<div>Hi All,</div><div><br></div><div><br></div><div><br></div><div>I've read a few post regarding the possibilities of I/O deadlock if using a loopback mounted file as the storage layer for DRBD.</div><div><br></div><div>
I have a few questions which I either haven't found answers to, or haven't found a recent answer to.</div><div><br></div><div>I'll just lay out my scenario so you understand where I'm coming from :)</div><div>
<br></div><div><br></div><div><br></div><div>We run a monitoring system that collects data from thousands of devices, which is stored in RRD files. Simple disks don't provide the I/O required for this system to work well. With our current NAS vendor, it seems that the cost-reward ration is not good enough to make dedicated use of these spindles. So we're left looking at other solutions.</div>
<div><br></div><div>SSDs seem to degrade too fast for the given I/O profile - although the last test was over a year ago, so it may be worth looking at this again.</div><div><br></div><div>So for the last year or two we have been using a prefabricated disk images, dropped into tmpfs and loopback mounted. This gives great performance, and some level of persistence (assuming the server doesn't die!), the disk image can be unmounted and copied back to physical disk before powering down the server etc. This is very stable, the current primary server has had it's image loopback mounted in this fashion for about two years and counting without unmounting :)</div>
<div>We are also running the DB for this application in another refab image using the same method.</div><div><br></div><div>One piece missing from our current solution - which I am currently looking at re-vamping - is descent replication for DR purposes (or clsuter partneship using active standby).</div>
<div><br></div><div>My initial plan was to use DRDB :)</div><div>I have already configured this as a layer on top of our current tmpfs, prefab image, loopback solution for testing.</div><div>Then I found posts advising against it because an I/O deadlock _will_ happen.</div>
<div><br></div><div>Damn!</div><div><br></div><div><br></div><div>In this post, I found some interesting info.</div><div><a href="http://www.digipedia.pl/usenet/thread/15095/10032/">http://www.digipedia.pl/usenet/thread/15095/10032/</a></div>
<div><br></div><div>"you'd hit this sooner with drbd 0.6, or with drbd 0.7 and kernel 2.4.</div><div>you can probably still hit this with the most recent drbd and kernel."</div><div><br></div><div>"if you don't have much io load, and you do have much free memory, you</div>
<div>probably could even run this for a long time and never have problems.</div><div>but eventually it will deadlock."</div><div><br></div><div><div>So my question are...</div></div><div><br></div><div><br></div><div>
Is this still the case - DRBD 8.3 (or even 8.4) using Linux 3.1 ?</div><div>Has it been tested recently? - the quote is "you can _probably_ still hit this".</div><div><br></div><div>If this is the case - is there any way at all to track how close to deadlock we are?</div>
<div><br></div><div><br></div><div>My plan would be to fail-over from one cluster partner to the other about once a month anyway, and restart the loopback device layer. If the deadlock is quantifiable in some way, we may be able to make a call and say this is acceptable risk.</div>
<div><br></div><div><br></div><div>Another more general question is - can anyone think of another way we might be able to leverage tmpfs for use with DRBD that I haven't thought of? :)</div><div><br></div><div><br></div>
<div>Thanks in advance - any assistance/input is appreciated.</div><div><br></div><div><br></div><br clear="all"><div><br></div>Cheers,<div>Just</div><br>
<pre>This message has been checked for all known viruses by the Postini Virus Control Centre.