<div>Hi All,</div><div><br></div><div><br></div><div><br></div><div>I&#39;ve read a few post regarding the possibilities of I/O deadlock if using a loopback mounted file as the storage layer for DRBD.</div><div><br></div><div>

I have a few questions which I either haven&#39;t found answers to, or haven&#39;t found a recent answer to.</div><div><br></div><div>I&#39;ll just lay out my scenario so you understand where I&#39;m coming from :)</div><div>

<br></div><div><br></div><div><br></div><div>We run a monitoring system that collects data from thousands of devices, which is stored in RRD files.  Simple disks don&#39;t provide the I/O required for this system to work well.  With our current NAS vendor, it seems that the cost-reward ration is not good enough to make dedicated use of these spindles.  So we&#39;re left looking at other solutions.</div>

<div><br></div><div>SSDs seem to degrade too fast for the given I/O profile - although the last test was over a year ago, so it may be worth looking at this again.</div><div><br></div><div>So for the last year or two we have been using a prefabricated disk images, dropped into tmpfs and loopback mounted.  This gives great performance, and some level of persistence (assuming the server doesn&#39;t die!), the disk image can be unmounted and copied back to physical disk before powering down the server etc.  This is very stable, the current primary server has had it&#39;s image loopback mounted in this fashion for about two years and counting without unmounting :)</div>

<div>We are also running the DB for this application in another refab image using the same method.</div><div><br></div><div>One piece missing from our current solution - which I am currently looking at re-vamping - is descent replication for DR purposes (or clsuter partneship using active standby).</div>

<div><br></div><div>My initial plan was to use DRDB :)</div><div>I have already configured this as a layer on top of our current tmpfs, prefab image, loopback solution for testing.</div><div>Then I found posts advising against it because an I/O deadlock _will_ happen.</div>

<div><br></div><div>Damn!</div><div><br></div><div><br></div><div>In this post, I found some interesting info.</div><div><a href="http://www.digipedia.pl/usenet/thread/15095/10032/">http://www.digipedia.pl/usenet/thread/15095/10032/</a></div>

<div><br></div><div>&quot;you&#39;d hit this sooner with drbd 0.6, or with drbd 0.7 and kernel 2.4.</div><div>you can probably still hit this with the most recent drbd and kernel.&quot;</div><div><br></div><div>&quot;if you don&#39;t have much io load, and you do have much free memory, you</div>

<div>probably could even run this for a long time and never have problems.</div><div>but eventually it will deadlock.&quot;</div><div><br></div><div><div>So my question are...</div></div><div><br></div><div><br></div><div>

Is this still the case - DRBD 8.3 (or even 8.4) using Linux 3.1 ?</div><div>Has it been tested recently?  -  the quote is &quot;you can _probably_ still hit this&quot;.</div><div><br></div><div>If this is the case - is there any way at all to track how close to deadlock we are?</div>

<div><br></div><div><br></div><div>My plan would be to fail-over from one cluster partner to the other about once a month anyway, and restart the loopback device layer.  If the deadlock is quantifiable in some way, we may be able to make a call and say this is acceptable risk.</div>

<div><br></div><div><br></div><div>Another more general question is - can anyone think of another way we might be able to leverage tmpfs for use with DRBD that I haven&#39;t thought of?  :)</div><div><br></div><div><br></div>

<div>Thanks in advance  - any assistance/input is appreciated.</div><div><br></div><div><br></div><br clear="all"><div><br></div>Cheers,<div>Just</div><br>

<pre>This message has been checked for all known viruses by the Postini Virus Control Centre.