[DRBD-user] Seeking extra info on loopback mounted devices

Thu Feb 9 12:08:17 CET 2012

Hi All,

I've read a few post regarding the possibilities of I/O deadlock if using
a loopback mounted file as the storage layer for DRBD.

I have a few questions which I either haven't found answers to, or haven't
found a recent answer to.

I'll just lay out my scenario so you understand where I'm coming from :)

We run a monitoring system that collects data from thousands of devices,
which is stored in RRD files.  Simple disks don't provide the I/O required
for this system to work well.  With our current NAS vendor, it seems that
the cost-reward ration is not good enough to make dedicated use of these
spindles.  So we're left looking at other solutions.

SSDs seem to degrade too fast for the given I/O profile - although the last
test was over a year ago, so it may be worth looking at this again.

So for the last year or two we have been using a prefabricated disk images,
dropped into tmpfs and loopback mounted.  This gives great performance, and
some level of persistence (assuming the server doesn't die!), the disk
image can be unmounted and copied back to physical disk before powering
down the server etc.  This is very stable, the current primary server has
had it's image loopback mounted in this fashion for about two years and
counting without unmounting :)
We are also running the DB for this application in another refab image
using the same method.

One piece missing from our current solution - which I am currently looking
at re-vamping - is descent replication for DR purposes (or clsuter
partneship using active standby).

My initial plan was to use DRDB :)
I have already configured this as a layer on top of our current tmpfs,
prefab image, loopback solution for testing.
Then I found posts advising against it because an I/O deadlock _will_
happen.

Damn!

In this post, I found some interesting info.
http://www.digipedia.pl/usenet/thread/15095/10032/

"you'd hit this sooner with drbd 0.6, or with drbd 0.7 and kernel 2.4.
you can probably still hit this with the most recent drbd and kernel."

"if you don't have much io load, and you do have much free memory, you
probably could even run this for a long time and never have problems.
but eventually it will deadlock."

So my question are...

Is this still the case - DRBD 8.3 (or even 8.4) using Linux 3.1 ?
Has it been tested recently?  -  the quote is "you can _probably_ still hit
this".

If this is the case - is there any way at all to track how close
to deadlock we are?

My plan would be to fail-over from one cluster partner to the other about
once a month anyway, and restart the loopback device layer.  If the
deadlock is quantifiable in some way, we may be able to make a call and say
this is acceptable risk.

Another more general question is - can anyone think of another way we might
be able to leverage tmpfs for use with DRBD that I haven't thought of?  :)

Thanks in advance  - any assistance/input is appreciated.

Cheers,
Just

This message has been checked for all known viruses by the Postini Virus Control Centre.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20120209/bc2f5fa4/attachment.htm>