[DRBD-user] Dual primary and LVM

Fri Aug 11 23:28:21 CEST 2017

Il 07-08-2017 12:59 Lars Ellenberg ha scritto:
> DRBD does not at all interact with the layers above it,
> so it does not know, and does not care, which entities may or may not
> have cached data they read earlier.
> Any entities that need cache coherency accross multiple instances
> need to coordinate in some way.
> 
> But that is not DRBD specific at all,
> and not even specific to clustering or multi-node setups.
> 
> This means that if you intend to use something that is NOT cluster 
> aware
> (or multi-instance aware) itself, you may need to add your own band-aid
> locking and flushing "somewhere".
> 
> I remember that "in the old days", kernel buffer pages may linger for
> quite some time, even if the corresponding devices was no longer open,
> which caused problems with migrating VMs even with something as a 
> shared
> scsi device.  Integration scripts added explicit calls to sync and
> blockdev --flushbufs and the like...
> 
> The kernel then learned to invalidate cache pages on last close,
> so these hacks are no longer necessary (as long as no-one keeps
> the device open when not actively used).
> 
> The other alternative is to always use "direct IO".
> 
> You can (destructively!) experiment with dual primary drbd,
> make both nodes primary,
> 
> on node A,
> watch -n1 "dd if=/dev/drbd0 bs=4096 count=1 | strings"
> watch -n1 "dd if=/dev/drbd0 bs=4096 count=1 iflag=direct | strings"
> 
> on node B,
> while sleep 0.1; do date +%F_%T.%N | dd of=/dev/drbd0 bs=4096
> iflag=sync of=direct; done
> 
> iflag=sync padds with NUL to full bs,
> of=direct makes sure it finds its way to DRBD
> and not just into buffer cache pages
> 
> You should see both "watch" thingies show the date changes written on
> the other node.
> 
> If you then do on "node A": sleep 10 < /dev/drbd0,
> the non if=direct watch should show the same date for ten seconds,
> because it gets its data from buffer cache, and the device is kept open
> by the sleep.
> 
> Once the "open count" of the device drops down to zero again, the 
> kernel
> will invalidate the pages, and the next read will need to re-read from
> disk (just as the "direct" read always does).
> 
> You can then do
> "sleep 10 </dev/drbd10 & sleep 5 ; blockdev --flushbufs /dev/drbd0; 
> wait",
> and see the non-direct watch update the date just once after 5 seconds,
> and then again once the sleep 10 has finished...
> 
> Again, this does not really have anything to do with DRBD,
> but with how the kernel treats block devices,
> and if and how entities coordinate alternating and concurrent access
> to "things".
> 
> You can easily have two entities on the same node corrupt a boring 
> plain
> text file on a classic file system on just a single node, if they both
> assume "exclusive access", and don't coordinate properly.

Great explanation Lars, thank you very much!

-- 
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.danti at assyoma.it - info at assyoma.it
GPG public key ID: FF5F32A8