Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Il 07-08-2017 12:59 Lars Ellenberg ha scritto: > DRBD does not at all interact with the layers above it, > so it does not know, and does not care, which entities may or may not > have cached data they read earlier. > Any entities that need cache coherency accross multiple instances > need to coordinate in some way. > > But that is not DRBD specific at all, > and not even specific to clustering or multi-node setups. > > This means that if you intend to use something that is NOT cluster > aware > (or multi-instance aware) itself, you may need to add your own band-aid > locking and flushing "somewhere". > > I remember that "in the old days", kernel buffer pages may linger for > quite some time, even if the corresponding devices was no longer open, > which caused problems with migrating VMs even with something as a > shared > scsi device. Integration scripts added explicit calls to sync and > blockdev --flushbufs and the like... > > The kernel then learned to invalidate cache pages on last close, > so these hacks are no longer necessary (as long as no-one keeps > the device open when not actively used). > > The other alternative is to always use "direct IO". > > You can (destructively!) experiment with dual primary drbd, > make both nodes primary, > > on node A, > watch -n1 "dd if=/dev/drbd0 bs=4096 count=1 | strings" > watch -n1 "dd if=/dev/drbd0 bs=4096 count=1 iflag=direct | strings" > > on node B, > while sleep 0.1; do date +%F_%T.%N | dd of=/dev/drbd0 bs=4096 > iflag=sync of=direct; done > > iflag=sync padds with NUL to full bs, > of=direct makes sure it finds its way to DRBD > and not just into buffer cache pages > > You should see both "watch" thingies show the date changes written on > the other node. > > If you then do on "node A": sleep 10 < /dev/drbd0, > the non if=direct watch should show the same date for ten seconds, > because it gets its data from buffer cache, and the device is kept open > by the sleep. > > Once the "open count" of the device drops down to zero again, the > kernel > will invalidate the pages, and the next read will need to re-read from > disk (just as the "direct" read always does). > > You can then do > "sleep 10 </dev/drbd10 & sleep 5 ; blockdev --flushbufs /dev/drbd0; > wait", > and see the non-direct watch update the date just once after 5 seconds, > and then again once the sleep 10 has finished... > > Again, this does not really have anything to do with DRBD, > but with how the kernel treats block devices, > and if and how entities coordinate alternating and concurrent access > to "things". > > You can easily have two entities on the same node corrupt a boring > plain > text file on a classic file system on just a single node, if they both > assume "exclusive access", and don't coordinate properly. Great explanation Lars, thank you very much! -- Danti Gionatan Supporto Tecnico Assyoma S.r.l. - www.assyoma.it email: g.danti at assyoma.it - info at assyoma.it GPG public key ID: FF5F32A8