Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On Thu, Jul 27, 2017 at 10:11:48AM +0200, Gionatan Danti wrote: > To clarify: the main reason I am asking about the feasibility of a > dual-primary DRBD setup with LVs on top of it is about cache coherency. Let > me do a step back: the given explaination for deny even read access on a > secondary node is of broken cache coherency/consistency: if the read/write > node writes something the secondary node had previously read, the latter > will not recognize the changes done by the first node. The canonical > solution to this problem is to use a dual-primary setup with a clustered > filesystem (eg: GFS2) which not only arbitrates write access, but maintains > read cache consistency also. > > Now, let's remove the clustered filesystem layer, leaving "naked" LVs only. > How read cache coherency is mantained in this case? As no filesystem is > layered on top of the raw LVs, there is not real pagecache at work, but the > kernel's buffers remains - and they need to be made coherents. How DRBD > achieves this? Does it update the receiving kernel I/O buffers each time the > other node writes something? DRBD does not at all interact with the layers above it, so it does not know, and does not care, which entities may or may not have cached data they read earlier. Any entities that need cache coherency accross multiple instances need to coordinate in some way. But that is not DRBD specific at all, and not even specific to clustering or multi-node setups. This means that if you intend to use something that is NOT cluster aware (or multi-instance aware) itself, you may need to add your own band-aid locking and flushing "somewhere". I remember that "in the old days", kernel buffer pages may linger for quite some time, even if the corresponding devices was no longer open, which caused problems with migrating VMs even with something as a shared scsi device. Integration scripts added explicit calls to sync and blockdev --flushbufs and the like... The kernel then learned to invalidate cache pages on last close, so these hacks are no longer necessary (as long as no-one keeps the device open when not actively used). The other alternative is to always use "direct IO". You can (destructively!) experiment with dual primary drbd, make both nodes primary, on node A, watch -n1 "dd if=/dev/drbd0 bs=4096 count=1 | strings" watch -n1 "dd if=/dev/drbd0 bs=4096 count=1 iflag=direct | strings" on node B, while sleep 0.1; do date +%F_%T.%N | dd of=/dev/drbd0 bs=4096 iflag=sync of=direct; done iflag=sync padds with NUL to full bs, of=direct makes sure it finds its way to DRBD and not just into buffer cache pages You should see both "watch" thingies show the date changes written on the other node. If you then do on "node A": sleep 10 < /dev/drbd0, the non if=direct watch should show the same date for ten seconds, because it gets its data from buffer cache, and the device is kept open by the sleep. Once the "open count" of the device drops down to zero again, the kernel will invalidate the pages, and the next read will need to re-read from disk (just as the "direct" read always does). You can then do "sleep 10 </dev/drbd10 & sleep 5 ; blockdev --flushbufs /dev/drbd0; wait", and see the non-direct watch update the date just once after 5 seconds, and then again once the sleep 10 has finished... Again, this does not really have anything to do with DRBD, but with how the kernel treats block devices, and if and how entities coordinate alternating and concurrent access to "things". You can easily have two entities on the same node corrupt a boring plain text file on a classic file system on just a single node, if they both assume "exclusive access", and don't coordinate properly. -- : Lars Ellenberg : LINBIT | Keeping the Digital World Running : DRBD -- Heartbeat -- Corosync -- Pacemaker DRBD® and LINBIT® are registered trademarks of LINBIT __ please don't Cc me, but send to list -- I'm subscribed