Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hello Lars, > Upper layer submits write to DRBD. > DRBD calculates checksum over data buffer. > DRBD sends that checksum. > DRBD submits data buffer to "local" backend block device. > Meanwhile, upper layer changes data buffer. > DRBD sends data buffer to peer. > DRBD receives local completion. > DRBD receives remote ACK. > DRBD completes this write to upper layer. > *only now* would the upper layer be "allowed" > to change that data buffer again. I think you were right and upper layer misbehaves. I've turned write caching off for Linux KVMs and last check found only one OOS (it probably caused before I turned caching off, so I'll wait one more week). Thank you for pointing the right way to dig. So far I see the following ways to avoid OOS. 1. Disabling write caching 2. Using barriers for guest OSes - it is enabled by default for ext4 and can be enabled for ext3 but: - can't be enabled for swap - not sure what to do with Windows guests (it is assumed that NTFS supports barriers but I've seen OOS caused on Windows partitions several times, may be I need to disable write caching inside Windows) The first way can cause slowdowns. The second way is to difficult especially when you can't control guest OSes. After all I wonder why DRBD can't copy the buffer before writing and then submit/send this copy and not the origin (that can be changed any time)? Best regards, Stanislav -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20140311/6b086956/attachment.htm>