<div dir="ltr">Hello Lars,<br><br>> Upper layer submits write to <span class="">DRBD</span>.<br>> <span class="">DRBD</span> calculates checksum over data buffer.<br>> <span class="">DRBD</span> sends that checksum.<br>
> <span class="">DRBD</span> submits data buffer to "local" backend <span class="">block</span> <span class="">device</span>.<br>> Meanwhile, upper layer changes data buffer.<br>> <span class="">DRBD</span> sends data buffer to peer.<br>
> <span class="">DRBD</span> receives local completion.<br>> <span class="">DRBD</span> receives remote ACK.<br>> <span class="">DRBD</span> completes this write to upper layer.<br>> *only now* would the upper layer be "allowed"<br>
> to change that data buffer again.<br><div class="gmail_extra"><br></div><div class="gmail_extra">I think you were right and upper layer misbehaves. I've turned write caching off for Linux KVMs and last check found only one OOS (it probably caused before I turned caching off, so I'll wait one more week). Thank you for pointing the right way to dig.<br>
<br></div><div class="gmail_extra">So far I see the following ways to avoid OOS.<br></div><div class="gmail_extra">1. Disabling write caching<br></div><div class="gmail_extra">2. Using barriers for guest OSes - it is enabled by default for ext4 and can be enabled for ext3 but:<br>
- can't be enabled for swap<br>- not sure what to do with Windows guests (it is assumed that NTFS supports barriers but I've seen OOS caused on Windows partitions several times, may be I need to disable write caching inside Windows)<br>
<br></div><div class="gmail_extra">The first way can cause slowdowns. The second way is to difficult especially when you can't control guest OSes.<br><br></div><div class="gmail_extra">After all I wonder why DRBD can't copy the buffer before writing and then submit/send this copy and not the origin (that can be changed any time)?<br>
<br></div><div class="gmail_extra">Best regards,<br></div><div class="gmail_extra">Stanislav<br></div></div>