Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On Sat, Feb 26, 2011 at 07:31:03PM +0100, Walter Haidinger wrote: > Hi Lars, thanks for the reply. > > > So you no longer have any problems/ASSERTs regarding drbd_al_read_log? > > No, those are gone. I did a create-md on the secondary node and a full resync. Don't know if that was "the fix", though, but I suppose so. > > > Well, what does the other (Primary) side say? > > I'd expect it to say > > "Digest mismatch, buffer modified by upper layers during write: ..." > > Yes, it does (see the kernel logs below). > > > If it does not, your link corrupty data. > > If it does, well, then that's what happens. > > (note: this double check on the sending side > > has only been introduced with 8.3.10) > > Now where do I go from here? > Any way to tell who or what is responsible for the data corruption? There is just "buffers modified during writeout". That's not necessarily the same as data corruption. Quoting the DRBD User's Guide: Notes on data integrity There are two independent methods in DRBD to ensure the integrity of the mirrored data. The online-verify mechanism and the data-integrity-alg of the network section. Both mechanisms might deliver false positives if the user of DRBD modifies the data which gets written to disk while the transfer goes on. This may happen for swap, or for certain append while global sync, or truncate/rewrite workloads, and not necessarily poses a problem for the integrity of the data. Usually when the initiator of the data transfer does this, it already knows that that data block will not be part of an on disk data structure, or will be resubmitted with correct data soon enough. ... If you don't want to know about that, disable that check. If the replication link interruptions caused by that check are bad for your setup (particularly so in dual primary setups), disabled that check. If you want to use it anyways: that's great, do so, and live with it. If you want to have DRBD do "end-to-end" data checksums, even if the data buffers may be modified while being in flight, and still want it to be efficient, sponsor feature development. The Problem: http://lwn.net/Articles/429305/ http://thread.gmane.org/gmane.linux.kernel/1103571 http://thread.gmane.org/gmane.linux.scsi/59259 And many many more older threads on various ML, some of them misleading, some of them mixing this issue of in-flight modifications with actual (hardware caused) data corruption. Possible Solutions: - DRBD starts to first copy every submitted data to some private pages, then calculates the checksum. As this is now a checksum over *private* pages, if it does not match, that's a always a sign of data corruption. It also is a significant performance hit. Potentially, we could optimistically try to get away without copying, and only take the performance hit once we see a mismatch, in which case we'd need to copy it still anyways, and send it again -- if we still have it. - Linux generic write-out path is fixed to not allow modifications of data during write-out. - Linux generic block integrity framework is fixed in whatever way is deemed most useful, and DRBD switches to use that instead, respectively simply forward integrity information, which may already have been generated by some layer above DRBD. The "generic write out path" people seem to be on it, this time. Not sure if it will help much with VMs on top of DRBD, as they will run older kernels or different operating systems doing things differently, potentially screwing things up. -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com