[DRBD-user] tons of out-of-sync sectors detected

Thu Jul 31 10:06:30 CEST 2008

Hi,

I understand how this works, and unfortunately I memtested the server a few
weeks ago - I'll do it again, just to be sure...

I think I'm going to go back to the old "official" Debian Etch kernel, to see
how it behaves with no offloading on the NIC and with integrity checking enabled
in DRBD. It seemed to find much less out-of-sync sectors than the new one with
drbdadm verify all. So in combination with these options it will eliminate this
problem, though I'm not holding my breath...

Couln't OpenLDAP or MySQL with BDB and HDB databases write to disk in a manner
that causes false postives with DBRD ? Do you know of any application
specifically that could do that ?

Eric

Selon Lars Ellenberg <lars.ellenberg at linbit.com>:
>
>
> did you memtest recently?
>
>
> as I wrote before in other threads,
> there are various ways to make these checksums fail.
>
> e.g. (not exactly technically correct, but to get the picture)
>  application writes some data, using normal writes to the file system.
>  file system receives that data, puts it into the page cache.
>  page cache decides (or is forced) to write out some pages.
>  pages is submitted to "disk" (in this case DRBD)
>
>  drbd receives that data,
>    calculates a checksum over that data buffer,
>    [A]
>    submits it to local disk,
>    and queues it for sending over tcp.
>
>      local disk write completion comes in,
>      remote disk write completion comes in
>      in any order.
>
>    once both these completion events are there,
>    drbd completes the request to "upper layers",
>    that is page cache in this example.
>    [B]
>
>
> if your application uses direct io, leave out the page cache.
> if it happens to be a file system meta data block,
> leave out page cache and application.
>
>
> now, if the data buffer is _modified_ (by any component or bad ram)
> while being in flight (that is after being submitted and the checksum
> has been calculated at [A], but before DRBD has completed the write
> at [B]), then the checksum calculated on the originally submitted data
> would have a hard time matching the modified data that is being sent
> later.
>
> drbd does expect in-flight data buffers to not be modified until it
> completes the request.
>
> there may be applications/subsystems out there that modify in-flight
> data buffers "accidentally", risking data integrity.
> it also may be "legal" from the application/file system/swap subsystem
> point of view to "re-use" an in-flight data buffer even before the write
> has been completed. the "on disk" result would then be undefined on
> _any_ io stack (either it reached "disk" before or after this
> modification), meaning that can only be "legal" if the user actually
> does not care what is written there.
>
> with DRBD, the result is doubly undefined, because it may have reached
> local disk before or after that modification, and it may have been sent
> before or after the modification, which may result in differing data
> being written on both nodes.
>
> this may or may not be what causes your observations.
>
> there is not much that DRBD could do about this, besides doing a full
> memcpy of the data buffer to some "drbd private" memory, before even
> calculating any checksums or submitting/queueing for tcp.
>
>
> --
> : Lars Ellenberg                           http://www.linbit.com :
> : DRBD/HA support and consulting             sales at linbit.com :
> : LINBIT Information Technologies GmbH      Tel +43-1-8178292-0  :
> : Vivenotgasse 48, A-1120 Vienna/Europe     Fax +43-1-8178292-82 :
> __
> please don't Cc me, but send to list -- I'm subscribed
> _______________________________________________
> drbd-user mailing list
> drbd-user at lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user
>
>