[DRBD-user] DRBD I/O problems & corrupted data

Lars Ellenberg lars.ellenberg at linbit.com
Tue Jan 27 12:39:26 CET 2015

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.

On Mon, Jan 26, 2015 at 04:11:26PM +0100, Saso Slavicic wrote:
> There always is a multiple error scenario ;-)

And we really only can cover single error scenarios.

If we sometimes can deal with multiple error scenarios
in sensible ways, that's a bonus.

> > Because we did not anticipate that you would block the worker,
> > and did call the handler before we notify the peer.
> .
> > *again* your blocking drbdadm detach from the local-io-error handler is
> the trigger.
> > Don't do that. You also should even background the "notify", if you really
> insist on using it.
> A local-io-error handler in the documentation example even has 'halt -f' in
> the sequence. How does that run before notifying the peer?
> However no documentation examples show handlers being backgrounded. ?

Either return as quickly as possible,
or do not return at all.

Either way, don't call back into DRBD kernel code
from inside a synchronously running handler.

> > really? for 7 megabyte you want to pull ahead already?
> > you basically won't have a consistent secondary, ever.
> Another problem for me with DRBD. Without proxy, DRBD even in protocol A
> blocks when network buffer is full. If buffer is set to 10MB, pull ahead
> needs to be before buffer is full or DRBD blocks. Our write load is low
> enough not to trigger resync too often, but when it does I want it to pull
> ahead, not slow down to WAN link speed. Somehow I don't feel comfortable
> with the idea of having hundreds of MBs of network buffer in the kernel. In
> this case, the secondary is more a standby to try to have the last possible
> data, so I don't mind it being a bit out of sync from time to time.

It won't be "a bit out of sync".

It will be *inconsistent*.
That's a polite word for "corrupt by design".

Which is why you should take a snapshot of your last consistent data,
from your before-resync-target handler.
Yes, that's supposed to be synchronous.
No, it should not block for too long.

: Lars Ellenberg
: http://www.LINBIT.com | Your Way to High Availability
: DRBD, Linux-HA  and  Pacemaker support and consulting

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
please don't Cc me, but send to list   --   I'm subscribed

More information about the drbd-user mailing list