[DRBD-user] pacemaker/corosync fence drbd on digest integrity error - do not wait for reconnect

Lars Ellenberg lars.ellenberg at linbit.com
Mon Nov 29 11:09:58 CET 2010

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


On Wed, Nov 17, 2010 at 12:36:32PM -0800, Dmitry Golubev wrote:
> 
> Hi,
> 
> I have a nasty problem with my cluster. For some reason it sometimes fails
> DRBD with "Digest integrity check FAILED". If I understand this correctly,
> that is OK and DRBD will reconnect at once. However before it does that, the
> cluster fences the secondary node and thus disables any possibility of
> cluster ever working again - until I manually clear the fencing rules out of
> crm config. The log looks like this:
> 
> 
> Nov 17 18:30:52 srv1 kernel: [2299058.247328] block drbd1: Digest integrity
> check FAILED.

> What can I do to fight this? I have no idea why is the communication fails
> sometimes, although the NICs and cabling is perfect. However I read in
> mailing lists, that it might happen with some NIC/kernel combination. Can we
> force the cluster soft to wait for reconnect a little bit?

Disable digest-integrity.
Buffers seem to be modified while being written out.
Which I consider bad behaviour. But it is still "legal",
and all filesystems seem to do it under certain circumstances.

Digest-integrity with dual-primary is not a very good idea.
Also, dual-primary mode of drbd does not necessarily add to your
availability, so think twice if you really need it.

We may add some workarounds in later versions of DRBD (copying
all data to private pages first, before we further process it),
which will obviously have additional performance impact.

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed



More information about the drbd-user mailing list