[DRBD-user] repeatable, infrequent, loss of data with DRBD

Lars Ellenberg lars.ellenberg at linbit.com
Sat Aug 22 11:07:16 CEST 2015

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


On Wed, Aug 19, 2015 at 04:57:09PM +0100, Matthew Vernon wrote:
> Hi,
> 
> I'm encountering a problem when using LVM & DRBD, the effect of
> which is that sometimes one of my nodes cannot see LVM metadata
> inside a DRBD device when it becomes Primary[1]. It seems that
> sometimes the early part of my DRBD device is not being correctly
> replicated (the LVM header is in the first MB).

You realize that, without cLVM, you need to deal with concurrency and
cache coherency between the hosts yourself?
Did you try the same with a variation of:
create: "echo MAGIC STRING | dd oflag=direct conv=sync bs=4k count=1"
verify: "dd iflag=direct bs=4k count=1 | xxd -a"

Also, lvm.conf "preferred names" is not good enough.
You need to actually deny use of the lower level devices.

> It seems likely that
> this is a bug in DRBD, but obviously it's difficult to be entirely
> sure. I'd be happy to try running different diagnostics if you think
> it'd help track down the problem.
> 
> My machines are Intel boxes running Debian Jessie (kernel 3.16.0),
> with the DRBD 8.4.6 kernel module, and drbd-utils version
> 8.9.2~rc1-2.
> 
> I have a script that can reproduce this problem; essentially[0] it does:
> 
> #on both hosts
> drbdadm -- --force create-md mcv21
> drbdadm up mcv21
> 
> #on host A only
> drbdadm wait-connect mcv21
> drbdadm new-current-uuid --clear-bitmap minor-7
> drbdadm primary mcv21
> 
> pvcreate /dev/drbd7
> drbdadm secondary mcv21
> 
> #on host B only
> drbdadm primary mcv21
> file -s /dev/drbd7 | grep 'LVM2 PV'
> 
> #on both hosts
> drbdadm down mcv21
> dd if=/dev/zero of=/dev/guests/mwsig-mcv21 bs=1M count=1
> 
> The "file" invocation is my diagnostic (it can detect the LVM PV
> metadata), and the dd is to make sure the LVM metadata is blanked
> before the next iteration.
> 
> The failure rate is about 1 in 1000 (an overnight run gave me 12
> failures to 12,988 passes).

You should also collect /proc/drbd, and maybe dmesg -c,
before and after each step.

Especially for the "failed" runs, /proc/drbd and
the kernel log of both nodes would be relevant.

It may be useful to have the LVM commands (pvcreate) run with -vvv
(or more).

You may want to double check what is actually in the blocks.
You could "prime" them with 100k of "something"
# node A
  perl -e 'print "A" x 102400' | dd of=/dev/guests/mwsig...  oflag=direct"
# node B
  perl -e 'print "B" x 102400' | dd of=/dev/guests/mwsig...  oflag=direct"

then do your thing,
and verify later which blocks have actually been written to.
(I use dd if=direct bs= count= skip= | xxd -a )

> The resource config (minus IP addresses):
> 
> resource mcv21 {
>   device /dev/drbd7;
>   disk /dev/guests/mwsig-mcv21;
>   meta-disk internal;
>   on ophon {
>     address [IPv4 here]:7795;
>     }
>   on opus {
>     address [IPv4 here]:7795;
>     }
>   net {
>     after-sb-0pri discard-zero-changes;
>     after-sb-1pri discard-secondary;
>     after-sb-2pri disconnect;
>   }
> }
> 
> The only interesting bit of global_common.conf is protocol C; and
> allow-two-primaries;
> 
> Regards,
> 
> Matthew
> 
> [0] the actual script is a bit fiddlier, as it has to deal with
> systemd-udevd sometimes holding /dev/drbd7 open a bit longer than it
> should, and has a pile of other related tests commented out in it. I
> can send it if you think it would help
> [1] my use case is building Xen guests - each guest has a host LV
> allocated, which is the "disk" for a DRBD device; I make that an LVM
> pv, which is what becomes the guests' storage

-- 
: Lars Ellenberg
: http://www.LINBIT.com | Your Way to High Availability
: DRBD, Linux-HA  and  Pacemaker support and consulting

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed



More information about the drbd-user mailing list