[DRBD-user] repeatable, infrequent, loss of data with DRBD

Matthew Vernon mcv21 at cam.ac.uk
Wed Aug 19 17:57:09 CEST 2015

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Hi,

I'm encountering a problem when using LVM & DRBD, the effect of which is 
that sometimes one of my nodes cannot see LVM metadata inside a DRBD 
device when it becomes Primary[1]. It seems that sometimes the early 
part of my DRBD device is not being correctly replicated (the LVM header 
is in the first MB). It seems likely that this is a bug in DRBD, but 
obviously it's difficult to be entirely sure. I'd be happy to try 
running different diagnostics if you think it'd help track down the problem.

My machines are Intel boxes running Debian Jessie (kernel 3.16.0), with 
the DRBD 8.4.6 kernel module, and drbd-utils version 8.9.2~rc1-2.

I have a script that can reproduce this problem; essentially[0] it does:

#on both hosts
drbdadm -- --force create-md mcv21
drbdadm up mcv21

#on host A only
drbdadm wait-connect mcv21
drbdadm new-current-uuid --clear-bitmap minor-7
drbdadm primary mcv21

pvcreate /dev/drbd7
drbdadm secondary mcv21

#on host B only
drbdadm primary mcv21
file -s /dev/drbd7 | grep 'LVM2 PV'

#on both hosts
drbdadm down mcv21
dd if=/dev/zero of=/dev/guests/mwsig-mcv21 bs=1M count=1

The "file" invocation is my diagnostic (it can detect the LVM PV 
metadata), and the dd is to make sure the LVM metadata is blanked before 
the next iteration.

The failure rate is about 1 in 1000 (an overnight run gave me 12 
failures to 12,988 passes).

The resource config (minus IP addresses):

resource mcv21 {
   device /dev/drbd7;
   disk /dev/guests/mwsig-mcv21;
   meta-disk internal;
   on ophon {
     address [IPv4 here]:7795;
     }
   on opus {
     address [IPv4 here]:7795;
     }
   net {
     after-sb-0pri discard-zero-changes;
     after-sb-1pri discard-secondary;
     after-sb-2pri disconnect;
   }
}

The only interesting bit of global_common.conf is protocol C; and 
allow-two-primaries;

Regards,

Matthew

[0] the actual script is a bit fiddlier, as it has to deal with 
systemd-udevd sometimes holding /dev/drbd7 open a bit longer than it 
should, and has a pile of other related tests commented out in it. I can 
send it if you think it would help
[1] my use case is building Xen guests - each guest has a host LV 
allocated, which is the "disk" for a DRBD device; I make that an LVM pv, 
which is what becomes the guests' storage



More information about the drbd-user mailing list