[DRBD-user] repeatable, infrequent, loss of data with DRBD

Fri Sep 11 15:04:07 CEST 2015

On Thu, Sep 03, 2015 at 04:55:51PM +0100, Matthew Vernon wrote:
> Hi,
> 
> On 22/08/15 10:07, Lars Ellenberg wrote:
> 
> Sorry for the delay - it took a while to sort out the necessary
> debugging output &c.

You have some strange effects in there.

In the failed run,
the peer disk (pdsk) goes Inconsistent -> UpToDate,
then back to Inconsistent again.

I see the connection drop.

I see the "other" node generate a new current uuid when made primary,
which it would not, if it was still properly connected.
So you get data divergence there.  DRBD also says at some point that it
knows there is at least one block "out of sync".

> > You should also collect /proc/drbd, and maybe dmesg -c,
> > before and after each step.
> > 
> > Especially for the "failed" runs, /proc/drbd and
> > the kernel log of both nodes would be relevant.
> 
> I have done this, and also collected network dumps (for ports 7790, the
> drbd-resource port and 22, the ssh port).
> 
> The attached tarball contains the following files relating to 2
> iterations of my script (1 failure, the other success):
> 
> script-output.txt - output from the script at the "driving" end -
> includes /proc/drbd dmesg -c and so on
> 
> syslog-16-16-04.txt - syslog from the "target" end for the successful run
> 
> 15-09-03-16-16-04.pcap - packet dump (tcpdump -w) for the successful run
> 
> syslog-16-16-06.txt & 15-09-03-16-16-06.pcap - syslog & packet dump from
> the failing run
> 
> each iteration through the loop (this is 2 iterations) does, roughly:
> 
> * output timestamp
> * start tcpdump
> * debug
> * drbd - create-md and up on both machines, wait-connect, new-current-uuid --clear-bitmap, primary

"wait-connect" may sometimes return early.

before you promote, you should double check if the new-current-uuid was
properly communicated and handled (see below).

> * debug
> * set magic string (TESTDATAMAGIC)
> * debug
> * secondary, make remote primary
> * debug
> * check for magic string on remote (and output FAILED if fails)
> * debug
> * drbdadm down (both machines)
> * debug
> * dd 1M of zero to front of underlying LV (both machines)
> * debug
> * kill tcpdump
> * output "Iteration done"
> 
> where debug = output the top 5 lines of /proc/drbd,  and dmesg -c
> 
> I hope that's enough debugging for you to track down the bug :-)

There seems to be at least one race condition somewhere.
I'm not saying there is no bug in DRBD, there may be some
unhandled race with connection handshake/loss/reconnect AND
the "skip initial sync" AND an early "application" write.  But I suspect
there may be a race condition in your test procedure as well.

I suggest to double check that you really have, on both nodes,
"Connected Secondary/Secondary Inconsistent/Inconsistent",
before you do the "new-current-uuid --clear",
and that after this, you have, on both nodes,
"Connected Secondary/Secondary UpToDate/UpToDate",
before you promote.

after the new-current-uuid, only "role" (resp. "peer") state changes
Secondary -> Primary and back are expected, any additional state changes
(especially connection related) would indicate some issue in the setup.
No further UUID changes would be expected, either.

In any case, we can try to reproduce the behaviour you observed, and if
we are able to trigger it, either explain what actually goes on, why it
behaves as it does, or see if we can "fix" it.

-- 
: Lars Ellenberg
: http://www.LINBIT.com | Your Way to High Availability
: DRBD, Linux-HA  and  Pacemaker support and consulting

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.