[Drbd-dev] Resync Stalls at 100% patch problem

Montrose, Ernest Ernest.Montrose at stratus.com
Thu Jun 7 00:14:13 CEST 2007

I looked into the patch issue a bit more. 
This problem happens if we acquire the req_lock early.  What I think is
going on is this:

Primary host ---->attach---> send_state()
Peer -----> receive_state()-->after_state_ch()-->send_bitmap()
Primary Host ---->receive_bitmap()***Unexpected cstate "Connected"
Peer ----> send_state()
Primary Host --->receive_state() and we are deadlock.

The primary host is receiving the bitmap too early.  Essentially the
peer should call send_state() before calling after_state_ch()

I noticed the patch had moved that logic after calling after_state_ch()
so moving it back before after_state_ch() should be OK.  Any reason why
it was moved down?

I tested the included patch that moved just before we call
Let me know.



-----Original Message-----
From: drbd-dev-bounces at linbit.com [mailto:drbd-dev-bounces at linbit.com]
On Behalf Of Montrose, Ernest
Sent: Tuesday, June 05, 2007 6:50 PM
To: Philipp Reisner; drbd-dev at linbit.com
Subject: [Drbd-dev] Resync Stalls at 100% patch problem

Unfortunately it seems like the patch for receive_state() in
drbd_receive.c has a problem.  Acquiring the req_lock at the top of this
routine causes the cstate machine to get confused.  To reproduce this
just do:
Drbdadm detach d0;sleep 3;drbdadm attach d0 for instance. One side ends
up WBTmpT and the other WbitmapS and deadlock.  I am researching more
but wanted to let you.  If I just remove the early lock then things are

drbd-dev mailing list
drbd-dev at lists.linbit.com

-------------- next part --------------
A non-text attachment was scrubbed...
Name: sync_100.patch
Type: application/octet-stream
Size: 1455 bytes
Desc: sync_100.patch
Url : http://lists.linbit.com/pipermail/drbd-dev/attachments/20070606/ffd78aba/sync_100.obj

More information about the drbd-dev mailing list