[Drbd-dev] Resync Stalls at 100% patch problem
Ernest.Montrose at stratus.com
Thu Jun 7 00:14:13 CEST 2007
I looked into the patch issue a bit more.
This problem happens if we acquire the req_lock early. What I think is
going on is this:
Primary host ---->attach---> send_state()
Peer -----> receive_state()-->after_state_ch()-->send_bitmap()
Primary Host ---->receive_bitmap()***Unexpected cstate "Connected"
Peer ----> send_state()
Primary Host --->receive_state() and we are deadlock.
The primary host is receiving the bitmap too early. Essentially the
peer should call send_state() before calling after_state_ch()
I noticed the patch had moved that logic after calling after_state_ch()
so moving it back before after_state_ch() should be OK. Any reason why
it was moved down?
I tested the included patch that moved just before we call
Let me know.
From: drbd-dev-bounces at linbit.com [mailto:drbd-dev-bounces at linbit.com]
On Behalf Of Montrose, Ernest
Sent: Tuesday, June 05, 2007 6:50 PM
To: Philipp Reisner; drbd-dev at linbit.com
Subject: [Drbd-dev] Resync Stalls at 100% patch problem
Unfortunately it seems like the patch for receive_state() in
drbd_receive.c has a problem. Acquiring the req_lock at the top of this
routine causes the cstate machine to get confused. To reproduce this
Drbdadm detach d0;sleep 3;drbdadm attach d0 for instance. One side ends
up WBTmpT and the other WbitmapS and deadlock. I am researching more
but wanted to let you. If I just remove the early lock then things are
drbd-dev mailing list
drbd-dev at lists.linbit.com
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 1455 bytes
Url : http://lists.linbit.com/pipermail/drbd-dev/attachments/20070606/ffd78aba/sync_100.obj
More information about the drbd-dev