[Drbd-dev] DRBD8: drbd nodes deadlock in WFBitMapT

Montrose, Ernest Ernest.Montrose at stratus.com
Mon Apr 2 23:44:51 CEST 2007


Phil,
OK...I have learned a tad more since that last email.  So before I even
try the proposed patch here is a way I was able to duplicate the
problem.
Perhaps that will help a bit.  Here it is with two nodes 'a' and 'b' .
Suppose
They are in steady states with UUIDS:
Xa:0:Ha:HH:1:1:0:1:0:0
Xb:0:Hb:HH:1:1:0:1:0:0
Role Secondary/Secondary

1) Disconnect/detach /dev/drbdX on nodea
2) Move Current UUID of nodea to history-UUID of nodea and set
current_UUID of nodea to 00000000000 with drbdmeta..
0:0:Xa:HH:1:1:0:1:0:0
3) Now attach and connect /dev/drbdX and the problem will occur

I have attached the logs for my "manufactured" version of the problem
:).  I have some extra instrumentation that I had placed in there so you
can ignore
Them...

I am not sure how we got that way under normal operation..:( One thing
is that
the test we are running calls for "pulling the plug" on one of the node.
So 
the reboot was not the result of a graceful shutdown and right before
the reboot, the messed up volume had just written its meta data "to
super block" coming fresh from a resync.

Hope this helps

Thanks a lot...

EM--

-----Original Message-----
From: Philipp Reisner [mailto:philipp.reisner at linbit.com] 
Sent: Monday, April 02, 2007 4:00 PM
To: drbd-dev at linbit.com
Cc: Montrose, Ernest
Subject: Re: [Drbd-dev] DRBD8: drbd nodes deadlock in WFBitMapT

Hi Ernest,

I removed all the linebreaks:

Drbd2: Writing metadata to superblock now.
......
....... This nodes is powered of and came back with:

13:16:53 je rct = 0 in
13:16:54 je drbd_sync_handshake:
13:16:54 je self
F71E503A8179BC5D:0000000000000000:3F430D4E1D59C3EA:6B0B1DA5CB20689C
13:16:54 je peer
F71E503A8179BC5C:0000000000000000:3F430D4E1D59C3EA:6B0B1DA5CB20689C
13:16:55 je uuid_compare()=0 by rule 4
<<<<<<<<=========---------------
13:16:55 je No resync, but bits in bitmap!

......

13:17:00 je drbd_sync_handshake:
13:17:00 je self
F71E503A8179BC5D:0000000000000000:3F430D4E1D59C3EA:6B0B1DA5CB20689C
13:17:00 je peer
0000000000000000:0000000000000000:F71E503A8179BC5C:3F430D4E1D59C3EA
13:17:00 je uuid_compare()=-2 by rule 6
13:17:00 je Writing meta data super block now.
13:17:01 je writing of bitmap took 11 jiffies
13:17:01 je 12 GB marked out-of-sync by on disk bit-map.
13:17:02 je 13336132 KB now marked out-of-sync by on disk bit-map.
13:17:02 je Writing meta data super block now.
13:17:02 je  uuid[History_start] now F71E503A8179BC5D
13:17:02 je  uuid[Current] now 0000000000000000
13:17:03 je conn( Connected -> WFBitMapT )
13:17:03 je Writing meta data super block now.

On theh other node=============================

13:16:48 be aftr_isp( 0 -> 1 )
13:16:48 be Handshake successful: DRBD Network Protocol version 86
13:16:48 be peer( Unknown -> Secondary ) conn( WFReportParams ->
Connected ) pdsk( DUnknown -> UpToDate ) peer_isp( 0 -> 1 )
13:16:48 be Writing meta data super block now.

....

13:16:49 be rct = 2 in
13:16:49 be drbd_sync_handshake:
13:16:49 be self
F71E503A8179BC5C:0000000000000000:3F430D4E1D59C3EA:6B0B1DA5CB20689C
13:16:49 be peer
F71E503A8179BC5D:0000000000000000:3F430D4E1D59C3EA:6B0B1DA5CB20689C
13:16:49 be uuid_compare()=-1 by rule 4
<<<<<<<<=========---------------
13:16:49 be  uuid[History_start] now F71E503A8179BC5C
13:16:49 be  uuid[Current] now 0000000000000000
13:16:49 be conn( Connected -> WFBitMapT )
13:16:49 be Writing meta data super block now.


Unfortunately the clocks of the nodes seems to be not in sync (please
use NTP, this would make reading such logs much easiert!)

I assume that the line marked with the arrow ( <<==-- ) is 
where the troubles begin. Because the 'self' and 'peer' line
is matches here inverse. Then the suspicious decision is
the outcome of uuid_compare() with rule 4. 

While je (jerry) comes to the conclusion 0, gets ben -1. 

The rtc (a few lines before) suggests us that jerry thought
that neither node was a CRASHED_PRIMARY, while ben thought
that jerry was a CRASHED_PRIMARY. Out of you comment
that jerry was powered off, I guess that jerry was really
a CRASHED_PRIMARY, and therefore it sent this information
over to ben. But somehow that CRASHED_PRIMARY bit got
cleared before jerry came to its drbd_uuid_compare()
function. 

The question is: how did that happen ?

In the rest of the log entries on jerry we can see how he
tried to do a second sync handshake ... 

Back to the question of the lost CRASHED_PRIMARY bit. 

That bits can get cleared when:
1) We attach a disk.
2) Connection state goes to connected.
3) In the receive_bitmap.

I think that the clear_bit in receive_bitmap() is wrong, but
I currently can not think of a situation where this clear_bit
can get executed before the drbd_sync_handshake() function.

Ernest, could you provide a bit more context with a few more
lines of jerry's log before 13:16:53.

So far this my proposed patch. But there is still a missing
link in the reasoning:

--- drbd_receiver.c	(revision 2821)
+++ drbd_receiver.c	(working copy)
@@ -2509,7 +2509,6 @@
 		D_ASSERT(h->command == ReportBitMap);
 	}
 
-	clear_bit(CRASHED_PRIMARY, &mdev->flags); // md_write() is in
drbd_start_resync.
 	if (mdev->state.conn == WFBitMapS) {
 		drbd_start_resync(mdev,SyncSource);
 	} else if (mdev->state.conn == WFBitMapT) {
-------------- next part --------------
drbd2: _drbd_set_state: Changing state.
drbd2: _drbd_set_state: setting conn to SyncSource or SyncTarget see message below
drbd2: peer( Secondary -> Unknown ) conn( Connected -> Disconnecting ) pdsk( UpToDate -> DUnknown ) 
drbd2: _drbd_set_state: Changing state.
drbd2: _drbd_set_state: setting conn to SyncSource or SyncTarget see message below
drbd2: _drbd_set_state: No state new state to change to.
drbd2: short read expecting header on sock: r=-512
drbd2: asender terminated
drbd2: tl_clear()
drbd2: Connection closed
drbd2: Writing meta data super block now.
drbd2: _drbd_set_state: Changing state.
drbd2: _drbd_set_state: setting conn to SyncSource or SyncTarget see message below
drbd2: conn( Disconnecting -> StandAlone ) 
drbd2: receiver terminated
drbd2: _drbd_set_state: Changing state.
drbd2: _drbd_set_state: setting conn to SyncSource or SyncTarget see message below
drbd2: disk( UpToDate -> Diskless ) 
drbd2: drbd_bm_resize called with capacity == 0
drbd2: worker terminated
drbd2: _drbd_set_state: Changing state.
drbd2: _drbd_set_state: setting conn to SyncSource or SyncTarget see message below
drbd2: disk( Diskless -> Attaching ) 
drbd2: No usable activity log found.
drbd2: max_segment_size ( = BIO size ) = 32768
drbd2: drbd_bm_resize called with capacity == 26672264
drbd2: resync bitmap: bits=3334033 words=104190
drbd2: size = 12 GB (13336132 KB)
drbd2: reading of bitmap took 2 jiffies
drbd2: recounting of set bits took additional 0 jiffies
drbd2: 0 KB marked out-of-sync by on disk bit-map.
drbd2: _drbd_set_state: Changing state.
drbd2: _drbd_set_state: setting conn to SyncSource or SyncTarget see message below
drbd2: disk( Attaching -> UpToDate ) 
drbd2: Writing meta data super block now.
drbd2: _drbd_set_state: Changing state.
drbd2: _drbd_set_state: setting conn to SyncSource or SyncTarget see message below
drbd2: conn( StandAlone -> Unconnected ) 
drbd2: receiver (re)started
drbd2: drbd_connect: ****************Connecting********
drbd2: _drbd_set_state: Changing state.
drbd2: _drbd_set_state: setting conn to SyncSource or SyncTarget see message below
drbd2: conn( Unconnected -> WFConnection ) 
drbd2: _drbd_set_state: Changing state.
drbd2: _drbd_set_state: setting conn to SyncSource or SyncTarget see message below
drbd2: conn( WFConnection -> WFReportParams ) 
drbd2: Handshake successful: DRBD Network Protocol version 86
drbd2: drbd_connect: ****************sending procol while Connecting********
drbd2: drbd_connect: ****************Sending paran while Connecting********
drbd2: drbd_connect: **************** Sending sizes while Connecting********
drbd2: drbd_connect: ****************sendin uuids while Connecting********
drbd2: drbd_connect: ****************Sending states while Connecting********
drbd2: drbd_connect: ****************DONE!!!! Connecting********
drbd2: drbd_sync_handshake:
drbd2: self 0000000000000000:0000000000000000:B2E340FD0FB4F236:B2E340FD0FB4F219
drbd2: peer B2E340FD0FB4F236:0000000000000000:E515D7C70644AF92:B2E340FD0FB4F219
drbd2: uuid_compare()=-1 by rule 5
drbd2:  uuid[Current] now 0000000000000000
drbd2: _drbd_set_state: Changing state.
drbd2: _drbd_set_state: setting conn to SyncSource or SyncTarget see message below
drbd2: peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapT ) pdsk( DUnknown -> UpToDate ) 
drbd2: Writing meta data super block now.
-------------- next part --------------
drbd2: _drbd_set_state: Changing state.
drbd2: _drbd_set_state: setting conn to SyncSource or SyncTarget see message below
drbd2: peer( Secondary -> Unknown ) conn( Connected -> TearDown ) pdsk( UpToDate -> DUnknown ) 
drbd2: Writing meta data super block now.
drbd2: meta connection shut down by peer.
drbd2: asender terminated
drbd2: tl_clear()
drbd2: Connection closed
drbd2: _drbd_set_state: Changing state.
drbd2: _drbd_set_state: setting conn to SyncSource or SyncTarget see message below
drbd2: conn( TearDown -> Unconnected ) 
drbd2: receiver terminated
drbd2: receiver (re)started
drbd2: drbd_connect: ****************Connecting********
drbd2: _drbd_set_state: Changing state.
drbd2: _drbd_set_state: setting conn to SyncSource or SyncTarget see message below
drbd2: conn( Unconnected -> WFConnection ) 
drbd2: _drbd_set_state: Changing state.
drbd2: _drbd_set_state: setting conn to SyncSource or SyncTarget see message below
drbd2: conn( WFConnection -> WFReportParams ) 
drbd2: Handshake successful: DRBD Network Protocol version 86
drbd2: drbd_connect: ****************sending procol while Connecting********
drbd2: drbd_connect: ****************Sending paran while Connecting********
drbd2: drbd_connect: **************** Sending sizes while Connecting********
drbd2: drbd_connect: ****************sendin uuids while Connecting********
drbd2: drbd_connect: ****************Sending states while Connecting********
drbd2: drbd_connect: ****************DONE!!!! Connecting********
drbd2: drbd_sync_handshake:
drbd2: self B2E340FD0FB4F236:0000000000000000:E515D7C70644AF92:B2E340FD0FB4F219
drbd2: peer 0000000000000000:0000000000000000:B2E340FD0FB4F236:B2E340FD0FB4F219
drbd2: uuid_compare()=-2 by rule 6
drbd2: Writing meta data super block now.
drbd2: writing of bitmap took 4 jiffies
drbd2: 12 GB marked out-of-sync by on disk bit-map.
drbd2: 13336132 KB now marked out-of-sync by on disk bit-map.
drbd2: Writing meta data super block now.
drbd2:  uuid[History_start] now B2E340FD0FB4F236
drbd2:  uuid[Current] now 0000000000000000
drbd2: _drbd_set_state: Changing state.
drbd2: _drbd_set_state: setting conn to SyncSource or SyncTarget see message below
drbd2: peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapT ) pdsk( DUnknown -> UpToDate ) 
drbd2: Writing meta data super block now.


More information about the drbd-dev mailing list