[Drbd-dev] 8.2.6 Peer disk state handling issue when attaching

Tue Aug 12 05:52:30 CEST 2008

I have noticed that with 8.2.6, if the role of a device is
Secondary/Secondary and you detach and then re-attach a device, the peer
disk state on the other node ends up as Consistent instead of UpToDate -
it seems that in this case the code does not check if a resync is
required and goes directly from DiskLess->Consistent on the side that is
not doing the detach/attach.

Here is a sample extract from the messages file on the two systems:

First, on the system where you do the detach followed by attach
(connection state is Connected when this starts, roles are
Secondary/Secondary, disk UpToDate/UpToDate:

Aug  9 04:53:11 node0 kernel: drbd16: disk( UpToDate -> Diskless ) 

Aug  9 04:53:32 node0 kernel: drbd16: disk( Diskless -> Attaching ) 
Aug  9 04:53:32 node0 kernel: drbd16: No usable activity log found.
Aug  9 04:53:32 node0 kernel: drbd16: max_segment_size ( = BIO size ) =
32768
Aug  9 04:53:32 node0 kernel: drbd16: reading of bitmap took 1 jiffies
Aug  9 04:53:32 node0 kernel: drbd16: recounting of set bits took
additional 0 jiffies
Aug  9 04:53:32 node0 kernel: drbd16: 0 KB (0 bits) marked out-of-sync
by on disk bit-map.
Aug  9 04:53:32 node0 kernel: drbd16: disk( Attaching -> Negotiating ) 
Aug  9 04:53:32 node0 kernel: drbd16: Writing meta data super block now.
Aug  9 04:53:32 node0 kernel: drbd16: disk( Negotiating -> UpToDate )

On the other node (same starting state):

Aug  9 04:53:11 node1 kernel: drbd16: pdsk( UpToDate -> Diskless ) 

Aug  9 04:53:32 node1 kernel: drbd16: real peer disk state = Consistent
Aug  9 04:53:32 node1 kernel: drbd16: pdsk( Diskless -> Consistent )

I can see why the second node does not go to the UpToDate state - there
is a check in _drbd_set_state such that it only overwrites Consistent
with UpToDate if the connection state is also changing which it does not
in this case. HOWEVER, I'm not sure this is the right place to fix it -
it seems to me that we should check for a resync even in this case since
one or both of the disks could have been Primary and modified the disk
at some point and then been downgraded to Secondary - so we really need
to call drbd_sync_handshake even in this case, but we don't seem to...

I don't see any fixes post 8.2.6 that obviously address this but perhaps
I missed something? If not, any thoughts on the right way to fix this?

Thanks,
Simon