[Drbd-dev] 8.2.6 Peer disk state handling issue when attaching

Tue Aug 12 10:49:46 CEST 2008

On Mon, Aug 11, 2008 at 11:52:30PM -0400, Graham, Simon wrote:
> I have noticed that with 8.2.6, if the role of a device is
> Secondary/Secondary and you detach and then re-attach a device, the peer
> disk state on the other node ends up as Consistent instead of UpToDate -
> it seems that in this case the code does not check if a resync is
> required and goes directly from DiskLess->Consistent on the side that is
> not doing the detach/attach.
> 
> Here is a sample extract from the messages file on the two systems:
> 
> First, on the system where you do the detach followed by attach
> (connection state is Connected when this starts, roles are
> Secondary/Secondary, disk UpToDate/UpToDate:
> 
> Aug  9 04:53:11 node0 kernel: drbd16: disk( UpToDate -> Diskless ) 
> 
> Aug  9 04:53:32 node0 kernel: drbd16: disk( Diskless -> Attaching ) 
> Aug  9 04:53:32 node0 kernel: drbd16: No usable activity log found.
> Aug  9 04:53:32 node0 kernel: drbd16: max_segment_size ( = BIO size ) =
> 32768
> Aug  9 04:53:32 node0 kernel: drbd16: reading of bitmap took 1 jiffies
> Aug  9 04:53:32 node0 kernel: drbd16: recounting of set bits took
> additional 0 jiffies
> Aug  9 04:53:32 node0 kernel: drbd16: 0 KB (0 bits) marked out-of-sync
> by on disk bit-map.
> Aug  9 04:53:32 node0 kernel: drbd16: disk( Attaching -> Negotiating ) 
> Aug  9 04:53:32 node0 kernel: drbd16: Writing meta data super block now.
> Aug  9 04:53:32 node0 kernel: drbd16: disk( Negotiating -> UpToDate )
> 
> On the other node (same starting state):
> 
> Aug  9 04:53:11 node1 kernel: drbd16: pdsk( UpToDate -> Diskless ) 
> 
> Aug  9 04:53:32 node1 kernel: drbd16: real peer disk state = Consistent
> Aug  9 04:53:32 node1 kernel: drbd16: pdsk( Diskless -> Consistent )
> 
> I can see why the second node does not go to the UpToDate state - there
> is a check in _drbd_set_state such that it only overwrites Consistent
> with UpToDate if the connection state is also changing which it does not
> in this case. HOWEVER, I'm not sure this is the right place to fix it -
> it seems to me that we should check for a resync even in this case since
> one or both of the disks could have been Primary and modified the disk
> at some point and then been downgraded to Secondary - so we really need
> to call drbd_sync_handshake even in this case, but we don't seem to...
> 
> I don't see any fixes post 8.2.6 that obviously address this but perhaps
> I missed something?

confirmed in current 8.2 git.

> If not, any thoughts on the right way to fix this?

I leave that question open for now.

-- 
: Lars Ellenberg                            Tel +43-1-8178292-55 :
: LINBIT Information Technologies GmbH      Fax +43-1-8178292-82 :
: Vivenotgasse 48, A-1120 Vienna/Europe    http://www.linbit.com :