[Drbd-dev] [CASE-41] After re-connected, despite of OOS remaining primary does not start re-synchronization or continues AHEAD mode.

Lars Ellenberg lars.ellenberg at linbit.com
Thu Apr 21 16:42:53 CEST 2016


On Thu, Apr 21, 2016 at 11:01:56PM +0900, Jaeheon Kim wrote:
> Hi,
> 
> We wrote some temporary solution to avoid "continues AHEAD mode problem".
> We cleared AHEAD_TO_SYNC_SOURCE and forced drbd_uuid_new_current
> at conn_disconnect.
> 
> Please check following codes;
> 
> 
> 1. drbd_disconnected()
> {
> 
>  .....
> 
> drbd_md_sync(device);
> 
> if (get_ldev(device)) {
>     drbd_bitmap_io(device, &drbd_bm_write_copy_pages, "write from
> disconnected",
>                  BM_LOCK_BULK | BM_LOCK_SINGLE_SLOT, peer_device);
>     put_ldev(device);
> }
> 
> #ifdef _WIN32_V9 // temporary patch (clear AHEAD_TO_SYNC_SOURCE flag)
> clear_bit(AHEAD_TO_SYNC_SOURCE, &device->flags); //  Windows DRBD
> #endif
> 
> }
> 
> 
> 
> 2. conn_disconnect()
> {
> 
> ... at the end of this function.
> 
> #ifdef _WIN32_V9 // (don't create uuid when primary is
> drbdadm-disconnected) temporary patch
> 
>  if( (resource->role[NOW] == R_PRIMARY)
>  {
>         test_and_clear_bit(NEW_CUR_UUID, &device->flags);
>         mutex_lock(&resource->conf_update);
>         drbd_uuid_new_current(device, false);
>         mutex_unlock(&resource->conf_update);
>  }
> #endif
> 
> }
> 
> What do you think about this idea?

that the AHEAD_TO_SYNC_SOURCE bit needs to be cleaned up in disconnect
seems very plausible.
But that may get racy with more than one peer,
given that they all share the same bit in device->flags.
That does not feel right either.


UUID gets bumped on first write after the need for a bump was detected.

It needs to be bumped immediately in certain situations, though.
At least when we lose the peer disk with replication link still intact,
Probably in more situations.
Maybe simply when we lose contact to the peer while being "ahead".

Though in that case, usually the disk states (peer would be expected to
be inconsistent, or at least out-of-date) should be enough to start the
resync. anyways, it would be correct to immediately bump the uuid
when disconnecting while "ahead", we know that in general we have a
different (more recent) on-disk state in that case, no need to wait
for yet an other write.
I'll need to double check whether that happens already?

    Lars



More information about the drbd-dev mailing list