[DRBD-user] [BUG] drbdmanage 0.98 :: cannot add nodes

Mon Oct 31 10:07:05 CET 2016

On Sat, Oct 29, 2016 at 08:55:45PM +0200, Toni Bolduan wrote:
> Hi list,
> 
> Today I've updated to drbdmanage 0.98 on my 2 ubuntu server nodes.
> After setting up the volume group on both nodes I started and the
> initialization on node 1. That worked fine.
> 
> 
> Then I tried to add the second node to my cluster with "drbdmanage add-node
> node2 10.0.0.12" and drbdmanage get stuck after confirmation.

During startup drbdmanage now has to handle more things, so it might
take longer (~15-30 seconds).

> 
> On the second node dmesg shows the following:
> 
> [...]
> [ 1103.413457] drbd .drbdctrl: Terminating worker thread
> [ 1386.430669] drbd .drbdctrl: Starting worker thread (from drbdsetup
> [2142])
> [ 1386.437482] drbd .drbdctrl node1: Starting sender thread (from drbdsetup
> [2155])
> [ 1386.445330] drbd .drbdctrl/0 drbd0: disk( Diskless -> Attaching )
> [ 1386.445340] drbd .drbdctrl/0 drbd0: Maximum number of peer devices = 31
> [ 1386.445425] drbd .drbdctrl: Method to ensure write ordering: flush
> [ 1386.445427] drbd .drbdctrl/0 drbd0 node1: node_id: 0 idx: 0 bm-uuid: 0x0
> flags: 0x10 max_size: 0 (DUnknown)
> [ 1386.445428] drbd .drbdctrl/0 drbd0: my node_id: 1
> [ 1386.445433] drbd .drbdctrl/0 drbd0 node1: node_id: 0 idx: 0 bm-uuid: 0x0
> flags: 0x10 max_size: 0 (DUnknown)
> [ 1386.445434] drbd .drbdctrl/0 drbd0: my node_id: 1
> [ 1386.445435] drbd .drbdctrl/0 drbd0: drbd_bm_resize called with capacity
> == 8112
> [ 1386.445441] drbd .drbdctrl/0 drbd0: resync bitmap: bits=1014 words=496
> pages=1
> [ 1386.445442] drbd .drbdctrl/0 drbd0: size = 4056 KB (4056 KB)
> [ 1386.446431] drbd .drbdctrl/0 drbd0: recounting of set bits took
> additional 0ms
> [ 1386.446440] drbd .drbdctrl/0 drbd0: disk( Attaching -> Outdated )
> [ 1386.446443] drbd .drbdctrl/0 drbd0: attached to current UUID:
> 120FE59FE04690DE
> [ 1411.289042] drbd .drbdctrl: State change failed: Need access to UpToDate
> data
> [ 1411.289066] drbd .drbdctrl: Failed: role( Secondary -> Primary )
> [ 1434.136862] drbd .drbdctrl: State change failed: Need access to UpToDate
> data
> [...]
> [ 2033.117704] drbd .drbdctrl: Failed: role( Secondary -> Primary )
> 
> How can I figure what happened here and why?
> 

I guess that that happened while the second node was in the leader
election phase, where it tries to become DRBD Primary on the control
volume (.drbdctrl). That is how leader election basically works. All
nodes race to become Primary until one succeeds, the others then see a
Primary and give up and become satellite nodes. The problem is that
there is no UpToDate data.

I would run "drbdsetup status" in a second window and check if the
resource (.drbdctrl) makes any progress. Does it sync up to the second
node or does it get stuck after some percentage? Or does it not start
syncing at all? Are they in some strange network state,... The output of
"drbdsetup status" of both nodes would help a lot.

Regards, rck