Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On Sat, Oct 29, 2016 at 08:55:45PM +0200, Toni Bolduan wrote: > Hi list, > > Today I've updated to drbdmanage 0.98 on my 2 ubuntu server nodes. > After setting up the volume group on both nodes I started and the > initialization on node 1. That worked fine. > > > Then I tried to add the second node to my cluster with "drbdmanage add-node > node2 10.0.0.12" and drbdmanage get stuck after confirmation. During startup drbdmanage now has to handle more things, so it might take longer (~15-30 seconds). > > On the second node dmesg shows the following: > > [...] > [ 1103.413457] drbd .drbdctrl: Terminating worker thread > [ 1386.430669] drbd .drbdctrl: Starting worker thread (from drbdsetup > [2142]) > [ 1386.437482] drbd .drbdctrl node1: Starting sender thread (from drbdsetup > [2155]) > [ 1386.445330] drbd .drbdctrl/0 drbd0: disk( Diskless -> Attaching ) > [ 1386.445340] drbd .drbdctrl/0 drbd0: Maximum number of peer devices = 31 > [ 1386.445425] drbd .drbdctrl: Method to ensure write ordering: flush > [ 1386.445427] drbd .drbdctrl/0 drbd0 node1: node_id: 0 idx: 0 bm-uuid: 0x0 > flags: 0x10 max_size: 0 (DUnknown) > [ 1386.445428] drbd .drbdctrl/0 drbd0: my node_id: 1 > [ 1386.445433] drbd .drbdctrl/0 drbd0 node1: node_id: 0 idx: 0 bm-uuid: 0x0 > flags: 0x10 max_size: 0 (DUnknown) > [ 1386.445434] drbd .drbdctrl/0 drbd0: my node_id: 1 > [ 1386.445435] drbd .drbdctrl/0 drbd0: drbd_bm_resize called with capacity > == 8112 > [ 1386.445441] drbd .drbdctrl/0 drbd0: resync bitmap: bits=1014 words=496 > pages=1 > [ 1386.445442] drbd .drbdctrl/0 drbd0: size = 4056 KB (4056 KB) > [ 1386.446431] drbd .drbdctrl/0 drbd0: recounting of set bits took > additional 0ms > [ 1386.446440] drbd .drbdctrl/0 drbd0: disk( Attaching -> Outdated ) > [ 1386.446443] drbd .drbdctrl/0 drbd0: attached to current UUID: > 120FE59FE04690DE > [ 1411.289042] drbd .drbdctrl: State change failed: Need access to UpToDate > data > [ 1411.289066] drbd .drbdctrl: Failed: role( Secondary -> Primary ) > [ 1434.136862] drbd .drbdctrl: State change failed: Need access to UpToDate > data > [...] > [ 2033.117704] drbd .drbdctrl: Failed: role( Secondary -> Primary ) > > How can I figure what happened here and why? > I guess that that happened while the second node was in the leader election phase, where it tries to become DRBD Primary on the control volume (.drbdctrl). That is how leader election basically works. All nodes race to become Primary until one succeeds, the others then see a Primary and give up and become satellite nodes. The problem is that there is no UpToDate data. I would run "drbdsetup status" in a second window and check if the resource (.drbdctrl) makes any progress. Does it sync up to the second node or does it get stuck after some percentage? Or does it not start syncing at all? Are they in some strange network state,... The output of "drbdsetup status" of both nodes would help a lot. Regards, rck