[DRBD-user] [BUG] drbdmanage 0.98 :: cannot add nodes

Mon Oct 31 13:29:57 CET 2016

Hi, thanks for reply.
Here's the output of drbdsetup status

node1:
root at deb1:~# drbdsetup status
.drbdctrl role:Primary
  volume:0 disk:UpToDate
  volume:1 disk:UpToDate

node2:
root at deb2:~# drbdsetup status
.drbdctrl role:Secondary
  volume:0 disk:Inconsistent
  volume:1 disk:Inconsistent
  deb1 connection:Connecting

I figured out, that this problem only occurs when using dedicated
interfaces for  drbd.
In a testsetup it's not important to use just one nic, but I want to run
drbd for productional use cases.

Here's the complete setup:

*Node1:*

*nic1*:

   - ip: 192.168.2.103
   - netmask: 255.255.255.0
   - gateway: 192.168.2.1

*nic2*:

   - ip: 10.0.0.11
   - netmask 255.255.255.0

*hostname*:

   - deb1

*dns*

   - 127.0.0.1       localhost
   - 10.0.0.11       deb1
   - 10.0.0.12       deb2

volumegroup drbdpool:

   - /dev/sdb

*Node2:*

*nic1*:

   - ip: 192.168.2.104
   - netmask: 255.255.255.0
   - gateway: 192.168.2.1

*nic2*:

   - ip: 10.0.0.12
   - netmask 255.255.255.0

*hostname*:

   - deb2

*dns:*

   - 127.0.0.1       localhost
   - 10.0.0.11       deb1
   - 10.0.0.12       deb2

volumegroup drbdpool:

   - /dev/sdb

It seems DRBD cannot figure out who/what is primary...
DRBD drives me insane... Sometimes it work and sometimes it doesn't...

drbdmanage init 10.0.0.11 got stuck 2 times... and at the 3rd try it worked
like a charm. Hä!?!?!

Here's the ouput after trying to add a secondary node:

------------------- 1st node ------------------------

root at deb1:~# drbdmanage add-node deb2 10.0.0.12
Operation completed successfully
Operation completed successfully

Executing join command using ssh.
IMPORTANT: The output you see comes from deb2
IMPORTANT: Your input is executed on deb2
You are going to join an existing drbdmanage cluster.
CAUTION! Note that:
  * Any previous drbdmanage cluster information may be removed
  * Any remaining resources managed by a previous drbdmanage installation
    that still exist on this system will no longer be managed by drbdmanage

Confirm:

  yes/no: yes
Operation completed successfully
root at deb1:~#

root at deb1:~# drbdsetup status
.drbdctrl role:Primary
  volume:0 disk:UpToDate
  volume:1 disk:UpToDate

root at deb1:~#
root at deb1:~#
root at deb1:~# drbdsetup status
.drbdctrl role:Primary
  volume:0 disk:UpToDate
  volume:1 disk:UpToDate
  deb2 role:Secondary
    volume:0 replication:SyncSource peer-disk:Inconsistent done:15.78
    volume:1 replication:SyncSource peer-disk:Inconsistent done:15.78

root at deb1:~# drbdsetup status
.drbdctrl role:Primary
  volume:0 disk:UpToDate
  volume:1 disk:UpToDate
  deb2 role:Secondary
    volume:0 peer-disk:UpToDate
    volume:1 peer-disk:UpToDate

------------------- 2nd node ------------------------

root at deb2:~# drbdsetup status
.drbdctrl role:Secondary
  volume:0 disk:Inconsistent
  volume:1 disk:Inconsistent
  deb1 role:Primary
    volume:0 replication:SyncTarget peer-disk:UpToDate done:81.46
    volume:1 replication:SyncTarget peer-disk:UpToDate done:81.46

root at deb2:~# drbdsetup status
.drbdctrl role:Secondary
  volume:0 disk:UpToDate
  volume:1 disk:UpToDate
  deb1 role:Primary
    volume:0 peer-disk:UpToDate
    volume:1 peer-disk:UpToDate

root at deb2:~# drbdsetup status
.drbdctrl role:Secondary
  volume:0 disk:UpToDate
  volume:1 disk:UpToDate
  deb1 role:Primary
    volume:0 peer-disk:UpToDate
    volume:1 peer-disk:UpToDate

What is going on there and why does it work and sometimes does not?

Best Regards,
Toni (Still a big fan)

2016-10-31 10:07 GMT+01:00 Roland Kammerer <roland.kammerer at linbit.com>:

> On Sat, Oct 29, 2016 at 08:55:45PM +0200, Toni Bolduan wrote:
> > Hi list,
> >
> > Today I've updated to drbdmanage 0.98 on my 2 ubuntu server nodes.
> > After setting up the volume group on both nodes I started and the
> > initialization on node 1. That worked fine.
> >
> >
> > Then I tried to add the second node to my cluster with "drbdmanage
> add-node
> > node2 10.0.0.12" and drbdmanage get stuck after confirmation.
>
> During startup drbdmanage now has to handle more things, so it might
> take longer (~15-30 seconds).
>
> >
> > On the second node dmesg shows the following:
> >
> > [...]
> > [ 1103.413457] drbd .drbdctrl: Terminating worker thread
> > [ 1386.430669] drbd .drbdctrl: Starting worker thread (from drbdsetup
> > [2142])
> > [ 1386.437482] drbd .drbdctrl node1: Starting sender thread (from
> drbdsetup
> > [2155])
> > [ 1386.445330] drbd .drbdctrl/0 drbd0: disk( Diskless -> Attaching )
> > [ 1386.445340] drbd .drbdctrl/0 drbd0: Maximum number of peer devices =
> 31
> > [ 1386.445425] drbd .drbdctrl: Method to ensure write ordering: flush
> > [ 1386.445427] drbd .drbdctrl/0 drbd0 node1: node_id: 0 idx: 0 bm-uuid:
> 0x0
> > flags: 0x10 max_size: 0 (DUnknown)
> > [ 1386.445428] drbd .drbdctrl/0 drbd0: my node_id: 1
> > [ 1386.445433] drbd .drbdctrl/0 drbd0 node1: node_id: 0 idx: 0 bm-uuid:
> 0x0
> > flags: 0x10 max_size: 0 (DUnknown)
> > [ 1386.445434] drbd .drbdctrl/0 drbd0: my node_id: 1
> > [ 1386.445435] drbd .drbdctrl/0 drbd0: drbd_bm_resize called with
> capacity
> > == 8112
> > [ 1386.445441] drbd .drbdctrl/0 drbd0: resync bitmap: bits=1014 words=496
> > pages=1
> > [ 1386.445442] drbd .drbdctrl/0 drbd0: size = 4056 KB (4056 KB)
> > [ 1386.446431] drbd .drbdctrl/0 drbd0: recounting of set bits took
> > additional 0ms
> > [ 1386.446440] drbd .drbdctrl/0 drbd0: disk( Attaching -> Outdated )
> > [ 1386.446443] drbd .drbdctrl/0 drbd0: attached to current UUID:
> > 120FE59FE04690DE
> > [ 1411.289042] drbd .drbdctrl: State change failed: Need access to
> UpToDate
> > data
> > [ 1411.289066] drbd .drbdctrl: Failed: role( Secondary -> Primary )
> > [ 1434.136862] drbd .drbdctrl: State change failed: Need access to
> UpToDate
> > data
> > [...]
> > [ 2033.117704] drbd .drbdctrl: Failed: role( Secondary -> Primary )
> >
> > How can I figure what happened here and why?
> >
>
> I guess that that happened while the second node was in the leader
> election phase, where it tries to become DRBD Primary on the control
> volume (.drbdctrl). That is how leader election basically works. All
> nodes race to become Primary until one succeeds, the others then see a
> Primary and give up and become satellite nodes. The problem is that
> there is no UpToDate data.
>
> I would run "drbdsetup status" in a second window and check if the
> resource (.drbdctrl) makes any progress. Does it sync up to the second
> node or does it get stuck after some percentage? Or does it not start
> syncing at all? Are they in some strange network state,... The output of
> "drbdsetup status" of both nodes would help a lot.
>
> Regards, rck
>
> _______________________________________________
> drbd-user mailing list
> drbd-user at lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20161031/76e1cdc8/attachment.htm>