[DRBD-user] Could not add a new node

Tue Jul 25 08:50:34 CEST 2017

On Mon, Jul 24, 2017 at 05:25:40PM +0900, 大川敬臣 wrote:
> Please let me ask you questions about DRBD.
> I'm testing DRBD 9.0.8 with RHEL 7.3 on AWS environment.
> 
> 【Configurations】
> OS：RHEL 7.3
> Kernel version:3.10.0-514.el7.x86_64
> DRBD:drbd-9.0.8-1
>      drbd-utils-9.0.0
>      drbdmanage-0.99.5
> Host Name：drbd-01、drbd-02
> Disk for DRBD: /dev/xvdb
> 
> ・DRBD volume is going to be used  as MySQL data volume.
> ・AWS env is only test environment. PROD env is VM ESXi.
> 
> 【Installing steps】
> ・Installing required packages
> # yum -y install kernel-devel.x86_64
> # yum -y install gcc.x86_64
> # yum -y install flex.x86_64
> 
> ・Installing DRBD
> # cd /usr/local/src/
> # tar zxf drbd-9.0.8-1.tar.gz
> # cd drbd-9.0.8-1
> # make KDIR=/usr/src/kernels/$(uname -r)
> # make install
> # modprobe drbd
> # cat /proc/drbd
> 
> ・Installing drbd-util
> # cd /usr/local/src/
> # tar zxf drbd-utils-9.0.0.tar.gz
> # cd drbd-utils-9.0.0
> # ./configure
> # make
> # make install
> 
> ・Installing DRBD Manage
> # cd /usr/local/src/
> # tar zxf drbdmanage-0.99.5.tar.gz
> # cd drbdmanage-0.99.5
> # ./setup.py build
> # ./setup.py install
> 
> 【Initialize DRBD】
> - Creating pool for DRBD
> # vgcreate drbdpool /dev/xvdb1
>   Physical volume "/dev/xvdb1" successfully created.
>   Volume group "drbdpool" successfully created
> #
> 
> - Execute on drbd-01
> # drbdmanage init 172.31.1.155
> 
> You are going to initialize a new drbdmanage cluster.
> CAUTION! Note that:
>   * Any previous drbdmanage cluster information may be removed
>   * Any remaining resources managed by a previous drbdmanage installation
>     that still exist on this system will no longer be managed by drbdmanage
> 
> Confirm:
> 
>   yes/no: yes
> Empty drbdmanage control volume initialized on '/dev/drbd0'.
> Empty drbdmanage control volume initialized on '/dev/drbd1'.
> Waiting for server: .
> Operation completed successfully
> #
> # drbdadm status
> .drbdctrl role:Primary
>   volume:0 disk:UpToDate
>   volume:1 disk:UpToDate
> 
> #
> #
> # lvdisplay -c
>   /dev/drbdpool/.drbdctrl_0:drbdpool:3:1:-1:2:8192:1:-1:0:-1:253:0
>   /dev/drbdpool/.drbdctrl_1:drbdpool:3:1:-1:2:8192:1:-1:0:-1:253:1
> #
> # drbdmanage list-nodes
> +------------------------------------------------------------------------------------------------------------+
> | Name    | Pool Size | Pool Free |
>                          | State |
> |------------------------------------------------------------------------------------------------------------|
> | drbd-01 |     10236 |     10228 |
>                          |    ok |
> +------------------------------------------------------------------------------------------------------------+
> #
> # drbdmanage new-node drbd-02 172.31.8.103
> Operation completed successfully
> Operation completed successfully
> 

Please read the following lines again:

> Executing join command using ssh.
> IMPORTANT: The output you see comes from drbd-02
> IMPORTANT: Your input is executed on drbd-02

Reread the lines above^^ ;-)

> You are going to join an existing drbdmanage cluster.
> CAUTION! Note that:
>   * Any previous drbdmanage cluster information may be removed
>   * Any remaining resources managed by a previous drbdmanage installation
>     that still exist on this system will no longer be managed by drbdmanage
> 
> Confirm:
> 
>   yes/no: yes
> Waiting for server to start up (can take up to 1 min)
> Waiting for server: ......
> Operation completed successfully
> Give leader time to contact the new node
> Operation completed successfully
> Operation completed successfully

I guess at that point your second node was successfully joined.

> #
> #
> # drbdmanage howto-join drbd-02
> IMPORTANT: Execute the following command only on node drbd-02!
> drbdmanage join -p 6999 172.31.8.103 1 drbd-01 172.31.1.155 0
> aez1qL969FHRDYJH4qYD
> Operation completed successfully
> #
> #
> 
> - Execute on drbd-02
> # drbdmanage join -p 6999 172.31.8.103 1 drbd-01 172.31.1.155 0
> aez1qL969FHRDYJH4qYD
> You are going to join an existing drbdmanage cluster.
> CAUTION! Note that:
>   * Any previous drbdmanage cluster information may be removed
>   * Any remaining resources managed by a previous drbdmanage installation
>     that still exist on this system will no longer be managed by drbdmanage
> 
> Confirm:
> 
>   yes/no: yes
> Waiting for server to start up (can take up to 1 min)
> Operation completed successfully
> #
> #

Why did you rejoin it? It was already joined.

> 
> - Execute on drbd-01
> # drbdmanage list-nodes
> +------------------------------------------------------------------------------------------+
> | Name    | Pool Size | Pool Free |                          |
>           State |
> |------------------------------------------------------------------------------------------|
> | drbd-01 |     10236 |     10228 |                          |
>  online/quorum vote ignored |
> | drbd-02 |     10236 |     10228 |                          |
> offline/quorum vote ignored |
> +------------------------------------------------------------------------------------------+
> [root at drbd-01 drbd.d]#
> 
> - Execute on drbd-02
> # drbdmanage list-nodes
> +------------------------------------------------------------------------------------------+
> | Name    | Pool Size | Pool Free |                          |
>           State |
> |------------------------------------------------------------------------------------------|
> | drbd-01 |     10236 |     10228 |                          |
> offline/quorum vote ignored |
> | drbd-02 |     10236 |     10228 |                          |
>  online/quorum vote ignored |
> +------------------------------------------------------------------------------------------+
> #
> #
> 
> When I checked syslog on drbd-01, the messages blow were writen.
> ----------
> Jul 24 03:27:58 drbd-01 dbus-daemon: .drbdctrl role:Primary
> Jul 24 03:27:58 drbd-01 dbus-daemon: volume:0 disk:UpToDate
> Jul 24 03:27:58 drbd-01 dbus-daemon: volume:1 disk:UpToDate
> Jul 24 03:29:55 drbd-01 drbdmanaged[2221]: INFO       DrbdAdm: Running
> external command: drbdadm -vvv adjust .drbdctrl
> Jul 24 03:29:55 drbd-01 drbdmanaged[2221]: ERROR      DrbdAdm: External
> command 'drbdadm': Exit code 1
> Jul 24 03:29:55 drbd-01 dbus-daemon: .drbdctrl role:Primary
> ----------
> 
> on drbd-02
> ----------
> Jul 24 03:29:59 drbd-02 drbdmanaged[2184]: INFO       DrbdAdm: Running
> external command: drbdadm -vvv adjust .drbdctrl
> Jul 24 03:29:59 drbd-02 drbdmanaged[2184]: ERROR      DrbdAdm: External
> command 'drbdadm': Exit code 1
> Jul 24 03:29:59 drbd-02 drbdmanaged[2184]: INFO       DRBDManage starting
> as potential leader node
> Jul 24 03:29:59 drbd-02 dbus-daemon: .drbdctrl role:Secondary
> Jul 24 03:29:59 drbd-02 dbus-daemon: volume:0 disk:Inconsistent
> Jul 24 03:29:59 drbd-02 dbus-daemon: volume:1 disk:Inconsistent
> Jul 24 03:29:59 drbd-02 dbus-daemon: drbd-01 connection:Connecting

Looks like the two machines never connected successfully on DRBD level.
I saw that recently on AWS and it is usually a problem with the network.
All ports that you need are open? For the join you used the node name
shown in "uname -n"?

Review the res file of the control volume and try to connect the
.drbdctrl on DRBD level. As long as the nodes can not connect on DRBD
level, the next higher level - drbdmanage - will fail.

Regards, rck