[DRBD-user] FAQ: Reconnecting after a temporary primary node failure

Tue Mar 8 17:33:49 CET 2011

On Mon, Mar 07, 2011 at 06:39:09PM -0500, Mauricio Tavares wrote:
> Newbie question here: so I created a mysql+drdb test setup with two
> nodes (vms), db1 and db2.
> 
> root at db1:~# drbdadm role r0
> Primary/Secondary
> root at db1:~#
> root at db2:~# drbdadm role r0
> Secondary/Primary
> root at db2:~#

> root at db1:~# tail /var/log/kern.log
> Mar  7 18:24:19 db1 kernel: [  744.695406] block drbd0: receiver (re)started
> Mar  7 18:24:19 db1 kernel: [  744.695414] block drbd0: conn( Unconnected -> WFConnection )
> Mar  7 18:24:19 db1 kernel: [  744.696261] block drbd0: bind before connect failed, err = -99

There.
DRBD can not bind to the address you configured it to.
So what can it do? -> go standalone.

Get rid of those funny NetworkManager or related stuff that
tries to be smart and deconfigures a NIC if you unplug it.
That does not belong on a server.

> Mar  7 18:24:19 db1 kernel: [  744.696271] block drbd0: conn( WFConnection -> Disconnecting )
> Mar  7 18:24:19 db1 kernel: [  744.696345] block drbd0: Discarding network configuration.
> Mar  7 18:24:19 db1 kernel: [  744.696702] block drbd0: Connection closed
> Mar  7 18:24:19 db1 kernel: [  744.696719] block drbd0: conn( Disconnecting -> StandAlone )

There it goes to StandAlone.

> Mar  7 18:24:19 db1 kernel: [  744.697261] block drbd0: receiver terminated
> Mar  7 18:24:19 db1 kernel: [  744.697280] block drbd0: Terminating receiver thread

There you plug the network back in.
Note the timestamps.

> Mar  7 18:31:41 db1 kernel: [ 1186.440928] e1000: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX

Since the network part of DRBD is not configured (has been discarded,
as the was not able to bind() to the configured address),
you need to re-configure the network part now:
"drbdadm adjust", resp. "drbdadm connect".

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com