[DRBD-user] Handling failback

Wed Oct 15 11:53:39 CEST 2008

On Wed, Oct 15, 2008 at 10:44:17AM +0200, Federico Fanton wrote:
> Hi everyone!
> I'm setting up a little cluster with two identical machines running  
> CentOS 5.2 and drbd 8.2.6.
> drbd is mirroring a logical volume that contains a Xen 3.2.1 domU, which  
> is running a Samba server. The domU's configuration uses drbd directly  
> (disk=[drbd:samba-disk,xvda1,w]), so Xen handles drbd state.
>
> Right now I'm testing failover/failback *without heartbeat* (for  
> simplicity of testing) and I can't figure how a way to handle failback :(
>
> My test goes like the following:
> Initially drbd is in Connected Secondary/Secondary mode
>
> -> Start the VM on node1
> drbd on node1: Connected Primary/Secondary
>
> -> Unplug network cables from node1 to simulate a failure
> drbd on node2: WFConnection Secondary/Unknown

you are not simulating "a" failure,
you are simulating the absolute worst case scenario in a cluster.
you are simulating split brain.

how about resetting node1 to simulate node failure?

> -> Start the VM on node2 to simulate failover
> drbd on node2: WFConnection Primary/Unknown

you now have the vm started and running on both nodes.
don't do that.

using dopd with heartbeat (or even configuring stonith. but if you do,
do so properly; setting up a reliable stonith infrastructure is not as
easy as it may seem!) will help you avoid it.

> -> Copy a file via Samba

> -> Reconnect node1
> drbd on node1: WFConnection Primary/Unknown
> drbd on node2: StandAlone Primary/Unknown
>
> What should I do now to reconnect drbd so that I don't lose the file I  
> copied via Samba?

you have diverging data sets.
you had (or still have?) the vm active on both nodes at the same time.
if you had done that using shared storage (iSCSI or FC), you had just
scrambled and destroyed your vm image.

fortunately, with DRBD, you "just" have diverging data sets.

> I think I should stop the VM on node2, reconnect drbd (which I don't  
> know how) and restart the VM on node1, am I right?

you need to decide which of the data sets to keep.
if you decide to keep node2,
  node2# # keep this nodes data
  node2# # make sure it is primary, for paranoia reasons

  node1# xm destroy whatever # stop vm and everything else depending on drbd
  node1# drbdadm secondary all # make drbd secondary
  node1# drbdadm -- --discard-my-data connect all

and on node2, you may now still need a
  node2# drbdadm adjust all # to make it attempt a reconnect

because you told node1 to connect and _lose_ on resync handshake,
node1 will now become sync target.

but please remember: don't do that.
don't simulate split brain if you mean to simulate node failure.

hope that helps.

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed