[DRBD-user] Re: Handling failback

Federico Fanton fake at panizzolo.it
Wed Oct 15 15:00:10 CEST 2008

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Lars Ellenberg wrote:
>> -> Start the VM on node1
>> drbd on node1: Connected Primary/Secondary
>>
>> -> Unplug network cables from node1 to simulate a failure
>> drbd on node2: WFConnection Secondary/Unknown
> 
> you are not simulating "a" failure,
> you are simulating the absolute worst case scenario in a cluster.
> you are simulating split brain.

Ouch! ^^;

> how about resetting node1 to simulate node failure?

I tried unplugging the power chord :) In this way failback works 
perfectly, thanks! (I just had to tweak CentOS boot order to ensure that 
xend starts before heartbeat)


>> -> Start the VM on node2 to simulate failover
>> drbd on node2: WFConnection Primary/Unknown
> 
> you now have the vm started and running on both nodes.
> don't do that.
> 
> using dopd with heartbeat (or even configuring stonith. but if you do,
> do so properly; setting up a reliable stonith infrastructure is not as
> easy as it may seem!) will help you avoid it.

If I understand correctly, with dopd when the link goes down every node 
should go in "Outdate" state and refuse to go primary.. In this way 
failover won't work, right? ?_? Sorry if I'm missing something obvious, 
as you'd have imagined I'm quite new at cluster management.


>> What should I do now to reconnect drbd so that I don't lose the file I  
>> copied via Samba?
> 
> you have diverging data sets.
> you had (or still have?) the vm active on both nodes at the same time.
> if you had done that using shared storage (iSCSI or FC), you had just
> scrambled and destroyed your vm image.
> 
> fortunately, with DRBD, you "just" have diverging data sets.

drbd protects the unwary :) (=me ^^; )


> you need to decide which of the data sets to keep.
> if you decide to keep node2,
>   node2# # keep this nodes data
>   node2# # make sure it is primary, for paranoia reasons
> 
>   node1# xm destroy whatever # stop vm and everything else depending on drbd
>   node1# drbdadm secondary all # make drbd secondary
>   node1# drbdadm -- --discard-my-data connect all
> 
> and on node2, you may now still need a
>   node2# drbdadm adjust all # to make it attempt a reconnect
> 
> because you told node1 to connect and _lose_ on resync handshake,
> node1 will now become sync target.

Copied straight to our internal Wiki!


> but please remember: don't do that.
> don't simulate split brain if you mean to simulate node failure.
> 
> hope that helps.

It helped a lot, many thanks!




More information about the drbd-user mailing list