Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Lars Ellenberg wrote: >> -> Start the VM on node1 >> drbd on node1: Connected Primary/Secondary >> >> -> Unplug network cables from node1 to simulate a failure >> drbd on node2: WFConnection Secondary/Unknown > > you are not simulating "a" failure, > you are simulating the absolute worst case scenario in a cluster. > you are simulating split brain. Ouch! ^^; > how about resetting node1 to simulate node failure? I tried unplugging the power chord :) In this way failback works perfectly, thanks! (I just had to tweak CentOS boot order to ensure that xend starts before heartbeat) >> -> Start the VM on node2 to simulate failover >> drbd on node2: WFConnection Primary/Unknown > > you now have the vm started and running on both nodes. > don't do that. > > using dopd with heartbeat (or even configuring stonith. but if you do, > do so properly; setting up a reliable stonith infrastructure is not as > easy as it may seem!) will help you avoid it. If I understand correctly, with dopd when the link goes down every node should go in "Outdate" state and refuse to go primary.. In this way failover won't work, right? ?_? Sorry if I'm missing something obvious, as you'd have imagined I'm quite new at cluster management. >> What should I do now to reconnect drbd so that I don't lose the file I >> copied via Samba? > > you have diverging data sets. > you had (or still have?) the vm active on both nodes at the same time. > if you had done that using shared storage (iSCSI or FC), you had just > scrambled and destroyed your vm image. > > fortunately, with DRBD, you "just" have diverging data sets. drbd protects the unwary :) (=me ^^; ) > you need to decide which of the data sets to keep. > if you decide to keep node2, > node2# # keep this nodes data > node2# # make sure it is primary, for paranoia reasons > > node1# xm destroy whatever # stop vm and everything else depending on drbd > node1# drbdadm secondary all # make drbd secondary > node1# drbdadm -- --discard-my-data connect all > > and on node2, you may now still need a > node2# drbdadm adjust all # to make it attempt a reconnect > > because you told node1 to connect and _lose_ on resync handshake, > node1 will now become sync target. Copied straight to our internal Wiki! > but please remember: don't do that. > don't simulate split brain if you mean to simulate node failure. > > hope that helps. It helped a lot, many thanks!