<br><div class="gmail_quote">On Fri, Oct 30, 2009 at 7:43 PM, Jean-Francois Chevrette <span dir="ltr"><<a href="mailto:jfchevrette@iweb.com">jfchevrette@iweb.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
Hello,<br>
<br>
we have a Citrix XenServer two-nodes cluster on which both nodes has a local partition that is configured as a DRBD resource. The resource is set to become primary on both nodes simultaneously. XenServer uses LVM and it is my understanding that it works in a way that any LV will ever be in use on both hosts at the same this and thus ensuring consistency between our dual-primary hosts.<br>
<br>
For the DRBD connectivity, both nodes are connected directly through a cross-over cable.<br>
<br>
For testing purposes, we have unplugged the network interfaces and thus forced both nodes to become WFConnection and in a Primary/Unknown state. VMs on each node kept working as usual.<br>
<br>
However, after reconnecting the network interfaces, both nodes became StandAlone and logs were showing that a Split-brain had been detected. It was my understanding that DRBD would have been able to sync OOS blocks from each nodes to the other one properly.<br>
<br>
What is supposed to happen when nodes from a dual-primary configuration reconnects to each other?<br>
<br>
<br>
after-sb-2pri disconnect;<br>
allow-two-primaries;<br>
}<br>
<br>
<br>
<br>
<br>
Logs from when we reconnected both nodes:<br>
<br>
block drbd0: Split-Brain detected, dropping connection!<br>
<br>
<br>
Can anyone tell me why I am not getting the behavior I am expecting?<br></blockquote></div><br>Hello,<br>the message is self explanatory: in drbd.conf you define the policy to "disconnect" when you get a split brain (sb) deriving from a 2-primary scenario.<br>
And so does drbd...<br><br>btw: having you dual primary and LVM, are you using also CLVMD? Otherwise if you do modifications on one VG (such as add an lv) you don't see them immediately, because you don't have cluster locking...<br>
<br>Bye,<br>Gianluca<br>