[DRBD-user] Split brain problem.

Mon Dec 5 03:25:15 CET 2011

Hi ALL,

Digimer, thank you again for your answer I'm really appreciate it! 
Unfortunately, I've tried to fixes split brain manually several times. 
It doesn't work.

# drbdadm disconnect r0
[root at infplsm017 ~]# drbdadm secondary r0
1: State change failed: (-12) Device is held open by someone
Command 'drbdsetup 1 secondary' terminated with exit code 11
# drbdadm -- --discard-my-data connect r0
1: Failure: (123) --discard-my-data not allowed when primary.
Command 'drbdsetup 1 net 10.10.24.10:7789 10.10.24.11:7789 C 
--set-defaults --create-device --ping-timeout=20 
--after-sb-2pri=disconnect --after-sb-1pri=discard-secondary 
--after-sb-0pri=discard-zero-changes --allow-two-primaries 
--discard-my-data' terminated with exit code 10
#

I guess I need to stop cluster daemons, don't I?

Thank you again,
Ivan

On 12/05/2011 12:21 PM, Digimer wrote:
> On 12/04/2011 04:15 PM, Ivan Pavlenko wrote:
>>          handlers {
>>                  pri-on-incon-degr
>> "/usr/lib/drbd/notify-pri-on-incon-degr.sh;
>> /usr/lib/drbd/notify-emergency-reboot.sh; echo b>  /proc/sysrq-trigger ;
>> reboot -f";
>>                  pri-lost-after-sb
>> "/usr/lib/drbd/notify-pri-lost-after-sb.sh;
>> /usr/lib/drbd/notify-emergency-reboot.sh; echo b>  /proc/sysrq-trigger ;
>> reboot -f";
>>                  local-io-error "/usr/lib/drbd/notify-io-error.sh;
>> /usr/lib/drbd/notify-emergency-shutdown.sh; echo o>  /proc/sysrq-trigger
>> ; halt -f";
>>          }
> You need to configure DRBD to use fencing. The best way to do this when
> using a Red Hat cluster is via Lon's "obliterate-peer.sh" script. You
> can download a copy this way;
>
> wget -c https://alteeve.com/files/an-cluster/sbin/obliterate-peer.sh -O
> /sbin/obliterate-peer.sh
> chmod a+x /sbin/obliterate-peer.sh
>
> Then add this;
>
> handlers {
>          fence-peer              "/sbin/obliterate-peer.sh";
> }
>
>> Here my answers on your questions:
>>
>> 1) There is definitely split brain not a network problem. I demonstrated
>> at my previous message I can ping members of the cluster and they have
>> open firewall. When I use telnet and sniffer I see nodes try to estimate
>> network connection, but they send reject pockets only.
> Indeed.
>
>> Dec  2 10:04:00 infplsm018<kern.alert>  kernel: block drbd1: Split-Brain
>> detected but unresolved, dropping connection!
> You will need to manually recover from this split brain. See;
>
> http://www.drbd.org/users-guide/s-resolve-split-brain.html
>
>> 3) And here my /etc/cluster/cluster.conf file
>>
>> <fencedevice agent="fence_null" name="nullfence"/>
>> <fencedevice agent="fence_manual" name="manfence"/>
> These are not effective or supported. You need to use real fence
> devices. This is exceedingly so when using shared storage in a cluster.
> What caused your split-brain in this case is largely meaningless without
> proper fencing.
>
> Once you have this setup, tested and working, then the next time DRBD
> would have split-brain'ed, it'll instead fence. At that point, then you
> need to sort out what is breaking your cluster. That is another thread
> though.
>