Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On 12/04/2011 04:15 PM, Ivan Pavlenko wrote:
> handlers {
> pri-on-incon-degr
> "/usr/lib/drbd/notify-pri-on-incon-degr.sh;
> /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ;
> reboot -f";
> pri-lost-after-sb
> "/usr/lib/drbd/notify-pri-lost-after-sb.sh;
> /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ;
> reboot -f";
> local-io-error "/usr/lib/drbd/notify-io-error.sh;
> /usr/lib/drbd/notify-emergency-shutdown.sh; echo o > /proc/sysrq-trigger
> ; halt -f";
> }
You need to configure DRBD to use fencing. The best way to do this when
using a Red Hat cluster is via Lon's "obliterate-peer.sh" script. You
can download a copy this way;
wget -c https://alteeve.com/files/an-cluster/sbin/obliterate-peer.sh -O
/sbin/obliterate-peer.sh
chmod a+x /sbin/obliterate-peer.sh
Then add this;
handlers {
fence-peer "/sbin/obliterate-peer.sh";
}
> Here my answers on your questions:
>
> 1) There is definitely split brain not a network problem. I demonstrated
> at my previous message I can ping members of the cluster and they have
> open firewall. When I use telnet and sniffer I see nodes try to estimate
> network connection, but they send reject pockets only.
Indeed.
> Dec 2 10:04:00 infplsm018 <kern.alert> kernel: block drbd1: Split-Brain
> detected but unresolved, dropping connection!
You will need to manually recover from this split brain. See;
http://www.drbd.org/users-guide/s-resolve-split-brain.html
> 3) And here my /etc/cluster/cluster.conf file
>
> <fencedevice agent="fence_null" name="nullfence"/>
> <fencedevice agent="fence_manual" name="manfence"/>
These are not effective or supported. You need to use real fence
devices. This is exceedingly so when using shared storage in a cluster.
What caused your split-brain in this case is largely meaningless without
proper fencing.
Once you have this setup, tested and working, then the next time DRBD
would have split-brain'ed, it'll instead fence. At that point, then you
need to sort out what is breaking your cluster. That is another thread
though.
--
Digimer
E-Mail: digimer at alteeve.com
Freenode handle: digimer
Papers and Projects: http://alteeve.com
Node Assassin: http://nodeassassin.org
"omg my singularity battery is dead again.
stupid hawking radiation." - epitron