[DRBD-user] Antwort: Re: Drbd split brain after network failure

Thu Apr 9 16:53:01 CEST 2015

If you have the possibility to use a dedicated, ideally direct network
connection for DRBD rather than the interface you are using for all other
traffic as well, that would increase the stability of your solution.
Configure 2 rings in pacemaker and set the primary ring to use the direct
link. There is quite a chance that you might be unable to reach the stonith
device in case of network issues on the normal network, so relying on
stonith alone is a bad idea. Stonith can have unwanted sideeffects if the
fenced node is powered back on while the former master is down as well
(cascading powerfail or things like that). Yo want to make sure the
resource is fenced on both nodes before the node is fenced. Otherwise the
fenced node has no way of knowing, that his data is outdated. I have not
checked lately how this is implemented if the policy is set as suggested.
So be sure to test that the former slave can not become master if the
former master has failed in the meantime and no sync has taken place.

Mit freundlichen Grüßen / Best Regards

Robert Köppl

Customer Support & Projects
Teamleader IT Support

KNAPP Systemintegration GmbH
Waltenbachstraße 9
8700 Leoben, Austria
Phone: +43 3842 805-322
Fax: +43 3842 82930-500
robert.koeppl at knapp.com
www.KNAPP.com

Commercial register number: FN 138870x
Commercial register court: Leoben

The information in this e-mail (including any attachment) is confidential
and intended to be for the use of the addressee(s) only. If you have
received the e-mail by mistake, any disclosure, copy, distribution or use
of the contents of the e-mail is prohibited, and you must delete the e-mail
from your system. As e-mail can be changed electronically KNAPP assumes no
responsibility for any alteration to this e-mail or its attachments. KNAPP
has taken every reasonable precaution to ensure that any attachment to this
e-mail has been swept for virus. However, KNAPP does not accept any
liability for damage sustained as a result of such attachment being virus
infected and strongly recommend that you carry out your own virus check
before opening any attachment.

Von:	Digimer <lists at alteeve.ca>
An:	violeta mateiu <mateiu.violeta at gmail.com>,
            drbd-user at lists.linbit.com
Datum:	09.04.2015 14:32
Betreff:	Re: [DRBD-user] Drbd split brain after network failure
Gesendet von:	drbd-user-bounces at lists.linbit.com

Fencing would prevent this. Configure (and test!) stonith in pacemaker,
then hook DRBD into it using the crm-fence-peer.sh and
crm-unfence-peer.sh handlers. You also need to set the fencing policy to
'resource-and-stonith;'. This way, instead of assuming the other peer is
dead, it will fence and be sure.

On 09/04/15 06:04 AM, violeta mateiu wrote:
> Hello,
>
> I have configured a two node Active/Pasive (host1, host2) cluster on
> fedora 20 with pacemaker, corosync, drbd .
>
> Every time i unplug a network cable form the master host at that point
> drbd goes in split brain state. After i reconnect brbd says StandAlone
> on one host and WFConnection on the other host.
>
> I can't fix this issue. Even though i declared automatic split-brain
> policies in drbd configuration file my DRBD never recoveres after a
> network failure.
>
> My resource configuration is as fallows:
>
> Resources:
>  Resource: ClusterIP (class=ocf provider=heartbeat type=IPaddr2)
>   Attributes: ip=192.168.100.94 cidr_netmask=32
>   Operations: monitor interval=30s (ClusterIP-monitor-interval-30s)
>  Resource: WebSite (class=ocf provider=heartbeat type=apache)
>   Attributes: configfile=/etc/httpd/conf/httpd.conf
> statusurl=http://localhost/server-status
>   Operations: monitor interval=1min (WebSite-monitor-interval-1min)
>  Master: WebDataClone
>   Meta Attrs: master-max=1 master-node-max=1 clone-max=2
> clone-node-max=1 notify=true
>   Resource: WebData (class=ocf provider=linbit type=drbd)
>    Attributes: drbd_resource=www
>    Operations: monitor interval=60s (WebData-monitor-interval-60s)
>  Resource: WebFS (class=ocf provider=heartbeat type=Filesystem)
>   Attributes: device=/dev/drbd/by-res/www directory=/var/www/html
> fstype=ext4
>  Master: SQLDataClone
>   Meta Attrs: master-max=1 master-node-max=1 clone-max=2
> clone-node-max=1 notify=true
>   Resource: SQLData (class=ocf provider=linbit type=drbd)
>    Attributes: drbd_resource=sql
>    Operations: monitor interval=60s (SQLData-monitor-interval-60s)
>  Resource: SQLFS (class=ocf provider=heartbeat type=Filesystem)
>   Attributes: device=/dev/drbd4 directory=/var/lib/mysql/data fstype=ext4
>  Resource: appServer (class=ocf provider=heartbeat type=anything)
>   Attributes: binfile=/home/myApp/myAppV2_17 workdir=/home/myApp/
> logfile=/home/myApp/logFile.log errlogfile=/home/myApp/errlogFile.log
> cmdline_options=/home/myApp/config.cfg
>   Operations: monitor interval=120s (appServer-monitor-interval-120s)
>  Resource: MySQL (class=ocf provider=heartbeat type=mysql)
>   Attributes: binary=/usr/sbin/mysqld config=/var/lib/mysql/my.cnf
> datadir=/var/lib/mysql/data pid=/var/run/mysqld/mysqld.pid
> socket=/var/lib/mysql/mysql.sock
>   Operations: monitor interval=60s (MySQL-monitor-interval-60s)
>
> I have two drbd partitions configured for mysql data files and one for
> apache files.
>
> Drbd configuration is as fallows:
>
> global {
>  usage-count yes;
> }
> common {
>  protocol C;
> }
> resource sql {
>  meta-disk internal;
>  device /dev/drbd4;
>
>  syncer {
>   verify-alg sha1;
>  }
>
>  net {
>    allow-two-primaries;
>
>     after-sb-0pri discard-zero-changes;
>     after-sb-1pri discard-secondary;
>
>    after-sb-2pri disconnect;
>  }
>  on host1 {
>   disk /dev/SQL/TestSQL;
>   address 192.168.100.92:7789 <http://192.168.100.92:7789>;
>  }
>  on host2 {
>   disk /dev/SQL/TestSQL;
>  address 192.168.100.93:7789 <http://192.168.100.93:7789>;
>  }
> }
> resource www {
>  meta-disk internal;
>  device /dev/drbd3;
>  syncer {
>   verify-alg sha1;
>  }
>
>  net {
>    allow-two-primaries;
>
>  after-sb-0pri discard-zero-changes;
>  after-sb-1pri discard-secondary;
>
>    after-sb-2pri disconnect;
>  }
>  on host1 {
>   disk /dev/WEB/TestWEB;
>   address 192.168.100.92:7799 <http://192.168.100.92:7799>;
>  }
>  on host2 {
>   disk /dev/WEB/TestWEB;
>   address 192.168.100.93:7799 <http://192.168.100.93:7799>;
>  }
> }
>
> Thank you,
> Violeta
>
>
> _______________________________________________
> drbd-user mailing list
> drbd-user at lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user
>

--
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?
_______________________________________________
drbd-user mailing list
drbd-user at lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user