[DRBD-user] Not able to test Automatic split brain recovery policies

Digimer lists at alteeve.ca
Wed Apr 10 16:41:48 CEST 2013

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


To your immediate problem;

If you had configured fencing, drbd would not split-brain. Are you using 
pacemaker or RHCS?

Secondly, 8.3.8 is very, very old. Upgrading to a newer 8.3.x version 
would be a good idea.

Back to split-brain; DRBD declares a split-brain as soon as both nodes 
are StandAlone and Primary. To recover, you need to tell DRBD which node 
to consider "good" and then drop the changes on the peer and let the 
good node sync to the other node.

On 04/10/2013 08:08 AM, Shailesh Vaidya wrote:
> I have followed same procedure (disable Ethernet card) etc and after
> that drbd status on both the nodes
>
> [root at drbd1 ~]# cat /proc/drbd
>
> version: 8.3.8 (api:88/proto:86-94)
>
> GIT-hash: d78846e52224fd00562f7c225bcc25b2d422321d build by
> mockbuild at builder10.centos.org, 2010-06-04 08:04:09
>
> 0: cs:StandAlone ro:Primary/Unknown ds:UpToDate/DUnknown   r----
>
>      ns:4 nr:0 dw:12 dr:82 al:1 bm:3 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:4
>
> [root at drbd1 ~]#
>
> [root at drbd2 ~]# cat /proc/drbd
>
> version: 8.3.8 (api:88/proto:86-94)
>
> GIT-hash: d78846e52224fd00562f7c225bcc25b2d422321d build by
> mockbuild at builder10.centos.org, 2010-06-04 08:04:09
>
> 0: cs:StandAlone ro:Secondary/Unknown ds:UpToDate/DUnknown   r----
>
>      ns:0 nr:4 dw:56 dr:42 al:1 bm:4 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:4
>
> [root at drbd2 ~]#
>
> /var/log/messages shows
>
> Apr 10 07:51:35 localhost kernel: block drbd0: helper command:
> /sbin/drbdadm initial-split-brain minor-0
>
> Apr 10 07:51:35 localhost kernel: block drbd0: helper command:
> /sbin/drbdadm initial-split-brain minor-0 exit code 0 (0x0)
>
> Apr 10 07:51:35 localhost kernel: block drbd0: Split-Brain detected but
> unresolved, dropping connection!
>
> Apr 10 07:51:35 localhost kernel: block drbd0: helper command:
> /sbin/drbdadm split-brain minor-0
>
> Apr 10 07:51:35 localhost kernel: block drbd0: helper command:
> /sbin/drbdadm split-brain minor-0 exit code 0 (0x0)
>
> Apr 10 07:51:35 localhost kernel: block drbd0: conn( WFReportParams ->
> Disconnecting )
>
> Apr 10 07:51:35 localhost kernel: block drbd0: error receiving
> ReportState, l: 4!
>
> Apr 10 07:51:35 localhost kernel: block drbd0: asender terminated
>
> Apr 10 07:51:35 localhost kernel: block drbd0: Terminating asender thread
>
> Apr 10 07:51:35 localhost kernel: block drbd0: Connection closed
>
> Apr 10 07:51:35 localhost kernel: block drbd0: conn( Disconnecting ->
> StandAlone )
>
> Apr 10 07:51:35 localhost kernel: block drbd0: receiver terminated
>
> Apr 10 07:51:35 localhost kernel: block drbd0: Terminating receiver thread
>
> Now if I do ‘drbdadm connect r0’ on both the machines then,
>
> Apr 10 07:56:37 localhost kernel: block drbd0: uuid_compare()=100 by rule 90
>
> Apr 10 07:56:37 localhost kernel: block drbd0: helper command:
> /sbin/drbdadm initial-split-brain minor-0
>
> Apr 10 07:56:37 localhost kernel: block drbd0: helper command:
> /sbin/drbdadm initial-split-brain minor-0 exit code 0 (0x0)
>
> Apr 10 07:56:37 localhost kernel: block drbd0: Split-Brain detected, 1
> primaries, automatically solved. Sync from peer node
>
> Apr 10 07:56:37 localhost kernel: block drbd0: peer( Unknown -> Primary
> ) conn( WFReportParams -> WFBitMapT ) pdsk( DUnknown -> UpToDate )
>
> Apr 10 07:56:37 localhost kernel: block drbd0: conn( WFBitMapT ->
> WFSyncUUID )
>
> Apr 10 07:56:37 localhost kernel: block drbd0: helper command:
> /sbin/drbdadm before-resync-target minor-0
>
> Apr 10 07:56:37 localhost kernel: block drbd0: helper command:
> /sbin/drbdadm before-resync-target minor-0 exit code 0 (0x0)
>
> Apr 10 07:56:37 localhost kernel: block drbd0: conn( WFSyncUUID ->
> SyncTarget ) disk( UpToDate -> Inconsistent )
>
> Apr 10 07:56:37 localhost kernel: block drbd0: Began resync as
> SyncTarget (will sync 4 KB [1 bits set]).
>
> Apr 10 07:56:37 localhost kernel: block drbd0: Resync done (total 1 sec;
> paused 0 sec; 4 K/sec)
>
> Regards,
>
> Shailesh Vaidya
>
> *From:*drbd-user-bounces at lists.linbit.com
> [mailto:drbd-user-bounces at lists.linbit.com] *On Behalf Of *Dan Barker
> *Sent:* Wednesday, April 10, 2013 5:16 PM
> *To:* drbd-user at lists.linbit.com
> *Subject:* Re: [DRBD-user] Not able to test Automatic split brain
> recovery policies
>
> You don’t show the status of the nodes, but I imagine you have two
> primary nodes. There is no handler specified for two primary nodes. Did
> you have two primary, disconnected nodes?
>
> It shouldn’t be possible to create split brain without writing on both
> nodes.
>
> Dan
>
> *From:*drbd-user-bounces at lists.linbit.com
> <mailto:drbd-user-bounces at lists.linbit.com>
> [mailto:drbd-user-bounces at lists.linbit.com] *On Behalf Of *Shailesh Vaidya
> *Sent:* Wednesday, April 10, 2013 1:58 AM
> *To:* drbd-user at lists.linbit.com <mailto:drbd-user at lists.linbit.com>
> *Subject:* [DRBD-user] Not able to test Automatic split brain recovery
> policies
>
> Hello,
>
> I am using DRBD 8.3.8
>
> I have configured Automatic split brain recovery policies as below in
> /etc/drbd.conf
>
> net {
>
> max-buffers     2048;
>
> ko-count 4;
>
> after-sb-0pri discard-zero-changes;
>
> after-sb-1pri discard-secondary;
>
> }
>
> My both machines are Virtual machines so not connected actual
> back-to-back connection. To reproduce split-brain, I am using below
> procedure,
>
> 1.On Primary disable Ethernet card from ‘Virtual Machine properties’
>
> 2.Wait to Secondery to start switch over and again enable Ethernet card
> on Primary
>
> Log shows mw that split-brain is occurred , however its shows connection
> dropped.
>
> Apr  9 10:30:15 drbd1 kernel: block drbd0: uuid_compare()=100 by rule 90
>
> Apr  9 10:30:15 drbd1 kernel: block drbd0: helper command: /sbin/drbdadm
> initial-split-brain minor-0
>
> Apr  9 10:30:15 drbd1 kernel: block drbd0: helper command: /sbin/drbdadm
> initial-split-brain minor-0 exit code 0 (0x0)
>
> Apr  9 10:30:15 drbd1 kernel: block drbd0: Split-Brain detected but
> unresolved, dropping connection!
>
> Apr  9 10:30:15 drbd1 kernel: block drbd0: helper command: /sbin/drbdadm
> split-brain minor-0
>
> Apr  9 10:30:15 drbd1 kernel: block drbd0: helper command: /sbin/drbdadm
> split-brain minor-0 exit code 0 (0x0)
>
> Apr  9 10:30:15 drbd1 kernel: block drbd0: conn( WFReportParams ->
> Disconnecting )
>
> Full DRBD conf file
>
> [root at drbd1 ~]# cat /etc/drbd.conf
>
> global {
>
> usage-count no;
>
> }
>
> resource r0 {
>
> protocol C;
>
> #incon-degr-cmd "echo !DRBD! pri on incon-degr | wall ; sleep 60 ; halt -f";
>
> on drbd1 {
>
> device     /dev/drbd0;
>
> disk       /dev/sda3;
>
> address    10.55.199.51:7789;
>
> meta-disk  internal;
>
> }
>
> on drbd2 {
>
> device    /dev/drbd0;
>
> disk      /dev/sda3;
>
> address   10.55.199.52:7789;
>
> meta-disk internal;
>
> }
>
> disk {
>
> on-io-error   detach;
>
> }
>
> net {
>
> max-buffers     2048;
>
> ko-count 4;
>
> after-sb-0pri discard-zero-changes;
>
> after-sb-1pri discard-secondary;
>
> }
>
> syncer {
>
> rate 25M;
>
> al-extents 257; # must be a prime number
>
> }
>
> startup {
>
> wfc-timeout  20;
>
> degr-wfc-timeout 120;    # 2 minutes.
>
> }
>
> }
>
> [root at drbd1 ~]# vi /var/log/messages
>
> [root at drbd1 ~]#
>
> [root at drbd1 ~]# cat /etc/drbd.conf
>
> global {
>
> usage-count no;
>
> }
>
> resource r0 {
>
> protocol C;
>
> #incon-degr-cmd "echo !DRBD! pri on incon-degr | wall ; sleep 60 ; halt -f";
>
> on drbd1 {
>
> device     /dev/drbd0;
>
> disk       /dev/sda3;
>
> address    10.55.199.51:7789;
>
> meta-disk  internal;
>
> }
>
> on drbd2 {
>
> device    /dev/drbd0;
>
> disk      /dev/sda3;
>
> address   10.55.199.52:7789;
>
> meta-disk internal;
>
> }
>
> disk {
>
> on-io-error   detach;
>
> }
>
> net {
>
> max-buffers     2048;
>
> ko-count 4;
>
> after-sb-0pri discard-zero-changes;
>
> after-sb-1pri discard-secondary;
>
> }
>
> syncer {
>
> rate 25M;
>
> al-extents 257; # must be a prime number
>
> }
>
> startup {
>
> wfc-timeout  20;
>
> degr-wfc-timeout 120;    # 2 minutes.
>
> }
>
> }
>
> [root at drbd1 ~]#
>
> Is this configuration issue or my testing procedure is not proper?
>
> Regards,
>
> Shailesh Vaidya
>
> DISCLAIMER ========== This e-mail may contain privileged and
> confidential information which is the property of Persistent Systems
> Ltd. It is intended only for the use of the individual or entity to
> which it is addressed. If you are not the intended recipient, you are
> not authorized to read, retain, copy, print, distribute or use this
> message. If you have received this communication in error, please notify
> the sender and delete all copies of this message. Persistent Systems
> Ltd. does not accept any liability for virus infected mails.
>
> DISCLAIMER ========== This e-mail may contain privileged and
> confidential information which is the property of Persistent Systems
> Ltd. It is intended only for the use of the individual or entity to
> which it is addressed. If you are not the intended recipient, you are
> not authorized to read, retain, copy, print, distribute or use this
> message. If you have received this communication in error, please notify
> the sender and delete all copies of this message. Persistent Systems
> Ltd. does not accept any liability for virus infected mails.
>
>
>
> _______________________________________________
> drbd-user mailing list
> drbd-user at lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user
>


-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without 
access to education?



More information about the drbd-user mailing list