[DRBD-user] Stuck in Standalone

Keith Ouellette KeithO at fibermountain.com
Thu Apr 16 02:49:12 CEST 2015

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


We have two nodes that have two drbd resources for two different applications on a pair of servers managed by Pacemaker. All looks to work fine when the primary node is put into standby or power cycled. Meaning that the drbd Primary gets moved to the new active node and the applications continue to run as expected. I have an issue when I pull the Ethernet out of the primary node and let it sit there for about a half hour. When I unplug it the Primary gets moved as expected and the applications continue to work. However, when I plug the Ethernet back into the system, both nodes go into a standalone state.

Node 1:

drbd driver loaded OK; device status:
version: 8.4.3 (api:1/proto:86-101)
srcversion: F97798065516C94BE0F27DC
m:res  cs          ro               ds                 p       mounted  fstype
0:r0   StandAlone  Primary/Unknown  UpToDate/DUnknown  r-----  ext4
1:r1   StandAlone  Primary/Unknown  UpToDate/DUnknown  r-----  ext4

Node 2:

drbd driver loaded OK; device status:
version: 8.4.3 (api:1/proto:86-101)
srcversion: F97798065516C94BE0F27DC
m:res  cs          ro                 ds                 p       mounted  fstype
0:r0   StandAlone  Secondary/Unknown  UpToDate/DUnknown  r-----
1:r1   StandAlone  Secondary/Unknown  UpToDate/DUnknown  r-----

As you can see one knows it is Primary and that is what the applications continue to run on. The second node knows it should be Secondary. All I do to resolve this is connect the resources on each node with the Secondary having the -discard-my-data option.

Is there a way to have the connects done automatically. This looks to be a type of "split brain' and I do have that configured in the global.common.conf:

global {
        usage-count no;
        # minor-count dialog-refresh disable-ip-verification
}
common {
        handlers {
                # These are EXAMPLE handlers only.
                # They may have severe implications,
                # like hard resetting the node under certain circumstances.
                # Be careful when chosing your poison.
                # pri-on-incon-degr "/usr/lib/drbd/notify-pri-on-incon-degr.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f";
                # pri-lost-after-sb "/usr/lib/drbd/notify-pri-lost-after-sb.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f";
                # local-io-error "/usr/lib/drbd/notify-io-error.sh; /usr/lib/drbd/notify-emergency-shutdown.sh; echo o > /proc/sysrq-trigger ; halt -f";
                # fence-peer "/usr/lib/drbd/crm-fence-peer.sh";
                split-brain "/usr/lib/drbd/notify-split-brain.sh root";
                # out-of-sync "/usr/lib/drbd/notify-out-of-sync.sh root";
                # before-resync-target "/usr/lib/drbd/snapshot-resync-target-lvm.sh -p 15 -- -c 16k";
                # after-resync-target /usr/lib/drbd/unsnapshot-resync-target-lvm.sh;
        }
        startup {
                # wfc-timeout degr-wfc-timeout outdated-wfc-timeout wait-after-sb
        }
        options {
                # cpu-mask on-no-data-accessible
        }
        disk {
                # size max-bio-bvecs on-io-error fencing disk-barrier disk-flushes
                # disk-drain md-flushes resync-rate resync-after al-extents
                # c-plan-ahead c-delay-target c-fill-target c-max-rate
                # c-min-rate disk-timeout
        }
        net {
                after-sb-0pri discard-zero-changes;
                after-sb-1pri discard-secondary;
                # after-sb-2pri consensus;
                after-sb-2pri disconnect;
                # protocol timeout max-epoch-size max-buffers unplug-watermark
                # connect-int ping-int sndbuf-size rcvbuf-size ko-count
                # allow-two-primaries cram-hmac-alg shared-secret after-sb-0pri
                # after-sb-1pri after-sb-2pri always-asbp rr-conflict
                # ping-timeout data-integrity-alg tcp-cork on-congestion
                # congestion-fill congestion-extents csums-alg verify-alg
                # use-rle
        }
}

The following are also the resource files:

r0.res:

resource r0 {
        on Node1 {
                volume 0 {
                        device          /dev/drbd0;
                        disk            /dev/ Node1-vg/AOS;
                        flexible-meta-disk      internal;
                }
                address         10.0.6.221:7788;
        }
        on Node2 {
                volume 0 {
                        device          /dev/drbd0;
                        disk            /dev/ Node2-vg/AOS;
                        flexible-meta-disk      internal;
                }
                address         10.0.6.222:7788;
        }
}

r1.res:

resource r1 {
        on Node1 {
                volume 0 {
                        device          /dev/drbd1;
                        disk            /dev/ Node1-vg/Controller;
                        flexible-meta-disk      internal;
                }
                address         10.0.6.221:7789;
        }
        on Node2 {
                volume 0 {
                        device          /dev/drbd1;
                        disk            /dev/ Node2-vg/Controller;
                        flexible-meta-disk      internal;
                }
                address         10.0.6.222:7789;
        }
}

I am not sure if this is possible, but I figured I would ask.

Thanks,
Keith


[cid:fm-logo.jpg]<http://www.fibermountain.com>
[cid:2015FMI.jpg]


Keith Ouellette


KeithO at fibermountain.com


700 West Johnson Avenue
Cheshire, CT06410
www.fibermountain.com


[cid:redline.jpg]


P. (203) 806-4046
C. (860) 810-4877
F. (845) 358-7882





Disclaimer: The information contained in this communication is confidential, may be privileged and is intended for the exclusive use of the above named addressee(s). If you are not the intended recipient(s), you are expressly prohibited from copying, distributing, disseminating, or in any other way using any information contained within this communication. If you have received this communication in error, please contact the sender by telephone or by response via mail. We have taken precautions to minimize the risk of transmitting software viruses, but we advise you to carry out your own virus checks on this message, as well as any attachments. We cannot accept liability for any loss or damage caused by software viruses.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20150416/95fbf13d/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: fm-logo.jpg
Type: image/jpeg
Size: 18744 bytes
Desc: fm-logo.jpg
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20150416/95fbf13d/attachment.jpg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 2015FMI.jpg
Type: image/jpeg
Size: 20461 bytes
Desc: 2015FMI.jpg
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20150416/95fbf13d/attachment-0001.jpg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: redline.jpg
Type: image/jpeg
Size: 538 bytes
Desc: redline.jpg
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20150416/95fbf13d/attachment-0002.jpg>


More information about the drbd-user mailing list