[DRBD-user] network startup failure

Scot Kreienkamp SKreien at la-z-boy.com
Thu Mar 8 15:30:02 CET 2012

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.

Hey everyone,

I just setup my first DRBD installation yesterday in a lab.  I was able to configure it, start it, and perform the initial sync.  Everything seemed to be working correctly until I restarted the two DRBD nodes.  When they restarted they dropped into recovery mode because they couldn't mount the disk from fstab.  I took the entry out of fstab and restarted, and it's been all downhill from there.  The official user guide leaves out a lot of practical info, so I would appreciate some pointers.

Question 1: I set DRBD on each node to start automatically and to mount /dev/drbd1 from fstab.  Is that the correct way to do things?  This is in a lab instance just to learn about DRBD, I don't want to mess with Pacemaker, Heartbeat, or anything like that right now.  Manual failover is ok for now.

Question 2:  Since the failure I have been unable to start DRBD despite everything I've tried.  When I try to start it, this is what I get:

Starting DRBD resources: [
     create res: drbd0
   prepare disk: drbd0
    adjust disk: drbd0
     adjust net: drbd0:failed(connect:20)

Which makes things seem like a network problem.  But this same config was working right up to the reboot, and the config files are in sync between the two nodes.  Here is /proc/drbd from node 1:

version: 8.4.1 (api:1/proto:86-100)
GIT-hash: 91b4c048c1a0e06777b5f65d312b38d47abaea80 build by root at retv3130.na.lzb.hq, 2012-03-07 10:30:49

1: cs:StandAlone ro:Primary/Unknown ds:UpToDate/DUnknown   r----s
    ns:0 nr:0 dw:0 dr:672 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:157277468

And node 2:

version: 8.4.1 (api:1/proto:86-100)
GIT-hash: 91b4c048c1a0e06777b5f65d312b38d47abaea80 build by root at retv3131.na.lzb.hq, 2012-03-07 10:32:07

1: cs:StandAlone ro:Secondary/Unknown ds:Inconsistent/DUnknown   r----s
    ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:157277468

>From /proc/drbd it looks like both are in standalone mode.  So I tried several things, all the way up to reinitialization, and it will still not connect.  I tried to manually connect, I've tried multiple restarts, I've tried telling it to outdate the peer from the primary node and got the response that it had to be connected first..... nothing works.  Both nodes are on the same network, and are pingable from each other, and there is nothing else listening on the port that DRBD wants.  Why will they not start or connect?

Here's my configs:

global {
        usage-count no;
        # minor-count dialog-refresh disable-ip-verification

common {
        handlers {
                pri-on-incon-degr "/usr/lib/drbd/notify-pri-on-incon-degr.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f";
                pri-lost-after-sb "/usr/lib/drbd/notify-pri-lost-after-sb.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f";
                local-io-error "/usr/lib/drbd/notify-io-error.sh; /usr/lib/drbd/notify-emergency-shutdown.sh; echo o > /proc/sysrq-trigger ; halt -f";
                # fence-peer "/usr/lib/drbd/crm-fence-peer.sh";
                split-brain "/usr/lib/drbd/notify-split-brain.sh root";
                out-of-sync "/usr/lib/drbd/notify-out-of-sync.sh root";
                # before-resync-target "/usr/lib/drbd/snapshot-resync-target-lvm.sh -p 15 -- -c 16k";
                # after-resync-target /usr/lib/drbd/unsnapshot-resync-target-lvm.sh;

        startup {
                become-primary-on retv3130.na.lzb.hq;
                # wfc-timeout degr-wfc-timeout outdated-wfc-timeout wait-after-sb

        options {
                # cpu-mask on-no-data-accessible

        disk {
                # size max-bio-bvecs on-io-error fencing disk-barrier disk-flushes
                # disk-drain md-flushes resync-rate resync-after al-extents
                # c-plan-ahead c-delay-target c-fill-target c-max-rate
                # c-min-rate disk-timeout

        net {
                protocol B;
               after-sb-0pri discard-younger-primary;
                after-sb-1pri discard-secondary;
                after-sb-2pri discard-younger-primary;

                # protocol timeout max-epoch-size max-buffers unplug-watermark
                # connect-int ping-int sndbuf-size rcvbuf-size ko-count
                # allow-two-primaries cram-hmac-alg shared-secret after-sb-0pri
                # after-sb-1pri after-sb-2pri always-asbp rr-conflict
                # ping-timeout data-integrity-alg tcp-cork on-congestion
                # congestion-fill congestion-extents csums-alg verify-alg
                # use-rle

resource drbd0 {
  on retv3130.na.lzb.hq {
    device    /dev/drbd1;
    disk      /dev/mapper/vg_linuxtemplate-NFS;
    meta-disk internal;
  on retv3131.na.lzb.hq {
    device    /dev/drbd1;
    disk      /dev/mapper/vg_linuxtemplate-NFS;
    meta-disk internal;

Thanks for the help.

Scot Kreienkamp
skreien at la-z-boy.com

This message is intended only for the individual or entity to which it is addressed. It may contain privileged, confidential information which is exempt from disclosure under applicable laws. If you are not the intended recipient, please note that you are strictly prohibited from disseminating or distributing this information (other than to the intended recipient) or copying this information. If you have received this communication in error, please notify us immediately by e-mail or by telephone at the above number. Thank you.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20120308/306a7ccc/attachment.htm>

More information about the drbd-user mailing list