[DRBD-user] DRBD Data Replication Problem in network failure

balajisundar at midascomm.com balajisundar at midascomm.com
Wed May 16 08:40:12 CEST 2007

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Hi,
        I'm a newbie flirting with DRBD.
If I remove ethernet and plug it in again (more than 10 seconds apart),
it springs to life within a few seconds.

But If I do a "service network stop" and "service network start" which are
more than 10 seconds apart  it doesn't re-establish the connection. (Same
happens if I do a "ifdown  eth0"  and "ifup eth0".

The connection parameters that I think that controls this behaviour are,
ping-int and ping-timeout which I've set to 10 and 5 respectively.
(The o/p of drbdadm dump from PC1 is given below. It is the same as in PC2)

The logs in /var/log/messages when I remove and connect ethernet:

Ethernet unplugged
==================
May 15 17:44:12 PC1 kernel: drbd0: PingAck did not arrive in time.
May 15 17:44:12 PC1 kernel: drbd0: peer( Secondary -> Unknown ) conn(
Connected
-> NetworkFailure ) pdsk( UpToDate -> DUnknown )
May 15 17:44:12 PC1 kernel: drbd0: Creating new current UUID
May 15 17:44:12 PC1 kernel: drbd0: asender terminated
May 15 17:44:12 PC1 kernel: drbd0: short read expecting header on sock:
r=-512
May 15 17:44:12 PC1 kernel: drbd0: tl_clear()
May 15 17:44:12 PC1 kernel: drbd0: Connection closed
May 15 17:44:12 PC1 kernel: drbd0: Writing meta data super block now.
May 15 17:44:13 PC1 kernel: drbd0: conn( NetworkFailure -> Unconnected )
May 15 17:44:13 PC1 kernel: drbd0: receiver terminated
May 15 17:44:13 PC1 kernel: drbd0: receiver (re)started

Ethernet plugged
================
May 15 17:45:21 PC1 kernel: drbd0: conn( WFConnection -> WFReportParams )
May 15 17:45:21 PC1 kernel: drbd0: Handshake successful: DRBD Network
Protocol version 86
May 15 17:45:21 PC1 kernel: drbd0: peer( Unknown -> Secondary ) conn(
WFReportParams -> WFBitMapS ) pdsk( DUnknown -> UpToDa
te )
May 15 17:45:21 PC1 kernel: drbd0: Writing meta data super block now.
May 15 17:45:21 PC1 kernel: drbd0: conn( WFBitMapS -> SyncSource ) pdsk(
UpToDate -> Inconsistent )
May 15 17:45:21 PC1 kernel: drbd0: Began resync as SyncSource (will sync 0
KB [0 bits set]).
May 15 17:45:21 PC1 kernel: drbd0: Resync done (total 1 sec; paused 0 sec;
0 K/sec)
May 15 17:45:21 PC1 kernel: drbd0: conn( SyncSource -> Connected ) pdsk(
Inconsistent -> UpToDate )
May 15 17:45:21 PC1 kernel: drbd0: Writing meta data super block now.
May 15 17:45:21 PC1 kernel: drbd0: aftr_isp( 0 -> 1 )
May 15 17:45:21 PC1 kernel: drbd0: aftr_isp( 1 -> 0 )

service network stop
====================
May 15 17:47:01 PC1 kernel: drbd0: PingAck did not arrive in time.
May 15 17:47:01 PC1 kernel: drbd0: peer( Secondary -> Unknown ) conn(
Connected
-> NetworkFailure ) pdsk( UpToDate -> DUnknown )
May 15 17:47:01 PC1 kernel: drbd0: Creating new current UUID
May 15 17:47:01 PC1 kernel: drbd0: asender terminated
May 15 17:47:01 PC1 kernel: drbd0: short read expecting header on sock:
r=-512
May 15 17:47:01 PC1 kernel: drbd0: tl_clear()
May 15 17:47:01 PC1 kernel: drbd0: Connection closed
May 15 17:47:01 PC1 kernel: drbd0: Writing meta data super block now.
May 15 17:47:01 PC1 kernel: drbd0: conn( NetworkFailure -> Unconnected )
May 15 17:47:01 PC1 kernel: drbd0: receiver terminated
May 15 17:47:01 PC1 kernel: drbd0: receiver (re)started
May 15 17:47:01 PC1 kernel: drbd0: conn( Unconnected -> WFConnection )

service network start
=====================
        No response in messages from DRBD.

Dump of drbdadm dump
====================
# /etc/drbd.conf
common {
    syncer {
        rate             10M;
    }
}

resource drbd0 {
    protocol               C;
    on PC1 {
        device           /dev/drbd0;
        disk             /dev/mapper/vgroot-LogVol01;
        address          192.168.13.110:7788;
        meta-disk        internal;
    }
    on PC2 {
        device           /dev/drbd0;
        disk             /dev/mapper/vgroot-LogVol01;
        address          192.168.13.222:7788;
        meta-disk        internal;
    }
  net {
        timeout           60;
        connect-int       10;
        ping-int          10;
        ping-timeout       5;
        max-buffers      2048;
        max-epoch-size   2048;
        ko-count           2;
        after-sb-0pri    discard-least-changes;
        after-sb-1pri    discard-secondary;
        after-sb-2pri    disconnect;
        rr-conflict      disconnect;
    }
    disk {
        on-io-error      detach;
    }
    syncer {
        rate             10M;
        after            drbd1;
        al-extents       257;
    }
    startup {
        wfc-timeout       15;
        degr-wfc-timeout  15;
    }
    handlers {
        pri-on-incon-degr "echo O > /proc/sysrq-trigger ; halt -f";
        pri-lost-after-sb "echo O > /proc/sysrq-trigger ; halt -f";
        local-io-error   "echo O > /proc/sysrq-trigger ; halt -f";
    }
}

resource drbd1 {
    protocol               C;
    on PC1 {
        device           /dev/drbd1;
        disk             /dev/mapper/vgroot-LogVol00;
        address          192.168.13.110:7789;
        meta-disk        internal;
    }
    on PC2 {
        device           /dev/drbd1;
        disk             /dev/mapper/vgroot-LogVol00;
        address          192.168.13.222:7789;
        meta-disk        internal;
    }
    net {
        timeout           60;
        connect-int       10;
        ping-int          10;
        ping-timeout       5;
        max-buffers      2048;
        max-epoch-size   2048;
        after-sb-0pri    discard-least-changes;
        after-sb-1pri    discard-secondary;
        after-sb-2pri    disconnect;
        rr-conflict      disconnect;
        ko-count           2;
    }
    disk {
        on-io-error      detach;
    }
    syncer {
        rate             10M;
        al-extents       257;
    }
    startup {
        wfc-timeout       15;
        degr-wfc-timeout  15;
    }
}



May Day!

--Regards
S.Balaji







More information about the drbd-user mailing list