[DRBD-user] network crash, devices are waiting for connection

Mon Mar 16 11:29:19 CET 2009

Hello,

I have 2 storages servers (modified debian etch, kernel 2.6.24.7, drbd
8.0.12) which export blocks devices to xen hosts. On the xen dom0, I mount
iscsi device for domUs.

On the first, drbd0 is primary, drbd1 secondary. On the second, drbd0 is
secondary, drbd1 primary.

There was a network failure, one month ago, and I'm just seeing that the
drbd devices were unclean since today.

On the first server, the two devices are visibles. The first is primary and
the second is out of date. All devices are waiting for connection (network
is up).
On the second server, only one device is visible (is primary and network is
up).

I'd like to known how I can resolve this situation without downtime (if it's
possible) or without breaking somethings.

If I use /etc/init.d/drbd reload, on the second server, is there a risk to
broke something ?

Thank for any help.

srv-1 #  cat /proc/drbd
version: 8.0.12 (api:86/proto:86)
GIT-hash: 5c9f89594553e32adb87d9638dce591782f947e3 build by
root at contrebasse, 2008-06-06 16:22:21
 0: cs:WFConnection st:Primary/Unknown ds:UpToDate/DUnknown C r---
    ns:0 nr:0 dw:64001036 dr:34592790 al:746290 bm:746260 lo:0 pe:0 ua:0
ap:0
        resync: used:0/61 hits:0 misses:0 starving:0 dirty:0 changed:0
        act_log: used:0/127 hits:15253969 misses:24137474 starving:0
dirty:23391184 changed:746290
 1: cs:WFConnection st:Secondary/Unknown ds:UpToDate/DUnknown C r---
    ns:0 nr:0 dw:0 dr:5832 al:0 bm:102 lo:0 pe:0 ua:0 ap:0
        resync: used:0/61 hits:0 misses:0 starving:0 dirty:0 changed:0
        act_log: used:0/127 hits:0 misses:0 starving:0 dirty:0 changed:0

srv-2 #  cat /proc/drbd
version: 8.0.12 (api:86/proto:86)
GIT-hash: 5c9f89594553e32adb87d9638dce591782f947e3 build by
root at contrebasse, 2008-06-06 16:22:21

 1: cs:StandAlone st:Primary/Unknown ds:UpToDate/DUnknown   r---
    ns:0 nr:0 dw:46114764 dr:11404539 al:578102 bm:578073 lo:0 pe:0 ua:0
ap:0
        resync: used:0/61 hits:0 misses:0 starving:0 dirty:0 changed:0
        act_log: used:0/127 hits:10950578 misses:16609945 starving:0
dirty:16031843 changed:578102

# drbd.conf
global { 
        usage-count no; 
}
common {
        protocol C;
}
resource drbd0 {
        syncer {
                rate 100M;
        }
        disk {
        on-io-error detach;
        }
        net { 
                after-sb-0pri disconnect;
                after-sb-1pri disconnect;
                after-sb-2pri disconnect;
                rr-conflict disconnect;
        }
        handlers {
                pri-on-incon-degr "echo o > /proc/sysrq-trigger ; halt -f";
                pri-lost-after-sb "echo o > /proc/sysrq-trigger ; halt -f";
                local-io-error "echo o > /proc/sysrq-trigger ; halt -f";
                split-brain "echo split-brain. drbdadm -- --discard-my-data
connect $DRBD_RESOURCE ? | mail -s 'DRBD Alert' root";
        }
        startup {
                wfc-timeout  15;
        }
        on srv-1 {
                device     /dev/drbd0;
                disk       /dev/sda4;
                address    192.168.60.1:7788;
                meta-disk  internal;
        }
        on srv-2 {
                device    /dev/drbd0;
                disk      /dev/sda4;
                address   192.168.60.2:7788;
                meta-disk internal;
        }

}
resource drbd1 {
        syncer {
                rate 100M;
        }
        disk {
        on-io-error detach;
        }
        net { 
                after-sb-0pri disconnect;
                after-sb-1pri disconnect;
                after-sb-2pri disconnect;
                rr-conflict disconnect;
        }
        handlers {
                pri-on-incon-degr "echo o > /proc/sysrq-trigger ; halt -f";
                pri-lost-after-sb "echo o > /proc/sysrq-trigger ; halt -f";
                local-io-error "echo o > /proc/sysrq-trigger ; halt -f";
                split-brain "echo split-brain. drbdadm -- --discard-my-data
connect $DRBD_RESOURCE ? | mail -s 'DRBD Alert' root";
        }
        startup {
                wfc-timeout  15;
        }
        on srv-1 {
                device     /dev/drbd1;
                disk       /dev/sda5;
                address    192.168.60.1:7789;
                meta-disk  internal;
        }
        on srv-2 {
                device    /dev/drbd1;
                disk      /dev/sda5;
                address   192.168.60.2:7789;
                meta-disk internal;
        }
}
-- 
View this message in context: http://www.nabble.com/network-crash%2C-devices-are-waiting-for-connection-tp22535279p22535279.html
Sent from the DRBD - User mailing list archive at Nabble.com.