[DRBD-user] Avoiding semi-fullsync's?

Wed Apr 27 14:27:46 CEST 2005

Hi,

Based on the story below, why are so many bits set
while it was perfectly in sync a few moments before
the Local IO failure? I really like to aviod these
semi-fullsync's. Please help. The story goes like:

Left-side server is Secondary and has Heartbeat running
ready to takeover..

13:14:45 drbd0: Secondary/Secondary --> Secondary/Primary
13:28:34 drbd0: PARTNER DISKLESS
13:28:50 drbd0: PingAck did not arrive in time.
13:28:50 drbd0: drbd0_asender [24044]: cstate Connected --> NetworkFailure
13:28:50 drbd0: asender terminated
13:28:50 drbd0: drbd0_receiver [18435]: cstate NetworkFailure --> BrokenPipe
13:28:50 drbd0: short read expecting header on sock: r=-512
13:28:50 drbd0: worker terminated
13:28:50 drbd0: drbd0_receiver [18435]: cstate BrokenPipe --> Unconnected
13:28:50 drbd0: Connection lost.
13:28:50 drbd0: drbd0_receiver [18435]: cstate Unconnected --> WFConnection
13:29:00 drbd0: Secondary/Unknown --> Primary/Unknown
13:29:01 EXT3 FS on drbd0, internal journal
13:56:55 drbd0: drbd0_receiver [18435]: cstate WFConnection --> 
WFReportParams
13:56:55 drbd0: Handshake successful: DRBD Network Protocol version 74
13:56:55 drbd0: Connection established.
13:56:55 drbd0: I am(P): 1:00000003:00000001:00000040:0000000b:10
13:56:55 drbd0: Peer(S): 1:00000003:00000001:0000003f:0000000a:11
13:56:55 drbd0: drbd0_receiver [18435]: cstate WFReportParams --> WFBitMapS
13:57:36 drbd0: 1453882468 KB now marked out-of-sync by on disk bit-map.
13:57:37 drbd0: Primary/Unknown --> Primary/Secondary
13:57:37 drbd0: drbd0_receiver [18435]: cstate WFBitMapS --> SyncSource
13:57:37 drbd0: Resync started as SyncSource (need to sync 1453882468 KB 
[363470617 bits set]).

Right-side server is Primary and has Heartbeat running
plus the nfsd and smbd services running. In order to
see what will happen I disconnected the scsi subsystem
and a mon-monitor kills heartbeat and tries to reboot.

13:14:45  drbd0: Secondary/Secondary --> Primary/Secondary
13:14:46  EXT3 FS on drbd0, internal journal
13:28:34  drbd0: Local IO failed. Detaching...
13:28:35  drbd0: Notified peer that my disk is broken.
13:55:50  drbd0: resync bitmap: bits=363470617 words=11358458
13:55:50  drbd0: size = 1386 GB (1453882468 KB)
13:56:09  drbd0: 0 KB marked out-of-sync by on disk bit-map.
13:56:09  drbd0: Found 10 transactions (592 active extents) in activity log.
13:56:09  drbd0: Marked additional 260 MB as out-of-sync based on AL.
13:56:09  drbd0: drbdsetup [2957]: cstate Unconfigured --> StandAlone
13:56:55  drbd0: drbdsetup [3010]: cstate StandAlone --> Unconnected
13:56:55  drbd0: drbd0_receiver [3011]: cstate Unconnected --> WFConnection
13:56:55  drbd0: drbd0_receiver [3011]: cstate WFConnection --> 
WFReportParams
13:56:55  drbd0: Handshake successful: DRBD Network Protocol version 74
13:56:55  drbd0: Connection established.
13:56:55  drbd0: I am(S): 1:00000003:00000001:0000003f:0000000a:11
13:56:55  drbd0: Peer(P): 1:00000003:00000001:00000040:0000000b:10
13:56:55  drbd0: drbd0_receiver [3011]: cstate WFReportParams --> WFBitMapT
13:56:55  drbd0: Secondary/Unknown --> Secondary/Primary
13:57:37  drbd0: drbd0_receiver [3011]: cstate WFBitMapT --> SyncTarget
13:57:37  drbd0: Resync started as SyncTarget (need to sync 1453882468 
KB [363470617 bits set]).

Drbd configuration: /etc/drbd.conf
resource drbd0 {
    protocol               C;
    incon-degr-cmd       "logger '!DRBD! pri on incon-degr'";
    on left {
        device           /dev/drbd0;
        disk             /dev/sdc1;
        address          192.168.0.3:7788;
        meta-disk        /dev/sdc2[0];
    }
    on right {
        device           /dev/drbd0;
        disk             /dev/sdc1;
        address          192.168.0.4:7788;
        meta-disk        /dev/sdc2[0];
    }
    disk {
        on-io-error      detach;
    }
    syncer {
        rate            99M;
        al-extents      521;
    }
    startup {
        degr-wfc-timeout 300;
    }
}

resource drbd1 {
    protocol               C;
    incon-degr-cmd       "logger '!DRBD! pri on incon-degr'";
    on left {
        device           /dev/drbd1;
        disk             /dev/sdd1;
        address          192.168.0.3:7888;
        meta-disk        /dev/sdd2[0];
    }
    on right {
        device           /dev/drbd1;
        disk             /dev/sdd1;
        address          192.168.0.4:7888;
        meta-disk        /dev/sdd2[0];
    }
    disk {
        on-io-error      detach;
    }
    syncer {
        rate            99M;
        al-extents      521;
    }
    startup {
        degr-wfc-timeout 300;
    }
}

resource drbd2 {
    protocol               C;
    incon-degr-cmd       "logger '!DRBD! pri on incon-degr'";
    on left {
        device           /dev/drbd2;
        disk             /dev/sde1;
        address          192.168.0.3:7988;
        meta-disk        /dev/sde2[0];
    }
    on right {
        device           /dev/drbd2;
        disk             /dev/sde1;
        address          192.168.0.4:7988;
        meta-disk        /dev/sde2[0];
    }
    disk {
        on-io-error      detach;
    }
    syncer {
        rate            99M;
        al-extents      521;
    }
    startup {
        degr-wfc-timeout 300;
    }
}

Many thanks,
Leroy

PS: This aint the full /var/log/messages, it doesnt
include two other drbds [1,2] which synced quickly
PS2: I asking the list because this happend twice now..