[DRBD-user] drbd-8.2.7 Local READ failed... ds: Diskless/UpToDate

Tue Dec 23 15:31:31 CET 2008

Hi,

I'm using drbd-8.2.7 with "common { disk { on-io-error detach; } }" and see "drbd3: Local READ failed..." messages even after logs show drbd3 disk state changed to Diskless.  It seems drbd did not detached the local drbd3 disk.

It is causing load average to increase beyond 40 and the file system stacked on drbd3 to stall waiting for I/O (unacceptable).

If not a bug can there be an option to emulate drbd-0.7 behavior to detach local disk immediately on I/O error?

Dec 22 10:25:38 node1 kernel: ata2: command 0x25 timeout, stat 0xd0 host_stat 0x1
Dec 22 10:25:38 node1 kernel: ata2: status=0xd0 { Busy }
Dec 22 10:25:38 node1 kernel: SCSI error : <1 0 0 0> return code = 0x8000002
Dec 22 10:25:38 node1 kernel: sdb: Current: sense key: Aborted Command
Dec 22 10:25:38 node1 kernel:     Additional sense: Scsi parity error
Dec 22 10:25:38 node1 kernel: end_request: I/O error, dev sdb, sector 4057363
Dec 22 10:25:38 node1 kernel: drbd3: got an _req_mod() errno of -5
Dec 22 10:25:38 node1 kernel: drbd3: Local READ failed sec=1952848s size=4096
Dec 22 10:25:38 node1 kernel: drbd3: disk( UpToDate -> Failed ) 
Dec 22 10:25:38 node1 kernel: drbd3: Local IO failed. Detaching...
Dec 22 10:25:38 node1 kernel: ATA: abnormal status 0xD0 on port 0xE007
Dec 22 10:25:38 node1 last message repeated 2 times
Dec 22 10:25:38 node1 kernel: drbd3: disk( Failed -> Diskless ) 
Dec 22 10:25:38 node1 kernel: drbd3: Notified peer that my disk is broken.
...
Dec 22 10:33:07 node1 watchdog[68054]: loadavg 37 24 12 is higher than the given threshold 36 27 18!
Dec 22 10:33:07 node1 watchdog[68054]: shutting down the system because of error -3
Dec 22 10:33:08 node1 kernel: ata2: command 0x25 timeout, stat 0xd0 host_stat 0x1
Dec 22 10:33:08 node1 kernel: ata2: status=0xd0 { Busy }
Dec 22 10:33:08 node1 kernel: SCSI error : <1 0 0 0> return code = 0x8000002
Dec 22 10:33:08 node1 kernel: sdb: Current: sense key: Aborted Command
Dec 22 10:33:08 node1 kernel:     Additional sense: Scsi parity error
Dec 22 10:33:08 node1 kernel: end_request: I/O error, dev sdb, sector 235310987
Dec 22 10:33:08 node1 kernel: drbd3: got an _req_mod() errno of -5
Dec 22 10:33:08 node1 kernel: drbd3: Local READ failed sec=233206472s size=4096
Dec 22 10:33:08 node1 kernel: ATA: abnormal status 0xD0 on port 0xE007
Dec 22 10:33:08 node1 last message repeated 2 times
...
Shutdown/reboot with sync took _very_ long; gets stuck waiting for drbd3!
...
Dec 22 11:13:10 node1 kernel: end_request: I/O error, dev sdb, sector 449904539
Dec 22 11:13:10 node1 kernel: drbd3: got an _req_mod() errno of -5
Dec 22 11:13:10 node1 kernel: drbd3: Local READ failed sec=447800024s size=4096
...
Dec 22 11:14:10 node1 kernel: end_request: I/O error, dev sdb, sector 180695108
Dec 22 11:14:10 node1 kernel: drbd3: got an _req_mod() errno of -5
Dec 22 11:14:10 node1 kernel: drbd3: Local WRITE failed sec=178590593s size=512
...
Dec 22 11:20:37 node1 syslogd 1.4.1: restart (remote reception).
Dec 22 11:20:37 node1 syslog: syslogd startup succeeded

---

common {
    protocol               C;
    net {
        sndbuf-size        0;
        max-buffers      32768;
        unplug-watermark 2048;
        timeout           30;
        connect-int        5;
        ping-int           5;
        ko-count           3;
        after-sb-0pri    discard-older-primary;
        after-sb-1pri    consensus;
        after-sb-2pri    violently-as0p;
        rr-conflict      disconnect;
    }
    disk {
        on-io-error      detach;
    }
    syncer {
        rate              16M;
        al-extents       1187;
    }
    startup {
        wfc-timeout      120;
        degr-wfc-timeout 120;
    }
    handlers {
        before-resync-target "exit 0";
        after-resync-target "exit 0";
    }
}

resource drbd3 {
    on node1 {
        device           /dev/drbd3;
        disk             /dev/sdb2;
        address          ipv4 A.B.C.D:P;
        meta-disk        internal;
    }
    on node3 {
        device           /dev/drbd3;
        disk             /dev/sdb2;
        address          ipv4 A.B.C.E:P;
        meta-disk        internal;
    }
    disk {
        size             311385340K;
    }
    syncer {
        rate             16M;
        after            drbd2;
    }
}

_________________________________________________________________
Life on your PC is safer, easier, and more enjoyable with Windows Vista®. 
http://clk.atdmt.com/MRT/go/127032870/direct/01/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20081223/df433b7c/attachment.htm>