[DRBD-user] recovering from "Local IO failed. Detaching..."

Gianluca Cecchi gianluca.cecchi at gmail.com
Tue Sep 15 12:06:28 CEST 2009

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


I'm again here.

After a little the bad diskless situation restarted also with 8.3.3rc2.
What the first change or action to debug to try?
In particular, I don't understand if the line in dmesg

end_request: I/O error, dev cciss/c0d0, sector 0

is reported by drbd or scsi layer or what...
In theory drbd should not try to read sector 0 at all as It uses c0d0p3 and
c0d0p4.....
or does this mean that it is not able to read the partition table....?
Thanks for help
Gianluca

On the problematic node I have:

drbd: initialized. Version: 8.3.3rc2 (api:88/proto:86-91)
drbd: GIT-hash: 04b2f175d7076ef2e0dd7d5ba6f6843357a041ed build by
root at virtfedbis, 2009-09-11 10:06:20
drbd: registered as block device major 147
drbd: minor_table @ 0xffff880829c04700
block drbd0: Starting worker thread (from cqueue [1779])
block drbd0: disk( Diskless -> Attaching )
block drbd0: Found 6 transactions (244 active extents) in activity log.
block drbd0: Method to ensure write ordering: barrier
block drbd0: max_segment_size ( = BIO size ) = 32768
block drbd0: drbd_bm_resize called with capacity == 109317376
block drbd0: resync bitmap: bits=13664672 words=213511
block drbd0: size = 52 GB (54658688 KB)
block drbd0: recounting of set bits took additional 2 jiffies
block drbd0: 0 KB (0 bits) marked out-of-sync by on disk bit-map.
block drbd0: Marked additional 920 MB as out-of-sync based on AL.
end_request: I/O error, dev cciss/c0d0, sector 0
block drbd0: meta data flush failed with status -95, disabling md-flushes
block drbd0: disk( Attaching -> Consistent )
block drbd0: conn( StandAlone -> Unconnected )
block drbd0: Starting receiver thread (from drbd0_worker [1781])
block drbd0: receiver (re)started
block drbd0: conn( Unconnected -> WFConnection )
block drbd0: Handshake successful: Agreed network protocol version 91
block drbd0: Peer authenticated using 20 bytes of 'sha1' HMAC
block drbd0: conn( WFConnection -> WFReportParams )
block drbd0: Starting asender thread (from drbd0_receiver [1806])
block drbd0: data-integrity-alg: <not-used>
block drbd0: drbd_sync_handshake:
block drbd0: self
71E806A9BE572C28:0000000000000000:9EB0CCB7634CBCDC:A0332E51B243BEE1
bits:235520 flags:0
block drbd0: peer
81BAD3F384A6F3C7:71E806A9BE572C29:9EB0CCB7634CBCDD:A0332E51B243BEE1
bits:109854 flags:0
block drbd0: uuid_compare()=-1 by rule 50
block drbd0: peer( Unknown -> Primary ) conn( WFReportParams -> WFBitMapT )
disk( Consistent -> Outdated ) pdsk( DUnknown -> UpToDate )
block drbd0: role( Secondary -> Primary )
block drbd0: conn( WFBitMapT -> WFSyncUUID )
block drbd0: helper command: /sbin/drbdadm before-resync-target minor-0
block drbd0: helper command: /sbin/drbdadm before-resync-target minor-0 exit
code 0 (0x0)
block drbd0: conn( WFSyncUUID -> SyncTarget ) disk( Outdated -> Inconsistent
)
block drbd0: Began resync as SyncTarget (will sync 1381496 KB [345374 bits
set]).
block drbd0: Resync aborted.
block drbd0: conn( SyncTarget -> Connected ) disk( Inconsistent -> Failed )
block drbd0: Local IO failed. Detaching...
block drbd0: Can not write resync data to local disk.
block drbd0: Can not write resync data to local disk.
block drbd0: drbd_rs_complete_io() called, but extent not found
block drbd0: drbd_rs_complete_io() called, but extent not found
block drbd0: drbd_rs_complete_io() called, but extent not found
block drbd0: drbd_rs_complete_io() called, but extent not found
block drbd0: drbd_rs_complete_io() called, but extent not found
block drbd0: drbd_rs_complete_io() called, but extent not found
block drbd0: Can not write resync data to local disk.
block drbd0: Can not write resync data to local disk.
block drbd0: Can not write resync data to local disk.
block drbd0: drbd_rs_complete_io() called, but extent not found
block drbd0: drbd_rs_complete_io() called, but extent not found
block drbd0: drbd_rs_complete_io() called, but extent not found
block drbd0: disk( Failed -> Diskless )
block drbd0: Notified peer that my disk is broken.
block drbd0: Can not write mirrored data block to local disk.

On the peer node I have:
Sep 15 11:46:29 virtfed kernel: block drbd0: Handshake successful: Agreed
network protocol version 91
Sep 15 11:46:29 virtfed kernel: block drbd0: Peer authenticated using 20
bytes of 'sha1' HMAC
Sep 15 11:46:29 virtfed kernel: block drbd0: conn( WFConnection ->
WFReportParams )
Sep 15 11:46:29 virtfed kernel: block drbd0: Starting asender thread (from
drbd0_receiver [2450])
Sep 15 11:46:29 virtfed kernel: block drbd0: data-integrity-alg: <not-used>
Sep 15 11:46:29 virtfed kernel: block drbd0: drbd_sync_handshake:
Sep 15 11:46:29 virtfed kernel: block drbd0: self
81BAD3F384A6F3C7:71E806A9BE572C29:9EB0CCB7634CBCDD:A0332E51B243BEE1
bits:109854 flags:0
Sep 15 11:46:29 virtfed kernel: block drbd0: peer
71E806A9BE572C28:0000000000000000:9EB0CCB7634CBCDC:A0332E51B243BEE1
bits:235520 flags:2
Sep 15 11:46:29 virtfed kernel: block drbd0: uuid_compare()=1 by rule 70
Sep 15 11:46:29 virtfed kernel: block drbd0: peer( Unknown -> Secondary )
conn( WFReportParams -> WFBitMapS )
Sep 15 11:46:29 virtfed kernel: block drbd0: peer( Secondary -> Primary )
Sep 15 11:46:29 virtfed kernel: block drbd0: conn( WFBitMapS -> SyncSource )
pdsk( Outdated -> Inconsistent )
Sep 15 11:46:29 virtfed kernel: block drbd0: Began resync as SyncSource
(will sync 1381496 KB [345374 bits set]).
Sep 15 11:46:31 virtfed kernel: block drbd0: Got NegAck packet. Peer is in
troubles?
Sep 15 11:46:31 virtfed kernel: block drbd0: Got NegAck packet. Peer is in
troubles?
Sep 15 11:46:31 virtfed kernel: block drbd0: Got NegAck packet. Peer is in
troubles?
Sep 15 11:46:31 virtfed kernel: block drbd0: Got NegAck packet. Peer is in
troubles?
Sep 15 11:46:31 virtfed kernel: block drbd0: Got NegAck packet. Peer is in
troubles?
Sep 15 11:46:32 virtfed kernel: block drbd0: Resync aborted.
Sep 15 11:46:32 virtfed kernel: block drbd0: conn( SyncSource -> Connected )
pdsk( Inconsistent -> Diskless )
Sep 15 11:46:32 virtfed kernel: block drbd0: Not sending RSDataReply,
partner DISKLESS!
Sep 15 11:46:32 virtfed kernel: block drbd0: Not sending RSDataReply,
partner DISKLESS!
Sep 15 11:46:32 virtfed kernel: block drbd0: Not sending RSDataReply,
partner DISKLESS!
Sep 15 11:46:32 virtfed kernel: block drbd0: Not sending RSDataReply,
partner DISKLESS!
Sep 15 11:46:32 virtfed kernel: block drbd0: Not sending RSDataReply,
partner DISKLESS!

My drbd.conf at the moment:
global {
    usage-count yes;
}

common {
  syncer { rate 100M; }
}

resource r0 {
  protocol C;

  handlers {
    pri-on-incon-degr "/usr/lib/drbd/notify-pri-on-incon-degr.sh;
/usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ;
reboot -f";
    pri-lost-after-sb "/usr/lib/drbd/notify-pri-lost-after-sb.sh;
/usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ;
reboot -f";
    local-io-error "/usr/lib/drbd/notify-io-error.sh;
/usr/lib/drbd/notify-emergency-shutdown.sh; echo o > /proc/sysrq-trigger ;
halt -f";
    fence-peer "/usr/local/bin/obliterate-peer.sh";
  }

  startup {
    wfc-timeout  30;
    degr-wfc-timeout 10;    # 2 minutes.
    outdated-wfc-timeout 2;  # 2 seconds.
    become-primary-on both;
  }

  disk {
    on-io-error   detach;
    fencing resource-and-stonith;
  }

  net {
    allow-two-primaries;
    cram-hmac-alg "sha1";
    shared-secret "kvmdrbd";
    after-sb-0pri discard-zero-changes;
    after-sb-1pri discard-secondary;
    after-sb-2pri disconnect;
    rr-conflict disconnect;
  }

  syncer {
    al-extents 257;
  }

  on virtfed {
    device     /dev/drbd0;
    disk       /dev/cciss/c0d0p3;
    address    192.168.16.111:7788;
    flexible-meta-disk  /dev/cciss/c0d0p4;

  }

  on virtfedbis {
    device    /dev/drbd0;
    disk       /dev/cciss/c0d0p3;
    address    192.168.16.112:7788;
    flexible-meta-disk  /dev/cciss/c0d0p4;
  }
}
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20090915/0070e283/attachment.htm>


More information about the drbd-user mailing list