I'm again here.<br><br>After a little the bad diskless situation restarted also with 8.3.3rc2.<br>What the first change or action to debug to try?<br>In particular, I don't understand if the line in dmesg<br><br>end_request: I/O error, dev cciss/c0d0, sector 0<br>
<br>is reported by drbd or scsi layer or what...<br>In theory drbd should not try to read sector 0 at all as It uses c0d0p3 and c0d0p4.....<br>or does this mean that it is not able to read the partition table....?<br>Thanks for help<br>
Gianluca<br><br>On the problematic node I have:<br><br>drbd: initialized. Version: 8.3.3rc2 (api:88/proto:86-91)<br>drbd: GIT-hash: 04b2f175d7076ef2e0dd7d5ba6f6843357a041ed build by root@virtfedbis, 2009-09-11 10:06:20<br>
drbd: registered as block device major 147<br>drbd: minor_table @ 0xffff880829c04700<br>block drbd0: Starting worker thread (from cqueue [1779])<br>block drbd0: disk( Diskless -> Attaching ) <br>block drbd0: Found 6 transactions (244 active extents) in activity log.<br>
block drbd0: Method to ensure write ordering: barrier<br>block drbd0: max_segment_size ( = BIO size ) = 32768<br>block drbd0: drbd_bm_resize called with capacity == 109317376<br>block drbd0: resync bitmap: bits=13664672 words=213511<br>
block drbd0: size = 52 GB (54658688 KB)<br>block drbd0: recounting of set bits took additional 2 jiffies<br>block drbd0: 0 KB (0 bits) marked out-of-sync by on disk bit-map.<br>block drbd0: Marked additional 920 MB as out-of-sync based on AL.<br>
end_request: I/O error, dev cciss/c0d0, sector 0<br>block drbd0: meta data flush failed with status -95, disabling md-flushes<br>block drbd0: disk( Attaching -> Consistent ) <br>block drbd0: conn( StandAlone -> Unconnected ) <br>
block drbd0: Starting receiver thread (from drbd0_worker [1781])<br>block drbd0: receiver (re)started<br>block drbd0: conn( Unconnected -> WFConnection ) <br>block drbd0: Handshake successful: Agreed network protocol version 91<br>
block drbd0: Peer authenticated using 20 bytes of 'sha1' HMAC<br>block drbd0: conn( WFConnection -> WFReportParams ) <br>block drbd0: Starting asender thread (from drbd0_receiver [1806])<br>block drbd0: data-integrity-alg: <not-used><br>
block drbd0: drbd_sync_handshake:<br>block drbd0: self 71E806A9BE572C28:0000000000000000:9EB0CCB7634CBCDC:A0332E51B243BEE1 bits:235520 flags:0<br>block drbd0: peer 81BAD3F384A6F3C7:71E806A9BE572C29:9EB0CCB7634CBCDD:A0332E51B243BEE1 bits:109854 flags:0<br>
block drbd0: uuid_compare()=-1 by rule 50<br>block drbd0: peer( Unknown -> Primary ) conn( WFReportParams -> WFBitMapT ) disk( Consistent -> Outdated ) pdsk( DUnknown -> UpToDate ) <br>block drbd0: role( Secondary -> Primary ) <br>
block drbd0: conn( WFBitMapT -> WFSyncUUID ) <br>block drbd0: helper command: /sbin/drbdadm before-resync-target minor-0<br>block drbd0: helper command: /sbin/drbdadm before-resync-target minor-0 exit code 0 (0x0)<br>block drbd0: conn( WFSyncUUID -> SyncTarget ) disk( Outdated -> Inconsistent ) <br>
block drbd0: Began resync as SyncTarget (will sync 1381496 KB [345374 bits set]).<br>block drbd0: Resync aborted.<br>block drbd0: conn( SyncTarget -> Connected ) disk( Inconsistent -> Failed ) <br>block drbd0: Local IO failed. Detaching...<br>
block drbd0: Can not write resync data to local disk.<br>block drbd0: Can not write resync data to local disk.<br>block drbd0: drbd_rs_complete_io() called, but extent not found<br>block drbd0: drbd_rs_complete_io() called, but extent not found<br>
block drbd0: drbd_rs_complete_io() called, but extent not found<br>block drbd0: drbd_rs_complete_io() called, but extent not found<br>block drbd0: drbd_rs_complete_io() called, but extent not found<br>block drbd0: drbd_rs_complete_io() called, but extent not found<br>
block drbd0: Can not write resync data to local disk.<br>block drbd0: Can not write resync data to local disk.<br>block drbd0: Can not write resync data to local disk.<br>block drbd0: drbd_rs_complete_io() called, but extent not found<br>
block drbd0: drbd_rs_complete_io() called, but extent not found<br>block drbd0: drbd_rs_complete_io() called, but extent not found<br>block drbd0: disk( Failed -> Diskless ) <br>block drbd0: Notified peer that my disk is broken.<br>
block drbd0: Can not write mirrored data block to local disk.<br><br>On the peer node I have:<br>Sep 15 11:46:29 virtfed kernel: block drbd0: Handshake successful: Agreed network protocol version 91<br>Sep 15 11:46:29 virtfed kernel: block drbd0: Peer authenticated using 20 bytes of 'sha1' HMAC<br>
Sep 15 11:46:29 virtfed kernel: block drbd0: conn( WFConnection -> WFReportParams ) <br>Sep 15 11:46:29 virtfed kernel: block drbd0: Starting asender thread (from drbd0_receiver [2450])<br>Sep 15 11:46:29 virtfed kernel: block drbd0: data-integrity-alg: <not-used><br>
Sep 15 11:46:29 virtfed kernel: block drbd0: drbd_sync_handshake:<br>Sep 15 11:46:29 virtfed kernel: block drbd0: self 81BAD3F384A6F3C7:71E806A9BE572C29:9EB0CCB7634CBCDD:A0332E51B243BEE1 bits:109854 flags:0<br>Sep 15 11:46:29 virtfed kernel: block drbd0: peer 71E806A9BE572C28:0000000000000000:9EB0CCB7634CBCDC:A0332E51B243BEE1 bits:235520 flags:2<br>
Sep 15 11:46:29 virtfed kernel: block drbd0: uuid_compare()=1 by rule 70<br>Sep 15 11:46:29 virtfed kernel: block drbd0: peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapS ) <br>Sep 15 11:46:29 virtfed kernel: block drbd0: peer( Secondary -> Primary ) <br>
Sep 15 11:46:29 virtfed kernel: block drbd0: conn( WFBitMapS -> SyncSource ) pdsk( Outdated -> Inconsistent ) <br>Sep 15 11:46:29 virtfed kernel: block drbd0: Began resync as SyncSource (will sync 1381496 KB [345374 bits set]).<br>
Sep 15 11:46:31 virtfed kernel: block drbd0: Got NegAck packet. Peer is in troubles?<br>Sep 15 11:46:31 virtfed kernel: block drbd0: Got NegAck packet. Peer is in troubles?<br>Sep 15 11:46:31 virtfed kernel: block drbd0: Got NegAck packet. Peer is in troubles?<br>
Sep 15 11:46:31 virtfed kernel: block drbd0: Got NegAck packet. Peer is in troubles?<br>Sep 15 11:46:31 virtfed kernel: block drbd0: Got NegAck packet. Peer is in troubles?<br>Sep 15 11:46:32 virtfed kernel: block drbd0: Resync aborted.<br>
Sep 15 11:46:32 virtfed kernel: block drbd0: conn( SyncSource -> Connected ) pdsk( Inconsistent -> Diskless ) <br>Sep 15 11:46:32 virtfed kernel: block drbd0: Not sending RSDataReply, partner DISKLESS!<br>Sep 15 11:46:32 virtfed kernel: block drbd0: Not sending RSDataReply, partner DISKLESS!<br>
Sep 15 11:46:32 virtfed kernel: block drbd0: Not sending RSDataReply, partner DISKLESS!<br>Sep 15 11:46:32 virtfed kernel: block drbd0: Not sending RSDataReply, partner DISKLESS!<br>Sep 15 11:46:32 virtfed kernel: block drbd0: Not sending RSDataReply, partner DISKLESS!<br>
<br>My drbd.conf at the moment:<br>global {<br> usage-count yes;<br>}<br><br>common {<br> syncer { rate 100M; }<br>}<br><br>resource r0 {<br> protocol C;<br><br> handlers {<br> pri-on-incon-degr "/usr/lib/drbd/notify-pri-on-incon-degr.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f";<br>
pri-lost-after-sb "/usr/lib/drbd/notify-pri-lost-after-sb.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f";<br> local-io-error "/usr/lib/drbd/notify-io-error.sh; /usr/lib/drbd/notify-emergency-shutdown.sh; echo o > /proc/sysrq-trigger ; halt -f"; <br>
fence-peer "/usr/local/bin/obliterate-peer.sh";<br> }<br><br> startup {<br> wfc-timeout 30;<br> degr-wfc-timeout 10; # 2 minutes.<br> outdated-wfc-timeout 2; # 2 seconds.<br> become-primary-on both;<br>
}<br><br> disk {<br> on-io-error detach;<br> fencing resource-and-stonith;<br> }<br><br> net {<br> allow-two-primaries;<br> cram-hmac-alg "sha1";<br> shared-secret "kvmdrbd";<br> after-sb-0pri discard-zero-changes;<br>
after-sb-1pri discard-secondary;<br> after-sb-2pri disconnect;<br> rr-conflict disconnect;<br> }<br><br> syncer {<br> al-extents 257;<br> }<br><br> on virtfed {<br> device /dev/drbd0;<br> disk /dev/cciss/c0d0p3;<br>
address <a href="http://192.168.16.111:7788">192.168.16.111:7788</a>;<br> flexible-meta-disk /dev/cciss/c0d0p4;<br><br> }<br><br> on virtfedbis {<br> device /dev/drbd0;<br> disk /dev/cciss/c0d0p3;<br>
address <a href="http://192.168.16.112:7788">192.168.16.112:7788</a>;<br> flexible-meta-disk /dev/cciss/c0d0p4;<br> }<br>}<br>