Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
I'm again here. After a little the bad diskless situation restarted also with 8.3.3rc2. What the first change or action to debug to try? In particular, I don't understand if the line in dmesg end_request: I/O error, dev cciss/c0d0, sector 0 is reported by drbd or scsi layer or what... In theory drbd should not try to read sector 0 at all as It uses c0d0p3 and c0d0p4..... or does this mean that it is not able to read the partition table....? Thanks for help Gianluca On the problematic node I have: drbd: initialized. Version: 8.3.3rc2 (api:88/proto:86-91) drbd: GIT-hash: 04b2f175d7076ef2e0dd7d5ba6f6843357a041ed build by root at virtfedbis, 2009-09-11 10:06:20 drbd: registered as block device major 147 drbd: minor_table @ 0xffff880829c04700 block drbd0: Starting worker thread (from cqueue [1779]) block drbd0: disk( Diskless -> Attaching ) block drbd0: Found 6 transactions (244 active extents) in activity log. block drbd0: Method to ensure write ordering: barrier block drbd0: max_segment_size ( = BIO size ) = 32768 block drbd0: drbd_bm_resize called with capacity == 109317376 block drbd0: resync bitmap: bits=13664672 words=213511 block drbd0: size = 52 GB (54658688 KB) block drbd0: recounting of set bits took additional 2 jiffies block drbd0: 0 KB (0 bits) marked out-of-sync by on disk bit-map. block drbd0: Marked additional 920 MB as out-of-sync based on AL. end_request: I/O error, dev cciss/c0d0, sector 0 block drbd0: meta data flush failed with status -95, disabling md-flushes block drbd0: disk( Attaching -> Consistent ) block drbd0: conn( StandAlone -> Unconnected ) block drbd0: Starting receiver thread (from drbd0_worker [1781]) block drbd0: receiver (re)started block drbd0: conn( Unconnected -> WFConnection ) block drbd0: Handshake successful: Agreed network protocol version 91 block drbd0: Peer authenticated using 20 bytes of 'sha1' HMAC block drbd0: conn( WFConnection -> WFReportParams ) block drbd0: Starting asender thread (from drbd0_receiver [1806]) block drbd0: data-integrity-alg: <not-used> block drbd0: drbd_sync_handshake: block drbd0: self 71E806A9BE572C28:0000000000000000:9EB0CCB7634CBCDC:A0332E51B243BEE1 bits:235520 flags:0 block drbd0: peer 81BAD3F384A6F3C7:71E806A9BE572C29:9EB0CCB7634CBCDD:A0332E51B243BEE1 bits:109854 flags:0 block drbd0: uuid_compare()=-1 by rule 50 block drbd0: peer( Unknown -> Primary ) conn( WFReportParams -> WFBitMapT ) disk( Consistent -> Outdated ) pdsk( DUnknown -> UpToDate ) block drbd0: role( Secondary -> Primary ) block drbd0: conn( WFBitMapT -> WFSyncUUID ) block drbd0: helper command: /sbin/drbdadm before-resync-target minor-0 block drbd0: helper command: /sbin/drbdadm before-resync-target minor-0 exit code 0 (0x0) block drbd0: conn( WFSyncUUID -> SyncTarget ) disk( Outdated -> Inconsistent ) block drbd0: Began resync as SyncTarget (will sync 1381496 KB [345374 bits set]). block drbd0: Resync aborted. block drbd0: conn( SyncTarget -> Connected ) disk( Inconsistent -> Failed ) block drbd0: Local IO failed. Detaching... block drbd0: Can not write resync data to local disk. block drbd0: Can not write resync data to local disk. block drbd0: drbd_rs_complete_io() called, but extent not found block drbd0: drbd_rs_complete_io() called, but extent not found block drbd0: drbd_rs_complete_io() called, but extent not found block drbd0: drbd_rs_complete_io() called, but extent not found block drbd0: drbd_rs_complete_io() called, but extent not found block drbd0: drbd_rs_complete_io() called, but extent not found block drbd0: Can not write resync data to local disk. block drbd0: Can not write resync data to local disk. block drbd0: Can not write resync data to local disk. block drbd0: drbd_rs_complete_io() called, but extent not found block drbd0: drbd_rs_complete_io() called, but extent not found block drbd0: drbd_rs_complete_io() called, but extent not found block drbd0: disk( Failed -> Diskless ) block drbd0: Notified peer that my disk is broken. block drbd0: Can not write mirrored data block to local disk. On the peer node I have: Sep 15 11:46:29 virtfed kernel: block drbd0: Handshake successful: Agreed network protocol version 91 Sep 15 11:46:29 virtfed kernel: block drbd0: Peer authenticated using 20 bytes of 'sha1' HMAC Sep 15 11:46:29 virtfed kernel: block drbd0: conn( WFConnection -> WFReportParams ) Sep 15 11:46:29 virtfed kernel: block drbd0: Starting asender thread (from drbd0_receiver [2450]) Sep 15 11:46:29 virtfed kernel: block drbd0: data-integrity-alg: <not-used> Sep 15 11:46:29 virtfed kernel: block drbd0: drbd_sync_handshake: Sep 15 11:46:29 virtfed kernel: block drbd0: self 81BAD3F384A6F3C7:71E806A9BE572C29:9EB0CCB7634CBCDD:A0332E51B243BEE1 bits:109854 flags:0 Sep 15 11:46:29 virtfed kernel: block drbd0: peer 71E806A9BE572C28:0000000000000000:9EB0CCB7634CBCDC:A0332E51B243BEE1 bits:235520 flags:2 Sep 15 11:46:29 virtfed kernel: block drbd0: uuid_compare()=1 by rule 70 Sep 15 11:46:29 virtfed kernel: block drbd0: peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapS ) Sep 15 11:46:29 virtfed kernel: block drbd0: peer( Secondary -> Primary ) Sep 15 11:46:29 virtfed kernel: block drbd0: conn( WFBitMapS -> SyncSource ) pdsk( Outdated -> Inconsistent ) Sep 15 11:46:29 virtfed kernel: block drbd0: Began resync as SyncSource (will sync 1381496 KB [345374 bits set]). Sep 15 11:46:31 virtfed kernel: block drbd0: Got NegAck packet. Peer is in troubles? Sep 15 11:46:31 virtfed kernel: block drbd0: Got NegAck packet. Peer is in troubles? Sep 15 11:46:31 virtfed kernel: block drbd0: Got NegAck packet. Peer is in troubles? Sep 15 11:46:31 virtfed kernel: block drbd0: Got NegAck packet. Peer is in troubles? Sep 15 11:46:31 virtfed kernel: block drbd0: Got NegAck packet. Peer is in troubles? Sep 15 11:46:32 virtfed kernel: block drbd0: Resync aborted. Sep 15 11:46:32 virtfed kernel: block drbd0: conn( SyncSource -> Connected ) pdsk( Inconsistent -> Diskless ) Sep 15 11:46:32 virtfed kernel: block drbd0: Not sending RSDataReply, partner DISKLESS! Sep 15 11:46:32 virtfed kernel: block drbd0: Not sending RSDataReply, partner DISKLESS! Sep 15 11:46:32 virtfed kernel: block drbd0: Not sending RSDataReply, partner DISKLESS! Sep 15 11:46:32 virtfed kernel: block drbd0: Not sending RSDataReply, partner DISKLESS! Sep 15 11:46:32 virtfed kernel: block drbd0: Not sending RSDataReply, partner DISKLESS! My drbd.conf at the moment: global { usage-count yes; } common { syncer { rate 100M; } } resource r0 { protocol C; handlers { pri-on-incon-degr "/usr/lib/drbd/notify-pri-on-incon-degr.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f"; pri-lost-after-sb "/usr/lib/drbd/notify-pri-lost-after-sb.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f"; local-io-error "/usr/lib/drbd/notify-io-error.sh; /usr/lib/drbd/notify-emergency-shutdown.sh; echo o > /proc/sysrq-trigger ; halt -f"; fence-peer "/usr/local/bin/obliterate-peer.sh"; } startup { wfc-timeout 30; degr-wfc-timeout 10; # 2 minutes. outdated-wfc-timeout 2; # 2 seconds. become-primary-on both; } disk { on-io-error detach; fencing resource-and-stonith; } net { allow-two-primaries; cram-hmac-alg "sha1"; shared-secret "kvmdrbd"; after-sb-0pri discard-zero-changes; after-sb-1pri discard-secondary; after-sb-2pri disconnect; rr-conflict disconnect; } syncer { al-extents 257; } on virtfed { device /dev/drbd0; disk /dev/cciss/c0d0p3; address 192.168.16.111:7788; flexible-meta-disk /dev/cciss/c0d0p4; } on virtfedbis { device /dev/drbd0; disk /dev/cciss/c0d0p3; address 192.168.16.112:7788; flexible-meta-disk /dev/cciss/c0d0p4; } } -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20090915/0070e283/attachment.htm>