Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
I have a problem with my DRBD I am in a serious problem and need help *My scenario:* - Hardware are identical on both Nodes - Two workstations ASUS P8H77-M PRO with Intel core I7, Proxmox VE 2.3, DRBD 8.4.2, LVM on top of DRBD - For each Node 2 NICs Realtek RTL8111/8168 PCI-E of 1 Gb/s in bond active-backup only for use with DRBD with direct connection NIC to NIC. - I use on the directive net "data-integrity-alg md5;" because for me is very important the data - "Node A" use 2 resources (r0 and r1) and replicates to "Node B" And after half hour it shows me this: shell#cat /proc/drbd version: 8.4.2 (api:1/proto:86-101) GIT-hash: 7ad5f850d711223713d6dcadc3dd48860321070c build by root at kvm5, 2013-06-16 13:44:51 0: cs:StandAlone ro:Primary/Unknown ds:UpToDate/DUnknown r----- ns:451409 nr:0 dw:527749 dr:868064 al:635 bm:239 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:254168 1: cs:Connected ro:Primary/Primary ds:UpToDate/UpToDate C r----- ns:36860 nr:0 dw:36860 dr:81763 al:93 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0 *This is my configuration:* *File global_common.conf:* global { usage-count no; } common { protocol C; handlers { pri-on-incon-degr "/usr/lib/drbd/notify-pri-on-incon-degr.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f"; pri-lost-after-sb "/usr/lib/drbd/notify-pri-lost-after-sb.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f"; local-io-error "/usr/lib/drbd/notify-io-error.sh; /usr/lib/drbd/notify-emergency-shutdown.sh; echo o > /proc/sysrq-trigger ; halt -f"; split-brain "/usr/lib/drbd/notify-split-brain.sh root"; out-of-sync "/usr/lib/drbd/notify-out-of-sync.sh root"; } startup { wfc-timeout 30; degr-wfc-timeout 20; outdated-wfc-timeout 15; } options { cpu-mask 0; } disk { on-io-error detach; al-extents 3389; resync-rate 75M; } net { sndbuf-size 0; no-tcp-cork; unplug-watermark 16; max-buffers 8000; max-epoch-size 8000; data-integrity-alg md5; verify-alg sha1; } } *File r0.res:* resource r0 { protocol C; startup { become-primary-on both; } net { allow-two-primaries; after-sb-0pri discard-zero-changes; after-sb-1pri discard-secondary; after-sb-2pri disconnect; } on kvm5 { device /dev/drbd0; disk /dev/sda3; address 10.2.2.50:7788; meta-disk internal; } on kvm6 { device /dev/drbd0; disk /dev/sda3; address 10.2.2.51:7788; meta-disk internal; } } *File r1.res:* resource r1 { protocol C; startup { become-primary-on both; } net { allow-two-primaries; after-sb-0pri discard-zero-changes; after-sb-1pri discard-secondary; after-sb-2pri disconnect; } on kvm5 { device /dev/drbd1; disk /dev/sdb3; address 10.2.2.50:7789; meta-disk internal; } on kvm6 { device /dev/drbd1; disk /dev/sdb3; address 10.2.2.51:7789; meta-disk internal; } } *Notes:* I use on the directive net "data-integrity-alg md5"; because for me is very important the data "Node A" use 2 resources and replicates to "Node B" *Thess are my Logs:* *Log in Node A:* Jun 17 05:58:39 kvm5 kernel: dlm: connecting to 4 Jun 17 06:31:14 kvm5 kernel: block drbd0: Digest mismatch, buffer modified by upper layers during write: 21908040s +4096 Jun 17 06:31:14 kvm5 kernel: d-con r0: *sock was shut down by peer* Jun 17 06:31:14 kvm5 kernel: d-con r0: peer( Primary -> Unknown ) conn( Connected -> BrokenPipe ) pdsk( UpToDate -> DUnknown ) Jun 17 06:31:14 kvm5 kernel: d-con r0: short read (expected size 16) Jun 17 06:31:14 kvm5 kernel: d-con r0: *meta connection shut down by peer.* Jun 17 06:31:14 kvm5 kernel: block drbd0: new current UUID 381CAEFBB77A202F:264FC7B7437F70E5:E50620E2018A2EEF:E50520E2018A2EEF Jun 17 06:31:14 kvm5 kernel: d-con r0: asender terminated Jun 17 06:31:14 kvm5 kernel: d-con r0: Terminating asender thread Jun 17 06:31:14 kvm5 kernel: d-con r0: *Connection closed* Jun 17 06:31:14 kvm5 kernel: d-con r0: conn( BrokenPipe -> Unconnected ) Jun 17 06:31:14 kvm5 kernel: d-con r0: receiver terminated Jun 17 06:31:14 kvm5 kernel: d-con r0: Restarting receiver thread Jun 17 06:31:14 kvm5 kernel: d-con r0: receiver (re)started Jun 17 06:31:14 kvm5 kernel: d-con r0: conn( Unconnected -> WFConnection ) Jun 17 06:31:15 kvm5 kernel: d-con r0: Handshake successful: Agreed network protocol version 101 Jun 17 06:31:15 kvm5 kernel: d-con r0: conn( WFConnection -> WFReportParams ) Jun 17 06:31:15 kvm5 kernel: d-con r0: Starting asender thread (from drbd_r_r0 [117263]) Jun 17 06:31:15 kvm5 kernel: block drbd0: drbd_sync_handshake: Jun 17 06:31:15 kvm5 kernel: block drbd0: self 381CAEFBB77A202F:264FC7B7437F70E5:E50620E2018A2EEF:E50520E2018A2EEF bits:14 flags:0 Jun 17 06:31:15 kvm5 kernel: block drbd0: peer 96BBC9E849D133DB:264FC7B7437F70E5:E50620E2018A2EEF:E50520E2018A2EEF bits:0 flags:0 Jun 17 06:31:15 kvm5 kernel: block drbd0: uuid_compare()=100 by rule 90 Jun 17 06:31:15 kvm5 kernel: block drbd0: helper command: /sbin/drbdadm initial-split-brain minor-0 Jun 17 06:31:15 kvm5 kernel: block drbd0: helper command: /sbin/drbdadm initial-split-brain minor-0 exit code 0 (0x0) Jun 17 06:31:15 kvm5 kernel: block drbd0: *Split-Brain detected but unresolved, dropping connection!* Jun 17 06:31:15 kvm5 kernel: block drbd0: helper command: /sbin/drbdadm split-brain minor-0 Jun 17 06:31:15 kvm5 kernel: d-con r0: *meta connection shut down by peer.* Jun 17 06:31:15 kvm5 kernel: d-con r0: conn( WFReportParams -> NetworkFailure ) Jun 17 06:31:15 kvm5 kernel: d-con r0: asender terminated Jun 17 06:31:15 kvm5 kernel: d-con r0: Terminating asender thread Jun 17 06:31:15 kvm5 kernel: block drbd0: helper command: /sbin/drbdadm split-brain minor-0 exit code 0 (0x0) Jun 17 06:31:15 kvm5 kernel: d-con r0: c*onn( NetworkFailure -> Disconnecting )* Jun 17 06:31:15 kvm5 kernel: d-con r0: error receiving ReportState, e: -5 l: 0! Jun 17 06:31:15 kvm5 kernel: d-con r0: Connection closed Jun 17 06:31:15 kvm5 kernel: d-con r0: conn( Disconnecting -> StandAlone ) Jun 17 06:31:15 kvm5 kernel: d-con r0: receiver terminated Jun 17 06:31:15 kvm5 kernel: d-con r0: Terminating receiver thread *Log in node B:* Jun 17 05:58:39 kvm6 kernel: dlm: got connection from 3 Jun 17 06:31:14 kvm6 kernel: block drbd0: *Digest integrity check FAILED: 21908040s +4096* Jun 17 06:31:14 kvm6 kernel: d-con r0: *error receiving Data, e: -5 l: 4112!* Jun 17 06:31:14 kvm6 kernel: d-con r0: peer( Primary -> Unknown ) conn( Connected -> ProtocolError ) pdsk( UpToDate -> DUnknown ) Jun 17 06:31:14 kvm6 kernel: block drbd0: new current UUID 96BBC9E849D133DB:264FC7B7437F70E5:E50620E2018A2EEF:E50520E2018A2EEF Jun 17 06:31:14 kvm6 kernel: d-con r0: asender terminated Jun 17 06:31:14 kvm6 kernel: d-con r0: Terminating asender thread Jun 17 06:31:14 kvm6 kernel: d-con r0: Connection closed Jun 17 06:31:14 kvm6 kernel: d-con r0: conn( ProtocolError -> Unconnected ) Jun 17 06:31:14 kvm6 kernel: d-con r0: receiver terminated Jun 17 06:31:14 kvm6 kernel: d-con r0: Restarting receiver thread Jun 17 06:31:14 kvm6 kernel: d-con r0: receiver (re)started Jun 17 06:31:14 kvm6 kernel: d-con r0: conn( Unconnected -> WFConnection ) Jun 17 06:31:15 kvm6 kernel: d-con r0: Handshake successful: Agreed network protocol version 101 Jun 17 06:31:15 kvm6 kernel: d-con r0: conn( WFConnection -> WFReportParams ) Jun 17 06:31:15 kvm6 kernel: d-con r0: Starting asender thread (from drbd_r_r0 [54943]) Jun 17 06:31:15 kvm6 kernel: block drbd0: drbd_sync_handshake: Jun 17 06:31:15 kvm6 kernel: block drbd0: self 96BBC9E849D133DB:264FC7B7437F70E5:E50620E2018A2EEF:E50520E2018A2EEF bits:0 flags:0 Jun 17 06:31:15 kvm6 kernel: block drbd0: peer 381CAEFBB77A202F:264FC7B7437F70E5:E50620E2018A2EEF:E50520E2018A2EEF bits:14 flags:0 Jun 17 06:31:15 kvm6 kernel: block drbd0: uuid_compare()=100 by rule 90 Jun 17 06:31:15 kvm6 kernel: block drbd0: helper command: /sbin/drbdadm initial-split-brain minor-0 Jun 17 06:31:15 kvm6 kernel: block drbd0: helper command: /sbin/drbdadm initial-split-brain minor-0 exit code 0 (0x0) Jun 17 06:31:15 kvm6 kernel: block drbd0: *Split-Brain detected but unresolved, dropping connection!* Jun 17 06:31:15 kvm6 kernel: block drbd0: helper command: /sbin/drbdadm split-brain minor-0 Jun 17 06:31:15 kvm6 kernel: block drbd0: helper command: /sbin/drbdadm split-brain minor-0 exit code 0 (0x0) Jun 17 06:31:15 kvm6 kernel: d-con r0: conn( WFReportParams -> Disconnecting ) Jun 17 06:31:15 kvm6 kernel: d-con r0: *error receiving ReportState, e: -5 l: 0!* Jun 17 06:31:15 kvm6 kernel: d-con r0: asender terminated Jun 17 06:31:15 kvm6 kernel: d-con r0: Terminating asender thread Jun 17 06:31:15 kvm6 kernel: d-con r0: *Connection closed* Jun 17 06:31:15 kvm6 kernel: d-con r0: conn( Disconnecting -> StandAlone ) Jun 17 06:31:15 kvm6 kernel: d-con r0: receiver terminated Jun 17 06:31:15 kvm6 kernel: d-con r0: Terminating receiver thread I will be extremely grateful to anyone who can help me Best regards Cesar -- View this message in context: http://drbd.10923.n7.nabble.com/d-con-r0-sock-was-shut-down-by-peer-DRBD-8-4-2-tp17912.html Sent from the DRBD - User mailing list archive at Nabble.com.