Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
I have a problem with my DRBD
I am in a serious problem and need help
*My scenario:*
- Hardware are identical on both Nodes
- Two workstations ASUS P8H77-M PRO with Intel core I7, Proxmox VE 2.3, DRBD
8.4.2, LVM on top of DRBD
- For each Node 2 NICs Realtek RTL8111/8168 PCI-E of 1 Gb/s in bond
active-backup only for use with DRBD with direct connection NIC to NIC.
- I use on the directive net "data-integrity-alg md5;" because for me is
very important the data
- "Node A" use 2 resources (r0 and r1) and replicates to "Node B"
And after half hour it shows me this:
shell#cat /proc/drbd
version: 8.4.2 (api:1/proto:86-101)
GIT-hash: 7ad5f850d711223713d6dcadc3dd48860321070c build by root at kvm5,
2013-06-16 13:44:51
0: cs:StandAlone ro:Primary/Unknown ds:UpToDate/DUnknown r-----
ns:451409 nr:0 dw:527749 dr:868064 al:635 bm:239 lo:0 pe:0 ua:0 ap:0
ep:1 wo:f oos:254168
1: cs:Connected ro:Primary/Primary ds:UpToDate/UpToDate C r-----
ns:36860 nr:0 dw:36860 dr:81763 al:93 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f
oos:0
*This is my configuration:*
*File global_common.conf:*
global { usage-count no; }
common { protocol C;
handlers {
pri-on-incon-degr
"/usr/lib/drbd/notify-pri-on-incon-degr.sh;
/usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ;
reboot -f";
pri-lost-after-sb
"/usr/lib/drbd/notify-pri-lost-after-sb.sh;
/usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ;
reboot -f";
local-io-error "/usr/lib/drbd/notify-io-error.sh;
/usr/lib/drbd/notify-emergency-shutdown.sh; echo o > /proc/sysrq-trigger ;
halt -f";
split-brain "/usr/lib/drbd/notify-split-brain.sh root";
out-of-sync "/usr/lib/drbd/notify-out-of-sync.sh root";
}
startup {
wfc-timeout 30; degr-wfc-timeout 20; outdated-wfc-timeout
15;
}
options {
cpu-mask 0;
}
disk {
on-io-error detach; al-extents 3389; resync-rate 75M;
}
net {
sndbuf-size 0; no-tcp-cork; unplug-watermark 16; max-buffers
8000; max-epoch-size 8000;
data-integrity-alg md5;
verify-alg sha1;
}
}
*File r0.res:*
resource r0 { protocol C;
startup {
become-primary-on both;
}
net {
allow-two-primaries;
after-sb-0pri discard-zero-changes;
after-sb-1pri discard-secondary;
after-sb-2pri disconnect;
}
on kvm5 {
device /dev/drbd0;
disk /dev/sda3;
address 10.2.2.50:7788;
meta-disk internal;
}
on kvm6 {
device /dev/drbd0;
disk /dev/sda3;
address 10.2.2.51:7788;
meta-disk internal;
}
}
*File r1.res:*
resource r1 { protocol C;
startup {
become-primary-on both;
}
net {
allow-two-primaries;
after-sb-0pri discard-zero-changes;
after-sb-1pri discard-secondary;
after-sb-2pri disconnect;
}
on kvm5 {
device /dev/drbd1;
disk /dev/sdb3;
address 10.2.2.50:7789;
meta-disk internal;
}
on kvm6 {
device /dev/drbd1;
disk /dev/sdb3;
address 10.2.2.51:7789;
meta-disk internal;
}
}
*Notes:*
I use on the directive net "data-integrity-alg md5"; because for me is very
important the data
"Node A" use 2 resources and replicates to "Node B"
*Thess are my Logs:*
*Log in Node A:*
Jun 17 05:58:39 kvm5 kernel: dlm: connecting to 4
Jun 17 06:31:14 kvm5 kernel: block drbd0: Digest mismatch, buffer modified
by upper layers during write: 21908040s +4096
Jun 17 06:31:14 kvm5 kernel: d-con r0: *sock was shut down by peer*
Jun 17 06:31:14 kvm5 kernel: d-con r0: peer( Primary -> Unknown ) conn(
Connected -> BrokenPipe ) pdsk( UpToDate -> DUnknown )
Jun 17 06:31:14 kvm5 kernel: d-con r0: short read (expected size 16)
Jun 17 06:31:14 kvm5 kernel: d-con r0: *meta connection shut down by peer.*
Jun 17 06:31:14 kvm5 kernel: block drbd0: new current UUID
381CAEFBB77A202F:264FC7B7437F70E5:E50620E2018A2EEF:E50520E2018A2EEF
Jun 17 06:31:14 kvm5 kernel: d-con r0: asender terminated
Jun 17 06:31:14 kvm5 kernel: d-con r0: Terminating asender thread
Jun 17 06:31:14 kvm5 kernel: d-con r0: *Connection closed*
Jun 17 06:31:14 kvm5 kernel: d-con r0: conn( BrokenPipe -> Unconnected )
Jun 17 06:31:14 kvm5 kernel: d-con r0: receiver terminated
Jun 17 06:31:14 kvm5 kernel: d-con r0: Restarting receiver thread
Jun 17 06:31:14 kvm5 kernel: d-con r0: receiver (re)started
Jun 17 06:31:14 kvm5 kernel: d-con r0: conn( Unconnected -> WFConnection )
Jun 17 06:31:15 kvm5 kernel: d-con r0: Handshake successful: Agreed network
protocol version 101
Jun 17 06:31:15 kvm5 kernel: d-con r0: conn( WFConnection -> WFReportParams
)
Jun 17 06:31:15 kvm5 kernel: d-con r0: Starting asender thread (from
drbd_r_r0 [117263])
Jun 17 06:31:15 kvm5 kernel: block drbd0: drbd_sync_handshake:
Jun 17 06:31:15 kvm5 kernel: block drbd0: self
381CAEFBB77A202F:264FC7B7437F70E5:E50620E2018A2EEF:E50520E2018A2EEF bits:14
flags:0
Jun 17 06:31:15 kvm5 kernel: block drbd0: peer
96BBC9E849D133DB:264FC7B7437F70E5:E50620E2018A2EEF:E50520E2018A2EEF bits:0
flags:0
Jun 17 06:31:15 kvm5 kernel: block drbd0: uuid_compare()=100 by rule 90
Jun 17 06:31:15 kvm5 kernel: block drbd0: helper command: /sbin/drbdadm
initial-split-brain minor-0
Jun 17 06:31:15 kvm5 kernel: block drbd0: helper command: /sbin/drbdadm
initial-split-brain minor-0 exit code 0 (0x0)
Jun 17 06:31:15 kvm5 kernel: block drbd0: *Split-Brain detected but
unresolved, dropping connection!*
Jun 17 06:31:15 kvm5 kernel: block drbd0: helper command: /sbin/drbdadm
split-brain minor-0
Jun 17 06:31:15 kvm5 kernel: d-con r0: *meta connection shut down by peer.*
Jun 17 06:31:15 kvm5 kernel: d-con r0: conn( WFReportParams ->
NetworkFailure )
Jun 17 06:31:15 kvm5 kernel: d-con r0: asender terminated
Jun 17 06:31:15 kvm5 kernel: d-con r0: Terminating asender thread
Jun 17 06:31:15 kvm5 kernel: block drbd0: helper command: /sbin/drbdadm
split-brain minor-0 exit code 0 (0x0)
Jun 17 06:31:15 kvm5 kernel: d-con r0: c*onn( NetworkFailure ->
Disconnecting )*
Jun 17 06:31:15 kvm5 kernel: d-con r0: error receiving ReportState, e: -5 l:
0!
Jun 17 06:31:15 kvm5 kernel: d-con r0: Connection closed
Jun 17 06:31:15 kvm5 kernel: d-con r0: conn( Disconnecting -> StandAlone )
Jun 17 06:31:15 kvm5 kernel: d-con r0: receiver terminated
Jun 17 06:31:15 kvm5 kernel: d-con r0: Terminating receiver thread
*Log in node B:*
Jun 17 05:58:39 kvm6 kernel: dlm: got connection from 3
Jun 17 06:31:14 kvm6 kernel: block drbd0: *Digest integrity check FAILED:
21908040s +4096*
Jun 17 06:31:14 kvm6 kernel: d-con r0: *error receiving Data, e: -5 l:
4112!*
Jun 17 06:31:14 kvm6 kernel: d-con r0: peer( Primary -> Unknown ) conn(
Connected -> ProtocolError ) pdsk( UpToDate -> DUnknown )
Jun 17 06:31:14 kvm6 kernel: block drbd0: new current UUID
96BBC9E849D133DB:264FC7B7437F70E5:E50620E2018A2EEF:E50520E2018A2EEF
Jun 17 06:31:14 kvm6 kernel: d-con r0: asender terminated
Jun 17 06:31:14 kvm6 kernel: d-con r0: Terminating asender thread
Jun 17 06:31:14 kvm6 kernel: d-con r0: Connection closed
Jun 17 06:31:14 kvm6 kernel: d-con r0: conn( ProtocolError -> Unconnected )
Jun 17 06:31:14 kvm6 kernel: d-con r0: receiver terminated
Jun 17 06:31:14 kvm6 kernel: d-con r0: Restarting receiver thread
Jun 17 06:31:14 kvm6 kernel: d-con r0: receiver (re)started
Jun 17 06:31:14 kvm6 kernel: d-con r0: conn( Unconnected -> WFConnection )
Jun 17 06:31:15 kvm6 kernel: d-con r0: Handshake successful: Agreed network
protocol version 101
Jun 17 06:31:15 kvm6 kernel: d-con r0: conn( WFConnection -> WFReportParams
)
Jun 17 06:31:15 kvm6 kernel: d-con r0: Starting asender thread (from
drbd_r_r0 [54943])
Jun 17 06:31:15 kvm6 kernel: block drbd0: drbd_sync_handshake:
Jun 17 06:31:15 kvm6 kernel: block drbd0: self
96BBC9E849D133DB:264FC7B7437F70E5:E50620E2018A2EEF:E50520E2018A2EEF bits:0
flags:0
Jun 17 06:31:15 kvm6 kernel: block drbd0: peer
381CAEFBB77A202F:264FC7B7437F70E5:E50620E2018A2EEF:E50520E2018A2EEF bits:14
flags:0
Jun 17 06:31:15 kvm6 kernel: block drbd0: uuid_compare()=100 by rule 90
Jun 17 06:31:15 kvm6 kernel: block drbd0: helper command: /sbin/drbdadm
initial-split-brain minor-0
Jun 17 06:31:15 kvm6 kernel: block drbd0: helper command: /sbin/drbdadm
initial-split-brain minor-0 exit code 0 (0x0)
Jun 17 06:31:15 kvm6 kernel: block drbd0: *Split-Brain detected but
unresolved, dropping connection!*
Jun 17 06:31:15 kvm6 kernel: block drbd0: helper command: /sbin/drbdadm
split-brain minor-0
Jun 17 06:31:15 kvm6 kernel: block drbd0: helper command: /sbin/drbdadm
split-brain minor-0 exit code 0 (0x0)
Jun 17 06:31:15 kvm6 kernel: d-con r0: conn( WFReportParams -> Disconnecting
)
Jun 17 06:31:15 kvm6 kernel: d-con r0: *error receiving ReportState, e: -5
l: 0!*
Jun 17 06:31:15 kvm6 kernel: d-con r0: asender terminated
Jun 17 06:31:15 kvm6 kernel: d-con r0: Terminating asender thread
Jun 17 06:31:15 kvm6 kernel: d-con r0: *Connection closed*
Jun 17 06:31:15 kvm6 kernel: d-con r0: conn( Disconnecting -> StandAlone )
Jun 17 06:31:15 kvm6 kernel: d-con r0: receiver terminated
Jun 17 06:31:15 kvm6 kernel: d-con r0: Terminating receiver thread
I will be extremely grateful to anyone who can help me
Best regards
Cesar
--
View this message in context: http://drbd.10923.n7.nabble.com/d-con-r0-sock-was-shut-down-by-peer-DRBD-8-4-2-tp17912.html
Sent from the DRBD - User mailing list archive at Nabble.com.