[DRBD-user] d-con r0: sock was shut down by peer DRBD 8.4.2

cesar brain at click.com.py
Mon Jun 17 14:28:13 CEST 2013

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


I have a problem with my DRBD

I am in a serious problem and need help

*My scenario:*
- Hardware are identical on both Nodes
- Two workstations ASUS P8H77-M PRO with Intel core I7, Proxmox VE 2.3, DRBD
8.4.2, LVM on top of DRBD
- For each Node 2 NICs Realtek RTL8111/8168 PCI-E of 1 Gb/s in bond
active-backup only for use with DRBD with direct connection NIC to NIC.
- I use on the directive net "data-integrity-alg md5;" because for me is
very important the data
- "Node A" use 2 resources (r0 and r1) and replicates to "Node B"

And after half hour it shows me this:

shell#cat /proc/drbd
version: 8.4.2 (api:1/proto:86-101)
GIT-hash: 7ad5f850d711223713d6dcadc3dd48860321070c build by root at kvm5,
2013-06-16 13:44:51
 0: cs:StandAlone ro:Primary/Unknown ds:UpToDate/DUnknown   r-----
    ns:451409 nr:0 dw:527749 dr:868064 al:635 bm:239 lo:0 pe:0 ua:0 ap:0
ep:1 wo:f oos:254168
 1: cs:Connected ro:Primary/Primary ds:UpToDate/UpToDate C r-----
    ns:36860 nr:0 dw:36860 dr:81763 al:93 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f
oos:0


*This is my configuration:*

*File global_common.conf:*
global { usage-count no; }

common { protocol C;
        handlers {
                pri-on-incon-degr
"/usr/lib/drbd/notify-pri-on-incon-degr.sh;
/usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ;
reboot -f";
                pri-lost-after-sb
"/usr/lib/drbd/notify-pri-lost-after-sb.sh;
/usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ;
reboot -f";
                local-io-error "/usr/lib/drbd/notify-io-error.sh;
/usr/lib/drbd/notify-emergency-shutdown.sh; echo o > /proc/sysrq-trigger ;
halt -f";
                split-brain "/usr/lib/drbd/notify-split-brain.sh root";
                out-of-sync "/usr/lib/drbd/notify-out-of-sync.sh root";
        }

        startup {
                wfc-timeout 30; degr-wfc-timeout 20; outdated-wfc-timeout
15;
        }

        options {
                cpu-mask 0;
        }

        disk {
                on-io-error detach; al-extents 3389; resync-rate 75M;
        }

        net {
                sndbuf-size 0; no-tcp-cork; unplug-watermark 16; max-buffers
8000; max-epoch-size 8000;
                data-integrity-alg md5;
                verify-alg sha1;
        }
}


*File r0.res:*
resource r0 { protocol C;
  startup {
    become-primary-on both;
  }
  net {
    allow-two-primaries;
    after-sb-0pri discard-zero-changes;
    after-sb-1pri discard-secondary;
    after-sb-2pri disconnect;
  }
  on kvm5 {
    device /dev/drbd0;
    disk /dev/sda3;
    address 10.2.2.50:7788;
    meta-disk internal;
  }
  on kvm6 {
    device /dev/drbd0;
    disk /dev/sda3;
    address 10.2.2.51:7788;
    meta-disk internal;
  }
}

*File r1.res:*
resource r1 { protocol C;
  startup {
    become-primary-on both;
  }
  net {
    allow-two-primaries;
    after-sb-0pri discard-zero-changes;
    after-sb-1pri discard-secondary;
    after-sb-2pri disconnect;
  }
  on kvm5 {
    device /dev/drbd1;
    disk /dev/sdb3;
    address 10.2.2.50:7789;
    meta-disk internal;
  }
  on kvm6 {
    device /dev/drbd1;
    disk /dev/sdb3;
    address 10.2.2.51:7789;
    meta-disk internal;
  }
}


*Notes:*
I use on the directive net "data-integrity-alg md5"; because for me is very
important the data
"Node A" use 2 resources and replicates to "Node B"

*Thess are my Logs:*

*Log in Node A:*
Jun 17 05:58:39 kvm5 kernel: dlm: connecting to 4
Jun 17 06:31:14 kvm5 kernel: block drbd0: Digest mismatch, buffer modified
by upper layers during write: 21908040s +4096
Jun 17 06:31:14 kvm5 kernel: d-con r0: *sock was shut down by peer*
Jun 17 06:31:14 kvm5 kernel: d-con r0: peer( Primary -> Unknown ) conn(
Connected -> BrokenPipe ) pdsk( UpToDate -> DUnknown )
Jun 17 06:31:14 kvm5 kernel: d-con r0: short read (expected size 16)
Jun 17 06:31:14 kvm5 kernel: d-con r0: *meta connection shut down by peer.*
Jun 17 06:31:14 kvm5 kernel: block drbd0: new current UUID
381CAEFBB77A202F:264FC7B7437F70E5:E50620E2018A2EEF:E50520E2018A2EEF
Jun 17 06:31:14 kvm5 kernel: d-con r0: asender terminated
Jun 17 06:31:14 kvm5 kernel: d-con r0: Terminating asender thread
Jun 17 06:31:14 kvm5 kernel: d-con r0: *Connection closed*
Jun 17 06:31:14 kvm5 kernel: d-con r0: conn( BrokenPipe -> Unconnected )
Jun 17 06:31:14 kvm5 kernel: d-con r0: receiver terminated
Jun 17 06:31:14 kvm5 kernel: d-con r0: Restarting receiver thread
Jun 17 06:31:14 kvm5 kernel: d-con r0: receiver (re)started
Jun 17 06:31:14 kvm5 kernel: d-con r0: conn( Unconnected -> WFConnection )
Jun 17 06:31:15 kvm5 kernel: d-con r0: Handshake successful: Agreed network
protocol version 101
Jun 17 06:31:15 kvm5 kernel: d-con r0: conn( WFConnection -> WFReportParams
)
Jun 17 06:31:15 kvm5 kernel: d-con r0: Starting asender thread (from
drbd_r_r0 [117263])
Jun 17 06:31:15 kvm5 kernel: block drbd0: drbd_sync_handshake:
Jun 17 06:31:15 kvm5 kernel: block drbd0: self
381CAEFBB77A202F:264FC7B7437F70E5:E50620E2018A2EEF:E50520E2018A2EEF bits:14
flags:0
Jun 17 06:31:15 kvm5 kernel: block drbd0: peer
96BBC9E849D133DB:264FC7B7437F70E5:E50620E2018A2EEF:E50520E2018A2EEF bits:0
flags:0
Jun 17 06:31:15 kvm5 kernel: block drbd0: uuid_compare()=100 by rule 90
Jun 17 06:31:15 kvm5 kernel: block drbd0: helper command: /sbin/drbdadm
initial-split-brain minor-0
Jun 17 06:31:15 kvm5 kernel: block drbd0: helper command: /sbin/drbdadm
initial-split-brain minor-0 exit code 0 (0x0)
Jun 17 06:31:15 kvm5 kernel: block drbd0: *Split-Brain detected but
unresolved, dropping connection!*
Jun 17 06:31:15 kvm5 kernel: block drbd0: helper command: /sbin/drbdadm
split-brain minor-0
Jun 17 06:31:15 kvm5 kernel: d-con r0: *meta connection shut down by peer.*
Jun 17 06:31:15 kvm5 kernel: d-con r0: conn( WFReportParams ->
NetworkFailure )
Jun 17 06:31:15 kvm5 kernel: d-con r0: asender terminated
Jun 17 06:31:15 kvm5 kernel: d-con r0: Terminating asender thread
Jun 17 06:31:15 kvm5 kernel: block drbd0: helper command: /sbin/drbdadm
split-brain minor-0 exit code 0 (0x0)
Jun 17 06:31:15 kvm5 kernel: d-con r0: c*onn( NetworkFailure ->
Disconnecting )*
Jun 17 06:31:15 kvm5 kernel: d-con r0: error receiving ReportState, e: -5 l:
0!
Jun 17 06:31:15 kvm5 kernel: d-con r0: Connection closed
Jun 17 06:31:15 kvm5 kernel: d-con r0: conn( Disconnecting -> StandAlone )
Jun 17 06:31:15 kvm5 kernel: d-con r0: receiver terminated
Jun 17 06:31:15 kvm5 kernel: d-con r0: Terminating receiver thread


*Log in node B:*
Jun 17 05:58:39 kvm6 kernel: dlm: got connection from 3
Jun 17 06:31:14 kvm6 kernel: block drbd0: *Digest integrity check FAILED:
21908040s +4096*
Jun 17 06:31:14 kvm6 kernel: d-con r0: *error receiving Data, e: -5 l:
4112!*
Jun 17 06:31:14 kvm6 kernel: d-con r0: peer( Primary -> Unknown ) conn(
Connected -> ProtocolError ) pdsk( UpToDate -> DUnknown )
Jun 17 06:31:14 kvm6 kernel: block drbd0: new current UUID
96BBC9E849D133DB:264FC7B7437F70E5:E50620E2018A2EEF:E50520E2018A2EEF
Jun 17 06:31:14 kvm6 kernel: d-con r0: asender terminated
Jun 17 06:31:14 kvm6 kernel: d-con r0: Terminating asender thread
Jun 17 06:31:14 kvm6 kernel: d-con r0: Connection closed
Jun 17 06:31:14 kvm6 kernel: d-con r0: conn( ProtocolError -> Unconnected )
Jun 17 06:31:14 kvm6 kernel: d-con r0: receiver terminated
Jun 17 06:31:14 kvm6 kernel: d-con r0: Restarting receiver thread
Jun 17 06:31:14 kvm6 kernel: d-con r0: receiver (re)started
Jun 17 06:31:14 kvm6 kernel: d-con r0: conn( Unconnected -> WFConnection )
Jun 17 06:31:15 kvm6 kernel: d-con r0: Handshake successful: Agreed network
protocol version 101
Jun 17 06:31:15 kvm6 kernel: d-con r0: conn( WFConnection -> WFReportParams
)
Jun 17 06:31:15 kvm6 kernel: d-con r0: Starting asender thread (from
drbd_r_r0 [54943])
Jun 17 06:31:15 kvm6 kernel: block drbd0: drbd_sync_handshake:
Jun 17 06:31:15 kvm6 kernel: block drbd0: self
96BBC9E849D133DB:264FC7B7437F70E5:E50620E2018A2EEF:E50520E2018A2EEF bits:0
flags:0
Jun 17 06:31:15 kvm6 kernel: block drbd0: peer
381CAEFBB77A202F:264FC7B7437F70E5:E50620E2018A2EEF:E50520E2018A2EEF bits:14
flags:0
Jun 17 06:31:15 kvm6 kernel: block drbd0: uuid_compare()=100 by rule 90
Jun 17 06:31:15 kvm6 kernel: block drbd0: helper command: /sbin/drbdadm
initial-split-brain minor-0
Jun 17 06:31:15 kvm6 kernel: block drbd0: helper command: /sbin/drbdadm
initial-split-brain minor-0 exit code 0 (0x0)
Jun 17 06:31:15 kvm6 kernel: block drbd0: *Split-Brain detected but
unresolved, dropping connection!*
Jun 17 06:31:15 kvm6 kernel: block drbd0: helper command: /sbin/drbdadm
split-brain minor-0
Jun 17 06:31:15 kvm6 kernel: block drbd0: helper command: /sbin/drbdadm
split-brain minor-0 exit code 0 (0x0)
Jun 17 06:31:15 kvm6 kernel: d-con r0: conn( WFReportParams -> Disconnecting
)
Jun 17 06:31:15 kvm6 kernel: d-con r0: *error receiving ReportState, e: -5
l: 0!*
Jun 17 06:31:15 kvm6 kernel: d-con r0: asender terminated
Jun 17 06:31:15 kvm6 kernel: d-con r0: Terminating asender thread
Jun 17 06:31:15 kvm6 kernel: d-con r0: *Connection closed*
Jun 17 06:31:15 kvm6 kernel: d-con r0: conn( Disconnecting -> StandAlone )
Jun 17 06:31:15 kvm6 kernel: d-con r0: receiver terminated
Jun 17 06:31:15 kvm6 kernel: d-con r0: Terminating receiver thread

I will be extremely grateful to anyone who can help me

Best regards
Cesar




--
View this message in context: http://drbd.10923.n7.nabble.com/d-con-r0-sock-was-shut-down-by-peer-DRBD-8-4-2-tp17912.html
Sent from the DRBD - User mailing list archive at Nabble.com.



More information about the drbd-user mailing list