[DRBD-user] List of drbd socket errors

Lars Ellenberg lars.ellenberg at linbit.com
Wed Sep 21 14:15:19 CEST 2011

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


On Wed, Sep 21, 2011 at 10:08:42AM +1000, Ivan Pavlenko wrote:
> Hi All,
> 
> Recently I had split brain onto my cluster. There was a not a big
> issue, but I still haven't found any reason of this glitch. I got in
> my log dile next:

We call it a DRBD resource internal split brain, when you have a period
in time during which both nodes can not communicate, _and_ both have
been Primary.

Which means, whenever you run dual-primary DRBD, and have a hickup on
the replication link, that causes a DRBD "split brain",
maybe better read that as "potential data-set divergence".

> Sep 20 18:44:35 infplsm004 <kern.info> kernel: VMCIUtil: Updating
> context id from 0x775d2835 to 0x775d2835 on event 0.
> Sep 20 18:44:35 infplsm004 <kern.err> kernel: block drbd2:
> sock_recvmsg returned -104
> Sep 20 18:44:35 infplsm004 <kern.info> kernel: block drbd2: peer(
> Primary -> Unknown ) conn( Connected -> NetworkFailure ) pdsk(
> UpToDate -> DUnknown )
> Sep 20 18:44:35 infplsm004 <kern.info> kernel: block drbd2: asender
> terminated
> Sep 20 18:44:35 infplsm004 <kern.info> kernel: block drbd2:
> Terminating asender thread
> Sep 20 18:44:35 infplsm004 <kern.err> kernel: block drbd2: short
> read expecting header on sock: r=-512
> Sep 20 18:44:35 infplsm004 <kern.info> kernel: block drbd2: Creating
> new current UUID
> Sep 20 18:44:36 infplsm004 <kern.info> kernel: block drbd2:
> Connection closed
> Sep 20 18:44:36 infplsm004 <kern.info> kernel: block drbd2: conn(
> NetworkFailure -> Unconnected )
> Sep 20 18:44:36 infplsm004 <kern.info> kernel: block drbd2: receiver
> terminated
> Sep 20 18:44:36 infplsm004 <kern.info> kernel: block drbd2:
> Restarting receiver thread
> Sep 20 18:44:36 infplsm004 <kern.info> kernel: block drbd2: receiver
> (re)started
> Sep 20 18:44:36 infplsm004 <kern.info> kernel: block drbd2: conn(
> Unconnected -> WFConnection )
> Sep 20 18:44:38 infplsm004 <kern.info> kernel: block drbd2:
> Handshake successful: Agreed network protocol version 94
> Sep 20 18:44:38 infplsm004 <kern.info> kernel: block drbd2: conn(
> WFConnection -> WFReportParams )
> Sep 20 18:44:38 infplsm004 <kern.info> kernel: block drbd2: Starting
> asender thread (from drbd2_receiver [11360])
> Sep 20 18:44:38 infplsm004 <kern.info> kernel: block drbd2:
> data-integrity-alg: <not-used>
> Sep 20 18:44:38 infplsm004 <kern.info> kernel: block drbd2:
> drbd_sync_handshake:
> Sep 20 18:44:38 infplsm004 <kern.info> kernel: block drbd2: self
> AD9C020C7BA6E149:51B8CD59E67A7227:01C987FB5F84C0D1:30241D96D32A31CF
> bits:1 flags:0
> Sep 20 18:44:38 infplsm004 <kern.info> kernel: block drbd2: peer
> A2111F74640A099D:51B8CD59E67A7227:01C987FB5F84C0D0:30241D96D32A31CF
> bits:0 flags:0
> Sep 20 18:44:38 infplsm004 <kern.info> kernel: block drbd2:
> uuid_compare()=100 by rule 90
> Sep 20 18:44:38 infplsm004 <kern.info> kernel: block drbd2: helper
> command: /sbin/drbdadm initial-split-brain minor-2
> Sep 20 18:44:38 infplsm004 <kern.info> kernel: block drbd2: helper
> command: /sbin/drbdadm initial-split-brain minor-2 exit code 0 (0x0)
> Sep 20 18:44:38 infplsm004 <kern.alert> kernel: block drbd2:
> Split-Brain detected but unresolved, dropping connection!
> Sep 20 18:44:38 infplsm004 <kern.info> kernel: block drbd2: helper
> command: /sbin/drbdadm split-brain minor-2
> Sep 20 18:44:38 infplsm004 <kern.err> kernel: block drbd2: meta
> connection shut down by peer.
> Sep 20 18:44:38 infplsm004 <kern.info> kernel: block drbd2: conn(
> WFReportParams -> NetworkFailure )
> Sep 20 18:44:38 infplsm004 <kern.info> kernel: block drbd2: asender
> terminated
> Sep 20 18:44:38 infplsm004 <kern.info> kernel: block drbd2:
> Terminating asender thread
> Sep 20 18:44:38 infplsm004 <kern.info> kernel: block drbd2: helper
> command: /sbin/drbdadm split-brain minor-2 exit code 0 (0x0)
> Sep 20 18:44:38 infplsm004 <kern.info> kernel: block drbd2: conn(
> NetworkFailure -> Disconnecting )
> Sep 20 18:44:38 infplsm004 <kern.err> kernel: block drbd2: error
> receiving ReportState, l: 4!
> Sep 20 18:44:38 infplsm004 <kern.info> kernel: block drbd2:
> Connection closed
> Sep 20 18:44:38 infplsm004 <kern.info> kernel: block drbd2: conn(
> Disconnecting -> StandAlone )
> Sep 20 18:44:38 infplsm004 <kern.info> kernel: block drbd2: receiver
> terminated
> Sep 20 18:44:38 infplsm004 <kern.info> kernel: block drbd2:
> Terminating receiver thread
> 
> I'd like to stress your attention on first two rows.  DRBD socket
> received messages is code -104. What's it for? Where I can get info
> about error codes?

These are typically normal negative errno codes,
on my box 104 would be ECONNRESET, Connection reset by peer.

> 
> Thank you in advance,
> Ivan
> 
> _______________________________________________
> drbd-user mailing list
> drbd-user at lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed



More information about the drbd-user mailing list