[DRBD-user] possible split brain, neither node will promote to primary

Lonni J Friedman netllama at gmail.com
Thu Oct 17 21:08:58 CEST 2013

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Greetings,
While I was recently out of the office, one of our DRBD clusters
(v8.4.2) experienced a power outage, which left things in an
inconsistent state.  The resource name is 'sdb'.  Both nodes refuse to
get promoted to primary manually with the error "Need access to
UpToDate data"
# drbdadm primary sdb
0: State change failed: (-2) Need access to UpToDate data
Command 'drbdsetup primary 0' terminated with exit code 17


If I try invalidating/discarding the changes on the node that should
be secondary, that seems to work:
# drbdadm invalidate sdb

in so much as there are no errors.  However I'm still unable to
promote the other node to primary:
# drbdadm primary sdb
0: State change failed: (-2) Need access to UpToDate data
Command 'drbdsetup primary 0' terminated with exit code 17

In dmesg, I see:

[ 1727.904874] block drbd0: State change failed: Need access to UpToDate data
[ 1727.959118] block drbd0:   state = { cs:StandAlone
ro:Secondary/Unknown ds:Inconsistent/DUnknown r----- }
[ 1728.069711] block drbd0:  wanted = { cs:StandAlone
ro:Primary/Unknown ds:Inconsistent/DUnknown r----- }


On the node that I want to be primary:
# drbd-overview
  0:sdb/0  WFConnection Secondary/Unknown Inconsistent/DUnknown C r-----

On the node that I want to be secondary:
# drbd-overview
  0:sdb/0  StandAlone Secondary/Unknown Inconsistent/DUnknown r----s


If I explicitly disconnect & then connect on the secondary node, I see
the following in dmesg on the primary (which suggests that they can
talk to eachother just fine, or the primary would never know what I
was running on the secondary):
Oct 17 12:07:17 cuda-fs2a kernel: [ 1591.170600] d-con sdb: Handshake
successful: Agreed network protocol version 101
Oct 17 12:07:17 cuda-fs2a kernel: [ 1591.282104] d-con sdb: conn(
WFConnection -> WFReportParams )
Oct 17 12:07:17 cuda-fs2a kernel: [ 1591.340240] d-con sdb: Starting
asender thread (from drbd_r_sdb [5253])
Oct 17 12:07:18 cuda-fs2a kernel: [ 1591.417309] block drbd0:
drbd_sync_handshake:
Oct 17 12:07:18 cuda-fs2a kernel: [ 1591.476699] block drbd0: self
B381016E97733504:0000000000000000:3B8A9D576A28E4D5:3BABCA99DD3A7BFC
bits:488338885 flags:0
Oct 17 12:07:18 cuda-fs2a kernel: [ 1591.600033] block drbd0: peer
BAF0602E951BBE80:B381016E97733504:3B8A9D576A28E4D4:3BABCA99DD3A7BFC
bits:488338885 flags:2
Oct 17 12:07:18 cuda-fs2a kernel: [ 1591.731182] block drbd0:
uuid_compare()=-1 by rule 50
Oct 17 12:07:18 cuda-fs2a kernel: [ 1591.796879] block drbd0:
Implicitly upgraded pdsk
Oct 17 12:07:18 cuda-fs2a kernel: [ 1591.861511] block drbd0: peer(
Unknown -> Secondary ) conn( WFReportParams -> WFBitMapT ) pdsk(
DUnknown -> UpToDate )
Oct 17 12:07:18 cuda-fs2a kernel: [ 1592.048675] d-con sdb: sock was
shut down by peer
Oct 17 12:07:18 cuda-fs2a kernel: [ 1592.048695] d-con sdb: meta
connection shut down by peer.
Oct 17 12:07:18 cuda-fs2a kernel: [ 1592.048742] d-con sdb: peer(
Secondary -> Unknown ) conn( WFBitMapT -> NetworkFailure ) pdsk(
UpToDate -> DUnknown )
Oct 17 12:07:18 cuda-fs2a kernel: [ 1592.048947] d-con sdb: asender terminated
Oct 17 12:07:18 cuda-fs2a kernel: [ 1592.048949] d-con sdb:
Terminating drbd_a_sdb
Oct 17 12:07:19 cuda-fs2a kernel: [ 1592.415501] d-con sdb: Connection closed
Oct 17 12:07:19 cuda-fs2a kernel: [ 1592.472362] d-con sdb: conn(
NetworkFailure -> Unconnected )
Oct 17 12:07:19 cuda-fs2a kernel: [ 1592.530239] d-con sdb: receiver terminated
Oct 17 12:07:19 cuda-fs2a kernel: [ 1592.587307] d-con sdb: Restarting
receiver thread
Oct 17 12:07:19 cuda-fs2a kernel: [ 1592.644489] d-con sdb: receiver (re)started
Oct 17 12:07:19 cuda-fs2a kernel: [ 1592.701116] d-con sdb: conn(
Unconnected -> WFConnection )


At this point, I'm quite confused, and not sure how to get things
working again.  Help?

thanks



More information about the drbd-user mailing list