Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Greetings, While I was recently out of the office, one of our DRBD clusters (v8.4.2) experienced a power outage, which left things in an inconsistent state. The resource name is 'sdb'. Both nodes refuse to get promoted to primary manually with the error "Need access to UpToDate data" # drbdadm primary sdb 0: State change failed: (-2) Need access to UpToDate data Command 'drbdsetup primary 0' terminated with exit code 17 If I try invalidating/discarding the changes on the node that should be secondary, that seems to work: # drbdadm invalidate sdb in so much as there are no errors. However I'm still unable to promote the other node to primary: # drbdadm primary sdb 0: State change failed: (-2) Need access to UpToDate data Command 'drbdsetup primary 0' terminated with exit code 17 In dmesg, I see: [ 1727.904874] block drbd0: State change failed: Need access to UpToDate data [ 1727.959118] block drbd0: state = { cs:StandAlone ro:Secondary/Unknown ds:Inconsistent/DUnknown r----- } [ 1728.069711] block drbd0: wanted = { cs:StandAlone ro:Primary/Unknown ds:Inconsistent/DUnknown r----- } On the node that I want to be primary: # drbd-overview 0:sdb/0 WFConnection Secondary/Unknown Inconsistent/DUnknown C r----- On the node that I want to be secondary: # drbd-overview 0:sdb/0 StandAlone Secondary/Unknown Inconsistent/DUnknown r----s If I explicitly disconnect & then connect on the secondary node, I see the following in dmesg on the primary (which suggests that they can talk to eachother just fine, or the primary would never know what I was running on the secondary): Oct 17 12:07:17 cuda-fs2a kernel: [ 1591.170600] d-con sdb: Handshake successful: Agreed network protocol version 101 Oct 17 12:07:17 cuda-fs2a kernel: [ 1591.282104] d-con sdb: conn( WFConnection -> WFReportParams ) Oct 17 12:07:17 cuda-fs2a kernel: [ 1591.340240] d-con sdb: Starting asender thread (from drbd_r_sdb [5253]) Oct 17 12:07:18 cuda-fs2a kernel: [ 1591.417309] block drbd0: drbd_sync_handshake: Oct 17 12:07:18 cuda-fs2a kernel: [ 1591.476699] block drbd0: self B381016E97733504:0000000000000000:3B8A9D576A28E4D5:3BABCA99DD3A7BFC bits:488338885 flags:0 Oct 17 12:07:18 cuda-fs2a kernel: [ 1591.600033] block drbd0: peer BAF0602E951BBE80:B381016E97733504:3B8A9D576A28E4D4:3BABCA99DD3A7BFC bits:488338885 flags:2 Oct 17 12:07:18 cuda-fs2a kernel: [ 1591.731182] block drbd0: uuid_compare()=-1 by rule 50 Oct 17 12:07:18 cuda-fs2a kernel: [ 1591.796879] block drbd0: Implicitly upgraded pdsk Oct 17 12:07:18 cuda-fs2a kernel: [ 1591.861511] block drbd0: peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapT ) pdsk( DUnknown -> UpToDate ) Oct 17 12:07:18 cuda-fs2a kernel: [ 1592.048675] d-con sdb: sock was shut down by peer Oct 17 12:07:18 cuda-fs2a kernel: [ 1592.048695] d-con sdb: meta connection shut down by peer. Oct 17 12:07:18 cuda-fs2a kernel: [ 1592.048742] d-con sdb: peer( Secondary -> Unknown ) conn( WFBitMapT -> NetworkFailure ) pdsk( UpToDate -> DUnknown ) Oct 17 12:07:18 cuda-fs2a kernel: [ 1592.048947] d-con sdb: asender terminated Oct 17 12:07:18 cuda-fs2a kernel: [ 1592.048949] d-con sdb: Terminating drbd_a_sdb Oct 17 12:07:19 cuda-fs2a kernel: [ 1592.415501] d-con sdb: Connection closed Oct 17 12:07:19 cuda-fs2a kernel: [ 1592.472362] d-con sdb: conn( NetworkFailure -> Unconnected ) Oct 17 12:07:19 cuda-fs2a kernel: [ 1592.530239] d-con sdb: receiver terminated Oct 17 12:07:19 cuda-fs2a kernel: [ 1592.587307] d-con sdb: Restarting receiver thread Oct 17 12:07:19 cuda-fs2a kernel: [ 1592.644489] d-con sdb: receiver (re)started Oct 17 12:07:19 cuda-fs2a kernel: [ 1592.701116] d-con sdb: conn( Unconnected -> WFConnection ) At this point, I'm quite confused, and not sure how to get things working again. Help? thanks