[DRBD-user] DRBD sync stalled at 100% ?

Eric Robinson eric.robinson at psmnv.com
Sat Jun 27 15:37:50 CEST 2020


Sorry for cross-posting this, but I'm not sure which list is the right one.

I'm not seeing anything on Google about this. Two DRBD nodes lost communication with each other, and then reconnected and started sync. But then it got to 100% and is just stalled there.

The nodes are 001db03a, 001db03b.

On 001db03a:

[root at 001db03a ~]# drbdadm status
ha01_mysql role:Primary
  disk:UpToDate
  001db03b role:Secondary
    replication:SyncSource peer-disk:Inconsistent done:100.00

ha02_mysql role:Secondary
  disk:UpToDate
  001db03b role:Primary
    peer-disk:UpToDate

On 001drbd03b:

[root at 001db03b ~]# drbdadm status
ha01_mysql role:Secondary
  disk:Inconsistent
  001db03a role:Primary
    replication:SyncTarget peer-disk:UpToDate done:100.00

ha02_mysql role:Primary
  disk:UpToDate
  001db03a role:Secondary
    peer-disk:UpToDate


On 001db03a, here are the DRBD messages from the onset of the problem until now.

Jun 26 22:34:27 001db03a kernel: drbd ha02_mysql 001db03b: PingAck did not arrive in time.
Jun 26 22:34:27 001db03a kernel: drbd ha02_mysql 001db03b: conn( Connected -> NetworkFailure ) peer( Primary -> Unknown )
Jun 26 22:34:27 001db03a kernel: drbd ha02_mysql/0 drbd1: disk( UpToDate -> Consistent )
Jun 26 22:34:27 001db03a kernel: drbd ha02_mysql/0 drbd1 001db03b: pdsk( UpToDate -> DUnknown ) repl( Established -> Off )
Jun 26 22:34:27 001db03a kernel: drbd ha02_mysql 001db03b: ack_receiver terminated
Jun 26 22:34:27 001db03a kernel: drbd ha02_mysql 001db03b: Terminating ack_recv thread
Jun 26 22:34:27 001db03a kernel: drbd ha02_mysql: Preparing cluster-wide state change 2946943372 (1->-1 0/0)
Jun 26 22:34:27 001db03a kernel: drbd ha02_mysql: Committing cluster-wide state change 2946943372 (6ms)
Jun 26 22:34:27 001db03a kernel: drbd ha02_mysql/0 drbd1: disk( Consistent -> UpToDate )
Jun 26 22:34:27 001db03a kernel: drbd ha02_mysql 001db03b: Connection closed
Jun 26 22:34:27 001db03a kernel: drbd ha02_mysql 001db03b: conn( NetworkFailure -> Unconnected )
Jun 26 22:34:27 001db03a kernel: drbd ha02_mysql 001db03b: Restarting receiver thread
Jun 26 22:34:27 001db03a kernel: drbd ha02_mysql 001db03b: conn( Unconnected -> Connecting )
Jun 26 22:34:30 001db03a kernel: drbd ha01_mysql 001db03b: PingAck did not arrive in time.
Jun 26 22:34:30 001db03a kernel: drbd ha01_mysql 001db03b: conn( Connected -> NetworkFailure ) peer( Secondary -> Unknown )
Jun 26 22:34:30 001db03a kernel: drbd ha01_mysql/0 drbd0 001db03b: pdsk( UpToDate -> DUnknown ) repl( Established -> Off )
Jun 26 22:34:30 001db03a kernel: drbd ha01_mysql 001db03b: ack_receiver terminated
Jun 26 22:34:30 001db03a kernel: drbd ha01_mysql 001db03b: Terminating ack_recv thread
Jun 26 22:34:30 001db03a kernel: drbd ha01_mysql/0 drbd0: new current UUID: D07A3D4B2F99832D weak: FFFFFFFFFFFFFFFD
Jun 26 22:34:30 001db03a kernel: drbd ha01_mysql 001db03b: Connection closed
Jun 26 22:34:30 001db03a kernel: drbd ha01_mysql 001db03b: conn( NetworkFailure -> Unconnected )
Jun 26 22:34:30 001db03a kernel: drbd ha01_mysql 001db03b: Restarting receiver thread
Jun 26 22:34:30 001db03a kernel: drbd ha01_mysql 001db03b: conn( Unconnected -> Connecting )
Jun 26 22:34:33 001db03a pengine[1474]:  notice:  * Start      p_drbd0:1        (                 001db03b )
Jun 26 22:34:33 001db03a crmd[1475]:  notice: Initiating notify operation p_drbd0_pre_notify_start_0 locally on 001db03a
Jun 26 22:34:33 001db03a crmd[1475]:  notice: Result of notify operation for p_drbd0 on 001db03a: 0 (ok)
Jun 26 22:34:33 001db03a crmd[1475]:  notice: Initiating start operation p_drbd0_start_0 on 001db03b
Jun 26 22:34:34 001db03a kernel: drbd ha02_mysql 001db03b: Handshake to peer 0 successful: Agreed network protocol version 113
Jun 26 22:34:34 001db03a kernel: drbd ha02_mysql 001db03b: Feature flags enabled on protocol level: 0xf TRIM THIN_RESYNC WRITE_SAME WRITE_ZEROES.
Jun 26 22:34:34 001db03a kernel: drbd ha02_mysql 001db03b: Starting ack_recv thread (from drbd_r_ha02_mys [2116])
Jun 26 22:34:34 001db03a kernel: drbd ha02_mysql 001db03b: Preparing remote state change 3920461435
Jun 26 22:34:34 001db03a kernel: drbd ha02_mysql 001db03b: Committing remote state change 3920461435 (primary_nodes=1)
Jun 26 22:34:34 001db03a kernel: drbd ha02_mysql 001db03b: conn( Connecting -> Connected ) peer( Unknown -> Primary )
Jun 26 22:34:34 001db03a kernel: drbd ha02_mysql/0 drbd1: disk( UpToDate -> Outdated )
Jun 26 22:34:34 001db03a kernel: drbd ha02_mysql/0 drbd1 001db03b: drbd_sync_handshake:
Jun 26 22:34:34 001db03a kernel: drbd ha02_mysql/0 drbd1 001db03b: self 492F8D33A72A8E08:0000000000000000:659DC04F5C85B6E4:8254EEA2EC50AD7C bits:0 flags:120
Jun 26 22:34:34 001db03a kernel: drbd ha02_mysql/0 drbd1 001db03b: peer 5A6B1EBE80500C39:492F8D33A72A8E09:659DC04F5C85B6E4:51A00A23ED88187A bits:1 flags:120
Jun 26 22:34:34 001db03a kernel: drbd ha02_mysql/0 drbd1 001db03b: uuid_compare()=-2 by rule 50
Jun 26 22:34:34 001db03a kernel: drbd ha02_mysql/0 drbd1 001db03b: pdsk( DUnknown -> UpToDate ) repl( Off -> WFBitMapT )
Jun 26 22:34:34 001db03a kernel: drbd ha02_mysql/0 drbd1 001db03b: receive bitmap stats [Bytes(packets)]: plain 0(0), RLE 23(1), total 23; compression: 100.0%
Jun 26 22:34:35 001db03a kernel: drbd ha02_mysql/0 drbd1 001db03b: send bitmap stats [Bytes(packets)]: plain 0(0), RLE 23(1), total 23; compression: 100.0%
Jun 26 22:34:35 001db03a kernel: drbd ha02_mysql/0 drbd1 001db03b: helper command: /sbin/drbdadm before-resync-target
Jun 26 22:34:35 001db03a kernel: drbd ha02_mysql/0 drbd1 001db03b: helper command: /sbin/drbdadm before-resync-target exit code 0 (0x0)
Jun 26 22:34:35 001db03a kernel: drbd ha02_mysql/0 drbd1: disk( Outdated -> Inconsistent )
Jun 26 22:34:35 001db03a kernel: drbd ha02_mysql/0 drbd1 001db03b: repl( WFBitMapT -> SyncTarget )
Jun 26 22:34:35 001db03a kernel: drbd ha02_mysql/0 drbd1 001db03b: Began resync as SyncTarget (will sync 4 KB [1 bits set]).
Jun 26 22:34:35 001db03a crmd[1475]:  notice: Initiating notify operation p_drbd0_post_notify_start_0 locally on 001db03a
Jun 26 22:34:35 001db03a crmd[1475]:  notice: Initiating notify operation p_drbd0_post_notify_start_0 on 001db03b
Jun 26 22:34:35 001db03a crmd[1475]:  notice: Transition aborted by status-2-master-p_drbd0 doing create master-p_drbd0=10000: Transient attribute change
Jun 26 22:34:35 001db03a crmd[1475]:  notice: Result of notify operation for p_drbd0 on 001db03a: 0 (ok)
Jun 26 22:34:35 001db03a crmd[1475]:  notice: Initiating monitor operation p_drbd0_monitor_60000 on 001db03b
Jun 26 22:34:35 001db03a kernel: drbd ha02_mysql/0 drbd1 001db03b: Resync done (total 1 sec; paused 0 sec; 4 K/sec)
Jun 26 22:34:35 001db03a kernel: drbd ha02_mysql/0 drbd1 001db03b: updated UUIDs 5A6B1EBE80500C38:0000000000000000:492F8D33A72A8E08:659DC04F5C85B6E4
Jun 26 22:34:35 001db03a kernel: drbd ha02_mysql/0 drbd1: disk( Inconsistent -> UpToDate )
Jun 26 22:34:35 001db03a kernel: drbd ha02_mysql/0 drbd1 001db03b: repl( SyncTarget -> Established )
Jun 26 22:34:35 001db03a kernel: drbd ha02_mysql/0 drbd1 001db03b: helper command: /sbin/drbdadm after-resync-target
Jun 26 22:34:35 001db03a kernel: drbd ha02_mysql/0 drbd1 001db03b: helper command: /sbin/drbdadm after-resync-target exit code 0 (0x0)
Jun 26 22:34:35 001db03a crmd[1475]:  notice: Transition aborted by status-2-master-p_drbd0 doing modify master-p_drbd0=1000: Transient attribute change
Jun 26 22:34:35 001db03a kernel: drbd ha01_mysql 001db03b: Handshake to peer 0 successful: Agreed network protocol version 113
Jun 26 22:34:35 001db03a kernel: drbd ha01_mysql 001db03b: Feature flags enabled on protocol level: 0xf TRIM THIN_RESYNC WRITE_SAME WRITE_ZEROES.
Jun 26 22:34:35 001db03a kernel: drbd ha01_mysql 001db03b: Starting ack_recv thread (from drbd_r_ha01_mys [2110])
Jun 26 22:34:35 001db03a kernel: drbd ha01_mysql 001db03b: Preparing remote state change 3458191960
Jun 26 22:34:35 001db03a kernel: drbd ha01_mysql 001db03b: Committing remote state change 3458191960 (primary_nodes=2)
Jun 26 22:34:35 001db03a kernel: drbd ha01_mysql 001db03b: conn( Connecting -> Connected ) peer( Unknown -> Secondary )
Jun 26 22:34:35 001db03a kernel: drbd ha01_mysql/0 drbd0 001db03b: drbd_sync_handshake:
Jun 26 22:34:35 001db03a kernel: drbd ha01_mysql/0 drbd0 001db03b: self D07A3D4B2F99832D:50AE57670FCB98C3:7DDDDEEEEEA477C4:B75C5B6B7AAFBB6A bits:22 flags:120
Jun 26 22:34:35 001db03a kernel: drbd ha01_mysql/0 drbd0 001db03b: peer 50AE57670FCB98C2:0000000000000000:7DDDDEEEEEA477C4:D2AAA82A5FF6EE84 bits:0 flags:20
Jun 26 22:34:35 001db03a kernel: drbd ha01_mysql/0 drbd0 001db03b: uuid_compare()=2 by rule 70
Jun 26 22:34:35 001db03a kernel: drbd ha01_mysql/0 drbd0 001db03b: pdsk( DUnknown -> Consistent ) repl( Off -> WFBitMapS )
Jun 26 22:34:35 001db03a kernel: drbd ha01_mysql/0 drbd0 001db03b: send bitmap stats [Bytes(packets)]: plain 0(0), RLE 83(1), total 83; compression: 100.0%
Jun 26 22:34:35 001db03a kernel: drbd ha01_mysql/0 drbd0 001db03b: pdsk( Consistent -> Outdated )
Jun 26 22:34:35 001db03a kernel: drbd ha01_mysql/0 drbd0 001db03b: receive bitmap stats [Bytes(packets)]: plain 0(0), RLE 83(1), total 83; compression: 100.0%
Jun 26 22:34:35 001db03a kernel: drbd ha01_mysql/0 drbd0 001db03b: helper command: /sbin/drbdadm before-resync-source
Jun 26 22:34:35 001db03a kernel: drbd ha01_mysql/0 drbd0 001db03b: helper command: /sbin/drbdadm before-resync-source exit code 0 (0x0)
Jun 26 22:34:35 001db03a kernel: drbd ha01_mysql/0 drbd0 001db03b: pdsk( Outdated -> Inconsistent ) repl( WFBitMapS -> SyncSource )
Jun 26 22:34:35 001db03a kernel: drbd ha01_mysql/0 drbd0 001db03b: Began resync as SyncSource (will sync 212 KB [53 bits set]).
Jun 26 22:36:35 001db03a kernel: drbd ha02_mysql 001db03b: sock was shut down by peer
Jun 26 22:36:35 001db03a kernel: drbd ha02_mysql 001db03b: conn( Connected -> BrokenPipe ) peer( Primary -> Unknown )
Jun 26 22:36:35 001db03a kernel: drbd ha02_mysql/0 drbd1: disk( UpToDate -> Consistent )
Jun 26 22:36:35 001db03a kernel: drbd ha02_mysql/0 drbd1 001db03b: pdsk( UpToDate -> DUnknown ) repl( Established -> Off )
Jun 26 22:36:35 001db03a kernel: drbd ha02_mysql 001db03b: meta connection shut down by peer.
Jun 26 22:36:35 001db03a kernel: drbd ha02_mysql 001db03b: ack_receiver terminated
Jun 26 22:36:35 001db03a kernel: drbd ha02_mysql 001db03b: Terminating ack_recv thread
Jun 26 22:36:39 001db03a kernel: drbd ha02_mysql: Preparing cluster-wide state change 2546365252 (1->-1 0/0)
Jun 26 22:36:39 001db03a kernel: drbd ha02_mysql: Committing cluster-wide state change 2546365252 (9ms)
Jun 26 22:36:39 001db03a kernel: drbd ha02_mysql/0 drbd1: disk( Consistent -> UpToDate )
Jun 26 22:36:39 001db03a kernel: drbd ha02_mysql 001db03b: Connection closed
Jun 26 22:36:39 001db03a kernel: drbd ha02_mysql 001db03b: conn( BrokenPipe -> Unconnected )
Jun 26 22:36:39 001db03a kernel: drbd ha02_mysql 001db03b: Restarting receiver thread
Jun 26 22:36:39 001db03a kernel: drbd ha02_mysql 001db03b: conn( Unconnected -> Connecting )
Jun 26 22:36:40 001db03a kernel: drbd ha02_mysql 001db03b: Handshake to peer 0 successful: Agreed network protocol version 113
Jun 26 22:36:40 001db03a kernel: drbd ha02_mysql 001db03b: Feature flags enabled on protocol level: 0xf TRIM THIN_RESYNC WRITE_SAME WRITE_ZEROES.
Jun 26 22:36:40 001db03a kernel: drbd ha02_mysql 001db03b: Starting ack_recv thread (from drbd_r_ha02_mys [2116])
Jun 26 22:36:40 001db03a kernel: drbd ha02_mysql 001db03b: Preparing remote state change 1109150886
Jun 26 22:36:40 001db03a kernel: drbd ha02_mysql 001db03b: Committing remote state change 1109150886 (primary_nodes=1)
Jun 26 22:36:40 001db03a kernel: drbd ha02_mysql 001db03b: conn( Connecting -> Connected ) peer( Unknown -> Primary )
Jun 26 22:36:40 001db03a kernel: drbd ha02_mysql/0 drbd1 001db03b: drbd_sync_handshake:
Jun 26 22:36:40 001db03a kernel: drbd ha02_mysql/0 drbd1 001db03b: self 5A6B1EBE80500C38:0000000000000000:492F8D33A72A8E08:659DC04F5C85B6E4 bits:0 flags:120
Jun 26 22:36:40 001db03a kernel: drbd ha02_mysql/0 drbd1 001db03b: peer 5A6B1EBE80500C39:0000000000000000:492F8D33A72A8E08:659DC04F5C85B6E4 bits:0 flags:120
Jun 26 22:36:40 001db03a kernel: drbd ha02_mysql/0 drbd1 001db03b: uuid_compare()=0 by rule 38
Jun 26 22:36:40 001db03a kernel: drbd ha02_mysql/0 drbd1 001db03b: pdsk( DUnknown -> UpToDate ) repl( Off -> Established )
Jun 26 22:39:41 001db03a kernel: drbd ha02_mysql 001db03b: PingAck did not arrive in time.
Jun 26 22:39:41 001db03a kernel: drbd ha02_mysql 001db03b: conn( Connected -> NetworkFailure ) peer( Primary -> Unknown )
Jun 26 22:39:41 001db03a kernel: drbd ha02_mysql/0 drbd1: disk( UpToDate -> Consistent )
Jun 26 22:39:41 001db03a kernel: drbd ha02_mysql/0 drbd1 001db03b: pdsk( UpToDate -> DUnknown ) repl( Established -> Off )
Jun 26 22:39:41 001db03a kernel: drbd ha02_mysql 001db03b: ack_receiver terminated
Jun 26 22:39:41 001db03a kernel: drbd ha02_mysql 001db03b: Terminating ack_recv thread
Jun 26 22:39:41 001db03a kernel: drbd ha02_mysql 001db03b: sock was shut down by peer
Jun 26 22:39:45 001db03a kernel: drbd ha02_mysql: Preparing cluster-wide state change 3067178175 (1->-1 0/0)
Jun 26 22:39:45 001db03a kernel: drbd ha02_mysql: Committing cluster-wide state change 3067178175 (8ms)
Jun 26 22:39:45 001db03a kernel: drbd ha02_mysql/0 drbd1: disk( Consistent -> UpToDate )
Jun 26 22:39:45 001db03a kernel: drbd ha02_mysql 001db03b: Connection closed
Jun 26 22:39:45 001db03a kernel: drbd ha02_mysql 001db03b: conn( NetworkFailure -> Unconnected )
Jun 26 22:39:45 001db03a kernel: drbd ha02_mysql 001db03b: Restarting receiver thread
Jun 26 22:39:45 001db03a kernel: drbd ha02_mysql 001db03b: conn( Unconnected -> Connecting )
Jun 26 22:39:45 001db03a kernel: drbd ha02_mysql 001db03b: Handshake to peer 0 successful: Agreed network protocol version 113
Jun 26 22:39:45 001db03a kernel: drbd ha02_mysql 001db03b: Feature flags enabled on protocol level: 0xf TRIM THIN_RESYNC WRITE_SAME WRITE_ZEROES.
Jun 26 22:39:45 001db03a kernel: drbd ha02_mysql 001db03b: Starting ack_recv thread (from drbd_r_ha02_mys [2116])
Jun 26 22:39:45 001db03a kernel: drbd ha02_mysql 001db03b: Preparing remote state change 2747304939
Jun 26 22:39:45 001db03a kernel: drbd ha02_mysql 001db03b: Committing remote state change 2747304939 (primary_nodes=1)
Jun 26 22:39:45 001db03a kernel: drbd ha02_mysql 001db03b: conn( Connecting -> Connected ) peer( Unknown -> Primary )
Jun 26 22:39:45 001db03a kernel: drbd ha02_mysql/0 drbd1 001db03b: drbd_sync_handshake:
Jun 26 22:39:45 001db03a kernel: drbd ha02_mysql/0 drbd1 001db03b: self 5A6B1EBE80500C38:0000000000000000:492F8D33A72A8E08:659DC04F5C85B6E4 bits:0 flags:120
Jun 26 22:39:45 001db03a kernel: drbd ha02_mysql/0 drbd1 001db03b: peer 5A6B1EBE80500C39:0000000000000000:492F8D33A72A8E08:659DC04F5C85B6E4 bits:0 flags:120
Jun 26 22:39:45 001db03a kernel: drbd ha02_mysql/0 drbd1 001db03b: uuid_compare()=0 by rule 38
Jun 26 22:39:45 001db03a kernel: drbd ha02_mysql/0 drbd1 001db03b: pdsk( DUnknown -> UpToDate ) repl( Off -> Established )

[cid:image001.png at 01D64C5E.3880A4C0]



[cid:image001.png at 01D64C5E.3880A4C0]

Disclaimer : This email and any files transmitted with it are confidential and intended solely for intended recipients. If you are not the named addressee you should not disseminate, distribute, copy or alter this email. Any views or opinions presented in this email are solely those of the author and might not represent those of Physician Select Management. Warning: Although Physician Select Management has taken reasonable precautions to ensure no viruses are present in this email, the company cannot accept responsibility for any loss or damage arising from the use of this email or attachments.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20200627/786f8a2c/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 33383 bytes
Desc: image001.png
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20200627/786f8a2c/attachment-0001.png>


More information about the drbd-user mailing list