[DRBD-user] "drbdadm disconnect; drbdadm connect", and its out of sync again

Harald Dunkel harald.dunkel at aixigo.com
Wed Aug 16 15:13:23 CEST 2023


Hi folks,

I have introduced sync checks of my drbd volumes run by cron once
per month. Result: Most of my volumes are out of sync.

AFAIU the fix is to disconnect the secondary volume and connect it
again. drbd will sync the lost chunks of data automagically. The
output of cat /proc/drbd shows that it does.

Problem is, a "drbdadm verify data2" run immediately after the sync
shows that the volume is still (or again?) out of sync.

# cat /proc/drbd
version: 8.4.11 (api:1/proto:86-101)
srcversion: A2B7326333D8D6BCA07314C

  1: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r-----
     ns:1796033864 nr:0 dw:1795344536 dr:687121205 al:573908 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
  2: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate C r-----
     ns:0 nr:72537008 dw:1479579756 dr:649638416 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:20

Please note the "oos:20". Below you can find the log file of the
first and second verify session, run on the secondary.

This is not what I had expected. I had assumed that the primary
keeps track about the sectors it has to sync to the secondary, if
the secondary is offline, no matter what. When the secondary is
connected again, *all* out-of-sync sectors are copied.

How comes? Is my kernel too old (5.10.179-1, as provided by
Debian 11)? How can I improve reliability?


Regards

Harri
--------------------------------------------------------------------
[Sun Aug 13 07:37:12 2023] block drbd2: Out of sync: start=1040747880, size=8 (sectors)
[Sun Aug 13 07:37:18 2023] block drbd2: Out of sync: start=1041586840, size=8 (sectors)
[Sun Aug 13 07:37:21 2023] block drbd2: Out of sync: start=1041920856, size=8 (sectors)
[Sun Aug 13 07:50:54 2023] block drbd2: Out of sync: start=1146912168, size=8 (sectors)
[Sun Aug 13 07:53:29 2023] block drbd2: Out of sync: start=1166579112, size=8 (sectors)
[Sun Aug 13 07:53:30 2023] block drbd2: Out of sync: start=1166581448, size=8 (sectors)
:
:
[Mon Aug 14 14:52:34 2023] block drbd2: Out of sync: start=15145086480, size=8 (sectors)
[Mon Aug 14 14:52:54 2023] block drbd2: Out of sync: start=15147457080, size=8 (sectors)
[Mon Aug 14 15:12:00 2023] block drbd2: Out of sync: start=15288564184, size=8 (sectors)
[Mon Aug 14 15:34:04 2023] block drbd2: Out of sync: start=15452534120, size=8 (sectors)
[Mon Aug 14 15:47:24 2023] block drbd2: Out of sync: start=15554437512, size=8 (sectors)
[Mon Aug 14 16:04:50 2023] block drbd2: Online verify done (total 122673 sec; paused 0 sec; 62616 K/sec)
[Mon Aug 14 16:04:50 2023] block drbd2: Online verify found 190 4k block out of sync!
[Mon Aug 14 16:04:50 2023] block drbd2: conn( VerifyS -> Connected )
[Mon Aug 14 16:04:50 2023] block drbd2: helper command: /sbin/drbdadm out-of-sync minor-2
[Mon Aug 14 16:04:50 2023] block drbd2: helper command: /sbin/drbdadm out-of-sync minor-2 exit code 0 (0x0)
[Mon Aug 14 22:54:48 2023] drbd data2: Requested state change failed by peer: Refusing to be Primary while peer is not outdated (-7)
[Mon Aug 14 22:54:48 2023] drbd data2: peer( Primary -> Unknown ) conn( Connected -> Disconnecting ) disk( UpToDate -> Outdated ) pdsk( UpToDate -> DUnknown )
[Mon Aug 14 22:54:48 2023] drbd data2: ack_receiver terminated
[Mon Aug 14 22:54:48 2023] drbd data2: Terminating drbd_a_data2
[Mon Aug 14 22:54:48 2023] drbd data2: Connection closed
[Mon Aug 14 22:54:48 2023] drbd data2: conn( Disconnecting -> StandAlone )
[Mon Aug 14 22:54:48 2023] drbd data2: receiver terminated
[Mon Aug 14 22:54:48 2023] drbd data2: Terminating drbd_r_data2
[Mon Aug 14 22:54:53 2023] drbd data2: conn( StandAlone -> Unconnected )
[Mon Aug 14 22:54:53 2023] drbd data2: Starting receiver thread (from drbd_w_data2 [1472])
[Mon Aug 14 22:54:53 2023] drbd data2: receiver (re)started
[Mon Aug 14 22:54:53 2023] drbd data2: conn( Unconnected -> WFConnection )
[Mon Aug 14 22:54:53 2023] drbd data2: Handshake successful: Agreed network protocol version 101
[Mon Aug 14 22:54:53 2023] drbd data2: Feature flags enabled on protocol level: 0xf TRIM THIN_RESYNC WRITE_SAME WRITE_ZEROES.
[Mon Aug 14 22:54:53 2023] drbd data2: Peer authenticated using 32 bytes HMAC
[Mon Aug 14 22:54:53 2023] drbd data2: conn( WFConnection -> WFReportParams )
[Mon Aug 14 22:54:53 2023] drbd data2: Starting ack_recv thread (from drbd_r_data2 [2092105])
[Mon Aug 14 22:54:53 2023] block drbd2: drbd_sync_handshake:
[Mon Aug 14 22:54:53 2023] block drbd2: self 691455693E5AB756:0000000000000000:5ECB208B898522BC:5ECA208B898522BD bits:190 flags:0
[Mon Aug 14 22:54:53 2023] block drbd2: peer 1D95BBC8D3EE5A27:691455693E5AB757:5ECB208B898522BD:5ECA208B898522BD bits:360 flags:0
[Mon Aug 14 22:54:53 2023] block drbd2: uuid_compare()=-1 by rule 50
[Mon Aug 14 22:54:53 2023] block drbd2: peer( Unknown -> Primary ) conn( WFReportParams -> WFBitMapT ) pdsk( DUnknown -> UpToDate )
[Mon Aug 14 22:54:53 2023] block drbd2: receive bitmap stats [Bytes(packets)]: plain 0(0), RLE 1023(1), total 1023; compression: 100.0%
[Mon Aug 14 22:54:53 2023] block drbd2: send bitmap stats [Bytes(packets)]: plain 0(0), RLE 1023(1), total 1023; compression: 100.0%
[Mon Aug 14 22:54:53 2023] block drbd2: conn( WFBitMapT -> WFSyncUUID )
[Mon Aug 14 22:54:53 2023] block drbd2: updated sync uuid 691555693E5AB756:0000000000000000:5ECB208B898522BC:5ECA208B898522BD
[Mon Aug 14 22:54:53 2023] block drbd2: helper command: /sbin/drbdadm before-resync-target minor-2
[Mon Aug 14 22:54:53 2023] block drbd2: helper command: /sbin/drbdadm before-resync-target minor-2 exit code 0 (0x0)
[Mon Aug 14 22:54:53 2023] block drbd2: conn( WFSyncUUID -> SyncTarget ) disk( Outdated -> Inconsistent )
[Mon Aug 14 22:54:53 2023] block drbd2: Began resync as SyncTarget (will sync 1440 KB [360 bits set]).
[Mon Aug 14 22:54:54 2023] block drbd2: Resync done (total 1 sec; paused 0 sec; 1440 K/sec)
[Mon Aug 14 22:54:54 2023] block drbd2: updated UUIDs 1D95BBC8D3EE5A26:0000000000000000:691555693E5AB756:691455693E5AB757
[Mon Aug 14 22:54:54 2023] block drbd2: conn( SyncTarget -> Connected ) disk( Inconsistent -> UpToDate )
[Mon Aug 14 22:54:54 2023] block drbd2: helper command: /sbin/drbdadm after-resync-target minor-2
[Mon Aug 14 22:54:54 2023] block drbd2: helper command: /sbin/drbdadm after-resync-target minor-2 exit code 0 (0x0)
[Mon Aug 14 22:55:33 2023] block drbd2: conn( Connected -> VerifyS )
[Mon Aug 14 22:55:33 2023] block drbd2: Starting Online Verify from sector 0
[Tue Aug 15 09:11:58 2023] block drbd2: Out of sync: start=4723202472, size=8 (sectors)
[Tue Aug 15 11:02:05 2023] block drbd2: Out of sync: start=5507952664, size=8 (sectors)
[Tue Aug 15 11:02:23 2023] block drbd2: Out of sync: start=5510085128, size=8 (sectors)
[Tue Aug 15 13:19:38 2023] block drbd2: Out of sync: start=6543445712, size=8 (sectors)
[Tue Aug 15 15:43:44 2023] block drbd2: Out of sync: start=7428513912, size=8 (sectors)
[Tue Aug 15 15:43:44 2023] block drbd2: Out of sync: start=7428513976, size=8 (sectors)
[Tue Aug 15 19:33:12 2023] block drbd2: Out of sync: start=9106431560, size=8 (sectors)
[Wed Aug 16 09:33:27 2023] block drbd2: Out of sync: start=15573735816, size=8 (sectors)
[Wed Aug 16 09:47:57 2023] block drbd2: Online verify done (total 125543 sec; paused 0 sec; 62456 K/sec)
[Wed Aug 16 09:47:57 2023] block drbd2: Online verify found 5 4k block out of sync!
[Wed Aug 16 09:47:57 2023] block drbd2: conn( VerifyS -> Connected )
[Wed Aug 16 09:47:57 2023] block drbd2: helper command: /sbin/drbdadm out-of-sync minor-2
[Wed Aug 16 09:47:57 2023] block drbd2: helper command: /sbin/drbdadm out-of-sync minor-2 exit code 0 (0x0)


More information about the drbd-user mailing list