[DRBD-user] DRBD 9: UpToDate but verify reports resource to be almost completely out of sync, no resynchronisation on disconnect/connect however

Michael Hierweck michael at hierweck.de
Thu May 6 10:06:00 CEST 2021


Hi all,

In our three node, single master DRBD cluster we experienced the following situation.



Check state:

root at host101010:/home/admins# drbdadm status vmXXXX
vmXXXX role:Primary
      disk:UpToDate
      host101011 role:Secondary
        peer-disk:UpToDate
      host101020 role:Secondary
        peer-disk:UpToDate

Everything looks nice.



Let's perform a verification:

root at host101010:/home/admins# drbdadm verify vmXXXX

Running:

root at host101010:/home/admins# drbdadm status vmXXXX
vmXXXX role:Primary
      disk:UpToDate
      host101011 role:Secondary
        replication:VerifyS peer-disk:UpToDate done:0.09
      host101020 role:Secondary
        replication:VerifyS peer-disk:UpToDate done:0.11



The results are remarkable:

[Wed May  5 11:07:55 2021] drbd vmXXXX/0 drbdXXXX0 host101011: Online verify done (total 7112 
sec; paused 0 sec; 29484 K/sec)
[Wed May  5 11:07:55 2021] drbd vmXXXX/0 drbdXXXX0 host101011: Online verify found 1 4k blocks 
out of sync!
[Wed May  5 11:07:55 2021] drbd vmXXXX/0 drbdXXXX0 host101011: repl( VerifyS -> Established )
[Wed May  5 11:07:55 2021] drbd vmXXXX/0 drbdXXXX0 host101011: helper command: /sbin/drbdadm 
out-of-sync
[Wed May  5 11:07:55 2021] drbd vmXXXX/0 drbdXXXX0 host101011: helper command: /sbin/drbdadm 
out-of-sync exit code 0
[Wed May  5 11:11:05 2021] drbd vmXXXX/0 drbdXXXX0 host101020: Online verify done (total 7302 
sec; paused 0 sec; 28720 K/sec)
[Wed May  5 11:11:05 2021] drbd vmXXXX/0 drbdXXXX0 host101020: Online verify found 41790818 4k 
blocks out of sync!
[Wed May  5 11:11:05 2021] drbd vmXXXX/0 drbdXXXX0 host101020: repl( VerifyS -> Established )
[Wed May  5 11:11:05 2021] drbd vmXXXX/0 drbdXXXX0 host101020: helper command: /sbin/drbdadm 
out-of-sync
[Wed May  5 11:11:05 2021] drbd vmXXXX/0 drbdXXXX0 host101020: helper command: /sbin/drbdadm 
out-of-sync exit code 0

Verify claims almost the whole 150G device to be out of sync.


Let's perform disconnect/connect:

[Wed May  5 11:44:30 2021] drbd vmXXXX: Preparing cluster-wide state change 1977682193 (0->1 
496/16)
[Wed May  5 11:44:30 2021] drbd vmXXXX: State change 1977682193: primary_nodes=1, 
weak_nodes=FFFFFFFFFFFFFFFA
[Wed May  5 11:44:30 2021] drbd vmXXXX: Committing cluster-wide state change 1977682193 (0ms)
[Wed May  5 11:44:30 2021] drbd vm1054 host101011: conn( Connected -> Disconnecting ) peer( 
Secondary -> Unknown )
[Wed May  5 11:44:30 2021] drbd vmXXXX/0 drbdXXXX0 host101011: pdsk( UpToDate -> DUnknown ) 
repl( Established -> Off )
[Wed May  5 11:44:30 2021] drbd vmXXXX host101011: ack_receiver terminated
[Wed May  5 11:44:30 2021] drbd vmXXXX host101011: Terminating ack_recv thread
[Wed May  5 11:44:30 2021] drbd vmXXXX host101011: Restarting sender thread
[Wed May  5 11:44:30 2021] drbd vmXXXX host101011: Connection closed
[Wed May  5 11:44:30 2021] drbd vmXXXX host101011: helper command: /sbin/drbdadm disconnected
[Wed May  5 11:44:30 2021] drbd vmXXXX host101011: helper command: /sbin/drbdadm disconnected 
exit code 0
[Wed May  5 11:44:30 2021] drbd vmXXXX host101011: conn( Disconnecting -> StandAlone )
[Wed May  5 11:44:30 2021] drbd vmXXXX host101011: Terminating receiver thread
[Wed May  5 11:44:30 2021] drbd vmXXXX: Preparing cluster-wide state change 2927240878 (0->2 
496/16)
[Wed May  5 11:44:30 2021] drbd vmXXXX: State change 2927240878: primary_nodes=1, 
weak_nodes=FFFFFFFFFFFFFFFE
[Wed May  5 11:44:30 2021] drbd vmXXXX host101020: Cluster is now split
[Wed May  5 11:44:30 2021] drbd vmXXXX: Committing cluster-wide state change 2927240878 (0ms)
[Wed May  5 11:44:30 2021] drbd vmXXXX host101020: conn( Connected -> Disconnecting ) peer( 
Secondary -> Unknown )
[Wed May  5 11:44:30 2021] drbd vmXXXX/0 drbdXXXX0 host101020: pdsk( UpToDate -> DUnknown ) 
repl( Established -> Off )
[Wed May  5 11:44:30 2021] drbd vmXXXX host101020: ack_receiver terminated
[Wed May  5 11:44:30 2021] drbd vmXXXX host101020: Terminating ack_recv thread
[Wed May  5 11:44:30 2021] drbd vmXXXX host101020: Restarting sender thread
[Wed May  5 11:44:30 2021] drbd vmXXXX host101020: Connection closed
[Wed May  5 11:44:30 2021] drbd vmXXXX host101020: helper command: /sbin/drbdadm disconnected
[Wed May  5 11:44:30 2021] drbd vmXXXX host101020: helper command: /sbin/drbdadm disconnected 
exit code 0
[Wed May  5 11:44:30 2021] drbd vmXXXX host101020: conn( Disconnecting -> StandAlone )
[Wed May  5 11:44:30 2021] drbd vmXXXX host101020: Terminating receiver thread
[Wed May  5 11:44:33 2021] drbd vmXXXX host101011: conn( StandAlone -> Unconnected )
[Wed May  5 11:44:33 2021] drbd vmXXXX host101011: Starting receiver thread (from drbd_w_vmXXXX 
[38063])
[Wed May  5 11:44:33 2021] drbd vmXXXX host101011: conn( Unconnected -> Connecting )
[Wed May  5 11:44:33 2021] drbd vmXXXX host101020: conn( StandAlone -> Unconnected )
[Wed May  5 11:44:33 2021] drbd vmXXXX host101020: Starting receiver thread (from drbd_w_vmXXXX 
[38063])
[Wed May  5 11:44:33 2021] drbd vmXXXX host101020: conn( Unconnected -> Connecting )
[Wed May  5 11:44:33 2021] drbd vmXXXX host101020: Handshake to peer 2 successful: Agreed 
network protocol version 119
[Wed May  5 11:44:33 2021] drbd vmXXXX host101020: Feature flags enabled on protocol level: 0xf 
TRIM THIN_RESYNC WRITE_SAME WRITE_ZEROES.
[Wed May  5 11:44:33 2021] drbd vmXXXX host101020: Peer authenticated using 20 bytes HMAC
[Wed May  5 11:44:33 2021] drbd vmXXXX host101020: Starting ack_recv thread (from drbd_r_vmXXXX 
[16989])
[Wed May  5 11:44:33 2021] drbd vmXXXX: Preparing cluster-wide state change 2251651464 (0->2 
499/145)
[Wed May  5 11:44:33 2021] drbd vmXXXX host101011: Handshake to peer 1 successful: Agreed 
network protocol version 119
[Wed May  5 11:44:33 2021] drbd vmXXXX host101011: Feature flags enabled on protocol level: 0xf 
TRIM THIN_RESYNC WRITE_SAME WRITE_ZEROES.
[Wed May  5 11:44:33 2021] drbd vmXXXX host101011: Peer authenticated using 20 bytes HMAC
[Wed May  5 11:44:33 2021] drbd vmXXXX host101011: Starting ack_recv thread (from drbd_r_vmXXXX 
[16987])
[Wed May  5 11:44:33 2021] drbd vmXXXX/0 drbdXXXX0 host101020: drbd_sync_handshake:
[Wed May  5 11:44:33 2021] drbd vmXXXX/0 drbdXXXX0 host101020: self 
AE3DDC8114ED4F29:0000000000000000:BAD783DF14F75BB4:CD4C13862D04D7CC bits:41790818 flags:120
[Wed May  5 11:44:33 2021] drbd vmXXXX/0 drbdXXXX0 host101020: peer 
AE3DDC8114ED4F28:0000000000000000:9D5D2097F94E7D30:F96063637CE0E650 bits:41790624 flags:120
[Wed May  5 11:44:33 2021] drbd vmXXXX/0 drbdXXXX0 host101020: uuid_compare()=no-sync by rule 38
[Wed May  5 11:44:33 2021] drbd vmXXXX/0 drbdXXXX0 host101020: bitmap content (41790818 bits)
[Wed May  5 11:44:33 2021] drbd vmXXXX: State change 2251651464: primary_nodes=1, 
weak_nodes=FFFFFFFFFFFFFFFA
[Wed May  5 11:44:33 2021] drbd vmXXXX: Committing cluster-wide state change 2251651464 (88ms)
[Wed May  5 11:44:33 2021] drbd vmXXXX host101020: conn( Connecting -> Connected ) peer( 
Unknown -> Secondary )
[Wed May  5 11:44:33 2021] drbd vmXXXX/0 drbdXXXX0 host101020: pdsk( DUnknown -> Outdated ) 
repl( Off -> Established )
[Wed May  5 11:44:33 2021] drbd vmXXXX/0 drbdXXXX0 host101020: pdsk( Outdated -> UpToDate )
[Wed May  5 11:44:34 2021] drbd vmXXXX: Preparing cluster-wide state change 1922267911 (0->1 
499/145)
[Wed May  5 11:44:34 2021] drbd vmXXXX/0 drbdXXXX0 host101011: drbd_sync_handshake:
[Wed May  5 11:44:34 2021] drbd vmXXXX/0 drbdXXXX0 host101011: self 
AE3DDC8114ED4F29:0000000000000000:BAD783DF14F75BB4:CD4C13862D04D7CC bits:1 flags:120
[Wed May  5 11:44:34 2021] drbd vmXXXX/0 drbdXXXX0 host101011: peer 
AE3DDC8114ED4F28:0000000000000000:E7DC3C8D8FA65BA8:9D5D2097F94E7D30 bits:1 flags:120
[Wed May  5 11:44:34 2021] drbd vmXXXX/0 drbdXXXX0 host101011: uuid_compare()=no-sync by rule 38
[Wed May  5 11:44:34 2021] drbd vmXXXX/0 drbdXXXX0 host101011: bitmap content (1 bits)
[Wed May  5 11:44:34 2021] drbd vmXXXX: State change 1922267911: primary_nodes=1, 
weak_nodes=FFFFFFFFFFFFFFF8
[Wed May  5 11:44:34 2021] drbd vmXXXX: Committing cluster-wide state change 1922267911 (48ms)
[Wed May  5 11:44:34 2021] drbd vmXXXX host101011: conn( Connecting -> Connected ) peer( 
Unknown -> Secondary )
[Wed May  5 11:44:34 2021] drbd vmXXXX/0 drbdXXXX0 host101011: pdsk( DUnknown -> Outdated ) 
repl( Off -> Established )
[Wed May  5 11:44:34 2021] drbd vmXXXX/0 drbdXXXX0 host101011: pdsk( Outdated -> UpToDate )


However after disconnect/reconnect drdadmin status vmXXXX did not show any synchronisation 
activity. Instead state was UpToDate/UpToDate/UpToDate immediately.


How can this be explained?



Debian GNU/Linux 10
drbd 9.0.28-1
drbd-utils 9.15 (from Buster Backports)

global {
      usage-count no;
      udev-always-use-vnr;
}

common {
      handlers {
      }
      startup {
      }
      options {
          auto-promote no;
      }
      disk {
          al-extents 250;
          c-plan-ahead 0;
      }
      net {
          protocol C;
          max-buffers 8000;
          max-epoch-size 8000;
          sndbuf-size 8M;
          cram-hmac-alg "sha1";
          verify-alg "sha1";
      }
}

resource "vmXXXX" {
      volume 0 {
          device      /dev/drbdXXXX0;
          disk        /dev/zvol/disk/data/vmXXXX-0;
          meta-disk   /dev/zvol/disk/meta/vmXXXX-0;
      }
      on host101011 {
          address 10.200.100.11:1XXXX;
          node-id 0;
      }
      on host101010 {
          address 10.200.100.10:1XXXX;
          node-id 1;
      }
      on host101020 {
          address 10.200.100.20:1XXXX;
          node-id 2;
      }
      connection-mesh {
          hosts host101011 host101010 host101020;
      }
      net {
          shared-secret "...";
      }
}


Thanks in advance,

Michael








More information about the drbd-user mailing list