[DRBD-user] Tracking down sources of corruption (possibly) detected by drbdadm verify...

Tue Jan 15 08:32:13 CET 2008

Hi all.

I'm trying out the new online verify functionality of 8.2.4 on a sun x2100M2 
(amd64 nvidia MCP55 based system) running in 64bit mode.

Configuration is
o 2.6.22.15 kernel with vserver 2.2.0.5 patch
o 2 nodes
o 2 internal disks with md mirrors for /, /boot, swap and a partitionable md 
for the drbd partitions
o drbd traffic travels over an nvidia (forcedeth driver) gigE direct 
connection between the two nodes. Have tried this with TSO, RX and TX 
checksumming enabled and disabled (via ethtool -K eth3 tso|tx|rx off) on both 
nodes and the issue persists.

After doing a full resync (via drbdadm invalidate-remote disk0), and no 
activity having been performed on the drbd resource, I would expect 'drbdadm 
verify disk0' to not show any differences between the two. Is this correct?

# TSO and checksumming on
ethtool -K eth3 tx on
ethtool -K eth3 rx on
ethtool -K eth3 tso on

# ethtool -k eth3
Offload parameters for eth3:
Cannot get device udp large send offload settings: Operation not supported
rx-checksumming: on
tx-checksumming: on
scatter-gather: on
tcp segmentation offload: on
udp fragmentation offload: off
generic segmentation offload: off

# cat /proc/drbd
version: 8.2.4 (api:88/proto:86-88)
GIT-hash: fc00c6e00a1b6039bfcebe37afa3e7e28dbd92fa build by root at host1, 
2008-01-14 10:28:18
 0: cs:Connected st:Primary/Secondary ds:UpToDate/UpToDate C r---
    ns:1844 nr:0 dw:8 dr:620364242 al:0 bm:12672 lo:0 pe:0 ua:0 ap:0
        resync: used:0/31 hits:116285364 misses:37916 starving:0 dirty:0 
changed:37916
        act_log: used:0/257 hits:2 misses:0 starving:0 dirty:0 changed:0
 1: cs:Connected st:Secondary/Primary ds:UpToDate/UpToDate C r---
    ns:0 nr:62936 dw:62936 dr:208772245 al:0 bm:24 lo:0 pe:0 ua:0 ap:0
        resync: used:0/31 hits:52196872 misses:12768 starving:0 dirty:0 
changed:12768
        act_log: used:0/257 hits:0 misses:0 starving:0 dirty:0 changed:0

# drbdadm invalidate-remote disk0
...wait while resync finishes...
# dmesg | tail
drbd0: conn( Connected -> StartingSyncS ) pdsk( UpToDate -> Inconsistent ) 
drbd0: Writing meta data super block now.
drbd0: writing of bitmap took 11 jiffies
drbd0: 98 GB (25845948 bits) marked out-of-sync by on disk bit-map.
drbd0: Writing meta data super block now.
drbd0: conn( StartingSyncS -> SyncSource ) 
drbd0: Began resync as SyncSource (will sync 103383792 KB [25845948 bits 
set]).
drbd0: Writing meta data super block now.
drbd0: Resync done (total 3084 sec; paused 0 sec; 33520 K/sec)
drbd0: conn( SyncSource -> Connected ) pdsk( Inconsistent -> UpToDate ) 
drbd0: Writing meta data super block now.

# cat /proc/drbd
version: 8.2.4 (api:88/proto:86-88)
GIT-hash: fc00c6e00a1b6039bfcebe37afa3e7e28dbd92fa build by root at host1, 
2008-01-14 10:28:18
 0: cs:Connected st:Primary/Secondary ds:UpToDate/UpToDate C r---
    ns:206769423 nr:0 dw:8 dr:827131821 al:0 bm:25294 lo:0 pe:0 ua:0 ap:0
        resync: used:0/31 hits:129195718 misses:50538 starving:0 dirty:0 
changed:50538
        act_log: used:0/257 hits:2 misses:0 starving:0 dirty:0 changed:0
 1: cs:Connected st:Secondary/Primary ds:UpToDate/UpToDate C r---
    ns:0 nr:62936 dw:62936 dr:208772245 al:0 bm:24 lo:0 pe:0 ua:0 ap:0
        resync: used:0/31 hits:52196872 misses:12768 starving:0 dirty:0 
changed:12768
        act_log: used:0/257 hits:0 misses:0 starving:0 dirty:0 changed:0

# drbdadm verify disk0
...
# cat /proc/drbd 
version: 8.2.4 (api:88/proto:86-88)
GIT-hash: fc00c6e00a1b6039bfcebe37afa3e7e28dbd92fa build by root at host1, 
2008-01-14 10:28:18
 0: cs:VerifyS st:Primary/Secondary ds:UpToDate/UpToDate C r---
    ns:206769423 nr:0 dw:8 dr:852939569 al:0 bm:25294 lo:0 pe:1230 ua:0 ap:0
         24% oos:461  6451937 / 25845948 
        resync: used:1/31 hits:135647309 misses:52114 starving:0 dirty:0 
changed:52114
        act_log: used:0/257 hits:2 misses:0 starving:0 dirty:0 changed:0
 1: cs:Connected st:Secondary/Primary ds:UpToDate/UpToDate C r---
    ns:0 nr:62936 dw:62936 dr:208772245 al:0 bm:24 lo:0 pe:0 ua:0 ap:0
        resync: used:0/31 hits:52196872 misses:12768 starving:0 dirty:0 
changed:12768
        act_log: used:0/257 hits:0 misses:0 starving:0 dirty:0 changed:0

...wait while verify finishes...
# dmesg
drbd0: conn( Connected -> VerifyS ) 
drbd0: Out of sync: start=13194216, size=32 (sectors)
drbd0: Out of sync: start=13209840, size=24 (sectors)
drbd0: Out of sync: start=13263400, size=8 (sectors)
drbd0: Out of sync: start=13263712, size=8 (sectors)
drbd0: Out of sync: start=13272176, size=24 (sectors)
drbd0: Out of sync: start=13273048, size=16 (sectors)
drbd0: Out of sync: start=13273216, size=8 (sectors)
drbd0: Out of sync: start=13283312, size=8 (sectors)
drbd0: Out of sync: start=13290200, size=8 (sectors)
drbd0: Out of sync: start=13290344, size=8 (sectors)
drbd0: Out of sync: start=13299472, size=8 (sectors)

...
drbd0: Out of sync: start=13911744, size=8 (sectors)
drbd0: Out of sync: start=13915904, size=8 (sectors)
drbd0: Out of sync: start=13915920, size=8 (sectors)
drbd0: Out of sync: start=13924344, size=16 (sectors)
drbd0: Out of sync: start=13924840, size=16 (sectors)
drbd0: Out of sync: start=13929816, size=8 (sectors)
drbd0: Out of sync: start=13942080, size=16 (sectors)
drbd0: Online verify  done (total 2049 sec; paused 0 sec; 50452 K/sec)
drbd0: Online verify found 461 4k block out of sync!
drbd0: helper command: /sbin/drbdadm out-of-sync
drbd0: Writing the whole bitmap.
drbd0: writing of bitmap took 20 jiffies
drbd0: 1844 KB (461 bits) marked out-of-sync by on disk bit-map.
drbd0: conn( VerifyS -> Connected ) 

# cat /proc/drbd
version: 8.2.4 (api:88/proto:86-88)
GIT-hash: fc00c6e00a1b6039bfcebe37afa3e7e28dbd92fa build by root at host1, 
2008-01-14 10:28:18
 0: cs:Connected st:Primary/Secondary ds:UpToDate/UpToDate C r---
    ns:206769423 nr:0 dw:8 dr:930515610 al:0 bm:25294 lo:0 pe:0 ua:0 ap:0
        resync: used:0/31 hits:155035355 misses:56849 starving:0 dirty:0 
changed:56849
        act_log: used:0/257 hits:2 misses:0 starving:0 dirty:0 changed:0
 1: cs:Connected st:Secondary/Primary ds:UpToDate/UpToDate C r---
    ns:0 nr:62936 dw:62936 dr:208772245 al:0 bm:24 lo:0 pe:0 ua:0 ap:0
        resync: used:0/31 hits:52196872 misses:12768 starving:0 dirty:0 
changed:12768
        act_log: used:0/257 hits:0 misses:0 starving:0 dirty:0 changed:0

ifconfig doesn't show the interface(s) running any errors.

Running the same test with tso and checksumming off results in the same 
pattern of out-of-sync data (same start and stop positions and same number of 
sectors at each)

# TSO and checksumming off
ethtool -K eth3 tx off
ethtool -K eth3 rx off
ethtool -K eth3 tso off

# dmesg
drbd0: conn( Connected -> VerifyS ) 
drbd0: Out of sync: start=13194216, size=32 (sectors)
drbd0: Out of sync: start=13209840, size=24 (sectors)
drbd0: Out of sync: start=13263400, size=8 (sectors)
drbd0: Out of sync: start=13263712, size=8 (sectors)
drbd0: Out of sync: start=13272176, size=24 (sectors)
drbd0: Out of sync: start=13273048, size=16 (sectors)
drbd0: Out of sync: start=13273216, size=8 (sectors)
drbd0: Out of sync: start=13283312, size=8 (sectors)
drbd0: Out of sync: start=13290200, size=8 (sectors)
drbd0: Out of sync: start=13290344, size=8 (sectors)
drbd0: Out of sync: start=13299472, size=8 (sectors)
...
drbd0: Out of sync: start=13911744, size=8 (sectors)
drbd0: Out of sync: start=13915904, size=8 (sectors)
drbd0: Out of sync: start=13915920, size=8 (sectors)
drbd0: Out of sync: start=13924344, size=16 (sectors)
drbd0: Out of sync: start=13924840, size=16 (sectors)
drbd0: Out of sync: start=13929816, size=8 (sectors)
drbd0: Out of sync: start=13942080, size=16 (sectors)
drbd0: Online verify  done (total 2044 sec; paused 0 sec; 50576 K/sec)
drbd0: Online verify found 461 4k block out of sync!
drbd0: helper command: /sbin/drbdadm out-of-sync
drbd0: Writing the whole bitmap.
drbd0: writing of bitmap took 14 jiffies
drbd0: 1844 KB (461 bits) marked out-of-sync by on disk bit-map.
drbd0: conn( VerifyS -> Connected ) 

Assuming it is not a problem with 'drbd verify' or a drbd bug, what would 
people recommend as the next step to tracking down the source of this 
inconsistency?

It is worth noting that fsx (http://www.codemonkey.org.uk/cruft/fsx-linux.c) 
ran successfully for 4 days without producing any errors (ie no file or 
filesystem corruption) on the ext3 filesystem on disk0. However upon running 
the verify command against the disk after the fsx testing, it said that 
around 64MB was un-synchronised.

Another question is how to act to resync after discovering differences via a 
verify command. Florian's blog says to run a 'drbdadm disconnect <resource>' 
followed by a 'drbdadm connect <resource>' on the primary for that resource.
Can this be done whilst writes are being done to the primary? (Wouldn't the 
sync be running both ways then? New writes being sent to the secondary node 
then back to the primary, and verify related resyncs from the primary to the 
secondary?) 

Configuration and layout listed below

fdisk /dev/sda:
   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *           1          17      136521   fd  Linux raid autodetect
/dev/sda2              18         516     4008217+  fd  Linux raid autodetect
/dev/sda3             517        4501    32009512+  fd  Linux raid autodetect
/dev/sda4            4502       30400   208033717+  83  Linux

parted /dev/md_d0 (since fdisk doesn't understand gpt tables):
Disk /dev/md_d0: 213GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt

Number  Start   End    Size   File system  Name     Flags
 1      17.4kB  135MB  135MB               primary       
 2      135MB   106GB  106GB  ext3         primary       
 3      106GB   106GB  135MB               primary       
 4      106GB   213GB  107GB  ext3         primary       

/proc/mdstat:
Personalities : [raid1] 
md_d0 : active raid1 sdb4[1] sda4[0]
      208033600 blocks [2/2] [UU]

md1 : active raid1 sdb2[0] sda2[1]
      4008128 blocks [2/2] [UU]

md2 : active raid1 sdb3[0] sda3[1]
      32009408 blocks [2/2] [UU]

md0 : active raid1 sdb1[0] sda1[1]
      136448 blocks [2/2] [UU]

unused devices: <none>

/etc/drbd.conf:
global { usage-count no; }
common { syncer { rate 50M; } }
resource disk0 {
  protocol C;
  startup {
    wfc-timeout  10;
    degr-wfc-timeout 120;    # 2 minutes.
  }
  disk {
    on-io-error   detach;
  }
  net {
    data-integrity-alg sha1;
    cram-hmac-alg sha1;
    shared-secret qx79PoVBIR5TvybCVri4QRfrsB34JcWj;
  }
  syncer {
    rate 50M;
    #group 1;
    al-extents 257;
    verify-alg sha1;
    cpu-mask 1;
  }
  on host1 {
    device     /dev/drbd0;
    disk       /dev/md_d0p2;
    address    192.168.2.1:7788;
    flexible-meta-disk  /dev/md_d0p1;
  }

  on host2 {
    device    /dev/drbd0;
    disk      /dev/md_d0p2;
    address   192.168.2.2:7788;
    flexible-meta-disk /dev/md_d0p1;
  }
}
resource disk1 {
  protocol C;
  startup {
    wfc-timeout  10;
    degr-wfc-timeout 120;    # 2 minutes.
  }
  disk {
    on-io-error   detach;
  }
  net {
    data-integrity-alg sha1;
    cram-hmac-alg sha1;
    shared-secret gH1b7hEIzhmmXo2nLUSs0SbXJ7NVw41L;
  }
  syncer {
    rate 50M;
    #group 1;
    al-extents 257;
    verify-alg sha1;
    cpu-mask 2;
  }
  on host1 {
    device     /dev/drbd1;
    disk       /dev/md_d0p4;
    address    192.168.2.1:7789;
    flexible-meta-disk  /dev/md_d0p3;
  }

  on host2 {
    device    /dev/drbd1;
    disk      /dev/md_d0p4;
    address   192.168.2.2:7789;
    flexible-meta-disk /dev/md_d0p3;
  }
}