[DRBD-user] Tracking down sources of corruption (possibly) detected by drbdadm verify...

Wed Jan 16 10:59:33 CET 2008

On Tue, Jan 15, 2008 at 06:02:13PM +1030, Admin wrote:
> Hi all.
> 
> I'm trying out the new online verify functionality of 8.2.4 on a sun x2100M2 
> (amd64 nvidia MCP55 based system) running in 64bit mode.
> 
> Configuration is
> o 2.6.22.15 kernel with vserver 2.2.0.5 patch
> o 2 nodes
> o 2 internal disks with md mirrors for /, /boot, swap and a partitionable md 
> for the drbd partitions
> o drbd traffic travels over an nvidia (forcedeth driver) gigE direct 
> connection between the two nodes. Have tried this with TSO, RX and TX 
> checksumming enabled and disabled (via ethtool -K eth3 tso|tx|rx off) on both 
> nodes and the issue persists.
> 
> After doing a full resync (via drbdadm invalidate-remote disk0), and no 
> activity having been performed on the drbd resource, I would expect 'drbdadm 
> verify disk0' to not show any differences between the two. Is this correct?
> 
> # TSO and checksumming on
> ethtool -K eth3 tx on
> ethtool -K eth3 rx on
> ethtool -K eth3 tso on
> 
> # ethtool -k eth3
> Offload parameters for eth3:
> Cannot get device udp large send offload settings: Operation not supported
> rx-checksumming: on
> tx-checksumming: on
> scatter-gather: on
> tcp segmentation offload: on
> udp fragmentation offload: off
> generic segmentation offload: off
> 
> # cat /proc/drbd
> version: 8.2.4 (api:88/proto:86-88)
> GIT-hash: fc00c6e00a1b6039bfcebe37afa3e7e28dbd92fa build by root at host1, 
> 2008-01-14 10:28:18
>  0: cs:Connected st:Primary/Secondary ds:UpToDate/UpToDate C r---
>     ns:1844 nr:0 dw:8 dr:620364242 al:0 bm:12672 lo:0 pe:0 ua:0 ap:0
>         resync: used:0/31 hits:116285364 misses:37916 starving:0 dirty:0 
> changed:37916
>         act_log: used:0/257 hits:2 misses:0 starving:0 dirty:0 changed:0
>  1: cs:Connected st:Secondary/Primary ds:UpToDate/UpToDate C r---
>     ns:0 nr:62936 dw:62936 dr:208772245 al:0 bm:24 lo:0 pe:0 ua:0 ap:0
>         resync: used:0/31 hits:52196872 misses:12768 starving:0 dirty:0 
> changed:12768
>         act_log: used:0/257 hits:0 misses:0 starving:0 dirty:0 changed:0
> 
> # drbdadm invalidate-remote disk0
> ...wait while resync finishes...
> # dmesg | tail
> drbd0: conn( Connected -> StartingSyncS ) pdsk( UpToDate -> Inconsistent ) 
> drbd0: Writing meta data super block now.
> drbd0: writing of bitmap took 11 jiffies
> drbd0: 98 GB (25845948 bits) marked out-of-sync by on disk bit-map.
> drbd0: Writing meta data super block now.
> drbd0: conn( StartingSyncS -> SyncSource ) 
> drbd0: Began resync as SyncSource (will sync 103383792 KB [25845948 bits 
> set]).
> drbd0: Writing meta data super block now.
> drbd0: Resync done (total 3084 sec; paused 0 sec; 33520 K/sec)
> drbd0: conn( SyncSource -> Connected ) pdsk( Inconsistent -> UpToDate ) 
> drbd0: Writing meta data super block now.
> 
> # cat /proc/drbd
> version: 8.2.4 (api:88/proto:86-88)
> GIT-hash: fc00c6e00a1b6039bfcebe37afa3e7e28dbd92fa build by root at host1, 
> 2008-01-14 10:28:18
>  0: cs:Connected st:Primary/Secondary ds:UpToDate/UpToDate C r---
>     ns:206769423 nr:0 dw:8 dr:827131821 al:0 bm:25294 lo:0 pe:0 ua:0 ap:0
>         resync: used:0/31 hits:129195718 misses:50538 starving:0 dirty:0 
> changed:50538
>         act_log: used:0/257 hits:2 misses:0 starving:0 dirty:0 changed:0
>  1: cs:Connected st:Secondary/Primary ds:UpToDate/UpToDate C r---
>     ns:0 nr:62936 dw:62936 dr:208772245 al:0 bm:24 lo:0 pe:0 ua:0 ap:0
>         resync: used:0/31 hits:52196872 misses:12768 starving:0 dirty:0 
> changed:12768
>         act_log: used:0/257 hits:0 misses:0 starving:0 dirty:0 changed:0
> 
> # drbdadm verify disk0
> ...
> # cat /proc/drbd 
> version: 8.2.4 (api:88/proto:86-88)
> GIT-hash: fc00c6e00a1b6039bfcebe37afa3e7e28dbd92fa build by root at host1, 
> 2008-01-14 10:28:18
>  0: cs:VerifyS st:Primary/Secondary ds:UpToDate/UpToDate C r---
>     ns:206769423 nr:0 dw:8 dr:852939569 al:0 bm:25294 lo:0 pe:1230 ua:0 ap:0
>          24% oos:461  6451937 / 25845948 
>         resync: used:1/31 hits:135647309 misses:52114 starving:0 dirty:0 
> changed:52114
>         act_log: used:0/257 hits:2 misses:0 starving:0 dirty:0 changed:0
>  1: cs:Connected st:Secondary/Primary ds:UpToDate/UpToDate C r---
>     ns:0 nr:62936 dw:62936 dr:208772245 al:0 bm:24 lo:0 pe:0 ua:0 ap:0
>         resync: used:0/31 hits:52196872 misses:12768 starving:0 dirty:0 
> changed:12768
>         act_log: used:0/257 hits:0 misses:0 starving:0 dirty:0 changed:0
> 
> ...wait while verify finishes...
> # dmesg
> drbd0: conn( Connected -> VerifyS ) 
> drbd0: Out of sync: start=13194216, size=32 (sectors)
> drbd0: Out of sync: start=13209840, size=24 (sectors)
> drbd0: Out of sync: start=13263400, size=8 (sectors)
> drbd0: Out of sync: start=13263712, size=8 (sectors)
> drbd0: Out of sync: start=13272176, size=24 (sectors)
> drbd0: Out of sync: start=13273048, size=16 (sectors)
> drbd0: Out of sync: start=13273216, size=8 (sectors)
> drbd0: Out of sync: start=13283312, size=8 (sectors)
> drbd0: Out of sync: start=13290200, size=8 (sectors)
> drbd0: Out of sync: start=13290344, size=8 (sectors)
> drbd0: Out of sync: start=13299472, size=8 (sectors)
> 
> ...
> drbd0: Out of sync: start=13911744, size=8 (sectors)
> drbd0: Out of sync: start=13915904, size=8 (sectors)
> drbd0: Out of sync: start=13915920, size=8 (sectors)
> drbd0: Out of sync: start=13924344, size=16 (sectors)
> drbd0: Out of sync: start=13924840, size=16 (sectors)
> drbd0: Out of sync: start=13929816, size=8 (sectors)
> drbd0: Out of sync: start=13942080, size=16 (sectors)
> drbd0: Online verify  done (total 2049 sec; paused 0 sec; 50452 K/sec)
> drbd0: Online verify found 461 4k block out of sync!
> drbd0: helper command: /sbin/drbdadm out-of-sync
> drbd0: Writing the whole bitmap.
> drbd0: writing of bitmap took 20 jiffies
> drbd0: 1844 KB (461 bits) marked out-of-sync by on disk bit-map.
> drbd0: conn( VerifyS -> Connected ) 
> 
> # cat /proc/drbd
> version: 8.2.4 (api:88/proto:86-88)
> GIT-hash: fc00c6e00a1b6039bfcebe37afa3e7e28dbd92fa build by root at host1, 
> 2008-01-14 10:28:18
>  0: cs:Connected st:Primary/Secondary ds:UpToDate/UpToDate C r---
>     ns:206769423 nr:0 dw:8 dr:930515610 al:0 bm:25294 lo:0 pe:0 ua:0 ap:0
>         resync: used:0/31 hits:155035355 misses:56849 starving:0 dirty:0 
> changed:56849
>         act_log: used:0/257 hits:2 misses:0 starving:0 dirty:0 changed:0
>  1: cs:Connected st:Secondary/Primary ds:UpToDate/UpToDate C r---
>     ns:0 nr:62936 dw:62936 dr:208772245 al:0 bm:24 lo:0 pe:0 ua:0 ap:0
>         resync: used:0/31 hits:52196872 misses:12768 starving:0 dirty:0 
> changed:12768
>         act_log: used:0/257 hits:0 misses:0 starving:0 dirty:0 changed:0
> 
> 
> ifconfig doesn't show the interface(s) running any errors.
> 
> Running the same test with tso and checksumming off

stop.
"the same test"...
including the real full sync, now with tso off, before the online verify?

> results in the same 
> pattern of out-of-sync data (same start and stop positions and same number of 
> sectors at each)

interessting. looks somehow systematically.
have a look at the data at those offsets. use e.g.
  dd iflag=direct bs=512 skip=13194216 count=32 if=/dev/md_d0p2 | xxd -a
on both nodes, and compare. what is it, and how does it differ?

> # TSO and checksumming off
> ethtool -K eth3 tx off
> ethtool -K eth3 rx off
> ethtool -K eth3 tso off
> 
> # dmesg
> drbd0: conn( Connected -> VerifyS ) 
> drbd0: Out of sync: start=13194216, size=32 (sectors)
> drbd0: Out of sync: start=13209840, size=24 (sectors)
> drbd0: Out of sync: start=13263400, size=8 (sectors)
> drbd0: Out of sync: start=13263712, size=8 (sectors)
> drbd0: Out of sync: start=13272176, size=24 (sectors)
> drbd0: Out of sync: start=13273048, size=16 (sectors)
> drbd0: Out of sync: start=13273216, size=8 (sectors)
> drbd0: Out of sync: start=13283312, size=8 (sectors)
> drbd0: Out of sync: start=13290200, size=8 (sectors)
> drbd0: Out of sync: start=13290344, size=8 (sectors)
> drbd0: Out of sync: start=13299472, size=8 (sectors)
> ...
> drbd0: Out of sync: start=13911744, size=8 (sectors)
> drbd0: Out of sync: start=13915904, size=8 (sectors)
> drbd0: Out of sync: start=13915920, size=8 (sectors)
> drbd0: Out of sync: start=13924344, size=16 (sectors)
> drbd0: Out of sync: start=13924840, size=16 (sectors)
> drbd0: Out of sync: start=13929816, size=8 (sectors)
> drbd0: Out of sync: start=13942080, size=16 (sectors)
> drbd0: Online verify  done (total 2044 sec; paused 0 sec; 50576 K/sec)
> drbd0: Online verify found 461 4k block out of sync!
> drbd0: helper command: /sbin/drbdadm out-of-sync
> drbd0: Writing the whole bitmap.
> drbd0: writing of bitmap took 14 jiffies
> drbd0: 1844 KB (461 bits) marked out-of-sync by on disk bit-map.
> drbd0: conn( VerifyS -> Connected ) 
> 
> Assuming it is not a problem with 'drbd verify' or a drbd bug, what would 
> people recommend as the next step to tracking down the source of this 
> inconsistency?
> 
> It is worth noting that fsx (http://www.codemonkey.org.uk/cruft/fsx-linux.c) 
> ran successfully for 4 days without producing any errors (ie no file or 
> filesystem corruption) on the ext3 filesystem on disk0. However upon running 
> the verify command against the disk after the fsx testing, it said that 
> around 64MB was un-synchronised.

I don't know this fsx thing, but I take it that it has some
"excercising" mode, and a "verification" mode, and during the first, all
seemed well, while now during the latter, it complains?

or did you mean you ran the fsx on md_d0p4 directly, then later
configured the drbd, and now are wondering why drbd did not sync up the
changed blocks? I somehow cannot see exactly what you did when,
so have difficulties to interpret your findings.

> Another question is how to act to resync after discovering differences via a 
> verify command. Florian's blog says to run a 'drbdadm disconnect <resource>' 
> followed by a 'drbdadm connect <resource>' on the primary for that resource.
> Can this be done whilst writes are being done to the primary? (Wouldn't the 
> sync be running both ways then? New writes being sent to the secondary node 
> then back to the primary, and verify related resyncs from the primary to the 
> secondary?) 

huh? either I misunderstand something, or you confuse several concepts.

 we have *replication*. this is going on whenever DRBD peers are
 talking to each other, and are the filesystem/application level writes,
 which are mirrored.

 then we have the *resync*, which is bringing an
 inconsistent/out-of-date node up-to-date.  this of course can happen
 while any Primary stays usable, and continues with replication.

 then we have the online verify, which just compares checksums, and in
 the end sets some bits as dirty, but does not take action on its own,
 in case you want to inspect, in detail, by hand, the areas found.
 that is why you need to disconnect/reconnect to trigger the resync 
 of those areas.

-- 
: Lars Ellenberg                           http://www.linbit.com :
: DRBD/HA support and consulting             sales at linbit.com :
: LINBIT Information Technologies GmbH      Tel +43-1-8178292-0  :
: Vivenotgasse 48, A-1120 Vienna/Europe     Fax +43-1-8178292-82 :
__
please use the "List-Reply" function of your email client.