Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On Wed, May 16, 2012 at 11:58:36AM +0400, Vladimir Kuklin wrote: > Hi guys > > I am experiencing very strange behaviour of DRBD. > > My setup is: > > > Two absolutely identical Supermicro nodes using LSI 9265-8i > controller with SAS disks. > > Scientific Linux 6.2 with latest OpenVZ stable kernel that uses drbd 8.3.10. > > the excerpt from my drbd.conf is following: > > ... > resource r1 { > net > { > max-buffers 8000; > max-epoch-size 8000; > sndbuf-size 2M; > allow-two-primaries; > after-sb-0pri discard-zero-changes; > after-sb-1pri discard-secondary; > after-sb-2pri disconnect; > # data-integrity-alg crc32c; > ping-int 25; > } > startup { > become-primary-on both; > } > > > > syncer { > rate 100M; > al-extents 3383; > csums-alg crc32c; > verify-alg crc32c; > } > > disk { > fencing resource-only; > # no-disk-barrier; > # no-disk-flushes; > } > handlers > { > split-brain "/usr/lib/drbd/notify-split-brain.sh root"; > out-of-sync "/usr/lib/drbd/notify-out-of-sync.sh root"; > # fence-peer "/usr/lib/drbd/crm-fence-peer.sh"; > # after-resync-target "/usr/lib/drbd/crm-unfence-peer.sh"; > } > protocol C; > > on srv10 { > device /dev/drbd1; > disk /dev/sdf1; > address 192.168.27.11:7789; > meta-disk internal; > } > > on srv11 { > device /dev/drbd1; > disk /dev/sdf1; > address 192.168.27.12:7789; > meta-disk internal; > } > } > ... > > The problem is that after initial synchronization if I run "drbdadm > verify r1", I get a bunch of out-of-sync blocks. Then I do > disconnect and connect of this resource and run "drbdadm verify r1" > again and then I again do get a bunch of out-of-sync blocks. Some of > them are false-positives and some of them are really out of sync as > dd shows (both with iflags=direct and without). And what is > important: there is no write operations on this device. Nothing is > written to it, but I get a bunch of out-of-sync blocks any time I > run verify and resync DRBD. > > [root at srv10 vvk]# drbdadm verify r1 > [root at srv10 vvk]# cat /proc/drbd > version: 8.3.10 (api:88/proto:86-96) > GIT-hash: 5c0b0469666682443d4785d90a2c603378f9017b build by > phil at fat-tyre, 2011-01-28 12:17:35 > > 1: cs:VerifyS ro:Primary/Secondary ds:UpToDate/UpToDate C r----- > ns:1568452 nr:0 dw:4 dr:39777412 al:1 bm:1300 lo:333 pe:0 ua:367 > ap:0 ep:1 wo:b oos:484868 > [>....................] verified: 2.3% (9328/9536)M > finish: 0:01:29 speed: 106,784 (106,784) want: 102,400 K/sec > [root at srv10 vvk]# cat /proc/drbd > version: 8.3.10 (api:88/proto:86-96) > GIT-hash: 5c0b0469666682443d4785d90a2c603378f9017b build by > phil at fat-tyre, 2011-01-28 12:17:35 > > 1: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r----- > ns:1568452 nr:0 dw:4 dr:49330856 al:1 bm:1300 lo:0 pe:0 ua:0 > ap:0 ep:1 wo:b oos:724448 > [root at srv10 vvk]# dmesg | tail -10 > > [ 2267.737437] block drbd1: helper command: /sbin/drbdadm > before-resync-source minor-1 exit code 0 (0x0) > [ 2267.737445] block drbd1: conn( WFBitMapS -> SyncSource ) pdsk( > Consistent -> Inconsistent ) > [ 2267.737455] block drbd1: Began resync as SyncSource (will sync > 724448 KB [181112 bits set]). > [ 2267.737494] block drbd1: updated sync UUID > 5B5B2B2EE43F8757:7BE37645CD25FF6D:7BE27645CD25FF6D:0001000000000000 > [ 2276.569106] block drbd1: Resync done (total 8 sec; paused 0 sec; > 90556 K/sec) > [ 2276.569113] block drbd1: 51 % had equal check sums, eliminated: > 371216K; transferred 353232K total 724448K > [ 2276.569120] block drbd1: updated UUIDs > 5B5B2B2EE43F8757:0000000000000000:7BE37645CD25FF6D:7BE27645CD25FF6D > [ 2276.569129] block drbd1: conn( SyncSource -> Connected ) pdsk( > Inconsistent -> UpToDate ) > [ 2276.569727] block drbd1: bitmap WRITE of 52 pages took 0 jiffies > [ 2276.569750] block drbd1: 0 KB (0 bits) marked out-of-sync by on > disk bit-map. > > [root at srv10 vvk]# cat /proc/drbd > version: 8.3.10 (api:88/proto:86-96) > GIT-hash: 5c0b0469666682443d4785d90a2c603378f9017b build by > phil at fat-tyre, 2011-01-28 12:17:35 > > 1: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r----- > ns:353232 nr:0 dw:4 dr:50055636 al:1 bm:1410 lo:0 pe:0 ua:0 ap:0 > ep:1 wo:b oos:0 > [root at srv10 vvk]# drbdadm verify r1 > > [root at srv10 vvk]# cat /proc/drbd > version: 8.3.10 (api:88/proto:86-96) > GIT-hash: 5c0b0469666682443d4785d90a2c603378f9017b build by > phil at fat-tyre, 2011-01-28 12:17:35 > > 1: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r----- > ns:353232 nr:0 dw:4 dr:59822788 al:1 bm:1410 lo:0 pe:0 ua:0 ap:0 > ep:1 wo:b oos:470532 > > > Is this normal DRBD behaviour? No. That is most likely broken hardware. Still: try a verify with md5/sha1, and a resync without csums-alg, then an additional verify, and no other interaction with the device in between. If that still shows out of sync blocks, I'd strongly suggest you start replacing hardware components. -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com