[DRBD-user] Drbd many blocks out of sync

Lars Ellenberg lars.ellenberg at linbit.com
Wed May 16 14:37:00 CEST 2012

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


On Wed, May 16, 2012 at 11:58:36AM +0400, Vladimir Kuklin wrote:
> Hi guys
> 
> I am experiencing very strange behaviour of DRBD.
> 
> My setup is:
> 
> 
> Two absolutely identical Supermicro nodes using LSI 9265-8i
> controller with SAS disks.
> 
> Scientific Linux 6.2 with latest OpenVZ stable kernel that uses drbd 8.3.10.
> 
> the excerpt from my drbd.conf is following:
> 
> ...
> resource r1 {
>     net
>         {
>         max-buffers 8000;
>         max-epoch-size 8000;
>         sndbuf-size 2M;
>         allow-two-primaries;
>         after-sb-0pri discard-zero-changes;
>         after-sb-1pri discard-secondary;
>         after-sb-2pri disconnect;
> #        data-integrity-alg crc32c;
>         ping-int 25;
>     }
>     startup {
>         become-primary-on both;
>     }
> 
> 
> 
>     syncer {
>         rate 100M;
>         al-extents 3383;
>         csums-alg crc32c;
>         verify-alg crc32c;
>         }
> 
>     disk {
>         fencing resource-only;
> #        no-disk-barrier;
> #        no-disk-flushes;
>         }
>         handlers
>         {
>         split-brain "/usr/lib/drbd/notify-split-brain.sh root";
>         out-of-sync "/usr/lib/drbd/notify-out-of-sync.sh root";
> #       fence-peer "/usr/lib/drbd/crm-fence-peer.sh";
> #       after-resync-target "/usr/lib/drbd/crm-unfence-peer.sh";
>         }
>     protocol C;
> 
>     on srv10 {
>         device     /dev/drbd1;
>         disk       /dev/sdf1;
>         address    192.168.27.11:7789;
>         meta-disk internal;
>         }
> 
>       on srv11 {
>         device    /dev/drbd1;
>         disk      /dev/sdf1;
>         address   192.168.27.12:7789;
>         meta-disk internal;
>         }
> }
> ...
> 
> The problem is that after initial synchronization if I run "drbdadm
> verify r1", I get a bunch of out-of-sync blocks. Then I do
> disconnect and connect of this resource and run "drbdadm verify r1"
> again and then I again do get a bunch of out-of-sync blocks. Some of
> them are false-positives and some of them are really out of sync as
> dd shows (both with iflags=direct and without). And what is
> important: there is no write operations on this device. Nothing is
> written to it, but I get a bunch of out-of-sync blocks any time I
> run verify and resync DRBD.
> 
> [root at srv10 vvk]# drbdadm verify r1
> [root at srv10 vvk]# cat /proc/drbd
> version: 8.3.10 (api:88/proto:86-96)
> GIT-hash: 5c0b0469666682443d4785d90a2c603378f9017b build by
> phil at fat-tyre, 2011-01-28 12:17:35
> 
>  1: cs:VerifyS ro:Primary/Secondary ds:UpToDate/UpToDate C r-----
>     ns:1568452 nr:0 dw:4 dr:39777412 al:1 bm:1300 lo:333 pe:0 ua:367
> ap:0 ep:1 wo:b oos:484868
>         [>....................] verified:  2.3% (9328/9536)M
>         finish: 0:01:29 speed: 106,784 (106,784) want: 102,400 K/sec
> [root at srv10 vvk]# cat /proc/drbd
> version: 8.3.10 (api:88/proto:86-96)
> GIT-hash: 5c0b0469666682443d4785d90a2c603378f9017b build by
> phil at fat-tyre, 2011-01-28 12:17:35
> 
>  1: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r-----
>     ns:1568452 nr:0 dw:4 dr:49330856 al:1 bm:1300 lo:0 pe:0 ua:0
> ap:0 ep:1 wo:b oos:724448
> [root at srv10 vvk]# dmesg | tail -10
> 
> [ 2267.737437] block drbd1: helper command: /sbin/drbdadm
> before-resync-source minor-1 exit code 0 (0x0)
> [ 2267.737445] block drbd1: conn( WFBitMapS -> SyncSource ) pdsk(
> Consistent -> Inconsistent )
> [ 2267.737455] block drbd1: Began resync as SyncSource (will sync
> 724448 KB [181112 bits set]).
> [ 2267.737494] block drbd1: updated sync UUID
> 5B5B2B2EE43F8757:7BE37645CD25FF6D:7BE27645CD25FF6D:0001000000000000
> [ 2276.569106] block drbd1: Resync done (total 8 sec; paused 0 sec;
> 90556 K/sec)
> [ 2276.569113] block drbd1: 51 % had equal check sums, eliminated:
> 371216K; transferred 353232K total 724448K
> [ 2276.569120] block drbd1: updated UUIDs
> 5B5B2B2EE43F8757:0000000000000000:7BE37645CD25FF6D:7BE27645CD25FF6D
> [ 2276.569129] block drbd1: conn( SyncSource -> Connected ) pdsk(
> Inconsistent -> UpToDate )
> [ 2276.569727] block drbd1: bitmap WRITE of 52 pages took 0 jiffies
> [ 2276.569750] block drbd1: 0 KB (0 bits) marked out-of-sync by on
> disk bit-map.
> 
> [root at srv10 vvk]# cat /proc/drbd
> version: 8.3.10 (api:88/proto:86-96)
> GIT-hash: 5c0b0469666682443d4785d90a2c603378f9017b build by
> phil at fat-tyre, 2011-01-28 12:17:35
> 
>  1: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r-----
>     ns:353232 nr:0 dw:4 dr:50055636 al:1 bm:1410 lo:0 pe:0 ua:0 ap:0
> ep:1 wo:b oos:0
> [root at srv10 vvk]# drbdadm verify r1
> 
> [root at srv10 vvk]# cat /proc/drbd
> version: 8.3.10 (api:88/proto:86-96)
> GIT-hash: 5c0b0469666682443d4785d90a2c603378f9017b build by
> phil at fat-tyre, 2011-01-28 12:17:35
> 
>  1: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r-----
>     ns:353232 nr:0 dw:4 dr:59822788 al:1 bm:1410 lo:0 pe:0 ua:0 ap:0
> ep:1 wo:b oos:470532
> 
> 
> Is this normal DRBD behaviour?

No.
That is most likely broken hardware.

Still: try a verify with md5/sha1,
and a resync without csums-alg,
then an additional verify,
and no other interaction with the device in between.

If that still shows out of sync blocks,
I'd strongly suggest you start
replacing hardware components.


-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com



More information about the drbd-user mailing list