[DRBD-user] DRBD 8.2.6 - reason for full resync?!

Tue Jun 24 12:27:30 CEST 2008

Hi,

I was trying today to play with drbd's settings and benchmark the 
results in order to obtain the best performance.
Here is my test setup:
2 identical machines with sas storage boxes. Each machine has two 2TB 
device (in my case /dev/sdb and /dev/sdc) that I mirror over drbd and on 
top of them there's LVM set up. The nodes share a gbit link dedicated 
for drbd traffic. After the initial sync which took something around 20 
hours to finish, I've created the LVM volume and formatted using ext3 
FS. Then I started to play around with params like al-extents, 
unplug-watermark, maxbuffers, max-epoch by changing the  values and 
doing a drbdadm adjust all on each node (of course after copying the 
config file accordingly). In the begining it went pretty well, maximum 
value attained by dd test over drbd was 28.9 MB/s:

[root at erebus testing]# dd if=/dev/zero of=test.dat bs=1G count=1 oflag=dsync
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB) copied, 37.1114 seconds, 28.9 MB/s

The configuration used is described in the end. After a couple more 
tests, I noticed a big impact on performance, getting around 19-20 MB/s 
so I checked /proc/drbd to see what's going on. Surprisingly, it was 
doing a full resync on one of the disks. Problem is, I don't understand 
why, as normally it should only resync discrepancies.

Output from /proc/drbd is as follows:

    /version: 8.2.6 (api:88/proto:86-88)/
    /GIT-hash: 3e69822d3bb4920a8c1bfdf7d647169eba7d2eb4 build by
    root at leviathan.nl.imc.local, 2008-06-23 11:34:01/
    / 0: cs:SyncTarget st:Secondary/Primary ds:Inconsistent/UpToDate C r---/
    /    ns:0 nr:184932480 dw:239670288 dr:2127855509 al:36724 bm:142013
    lo:30 pe:235 ua:29 ap:0 oos:1952003580/
    /        [>...................] sync'ed:  8.6% (1906253/2084799)M/
    /        finish: 8:54:23 speed: 60,812 (53,284) K/sec/
    / 1: cs:Connected st:Secondary/Primary ds:UpToDate/UpToDate C r---/
    /    ns:0 nr:0 dw:33488348 dr:2134854516 al:31586 bm:130427 lo:0
    pe:0 ua:0 ap:0 oos:0/

Here is my drbd.conf file (the version I've got the best result with, 
28.9 MB/s):

    /global {
      usage-count no;
    }

    common {
      protocol C;
      syncer {
            rate 110M;
      }
    }

    resource drbd0 {
      on leviathan {
            device    /dev/drbd0;
            disk      /dev/sdb;
            address   10.0.0.10:7789;
            meta-disk internal;
      }
      on erebus {
            device    /dev/drbd0;
            disk      /dev/sdb;
            address   10.0.0.20:7789;
            meta-disk internal;
      }
      syncer {
            rate 110M;
            al-extents 641;
      }
      net {
            #on-disconnect reconnect;
            after-sb-0pri disconnect;
            after-sb-1pri disconnect;
            max-epoch-size 8192;
            max-buffers 8192;
            unplug-watermark 128;
      }
    }

    resource drbd1 {
      on leviathan {
            device    /dev/drbd1;
            disk      /dev/sdc;
            address   10.0.0.10:7790;
            meta-disk internal;
      }
      on erebus {
            device    /dev/drbd1;
            disk      /dev/sdc;
            address   10.0.0.20:7790;
            meta-disk internal;
      }
      syncer {
            rate 110M;
            al-extents 641;
      }
      net {
            #on-disconnect reconnect;
            after-sb-0pri disconnect;
            after-sb-1pri disconnect;
            max-epoch-size 8192;
            max-buffers 8192;
            unplug-watermark 128;
      }
    }/

Anyone has any ideea what caused the full resync and how I can avoid it 
in the future?

Thanks and regards,

Andrei Neagoe.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20080624/3cf5b6f3/attachment.htm>