[DRBD-user] strange performance problems with drbd

Torsten Neumann torsten.neumann at dlh.de
Mon May 8 08:43:16 CEST 2006

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Hello,

I am trying to setup drbd on some HP Proliant Servers but have some issues about
performance. The speed is with most linux kernels very slow. There are just a
few exceptions.

Software
Both nodes running debian 3.1 ("sarge")

Hardware
DL 380 G4 with a 440GB Hardware Raid5  (4 144GB disks) on /dev/cciss/c0d1
A dual Intel Gigabit Controller configured as bonding device for drbd use.
Connected to a an extra vlan. When sending data via tcpspray it looks like

linbackup-1:~# tcpspray -n 1000000 192.168.53.2
Transmitted 1024000000 bytes in 9.006240 seconds (111034.127 kbytes/s)

linbackup-2:~# tcpspray -n 1000000 192.168.53.1
Transmitted 1024000000 bytes in 12.658490 seconds (78998.364 kbytes/s)

(there is a significant speed difference, but I am not sure if it explains the
later effect)

For all following results drbd-0.7.18 (SVN Revision: 2186M) is used.  And also
the following drbd.conf was the same 

global {
    minor-count 5;
}

resource drbd0 {
  protocol C;
  incon-degr-cmd "echo '!DRBD! pri on incon-degr' | wall ; sleep 60 ; halt -f";
  startup {
    wfc-timeout  0;
    degr-wfc-timeout 120;    # 2 minutes.
  }
  disk {
    on-io-error   detach;
  }
  net {
    max-buffers     131072;
    ko-count 4;
  }
  syncer {
    rate 90M;
    group 1;
    al-extents 257;
  }

  on linbackup-1 {
    device     /dev/drbd0;
    disk       /dev/vgdrbd/drbd;
    address    192.168.53.1:7788;
    meta-disk  /dev/vg00/lvol5[0];
  }

  on linbackup-2 {
    device     /dev/drbd0;
    disk       /dev/vgdrbd/drbd;
    address    192.168.53.2:7788;
    meta-disk  /dev/vg00/lvol5[0];
  }
}

(there was no speed difference when using drbd on top of lvm2 disks or directly
on the device. The metadisk is on a seperate disk)

When syncing I get with most kernel Versions something like

(example with both nodes running linux-2.6.6)
SVN Revision: 2186M build by root at linbackup-1, 2006-05-06 19:52:29
 0: cs:SyncSource st:Secondary/Secondary ld:Consistent
    ns:263444 nr:0 dw:0 dr:272268 al:0 bm:45 lo:0 pe:192 ua:2206 ap:0
        [>...................] sync'ed:  4.6% (5511/5768)M
        finish: 0:47:02 speed: 1,672 (3,280) K/sec

with some Versions I got 10 times faster speed like (linux-2.6.5)

SVN Revision: 2186M build by root at linbackup-2, 2006-05-06 20:01:39
 0: cs:SyncTarget st:Secondary/Secondary ld:Inconsistent
    ns:0 nr:1090348 dw:1090344 dr:0 al:0 bm:60 lo:3524 pe:1721 ua:3524 ap:0
        [===>................] sync'ed: 19.6% (4349/5400)M
        finish: 0:01:41 speed: 43,868 (33,632) K/sec

In all tests (except noted) the node linbackup-1 was the primary and linbackup-2
secondary. The following kernel configurations had been slow

both nodes running 2.6.16.14,  2.6.15.7, 2.6.14.7, 2.6.13.5, 2.6.12.6,
2.6.11.12, 2.6.10, 2.6.6 and also
linbackup-1(primary) running 2.6.5 and linbackup-2(secondary) running  2.6.16.14

Just with the following configurations I got the better speed
both nodes running 2.6.5
linbackup-1(primary) running 2.6.16.14 and linbackup-2(secondary) running  2.6.5
and vice versa when switching primary and secondary
linbackup-1(secondary) with 2.6.5 and linbackup-2(primary) with 2.6.16.14

Other configurations of different kernel versions had been tested too.
(Unfortunally not all documented) When not at least one node was running 2.6.5
it was always slow.

Any ideas how I can track down this problem. I don't like to run ancient kernel
revisions on a production system that I can never update. I am not even sure if
its a hard or a software problem. Anymore information needed, or any idea what
else I should test to make it working all the time?

Regards
  Torsten




More information about the drbd-user mailing list