[DRBD-user] blocking I/O with drbd

Thu Dec 15 13:19:35 CET 2011

Hi all,

we've been using drbd for about six month now, and so far everything is
working pretty well. Our setup is like this with two identical machines
(besides the actual HDDs).

Dell 2950 III
- 16GB Ram, Dual-Quadcore Xeons 2.0GHz
- Redhat Enterprise Linux 5.7, 2.6.18-238.19.1.el5
- PERC6/i Raid-Controller
- 2 Disk OS, Raid 1 (sda)
- 4 Disk Content, Raid 10 (sdb)
- 500GB /dev/sdb5 extended partition
- LVM-Group 'content' on /dev/sdb5
- 400GB LVM-Volume 'data' created in LVM-Group 'content'
- DRBD with /dev/drbd0 on /dev/content/data (content being the
LVM-Group, data being the LVM-Volume)
- /dev/drbd0 is mounted with noatime,ext3-ordered-journaling and then
exported with nfs3 and and mounted by 8 machines (rhel5 entirely)
- replication is done using a dedicated nic with gbit

The DRBD-Version is
drbd 8.3.8-1
kmod 8.3.8.1

here is the information from /proc/drbd:

####
version: 8.3.8 (api:88/proto:86-94)
GIT-hash: d78846e52224fd00562f7c225bcc25b2d422321d build by
mockbuild at builder10.centos.org, 2010-06-04 08:04:09
 0: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate B r----
    ns:91617716 nr:15706584 dw:107784232 dr:53529112 al:892898 bm:37118
lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0
####

This is the latest "official" Version available for CentOS/Redhat from
the CentOS-extras repo (as far as i know):

http://mirror.centos.org/centos/5/extras/x86_64/RPMS/

The configuration is identical on both nodes, looking like this:

# /etc/drbd.d/global_common.conf #
##################################
global {
        usage-count no;
}

common {
        protocol B;
        handlers {}
        startup {}
        disk {
                no-disk-barrier;
                no-disk-flushes;
                no-disk-drain;
        }
        net {
                max-buffers 8000;
                max-epoch-size 8000;
                unplug-watermark 1024;
                sndbuf-size 512k;
        }
        syncer {
                rate 25M;
                al-extents 3833;
        }
}
##################################

# /etc/drbd.d/production.conf #
##################################
resource eshop
{
  device    /dev/drbd0;
  disk      /dev/content/data;
  meta-disk internal;

  on nfs01.data.domain.de  {
    address   10.110.127.129:7789;
  }

  on fallback.dta.domain.de  {
    address   10.110.127.130:7789;
  }
}
##################################

The problem we have with this setup is quite complicated to explain. The
read/write-performance in daily production use is sufficient to not
effect the entire platform. The usual system-load viewed using top is
pretty low, usually between 0.5 and 3.

As soon as i produce some artifical i/o on /dev/drb0 on the master, the
load pretty much explodes (up to 15) because of blocking i/o. The i/o is
done with dd and pretty small files of bout 40MB:

dd if=/dev/zero of=./test-data.dd bs=4096 count=10240

Two successive runs like this, make the load go up as far as 10-12
rendering the whole system useless. In this state, a running dd can not
be interruped, the nfs-exports are totally inaccessible and the whole
production-system is at a stand still.

Using blktrace/blkparse one can see, that absolutely no i/o is possible.

'top' shows one or two cores at 100% wait:

###
Cpu0 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu1 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu2 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu3 : 0.0%us, 0.0%sy, 0.0%ni, 0.0%id,100.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu4 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu5 : 0.0%us, 0.3%sy, 0.0%ni, 99.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu6 : 0.0%us, 0.3%sy, 0.0%ni, 99.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu7 : 0.0%us, 0.3%sy, 0.0%ni, 99.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
###

This lasts for about 3-4 minutes with the load slowly degrading.

This behaviour can also be reproduced by using the megacli and querying
the raid-controller for information. A couple of successive runs result
in the above described behaviour.

And here comes the catch:
This only happens, if the drbd-layer is in use.

If i produce heavy i/o (100 runs of writing a 400MB) on
- the same block-device /dev/sdb
- in the same volume-group 'content'
- but on a newly created _different_ LVM-Volume
- without usind the drbd-layer

the system-load marginally rises, but i/o is never blocking.

Things i have tried:
- switching from protocol C to B
- disk: no-disk-barrier / no-disk-flushes / no-disk-drain;
- net: max-buffers / max-epoch-size / unplug-watermark / sndbuf-size
- syncer: rate, al-extents
- various caching settings on the raid-controller itself (using megacli)
- completely disabling drbd-replication to rule out the replication-setup
- ruled out faulty raid-controller battery which could force the
controller into writethrough mode instead of writeback
- checked/updated firmwares of hdds and raid-controllers
- went through the git-changelogs for drbd 8.3, but i cannot say for
sure which bug might or might not cause this behaviour. there are quite
a lot of changes which mention sync/resync/replication and the like
- newest RHEL5 kernel kernel-2.6.18-274.12.1.el5

Well, if you have gotten this far, thank you for your patience and for
staying with me :-)

If you have any suggestions on what could cause this behaviour, please
let me know.

greetings from germany
volker