Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hi all, we've been using drbd for about six month now, and so far everything is working pretty well. Our setup is like this with two identical machines (besides the actual HDDs). Dell 2950 III - 16GB Ram, Dual-Quadcore Xeons 2.0GHz - Redhat Enterprise Linux 5.7, 2.6.18-238.19.1.el5 - PERC6/i Raid-Controller - 2 Disk OS, Raid 1 (sda) - 4 Disk Content, Raid 10 (sdb) - 500GB /dev/sdb5 extended partition - LVM-Group 'content' on /dev/sdb5 - 400GB LVM-Volume 'data' created in LVM-Group 'content' - DRBD with /dev/drbd0 on /dev/content/data (content being the LVM-Group, data being the LVM-Volume) - /dev/drbd0 is mounted with noatime,ext3-ordered-journaling and then exported with nfs3 and and mounted by 8 machines (rhel5 entirely) - replication is done using a dedicated nic with gbit The DRBD-Version is drbd 8.3.8-1 kmod 8.3.8.1 here is the information from /proc/drbd: #### version: 8.3.8 (api:88/proto:86-94) GIT-hash: d78846e52224fd00562f7c225bcc25b2d422321d build by mockbuild at builder10.centos.org, 2010-06-04 08:04:09 0: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate B r---- ns:91617716 nr:15706584 dw:107784232 dr:53529112 al:892898 bm:37118 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0 #### This is the latest "official" Version available for CentOS/Redhat from the CentOS-extras repo (as far as i know): http://mirror.centos.org/centos/5/extras/x86_64/RPMS/ The configuration is identical on both nodes, looking like this: # /etc/drbd.d/global_common.conf # ################################## global { usage-count no; } common { protocol B; handlers {} startup {} disk { no-disk-barrier; no-disk-flushes; no-disk-drain; } net { max-buffers 8000; max-epoch-size 8000; unplug-watermark 1024; sndbuf-size 512k; } syncer { rate 25M; al-extents 3833; } } ################################## # /etc/drbd.d/production.conf # ################################## resource eshop { device /dev/drbd0; disk /dev/content/data; meta-disk internal; on nfs01.data.domain.de { address 10.110.127.129:7789; } on fallback.dta.domain.de { address 10.110.127.130:7789; } } ################################## The problem we have with this setup is quite complicated to explain. The read/write-performance in daily production use is sufficient to not effect the entire platform. The usual system-load viewed using top is pretty low, usually between 0.5 and 3. As soon as i produce some artifical i/o on /dev/drb0 on the master, the load pretty much explodes (up to 15) because of blocking i/o. The i/o is done with dd and pretty small files of bout 40MB: dd if=/dev/zero of=./test-data.dd bs=4096 count=10240 Two successive runs like this, make the load go up as far as 10-12 rendering the whole system useless. In this state, a running dd can not be interruped, the nfs-exports are totally inaccessible and the whole production-system is at a stand still. Using blktrace/blkparse one can see, that absolutely no i/o is possible. 'top' shows one or two cores at 100% wait: ### Cpu0 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu1 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu2 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu3 : 0.0%us, 0.0%sy, 0.0%ni, 0.0%id,100.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu4 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu5 : 0.0%us, 0.3%sy, 0.0%ni, 99.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu6 : 0.0%us, 0.3%sy, 0.0%ni, 99.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu7 : 0.0%us, 0.3%sy, 0.0%ni, 99.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st ### This lasts for about 3-4 minutes with the load slowly degrading. This behaviour can also be reproduced by using the megacli and querying the raid-controller for information. A couple of successive runs result in the above described behaviour. And here comes the catch: This only happens, if the drbd-layer is in use. If i produce heavy i/o (100 runs of writing a 400MB) on - the same block-device /dev/sdb - in the same volume-group 'content' - but on a newly created _different_ LVM-Volume - without usind the drbd-layer the system-load marginally rises, but i/o is never blocking. Things i have tried: - switching from protocol C to B - disk: no-disk-barrier / no-disk-flushes / no-disk-drain; - net: max-buffers / max-epoch-size / unplug-watermark / sndbuf-size - syncer: rate, al-extents - various caching settings on the raid-controller itself (using megacli) - completely disabling drbd-replication to rule out the replication-setup - ruled out faulty raid-controller battery which could force the controller into writethrough mode instead of writeback - checked/updated firmwares of hdds and raid-controllers - went through the git-changelogs for drbd 8.3, but i cannot say for sure which bug might or might not cause this behaviour. there are quite a lot of changes which mention sync/resync/replication and the like - newest RHEL5 kernel kernel-2.6.18-274.12.1.el5 Well, if you have gotten this far, thank you for your patience and for staying with me :-) If you have any suggestions on what could cause this behaviour, please let me know. greetings from germany volker