[DRBD-user] blocking I/O with drbd

Fri Dec 16 13:43:02 CET 2011

Hi,
 have you tried with elevator=deadline?
 what does "cat /sys/block/dm-*/queue/scheduler" show?

On Fri, 16 Dec 2011 13:27:55 +0100, Volker <mail at blafoo.org> wrote:
> Hi,
> 
>>> [root at nfs01 nfs]# cat /proc/drbd
>>> version: 8.3.8 (api:88/proto:86-94)
>> 
>> Really do an upgrade! ... elrepo seems to have latest DRBD 8.3.12
>> packages
> 
> Thanks for the hint, we might consider that if nothing else helps :-)
> 
> Not that we dont want the newer version. Its the unofficial repository
> that is the problem here. We are quite hesitant of unofficial repos,
> because that systems hosts hundreds of customers.
> 
>>> Why these resyncs happen and so much data is being resynced, is
another
>>> case. The nodes were disconnected for 3-4 Minutes which does not
justify
>>> so much data. Anyways...
>> 
>> If you adjust your resource after changing an disk option the disk is
>> detached/attached ... this means syncing the complete AL when done on a
>> primary ... 3833*4MB=15332MB
> 
> Great! Thanks for the insight. Im really learning some stuff about drbd
> here!
> 
>>> After issueing the mentioned dd command
>>>
>>> $ dd if=/dev/zero of=./test-data.dd bs=4096 count=10240
>>> 10240+0 records in
>>> 10240+0 records out
>>> 41943040 bytes (42 MB) copied, 0.11743 seconds, 357 MB/s
>> 
>> you benchmark your page cache here ... add oflag=direct to dd to bypass
>> it
> 
> Now this makes me shiver and lough at the same time (shortened the
output):
> 
> ####
> [root at nfs01 nfs]# dd if=/dev/zero of=./test-data.dd bs=4096 count=10240
> 41943040 bytes (42 MB) copied, 24.7257 seconds, 1.7 MB/s
> 
> [root at nfs01 nfs]# dd if=/dev/zero of=./test-data.dd bs=4096 count=10240
> oflag=direct
> 41943040 bytes (42 MB) copied, 25.9601 seconds, 1.6 MB/s
> 
> [root at nfs01 nfs]# dd if=/dev/zero of=./test-data.dd bs=4096 count=10240
> oflag=direct
> 41943040 bytes (42 MB) copied, 44.4078 seconds, 944 kB/s
> 
> [root at nfs01 nfs]# dd if=/dev/zero of=./test-data.dd bs=4096 count=10240
> oflag=direct
> 30384128 bytes (42 MB) copied, 26.9182 seconds, 1.3 MB/s
> ####
> 
> The load rises a little while doing this (to about 3-4), but the systems
> remains usable.
> 
>> looks like I/O system or network is fully saturated
> 
> It seems more like some sort of drbd-cache-setting is broken somewhere.
> 
> On an LVM-Volume without DRBD dd works fine (i shortened the output):
> 
> ####
> [root at nfs01 mnt]# dd if=/dev/zero of=./test-data.dd bs=4096 count=10240
> oflag=direct
> 41943040 bytes (42 MB) copied, 0.738741 seconds, 56.8 MB/s
> 
> [root at nfs01 mnt]# dd if=/dev/zero of=./test-data.dd bs=4096 count=10240
> oflag=direct
> 41943040 bytes (42 MB) copied, 0.746778 seconds, 56.2 MB/s
> 
> [root at nfs01 mnt]# dd if=/dev/zero of=./test-data.dd bs=4096 count=10240
> oflag=direct
> 41943040 bytes (42 MB) copied, 0.733518 seconds, 57.2 MB/s
> 
> [root at nfs01 mnt]# dd if=/dev/zero of=./test-data.dd bs=4096 count=10240
> oflag=direct
> 41943040 bytes (42 MB) copied, 0.736617 seconds, 56.9 MB/s
> 
> [root at nfs01 mnt]# dd if=/dev/zero of=./test-data.dd bs=4096 count=10240
> oflag=direct
> 41943040 bytes (42 MB) copied, 0.73078 seconds, 57.4 MB/s
> ####
> 
> The network link is also just fine. We've tested this with almost
> 100MB/s (that is Megabytes) of throughput. The only possible limit here
> would be the syncer rate of 25MB/s, but the network-link is only
> saturated during a resync.
> 
> Any more ideas with this info?
> 
> best regards
> volker
> _______________________________________________
> drbd-user mailing list
> drbd-user at lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user