[DRBD-user] blocking I/O with drbd

Fri Dec 16 00:39:58 CET 2011

Hello Volker,

On 12/15/2011 05:33 PM, Volker wrote:
> Hi Andreas,
> 
>>>                 no-disk-drain;
>>
>> try replacing "no-disk-drain" by "no-md-flushes"
> 
> Thanks for your suggestion. Unfortunately setting that made it worse.
> Shortly after
> 
> $ drbdadm adjust content
> 
> The load on the master went up to 4 and did not decrease afterwards.
> After removing 'no-md-flushes' the load went down to around 1-1.5 again.

hmmm ... that is unexpected.

This behaviour would maybe make sense if your controller has no cache at
all ... or it is configured to only cache reads.

> 
> But:
> 
> There were two resyncs directly after activating and deactivating it. It
> looked like below:
> 
> ####
> [root at nfs01 nfs]# cat /proc/drbd
> version: 8.3.8 (api:88/proto:86-94)

Really do an upgrade! ... elrepo seems to have latest DRBD 8.3.12 packages

> GIT-hash: d78846e52224fd00562f7c225bcc25b2d422321d build by
> mockbuild at builder10.centos.org, 2010-06-04 08:04:09
>  0: cs:SyncTarget ro:Primary/Secondary ds:Inconsistent/UpToDate B r----
>     ns:902760 nr:13965696 dw:14865000 dr:75668 al:1163261 bm:45844 lo:0
> pe:0 ua:0 ap:0 ep:1 wo:d oos:1737728
> 	[================>...] sync'ed: 89.0% (1696/15332)M queue_delay: 0.0 ms
> 	finish: 0:00:52 speed: 32,928 (25,156) want: 25,600 K/sec
> ###
> 
> As you can see, the rate is at around 25MB, which is fine and fast
> enough. The system-load on the master is not affected by this resync.
> 
> Why these resyncs happen and so much data is being resynced, is another
> case. The nodes were disconnected for 3-4 Minutes which does not justify
> so much data. Anyways...

If you adjust your resource after changing an disk option the disk is
detached/attached ... this means syncing the complete AL when done on a
primary ... 3833*4MB=15332MB

> 
> One further note regarding the blocking-io:
> 
> After issueing the mentioned dd command
> 
> $ dd if=/dev/zero of=./test-data.dd bs=4096 count=10240
> 10240+0 records in
> 10240+0 records out
> 41943040 bytes (42 MB) copied, 0.11743 seconds, 357 MB/s

you benchmark your page cache here ... add oflag=direct to dd to bypass it

> 
> dd finishes within a couple of seconds (1-2) and the system-load does
> not increase right away. It takes about 4-5 seconds for the load to
> increase up to around 5-6. If i would issue a second dd-command right
> after the first one finishes, the load would increase even higher than
> 5-6 with the second dd command being uninterruptible.

looks like I/O system or network is fully saturated

> 
> Interestingly dd _always_ reports speeds of 200-350MB which is obviously
> not the case.
> 
> Any more ideas?

try another RAID controller if DRBD upgrade is not enough.

Regards,
Andreas

-- 
Need help with DRBD?
http://www.hastexo.com/now

> 
> greetings
> volker
> _______________________________________________
> drbd-user mailing list
> drbd-user at lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 286 bytes
Desc: OpenPGP digital signature
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20111216/2ee0480b/attachment.pgp>