[DRBD-user] umount costs lots of time in drbd 8.4.3

Wed May 8 21:13:18 CEST 2013

On Thu, May 09, 2013 at 12:16:56AM +0800, Mia Lueng wrote:
> in drbd 8.4.3，I do the following test:
> 
> [root at kvm3 drbd.d]# drbdadm dump drbd0
> # resource drbd0 on kvm3: not ignored, not stacked
> # defined at /etc/drbd.d/drbd0.res:1
> resource drbd0 {
>     on kvm3 {
>         device           /dev/drbd0 minor 0;
>         disk             /dev/vg_kvm3/drbd0;
>         meta-disk        internal;
>         address          ipv4 192.168.10.6:7700;
>     }
>     on kvm4 {
>         device           /dev/drbd0 minor 0;
>         disk             /dev/vg_kvm4/drbd0;
>         meta-disk        internal;
>         address          ipv4 192.168.10.7:7700;
>     }
>     net {
>         protocol           A;
>         csums-alg        md5;
>         verify-alg       md5;
>         ping-timeout      30;
>         ping-int          30;
>         max-epoch-size   8192;
>         max-buffers      8912;
>         unplug-watermark 131072;
>     }
>     disk {
>         on-io-error      pass_on;
>         disk-barrier      no;
>         disk-flushes      no;
>         resync-rate      100M;
>         c-plan-ahead      20;
>         c-delay-target   100;
>         c-max-rate       400M;
>         c-min-rate        2M;
>         al-extents       601;
>     }
> }
> 
> [root at kvm3 oradata]# dd if=t1 of=t2 bs=1M
> 5585+1 records in
> 5585+1 records out
> 5856305152 bytes (5.9 GB) copied, 286.119 s, 20.5 MB/s

That writes to the page cache, and from there to the block device.

No fsync, no sync: there will still be a few GB in the cache (RAM only).

> [root at kvm3 oradata]# cd
> [root at kvm3 ~]# umount /oradata
> 
> 
> it takes lots of time(up to 600 seconds)  to umount the drbd mount point.

On umount, the filesystem obviously has to flush all dirty pages first.

What is your replication bandwidth?

> echo "1" >/proc/sys/vm/block_dump
> show when umount ,
> 
> [root at kvm3 ~]# dmesg|tail -n 100
...
> umount(3958): WRITE block 100925440 on dm-5
> umount(3958): WRITE block 100925440 on dm-5
> umount(3958): WRITE block 100925440 on dm-5
> umount(3958): WRITE block 0 on dm-5
> umount(3958): dirtied inode 1053911 (mtab.tmp) on dm-0
> umount(3958): dirtied inode 1053911 (mtab.tmp) on dm-0
> umount(3958): WRITE block 33845632 on dm-0
> umount(3958): dirtied inode 1053912 (?) on dm-0
> 
> 
> Is the reason that I use protocol A?

No.

But that you need to understand caching, and tunables.

Some hints and keywords for a followup search:

Check how much "dirty" data (writes not yet on stable storage)
is still in RAM:
grep Dirty /proc/meminfo

Tune how much dirty data is "allowed"
sysctl
	vm.dirty_background_bytes
	vm.dirty_bytes
	vm.dirty_writeback_centisecs
	vm.dirty_expire_centisecs

also compare:
time dd if=t1 of=t2 bs=1M; time sync
time dd if=t1 of=t2 bs=1M conv=fsync

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed