[DRBD-user] umount costs lots of time in drbd 8.4.3

Tue May 14 02:07:12 CEST 2013

On Thu, May 09, 2013 at 10:33:16AM +0800, Mia Lueng wrote:
> # sysctl -a|grep dirty
> vm.dirty_background_ratio = 10
> vm.dirty_background_bytes = 0
> vm.dirty_ratio = 20
> vm.dirty_bytes = 0
> vm.dirty_writeback_centisecs = 500
> vm.dirty_expire_centisecs = 3000
> 
> bandwidth is 100M bps

You can replicate around 10 to 12 MByte per second.
To avoid long "write-out stalls" when flushing caches,
you should not allow more than about 20 MByte dirty,
and start write out much earlier.

vm.dirty_bytes=20100100
vm.dirty_background_bytes=500100
vm.dirty_writeback_centisecs=97

A ratio of 20 % of available RAM may well mean several GB.
How much RAM do you have?

Depending on what usage patterns and data characteristics
you actually have in production, maybe you want to try drbd-proxy.
Or check with LINBIT what other options you have.

> 2013/5/9 Lars Ellenberg <lars.ellenberg at linbit.com>
> 
> > On Thu, May 09, 2013 at 12:16:56AM +0800, Mia Lueng wrote:
> > > in drbd 8.4.3，I do the following test:
> > >
> > > [root at kvm3 drbd.d]# drbdadm dump drbd0
> > > # resource drbd0 on kvm3: not ignored, not stacked
> > > # defined at /etc/drbd.d/drbd0.res:1
> > > resource drbd0 {
> > >     on kvm3 {
> > >         device           /dev/drbd0 minor 0;
> > >         disk             /dev/vg_kvm3/drbd0;
> > >         meta-disk        internal;
> > >         address          ipv4 192.168.10.6:7700;
> > >     }
> > >     on kvm4 {
> > >         device           /dev/drbd0 minor 0;
> > >         disk             /dev/vg_kvm4/drbd0;
> > >         meta-disk        internal;
> > >         address          ipv4 192.168.10.7:7700;
> > >     }
> > >     net {
> > >         protocol           A;
> > >         csums-alg        md5;
> > >         verify-alg       md5;
> > >         ping-timeout      30;
> > >         ping-int          30;
> > >         max-epoch-size   8192;
> > >         max-buffers      8912;
> > >         unplug-watermark 131072;
> > >     }
> > >     disk {
> > >         on-io-error      pass_on;
> > >         disk-barrier      no;
> > >         disk-flushes      no;
> > >         resync-rate      100M;
> > >         c-plan-ahead      20;
> > >         c-delay-target   100;
> > >         c-max-rate       400M;
> > >         c-min-rate        2M;
> > >         al-extents       601;
> > >     }
> > > }
> > >
> > > [root at kvm3 oradata]# dd if=t1 of=t2 bs=1M
> > > 5585+1 records in
> > > 5585+1 records out
> > > 5856305152 bytes (5.9 GB) copied, 286.119 s, 20.5 MB/s
> >
> > That writes to the page cache, and from there to the block device.
> >
> > No fsync, no sync: there will still be a few GB in the cache (RAM only).
> >
> > > [root at kvm3 oradata]# cd
> > > [root at kvm3 ~]# umount /oradata
> > >
> > >
> > > it takes lots of time(up to 600 seconds)  to umount the drbd mount point.
> >
> > On umount, the filesystem obviously has to flush all dirty pages first.
> >
> > What is your replication bandwidth?
> >
> > > echo "1" >/proc/sys/vm/block_dump
> > > show when umount ,
> > >
> > > [root at kvm3 ~]# dmesg|tail -n 100
> > ...
> > > umount(3958): WRITE block 100925440 on dm-5
> > > umount(3958): WRITE block 100925440 on dm-5
> > > umount(3958): WRITE block 100925440 on dm-5
> > > umount(3958): WRITE block 0 on dm-5
> > > umount(3958): dirtied inode 1053911 (mtab.tmp) on dm-0
> > > umount(3958): dirtied inode 1053911 (mtab.tmp) on dm-0
> > > umount(3958): WRITE block 33845632 on dm-0
> > > umount(3958): dirtied inode 1053912 (?) on dm-0
> > >
> > >
> > > Is the reason that I use protocol A?
> >
> > No.
> >
> > But that you need to understand caching, and tunables.
> >
> > Some hints and keywords for a followup search:
> >
> > Check how much "dirty" data (writes not yet on stable storage)
> > is still in RAM:
> > grep Dirty /proc/meminfo
> >
> > Tune how much dirty data is "allowed"
> > sysctl
> >         vm.dirty_background_bytes
> >         vm.dirty_bytes
> >         vm.dirty_writeback_centisecs
> >         vm.dirty_expire_centisecs
> >
> > also compare:
> > time dd if=t1 of=t2 bs=1M; time sync
> > time dd if=t1 of=t2 bs=1M conv=fsync
> >
> >
> >
> >
> > --
> > : Lars Ellenberg
> > : LINBIT | Your Way to High Availability
> > : DRBD/HA support and consulting http://www.linbit.com
> >
> > DRBD(R) and LINBIT(R) are registered trademarks of LINBIT, Austria.
> > __
> > please don't Cc me, but send to list   --   I'm subscribed
> > _______________________________________________
> > drbd-user mailing list
> > drbd-user at lists.linbit.com
> > http://lists.linbit.com/mailman/listinfo/drbd-user
> >

> _______________________________________________
> drbd-user mailing list
> drbd-user at lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed