[DRBD-user] How to rate-limit device cleanup (shred, dd)

Fri Jul 17 14:35:11 CEST 2015

On Thu, Jul 16, 2015 at 12:07:32PM +0200, Helmut Wollmersdorfer wrote:
> 
> Am 15.07.2015 um 14:49 schrieb Lars Ellenberg <lars.ellenberg at linbit.com>:
> 
> > On Wed, Jul 15, 2015 at 01:01:02PM +0200, Helmut Wollmersdorfer wrote:
> 
> […]
> 
> >> 
> >> This works nice for small devices (up to 10 GB):
> >> 
> >> # time shred -n0 -zv /dev/vg1/lv_drbd6_1
> >> shred: /dev/vg1/lv_drbd6_1: pass 1/1 (000000)...
> >> shred: /dev/vg1/lv_drbd6_1: pass 1/1 (000000)...800MiB/10GiB 7%
> >> shred: /dev/vg1/lv_drbd6_1: pass 1/1 (000000)...1.9GiB/10GiB 19%
> >> shred: /dev/vg1/lv_drbd6_1: pass 1/1 (000000)...2.9GiB/10GiB 29%
> >> shred: /dev/vg1/lv_drbd6_1: pass 1/1 (000000)...3.8GiB/10GiB 38%
> >> shred: /dev/vg1/lv_drbd6_1: pass 1/1 (000000)...4.9GiB/10GiB 49%
> >> shred: /dev/vg1/lv_drbd6_1: pass 1/1 (000000)...5.9GiB/10GiB 59%
> >> shred: /dev/vg1/lv_drbd6_1: pass 1/1 (000000)...7.0GiB/10GiB 70%
> >> shred: /dev/vg1/lv_drbd6_1: pass 1/1 (000000)...8.1GiB/10GiB 81%
> >> shred: /dev/vg1/lv_drbd6_1: pass 1/1 (000000)...9.2GiB/10GiB 92%
> >> shred: /dev/vg1/lv_drbd6_1: pass 1/1 (000000)...10GiB/10GiB 100%
> >> 
> >> real	0m48.919s
> >> user	0m0.544s
> >> sys	0m11.917s
> 
> That’s ~200 MB/s
> 
> 
> >> 
> >> But for larger ones e.g. 100 GB it blocks the IO on the XEN-node, triggering alerts in the monitoring of other running XEN-guests.
> >> 
> >> As far as I googled ionice would not work.
> >> 
> >> What I found as solution is piping through pv:
> >> 
> >> # time dd if=/dev/zero | pv -L 100M | dd of=/dev/vg1/lv_drbd7_2 
> >> dd: writing to `/dev/vg1/lv_drbd7_2': No space left on device                                                              ]
> >>   2GB 0:01:32 [22.1MB/s] [             <=>                                                                                ]
> > 
> > That does not sound right.
> > Your pv limits to 100 M/s, but you still get only 20 M/s?
> 
> Yeah, I pasted my first try to show exactly this problem. 
> In my first try I just took the half of 200 MB/s as limit, but the chosen way calling dd has the bottleneck at 20 MB/s.

As I already said:
bs=1M oflag=direct

> This means shred is very fast but un-nice, pv-piped dd is slow (maybe nice, maybe not). 
> 
> > 
> >> 4194305+0 records in
> >> 4194304+0 records out
> >> 2147483648 bytes (2.1 GB) copied, 92.9986 s, 23.1 MB/s
> >> 
> >> real	1m33.005s
> >> user	0m7.532s
> >> sys	0m52.443s
> >> 
> >> Is there a better way?
> > 
> > You don't care for the wall clock time needed,
> > but just for the impact on overall system performance?
> 
> Yes, stability of production has top priority.
> 
> But duration is not unimportant, depending on the need to reuse the space
> (At the moment the space is not needed).
> 
> > 
> > You really want to avoid to clobber your precious cache,
> > or even drive out "idle" data pages into swap.
> > 
> > Use direct IO.  Or limit total memory usage, including buffer cache pages,
> > using cgroups.  And use a rate limit (again, using cgroups, if you
> > like, or using as "crude" as "dd some chunk ; sleep 5; dd next chunk",
> > or your pv -L xM method above).
> > 
> > If your devices support "discard" you could just use blkdiscard.
> 
> This all smells like hours or days of RTFM & testing.

Please.  Yes, cgroups is much "cooler".
I still give you just the crude and ugly,
which at least works "everywhere".
Ok, for busybox you may need to remove the bashism ;-)

zero_me=/dev/X
size_byte=$( blockdev --getsize64 $zero_me )
M_at_once=100
sleep_per_step=2
steps=$(( size_byte / (M_at_once * 2**20) ))
for (( i = 0; i <= steps ; i++ )) ; do
	dd if=/dev/zero of=$zero_me oflag=direct \
		bs=1M count=$M_at_once seek=$(( i * $M_at_once ))
	sleep $sleep_per_step
done

The important part is the oflag=direct
                          ^^^^^^^^^^^^
(and of course the block size...)

You can even add a "sleep only if load > watermark" magic there.

# see if 1/5/15 minute load average is below watermarks
# if no watermark is given, the previous watermark is used
# if no watermark for 1 minute is given, 20 is assumed.
# load fractions are ignored
load_less_than_x()
{
	local x1=$1 x5=$2 x15=$3
	: ${x1:=20} ; : ${x5:=$x1} ; : ${x15:=$x5}
	set -- $(< /proc/loadavg)
	local l1=${1%.*} l5=${2%.*} l15=${3%.*}
	(( $l1 < $x1 || $l5 < $x5 || $l15 < $x15 ))
}

load_less_than_x 12 9 7 || sleep $sleep_per_step

-- 
: Lars Ellenberg
: http://www.LINBIT.com | Your Way to High Availability
: DRBD, Linux-HA  and  Pacemaker support and consulting

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed