[DRBD-user] How to rate-limit device cleanup (shred, dd)

Wed Jul 15 14:49:52 CEST 2015

On Wed, Jul 15, 2015 at 01:01:02PM +0200, Helmut Wollmersdorfer wrote:
> Hi,
> 
> in an environment of
> 
> - raid
> - LVM on raid
> - DRBD on LVM
> - xen guest on DRBD
> 
> the space of the device needs to be cleaned (overwritten by zeroes or random) before deletion of the logical volume.
> 
> The usual steps would be
> 
> # drbdadm down drbd6_2
> # shred -n0 -zv /dev/vg1/lv_drbd6_2 
> # lvremove /dev/vg1/lv_drbd6_2
> 
> This works nice for small devices (up to 10 GB):
> 
> # time shred -n0 -zv /dev/vg1/lv_drbd6_1
> shred: /dev/vg1/lv_drbd6_1: pass 1/1 (000000)...
> shred: /dev/vg1/lv_drbd6_1: pass 1/1 (000000)...800MiB/10GiB 7%
> shred: /dev/vg1/lv_drbd6_1: pass 1/1 (000000)...1.9GiB/10GiB 19%
> shred: /dev/vg1/lv_drbd6_1: pass 1/1 (000000)...2.9GiB/10GiB 29%
> shred: /dev/vg1/lv_drbd6_1: pass 1/1 (000000)...3.8GiB/10GiB 38%
> shred: /dev/vg1/lv_drbd6_1: pass 1/1 (000000)...4.9GiB/10GiB 49%
> shred: /dev/vg1/lv_drbd6_1: pass 1/1 (000000)...5.9GiB/10GiB 59%
> shred: /dev/vg1/lv_drbd6_1: pass 1/1 (000000)...7.0GiB/10GiB 70%
> shred: /dev/vg1/lv_drbd6_1: pass 1/1 (000000)...8.1GiB/10GiB 81%
> shred: /dev/vg1/lv_drbd6_1: pass 1/1 (000000)...9.2GiB/10GiB 92%
> shred: /dev/vg1/lv_drbd6_1: pass 1/1 (000000)...10GiB/10GiB 100%
> 
> real	0m48.919s
> user	0m0.544s
> sys	0m11.917s
> 
> But for larger ones e.g. 100 GB it blocks the IO on the XEN-node, triggering alerts in the monitoring of other running XEN-guests.
> 
> As far as I googled ionice would not work.
> 
> What I found as solution is piping through pv:
> 
> # time dd if=/dev/zero | pv -L 100M | dd of=/dev/vg1/lv_drbd7_2 
> dd: writing to `/dev/vg1/lv_drbd7_2': No space left on device                                                              ]
>    2GB 0:01:32 [22.1MB/s] [             <=>                                                                                ]

That does not sound right.
Your pv limits to 100 M/s, but you still get only 20 M/s?

> 4194305+0 records in
> 4194304+0 records out
> 2147483648 bytes (2.1 GB) copied, 92.9986 s, 23.1 MB/s
> 
> real	1m33.005s
> user	0m7.532s
> sys	0m52.443s
> 
> Is there a better way?

You don't care for the wall clock time needed,
but just for the impact on overall system performance?

You really want to avoid to clobber your precious cache,
or even drive out "idle" data pages into swap.

Use direct IO.  Or limit total memory usage, including buffer cache pages,
using cgroups.  And use a rate limit (again, using cgroups, if you
like, or using as "crude" as "dd some chunk ; sleep 5; dd next chunk",
or your pv -L xM method above).

If your devices support "discard" you could just use blkdiscard.

-- 
: Lars Ellenberg
: http://www.LINBIT.com | Your Way to High Availability
: DRBD, Linux-HA  and  Pacemaker support and consulting

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed