[DRBD-user] can't remove directory with a few million files

Fri Jan 28 23:14:54 CET 2011

Hi there,

i'ld address this problem to the filesystem handling.
I assume there are much more elegant solutions, like degrading the
drbd to reduce i/o overhead or tweak around with schedulers or ionice
to get some free resources for the system during rm, but a very quick
solution is, to split most read and write load.
e.g. for f in `find $options_to_match_these`; do rm $f; done
maybe the system gets stuck for a while on the find operation, but much
shorter than rm.
For this kind of problem, I'ld also avoid find -exec rm {} , but maybe
this would do it also.

Anyway, after you managed to rm the files, I'ld suggest to rethink your
filesystem-, scheduler-, drbd- settings.

Much luck!

> 
> Hello DRBD-users worldwide...
> 
> I've been using DRBD almost a year now, until now without problems that
> I couldn't resolve myself.
> But now I ran into quite a serious problem and I'm interested if someone
> else experienced something similar with or without DRBD (as of course I
> can't really be sure that DRBD is the problem):
> 
> A few months ago a colleague of mine forgot to activate a cronjob, that
> deletes a couple thousand very small temporary files each night on a
> DRBD-device. Now I have a directory with, I guess more than a million
> files, which wouldn't be so bad, if rm -rf {dir}/ could delete it. But
> sadly that is not the case.
> rm gets stuck after it deleted a few hundred files and doesn't resume
> operation. Furthermore the all IO-access on the DRBD-device is complete
> stuck until the rm process is killed.
> 
> I've already disconnected all resources from it's peer and shut down
> most of the non essential services on the machine.
> 
> It's running Debian Lenny with
> 
> uname -a
> Linux srv1.xxx.at 2.6.26-2-openvz-amd64 #1 SMP Wed May 12 18:14:56 UTC
> 2010 x86_64 GNU/Linux
> 
> cat /proc/drbd
> version: 8.3.7 (api:88/proto:86-91)
> GIT-hash: ea9e28dbff98e331a62bcbcc63a6135808fe2917 build by
> root at srv1.xxx.at, 2010-03-28 21:47:13
>   0: cs:WFConnection ro:Primary/Unknown ds:UpToDate/DUnknown C r----
>      ns:1875795496 nr:0 dw:225995436 dr:566154981 al:105639961
> bm:11019801 lo:2 pe:0 ua:0 ap:1 ep:1 wo:b oos:1242040
>   1: cs:StandAlone ro:Secondary/Unknown ds:UpToDate/DUnknown   r----
>      ns:0 nr:31796784 dw:31796784 dr:2253416 al:0 bm:1134 lo:0 pe:0 ua:0
> ap:0 ep:1 wo:d oos:0
>   2: cs:StandAlone ro:Secondary/Unknown ds:UpToDate/DUnknown   r----
>      ns:0 nr:57709884 dw:143774088 dr:8480 al:0 bm:50 lo:0 pe:0 ua:0
> ap:0 ep:1 wo:d oos:0
> 
> The filesystem on resource 0 is ext3  with a block size of 4096 and lies
> on a SW-RAID5 (far from ideal - I know).
> 
> 
> Atm. I'm using a bash-hack, that kills the rm-process every 30 seconds
> and restarts it as long as the directory still exists.
> 
> Thanks for any hints to what might cause this problem.
> 
> Joe
> 
> --
> Joseph Hauptmann
> 
> /digiconcept/ - GmbH.
> 1080 Wien
> Blindengasse 52/1
> 
> Tel. +43 1 218 0 212 - 24
> Fax +43 1 218 0 212 - 10
> 
> _______________________________________________
> drbd-user mailing list
> drbd-user at lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user