Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hi there, i'ld address this problem to the filesystem handling. I assume there are much more elegant solutions, like degrading the drbd to reduce i/o overhead or tweak around with schedulers or ionice to get some free resources for the system during rm, but a very quick solution is, to split most read and write load. e.g. for f in `find $options_to_match_these`; do rm $f; done maybe the system gets stuck for a while on the find operation, but much shorter than rm. For this kind of problem, I'ld also avoid find -exec rm {} , but maybe this would do it also. Anyway, after you managed to rm the files, I'ld suggest to rethink your filesystem-, scheduler-, drbd- settings. Much luck! > > Hello DRBD-users worldwide... > > I've been using DRBD almost a year now, until now without problems that > I couldn't resolve myself. > But now I ran into quite a serious problem and I'm interested if someone > else experienced something similar with or without DRBD (as of course I > can't really be sure that DRBD is the problem): > > A few months ago a colleague of mine forgot to activate a cronjob, that > deletes a couple thousand very small temporary files each night on a > DRBD-device. Now I have a directory with, I guess more than a million > files, which wouldn't be so bad, if rm -rf {dir}/ could delete it. But > sadly that is not the case. > rm gets stuck after it deleted a few hundred files and doesn't resume > operation. Furthermore the all IO-access on the DRBD-device is complete > stuck until the rm process is killed. > > I've already disconnected all resources from it's peer and shut down > most of the non essential services on the machine. > > It's running Debian Lenny with > > uname -a > Linux srv1.xxx.at 2.6.26-2-openvz-amd64 #1 SMP Wed May 12 18:14:56 UTC > 2010 x86_64 GNU/Linux > > cat /proc/drbd > version: 8.3.7 (api:88/proto:86-91) > GIT-hash: ea9e28dbff98e331a62bcbcc63a6135808fe2917 build by > root at srv1.xxx.at, 2010-03-28 21:47:13 > 0: cs:WFConnection ro:Primary/Unknown ds:UpToDate/DUnknown C r---- > ns:1875795496 nr:0 dw:225995436 dr:566154981 al:105639961 > bm:11019801 lo:2 pe:0 ua:0 ap:1 ep:1 wo:b oos:1242040 > 1: cs:StandAlone ro:Secondary/Unknown ds:UpToDate/DUnknown r---- > ns:0 nr:31796784 dw:31796784 dr:2253416 al:0 bm:1134 lo:0 pe:0 ua:0 > ap:0 ep:1 wo:d oos:0 > 2: cs:StandAlone ro:Secondary/Unknown ds:UpToDate/DUnknown r---- > ns:0 nr:57709884 dw:143774088 dr:8480 al:0 bm:50 lo:0 pe:0 ua:0 > ap:0 ep:1 wo:d oos:0 > > The filesystem on resource 0 is ext3 with a block size of 4096 and lies > on a SW-RAID5 (far from ideal - I know). > > > Atm. I'm using a bash-hack, that kills the rm-process every 30 seconds > and restarts it as long as the directory still exists. > > Thanks for any hints to what might cause this problem. > > Joe > > -- > Joseph Hauptmann > > /digiconcept/ - GmbH. > 1080 Wien > Blindengasse 52/1 > > Tel. +43 1 218 0 212 - 24 > Fax +43 1 218 0 212 - 10 > > _______________________________________________ > drbd-user mailing list > drbd-user at lists.linbit.com > http://lists.linbit.com/mailman/listinfo/drbd-user