[DRBD-user] can't remove directory with a few million files

Joseph Hauptmann joseph.hauptmann at digiconcept.net
Fri Jan 28 22:39:23 CET 2011

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


thx again for the tip, but disconnecting the peer (ie. 
WFConnection-mode) was the first thing i've done.

i'm currently deleting with
find subdir/ -type f | while read LINE ; do rm -vf $LINE && sleep 0.03; done
that delay seems to be enough to not cause the device to block 
I/O-access and so at least the machine is online again. deleting this 
way though will most likely take till end of next week.

enjoy your weekend,

joe

Am 28.01.2011 21:59, schrieb Moti Levy:
> All I can think of is that DRBD is trying to catch up and causes the
> delays.
> Maybe take one of the nodes offline and try to delete without "real time
> replication" ?
>
> Moti
>
>
> On Fri, Jan 28, 2011 at 2:44 PM, Joseph Hauptmann<joseph at digiconcept.net>wrote:
>
>>   Yes, I did try that. Doesn't make much of a (speed) difference.
>>
>> It seems, that the problem is less that rm gets stuck for good, but that it
>> takes really long breaks (about 20 sec.) while deleting - during those
>> breaks the whole partition is stuck and iostat reports 100% utilization
>> compared to ~95% while actually deleting files. Could the "hang-time" be
>> DRBD writing meta-information (internal in my case) and blocking every other
>> access as long the meta-data isn't written to the disk? Of course there is
>> also the ext3-journal that has to be written, but still I don't see why it
>> should take that long: I'm currently timing how long it takes to delete a
>> subdir with 285868 block-sized files in it (already more than 30 min).
>>
>>
>> dmesg is clear, so it does not seem to be a SATA reset.
>>
>> any other ideas?
>>
>>
>>
>>
>>
>> Am 2011-01-28 20:02, schrieb Moti Levy:
>>
>> Have you tried :
>> find dirname -type f -exec rm {} \;
>>
>>
>>   On Fri, Jan 28, 2011 at 1:46 PM, Joseph Hauptmann<joseph at digiconcept.net
>>> wrote:
>>> Hello DRBD-users worldwide...
>>>
>>> I've been using DRBD almost a year now, until now without problems that I
>>> couldn't resolve myself.
>>> But now I ran into quite a serious problem and I'm interested if someone
>>> else experienced something similar with or without DRBD (as of course I
>>> can't really be sure that DRBD is the problem):
>>>
>>> A few months ago a colleague of mine forgot to activate a cronjob, that
>>> deletes a couple thousand very small temporary files each night on a
>>> DRBD-device. Now I have a directory with, I guess more than a million files,
>>> which wouldn't be so bad, if rm -rf {dir}/ could delete it. But sadly that
>>> is not the case.
>>> rm gets stuck after it deleted a few hundred files and doesn't resume
>>> operation. Furthermore the all IO-access on the DRBD-device is complete
>>> stuck until the rm process is killed.
>>>
>>> I've already disconnected all resources from it's peer and shut down most
>>> of the non essential services on the machine.
>>>
>>> It's running Debian Lenny with
>>>
>>> uname -a
>>> Linux srv1.xxx.at 2.6.26-2-openvz-amd64 #1 SMP Wed May 12 18:14:56 UTC
>>> 2010 x86_64 GNU/Linux
>>>
>>> cat /proc/drbd
>>> version: 8.3.7 (api:88/proto:86-91)
>>> GIT-hash: ea9e28dbff98e331a62bcbcc63a6135808fe2917 build by
>>> root at srv1.xxx.at, 2010-03-28 21:47:13
>>>   0: cs:WFConnection ro:Primary/Unknown ds:UpToDate/DUnknown C r----
>>>     ns:1875795496 nr:0 dw:225995436 dr:566154981 al:105639961 bm:11019801
>>> lo:2 pe:0 ua:0 ap:1 ep:1 wo:b oos:1242040
>>>   1: cs:StandAlone ro:Secondary/Unknown ds:UpToDate/DUnknown   r----
>>>     ns:0 nr:31796784 dw:31796784 dr:2253416 al:0 bm:1134 lo:0 pe:0 ua:0
>>> ap:0 ep:1 wo:d oos:0
>>>   2: cs:StandAlone ro:Secondary/Unknown ds:UpToDate/DUnknown   r----
>>>     ns:0 nr:57709884 dw:143774088 dr:8480 al:0 bm:50 lo:0 pe:0 ua:0 ap:0
>>> ep:1 wo:d oos:0
>>>
>>> The filesystem on resource 0 is ext3  with a block size of 4096 and lies
>>> on a SW-RAID5 (far from ideal - I know).
>>>
>>>
>>> Atm. I'm using a bash-hack, that kills the rm-process every 30 seconds and
>>> restarts it as long as the directory still exists.
>>>
>>> Thanks for any hints to what might cause this problem.
>>>
>>> Joe
>>>
>>> --
>>> Joseph Hauptmann
>>>
>>> /digiconcept/ - GmbH.
>>> 1080 Wien
>>> Blindengasse 52/1
>>>
>>> Tel. +43 1 218 0 212 - 24
>>> Fax +43 1 218 0 212 - 10
>>>
>>> _______________________________________________
>>> drbd-user mailing list
>>> drbd-user at lists.linbit.com
>>> http://lists.linbit.com/mailman/listinfo/drbd-user
>>>
>>
>>
>> --
>> Joseph Hauptmann
>>
>> /digiconcept/ - GmbH.
>> 1080 Wien
>> Blindengasse 52/1
>>
>> Tel. +43 1 218 0 212 - 24
>> Fax +43 1 218 0 212 - 10
>>
>>
>> _______________________________________________
>> drbd-user mailing list
>> drbd-user at lists.linbit.com
>> http://lists.linbit.com/mailman/listinfo/drbd-user
>>
>>




More information about the drbd-user mailing list