[DRBD-user] can't remove directory with a few million files

Joseph Hauptmann joseph at digiconcept.net
Fri Jan 28 19:46:43 CET 2011

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Hello DRBD-users worldwide...

I've been using DRBD almost a year now, until now without problems that 
I couldn't resolve myself.
But now I ran into quite a serious problem and I'm interested if someone 
else experienced something similar with or without DRBD (as of course I 
can't really be sure that DRBD is the problem):

A few months ago a colleague of mine forgot to activate a cronjob, that 
deletes a couple thousand very small temporary files each night on a 
DRBD-device. Now I have a directory with, I guess more than a million 
files, which wouldn't be so bad, if rm -rf {dir}/ could delete it. But 
sadly that is not the case.
rm gets stuck after it deleted a few hundred files and doesn't resume 
operation. Furthermore the all IO-access on the DRBD-device is complete 
stuck until the rm process is killed.

I've already disconnected all resources from it's peer and shut down 
most of the non essential services on the machine.

It's running Debian Lenny with

uname -a
Linux srv1.xxx.at 2.6.26-2-openvz-amd64 #1 SMP Wed May 12 18:14:56 UTC 
2010 x86_64 GNU/Linux

cat /proc/drbd
version: 8.3.7 (api:88/proto:86-91)
GIT-hash: ea9e28dbff98e331a62bcbcc63a6135808fe2917 build by 
root at srv1.xxx.at, 2010-03-28 21:47:13
  0: cs:WFConnection ro:Primary/Unknown ds:UpToDate/DUnknown C r----
     ns:1875795496 nr:0 dw:225995436 dr:566154981 al:105639961 
bm:11019801 lo:2 pe:0 ua:0 ap:1 ep:1 wo:b oos:1242040
  1: cs:StandAlone ro:Secondary/Unknown ds:UpToDate/DUnknown   r----
     ns:0 nr:31796784 dw:31796784 dr:2253416 al:0 bm:1134 lo:0 pe:0 ua:0 
ap:0 ep:1 wo:d oos:0
  2: cs:StandAlone ro:Secondary/Unknown ds:UpToDate/DUnknown   r----
     ns:0 nr:57709884 dw:143774088 dr:8480 al:0 bm:50 lo:0 pe:0 ua:0 
ap:0 ep:1 wo:d oos:0

The filesystem on resource 0 is ext3  with a block size of 4096 and lies 
on a SW-RAID5 (far from ideal - I know).


Atm. I'm using a bash-hack, that kills the rm-process every 30 seconds 
and restarts it as long as the directory still exists.

Thanks for any hints to what might cause this problem.

Joe

-- 
Joseph Hauptmann

/digiconcept/ - GmbH.
1080 Wien
Blindengasse 52/1

Tel. +43 1 218 0 212 - 24
Fax +43 1 218 0 212 - 10




More information about the drbd-user mailing list