Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
/ 2006-05-28 18:39:21 -0400 \ Maurice Volaski: > drbd-0.7.19 under kernel 2.6.17-rc4 is running on a primary node > standalone. There are 8 resources in the same group. fsck.ext3 -fv > is being run simultaneously on all of them. Each of the drbd devices > are running on an lv, which all belong to a single pv. The actual > "disk" is a hardware RAID connected via SCSI (i.e., the mpt driver). > > Five of the fsck finished their tasks successfully and reported no > problems. The remaining three got "stuck". There was no activity > either on the physical RAID itself or listed in top. They were just > listed as "D," uninterruptible sleep. Two of the fscks were at the > end, giving the final summary information--no problems---for their > respective filesystems and were stuck at that point. The last one > was stuck in the first pass. Attempts to kill them failed; even kill > -9 and attempting to shutdown were ignored. I rebooted manually and > ran the three stuck ones again without a hitch. you cannot "kill" "uninterruptible sleep". thats just the point of being "uninterruptible". If you suspect a DRBD bug (you obviously do), * _reproduce_, preferably with a non-rc-kernel if you cannot reproduce, we'll file it under "cosmic rays". if you can only reproduce with some rc-kernel, it is very likely that drbd is not the cause, but the trigger only: -> tell "them" about it. once "easily" reproduced: * is anything else "stuck"? * what are the numbers in /proc/drbd * get the wchan of the stuck processes via top, or ps -eo pid,wchan:40,comm, or cat /proc/<pid>/wchan (not only the fsck, but everything that may be related, so fs-related kernel threads, vm-related kernel threads, drbd-related kernel-threads). * get a backtrace of the stuck processess e.g. echo t > /proc/sysrq-trigger (again, everything that may be related) note that these are not necessarily reliable, but may be helpful to understand whats going on. anything else that may be helpfull to debug the problem. you seem to have a habit of complaining on many lists about many things, you meanwhile should know that just saying "it does not work" won't help in solving issues. And you probably also have already been told many times what is necessary for a useful bug report. as for the Question: [Q] What would cause fsck running on a drbd device to just stop? could be that you have just been too impatient. could be a process scheduler bug, could be a io-scheduler bug, could even be a hardware thing. could be an in-kernel memory allocation within the io-path without GFP_NOIO. this does not need to be in drbd, drbd just causes more memory pressure. could be... lot of things one can speculate about if one just has "does not work". cheers, -- : Lars Ellenberg Tel +43-1-8178292-0 : : LINBIT Information Technologies GmbH Fax +43-1-8178292-82 : : Schoenbrunner Str. 244, A-1120 Vienna/Europe http://www.linbit.com : __ please use the "List-Reply" function of your email client.