Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On Mon, Sep 10, 2007 at 02:45:29PM -0700, Jason Zhang wrote: > I have an experimental setup, and the primary node seems to lockup in a > few occasions, mostly due to it is not connected to secondary (manually > disconnected or connection breakage). Once it enters in that state, > issuing shutdown command won't reboot the machine, had to power cycle. first, upgrade to 8.0.6, please. and give your exact kernel version, too. > My questions are: > > 1. what information should I gather in this case? you could do cat /proc/drbd ps -eo pid,state,wchan:40,comm | grep drbd or even echo 1 > /proc/sys/kernel/sysrq echo t > /proc/sysrq-trigger then take the stack dumps from wherever your syslog puts the kernel messages, into some file, throw away the uninteressting userland processes, if possible reduce the list to only kernel related, vm, fs, maybe network and blockdevice related things, _avoid linebreaks_, gzip it, and post that. but since this may be A LOT of information which may be completely useless, someone would need to really be motivated to look at that. > 2. any software methods to kill/reload drbd besides physically pressing > the reset/power buttons? it depends on the nature of that "lockup". mostly the answer is: no, you have to reset. well, you can do echo s > /proc/sysrq-trigger echo u > /proc/sysrq-trigger maybe stop all stopable md-devices now, if you know a way to do so, then echo b > /proc/sysrq-trigger and hope for the best, which may be slightly more convenient if you don't have a network power switch, and remote hands are cumbersome and costly... -- : Lars Ellenberg Tel +43-1-8178292-0 : : LINBIT Information Technologies GmbH Fax +43-1-8178292-82 : : Vivenotgasse 48, A-1120 Vienna/Europe http://www.linbit.com : __ please use the "List-Reply" function of your email client.