Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hi, When installing DRBD on a pair of shuttle computers (sk41s), I find that it functions fine apart from after i simulate a failover. Powering off the primary machine, the secondary takes over ok. But when the failed machine rejoins the cluster it undertakes the partial resync and then hangs hard (part of the way through the resync). Any suggestions on how I can debug this further? Keyboard is locked up (caps lock doesn't work etc), and I've tried alt+sysrq+[srbp] etc. No messages are written out to the console (aside from normal boot up messages). I'm guessing it's hardware related, but the same systems only appear to have problems when running drbd. Other mirroring software (e.g. under Windows) hasn't behaved in a similar manner. Interestingly enough, this behaviour is repeatable (i.e. if i power cycle the hung machine, and let it boot normally it will hang again, and again (1 in 4 times it seems to succeed ok, and things go back to normal) I have noticed that if I reboot the hung machine into single user mode, and then start the network and drbd, it doesn't hang. I've tried disabling some random init scripts (e.g. hotplug, shmfs etc) but it doesn't help. </sigh> OS is SLES8 with the 2.4.21 kernel. DRBD version 0.7.4. (The same problems were experienced with drbd-0.6.12, although it didn't seem to be as consistant with regards to failing to undertake the resync after the failover). Suggestions welcome :) thanks David.