[DRBD-user] machine hanging after resync (drbd 0.7.4, SLES8)

David Goodwin david at openminds.co.uk
Fri Sep 24 17:28:24 CEST 2004

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Hi,

When installing DRBD on a pair of shuttle computers (sk41s), I find that 
it functions fine apart from after i simulate a failover.


Powering off the primary machine, the secondary takes over ok. But when 
the failed machine rejoins the cluster it undertakes the partial resync 
and then hangs hard (part of the way through the resync).

Any suggestions on how I can debug this further? Keyboard is locked up 
(caps lock doesn't work etc), and I've tried alt+sysrq+[srbp] etc. No 
messages are written out to the console (aside from normal boot up 
messages).


I'm guessing it's hardware related, but the same systems only appear to 
have problems when running drbd. Other mirroring software (e.g. under 
Windows) hasn't behaved in a similar manner.



Interestingly enough, this behaviour is repeatable (i.e. if i power 
cycle the hung machine, and let it boot normally it will hang again, and 
again (1 in 4 times it seems to succeed ok, and things go back to normal)


I have noticed that if I reboot the hung machine into single user mode, 
and then start the network and drbd, it doesn't hang. I've tried 
disabling some random init scripts (e.g. hotplug, shmfs etc) but it 
doesn't help.

</sigh>

OS is SLES8 with the 2.4.21 kernel. DRBD version 0.7.4. (The same 
problems were experienced with drbd-0.6.12, although it didn't seem to 
be as consistant with regards to failing to undertake the resync after 
the failover).

Suggestions welcome :)


thanks
David.



More information about the drbd-user mailing list