[DRBD-user] lockup on primary node

Lars Ellenberg Lars.Ellenberg at linbit.com
Fri Jul 16 22:18:11 CEST 2004

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


/ 2004-07-16 14:52:58 -0500
\ John Lange:
> On Fri, 2004-07-16 at 14:16, Lars Ellenberg wrote:
> > have you any indication *what* may hang?
> > can you enable sysrq, and, when it hangs again,
> > hit sysrq-t (which is showTask)... and see which processes are in "D" state,
> > and where (this includes a stack backtrace of each process).
> > (if you hook up a serial console, you could capture the output)
> 
> The machines are in a datacenter across the city so its not easy to do
> but possible. I'm going to go down there now with a new crossover cable
> to see if that solves the problem.
> 
> If not I will resort to what you mentioned above. The biggest problem is
> that it is so unpredictable. Usually it only happens once every 2-3 days
> but some times its 2-3 times a day.
> 
> > what happens when you unplug the cable?
> > does the "hung" node recover then?
> 
> Haven't tried it but next time I get a hang I'll ask the data centre
> staff to do this instead of the hard-reset. They are fairly cooperative
> but I get the impression I'm starting to use up my free reboot
> support... :|
> 
> Thanks very much for your suggestions.

instead of plugging the cable, you could also do
 iptables -I INPUT -i eth1 -j DROP
 iptables -I OUTPUT -o eth1 -j DROP
(or even only drbd disconnect)
on the surviving box (when eth1 is your crossover cable)...
in case that the other box hangs somewhere within DRBD,
it _may_ recover when DRBD recognizes the connection loss.

	lge



More information about the drbd-user mailing list