[DRBD-user] System lockup with DRBD

chambal 2iow-li6l at dea.spamcon.org
Sun Oct 24 05:37:29 CEST 2010


I have a pair of VIA M800 Mini-ITX with SSD (one OCZ
Vertex-Turbo, one Intel), and CentOS 5.5 with current patches.

When I have DRBD active on both units, at some random point but
always within one day, one of the units has completely locked up.
In all but one case, it's the Primary unit.

When I say locked up, I mean the PC is completely frozen -
keyboard is dead (can't toggle numlock, and Alt-SysRq - which is
enabled - doesn't work), there's no kernel panic dump on the
physical console, there's no response to tapping the power
switch, and it can't be pinged.  There's nothing in the syslog
after it's forcibly rebooted.

Possibly important clue: the front panel LED for hard disk
activity is solidly on when the failure occurs.

When I have DRBD running on only the active (Primary) unit (did
"service drbd stop" on the inactive (Secondary) unit), this
lockup never occurs.

There is not very much disk read/write activity on the shared
partition.  Both units are on the same local private LAN segment.

Originally I was using DRBD 8.0.1 (which didn't have this problem
on different much older hardware and OS), then updated to DRBD
8.0.16, then yesterday to 8.3.9.  No difference in the problem.
Because the kernel is 2.6.18-194.17.1.el5 I still have to use a
kernel module.

I am rather lost on how to proceed in tracking down the cause of
this problem or a solution.




More information about the drbd-user mailing list