Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
I have a pair of VIA M800 Mini-ITX with SSD (one OCZ Vertex-Turbo, one Intel), and CentOS 5.5 with current patches. When I have DRBD active on both units, at some random point but always within one day, one of the units has completely locked up. In all but one case, it's the Primary unit. When I say locked up, I mean the PC is completely frozen - keyboard is dead (can't toggle numlock, and Alt-SysRq - which is enabled - doesn't work), there's no kernel panic dump on the physical console, there's no response to tapping the power switch, and it can't be pinged. There's nothing in the syslog after it's forcibly rebooted. Possibly important clue: the front panel LED for hard disk activity is solidly on when the failure occurs. When I have DRBD running on only the active (Primary) unit (did "service drbd stop" on the inactive (Secondary) unit), this lockup never occurs. There is not very much disk read/write activity on the shared partition. Both units are on the same local private LAN segment. Originally I was using DRBD 8.0.1 (which didn't have this problem on different much older hardware and OS), then updated to DRBD 8.0.16, then yesterday to 8.3.9. No difference in the problem. Because the kernel is 2.6.18-194.17.1.el5 I still have to use a kernel module. I am rather lost on how to proceed in tracking down the cause of this problem or a solution.