[DRBD-user] primary node crashes

Todd Denniston Todd.Denniston at ssa.crane.navy.mil
Fri Jun 24 16:12:22 CEST 2005

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Thomas Böhme wrote:
> 
> Hello,
> 
> a few weeks ago I allready posted my problem.
> 

<SNIP>
> When I add the second node the sync starts and 
> when its finished it is a consistent secondary. 
> After some hours (max. 1 day) the PRIMARY node 
> crashes without a single error in syslog. And 
> the sencondary only reports that the Primary 
> is dead (Network Failure). The secondary 
> becomes primary and I add the crashed node as 
> sencondary. Again the new primary node crashes 
> after some hours. If I do not add a secondary 
> node, the primary is stable over weeks.

Yikes, that sounds familiar, even though I was using SCSI.

> 
> 1) How is it possible to configure the primary 
> to log everything to syslog when it crashes?

Three things you might do:
1] configure your machines syslog to send to a separate machine,
man syslog.conf -> search for "Remote Machine".

2] build your own kernel and enable  
"Support for console on line printer (LP_CONSOLE)"
Device Drivers -> Character device -> \
(PRINTER) -> (LP_CONSOLE)
It might even already be configured in.

3] IIRC there used to be a "console on serial" option in the kernel, which
you could send the console (syslog) messages to another computer through the
serial line, I just can not find it in a 2.6.X kernel now.


Also possibly related to your problem  as you mentioned
> http://lists.linbit.com/pipermail/drbd-user/2005-June/003054.html
; From that previous message Thomas Böhme wrote:
; And then after some hours the primary crashes 
; with a kernel panic (I have no output from that):

Is the reason why you have no output, because your systems were running X
when they panicked?
(if you have seen my last few messages to the mailing lists, you'll note my
panic debugging theme)
If so switch to run level 3 (and shutdown X) prior to when you think the
system will panic, you should at least then be able to copy from the screen
what caused the panic. Doing this might be easier than getting console on a
printer or syslog to another computer though.

> 2) Are there any suggested modifications to add 
> to drbd.conf when using a 1.4 TB storage?

No clue if this should even be a problem, look back in the mailing list,
IIRC there were some conversations about problems with really big devices
(IIRC they were bigger than 2 or 4 TB though), with a previous version of
DRBD, that may give some light.

> 3) Can the crash be relatet to reiserfs or the 
> underlying device driver 
> (CONFIG_SCSI_QLA2XXX -> QLogic Corp. QLA2300 
> 64-bit Fibre Channel Adapter)?
> 

Anything is possible, but knowing what panicked the kernel should help
narrow down the options.

-- 
Todd Denniston
Crane Division, Naval Surface Warfare Center (NSWC Crane) 
Harnessing the Power of Technology for the Warfighter



More information about the drbd-user mailing list