[DRBD-user] Strange reboot situation

Lars Ellenberg lars.ellenberg at linbit.com
Sun Sep 7 01:11:35 CEST 2008

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


On Fri, Sep 05, 2008 at 10:15:49PM +0100, Henri Cook wrote:
> Hi all, I have a bizarre problem i'm hoping you can help me with.
> 
> Node A and Node B have /dev/drbd0 mounted in Primary-Primary on /shared
> 
> If Node B reboots, Node A stays online with the drive mounted and
> resyncs normally upon it's return.
> 
> IF however, there is an FTP transfer in progress to /shared on Node A
> when Node B gets rebooted, as soon as Node A loses the DRBD connection
> (Primary/Unknown) it chooses to reboot itself also.
> 
> This obviously means my HA setup is going down in a sort of chain
> reaction when under load - have i missed some obvious on-net-loss reboot
> type option?
> 
> UPDATE: It appears that when i just reboot Node B with no active
> transfer, it registers as 'WFConnection' whereas if I reboot with an
> active transfer it registers as a 'NetworkFailure' - safe to assume
> then that default NetworkFailure behaviour is to reboot - can anyone
> tell me how to change this??

No.
DRBD does no such thing.  "NetworkFailure" is just one of the normal
transitional states drbd goes through if the replication link "goes
away" unexpected for whatever reason, and is expected to settle
"quickly" to one of the less transient states like StandAlone or
WFConnection.

DRBD on occasions calls "user space helpers" called handlers.
verify if you have any halt/reboot/switchoffs configured there.
(though I don't see a reason for any of those helpers to be called in
this scenario).

OCFS2 does "ping to disk" and "self fencing"
if within a (configurable) timeout this "ping to disk" cannot be served.
maybe that is your problem?

you could also update to drbd 8.0.13, and see if that helps.

ad debugging messages of drbd:
you can make DRBD very noisy by echoing some values into
/sys/module/drbd/parameters/trace* for the values see drbd_int.h in the
drbd sources, search for TraceLevel and TraceType.
I don't think that will help you.
but you asked for it (off-list).

-- 
: Lars Ellenberg                
: LINBIT HA-Solutions GmbH
: DRBD®/HA support and consulting    http://www.linbit.com

DRBD® and LINBIT® are registered trademarks
of LINBIT Information Technologies GmbH
__
please don't Cc me, but send to list   --   I'm subscribed



More information about the drbd-user mailing list