[DRBD-user] DRBD resource fenced by crm-fence-peer.sh with exit code 5

Andrew Martin amartin at xes-inc.com
Tue Jun 17 00:18:07 CEST 2014

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Hi Lars,

----- Original Message -----
> From: "Lars Ellenberg" <lars.ellenberg at linbit.com>
> To: drbd-user at lists.linbit.com
> Sent: Friday, June 13, 2014 9:46:12 AM
> Subject: Re: [DRBD-user] DRBD resource fenced by crm-fence-peer.sh with exit code 5
> > During testing, I've tried shutting down the currently-active node. When
> > doing
> > so, the fence peer handler inserts the constraint correctly, but it exits
> > with
> > exit code 5:
> > INFO peer is not reachable, my disk is UpToDate: placed constraint
> > 'drbd-fence-by-handler-ms_drbd_drives'
> 
> "Shutting down", is in how?
> Do you first cut the replication link, while still being primary?
> Well, that *of course* will prevent the other node from being promoted.
> That's exactly what this is supposed to do if a Primary loses the
> replication link.
I'm issuing the "reboot" command on the currently-primary node. I would expect
this to gracefully stop DRBD and transfer control over to the other node? I'd
like to simulate an accidental reboot of the server, a hardware failure, and
a kernel panic to verify that the other node can take over if the current
primary fails under any of these conditions.

> 
> > crm-fence-peer.sh exit codes:
> > http://www.drbd.org/users-guide-8.3/s-fence-peer.html
> > 
> > I can see this constraint in the CIB, however, the remaining (still
> > secondary)
> > node fails to promote.
> 
> Yes. Because that constraint tells it to not become Master.
> 
> > Moreover, when the original node is powered back on, it
> > repeatedly attempts to remove the constraint by calling
> > crm-unfence-peer.sh,
> 
> Is that so.
> I don't see why it would do that.
> the crm unfence should be called only by the after-resync-target handler,
> so you would need to have a resync, be sync target, and finish that
> resync successfully.
I actually have a wrapper script configured in /etc/drbd.conf which does some
additional logging:
fence-peer "/usr/local/bin/fence-peer crm-fence-peer";
after-resync-target "/usr/local/bin/fence-peer crm-unfence-peer";

In this wrapper script, I record the arguments and then call either
crm-fence-peer.sh or crm-unfence-peer.sh:
echo "Calling with $*" >> $LOG

# fence the peer
/usr/lib/drbd/$1.sh $@ >> $LOG 2>> $LOG

I can tail the log and see these being printed frequently after restarting
restarting the primary node:
Calling with crm-unfence-peer
Calling with crm-unfence-peer
Calling with crm-unfence-peer
Calling with crm-unfence-peer
Calling with crm-unfence-peer
Calling with crm-unfence-peer

I can update the script to include timestamps as well if that would be
helpful.

Thanks,

Andrew



More information about the drbd-user mailing list