[DRBD-user] DRBD over RAID1 / Pacemaker failure detection

Tue Jun 1 17:56:41 CEST 2010

On Tue, Jun 01, 2010 at 05:47:34PM +0200, Florian Haas wrote:
> On 2010-06-01 15:17, Caspar Smit wrote:
> > I tested the following situation:
> > 
> > - I pulled out BOTH sdb and sdc on node 1.
> > - Did a write action on the iscsitarget from a client machine.
> > - DRBD (on-io-error detach) detached the primary node 1 and became:
> >   Primary/Secondary - Diskless/UpToDate
> > - The performance degraded significantly after it became diskless.
> > 
> > Is it possible for the Linbit OCF RA script to detect that a low level IO
> > error occured on the DRBD backed storage (md0)?
> > 
> > What I would like is that in a case of low level failure (software raid
> > failure) that a failover takes place and node 2 becomes the primary
> > because now that doesn't happen.
> 
> It doesn't hurt to peruse the documentation. :)
> 
> http://www.drbd.org/users-guide/s-handling-disk-errors.html
> http://www.drbd.org/users-guide/s-configure-io-error-behavior.html
> 
> If you use a local-io-error handler that simply does
> "echo o > /proc/sysrq-trigger", then in case of an I/O error the node
> will remove itself from the cluster and Pacemaker will initiate failover.

That may be a bit brutal, but should of course work.

The less brutal way would use the preference scores.
The monitor action of ocf:linbit:drbd does adjust its master score
based on various things, also on the local disk state.

If the local disk state is not UpToDate, the master score will be <= 10,
if the local disk is UpToDate and we are connected, it will be 10000.

So unless you have other scores that weight more,
I expect the PE to decide that moving the resources
over would be a good idea.

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed