[DRBD-user] DRBD over RAID1 / Pacemaker failure detection

Caspar Smit c.smit at truebit.nl
Wed Jun 2 10:50:08 CEST 2010

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


> On Tue, Jun 01, 2010 at 05:47:34PM +0200, Florian Haas wrote:
>> On 2010-06-01 15:17, Caspar Smit wrote:
>> > I tested the following situation:
>> >
>> > - I pulled out BOTH sdb and sdc on node 1.
>> > - Did a write action on the iscsitarget from a client machine.
>> > - DRBD (on-io-error detach) detached the primary node 1 and became:
>> >   Primary/Secondary - Diskless/UpToDate
>> > - The performance degraded significantly after it became diskless.
>> >
>> > Is it possible for the Linbit OCF RA script to detect that a low level
>> IO
>> > error occured on the DRBD backed storage (md0)?
>> >
>> > What I would like is that in a case of low level failure (software
>> raid
>> > failure) that a failover takes place and node 2 becomes the primary
>> > because now that doesn't happen.
>>
>> It doesn't hurt to peruse the documentation. :)
>>
>> http://www.drbd.org/users-guide/s-handling-disk-errors.html
>> http://www.drbd.org/users-guide/s-configure-io-error-behavior.html
>>
>> If you use a local-io-error handler that simply does
>> "echo o > /proc/sysrq-trigger", then in case of an I/O error the node
>> will remove itself from the cluster and Pacemaker will initiate
>> failover.
>
> That may be a bit brutal, but should of course work.
>
> The less brutal way would use the preference scores.
> The monitor action of ocf:linbit:drbd does adjust its master score
> based on various things, also on the local disk state.
>
> If the local disk state is not UpToDate, the master score will be <= 10,
> if the local disk is UpToDate and we are connected, it will be 10000.
>
> So unless you have other scores that weight more,
> I expect the PE to decide that moving the resources
> over would be a good idea.

The Policy Engine is one part that is a little fuzzy to me.
I entered node 1 as preferred master so I guess that is the reason the PE
doesn't decide to failover.

location ms-drbd0-master-on-node1 ms-drbd0 \
rule $id="ms-drbd0-master-on-node1-rule" $role="master" 100: #uname eq node1

So do I need to adjust this constraint to let it failover?

Thanks for the help,
Caspar Smit

>
>
> --
> : Lars Ellenberg
> : LINBIT | Your Way to High Availability
> : DRBD/HA support and consulting http://www.linbit.com
>
> DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
> __
> please don't Cc me, but send to list   --   I'm subscribed
> _______________________________________________
> drbd-user mailing list
> drbd-user at lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user
>




More information about the drbd-user mailing list