[DRBD-user] Dual Primary Mode: Shared Directory blocked after node crash until reboot

DRBD User DRBDUser at gmx.at
Wed May 13 08:36:30 CEST 2015

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Ok i understand:

In a dual primary setup without a valid stonith configuration i have to wait until the crashed node is set to a *known* state: eg. using reboot, manual intervention.

But what if the crashed node never gets alive:
Will the stonith setup set the state of the crashed node to a *known* state, so that the active node can continue to operate ?
Or do I have to intervene manually ?

So for my plan to have a high available service (which saves its state to a shared directory) a primary/secondary setup may be the way to go - or i is fencing/stonith always a must ?
 
 

Gesendet: Dienstag, 12. Mai 2015 um 15:11 Uhr
Von: Ivan <ivan at c3i.bg>
An: drbd-user at lists.linbit.com
Betreff: Re: [DRBD-user] Dual Primary Mode: Shared Directory blocked after node crash until reboot
On 05/12/2015 02:09 PM, DRBD User wrote:
> Hi
>
> @Cesar: thx for your suggestion - but i don't want to do a manually fence.

from Digimer's replies to your posts:

1- the dlm "lock" will be released once the crashed node is set to a
*known* state in pacemaker. Without releasing, forget about using your
shared fs.
2- a *known* state requires a working stonith setup: either automatic
(IPMI, switched PDU, ...), or manual, as Cesar described.

Now, if you don't want to use stonith and you're brave enough to risk
having a split-brain (you have good backups, the data on the shared fs
is transient/not important, ...), I imagine you could have a shell
script with a loop running in the background that would automatically
ack a manual fence when needed. Or you could write a dummy stonith agent
that would always return success.



>
> during testing i found out, that after pulling power plug the shared directory it is not completely inaccessible : it is readable, only a write will block until crashed node restarts - BUT what if crashed node never restarts ? (my service saves it state into shared directory an should not block)
>
> maybe its better to switch from active/active to active/passive - or is here the situation (pull power plug, blocking..) the same ?
>
> thx
>
> Gesendet: Dienstag, 12. Mai 2015 um 12:33 Uhr
> Von: "Cesar Peschiera" <brain at click.com.py>
> An: "DRBD User" <DRBDUser at gmx.at>, drbd-user at lists.linbit.com
> Betreff: Re: [DRBD-user] Dual Primary Mode: Shared Directory blocked after node crash until reboot
>
> About of your problem of fence:
>
> Instead of use a fence by Hardware, you can use a manual fence that come with the cluster software.
>
> Please read this:
> 1- It not require any hardware.
> 2- This option isn't advisable in production environments, but useful in development environments.
> 3- The file used is "fence_ack_manual"
> 4- It is executed by CLI in a node that is alive for apply the fence to other server.
> 5- For use it, It is advisable that first disconnect totally the electric power on the server that will be fenced, the goal is to shut down brutally the server that will be fenced before of run the fence command.
> 6- Finally, execute this command in a node that is alive:
> Shell# /[PATH]/fence_ack_manual [IP or Name of the Node that will be fenced]
> 7- Follow the steps as directed by this command.
>
> I hope this information is helpful.
>
> Best regards
> Cesar
>
> ----- Original Message -----
> From: DRBD User[DRBDUser at gmx.at]
> To: drbd-user at lists.linbit.com[drbd-user at lists.linbit.com]
> Sent: Tuesday, May 12, 2015 5:39 AM
> Subject: [DRBD-user] Dual Primary Mode: Shared Directory blocked after node crash until reboot
>
>
> the DRBD status is (regardless of 'nice' shutdown (eg reboot) or 'abrupt' kill (eg pull power plug))
>
> cs:WFConnection ro:Primary/Unknown ds:UpToDate/Outdated
>
> but only with a 'nice' shutdown the shared directoy is still accessible...
>
>
> Gesendet: Dienstag, 12. Mai 2015 um 09:44 Uhr
> Von: Digimer <lists at alteeve.ca[lists at alteeve.ca]>
> An: "DRBD User" <DRBDUser at gmx.at>, drbd-user at lists.linbit.com
> Betreff: Re: [DRBD-user] Dual Primary Mode: Shared Directory blocked after node crash until reboot
> On 12/05/15 03:42 AM, DRBD User wrote:
>>>>> pacemakers pcs property stonith-enabled is currently set to false
>>
>>> Well there's your problem. :)
>>
>> Since i don't have any (hardware) STONITH device, i have set stonith-enabled to false.
>> DRBD's fencing rule is set to : 'fencing: resource-only'
>>
>> My goal is: if one node crashes, the other node should take over the work immediately. But actually i have to wait the reboot time of the crashed node. I thought, that in such a situation the active node (rather the shared directory) is immediately usable ?
>>
>> May be i should use another fence script ?
>>
>> I tried to create the resource with operation 'on-fail=restart' - but no success ...
>>
>> Any other suggestions ?
>
> You *CAN NOT* safely proceed when a node stops responding _until_ you
> have put the lost node into a known state. To do otherwise would be to
> risk a split-brain.
>
> A good fence device are switched PDUs, like the APC-brand AP7900 (not
> all makes/models are supported, so check first before buying other
> brands). The AP7900 can usually be found used for ~$200 and makes an
> excellent external fence device.
>
> Trying to use DRBD without proper fencing will result in pain and
> heartache. The delay needed to fence a lost node is FAR preferable to
> risking a split-brain.
>
> --
> Digimer
> Papers and Projects: https://alteeve.ca/w/[https://alteeve.ca/w/[https://alteeve.ca/w/]]
> What if the cure for cancer is trapped in the mind of a person without
> access to education?
> _______________________________________________
> drbd-user mailing list
> drbd-user at lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user[http://lists.linbit.com/mailman/listinfo/drbd-user][http://lists.linbit.com/mailman/listinfo/drbd-user[http://lists.linbit.com/mailman/listinfo/drbd-user]]
>
> ------------------------------------------------------------
>
> _______________________________________________
> drbd-user mailing list
> drbd-user at lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user_______________________________________________[http://lists.linbit.com/mailman/listinfo/drbd-user_______________________________________________] drbd-user mailing list drbd-user at lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user[http://lists.linbit.com/mailman/listinfo/drbd-user][http://lists.linbit.com/mailman/listinfo/drbd-user[http://lists.linbit.com/mailman/listinfo/drbd-user]]
> _______________________________________________
> drbd-user mailing list
> drbd-user at lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user[http://lists.linbit.com/mailman/listinfo/drbd-user]
>
_______________________________________________
drbd-user mailing list
drbd-user at lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user[http://lists.linbit.com/mailman/listinfo/drbd-user]



More information about the drbd-user mailing list