[DRBD-user] Dual Primary Mode: Shared Directory blocked after node crash until reboot

Digimer lists at alteeve.ca
Mon May 11 17:24:41 CEST 2015

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


On 11/05/15 05:20 AM, DRBD User wrote:
> I have configured these handlers:
> 
> fence-peer: /usr/lib/drbd/crm-fence-peer.sh;
> after-resync-target: /usr/lib/drbd/crm-unfence-peer.sh;
> split-brain: /usr/lib/drbd/notify-split-brain.sh;
> 
> pacemakers pcs property stonith-enabled is currently set to false

Well there's your problem. :)

DRBD asks pacemaker to fence the node and waits for a "success". If
stonith is disabled (or enabled but not properly configured), pacemaker
will not return success and thus DRBD will block (assuming you have
'fencing: resource-and-stonith;').

> I do a synchronization of 2 services using a lck file in the shared directory.
> 
> If one service can lock the lck file, it becomes master service and can serialize its status in the shared directory.
> 
> If one service dies, the other service can take over the work (lock the lck file, deserialize the status, ...)
> 
> When i stop the node running the master service using 'pcs cluster standby node2', the shared directory is accessible on node1 and the service on node1 becomes master - but if i do a power plug pull i have to wait the reboot time, until the (new) master service has access to the shared directory.

Fencing is required only when a node enters an unknown state (or
pacemaker is configured to self-fence on failure to stop a resouce). So
when you put a node into an "I'm out, later guys" mode, fencing is not
needed.

> 1) >>> Without successful fencing, the only safe option is to lock up
> How do i do this ?

Given your report that it blocked, it sounds like it is working. :)

> 2) It is possible to define the time when an active node gets informed about another node crash (currently it takes about 2sec before DRBD changes the status of the crashed node) ?

The active node will know about the loss of it's peer very quickly
(depending on your corosync/drbd config). That's why fencing was called.

-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?



More information about the drbd-user mailing list