[DRBD-user] Xen block-drbd and fencing

Mon Oct 4 15:51:40 CEST 2010

Hello everybody,
has someone used crm-fence-peer.sh for resource fencing with block-drbd 
managed resources ?

Looking at the changelog I can see that in 8.3.7 it's supposed to work :

* crm-fence-peer.sh is now also usable if DRBD is managed from the xen 
block helper script

But I've tried to figure out how this could be done looking at the 
scripts (both block-drbd and crm-fence-peer) finding nothing that seems 
related to this feature.

My drbd version is 8.3.7 (from sources)
crm is corosync 1.2.1 (from sources)
I have configured 'Xen only' resources at crm level, while drbd is 
indirectly managed via block-drbd scripts.

my drbd resources are configured this way:

resource res1 {
   device    /dev/drbd1;
   disk      /dev/vhosts/res1;
   meta-disk internal;
   disk {
     on-io-error detach;
     fencing resource-only;
   }
   handlers {
     split-brain "/usr/lib/drbd/notify-split-brain.sh root";
     fence-peer  "/usr/lib/drbd/crm-fence-peer.sh";
     after-resync-target  "/usr/lib/drbd/crm-unfence-peer.sh";
   }
   syncer {
     rate 40M;
   }
   net {
     allow-two-primaries;
     max-buffers 8000;
     max-epoch-size 8000;
     sndbuf-size 0;
   }
   on cl1 {
     address   10.20.30.41:7789;
   }
   on cl2 {
     address   10.20.30.42:7789;
   }
}

The question is : is this supposed to work with drbd 8.3.7 ?
Or the crm-fence-peer scripts needs an explicit drbd resource in cib to 
operate with ?
In second case can I still use Xen resources with drbd-block.sh adding 
to cib the relative drbd resources or I need to go for a completely 
crm-managed resource group?

Lastly, with this setup in case of io-error on a host with primary state 
resource the manual says the resource will be detached and will "run 
diskless", but I can't figure out if the process of promoting the drbd 
resource to primary on the peer and starting the xen host is intended to 
be automatically managed by crm-fence scripts or if I must configure 
some other scripts to obtain the following result :

1) io failure on host cl1 (local disk failure) with res1=Primary on it
2) on-io-error detach detaches resource from device on cl1 - go Diskless
3) cl2 notifies crm of cl1 failure (how?)
4) crm starts res1 on cl2

Many thanks in advance.
Sauro Saltini.