[DRBD-user] ZFS storage backend failed

Fri Feb 9 18:35:47 CET 2018

Hello,
I'm just doing a lab about zpool as storage backend for DRBD (storing VM images
with Proxmox).

Right now, it's pretty good once tuned and I've been able to achieve 500MB/s
write speed with just a little curiosity about concurrent write from both
hypervisors cluster but that's not the point here.

To complete resiliancy tests, I simplify unplugged a disk from a node. My toughs
was DRBD was just going to detect ZFS failure and detach the ressources from
failed device.

But ... nothing. I/O just hangs on VMs ran on the 'failed' node.

My zpool status :

	NAME        STATE     READ WRITE CKSUM
	drbdpool    UNAVAIL      0     0     0  insufficient replicas
	  sda       UNAVAIL      0     0     0

but drbdadm show this for locally hosted VM (on the failed node) :
vm-101-disk-1 role:Primary
  disk:UpToDate
  hyper-test-02 role:Secondary
    peer-disk:UpToDate

and remote VM (on the 'sane' node from failed node point of view) :
vm-104-disk-1 role:Secondary
  disk:Consistent
  hyper-test-02 connection:NetworkFailure

So it seems that DRBD didn't detect the I/O failure.

Is there a way to force automatic failover in this case ? I probably missed a
detection mecanism.

Best regards,
Julien Escario