[DRBD-user] Drbd/pacemaker active/passive san failover

Mon Sep 19 10:09:02 CEST 2016

On 19/09/16 03:37 AM, Marco Marino wrote:
> Hi, I'm trying to build an active/passive cluster with drbd and
> pacemaker for a san. I'm using 2 nodes with one raid controller
> (megaraid) on each one. Each node has an ssd disk that works as cache
> for read (and write?) realizing the CacheCade proprietary tecnology.
> 
> Basically, the structure of the san is:
> 
> Physycal disks -> RAID -> Device /dev/sdb in the OS -> Drbd resource
> (that use /dev/sdb as backend) (using pacemaker with a master/slave
> resource) -> VG (managed with pacemaker) -> Iscsi target (with
> pacemaker) -> Iscsi LUNS (one for each logical volume in the VG, managed
> with pacemaker)
> 
> Few days ago, the ssd disk was wrongly removed from the primary node of
> the cluster and this caused a lot of problems: drbd resource and all
> logical volumes went in readonly mode with a lot of I/O errors but the
> cluster did not switched to the other node. All filesystem on initiators

Corosync detects node faults and causes pacemaker to trigger fencing.
Storage failures, by default, don't trigger failover. DRBD should have
marked the node with failed storage as Diskless and continued to
function be reading and writing to the good node's storage. So that's
where I would start looking for problems.

> went to readonly mode. There are 2 problems involved here (I think): 1)
> Why removing the ssd disk cause a readonly mode with I/O errors? This

Was it the FS that remounted read-only? The IO errors could have been
the OS complaining. Without logs, it's impossible to say what was
actually happening.

> means that the ssd is a single point of failure for a single node san
> with megaraid controllers and CacheCade tecnology..... and 2) Why drbd
> not worked as espected?

Again, no logs means we can't say. If your array is redundant (RAID
level 1, 5 or 6) then a drive failure should be no problem. If it's not
redundant, then drbd should have been able to continue to function, as
mentioned above.

> For point 1) I'm checking with the vendor and I doubt that I can do
> something

The OS, and DRBD, will react based on what the controller's driver tells
it has happened. Why whatever happened would require pulling the logs
from the controller.

If I recall our conversation on IRC, I believe you had a supermicro box
and possibly suffered a back-plane failure. If so, then it's possible
more than just one SSD failed. Again, not something we can speak to hear
without logs and feedback from your hardware vendor's diagnostics.

> For point 2) I have errors in the drbd configuration. My idea is that
> when an I/O error happens on the primary node, the cluster should switch
> to the secondary node and shut down the damaged node.
> Here -> http://pastebin.com/79dDK66m it is possible to see the actual
> drbd configuration, but I need to change a lot of things and I want to
> share my ideas here:

You need to set 'fencing resource-and-stonith;', first of all. Secondly,
does stonith work in pacemaker? Also, setting 'degr-wfc-timeout 0;'
seems ... aggressive.

> 1) The "handlers" section should be moved in the "common" section of
> global_common.conf and not in the resource file.

Makes no real difference, but yes, it's normally in common.

> 2)I'm thinking to modify the "handlers" section as follow:
> 
> handlers {
> 		pri-on-incon-degr "/usr/lib/drbd/notify-pri-on-incon-degr.sh;
> /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ;
> reboot -f";
> 		pri-lost-after-sb "/usr/lib/drbd/notify-pri-lost-after-sb.sh;
> /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ;
> reboot -f";
> 		local-io-error "/usr/lib/drbd/notify-io-error.sh;
> /usr/lib/drbd/notify-emergency-shutdown.sh; echo o >
> /proc/sysrq-trigger ; halt -f";
>  
> 		# Hook into Pacemaker's fencing.
> 		fence-peer "/usr/lib/drbd/crm-fence-peer.sh";

You also need to unfence-peer, iirc.

> 	}
> 
> 
> In this way, when an I/O error happens, the node will be powered off and
> pacemaker will switch resources to the other node (or at least doesn't
> create problematic behaviors...)
> 
> 3) I'm thinking to move the "fencing" directive from the resource to the
> global_common.conf file. Furthermore, I want to change it to
> 
> fencing resource-and-stonith;

This is what it should have been, yes. However, I doubt it would have
helped in this failure scenario.

> 4) Finally, in the global "net" section I need to add:
> 
> after-sb-0pri discard-zero-changes;
> after-sb-1pri discard-secondary;
> after-sb-2pri disconnect;
> 
> At the end of the work configuration will be -> http://pastebin.com/r3N1gzwx
> 
> Please, give me suggestion about mistakes and possible changes.
> 
> Thank you

-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?