[DRBD-user] Lost in the midle of tons of information

Tue Apr 16 22:07:22 CEST 2013

Hello.

I'm reading tons of information about pacemaker/corosyc/drbd and fence 
devices and right now I'm feeling really lost.

My enviroment (2 nodes cluster-corosync/pacemaker for kvm virtualization 
using drbd as shared storage) doesn't have a true fence device (ipmi, 
apc or any other physical device/hardware).

In fact I could use my managed switch but this method would cut only the 
node's external link (the one where the services/vms will be avaliable) 
and not the drbd's internal link (for this connection I'm using a direct 
crossover cable).
And thats it! That is all I got for a production environment.

So... how to do the fencing/stonith?
Should I use the managed switch? Is it enough?

In fact I belive to be facing a lot of limitations on my understanding 
about fencing/stonith.

In case I lost my external link in one node I might be able to test this 
scenario using "ping" agent and migrate the services for the other.
As far I can understand this is not fencing/stonish but just a 
failover/migration right?

In case I lost external link in both nodes then then "ping" agent should 
be smart to do nothing!

In case I lost the internal link that will result in a lost of drbd 
replication at once.
In this case the primary's resource should be able to run on it's actual 
node with no problem.
After drbd's link is up again a resync will start and both nodes will be 
uptodate in the end of the process.
Again this is not a fencing/stonith situation, right?

So lets see some fencing/stonish situation that I believe are possible:

- one node crashs (power off, hangs, etc): in this situation node "A" 
(as example) is crashed, node "B" detects the broken communication and 
should start a stonith and only after a stonith's response it is free to 
restart stopped services that were prior allocated to node "A";
---> in this scenario my cluster will end with a primary/primary race 
condition and a split-brain anyway, would not it? I meant node "A" was 
the primary when crashed... node "B" took the primary and restart 
service but when node "A" come back it will try to be primary again or 
will accept be secondary and start a resync?

- both nodes cash: no idea on what should happen... if in case only 
network failed so both nodes will try a STONITH... when the 
communication is back the stonith will take place and I might see at 
least one reboot on each node... in this case my "ping" smart agent 
should do nothing because it could not confirm the other's node link!

- node "B" got no external link (so my "ping" smart agent starts to 
migrate services to node "A") but then node "A" crashes...  ohhh my 
blessed God.... I have no idea.

At this point of my (lack) understanding I would use the following 
parameters for drbd.conf

global {
         usage-count yes;
}

common {
         protocol C;

     handlers {
         fence-peer "/usr/lib/drbd/crm-fence-peer.sh";
         after-resync-target "/usr/lib/drbd/crm-unfence-peer.sh";
         pri-on-incon-degr "/usr/lib/drbd/notify-pri-on-incon-degr.sh; 
/usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; 
reboot -f";
     }

         disk {
                 on-io-error detach;
                 fencing resource-only;
         }

         net {
                 data-integrity-alg md5;

                 after-sb-0pri discard-zero-changes;
                 after-sb-1pri discard-secondary;
                 after-sb-2pri disconnect;
         }

         syncer {
                 rate 42M;
                 al-extents 3389;
                 verify-alg md5;
                 csums-alg md5;
         }
}

Is it enough or do I need enable stonith on pacemaker?

If I need stonith may I use stonith:null method on my cluster and keep 
all may faith on drbd fence scripts?

As alternative of stonith:null I was thinking about:

- suicide as stonith device (I have this on my opensuse 12.3 
openais/pacemaker/corosync box) but I read a lot of people recommending 
not doing so.

- "sbd".... but this software based fence device should not be used over 
a drbd resource.
But to make me crazier the same guide gives me hope again telling that I 
might be able to use iSCSI over drbd for sbd fencing (or perhaps thats a 
miss translation)!
Below the exact words.

     "The SBD devices must not reside on a DRBD instance.
     Why? Because DRBD is not shared, but replicated storage.
     If your cluster communication breaks down, and you finally need to 
actually use stonith, chances are that the DRBD replication link broke 
down as well, whatever you write to your local instance of DRBD cannot 
reach the peer, and the peer's sbd daemon has no way to know about that 
poison pill it is supposed to commit suicide uppon.
     The SBD device may of course be an iSCSI LU, which in turn may be 
exported from a DRBD based iSCSI target cluster."

Is iSCSI over drbd a real possibility for sbd fencing?

If you have recommendations or additional comments please feel free to send.

Regards,
Gilsberty

P.S.: I'm sorry for the size of the message.