<div dir="ltr">Hi Digimer, thank you for your support!<br><div><div class="gmail_extra"><br><div class="gmail_quote">2016-09-19 10:09 GMT+02:00 Digimer <span dir="ltr"><<a href="mailto:lists@alteeve.ca" target="_blank">lists@alteeve.ca</a>></span>:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span>On 19/09/16 03:37 AM, Marco Marino wrote:<br>
> Hi, I'm trying to build an active/passive cluster with drbd and<br>
> pacemaker for a san. I'm using 2 nodes with one raid controller<br>
> (megaraid) on each one. Each node has an ssd disk that works as cache<br>
> for read (and write?) realizing the CacheCade proprietary tecnology.<br>
><br>
> Basically, the structure of the san is:<br>
><br>
> Physycal disks -> RAID -> Device /dev/sdb in the OS -> Drbd resource<br>
> (that use /dev/sdb as backend) (using pacemaker with a master/slave<br>
> resource) -> VG (managed with pacemaker) -> Iscsi target (with<br>
> pacemaker) -> Iscsi LUNS (one for each logical volume in the VG, managed<br>
> with pacemaker)<br>
><br>
> Few days ago, the ssd disk was wrongly removed from the primary node of<br>
> the cluster and this caused a lot of problems: drbd resource and all<br>
> logical volumes went in readonly mode with a lot of I/O errors but the<br>
> cluster did not switched to the other node. All filesystem on initiators<br>
<br>
</span>Corosync detects node faults and causes pacemaker to trigger fencing.<br>
Storage failures, by default, don't trigger failover. DRBD should have<br>
marked the node with failed storage as Diskless and continued to<br>
function be reading and writing to the good node's storage. So that's<br>
where I would start looking for problems.<br></blockquote><div><br></div><div>Yes, resource became diskless, but could be the problem related to the fact that there are constraints for ordering and colocation in pacemaker? Remember that the drbd resource is managed by pacemaker with a master/slave resource. Diskless in drbd doesn't mean failure for pacemaker, so from the point of view of pacemaker all is normal... (maybe this is a stupid hypothesis.... I'm sorry.)<br></div><div><br> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<span><br>
> went to readonly mode. There are 2 problems involved here (I think): 1)<br>
> Why removing the ssd disk cause a readonly mode with I/O errors? This<br>
<br>
</span>Was it the FS that remounted read-only? The IO errors could have been<br>
the OS complaining. Without logs, it's impossible to say what was<br>
actually happening.<br></blockquote><div><br></div><div>I think the problem is not related to FS. When the problem happened, I did lvs on the san and I have seen a lot of I/O errors related to logical volumes on top of the drbd resource (and the drbd resource in diskless mode)<br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<span><br>
> means that the ssd is a single point of failure for a single node san<br>
> with megaraid controllers and CacheCade tecnology..... and 2) Why drbd<br>
> not worked as espected?<br>
<br>
</span>Again, no logs means we can't say. If your array is redundant (RAID<br>
level 1, 5 or 6) then a drive failure should be no problem. If it's not<br>
redundant, then drbd should have been able to continue to function, as<br>
mentioned above.<br></blockquote><div> </div><div>I'm using RAID6, but I think that CacheCade technology is outside the raid. The ssd disk is not involved in the creation of the raid. As I told, the ssd actually seems to be a single point of failure outside the raid (also here, I'm sorry if this is a stupid hypothesis).<br></div><div><br> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<span><br>
> For point 1) I'm checking with the vendor and I doubt that I can do<br>
> something<br>
<br>
</span>The OS, and DRBD, will react based on what the controller's driver tells<br>
it has happened. Why whatever happened would require pulling the logs<br>
from the controller.<br>
<br>
If I recall our conversation on IRC, I believe you had a supermicro box<br>
and possibly suffered a back-plane failure. If so, then it's possible<br>
more than just one SSD failed. Again, not something we can speak to hear<br>
without logs and feedback from your hardware vendor's diagnostics.<br></blockquote><div><br></div><div>No, just the ssd disk has been removed, but as a I told, the ssd disk is outside the raid<br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<span><br>
> For point 2) I have errors in the drbd configuration. My idea is that<br>
> when an I/O error happens on the primary node, the cluster should switch<br>
> to the secondary node and shut down the damaged node.<br>
> Here -> <a href="http://pastebin.com/79dDK66m" rel="noreferrer" target="_blank">http://pastebin.com/79dDK66m</a> it is possible to see the actual<br>
> drbd configuration, but I need to change a lot of things and I want to<br>
> share my ideas here:<br>
<br>
</span>You need to set 'fencing resource-and-stonith;', first of all. Secondly,<br>
does stonith work in pacemaker? Also, setting 'degr-wfc-timeout 0;'<br>
seems ... aggressive.<br>
<span><br></span></blockquote><div>Should I increase it? Nodes are connected on a 10G direct channel network for replication<br></div><div><br> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span>
> 1) The "handlers" section should be moved in the "common" section of<br>
> global_common.conf and not in the resource file.<br>
<br>
</span>Makes no real difference, but yes, it's normally in common.<br></blockquote><div><br></div><div>Ok<br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<span><br>
> 2)I'm thinking to modify the "handlers" section as follow:<br>
><br>
> handlers {<br>
> pri-on-incon-degr "/usr/lib/drbd/notify-pri-on-i<wbr>ncon-degr.sh;<br>
> /usr/lib/drbd/notify-emergency<wbr>-reboot.sh; echo b > /proc/sysrq-trigger ;<br>
> reboot -f";<br>
> pri-lost-after-sb "/usr/lib/drbd/notify-pri-lost<wbr>-after-sb.sh;<br>
> /usr/lib/drbd/notify-emergency<wbr>-reboot.sh; echo b > /proc/sysrq-trigger ;<br>
> reboot -f";<br>
> local-io-error "/usr/lib/drbd/notify-io-error<wbr>.sh;<br>
> /usr/lib/drbd/notify-emergency<wbr>-shutdown.sh; echo o ><br>
> /proc/sysrq-trigger ; halt -f";<br>
><br>
> # Hook into Pacemaker's fencing.<br>
> fence-peer "/usr/lib/drbd/crm-fence-peer.<wbr>sh";<br>
<br>
</span>You also need to unfence-peer, iirc.<br></blockquote><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<span><br>
> }<br>
><br>
><br>
> In this way, when an I/O error happens, the node will be powered off and<br>
> pacemaker will switch resources to the other node (or at least doesn't<br>
> create problematic behaviors...)<br>
><br>
> 3) I'm thinking to move the "fencing" directive from the resource to the<br>
> global_common.conf file. Furthermore, I want to change it to<br>
><br>
> fencing resource-and-stonith;<br>
<br>
</span>This is what it should have been, yes. However, I doubt it would have<br>
helped in this failure scenario.<br></blockquote><div><br></div><div>Why? Using local-io-error should power off the node and switch the cluster on the remaing node....<br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div><div><br>
> 4) Finally, in the global "net" section I need to add:<br>
><br>
> after-sb-0pri discard-zero-changes;<br>
> after-sb-1pri discard-secondary;<br>
> after-sb-2pri disconnect;<br>
><br>
> At the end of the work configuration will be -> <a href="http://pastebin.com/r3N1gzwx" rel="noreferrer" target="_blank">http://pastebin.com/r3N1gzwx</a><br>
><br>
> Please, give me suggestion about mistakes and possible changes.<br>
><br>
> Thank you<br>
<br></div></div></blockquote><div><br><br></div><div>Thank you again<br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div><div>
<br>
</div></div><span><font color="#888888">--<br>
Digimer<br>
Papers and Projects: <a href="https://alteeve.ca/w/" rel="noreferrer" target="_blank">https://alteeve.ca/w/</a><br>
What if the cure for cancer is trapped in the mind of a person without<br>
access to education?<br>
</font></span></blockquote></div><br></div></div></div>