<p dir="ltr"></p>

<p dir="ltr">On 19 Sep 2016 5:45 pm, &quot;Marco Marino&quot; &lt;<a href="mailto:marino.mrc@gmail.com">marino.mrc@gmail.com</a>&gt; wrote:<br>

&gt;<br>

&gt; Hi, I&#39;m trying to build an active/passive cluster with drbd and pacemaker for a san. I&#39;m using 2 nodes with one raid controller (megaraid) on each one. Each node has an ssd disk that works as cache for read (and write?) realizing the CacheCade proprietary tecnology. <br>

&gt;<br>

Did you configure the CacheCade? If the write cache was enabled in write-back mode then suddenly removing the device from under the controller would have caused serious problems I guess since the controller expects to write to the ssd cache firts and then flush to the hdd&#39;s. Maybe this explains the read only mode?</p>

<p dir="ltr">&gt; Basically, the structure of the san is:<br>

&gt;<br>

&gt; Physycal disks -&gt; RAID -&gt; Device /dev/sdb in the OS -&gt; Drbd resource (that use /dev/sdb as backend) (using pacemaker with a master/slave resource) -&gt; VG (managed with pacemaker) -&gt; Iscsi target (with pacemaker) -&gt; Iscsi LUNS (one for each logical volume in the VG, managed with pacemaker)<br>

&gt;<br>

&gt; Few days ago, the ssd disk was wrongly removed from the primary node of the cluster and this caused a lot of problems: drbd resource and all logical volumes went in readonly mode with a lot of I/O errors but the cluster did not switched to the other node. All filesystem on initiators went to readonly mode. There are 2 problems involved here (I think): 1) Why removing the ssd disk cause a readonly mode with I/O errors? This means that the ssd is a single point of failure for a single node san with megaraid controllers and CacheCade tecnology..... and 2) Why drbd not worked as espected?<br>

What was the state in /proc/drbd ?</p>

<p dir="ltr">&gt; For point 1) I&#39;m checking with the vendor and I doubt that I can do something<br>

&gt; For point 2) I have errors in the drbd configuration. My idea is that when an I/O error happens on the primary node, the cluster should switch to the secondary node and shut down the damaged node. <br>

&gt; Here -&gt; <a href="http://pastebin.com/79dDK66m">http://pastebin.com/79dDK66m</a> it is possible to see the actual drbd configuration, but I need to change a lot of things and I want to share my ideas here:<br>

&gt;<br>

&gt; 1) The &quot;handlers&quot; section should be moved in the &quot;common&quot; section of global_common.conf and not in the resource file.<br>

&gt;<br>

&gt; 2)I&#39;m thinking to modify the &quot;handlers&quot; section as follow:<br>

&gt;<br>

&gt; handlers {                 pri-on-incon-degr &quot;/usr/lib/drbd/notify-pri-on-incon-degr.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b &gt; /proc/sysrq-trigger ; reboot -f&quot;;                 pri-lost-after-sb &quot;/usr/lib/drbd/notify-pri-lost-after-sb.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b &gt; /proc/sysrq-trigger ; reboot -f&quot;;                 local-io-error &quot;/usr/lib/drbd/notify-io-error.sh; /usr/lib/drbd/notify-emergency-shutdown.sh; echo o &gt; /proc/sysrq-trigger ; halt -f&quot;;                   # Hook into Pacemaker&#39;s fencing.                 fence-peer &quot;/usr/lib/drbd/crm-fence-peer.sh&quot;;         }<br>

&gt;<br>

&gt;<br>

&gt; In this way, when an I/O error happens, the node will be powered off and pacemaker will switch resources to the other node (or at least doesn&#39;t create problematic behaviors...)<br>

&gt;<br>

&gt; 3) I&#39;m thinking to move the &quot;fencing&quot; directive from the resource to the global_common.conf file. Furthermore, I want to change it to<br>

&gt;<br>

&gt; fencing resource-and-stonith;<br>

&gt;<br>

&gt;<br>

&gt; 4) Finally, in the global &quot;net&quot; section I need to add:<br>

&gt;<br>

&gt; after-sb-0pri discard-zero-changes; after-sb-1pri discard-secondary; after-sb-2pri disconnect;<br>

&gt;<br>

&gt; At the end of the work configuration will be -&gt; <a href="http://pastebin.com/r3N1gzwx">http://pastebin.com/r3N1gzwx</a><br>

&gt;<br>

&gt; Please, give me suggestion about mistakes and possible changes.<br>

&gt;<br>

&gt; Thank you<br>

&gt;<br>

&gt;<br>

&gt;<br>

&gt; _______________________________________________<br>

&gt; drbd-user mailing list<br>

&gt; <a href="mailto:drbd-user@lists.linbit.com">drbd-user@lists.linbit.com</a><br>

&gt; <a href="http://lists.linbit.com/mailman/listinfo/drbd-user">http://lists.linbit.com/mailman/listinfo/drbd-user</a><br>

&gt;</p>