<html>

  <head>

    <meta content="text/html; charset=windows-1252"

      http-equiv="Content-Type">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    <p><br>

    </p>

    <br>

    <div class="moz-cite-prefix">On 19/09/2016 19:06, Marco Marino

      wrote:<br>

    </div>

    <blockquote

cite="mid:CAFHVVuLOwbaUwAc1Kw7-y2GD0hAAVn7mweFT207vnxOK6NN8kg@mail.gmail.com"

      type="cite">

      <div dir="ltr"><br>

        <div class="gmail_extra"><br>

          <div class="gmail_quote">2016-09-19 10:50 GMT+02:00 Igor

            Cicimov <span dir="ltr">&lt;<a moz-do-not-send="true"

                href="mailto:igorc@encompasscorporation.com"

                target="_blank">igorc@encompasscorporation.com</a>&gt;</span>:<br>

            <blockquote class="gmail_quote" style="margin:0 0 0

              .8ex;border-left:1px #ccc solid;padding-left:1ex">

              <p dir="ltr"><span class="">On 19 Sep 2016 5:45 pm, "Marco

                  Marino" &lt;<a moz-do-not-send="true"

                    href="mailto:marino.mrc@gmail.com" target="_blank">marino.mrc@gmail.com</a>&gt;

                  wrote:<br>

                  &gt;<br>

                  &gt; Hi, I'm trying to build an active/passive cluster

                  with drbd and pacemaker for a san. I'm using 2 nodes

                  with one raid controller (megaraid) on each one. Each

                  node has an ssd disk that works as cache for read (and

                  write?) realizing the CacheCade proprietary tecnology.

                  <br>

                  &gt;<br>

                </span>

                Did you configure the CacheCade? If the write cache was

                enabled in write-back mode then suddenly removing the

                device from under the controller would have caused

                serious problems I guess since the controller expects to

                write to the ssd cache firts and then flush to the

                hdd's. Maybe this explains the read only mode?</p>

            </blockquote>

            <div>Good point. It is exactly as you wrote. How can I

              mitigate this behavior in a clustered (active/passive)

              enviroment??? As I told in the other post, I think the

              best solution is to poweroff the node using local-io-error

              and switch all resources on the other node.... But please

              give me some suggestions....<br>

            </div>

            <div><br>

            </div>

          </div>

        </div>

      </div>

    </blockquote>

    <br>

    <blockquote

cite="mid:CAFHVVuLOwbaUwAc1Kw7-y2GD0hAAVn7mweFT207vnxOK6NN8kg@mail.gmail.com"

      type="cite">

      <div dir="ltr">

        <div class="gmail_extra">

          <div class="gmail_quote">

            <div> </div>

            <blockquote class="gmail_quote" style="margin:0 0 0

              .8ex;border-left:1px #ccc solid;padding-left:1ex">

              <p dir="ltr"><span class="">&gt; Basically, the structure

                  of the san is:<br>

                  &gt;<br>

                  &gt; Physycal disks -&gt; RAID -&gt; Device /dev/sdb

                  in the OS -&gt; Drbd resource (that use /dev/sdb as

                  backend) (using pacemaker with a master/slave

                  resource) -&gt; VG (managed with pacemaker) -&gt;

                  Iscsi target (with pacemaker) -&gt; Iscsi LUNS (one

                  for each logical volume in the VG, managed with

                  pacemaker)<br>

                  &gt;<br>

                  &gt; Few days ago, the ssd disk was wrongly removed

                  from the primary node of the cluster and this caused a

                  lot of problems: drbd resource and all logical volumes

                  went in readonly mode with a lot of I/O errors but the

                  cluster did not switched to the other node. All

                  filesystem on initiators went to readonly mode. There

                  are 2 problems involved here (I think): 1) Why

                  removing the ssd disk cause a readonly mode with I/O

                  errors? This means that the ssd is a single point of

                  failure for a single node san with megaraid

                  controllers and CacheCade tecnology..... and 2) Why

                  drbd not worked as espected?<br>

                </span>

                What was the state in /proc/drbd ?</p>

            </blockquote>

            <br>

          </div>

        </div>

      </div>

    </blockquote>

    I think you will need to examine the logs to find out what happened.

    It would appear (just making a wild guess) that either the cache is

    happening between DRBD and iSCSI instead of between DRBD and RAID.

    If it happened under DRBD then DRBD should see the read/write error,

    and should automatically fail the local storage. It wouldn't

    necessarily failover to the secondary, but it would do all

    read/write from the secondary node. The fact this didn't happen

    makes it look like the failure happened above DRBD.<br>

    <br>

    At least that is my understanding of how it will work in that

    scenario.<br>

    <br>

    Regards,<br>

    Adam<br>

  </body>

</html>