[DRBD-user] cman+pacemaker+drbd fencing problem

Wed Feb 15 23:45:59 CET 2012

I think this is a problem with DRBD and not cman+pacemaker, so I'm posting here
first.

I'm trying to set up an active/active HA cluster as explained in :

<http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Clusters_from_Scratch/ch08.html>

I'll give versions and config files below, but I'll start with what happens. I
start with an active/active cman+pacemaker+drbd+gfs2 cluster, with fencing
enabled. My fencing mechanism cuts power to a node by turning the load off in
its UPS. The two nodes are hypatia-tb and orestes-tb.

I want to test fencing and recovery. I start with both nodes running, and
resources properly running on both nodes. Then I simulate failure on one node,
e.g., orestes-tb. I've done this with "crm node standby", "service pacemaker
off", or by pulling the pull. The results are the same: all the resources move
to hypatia-tb, with the drbd resource as Primary.

When I try to bring orestes-tb back into the cluster with "crm node online" or
"service pacemaker on" (the inverse of how I removed it), orestes-tb is fenced.
OK, that makes sense, I guess; there's a potential split-brain situation.

I bring orestes-tb back up, with the intent of adding it back into the cluster.
I make sure cman, pacemaker, and drbd services were off at system start. On
orestes-tb, I type "service drbd start".

What I expect to happen is that the drbd resource on orestes-tb is marked
"Outdated" or something like that. Then I'd fix it with "drbdadm
--discard-my-data connect admin" or whatever is appropriate, as in

<http://www.drbd.org/users-guide/s-resolve-split-brain.html>

What actually happens is that hypatia-tb is fenced. Since this is the node
running all the resources, this is bad behavior. It's even more puzzling when I
consider that at, the time, there isn't any fencing resource actually running on
orestes-tb; my guess is that DRBD on hypatia-tb is fencing itself.

Eventually hypatia-tb reboots, I fiddle with things, and the cluster is back to
normal. But as a fencing/stability/HA test, this is a failure.

Any ideas?

Versions:

Scientific Linux 6.2
2.6.32
cman-3.0.12
corosync-1.4.1
pacemaker-1.1.6
drbd-8.4.1

/etc/drbd.d/global-common.conf:

global {
        usage-count yes;
}

common {
        startup {
                wfc-timeout             60;
                degr-wfc-timeout        60;
                outdated-wfc-timeout    60;
        }
        net {
           ping-timeout 11;
        }
}

/etc/drbd.d/admin.res:

resource admin {

        protocol C;

        on hypatia-tb.nevis.columbia.edu {
                volume 0 {
                        device          /dev/drbd0;
                        disk            /dev/md2;
                        flexible-meta-disk      internal;
                }
                address         192.168.100.7:7788;
        }
        on orestes-tb.nevis.columbia.edu {
                volume 0 {
                        device          /dev/drbd0;
                        disk            /dev/md2;
                        flexible-meta-disk      internal;
                }
                address         192.168.100.6:7788;
        }

        startup {
        }

        net {
                allow-two-primaries     yes;
                after-sb-0pri      discard-zero-changes;
                after-sb-1pri      discard-secondary;
                after-sb-2pri      disconnect;
                sndbuf-size 0;
        }

        disk {
                resync-rate     100M;
                c-max-rate      100M;
                al-extents      3389;
                fencing resource-only;
        }

An edited output of "crm configure show":

node hypatia-tb.nevis.columbia.edu
node orestes-tb.nevis.columbia.edu
primitive StonithHypatia stonith:fence_nut \
   params pcmk_host_check="static-list" \
   pcmk_host_list="hypatia-tb.nevis.columbia.edu" \
   ups="sofia-ups" username="admin" password="XXX"
primitive StonithOrestes stonith:fence_nut \
   params pcmk_host_check="static-list" \
   pcmk_host_list="orestes-tb.nevis.columbia.edu"
   ups="dc-test-stand-ups" username="admin" password="XXX"
location StonithHypatiaLocation StonithHypatia \
   -inf: hypatia-tb.nevis.columbia.edu
location StonithOrestesLocation StonithOrestes \
   -inf: orestes-tb.nevis.columbia.edu

/etc/cluster/cluster.conf:

<?xml version="1.0"?>
<cluster config_version="17" name="Nevis_HA">
  <logging debug="off"/>
  <cman expected_votes="1" two_node="1" />
  <clusternodes>
    <clusternode name="hypatia-tb.nevis.columbia.edu" nodeid="1">
      <altname name="hypatia-private.nevis.columbia.edu" port="5405"
mcast="226.94.1.1"/>
      <fence>
        <method name="pcmk-redirect">
          <device name="pcmk" port="hypatia-tb.nevis.columbia.edu"/>
        </method>
      </fence>
    </clusternode>
    <clusternode name="orestes-tb.nevis.columbia.edu" nodeid="2">
      <altname name="orestes-private.nevis.columbia.edu" port="5405"
mcast="226.94.1.1"/>
      <fence>
        <method name="pcmk-redirect">
          <device name="pcmk" port="orestes-tb.nevis.columbia.edu"/>
        </method>
      </fence>
    </clusternode>
  </clusternodes>
  <fencedevices>
    <fencedevice name="pcmk" agent="fence_pcmk"/>
  </fencedevices>
  <fence_daemon post_join_delay="30" />
  <rm disabled="1" />
</cluster>

The log messages on orestes-tb, just before hypatia-tb is fenced (there are no
messages in the hypatia-tb log for this time):

Feb 15 16:52:27 orestes-tb kernel: drbd: initialized. Version: 8.4.1
(api:1/proto:86-100)
Feb 15 16:52:27 orestes-tb kernel: drbd: GIT-hash:
91b4c048c1a0e06777b5f65d312b38d47abaea80 build by
root at orestes-tb.nevis.columbia.edu, 2012-02-14 17:05:32
Feb 15 16:52:27 orestes-tb kernel: drbd: registered as block device major 147
Feb 15 16:52:27 orestes-tb kernel: d-con admin: Starting worker thread (from
drbdsetup [2570])
Feb 15 16:52:27 orestes-tb kernel: block drbd0: disk( Diskless -> Attaching )
Feb 15 16:52:27 orestes-tb kernel: d-con admin: Method to ensure write ordering:
barrier
Feb 15 16:52:27 orestes-tb kernel: block drbd0: max BIO size = 130560
Feb 15 16:52:27 orestes-tb kernel: block drbd0: Adjusting my ra_pages to backing
device's (32 -> 768)
Feb 15 16:52:27 orestes-tb kernel: block drbd0: drbd_bm_resize called with
capacity == 5611549368
Feb 15 16:52:27 orestes-tb kernel: block drbd0: resync bitmap: bits=701443671
words=10960058 pages=21407
Feb 15 16:52:27 orestes-tb kernel: block drbd0: size = 2676 GB (2805774684 KB)
Feb 15 16:52:28 orestes-tb kernel: block drbd0: bitmap READ of 21407 pages took
634 jiffies
Feb 15 16:52:28 orestes-tb kernel: block drbd0: recounting of set bits took
additional 92 jiffies
Feb 15 16:52:28 orestes-tb kernel: block drbd0: 0 KB (0 bits) marked out-of-sync
by on disk bit-map.
Feb 15 16:52:28 orestes-tb kernel: block drbd0: disk( Attaching -> Outdated )
Feb 15 16:52:28 orestes-tb kernel: block drbd0: attached to UUIDs
F5355FCF6114F218:0000000000000000:8A5519C7090D6BD6:8A5419C7090D6BD6
Feb 15 16:52:28 orestes-tb kernel: d-con admin: conn( StandAlone -> Unconnected )
Feb 15 16:52:28 orestes-tb kernel: d-con admin: Starting receiver thread (from
drbd_w_admin [2572])
Feb 15 16:52:28 orestes-tb kernel: d-con admin: receiver (re)started
Feb 15 16:52:28 orestes-tb kernel: d-con admin: conn( Unconnected -> WFConnection )
Feb 15 16:52:29 orestes-tb kernel: d-con admin: Handshake successful: Agreed
network protocol version 100
Feb 15 16:52:29 orestes-tb kernel: d-con admin: conn( WFConnection ->
WFReportParams )
Feb 15 16:52:29 orestes-tb kernel: d-con admin: Starting asender thread (from
drbd_r_admin [2579])
Feb 15 16:52:29 orestes-tb kernel: block drbd0: drbd_sync_handshake:
Feb 15 16:52:29 orestes-tb kernel: block drbd0: self
F5355FCF6114F218:0000000000000000:8A5519C7090D6BD6:8A5419C7090D6BD6 bits:0 flags:0
Feb 15 16:52:29 orestes-tb kernel: block drbd0: peer
06B93A6C54D6D631:F5355FCF6114F219:8A5519C7090D6BD6:8A5419C7090D6BD6 bits:615 flags:0
Feb 15 16:52:29 orestes-tb kernel: block drbd0: uuid_compare()=-1 by rule 50
Feb 15 16:52:29 orestes-tb kernel: block drbd0: peer( Unknown -> Primary ) conn(
WFReportParams -> WFBitMapT ) pdsk( DUnknown -> UpToDate )
Feb 15 16:52:29 orestes-tb kernel: block drbd0: receive bitmap stats
[Bytes(packets)]: plain 0(0), RLE 39(1), total 39; compression: 100.0%
Feb 15 16:52:29 orestes-tb kernel: block drbd0: send bitmap stats
[Bytes(packets)]: plain 0(0), RLE 39(1), total 39; compression: 100.0%
Feb 15 16:52:29 orestes-tb kernel: block drbd0: conn( WFBitMapT -> WFSyncUUID )
Feb 15 16:52:50 orestes-tb kernel: d-con admin: PingAck did not arrive in time.
Feb 15 16:52:50 orestes-tb kernel: d-con admin: peer( Primary -> Unknown ) conn(
WFSyncUUID -> NetworkFailure ) pdsk( UpToDate -> DUnknown )
Feb 15 16:52:50 orestes-tb kernel: d-con admin: asender terminated
Feb 15 16:52:50 orestes-tb kernel: d-con admin: Terminating asender thread
Feb 15 16:52:51 orestes-tb kernel: block drbd0: bitmap WRITE of 3 pages took 247
jiffies
Feb 15 16:52:51 orestes-tb kernel: block drbd0: 2460 KB (615 bits) marked
out-of-sync by on disk bit-map.
Feb 15 16:52:51 orestes-tb kernel: d-con admin: Connection closed
Feb 15 16:52:51 orestes-tb kernel: d-con admin: conn( NetworkFailure ->
Unconnected )
Feb 15 16:52:51 orestes-tb kernel: d-con admin: receiver terminated
Feb 15 16:52:51 orestes-tb kernel: d-con admin: Restarting receiver thread
Feb 15 16:52:51 orestes-tb kernel: d-con admin: receiver (re)started
Feb 15 16:52:51 orestes-tb kernel: d-con admin: conn( Unconnected -> WFConnection )

-- 
Bill Seligman             | Phone: (914) 591-2823
Nevis Labs, Columbia Univ | mailto://seligman@nevis.columbia.edu
PO Box 137                |
Irvington NY 10533 USA    | http://www.nevis.columbia.edu/~seligman/

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4497 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20120215/eb86f108/attachment.bin>