[DRBD-user] testing crm-fence-peer.sh

Pavlos Parissis pavlos.parissis at gmail.com
Mon Sep 20 22:18:49 CEST 2010

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Hi,
I was testing the testing crm-fence-peer.sh on heartbeat/pacemaker
cluster and I was wondering if the message " Remote node did not
respond" which I got 3 times, is normal.
For the simulation I used iptables on the slave to break the
communication link between the master and slave. The drbd noticed
immediately the broken link and invoked the crm-fence-peer.sh.

here is the log on the master and I broke the communication link only
for one of my drbd resources (drbd_pbx_service_1)
Sep 20 22:07:22 node-01 kernel: block drbd1: PingAck did not arrive in time.
Sep 20 22:07:22 node-01 kernel: block drbd1: peer( Secondary ->
Unknown ) conn( Connected -> NetworkFailure ) pdsk( UpToDate ->
DUnknown )
Sep 20 22:07:22 node-01 kernel: block drbd1: asender terminated
Sep 20 22:07:22 node-01 kernel: block drbd1: Terminating asender thread
Sep 20 22:07:22 node-01 kernel: block drbd1: short read expecting
header on sock: r=-512
Sep 20 22:07:22 node-01 kernel: block drbd1: Creating new current UUID
Sep 20 22:07:22 node-01 kernel: block drbd1: Connection closed
Sep 20 22:07:22 node-01 kernel: block drbd1: helper command:
/sbin/drbdadm fence-peer minor-1
Sep 20 22:07:22 node-01 crm-fence-peer.sh[14877]: invoked for drbd_pbx_service_1
Sep 20 22:07:22 node-01 cibadmin: [14881]: info: Invoked: cibadmin -Ql
Sep 20 22:07:22 node-01 cibadmin: [14890]: info: Invoked: cibadmin -Q -t 1
Sep 20 22:07:24 node-01 crm-fence-peer.sh[14877]: Call cib_query
failed (-41): Remote node did not respond
Sep 20 22:07:24 node-01 cibadmin: [14905]: info: Invoked: cibadmin -Q -t 1
Sep 20 22:07:25 node-01 crm-fence-peer.sh[14877]: Call cib_query
failed (-41): Remote node did not respond
Sep 20 22:07:25 node-01 cibadmin: [14913]: info: Invoked: cibadmin -Q -t 1
Sep 20 22:07:27 node-01 crm-fence-peer.sh[14877]: Call cib_query
failed (-41): Remote node did not respond
Sep 20 22:07:27 node-01 cibadmin: [14958]: info: Invoked: cibadmin -Q -t 1
Sep 20 22:07:29 node-01 crm-fence-peer.sh[14877]: Call cib_query
failed (-41): Remote node did not respond
Sep 20 22:07:29 node-01 cibadmin: [14966]: info: Invoked: cibadmin -Q -t 2
Sep 20 22:07:31 node-01 cibadmin: [14992]: info: Invoked: cibadmin -C
-o constraints -X <rsc_location rsc="ms-drbd_01"
id="drbd-fence-by-handler-ms-drbd_01">   <rule role="Master"
score="-INFINITY" id="drbd-fence-by-handler-rule-ms-drbd_01">
<expression attribute="#uname" operation="ne" value="node-01"
id="drbd-fence-by-handler-expr-ms-drbd_01"/>   </rule> </rsc_location>
Sep 20 22:07:33 node-01 crm-fence-peer.sh[14877]: INFO peer is
reachable, my disk is UpToDate: placed constraint
'drbd-fence-by-handler-ms-drbd_01'
Sep 20 22:07:33 node-01 kernel: block drbd1: helper command:
/sbin/drbdadm fence-peer minor-1 exit code 4 (0x400)
Sep 20 22:07:33 node-01 kernel: block drbd1: fence-peer helper
returned 4 (peer was fenced)
Sep 20 22:07:33 node-01 kernel: block drbd1: pdsk( DUnknown -> Outdated )
Sep 20 22:07:33 node-01 kernel: block drbd1: conn( NetworkFailure ->
Unconnected )
Sep 20 22:07:33 node-01 kernel: block drbd1: receiver terminated
Sep 20 22:07:33 node-01 kernel: block drbd1: Restarting receiver thread
Sep 20 22:07:33 node-01 kernel: block drbd1: receiver (re)started
Sep 20 22:07:33 node-01 kernel: block drbd1: conn( Unconnected -> WFConnection )
Sep 20 22:07:33 node-01 cib: [15014]: info: write_cib_contents:
Archived previous version as /var/lib/heartbeat/crm/cib-86.raw
Sep 20 22:07:33 node-01 cib: [15014]: info: write_cib_contents: Wrote
version 0.216.0 of the CIB to disk (digest:
d702a9fda7620a063112250058a8cd85)
Sep 20 22:07:33 node-01 cib: [15014]: info: retrieveCib: Reading
cluster configuration from: /var/lib/heartbeat/crm/cib.C4omcr (digest:
/var/lib/heartbeat/crm/cib.E6M61E)
Sep 20 22:07:42 node-01 attrd: [3349]: info: attrd_ha_callback: flush
message from node-03
Sep 20 22:08:55 node-01 kernel: block drbd1: Handshake successful:
Agreed network protocol version 94
Sep 20 22:08:55 node-01 kernel: block drbd1: conn( WFConnection ->
WFReportParams )
Sep 20 22:08:55 node-01 kernel: block drbd1: Starting asender thread
(from drbd1_receiver [8421])
Sep 20 22:08:55 node-01 kernel: block drbd1: data-integrity-alg: sha1
Sep 20 22:08:55 node-01 kernel: block drbd1: drbd_sync_handshake:
Sep 20 22:08:55 node-01 kernel: block drbd1: self
977AD28C97AB9AED:C374CF64C8EBB0FF:2A3FC068D1DCB3EA:2BF2361F46F1D703
bits:0 flags:0
Sep 20 22:08:55 node-01 kernel: block drbd1: peer
C374CF64C8EBB0FE:0000000000000000:2A3FC068D1DCB3EA:2BF2361F46F1D703
bits:0 flags:0
Sep 20 22:08:55 node-01 kernel: block drbd1: uuid_compare()=1 by rule 70
Sep 20 22:08:55 node-01 kernel: block drbd1: peer( Unknown ->
Secondary ) conn( WFReportParams -> WFBitMapS ) pdsk( Outdated ->
UpToDate )
Sep 20 22:08:55 node-01 kernel: block drbd1: conn( WFBitMapS ->
SyncSource ) pdsk( UpToDate -> Inconsistent )
Sep 20 22:08:55 node-01 kernel: block drbd1: Began resync as
SyncSource (will sync 0 KB [0 bits set]).
Sep 20 22:08:55 node-01 kernel: block drbd1: Resync done (total 1 sec;
paused 0 sec; 0 K/sec)
Sep 20 22:08:55 node-01 kernel: block drbd1: conn( SyncSource ->
Connected ) pdsk( Inconsistent -> UpToDate )
Sep 20 22:08:57 node-01 cib: [15850]: info: write_cib_contents:
Archived previous version as /var/lib/heartbeat/crm/cib-87.raw
Sep 20 22:08:57 node-01 cib: [15850]: info: write_cib_contents: Wrote
version 0.217.0 of the CIB to disk (digest:
b96e1b896cee045e99f40f2678239b48)
Sep 20 22:08:57 node-01 cib: [15850]: info: retrieveCib: Reading
cluster configuration from: /var/lib/heartbeat/crm/cib.S2TNma (digest:
/var/lib/heartbeat/crm/cib.Cc8mF5)


and the conf on both system

#
# please have a a look at the example configuration file in
# /usr/share/doc/drbd83/drbd.conf
#

global {
  usage-count yes;
}
common {
  protocol C;

  syncer {
    csums-alg sha1;
    verify-alg sha1;
    rate 10M;
  }

  net {
    data-integrity-alg sha1;
    max-buffers 20480;
    max-epoch-size 16384;
  }

  disk {
    on-io-error detach;
### Only when DRBD is under cluster ###
    fencing resource-only;
### --- ###
  }

  startup {
    wfc-timeout 60;
    degr-wfc-timeout 30;
    outdated-wfc-timeout 15;
   }

### Only when DRBD is under cluster ###
  handlers {
    split-brain "/usr/lib/drbd/notify-split-brain.sh root";
    fence-peer "/usr/lib/drbd/crm-fence-peer.sh";
    after-resync-target "/usr/lib/drbd/crm-unfence-peer.sh";
  }
### --- ###
}

resource drbd_pbx_service_1 {

  on node-01 {
    device    /dev/drbd1;
    disk      /dev/sdd1;
    address   10.10.10.129:7789;
    meta-disk internal;
  }
   on node-03 {
    device    /dev/drbd1;
    disk      /dev/sdd1;
    address   10.10.10.131:7789;
    meta-disk internal;
  }
}

resource drbd_pbx_service_2 {

  on node-02 {
    device    /dev/drbd2;
    disk      /dev/sdb1;
    address   10.10.10.130:7790;
    meta-disk internal;
  }
  on node-03 {
    device    /dev/drbd2;
    disk      /dev/sdc1;
    address   10.10.10.131:7790;
    meta-disk internal;
  }
}

Thanks in advance,
Pavlos



More information about the drbd-user mailing list