Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hi, I was testing the testing crm-fence-peer.sh on heartbeat/pacemaker cluster and I was wondering if the message " Remote node did not respond" which I got 3 times, is normal. For the simulation I used iptables on the slave to break the communication link between the master and slave. The drbd noticed immediately the broken link and invoked the crm-fence-peer.sh. here is the log on the master and I broke the communication link only for one of my drbd resources (drbd_pbx_service_1) Sep 20 22:07:22 node-01 kernel: block drbd1: PingAck did not arrive in time. Sep 20 22:07:22 node-01 kernel: block drbd1: peer( Secondary -> Unknown ) conn( Connected -> NetworkFailure ) pdsk( UpToDate -> DUnknown ) Sep 20 22:07:22 node-01 kernel: block drbd1: asender terminated Sep 20 22:07:22 node-01 kernel: block drbd1: Terminating asender thread Sep 20 22:07:22 node-01 kernel: block drbd1: short read expecting header on sock: r=-512 Sep 20 22:07:22 node-01 kernel: block drbd1: Creating new current UUID Sep 20 22:07:22 node-01 kernel: block drbd1: Connection closed Sep 20 22:07:22 node-01 kernel: block drbd1: helper command: /sbin/drbdadm fence-peer minor-1 Sep 20 22:07:22 node-01 crm-fence-peer.sh[14877]: invoked for drbd_pbx_service_1 Sep 20 22:07:22 node-01 cibadmin: [14881]: info: Invoked: cibadmin -Ql Sep 20 22:07:22 node-01 cibadmin: [14890]: info: Invoked: cibadmin -Q -t 1 Sep 20 22:07:24 node-01 crm-fence-peer.sh[14877]: Call cib_query failed (-41): Remote node did not respond Sep 20 22:07:24 node-01 cibadmin: [14905]: info: Invoked: cibadmin -Q -t 1 Sep 20 22:07:25 node-01 crm-fence-peer.sh[14877]: Call cib_query failed (-41): Remote node did not respond Sep 20 22:07:25 node-01 cibadmin: [14913]: info: Invoked: cibadmin -Q -t 1 Sep 20 22:07:27 node-01 crm-fence-peer.sh[14877]: Call cib_query failed (-41): Remote node did not respond Sep 20 22:07:27 node-01 cibadmin: [14958]: info: Invoked: cibadmin -Q -t 1 Sep 20 22:07:29 node-01 crm-fence-peer.sh[14877]: Call cib_query failed (-41): Remote node did not respond Sep 20 22:07:29 node-01 cibadmin: [14966]: info: Invoked: cibadmin -Q -t 2 Sep 20 22:07:31 node-01 cibadmin: [14992]: info: Invoked: cibadmin -C -o constraints -X <rsc_location rsc="ms-drbd_01" id="drbd-fence-by-handler-ms-drbd_01"> <rule role="Master" score="-INFINITY" id="drbd-fence-by-handler-rule-ms-drbd_01"> <expression attribute="#uname" operation="ne" value="node-01" id="drbd-fence-by-handler-expr-ms-drbd_01"/> </rule> </rsc_location> Sep 20 22:07:33 node-01 crm-fence-peer.sh[14877]: INFO peer is reachable, my disk is UpToDate: placed constraint 'drbd-fence-by-handler-ms-drbd_01' Sep 20 22:07:33 node-01 kernel: block drbd1: helper command: /sbin/drbdadm fence-peer minor-1 exit code 4 (0x400) Sep 20 22:07:33 node-01 kernel: block drbd1: fence-peer helper returned 4 (peer was fenced) Sep 20 22:07:33 node-01 kernel: block drbd1: pdsk( DUnknown -> Outdated ) Sep 20 22:07:33 node-01 kernel: block drbd1: conn( NetworkFailure -> Unconnected ) Sep 20 22:07:33 node-01 kernel: block drbd1: receiver terminated Sep 20 22:07:33 node-01 kernel: block drbd1: Restarting receiver thread Sep 20 22:07:33 node-01 kernel: block drbd1: receiver (re)started Sep 20 22:07:33 node-01 kernel: block drbd1: conn( Unconnected -> WFConnection ) Sep 20 22:07:33 node-01 cib: [15014]: info: write_cib_contents: Archived previous version as /var/lib/heartbeat/crm/cib-86.raw Sep 20 22:07:33 node-01 cib: [15014]: info: write_cib_contents: Wrote version 0.216.0 of the CIB to disk (digest: d702a9fda7620a063112250058a8cd85) Sep 20 22:07:33 node-01 cib: [15014]: info: retrieveCib: Reading cluster configuration from: /var/lib/heartbeat/crm/cib.C4omcr (digest: /var/lib/heartbeat/crm/cib.E6M61E) Sep 20 22:07:42 node-01 attrd: [3349]: info: attrd_ha_callback: flush message from node-03 Sep 20 22:08:55 node-01 kernel: block drbd1: Handshake successful: Agreed network protocol version 94 Sep 20 22:08:55 node-01 kernel: block drbd1: conn( WFConnection -> WFReportParams ) Sep 20 22:08:55 node-01 kernel: block drbd1: Starting asender thread (from drbd1_receiver [8421]) Sep 20 22:08:55 node-01 kernel: block drbd1: data-integrity-alg: sha1 Sep 20 22:08:55 node-01 kernel: block drbd1: drbd_sync_handshake: Sep 20 22:08:55 node-01 kernel: block drbd1: self 977AD28C97AB9AED:C374CF64C8EBB0FF:2A3FC068D1DCB3EA:2BF2361F46F1D703 bits:0 flags:0 Sep 20 22:08:55 node-01 kernel: block drbd1: peer C374CF64C8EBB0FE:0000000000000000:2A3FC068D1DCB3EA:2BF2361F46F1D703 bits:0 flags:0 Sep 20 22:08:55 node-01 kernel: block drbd1: uuid_compare()=1 by rule 70 Sep 20 22:08:55 node-01 kernel: block drbd1: peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapS ) pdsk( Outdated -> UpToDate ) Sep 20 22:08:55 node-01 kernel: block drbd1: conn( WFBitMapS -> SyncSource ) pdsk( UpToDate -> Inconsistent ) Sep 20 22:08:55 node-01 kernel: block drbd1: Began resync as SyncSource (will sync 0 KB [0 bits set]). Sep 20 22:08:55 node-01 kernel: block drbd1: Resync done (total 1 sec; paused 0 sec; 0 K/sec) Sep 20 22:08:55 node-01 kernel: block drbd1: conn( SyncSource -> Connected ) pdsk( Inconsistent -> UpToDate ) Sep 20 22:08:57 node-01 cib: [15850]: info: write_cib_contents: Archived previous version as /var/lib/heartbeat/crm/cib-87.raw Sep 20 22:08:57 node-01 cib: [15850]: info: write_cib_contents: Wrote version 0.217.0 of the CIB to disk (digest: b96e1b896cee045e99f40f2678239b48) Sep 20 22:08:57 node-01 cib: [15850]: info: retrieveCib: Reading cluster configuration from: /var/lib/heartbeat/crm/cib.S2TNma (digest: /var/lib/heartbeat/crm/cib.Cc8mF5) and the conf on both system # # please have a a look at the example configuration file in # /usr/share/doc/drbd83/drbd.conf # global { usage-count yes; } common { protocol C; syncer { csums-alg sha1; verify-alg sha1; rate 10M; } net { data-integrity-alg sha1; max-buffers 20480; max-epoch-size 16384; } disk { on-io-error detach; ### Only when DRBD is under cluster ### fencing resource-only; ### --- ### } startup { wfc-timeout 60; degr-wfc-timeout 30; outdated-wfc-timeout 15; } ### Only when DRBD is under cluster ### handlers { split-brain "/usr/lib/drbd/notify-split-brain.sh root"; fence-peer "/usr/lib/drbd/crm-fence-peer.sh"; after-resync-target "/usr/lib/drbd/crm-unfence-peer.sh"; } ### --- ### } resource drbd_pbx_service_1 { on node-01 { device /dev/drbd1; disk /dev/sdd1; address 10.10.10.129:7789; meta-disk internal; } on node-03 { device /dev/drbd1; disk /dev/sdd1; address 10.10.10.131:7789; meta-disk internal; } } resource drbd_pbx_service_2 { on node-02 { device /dev/drbd2; disk /dev/sdb1; address 10.10.10.130:7790; meta-disk internal; } on node-03 { device /dev/drbd2; disk /dev/sdc1; address 10.10.10.131:7790; meta-disk internal; } } Thanks in advance, Pavlos