Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hi, I am using heartbeat2 from backports.org on debian 4.0. When testing the dopd hotfix for etch (http://www.linbit.com/support/hotfix/dopd_heartbeat_2.1.3/etch-i386/) I still can't get dopd to work. I followed the instructions in http://www.linbit.com/support/hotfix/dopd_heartbeat_2.1.3/README and downloaded the fixed versions, downloaded the MD5SUM file, extracted the files and verified the MD5 sum. -- debnode2:/usr/lib/heartbeat# wget http://www.linbit.com/support/hotfix/dopd_heartbeat_2.1.3/etch-i386/MD5SUM --13:20:54-- http://www.linbit.com/support/hotfix/dopd_heartbeat_2.1.3/etch-i386/MD5SUM => `MD5SUM' Auflösen des Hostnamen »www.linbit.com«.... 212.69.162.23 Verbindungsaufbau zu www.linbit.com|212.69.162.23|:80... verbunden. HTTP Anforderung gesendet, warte auf Antwort... 200 OK Länge: 192 [text/plain] 100%[====================================>] 192 --.--K/s 13:20:54 (10.96 MB/s) - »MD5SUM« gespeichert [192/192] debnode2:/usr/lib/heartbeat# md5sum --check < MD5SUM dopd: OK drbd-peer-outdater: OK dopd.bz2: OK drbd-peer-outdater.bz2: OK -- Then I shut down debnode1 (it's a VM, so I powered it down HARD) - the failover with heartbeat did not work. Logs: Aug 15 12:39:33 debnode2 kernel: drbd0: PingAck did not arrive in time. Aug 15 12:39:33 debnode2 kernel: drbd0: peer( Primary -> Unknown ) conn( Connected -> NetworkFailure ) pdsk( UpToDate -> DUnknown ) Aug 15 12:39:33 debnode2 kernel: drbd0: asender terminated Aug 15 12:39:33 debnode2 kernel: drbd0: Terminating asender thread Aug 15 12:39:33 debnode2 kernel: drbd0: short read expecting header on sock: r=-512 Aug 15 12:39:33 debnode2 kernel: drbd0: Writing meta data super block now. Aug 15 12:39:33 debnode2 kernel: drbd0: tl_clear() Aug 15 12:39:33 debnode2 kernel: drbd0: Connection closed Aug 15 12:39:33 debnode2 kernel: drbd0: conn( NetworkFailure -> Unconnected ) Aug 15 12:39:33 debnode2 kernel: drbd0: receiver terminated Aug 15 12:39:33 debnode2 kernel: drbd0: receiver (re)started Aug 15 12:39:33 debnode2 kernel: drbd0: conn( Unconnected -> WFConnection ) Aug 15 12:39:37 debnode2 heartbeat: [2707]: WARN: node debnode1: is dead Aug 15 12:39:37 debnode2 heartbeat: [2707]: WARN: No STONITH device configured. Aug 15 12:39:37 debnode2 ipfail: [4559]: info: Status update: Node debnode1 now has status dead Aug 15 12:39:37 debnode2 heartbeat: [2707]: WARN: Shared disks are not protected. Aug 15 12:39:37 debnode2 heartbeat: [2707]: info: Resources being acquired from debnode1. Aug 15 12:39:37 debnode2 heartbeat: [2707]: info: Link debnode1:eth0 dead. Aug 15 12:39:37 debnode2 heartbeat: [2707]: info: Link debnode1:eth1 dead. Aug 15 12:39:37 debnode2 heartbeat: [5210]: debug: notify_world: setting SIGCHLD Handler to SIG_DFL Aug 15 12:39:37 debnode2 harc[5210]: [5223]: info: Running /etc/ha.d/rc.d/status status Aug 15 12:39:37 debnode2 heartbeat: [5211]: info: No local resources [/usr/share/heartbeat/ResourceManager listkeys debnode2] to acquire. Aug 15 12:39:37 debnode2 heartbeat: [2707]: debug: StartNextRemoteRscReq(): child count 1 Aug 15 12:39:37 debnode2 ipfail: [4559]: debug: Found ping node 192.168.226.2! Aug 15 12:39:37 debnode2 mach_down[5235]: [5256]: info: Taking over resource group drbddisk Aug 15 12:39:37 debnode2 ResourceManager[5257]: [5268]: info: Acquiring resource group: debnode1 drbddisk Filesystem::/dev/drbd0::/db::ext3::noatime IPaddr2:: 192.168.226.42/32/eth0 Aug 15 12:39:37 debnode2 ResourceManager[5257]: [5284]: info: Running /etc/ha.d/resource.d/drbddisk start Aug 15 12:39:37 debnode2 ResourceManager[5257]: [5285]: debug: Starting /etc/ha.d/resource.d/drbddisk start [xxxxx] Aug 15 12:39:37 debnode2 kernel: drbd0: helper command: /sbin/drbdadm outdate-peer Aug 15 12:39:38 debnode2 drbd-peer-outdater: [5291]: debug: drbd peer: debnode1 Aug 15 12:39:38 debnode2 drbd-peer-outdater: [5291]: debug: drbd resource: db Aug 15 12:39:38 debnode2 /usr/lib/heartbeat/dopd: [4560]: debug: Connecting channel Aug 15 12:39:38 debnode2 /usr/lib/heartbeat/dopd: [4560]: debug: Client outdater (0x8057ee0) connected Aug 15 12:39:38 debnode2 /usr/lib/heartbeat/dopd: [4560]: debug: invoked: outdater Aug 15 12:39:38 debnode2 /usr/lib/heartbeat/dopd: [4560]: debug: Processing msg from outdater Aug 15 12:39:38 debnode2 /usr/lib/heartbeat/dopd: [4560]: debug: Got message from (drbd-peer-outdater). (peer: debnode1, res :db) Aug 15 12:39:38 debnode2 /usr/lib/heartbeat/dopd: [4560]: debug: Starting node walk Aug 15 12:39:38 debnode2 ipfail: [4559]: info: NS: We are still alive! Aug 15 12:39:38 debnode2 ipfail: [4559]: info: Link Status update: Link debnode1/eth0 now has status dead Aug 15 12:39:38 debnode2 drbd-peer-outdater: [5291]: debug: message: outdater_rc, debnode2 Aug 15 12:39:38 debnode2 kernel: drbd0: outdate-peer helper broken, returned 20 Aug 15 12:39:38 debnode2 kernel: drbd0: helper command: /sbin/drbdadm outdate-peer [xxxxx] The Section between [xxxxx] repeats in a loop. --- Then I tried to manually start the service (debnode1 still powered down): debnode2:/usr/lib/heartbeat# cl_status listnodes 192.168.226.2 debnode2 debnode1 debnode2:/usr/lib/heartbeat# cl_status nodestatus debnode1 dead debnode2:/usr/lib/heartbeat# echo "==========================" >> /var/log/syslog debnode2:/usr/lib/heartbeat# /usr/lib/heartbeat/drbd-peer-outdater -p debnode1 -r db -t 4; echo $? 20 Return code 20 and the log says: ========================== Aug 15 13:27:09 debnode2 drbd-peer-outdater: [6949]: debug: drbd peer: debnode1 Aug 15 13:27:09 debnode2 drbd-peer-outdater: [6949]: debug: drbd resource: db Aug 15 13:27:09 debnode2 /usr/lib/heartbeat/dopd: [4560]: debug: Connecting channel Aug 15 13:27:09 debnode2 /usr/lib/heartbeat/dopd: [4560]: debug: Client outdater (0x805fe40) connected Aug 15 13:27:09 debnode2 /usr/lib/heartbeat/dopd: [4560]: debug: invoked: outdater Aug 15 13:27:09 debnode2 /usr/lib/heartbeat/dopd: [4560]: debug: Processing msg from outdater Aug 15 13:27:09 debnode2 /usr/lib/heartbeat/dopd: [4560]: debug: Got message from (drbd-peer-outdater). (peer: debnode1, res :db) Aug 15 13:27:09 debnode2 /usr/lib/heartbeat/dopd: [4560]: debug: Starting node walk Aug 15 13:27:10 debnode2 drbd-peer-outdater: [6949]: debug: message: outdater_rc, debnode2 Aug 15 13:27:10 debnode2 /usr/lib/heartbeat/dopd: [4560]: WARN: Cluster node: debnode1: status: dead Aug 15 13:27:10 debnode2 /usr/lib/heartbeat/dopd: [4560]: debug: Processed 1 messages Aug 15 13:27:10 debnode2 /usr/lib/heartbeat/dopd: [4560]: debug: invoked: outdater Aug 15 13:27:10 debnode2 /usr/lib/heartbeat/dopd: [4560]: debug: Processed 0 messages Aug 15 13:27:10 debnode2 /usr/lib/heartbeat/dopd: [4560]: debug: destroying connection: (null) Aug 15 13:27:10 debnode2 /usr/lib/heartbeat/dopd: [4560]: debug: Deleting outdater (0x805fe40) from mainloop debnode2:/usr/lib/heartbeat# drbdadm primary db /dev/drbd0: State change failed: (-7) Refusing to be Primary while peer is not outdated Command 'drbdsetup /dev/drbd0 primary' terminated with exit code 11 What's the issue here ? From my understanding dopd should: "invalidate remote DRBD disks if ONLY the replication link is broken and heartbeat can still communitcate with the remote peer over an alternate network - aka a second heartbeat". If the node to be outdated is known to be dead by heartbeat, the node is dead and dopd should just continue. This latter case is the behaviour if a node is really dead." Or did I miss something ? Just for completeness: debnode2:/usr/lib/heartbeat# ls -al /usr/lib/heartbeat/drbd-peer-outdater* -rwxr-xr-x 1 root root 8716 2008-08-05 15:49 /usr/lib/heartbeat/drbd-peer-outdater -rw-r--r-- 1 root root 4419 2008-08-05 15:49 /usr/lib/heartbeat/drbd-peer-outdater.bz2 -rwxr-xr-x 1 root root 8716 2008-03-28 17:39 /usr/lib/heartbeat/drbd-peer-outdater.ORG debnode2:/usr/lib/heartbeat# ls -al /usr/lib/heartbeat/dopd* -rwxr-xr-x 1 root root 12744 2008-08-05 15:49 /usr/lib/heartbeat/dopd -rw-r--r-- 1 root root 6116 2008-08-05 15:49 /usr/lib/heartbeat/dopd.bz2 -rwxr-xr-x 1 root root 12744 2008-03-28 17:39 /usr/lib/heartbeat/dopd.ORG debnode2:/usr/lib/heartbeat# md5sum /usr/lib/heartbeat/drbd-peer-outdater b195f526bb6fa3659f4c63e8f23b1d99 /usr/lib/heartbeat/drbd-peer-outdater debnode2:/usr/lib/heartbeat# md5sum /usr/lib/heartbeat/dopd 9ce67567ea50157bfd2f0e3f3623010d /usr/lib/heartbeat/dopd debnode2:/usr/lib/heartbeat# dpkg -l | grep heartbeat ii heartbeat 2.1.3-5~bpo40+1 Subsystem for High-Availability Linux ii heartbeat-2 2.1.3-5~bpo40+1 Subsystem for High-Availability Linux Any hints appreshiated, Robert