Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
I actually compiled the dopd from source myself (patched version = Heartbeat 2.1.3 + http://www.linbit.com/support/hotfix/dopd_heartbeat_2.1.3/src/heartbeat-dopd-fix.diff). This time it works fine. Aug 15 17:09:54 debnode2 drbd-peer-outdater: [3795]: debug: message: outdater_rc, debnode2 Aug 15 17:09:54 debnode2 /usr/lib/heartbeat/dopd: [3042]: WARN: Cluster node: debnode1: status: dead Aug 15 17:09:54 debnode2 /usr/lib/heartbeat/dopd: [3042]: debug: Processed 1 messages Aug 15 17:09:54 debnode2 /usr/lib/heartbeat/dopd: [3042]: debug: invoked: outdater Aug 15 17:09:54 debnode2 /usr/lib/heartbeat/dopd: [3042]: debug: Processed 0 messages Aug 15 17:09:54 debnode2 /usr/lib/heartbeat/dopd: [3042]: debug: destroying connection: (null) Aug 15 17:09:54 debnode2 /usr/lib/heartbeat/dopd: [3042]: debug: Deleting outdater (0x80577a0) from mainloop Aug 15 17:09:54 debnode2 ResourceManager[3761]: [3797]: debug: /etc/ha.d/resource.d/drbddisk start done. RC=0 After manually stipping the compiled binaries (strip dopd) I get those files: -rwxr-xr-x 1 root root 13240 2008-08-15 17:04 /usr/lib/heartbeat/dopd -rwxr-xr-x 1 root root 12744 2008-03-28 17:39 /usr/lib/heartbeat/dopd.ORG -rwxr-xr-x 1 root root 9100 2008-08-15 17:04 /usr/lib/heartbeat/drbd-peer-outdater -rwxr-xr-x 1 root root 8716 2008-03-28 17:39 /usr/lib/heartbeat/drbd-peer-outdater.ORG debnode2:/INSTALL/heartbeat-2.1.3/contrib/drbd-outdate-peer# file /usr/lib/heartbeat/dopd /usr/lib/heartbeat/dopd: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), for GNU/Linux 2.4.1, dynamically linked (uses shared libs), for GNU/Linux 2.4.1, stripped debnode2:/INSTALL/heartbeat-2.1.3/contrib/drbd-outdate-peer# file /usr/lib/heartbeat/dopd.ORG /usr/lib/heartbeat/dopd.ORG: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), for GNU/Linux 2.4.1, dynamically linked (uses shared libs), for GNU/Linux 2.4.1, stripped Could it be that the files at http://www.linbit.com/support/hotfix/dopd_heartbeat_2.1.3/etch-i386/ are still broken ? Robert Robert schrieb: > Hi, I am using heartbeat2 from backports.org on debian 4.0. > When testing the dopd hotfix for etch > (http://www.linbit.com/support/hotfix/dopd_heartbeat_2.1.3/etch-i386/) > I still can't get dopd to work. > > I followed the instructions in > http://www.linbit.com/support/hotfix/dopd_heartbeat_2.1.3/README and > downloaded the fixed versions, downloaded the MD5SUM file, extracted > the files and verified the MD5 sum. > -- > debnode2:/usr/lib/heartbeat# wget > http://www.linbit.com/support/hotfix/dopd_heartbeat_2.1.3/etch-i386/MD5SUM > > --13:20:54-- > http://www.linbit.com/support/hotfix/dopd_heartbeat_2.1.3/etch-i386/MD5SUM > > => `MD5SUM' > Auflösen des Hostnamen »www.linbit.com«.... 212.69.162.23 > Verbindungsaufbau zu www.linbit.com|212.69.162.23|:80... verbunden. > HTTP Anforderung gesendet, warte auf Antwort... 200 OK > Länge: 192 [text/plain] > > 100%[====================================>] 192 --.--K/s > > 13:20:54 (10.96 MB/s) - »MD5SUM« gespeichert [192/192] > > debnode2:/usr/lib/heartbeat# md5sum --check < MD5SUM > dopd: OK > drbd-peer-outdater: OK > dopd.bz2: OK > drbd-peer-outdater.bz2: OK > -- > > Then I shut down debnode1 (it's a VM, so I powered it down HARD) - the > failover with heartbeat did not work. Logs: > > Aug 15 12:39:33 debnode2 kernel: drbd0: PingAck did not arrive in time. > Aug 15 12:39:33 debnode2 kernel: drbd0: peer( Primary -> Unknown ) > conn( Connected -> NetworkFailure ) pdsk( UpToDate -> DUnknown ) > Aug 15 12:39:33 debnode2 kernel: drbd0: asender terminated > Aug 15 12:39:33 debnode2 kernel: drbd0: Terminating asender thread > Aug 15 12:39:33 debnode2 kernel: drbd0: short read expecting header on > sock: r=-512 > Aug 15 12:39:33 debnode2 kernel: drbd0: Writing meta data super block > now. > Aug 15 12:39:33 debnode2 kernel: drbd0: tl_clear() > Aug 15 12:39:33 debnode2 kernel: drbd0: Connection closed > Aug 15 12:39:33 debnode2 kernel: drbd0: conn( NetworkFailure -> > Unconnected ) > Aug 15 12:39:33 debnode2 kernel: drbd0: receiver terminated > Aug 15 12:39:33 debnode2 kernel: drbd0: receiver (re)started > Aug 15 12:39:33 debnode2 kernel: drbd0: conn( Unconnected -> > WFConnection ) > Aug 15 12:39:37 debnode2 heartbeat: [2707]: WARN: node debnode1: is dead > Aug 15 12:39:37 debnode2 heartbeat: [2707]: WARN: No STONITH device > configured. > Aug 15 12:39:37 debnode2 ipfail: [4559]: info: Status update: Node > debnode1 now has status dead > Aug 15 12:39:37 debnode2 heartbeat: [2707]: WARN: Shared disks are not > protected. > Aug 15 12:39:37 debnode2 heartbeat: [2707]: info: Resources being > acquired from debnode1. > Aug 15 12:39:37 debnode2 heartbeat: [2707]: info: Link debnode1:eth0 > dead. > Aug 15 12:39:37 debnode2 heartbeat: [2707]: info: Link debnode1:eth1 > dead. > Aug 15 12:39:37 debnode2 heartbeat: [5210]: debug: notify_world: > setting SIGCHLD Handler to SIG_DFL > Aug 15 12:39:37 debnode2 harc[5210]: [5223]: info: Running > /etc/ha.d/rc.d/status status > Aug 15 12:39:37 debnode2 heartbeat: [5211]: info: No local resources > [/usr/share/heartbeat/ResourceManager listkeys debnode2] to acquire. > Aug 15 12:39:37 debnode2 heartbeat: [2707]: debug: > StartNextRemoteRscReq(): child count 1 > Aug 15 12:39:37 debnode2 ipfail: [4559]: debug: Found ping node > 192.168.226.2! > Aug 15 12:39:37 debnode2 mach_down[5235]: [5256]: info: Taking over > resource group drbddisk > Aug 15 12:39:37 debnode2 ResourceManager[5257]: [5268]: info: > Acquiring resource group: debnode1 drbddisk > Filesystem::/dev/drbd0::/db::ext3::noatime IPaddr2:: > 192.168.226.42/32/eth0 > Aug 15 12:39:37 debnode2 ResourceManager[5257]: [5284]: info: Running > /etc/ha.d/resource.d/drbddisk start > Aug 15 12:39:37 debnode2 ResourceManager[5257]: [5285]: debug: > Starting /etc/ha.d/resource.d/drbddisk start > [xxxxx] > Aug 15 12:39:37 debnode2 kernel: drbd0: helper command: /sbin/drbdadm > outdate-peer > Aug 15 12:39:38 debnode2 drbd-peer-outdater: [5291]: debug: drbd peer: > debnode1 > Aug 15 12:39:38 debnode2 drbd-peer-outdater: [5291]: debug: drbd > resource: db > Aug 15 12:39:38 debnode2 /usr/lib/heartbeat/dopd: [4560]: debug: > Connecting channel > Aug 15 12:39:38 debnode2 /usr/lib/heartbeat/dopd: [4560]: debug: > Client outdater (0x8057ee0) connected > Aug 15 12:39:38 debnode2 /usr/lib/heartbeat/dopd: [4560]: debug: > invoked: outdater > Aug 15 12:39:38 debnode2 /usr/lib/heartbeat/dopd: [4560]: debug: > Processing msg from outdater > Aug 15 12:39:38 debnode2 /usr/lib/heartbeat/dopd: [4560]: debug: Got > message from (drbd-peer-outdater). (peer: debnode1, res :db) > Aug 15 12:39:38 debnode2 /usr/lib/heartbeat/dopd: [4560]: debug: > Starting node walk > Aug 15 12:39:38 debnode2 ipfail: [4559]: info: NS: We are still alive! > Aug 15 12:39:38 debnode2 ipfail: [4559]: info: Link Status update: > Link debnode1/eth0 now has status dead > Aug 15 12:39:38 debnode2 drbd-peer-outdater: [5291]: debug: message: > outdater_rc, debnode2 > Aug 15 12:39:38 debnode2 kernel: drbd0: outdate-peer helper broken, > returned 20 > Aug 15 12:39:38 debnode2 kernel: drbd0: helper command: /sbin/drbdadm > outdate-peer > [xxxxx] > > The Section between [xxxxx] repeats in a loop. > --- > > Then I tried to manually start the service (debnode1 still powered down): > > debnode2:/usr/lib/heartbeat# cl_status listnodes > 192.168.226.2 > debnode2 > debnode1 > debnode2:/usr/lib/heartbeat# cl_status nodestatus debnode1 > dead > debnode2:/usr/lib/heartbeat# echo "==========================" >> > /var/log/syslog > debnode2:/usr/lib/heartbeat# /usr/lib/heartbeat/drbd-peer-outdater -p > debnode1 -r db -t 4; echo $? > 20 > > Return code 20 and the log says: > > ========================== > Aug 15 13:27:09 debnode2 drbd-peer-outdater: [6949]: debug: drbd peer: > debnode1 > Aug 15 13:27:09 debnode2 drbd-peer-outdater: [6949]: debug: drbd > resource: db > Aug 15 13:27:09 debnode2 /usr/lib/heartbeat/dopd: [4560]: debug: > Connecting channel > Aug 15 13:27:09 debnode2 /usr/lib/heartbeat/dopd: [4560]: debug: > Client outdater (0x805fe40) connected > Aug 15 13:27:09 debnode2 /usr/lib/heartbeat/dopd: [4560]: debug: > invoked: outdater > Aug 15 13:27:09 debnode2 /usr/lib/heartbeat/dopd: [4560]: debug: > Processing msg from outdater > Aug 15 13:27:09 debnode2 /usr/lib/heartbeat/dopd: [4560]: debug: Got > message from (drbd-peer-outdater). (peer: debnode1, res :db) > Aug 15 13:27:09 debnode2 /usr/lib/heartbeat/dopd: [4560]: debug: > Starting node walk > Aug 15 13:27:10 debnode2 drbd-peer-outdater: [6949]: debug: message: > outdater_rc, debnode2 > Aug 15 13:27:10 debnode2 /usr/lib/heartbeat/dopd: [4560]: WARN: > Cluster node: debnode1: status: dead > Aug 15 13:27:10 debnode2 /usr/lib/heartbeat/dopd: [4560]: debug: > Processed 1 messages > Aug 15 13:27:10 debnode2 /usr/lib/heartbeat/dopd: [4560]: debug: > invoked: outdater > Aug 15 13:27:10 debnode2 /usr/lib/heartbeat/dopd: [4560]: debug: > Processed 0 messages > Aug 15 13:27:10 debnode2 /usr/lib/heartbeat/dopd: [4560]: debug: > destroying connection: (null) > Aug 15 13:27:10 debnode2 /usr/lib/heartbeat/dopd: [4560]: debug: > Deleting outdater (0x805fe40) from mainloop > > debnode2:/usr/lib/heartbeat# drbdadm primary db > /dev/drbd0: State change failed: (-7) Refusing to be Primary while > peer is not outdated > Command 'drbdsetup /dev/drbd0 primary' terminated with exit code 11 > > What's the issue here ? From my understanding dopd should: > > "invalidate remote DRBD disks if ONLY the replication link is broken > and heartbeat can still communitcate with the remote peer over an > alternate network - aka a second heartbeat". If the node to be > outdated is known to be dead by heartbeat, the node is dead and dopd > should just continue. This latter case is the behaviour if a node is > really dead." > > Or did I miss something ? > > Just for completeness: > debnode2:/usr/lib/heartbeat# ls -al > /usr/lib/heartbeat/drbd-peer-outdater* > -rwxr-xr-x 1 root root 8716 2008-08-05 15:49 > /usr/lib/heartbeat/drbd-peer-outdater > -rw-r--r-- 1 root root 4419 2008-08-05 15:49 > /usr/lib/heartbeat/drbd-peer-outdater.bz2 > -rwxr-xr-x 1 root root 8716 2008-03-28 17:39 > /usr/lib/heartbeat/drbd-peer-outdater.ORG > debnode2:/usr/lib/heartbeat# ls -al /usr/lib/heartbeat/dopd* > -rwxr-xr-x 1 root root 12744 2008-08-05 15:49 /usr/lib/heartbeat/dopd > -rw-r--r-- 1 root root 6116 2008-08-05 15:49 /usr/lib/heartbeat/dopd.bz2 > -rwxr-xr-x 1 root root 12744 2008-03-28 17:39 /usr/lib/heartbeat/dopd.ORG > > debnode2:/usr/lib/heartbeat# md5sum /usr/lib/heartbeat/drbd-peer-outdater > b195f526bb6fa3659f4c63e8f23b1d99 /usr/lib/heartbeat/drbd-peer-outdater > debnode2:/usr/lib/heartbeat# md5sum /usr/lib/heartbeat/dopd > 9ce67567ea50157bfd2f0e3f3623010d /usr/lib/heartbeat/dopd > > debnode2:/usr/lib/heartbeat# dpkg -l | grep heartbeat > ii heartbeat > 2.1.3-5~bpo40+1 Subsystem for > High-Availability Linux > ii heartbeat-2 > 2.1.3-5~bpo40+1 Subsystem for > High-Availability Linux > > > Any hints appreshiated, > Robert > > _______________________________________________ > drbd-user mailing list > drbd-user at lists.linbit.com > http://lists.linbit.com/mailman/listinfo/drbd-user