Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hi, I am using heartbeat2 from backports.org on debian 4.0.
When testing the dopd hotfix for etch
(http://www.linbit.com/support/hotfix/dopd_heartbeat_2.1.3/etch-i386/)
I still can't get dopd to work.
I followed the instructions in
http://www.linbit.com/support/hotfix/dopd_heartbeat_2.1.3/README and
downloaded the fixed versions, downloaded the MD5SUM file, extracted the
files and verified the MD5 sum.
--
debnode2:/usr/lib/heartbeat# wget
http://www.linbit.com/support/hotfix/dopd_heartbeat_2.1.3/etch-i386/MD5SUM
--13:20:54--
http://www.linbit.com/support/hotfix/dopd_heartbeat_2.1.3/etch-i386/MD5SUM
=> `MD5SUM'
Auflösen des Hostnamen »www.linbit.com«.... 212.69.162.23
Verbindungsaufbau zu www.linbit.com|212.69.162.23|:80... verbunden.
HTTP Anforderung gesendet, warte auf Antwort... 200 OK
Länge: 192 [text/plain]
100%[====================================>] 192 --.--K/s
13:20:54 (10.96 MB/s) - »MD5SUM« gespeichert [192/192]
debnode2:/usr/lib/heartbeat# md5sum --check < MD5SUM
dopd: OK
drbd-peer-outdater: OK
dopd.bz2: OK
drbd-peer-outdater.bz2: OK
--
Then I shut down debnode1 (it's a VM, so I powered it down HARD) - the
failover with heartbeat did not work. Logs:
Aug 15 12:39:33 debnode2 kernel: drbd0: PingAck did not arrive in time.
Aug 15 12:39:33 debnode2 kernel: drbd0: peer( Primary -> Unknown ) conn(
Connected -> NetworkFailure ) pdsk( UpToDate -> DUnknown )
Aug 15 12:39:33 debnode2 kernel: drbd0: asender terminated
Aug 15 12:39:33 debnode2 kernel: drbd0: Terminating asender thread
Aug 15 12:39:33 debnode2 kernel: drbd0: short read expecting header on
sock: r=-512
Aug 15 12:39:33 debnode2 kernel: drbd0: Writing meta data super block now.
Aug 15 12:39:33 debnode2 kernel: drbd0: tl_clear()
Aug 15 12:39:33 debnode2 kernel: drbd0: Connection closed
Aug 15 12:39:33 debnode2 kernel: drbd0: conn( NetworkFailure ->
Unconnected )
Aug 15 12:39:33 debnode2 kernel: drbd0: receiver terminated
Aug 15 12:39:33 debnode2 kernel: drbd0: receiver (re)started
Aug 15 12:39:33 debnode2 kernel: drbd0: conn( Unconnected -> WFConnection )
Aug 15 12:39:37 debnode2 heartbeat: [2707]: WARN: node debnode1: is dead
Aug 15 12:39:37 debnode2 heartbeat: [2707]: WARN: No STONITH device
configured.
Aug 15 12:39:37 debnode2 ipfail: [4559]: info: Status update: Node
debnode1 now has status dead
Aug 15 12:39:37 debnode2 heartbeat: [2707]: WARN: Shared disks are not
protected.
Aug 15 12:39:37 debnode2 heartbeat: [2707]: info: Resources being
acquired from debnode1.
Aug 15 12:39:37 debnode2 heartbeat: [2707]: info: Link debnode1:eth0 dead.
Aug 15 12:39:37 debnode2 heartbeat: [2707]: info: Link debnode1:eth1 dead.
Aug 15 12:39:37 debnode2 heartbeat: [5210]: debug: notify_world: setting
SIGCHLD Handler to SIG_DFL
Aug 15 12:39:37 debnode2 harc[5210]: [5223]: info: Running
/etc/ha.d/rc.d/status status
Aug 15 12:39:37 debnode2 heartbeat: [5211]: info: No local resources
[/usr/share/heartbeat/ResourceManager listkeys debnode2] to acquire.
Aug 15 12:39:37 debnode2 heartbeat: [2707]: debug:
StartNextRemoteRscReq(): child count 1
Aug 15 12:39:37 debnode2 ipfail: [4559]: debug: Found ping node
192.168.226.2!
Aug 15 12:39:37 debnode2 mach_down[5235]: [5256]: info: Taking over
resource group drbddisk
Aug 15 12:39:37 debnode2 ResourceManager[5257]: [5268]: info: Acquiring
resource group: debnode1 drbddisk
Filesystem::/dev/drbd0::/db::ext3::noatime IPaddr2::
192.168.226.42/32/eth0
Aug 15 12:39:37 debnode2 ResourceManager[5257]: [5284]: info: Running
/etc/ha.d/resource.d/drbddisk start
Aug 15 12:39:37 debnode2 ResourceManager[5257]: [5285]: debug: Starting
/etc/ha.d/resource.d/drbddisk start
[xxxxx]
Aug 15 12:39:37 debnode2 kernel: drbd0: helper command: /sbin/drbdadm
outdate-peer
Aug 15 12:39:38 debnode2 drbd-peer-outdater: [5291]: debug: drbd peer:
debnode1
Aug 15 12:39:38 debnode2 drbd-peer-outdater: [5291]: debug: drbd
resource: db
Aug 15 12:39:38 debnode2 /usr/lib/heartbeat/dopd: [4560]: debug:
Connecting channel
Aug 15 12:39:38 debnode2 /usr/lib/heartbeat/dopd: [4560]: debug: Client
outdater (0x8057ee0) connected
Aug 15 12:39:38 debnode2 /usr/lib/heartbeat/dopd: [4560]: debug:
invoked: outdater
Aug 15 12:39:38 debnode2 /usr/lib/heartbeat/dopd: [4560]: debug:
Processing msg from outdater
Aug 15 12:39:38 debnode2 /usr/lib/heartbeat/dopd: [4560]: debug: Got
message from (drbd-peer-outdater). (peer: debnode1, res :db)
Aug 15 12:39:38 debnode2 /usr/lib/heartbeat/dopd: [4560]: debug:
Starting node walk
Aug 15 12:39:38 debnode2 ipfail: [4559]: info: NS: We are still alive!
Aug 15 12:39:38 debnode2 ipfail: [4559]: info: Link Status update: Link
debnode1/eth0 now has status dead
Aug 15 12:39:38 debnode2 drbd-peer-outdater: [5291]: debug: message:
outdater_rc, debnode2
Aug 15 12:39:38 debnode2 kernel: drbd0: outdate-peer helper broken,
returned 20
Aug 15 12:39:38 debnode2 kernel: drbd0: helper command: /sbin/drbdadm
outdate-peer
[xxxxx]
The Section between [xxxxx] repeats in a loop.
---
Then I tried to manually start the service (debnode1 still powered down):
debnode2:/usr/lib/heartbeat# cl_status listnodes
192.168.226.2
debnode2
debnode1
debnode2:/usr/lib/heartbeat# cl_status nodestatus debnode1
dead
debnode2:/usr/lib/heartbeat# echo "==========================" >>
/var/log/syslog
debnode2:/usr/lib/heartbeat# /usr/lib/heartbeat/drbd-peer-outdater -p
debnode1 -r db -t 4; echo $?
20
Return code 20 and the log says:
==========================
Aug 15 13:27:09 debnode2 drbd-peer-outdater: [6949]: debug: drbd peer:
debnode1
Aug 15 13:27:09 debnode2 drbd-peer-outdater: [6949]: debug: drbd
resource: db
Aug 15 13:27:09 debnode2 /usr/lib/heartbeat/dopd: [4560]: debug:
Connecting channel
Aug 15 13:27:09 debnode2 /usr/lib/heartbeat/dopd: [4560]: debug: Client
outdater (0x805fe40) connected
Aug 15 13:27:09 debnode2 /usr/lib/heartbeat/dopd: [4560]: debug:
invoked: outdater
Aug 15 13:27:09 debnode2 /usr/lib/heartbeat/dopd: [4560]: debug:
Processing msg from outdater
Aug 15 13:27:09 debnode2 /usr/lib/heartbeat/dopd: [4560]: debug: Got
message from (drbd-peer-outdater). (peer: debnode1, res :db)
Aug 15 13:27:09 debnode2 /usr/lib/heartbeat/dopd: [4560]: debug:
Starting node walk
Aug 15 13:27:10 debnode2 drbd-peer-outdater: [6949]: debug: message:
outdater_rc, debnode2
Aug 15 13:27:10 debnode2 /usr/lib/heartbeat/dopd: [4560]: WARN: Cluster
node: debnode1: status: dead
Aug 15 13:27:10 debnode2 /usr/lib/heartbeat/dopd: [4560]: debug:
Processed 1 messages
Aug 15 13:27:10 debnode2 /usr/lib/heartbeat/dopd: [4560]: debug:
invoked: outdater
Aug 15 13:27:10 debnode2 /usr/lib/heartbeat/dopd: [4560]: debug:
Processed 0 messages
Aug 15 13:27:10 debnode2 /usr/lib/heartbeat/dopd: [4560]: debug:
destroying connection: (null)
Aug 15 13:27:10 debnode2 /usr/lib/heartbeat/dopd: [4560]: debug:
Deleting outdater (0x805fe40) from mainloop
debnode2:/usr/lib/heartbeat# drbdadm primary db
/dev/drbd0: State change failed: (-7) Refusing to be Primary while peer
is not outdated
Command 'drbdsetup /dev/drbd0 primary' terminated with exit code 11
What's the issue here ? From my understanding dopd should:
"invalidate remote DRBD disks if ONLY the replication link is broken and
heartbeat can still communitcate with the remote peer over an alternate
network - aka a second heartbeat". If the node to be outdated is known
to be dead by heartbeat, the node is dead and dopd should just continue.
This latter case is the behaviour if a node is really dead."
Or did I miss something ?
Just for completeness:
debnode2:/usr/lib/heartbeat# ls -al /usr/lib/heartbeat/drbd-peer-outdater*
-rwxr-xr-x 1 root root 8716 2008-08-05 15:49
/usr/lib/heartbeat/drbd-peer-outdater
-rw-r--r-- 1 root root 4419 2008-08-05 15:49
/usr/lib/heartbeat/drbd-peer-outdater.bz2
-rwxr-xr-x 1 root root 8716 2008-03-28 17:39
/usr/lib/heartbeat/drbd-peer-outdater.ORG
debnode2:/usr/lib/heartbeat# ls -al /usr/lib/heartbeat/dopd*
-rwxr-xr-x 1 root root 12744 2008-08-05 15:49 /usr/lib/heartbeat/dopd
-rw-r--r-- 1 root root 6116 2008-08-05 15:49 /usr/lib/heartbeat/dopd.bz2
-rwxr-xr-x 1 root root 12744 2008-03-28 17:39 /usr/lib/heartbeat/dopd.ORG
debnode2:/usr/lib/heartbeat# md5sum /usr/lib/heartbeat/drbd-peer-outdater
b195f526bb6fa3659f4c63e8f23b1d99 /usr/lib/heartbeat/drbd-peer-outdater
debnode2:/usr/lib/heartbeat# md5sum /usr/lib/heartbeat/dopd
9ce67567ea50157bfd2f0e3f3623010d /usr/lib/heartbeat/dopd
debnode2:/usr/lib/heartbeat# dpkg -l | grep heartbeat
ii heartbeat
2.1.3-5~bpo40+1 Subsystem for High-Availability
Linux
ii heartbeat-2
2.1.3-5~bpo40+1 Subsystem for High-Availability
Linux
Any hints appreshiated,
Robert