[DRBD-user] dopd still not working ?

Robert reg at elconas.de
Fri Aug 15 11:31:45 CEST 2008

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Hi, I am using heartbeat2 from backports.org on debian 4.0.
When testing the dopd hotfix for etch 
(http://www.linbit.com/support/hotfix/dopd_heartbeat_2.1.3/etch-i386/)  
I still can't get dopd to work.

I followed the instructions in 
http://www.linbit.com/support/hotfix/dopd_heartbeat_2.1.3/README and 
downloaded the fixed versions, downloaded the MD5SUM file, extracted the 
files and verified the MD5 sum.
--
debnode2:/usr/lib/heartbeat# wget 
http://www.linbit.com/support/hotfix/dopd_heartbeat_2.1.3/etch-i386/MD5SUM
--13:20:54--  
http://www.linbit.com/support/hotfix/dopd_heartbeat_2.1.3/etch-i386/MD5SUM
           => `MD5SUM'
Auflösen des Hostnamen »www.linbit.com«.... 212.69.162.23
Verbindungsaufbau zu www.linbit.com|212.69.162.23|:80... verbunden.
HTTP Anforderung gesendet, warte auf Antwort... 200 OK
Länge: 192 [text/plain]

100%[====================================>] 192           --.--K/s

13:20:54 (10.96 MB/s) - »MD5SUM« gespeichert [192/192]

debnode2:/usr/lib/heartbeat# md5sum --check < MD5SUM
dopd: OK
drbd-peer-outdater: OK
dopd.bz2: OK
drbd-peer-outdater.bz2: OK
--

Then I shut down debnode1 (it's a VM, so I powered it down HARD) - the 
failover with heartbeat did not work. Logs:

Aug 15 12:39:33 debnode2 kernel: drbd0: PingAck did not arrive in time.
Aug 15 12:39:33 debnode2 kernel: drbd0: peer( Primary -> Unknown ) conn( 
Connected -> NetworkFailure ) pdsk( UpToDate -> DUnknown )
Aug 15 12:39:33 debnode2 kernel: drbd0: asender terminated
Aug 15 12:39:33 debnode2 kernel: drbd0: Terminating asender thread
Aug 15 12:39:33 debnode2 kernel: drbd0: short read expecting header on 
sock: r=-512
Aug 15 12:39:33 debnode2 kernel: drbd0: Writing meta data super block now.
Aug 15 12:39:33 debnode2 kernel: drbd0: tl_clear()
Aug 15 12:39:33 debnode2 kernel: drbd0: Connection closed
Aug 15 12:39:33 debnode2 kernel: drbd0: conn( NetworkFailure -> 
Unconnected )
Aug 15 12:39:33 debnode2 kernel: drbd0: receiver terminated
Aug 15 12:39:33 debnode2 kernel: drbd0: receiver (re)started
Aug 15 12:39:33 debnode2 kernel: drbd0: conn( Unconnected -> WFConnection )
Aug 15 12:39:37 debnode2 heartbeat: [2707]: WARN: node debnode1: is dead
Aug 15 12:39:37 debnode2 heartbeat: [2707]: WARN: No STONITH device 
configured.
Aug 15 12:39:37 debnode2 ipfail: [4559]: info: Status update: Node 
debnode1 now has status dead
Aug 15 12:39:37 debnode2 heartbeat: [2707]: WARN: Shared disks are not 
protected.
Aug 15 12:39:37 debnode2 heartbeat: [2707]: info: Resources being 
acquired from debnode1.
Aug 15 12:39:37 debnode2 heartbeat: [2707]: info: Link debnode1:eth0 dead.
Aug 15 12:39:37 debnode2 heartbeat: [2707]: info: Link debnode1:eth1 dead.
Aug 15 12:39:37 debnode2 heartbeat: [5210]: debug: notify_world: setting 
SIGCHLD Handler to SIG_DFL
Aug 15 12:39:37 debnode2 harc[5210]: [5223]: info: Running 
/etc/ha.d/rc.d/status status
Aug 15 12:39:37 debnode2 heartbeat: [5211]: info: No local resources 
[/usr/share/heartbeat/ResourceManager listkeys debnode2] to acquire.
Aug 15 12:39:37 debnode2 heartbeat: [2707]: debug: 
StartNextRemoteRscReq(): child count 1
Aug 15 12:39:37 debnode2 ipfail: [4559]: debug: Found ping node 
192.168.226.2!
Aug 15 12:39:37 debnode2 mach_down[5235]: [5256]: info: Taking over 
resource group drbddisk
Aug 15 12:39:37 debnode2 ResourceManager[5257]: [5268]: info: Acquiring 
resource group: debnode1 drbddisk 
Filesystem::/dev/drbd0::/db::ext3::noatime IPaddr2::
192.168.226.42/32/eth0
Aug 15 12:39:37 debnode2 ResourceManager[5257]: [5284]: info: Running 
/etc/ha.d/resource.d/drbddisk  start
Aug 15 12:39:37 debnode2 ResourceManager[5257]: [5285]: debug: Starting 
/etc/ha.d/resource.d/drbddisk  start
[xxxxx]
Aug 15 12:39:37 debnode2 kernel: drbd0: helper command: /sbin/drbdadm 
outdate-peer
Aug 15 12:39:38 debnode2 drbd-peer-outdater: [5291]: debug: drbd peer: 
debnode1
Aug 15 12:39:38 debnode2 drbd-peer-outdater: [5291]: debug: drbd 
resource: db
Aug 15 12:39:38 debnode2 /usr/lib/heartbeat/dopd: [4560]: debug: 
Connecting channel
Aug 15 12:39:38 debnode2 /usr/lib/heartbeat/dopd: [4560]: debug: Client 
outdater (0x8057ee0) connected
Aug 15 12:39:38 debnode2 /usr/lib/heartbeat/dopd: [4560]: debug: 
invoked: outdater
Aug 15 12:39:38 debnode2 /usr/lib/heartbeat/dopd: [4560]: debug: 
Processing msg from outdater
Aug 15 12:39:38 debnode2 /usr/lib/heartbeat/dopd: [4560]: debug: Got 
message from (drbd-peer-outdater). (peer: debnode1, res :db)
Aug 15 12:39:38 debnode2 /usr/lib/heartbeat/dopd: [4560]: debug: 
Starting node walk
Aug 15 12:39:38 debnode2 ipfail: [4559]: info: NS: We are still alive!
Aug 15 12:39:38 debnode2 ipfail: [4559]: info: Link Status update: Link 
debnode1/eth0 now has status dead
Aug 15 12:39:38 debnode2 drbd-peer-outdater: [5291]: debug: message: 
outdater_rc, debnode2
Aug 15 12:39:38 debnode2 kernel: drbd0: outdate-peer helper broken, 
returned 20
Aug 15 12:39:38 debnode2 kernel: drbd0: helper command: /sbin/drbdadm 
outdate-peer
[xxxxx]

The Section between [xxxxx] repeats in a loop.
---

Then I tried to manually start the service (debnode1 still powered down):

debnode2:/usr/lib/heartbeat# cl_status listnodes
192.168.226.2
debnode2
debnode1
debnode2:/usr/lib/heartbeat# cl_status nodestatus debnode1
dead
debnode2:/usr/lib/heartbeat# echo "==========================" >> 
/var/log/syslog
debnode2:/usr/lib/heartbeat# /usr/lib/heartbeat/drbd-peer-outdater  -p 
debnode1 -r db -t 4; echo $?
20

Return code 20 and the log says:

==========================
Aug 15 13:27:09 debnode2 drbd-peer-outdater: [6949]: debug: drbd peer: 
debnode1
Aug 15 13:27:09 debnode2 drbd-peer-outdater: [6949]: debug: drbd 
resource: db
Aug 15 13:27:09 debnode2 /usr/lib/heartbeat/dopd: [4560]: debug: 
Connecting channel
Aug 15 13:27:09 debnode2 /usr/lib/heartbeat/dopd: [4560]: debug: Client 
outdater (0x805fe40) connected
Aug 15 13:27:09 debnode2 /usr/lib/heartbeat/dopd: [4560]: debug: 
invoked: outdater
Aug 15 13:27:09 debnode2 /usr/lib/heartbeat/dopd: [4560]: debug: 
Processing msg from outdater
Aug 15 13:27:09 debnode2 /usr/lib/heartbeat/dopd: [4560]: debug: Got 
message from (drbd-peer-outdater). (peer: debnode1, res :db)
Aug 15 13:27:09 debnode2 /usr/lib/heartbeat/dopd: [4560]: debug: 
Starting node walk
Aug 15 13:27:10 debnode2 drbd-peer-outdater: [6949]: debug: message: 
outdater_rc, debnode2
Aug 15 13:27:10 debnode2 /usr/lib/heartbeat/dopd: [4560]: WARN: Cluster 
node: debnode1: status: dead
Aug 15 13:27:10 debnode2 /usr/lib/heartbeat/dopd: [4560]: debug: 
Processed 1 messages
Aug 15 13:27:10 debnode2 /usr/lib/heartbeat/dopd: [4560]: debug: 
invoked: outdater
Aug 15 13:27:10 debnode2 /usr/lib/heartbeat/dopd: [4560]: debug: 
Processed 0 messages
Aug 15 13:27:10 debnode2 /usr/lib/heartbeat/dopd: [4560]: debug: 
destroying connection: (null)
Aug 15 13:27:10 debnode2 /usr/lib/heartbeat/dopd: [4560]: debug: 
Deleting outdater (0x805fe40) from mainloop

debnode2:/usr/lib/heartbeat# drbdadm primary db
/dev/drbd0: State change failed: (-7) Refusing to be Primary while peer 
is not outdated
Command 'drbdsetup /dev/drbd0 primary' terminated with exit code 11

What's the issue here ? From my understanding dopd should:

"invalidate remote DRBD disks if ONLY the replication link is broken and 
heartbeat can still communitcate with the remote peer over an alternate 
network - aka a second heartbeat". If the node to be outdated is known 
to be dead by heartbeat, the node is dead and dopd should just continue. 
This latter case is the behaviour if a node is really dead."

Or did I miss something ?

Just for completeness:
debnode2:/usr/lib/heartbeat# ls -al /usr/lib/heartbeat/drbd-peer-outdater*
-rwxr-xr-x 1 root root 8716 2008-08-05 15:49 
/usr/lib/heartbeat/drbd-peer-outdater
-rw-r--r-- 1 root root 4419 2008-08-05 15:49 
/usr/lib/heartbeat/drbd-peer-outdater.bz2
-rwxr-xr-x 1 root root 8716 2008-03-28 17:39 
/usr/lib/heartbeat/drbd-peer-outdater.ORG
debnode2:/usr/lib/heartbeat# ls -al /usr/lib/heartbeat/dopd*
-rwxr-xr-x 1 root root 12744 2008-08-05 15:49 /usr/lib/heartbeat/dopd
-rw-r--r-- 1 root root  6116 2008-08-05 15:49 /usr/lib/heartbeat/dopd.bz2
-rwxr-xr-x 1 root root 12744 2008-03-28 17:39 /usr/lib/heartbeat/dopd.ORG

debnode2:/usr/lib/heartbeat# md5sum /usr/lib/heartbeat/drbd-peer-outdater
b195f526bb6fa3659f4c63e8f23b1d99  /usr/lib/heartbeat/drbd-peer-outdater
debnode2:/usr/lib/heartbeat# md5sum /usr/lib/heartbeat/dopd
9ce67567ea50157bfd2f0e3f3623010d  /usr/lib/heartbeat/dopd

debnode2:/usr/lib/heartbeat# dpkg -l | grep heartbeat
ii  heartbeat                         
2.1.3-5~bpo40+1                          Subsystem for High-Availability 
Linux
ii  heartbeat-2                       
2.1.3-5~bpo40+1                          Subsystem for High-Availability 
Linux


Any hints appreshiated,
Robert




More information about the drbd-user mailing list