[DRBD-user] dopd still not working ?

Robert reg at elconas.de
Fri Aug 15 15:12:45 CEST 2008

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


I actually compiled the dopd from source myself (patched version = 
Heartbeat 2.1.3 + 
http://www.linbit.com/support/hotfix/dopd_heartbeat_2.1.3/src/heartbeat-dopd-fix.diff). 


This time it works fine.

Aug 15 17:09:54 debnode2 drbd-peer-outdater: [3795]: debug: message: 
outdater_rc, debnode2
Aug 15 17:09:54 debnode2 /usr/lib/heartbeat/dopd: [3042]: WARN: Cluster 
node: debnode1: status: dead
Aug 15 17:09:54 debnode2 /usr/lib/heartbeat/dopd: [3042]: debug: 
Processed 1 messages
Aug 15 17:09:54 debnode2 /usr/lib/heartbeat/dopd: [3042]: debug: 
invoked: outdater
Aug 15 17:09:54 debnode2 /usr/lib/heartbeat/dopd: [3042]: debug: 
Processed 0 messages
Aug 15 17:09:54 debnode2 /usr/lib/heartbeat/dopd: [3042]: debug: 
destroying connection: (null)
Aug 15 17:09:54 debnode2 /usr/lib/heartbeat/dopd: [3042]: debug: 
Deleting outdater (0x80577a0) from mainloop
Aug 15 17:09:54 debnode2 ResourceManager[3761]: [3797]: debug: 
/etc/ha.d/resource.d/drbddisk  start done. RC=0

After manually stipping the compiled binaries (strip dopd) I get those 
files:

-rwxr-xr-x 1 root root 13240 2008-08-15 17:04 /usr/lib/heartbeat/dopd
-rwxr-xr-x 1 root root 12744 2008-03-28 17:39 /usr/lib/heartbeat/dopd.ORG
-rwxr-xr-x 1 root root  9100 2008-08-15 17:04 
/usr/lib/heartbeat/drbd-peer-outdater
-rwxr-xr-x 1 root root  8716 2008-03-28 17:39 
/usr/lib/heartbeat/drbd-peer-outdater.ORG

debnode2:/INSTALL/heartbeat-2.1.3/contrib/drbd-outdate-peer# file 
/usr/lib/heartbeat/dopd
/usr/lib/heartbeat/dopd: ELF 32-bit LSB executable, Intel 80386, version 
1 (SYSV), for GNU/Linux 2.4.1, dynamically linked (uses shared libs), 
for GNU/Linux 2.4.1, stripped
debnode2:/INSTALL/heartbeat-2.1.3/contrib/drbd-outdate-peer# file 
/usr/lib/heartbeat/dopd.ORG
/usr/lib/heartbeat/dopd.ORG: ELF 32-bit LSB executable, Intel 80386, 
version 1 (SYSV), for GNU/Linux 2.4.1, dynamically linked (uses shared 
libs), for GNU/Linux 2.4.1, stripped

Could it be that the files at 
http://www.linbit.com/support/hotfix/dopd_heartbeat_2.1.3/etch-i386/ are 
still broken ?

Robert

Robert schrieb:
> Hi, I am using heartbeat2 from backports.org on debian 4.0.
> When testing the dopd hotfix for etch 
> (http://www.linbit.com/support/hotfix/dopd_heartbeat_2.1.3/etch-i386/)  
> I still can't get dopd to work.
>
> I followed the instructions in 
> http://www.linbit.com/support/hotfix/dopd_heartbeat_2.1.3/README and 
> downloaded the fixed versions, downloaded the MD5SUM file, extracted 
> the files and verified the MD5 sum.
> -- 
> debnode2:/usr/lib/heartbeat# wget 
> http://www.linbit.com/support/hotfix/dopd_heartbeat_2.1.3/etch-i386/MD5SUM 
>
> --13:20:54--  
> http://www.linbit.com/support/hotfix/dopd_heartbeat_2.1.3/etch-i386/MD5SUM 
>
>           => `MD5SUM'
> Auflösen des Hostnamen »www.linbit.com«.... 212.69.162.23
> Verbindungsaufbau zu www.linbit.com|212.69.162.23|:80... verbunden.
> HTTP Anforderung gesendet, warte auf Antwort... 200 OK
> Länge: 192 [text/plain]
>
> 100%[====================================>] 192           --.--K/s
>
> 13:20:54 (10.96 MB/s) - »MD5SUM« gespeichert [192/192]
>
> debnode2:/usr/lib/heartbeat# md5sum --check < MD5SUM
> dopd: OK
> drbd-peer-outdater: OK
> dopd.bz2: OK
> drbd-peer-outdater.bz2: OK
> -- 
>
> Then I shut down debnode1 (it's a VM, so I powered it down HARD) - the 
> failover with heartbeat did not work. Logs:
>
> Aug 15 12:39:33 debnode2 kernel: drbd0: PingAck did not arrive in time.
> Aug 15 12:39:33 debnode2 kernel: drbd0: peer( Primary -> Unknown ) 
> conn( Connected -> NetworkFailure ) pdsk( UpToDate -> DUnknown )
> Aug 15 12:39:33 debnode2 kernel: drbd0: asender terminated
> Aug 15 12:39:33 debnode2 kernel: drbd0: Terminating asender thread
> Aug 15 12:39:33 debnode2 kernel: drbd0: short read expecting header on 
> sock: r=-512
> Aug 15 12:39:33 debnode2 kernel: drbd0: Writing meta data super block 
> now.
> Aug 15 12:39:33 debnode2 kernel: drbd0: tl_clear()
> Aug 15 12:39:33 debnode2 kernel: drbd0: Connection closed
> Aug 15 12:39:33 debnode2 kernel: drbd0: conn( NetworkFailure -> 
> Unconnected )
> Aug 15 12:39:33 debnode2 kernel: drbd0: receiver terminated
> Aug 15 12:39:33 debnode2 kernel: drbd0: receiver (re)started
> Aug 15 12:39:33 debnode2 kernel: drbd0: conn( Unconnected -> 
> WFConnection )
> Aug 15 12:39:37 debnode2 heartbeat: [2707]: WARN: node debnode1: is dead
> Aug 15 12:39:37 debnode2 heartbeat: [2707]: WARN: No STONITH device 
> configured.
> Aug 15 12:39:37 debnode2 ipfail: [4559]: info: Status update: Node 
> debnode1 now has status dead
> Aug 15 12:39:37 debnode2 heartbeat: [2707]: WARN: Shared disks are not 
> protected.
> Aug 15 12:39:37 debnode2 heartbeat: [2707]: info: Resources being 
> acquired from debnode1.
> Aug 15 12:39:37 debnode2 heartbeat: [2707]: info: Link debnode1:eth0 
> dead.
> Aug 15 12:39:37 debnode2 heartbeat: [2707]: info: Link debnode1:eth1 
> dead.
> Aug 15 12:39:37 debnode2 heartbeat: [5210]: debug: notify_world: 
> setting SIGCHLD Handler to SIG_DFL
> Aug 15 12:39:37 debnode2 harc[5210]: [5223]: info: Running 
> /etc/ha.d/rc.d/status status
> Aug 15 12:39:37 debnode2 heartbeat: [5211]: info: No local resources 
> [/usr/share/heartbeat/ResourceManager listkeys debnode2] to acquire.
> Aug 15 12:39:37 debnode2 heartbeat: [2707]: debug: 
> StartNextRemoteRscReq(): child count 1
> Aug 15 12:39:37 debnode2 ipfail: [4559]: debug: Found ping node 
> 192.168.226.2!
> Aug 15 12:39:37 debnode2 mach_down[5235]: [5256]: info: Taking over 
> resource group drbddisk
> Aug 15 12:39:37 debnode2 ResourceManager[5257]: [5268]: info: 
> Acquiring resource group: debnode1 drbddisk 
> Filesystem::/dev/drbd0::/db::ext3::noatime IPaddr2::
> 192.168.226.42/32/eth0
> Aug 15 12:39:37 debnode2 ResourceManager[5257]: [5284]: info: Running 
> /etc/ha.d/resource.d/drbddisk  start
> Aug 15 12:39:37 debnode2 ResourceManager[5257]: [5285]: debug: 
> Starting /etc/ha.d/resource.d/drbddisk  start
> [xxxxx]
> Aug 15 12:39:37 debnode2 kernel: drbd0: helper command: /sbin/drbdadm 
> outdate-peer
> Aug 15 12:39:38 debnode2 drbd-peer-outdater: [5291]: debug: drbd peer: 
> debnode1
> Aug 15 12:39:38 debnode2 drbd-peer-outdater: [5291]: debug: drbd 
> resource: db
> Aug 15 12:39:38 debnode2 /usr/lib/heartbeat/dopd: [4560]: debug: 
> Connecting channel
> Aug 15 12:39:38 debnode2 /usr/lib/heartbeat/dopd: [4560]: debug: 
> Client outdater (0x8057ee0) connected
> Aug 15 12:39:38 debnode2 /usr/lib/heartbeat/dopd: [4560]: debug: 
> invoked: outdater
> Aug 15 12:39:38 debnode2 /usr/lib/heartbeat/dopd: [4560]: debug: 
> Processing msg from outdater
> Aug 15 12:39:38 debnode2 /usr/lib/heartbeat/dopd: [4560]: debug: Got 
> message from (drbd-peer-outdater). (peer: debnode1, res :db)
> Aug 15 12:39:38 debnode2 /usr/lib/heartbeat/dopd: [4560]: debug: 
> Starting node walk
> Aug 15 12:39:38 debnode2 ipfail: [4559]: info: NS: We are still alive!
> Aug 15 12:39:38 debnode2 ipfail: [4559]: info: Link Status update: 
> Link debnode1/eth0 now has status dead
> Aug 15 12:39:38 debnode2 drbd-peer-outdater: [5291]: debug: message: 
> outdater_rc, debnode2
> Aug 15 12:39:38 debnode2 kernel: drbd0: outdate-peer helper broken, 
> returned 20
> Aug 15 12:39:38 debnode2 kernel: drbd0: helper command: /sbin/drbdadm 
> outdate-peer
> [xxxxx]
>
> The Section between [xxxxx] repeats in a loop.
> ---
>
> Then I tried to manually start the service (debnode1 still powered down):
>
> debnode2:/usr/lib/heartbeat# cl_status listnodes
> 192.168.226.2
> debnode2
> debnode1
> debnode2:/usr/lib/heartbeat# cl_status nodestatus debnode1
> dead
> debnode2:/usr/lib/heartbeat# echo "==========================" >> 
> /var/log/syslog
> debnode2:/usr/lib/heartbeat# /usr/lib/heartbeat/drbd-peer-outdater  -p 
> debnode1 -r db -t 4; echo $?
> 20
>
> Return code 20 and the log says:
>
> ==========================
> Aug 15 13:27:09 debnode2 drbd-peer-outdater: [6949]: debug: drbd peer: 
> debnode1
> Aug 15 13:27:09 debnode2 drbd-peer-outdater: [6949]: debug: drbd 
> resource: db
> Aug 15 13:27:09 debnode2 /usr/lib/heartbeat/dopd: [4560]: debug: 
> Connecting channel
> Aug 15 13:27:09 debnode2 /usr/lib/heartbeat/dopd: [4560]: debug: 
> Client outdater (0x805fe40) connected
> Aug 15 13:27:09 debnode2 /usr/lib/heartbeat/dopd: [4560]: debug: 
> invoked: outdater
> Aug 15 13:27:09 debnode2 /usr/lib/heartbeat/dopd: [4560]: debug: 
> Processing msg from outdater
> Aug 15 13:27:09 debnode2 /usr/lib/heartbeat/dopd: [4560]: debug: Got 
> message from (drbd-peer-outdater). (peer: debnode1, res :db)
> Aug 15 13:27:09 debnode2 /usr/lib/heartbeat/dopd: [4560]: debug: 
> Starting node walk
> Aug 15 13:27:10 debnode2 drbd-peer-outdater: [6949]: debug: message: 
> outdater_rc, debnode2
> Aug 15 13:27:10 debnode2 /usr/lib/heartbeat/dopd: [4560]: WARN: 
> Cluster node: debnode1: status: dead
> Aug 15 13:27:10 debnode2 /usr/lib/heartbeat/dopd: [4560]: debug: 
> Processed 1 messages
> Aug 15 13:27:10 debnode2 /usr/lib/heartbeat/dopd: [4560]: debug: 
> invoked: outdater
> Aug 15 13:27:10 debnode2 /usr/lib/heartbeat/dopd: [4560]: debug: 
> Processed 0 messages
> Aug 15 13:27:10 debnode2 /usr/lib/heartbeat/dopd: [4560]: debug: 
> destroying connection: (null)
> Aug 15 13:27:10 debnode2 /usr/lib/heartbeat/dopd: [4560]: debug: 
> Deleting outdater (0x805fe40) from mainloop
>
> debnode2:/usr/lib/heartbeat# drbdadm primary db
> /dev/drbd0: State change failed: (-7) Refusing to be Primary while 
> peer is not outdated
> Command 'drbdsetup /dev/drbd0 primary' terminated with exit code 11
>
> What's the issue here ? From my understanding dopd should:
>
> "invalidate remote DRBD disks if ONLY the replication link is broken 
> and heartbeat can still communitcate with the remote peer over an 
> alternate network - aka a second heartbeat". If the node to be 
> outdated is known to be dead by heartbeat, the node is dead and dopd 
> should just continue. This latter case is the behaviour if a node is 
> really dead."
>
> Or did I miss something ?
>
> Just for completeness:
> debnode2:/usr/lib/heartbeat# ls -al 
> /usr/lib/heartbeat/drbd-peer-outdater*
> -rwxr-xr-x 1 root root 8716 2008-08-05 15:49 
> /usr/lib/heartbeat/drbd-peer-outdater
> -rw-r--r-- 1 root root 4419 2008-08-05 15:49 
> /usr/lib/heartbeat/drbd-peer-outdater.bz2
> -rwxr-xr-x 1 root root 8716 2008-03-28 17:39 
> /usr/lib/heartbeat/drbd-peer-outdater.ORG
> debnode2:/usr/lib/heartbeat# ls -al /usr/lib/heartbeat/dopd*
> -rwxr-xr-x 1 root root 12744 2008-08-05 15:49 /usr/lib/heartbeat/dopd
> -rw-r--r-- 1 root root  6116 2008-08-05 15:49 /usr/lib/heartbeat/dopd.bz2
> -rwxr-xr-x 1 root root 12744 2008-03-28 17:39 /usr/lib/heartbeat/dopd.ORG
>
> debnode2:/usr/lib/heartbeat# md5sum /usr/lib/heartbeat/drbd-peer-outdater
> b195f526bb6fa3659f4c63e8f23b1d99  /usr/lib/heartbeat/drbd-peer-outdater
> debnode2:/usr/lib/heartbeat# md5sum /usr/lib/heartbeat/dopd
> 9ce67567ea50157bfd2f0e3f3623010d  /usr/lib/heartbeat/dopd
>
> debnode2:/usr/lib/heartbeat# dpkg -l | grep heartbeat
> ii  heartbeat                         
> 2.1.3-5~bpo40+1                          Subsystem for 
> High-Availability Linux
> ii  heartbeat-2                       
> 2.1.3-5~bpo40+1                          Subsystem for 
> High-Availability Linux
>
>
> Any hints appreshiated,
> Robert
>
> _______________________________________________
> drbd-user mailing list
> drbd-user at lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user




More information about the drbd-user mailing list