Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hello all, I'm facing a problem with drbd and heartbeat. When the master node is powered off, the secondary node stays in a secondary state. It does not want to become primary. It says that the other pear is primary, but it is dead! I'm using drbd (0.7.2 from source) and heartbeat (1.2.0 from RPM) works well on a 2.6.8.1 kernel (source) on a Mandrake linux 10.0. They work well manually (heartbeat restart) or with the hb_standby command : services go well to the other node (2 nodes cluster). DRBD works well in such a context. But, when the power goes off on the primary node, the drbddisk script, launched by heartbeat to switch the node from secondary to primary state, can not switch DRBD to the primary state. The survival node stays in the secondary state. So, my data partition cannot be mounted. So, some services which rely on that partition cannot get up. It seems that the remainning node "thinks" that the primary node (which is dead!) is still in primary state and so, does not want to become primary. Please, tell me a bit about this. So, when a node crashes completely, how the other node be able to become primary and so, be able to mount my partition in a read-write mode? Thanks for a bit help. -------------- snip ha-debug of the remaining node heartbeat: 2004/10/13_15:45:43 debug: Starting /etc/rc.d/init.d/drbddisk r0 start ioctl(,SET_STATE,) failed: Permission denied Partner is already primary Command '/sbin/drbdsetup /dev/drbd/0 primary' terminated with exit code 20 drbdadm aborting heartbeat: 2004/10/13_15:45:43 debug: /etc/rc.d/init.d/drbddisk r0 start done. RC=0 --------------- snip --------------- snip ha-log of the remaining node heartbeat: 2004/10/13_15:45:43 WARN: node truc1: is dead heartbeat: 2004/10/13_15:45:43 WARN: No STONITH device configured. heartbeat: 2004/10/13_15:45:43 WARN: Shared disks are not protected. heartbeat: 2004/10/13_15:45:43 info: Resources being acquired from truc1. heartbeat: 2004/10/13_15:45:43 info: Link truc1:eth1 dead. heartbeat: 2004/10/13_15:45:43 info: Running /etc/ha.d/rc.d/status status heartbeat: 2004/10/13_15:45:43 info: No local resources [/usr/lib/heartbeat/ResourceManager listkeys truc2] to acquire. heartbeat: 2004/10/13_15:45:43 info: Taking over resource group drbddisk::r0 heartbeat: 2004/10/13_15:45:43 info: Acquiring resource group: truc1 drbddisk::r0 Filesystem::/dev/drbd/0::/data::ext3 192.168.2.211 named inst_dns_server heartbeat: 2004/10/13_15:45:43 info: Running /etc/rc.d/init.d/drbddisk r0 start heartbeat: 2004/10/13_15:45:43 info: Running /etc/ha.d/resource.d/Filesystem /dev/drbd/0 /data ext3 start heartbeat: 2004/10/13_15:45:43 ERROR: Couldn't mount filesystem /dev/drbd/0 on /data heartbeat: 2004/10/13_15:45:43 ERROR: Return code 1 from /etc/ha.d/resource.d/Filesystem heartbeat: 2004/10/13_15:45:43 info: Running /etc/ha.d/resource.d/IPaddr 192.168.2.211 start heartbeat: 2004/10/13_15:45:44 info: /sbin/ifconfig eth0:0 192.168.2.211 netmask 255.255.255.0 broadcast 192.168.2.255 heartbeat: 2004/10/13_15:45:44 info: Sending Gratuitous Arp for 192.168.2.211 on eth0:0 [eth0] heartbeat: 2004/10/13_15:45:44 /usr/lib/heartbeat/send_arp -i 500 -r 10 -p /var/lib/heartbeat/rsctmp/send_arp/send_arp-192.168.2.211 eth0 192.168.2.211 auto 192.168.2.211 ffffffffffff -------------------- snip -- ***************************** Richard NAGY Nameshield *****************************