[DRBD-user] How to become primary when the other node gets dead?

Richard NAGY r.nagy at nameshield.net
Wed Oct 13 16:23:49 CEST 2004

Hello all,

I'm facing a problem with drbd and heartbeat. When the master node is 
powered off, the secondary node stays in a secondary state. It does not 
want to become primary. It says that the other pear is primary, but it 
is dead!

I'm using drbd (0.7.2 from source) and heartbeat (1.2.0 from RPM) works 
well on a kernel (source) on a Mandrake linux 10.0. They work 
well manually (heartbeat restart) or with the hb_standby command : 
services go well to the other node (2 nodes cluster). DRBD works well in 
such a context. But, when the power goes off on the primary node, the 
drbddisk script, launched by heartbeat to switch the node from secondary 
to primary state, can not switch DRBD to the primary state. The survival 
node stays in the secondary state. So, my data partition cannot be 
mounted. So, some services which rely on that partition cannot get up.

It seems that the remainning node "thinks" that the primary node (which 
is dead!) is still in primary state and so, does not want to become 
primary. Please, tell me a bit about this.
So, when a node crashes completely, how the other node be able to become 
primary and so, be able to mount my partition in a read-write mode?

Thanks for a bit help.

-------------- snip ha-debug of the remaining node
heartbeat: 2004/10/13_15:45:43 debug: Starting /etc/rc.d/init.d/drbddisk 
r0 start
ioctl(,SET_STATE,) failed: Permission denied
Partner is already primary
Command '/sbin/drbdsetup /dev/drbd/0 primary' terminated with exit code 20
drbdadm aborting
heartbeat: 2004/10/13_15:45:43 debug: /etc/rc.d/init.d/drbddisk r0 start 
done. RC=0
--------------- snip

--------------- snip ha-log of the remaining node
heartbeat: 2004/10/13_15:45:43 WARN: node truc1: is dead
heartbeat: 2004/10/13_15:45:43 WARN: No STONITH device configured.
heartbeat: 2004/10/13_15:45:43 WARN: Shared disks are not protected.
heartbeat: 2004/10/13_15:45:43 info: Resources being acquired from truc1.
heartbeat: 2004/10/13_15:45:43 info: Link truc1:eth1 dead.
heartbeat: 2004/10/13_15:45:43 info: Running /etc/ha.d/rc.d/status status
heartbeat: 2004/10/13_15:45:43 info: No local resources 
[/usr/lib/heartbeat/ResourceManager listkeys truc2] to acquire.
heartbeat: 2004/10/13_15:45:43 info: Taking over resource group drbddisk::r0
heartbeat: 2004/10/13_15:45:43 info: Acquiring resource group: truc1 
drbddisk::r0 Filesystem::/dev/drbd/0::/data::ext3 named 
heartbeat: 2004/10/13_15:45:43 info: Running /etc/rc.d/init.d/drbddisk 
r0 start
heartbeat: 2004/10/13_15:45:43 info: Running 
/etc/ha.d/resource.d/Filesystem /dev/drbd/0 /data ext3 start
heartbeat: 2004/10/13_15:45:43 ERROR: Couldn't mount filesystem 
/dev/drbd/0 on /data
heartbeat: 2004/10/13_15:45:43 ERROR: Return code 1 from 
heartbeat: 2004/10/13_15:45:43 info: Running /etc/ha.d/resource.d/IPaddr start
heartbeat: 2004/10/13_15:45:44 info: /sbin/ifconfig eth0:0 
netmask  broadcast
heartbeat: 2004/10/13_15:45:44 info: Sending Gratuitous Arp for on eth0:0 [eth0]
heartbeat: 2004/10/13_15:45:44 /usr/lib/heartbeat/send_arp -i 500 -r 10 
-p /var/lib/heartbeat/rsctmp/send_arp/send_arp- eth0 auto ffffffffffff
-------------------- snip

Richard NAGY

