Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hi everyone, I'm running a cluster (of 2 nodes) with Drbd and Heartbeat, I ve done a lot of test and all worked out properly, but I can't understand why the most simple of them doesn't work. The problem is that when I pull off the plug of interface eth0 (public) of the primary the secondary for some reason can't start the takeover procedure.First of all here is my ha.cf file and my drbd.conf file: ha.cf logfile /var/log/ha-log keepalive 2 deadtime 60 initdead 120 bcast eth1 bcast eth0 serial /dev/ttyS0 auto_failback on node host1 host2 respawn root /etc/init.d/apache2 respawn root /etc/init.d/postgresql respawn root /usr/lib/heartbeat/ipfail ping (#ip addr gateway) drbd.conf resource r0 { protocol C; net { timeout 15; } syncer { group 0; rate 5M; } on host1 { device /dev/drbd0; disk /dev/hda5; address 192.168.0.1:7788; meta-disk internal; } on host2 { device /dev/drbd0; disk /dev/hda5; address 192.168.0.2:7788; meta-disk internal; } } resource r1 { protocol C; net { timeout 15; } syncer { group 1; rate 5M; } on host1 { device /dev/drbd1; disk /dev/hdc1; address 192.168.0.1:7789; meta-disk internal; } on host2 { device /dev/drbd1; disk /dev/hda6; address 192.168.0.2:7789; meta-disk internal; } meta-disk internal; } } The fact is that, -if i plug off both the serial and the eth1,and then I stop heartbeat on the primary,the takeover takes effect correctly -If i plug off eth0 they both go in a stat of "standby":none of them is working but,anlizing the log file (I don't report it,too long..) I see that: 1)on the primary the takeover procedure has started for the secondary,but i have a warning that the second node is down (while it is up!!) and the primary goes on having all the resources (for example the IP address) 2) on the secondary I have a message : "Both nodes own our resources" In the end if i plug again eth0 the strange thing is that the takeover has effect (all resources goes on the secondary) and then , for the "auto_failback on" they all comes back to the primary... I thought it was a problem of timeout also,but that shouldn't even work when i switch off heartbeat manually (I think) ,while it works if i do this.. I think I've config properly everything (ipfail in particular..) what's wrong then?should I avoid to send broadcast on eth0? Thanks in advance for those that can let me understand...