Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
i originally posted this on the linux-ha list, but received no replies. i was hoping somebody here might be able to shed some light on the issue: i've run into a very weird problem involving heartbeat, dopd, drbd, and the linux bonding driver. first, my setup: OS: centos 5.2 running kernel 2.6.18-92.1.10.el5PAE heartbeat: 2.1.3 w/ dopd patch applied drbd: 8.2.6 for installed kernel the 'w/ dopd patch applied' refers to this patch that fixes a critical breakage in dopd: http://hg.linux-ha.org/dev/rev/47f60bebe7b2 note that the problem i describe below happens with or without the patch. basically, when i turn on dopd in my ha.cf file with the following directives: respawn hacluster /usr/lib/heartbeat/dopd apiauth dopd gid=haclient uid=hacluster the bonding setup that i have for my drbd connection tanks with messages like these: Aug 31 19:53:25 beast kernel: bonding: bond1: link status definitely up for interface eth3. Aug 31 19:53:25 beast kernel: bonding: bond1: making interface eth3 the new active one. Aug 31 19:53:25 beast kernel: bonding: bond1: first active interface up! Aug 31 19:53:25 beast kernel: bonding: bond1: link status definitely up for interface eth4. Aug 31 19:53:25 beast kernel: bonding: bond1: link status definitely up for interface eth5. Aug 31 20:18:27 beast kernel: bonding: bond1: link status definitely down for interface eth5, disabling it Aug 31 20:18:28 beast kernel: bonding: bond1: link status definitely down for interface eth3, disabling it Aug 31 20:18:28 beast kernel: bonding: bond1: link status definitely down for interface eth4, disabling it Aug 31 20:18:28 beast kernel: bonding: bond1: now running without any active interface ! Aug 31 20:18:28 beast kernel: bonding: bond1: Error: found a client with no channel in the client's hash table the bond has three slave interfaces, using three crossover cables that run directly from one machine to the other. the bond works perfectly with heartbeat and drbd _until_ i add those directives listed above to turn on dopd, at which point it tanks with those error messages. there's nothing special or interesting in the heartbeat logs that i can see -- when dopd gets started by heartbeat then the problem occurs. in the meantime i've simply gotten rid of the bond, and heartbeat/dopd/ drbd are all running happily together. i sure would like to have my cake and eat it, too, though -- so if anybody has any suggestions about how i can fix this, i'd surely appreciate it!