Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Tim Jackson wrote: > Christophe Zwecker wrote: > >> node1 is primary with mounted fs >> node2 is secondary >> >> nod1 goes down (only network failure), > > "only" network failure? Which network? In many cases, a network failure > alone is worse than one box completely failing, because it can cause > "split brain" if you're not careful. I plugged the network cable from node1, leaves the crossover cable between node1 + node2 > What connections do you have for Heartbeat to use? (A serial heartbeat > is always a good idea if you can have it). As many redundant paths as > possible is good. (typical might be 3: replication (crossover) network > between the DRBD machines, "normal" network and serial heartbeat) I use a crossover cable for testing, ill add serial for production >> heartbeat unmounts the drbd fs on node1. node 2 takes over and mounts >> the drbd volume. > > And what happens to node1 here? Are you sure that Heartbeat stops the > DRBD services? My guess is that you have a single network connection for > both DRBD and Heartbeat, in which case DRBD will still be primary on node1. yes heartbeat stops drbd on node1 and starts it on node2 heartbeat[17239]: 2006/09/15_15:08:42 WARN: node 192.168.1.254: is dead heartbeat[17239]: 2006/09/15_15:08:42 info: Link 192.168.1.254:192.168.1.254 dead. harc[18084]: 2006/09/15_15:08:42 info: Running /etc/ha.d/rc.d/status status heartbeat[17239]: 2006/09/15_15:08:54 info: mw-test-n1.i-dis.net wants to go standby [all] heartbeat[17239]: 2006/09/15_15:08:55 info: standby: mw-test-n2.i-dis.net can take our all resources heartbeat[18103]: 2006/09/15_15:08:55 info: give up all HA resources (standby). ResourceManager[18113]: 2006/09/15_15:08:55 info: Releasing resource group: mw-test-n1.i-dis.net drbddisk::ha Filesystem::/dev/drbd0::/ha::ext3 192.168.1.123 httpd mysql ResourceManager[18113]: 2006/09/15_15:08:55 info: Running /etc/init.d/mysql stop ResourceManager[18113]: 2006/09/15_15:08:59 info: Running /etc/init.d/httpd stop ResourceManager[18113]: 2006/09/15_15:08:59 info: Running /etc/ha.d/resource.d/IPaddr 192.168.1.123 stop IPaddr[18295]: 2006/09/15_15:08:59 INFO: /sbin/route -n del -host 192.168.1.123 IPaddr[18295]: 2006/09/15_15:08:59 INFO: /sbin/ifconfig eth0:0 192.168.1.123 down IPaddr[18295]: 2006/09/15_15:08:59 INFO: IP Address 192.168.1.123 released IPaddr[18225]: 2006/09/15_15:08:59 INFO: IPaddr Success ResourceManager[18113]: 2006/09/15_15:08:59 info: Running /etc/ha.d/resource.d/Filesystem /dev/drbd0 /ha ext3 stop Filesystem[18415]: 2006/09/15_15:09:00 INFO: Running stop for /dev/drbd0 on /ha Filesystem[18415]: 2006/09/15_15:09:00 INFO: unmounted /ha successfully Filesystem[18351]: 2006/09/15_15:09:00 INFO: Filesystem Success ResourceManager[18113]: 2006/09/15_15:09:00 info: Running /etc/ha.d/resource.d/drbddisk ha stop heartbeat[18103]: 2006/09/15_15:09:00 info: all HA resource release completed (standby). so on node1: [root at mw-test-n1 ~]# cat /proc/drbd version: 0.7.21 (api:79/proto:74) SVN Revision: 2326 build by root at mw-test-n1.i-dis.net, 2006-09-11 16:41:09 0: cs:WFConnection st:Secondary/Unknown ld:Consistent ns:402708 nr:444 dw:403368 dr:14442 al:104 bm:381 lo:0 pe:0 ua:0 ap:0 on node2: [root at mw-test-n2 ~]# cat /proc/drbd version: 0.7.21 (api:79/proto:74) SVN Revision: 2326 build by root at mw-test-n1.i-dis.net, 2006-09-11 16:41:09 0: cs:WFConnection st:Primary/Unknown ld:Consistent ns:444 nr:402708 dw:403800 dr:12215 al:15 bm:18 lo:0 pe:0 ua:0 ap:0 >> node1 comes backup, mounts drbd volume and the change aint there >> because: >> Sep 15 13:47:03 mw-test-n2 kernel: drbd0: Current Primary shall become >> sync TARGET! Aborting to prevent data corruption. > > DRBD is doing the right thing here. Either your nodes weren't really > synchronised before the failure, or you had a split brain where DRBD was > primary on both machines. the data was synced for sure. could it be that the problem is, when node1 comes backup, on node1 drbd is switched to primary beforce its being synced ? -- Christophe Zwecker :Sysctl Koppel 96 20099 Hamburg phon: +49 40 41263790 fax: +49 40 41263799 mail: czwecker at sysctl.de