Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi All, I prepared a small cluster with drbd and heartbeat with these characteristics and configurations Distribution Debian Etch with a minimum possible backports packets kernel 2.6.25 drbd 8.0.11 built in kernel heartbeat 2.1.3 the two machines are alessandra (master) lalla (slave) ================================================================================ drbd.conf global { usage-count no; } common { syncer { rate 100M; } } resource r0 { protocol C; handlers { pri-on-incon-degr "echo o > /proc/sysrq-trigger ; /etc/init.d/heartbeat stop"; pri-lost-after-sb "echo o > /proc/sysrq-trigger ; halt -f"; local-io-error "echo o > /proc/sysrq-trigger ; halt -f"; outdate-peer "/usr/lib/heartbeat/drbd-peer-outdater -t 5"; pri-lost "echo pri-lost. Have a look at the log files. | mail -s 'DRBD Alert' root"; split-brain "echo split-brain. drbdadm -- --discard-my-data connect $DRBD_RESOURCE ? | mail -s 'DRBD Alert' root"; } startup { wfc-timeout 60; degr-wfc-timeout 120; # 2 minutes. } disk { on-io-error detach; } net { rr-conflict disconnect; after-sb-0pri discard-younger-primary; after-sb-1pri consensus; after-sb-2pri disconnect; } syncer { rate 100M; al-extents 257; } on alessandra { device /dev/drbd0; disk /dev/hda10; address 192.168.1.245:7788; meta-disk internal; } on lalla { device /dev/drbd0; disk /dev/hda9; address 192.168.1.15:7788; meta-disk internal; } } resource "r1" { protocol C; startup { wfc-timeout 60; ## Infinite! degr-wfc-timeout 120; ## 2 minutes. } disk { on-io-error detach; } net { } syncer { rate 100M; } on alessandra { device /dev/drbd1; disk /dev/hda11; address 192.168.1.245:7789; meta-disk internal; } on lalla { device /dev/drbd1; disk /dev/hda10; address 192.168.1.15:7789; meta-disk internal; } } ================================================================================== ha.cf debugfile /var/log/ha-debug logfile /var/log/ha-log logfacility local0 keepalive 2 deadtime 30 initdead 120 udpport 694 ucast eth1 192.168.2.246 (on alessandra 192.168.2.245 on lalla) auto_failback on node alessandra node lalla ================================================================================== autkeys auth 2 2 sha1 unabellapasswordlungaecomplicata ================================================================================== haresources alessandra IPaddr::192.168.2.247/24/eth1 \ drbddisk::r0 Filesystem::/dev/drbd0::/cache::ext3::defaults \ drbddisk::r1 Filesystem::/dev/drbd1::/jumper::ext3::defaults \ bind9 samba winbind bla bla bla \ MailTo::mario.guenzi at oceano.lan::Transizione_alessandra_lalla ============================================================================== the drbd devices are in /etc/fstab with noauto option If I shutdown the master, the slave take over all of them without any problem, it happens the same at reboot and if I type /etc/init.d/heartbeat stop and start. But if I unplug the network cable to simulate a fault in slave syslog I found this: lalla:/# May 30 09:32:16 lalla heartbeat: [3018]: WARN: node alessandra: is dead May 30 09:32:16 lalla heartbeat: [3018]: WARN: No STONITH device configured. May 30 09:32:16 lalla heartbeat: [3018]: WARN: Shared disks are not protected. May 30 09:32:16 lalla heartbeat: [3018]: info: Resources being acquired from alessandra. May 30 09:32:16 lalla heartbeat: [3018]: info: Link alessandra:eth1 dead. May 30 09:32:16 lalla heartbeat: [8063]: debug: notify_world: setting SIGCHLD Handler to SIG_DFL May 30 09:32:16 lalla harc[8063]: info: Running /etc/ha.d/rc.d/status status May 30 09:32:16 lalla heartbeat: [8064]: info: No local resources [/usr/share/heartbeat/ResourceManager listkeys lalla] to acquire. May 30 09:32:16 lalla heartbeat: [3018]: debug: StartNextRemoteRscReq(): child count 1 May 30 09:32:16 lalla mach_down[8092]: info: Taking over resource group IPaddr::192.168.2.247/24/eth1 May 30 09:32:16 lalla ResourceManager[8118]: info: Acquiring resource group: alessandra IPaddr::192.168.2.247/24/eth1 drbddisk::r0 Filesystem::/dev/drbd0::/cache::ext3::defaults drbddisk::r1 Filesystem::/dev/drbd1::/jumper::ext3::defaults bind9 MailTo::mario.guenzi at oceano.lan::Transizione_lalla_alessandra May 30 09:32:16 lalla IPaddr[8145]: INFO: Resource is stopped May 30 09:32:16 lalla ResourceManager[8118]: info: Running /etc/ha.d/resource.d/IPaddr 192.168.2.247/24/eth1 start May 30 09:32:16 lalla ResourceManager[8118]: debug: Starting /etc/ha.d/resource.d/IPaddr 192.168.2.247/24/eth1 start May 30 09:32:17 lalla IPaddr[8243]: INFO: Using calculated netmask for 192.168.2.247: 255.255.255.0 May 30 09:32:17 lalla IPaddr[8243]: DEBUG: Using calculated broadcast for 192.168.2.247: 192.168.2.255 May 30 09:32:17 lalla IPaddr[8243]: INFO: eval ifconfig eth1:0 192.168.2.247 netmask 255.255.255.0 broadcast 192.168.2.255 May 30 09:32:17 lalla IPaddr[8243]: DEBUG: Sending Gratuitous Arp for 192.168.2.247 on eth1:0 [eth1] May 30 09:32:17 lalla IPaddr[8214]: INFO: Success May 30 09:32:17 lalla ResourceManager[8118]: debug: /etc/ha.d/resource.d/IPaddr 192.168.2.247/24/eth1 start done. RC=0 May 30 09:32:17 lalla ResourceManager[8118]: info: Running /etc/ha.d/resource.d/drbddisk r0 start May 30 09:32:17 lalla ResourceManager[8118]: debug: Starting /etc/ha.d/resource.d/drbddisk r0 start May 30 09:32:29 lalla ResourceManager[8118]: debug: /etc/ha.d/resource.d/drbddisk r0 start done. RC=1 May 30 09:32:29 lalla ResourceManager[8118]: ERROR: Return code 1 from /etc/ha.d/resource.d/drbddisk May 30 09:32:29 lalla ResourceManager[8118]: CRIT: Giving up resources due to failure of drbddisk::r0 May 30 09:32:29 lalla ResourceManager[8118]: info: Releasing resource group: alessandra IPaddr::192.168.2.247/24/eth1 drbddisk::r0 Filesystem::/dev/drbd0::/cache::ext3::defaults drbddisk::r1 Filesystem::/dev/drbd1::/jumper::ext3::defaults bind9 MailTo::mario.guenzi at oceano.lan::Transizione_lalla_alessandra May 30 09:32:30 lalla ResourceManager[8118]: info: Running /etc/ha.d/resource.d/MailTo mario.guenzi at oceano.lan <mailto:mario.guenzi at oceano.lan> Transizione_lalla_alessandra stop May 30 09:32:30 lalla ResourceManager[8118]: debug: Starting /etc/ha.d/resource.d/MailTo mario.guenzi at oceano.lan <mailto:mario.guenzi at oceano.lan> Transizione_lalla_alessandra stop May 30 09:32:30 lalla MailTo[8431]: INFO: Success May 30 09:32:30 lalla ResourceManager[8118]: debug: /etc/ha.d/resource.d/MailTo mario.guenzi at oceano.lan <mailto:mario.guenzi at oceano.lan> Transizione_lalla_alessandra stop done. RC=0 May 30 09:32:30 lalla ResourceManager[8118]: info: Running /etc/init.d/bind9 stop May 30 09:32:30 lalla ResourceManager[8118]: debug: Starting /etc/init.d/bind9 stop May 30 09:32:30 lalla ResourceManager[8118]: debug: /etc/init.d/bind9 stop done. RC=0 May 30 09:32:30 lalla ResourceManager[8118]: info: Running /etc/ha.d/resource.d/Filesystem /dev/drbd1 /jumper ext3 defaults stop May 30 09:32:30 lalla ResourceManager[8118]: debug: Starting /etc/ha.d/resource.d/Filesystem /dev/drbd1 /jumper ext3 defaults stop May 30 09:32:30 lalla Filesystem[8531]: INFO: Running stop for /dev/drbd1 on /jumper May 30 09:32:31 lalla Filesystem[8520]: INFO: Success May 30 09:32:31 lalla ResourceManager[8118]: debug: /etc/ha.d/resource.d/Filesystem /dev/drbd1 /jumper ext3 defaults stop done. RC=0 May 30 09:32:31 lalla ResourceManager[8118]: info: Running /etc/ha.d/resource.d/drbddisk r1 stop May 30 09:32:31 lalla ResourceManager[8118]: debug: Starting /etc/ha.d/resource.d/drbddisk r1 stop May 30 09:32:31 lalla ResourceManager[8118]: debug: /etc/ha.d/resource.d/drbddisk r1 stop done. RC=0 May 30 09:32:31 lalla ResourceManager[8118]: info: Running /etc/ha.d/resource.d/Filesystem /dev/drbd0 /cache ext3 defaults stop May 30 09:32:31 lalla ResourceManager[8118]: debug: Starting /etc/ha.d/resource.d/Filesystem /dev/drbd0 /cache ext3 defaults stop May 30 09:32:31 lalla Filesystem[8641]: INFO: Running stop for /dev/drbd0 on /cache May 30 09:32:31 lalla Filesystem[8630]: INFO: Success May 30 09:32:31 lalla ResourceManager[8118]: debug: /etc/ha.d/resource.d/Filesystem /dev/drbd0 /cache ext3 defaults stop done. RC=0 May 30 09:32:31 lalla ResourceManager[8118]: info: Running /etc/ha.d/resource.d/drbddisk r0 stop May 30 09:32:31 lalla ResourceManager[8118]: debug: Starting /etc/ha.d/resource.d/drbddisk r0 stop May 30 09:32:31 lalla ResourceManager[8118]: debug: /etc/ha.d/resource.d/drbddisk r0 stop done. RC=0 May 30 09:32:31 lalla ResourceManager[8118]: info: Running /etc/ha.d/resource.d/IPaddr 192.168.2.247/24/eth1 stop May 30 09:32:31 lalla ResourceManager[8118]: debug: Starting /etc/ha.d/resource.d/IPaddr 192.168.2.247/24/eth1 stop May 30 09:32:32 lalla IPaddr[8769]: INFO: ifconfig eth1:0 down May 30 09:32:32 lalla IPaddr[8740]: INFO: Success May 30 09:32:32 lalla ResourceManager[8118]: debug: /etc/ha.d/resource.d/IPaddr 192.168.2.247/24/eth1 stop done. RC=0 May 30 09:32:32 lalla mach_down[8092]: info: /usr/share/heartbeat/mach_down: nice_failback: foreign resources acquired May 30 09:32:32 lalla mach_down[8092]: info: mach_down takeover complete for node alessandra. May 30 09:32:32 lalla heartbeat: [3018]: info: mach_down takeover complete. May 30 09:33:02 lalla hb_standby[8823]: Going standby [foreign]. May 30 09:33:02 lalla heartbeat: [3018]: info: lalla wants to go standby [foreign] May 30 09:33:13 lalla heartbeat: [3018]: WARN: No reply to standby request. Standby request cancelled. and obviously does not start anything. As it returns error "1",(lalla ResourceManager[8118]: ERROR: Return code 1 from /etc/ha.d/resource.d/drbddisk) I tried to search with google and documentation drbd but I could not find anything, can someone please explain what I do wrong? thanks in advance and best regards - -- Mario Vittorio Guenzi http://clark.tipistrani.it Si vis pacem para bellum -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIRkRjm6qs1ZkNrIoRAmBUAJwIPD7g+blqWu4snVPef19goSbDyQCfZNvf b4SsPj675oYC+VjkVorcHWo= =KXDW -----END PGP SIGNATURE-----