Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Dear Sir/Madam, Can anyone help me to diagnose what is the problem of my script? Cause i cannot get my HA to work as accordingly. I have tested using one drbd partition and tested ok with HA. Now i am using 2 drbd partitions. If manually run the drbd and mounting it, all works fine, but when i want it to be automatically up by HA, it is not working as i wanted. It just don't want to mount my drbd1 and then all things failed. I have included my drbd.conf, ha.cf, haresources and ha-debug log in this email. Cause i am not sure how to describe it. Help? From Cindy ----------------------------------------------- heartbeat[8043]: 2009/03/24_09:19:34 info: ************************** heartbeat[8043]: 2009/03/24_09:19:34 info: Configuration validated. Starting heartbeat 2.1.3 heartbeat[8044]: 2009/03/24_09:19:34 info: heartbeat: version 2.1.3 heartbeat[8044]: 2009/03/24_09:19:34 info: Heartbeat generation: 1237792620 heartbeat[8044]: 2009/03/24_09:19:34 info: glib: UDP Broadcast heartbeat started on port 694 (694) interface eth0 heartbeat[8044]: 2009/03/24_09:19:34 info: glib: UDP Broadcast heartbeat closed on port 694 interface eth0 - Status: 1 heartbeat[8044]: 2009/03/24_09:19:34 info: G_main_add_TriggerHandler: Added signal manual handler heartbeat[8044]: 2009/03/24_09:19:34 info: G_main_add_TriggerHandler: Added signal manual handler heartbeat[8044]: 2009/03/24_09:19:34 info: G_main_add_SignalHandler: Added signal handler for signal 17 heartbeat[8044]: 2009/03/24_09:19:34 info: Local status now set to: 'up' heartbeat[8044]: 2009/03/24_09:19:35 info: Link f10-1:eth0 up. heartbeat[8044]: 2009/03/24_09:20:24 info: Link f10-2:eth0 up. heartbeat[8044]: 2009/03/24_09:20:24 info: Status update for node f10-2: status up heartbeat[8055]: 2009/03/24_09:20:24 debug: notify_world: setting SIGCHLD Handler to SIG_DFL heartbeat[8044]: 2009/03/24_09:20:24 debug: get_delnodelist: delnodelist= harc[8055]: 2009/03/24_09:20:24 info: Running /etc/ha.d/rc.d/status status heartbeat[8044]: 2009/03/24_09:20:25 info: Comm_now_up(): updating status to active heartbeat[8044]: 2009/03/24_09:20:25 info: Local status now set to: 'active' heartbeat[8044]: 2009/03/24_09:20:25 info: Status update for node f10-2: status active heartbeat[8072]: 2009/03/24_09:20:25 debug: notify_world: setting SIGCHLD Handler to SIG_DFL harc[8072]: 2009/03/24_09:20:25 info: Running /etc/ha.d/rc.d/status status heartbeat[8044]: 2009/03/24_09:20:35 info: local resource transition completed. heartbeat[8044]: 2009/03/24_09:20:35 info: Initial resource acquisition complete (T_RESOURCES(us)) IPaddr[8127]: 2009/03/24_09:20:35 INFO: Resource is stopped heartbeat[8091]: 2009/03/24_09:20:35 info: Local Resource acquisition completed. heartbeat[8044]: 2009/03/24_09:20:35 debug: StartNextRemoteRscReq(): child count 1 heartbeat[8166]: 2009/03/24_09:20:35 debug: notify_world: setting SIGCHLD Handler to SIG_DFL harc[8166]: 2009/03/24_09:20:35 info: Running /etc/ha.d/rc.d/ip-request-resp ip-request-resp ip-request-resp[8166]: 2009/03/24_09:20:35 received ip-request-resp 172.16.3.100 OK yes ResourceManager[8187]: 2009/03/24_09:20:35 info: Acquiring resource group: f10-1 172.16.3.100 drbddisk::r0 drbddisk::r1 Filesystem::/dev/drbd0::/data::ext3 Filesystem::/dev/drbd1::/data2::ext3 heartbeat[8044]: 2009/03/24_09:20:35 info: remote resource transition completed. IPaddr[8214]: 2009/03/24_09:20:35 INFO: Resource is stopped ResourceManager[8187]: 2009/03/24_09:20:35 info: Running /etc/ha.d/resource.d/IPaddr 172.16.3.100 start ResourceManager[8187]: 2009/03/24_09:20:35 debug: Starting /etc/ha.d/resource.d/IPaddr 172.16.3.100 start IPaddr[8290]: 2009/03/24_09:20:35 INFO: Using calculated nic for 172.16.3.100: eth1 IPaddr[8290]: 2009/03/24_09:20:36 INFO: Using calculated netmask for 172.16.3.100: 255.255.0.0 IPaddr[8290]: 2009/03/24_09:20:36 DEBUG: Using calculated broadcast for 172.16.3.100: 172.16.255.255 IPaddr[8290]: 2009/03/24_09:20:36 INFO: eval ifconfig eth1:0 172.16.3.100 netmask 255.255.0.0 broadcast 172.16.255.255 IPaddr[8290]: 2009/03/24_09:20:36 DEBUG: Sending Gratuitous Arp for 172.16.3.100 on eth1:0 [eth1] IPaddr[8273]: 2009/03/24_09:20:36 INFO: Success INFO: Success ResourceManager[8187]: 2009/03/24_09:20:36 debug: /etc/ha.d/resource.d/IPaddr 172.16.3.100 start done. RC=0 Filesystem[8432]: 2009/03/24_09:20:36 INFO: Resource is stopped ResourceManager[8187]: 2009/03/24_09:20:36 info: Running /etc/ha.d/resource.d/Filesystem /dev/drbd0 /data ext3 start ResourceManager[8187]: 2009/03/24_09:20:36 debug: Starting /etc/ha.d/resource.d/Filesystem /dev/drbd0 /data ext3 start Filesystem[8513]: 2009/03/24_09:20:36 INFO: Running start for /dev/drbd0 on /data Filesystem[8502]: 2009/03/24_09:20:36 INFO: Success INFO: Success ResourceManager[8187]: 2009/03/24_09:20:36 debug: /etc/ha.d/resource.d/Filesystem /dev/drbd0 /data ext3 start done. RC=0 Filesystem[8581]: 2009/03/24_09:20:36 INFO: Resource is stopped *ResourceManager[8187]: 2009/03/24_09:20:36 info: Running /etc/ha.d/resource.d/Filesystem /dev/drbd1 /data2 ext3 start ResourceManager[8187]: 2009/03/24_09:20:36 debug: Starting /etc/ha.d/resource.d/Filesystem /dev/drbd1 /data2 ext3 start Filesystem[8662]: 2009/03/24_09:20:37 INFO: Running start for /dev/drbd1 on /data2 Filesystem[8662]: 2009/03/24_09:20:37 ERROR: Couldn't find filesystem ext3 in /proc/filesystems Filesystem[8651]: 2009/03/24_09:20:37 ERROR: Illegal argument ERROR: Illegal argument* ResourceManager[8187]: 2009/03/24_09:20:37 debug: /etc/ha.d/resource.d/Filesystem /dev/drbd1 /data2 ext3 start done. RC=2 ResourceManager[8187]: 2009/03/24_09:20:37 ERROR: Return code 2 from /etc/ha.d/resource.d/Filesystem ResourceManager[8187]: 2009/03/24_09:20:37 CRIT: Giving up resources due to failure of Filesystem::/dev/drbd1::/data2::ext3 ResourceManager[8187]: 2009/03/24_09:20:37 info: Releasing resource group: f10-1 172.16.3.100 drbddisk::r0 drbddisk::r1 Filesystem::/dev/drbd0::/data::ext3 Filesystem::/dev/drbd1::/data2::ext3 ResourceManager[8187]: 2009/03/24_09:20:37 info: Running /etc/ha.d/resource.d/Filesystem /dev/drbd1 /data2 ext3 stop ResourceManager[8187]: 2009/03/24_09:20:37 debug: Starting /etc/ha.d/resource.d/Filesystem /dev/drbd1 /data2 ext3 stop Filesystem[8765]: 2009/03/24_09:20:37 INFO: Running stop for /dev/drbd1 on /data2 Filesystem[8754]: 2009/03/24_09:20:37 INFO: Success INFO: Success ResourceManager[8187]: 2009/03/24_09:20:37 debug: /etc/ha.d/resource.d/Filesystem /dev/drbd1 /data2 ext3 stop done. RC=0 ResourceManager[8187]: 2009/03/24_09:20:37 info: Running /etc/ha.d/resource.d/Filesystem /dev/drbd0 /data ext3 stop ResourceManager[8187]: 2009/03/24_09:20:37 debug: Starting /etc/ha.d/resource.d/Filesystem /dev/drbd0 /data ext3 stop Filesystem[8848]: 2009/03/24_09:20:37 INFO: Running stop for /dev/drbd0 on /data Filesystem[8848]: 2009/03/24_09:20:37 INFO: Trying to unmount /data Filesystem[8848]: 2009/03/24_09:20:37 INFO: unmounted /data successfully Filesystem[8837]: 2009/03/24_09:20:37 INFO: Success INFO: Success ResourceManager[8187]: 2009/03/24_09:20:37 debug: /etc/ha.d/resource.d/Filesystem /dev/drbd0 /data ext3 stop done. RC=0 ResourceManager[8187]: 2009/03/24_09:20:37 info: Running /etc/ha.d/resource.d/drbddisk r1 stop ResourceManager[8187]: 2009/03/24_09:20:37 debug: Starting /etc/ha.d/resource.d/drbddisk r1 stop ResourceManager[8187]: 2009/03/24_09:20:37 debug: /etc/ha.d/resource.d/drbddisk r1 stop done. RC=0 ResourceManager[8187]: 2009/03/24_09:20:37 info: Running /etc/ha.d/resource.d/drbddisk r0 stop ResourceManager[8187]: 2009/03/24_09:20:37 debug: Starting /etc/ha.d/resource.d/drbddisk r0 stop ResourceManager[8187]: 2009/03/24_09:20:37 debug: /etc/ha.d/resource.d/drbddisk r0 stop done. RC=0 ResourceManager[8187]: 2009/03/24_09:20:37 info: Running /etc/ha.d/resource.d/IPaddr 172.16.3.100 stop ResourceManager[8187]: 2009/03/24_09:20:37 debug: Starting /etc/ha.d/resource.d/IPaddr 172.16.3.100 stop In IP Stop SIOCDELRT: No such process IPaddr[9012]: 2009/03/24_09:20:38 INFO: ifconfig eth1:0 down IPaddr[8995]: 2009/03/24_09:20:38 INFO: Success INFO: Success ResourceManager[8187]: 2009/03/24_09:20:38 debug: /etc/ha.d/resource.d/IPaddr 172.16.3.100 stop done. RC=0 hb_standby[9057]: 2009/03/24_09:21:08 Going standby [foreign]. heartbeat[8044]: 2009/03/24_09:21:08 info: f10-1 wants to go standby [foreign] heartbeat[8044]: 2009/03/24_09:21:08 info: standby: f10-2 can take our foreign resources heartbeat[9071]: 2009/03/24_09:21:08 info: give up foreign HA resources (standby). heartbeat[9071]: 2009/03/24_09:21:08 info: foreign HA resource release completed (standby). heartbeat[8044]: 2009/03/24_09:21:08 info: Local standby process completed [foreign]. heartbeat[8044]: 2009/03/24_09:21:08 WARN: 1 lost packet(s) for [f10-2] [36:38] heartbeat[8044]: 2009/03/24_09:21:08 info: remote resource transition completed. heartbeat[8044]: 2009/03/24_09:21:08 info: No pkts missing from f10-2! heartbeat[8044]: 2009/03/24_09:21:08 info: Other node completed standby takeover of foreign resources. --------------------------- [root at f10-1 ~]# cat /etc/ha.d/ha.cf debugfile /var/log/ha-debug logfile /var/log/ha-log logfacility local0 keepalive 2 deadtime 30 warntime 10 initdead 120 udpport 694 bcast eth0 # Linux auto_failback on node f10-1 node f10-2 --------------------------- [root at f10-1 ~]# cat /etc/ha.d/haresources f10-1 172.16.3.100 drbddisk::r0 drbddisk::r1 Filesystem::/dev/drbd0::/data::ext3 Filesystem::/dev/drbd1::/data2::ext3 --------------------------- [root at f10-1 ~]# cat /etc/drbd.conf global { usage-count yes; } common { syncer { rate 100M; } } resource r0 { protocol C; handlers { pri-on-incon-degr "echo o > /proc/sysrq-trigger ; halt -f"; pri-lost-after-sb "echo o > /proc/sysrq-trigger ; halt -f"; local-io-error "echo o > /proc/sysrq-trigger ; halt -f"; outdate-peer "/usr/lib/heartbeat/drbd-peer-outdater -t 5"; } startup { degr-wfc-timeout 120; # 2 minutes. } disk { on-io-error detach; } net { after-sb-0pri disconnect; after-sb-1pri disconnect; after-sb-2pri disconnect; rr-conflict disconnect; } syncer { rate 1000M; al-extents 257; } on f10-1 { device /dev/drbd0; disk /dev/VolGroup00/LogVol03; address 10.0.0.1:7788; meta-disk /dev/VolGroup00/LogVol02[0]; } on f10-2 { device /dev/drbd0; disk /dev/VolGroup00/LogVol03; address 10.0.0.2:7788; meta-disk /dev/VolGroup00/LogVol02[0]; } } resource r1 { protocol C; handlers { pri-on-incon-degr "echo o > /proc/sysrq-trigger ; halt -f"; pri-lost-after-sb "echo o > /proc/sysrq-trigger ; halt -f"; local-io-error "echo o > /proc/sysrq-trigger ; halt -f"; outdate-peer "/usr/lib/heartbeat/drbd-peer-outdater -t 5"; } startup { wfc-timeout 0; ## Infinite! degr-wfc-timeout 120; ## 2 minutes. } disk { on-io-error detach; } net { # timeout 60; # connect-int 10; # ping-int 10; # max-buffers 2048; # max-epoch-size 2048; # cram-hmac-alg "sha1"; # shared-secret "FooFunFactory"; after-sb-0pri disconnect; after-sb-1pri disconnect; after-sb-2pri disconnect; rr-conflict disconnect; } syncer { rate 1000M; al-extents 257; } device /dev/drbd1; disk /dev/VolGroup00/LogVol04; meta-disk /dev/VolGroup00/LogVol02[1]; on f10-1 { address 10.0.0.1:7789; } on f10-2 { address 10.0.0.2:7789; } }