Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Dear Sir/Madam,
Can anyone help me to diagnose what is the problem of my script? Cause i
cannot get my HA to work as accordingly.
I have tested using one drbd partition and tested ok with HA.
Now i am using 2 drbd partitions. If manually run the drbd and mounting
it, all works fine, but when i want it to be automatically up by HA, it
is not working as i wanted. It just don't want to mount my drbd1 and
then all things failed.
I have included my drbd.conf, ha.cf, haresources and ha-debug log in
this email. Cause i am not sure how to describe it.
Help?
From
Cindy
-----------------------------------------------
heartbeat[8043]: 2009/03/24_09:19:34 info: **************************
heartbeat[8043]: 2009/03/24_09:19:34 info: Configuration validated.
Starting heartbeat 2.1.3
heartbeat[8044]: 2009/03/24_09:19:34 info: heartbeat: version 2.1.3
heartbeat[8044]: 2009/03/24_09:19:34 info: Heartbeat generation: 1237792620
heartbeat[8044]: 2009/03/24_09:19:34 info: glib: UDP Broadcast heartbeat
started on port 694 (694) interface eth0
heartbeat[8044]: 2009/03/24_09:19:34 info: glib: UDP Broadcast heartbeat
closed on port 694 interface eth0 - Status: 1
heartbeat[8044]: 2009/03/24_09:19:34 info: G_main_add_TriggerHandler:
Added signal manual handler
heartbeat[8044]: 2009/03/24_09:19:34 info: G_main_add_TriggerHandler:
Added signal manual handler
heartbeat[8044]: 2009/03/24_09:19:34 info: G_main_add_SignalHandler:
Added signal handler for signal 17
heartbeat[8044]: 2009/03/24_09:19:34 info: Local status now set to: 'up'
heartbeat[8044]: 2009/03/24_09:19:35 info: Link f10-1:eth0 up.
heartbeat[8044]: 2009/03/24_09:20:24 info: Link f10-2:eth0 up.
heartbeat[8044]: 2009/03/24_09:20:24 info: Status update for node f10-2:
status up
heartbeat[8055]: 2009/03/24_09:20:24 debug: notify_world: setting
SIGCHLD Handler to SIG_DFL
heartbeat[8044]: 2009/03/24_09:20:24 debug: get_delnodelist: delnodelist=
harc[8055]: 2009/03/24_09:20:24 info: Running /etc/ha.d/rc.d/status
status
heartbeat[8044]: 2009/03/24_09:20:25 info: Comm_now_up(): updating
status to active
heartbeat[8044]: 2009/03/24_09:20:25 info: Local status now set to: 'active'
heartbeat[8044]: 2009/03/24_09:20:25 info: Status update for node f10-2:
status active
heartbeat[8072]: 2009/03/24_09:20:25 debug: notify_world: setting
SIGCHLD Handler to SIG_DFL
harc[8072]: 2009/03/24_09:20:25 info: Running /etc/ha.d/rc.d/status
status
heartbeat[8044]: 2009/03/24_09:20:35 info: local resource transition
completed.
heartbeat[8044]: 2009/03/24_09:20:35 info: Initial resource acquisition
complete (T_RESOURCES(us))
IPaddr[8127]: 2009/03/24_09:20:35 INFO: Resource is stopped
heartbeat[8091]: 2009/03/24_09:20:35 info: Local Resource acquisition
completed.
heartbeat[8044]: 2009/03/24_09:20:35 debug: StartNextRemoteRscReq():
child count 1
heartbeat[8166]: 2009/03/24_09:20:35 debug: notify_world: setting
SIGCHLD Handler to SIG_DFL
harc[8166]: 2009/03/24_09:20:35 info: Running
/etc/ha.d/rc.d/ip-request-resp ip-request-resp
ip-request-resp[8166]: 2009/03/24_09:20:35 received ip-request-resp
172.16.3.100 OK yes
ResourceManager[8187]: 2009/03/24_09:20:35 info: Acquiring resource
group: f10-1 172.16.3.100 drbddisk::r0 drbddisk::r1
Filesystem::/dev/drbd0::/data::ext3 Filesystem::/dev/drbd1::/data2::ext3
heartbeat[8044]: 2009/03/24_09:20:35 info: remote resource transition
completed.
IPaddr[8214]: 2009/03/24_09:20:35 INFO: Resource is stopped
ResourceManager[8187]: 2009/03/24_09:20:35 info: Running
/etc/ha.d/resource.d/IPaddr 172.16.3.100 start
ResourceManager[8187]: 2009/03/24_09:20:35 debug: Starting
/etc/ha.d/resource.d/IPaddr 172.16.3.100 start
IPaddr[8290]: 2009/03/24_09:20:35 INFO: Using calculated nic for
172.16.3.100: eth1
IPaddr[8290]: 2009/03/24_09:20:36 INFO: Using calculated netmask for
172.16.3.100: 255.255.0.0
IPaddr[8290]: 2009/03/24_09:20:36 DEBUG: Using calculated broadcast
for 172.16.3.100: 172.16.255.255
IPaddr[8290]: 2009/03/24_09:20:36 INFO: eval ifconfig eth1:0
172.16.3.100 netmask 255.255.0.0 broadcast 172.16.255.255
IPaddr[8290]: 2009/03/24_09:20:36 DEBUG: Sending Gratuitous Arp for
172.16.3.100 on eth1:0 [eth1]
IPaddr[8273]: 2009/03/24_09:20:36 INFO: Success
INFO: Success
ResourceManager[8187]: 2009/03/24_09:20:36 debug:
/etc/ha.d/resource.d/IPaddr 172.16.3.100 start done. RC=0
Filesystem[8432]: 2009/03/24_09:20:36 INFO: Resource is stopped
ResourceManager[8187]: 2009/03/24_09:20:36 info: Running
/etc/ha.d/resource.d/Filesystem /dev/drbd0 /data ext3 start
ResourceManager[8187]: 2009/03/24_09:20:36 debug: Starting
/etc/ha.d/resource.d/Filesystem /dev/drbd0 /data ext3 start
Filesystem[8513]: 2009/03/24_09:20:36 INFO: Running start for
/dev/drbd0 on /data
Filesystem[8502]: 2009/03/24_09:20:36 INFO: Success
INFO: Success
ResourceManager[8187]: 2009/03/24_09:20:36 debug:
/etc/ha.d/resource.d/Filesystem /dev/drbd0 /data ext3 start done. RC=0
Filesystem[8581]: 2009/03/24_09:20:36 INFO: Resource is stopped
*ResourceManager[8187]: 2009/03/24_09:20:36 info: Running
/etc/ha.d/resource.d/Filesystem /dev/drbd1 /data2 ext3
start
ResourceManager[8187]: 2009/03/24_09:20:36 debug: Starting
/etc/ha.d/resource.d/Filesystem /dev/drbd1 /data2 ext3
start
Filesystem[8662]: 2009/03/24_09:20:37 INFO: Running start for
/dev/drbd1 on /data2
Filesystem[8662]: 2009/03/24_09:20:37 ERROR: Couldn't find filesystem
ext3
in /proc/filesystems
Filesystem[8651]: 2009/03/24_09:20:37 ERROR: Illegal argument
ERROR: Illegal argument*
ResourceManager[8187]: 2009/03/24_09:20:37 debug:
/etc/ha.d/resource.d/Filesystem /dev/drbd1 /data2 ext3
start done. RC=2
ResourceManager[8187]: 2009/03/24_09:20:37 ERROR: Return code 2 from
/etc/ha.d/resource.d/Filesystem
ResourceManager[8187]: 2009/03/24_09:20:37 CRIT: Giving up resources
due to failure of Filesystem::/dev/drbd1::/data2::ext3
ResourceManager[8187]: 2009/03/24_09:20:37 info: Releasing resource
group: f10-1 172.16.3.100 drbddisk::r0 drbddisk::r1
Filesystem::/dev/drbd0::/data::ext3 Filesystem::/dev/drbd1::/data2::ext3
ResourceManager[8187]: 2009/03/24_09:20:37 info: Running
/etc/ha.d/resource.d/Filesystem /dev/drbd1 /data2 ext3
stop
ResourceManager[8187]: 2009/03/24_09:20:37 debug: Starting
/etc/ha.d/resource.d/Filesystem /dev/drbd1 /data2 ext3
stop
Filesystem[8765]: 2009/03/24_09:20:37 INFO: Running stop for
/dev/drbd1 on /data2
Filesystem[8754]: 2009/03/24_09:20:37 INFO: Success
INFO: Success
ResourceManager[8187]: 2009/03/24_09:20:37 debug:
/etc/ha.d/resource.d/Filesystem /dev/drbd1 /data2 ext3
stop done. RC=0
ResourceManager[8187]: 2009/03/24_09:20:37 info: Running
/etc/ha.d/resource.d/Filesystem /dev/drbd0 /data ext3 stop
ResourceManager[8187]: 2009/03/24_09:20:37 debug: Starting
/etc/ha.d/resource.d/Filesystem /dev/drbd0 /data ext3 stop
Filesystem[8848]: 2009/03/24_09:20:37 INFO: Running stop for
/dev/drbd0 on /data
Filesystem[8848]: 2009/03/24_09:20:37 INFO: Trying to unmount /data
Filesystem[8848]: 2009/03/24_09:20:37 INFO: unmounted /data successfully
Filesystem[8837]: 2009/03/24_09:20:37 INFO: Success
INFO: Success
ResourceManager[8187]: 2009/03/24_09:20:37 debug:
/etc/ha.d/resource.d/Filesystem /dev/drbd0 /data ext3 stop done. RC=0
ResourceManager[8187]: 2009/03/24_09:20:37 info: Running
/etc/ha.d/resource.d/drbddisk r1 stop
ResourceManager[8187]: 2009/03/24_09:20:37 debug: Starting
/etc/ha.d/resource.d/drbddisk r1 stop
ResourceManager[8187]: 2009/03/24_09:20:37 debug:
/etc/ha.d/resource.d/drbddisk r1 stop done. RC=0
ResourceManager[8187]: 2009/03/24_09:20:37 info: Running
/etc/ha.d/resource.d/drbddisk r0 stop
ResourceManager[8187]: 2009/03/24_09:20:37 debug: Starting
/etc/ha.d/resource.d/drbddisk r0 stop
ResourceManager[8187]: 2009/03/24_09:20:37 debug:
/etc/ha.d/resource.d/drbddisk r0 stop done. RC=0
ResourceManager[8187]: 2009/03/24_09:20:37 info: Running
/etc/ha.d/resource.d/IPaddr 172.16.3.100 stop
ResourceManager[8187]: 2009/03/24_09:20:37 debug: Starting
/etc/ha.d/resource.d/IPaddr 172.16.3.100 stop
In IP Stop
SIOCDELRT: No such process
IPaddr[9012]: 2009/03/24_09:20:38 INFO: ifconfig eth1:0 down
IPaddr[8995]: 2009/03/24_09:20:38 INFO: Success
INFO: Success
ResourceManager[8187]: 2009/03/24_09:20:38 debug:
/etc/ha.d/resource.d/IPaddr 172.16.3.100 stop done. RC=0
hb_standby[9057]: 2009/03/24_09:21:08 Going standby [foreign].
heartbeat[8044]: 2009/03/24_09:21:08 info: f10-1 wants to go standby
[foreign]
heartbeat[8044]: 2009/03/24_09:21:08 info: standby: f10-2 can take our
foreign resources
heartbeat[9071]: 2009/03/24_09:21:08 info: give up foreign HA resources
(standby).
heartbeat[9071]: 2009/03/24_09:21:08 info: foreign HA resource release
completed (standby).
heartbeat[8044]: 2009/03/24_09:21:08 info: Local standby process
completed [foreign].
heartbeat[8044]: 2009/03/24_09:21:08 WARN: 1 lost packet(s) for [f10-2]
[36:38]
heartbeat[8044]: 2009/03/24_09:21:08 info: remote resource transition
completed.
heartbeat[8044]: 2009/03/24_09:21:08 info: No pkts missing from f10-2!
heartbeat[8044]: 2009/03/24_09:21:08 info: Other node completed standby
takeover of foreign resources.
---------------------------
[root at f10-1 ~]# cat /etc/ha.d/ha.cf
debugfile /var/log/ha-debug
logfile /var/log/ha-log
logfacility local0
keepalive 2
deadtime 30
warntime 10
initdead 120
udpport 694
bcast eth0 # Linux
auto_failback on
node f10-1
node f10-2
---------------------------
[root at f10-1 ~]# cat /etc/ha.d/haresources
f10-1 172.16.3.100 drbddisk::r0 drbddisk::r1
Filesystem::/dev/drbd0::/data::ext3 Filesystem::/dev/drbd1::/data2::ext3
---------------------------
[root at f10-1 ~]# cat /etc/drbd.conf
global {
usage-count yes;
}
common {
syncer { rate 100M; }
}
resource r0 {
protocol C;
handlers {
pri-on-incon-degr "echo o > /proc/sysrq-trigger ; halt -f";
pri-lost-after-sb "echo o > /proc/sysrq-trigger ; halt -f";
local-io-error "echo o > /proc/sysrq-trigger ; halt -f";
outdate-peer "/usr/lib/heartbeat/drbd-peer-outdater -t 5";
}
startup {
degr-wfc-timeout 120; # 2 minutes.
}
disk {
on-io-error detach;
}
net {
after-sb-0pri disconnect;
after-sb-1pri disconnect;
after-sb-2pri disconnect;
rr-conflict disconnect;
}
syncer {
rate 1000M;
al-extents 257;
}
on f10-1 {
device /dev/drbd0;
disk /dev/VolGroup00/LogVol03;
address 10.0.0.1:7788;
meta-disk /dev/VolGroup00/LogVol02[0];
}
on f10-2 {
device /dev/drbd0;
disk /dev/VolGroup00/LogVol03;
address 10.0.0.2:7788;
meta-disk /dev/VolGroup00/LogVol02[0];
}
}
resource r1 {
protocol C;
handlers {
pri-on-incon-degr "echo o > /proc/sysrq-trigger ; halt -f";
pri-lost-after-sb "echo o > /proc/sysrq-trigger ; halt -f";
local-io-error "echo o > /proc/sysrq-trigger ; halt -f";
outdate-peer "/usr/lib/heartbeat/drbd-peer-outdater -t 5";
}
startup {
wfc-timeout 0; ## Infinite!
degr-wfc-timeout 120; ## 2 minutes.
}
disk {
on-io-error detach;
}
net {
# timeout 60;
# connect-int 10;
# ping-int 10;
# max-buffers 2048;
# max-epoch-size 2048;
# cram-hmac-alg "sha1";
# shared-secret "FooFunFactory";
after-sb-0pri disconnect;
after-sb-1pri disconnect;
after-sb-2pri disconnect;
rr-conflict disconnect;
}
syncer {
rate 1000M;
al-extents 257;
}
device /dev/drbd1;
disk /dev/VolGroup00/LogVol04;
meta-disk /dev/VolGroup00/LogVol02[1];
on f10-1 {
address 10.0.0.1:7789;
}
on f10-2 {
address 10.0.0.2:7789;
}
}