Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
This looks like a drbd issue, you might have more luck on that list. On Fri, Dec 3, 2010 at 4:15 PM, Dan Frincu <dfrincu at streamwide.ro> wrote: > Hi, > > Don't know how to summarize what I've encountered, therefore the rather lame > subject. I'm running a HA setup of 2 nodes on RHEL5U3, and I have done the > following tests: > - configure resources, everything runs on cluster1 > - hard reboot of both active and passive nodes (cluster1 and cluster2) at > the same time (echo b > /proc/sysrq-trigger) > - after reboot, checked resources, all went fine, they started on cluster1 > - hard rebooted active node (cluster1) > - second node detects failure of cluster1, but gets stuck when it tries to > start the drbd resources > > Dec 03 16:39:40 cluster2 crmd: [1699]: info: process_lrm_event: LRM > operation ping_gw:0_monitor_0 (call=13, rc=7, cib-update=16, confirmed=true) > not running > Dec 03 16:39:41 cluster2 crm_attribute: [2585]: info: Invoked: crm_attribute > -N cluster2 -n master-drbd_mysql:1 -l reboot -D > Dec 03 16:39:41 cluster2 crm_attribute: [2584]: info: Invoked: crm_attribute > -N cluster2 -n master-drbd_home:1 -l reboot -D > Dec 03 16:39:41 cluster2 crm_attribute: [2583]: info: Invoked: crm_attribute > -N cluster2 -n master-drbd_storage:1 -l reboot -D > Dec 03 16:39:41 cluster2 attrd: [1697]: info: find_hash_entry: Creating hash > entry for master-drbd_home:1 > Dec 03 16:39:41 cluster2 attrd: [1697]: info: find_hash_entry: Creating hash > entry for master-drbd_storage:1 > Dec 03 16:39:41 cluster2 attrd: [1697]: info: find_hash_entry: Creating hash > entry for master-drbd_mysql:1 > Dec 03 16:39:41 cluster2 crmd: [1699]: info: process_lrm_event: LRM > operation drbd_storage:1_monitor_0 (call=12, rc=7, cib-update=17, > confirmed=true) not running > Dec 03 16:39:41 cluster2 crmd: [1699]: info: process_lrm_event: LRM > operation drbd_home:1_monitor_0 (call=10, rc=7, cib-update=18, > confirmed=true) not running > Dec 03 16:39:41 cluster2 crmd: [1699]: info: process_lrm_event: LRM > operation drbd_mysql:1_monitor_0 (call=11, rc=7, cib-update=19, > confirmed=true) not running > Dec 03 16:39:41 cluster2 attrd: [1697]: info: find_hash_entry: Creating hash > entry for probe_complete > Dec 03 16:39:41 cluster2 attrd: [1697]: info: attrd_trigger_update: Sending > flush op to all hosts for: probe_complete (true) > Dec 03 16:39:41 cluster2 attrd: [1697]: info: attrd_perform_update: Sent > update 10: probe_complete=true > Dec 03 16:39:41 cluster2 crmd: [1699]: info: do_lrm_rsc_op: Performing > key=96:2:0:35c8ee9b-f99d-4d0a-9d16-41d119326713 op=ping_gw:0_start_0 ) > Dec 03 16:39:41 cluster2 lrmd: [1696]: info: rsc:ping_gw:0:14: start > Dec 03 16:39:41 cluster2 crmd: [1699]: info: do_lrm_rsc_op: Performing > key=14:2:0:35c8ee9b-f99d-4d0a-9d16-41d119326713 op=drbd_home:1_start_0 ) > Dec 03 16:39:41 cluster2 lrmd: [1696]: info: rsc:drbd_home:1:15: start > Dec 03 16:39:41 cluster2 crmd: [1699]: info: do_lrm_rsc_op: Performing > key=42:2:0:35c8ee9b-f99d-4d0a-9d16-41d119326713 op=drbd_mysql:1_start_0 ) > Dec 03 16:39:41 cluster2 lrmd: [1696]: info: rsc:drbd_mysql:1:16: start > Dec 03 16:39:41 cluster2 crmd: [1699]: info: do_lrm_rsc_op: Performing > key=70:2:0:35c8ee9b-f99d-4d0a-9d16-41d119326713 op=drbd_storage:1_start_0 ) > Dec 03 16:39:41 cluster2 lrmd: [1696]: info: rsc:drbd_storage:1:17: start > Dec 03 16:39:41 cluster2 lrmd: [1696]: info: RA output: > (drbd_mysql:1:start:stdout) > > Dec 03 16:39:41 cluster2 lrmd: [1696]: info: RA output: > (drbd_storage:1:start:stdout) > > Dec 03 16:39:41 cluster2 lrmd: [1696]: info: RA output: > (drbd_home:1:start:stdout) > > Dec 03 16:39:41 cluster2 lrmd: [1696]: info: RA output: > (drbd_storage:1:start:stdout) > > Dec 03 16:39:41 cluster2 lrmd: [1696]: info: RA output: > (drbd_mysql:1:start:stdout) > > Dec 03 16:39:41 cluster2 crm_attribute: [2746]: info: Invoked: crm_attribute > -N cluster2 -n master-drbd_storage:1 -l reboot -v 5 > Dec 03 16:39:41 cluster2 attrd: [1697]: info: attrd_trigger_update: Sending > flush op to all hosts for: master-drbd_storage:1 (5) > Dec 03 16:39:41 cluster2 lrmd: [1696]: info: RA output: > (drbd_mysql:1:start:stderr) 0: Failure: (124) Device is attached to a disk > (use detach first) > > Dec 03 16:39:41 cluster2 lrmd: [1696]: info: RA output: > (drbd_mysql:1:start:stderr) Command 'drbdsetup 0 disk > Dec 03 16:39:41 cluster2 lrmd: [1696]: info: RA output: > (drbd_mysql:1:start:stderr) /dev/xvda5 /dev/xvda5 internal --set-defaults > --create-device --fencing=resource-only --on-io-e > rror=detach' terminated with exit code 10 > > And then continues with > > Dec 03 16:40:42 cluster2 lrmd: [1696]: info: RA output: > (drbd_mysql:1:promote:stderr) 0: State change failed: (-7) Refusing to be > Primary while peer is not outdated > > Dec 03 16:40:42 cluster2 lrmd: [1696]: info: RA output: > (drbd_mysql:1:promote:stderr) Command ' > Dec 03 16:40:42 cluster2 lrmd: [1696]: info: RA output: > (drbd_mysql:1:promote:stderr) drbdsetup > Dec 03 16:40:42 cluster2 lrmd: [1696]: info: RA output: > (drbd_mysql:1:promote:stderr) > Dec 03 16:40:42 cluster2 lrmd: [1696]: info: RA output: > (drbd_mysql:1:promote:stderr) 0 > Dec 03 16:40:42 cluster2 lrmd: [1696]: info: RA output: > (drbd_mysql:1:promote:stderr) > Dec 03 16:40:42 cluster2 lrmd: [1696]: info: RA output: > (drbd_mysql:1:promote:stderr) primary > Dec 03 16:40:42 cluster2 lrmd: [1696]: info: RA output: > (drbd_mysql:1:promote:stderr) ' terminated with exit code 11 > > Dec 03 16:40:42 cluster2 lrmd: [1696]: info: RA output: > (drbd_mysql:1:promote:stdout) > > Dec 03 16:40:50 cluster2 lrmd: [1696]: info: RA output: > (drbd_mysql:1:promote:stderr) 0: State change failed: (-7) Refusing to be > Primary while peer is not outdated > > Dec 03 16:40:50 cluster2 lrmd: [1696]: info: RA output: > (drbd_mysql:1:promote:stderr) Command 'drbdsetup 0 primary > Dec 03 16:40:50 cluster2 lrmd: [1696]: info: RA output: > (drbd_mysql:1:promote:stderr) ' terminated with exit code 11 > > Anyways, long story short, several minutes later and hundreds of messages in > the log file the cluster settles into a working state. > > The RPM's I used are > # rpm -qa | grep '(corosync|drbd|pacemaker|resource-agents|openais)' > corosynclib-1.3.0-1 > openaislib-1.1.3-1.6.el5 > drbd83-8.3.2-6.el5_3 > corosync-1.3.0-1 > pacemaker-libs-1.0.10-1.4.el5 > openais-1.1.3-1.6.el5 > pacemaker-1.0.10-1.4.el5 > kmod-drbd83-xen-8.3.2-6.el5_3 > resource-agents-1.0.3-2.6.el5 > > Attached is a hb_report. > > I've never experienced this kind of behavior before, is this normal? > > Regards, > Dan > > -- > Dan FRINCU > Systems Engineer > CCNA, RHCE > Streamwide Romania > > > _______________________________________________ > Openais mailing list > Openais at lists.linux-foundation.org > https://lists.linux-foundation.org/mailman/listinfo/openais >