Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hi, I have a cluster with 3 nodes and 2 resource groups which use 2 different ms drbd resources. The 3rd node plays the roll of the backup node of the other 2 nodes, so N+1 cluster. If I startup only the first and the second node I am getting the error " Failure: (124) Device is attached to a disk (use detach first)", but the promote works and the resources is started. I am only get that error in this specific scenario.If the third node is down and i restart ms drbd resources or restart the whole cluster with heartbeat stop/stop I don't receive that error. Any ideas? Since the resources are started I shouldn't be worry. But, I would like to know why I get the error, I don't like to see errors, even harmless, without knowing the reason. Thanks, Pavlos Sep 23 07:58:20 node-01 lrmd: [3604]: info: rsc:ip_02:2: probe Sep 23 07:58:20 node-01 crmd: [3607]: info: do_lrm_rsc_op: Performing key=4:1:7:7d69fd52-a69b-4f73-b325-60e3676d6068 op=ip_02_monitor_0 ) Sep 23 07:58:20 node-01 crmd: [3607]: info: do_lrm_rsc_op: Performing key=5:1:7:7d69fd52-a69b-4f73-b325-60e3676d6068 op=fs_02_monitor_0 ) Sep 23 07:58:20 node-01 lrmd: [3604]: info: rsc:fs_02:3: probe Sep 23 07:58:20 node-01 crmd: [3607]: info: do_lrm_rsc_op: Performing key=6:1:7:7d69fd52-a69b-4f73-b325-60e3676d6068 op=pbx_02_monitor_0 ) Sep 23 07:58:20 node-01 lrmd: [3604]: info: rsc:pbx_02:4: probe Sep 23 07:58:20 node-01 crmd: [3607]: info: do_lrm_rsc_op: Performing key=7:1:7:7d69fd52-a69b-4f73-b325-60e3676d6068 op=drbd_01:0_monitor_0 ) Sep 23 07:58:20 node-01 lrmd: [3604]: info: rsc:drbd_01:0:5: probe Sep 23 07:58:20 node-01 crmd: [3607]: info: do_lrm_rsc_op: Performing key=8:1:7:7d69fd52-a69b-4f73-b325-60e3676d6068 op=drbd_02:0_monitor_0 ) Sep 23 07:58:20 node-01 cib: [3614]: info: write_cib_contents: Archived previous version as /var/lib/heartbeat/crm/cib-65.raw Sep 23 07:58:20 node-01 crmd: [3607]: info: do_lrm_rsc_op: Performing key=9:1:7:7d69fd52-a69b-4f73-b325-60e3676d6068 op=ip_01_monitor_0 ) Sep 23 07:58:20 node-01 crmd: [3607]: info: do_lrm_rsc_op: Performing key=10:1:7:7d69fd52-a69b-4f73-b325-60e3676d6068 op=fs_01_monitor_0 ) Sep 23 07:58:20 node-01 crmd: [3607]: info: do_lrm_rsc_op: Performing key=11:1:7:7d69fd52-a69b-4f73-b325-60e3676d6068 op=pbx_01_monitor_0 ) Sep 23 07:58:21 node-01 cib: [3614]: info: write_cib_contents: Wrote version 0.299.0 of the CIB to disk (digest: 1d63099ba4a68abd3ae745a8c7ff3791) Sep 23 07:58:21 node-01 crmd: [3607]: info: process_lrm_event: LRM operation pbx_02_monitor_0 (call=4, rc=7, cib-update=8, confirmed=true) not running Sep 23 07:58:21 node-01 cib: [3614]: info: retrieveCib: Reading cluster configuration from: /var/lib/heartbeat/crm/cib.0aWXrl (digest: /var/lib/heartbeat/crm/cib.bQ9Y5r) Sep 23 07:58:21 node-01 Filesystem[3616]: [3655]: WARNING: Couldn't find device [/dev/drbd2]. Expected /dev/??? to exist Sep 23 07:58:21 node-01 crmd: [3607]: info: process_lrm_event: LRM operation fs_02_monitor_0 (call=3, rc=7, cib-update=9, confirmed=true) not running Sep 23 07:58:21 node-01 crm_attribute: [3697]: info: Invoked: crm_attribute -N node-01 -n master-drbd_01:0 -l reboot -D Sep 23 07:58:21 node-01 attrd: [3606]: info: find_hash_entry: Creating hash entry for master-drbd_01:0 Sep 23 07:58:21 node-01 crmd: [3607]: info: process_lrm_event: LRM operation drbd_01:0_monitor_0 (call=5, rc=7, cib-update=10, confirmed=true) not running Sep 23 07:58:21 node-01 crmd: [3607]: info: process_lrm_event: LRM operation ip_02_monitor_0 (call=2, rc=7, cib-update=11, confirmed=true) not running Sep 23 07:58:22 node-01 attrd: [3606]: info: attrd_ha_callback: flush message from node-02 Sep 23 07:58:22 node-01 attrd: [3606]: info: find_hash_entry: Creating hash entry for probe_complete Sep 23 07:58:22 node-01 lrmd: [3604]: info: rsc:drbd_02:0:6: probe Sep 23 07:58:22 node-01 lrmd: [3604]: info: rsc:ip_01:7: probe Sep 23 07:58:22 node-01 lrmd: [3604]: info: rsc:fs_01:8: probe Sep 23 07:58:22 node-01 lrmd: [3604]: info: rsc:pbx_01:9: probe Sep 23 07:58:22 node-01 Filesystem[3723]: [3760]: WARNING: Couldn't find device [/dev/drbd1]. Expected /dev/??? to exist Sep 23 07:58:22 node-01 crmd: [3607]: info: process_lrm_event: LRM operation fs_01_monitor_0 (call=8, rc=7, cib-update=12, confirmed=true) not running Sep 23 07:58:23 node-01 lrmd: [3604]: info: RA output: (drbd_02:0:probe:stderr) 'drbd_pbx_service_2' ignored, since this host (node-01) is not mentioned with an 'on' keyword. Sep 23 07:58:23 node-01 crmd: [3607]: info: process_lrm_event: LRM operation drbd_02:0_monitor_0 (call=6, rc=7, cib-update=13, confirmed=true) not running Sep 23 07:58:23 node-01 crmd: [3607]: info: process_lrm_event: LRM operation pbx_01_monitor_0 (call=9, rc=7, cib-update=14, confirmed=true) not running Sep 23 07:58:23 node-01 crmd: [3607]: info: process_lrm_event: LRM operation ip_01_monitor_0 (call=7, rc=7, cib-update=15, confirmed=true) not running Sep 23 07:58:25 node-01 attrd: [3606]: info: attrd_trigger_update: Sending flush op to all hosts for: probe_complete (true) Sep 23 07:58:25 node-01 lrmd: [3604]: info: rsc:drbd_01:0:10: start Sep 23 07:58:25 node-01 crmd: [3607]: info: do_lrm_rsc_op: Performing key=11:2:0:7d69fd52-a69b-4f73-b325-60e3676d6068 op=drbd_01:0_start_0 ) Sep 23 07:58:25 node-01 attrd: [3606]: info: attrd_perform_update: Sent update 11: probe_complete=true Sep 23 07:58:25 node-01 attrd: [3606]: info: attrd_trigger_update: Sending flush op to all hosts for: probe_complete (true) Sep 23 07:58:25 node-01 attrd: [3606]: info: attrd_perform_update: Sent update 14: probe_complete=true Sep 23 07:58:25 node-01 lrmd: [3604]: info: RA output: (drbd_01:0:start:stdout) Sep 23 07:58:25 node-01 lrmd: [3604]: info: RA output: (drbd_01:0:start:stdout) Found valid meta data in the expected location, 8587153408 bytes into /dev/sdd1. Sep 23 07:58:25 node-01 lrmd: [3604]: info: RA output: (drbd_01:0:start:stdout) Sep 23 07:58:25 node-01 lrmd: [3604]: info: RA output: (drbd_01:0:start:stderr) 1: Failure: (124) Device is attached to a disk (use detach first) Sep 23 07:58:25 node-01 lrmd: [3604]: info: RA output: (drbd_01:0:start:stderr) Command ' Sep 23 07:58:25 node-01 lrmd: [3604]: info: RA output: (drbd_01:0:start:stderr) drbdsetup Sep 23 07:58:25 node-01 lrmd: [3604]: info: RA output: (drbd_01:0:start:stderr) Sep 23 07:58:25 node-01 lrmd: [3604]: info: RA output: (drbd_01:0:start:stderr) 1 Sep 23 07:58:25 node-01 lrmd: [3604]: info: RA output: (drbd_01:0:start:stderr) Sep 23 07:58:25 node-01 lrmd: [3604]: info: RA output: (drbd_01:0:start:stderr) disk Sep 23 07:58:25 node-01 lrmd: [3604]: info: RA output: (drbd_01:0:start:stderr) Sep 23 07:58:25 node-01 lrmd: [3604]: info: RA output: (drbd_01:0:start:stderr) /dev/sdd1 Sep 23 07:58:25 node-01 lrmd: [3604]: info: RA output: (drbd_01:0:start:stderr) Sep 23 07:58:25 node-01 lrmd: [3604]: info: RA output: (drbd_01:0:start:stderr) /dev/sdd1 Sep 23 07:58:25 node-01 lrmd: [3604]: info: RA output: (drbd_01:0:start:stderr) Sep 23 07:58:25 node-01 lrmd: [3604]: info: RA output: (drbd_01:0:start:stderr) internal Sep 23 07:58:25 node-01 lrmd: [3604]: info: RA output: (drbd_01:0:start:stderr) Sep 23 07:58:25 node-01 lrmd: [3604]: info: RA output: (drbd_01:0:start:stderr) --set-defaults Sep 23 07:58:25 node-01 lrmd: [3604]: info: RA output: (drbd_01:0:start:stderr) Sep 23 07:58:25 node-01 lrmd: [3604]: info: RA output: (drbd_01:0:start:stderr) --create-device Sep 23 07:58:25 node-01 lrmd: [3604]: info: RA output: (drbd_01:0:start:stderr) Sep 23 07:58:25 node-01 lrmd: [3604]: info: RA output: (drbd_01:0:start:stderr) --fencing=resource-only Sep 23 07:58:25 node-01 lrmd: [3604]: info: RA output: (drbd_01:0:start:stderr) Sep 23 07:58:25 node-01 lrmd: [3604]: info: RA output: (drbd_01:0:start:stderr) --on-io-error=detach Sep 23 07:58:25 node-01 lrmd: [3604]: info: RA output: (drbd_01:0:start:stderr) ' terminated with exit code 10 Sep 23 07:58:25 node-01 drbd[3810]: [3864]: ERROR: drbd_pbx_service_1: Called drbdadm -c /etc/drbd.conf up drbd_pbx_service_1 Sep 23 07:58:25 node-01 drbd[3810]: [3866]: ERROR: drbd_pbx_service_1: Exit code 1 Sep 23 07:58:25 node-01 drbd[3810]: [3868]: ERROR: drbd_pbx_service_1: Command output: Sep 23 07:58:25 node-01 lrmd: [3604]: info: RA output: (drbd_01:0:start:stdout) Sep 23 07:58:26 node-01 attrd: [3606]: info: attrd_trigger_update: Sending flush op to all hosts for: master-drbd_01:0 (10000) Sep 23 07:58:26 node-01 attrd: [3606]: info: attrd_perform_update: Sent update 17: master-drbd_01:0=10000 Sep 23 07:58:26 node-01 lrmd: [3604]: info: RA output: (drbd_01:0:start:stdout) Sep 23 07:58:26 node-01 crmd: [3607]: info: process_lrm_event: LRM operation drbd_01:0_start_0 (call=10, rc=0, cib-update=16, confirmed=true) ok Sep 23 07:58:26 node-01 attrd: [3606]: info: attrd_ha_callback: flush message from node-02 Sep 23 07:58:26 node-01 attrd: [3606]: info: find_hash_entry: Creating hash entry for master-drbd_02:0 Sep 23 07:58:28 node-01 lrmd: [3604]: info: rsc:drbd_01:0:11: notify Sep 23 07:58:28 node-01 crmd: [3607]: info: do_lrm_rsc_op: Performing key=94:2:0:7d69fd52-a69b-4f73-b325-60e3676d6068 op=drbd_01:0_notify_0 ) Sep 23 07:58:28 node-01 lrmd: [3604]: info: RA output: (drbd_01:0:notify:stdout) Sep 23 07:58:28 node-01 crmd: [3607]: info: process_lrm_event: LRM operation drbd_01:0_notify_0 (call=11, rc=0, cib-update=17, confirmed=true) ok Sep 23 07:58:31 node-01 lrmd: [3604]: info: rsc:drbd_01:0:12: notify Sep 23 07:58:31 node-01 crmd: [3607]: info: do_lrm_rsc_op: Performing key=99:3:0:7d69fd52-a69b-4f73-b325-60e3676d6068 op=drbd_01:0_notify_0 ) Sep 23 07:58:31 node-01 crmd: [3607]: info: process_lrm_event: LRM operation drbd_01:0_notify_0 (call=12, rc=0, cib-update=18, confirmed=true) ok Sep 23 07:58:33 node-01 lrmd: [3604]: info: rsc:drbd_01:0:13: promote Sep 23 07:58:33 node-01 crmd: [3607]: info: do_lrm_rsc_op: Performing key=17:3:0:7d69fd52-a69b-4f73-b325-60e3676d6068 op=drbd_01:0_promote_0 ) Sep 23 07:58:33 node-01 lrmd: [3604]: info: RA output: (drbd_01:0:promote:stdout) Sep 23 07:58:33 node-01 crmd: [3607]: info: process_lrm_event: LRM operation drbd_01:0_promote_0 (call=13, rc=0, cib-update=19, confirmed=true) ok Sep 23 07:58:36 node-01 lrmd: [3604]: info: rsc:drbd_01:0:14: notify Sep 23 07:58:36 node-01 crmd: [3607]: info: do_lrm_rsc_op: Performing key=100:3:0:7d69fd52-a69b-4f73-b325-60e3676d6068 op=drbd_01:0_notify_0 ) Sep 23 07:58:36 node-01 lrmd: [3604]: info: RA output: (drbd_01:0:notify:stdout) Sep 23 07:58:36 node-01 crmd: [3607]: info: process_lrm_event: LRM operation drbd_01:0_notify_0 (call=14, rc=0, cib-update=20, confirmed=true) ok Sep 23 07:58:38 node-01 lrmd: [3604]: info: rsc:ip_01:15: start Sep 23 07:58:38 node-01 crmd: [3607]: info: do_lrm_rsc_op: Performing key=70:3:0:7d69fd52-a69b-4f73-b325-60e3676d6068 op=ip_01_start_0 ) Sep 23 07:58:38 node-01 IPaddr2[3991]: [4023]: INFO: ip -f inet addr add 10.10.10.10/25 brd 10.10.10.127 dev eth0 Sep 23 07:58:38 node-01 IPaddr2[3991]: [4026]: INFO: ip link set eth0 up Sep 23 07:58:38 node-01 IPaddr2[3991]: [4029]: INFO: /usr/lib/heartbeat/send_arp -i 200 -r 5 -p /var/run/heartbeat/rsctmp/send_arp/send_arp-10.10.10.10 eth0 10.10.10.10 auto not_used not_used Sep 23 07:58:38 node-01 crmd: [3607]: info: process_lrm_event: LRM operation ip_01_start_0 (call=15, rc=0, cib-update=21, confirmed=true) ok Sep 23 07:58:40 node-01 crmd: [3607]: info: do_lrm_rsc_op: Performing key=71:3:0:7d69fd52-a69b-4f73-b325-60e3676d6068 op=ip_01_monitor_5000 ) Sep 23 07:58:40 node-01 lrmd: [3604]: info: rsc:ip_01:16: monitor Sep 23 07:58:40 node-01 crmd: [3607]: info: do_lrm_rsc_op: Performing key=72:3:0:7d69fd52-a69b-4f73-b325-60e3676d6068 op=fs_01_start_0 ) Sep 23 07:58:40 node-01 lrmd: [3604]: info: rsc:fs_01:17: start Sep 23 07:58:40 node-01 Filesystem[4038]: [4098]: INFO: Running start for /dev/drbd1 on /pbx_service_01 Sep 23 07:58:40 node-01 crmd: [3607]: info: process_lrm_event: LRM operation ip_01_monitor_5000 (call=16, rc=0, cib-update=22, confirmed=false) ok Sep 23 07:58:40 node-01 crmd: [3607]: info: process_lrm_event: LRM operation fs_01_start_0 (call=17, rc=0, cib-update=23, confirmed=true) ok Sep 23 07:58:42 node-01 lrmd: [3604]: info: RA output: (ip_01:start:stderr) ARPING 10.10.10.10 from 10.10.10.10 eth0 Sent 5 probes (5 broadcast(s)) Received 0 response(s) Sep 23 07:58:44 node-01 lrmd: [3604]: info: rsc:fs_01:18: monitor Sep 23 07:58:44 node-01 crmd: [3607]: info: do_lrm_rsc_op: Performing key=73:3:0:7d69fd52-a69b-4f73-b325-60e3676d6068 op=fs_01_monitor_20000 ) Sep 23 07:58:44 node-01 lrmd: [3604]: info: rsc:pbx_01:19: start Sep 23 07:58:44 node-01 crmd: [3607]: info: do_lrm_rsc_op: Performing key=74:3:0:7d69fd52-a69b-4f73-b325-60e3676d6068 op=pbx_01_start_0 ) Sep 23 07:58:44 node-01 crmd: [3607]: info: process_lrm_event: LRM operation pbx_01_start_0 (call=19, rc=0, cib-update=24, confirmed=true) ok Sep 23 07:58:44 node-01 crmd: [3607]: info: process_lrm_event: LRM operation fs_01_monitor_20000 (call=18, rc=0, cib-update=25, confirmed=false) ok Sep 23 07:58:46 node-01 lrmd: [3604]: info: rsc:pbx_01:20: monitor Sep 23 07:58:46 node-01 crmd: [3607]: info: do_lrm_rsc_op: Performing key=75:3:0:7d69fd52-a69b-4f73-b325-60e3676d6068 op=pbx_01_monitor_20000 ) Sep 23 07:58:46 node-01 crmd: [3607]: info: process_lrm_event: LRM operation pbx_01_monitor_20000 (call=20, rc=0, cib-update=26, confirmed=false) ok [root at node-02 ~]# crm configure show node $id="b8ad13a6-8a6e-4304-a4a1-8f69fa735100" node-02 node $id="d5557037-cf8f-49b7-95f5-c264927a0c76" node-01 node $id="e5195d6b-ed14-4bb3-92d3-9105543f9251" node-03 primitive drbd_01 ocf:linbit:drbd \ params drbd_resource="drbd_pbx_service_1" \ op monitor interval="30s" primitive drbd_02 ocf:linbit:drbd \ params drbd_resource="drbd_pbx_service_2" \ op monitor interval="30s" primitive fs_01 ocf:heartbeat:Filesystem \ params device="/dev/drbd1" directory="/pbx_service_01" fstype="ext3" migration-threshold="3" failure-timeout="60" \ op monitor interval="20s" timeout="40s" OCF_CHECK_LEVEL="20" primitive fs_02 ocf:heartbeat:Filesystem \ params device="/dev/drbd2" directory="/pbx_service_02" fstype="ext3" primitive ip_01 ocf:heartbeat:IPaddr2 \ params ip="10.10.10.10" cidr_netmask="25" broadcast="10.10.10.127" failure-timeout="120" migration-threshold="3" \ op monitor interval="5s" primitive ip_02 ocf:heartbeat:IPaddr2 \ params ip="10.10.10.11" cidr_netmask="25" broadcast="10.10.10.127" \ op monitor interval="5s" primitive pbx_01 ocf:heartbeat:Dummy \ params state="/pbx_service_01/Dummy.state" failure-timeout="60" migration-threshold="3" \ op monitor interval="20s" timeout="40s" primitive pbx_02 ocf:heartbeat:Dummy \ params state="/pbx_service_02/Dummy.state" group pbx_service_01 ip_01 fs_01 pbx_01 \ meta target-role="Started" group pbx_service_02 ip_02 fs_02 pbx_02 \ meta target-role="Started" ms ms-drbd_01 drbd_01 \ meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true" target-role="Started" ms ms-drbd_02 drbd_02 \ meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true" target-role="Started" location PrimaryNode-drbd_01 ms-drbd_01 100: node-01 location PrimaryNode-drbd_02 ms-drbd_02 100: node-02 location PrimaryNode-pbx_service_01 pbx_service_01 200: node-01 location PrimaryNode-pbx_service_02 pbx_service_02 200: node-02 location SecondaryNode-drbd_01 ms-drbd_01 0: node-03 location SecondaryNode-drbd_02 ms-drbd_02 0: node-03 location SecondaryNode-pbx_service_01 pbx_service_01 10: node-03 location SecondaryNode-pbx_service_02 pbx_service_02 10: node-03 colocation fs_01-on-drbd_01 inf: pbx_service_01 ms-drbd_01:Master colocation fs_02-on-drbd_02 inf: fs_02 ms-drbd_02:Master colocation pbx_01-with-fs_01 inf: pbx_01 fs_01 colocation pbx_01-with-ip_01 inf: pbx_01 ip_01 colocation pbx_02-with-fs_02 inf: pbx_02 fs_02 colocation pbx_02-with-ip_02 inf: pbx_02 ip_02 order fs_01-after-drbd_01 inf: ms-drbd_01:promote pbx_service_01:start order fs_02-after-drbd_02 inf: ms-drbd_02:promote fs_02:start order pbx_01-after-fs_01 inf: fs_01 pbx_01 order pbx_01-after-ip_01 inf: ip_01 pbx_01 order pbx_02-after-fs_02 inf: fs_02 pbx_02 order pbx_02-after-ip_02 inf: ip_02 pbx_02 property $id="cib-bootstrap-options" \ dc-version="1.0.9-89bd754939df5150de7cd76835f98fe90851b677" \ cluster-infrastructure="Heartbeat" \ stonith-enabled="false" \ symmetric-cluster="false" rsc_defaults $id="rsc-options" \ resource-stickiness="1000" [root at node-02 ~]# cat /etc/drbd.conf ###common an all nodes # # please have a a look at the example configuration file in # /usr/share/doc/drbd83/drbd.conf # global { usage-count yes; } common { protocol C; syncer { csums-alg sha1; verify-alg sha1; rate 10M; } net { data-integrity-alg sha1; max-buffers 20480; max-epoch-size 16384; } disk { on-io-error detach; ### Only when DRBD is under cluster ### fencing resource-only; ### --- ### } startup { wfc-timeout 60; degr-wfc-timeout 30; outdated-wfc-timeout 15; } ### Only when DRBD is under cluster ### handlers { split-brain "/usr/lib/drbd/notify-split-brain.sh root"; fence-peer "/usr/lib/drbd/crm-fence-peer.sh"; after-resync-target "/usr/lib/drbd/crm-unfence-peer.sh"; } ### --- ### } resource drbd_pbx_service_1 { on node-01 { device /dev/drbd1; disk /dev/sdd1; address 10.10.10.129:7789; meta-disk internal; } on node-03 { device /dev/drbd1; disk /dev/sdd1; address 10.10.10.131:7789; meta-disk internal; } } resource drbd_pbx_service_2 { on node-02 { device /dev/drbd2; disk /dev/sdb1; address 10.10.10.130:7790; meta-disk internal; } on node-03 { device /dev/drbd2; disk /dev/sdc1; address 10.10.10.131:7790; meta-disk internal; } } [root at node-02 ~]# -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20100923/438308c3/attachment.htm>