Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Thank you so much! Adjusting the timeout of the peer-outdater was the solution. Of course I did look at the logs myself but I could not get a clou out of it. Again, thanks a lot for your explainations. Christoph 2008/9/18 Lars Ellenberg <lars.ellenberg at linbit.com>: > On Thu, Sep 18, 2008 at 10:29:33AM +0200, Christoph Eßer wrote: >> Ooops, gmail incorrectly labeled this mail as sent to the Linux-HA >> list. This is the right one. Sorry for disturbance. > > and it is much too long... > >> I added a complete log for the reboot process on my slave machine. >> Maybe this helps. > > you could have looked at those logs yourself, you know? > >> 2008/9/18 Christoph Eßer <cehser at gmail.com>: >> > Sorry, I accidentially sent the message again to this list. just ignore it. >> > >> > 2008/9/18 Christoph Eßer <cehser at gmail.com>: >> >> Hi there, >> >> >> >> I first came up with this question on Linux-HA mailing list. As they think >> >> it semms to be a DRBD related problem, I repeat it here: >> >> >> >> I am using Heartbeat 2.99,Pacemaker 0.6.6 and DRBD 8.0.13 on Debian Lenny. >> >> >> >> Anything works as expected if I just force the system to failover by >> >> unplugging the power supply of any of my two nodes. Shutting down the >> >> DRBD slave via "init 0" or "init 6" works fine as well. >> >> >> >> But whenever I restart my DRBD master via "init 6" the whole cluster >> >> crashes and fails to reassign the resources, even after the node started >> >> up again. Anyway, I found some strange DRBD related messages in >> >> /var/log/syslog on the second node while restarting the master node: > >> Sep 18 08:14:22 viktor-01 drbd[3863]: [3900]: DEBUG: r0 status: Secondary/Primary Secondary Primary Connected > > fine. > > >> Sep 18 08:15:19 viktor-01 kernel: [ 274.894318] drbd0: peer( Primary -> Secondary ) > > ok, peer is going secondary. > >> Sep 18 08:15:24 viktor-01 kernel: [ 279.772756] drbd0: peer( Secondary -> Unknown ) conn( Connected -> TearDown ) pdsk( UpToDate -> DUnknown ) > > peer disconnected > >> Sep 18 08:15:24 viktor-01 kernel: [ 279.772756] drbd0: conn( Unconnected -> WFConnection ) > > we go wait for the peer to come back. > >> Sep 18 08:15:25 viktor-01 cib: [2279]: info: apply_xml_diff: Digest mis-match: expected 615bd497ee2d3cfdec1bc8611c9af264, calculated dd22ea8b07a5c626d0c03c021d1c3045 > > dunno whether this is normal behaviour of heartbeat: > >> Sep 18 08:15:25 viktor-01 cib: [2279]: info: cib_process_diff: Diff 0.209.194 -> 0.209.195 not applied to 0.209.194: Failed application of a global update. Requesting full refresh. >> Sep 18 08:15:25 viktor-01 cib: [2279]: info: cib_process_diff: Requesting re-sync from peer: Failed application of a global update. Requesting full refresh. >> Sep 18 08:15:25 viktor-01 cib: [2279]: WARN: do_cib_notify: cib_apply_diff of <diff > FAILED: Application of an update diff failed, requesting a full refresh >> Sep 18 08:15:25 viktor-01 drbd[4021]: [4050]: DEBUG: r0: Calling drbdadm -c /etc/drbd.conf cstate r0 >> Sep 18 08:15:25 viktor-01 cib: [2279]: WARN: cib_process_request: cib_apply_diff operation failed: Application of an update diff failed, requesting a full refresh >> Sep 18 08:15:25 viktor-01 cib: [2279]: WARN: cib_process_diff: Not applying diff 0.209.195 -> 0.209.196 (sync in progress) >> Sep 18 08:15:25 viktor-01 drbd[4021]: [4057]: DEBUG: r0: Exit code 0 >> Sep 18 08:15:25 viktor-01 cib: [2279]: WARN: do_cib_notify: cib_apply_diff of <diff > FAILED: Application of an update diff failed, requesting a full refresh >> Sep 18 08:15:25 viktor-01 drbd[4021]: [4058]: DEBUG: r0: Command output: WFConnection >> Sep 18 08:15:25 viktor-01 cib: [2279]: WARN: cib_process_request: cib_apply_diff operation failed: Application of an update diff failed, requesting a full refresh > > >> Sep 18 08:15:28 viktor-01 crmd: [2283]: notice: crmd_client_status_callback: Status update: Client viktor-02/crmd now has status [offline] >> Sep 18 08:15:28 viktor-01 crmd: [2283]: info: crm_update_peer_proc: viktor-02.crmd is now offline > > but finally peer is offline from heartbeats perspective as well. > >> Sep 18 08:15:28 viktor-01 crmd: [2283]: info: crmd_client_status_callback: Got client status callback - our DC is dead > > unfortunately it was the DC, > so heartbeat elects a new DC. > >> Sep 18 08:15:28 viktor-01 crmd: [2283]: info: do_state_transition: State transition S_NOT_DC -> S_ELECTION [ input=I_ELECTION cause=C_CRMD_STATUS_CALLBACK origin=crmd_client_status_callback ] >> Sep 18 08:15:28 viktor-01 crmd: [2283]: info: update_dc: Unset DC viktor-02 > > ... > >> Sep 18 08:15:35 viktor-01 ccm: [2278]: info: Break tie for 2 nodes cluster >> Sep 18 08:15:35 viktor-01 crmd: [2283]: info: ccm_event_detail: NEW MEMBERSHIP: trans=3, nodes=1, new=0, lost=1 n_idx=0, new_idx=1, old_idx=3 > > ... > >> Sep 18 08:15:35 viktor-01 crmd: [2283]: info: update_dc: Set DC to viktor-01 (2.1) > > there. we are the DC. > >> Sep 18 08:15:36 viktor-01 crmd: [2283]: info: do_state_transition: All 1 cluster nodes responded to the join offer. > > cool. I agree with myself. > > > we start the resource stack: > >> Sep 18 08:15:37 viktor-01 pengine: [4066]: WARN: native_color: Resource drbd:0 cannot run anywhere >> Sep 18 08:15:37 viktor-01 pengine: [4066]: info: master_color: Promoting drbd:1 (Slave viktor-01) > > ... >> Sep 18 08:15:37 viktor-01 pengine: [4066]: notice: NoRoleChange: Promote drbd:1#011(Slave -> Master viktor-01) >> Sep 18 08:15:37 viktor-01 pengine: [4066]: notice: RecurringOp: Start recurring monitor (29s) for drbd:1 on viktor-01 >> Sep 18 08:15:37 viktor-01 pengine: [4066]: notice: StartRsc: viktor-01#011Start extern >> Sep 18 08:15:37 viktor-01 pengine: [4066]: notice: StartRsc: viktor-01#011Start eth3:0 > ... >> Sep 18 08:15:37 viktor-01 pengine: [4066]: notice: StartRsc: viktor-01#011Start fs_mount >> Sep 18 08:15:37 viktor-01 pengine: [4066]: notice: StartRsc: viktor-01#011Start Exim4 >> Sep 18 08:15:37 viktor-01 pengine: [4066]: notice: StartRsc: viktor-01#011Start Squid > > ... > >> Sep 18 08:15:37 viktor-01 crmd: [2283]: info: do_lrm_rsc_op: Performing op=drbd:1_promote_0 key=7:0:0:48972b3a-fc9c-405d-bf6c-1385730bb2b3) > >> Sep 18 08:15:38 viktor-01 drbd[4080]: [4117]: DEBUG: r0 status: Secondary/Unknown Secondary Unknown WFConnection >> Sep 18 08:15:38 viktor-01 drbd[4080]: [4118]: DEBUG: r0: Calling drbdadm -c /etc/drbd.conf primary r0 >> Sep 18 08:15:38 viktor-01 kernel: [ 293.423767] drbd0: helper command: /sbin/drbdadm outdate-peer > > you configured drbd to use the outdate-peer handler. > so it does. > >> Sep 18 08:15:38 viktor-01 drbd-peer-outdater: [4124]: debug: drbd peer: viktor-02 >> Sep 18 08:15:38 viktor-01 drbd-peer-outdater: [4124]: debug: drbd resource: r0 >> Sep 18 08:15:38 viktor-01 /usr/lib/heartbeat/dopd: [2284]: debug: Connecting channel >> Sep 18 08:15:38 viktor-01 /usr/lib/heartbeat/dopd: [2284]: debug: Client outdater (0x9864a50) connected >> Sep 18 08:15:38 viktor-01 /usr/lib/heartbeat/dopd: [2284]: debug: invoked: outdater >> Sep 18 08:15:38 viktor-01 /usr/lib/heartbeat/dopd: [2284]: debug: Processing msg from outdater >> Sep 18 08:15:38 viktor-01 /usr/lib/heartbeat/dopd: [2284]: debug: Got message from (drbd-peer-outdater). (peer: viktor-02, res :r0) >> Sep 18 08:15:38 viktor-01 /usr/lib/heartbeat/dopd: [2284]: debug: Starting node walk >> Sep 18 08:15:39 viktor-01 /usr/lib/heartbeat/dopd: [2284]: debug: node viktor-02 found >> Sep 18 08:15:39 viktor-01 /usr/lib/heartbeat/dopd: [2284]: info: sending start_outdate message to the other node viktor-01 -> viktor-02 >> Sep 18 08:15:39 viktor-01 /usr/lib/heartbeat/dopd: [2284]: debug: sending [start_outdate res: r0] to node: viktor-02 >> Sep 18 08:15:39 viktor-01 /usr/lib/heartbeat/dopd: [2284]: debug: Processed 1 messages > > meanwhile the peer rebooted? > >> Sep 18 08:15:41 viktor-01 kernel: [ 297.000380] e1000: eth2: e1000_watchdog: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX >> Sep 18 08:15:44 viktor-01 kernel: [ 300.008048] drbd0: connect failed, err = -115 >> Sep 18 08:15:54 viktor-01 kernel: [ 310.108051] drbd0: connect failed, err = -115 > > but has no drbd port open yet. > >> Sep 18 08:15:57 viktor-01 lrmd: [2280]: WARN: drbd:1:promote process (PID 4080) timed out (try 1). Killing with signal SIGTERM (15). > > > the timeout on the promote is too short, > compared to the timeout of drbd-peer-outdater. > add "-t 5" to the drbd-peer-outdater command in drbd.conf, > or increase the promote timeouts. > > >> Sep 18 08:15:57 viktor-01 lrmd: [2280]: WARN: operation promote[54] on ocf::drbd::drbd:1 for client 2283, its parameters: clone_max=[2] CRM_meta_role=[Master] drbdconf=[/etc/drbd.conf] CRM_meta_notify_start_resource=[drbd:0 drbd:1 ] CRM_meta_notify_promote_resource=[drbd:1 ] drbd_resource=[r0] CRM_meta_master_max=[1] CRM_meta_stateful=[true] CRM_meta_timeout=[20000] crm_feature_set=[2.1] CRM_meta_clone_max=[2] CRM_meta_master_node_max=[1] master_node_max=[1] master_max=[1] CRM_meta_notify_promote_uname=[viktor-01 ] CRM_meta_notify_start_uname=[viktor-: pid [4080] timed out >> Sep 18 08:15:57 viktor-01 crmd: [2283]: ERROR: process_lrm_event: LRM operation drbd:1_promote_0 (54) Timed Out (timeout=20000ms) > > default timeout for drbd-peer-outdater is 60 seconds. > promote timeout is set to 20 seconds. > > too bad. > "recovery action" of heartbeat for a failed promote is > probably "stop" "start". while drbd itself is still trying to become > primary, once that drbd-peer-outdate helper returns. > > hm, thats going to hurt. > > > ... blabla ... > >> Sep 18 08:15:58 viktor-01 pengine: [4066]: info: master_color: Promoting drbd:1 (Master viktor-01) >> Sep 18 08:15:58 viktor-01 pengine: [4066]: info: master_color: ms_drbd_ip: Promoted 1 instances of a possible 1 to master >> Sep 18 08:15:58 viktor-01 pengine: [4066]: info: master_color: ms_drbd_ip: Promoted 1 instances of a possible 1 to master >> Sep 18 08:15:58 viktor-01 pengine: [4066]: notice: NoRoleChange: Leave resource drbd:1#011(Master viktor-01) > > heartbeat seems to be confused. > wants to promote the master? but it is already master? > > > ok, so it tries to demote first: > >> Sep 18 08:15:58 viktor-01 lrmd: [2280]: info: rsc:drbd:1: demote >> Sep 18 08:15:58 viktor-01 crmd: [2283]: info: do_lrm_rsc_op: Performing op=drbd:1_demote_0 key=6:1:0:48972b3a-fc9c-405d-bf6c-1385730bb2b3) > >> Sep 18 08:15:58 viktor-01 drbd[4137]: [4148]: DEBUG: r0: Calling drbdadm -c /etc/drbd.conf secondary r0 >> Sep 18 08:15:59 viktor-01 drbd[4137]: [4155]: DEBUG: r0: Exit code 0 >> Sep 18 08:15:59 viktor-01 drbd[4137]: [4156]: DEBUG: r0: Command output: No response from the DRBD driver! Is the module loaded? Command 'drbdsetup /dev/drbd0 secondary' terminated with exit code 20 drbdadm aborting > > hu? did not work? well, not, because the promotion is still pending... > so what is it? exit code 20, or exit code 0? > >> Sep 18 08:15:59 viktor-01 lrmd: [2280]: info: RA output: (drbd:1:demote:stdout) No response from the DRBD driver! Is the module loaded? Command 'drbdsetup /dev/drbd0 secondary' terminated with exit code 20 drbdadm aborting > > sure it does not answer to role change requests, > as there is a role change request still pending. > >> Sep 18 08:16:00 viktor-01 heartbeat: [2173]: WARN: node viktor-02: is dead >> Sep 18 08:16:00 viktor-01 crmd: [2283]: notice: crmd_ha_status_callback: Status update: Node viktor-02 now has status [dead] >> Sep 18 08:16:00 viktor-01 heartbeat: [2173]: info: Link viktor-02:eth0 dead. >> Sep 18 08:16:00 viktor-01 crmd: [2283]: info: crm_update_peer_proc: viktor-02.ais is now offline >> Sep 18 08:16:00 viktor-01 heartbeat: [2173]: info: Link viktor-02:eth1 dead. > > uh? > I thought we knew that already? > no, we only knew it was offline and the former DC was dead. > whatever, nit picking on wording. > ... > >> Sep 18 08:16:01 viktor-01 pengine: [4066]: info: master_color: Promoting drbd:1 (Slave viktor-01) > > and more "recovery actions" > >> Sep 18 08:16:01 viktor-01 pengine: [4066]: info: master_color: ms_drbd_ip: Promoted 1 instances of a possible 1 to master >> Sep 18 08:16:01 viktor-01 pengine: [4066]: info: master_color: ms_drbd_ip: Promoted 1 instances of a possible 1 to master >> Sep 18 08:16:01 viktor-01 pengine: [4066]: notice: NoRoleChange: Promote drbd:1#011(Slave -> Master viktor-01) >> Sep 18 08:16:01 viktor-01 pengine: [4066]: notice: StopRsc: viktor-01#011Stop drbd:1 >> Sep 18 08:16:01 viktor-01 pengine: [4066]: notice: StartRsc: viktor-01#011Start drbd:1 > > yes, now it wants to _stop_ (which in heartbeat ocf drbd ra means even > unload the drbd module) and then start again... > > stupid timeouts. > > ... >> Sep 18 08:16:02 viktor-01 lrmd: [2280]: info: rsc:drbd:1: stop >> Sep 18 08:16:02 viktor-01 crmd: [2283]: info: do_lrm_rsc_op: Performing op=drbd:1_stop_0 key=1:2:0:48972b3a-fc9c-405d-bf6c-1385730bb2b3) > >> Sep 18 08:16:02 viktor-01 drbd[4214]: [4226]: DEBUG: r0: Calling drbdadm -c /etc/drbd.conf state r0 >> Sep 18 08:16:02 viktor-01 drbd[4214]: [4242]: DEBUG: r0: Calling drbdadm -c /etc/drbd.conf cstate r0 >> Sep 18 08:16:03 viktor-01 drbd[4214]: [4249]: DEBUG: r0: Exit code 0 >> Sep 18 08:16:03 viktor-01 drbd[4214]: [4250]: DEBUG: r0: Command output: No response from the DRBD driver! Is the module loaded? Command 'drbdsetup /dev/drbd0 cstate' terminated with exit code 20 drbdadm aborting > >> Sep 18 08:16:03 viktor-01 lrmd: [2280]: info: RA output: (drbd:1:stop:stdout) >> Sep 18 08:16:03 viktor-01 drbd[4214]: [4257]: DEBUG: r0 stop: already unconfigured. > > and the ra thinks that drbd was already stopped. > not good, as it is not. > > stupid stupid timeouts. > > > now, the start: > >> Sep 18 08:16:04 viktor-01 pengine: [4066]: notice: StartRsc: viktor-01#011Start drbd:0 > > ... and so on. > >> Sep 18 08:16:38 viktor-01 drbd-peer-outdater: [4124]: WARN: error: could not connect to dopd after 60 seconds: timeout reached > > there. finally. although that message is missleading: > drbd-peer-outdater was able to connect to local dopd, > but local dopd was not able to connect to remote dopd, > and when it was invoked, aparently it was not yet able > to determine that there was no point in trying to contact the remote > dopd, as it was offline (but not dead). or some such. > anyways, full timeout expired. > >> Sep 18 08:16:57 viktor-01 kernel: [ 373.064043] drbd0: Handshake successful: DRBD Network Protocol version 86 > > but now already the peer rebooted, and drbd re-establishes the connection. > >> Sep 18 08:16:57 viktor-01 kernel: [ 373.064133] drbd0: conn( WFConnection -> WFReportParams ) >> Sep 18 08:16:57 viktor-01 kernel: [ 373.076877] drbd0: Resync done (total 1 sec; paused 0 sec; 0 K/sec) >> Sep 18 08:16:57 viktor-01 kernel: [ 373.076934] drbd0: conn( SyncSource -> Connected ) pdsk( Inconsistent -> UpToDate ) > >> Sep 18 08:16:58 viktor-01 heartbeat: [2173]: info: Heartbeat restart on node viktor-02 > > you aparently really have drbd started from init _and_ use the > heartbeat ocf drbd ra. not good, as heartbeat (with that drbd ocf ra) > insists on doing it itself, and gets confused if it was done already. > >> Sep 18 08:17:04 viktor-01 pengine: [4066]: info: determine_online_status: Node viktor-01 is online >> Sep 18 08:17:04 viktor-01 pengine: [4066]: info: unpack_rsc_op: Remapping drbd:0_monitor_0 (rc=0) on viktor-01 to an ERROR (expected 7) > > right. > heartbeat had its own picture of how drbd's state "should" be. > but it got it wrong. > reality looks different from what heartbeat thinks it should look like. > > ... > and we need an other membership update > >> Sep 18 08:17:10 viktor-01 tengine: [4065]: info: run_graph: Throttling output: batch limit (30) reached > > uh? is that normal, or is the heartbeat graph processing looping? > > > anyways, heartbeat tries to make reality look "correct" again, > by trying to stop drbd. > >> Sep 18 08:17:23 viktor-01 tengine: [4065]: info: send_rsc_command: Initiating action 15: stop drbd:1_stop_0 on viktor-02 >> Sep 18 08:17:25 viktor-01 kernel: [ 400.847494] drbd0: State change failed: Refusing to be Primary while peer is not outdated >> Sep 18 08:17:25 viktor-01 kernel: [ 400.847494] drbd0: state = { cs:Connected st:Primary/Secondary ds:UpToDate/UpToDate r--- } >> Sep 18 08:17:25 viktor-01 kernel: [ 400.847494] drbd0: wanted = { cs:TearDown st:Primary/Unknown ds:UpToDate/DUnknown r--- } >> Sep 18 08:17:25 viktor-01 kernel: [ 400.849214] drbd0: peer( Secondary -> Unknown ) conn( Connected -> TearDown ) pdsk( UpToDate -> Outdated ) > > ok, that did work (it needed two tries, as the first try could not yet > confirm "out-of-date-ness" of the peer, but it did work). > > so, viktor-01 is still drbd Primary, > while viktor-02 aparently now has been unconfigured. > >> Sep 18 08:17:29 viktor-01 pengine: [4066]: notice: clone_print: Master/Slave Set: ms_drbd_ip >> Sep 18 08:17:29 viktor-01 pengine: [4066]: notice: native_print: drbd:0#011(ocf::heartbeat:drbd):#011Started viktor-02 >> Sep 18 08:17:29 viktor-01 pengine: [4066]: notice: native_print: drbd:1#011(ocf::heartbeat:drbd):#011Stopped > > hm. that still does not mach reality. > >> Sep 18 08:17:29 viktor-01 pengine: [4066]: WARN: native_color: Resource drbd:1 cannot run anywhere >> Sep 18 08:17:29 viktor-01 pengine: [4066]: info: master_color: Promoting drbd:0 (Slave viktor-02) > > nonsense. > drbd:1 _IS_ still running (and Primary) on viktor-01. > viktor-02 has just been unconfigured. at least the latter should be > known by heartbeat, as it initiated (and successfully completed) it. > >> Sep 18 08:17:29 viktor-01 pengine: [4066]: notice: NoRoleChange: Promote drbd:0#011(Slave -> Master viktor-02) > > that won't work. > >> Sep 18 08:17:30 viktor-01 tengine: [4065]: info: send_rsc_command: Initiating action 8: promote drbd:0_promote_0 on viktor-02 >> Sep 18 08:17:32 viktor-01 crmd: [2283]: info: do_state_transition: State transition S_TRANSITION_ENGINE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_IPC_MESSAGE origin=route_message ] >> Sep 18 08:17:32 viktor-01 tengine: [4065]: info: status_from_rc: Re-mapping op status to LRM_OP_ERROR for rc=1 >> Sep 18 08:17:32 viktor-01 crmd: [2283]: info: do_state_transition: All 2 cluster nodes are eligible to run resources. >> Sep 18 08:17:32 viktor-01 tengine: [4065]: WARN: status_from_rc: Action 8 (promote) on viktor-02 failed (target: 0 vs. rc: 1): Error > > right. you cannot promote an unconfigured resource. > >> Sep 18 08:17:32 viktor-01 pengine: [4066]: notice: native_print: drbd:0#011(ocf::heartbeat:drbd):#011Master viktor-02 FAILED > > OF COURSE. > this is strange, why would heartbeat try to unconfigure and then > promote, without a start in between? > >> Sep 18 08:17:32 viktor-01 pengine: [4066]: notice: native_print: drbd:1#011(ocf::heartbeat:drbd):#011Stopped > this still runs on viktor-01, as Primary! > >> Sep 18 08:17:32 viktor-01 pengine: [4066]: WARN: native_color: Resource drbd:1 cannot run anywhere > > nonsense. it does. > >> Sep 18 08:17:32 viktor-01 pengine: [4066]: info: master_color: Promoting drbd:0 (Master viktor-02) > > why promote it? > it should first be started! > > ah. right: >> Sep 18 08:17:34 viktor-01 tengine: [4065]: info: send_rsc_command: Initiating action 7: demote drbd:0_demote_0 on viktor-02 > >> Sep 18 08:17:40 viktor-01 tengine: [4065]: info: send_rsc_command: Initiating action 1: stop drbd:0_stop_0 on viktor-02 > >> Sep 18 08:17:42 viktor-01 tengine: [4065]: info: send_rsc_command: Initiating action 5: start drbd:0_start_0 on viktor-02 > > there we are again. > >> Sep 18 08:17:43 viktor-01 kernel: [ 418.532147] drbd0: Handshake successful: DRBD Network Protocol version 86 >> Sep 18 08:17:43 viktor-01 kernel: [ 418.534929] drbd0: peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapS ) >> Sep 18 08:17:43 viktor-01 kernel: [ 418.549743] drbd0: conn( WFBitMapS -> SyncSource ) pdsk( Outdated -> Inconsistent ) >> Sep 18 08:17:43 viktor-01 kernel: [ 418.549961] drbd0: conn( SyncSource -> Connected ) pdsk( Inconsistent -> UpToDate ) >> Sep 18 08:17:45 viktor-01 tengine: [4065]: info: match_graph_event: Action drbd:0_start_0 (5) confirmed on viktor-02 (rc=0) > > >> Sep 18 08:17:48 viktor-01 pengine: [4066]: WARN: unpack_rsc_op: drbd:0_monitor_0 found active drbd:0 on viktor-02 > > uh? > >> Sep 18 08:17:48 viktor-01 pengine: [4066]: WARN: unpack_rsc_op: drbd:1_monitor_0 found active drbd:1 on viktor-02 > > how can it find both clones active on the same node? maybe I just still > don't get the concept of the clones/master/slave resources? > I though I did, though. > very strange. > > and so on. > > -- > : Lars Ellenberg > : LINBIT | Your Way to High Availability > : DRBD/HA support and consulting http://www.linbit.com > > DRBD(R) and LINBIT(R) are registered trademarks of LINBIT, Austria. > _______________________________________________ > drbd-user mailing list > drbd-user at lists.linbit.com > http://lists.linbit.com/mailman/listinfo/drbd-user >