[DRBD-user] Re: DRBD+Linux-HA: Failover fails if initiated by init 6 on DRBD Master Node

Christoph Eßer cehser at gmail.com
Thu Sep 18 22:03:31 CEST 2008


Thank you so much! Adjusting the timeout of the peer-outdater was the solution.

Of course I did look at the logs myself but I could not get a clou out of it.

Again, thanks a lot for your explainations.

Christoph

2008/9/18 Lars Ellenberg <lars.ellenberg at linbit.com>:
> On Thu, Sep 18, 2008 at 10:29:33AM +0200, Christoph Eßer wrote:
>> Ooops, gmail incorrectly labeled this mail as sent to the Linux-HA
>> list. This is the right one. Sorry for disturbance.
>
> and it is much too long...
>
>> I added a complete log for the reboot process on my slave machine.
>> Maybe this helps.
>
> you could have looked at those logs yourself, you know?
>
>> 2008/9/18 Christoph Eßer <cehser at gmail.com>:
>> > Sorry, I accidentially sent the message again to this list. just ignore it.
>> >
>> > 2008/9/18 Christoph Eßer <cehser at gmail.com>:
>> >> Hi there,
>> >>
>> >> I first came up with this question on Linux-HA mailing list. As they think
>> >> it semms to be a DRBD related problem, I repeat it here:
>> >>
>> >> I am using Heartbeat 2.99,Pacemaker 0.6.6 and DRBD 8.0.13 on Debian Lenny.
>> >>
>> >> Anything works as expected if I just force the system to failover by
>> >> unplugging the power supply of any of my two nodes. Shutting down the
>> >> DRBD slave via "init 0" or "init 6" works fine as well.
>> >>
>> >> But whenever I restart my DRBD master via "init 6" the whole cluster
>> >> crashes and fails to reassign the resources, even after the node started
>> >> up again. Anyway, I found some strange DRBD related messages in
>> >> /var/log/syslog on the second node while restarting the master node:
>
>> Sep 18 08:14:22 viktor-01 drbd[3863]: [3900]: DEBUG: r0 status: Secondary/Primary Secondary Primary Connected
>
> fine.
>
>
>> Sep 18 08:15:19 viktor-01 kernel: [  274.894318] drbd0: peer( Primary -> Secondary )
>
> ok, peer is going secondary.
>
>> Sep 18 08:15:24 viktor-01 kernel: [  279.772756] drbd0: peer( Secondary -> Unknown ) conn( Connected -> TearDown ) pdsk( UpToDate -> DUnknown )
>
> peer disconnected
>
>> Sep 18 08:15:24 viktor-01 kernel: [  279.772756] drbd0: conn( Unconnected -> WFConnection )
>
> we go wait for the peer to come back.
>
>> Sep 18 08:15:25 viktor-01 cib: [2279]: info: apply_xml_diff: Digest mis-match: expected 615bd497ee2d3cfdec1bc8611c9af264, calculated dd22ea8b07a5c626d0c03c021d1c3045
>
> dunno whether this is normal behaviour of heartbeat:
>
>> Sep 18 08:15:25 viktor-01 cib: [2279]: info: cib_process_diff: Diff 0.209.194 -> 0.209.195 not applied to 0.209.194: Failed application of a global update.  Requesting full refresh.
>> Sep 18 08:15:25 viktor-01 cib: [2279]: info: cib_process_diff: Requesting re-sync from peer: Failed application of a global update.  Requesting full refresh.
>> Sep 18 08:15:25 viktor-01 cib: [2279]: WARN: do_cib_notify: cib_apply_diff of <diff > FAILED: Application of an update diff failed, requesting a full refresh
>> Sep 18 08:15:25 viktor-01 drbd[4021]: [4050]: DEBUG: r0: Calling drbdadm -c /etc/drbd.conf cstate r0
>> Sep 18 08:15:25 viktor-01 cib: [2279]: WARN: cib_process_request: cib_apply_diff operation failed: Application of an update diff failed, requesting a full refresh
>> Sep 18 08:15:25 viktor-01 cib: [2279]: WARN: cib_process_diff: Not applying diff 0.209.195 -> 0.209.196 (sync in progress)
>> Sep 18 08:15:25 viktor-01 drbd[4021]: [4057]: DEBUG: r0: Exit code 0
>> Sep 18 08:15:25 viktor-01 cib: [2279]: WARN: do_cib_notify: cib_apply_diff of <diff > FAILED: Application of an update diff failed, requesting a full refresh
>> Sep 18 08:15:25 viktor-01 drbd[4021]: [4058]: DEBUG: r0: Command output: WFConnection
>> Sep 18 08:15:25 viktor-01 cib: [2279]: WARN: cib_process_request: cib_apply_diff operation failed: Application of an update diff failed, requesting a full refresh
>
>
>> Sep 18 08:15:28 viktor-01 crmd: [2283]: notice: crmd_client_status_callback: Status update: Client viktor-02/crmd now has status [offline]
>> Sep 18 08:15:28 viktor-01 crmd: [2283]: info: crm_update_peer_proc: viktor-02.crmd is now offline
>
> but finally peer is offline from heartbeats perspective as well.
>
>> Sep 18 08:15:28 viktor-01 crmd: [2283]: info: crmd_client_status_callback: Got client status callback - our DC is dead
>
> unfortunately it was the DC,
> so heartbeat elects a new DC.
>
>> Sep 18 08:15:28 viktor-01 crmd: [2283]: info: do_state_transition: State transition S_NOT_DC -> S_ELECTION [ input=I_ELECTION cause=C_CRMD_STATUS_CALLBACK origin=crmd_client_status_callback ]
>> Sep 18 08:15:28 viktor-01 crmd: [2283]: info: update_dc: Unset DC viktor-02
>
> ...
>
>> Sep 18 08:15:35 viktor-01 ccm: [2278]: info: Break tie for 2 nodes cluster
>> Sep 18 08:15:35 viktor-01 crmd: [2283]: info: ccm_event_detail: NEW MEMBERSHIP: trans=3, nodes=1, new=0, lost=1 n_idx=0, new_idx=1, old_idx=3
>
> ...
>
>> Sep 18 08:15:35 viktor-01 crmd: [2283]: info: update_dc: Set DC to viktor-01 (2.1)
>
> there. we are the DC.
>
>> Sep 18 08:15:36 viktor-01 crmd: [2283]: info: do_state_transition: All 1 cluster nodes responded to the join offer.
>
> cool. I agree with myself.
>
>
> we start the resource stack:
>
>> Sep 18 08:15:37 viktor-01 pengine: [4066]: WARN: native_color: Resource drbd:0 cannot run anywhere
>> Sep 18 08:15:37 viktor-01 pengine: [4066]: info: master_color: Promoting drbd:1 (Slave viktor-01)
>
> ...
>> Sep 18 08:15:37 viktor-01 pengine: [4066]: notice: NoRoleChange: Promote drbd:1#011(Slave -> Master viktor-01)
>> Sep 18 08:15:37 viktor-01 pengine: [4066]: notice: RecurringOp:  Start recurring monitor (29s) for drbd:1 on viktor-01
>> Sep 18 08:15:37 viktor-01 pengine: [4066]: notice: StartRsc:  viktor-01#011Start extern
>> Sep 18 08:15:37 viktor-01 pengine: [4066]: notice: StartRsc:  viktor-01#011Start eth3:0
> ...
>> Sep 18 08:15:37 viktor-01 pengine: [4066]: notice: StartRsc:  viktor-01#011Start fs_mount
>> Sep 18 08:15:37 viktor-01 pengine: [4066]: notice: StartRsc:  viktor-01#011Start Exim4
>> Sep 18 08:15:37 viktor-01 pengine: [4066]: notice: StartRsc:  viktor-01#011Start Squid
>
> ...
>
>> Sep 18 08:15:37 viktor-01 crmd: [2283]: info: do_lrm_rsc_op: Performing op=drbd:1_promote_0 key=7:0:0:48972b3a-fc9c-405d-bf6c-1385730bb2b3)
>
>> Sep 18 08:15:38 viktor-01 drbd[4080]: [4117]: DEBUG: r0 status: Secondary/Unknown Secondary Unknown WFConnection
>> Sep 18 08:15:38 viktor-01 drbd[4080]: [4118]: DEBUG: r0: Calling drbdadm -c /etc/drbd.conf primary r0
>> Sep 18 08:15:38 viktor-01 kernel: [  293.423767] drbd0: helper command: /sbin/drbdadm outdate-peer
>
> you configured drbd to use the outdate-peer handler.
> so it does.
>
>> Sep 18 08:15:38 viktor-01 drbd-peer-outdater: [4124]: debug: drbd peer: viktor-02
>> Sep 18 08:15:38 viktor-01 drbd-peer-outdater: [4124]: debug: drbd resource: r0
>> Sep 18 08:15:38 viktor-01 /usr/lib/heartbeat/dopd: [2284]: debug: Connecting channel
>> Sep 18 08:15:38 viktor-01 /usr/lib/heartbeat/dopd: [2284]: debug: Client outdater (0x9864a50) connected
>> Sep 18 08:15:38 viktor-01 /usr/lib/heartbeat/dopd: [2284]: debug: invoked: outdater
>> Sep 18 08:15:38 viktor-01 /usr/lib/heartbeat/dopd: [2284]: debug: Processing msg from outdater
>> Sep 18 08:15:38 viktor-01 /usr/lib/heartbeat/dopd: [2284]: debug: Got message from (drbd-peer-outdater). (peer: viktor-02, res :r0)
>> Sep 18 08:15:38 viktor-01 /usr/lib/heartbeat/dopd: [2284]: debug: Starting node walk
>> Sep 18 08:15:39 viktor-01 /usr/lib/heartbeat/dopd: [2284]: debug: node viktor-02 found
>> Sep 18 08:15:39 viktor-01 /usr/lib/heartbeat/dopd: [2284]: info: sending start_outdate message to the other node viktor-01 -> viktor-02
>> Sep 18 08:15:39 viktor-01 /usr/lib/heartbeat/dopd: [2284]: debug: sending [start_outdate res: r0] to node: viktor-02
>> Sep 18 08:15:39 viktor-01 /usr/lib/heartbeat/dopd: [2284]: debug: Processed 1 messages
>
> meanwhile the peer rebooted?
>
>> Sep 18 08:15:41 viktor-01 kernel: [  297.000380] e1000: eth2: e1000_watchdog: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
>> Sep 18 08:15:44 viktor-01 kernel: [  300.008048] drbd0: connect failed, err = -115
>> Sep 18 08:15:54 viktor-01 kernel: [  310.108051] drbd0: connect failed, err = -115
>
> but has no drbd port open yet.
>
>> Sep 18 08:15:57 viktor-01 lrmd: [2280]: WARN: drbd:1:promote process (PID 4080) timed out (try 1).  Killing with signal SIGTERM (15).
>
>
> the timeout on the promote is too short,
> compared to the timeout of drbd-peer-outdater.
> add "-t 5" to the drbd-peer-outdater command in drbd.conf,
> or increase the promote timeouts.
>
>
>> Sep 18 08:15:57 viktor-01 lrmd: [2280]: WARN: operation promote[54] on ocf::drbd::drbd:1 for client 2283, its parameters: clone_max=[2] CRM_meta_role=[Master] drbdconf=[/etc/drbd.conf] CRM_meta_notify_start_resource=[drbd:0 drbd:1 ] CRM_meta_notify_promote_resource=[drbd:1 ] drbd_resource=[r0] CRM_meta_master_max=[1] CRM_meta_stateful=[true] CRM_meta_timeout=[20000] crm_feature_set=[2.1] CRM_meta_clone_max=[2] CRM_meta_master_node_max=[1] master_node_max=[1] master_max=[1] CRM_meta_notify_promote_uname=[viktor-01 ] CRM_meta_notify_start_uname=[viktor-: pid [4080] timed out
>> Sep 18 08:15:57 viktor-01 crmd: [2283]: ERROR: process_lrm_event: LRM operation drbd:1_promote_0 (54) Timed Out (timeout=20000ms)
>
> default timeout for drbd-peer-outdater is 60 seconds.
> promote timeout is set to 20 seconds.
>
> too bad.
> "recovery action" of heartbeat for a failed promote is
> probably "stop" "start". while drbd itself is still trying to become
> primary, once that drbd-peer-outdate helper returns.
>
> hm, thats going to hurt.
>
>
> ...  blabla ...
>
>> Sep 18 08:15:58 viktor-01 pengine: [4066]: info: master_color: Promoting drbd:1 (Master viktor-01)
>> Sep 18 08:15:58 viktor-01 pengine: [4066]: info: master_color: ms_drbd_ip: Promoted 1 instances of a possible 1 to master
>> Sep 18 08:15:58 viktor-01 pengine: [4066]: info: master_color: ms_drbd_ip: Promoted 1 instances of a possible 1 to master
>> Sep 18 08:15:58 viktor-01 pengine: [4066]: notice: NoRoleChange: Leave resource drbd:1#011(Master viktor-01)
>
> heartbeat seems to be confused.
> wants to promote the master? but it is already master?
>
>
> ok, so it tries to demote first:
>
>> Sep 18 08:15:58 viktor-01 lrmd: [2280]: info: rsc:drbd:1: demote
>> Sep 18 08:15:58 viktor-01 crmd: [2283]: info: do_lrm_rsc_op: Performing op=drbd:1_demote_0 key=6:1:0:48972b3a-fc9c-405d-bf6c-1385730bb2b3)
>
>> Sep 18 08:15:58 viktor-01 drbd[4137]: [4148]: DEBUG: r0: Calling drbdadm -c /etc/drbd.conf secondary r0
>> Sep 18 08:15:59 viktor-01 drbd[4137]: [4155]: DEBUG: r0: Exit code 0
>> Sep 18 08:15:59 viktor-01 drbd[4137]: [4156]: DEBUG: r0: Command output: No response from the DRBD driver! Is the module loaded? Command 'drbdsetup /dev/drbd0 secondary' terminated with exit code 20 drbdadm aborting
>
> hu? did not work? well, not, because the promotion is still pending...
> so what is it? exit code 20, or exit code 0?
>
>> Sep 18 08:15:59 viktor-01 lrmd: [2280]: info: RA output: (drbd:1:demote:stdout) No response from the DRBD driver! Is the module loaded? Command 'drbdsetup /dev/drbd0 secondary' terminated with exit code 20 drbdadm aborting
>
> sure it does not answer to role change requests,
> as there is a role change request still pending.
>
>> Sep 18 08:16:00 viktor-01 heartbeat: [2173]: WARN: node viktor-02: is dead
>> Sep 18 08:16:00 viktor-01 crmd: [2283]: notice: crmd_ha_status_callback: Status update: Node viktor-02 now has status [dead]
>> Sep 18 08:16:00 viktor-01 heartbeat: [2173]: info: Link viktor-02:eth0 dead.
>> Sep 18 08:16:00 viktor-01 crmd: [2283]: info: crm_update_peer_proc: viktor-02.ais is now offline
>> Sep 18 08:16:00 viktor-01 heartbeat: [2173]: info: Link viktor-02:eth1 dead.
>
> uh?
> I thought we knew that already?
> no, we only knew it was offline and the former DC was dead.
> whatever, nit picking on wording.
> ...
>
>> Sep 18 08:16:01 viktor-01 pengine: [4066]: info: master_color: Promoting drbd:1 (Slave viktor-01)
>
> and more "recovery actions"
>
>> Sep 18 08:16:01 viktor-01 pengine: [4066]: info: master_color: ms_drbd_ip: Promoted 1 instances of a possible 1 to master
>> Sep 18 08:16:01 viktor-01 pengine: [4066]: info: master_color: ms_drbd_ip: Promoted 1 instances of a possible 1 to master
>> Sep 18 08:16:01 viktor-01 pengine: [4066]: notice: NoRoleChange: Promote drbd:1#011(Slave -> Master viktor-01)
>> Sep 18 08:16:01 viktor-01 pengine: [4066]: notice: StopRsc:   viktor-01#011Stop drbd:1
>> Sep 18 08:16:01 viktor-01 pengine: [4066]: notice: StartRsc:  viktor-01#011Start drbd:1
>
> yes, now it wants to _stop_ (which in heartbeat ocf drbd ra means even
> unload the drbd module) and then start again...
>
> stupid timeouts.
>
> ...
>> Sep 18 08:16:02 viktor-01 lrmd: [2280]: info: rsc:drbd:1: stop
>> Sep 18 08:16:02 viktor-01 crmd: [2283]: info: do_lrm_rsc_op: Performing op=drbd:1_stop_0 key=1:2:0:48972b3a-fc9c-405d-bf6c-1385730bb2b3)
>
>> Sep 18 08:16:02 viktor-01 drbd[4214]: [4226]: DEBUG: r0: Calling drbdadm -c /etc/drbd.conf state r0
>> Sep 18 08:16:02 viktor-01 drbd[4214]: [4242]: DEBUG: r0: Calling drbdadm -c /etc/drbd.conf cstate r0
>> Sep 18 08:16:03 viktor-01 drbd[4214]: [4249]: DEBUG: r0: Exit code 0
>> Sep 18 08:16:03 viktor-01 drbd[4214]: [4250]: DEBUG: r0: Command output: No response from the DRBD driver! Is the module loaded? Command 'drbdsetup /dev/drbd0 cstate' terminated with exit code 20 drbdadm aborting
>
>> Sep 18 08:16:03 viktor-01 lrmd: [2280]: info: RA output: (drbd:1:stop:stdout)
>> Sep 18 08:16:03 viktor-01 drbd[4214]: [4257]: DEBUG: r0 stop: already unconfigured.
>
> and the ra thinks that drbd was already stopped.
> not good, as it is not.
>
> stupid stupid timeouts.
>
>
> now, the start:
>
>> Sep 18 08:16:04 viktor-01 pengine: [4066]: notice: StartRsc:  viktor-01#011Start drbd:0
>
> ... and so on.
>
>> Sep 18 08:16:38 viktor-01 drbd-peer-outdater: [4124]: WARN: error: could not connect to dopd after 60 seconds: timeout reached
>
> there. finally. although that message is missleading:
> drbd-peer-outdater was able to connect to local dopd,
> but local dopd was not able to connect to remote dopd,
> and when it was invoked, aparently it was not yet able
> to determine that there was no point in trying to contact the remote
> dopd, as it was offline (but not dead). or some such.
> anyways, full timeout expired.
>
>> Sep 18 08:16:57 viktor-01 kernel: [  373.064043] drbd0: Handshake successful: DRBD Network Protocol version 86
>
> but now already the peer rebooted, and drbd re-establishes the connection.
>
>> Sep 18 08:16:57 viktor-01 kernel: [  373.064133] drbd0: conn( WFConnection -> WFReportParams )
>> Sep 18 08:16:57 viktor-01 kernel: [  373.076877] drbd0: Resync done (total 1 sec; paused 0 sec; 0 K/sec)
>> Sep 18 08:16:57 viktor-01 kernel: [  373.076934] drbd0: conn( SyncSource -> Connected ) pdsk( Inconsistent -> UpToDate )
>
>> Sep 18 08:16:58 viktor-01 heartbeat: [2173]: info: Heartbeat restart on node viktor-02
>
> you aparently really have drbd started from init _and_ use the
> heartbeat ocf drbd ra. not good, as heartbeat (with that drbd ocf ra)
> insists on doing it itself, and gets confused if it was done already.
>
>> Sep 18 08:17:04 viktor-01 pengine: [4066]: info: determine_online_status: Node viktor-01 is online
>> Sep 18 08:17:04 viktor-01 pengine: [4066]: info: unpack_rsc_op: Remapping drbd:0_monitor_0 (rc=0) on viktor-01 to an ERROR (expected 7)
>
> right.
> heartbeat had its own picture of how drbd's state "should" be.
> but it got it wrong.
> reality looks different from what heartbeat thinks it should look like.
>
> ...
> and we need an other membership update
>
>> Sep 18 08:17:10 viktor-01 tengine: [4065]: info: run_graph: Throttling output: batch limit (30) reached
>
> uh? is that normal, or is the heartbeat graph processing looping?
>
>
> anyways, heartbeat tries to make reality look "correct" again,
> by trying to stop drbd.
>
>> Sep 18 08:17:23 viktor-01 tengine: [4065]: info: send_rsc_command: Initiating action 15: stop drbd:1_stop_0 on viktor-02
>> Sep 18 08:17:25 viktor-01 kernel: [  400.847494] drbd0: State change failed: Refusing to be Primary while peer is not outdated
>> Sep 18 08:17:25 viktor-01 kernel: [  400.847494] drbd0:   state = { cs:Connected st:Primary/Secondary ds:UpToDate/UpToDate r--- }
>> Sep 18 08:17:25 viktor-01 kernel: [  400.847494] drbd0:  wanted = { cs:TearDown st:Primary/Unknown ds:UpToDate/DUnknown r--- }
>> Sep 18 08:17:25 viktor-01 kernel: [  400.849214] drbd0: peer( Secondary -> Unknown ) conn( Connected -> TearDown ) pdsk( UpToDate -> Outdated )
>
> ok, that did work (it needed two tries, as the first try could not yet
> confirm "out-of-date-ness" of the peer, but it did work).
>
> so, viktor-01 is still drbd Primary,
> while viktor-02 aparently now has been unconfigured.
>
>> Sep 18 08:17:29 viktor-01 pengine: [4066]: notice: clone_print: Master/Slave Set: ms_drbd_ip
>> Sep 18 08:17:29 viktor-01 pengine: [4066]: notice: native_print:     drbd:0#011(ocf::heartbeat:drbd):#011Started viktor-02
>> Sep 18 08:17:29 viktor-01 pengine: [4066]: notice: native_print:     drbd:1#011(ocf::heartbeat:drbd):#011Stopped
>
> hm. that still does not mach reality.
>
>> Sep 18 08:17:29 viktor-01 pengine: [4066]: WARN: native_color: Resource drbd:1 cannot run anywhere
>> Sep 18 08:17:29 viktor-01 pengine: [4066]: info: master_color: Promoting drbd:0 (Slave viktor-02)
>
> nonsense.
> drbd:1 _IS_ still running (and Primary) on viktor-01.
> viktor-02 has just been unconfigured. at least the latter should be
> known by heartbeat, as it initiated (and successfully completed) it.
>
>> Sep 18 08:17:29 viktor-01 pengine: [4066]: notice: NoRoleChange: Promote drbd:0#011(Slave -> Master viktor-02)
>
> that won't work.
>
>> Sep 18 08:17:30 viktor-01 tengine: [4065]: info: send_rsc_command: Initiating action 8: promote drbd:0_promote_0 on viktor-02
>> Sep 18 08:17:32 viktor-01 crmd: [2283]: info: do_state_transition: State transition S_TRANSITION_ENGINE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_IPC_MESSAGE origin=route_message ]
>> Sep 18 08:17:32 viktor-01 tengine: [4065]: info: status_from_rc: Re-mapping op status to LRM_OP_ERROR for rc=1
>> Sep 18 08:17:32 viktor-01 crmd: [2283]: info: do_state_transition: All 2 cluster nodes are eligible to run resources.
>> Sep 18 08:17:32 viktor-01 tengine: [4065]: WARN: status_from_rc: Action 8 (promote) on viktor-02 failed (target: 0 vs. rc: 1): Error
>
> right. you cannot promote an unconfigured resource.
>
>> Sep 18 08:17:32 viktor-01 pengine: [4066]: notice: native_print:     drbd:0#011(ocf::heartbeat:drbd):#011Master viktor-02 FAILED
>
> OF COURSE.
> this is strange, why would heartbeat try to unconfigure and then
> promote, without a start in between?
>
>> Sep 18 08:17:32 viktor-01 pengine: [4066]: notice: native_print:     drbd:1#011(ocf::heartbeat:drbd):#011Stopped
> this still runs on viktor-01, as Primary!
>
>> Sep 18 08:17:32 viktor-01 pengine: [4066]: WARN: native_color: Resource drbd:1 cannot run anywhere
>
> nonsense. it does.
>
>> Sep 18 08:17:32 viktor-01 pengine: [4066]: info: master_color: Promoting drbd:0 (Master viktor-02)
>
> why promote it?
> it should first be started!
>
> ah. right:
>> Sep 18 08:17:34 viktor-01 tengine: [4065]: info: send_rsc_command: Initiating action 7: demote drbd:0_demote_0 on viktor-02
>
>> Sep 18 08:17:40 viktor-01 tengine: [4065]: info: send_rsc_command: Initiating action 1: stop drbd:0_stop_0 on viktor-02
>
>> Sep 18 08:17:42 viktor-01 tengine: [4065]: info: send_rsc_command: Initiating action 5: start drbd:0_start_0 on viktor-02
>
> there we are again.
>
>> Sep 18 08:17:43 viktor-01 kernel: [  418.532147] drbd0: Handshake successful: DRBD Network Protocol version 86
>> Sep 18 08:17:43 viktor-01 kernel: [  418.534929] drbd0: peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapS )
>> Sep 18 08:17:43 viktor-01 kernel: [  418.549743] drbd0: conn( WFBitMapS -> SyncSource ) pdsk( Outdated -> Inconsistent )
>> Sep 18 08:17:43 viktor-01 kernel: [  418.549961] drbd0: conn( SyncSource -> Connected ) pdsk( Inconsistent -> UpToDate )
>> Sep 18 08:17:45 viktor-01 tengine: [4065]: info: match_graph_event: Action drbd:0_start_0 (5) confirmed on viktor-02 (rc=0)
>
>
>> Sep 18 08:17:48 viktor-01 pengine: [4066]: WARN: unpack_rsc_op: drbd:0_monitor_0 found active drbd:0 on viktor-02
>
> uh?
>
>> Sep 18 08:17:48 viktor-01 pengine: [4066]: WARN: unpack_rsc_op: drbd:1_monitor_0 found active drbd:1 on viktor-02
>
> how can it find both clones active on the same node?  maybe I just still
> don't get the concept of the clones/master/slave resources?
> I though I did, though.
> very strange.
>
> and so on.
>
> --
> : Lars Ellenberg
> : LINBIT | Your Way to High Availability
> : DRBD/HA support and consulting http://www.linbit.com
>
> DRBD(R) and LINBIT(R) are registered trademarks of LINBIT, Austria.
> _______________________________________________
> drbd-user mailing list
> drbd-user at lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user
>



More information about the drbd-user mailing list