Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hi Lars Ellenberg, The Master Host has two network cards, eth0 and eth1. Drbd uses eth0. "not real dead" means eth0 is dead. ( it can get by ha log). Eth1 can ping good but can't login by ssh. So I think maybe the linux is panic. Eth0 is dead, but drbd can't detect it and return immediately. Why? Thanks. -----邮件原件----- 发件人: drbd-user-bounces at lists.linbit.com [mailto:drbd-user-bounces at lists.linbit.com] 代表 drbd-user-request at lists.linbit.com 发送时间: 2012年8月22日 星期三 18:00 收件人: drbd-user at lists.linbit.com 主题: drbd-user Digest, Vol 97, Issue 23 Send drbd-user mailing list submissions to drbd-user at lists.linbit.com To subscribe or unsubscribe via the World Wide Web, visit http://lists.linbit.com/mailman/listinfo/drbd-user or, via email, send a message with subject or body 'help' to drbd-user-request at lists.linbit.com You can reach the person managing the list at drbd-user-owner at lists.linbit.com When replying, please edit your Subject line so it is more specific than "Re: Contents of drbd-user digest..." Today's Topics: 1. Re: Drbd : PingAsk timeout, about 10 mins. (Lars Ellenberg) ---------------------------------------------------------------------- Message: 1 Date: Tue, 21 Aug 2012 12:50:12 +0200 From: Lars Ellenberg <lars.ellenberg at linbit.com> Subject: Re: [DRBD-user] Drbd : PingAsk timeout, about 10 mins. To: drbd-user at lists.linbit.com Message-ID: <20120821105012.GG20059 at soda.linbit> Content-Type: text/plain; charset=utf-8 On Tue, Aug 21, 2012 at 03:40:34PM +0800, simon wrote: > Hi Pascal, > > > > I can?t reproduce the error because the condition that it issues is > very especially. The Master host is in the ?not real dead? status. > ( I doubt it is Linux?s panic). The TCP stack maybe is bad in Master > host. Now I don?t want to avoid it because I can?t reproduce it. I > only want to succeed to switch form Master to Slave so that my > service can be supplied normally. But I can?t right to switch because > of the 10 minutes delay of Drbd. Well. If it was "not real dead", then I'd suspect that the DRBD connection was still "sort of up", and thus DRBD saw the other node as Primary still, and correctly refused to be promoted locally. To have your cluster recover from a "almost but not quite dead node" scenario, you need to add stonith aka node level fencing to your cluster stack. > I run ?drbdsetup 0 show? on my host, it shows as following, > > disk { > size 0s _is_default; # bytes > on-io-error detach; > fencing dont-care _is_default; > max-bio-bvecs 0 _is_default; > } > > net { > timeout 60 _is_default; # 1/10 seconds > max-epoch-size 2048 _is_default; > max-buffers 2048 _is_default; > unplug-watermark 128 _is_default; > connect-int 10 _is_default; # seconds > ping-int 10 _is_default; # seconds > sndbuf-size 0 _is_default; # bytes > rcvbuf-size 0 _is_default; # bytes > ko-count 0 _is_default; > allow-two-primaries; Uh. You are sure about that? Two primaries, and dont-care for fencing? You are aware that you just subscribed to data corruption, right? If you want two primaries, you MUST have proper fencing, on both the cluster level (stonith) and the drbd level (fencing resource-and-stonith; fence-peer handler: e.g. crm-fence-peer.sh). > after-sb-0pri discard-least-changes; > after-sb-1pri discard-secondary; And here you configure automatic data loss. Which is ok, as long as you are aware of that and actually mean it... > > after-sb-2pri disconnect _is_default; > rr-conflict disconnect _is_default; > ping-timeout 5 _is_default; # 1/10 seconds > } > > syncer { > rate 102400k; # bytes/second > after -1 _is_default; > al-extents 257; > } > > protocol C; > _this_host { > device minor 0; > disk "/dev/cciss/c0d0p7"; > meta-disk internal; > address ipv4 172.17.5.152:7900; > } > > _remote_host { > address ipv4 172.17.5.151:7900; > } > > > > > > In the list , there is ?timeout 60 _is_default; # 1/10 seconds?. Then guess what, maybe the timeout did not trigger, because the peer was still "sort of" responsive? -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD? and LINBIT? are registered trademarks of LINBIT, Austria. __ please don't Cc me, but send to list -- I'm subscribed ------------------------------ _______________________________________________ drbd-user mailing list drbd-user at lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user End of drbd-user Digest, Vol 97, Issue 23 *****************************************