Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Dear friends I would like to suggest you to edit your messages before to issuing them to the list. Nowadays emails are often read on mobile devices, such as smartphones and so on. The editing phase should focus on remove (as example, in this thread), such kilometric log text: what should be the sense of keeping multiples and multiples repetitions of in ALL the replies? Thank you for understanding my critic that wants to be constructive as much as possible. Kind regards and thank you really much for sharing your experiences. Robert Le mail ti raggiungono ovunque con BlackBerry® from Vodafone! -----Original Message----- From: drbd-user-request at lists.linbit.com Sender: drbd-user-bounces at lists.linbit.com Date: Sat, 18 Aug 2012 16:24:45 To: <drbd-user at lists.linbit.com> Reply-To: drbd-user at lists.linbit.com Subject: drbd-user Digest, Vol 97, Issue 18 Send drbd-user mailing list submissions to drbd-user at lists.linbit.com To subscribe or unsubscribe via the World Wide Web, visit http://lists.linbit.com/mailman/listinfo/drbd-user or, via email, send a message with subject or body 'help' to drbd-user-request at lists.linbit.com You can reach the person managing the list at drbd-user-owner at lists.linbit.com When replying, please edit your Subject line so it is more specific than "Re: Contents of drbd-user digest..." Today's Topics: 1. Re: Drbd : PingAsk timeout, about 10 mins. (Pascal BERTON) 2. Re: Drbd : PingAsk timeout, about 10 mins. (?? (??)) ---------------------------------------------------------------------- Message: 1 Date: Sat, 18 Aug 2012 12:46:01 +0200 From: "Pascal BERTON" <pascal.berton3 at free.fr> Subject: Re: [DRBD-user] Drbd : PingAsk timeout, about 10 mins. To: "'simon'" <litao5 at hisense.com>, <drbd-user at lists.linbit.com> Message-ID: <000f01cd7d2e$a9caa4d0$fd5fee70$@berton3 at free.fr> Content-Type: text/plain; charset="iso-8859-1" Hi Simon. AFAIK, the Ping Ack error means your replication network links are either down or subject to sufficient errors to prevent both nodes to reach each other in a timely manner. I had the occasion to experience such behavior because of bad optical fibers for instance, generating huge number of network errors. You also have ?network failure? messages in your logs and it?s ?Waiting for connection?. In your case I?d say the first thing to do is to test this network : Can both nodes ping each other address on this network ? Does an ifconfig of each address report errors ? Etc? I bet when your replication network is up again, your cluster will run fine. Pascal. De : drbd-user-bounces at lists.linbit.com [mailto:drbd-user-bounces at lists.linbit.com] De la part de simon Envoy? : samedi 18 ao?t 2012 03:37 ? : drbd-user at lists.linbit.com Objet : [DRBD-user] Drbd : PingAsk timeout, about 10 mins. Hi all, I used drbd 8.3.7 on HA. When Master host is dead and HA swatches from Master to Slave, the drbd can?t switch because it spends 10 minutes to mount its partition. But the time is timeout to HA.(in HA, default overtime is 2 miniutes). Why does drbd spent that long time? The log is: Jul 22 21:06:34 QD-CS-MDC-B kernel: [325560.739458] block drbd1: peer( Primary -> Unknown ) conn( Connected -> NetworkFailure ) pdsk( UpToDate -> DUnknown ) Jul 22 21:06:34 QD-CS-MDC-B kernel: [325560.739468] block drbd1: asender terminated Jul 22 21:06:34 QD-CS-MDC-B kernel: [325560.739470] block drbd1: Terminating asender thread Jul 22 21:06:34 QD-CS-MDC-B kernel: [325560.739526] block drbd1: short read expecting header on sock: r=-512 Jul 22 21:06:34 QD-CS-MDC-B kernel: [325560.739666] block drbd1: Connection closed Jul 22 21:06:34 QD-CS-MDC-B kernel: [325560.739672] block drbd1: conn( NetworkFailure -> Unconnected ) Jul 22 21:06:34 QD-CS-MDC-B kernel: [325560.739678] block drbd1: receiver terminated Jul 22 21:06:34 QD-CS-MDC-B kernel: [325560.739680] block drbd1: Restarting receiver thread Jul 22 21:06:34 QD-CS-MDC-B kernel: [325560.739683] block drbd1: receiver (re)started Jul 22 21:06:34 QD-CS-MDC-B kernel: [325560.739687] block drbd1: conn( Unconnected -> WFConnection ) Jul 22 21:06:39 QD-CS-MDC-B pengine: [17776]: info: crm_log_init: Changed active directory to /usr/var/lib/heartbeat/cores/root Jul 22 21:06:47 QD-CS-MDC-B kernel: [325573.727331] NET: Registered protocol family 17 Jul 22 21:06:47 QD-CS-MDC-B kernel: [325573.768912] block drbd0: role( Secondary -> Primary ) Jul 22 21:06:47 QD-CS-MDC-B kernel: [325573.772742] block drbd1: role( Secondary -> Primary ) Jul 22 21:06:47 QD-CS-MDC-B kernel: [325573.772997] block drbd1: Creating new current UUID Jul 22 21:08:47 QD-CS-MDC-B su: (to hitv) root on none Jul 22 21:16:47 QD-CS-MDC-B kernel: [326174.032485] block drbd0: PingAck did not arrive in time. Jul 22 21:16:47 QD-CS-MDC-B kernel: [326174.032493] block drbd0: peer( Primary -> Unknown ) conn( Connected -> NetworkFailure ) pdsk( UpToDate -> DUnknown ) Jul 22 21:16:47 QD-CS-MDC-B kernel: [326174.032503] block drbd0: asender terminated Jul 22 21:16:47 QD-CS-MDC-B kernel: [326174.032506] block drbd0: Terminating asender thread Jul 22 21:16:47 QD-CS-MDC-B kernel: [326174.032514] block drbd0: Creating new current UUID Jul 22 21:16:47 QD-CS-MDC-B kernel: [326174.032567] block drbd0: short read expecting header on sock: r=-512 Jul 22 21:16:47 QD-CS-MDC-B kernel: [326174.032868] block drbd0: Connection closed Jul 22 21:16:47 QD-CS-MDC-B kernel: [326174.032875] block drbd0: conn( NetworkFailure -> Unconnected ) Jul 22 21:16:47 QD-CS-MDC-B kernel: [326174.032879] block drbd0: receiver terminated Jul 22 21:16:47 QD-CS-MDC-B kernel: [326174.032881] block drbd0: Restarting receiver thread Jul 22 21:16:47 QD-CS-MDC-B kernel: [326174.032884] block drbd0: receiver (re)started Jul 22 21:16:47 QD-CS-MDC-B kernel: [326174.032888] block drbd0: conn( Unconnected -> WFConnection ) Jul 22 21:16:48 QD-CS-MDC-B kernel: [326174.600888] kjournald starting. Commit interval 15 seconds Jul 22 21:16:48 QD-CS-MDC-B kernel: [326174.600956] EXT3-fs warning: maximal mount count reached, running e2fsck is recommended Jul 22 21:16:48 QD-CS-MDC-B kernel: [326174.601330] EXT3 FS on drbd0, internal journal Jul 22 21:16:48 QD-CS-MDC-B kernel: [326174.601334] EXT3-fs: recovery complete. Jul 22 21:16:48 QD-CS-MDC-B kernel: [326174.601392] EXT3-fs: mounted filesystem with ordered data mode. According to the log, the timeout is PingAsk operation. Thanks your help. simon -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20120818/cb6ca975/attachment-0001.htm> ------------------------------ Message: 2 Date: Sat, 18 Aug 2012 22:24:14 +0800 (CST) From: ??(??) <litao5 at hisense.com> Subject: Re: [DRBD-user] Drbd : PingAsk timeout, about 10 mins. To: "Pascal BERTON" <pascal.berton3 at free.fr> Cc: drbd-user at lists.linbit.com Message-ID: <1408fd4.96f3e.1393a1eb50b.Coremail.litao5 at hisense.com> Content-Type: text/plain; charset="utf-8" Hi Pasical, Thanks your reply. Yes, the network was bad. Master host was dead so that Slave host took over its work and mount the drbd partition on Slave host. When mounting , the timeout issued. But the default timeout of network of drdb is 6 senconds (it can be set in drbd.conf). But it failed to take effect. why? Do you have a good idea to make it switch immediately in the condition? Thanks. Simon -----????----- ???: "Pascal BERTON" <pascal.berton3 at free.fr> ????: 2012?8?18? ??? ???: 'simon' <litao5 at hisense.com>, drbd-user at lists.linbit.com ??: ??: RE: [DRBD-user] Drbd : PingAsk timeout, about 10 mins. Hi Simon. AFAIK, the Ping Ack error means your replication network links are either down or subject to sufficient errors to prevent both nodes to reach each other in a timely manner. I had the occasion to experience such behavior because of bad optical fibers for instance, generating huge number of network errors. You also have ?network failure? messages in your logs and it?s ?Waiting for connection?. In your case I?d say the first thing to do is to test this network : Can both nodes ping each other address on this network ? Does an ifconfig of each address report errors ? Etc? I bet when your replication network is up again, your cluster will run fine. Pascal. De :drbd-user-bounces at lists.linbit.com [mailto:drbd-user-bounces at lists.linbit.com] De la part de simon Envoy? : samedi 18 ao?t 2012 03:37 ? :drbd-user at lists.linbit.com Objet : [DRBD-user] Drbd : PingAsk timeout, about 10 mins. Hi all, I used drbd 8.3.7 on HA. When Master host is dead and HA swatches from Master to Slave, the drbd can?t switch because it spends 10 minutes to mount its partition. But the time is timeout to HA.(in HA, default overtime is 2 miniutes). Why does drbd spent that long time? The log is: Jul 22 21:06:34 QD-CS-MDC-B kernel: [325560.739458] block drbd1: peer( Primary -> Unknown ) conn( Connected -> NetworkFailure ) pdsk( UpToDate -> DUnknown ) Jul 22 21:06:34 QD-CS-MDC-B kernel: [325560.739468] block drbd1: asender terminated Jul 22 21:06:34 QD-CS-MDC-B kernel: [325560.739470] block drbd1: Terminating asender thread Jul 22 21:06:34 QD-CS-MDC-B kernel: [325560.739526] block drbd1: short read expecting header on sock: r=-512 Jul 22 21:06:34 QD-CS-MDC-B kernel: [325560.739666] block drbd1: Connection closed Jul 22 21:06:34 QD-CS-MDC-B kernel: [325560.739672] block drbd1: conn( NetworkFailure -> Unconnected ) Jul 22 21:06:34 QD-CS-MDC-B kernel: [325560.739678] block drbd1: receiver terminated Jul 22 21:06:34 QD-CS-MDC-B kernel: [325560.739680] block drbd1: Restarting receiver thread Jul 22 21:06:34 QD-CS-MDC-B kernel: [325560.739683] block drbd1: receiver (re)started Jul 22 21:06:34 QD-CS-MDC-B kernel: [325560.739687] block drbd1: conn( Unconnected -> WFConnection ) Jul 22 21:06:39 QD-CS-MDC-B pengine: [17776]: info: crm_log_init: Changed active directory to /usr/var/lib/heartbeat/cores/root Jul 22 21:06:47 QD-CS-MDC-B kernel: [325573.727331] NET: Registered protocol family 17 Jul 22 21:06:47 QD-CS-MDC-B kernel: [325573.768912] block drbd0: role( Secondary -> Primary ) Jul 22 21:06:47 QD-CS-MDC-B kernel: [325573.772742] block drbd1: role( Secondary -> Primary ) Jul 22 21:06:47 QD-CS-MDC-B kernel: [325573.772997] block drbd1: Creating new current UUID Jul 22 21:08:47 QD-CS-MDC-B su: (to hitv) root on none Jul 22 21:16:47 QD-CS-MDC-B kernel: [326174.032485] block drbd0: PingAck did not arrive in time. Jul 22 21:16:47 QD-CS-MDC-B kernel: [326174.032493] block drbd0: peer( Primary -> Unknown ) conn( Connected -> NetworkFailure ) pdsk( UpToDate -> DUnknown ) Jul 22 21:16:47 QD-CS-MDC-B kernel: [326174.032503] block drbd0: asender terminated Jul 22 21:16:47 QD-CS-MDC-B kernel: [326174.032506] block drbd0: Terminating asender thread Jul 22 21:16:47 QD-CS-MDC-B kernel: [326174.032514] block drbd0: Creating new current UUID Jul 22 21:16:47 QD-CS-MDC-B kernel: [326174.032567] block drbd0: short read expecting header on sock: r=-512 Jul 22 21:16:47 QD-CS-MDC-B kernel: [326174.032868] block drbd0: Connection closed Jul 22 21:16:47 QD-CS-MDC-B kernel: [326174.032875] block drbd0: conn( NetworkFailure -> Unconnected ) Jul 22 21:16:47 QD-CS-MDC-B kernel: [326174.032879] block drbd0: receiver terminated Jul 22 21:16:47 QD-CS-MDC-B kernel: [326174.032881] block drbd0: Restarting receiver thread Jul 22 21:16:47 QD-CS-MDC-B kernel: [326174.032884] block drbd0: receiver (re)started Jul 22 21:16:47 QD-CS-MDC-B kernel: [326174.032888] block drbd0: conn( Unconnected -> WFConnection ) Jul 22 21:16:48 QD-CS-MDC-B kernel: [326174.600888] kjournald starting. Commit interval 15 seconds Jul 22 21:16:48 QD-CS-MDC-B kernel: [326174.600956] EXT3-fs warning: maximal mount count reached, running e2fsck is recommended Jul 22 21:16:48 QD-CS-MDC-B kernel: [326174.601330] EXT3 FS on drbd0, internal journal Jul 22 21:16:48 QD-CS-MDC-B kernel: [326174.601334] EXT3-fs: recovery complete. Jul 22 21:16:48 QD-CS-MDC-B kernel: [326174.601392] EXT3-fs: mounted filesystem with ordered data mode. According to the log, the timeout is PingAsk operation. Thanks your help. simon -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20120818/c5f788f1/attachment.htm> ------------------------------ _______________________________________________ drbd-user mailing list drbd-user at lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user End of drbd-user Digest, Vol 97, Issue 18 *****************************************