<P>Hi Pasical,</P>
<P><STRONG>Thanks your reply. </STRONG></P>
<P><STRONG>Yes, the network was bad. Master host was dead so that Slave host took over its work and mount the drbd partition on Slave host. When mounting , the timeout issued. But the default timeout of network of drdb is 6 senconds (it can be set in drbd.conf). But it failed to take effect. why?</STRONG></P>
<P><STRONG>Do you have a good idea to make it switch immediately in the condition? </STRONG></P>
<P><STRONG>Thanks.</STRONG></P>
<P><STRONG> Simon</STRONG> </P>
<BLOCKQUOTE style="BORDER-LEFT: #a0c6e5 2px solid; PADDING-LEFT: 5px; MARGIN-LEFT: 5px; MARGIN-RIGHT: 0px" name="replyContent">-----原始邮件-----<BR><B>发件人:</B> "Pascal BERTON" <pascal.berton3@free.fr><BR><B>发送时间:</B> 2012年8月18日 星期六<BR><B>收件人:</B> 'simon' <litao5@hisense.com>, drbd-user@lists.linbit.com<BR><B>抄送:</B> <BR><B>主题:</B> RE: [DRBD-user] Drbd : PingAsk timeout, about 10 mins.<BR><BR>
<DIV class="WordSection1">
<P class="MsoNormal"><SPAN style="COLOR: #1f497d; FONT-SIZE: 11pt" lang="EN-US">Hi Simon.<o:p></o:p></SPAN></P>
<P class="MsoNormal"><SPAN style="COLOR: #1f497d; FONT-SIZE: 11pt" lang="EN-US"><o:p> </o:p></SPAN></P>
<P class="MsoNormal"><SPAN style="COLOR: #1f497d; FONT-SIZE: 11pt" lang="EN-US">AFAIK, the Ping Ack error means your replication network links are either down or subject to sufficient errors to prevent both nodes to reach each other in a timely manner. I had the occasion to experience such behavior because of bad optical fibers for instance, generating huge number of network errors. You also have “network failure” messages in your logs and it’s “Waiting for connection”. In your case I’d say the first thing to do is to test this network : Can both nodes ping each other address on this network ? Does an ifconfig of each address report errors ? Etc… I bet when your replication network is up again, your cluster will run fine.<o:p></o:p></SPAN></P>
<P class="MsoNormal"><SPAN style="COLOR: #1f497d; FONT-SIZE: 11pt" lang="EN-US"><o:p> </o:p></SPAN></P>
<P class="MsoNormal"><SPAN style="COLOR: #1f497d; FONT-SIZE: 11pt" lang="EN-US">Pascal.<o:p></o:p></SPAN></P>
<P class="MsoNormal"><SPAN style="COLOR: #1f497d; FONT-SIZE: 11pt" lang="EN-US"><o:p> </o:p></SPAN></P>
<DIV>
<DIV style="BORDER-BOTTOM: medium none; BORDER-LEFT: medium none; PADDING-BOTTOM: 0cm; PADDING-LEFT: 0cm; PADDING-RIGHT: 0cm; BORDER-TOP: #b5c4df 1pt solid; BORDER-RIGHT: medium none; PADDING-TOP: 3pt">
<P style="TEXT-ALIGN: left" class="MsoNormal" align="left"><B><SPAN style="FONT-FAMILY: 'Tahoma','sans-serif'; FONT-SIZE: 10pt">De :</SPAN></B><SPAN style="FONT-FAMILY: 'Tahoma','sans-serif'; FONT-SIZE: 10pt"> <A href="mailto:drbd-user-bounces@lists.linbit.com" target="_blank">drbd-user-bounces@lists.linbit.com</A> [mailto:<A href="mailto:drbd-user-bounces@lists.linbit.com" target="_blank">drbd-user-bounces@lists.linbit.com</A>] <B>De la part de</B> simon<BR><B>Envoyé :</B> samedi 18 août 2012 03:37<BR><B>À :</B> <A href="mailto:drbd-user@lists.linbit.com" target="_blank">drbd-user@lists.linbit.com</A><BR><B>Objet :</B> [DRBD-user] Drbd : PingAsk timeout, about 10 mins.<o:p></o:p></SPAN></P></DIV></DIV>
<P style="TEXT-ALIGN: left" class="MsoNormal" align="left"><o:p> </o:p></P>
<P class="MsoNormal"><SPAN lang="EN-US">Hi all,<o:p></o:p></SPAN></P>
<P class="MsoNormal"><SPAN lang="EN-US"><o:p> </o:p></SPAN></P>
<P class="MsoNormal"><SPAN lang="EN-US">I used drbd 8.3.7 on HA. When Master host is dead and HA swatches from Master to Slave, the drbd can’t switch because it spends 10 minutes to mount its partition. But the time is timeout to HA.(in HA, default overtime is 2 miniutes).<o:p></o:p></SPAN></P>
<P class="MsoNormal"><SPAN lang="EN-US"><o:p> </o:p></SPAN></P>
<P class="MsoNormal"><SPAN lang="EN-US">Why does drbd spent that long time? <o:p></o:p></SPAN></P>
<P class="MsoNormal"><SPAN lang="EN-US"><o:p> </o:p></SPAN></P>
<P class="MsoNormal"><SPAN lang="EN-US">The log is:<o:p></o:p></SPAN></P>
<P class="MsoNormal"><SPAN lang="EN-US">Jul 22 21:06:34 QD-CS-MDC-B kernel: [325560.739458] block drbd1: peer( Primary -> Unknown ) conn( Connected -> NetworkFailure ) pdsk( UpToDate -> DUnknown ) <o:p></o:p></SPAN></P>
<P class="MsoNormal"><SPAN lang="EN-US">Jul 22 21:06:34 QD-CS-MDC-B kernel: [325560.739468] block drbd1: asender terminated<o:p></o:p></SPAN></P>
<P class="MsoNormal"><SPAN lang="EN-US">Jul 22 21:06:34 QD-CS-MDC-B kernel: [325560.739470] block drbd1: Terminating asender thread<o:p></o:p></SPAN></P>
<P class="MsoNormal"><SPAN lang="EN-US">Jul 22 21:06:34 QD-CS-MDC-B kernel: [325560.739526] block drbd1: short read expecting header on sock: r=-512<o:p></o:p></SPAN></P>
<P class="MsoNormal"><SPAN lang="EN-US">Jul 22 21:06:34 QD-CS-MDC-B kernel: [325560.739666] block drbd1: Connection closed<o:p></o:p></SPAN></P>
<P class="MsoNormal"><SPAN lang="EN-US">Jul 22 21:06:34 QD-CS-MDC-B kernel: [325560.739672] block drbd1: conn( NetworkFailure -> Unconnected ) <o:p></o:p></SPAN></P>
<P class="MsoNormal"><SPAN lang="EN-US">Jul 22 21:06:34 QD-CS-MDC-B kernel: [325560.739678] block drbd1: receiver terminated<o:p></o:p></SPAN></P>
<P class="MsoNormal"><SPAN lang="EN-US">Jul 22 21:06:34 QD-CS-MDC-B kernel: [325560.739680] block drbd1: Restarting receiver thread<o:p></o:p></SPAN></P>
<P class="MsoNormal"><SPAN lang="EN-US">Jul 22 21:06:34 QD-CS-MDC-B kernel: [325560.739683] block drbd1: receiver (re)started<o:p></o:p></SPAN></P>
<P class="MsoNormal"><SPAN lang="EN-US">Jul 22 21:06:34 QD-CS-MDC-B kernel: [325560.739687] block drbd1: conn( Unconnected -> WFConnection ) <o:p></o:p></SPAN></P>
<P class="MsoNormal"><SPAN lang="EN-US">Jul 22 21:06:39 QD-CS-MDC-B pengine: [17776]: info: crm_log_init: Changed active directory to /usr/var/lib/heartbeat/cores/root<o:p></o:p></SPAN></P>
<P class="MsoNormal"><SPAN lang="EN-US">Jul 22 21:06:47 QD-CS-MDC-B kernel: [325573.727331] NET: Registered protocol family 17<o:p></o:p></SPAN></P>
<P class="MsoNormal"><SPAN lang="EN-US">Jul 22 21:06:47 QD-CS-MDC-B kernel: [325573.768912] block drbd0: role( Secondary -> Primary ) <o:p></o:p></SPAN></P>
<P class="MsoNormal"><SPAN lang="EN-US">Jul 22 21:06:47 QD-CS-MDC-B kernel: [325573.772742] block drbd1: role( Secondary -> Primary ) <o:p></o:p></SPAN></P>
<P class="MsoNormal"><SPAN style="COLOR: red" lang="EN-US">Jul 22 21:06:47 QD-CS-MDC-B kernel: [325573.772997] block drbd1: Creating new current UUID<o:p></o:p></SPAN></P>
<P class="MsoNormal"><SPAN lang="EN-US">Jul 22 21:08:47 QD-CS-MDC-B su: (to hitv) root on none<o:p></o:p></SPAN></P>
<P class="MsoNormal"><SPAN style="COLOR: red" lang="EN-US">Jul 22 21:16:47 QD-CS-MDC-B kernel: [326174.032485] block drbd0: PingAck did not arrive in time.<o:p></o:p></SPAN></P>
<P class="MsoNormal"><SPAN lang="EN-US">Jul 22 21:16:47 QD-CS-MDC-B kernel: [326174.032493] block drbd0: peer( Primary -> Unknown ) conn( Connected -> NetworkFailure ) pdsk( UpToDate -> DUnknown ) <o:p></o:p></SPAN></P>
<P class="MsoNormal"><SPAN lang="EN-US">Jul 22 21:16:47 QD-CS-MDC-B kernel: [326174.032503] block drbd0: asender terminated<o:p></o:p></SPAN></P>
<P class="MsoNormal"><SPAN lang="EN-US">Jul 22 21:16:47 QD-CS-MDC-B kernel: [326174.032506] block drbd0: Terminating asender thread<o:p></o:p></SPAN></P>
<P class="MsoNormal"><SPAN lang="EN-US">Jul 22 21:16:47 QD-CS-MDC-B kernel: [326174.032514] block drbd0: Creating new current UUID<o:p></o:p></SPAN></P>
<P class="MsoNormal"><SPAN lang="EN-US">Jul 22 21:16:47 QD-CS-MDC-B kernel: [326174.032567] block drbd0: short read expecting header on sock: r=-512<o:p></o:p></SPAN></P>
<P class="MsoNormal"><SPAN lang="EN-US">Jul 22 21:16:47 QD-CS-MDC-B kernel: [326174.032868] block drbd0: Connection closed<o:p></o:p></SPAN></P>
<P class="MsoNormal"><SPAN lang="EN-US">Jul 22 21:16:47 QD-CS-MDC-B kernel: [326174.032875] block drbd0: conn( NetworkFailure -> Unconnected ) <o:p></o:p></SPAN></P>
<P class="MsoNormal"><SPAN lang="EN-US">Jul 22 21:16:47 QD-CS-MDC-B kernel: [326174.032879] block drbd0: receiver terminated<o:p></o:p></SPAN></P>
<P class="MsoNormal"><SPAN lang="EN-US">Jul 22 21:16:47 QD-CS-MDC-B kernel: [326174.032881] block drbd0: Restarting receiver thread<o:p></o:p></SPAN></P>
<P class="MsoNormal"><SPAN lang="EN-US">Jul 22 21:16:47 QD-CS-MDC-B kernel: [326174.032884] block drbd0: receiver (re)started<o:p></o:p></SPAN></P>
<P class="MsoNormal"><SPAN lang="EN-US">Jul 22 21:16:47 QD-CS-MDC-B kernel: [326174.032888] block drbd0: conn( Unconnected -> WFConnection )<o:p></o:p></SPAN></P>
<P class="MsoNormal"><SPAN style="FONT-SIZE: 12pt" lang="EN-US">Jul 22 21:16:48 QD-CS-MDC-B kernel: [326174.600888] kjournald starting. Commit interval 15 seconds<o:p></o:p></SPAN></P>
<P class="MsoNormal"><SPAN style="FONT-SIZE: 12pt" lang="EN-US">Jul 22 21:16:48 QD-CS-MDC-B kernel: [326174.600956] EXT3-fs warning: maximal mount count reached, running e2fsck is recommended<o:p></o:p></SPAN></P>
<P class="MsoNormal"><SPAN style="FONT-SIZE: 12pt" lang="EN-US">Jul 22 21:16:48 QD-CS-MDC-B kernel: [326174.601330] EXT3 FS on drbd0, internal journal<o:p></o:p></SPAN></P>
<P class="MsoNormal"><SPAN style="FONT-SIZE: 12pt" lang="EN-US">Jul 22 21:16:48 QD-CS-MDC-B kernel: [326174.601334] EXT3-fs: recovery complete.<o:p></o:p></SPAN></P>
<P class="MsoNormal"><SPAN style="FONT-SIZE: 12pt" lang="EN-US">Jul 22 21:16:48 QD-CS-MDC-B kernel: [326174.601392] EXT3-fs: mounted filesystem with ordered data mode. <o:p></o:p></SPAN></P>
<P class="MsoNormal"><SPAN style="FONT-SIZE: 12pt" lang="EN-US"> <o:p></o:p></SPAN></P>
<P class="MsoNormal"><SPAN lang="EN-US">According to the log, the timeout is PingAsk operation.<o:p></o:p></SPAN></P>
<P class="MsoNormal"><SPAN lang="EN-US"><o:p> </o:p></SPAN></P>
<P class="MsoNormal"><SPAN lang="EN-US"><o:p> </o:p></SPAN></P>
<P class="MsoNormal"><SPAN lang="EN-US">Thanks your help.<o:p></o:p></SPAN></P>
<P class="MsoNormal"><SPAN lang="EN-US"> <o:p></o:p></SPAN></P>
<P class="MsoNormal"><SPAN lang="EN-US"> </SPAN><SPAN style="FONT-SIZE: 12pt" lang="EN-US">simon<o:p></o:p></SPAN></P>
<P class="MsoNormal"><SPAN style="FONT-SIZE: 12pt" lang="EN-US"><o:p> </o:p></SPAN></P>
<P class="MsoNormal"><SPAN lang="EN-US"><o:p> </o:p></SPAN></P>
<P class="MsoNormal"><SPAN lang="EN-US"><o:p> </o:p></SPAN></P></DIV></BLOCKQUOTE><BR><SPAN></SPAN>