<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=us-ascii">
<META NAME="Generator" CONTENT="MS Exchange Server version 6.5.7650.28">
<TITLE>DRBD Xen problems</TITLE>
</HEAD>
<BODY>
<!-- Converted from text/rtf format -->
<P DIR=LTR><SPAN LANG="en-us"></SPAN><SPAN LANG="en-us"><FONT SIZE=2 FACE="Arial">Hello,</FONT></SPAN><SPAN LANG="en-us"></SPAN><SPAN LANG="en-us"> </SPAN></P>
<P DIR=LTR><SPAN LANG="en-us"></SPAN><SPAN LANG="en-us"></SPAN></P>
<P DIR=LTR><SPAN LANG="en-us"><FONT SIZE=2 FACE="Arial">I</FONT></SPAN><SPAN LANG="en-us"></SPAN><SPAN LANG="en-us"><FONT SIZE=2 FACE="Arial">’</FONT></SPAN><SPAN LANG="en-us"></SPAN><SPAN LANG="en-us"><FONT SIZE=2 FACE="Arial">m running drbd 0.7.22 under rPath in a Xen DomU. At first everything was working fine. Node 1 was actually under a non-Xen RHEL4 instance under VMWare and node 2 was under rPath Xen DomU. Everything worked fine there. We migrated Node 1 to match the other node (OS and config). Everything continued to work fine</FONT></SPAN><SPAN LANG="en-us"></SPAN><SPAN LANG="en-us"> <FONT SIZE=2 FACE="Arial">and about a week later we started seeing</FONT></SPAN><SPAN LANG="en-us"></SPAN><SPAN LANG="en-us"> <FONT SIZE=2 FACE="Arial">things like this on the console:</FONT></SPAN></P>
<P DIR=LTR><SPAN LANG="en-us"></SPAN><SPAN LANG="en-us"><FONT SIZE=2 FACE="Arial">Dec 29 19:23:30 pxtoaksql04a kernel: drbd0: drbd0_receiver [731]: cstate Unconnected --> WFConnection</FONT></SPAN></P>
<P DIR=LTR><SPAN LANG="en-us"><FONT SIZE=2 FACE="Arial">Dec 29 19:23:38 pxtoaksql04a kernel: drbd0: drbd0_receiver [731]: cstate WFConnection --> WFReportParams</FONT></SPAN></P>
<P DIR=LTR><SPAN LANG="en-us"><FONT SIZE=2 FACE="Arial">Dec 29 19:23:38 pxtoaksql04a kernel: drbd0: sock was shut down by peer</FONT></SPAN></P>
<P DIR=LTR><SPAN LANG="en-us"><FONT SIZE=2 FACE="Arial">Dec 29 19:23:38 pxtoaksql04a kernel: drbd0: drbd0_receiver [731]: cstate WFReportParams --> BrokenPipe</FONT></SPAN></P>
<P DIR=LTR><SPAN LANG="en-us"><FONT SIZE=2 FACE="Arial">Dec 29 19:23:38 pxtoaksql04a kernel: drbd0: short read expecting header on sock: r=0</FONT></SPAN></P>
<P DIR=LTR><SPAN LANG="en-us"><FONT SIZE=2 FACE="Arial">Dec 29 19:23:38 pxtoaksql04a kernel: drbd0: Network error during initial handshake. I'll try again.</FONT></SPAN></P>
<P DIR=LTR><SPAN LANG="en-us"><FONT SIZE=2 FACE="Arial">Dec 29 19:23:44 pxtoaksql04a kernel: drbd0: worker terminated</FONT></SPAN></P>
<P DIR=LTR><SPAN LANG="en-us"><FONT SIZE=2 FACE="Arial">Dec 29 19:23:44 pxtoaksql04a kernel: drbd0: drbd0_receiver [731]: cstate BrokenPipe --> Unconnected</FONT></SPAN></P>
<P DIR=LTR><SPAN LANG="en-us"><FONT SIZE=2 FACE="Arial">Dec 29 19:23:44 pxtoaksql04a kernel: drbd0: Connection lost.</FONT></SPAN></P>
<P DIR=LTR><SPAN LANG="en-us"><FONT SIZE=2 FACE="Arial">Dec 29 19:23:44 pxtoaksql04a kernel: drbd0: drbd0_receiver [731]: cstate Unconnected --> WFConnection</FONT></SPAN></P>
<P DIR=LTR><SPAN LANG="en-us"><FONT SIZE=2 FACE="Arial">Dec 29 19:23:51 pxtoaksql04a kernel: drbd0: drbd0_receiver [731]: cstate WFConnection --> WFReportParams</FONT></SPAN></P>
<P DIR=LTR><SPAN LANG="en-us"><FONT SIZE=2 FACE="Arial">Dec 29 19:23:51 pxtoaksql04a kernel: drbd0: sock was shut down by peer</FONT></SPAN></P>
<P DIR=LTR><SPAN LANG="en-us"><FONT SIZE=2 FACE="Arial">Dec 29 19:23:51 pxtoaksql04a kernel: drbd0: drbd0_receiver [731]: cstate WFReportParams --> BrokenPipe</FONT></SPAN></P>
<P DIR=LTR><SPAN LANG="en-us"><FONT SIZE=2 FACE="Arial">Dec 29 19:23:51 pxtoaksql04a kernel: drbd0: short read expecting header on sock: r=0</FONT></SPAN></P>
<P DIR=LTR><SPAN LANG="en-us"><FONT SIZE=2 FACE="Arial">Dec 29 19:23:51 pxtoaksql04a kernel: drbd0: Network error during initial handshake. I'll try again.</FONT></SPAN></P>
<P DIR=LTR><SPAN LANG="en-us"><FONT SIZE=2 FACE="Arial">Dec 29 19:23:57 pxtoaksql04a kernel: drbd0: worker terminated</FONT></SPAN></P>
<P DIR=LTR><SPAN LANG="en-us"><FONT SIZE=2 FACE="Arial">Dec 29 19:23:57 pxtoaksql04a kernel: drbd0: drbd0_receiver [731]: cstate BrokenPipe --> Unconnected</FONT></SPAN></P>
<P DIR=LTR><SPAN LANG="en-us"><FONT SIZE=2 FACE="Arial">Dec 29 19:23:57 pxtoaksql04a kernel: drbd0: Connection lost.</FONT></SPAN><SPAN LANG="en-us"></SPAN><SPAN LANG="en-us"></SPAN></P>
<P DIR=LTR><SPAN LANG="en-us"></SPAN><SPAN LANG="en-us"><FONT SIZE=2 FACE="Arial">…</FONT></SPAN><SPAN LANG="en-us"></SPAN><SPAN LANG="en-us"></SPAN></P>
<P DIR=LTR><SPAN LANG="en-us"><FONT SIZE=2 FACE="Arial">Restarting DRBD on the</FONT></SPAN><SPAN LANG="en-us"></SPAN><SPAN LANG="en-us"> <FONT SIZE=2 FACE="Arial">affected node does nothing. Restarting this machine seems to fix the problem but then HA takes over the node and shuts down the active primary (gracefully even though auto</FONT></SPAN><SPAN LANG="en-us"></SPAN><SPAN LANG="en-us"><FONT SIZE=2 FACE="Arial"></FONT></SPAN><SPAN LANG="en-us"></SPAN><SPAN LANG="en-us"> <FONT SIZE=2 FACE="Arial">failback is</FONT></SPAN><SPAN LANG="en-us"></SPAN><SPAN LANG="en-us"> <FONT SIZE=2 FACE="Arial">disabled -- this is a HA problem though).</FONT></SPAN></P>
<P DIR=LTR><SPAN LANG="en-us"><FONT SIZE=2 FACE="Arial">Any ideas on why this might be happening? Would heavy IO load cause this?</FONT></SPAN></P>
<P DIR=LTR><SPAN LANG="en-us"></SPAN><SPAN LANG="en-us"></SPAN></P>
<P DIR=LTR><SPAN LANG="en-us"></SPAN><SPAN LANG="en-us"><FONT SIZE=2 FACE="Arial">Gary Wayne Smith</FONT></SPAN><SPAN LANG="en-us"></SPAN><SPAN LANG="en-us"></SPAN></P>
</BODY>
</HTML>