Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Am Mittwoch, 3. Januar 2007 04:44 schrieb Gary W. Smith: > Hello, > > I'm running drbd 0.7.22 under rPath in a Xen DomU. At first everything > was working fine. Node 1 was actually under a non-Xen RHEL4 instance > under VMWare and node 2 was under rPath Xen DomU. Everything worked > fine there. We migrated Node 1 to match the other node (OS and config). > Everything continued to work fine and about a week later we started > seeing things like this on the console: > > Dec 29 19:23:30 pxtoaksql04a kernel: drbd0: drbd0_receiver [731]: cstate > Unconnected --> WFConnection > Dec 29 19:23:38 pxtoaksql04a kernel: drbd0: drbd0_receiver [731]: cstate > WFConnection --> WFReportParams > Dec 29 19:23:38 pxtoaksql04a kernel: drbd0: sock was shut down by peer > Dec 29 19:23:38 pxtoaksql04a kernel: drbd0: drbd0_receiver [731]: cstate > WFReportParams --> BrokenPipe > Dec 29 19:23:38 pxtoaksql04a kernel: drbd0: short read expecting header > on sock: r=0 > Dec 29 19:23:38 pxtoaksql04a kernel: drbd0: Network error during initial > handshake. I'll try again. > Dec 29 19:23:44 pxtoaksql04a kernel: drbd0: worker terminated > Dec 29 19:23:44 pxtoaksql04a kernel: drbd0: drbd0_receiver [731]: cstate > BrokenPipe --> Unconnected > Dec 29 19:23:44 pxtoaksql04a kernel: drbd0: Connection lost. > Dec 29 19:23:44 pxtoaksql04a kernel: drbd0: drbd0_receiver [731]: cstate > Unconnected --> WFConnection > Dec 29 19:23:51 pxtoaksql04a kernel: drbd0: drbd0_receiver [731]: cstate > WFConnection --> WFReportParams > Dec 29 19:23:51 pxtoaksql04a kernel: drbd0: sock was shut down by peer > Dec 29 19:23:51 pxtoaksql04a kernel: drbd0: drbd0_receiver [731]: cstate > WFReportParams --> BrokenPipe > Dec 29 19:23:51 pxtoaksql04a kernel: drbd0: short read expecting header > on sock: r=0 > Dec 29 19:23:51 pxtoaksql04a kernel: drbd0: Network error during initial > handshake. I'll try again. > Dec 29 19:23:57 pxtoaksql04a kernel: drbd0: worker terminated > Dec 29 19:23:57 pxtoaksql04a kernel: drbd0: drbd0_receiver [731]: cstate > BrokenPipe --> Unconnected > Dec 29 19:23:57 pxtoaksql04a kernel: drbd0: Connection lost. > ... > > Restarting DRBD on the affected node does nothing. Restarting this > machine seems to fix the problem but then HA takes over the node and > shuts down the active primary (gracefully even though auto failback is > disabled -- this is a HA problem though). > > Any ideas on why this might be happening? Would heavy IO load cause > this? > To understand what is going on here we need the kernel-log of both machines of such an event! PS: Please make sure that the clocks of those machines are in sync (use NTP!) -Phil -- : Dipl-Ing Philipp Reisner Tel +43-1-8178292-50 : : LINBIT Information Technologies GmbH Fax +43-1-8178292-82 : : Vivenotgasse 48, 1120 Vienna, Austria http://www.linbit.com :