Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hello, I'm running drbd 0.7.22 under rPath in a Xen DomU. At first everything was working fine. Node 1 was actually under a non-Xen RHEL4 instance under VMWare and node 2 was under rPath Xen DomU. Everything worked fine there. We migrated Node 1 to match the other node (OS and config). Everything continued to work fine and about a week later we started seeing things like this on the console: Dec 29 19:23:30 pxtoaksql04a kernel: drbd0: drbd0_receiver [731]: cstate Unconnected --> WFConnection Dec 29 19:23:38 pxtoaksql04a kernel: drbd0: drbd0_receiver [731]: cstate WFConnection --> WFReportParams Dec 29 19:23:38 pxtoaksql04a kernel: drbd0: sock was shut down by peer Dec 29 19:23:38 pxtoaksql04a kernel: drbd0: drbd0_receiver [731]: cstate WFReportParams --> BrokenPipe Dec 29 19:23:38 pxtoaksql04a kernel: drbd0: short read expecting header on sock: r=0 Dec 29 19:23:38 pxtoaksql04a kernel: drbd0: Network error during initial handshake. I'll try again. Dec 29 19:23:44 pxtoaksql04a kernel: drbd0: worker terminated Dec 29 19:23:44 pxtoaksql04a kernel: drbd0: drbd0_receiver [731]: cstate BrokenPipe --> Unconnected Dec 29 19:23:44 pxtoaksql04a kernel: drbd0: Connection lost. Dec 29 19:23:44 pxtoaksql04a kernel: drbd0: drbd0_receiver [731]: cstate Unconnected --> WFConnection Dec 29 19:23:51 pxtoaksql04a kernel: drbd0: drbd0_receiver [731]: cstate WFConnection --> WFReportParams Dec 29 19:23:51 pxtoaksql04a kernel: drbd0: sock was shut down by peer Dec 29 19:23:51 pxtoaksql04a kernel: drbd0: drbd0_receiver [731]: cstate WFReportParams --> BrokenPipe Dec 29 19:23:51 pxtoaksql04a kernel: drbd0: short read expecting header on sock: r=0 Dec 29 19:23:51 pxtoaksql04a kernel: drbd0: Network error during initial handshake. I'll try again. Dec 29 19:23:57 pxtoaksql04a kernel: drbd0: worker terminated Dec 29 19:23:57 pxtoaksql04a kernel: drbd0: drbd0_receiver [731]: cstate BrokenPipe --> Unconnected Dec 29 19:23:57 pxtoaksql04a kernel: drbd0: Connection lost. ... Restarting DRBD on the affected node does nothing. Restarting this machine seems to fix the problem but then HA takes over the node and shuts down the active primary (gracefully even though auto failback is disabled -- this is a HA problem though). Any ideas on why this might be happening? Would heavy IO load cause this? Gary Wayne Smith -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20070102/4e4b7c44/attachment.htm>