[DRBD-user] DRBD Xen problems

Gary W. Smith gary at primeexalia.com
Wed Jan 3 04:44:45 CET 2007


Hello, 

I'm running drbd 0.7.22 under rPath in a Xen DomU.  At first everything
was working fine.  Node 1 was actually under a non-Xen RHEL4 instance
under VMWare and node 2 was under rPath Xen DomU.  Everything worked
fine there.  We migrated Node 1 to match the other node (OS and config).
Everything continued to work fine and about a week later we started
seeing things like this on the console:

Dec 29 19:23:30 pxtoaksql04a kernel: drbd0: drbd0_receiver [731]: cstate
Unconnected --> WFConnection
Dec 29 19:23:38 pxtoaksql04a kernel: drbd0: drbd0_receiver [731]: cstate
WFConnection --> WFReportParams
Dec 29 19:23:38 pxtoaksql04a kernel: drbd0: sock was shut down by peer
Dec 29 19:23:38 pxtoaksql04a kernel: drbd0: drbd0_receiver [731]: cstate
WFReportParams --> BrokenPipe
Dec 29 19:23:38 pxtoaksql04a kernel: drbd0: short read expecting header
on sock: r=0
Dec 29 19:23:38 pxtoaksql04a kernel: drbd0: Network error during initial
handshake. I'll try again.
Dec 29 19:23:44 pxtoaksql04a kernel: drbd0: worker terminated
Dec 29 19:23:44 pxtoaksql04a kernel: drbd0: drbd0_receiver [731]: cstate
BrokenPipe --> Unconnected
Dec 29 19:23:44 pxtoaksql04a kernel: drbd0: Connection lost.
Dec 29 19:23:44 pxtoaksql04a kernel: drbd0: drbd0_receiver [731]: cstate
Unconnected --> WFConnection
Dec 29 19:23:51 pxtoaksql04a kernel: drbd0: drbd0_receiver [731]: cstate
WFConnection --> WFReportParams
Dec 29 19:23:51 pxtoaksql04a kernel: drbd0: sock was shut down by peer
Dec 29 19:23:51 pxtoaksql04a kernel: drbd0: drbd0_receiver [731]: cstate
WFReportParams --> BrokenPipe
Dec 29 19:23:51 pxtoaksql04a kernel: drbd0: short read expecting header
on sock: r=0
Dec 29 19:23:51 pxtoaksql04a kernel: drbd0: Network error during initial
handshake. I'll try again.
Dec 29 19:23:57 pxtoaksql04a kernel: drbd0: worker terminated
Dec 29 19:23:57 pxtoaksql04a kernel: drbd0: drbd0_receiver [731]: cstate
BrokenPipe --> Unconnected
Dec 29 19:23:57 pxtoaksql04a kernel: drbd0: Connection lost.
...

Restarting DRBD on the affected node does nothing.  Restarting this
machine seems to fix the problem but then HA takes over the node and
shuts down the active primary (gracefully even though auto failback is
disabled -- this is a HA problem though).

Any ideas on why this might be happening?  Would heavy IO load cause
this?

Gary Wayne Smith
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20070102/4e4b7c44/attachment.htm>


More information about the drbd-user mailing list