[DRBD-user] DRBD Xen problems

Gary W. Smith gary at primeexalia.com
Wed Jan 3 20:38:28 CET 2007


It will be a little while before I can get the log information together.
We will be restarting the clusters this weekend.  After that the data
should be synced.  When that's done we'll start watching for the
failure.

Gary 

> -----Original Message-----
> From: drbd-user-bounces at lists.linbit.com [mailto:drbd-user-
> bounces at lists.linbit.com] On Behalf Of Philipp Reisner
> Sent: Wednesday, January 03, 2007 3:03 AM
> To: drbd-user at lists.linbit.com
> Subject: Re: [DRBD-user] DRBD Xen problems
> 
> Am Mittwoch, 3. Januar 2007 04:44 schrieb Gary W. Smith:
> > Hello,
> >
> > I'm running drbd 0.7.22 under rPath in a Xen DomU.  At first
everything
> > was working fine.  Node 1 was actually under a non-Xen RHEL4
instance
> > under VMWare and node 2 was under rPath Xen DomU.  Everything worked
> > fine there.  We migrated Node 1 to match the other node (OS and
config).
> > Everything continued to work fine and about a week later we started
> > seeing things like this on the console:
> >
> > Dec 29 19:23:30 pxtoaksql04a kernel: drbd0: drbd0_receiver [731]:
cstate
> > Unconnected --> WFConnection
> > Dec 29 19:23:38 pxtoaksql04a kernel: drbd0: drbd0_receiver [731]:
cstate
> > WFConnection --> WFReportParams
> > Dec 29 19:23:38 pxtoaksql04a kernel: drbd0: sock was shut down by
peer
> > Dec 29 19:23:38 pxtoaksql04a kernel: drbd0: drbd0_receiver [731]:
cstate
> > WFReportParams --> BrokenPipe
> > Dec 29 19:23:38 pxtoaksql04a kernel: drbd0: short read expecting
header
> > on sock: r=0
> > Dec 29 19:23:38 pxtoaksql04a kernel: drbd0: Network error during
initial
> > handshake. I'll try again.
> > Dec 29 19:23:44 pxtoaksql04a kernel: drbd0: worker terminated
> > Dec 29 19:23:44 pxtoaksql04a kernel: drbd0: drbd0_receiver [731]:
cstate
> > BrokenPipe --> Unconnected
> > Dec 29 19:23:44 pxtoaksql04a kernel: drbd0: Connection lost.
> > Dec 29 19:23:44 pxtoaksql04a kernel: drbd0: drbd0_receiver [731]:
cstate
> > Unconnected --> WFConnection
> > Dec 29 19:23:51 pxtoaksql04a kernel: drbd0: drbd0_receiver [731]:
cstate
> > WFConnection --> WFReportParams
> > Dec 29 19:23:51 pxtoaksql04a kernel: drbd0: sock was shut down by
peer
> > Dec 29 19:23:51 pxtoaksql04a kernel: drbd0: drbd0_receiver [731]:
cstate
> > WFReportParams --> BrokenPipe
> > Dec 29 19:23:51 pxtoaksql04a kernel: drbd0: short read expecting
header
> > on sock: r=0
> > Dec 29 19:23:51 pxtoaksql04a kernel: drbd0: Network error during
initial
> > handshake. I'll try again.
> > Dec 29 19:23:57 pxtoaksql04a kernel: drbd0: worker terminated
> > Dec 29 19:23:57 pxtoaksql04a kernel: drbd0: drbd0_receiver [731]:
cstate
> > BrokenPipe --> Unconnected
> > Dec 29 19:23:57 pxtoaksql04a kernel: drbd0: Connection lost.
> > ...
> >
> > Restarting DRBD on the affected node does nothing.  Restarting this
> > machine seems to fix the problem but then HA takes over the node and
> > shuts down the active primary (gracefully even though auto failback
is
> > disabled -- this is a HA problem though).
> >
> > Any ideas on why this might be happening?  Would heavy IO load cause
> > this?
> >
> 
> To understand what is going on here we need the kernel-log of both
> machines of such an event!
> PS: Please make sure that the clocks of those machines are in sync
> (use NTP!)
> 
> -Phil
> 
> --
> : Dipl-Ing Philipp Reisner                      Tel +43-1-8178292-50 :
> : LINBIT Information Technologies GmbH          Fax +43-1-8178292-82 :
> : Vivenotgasse 48, 1120 Vienna, Austria        http://www.linbit.com :
> _______________________________________________
> drbd-user mailing list
> drbd-user at lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user



More information about the drbd-user mailing list