[DRBD-user] DRBD Xen problems

Philipp Reisner philipp.reisner at linbit.com
Wed Jan 3 12:03:04 CET 2007

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Am Mittwoch, 3. Januar 2007 04:44 schrieb Gary W. Smith:
> Hello,
>
> I'm running drbd 0.7.22 under rPath in a Xen DomU.  At first everything
> was working fine.  Node 1 was actually under a non-Xen RHEL4 instance
> under VMWare and node 2 was under rPath Xen DomU.  Everything worked
> fine there.  We migrated Node 1 to match the other node (OS and config).
> Everything continued to work fine and about a week later we started
> seeing things like this on the console:
>
> Dec 29 19:23:30 pxtoaksql04a kernel: drbd0: drbd0_receiver [731]: cstate
> Unconnected --> WFConnection
> Dec 29 19:23:38 pxtoaksql04a kernel: drbd0: drbd0_receiver [731]: cstate
> WFConnection --> WFReportParams
> Dec 29 19:23:38 pxtoaksql04a kernel: drbd0: sock was shut down by peer
> Dec 29 19:23:38 pxtoaksql04a kernel: drbd0: drbd0_receiver [731]: cstate
> WFReportParams --> BrokenPipe
> Dec 29 19:23:38 pxtoaksql04a kernel: drbd0: short read expecting header
> on sock: r=0
> Dec 29 19:23:38 pxtoaksql04a kernel: drbd0: Network error during initial
> handshake. I'll try again.
> Dec 29 19:23:44 pxtoaksql04a kernel: drbd0: worker terminated
> Dec 29 19:23:44 pxtoaksql04a kernel: drbd0: drbd0_receiver [731]: cstate
> BrokenPipe --> Unconnected
> Dec 29 19:23:44 pxtoaksql04a kernel: drbd0: Connection lost.
> Dec 29 19:23:44 pxtoaksql04a kernel: drbd0: drbd0_receiver [731]: cstate
> Unconnected --> WFConnection
> Dec 29 19:23:51 pxtoaksql04a kernel: drbd0: drbd0_receiver [731]: cstate
> WFConnection --> WFReportParams
> Dec 29 19:23:51 pxtoaksql04a kernel: drbd0: sock was shut down by peer
> Dec 29 19:23:51 pxtoaksql04a kernel: drbd0: drbd0_receiver [731]: cstate
> WFReportParams --> BrokenPipe
> Dec 29 19:23:51 pxtoaksql04a kernel: drbd0: short read expecting header
> on sock: r=0
> Dec 29 19:23:51 pxtoaksql04a kernel: drbd0: Network error during initial
> handshake. I'll try again.
> Dec 29 19:23:57 pxtoaksql04a kernel: drbd0: worker terminated
> Dec 29 19:23:57 pxtoaksql04a kernel: drbd0: drbd0_receiver [731]: cstate
> BrokenPipe --> Unconnected
> Dec 29 19:23:57 pxtoaksql04a kernel: drbd0: Connection lost.
> ...
>
> Restarting DRBD on the affected node does nothing.  Restarting this
> machine seems to fix the problem but then HA takes over the node and
> shuts down the active primary (gracefully even though auto failback is
> disabled -- this is a HA problem though).
>
> Any ideas on why this might be happening?  Would heavy IO load cause
> this?
>

To understand what is going on here we need the kernel-log of both
machines of such an event! 
PS: Please make sure that the clocks of those machines are in sync
(use NTP!)

-Phil

-- 
: Dipl-Ing Philipp Reisner                      Tel +43-1-8178292-50 :
: LINBIT Information Technologies GmbH          Fax +43-1-8178292-82 :
: Vivenotgasse 48, 1120 Vienna, Austria        http://www.linbit.com :



More information about the drbd-user mailing list