[DRBD-user] DRBD device stalled after reconnection

Maros Timko timkom at gmail.com
Thu Feb 5 16:26:06 CET 2009

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Hi Lars,

I downloaded the trunk of 8.2 some weeks ago (GIT-hash:
61b7f4c2fc34fe3d2acf7be6bcc1fc2684708a7d) and created rpms. I assume it is
8.2.7.

Unfortunatelly I have not yet tested such cases against 8.3.

Thanks

2009/2/5 Lars Ellenberg <lars.ellenberg at linbit.com>

> On Thu, Feb 05, 2009 at 02:25:54PM +0000, Maros Timko wrote:
> > Hi all,
> >
> > we are running Xen VMs on top of DRBD, DRBD resources are defined on top
> of
> > LVMs. We use 64-bit CentOS 5.2 (2.6.18-92.1.22.el5xen). Previously we
> were
> > testing the setup with DRBD RPMs from CentOS distribution (8.2.6-3), but
> we
> > met an issue: device on top of which still runs Xen VM at the time of
> DRBD
> > communication path is broken (we just removed dedicated crossover cable
> for
> > simple tests) for some time, stalled at the sync progress at 100% after
> > reconnection. This was easily reproducible and the more changes occured
> on
> > the device when disconnected the higher probability of the stalling. We
> use
> > synchronuous resync definition (using "after" config) so it means for us
> > that all the followers are stuck in PausedSync states with inconsistent
> data
> > state. Reconnection of this device solves the issue, however, there is no
> > handler for such situations and devices itself looks happy (syncing
> although
> > at 100%).
> >
> > So we tried to upgrade to DRBD 8.2.7 (GIT-hash:
> > 61b7f4c2fc34fe3d2acf7be6bcc1fc2684708a7d) - it seemed like this release
> > solved such issue. However, we still experience this, although not so
> often
> > and the behaviour is different - device get stalled at e.g. 25% and then
> the
> > number decreases. This is I think because still new changes are coming so
> > the update of statistics gives such results.
>
> likely something completely different than the issue described in the
> first paragraph.
>
> > I tried to look for stalling issues on the list but seems like there is
> no
> > definite answer. If anyone has an experience with some kind of
> information
> > on how to prevent such issues, it would be great. Most of the issues what
> I
> > saw were related to network quality or huge amount of data that needs to
> be
> > resynced. But we are trying simply plug out the cable.
> >
> > I am enclosing dump of related device only, all others are exactly the
> > same excepting LVMs ... and corresponding /var/log/messages section.
>
> This:
>
> > Feb  5 09:35:06 svdom0-0148 kernel: drbd1: cs:SyncSource rs_left=19637 >
> rs_total=19587 (rs_failed 0)
>
> is an interessting message.
> This should not normally happen,
> though there are situations where it may happen.
>
> Which one, exactly, is this, 8.2.7?
>
> Did you try with 8.3.0?
>
> --
> : Lars Ellenberg
> : LINBIT | Your Way to High Availability
> : DRBD/HA support and consulting http://www.linbit.com
>
> DRBD(R) and LINBIT(R) are registered trademarks of LINBIT, Austria.
> __
> please don't Cc me, but send to list   --   I'm subscribed
> _______________________________________________
> drbd-user mailing list
> drbd-user at lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20090205/4522d2b6/attachment.htm>


More information about the drbd-user mailing list