[DRBD-user] DRBD device stalled after reconnection

Lars Ellenberg lars.ellenberg at linbit.com
Thu Feb 5 16:04:28 CET 2009

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


On Thu, Feb 05, 2009 at 02:25:54PM +0000, Maros Timko wrote:
> Hi all,
> 
> we are running Xen VMs on top of DRBD, DRBD resources are defined on top of
> LVMs. We use 64-bit CentOS 5.2 (2.6.18-92.1.22.el5xen). Previously we were
> testing the setup with DRBD RPMs from CentOS distribution (8.2.6-3), but we
> met an issue: device on top of which still runs Xen VM at the time of DRBD
> communication path is broken (we just removed dedicated crossover cable for
> simple tests) for some time, stalled at the sync progress at 100% after
> reconnection. This was easily reproducible and the more changes occured on
> the device when disconnected the higher probability of the stalling. We use
> synchronuous resync definition (using "after" config) so it means for us
> that all the followers are stuck in PausedSync states with inconsistent data
> state. Reconnection of this device solves the issue, however, there is no
> handler for such situations and devices itself looks happy (syncing although
> at 100%).
>
> So we tried to upgrade to DRBD 8.2.7 (GIT-hash:
> 61b7f4c2fc34fe3d2acf7be6bcc1fc2684708a7d) - it seemed like this release
> solved such issue. However, we still experience this, although not so often
> and the behaviour is different - device get stalled at e.g. 25% and then the
> number decreases. This is I think because still new changes are coming so
> the update of statistics gives such results.

likely something completely different than the issue described in the
first paragraph.

> I tried to look for stalling issues on the list but seems like there is no
> definite answer. If anyone has an experience with some kind of information
> on how to prevent such issues, it would be great. Most of the issues what I
> saw were related to network quality or huge amount of data that needs to be
> resynced. But we are trying simply plug out the cable.
> 
> I am enclosing dump of related device only, all others are exactly the
> same excepting LVMs ... and corresponding /var/log/messages section.

This:

> Feb  5 09:35:06 svdom0-0148 kernel: drbd1: cs:SyncSource rs_left=19637 > rs_total=19587 (rs_failed 0)

is an interessting message.
This should not normally happen,
though there are situations where it may happen.

Which one, exactly, is this, 8.2.7?

Did you try with 8.3.0?

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed



More information about the drbd-user mailing list