Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On Thu, Feb 05, 2009 at 02:25:54PM +0000, Maros Timko wrote: > Hi all, > > we are running Xen VMs on top of DRBD, DRBD resources are defined on top of > LVMs. We use 64-bit CentOS 5.2 (2.6.18-92.1.22.el5xen). Previously we were > testing the setup with DRBD RPMs from CentOS distribution (8.2.6-3), but we > met an issue: device on top of which still runs Xen VM at the time of DRBD > communication path is broken (we just removed dedicated crossover cable for > simple tests) for some time, stalled at the sync progress at 100% after > reconnection. This was easily reproducible and the more changes occured on > the device when disconnected the higher probability of the stalling. We use > synchronuous resync definition (using "after" config) so it means for us > that all the followers are stuck in PausedSync states with inconsistent data > state. Reconnection of this device solves the issue, however, there is no > handler for such situations and devices itself looks happy (syncing although > at 100%). > > So we tried to upgrade to DRBD 8.2.7 (GIT-hash: > 61b7f4c2fc34fe3d2acf7be6bcc1fc2684708a7d) - it seemed like this release > solved such issue. However, we still experience this, although not so often > and the behaviour is different - device get stalled at e.g. 25% and then the > number decreases. This is I think because still new changes are coming so > the update of statistics gives such results. likely something completely different than the issue described in the first paragraph. > I tried to look for stalling issues on the list but seems like there is no > definite answer. If anyone has an experience with some kind of information > on how to prevent such issues, it would be great. Most of the issues what I > saw were related to network quality or huge amount of data that needs to be > resynced. But we are trying simply plug out the cable. > > I am enclosing dump of related device only, all others are exactly the > same excepting LVMs ... and corresponding /var/log/messages section. This: > Feb 5 09:35:06 svdom0-0148 kernel: drbd1: cs:SyncSource rs_left=19637 > rs_total=19587 (rs_failed 0) is an interessting message. This should not normally happen, though there are situations where it may happen. Which one, exactly, is this, 8.2.7? Did you try with 8.3.0? -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. __ please don't Cc me, but send to list -- I'm subscribed