[DRBD-user] DRBD is passing I/O-error to upper layer, but should not
lars.ellenberg at linbit.com
Mon Jan 28 18:26:50 CET 2013
On Mon, Jan 28, 2013 at 05:57:43PM +0100, Lars Ellenberg wrote:
> On Mon, Jan 28, 2013 at 04:04:32PM +0100, Matthias Hensler wrote:
> > Hi,
> > On Mon, Jan 28, 2013 at 03:26:49PM +0100, Felix Frank wrote:
> > > On 01/26/2013 06:19 PM, Matthias Hensler wrote:
> > > > Jan 26 15:32:21 lisa kernel: block drbd12: IO ERROR: neither local nor remote data, sector 0+0
> > > > Jan 26 15:32:21 lisa kernel: block drbd9: IO ERROR: neither local nor remote data, sector 0+0
> > >
> > > I'm not at all sure about this, but this does seem to indicate that
> > > the peer either has disk problems of its own (not likely) or otherwise
> > > cannot access this specific (first?) sector.
> > I do not think that there were any disk problems on the peer. In fact I
> > did reboot all virtual machines on the peer side prior to replacing the
> > failed disk. Everything went smooth from there.
> Did you actually see DRBD pass errors up the stack
> (file systems remounting read-only, guest VMs noticing IO error),
> or have you "only" been scared by above log message.
> If the latter, we may need to do some "cosmetic surgery".
> If the former, and this node still had an established connection to a
> healthy peer device, we'd have a "real" bug.
Ok. I think we may in fact have a real bug there.
I still need to reproduce and double check, but it looks like
"empty flushes" (file system barriers) on a Diskless Primary
may be affected, obviously only on systems that do support those,
which is drbd 8.3.14 and 8.3.15, on kernels that implement "barrier"
as REQ_FLUSH/REQ_FUA (not REQ_BARRIER).
drbd 8.4 should be good.
We'll see what we can do.
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com
DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
More information about the drbd-user