[DRBD-user] DRBD is passing I/O-error to upper layer, but should not

Mon Jan 28 18:26:50 CET 2013

On Mon, Jan 28, 2013 at 05:57:43PM +0100, Lars Ellenberg wrote:
> On Mon, Jan 28, 2013 at 04:04:32PM +0100, Matthias Hensler wrote:
> > Hi,
> > 
> > On Mon, Jan 28, 2013 at 03:26:49PM +0100, Felix Frank wrote:
> > > On 01/26/2013 06:19 PM, Matthias Hensler wrote:
> > > > Jan 26 15:32:21 lisa kernel: block drbd12: IO ERROR: neither local nor remote data, sector 0+0
> > > > Jan 26 15:32:21 lisa kernel: block drbd9: IO ERROR: neither local nor remote data, sector 0+0
> > > 
> > > I'm not at all sure about this, but this does seem to indicate that
> > > the peer either has disk problems of its own (not likely) or otherwise
> > > cannot access this specific (first?) sector.
> > 
> > I do not think that there were any disk problems on the peer. In fact I
> > did reboot all virtual machines on the peer side prior to replacing the
> > failed disk. Everything went smooth from there.
> 
> 
> Did you actually see DRBD pass errors up the stack
> (file systems remounting read-only, guest VMs noticing IO error),
> or have you "only" been scared by above log message.
> 
> 
> If the latter, we may need to do some "cosmetic surgery".
> 
> If the former, and this node still had an established connection to a
> healthy peer device, we'd have a "real" bug.

Ok. I think we may in fact have a real bug there.
I still need to reproduce and double check, but it looks like
"empty flushes" (file system barriers) on a Diskless Primary
may be affected, obviously only on systems that do support those,
which is drbd 8.3.14 and 8.3.15, on kernels that implement "barrier"
as REQ_FLUSH/REQ_FUA (not REQ_BARRIER).

drbd 8.4 should be good.

We'll see what we can do.

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.