[DRBD-user] DRBD is passing I/O-error to upper layer, but should not

Mon Jan 28 16:04:32 CET 2013

Hi,

On Mon, Jan 28, 2013 at 03:26:49PM +0100, Felix Frank wrote:
> On 01/26/2013 06:19 PM, Matthias Hensler wrote:
> > Jan 26 15:32:21 lisa kernel: block drbd12: IO ERROR: neither local nor remote data, sector 0+0
> > Jan 26 15:32:21 lisa kernel: block drbd9: IO ERROR: neither local nor remote data, sector 0+0
> 
> I'm not at all sure about this, but this does seem to indicate that
> the peer either has disk problems of its own (not likely) or otherwise
> cannot access this specific (first?) sector.

I do not think that there were any disk problems on the peer. In fact I
did reboot all virtual machines on the peer side prior to replacing the
failed disk. Everything went smooth from there.

Also there were no errors reported in the log on the peer side. The only
messages were the switching from UpToDate -> Failed -> Diskless
(reported for the primary side).

I rechecked the logfiles on the primary, and these messges "IO ERROR:
neither local nor remote data, sector 0+0" are reported for all affected
DRBD-devices. Some devices were in "Failed"-state before the first
occurence of this message, while other devices were already in
"Diskless". Also these messages still occur a long time after the
incident.

Also, only sector "0+0" is reported, no "real" sector. Maybe DRBD throws
this error in a path were it is not supposed to? I tried to figure it
out in the source, but I am not so familar with the logic in drbd_req.c.
However, for me it looks like that error is reported without actually
trying to read date from the remote side.

> It seems natural for the guest OS to abort the journal under such
> circumstance. The root cause is not clear to me though.

Yes, if an I/O-error occurs I have no objection in having the guest to
abort the journal. However, I do not think that this I/O-error should
have been reported to the guest in the first place (error-policy is
explictly set to "detach" and I see no indication that the peer disk had
any problem at all).

Regards,
Matthias
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 308 bytes
Desc: not available
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20130128/79340030/attachment.pgp>