[DRBD-user] What to do about read errors on the primary?

Lars Ellenberg lars.ellenberg at linbit.com
Fri Sep 21 11:14:25 CEST 2012

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.

> ... you should have "on-io-error detach;"  ...

> So for now, you better have some RAID below DRBD.
> hard or soft (md), any redundant raid level is fine.
> Because, if your SyncSource fails during resync,
> you are out-of-luck.
> With sufficient context information, if you know what you are doing,
> and given specific failure modes, you then may be able to fix this by hand.
> On such theoretically fixable failure mode would be read errors on
> different sectors, while the respective sector on the other node still
> has the "correct" data (and you are sure that it is still the correct
> data).
> But no, DRBD does not do any such "advanced" fixing of multiple
> "simultaneously" failing replicas, yet.

A bit more detail about how to "fis this by hand".

First, "this" is supposed to be the following scenario,
which may be similar enough to what has happened.

Normal operation.

IO error (read or write) in the data area on Primary.

Because of the "pass-on" setting, we do not detach,
but mark that bit as out-of-sync in our bitmap,
and change the disk state to "Inconsistent".
You need a sufficiently recent version of DRBD (>= 8.3.11, iirc).

A bit later, we have an additional error on the peer,
but in an other sector.

Fortunately, the sector numbers are in the log.

You now could copy the respective sectors from the respective peer
(which are supposedly still "correct"), write them locally, and hope
that by writing, you trigger sector re-allocation, and thereby "heal"
the broken sectors.

If that is no longer possible, you could just write zeros to those
sectors, and again hope for "healing" by re-allocation.

Or you could (dd_rescue) copy all still readable data to a different
drive, zeroing out the unreadable sectors.

This should get you a workable copy of your data,
in the first case even an actually good one.

In the zeroed sectors cases obviously a "damaged" one,
but you should be able to recover with standard tools from there.


More information about the drbd-user mailing list