[DRBD-user] What to do about read errors on the primary?

Dan Barker dbarker at visioncomm.net
Tue Sep 18 19:39:54 CEST 2012

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.

"shot myself in the foot somewhere along the line"

I'm glad you don't need any help on that subject. I have much experience
shooting my own foot; I'm glad I don't need to share them with you<g>.

If the primary's disk is the best you've got, and it's worth some file
corruption (drbd abhors any single-bit difference from primary to
secondary), I think the best course (which will probably crash and burn) is
to dd the contents of the primary disk to a new, hopefully identical disk.
On error, dd will probably stop. You can then restart it beyond the "bad"
spot with seek=. After "less than 200" trys, you'll have a copy of the
readable blocks on a disk which will run with no read errors, although there
will be junk in the places that were bad before.

Mount that to drbd, mount the secondary discard-my-data and let them sync
up. Then  fsck and hold on to your shorts.

AFAICT, that's going to be your best (only?) shot.

Not knowing what you did this time makes it difficult to direct you not to
do that again, but I'm going to try, "Don't do that again".

A simple suggestion is to do a weekly verify with email to you if anything
is amuck. Of course, even that can fail. No email means no verify error, but
it doesn't mean the CPU didn't overheat and shutdown one of the nodes
(happened to me a couple weeks back. $2 fan).



-----Original Message-----
From: Alan Robertson [mailto:alanr at unix.sh] 
Sent: Tuesday, September 18, 2012 1:24 PM
To: Dan Barker
Subject: Re: [DRBD-user] What to do about read errors on the primary?

On 09/18/2012 10:24 AM, Dan Barker wrote:
> "I have read errors on the primary side, which caused the secondary to 
> go into an "inconsistent" state."
> It's a shame you lost the logs. They would have said much.
> When drbd loses a primary disk, it continues to work, read/write, 
> using the secondary disk. The active node will remain primary, the 
> standby node will remain secondary, but the disk state will be 
> diskless/uptodate. All I/O is going over the wire now, reads and 
> writes; not just writes as is the normal
> (uptodate/uptodate) case.
> You have described a result different than that, so the precipitating 
> events must be different too.

Thanks for the description of how it's supposed to work in this case.  I
didn't really know.

I may have shot myself in the foot somewhere along the line too...  I
certainly wouldn't count that out. :-D

The reason why the logs were lost is that I didn't notice for a long time...
It could have been many months.  This is my home system.  It's actually been
many years since I had a disk failure...

What I noticed was that some failover tests I was performing didn't work
- it insisted on leaving things on the (now-broken) primary side.  I then
noticed the DRBD state wasn't in sync (and even that was a month or so ago -
life has been busy).  I tried to bring them into sync using a variety of
techniques that didn't work. _Then_ I noticed the I/O errors.

The I/O errors are near the end of the disk.  I wonder if some of the I/O
errors were in the bitmap?

But after screwing around, and probably shooting myself in the foot, I'd
like for the two sides to continue to try and stay in sync as much as they
can.  I don't want the synchronization to stop just because there might be
an I/O error on one block.  Or at least, I _think_ that's what I want.  [In
my case, of course, it was a lot more than one block - but less than 200].

In my case, the only absolutely up-to-date copy I have is in this failing
drive.  Not what I wanted...  I may have caused this by my flailing around
trying to make failover work.

    Alan Robertson <alanr at unix.sh> - @OSSAlanR

"Openness is the foundation and preservative of friendship...  Let me claim
from you at all times your undisguised opinions." - William Wilberforce

More information about the drbd-user mailing list