Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
"shot myself in the foot somewhere along the line" I'm glad you don't need any help on that subject. I have much experience shooting my own foot; I'm glad I don't need to share them with you<g>. If the primary's disk is the best you've got, and it's worth some file corruption (drbd abhors any single-bit difference from primary to secondary), I think the best course (which will probably crash and burn) is to dd the contents of the primary disk to a new, hopefully identical disk. On error, dd will probably stop. You can then restart it beyond the "bad" spot with seek=. After "less than 200" trys, you'll have a copy of the readable blocks on a disk which will run with no read errors, although there will be junk in the places that were bad before. Mount that to drbd, mount the secondary discard-my-data and let them sync up. Then fsck and hold on to your shorts. AFAICT, that's going to be your best (only?) shot. Not knowing what you did this time makes it difficult to direct you not to do that again, but I'm going to try, "Don't do that again". A simple suggestion is to do a weekly verify with email to you if anything is amuck. Of course, even that can fail. No email means no verify error, but it doesn't mean the CPU didn't overheat and shutdown one of the nodes (happened to me a couple weeks back. $2 fan). hth Dan -----Original Message----- From: Alan Robertson [mailto:alanr at unix.sh] Sent: Tuesday, September 18, 2012 1:24 PM To: Dan Barker Subject: Re: [DRBD-user] What to do about read errors on the primary? On 09/18/2012 10:24 AM, Dan Barker wrote: > "I have read errors on the primary side, which caused the secondary to > go into an "inconsistent" state." > > It's a shame you lost the logs. They would have said much. > > When drbd loses a primary disk, it continues to work, read/write, > using the secondary disk. The active node will remain primary, the > standby node will remain secondary, but the disk state will be > diskless/uptodate. All I/O is going over the wire now, reads and > writes; not just writes as is the normal > (uptodate/uptodate) case. > > You have described a result different than that, so the precipitating > events must be different too. Thanks for the description of how it's supposed to work in this case. I didn't really know. I may have shot myself in the foot somewhere along the line too... I certainly wouldn't count that out. :-D The reason why the logs were lost is that I didn't notice for a long time... It could have been many months. This is my home system. It's actually been many years since I had a disk failure... What I noticed was that some failover tests I was performing didn't work - it insisted on leaving things on the (now-broken) primary side. I then noticed the DRBD state wasn't in sync (and even that was a month or so ago - life has been busy). I tried to bring them into sync using a variety of techniques that didn't work. _Then_ I noticed the I/O errors. The I/O errors are near the end of the disk. I wonder if some of the I/O errors were in the bitmap? But after screwing around, and probably shooting myself in the foot, I'd like for the two sides to continue to try and stay in sync as much as they can. I don't want the synchronization to stop just because there might be an I/O error on one block. Or at least, I _think_ that's what I want. [In my case, of course, it was a lot more than one block - but less than 200]. In my case, the only absolutely up-to-date copy I have is in this failing drive. Not what I wanted... I may have caused this by my flailing around trying to make failover work. -- Alan Robertson <alanr at unix.sh> - @OSSAlanR "Openness is the foundation and preservative of friendship... Let me claim from you at all times your undisguised opinions." - William Wilberforce