[DRBD-user] Re: DRBD with disk failure.

Lars Ellenberg Lars.Ellenberg at linbit.com
Wed Jul 26 01:50:23 CEST 2006

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


/ 2006-07-25 19:12:42 -0400
\ Brent A Nelson:
> Lars, none of your suggestions caused the drbd device to "unstick".  Both nodes were using anticipatory 
> io-scheduling (changing to deadline didn't get it going again, although I wanted to be running deadline, anyway, so 
> it's good to know that I wasn't).
> 
> Here are the relevant entries from /proc/drbd:
> Secondary:
>  3: cs:ServerForDLess st:Secondary/Primary ld:Consistent
>     ns:772036 nr:18499220 dw:18499220 dr:772036 al:0 bm:465 lo:0 pe:0 ua:0 ap:0
> 
> Primary:
>  3: cs:DiskLessClient st:Primary/Secondary ld:Inconsistent
>     ns:18499696 nr:0 dw:13933352 dr:4995314 al:5062 bm:626 lo:2 pe:0 ua:0 ap:0

ok, I guess this is it.
"lo: 2" means:
  locally submitted io-requests on this node,
  but neither failure nor completion even received yet.

Either we (drbd) sometimes have a problem counting things in presence
of IO error. Or, the driver/device which is broken failed to report
the error for these requests, which I have seen in the real world...

Since "ap" is zero, these are probably meta_data update requests.

Could be we just "forgot" to properly clean up the reference counters
in one of the possible code pathes resulting from the inability to
write to our meta data -- and now we are waiting forever for "lo" to
drop to zero again... We'll look into that.

-- 
: Lars Ellenberg                                  Tel +43-1-8178292-0  :
: LINBIT Information Technologies GmbH            Fax +43-1-8178292-82 :
: Schoenbrunner Str. 244, A-1120 Vienna/Europe   http://www.linbit.com :
__
please use the "List-Reply" function of your email client.



More information about the drbd-user mailing list