Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
/ 2006-07-25 19:12:42 -0400 \ Brent A Nelson: > Lars, none of your suggestions caused the drbd device to "unstick". Both nodes were using anticipatory > io-scheduling (changing to deadline didn't get it going again, although I wanted to be running deadline, anyway, so > it's good to know that I wasn't). > > Here are the relevant entries from /proc/drbd: > Secondary: > 3: cs:ServerForDLess st:Secondary/Primary ld:Consistent > ns:772036 nr:18499220 dw:18499220 dr:772036 al:0 bm:465 lo:0 pe:0 ua:0 ap:0 > > Primary: > 3: cs:DiskLessClient st:Primary/Secondary ld:Inconsistent > ns:18499696 nr:0 dw:13933352 dr:4995314 al:5062 bm:626 lo:2 pe:0 ua:0 ap:0 ok, I guess this is it. "lo: 2" means: locally submitted io-requests on this node, but neither failure nor completion even received yet. Either we (drbd) sometimes have a problem counting things in presence of IO error. Or, the driver/device which is broken failed to report the error for these requests, which I have seen in the real world... Since "ap" is zero, these are probably meta_data update requests. Could be we just "forgot" to properly clean up the reference counters in one of the possible code pathes resulting from the inability to write to our meta data -- and now we are waiting forever for "lo" to drop to zero again... We'll look into that. -- : Lars Ellenberg Tel +43-1-8178292-0 : : LINBIT Information Technologies GmbH Fax +43-1-8178292-82 : : Schoenbrunner Str. 244, A-1120 Vienna/Europe http://www.linbit.com : __ please use the "List-Reply" function of your email client.