Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hi, I recently had the following occur on the primary node of a DRBD resource, running DRBD 8.4.5 on CentOS 6.6 (kernel 2.6.32-504.el6.x86_64): Nov 11 05:34:54 kernel: block drbd5: Remote failed to finish a request within ko-count * timeout Nov 11 05:34:54 kernel: block drbd5: peer( Secondary -> Unknown ) conn( Connected -> Timeout ) pdsk( UpToDate -> DUnknown ) Being unfamiliar with ko-count, I looked at the documentation and found: ko-count number In case the secondary node fails to complete a single write request for count times the timeout, it is expelled from the cluster. (I.e. the primary node goes into StandAlone mode.) The default value is 0, which disables this feature. The thing is -- nowhere in my config was ko-count set. So seeing it apparently kick in was an unwelcome surprise. I have since set ko-count and timeout to "large" values in the hope that it doesn't happen again. Is this a DRBD bug, or expected behavior? If it's somehow the latter, I think the combination of the documentation and error messages is quite misleading and should be fixed. Thanks, Zev Weiss