Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
/ 2004-02-07 14:19:51 +0100 \ Jurgen Schoeters: > Hi, > > I have a little question about ko-count: > > KO-count > > If some write request send times out this many times, the peer > is considered dead, even if it still answers ping requests > > If a timeout happens then drbd will try to reconnect with the > other node (connect-int), but if it happens for example 4 times > (if ko-count=4). no. The option is intended to detect the case where a node (or its IO subsystem) actually _is_ dead, but the network stack still is able to process and answer small packets. I try to transfer one block. "ping-timeout" expires, ko-count decremented. ping sent. still trying to send, same connection. "ping-timeout" expires some more times, ko-count decremented each time, ping sent each time. still trying to send, connection still up. ko-count reaches 0 (means that I was not able to send data since [ping-timeout * configured-ko-count]. So I decide that the secondary has a problem, "feels" alive ping-wise and thus probably won't just reboot or be stonithed, and is very unlikely to recover in due time. I drop the connection completely, falling back to local only IO, rather than having a blocking io subsystem on the secondary blocking the io subsystem on the primary, too. The cluster is degraded now, and operator intervention is required. Other Szenario: If I sent a ping, but did not get a ping-ack in due time, or a send (or recv, for that matter) fails with an error code like "Connection reset by peer","Broken pipe" or some such, the connection is dropped (or was already dropped by the peer or hardware). But I try to reconnect, since the problem is likely to go away without intervention: either the network was down and comes back soon, or the peer will reboot and come back soon. This has nothing to do with the ko-count. Lars Ellenberg