[DRBD-user] ko-count question

Sat Feb 7 15:20:29 CET 2004

/ 2004-02-07 14:19:51 +0100
\ Jurgen Schoeters:
> Hi,
> 
> I have a little question about ko-count:
> 
> KO-count
> 
> If some write request send times out this many times, the peer
> is considered dead, even if it still answers ping requests
> 
> If a timeout happens then drbd will try to reconnect with the
> other node (connect-int), but if it happens for example 4 times
> (if ko-count=4).

no. The option is intended to detect the case where a node (or its
IO subsystem) actually _is_ dead, but the network stack still is
able to process and answer small packets.

I try to transfer one block.
"ping-timeout" expires, ko-count decremented.
ping sent. still trying to send, same connection.
"ping-timeout" expires some more times,
ko-count decremented each time, ping sent each time.
still trying to send, connection still up.

ko-count reaches 0 (means that I was not able to send data
since [ping-timeout * configured-ko-count].
So I decide that the secondary has a problem, "feels" alive
ping-wise and thus probably won't just reboot or be stonithed,
and is very unlikely to recover in due time.

I drop the connection completely, falling back to local only IO,
rather than having a blocking io subsystem on the secondary
blocking the io subsystem on the primary, too.

The cluster is degraded now, and operator intervention is required.

Other Szenario:
If I sent a ping, but did not get a ping-ack in due time,
or a send (or recv, for that matter) fails with an error code
like "Connection reset by peer","Broken pipe" or some such,
the connection is dropped (or was already dropped by the peer or
hardware).
But I try to reconnect, since the problem is likely to go away
without intervention: either the network was down and comes back
soon, or the peer will reboot and come back soon.
This has nothing to do with the ko-count.

	Lars Ellenberg