Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Lars Ellenberg wrote:
>
> / 2004-04-13 07:15:56 -0500
> \ Todd Denniston:
> > all of my drbd device net sections contain (and did at the time of the lockup
> > too):
<SNIP>
> > timeout = 60 # unit: 0.1 seconds
<SNIP>
> > ko-count = 10 # if some block send times out this many times,
<SNIP>
> > Which I thought meant that in ~60 seconds[1] I would get a fallover.
>
> Ah. No.
> This only detects whether I was able to send something to the Secondary.
> If not (for that ~60 seconds), I disconnect and ignore my peer.
>
> This does NOT detect local IO failure, since that normally is caught by
> the "do-panic" option: when I get a local IO failure, I will just panic
> the box. Which should always trigger a failover ...
Unfortunately the lower level has not yet (at that time) declared an IO
failure, might be a buglet there (adaptec SCSI layer). :{
>
> > What I was suspecting is the 2 drbd's can still talk to one another, and the
> > failing node's drbd is blocking (but not failing) on the write to the SCSI
> > layer, because the scsi layer is in a loop retrying the reset on the Promise
> > box ''hard drive''.
> >
<SNIP>
> Hm.
> If the Secondary manages to get a "throughput" of more than one block
> per (ko-count*ping-intervall), we do not disconnect. If it "even"
> manages to get >= (4k) per ping intervall, ko-count won't trigger at all.
>
Drat, so I am getting 'some' data through to the disk more often than once
every 6 seconds... might lower that timeout and see if I get some ko markers.
> So a "very slow" but "not slow enough" write throughput on the Secondary
> will throttle the Primary to the same slowness.
>
> On the Primary, if it still is responsive, try to watch the "ns:".
> If it still increases, this is what happens.
>
good point, I'll look at it when it latches up today or tomorrow. (seems to
happen ~14?? local time for 2 working days in a row).
> If you have some "creative" idea how to cope with this, tell us.
unfortunately I was expecting that the throughput to the disk was 0, but you
are suggesting it is slightly higher, and that you have already covered the 0
case.
perhaps a minimum-pending-throughput (min-pend-speed)?
and pend-grace-period
warning really nasty pseudo code:
if new data received from primary &&
we have some data that has been pending its sync to disk >
pend-grace-period &&
(avg_speed_to_disk_since_we_started_pending < min-pend-speed)
{
call ko-count type routines
}
requires receipt/send time markers on data packets, and something tracking
average data rate...
looks easy in english ... could be a pain to code. :}
<SNIP>
--
Todd Denniston
Crane Division, Naval Surface Warfare Center (NSWC Crane)
Harnessing the Power of Technology for the Warfighter