[DRBD-user] can drbd be made to detect that it has failed to write to the underlying device in a 'long time'?

Tue Apr 13 17:48:36 CEST 2004

Lars Ellenberg wrote:
> 
> / 2004-04-13 07:15:56 -0500
> \ Todd Denniston:
> > all of my drbd device net sections contain (and did at the time of the lockup
> > too):
<SNIP>
> >     timeout     = 60    # unit: 0.1 seconds
<SNIP>
> >     ko-count    = 10    # if some block send times out this many times,
<SNIP>
> > Which I thought meant that in ~60 seconds[1] I would get a fallover.
> 
> Ah. No.
> This only detects whether I was able to send something to the Secondary.
> If not (for that ~60 seconds), I disconnect and ignore my peer.
> 
> This does NOT detect local IO failure, since that normally is caught by
> the "do-panic" option: when I get a local IO failure, I will just panic
> the box. Which should always trigger a failover ...

Unfortunately the lower level has not yet (at that time) declared an IO
failure, might be a buglet there (adaptec SCSI layer). :{

> 
> > What I was suspecting is the 2 drbd's can still talk to one another, and the
> > failing node's drbd is blocking (but not failing) on the write to the SCSI
> > layer, because the scsi layer is in a loop retrying the reset on the Promise
> > box ''hard drive''.
> >
<SNIP> 
> Hm.
> If the Secondary manages to get a "throughput" of more than one block
> per (ko-count*ping-intervall), we do not disconnect. If it "even"
> manages to get >= (4k) per ping intervall, ko-count won't trigger at all.
> 

Drat, so I am getting 'some' data through to the disk more often than once
every 6 seconds... might lower that timeout and see if I get some ko markers.

> So a "very slow" but "not slow enough" write throughput on the Secondary
> will throttle the Primary to the same slowness.
> 
> On the Primary, if it still is responsive, try to watch the "ns:".
> If it still increases, this is what happens.
> 
good point, I'll look at it when it latches up today or tomorrow. (seems to
happen ~14?? local time for 2 working days in a row).

> If you have some "creative" idea how to cope with this, tell us.

unfortunately I was expecting that the throughput to the disk was 0, but you
are suggesting it is slightly higher, and that you have already covered the 0
case.

perhaps a minimum-pending-throughput (min-pend-speed)?
and pend-grace-period

warning really nasty pseudo code:
if new data received from primary && 
    we have some data that has been pending its sync to disk >
pend-grace-period && 
     (avg_speed_to_disk_since_we_started_pending < min-pend-speed)
{
call ko-count type routines
}

requires receipt/send time markers on data packets, and something tracking
average data rate...
looks easy in english ... could be a pain to code. :}

<SNIP>
-- 
Todd Denniston
Crane Division, Naval Surface Warfare Center (NSWC Crane) 
Harnessing the Power of Technology for the Warfighter