[DRBD-user] can drbd be made to detect that it has failed to write to the underlying device in a 'long time'?

Tue Apr 13 23:54:06 CEST 2004

Todd Denniston wrote:
> 
send button slipped.
> Todd Denniston wrote:
> >
> > Lars Ellenberg wrote:
> > >
> <SNIP>
> >
> > Unfortunately the lower level has not yet (at that time) declared an IO
> > failure, might be a buglet there (adaptec SCSI layer). :{
> >

Today the ataptec driver gave an IO error ~35 seconds after it started dumping
card...
could be because I passed in the verbose and tag_depth:16 flags that it
handled the situation better.
I'll try to catch that ns: stuff tomorrow.

> <SNIP>
> > > So a "very slow" but "not slow enough" write throughput on the Secondary
> > > will throttle the Primary to the same slowness.
> > >
> > > On the Primary, if it still is responsive, try to watch the "ns:".
> > > If it still increases, this is what happens.
> > >
> > good point, I'll look at it when it latches up today or tomorrow. (seems to
> > happen ~14?? local time for 2 working days in a row).
> >
> 
> <SNIP>
> Darn, I was not doing anything using the disk at the time it had a problem
> today, and missed the slow spot.
> 
> on a positive note, about 22 seconds into the secondaries Card dump (I think
> this is the time that the lockup starts) I got the following on the active
> primary:
> Apr 13 15:36:23 foo kernel: drbd1: sock_sendmsg time expired on sock
> Apr 13 15:36:23 foo kernel: drbd1: no data sent since 10 ping intervals, peer
> seems knocked out: going to StandAlone.
> Apr 13 15:36:23 foo kernel: drbd1: Connection lost.
> 
> so was the 'ko count down' messages something that you might have added or
> fixed in 0.6.1[12]?
> 

-- 
Todd Denniston
Crane Division, Naval Surface Warfare Center (NSWC Crane) 
Harnessing the Power of Technology for the Warfighter