Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Todd Denniston wrote: > > Lars Ellenberg wrote: > > <SNIP> > > Unfortunately the lower level has not yet (at that time) declared an IO > failure, might be a buglet there (adaptec SCSI layer). :{ > <SNIP> > > So a "very slow" but "not slow enough" write throughput on the Secondary > > will throttle the Primary to the same slowness. > > > > On the Primary, if it still is responsive, try to watch the "ns:". > > If it still increases, this is what happens. > > > good point, I'll look at it when it latches up today or tomorrow. (seems to > happen ~14?? local time for 2 working days in a row). > <SNIP> Darn, I was not doing anything using the disk at the time it had a problem today, and missed the slow spot. on a positive note, about 22 seconds into the secondaries Card dump (I think this is the time that the lockup starts) I got the following on the active primary: Apr 13 15:36:23 foo kernel: drbd1: sock_sendmsg time expired on sock Apr 13 15:36:23 foo kernel: drbd1: no data sent since 10 ping intervals, peer seems knocked out: going to StandAlone. Apr 13 15:36:23 foo kernel: drbd1: Connection lost. so was the 'ko count down' messages something that you might have added or fixed in 0.6.1[12]? -- Todd Denniston Crane Division, Naval Surface Warfare Center (NSWC Crane) Harnessing the Power of Technology for the Warfighter