Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On Mon, Oct 05, 2009 at 10:41:23AM +0200, Lars Ellenberg wrote: > On Sun, Oct 04, 2009 at 10:14:22PM +0200, Lars Ellenberg wrote: > > On Sun, Oct 04, 2009 at 03:55:44AM -0400, Gennadiy Nerubayev wrote: > > > On Tue, Sep 22, 2009 at 5:01 PM, Jason McKay <jmckay at logicworks.net> wrote: > > > > > > > On Sep 22, 2009, at 4:34 PM, Lars Ellenberg wrote: > > > > > > > > > But correcting the tcp_mem setting above > > > > > is more likely to fix your symptoms. > > > > > > > > I suspect it will. We'll test and follow up. > > > > > > > > > > Hi guys, > > > > > > Unfortunately these are still occurring, even after we've updated to rc3, > > > and used the tuning settings from rc3 notes (prior to this % of memory in > > > pages were attempted with same results). They are a lot less frequent > > > (intervals measured in hours), and have not yet caused a panic, but of > > > course the worry is that it may happen regardless. Anything else that we > > > could try here to eliminate it completely? Is there any chance that the > > > ipoib stack is at fault? > > > > Possibly. > > Maybe Vlad knows more? > > From http://www.openfabrics.org/txt/documentation/linux/EWG_meeting_minutes/12_01_08.txt: > > 1419 maj vlad at mellanox Iperf-2.0.4 fails: page allocation failure. order:5 > > I guess that means https://bugs.openfabrics.org/show_bug.cgi?id=1419 > > Not much progress on that bug, though. > > This appears related, as well: > http://bugzilla.kernel.org/show_bug.cgi?id=10890 > > Though there it was claimed that leaving network sysctls at the defaults > "solved" the issue. And yet one more, where sysctls helped: http://thread.gmane.org/gmane.linux.nfs/20761/focus=695707 It has different context, but that thread may give you an idea on how to track it down further: turn on slab debug, then sample /proc/slabinfo, /proc/slab_allocators, /proc/net/sockstat, and maybe similar statistics in the infiniband area. BTW, maybe your netdev_max_backlog is a bit excessive? -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.