[DRBD-user] Page allocation failure (IPOIB, Infiniband, connected mode)

Lars Ellenberg lars.ellenberg at linbit.com
Tue Mar 20 17:04:48 CET 2012

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


On Mon, Mar 19, 2012 at 05:04:44PM +0900, Christian Balzer wrote:
> 
> Hi Florian,
> 
> On Fri, 16 Mar 2012 13:55:17 +0100 Florian Haas wrote:
> 
> > On Wed, Mar 14, 2012 at 7:48 AM, Christian Balzer <chibi at gol.com> wrote:
> > > Hello,
> > >
> > > This is basically a repeat of:
> > > http://lists.linbit.com/pipermail/drbd-user/2011-August/016758.html
> > >
> > > 32GB RAM, Debian Squeeze, 3.2 (debian backport) kernel, 8.3.12 DRBD,
> > > IPOIB in connected mode with a 64k MTU. Just 2 DRBD resources.
> > >
> > > After encountering this for the first time (never showed up in two
> > > weeks of stress testing, which only goes to prove that real life just
> > > can't be simulated) I found the above article and changed the
> > > following sysctls:
> > >
> > > vm/min_free_kbytes = 262144
> [snip]
> > >
> > > Lars hinted at "atomic reserves" in his reply, which particular
> > > parameters are we talking about here?
> > 
> > I had hoped for Lars to pitch in here, but I guess I'll give it a go
> > instead. Note I'm certainly no kernel memory management expert, but
> > I'm not aware of anything that would fit that description other than
> > the vm.min_free_kbytes sysctl you've already mentioned.
> > 
> Yeah, that was my assumption, too. 

Well, no.  Or rather, "it depends".

The trace you posted contains tcp_sendmsg, so from the send path.

In the *receive* path, the min_free_kbytes actually make a difference.
In the *send* path, typically it does not, because we are not in
"atomic" context, but may block/sleep, and thus this reserve should
normally not be touched.

Also, the problem is not insufficient free memory, but insufficient
free memory of the desired "order". Put it differently: problem
is memory fragmentation.

So you need to look into memory "defragmentation", which is better
known as "memory compaction" in the linux kernel.

Relevant sysctls:
compact_memory (trigger to do an ad-hoc compaction run)
extfrag_threshold, probably a few more.

Or you need to fix the drivers to not require higher order page
allocation, but be ok with just some single pages scattered around.

> > SUSE's kernel documentation team, btw, lists these "page allocation
> > failure" warnings as no cause for concern as long as they happen
> > infrequently:
> > 
> Once or twice per day would fit that bill, however they still make me
> wonder. 
> I doubled the vm.min_free_kbytes again to 512MB and still got them at
> times with particular high activity. Not sure if upping to 1GB would
> actually make it go away, as reported free memory was several GB at least
> once when such a failure was logged.
> 
> I guess I'll just keep an eye on it, these boxes are at about 30% of their
> expected load/capacity (I/O, not space) now...

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed



More information about the drbd-user mailing list