[DRBD-user] recovery from "page allocation failure"

Christian Balzer chibi at gol.com
Thu Jul 18 11:07:29 CEST 2013

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


On Thu, 18 Jul 2013 10:53:12 +0200 Lars Ellenberg wrote:

> On Thu, Jul 18, 2013 at 10:11:15AM +0900, Christian Balzer wrote:
> > On Wed, 17 Jul 2013 11:27:23 +0200 Lars Ellenberg wrote:
> > 
> > > On Wed, Jul 17, 2013 at 05:25:13PM +0900, Christian Balzer wrote:
> > > > 
> > > > 
> > > > On a very busy cluster with kernel 3.4.48 and DRBD 8.4.3 I was
> > > > able to reduce these kernel messages from dozens a day to nearly
> > > > none by setting
> > > > 
> > > > vm/min_free_kbytes = 524288
> > > 
> > > Yes, that's a setting that should typically help.
> > > 
> > > > Lars, as this keeps popping up and always suggests DRBD to be
> > > > guilty party even if it's not, I wonder if you guys should have
> > > > some back channel talk with the relevant people on the kernel ML...
> > > 
> > > I don't think that would lead anywhere,
> > > upstream kernel has the "memory compaction" meanwhile,
> > > so it should have become much less likely to hit this situation.
> > > 
> > Upstream kernel being from what version on?
> > Note that even with the above setting the last time it failed to get an
> > order 5 alloc (128MB) things looked like this:
> 
> 128 *k*, of course.
> 
Indeed, in machines these days with many GBs of memory 128KB just looks so
small and wrong. ^o^
 
> > ---
> > Node 0 Normal: 66757*4kB 692*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB
> > 0*512kB 0*1024kB 0*2048kB 1*4096kB = 276660kB Node 1 Normal: 25424*4kB
> > 21411*8kB 5130*16kB 1037*32kB 96*64kB 26*128kB 1*256kB 0*512kB
> > 0*1024kB 0*2048kB 1*4096kB = 402072kB ---
> > 
> > So on the node where it counted, lots of tiny fragments (and just about
> > half the free memory to boot).
> > 
> > > Part of the issue was that there is no "physically contiguous" memory
> > > available: even though we have free memory, it is too fragmented.
> > > 
> > > The "compaction" should cause "defragmentation" during normal
> > > allocations, making it much less likely to fail atomic allocations
> > > due to fragmentation.
> > > 
> > If that is supposed to deliver the same results as a forced
> > compaction, I don't see it work, as the 3.2 tests I posted in the
> > thread last year and current ones with 3.4 suggest:
> > ---
> > # cat /sys/kernel/debug/extfrag/unusable_index 
> > Node 0, zone      DMA 0.000 0.000 0.000 0.000 0.000 0.008 0.016 0.032
> > 0.032 0.097 0.226 Node 0, zone    DMA32 0.000 0.161 0.212 0.256 0.283
> > 0.329 0.514 0.744 0.939 1.000 1.000 Node 0, zone   Normal 0.000 0.983
> > 0.983 0.983 0.983 0.983 0.983 0.983 0.983 0.983 0.983 Node 1, zone
> > Normal 0.000 0.297 0.783 0.889 0.935 0.963 0.985 0.989 0.989 0.989
> > 0.989 # echo 1 > /proc/sys/vm/compact_memory #
> > cat /sys/kernel/debug/extfrag/unusable_index Node 0, zone      DMA
> > 0.000 0.000 0.000 0.000 0.000 0.008 0.016 0.032 0.032 0.097 0.226 Node
> > 0, zone    DMA32 0.000 0.032 0.055 0.092 0.189 0.324 0.516 0.751 0.940
> > 0.985 1.000 Node 0, zone   Normal 0.000 0.984 0.984 0.984 0.984 0.984
> > 0.984 0.984 0.984 0.984 0.984 Node 1, zone   Normal 0.000 0.304 0.798
> > 0.902 0.940 0.964 0.985 0.989 0.989 0.989 0.989 --- Not any real
> > improvement where it counts.
> 
> Post that to the mm lists.
> You want to complain to the right people.
> 
Here will be dragons... ^o^

> > > "just use a more recent kernel" should help as well, already
> 
> I was just saying that the situation should have improved (compared with
> kernels that don't even know about compaction),
> and likely will keep improving (free memory fragmentation does affect
> other things and performance in general). I didn't say it was "fixed".
> 
> Still this "compaction" is an interesting problem,
> not all pages can be "migrated" freely for various reasons.
> 
Of course, so avoiding fragmentation in the first place would be neat
trick, most kernel pages can't be relocated...

Laters,

Christian
-- 
Christian Balzer        Network/Systems Engineer                
chibi at gol.com   	Global OnLine Japan/Fusion Communications
http://www.gol.com/



More information about the drbd-user mailing list