[DRBD-user] recovery from "page allocation failure"

Christian Balzer chibi at gol.com
Thu Jul 18 03:11:15 CEST 2013

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


On Wed, 17 Jul 2013 11:27:23 +0200 Lars Ellenberg wrote:

> On Wed, Jul 17, 2013 at 05:25:13PM +0900, Christian Balzer wrote:
> > 
> > 
> > On a very busy cluster with kernel 3.4.48 and DRBD 8.4.3 I was able to
> > reduce these kernel messages from dozens a day to nearly none by
> > setting
> > 
> > vm/min_free_kbytes = 524288
> 
> Yes, that's a setting that should typically help.
> 
> > Lars, as this keeps popping up and always suggests DRBD to be guilty
> > party even if it's not, I wonder if you guys should have some back
> > channel talk with the relevant people on the kernel ML...
> 
> I don't think that would lead anywhere,
> upstream kernel has the "memory compaction" meanwhile,
> so it should have become much less likely to hit this situation.
> 
Upstream kernel being from what version on?
Note that even with the above setting the last time it failed to get an
order 5 alloc (128MB) things looked like this:
---
Node 0 Normal: 66757*4kB 692*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 1*4096kB = 276660kB
Node 1 Normal: 25424*4kB 21411*8kB 5130*16kB 1037*32kB 96*64kB 26*128kB 1*256kB 0*512kB 0*1024kB 0*2048kB 1*4096kB = 402072kB
---

So on the node where it counted, lots of tiny fragments (and just about
half the free memory to boot).

> Part of the issue was that there is no "physically contiguous" memory
> available: even though we have free memory, it is too fragmented.
> 
> The "compaction" should cause "defragmentation" during normal
> allocations, making it much less likely to fail atomic allocations due
> to fragmentation.
> 
If that is supposed to deliver the same results as a forced compaction, I
don't see it work, as the 3.2 tests I posted in the thread last year and
current ones with 3.4 suggest:
---
# cat /sys/kernel/debug/extfrag/unusable_index 
Node 0, zone      DMA 0.000 0.000 0.000 0.000 0.000 0.008 0.016 0.032 0.032 0.097 0.226 
Node 0, zone    DMA32 0.000 0.161 0.212 0.256 0.283 0.329 0.514 0.744 0.939 1.000 1.000 
Node 0, zone   Normal 0.000 0.983 0.983 0.983 0.983 0.983 0.983 0.983 0.983 0.983 0.983 
Node 1, zone   Normal 0.000 0.297 0.783 0.889 0.935 0.963 0.985 0.989 0.989 0.989 0.989 
# echo 1 > /proc/sys/vm/compact_memory
# cat /sys/kernel/debug/extfrag/unusable_index 
Node 0, zone      DMA 0.000 0.000 0.000 0.000 0.000 0.008 0.016 0.032 0.032 0.097 0.226 
Node 0, zone    DMA32 0.000 0.032 0.055 0.092 0.189 0.324 0.516 0.751 0.940 0.985 1.000 
Node 0, zone   Normal 0.000 0.984 0.984 0.984 0.984 0.984 0.984 0.984 0.984 0.984 0.984 
Node 1, zone   Normal 0.000 0.304 0.798 0.902 0.940 0.964 0.985 0.989 0.989 0.989 0.989 
---
Not any real improvement where it counts.

> "just use a more recent kernel" should help as well, already
> 
When choosing a kernel for a new system/cluster I try to pick the latest
"longterm" one that works with the userspace tools of the current distro
release I'm using. 
And unless something absolutely requires me to do otherwise, these
machines will stay up as long as possible, in some cases until their
replacement (5 years).

So that's 3.4 at this time, if somebody convinces me that 3.10 will turn
longterm I could give that a try and see if it plays nice with Wheezy
userland tools when building the next cluster.

Regards,

Christian
-- 
Christian Balzer        Network/Systems Engineer                
chibi at gol.com   	Global OnLine Japan/Fusion Communications
http://www.gol.com/



More information about the drbd-user mailing list