Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On Thu, 18 Jul 2013 10:53:12 +0200 Lars Ellenberg wrote: > On Thu, Jul 18, 2013 at 10:11:15AM +0900, Christian Balzer wrote: > > On Wed, 17 Jul 2013 11:27:23 +0200 Lars Ellenberg wrote: > > > > > On Wed, Jul 17, 2013 at 05:25:13PM +0900, Christian Balzer wrote: > > > > > > > > > > > > On a very busy cluster with kernel 3.4.48 and DRBD 8.4.3 I was > > > > able to reduce these kernel messages from dozens a day to nearly > > > > none by setting > > > > > > > > vm/min_free_kbytes = 524288 > > > > > > Yes, that's a setting that should typically help. > > > > > > > Lars, as this keeps popping up and always suggests DRBD to be > > > > guilty party even if it's not, I wonder if you guys should have > > > > some back channel talk with the relevant people on the kernel ML... > > > > > > I don't think that would lead anywhere, > > > upstream kernel has the "memory compaction" meanwhile, > > > so it should have become much less likely to hit this situation. > > > > > Upstream kernel being from what version on? > > Note that even with the above setting the last time it failed to get an > > order 5 alloc (128MB) things looked like this: > > 128 *k*, of course. > Indeed, in machines these days with many GBs of memory 128KB just looks so small and wrong. ^o^ > > --- > > Node 0 Normal: 66757*4kB 692*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB > > 0*512kB 0*1024kB 0*2048kB 1*4096kB = 276660kB Node 1 Normal: 25424*4kB > > 21411*8kB 5130*16kB 1037*32kB 96*64kB 26*128kB 1*256kB 0*512kB > > 0*1024kB 0*2048kB 1*4096kB = 402072kB --- > > > > So on the node where it counted, lots of tiny fragments (and just about > > half the free memory to boot). > > > > > Part of the issue was that there is no "physically contiguous" memory > > > available: even though we have free memory, it is too fragmented. > > > > > > The "compaction" should cause "defragmentation" during normal > > > allocations, making it much less likely to fail atomic allocations > > > due to fragmentation. > > > > > If that is supposed to deliver the same results as a forced > > compaction, I don't see it work, as the 3.2 tests I posted in the > > thread last year and current ones with 3.4 suggest: > > --- > > # cat /sys/kernel/debug/extfrag/unusable_index > > Node 0, zone DMA 0.000 0.000 0.000 0.000 0.000 0.008 0.016 0.032 > > 0.032 0.097 0.226 Node 0, zone DMA32 0.000 0.161 0.212 0.256 0.283 > > 0.329 0.514 0.744 0.939 1.000 1.000 Node 0, zone Normal 0.000 0.983 > > 0.983 0.983 0.983 0.983 0.983 0.983 0.983 0.983 0.983 Node 1, zone > > Normal 0.000 0.297 0.783 0.889 0.935 0.963 0.985 0.989 0.989 0.989 > > 0.989 # echo 1 > /proc/sys/vm/compact_memory # > > cat /sys/kernel/debug/extfrag/unusable_index Node 0, zone DMA > > 0.000 0.000 0.000 0.000 0.000 0.008 0.016 0.032 0.032 0.097 0.226 Node > > 0, zone DMA32 0.000 0.032 0.055 0.092 0.189 0.324 0.516 0.751 0.940 > > 0.985 1.000 Node 0, zone Normal 0.000 0.984 0.984 0.984 0.984 0.984 > > 0.984 0.984 0.984 0.984 0.984 Node 1, zone Normal 0.000 0.304 0.798 > > 0.902 0.940 0.964 0.985 0.989 0.989 0.989 0.989 --- Not any real > > improvement where it counts. > > Post that to the mm lists. > You want to complain to the right people. > Here will be dragons... ^o^ > > > "just use a more recent kernel" should help as well, already > > I was just saying that the situation should have improved (compared with > kernels that don't even know about compaction), > and likely will keep improving (free memory fragmentation does affect > other things and performance in general). I didn't say it was "fixed". > > Still this "compaction" is an interesting problem, > not all pages can be "migrated" freely for various reasons. > Of course, so avoiding fragmentation in the first place would be neat trick, most kernel pages can't be relocated... Laters, Christian -- Christian Balzer Network/Systems Engineer chibi at gol.com Global OnLine Japan/Fusion Communications http://www.gol.com/