Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On Wed, 17 Jul 2013 11:27:23 +0200 Lars Ellenberg wrote: > On Wed, Jul 17, 2013 at 05:25:13PM +0900, Christian Balzer wrote: > > > > > > On a very busy cluster with kernel 3.4.48 and DRBD 8.4.3 I was able to > > reduce these kernel messages from dozens a day to nearly none by > > setting > > > > vm/min_free_kbytes = 524288 > > Yes, that's a setting that should typically help. > > > Lars, as this keeps popping up and always suggests DRBD to be guilty > > party even if it's not, I wonder if you guys should have some back > > channel talk with the relevant people on the kernel ML... > > I don't think that would lead anywhere, > upstream kernel has the "memory compaction" meanwhile, > so it should have become much less likely to hit this situation. > Upstream kernel being from what version on? Note that even with the above setting the last time it failed to get an order 5 alloc (128MB) things looked like this: --- Node 0 Normal: 66757*4kB 692*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 1*4096kB = 276660kB Node 1 Normal: 25424*4kB 21411*8kB 5130*16kB 1037*32kB 96*64kB 26*128kB 1*256kB 0*512kB 0*1024kB 0*2048kB 1*4096kB = 402072kB --- So on the node where it counted, lots of tiny fragments (and just about half the free memory to boot). > Part of the issue was that there is no "physically contiguous" memory > available: even though we have free memory, it is too fragmented. > > The "compaction" should cause "defragmentation" during normal > allocations, making it much less likely to fail atomic allocations due > to fragmentation. > If that is supposed to deliver the same results as a forced compaction, I don't see it work, as the 3.2 tests I posted in the thread last year and current ones with 3.4 suggest: --- # cat /sys/kernel/debug/extfrag/unusable_index Node 0, zone DMA 0.000 0.000 0.000 0.000 0.000 0.008 0.016 0.032 0.032 0.097 0.226 Node 0, zone DMA32 0.000 0.161 0.212 0.256 0.283 0.329 0.514 0.744 0.939 1.000 1.000 Node 0, zone Normal 0.000 0.983 0.983 0.983 0.983 0.983 0.983 0.983 0.983 0.983 0.983 Node 1, zone Normal 0.000 0.297 0.783 0.889 0.935 0.963 0.985 0.989 0.989 0.989 0.989 # echo 1 > /proc/sys/vm/compact_memory # cat /sys/kernel/debug/extfrag/unusable_index Node 0, zone DMA 0.000 0.000 0.000 0.000 0.000 0.008 0.016 0.032 0.032 0.097 0.226 Node 0, zone DMA32 0.000 0.032 0.055 0.092 0.189 0.324 0.516 0.751 0.940 0.985 1.000 Node 0, zone Normal 0.000 0.984 0.984 0.984 0.984 0.984 0.984 0.984 0.984 0.984 0.984 Node 1, zone Normal 0.000 0.304 0.798 0.902 0.940 0.964 0.985 0.989 0.989 0.989 0.989 --- Not any real improvement where it counts. > "just use a more recent kernel" should help as well, already > When choosing a kernel for a new system/cluster I try to pick the latest "longterm" one that works with the userspace tools of the current distro release I'm using. And unless something absolutely requires me to do otherwise, these machines will stay up as long as possible, in some cases until their replacement (5 years). So that's 3.4 at this time, if somebody convinces me that 3.10 will turn longterm I could give that a try and see if it plays nice with Wheezy userland tools when building the next cluster. Regards, Christian -- Christian Balzer Network/Systems Engineer chibi at gol.com Global OnLine Japan/Fusion Communications http://www.gol.com/