<div dir="ltr">Figured this out. It caused by huge buffer size, which is kind of like cache, but with disk meta data. The kernel won't release buffered memory unless the kernel tries to get more memory, and <i>buffered</i> memory accumulates over time.<div>
<br></div><div>I ran a program that allocates a git chunk of memory and the buffer size went down to less then 1MB.</div></div><div class="gmail_extra"><br><br><div class="gmail_quote">On Tue, May 21, 2013 at 12:14 PM, Lin Zhao <span dir="ltr"><<a href="mailto:lin@groupon.com" target="_blank">lin@groupon.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">I've been running DRBD set up for 3 months, and recently noticed high kernel memory usage on the secondary machines.<div>
<br></div><div>The secondary machine runs very light user applications, but total memory usage reaches as much as 60G.</div>
<div><br></div><div>Is there a known issue with kernel message leak? Attaching top, slabtop and meminfo from my backup machine. You can see that the processes from <i>top</i> show very light RES size, but the system memory usage reaches 63 G. Can you identify something obvious?</div>
<div><br></div><div>top:</div><div><div>last pid: 5836; load avg: 0.11, 0.17, 0.12; up 250+16:35:23 19:12:32</div>
<div>326 processes: 1 running, 325 sleeping</div><div>CPU states: 0.0% user, 0.0% nice, 0.0% system, 100% idle, 0.0% iowait</div><div>Kernel: 106 ctxsw, 1019 intr</div><div>Memory: 63G used, 426M free, 52G buffers, 179M cached</div>
<div>Swap: 116K used, 8000M free</div><div><br></div><div> PID USERNAME THR PRI NICE SIZE RES SHR STATE TIME CPU COMMAND</div><div>25785 ganglia 1 15 0 140M 8460K 3476K sleep 88:42 0.00% gmond</div>
<div> 4730 root 1 16 0 88M 3368K 2636K sleep 0:00 0.00% sshd</div><div> 4732 lin 1 15 0 88M 1836K 1084K sleep 0:00 0.00% sshd</div><div> 4733 lin 1 16 0 65M 1596K 1272K sleep 0:00 0.00% bash</div>
<div> 7523 root 1 15 0 65M 1596K 1272K sleep 0:00 0.00% bash</div><div> 5500 root 1 15 0 61M 1208K 644K sleep 2:20 0.00% sshd</div><div> 5834 root 1 15 0 61M 848K 336K sleep 0:00 0.00% crond</div>
<div> 8785 root 1 16 0 61M 1024K 516K sleep 0:03 0.00% crond</div><div> 7493 root 1 15 0 51M 1372K 1036K sleep 0:00 0.00% login</div><div> 5066 root 3 20 0 28M 576K 448K sleep 0:00 0.00% brcm_iscsiuio</div>
<div> 8886 root 1 15 0 23M 1984K 1464K sleep 0:00 0.00% ntpd</div><div> 1798 root 1 11 -4 12M 776K 456K sleep 0:00 0.00% udevd</div><div> 5072 root 1 5 -10 12M 4452K 3164K sleep 0:00 0.00% iscsid</div>
<div> 5071 root 1 18 0 12M 652K 416K sleep 0:00 0.00% iscsid</div><div> 5718 lin 1 15 0 11M 1152K 848K run 0:00 0.00% top</div><div>12349 root 1 15 0 11M 1532K 612K sleep 2:54 0.00% syslogd</div>
<div> 1 root 1 15 0 10M 752K 632K sleep 4:17 0.00% init</div><div> 5835 root 1 19 0 8688K 1072K 924K sleep 0:00 0.00% sh</div><div> 7301 root 1 19 0 3808K 532K 448K sleep 0:00 0.00% mingetty</div>
<div> 7300 root 1 18 0 3808K 532K 448K sleep 0:00 0.00% mingetty</div><div> 7299 root 1 17 0 3808K 532K 448K sleep 0:00 0.00% mingetty</div><div> 7298 root 1 16 0 3808K 532K 448K sleep 0:00 0.00% mingetty</div>
<div> 7302 root 1 18 0 3808K 528K 448K sleep 0:00 0.00% mingetty</div><div> 7303 root 1 18 0 3808K 528K 448K sleep 0:00 0.00% mingetty</div><div> 5836 root 1 19 0 3808K 484K 408K sleep 0:00 0.00% sleep</div>
<div> 1744 root 1 10 -5 0K 0K 0K sleep 649:36 0.00% md2_raid10</div><div> 6586 root 1 15 0 0K 0K 0K sleep 227:57 0.00% drbd1_receiver</div><div> 6587 root 1 -3 0 0K 0K 0K sleep 72:20 0.00% drbd1_asender</div>
<div> 1740 root 1 10 -5 0K 0K 0K sleep 64:18 0.00% md1_raid10</div><div> 1750 root 1 10 -5 0K 0K 0K sleep 16:02 0.00% kjournald</div><div><br></div><div>slabtop:</div><div>
<div>Active / Total Objects (% used) : 108378294 / 108636165 (99.8%)</div><div> Active / Total Slabs (% used) : 2746709 / 2746710 (100.0%)</div><div> Active / Total Caches (% used) : 100 / 150 (66.7%)</div><div>
Active / Total Size (% used) : 10273556.84K / 10298936.27K (99.8%)</div><div> Minimum / Average / Maximum Object : 0.02K / 0.09K / 128.00K</div><div><br></div><div> OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME </div>
<div>108327280 108091540 20% 0.09K 2708182<span style="white-space:pre-wrap">        </span>40 10832728K buffer_head</div><div>228606 228226 99% 0.52K 32658 7 130632K radix_tree_node</div><div> 9856 9832 99% 0.09K 224<span style="white-space:pre-wrap">        </span> 44<span style="white-space:pre-wrap">        </span> 896K sysfs_dir_cache</div>
<div> 7847 3871 49% 0.06K 133<span style="white-space:pre-wrap">        </span> 59<span style="white-space:pre-wrap">        </span> 532K size-64</div><div> 7596 5889 77% 0.21K 422<span style="white-space:pre-wrap">        </span> 18<span style="white-space:pre-wrap">        </span> 1688K dentry_cache</div>
<div> 6300 5208 82% 0.12K 210<span style="white-space:pre-wrap">        </span> 30<span style="white-space:pre-wrap">        </span> 840K size-128</div><div> 4368 3794 86% 0.03K 39<span style="white-space:pre-wrap">        </span>112<span style="white-space:pre-wrap">        </span> 156K size-32</div>
<div> 3150 2793 88% 0.25K 210<span style="white-space:pre-wrap">        </span> 15<span style="white-space:pre-wrap">        </span> 840K size-256</div><div> 3068 2563 83% 0.06K 52<span style="white-space:pre-wrap">        </span> 59<span style="white-space:pre-wrap">        </span> 208K Acpi-Operand</div>
<div> 2904 1253 43% 0.17K 132<span style="white-space:pre-wrap">        </span> 22<span style="white-space:pre-wrap">        </span> 528K vm_area_struct</div><div> 2376 2342 98% 1.00K 594 4<span style="white-space:pre-wrap">        </span> 2376K size-1024</div>
<div> 2304 380 16% 0.02K 16<span style="white-space:pre-wrap">        </span>144 64K anon_vma</div><div> 2256 1852 82% 0.08K 47<span style="white-space:pre-wrap">        </span> 48<span style="white-space:pre-wrap">        </span> 188K selinux_inode_security</div>
<div> 2121 1943 91% 0.55K 303 7<span style="white-space:pre-wrap">        </span> 1212K inode_cache</div><div> 1776 1463 82% 0.50K 222 8<span style="white-space:pre-wrap">        </span> 888K size-512</div>
<div> 1710 705 41% 0.25K 114<span style="white-space:pre-wrap">        </span> 15<span style="white-space:pre-wrap">        </span> 456K filp</div><div> 1698 1642 96% 0.58K 283 6<span style="white-space:pre-wrap">        </span> 1132K proc_inode_cache</div>
<div> 1632 1606 98% 2.00K 816 2<span style="white-space:pre-wrap">        </span> 3264K size-2048</div><div> 1590 1147 72% 0.25K 106<span style="white-space:pre-wrap">        </span> 15<span style="white-space:pre-wrap">        </span> 424K skbuff_head_cache</div>
<div> 1584 324 20% 0.02K 11<span style="white-space:pre-wrap">        </span>144 44K numa_policy</div><div> 1180 359 30% 0.06K 20<span style="white-space:pre-wrap">        </span> 59 80K delayacct_cache</div>
<div> 1140 1101 96% 0.74K 228 5<span style="white-space:pre-wrap">        </span> 912K ext3_inode_cache</div><div> 1080 1049 97% 0.09K 27<span style="white-space:pre-wrap">        </span> 40<span style="white-space:pre-wrap">        </span> 108K drbd_ee</div>
<div> 1054 1024 97% 0.11K 31<span style="white-space:pre-wrap">        </span> 34<span style="white-space:pre-wrap">        </span> 124K drbd_req</div><div> 1010 339 33% 0.02K<span style="white-space:pre-wrap">        </span> 5<span style="white-space:pre-wrap">        </span>202 20K biovec-1</div>
<div> 1008 888 88% 0.03K<span style="white-space:pre-wrap">        </span> 9<span style="white-space:pre-wrap">        </span>112 36K Acpi-Namespace</div><div> 944 335 35% 0.06K 16<span style="white-space:pre-wrap">        </span> 59 64K pid</div>
<div> 650 514 79% 0.75K 130 5<span style="white-space:pre-wrap">        </span> 520K shmem_inode_cache</div><div> 630 542 86% 0.12K 21<span style="white-space:pre-wrap">        </span> 30 84K bio</div>
<div> 558 353 63% 0.81K 62 9<span style="white-space:pre-wrap">        </span> 496K signal_cache</div><div> 496 496 100% 4.00K 496 1<span style="white-space:pre-wrap">        </span> 1984K size-4096</div>
<div> 410 351 85% 1.84K 205 2<span style="white-space:pre-wrap">        </span> 820K task_struct</div><div> 399 355 88% 2.06K 133 3<span style="white-space:pre-wrap">        </span> 1064K sighand_cache</div>
<div> 354 54 15% 0.06K<span style="white-space:pre-wrap">        </span> 6<span style="white-space:pre-wrap">        </span> 59 24K fs_cache</div><div><br></div></div><div>memtop:</div><div><div>MemTotal: 65996216 kB</div>
<div>MemFree: 436188 kB</div><div>Buffers: 54272396 kB</div><div>Cached: 183784 kB</div><div>SwapCached: 0 kB</div><div>Active: 324660 kB</div><div>Inactive: 54143868 kB</div><div>
HighTotal: 0 kB</div><div>HighFree: 0 kB</div><div>LowTotal: 65996216 kB</div><div>LowFree: 436188 kB</div><div>SwapTotal: 8192504 kB</div><div>SwapFree: 8192388 kB</div><div>Dirty: 0 kB</div>
<div>Writeback: 0 kB</div><div>AnonPages: 12320 kB</div><div>Mapped: 8312 kB</div><div>Slab: 10988324 kB</div><div>PageTables: 1584 kB</div><div>NFS_Unstable: 0 kB</div><div>
Bounce: 0 kB</div><div>CommitLimit: 41190612 kB</div><div>Committed_AS: 44772 kB</div><div>VmallocTotal: 34359738367 kB</div><div>VmallocUsed: 267000 kB</div><div>VmallocChunk: 34359471059 kB</div><div>
HugePages_Total: 0</div><div>HugePages_Free: 0</div><div>HugePages_Rsvd: 0</div><div>Hugepagesize: 2048 kB</div><span class="HOEnZb"><font color="#888888"><div><br></div></font></span></div><span class="HOEnZb"><font color="#888888">-- <br>
Lin Zhao<br>Project Lead of Messagebus<div><a href="https://wiki.groupondev.com/Message_Bus" target="_blank">https://wiki.groupondev.com/Message_Bus</a><br>
3101 Park Blvd, Palo Alto, CA 94306<br></div>
</font></span></div></div>
</blockquote></div><br><br clear="all"><div><br></div>-- <br>Lin Zhao<br>Project Lead of Messagebus<div><a href="https://wiki.groupondev.com/Message_Bus" target="_blank">https://wiki.groupondev.com/Message_Bus</a><br>3101 Park Blvd, Palo Alto, CA 94306<br>
</div>
</div>