[DRBD-user] rpc_tasks slowly consumes memory

Mon Dec 22 22:30:30 CET 2008

  Thanks for the response Lars, you were correct. As it turns out I needed
to update the nfs-utils package. I have done so and am currently watching
rpc_tasks slab usage. Its only been running for a couple days now, but
already seems to be stabilized. If all is well Friday, this issue will be
fixed.

Thanks again,

Erik Lat wrote:
> 
>  Hey all. I currently have a DRBD filesystem setup between 2 nodes both
> running on CentOS 5.2 servers. Version information:
> 
> heartbeat-2.1.3-3.el5.centos
> kmod-drbd82-8.2.6-1.2.6.18_92.1.10.el5
> drbd82-8.2.6-1.el5.centos
> kernel-2.6.18-92.1.10.el5
> 
> /proc/drbd
> version: 8.2.6 (api:88/proto:86-88)
> GIT-hash: 3e69822d3bb4920a8c1bfdf7d647169eba7d2eb4 build by
> buildsvn at c5-i386-build, 2008-08-07 17:07:52
>  0: cs:Connected st:Primary/Secondary ds:UpToDate/UpToDate C r---
>     ns:409988212 nr:1220768 dw:411208980 dr:108310305 al:1353369 bm:1174
> lo:0 pe:0 ua:0 ap:0 oos:0
> 
> 
>   The problem is that after some time I start to see memory usage on the
> primary drbd node increase and never get free'd. This happens over the
> course of a month. before the box (with a gig of ram) eventually starts
> swapping. Under top i only see the heartbeat process using the most memory
> which is only at 1.2%.
> 
>   However when i run slabtop, I see a process 'rpc_tasks' consuming around
> 850M of the memory:
> 
>  OBJS ACTIVE  USE OBJ SIZE  SLABS OBJ/SLAB CACHE SIZE NAME                   
>  3149025 3149025 100%    0.25K 209935       15    839740K rpc_tasks
>  93583  93530  99%    0.02K    461      203      1844K avtab_node
> 
>  I've no idea what this task is, but I assume it has something to do with
> NFS. The only way to free up the memory is to reboot the server. 
> 
> Under /var/log/messages I see the following errors repeated hundreds of
> times:
> 
> Dec 12 08:20:04 fs1 kernel: lockd: server 192.168.0.128 not responding,
> timed out
> Dec 12 14:52:16 fs1 kernel: lockd: server 192.168.0.128 not responding,
> timed out
> Dec 13 14:52:09 fs1 kernel: lockd: server 192.168.0.128 not responding,
> timed out
> 
> The .128 host is an NFS client accessing data on the DRBD NFS share. There
> are 2 other nodes here (.129 and .130) but neither of them have any
> problems. And all of them have lockd / statd / etc running.
> Both are mounted using NFS via UDP, not TCP.
> 
> Does anyone have any idea what the memory consumption error could be
> related to? I can provide any additional information if necessary.
> 
> As a side note, I did see that Centos has a kernel update and an update to
> the drbd kernel module, but since the rpc_tasks() (system call?) seems to
> be chewing up the memory, I dont think upgrading drbd's module version
> will fix it and I'm hesitant to upgrade to the newer kernel willy-nilly
> like.
> 
> I found this, but only found 2 fixes pertaining to NFS:
> https://rhn.redhat.com/errata/RHSA-2008-1017.html
> 450335 - LTC41974-Pages of a memory mapped NFS file get corrupted.
> 469650 - [REG][5.2][NFSv4] Accessing the same file at the same time causes
> NFSv4 open() call to stall forever on NFS4ERR_DELAY
> 
> Any help is greatly appreaciated.
> 
> 

-- 
View this message in context: http://www.nabble.com/rpc_tasks-slowly-consumes-memory-tp21083441p21135349.html
Sent from the DRBD - User mailing list archive at Nabble.com.