[DRBD-user] rpc_tasks slowly consumes memory

Erik Lat forums at htbindustries.org
Fri Dec 19 22:33:30 CET 2008

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.



I should also note that for the NFS clients I am using version 2, with
(r/w)size=4096


Erik Lat wrote:
> 
> 
>   As a side note, I am unable to reboot the machine. A few minutes after
> running 'shutdown -r now' I see this on the console screen:
> 
> 
> ====================
> BUG:  soft lockup - CPU#0 stuck for 10s! [rpciod/0:3134]
> 
> Pid: 3134, comm:		rpciod/0
> EIP: 0060:[<C060998E>] CPU: 0
> EIP is at _spin_lock_bh+0xf/0x18
>   EFLAGS: 00000286	Tainted: G	(2.6.18-92.1.10.el5 #1)
> EAX: f251b000 EBX: f38d7b58 ECX: d143ab84 EDX:90d91878
> ESI: f38d7b58 EDI: 00000000 EBP: 00000246 DS: 007b ES: 007b
> CR0: 8005003b CR2: 00172fa0 CR3: 00726000 CR4: 000006d0
>   [<f99e89b9>] svc_wake_up+0xe/0x4b [sunrpc]
>   [<f99e5c54>] rpc_exit_task+0x1b/0x58 [sunrpc]
>   [<f99e5ed9>] __rpc_execute+0x7a/0x1f8 [sunrpc]
>   [<c043322a>] run_workqueue+0x78/0xb5
>   [<f99e6057>] rpc_async_schedule+0x0/0x5 [sunrpc]
>   [<c0433ade>] worker_thread+0xd9/0x10b
>   [<c042027b>] defaulty_wake_function+0x0/0xc
>   [<c0433a05>] worker_thread+0x0/0x10b
>   [<c0435eed>] kthread+0xc0/0xeb
>   [<c0435e2d>] kthread+0x0/0xeb
>   [<c0405c3b>] kernel_thread_helper+0x7/0x10
> ====================
> 
> 
> 
> Erik Lat wrote:
>> 
>>  Hey all. I currently have a DRBD filesystem setup between 2 nodes both
>> running on CentOS 5.2 servers. Version information:
>> 
>> heartbeat-2.1.3-3.el5.centos
>> kmod-drbd82-8.2.6-1.2.6.18_92.1.10.el5
>> drbd82-8.2.6-1.el5.centos
>> kernel-2.6.18-92.1.10.el5
>> 
>> /proc/drbd
>> version: 8.2.6 (api:88/proto:86-88)
>> GIT-hash: 3e69822d3bb4920a8c1bfdf7d647169eba7d2eb4 build by
>> buildsvn at c5-i386-build, 2008-08-07 17:07:52
>>  0: cs:Connected st:Primary/Secondary ds:UpToDate/UpToDate C r---
>>     ns:409988212 nr:1220768 dw:411208980 dr:108310305 al:1353369 bm:1174
>> lo:0 pe:0 ua:0 ap:0 oos:0
>> 
>> 
>>   The problem is that after some time I start to see memory usage on the
>> primary drbd node increase and never get free'd. This happens over the
>> course of a month. before the box (with a gig of ram) eventually starts
>> swapping. Under top i only see the heartbeat process using the most
>> memory which is only at 1.2%.
>> 
>>   However when i run slabtop, I see a process 'rpc_tasks' consuming
>> around 850M of the memory:
>> 
>>  OBJS ACTIVE  USE OBJ SIZE  SLABS OBJ/SLAB CACHE SIZE NAME                   
>>  3149025 3149025 100%    0.25K 209935       15    839740K rpc_tasks
>>  93583  93530  99%    0.02K    461      203      1844K avtab_node
>> 
>>  I've no idea what this task is, but I assume it has something to do with
>> NFS. The only way to free up the memory is to reboot the server. 
>> 
>> Under /var/log/messages I see the following errors repeated hundreds of
>> times:
>> 
>> Dec 12 08:20:04 fs1 kernel: lockd: server 192.168.0.128 not responding,
>> timed out
>> Dec 12 14:52:16 fs1 kernel: lockd: server 192.168.0.128 not responding,
>> timed out
>> Dec 13 14:52:09 fs1 kernel: lockd: server 192.168.0.128 not responding,
>> timed out
>> 
>> The .128 host is an NFS client accessing data on the DRBD NFS share.
>> There are 2 other nodes here (.129 and .130) but neither of them have any
>> problems. And all of them have lockd / statd / etc running.
>> Both are mounted using NFS via UDP, not TCP.
>> 
>> Does anyone have any idea what the memory consumption error could be
>> related to? I can provide any additional information if necessary.
>> 
>> As a side note, I did see that Centos has a kernel update and an update
>> to the drbd kernel module, but since the rpc_tasks() (system call?) seems
>> to be chewing up the memory, I dont think upgrading drbd's module version
>> will fix it and I'm hesitant to upgrade to the newer kernel willy-nilly
>> like.
>> 
>> I found this, but only found 2 fixes pertaining to NFS:
>> https://rhn.redhat.com/errata/RHSA-2008-1017.html
>> 450335 - LTC41974-Pages of a memory mapped NFS file get corrupted.
>> 469650 - [REG][5.2][NFSv4] Accessing the same file at the same time
>> causes NFSv4 open() call to stall forever on NFS4ERR_DELAY
>> 
>> Any help is greatly appreaciated.
>> 
>> 
> 
> 

-- 
View this message in context: http://www.nabble.com/rpc_tasks-slowly-consumes-memory-tp21083441p21098659.html
Sent from the DRBD - User mailing list archive at Nabble.com.




More information about the drbd-user mailing list