[DRBD-user] kjournald getting stuck in sync_buffer on a drbd device

Florian Haas florian.haas at linbit.com
Fri Apr 25 12:08:23 CEST 2008

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Lars,

On Thursday 24 April 2008 21:14:15 Lars Kellogg-Stedman wrote:
> Howdy,
>
> I've recently had access to a filesystem on a DRBD device freeze up on
> two separate systems, and I wanted to see if anyone else has
> encountered this behavior.  Both systems are running CentOS 5, with
> kernel 2.6.18-53.1.14.el5 and drbd 8.2.5.
>
> On two occasions, an attempt to "ls" a filesystem hung.  I found
> kjournald stuck in sync_buffer:
>
>    # ps -o pid,state,wchan:20,cmd -e | grep D
>    PID S WCHAN                CMD
>     3021 D sync_buffer          [kjournald]
>     6478 D get_write_access     ls --color=tty
>
> After several minutes (5?), things would eventually start working
> again.  While experiencing the problem, /proc/drbd looked like this:
>
> version: 8.2.5 (api:88/proto:86-88)
> GIT-hash: 9faf052fdae5ef0c61b4d03890e2d2eab550610c build by
> buildsvn at c5-x8664-build, 2008-03-09 10:16:01
>   0: cs:Connected st:Secondary/Primary ds:UpToDate/UpToDate C r---
>      ns:3920 nr:4320648 dw:4324568 dr:4526 al:9 bm:270 lo:0 pe:0 ua:0
> ap:0
> 	resync: used:0/31 hits:272417 misses:288 starving:0 dirty:0 changed:288
> 	act_log: used:0/127 hits:971 misses:9 starving:0 dirty:0 changed:9
>   1: cs:Connected st:Primary/Secondary ds:UpToDate/UpToDate C r---
>      ns:4269080 nr:0 dw:132640 dr:4137225 al:49 bm:293 lo:0 pe:2 ua:0
> ap:2
> 	resync: used:0/31 hits:272762 misses:283 starving:0 dirty:0 changed:283
> 	act_log: used:1/127 hits:33111 misses:55 starving:0 dirty:6 changed:49
>
> (The problem occurred on drbd1, for which this system is primary).
>
> Has anyone seen this before?

OK. Your Primary has a pending count ("pe") > 0, which means it is waiting for 
the Secondary to complete stuff it is currently working on. So this looks 
like you're having issues on your Secondary. Please provide that ps 
and /proc/drbd output for your Secondary, just like you did for your Primary.

Also, check the syslog on your Secondary and look for messages that indicate 
I/O subsystem errors and/or timeouts.

Finally, check the syslog on your Primary and grep for "sock_sendmsg time 
expired" messages. 

Let us know if that helps.

Cheers,
Florian

-- 
: Florian G. Haas
: LINBIT Information Technologies GmbH
: Vivenotgasse 48, A-1120 Vienna, Austria

When replying, there is no need to CC my personal address.
I monitor the list on a daily basis. Thank you.



More information about the drbd-user mailing list