[DRBD-user] kjournald getting stuck in sync_buffer on a drbd device

Lars Kellogg-Stedman lars at seas.harvard.edu
Sat Apr 26 01:18:09 CEST 2008

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


> OK. Your Primary has a pending count ("pe") > 0, which means it is  
> waiting for
> the Secondary to complete stuff it is currently working on. So this  
> looks
> like you're having issues on your Secondary. Please provide that ps
> and /proc/drbd output for your Secondary, just like you did for your  
> Primary.

So when the primary looks like this:

version: 8.2.5 (api:88/proto:86-88)
GIT-hash: 9faf052fdae5ef0c61b4d03890e2d2eab550610c build by  
buildsvn at c5-x8664-build, 2008-03-09 10:16:01
  0: cs:Connected st:Primary/Secondary ds:UpToDate/UpToDate C r---
     ns:384 nr:440 dw:824 dr:3729 al:1 bm:5 lo:0 pe:2 ua:0 ap:1
         resync: used:0/31 hits:49 misses:5 starving:0 dirty:0 changed:5
         act_log: used:1/127 hits:95 misses:1 starving:0 dirty:0  
changed:1
  1: cs:Connected st:Secondary/Primary ds:UpToDate/UpToDate C r---
     ns:0 nr:21008 dw:21008 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0
         resync: used:0/31 hits:0 misses:0 starving:0 dirty:0 changed:0
         act_log: used:0/127 hits:0 misses:0 starving:0 dirty:0  
changed:0

The secondary looks like this:

version: 8.2.5 (api:88/proto:86-88)
GIT-hash: 9faf052fdae5ef0c61b4d03890e2d2eab550610c build by  
buildsvn at c5-x8664-build, 2008-03-09 10:16:01
  0: cs:Connected st:Secondary/Primary ds:UpToDate/UpToDate C r---
     ns:42272 nr:32984 dw:38988 dr:45742 al:1 bm:52 lo:1 pe:0 ua:0 ap:0
	resync: used:0/31 hits:2478 misses:28 starving:0 dirty:0 changed:28
	act_log: used:0/127 hits:1500 misses:1 starving:0 dirty:0 changed:1
  1: cs:Connected st:Primary/Secondary ds:UpToDate/UpToDate C r---
     ns:145396 nr:12 dw:145728 dr:69130 al:1 bm:131 lo:0 pe:0 ua:0 ap:0
	resync: used:0/31 hits:635 misses:9 starving:0 dirty:0 changed:9
	act_log: used:0/127 hits:36428 misses:1 starving:0 dirty:0 changed:1

(Here we're looking at drbd0, so when I say "primary" I mean "the  
system that is primary for drbd0", and similarly for "secondary").

The two systems are passing network traffic back and forth while  
kjournald is hung.  I took a short packet trace, which is available  
here:

   http://people.seas.harvard.edu/~lars/drbd/packets

On the primary, kjournald is stuck in the D state in sync_buffer.  On  
the secondary, there don't appear to be any similarly stuck  
processes.  There no obvious errors in syslog on either system.  There  
are entries like this on the secondary:

   drbd1: [drbd1_worker/2321] sock_sendmsg time expired, ko = 4294967255

But these appear to correspond to me shutting down the primary, rather  
than the beginning of the situation.

-- Lars




More information about the drbd-user mailing list