[DRBD-user] D States with drbd on 2.6.22

Thu Aug 23 17:50:11 CEST 2007

On Thu, Aug 23, 2007 at 11:41:44AM +0200, Jens Beyer wrote:
> 
> Hi,
> 
> I'm running 2.6.22.1 with 8.0.4 (also tried 8.0.5) 
> on a Testsystem.  The drbd itself is located on lvm.
> 
> Since using 2.6.22.1 some problems with DRBD appeared:
> under several conditions I can lead an operation on a drbd-device 
> to 'hang' in D state. 
> 
> It is best triggered by disconnecting the secondary node of a running
> drbd-'cluster' via drbdadm disconnect|down <device> and running I/O operations 
> on primary node (like rm or even a recovery mount of xfs). 
> 
> 
> Trying to reconnect at this time isnt possible (secondary 
> is starting until WFBitMapT, primary ends in WFReportParams with
> drbd_receive in 'D' state too (see trace at end of mail).
> 
> On the same cluster running 2.6.20.2 (and earlier) I never had 
> similar problems (and they arent reproducable as on .22). 
> 
> Some traces (doing an rm -r on xfs on drbd) (only process in D state,
> (I can provide full trace if needed).
> 
> Though this seems to be kernel-related I prefer to post it here rather 
> then on lkml for easier track down.

I think this is a side effect of git commit
 d89d87965dcbe6fe4f96a2a7e8421b3a75f634d1 
 When stacked block devices are in-use (e.g. md or dm), the recursive calls

I don't see exactly where it locks up,
since we actually no longer really do blocking
recurse into generic_make_request, but rather
queue housekeeping requests to the worker.
well. unless, maybe, in the bitmap writeout path,
indirectly via submit_bio.

but that should be it.
(unless it locks up in the layers _below_ us,
any more stacked block devices involved? dm? md?

> Aug 21 11:14:20 boxfe02 kernel: [ 1477.079000] xfsbufd       D 000001208fb9b9ad     0 14878      2 (L-TLB)
> Aug 21 11:14:20 boxfe02 kernel: [ 1477.079017]  ffff81010bb75c00 0000000000000046 0000000000000000 ffff81011b74f7d0
> Aug 21 11:14:20 boxfe02 kernel: [ 1477.079038]  ffff81010bb75be0 ffffffff804b6080 ffffffff80551100 000000010001509c
> Aug 21 11:14:20 boxfe02 kernel: [ 1477.079058]  000000000bb75b90 ffff81011b71c9a8 0000000000000171 ffffffff804b6080

> Aug 21 11:14:20 boxfe02 kernel: [ 1477.079078] Call Trace:
> Aug 21 11:14:20 boxfe02 kernel: [ 1477.079100]  [<ffffffff88144e8e>] :drbd:lc_find+0x1e/0x60
> Aug 21 11:14:20 boxfe02 kernel: [ 1477.079120]  [<ffffffff881435d7>] :drbd:drbd_al_begin_io+0x267/0x320

hm. if anywhere, I'd expect it to lock in drbd_al_begin_io, ok.
but not in lc_find, that is even before we queued the housekeeping
request for the worker... maybe that last part of the stack trace is
unreliable. probably I'm still missing something.

> Aug 21 11:14:20 boxfe02 kernel: [ 1477.079131]  [<ffffffff80242cf0>] autoremove_wake_function+0x0/0x30
> Aug 21 11:14:20 boxfe02 kernel: [ 1477.079139]  [<ffffffff80242cf0>] autoremove_wake_function+0x0/0x30
> Aug 21 11:14:20 boxfe02 kernel: [ 1477.079150]  [<ffffffff802ad1ac>] __bio_clone+0x9c/0xc0
> Aug 21 11:14:20 boxfe02 kernel: [ 1477.079170]  [<ffffffff88140803>] :drbd:drbd_make_request_common+0x5b3/0xb90
> Aug 21 11:14:20 boxfe02 kernel: [ 1477.079179]  [<ffffffff80302130>] elv_rb_add+0x70/0x80
> Aug 21 11:14:20 boxfe02 kernel: [ 1477.079188]  [<ffffffff80242cf0>] autoremove_wake_function+0x0/0x30
> Aug 21 11:14:20 boxfe02 kernel: [ 1477.079196]  [<ffffffff80242cf0>] autoremove_wake_function+0x0/0x30
> Aug 21 11:14:20 boxfe02 kernel: [ 1477.079207]  [<ffffffff80304014>] generic_make_request+0x1c4/0x260
> Aug 21 11:14:20 boxfe02 kernel: [ 1477.079216]  [<ffffffff8030410e>] submit_bio+0x5e/0xf0
> Aug 21 11:14:20 boxfe02 kernel: [ 1477.079224]  [<ffffffff802ad01b>] __bio_add_page+0x1ab/0x220
> Aug 21 11:14:20 boxfe02 kernel: [ 1477.079259]  [<ffffffff88296870>] :xfs:_xfs_buf_ioapply+0x230/0x2e0
> Aug 21 11:14:20 boxfe02 kernel: [ 1477.079292]  [<ffffffff88297659>] :xfs:xfs_buf_iorequest+0x29/0x70
> Aug 21 11:14:20 boxfe02 kernel: [ 1477.079324]  [<ffffffff8829bda5>] :xfs:xfs_bdstrat_cb+0x35/0x50
> Aug 21 11:14:20 boxfe02 kernel: [ 1477.079355]  [<ffffffff88297902>] :xfs:xfsbufd+0x92/0x150
> Aug 21 11:14:20 boxfe02 kernel: [ 1477.079387]  [<ffffffff88297870>] :xfs:xfsbufd+0x0/0x150
> Aug 21 11:14:20 boxfe02 kernel: [ 1477.079394]  [<ffffffff8024294c>] kthread+0x6c/0xa0
> Aug 21 11:14:20 boxfe02 kernel: [ 1477.079403]  [<ffffffff8020ac78>] child_rip+0xa/0x12
> Aug 21 11:14:20 boxfe02 kernel: [ 1477.079412]  [<ffffffff802428e0>] kthread+0x0/0xa0
> Aug 21 11:14:20 boxfe02 kernel: [ 1477.079419]  [<ffffffff8020ac6e>] child_rip+0x0/0x12
> Aug 21 11:14:20 boxfe02 kernel: [ 1477.079424] 

> Aug 21 11:14:20 boxfe02 kernel: [ 1477.082992] 
> Aug 21 11:14:20 boxfe02 kernel: [ 1477.082997] rm            D 0000011a7687dbb8     0 15156  13547 (NOTLB)
> Aug 21 11:14:20 boxfe02 kernel: [ 1477.083014]  ffff8101191af7e8 0000000000000082 0000000000000000 ffff8101191af87c
> Aug 21 11:14:20 boxfe02 kernel: [ 1477.083034]  ffff8101191af880 ffffffff8825359b 0000000100000246 0000000100014656
> Aug 21 11:14:20 boxfe02 kernel: [ 1477.083059]  0000000308cd8080 ffff81004f9e8a68 000000000002c3ae ffff81011be057d0
> Aug 21 11:14:20 boxfe02 kernel: [ 1477.083078] Call Trace:
> Aug 21 11:14:20 boxfe02 kernel: [ 1477.083109]  [<ffffffff8825359b>] :xfs:xfs_bmap_search_multi_extents+0x10b/0x120
> Aug 21 11:14:20 boxfe02 kernel: [ 1477.083121]  [<ffffffff804041b1>] wait_for_completion+0xa1/0x100
> Aug 21 11:14:20 boxfe02 kernel: [ 1477.083131]  [<ffffffff80228f70>] default_wake_function+0x0/0x10
> Aug 21 11:14:20 boxfe02 kernel: [ 1477.083142]  [<ffffffff80228f70>] default_wake_function+0x0/0x10
> Aug 21 11:14:20 boxfe02 kernel: [ 1477.083163]  [<ffffffff88143938>] :drbd:_drbd_md_sync_page_io+0xc8/0x130

ah. no. the xfsbufd above is waiting for this rm to finish.

> Aug 21 11:14:20 boxfe02 kernel: [ 1477.083186]  [<ffffffff881442fc>] :drbd:drbd_md_sync_page_io+0x29c/0x4f0
> Aug 21 11:14:20 boxfe02 kernel: [ 1477.083204]  [<ffffffff881317a0>] :drbd:drbd_bm_get_lel+0x130/0x220
> Aug 21 11:14:20 boxfe02 kernel: [ 1477.083223]  [<ffffffff88132a58>] :drbd:drbd_bm_write_sect+0xc8/0x220

right. we have to queue the bitmap updates for the worker as well.

so what happens is:
fs -> generic_make_request
      -> __generic_make_request
        -> drbd_make_request_common
          -> drbd_al_begin_io
            -> drbd_bm_write_sect
              -> drbd_md_sync_page_io
                -> submit_bio
                  -> generic_make_request
                        (since this is recursed,
                         it will only add the request to the list,
                         but not submit it)
                -> wait for that bio to complete,
                   which will never happen,
                   since it would only actually be submitted
                   in the loop of the outmost generic_make_request.

problem is understood.
I'll fix this as soon as I find the time to code it up.

btw, we fixed the activity log write path in 2006-01-29,
(even within drbd 0.7 already) when that patch by Neil Brown got
included into FC4, and those users first complained about deadlocks.
but something else changed since, aparently,
or we fixed it only half way back then.

> Aug 21 11:14:20 boxfe02 kernel: [ 1477.083243]  [<ffffffff8814353d>] :drbd:drbd_al_begin_io+0x1cd/0x320
> Aug 21 11:14:20 boxfe02 kernel: [ 1477.083252]  [<ffffffff80264f99>] mempool_alloc+0x39/0x110
> Aug 21 11:14:20 boxfe02 kernel: [ 1477.083262]  [<ffffffff80264f99>] mempool_alloc+0x39/0x110
> Aug 21 11:14:20 boxfe02 kernel: [ 1477.083271]  [<ffffffff80281d99>] cache_alloc_refill+0x199/0x500
> Aug 21 11:14:20 boxfe02 kernel: [ 1477.083280]  [<ffffffff802ad1ac>] __bio_clone+0x9c/0xc0
> Aug 21 11:14:20 boxfe02 kernel: [ 1477.083298]  [<ffffffff88140803>] :drbd:drbd_make_request_common+0x5b3/0xb90
> Aug 21 11:14:20 boxfe02 kernel: [ 1477.083309]  [<ffffffff80302130>] elv_rb_add+0x70/0x80
> Aug 21 11:14:20 boxfe02 kernel: [ 1477.083317]  [<ffffffff80242cf0>] autoremove_wake_function+0x0/0x30
> Aug 21 11:14:20 boxfe02 kernel: [ 1477.083326]  [<ffffffff80242cf0>] autoremove_wake_function+0x0/0x30
> Aug 21 11:14:20 boxfe02 kernel: [ 1477.083337]  [<ffffffff80304014>] generic_make_request+0x1c4/0x260
> Aug 21 11:14:20 boxfe02 kernel: [ 1477.083346]  [<ffffffff8030410e>] submit_bio+0x5e/0xf0
> Aug 21 11:14:20 boxfe02 kernel: [ 1477.083354]  [<ffffffff802ad01b>] __bio_add_page+0x1ab/0x220
> Aug 21 11:14:20 boxfe02 kernel: [ 1477.083390]  [<ffffffff88296870>] :xfs:_xfs_buf_ioapply+0x230/0x2e0
> Aug 21 11:14:20 boxfe02 kernel: [ 1477.083422]  [<ffffffff88297659>] :xfs:xfs_buf_iorequest+0x29/0x70
> Aug 21 11:14:20 boxfe02 kernel: [ 1477.083454]  [<ffffffff8829bda5>] :xfs:xfs_bdstrat_cb+0x35/0x50
> Aug 21 11:14:20 boxfe02 kernel: [ 1477.083486]  [<ffffffff88297a04>] :xfs:xfs_buf_iostart+0x44/0xa0
> Aug 21 11:14:20 boxfe02 kernel: [ 1477.083522]  [<ffffffff88287ed8>] :xfs:xfs_trans_push_ail+0x1f8/0x280
> Aug 21 11:14:20 boxfe02 kernel: [ 1477.083556]  [<ffffffff8827d374>] :xfs:xfs_log_reserve+0x74/0xf0
> Aug 21 11:14:20 boxfe02 kernel: [ 1477.083591]  [<ffffffff882874cf>] :xfs:xfs_trans_reserve+0xaf/0x200
> Aug 21 11:14:20 boxfe02 kernel: [ 1477.083624]  [<ffffffff88293ae2>] :xfs:kmem_zone_zalloc+0x32/0x50
> Aug 21 11:14:20 boxfe02 kernel: [ 1477.083659]  [<ffffffff882743a1>] :xfs:xfs_itruncate_finish+0x121/0x310
> Aug 21 11:14:20 boxfe02 kernel: [ 1477.083694]  [<ffffffff8828dbeb>] :xfs:xfs_inactive+0x3fb/0x4e0
> Aug 21 11:14:20 boxfe02 kernel: [ 1477.083727]  [<ffffffff8829d97c>] :xfs:xfs_fs_clear_inode+0xec/0x120
> Aug 21 11:14:20 boxfe02 kernel: [ 1477.083737]  [<ffffffff8029b126>] clear_inode+0x116/0x150
> Aug 21 11:14:20 boxfe02 kernel: [ 1477.083747]  [<ffffffff8029b76b>] generic_delete_inode+0x11b/0x150
> Aug 21 11:14:20 boxfe02 kernel: [ 1477.083757]  [<ffffffff8029a557>] iput+0x67/0x80
> Aug 21 11:14:20 boxfe02 kernel: [ 1477.083765]  [<ffffffff80291391>] do_unlinkat+0x101/0x180
> Aug 21 11:14:20 boxfe02 kernel: [ 1477.083775]  [<ffffffff802931fb>] sys_getdents+0xbb/0xe0
> Aug 21 11:14:20 boxfe02 kernel: [ 1477.083785]  [<ffffffff80209e5e>] system_call+0x7e/0x83
> Aug 21 11:14:20 boxfe02 kernel: [ 1477.083793] 

-- 
: Lars Ellenberg                            Tel +43-1-8178292-0  :
: LINBIT Information Technologies GmbH      Fax +43-1-8178292-82 :
: Vivenotgasse 48, A-1120 Vienna/Europe    http://www.linbit.com :
__
please use the "List-Reply" function of your email client.