[DRBD-user] Oops after upgrading kernel

Sat Jan 22 01:02:09 CET 2005

On 2005-01-21T23:03:09, Pavel Srubar <pajinek at centrum.cz> wrote:

> 
> I have run drbd 0.7.7 on SLES9 for a couple of month without any
> problems.  Everytime I upgraded distribution kernel I compiled drbd
> against new kernel sources and everything went well. But after last
> upgrade to kernel version 2.6.5-7.139-smp
> SLES9_SP1_BRANCH-200501141541330000 I get Oops after trying to make
> filesystem on drbd0 device and command accessing device will hang
> until hard reboot. Used HW: 2x machine with Intel Xeon 3.0Ghz, 1G RAM,
> each machine has 2 SCSI 72GB HDD connected together as HW RAID 1, one
> partition (53GB) /dev/cciss/c0d0p7 is used as a physical device for
> drbd0. Follows actions I made and dmesg output after them (for this
> test I used another HDD with 400MB partition):

Ah damn, so drbd is also affected by this. This is a recent semantic
change in the bio handling which is also present in 2.6.10-ac10 and up.

The cause is probably in drbd_actlog.c:_drbd_md_sync_page_io(), where a
bio is allocated on the stack and then initialized via bio_init(),
instead of being allocated as a struct bio * via bio_alloc().

Jens: Maybe getting that work-around into the core kernel would be good,
with a loud warning maybe, instead of oopsing with a NULL deref? There
may be more code relying on this in external modules :-(

I'll fix this after the dojo tomorrow in drbd, but it's too late now.

(A similar bug existed in md, which probably explains where drbd
'inherited' it from ;-)

> Unable to handle kernel paging request at virtual address 4008d3c7
>  printing eip:
> c0178555
> *pde = 31dd8067
> Oops: 0003 [#1]
> SMP
> CPU:    0
> EIP:    0060:[<c0178555>]    Tainted: G  U
> EFLAGS: 00210206   (2.6.5-7.139-smp SLES9_SP1_BRANCH-200501141541330000)
> EIP is at __bio_clone+0x35/0xc0
> eax: 0000000c   ebx: 00000000   ecx: 00000003   edx: edc36b00
> esi: cdfd77c0   edi: 4008d3c7   ebp: edd20bc4   esp: ede65b34
> ds: 007b   es: 007b   ss: 0068
> Process mkreiserfs (pid: 4650, threadinfo=ede64000 task=f1d779b0)
> Stack: ee20a600 00000000 edd20ba8 f183f800 edc36b00 f93900b4 eddf6ff4 f7a95284
>        000b8005 00000000 00000001 edd20bc4 f7192904 f7192904 c1ae6e14 00000000
>        00000008 f7192904 00000008 c01741e1 00000000 00000000 00001000 00000000
> Call Trace:
>  [<f93900b4>] drbd_make_request_26+0x364/0xbdf [drbd]
>  [<c01741e1>] __find_get_block+0xa1/0x1b0
>  [<c026152d>] generic_make_request+0x11d/0x200
>  [<f90bf7c3>] search_by_key+0x153/0x14c0 [reiserfs]
>  [<c0151703>] mempool_alloc+0x73/0x140
>  [<c0128d50>] autoremove_wake_function+0x0/0x40
>  [<c0261678>] submit_bio+0x68/0x120
>  [<c0128d50>] autoremove_wake_function+0x0/0x40
>  [<c015d5e6>] do_no_page+0x246/0x8e0
>  [<c01776f2>] bio_alloc+0xd2/0x1c0
>  [<c017386d>] submit_bh+0x17d/0x230
>  [<c0175967>] block_read_full_page+0x357/0x360
>  [<c0179c50>] blkdev_get_block+0x0/0x80
>  [<c014d3a7>] add_to_page_cache+0x57/0x180
>  [<c01552c0>] read_pages+0x130/0x1b0
>  [<c0153b7d>] __alloc_pages+0xad/0x310
>  [<c015eeb8>] handle_mm_fault+0x138/0xb60
>  [<c015544e>] do_page_cache_readahead+0x10e/0x180
>  [<c01555e8>] page_cache_readahead+0x128/0x240
>  [<c014e692>] do_generic_mapping_read+0x332/0x7d0
>  [<c014cc60>] file_read_actor+0x0/0xf0
>  [<c014f682>] __generic_file_aio_read+0x1e2/0x220
>  [<c014cc60>] file_read_actor+0x0/0xf0
>  [<c0153b7d>] __alloc_pages+0xad/0x310
>  [<c014f7ef>] generic_file_read+0x8f/0xb0
>  [<c015eeb8>] handle_mm_fault+0x138/0xb60
>  [<c0128d50>] autoremove_wake_function+0x0/0x40
>  [<c011df93>] do_page_fault+0x163/0x53f
>  [<c0172216>] vfs_read+0xc6/0x160
>  [<c0179d80>] block_llseek+0x0/0x110
>  [<c01724c1>] sys_read+0x91/0xf0
>  [<c01091d9>] sysenter_past_esp+0x52/0x79
> 
> Code: f3 a5 a8 02 74 02 66 a5 a8 01 74 01 a4 8b 42 0c 8b 0a 8b 5a

Sincerely,
    Lars Marowsky-Brée <lmb at suse.de>

-- 
High Availability & Clustering
SUSE Labs, Research and Development
SUSE LINUX Products GmbH - A Novell Business