Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On 2005-01-21T23:03:09, Pavel Srubar <pajinek at centrum.cz> wrote: > > I have run drbd 0.7.7 on SLES9 for a couple of month without any > problems. Everytime I upgraded distribution kernel I compiled drbd > against new kernel sources and everything went well. But after last > upgrade to kernel version 2.6.5-7.139-smp > SLES9_SP1_BRANCH-200501141541330000 I get Oops after trying to make > filesystem on drbd0 device and command accessing device will hang > until hard reboot. Used HW: 2x machine with Intel Xeon 3.0Ghz, 1G RAM, > each machine has 2 SCSI 72GB HDD connected together as HW RAID 1, one > partition (53GB) /dev/cciss/c0d0p7 is used as a physical device for > drbd0. Follows actions I made and dmesg output after them (for this > test I used another HDD with 400MB partition): Ah damn, so drbd is also affected by this. This is a recent semantic change in the bio handling which is also present in 2.6.10-ac10 and up. The cause is probably in drbd_actlog.c:_drbd_md_sync_page_io(), where a bio is allocated on the stack and then initialized via bio_init(), instead of being allocated as a struct bio * via bio_alloc(). Jens: Maybe getting that work-around into the core kernel would be good, with a loud warning maybe, instead of oopsing with a NULL deref? There may be more code relying on this in external modules :-( I'll fix this after the dojo tomorrow in drbd, but it's too late now. (A similar bug existed in md, which probably explains where drbd 'inherited' it from ;-) > Unable to handle kernel paging request at virtual address 4008d3c7 > printing eip: > c0178555 > *pde = 31dd8067 > Oops: 0003 [#1] > SMP > CPU: 0 > EIP: 0060:[<c0178555>] Tainted: G U > EFLAGS: 00210206 (2.6.5-7.139-smp SLES9_SP1_BRANCH-200501141541330000) > EIP is at __bio_clone+0x35/0xc0 > eax: 0000000c ebx: 00000000 ecx: 00000003 edx: edc36b00 > esi: cdfd77c0 edi: 4008d3c7 ebp: edd20bc4 esp: ede65b34 > ds: 007b es: 007b ss: 0068 > Process mkreiserfs (pid: 4650, threadinfo=ede64000 task=f1d779b0) > Stack: ee20a600 00000000 edd20ba8 f183f800 edc36b00 f93900b4 eddf6ff4 f7a95284 > 000b8005 00000000 00000001 edd20bc4 f7192904 f7192904 c1ae6e14 00000000 > 00000008 f7192904 00000008 c01741e1 00000000 00000000 00001000 00000000 > Call Trace: > [<f93900b4>] drbd_make_request_26+0x364/0xbdf [drbd] > [<c01741e1>] __find_get_block+0xa1/0x1b0 > [<c026152d>] generic_make_request+0x11d/0x200 > [<f90bf7c3>] search_by_key+0x153/0x14c0 [reiserfs] > [<c0151703>] mempool_alloc+0x73/0x140 > [<c0128d50>] autoremove_wake_function+0x0/0x40 > [<c0261678>] submit_bio+0x68/0x120 > [<c0128d50>] autoremove_wake_function+0x0/0x40 > [<c015d5e6>] do_no_page+0x246/0x8e0 > [<c01776f2>] bio_alloc+0xd2/0x1c0 > [<c017386d>] submit_bh+0x17d/0x230 > [<c0175967>] block_read_full_page+0x357/0x360 > [<c0179c50>] blkdev_get_block+0x0/0x80 > [<c014d3a7>] add_to_page_cache+0x57/0x180 > [<c01552c0>] read_pages+0x130/0x1b0 > [<c0153b7d>] __alloc_pages+0xad/0x310 > [<c015eeb8>] handle_mm_fault+0x138/0xb60 > [<c015544e>] do_page_cache_readahead+0x10e/0x180 > [<c01555e8>] page_cache_readahead+0x128/0x240 > [<c014e692>] do_generic_mapping_read+0x332/0x7d0 > [<c014cc60>] file_read_actor+0x0/0xf0 > [<c014f682>] __generic_file_aio_read+0x1e2/0x220 > [<c014cc60>] file_read_actor+0x0/0xf0 > [<c0153b7d>] __alloc_pages+0xad/0x310 > [<c014f7ef>] generic_file_read+0x8f/0xb0 > [<c015eeb8>] handle_mm_fault+0x138/0xb60 > [<c0128d50>] autoremove_wake_function+0x0/0x40 > [<c011df93>] do_page_fault+0x163/0x53f > [<c0172216>] vfs_read+0xc6/0x160 > [<c0179d80>] block_llseek+0x0/0x110 > [<c01724c1>] sys_read+0x91/0xf0 > [<c01091d9>] sysenter_past_esp+0x52/0x79 > > Code: f3 a5 a8 02 74 02 66 a5 a8 01 74 01 a4 8b 42 0c 8b 0a 8b 5a Sincerely, Lars Marowsky-Brée <lmb at suse.de> -- High Availability & Clustering SUSE Labs, Research and Development SUSE LINUX Products GmbH - A Novell Business