[DRBD-user] DRBD(?) BUG() in 2.6.27-rc6

Lars Ellenberg lars.ellenberg at linbit.com
Mon Sep 22 20:35:43 CEST 2008

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


On Mon, Sep 22, 2008 at 06:59:12PM +0200, Jan Kasprzak wrote:
> 	Hello,
> 
> I have tried to migrate one node of my DRBD+Heartbeat cluster to 2.6.27-rc6
> (with drbd-8.0.13), but got the following BUG() trace soon after the migration
> when commands started to access their data on ext3fs on top of drbd
> (the commands in the following oopses are kdc, postgrey, and dspam
> - all of them have their state data on the drbd volume).
> 
> 	I have reverted back to previous kernel, but there is possibly
> something wrong WRT. DRBD and the block layer in the upcoming kernel.
> I have not seen it on regular ext3-over-/dev/sd*.

ext3 tries to submit empty bios, apparently READ (and READA) request,
with no pages attached to store the data to be read.

drbd is not even involved yet.

my guess is that either there is a generic bug in there somewhere,
or they got the logic wrong for the case that "bio_add_page" reports
less than expected.

if so, you should be able to reproduce the very same symptoms and
BUG stacktraces with software raid0 and a stripe width of <= 32k.

> -Yenya
> 
> ------------[ cut here ]------------
> kernel BUG at block/blk-core.c:1495!
> invalid opcode: 0000 [1] SMP 
> CPU 1 
> Modules linked in: drbd ftdi_sio usbserial
> Pid: 3518, comm: kdc Not tainted 2.6.27-rc6 #1
> RIP: 0010:[<ffffffff802f825a>]  [<ffffffff802f825a>] submit_bio+0x29/0xcf
> RSP: 0018:ffff88003e9a9a98  EFLAGS: 00010246
> RAX: 0000000000000000 RBX: ffff88007f960940 RCX: 0000000000000000
> RDX: ffffffff802a23c1 RSI: ffff88007f960940 RDI: 0000000000000000
> RBP: ffff88003e9a9bd8 R08: ffff8800568c6b50 R09: 0000000000000040
> R10: ffff88007f960940 R11: 0000000000000001 R12: ffffe20000d3a438
> R13: 0000000000000001 R14: 0000000000de5630 R15: 0000000000000001
> FS:  00007f82149006f0(0000) GS:ffff88007f8799c0(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> CR2: 000000000044f050 CR3: 000000004c268000 CR4: 00000000000006e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Process kdc (pid: 3518, threadinfo ffff88003e9a8000, task ffff88007f9e3990)
> Stack:  ffff880000000000 0000000000000000 000000004975a130 0000000000001000
>  ffff88003e9a9bd8 0000000000001000 ffff88003e9a9bd8 ffffffff802a17c6
>  0000000000000001 ffffffff802a215e ffff88004975a198 ffffffff802c3dad
> Call Trace:
>  [<ffffffff802a17c6>] ? mpage_bio_submit+0x22/0x26
>  [<ffffffff802a215e>] ? do_mpage_readpage+0x3f2/0x4ab
>  [<ffffffff802c3dad>] ? ext3_get_block+0x0/0xe6
>  [<ffffffff802c3dad>] ? ext3_get_block+0x0/0xe6
>  [<ffffffff8025c7dd>] ? add_to_page_cache_locked+0x74/0x96
>  [<ffffffff802c3dad>] ? ext3_get_block+0x0/0xe6
>  [<ffffffff802a2383>] ? mpage_readpages+0xa6/0xe4
>  [<ffffffff802c3dad>] ? ext3_get_block+0x0/0xe6
>  [<ffffffff80261033>] ? __alloc_pages_internal+0xd6/0x3a9
>  [<ffffffff80262f20>] ? __do_page_cache_readahead+0xfc/0x18e
>  [<ffffffff80263212>] ? ondemand_readahead+0x13a/0x149
>  [<ffffffff8025de23>] ? generic_file_aio_read+0x1f6/0x523
>  [<ffffffff8027c13b>] ? do_sync_read+0xc9/0x10c
>  [<ffffffff8023f800>] ? autoremove_wake_function+0x0/0x2e
>  [<ffffffff802e493f>] ? selinux_file_permission+0x4e/0xf6
>  [<ffffffff8027c8c0>] ? vfs_read+0xaa/0x133
>  [<ffffffff8027cba7>] ? sys_read+0x45/0x6e
>  [<ffffffff8020b20b>] ? system_call_fastpath+0x16/0x1b
> 
> 
> Code: d0 c3 55 48 63 c7 53 48 89 f3 48 83 ec 28 48 0b 46 20 8b 4e 30 a8 04 48 89 46 20 74 08 85 c9 0f 84 9d 00 00 00 83 7b 30 00 75 04 <0f> 0b eb fe 48 83 7b 40 00 75 04 0f 0b eb fe 40 88 f8 c1 e9 09 
> RIP  [<ffffffff802f825a>] submit_bio+0x29/0xcf
>  RSP <ffff88003e9a9a98>
> ---[ end trace 8f7cc555c11c0a36 ]---
> ------------[ cut here ]------------
> kernel BUG at block/blk-core.c:1495!
> invalid opcode: 0000 [2] SMP 
> CPU 1 
> Modules linked in: drbd ftdi_sio usbserial
> Pid: 3558, comm: postgrey Tainted: G      D   2.6.27-rc6 #1
> RIP: 0010:[<ffffffff802f825a>]  [<ffffffff802f825a>] submit_bio+0x29/0xcf
> RSP: 0000:ffff88004c357b38  EFLAGS: 00010246
> RAX: 0000000000000000 RBX: ffff88004c1a92c0 RCX: 0000000000000000
> RDX: ffffffff802a23c1 RSI: ffff88004c1a92c0 RDI: 0000000000000000
> RBP: ffff88004c357c78 R08: ffff88007f963680 R09: 0000000000000040
> R10: ffff88004c1a92c0 R11: 0000000000000001 R12: ffffe20000d09ac8
> R13: 0000000000000001 R14: 00000000011a43c0 R15: 0000000000000006
> FS:  00007f4e9b4b86f0(0000) GS:ffff88007f8799c0(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> CR2: 00007f4e9b4c607c CR3: 000000004c3e7000 CR4: 00000000000006e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Process postgrey (pid: 3558, threadinfo ffff88004c356000, task ffff88004c339320)
> Stack:  ffff880000000000 0000000000000000 000000004c357b68 0000000000001000
>  ffff88004c357c78 0000000000001000 ffff88004c357c78 ffffffff802a17c6
>  0000000000000006 ffffffff802a215e 000000004c357c08 ffffffff802c3dad
> Call Trace:
>  [<ffffffff802a17c6>] ? mpage_bio_submit+0x22/0x26
>  [<ffffffff802a215e>] ? do_mpage_readpage+0x3f2/0x4ab
>  [<ffffffff802c3dad>] ? ext3_get_block+0x0/0xe6
>  [<ffffffff802c3dad>] ? ext3_get_block+0x0/0xe6
>  [<ffffffff8025c7dd>] ? add_to_page_cache_locked+0x74/0x96
>  [<ffffffff802c3dad>] ? ext3_get_block+0x0/0xe6
>  [<ffffffff802a2383>] ? mpage_readpages+0xa6/0xe4
>  [<ffffffff802c3dad>] ? ext3_get_block+0x0/0xe6
>  [<ffffffff80261033>] ? __alloc_pages_internal+0xd6/0x3a9
>  [<ffffffff80262f20>] ? __do_page_cache_readahead+0xfc/0x18e
>  [<ffffffff8025e525>] ? filemap_fault+0x15d/0x348
>  [<ffffffff802683a2>] ? __do_fault+0x52/0x35f
>  [<ffffffff80269d82>] ? handle_mm_fault+0x369/0x654
>  [<ffffffff802200d1>] ? do_page_fault+0x377/0x732
>  [<ffffffff803044c7>] ? __up_write+0x21/0x10e
>  [<ffffffff80447ad9>] ? error_exit+0x0/0x51
> 
> 
> Code: d0 c3 55 48 63 c7 53 48 89 f3 48 83 ec 28 48 0b 46 20 8b 4e 30 a8 04 48 89 46 20 74 08 85 c9 0f 84 9d 00 00 00 83 7b 30 00 75 04 <0f> 0b eb fe 48 83 7b 40 00 75 04 0f 0b eb fe 40 88 f8 c1 e9 09 
> RIP  [<ffffffff802f825a>] submit_bio+0x29/0xcf
>  RSP <ffff88004c357b38>
> ---[ end trace 8f7cc555c11c0a36 ]---
> ------------[ cut here ]------------
> kernel BUG at block/blk-core.c:1495!
> invalid opcode: 0000 [3] SMP 
> CPU 1 
> Modules linked in: drbd ftdi_sio usbserial
> Pid: 5260, comm: dspam Tainted: G      D   2.6.27-rc6 #1
> RIP: 0010:[<ffffffff802f825a>]  [<ffffffff802f825a>] submit_bio+0x29/0xcf
> RSP: 0018:ffff880038f85a98  EFLAGS: 00010246
> RAX: 0000000000000000 RBX: ffff88004c1d2bc0 RCX: 0000000000000000
> RDX: ffffffff802a23c1 RSI: ffff88004c1d2bc0 RDI: 0000000000000000
> RBP: ffff880038f85bd8 R08: ffff88007f961380 R09: 0000000000000040
> R10: ffff88004c1d2bc0 R11: 0000000000000001 R12: ffffe20000bfaf18
> R13: 0000000000000001 R14: 00000000047a4048 R15: 0000000000000001
> FS:  00007f6e981b56f0(0000) GS:ffff88007f8799c0(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007fffa01c3a28 CR3: 000000003a20c000 CR4: 00000000000006e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Process dspam (pid: 5260, threadinfo ffff880038f84000, task ffff88007f9e2940)
> Stack:  ffff880000000000 0000000000000000 00000000ffffffff 0000000000001000
>  ffff880038f85bd8 0000000000001000 ffff880038f85bd8 ffffffff802a17c6
>  0000000000000001 ffffffff802a215e 00000010000000d0 ffffffff802c3dad
> Call Trace:
>  [<ffffffff802a17c6>] ? mpage_bio_submit+0x22/0x26
>  [<ffffffff802a215e>] ? do_mpage_readpage+0x3f2/0x4ab
>  [<ffffffff802c3dad>] ? ext3_get_block+0x0/0xe6
>  [<ffffffff802c3dad>] ? ext3_get_block+0x0/0xe6
>  [<ffffffff8025c7dd>] ? add_to_page_cache_locked+0x74/0x96
>  [<ffffffff802c3dad>] ? ext3_get_block+0x0/0xe6
>  [<ffffffff802a2383>] ? mpage_readpages+0xa6/0xe4
>  [<ffffffff802c3dad>] ? ext3_get_block+0x0/0xe6
>  [<ffffffff80261033>] ? __alloc_pages_internal+0xd6/0x3a9
>  [<ffffffff80262f20>] ? __do_page_cache_readahead+0xfc/0x18e
>  [<ffffffff80263212>] ? ondemand_readahead+0x13a/0x149
>  [<ffffffff8025de23>] ? generic_file_aio_read+0x1f6/0x523
>  [<ffffffff8027c13b>] ? do_sync_read+0xc9/0x10c
>  [<ffffffff8023f800>] ? autoremove_wake_function+0x0/0x2e
>  [<ffffffff802e493f>] ? selinux_file_permission+0x4e/0xf6
>  [<ffffffff8027c8c0>] ? vfs_read+0xaa/0x133
>  [<ffffffff8027cba7>] ? sys_read+0x45/0x6e
>  [<ffffffff8020b20b>] ? system_call_fastpath+0x16/0x1b
> 
> 
> Code: d0 c3 55 48 63 c7 53 48 89 f3 48 83 ec 28 48 0b 46 20 8b 4e 30 a8 04 48 89 46 20 74 08 85 c9 0f 84 9d 00 00 00 83 7b 30 00 75 04 <0f> 0b eb fe 48 83 7b 40 00 75 04 0f 0b eb fe 40 88 f8 c1 e9 09 
> RIP  [<ffffffff802f825a>] submit_bio+0x29/0xcf
>  RSP <ffff880038f85a98>
> ---[ end trace 8f7cc555c11c0a36 ]---
> 
> -- 
> | Jan "Yenya" Kasprzak  <kas at {fi.muni.cz - work | yenya.net - private}> |
> | GPG: ID 1024/D3498839      Fingerprint 0D99A7FB206605D7 8B35FCDE05B18A5E |
> | http://www.fi.muni.cz/~kas/    Journal: http://www.fi.muni.cz/~kas/blog/ |
> >>  If you find yourself arguing with Alan Cox, you’re _probably_ wrong.  <<
> >>     --James Morris in "How and Why You Should Become a Kernel Hacker"  <<
> _______________________________________________
> drbd-user mailing list
> drbd-user at lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user
> 

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed



More information about the drbd-user mailing list