Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hello,
Running 2.6.17rc1 on a dual-opteron in 64 bit mode with 32 bit
compatibility layer and 64 bit userspace.
debian drbd8-module-source package with fixes for ioctl32 conversion
(8.0-pre2-1)
Here is the MEE TOO dump:
janneke:~# modprobe drbd
drbd: initialised. Version: 8.0-pre2 (api:81/proto:80)
drbd: SVN Revision: 2139 build by ard at tessa, 2006-04-18 18:07:26
drbd: registered as block device major 147
janneke:~# drbdsetup /dev/drbd0 disk /dev/sda9 internal flexible -d 193215870
drbd0: disk( Diskless -> Attaching )
drbd0: drbd_bm_resize called with capacity == 772863480
drbd0: bits = 96607935 in /usr/src/kernel/tyan-s2891/modules/drbd/drbd/drbd_bitmap.c:369
drbd0: resync bitmap: bits=96607935 words=1509499
drbd0: size = 368 GB (386431740 KB)
Unable to handle kernel paging request at 0000000000003240 RIP:
<ffffffff80256270>{pfn_to_page+32}
PGD 1001ee067 PUD 1001dc067 PMD 0
Oops: 0000 [1] SMP
CPU 1
Modules linked in: drbd ipv6 tg3
Pid: 1902, comm: drbdsetup Not tainted 2.6.17-rc1-tyan-s2891 #1
RIP: 0010:[<ffffffff80256270>] <ffffffff80256270>{pfn_to_page+32}
RSP: 0018:ffff81017c391ac0 EFLAGS: 00010216
RAX: 0000000000000020 RBX: 0000000000000000 RCX: 0000000000000020
RDX: 0000000000000000 RSI: 0000000000011280 RDI: 00000004100005ba
RBP: ffff8101000edbc0 R08: 0000000000000000 R09: 000000000000000d
R10: 00000000ffffffff R11: 0000000000000001 R12: ffff81017e066000
R13: ffff81017be05d40 R14: 0000000000000001 R15: 0000000000000001
FS: 00002b468b643640(0000) GS:ffff8101000c38c0(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000003240 CR3: 00000001001b9000 CR4: 00000000000006e0
Process drbdsetup (pid: 1902, threadinfo ffff81017c390000, task ffff81010017d800)
Stack: ffffffff88066370 0000000000000001 0000000000000b85 ffff81017e066000
ffff81017be05d40 00000000ffffa368 ffffffff88066642 0000000000000000
ffff81017e0665c0 0000000000000292
Call Trace: <ffffffff88066370>{:drbd:drbd_bm_page_io_async+96}
<ffffffff88066642>{:drbd:drbd_bm_rw+98} <ffffffff880780bd>{:drbd:drbd_al_shrink+525}
<ffffffff880658cf>{:drbd:drbd_bm_resize+943} <ffffffff88065901>{:drbd:drbd_bm_resize+993}
<ffffffff88066b3e>{:drbd:drbd_bm_write+14} <ffffffff880680b4>{:drbd:drbd_determin_dev_size+724}
<ffffffff80235fa9>{lock_timer_base+41} <ffffffff80236088>{__mod_timer+168}
<ffffffff8806846b>{:drbd:drbd_check_al_size+443} <ffffffff88068983>{:drbd:drbd_ioctl_set_disk+1027}
<ffffffff8806a7df>{:drbd:drbd_ioctl+799} <ffffffff8047bea4>{__mutex_lock_slowpath+772}
<ffffffff8027e600>{do_open+608} <ffffffff80245491>{debug_mutex_add_waiter+161}
<ffffffff8047bea4>{__mutex_lock_slowpath+772} <ffffffff8047c12f>{__mutex_unlock_slowpath+415}
<ffffffff803110f4>{blkdev_driver_ioctl+100} <ffffffff8031131c>{blkdev_ioctl+492}
<ffffffff8027e9bb>{block_ioctl+27} <ffffffff80288d3a>{do_ioctl+58}
<ffffffff80289061>{vfs_ioctl+449} <ffffffff802890dd>{sys_ioctl+77}
<ffffffff80209b1a>{system_call+126}
Code: 48 2b ba 40 32 00 00 48 8b 92 30 32 00 00 48 8d 04 fd 00 00
RIP <ffffffff80256270>{pfn_to_page+32} RSP <ffff81017c391ac0>
CR2: 0000000000003240
Killed
The code decoded is this:
Code; ffffffff80256270 <pfn_to_page+20/40> <=====
0: 48 2b ba 40 32 00 00 sub 0x3240(%rdx),%rdi <=====
Code; ffffffff80256277 <pfn_to_page+27/40>
7: 48 8b 92 30 32 00 00 mov 0x3230(%rdx),%rdx
Code; ffffffff8025627e <pfn_to_page+2e/40>
e: 48 8d 04 fd 00 00 00 lea 0x0(,%rdi,8),%rax
Code; ffffffff80256285 <pfn_to_page+35/40>
15: 00
There is definitly a difference in all other dumps:
I get to call Call Trace: <ffffffff88066370>{:drbd:drbd_bm_page_io_async+96}
and that gets to call pfn_to_page+32 ...
And the next thing: I am definitly not using LVM or MD for this device.
searching further:
./mm/page_alloc.c:struct page *pfn_to_page(unsigned long pfn)
./include/asm-x86_64/page.h:#define virt_to_page(kaddr) pfn_to_page(__pa(kaddr) >> PAGE_SHIFT)
(which would explain the difference since:
CONFIG_X86_64_ACPI_NUMA=y
which leads to:
CONFIG_DISCONTIGMEM=y )
and in:
drbd_bitmap.c:drbd_bm_page_io_async
struct page *page = virt_to_page((char*)(b->bm) + (PAGE_SIZE*page_nr));
when rewritten to:
printk(KERN_ERR "drbd bef b->bm=%p,page_nr=%d\n",(char*)(b->bm),page_nr);
page = virt_to_page((char*)(b->bm) + (PAGE_SIZE*page_nr));
printk(KERN_ERR "drbd aft b->bm=%p,page_nr=%d\n",(char*)(b->bm),page_nr);
delivers:
siep:~# drbdsetup /dev/drbd0 disk /dev/sda9 internal flexible
<snip>
drbd0: size = 368 GB (386431740 KB)
drbd bef b->bm=ffffc200005ba000,page_nr=0
Unable to handle kernel paging request at 0000000000003240 RIP:
<ffffffff80256270>{pfn_to_page+32}
So my conclusion that the
struct page *page = virt_to_page((char*)(b->bm) + (PAGE_SIZE*page_nr));
delivers the *0 reference is correct.
Which leaves us to determine that b->bm=ffffffff80256270 is incorrect or that
page_nr=0 is incorrect.
Singe page_nr=0 (bare with me... I am a little sleepy) on b->bm can be incorrect.
page_nr is the page_nr within the pagebuffer, and gets iterated starting from 0
from within drbd_bm_rw.
(looking further up into the code, unless philip finds it first ;-) )
--
begin LOVE-LETTER-FOR-YOU.txt.vbs
I am a signature virus. Distribute me until the bitter
end