[DRBD-user] Daily crashes with a XFS filesystem on a DRBD 0.7.5 device with 2.6 SMP kernel

Cyril Bouthors cyril at bouthors.org
Thu Nov 11 11:16:14 CET 2004

I'm experiencing several random crashes a day with my latest DRBD

It's exactly the same as the others (crash free) that I got except
that the kernel is compiled with SMP support (Hyper Threading).

I'm wondering if anyone is using heavy loaded production NFS server
with DRBD + XFS + SMP.

Here's the complete configuration I'm using:
 - DRBD 0.7.5
 - Linux 2.6.9-1-686-smp (Debian)
 - XFS over software RAID

This morning I reproduced the crash several times with this simple
test: make -C /usr/src/linux -j modules

The machine stop responding to ICMP requests after ~20 seconds; the
same runs just fine if I use an UP kernel.

I've just booted the UP kernel so I'm not sure if the random daily
crashes has gone away too, I'll let you know.

Most of the time the machine crashes without anything in the logs but
I had this message last time, I don't know if it's DRBD related. Maybe
it's another bug:

------------[ cut here ]------------
kernel BUG at mm/rmap.c:474!
invalid operand: 0000 [#1]
Modules linked in: drbd ext2 mbcache nfsd exportfs lockd sunrpc sd_mod sg sr_mod scsi_mod cdrom ipt_LOG iptable_nat ip_conntrack iptable_filter ip_tables ipv6 dm_mod raid0 md capability commoncap r8169 tg3 firmware_class 3c59x 8139too mii crc32 forcedeth rtc xfs ide_generic piix ide_disk ide_core unix fbcon font vesafb cfbcopyarea cfbimgblt cfbfillrect
CPU:    1
EIP:    0060:[<c015199c>]    Tainted: GF  VLI
EFLAGS: 00010286   (2.6.9-1-686-smp) 
EIP is at page_remove_rmap+0x3c/0x50
eax: ffffffff   ebx: 00002000   ecx: da6add4c   edx: c139d9a0
esi: da56df00   edi: c139d9a0   ebp: 00000000   esp: da6adc78
ds: 007b   es: 007b   ss: 0068
Process munin-node (pid: 14532, threadinfo=da6ac000 task=f7321150)
Stack: c014af5e c139d9a0 00000000 c028fe10 da6adcd8 b83be000 da5bbb80 b7fe0000 
       00000000 c014b143 c18143a0 da5bbb7c b7fbe000 00022000 00000000 c18143a0 
       b7fbe000 da5bbb80 b7fe0000 00000000 c014b1b3 c18143a0 da5bbb7c b7fbe000 
Call Trace:
 [<c014af5e>] zap_pte_range+0x14e/0x2d0
 [<c028fe10>] schedule+0x520/0xbe0
 [<c014b143>] zap_pmd_range+0x63/0x80
 [<c014b1b3>] unmap_page_range+0x53/0x80
 [<c014b2e6>] unmap_vmas+0x106/0x220
 [<c014fa8f>] exit_mmap+0x9f/0x190
 [<c011de6b>] mmput+0x6b/0xa0
 [<c0166f9d>] exec_mmap+0xfd/0x1e0
 [<c016729a>] flush_old_exec+0x15a/0x870
 [<c015b9c7>] vfs_read+0x107/0x160
 [<c0166e8e>] kernel_read+0x4e/0x60
 [<c01861bb>] load_elf_binary+0x2db/0xbd0
 [<c011db60>] autoremove_wake_function+0x0/0x60
 [<c0166e8e>] kernel_read+0x4e/0x60
 [<c0185ee0>] load_elf_binary+0x0/0xbd0
 [<c0167d5e>] search_binary_handler+0x18e/0x2d0
 [<c0185565>] load_script+0x215/0x250
 [<c0140f55>] __alloc_pages+0x1d5/0x390
 [<c01b0aa2>] copy_from_user+0x42/0x70
 [<c01669d1>] copy_strings+0x1d1/0x220
 [<c0185350>] load_script+0x0/0x250
 [<c0167d5e>] search_binary_handler+0x18e/0x2d0
 [<c0168059>] do_execve+0x1b9/0x270
 [<c0104c62>] sys_execve+0x42/0x80
 [<c0106199>] sysenter_past_esp+0x52/0x71
Code: 98 c0 84 c0 74 24 8b 42 08 40 78 1f 9c 59 fa b8 00 e0 ff ff ba 00 4b 37 c0 21 e0 8b 40 10 03 14 85 20 80 37 c0 ff 4a 10 51 9d c3 <0f> 0b da 01 65 3a 2a c0 eb d7 0f 0b d7 01 65 3a 2a c0 eb bb 83 
 <6>note: munin-node[14532] exited with preempt_count 1
find_exported_dentry: npd != pd
find_exported_dentry: npd != pd
Cyril Bouthors
