[DRBD-user] Oops when shutting down drbd

Rainer Sabelka sabelka at iue.tuwien.ac.at
Tue Jul 31 17:58:09 CEST 2007

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Hi,

I'm trying out drbd on two identical Linux boxes running Debian etch (standard 
debian kernel 2.6.18-4-686) and drbd 8.0.4 (compiled from the sources in the 
debian unstable repository)

From time to time I get a kernel oops caused by drbd. I think it happened 
during stopping drbd using "/etc/init.d/drbd stop" (altough I'm not 100% 
surehere, because it always happened when executing some shell script which 
contains multiple drbd related operations).

Unfortunately I'm not able to reproduce this - in almost all cases starting 
and stopping drbd just works.

However, I thought I'll post the kernel traces of my last two drbd crashes to 
this list, maybe someone has an idea what has been going wrong.

--------------------------------------------------------------
1st incident:

Jul 30 14:57:40 newserver2 kernel: drbd0: peer( Primary -> Unknown ) 
conn( SyncTarget -> Disconnecting ) pdsk( UpToDate -> DUnknown )
Jul 30 14:57:40 newserver2 kernel: drbd0: short read receiving data: read 720 
expected 4096
Jul 30 14:57:40 newserver2 kernel: drbd0: error receiving RSDataReply, l: 
32792!
Jul 30 14:57:40 newserver2 kernel: drbd0: asender terminated
Jul 30 14:57:40 newserver2 kernel: drbd0: tl_clear()
Jul 30 14:57:40 newserver2 kernel: drbd0: Connection closed
Jul 30 14:57:40 newserver2 kernel: drbd0: Writing meta data super block now.
Jul 30 14:57:40 newserver2 kernel: drbd0: conn( Disconnecting -> StandAlone )
Jul 30 14:57:40 newserver2 kernel: drbd0: receiver terminated
Jul 30 14:57:40 newserver2 kernel: drbd0: disk( Inconsistent -> Diskless )
Jul 30 14:57:40 newserver2 kernel: drbd0: drbd_bm_resize called with capacity 
== 0
Jul 30 14:57:40 newserver2 kernel: drbd0: disk( Diskless -> Attaching )
Jul 30 14:57:40 newserver2 kernel: drbd0: No usable activity log found.
Jul 30 14:57:40 newserver2 kernel: drbd0: max_segment_size ( = BIO size ) = 
32768
Jul 30 14:57:40 newserver2 kernel: drbd0: drbd_bm_resize: 
(down_trylock(&b->bm_change)) in /usr/src/modules/drbd/drbd/drbd_bitmap.c:370
Jul 30 14:57:40 newserver2 kernel: drbd0: drbd_bm_resize called with capacity 
== 1409178656
Jul 30 14:57:40 newserver2 kernel: drbd0: worker terminated
Jul 30 14:57:40 newserver2 kernel: BUG: unable to handle kernel NULL pointer 
dereference at virtual address 0000004c
Jul 30 14:57:40 newserver2 kernel: printing eip:
Jul 30 14:57:40 newserver2 kernel: f8c90f85
Jul 30 14:57:40 newserver2 kernel: *pde = 00000000
Jul 30 14:57:40 newserver2 kernel: Oops: 0000 [#1]
Jul 30 14:57:40 newserver2 kernel: SMP
Jul 30 14:57:40 newserver2 kernel: Modules linked in: drbd usbhid nfs nfsd 
exportfs lockd nfs_acl sunrpc button ac battery ipv6 cn dm_snapshot dm_mirror 
dm_mod loop serio_
raw psmouse rtc shpchp pci_hotplug evdev pcspkr ext3 jbd mbcache ide_cd cdrom 
piix sd_mod generic ehci_hcd ide_core uhci_hcd qla2xxx megaraid_sas bnx2 
usbcore firmware_cla
ss scsi_transport_fc scsi_mod thermal processor fan
Jul 30 14:57:40 newserver2 kernel: CPU:    7
Jul 30 14:57:40 newserver2 kernel: EIP:    0060:[<f8c90f85>]    Not tainted 
VLI
Jul 30 14:57:40 newserver2 kernel: EFLAGS: 00010246   (2.6.18-4-686 #1)
Jul 30 14:57:40 newserver2 kernel: EIP is at drbd_bm_resize+0x146/0x37b [drbd]
Jul 30 14:57:40 newserver2 kernel: eax: 00000000   ebx: 00000000   ecx: 
0a7fcb84   edx: 00000000
Jul 30 14:57:40 newserver2 kernel: esi: 00000000   edi: 00000000   ebp: 
f7fc8d40   esp: df977eb4
Jul 30 14:57:40 newserver2 kernel: ds: 007b   es: 007b   ss: 0068
Jul 30 14:57:40 newserver2 kernel: Process cqueue/7 (pid: 2743, ti=df976000 
task=c3307aa0 task.ti=df976000)
Jul 30 14:57:40 newserver2 kernel: Stack: 53fe5c20 00000000 f6ec1400 53fe5c20 
0a7fcb84 0053fe5e 00000000 53fe5c20
Jul 30 14:57:40 newserver2 kernel: 00000000 00000000 00000000 f8ca7c3d 
f6ec1400 0000a848 53fe5c20 00000000
Jul 30 14:57:40 newserver2 kernel: 53fe5c20 00000000 c011d97e f8cb1ca9 
df977f14 df977f14 f8ca7697 f8cb1ca9
Jul 30 14:57:40 newserver2 kernel: Call Trace:
Jul 30 14:57:40 newserver2 kernel: [<f8ca7c3d>] 
drbd_determin_dev_size+0x14d/0x370 [drbd]
Jul 30 14:57:40 newserver2 kernel: [<c011d97e>] printk+0x14/0x18
Jul 30 14:57:40 newserver2 kernel: [<f8ca7697>] 
drbd_setup_queue_param+0x2bc/0x2f6 [drbd]
Jul 30 14:57:40 newserver2 kernel: [<f8c90d2f>] __drbd_bm_lock+0x18/0xda 
[drbd]
Jul 30 14:57:40 newserver2 kernel: [<f8ca9359>] drbd_nl_disk_conf+0x524/0x788 
[drbd]
Jul 30 14:57:40 newserver2 kernel: [<f8ca8d86>] 
drbd_connector_callback+0xc4/0x173 [drbd]
Jul 30 14:57:40 newserver2 kernel: [<f8c570a2>] cn_queue_wrapper+0x9/0x1e [cn]
Jul 30 14:57:40 newserver2 kernel: [<c012abfc>] run_workqueue+0x78/0xb5
Jul 30 14:57:40 newserver2 kernel: [<f8c57099>] cn_queue_wrapper+0x0/0x1e [cn]
Jul 30 14:57:40 newserver2 kernel: [<c012b4e6>] worker_thread+0xd9/0x10b
Jul 30 14:57:40 newserver2 kernel: [<c0117778>] default_wake_function+0x0/0xc
Jul 30 14:57:40 newserver2 kernel: [<c012b40d>] worker_thread+0x0/0x10b
Jul 30 14:57:40 newserver2 kernel: [<c012d85f>] kthread+0xc2/0xef
Jul 30 14:57:40 newserver2 kernel: [<c012d79d>] kthread+0x0/0xef
Jul 30 14:57:40 newserver2 kernel: [<c0101005>] kernel_thread_helper+0x5/0xb
Jul 30 14:57:40 newserver2 kernel: Code: 07 83 d2 00 83 e0 f8 0f ac d0 03 8b 
54 24 08 31 db 89 44 24 10 83 c0 3f 83 e0 c0 8b 4c 24 10 c1 e8 05 89 44 24 14 
8b 42 14 31 d2 <
8b> 40 4c 83 c0 b8 83 d2 ff 0f a4 c2 0c c1 e0 0c 39 d3 72 25 39
Jul 30 14:57:40 newserver2 kernel: EIP: [<f8c90f85>] 
drbd_bm_resize+0x146/0x37b [drbd] SS:ESP 0068:df977eb4

------------------------------------------------------------------------------

2nd incident:

Jul 31 17:23:45 newserver1 kernel: drbd0: peer( Secondary -> Unknown ) 
conn( SyncSource -> Disconnecting )
Jul 31 17:23:45 newserver1 kernel: drbd0: drbd_pp_alloc interrupted!
Jul 31 17:23:45 newserver1 kernel: drbd0: alloc_ee: Allocation of a page 
failed
Jul 31 17:23:45 newserver1 kernel: drbd0: error receiving RSDataRequest, l: 
24!
Jul 31 17:23:45 newserver1 kernel: drbd0: asender terminated
Jul 31 17:23:45 newserver1 kernel: drbd0: _drbd_send_page: size=4096 len=3492 
sent=-104
Jul 31 17:23:45 newserver1 kernel: drbd0: drbd_send_block() failed
Jul 31 17:23:45 newserver1 kernel: drbd0: tl_clear()
Jul 31 17:23:45 newserver1 kernel: drbd0: Connection closed
Jul 31 17:23:45 newserver1 kernel: drbd0: Writing meta data super block now.
Jul 31 17:23:45 newserver1 kernel: drbd0: conn( Disconnecting -> StandAlone )
Jul 31 17:23:45 newserver1 kernel: drbd0: receiver terminated
Jul 31 17:23:45 newserver1 kernel: drbd0: disk( UpToDate -> Diskless ) 
pdsk( Inconsistent -> DUnknown )
Jul 31 17:23:45 newserver1 kernel: drbd0: drbd_bm_resize called with capacity 
== 0
Jul 31 17:23:45 newserver1 kernel: drbd0: ASSERT( list_empty(&mdev->net_ee) ) 
in /usr/src/modules/drbd/drbd/drbd_main.c:2103
Jul 31 17:23:45 newserver1 kernel: drbd0: worker terminated
Jul 31 17:23:45 newserver1 kernel: ------------[ cut here ]------------
Jul 31 17:23:45 newserver1 kernel: kernel BUG at include/linux/mm.h:300!
Jul 31 17:23:45 newserver1 kernel: invalid opcode: 0000 [#1]
Jul 31 17:23:45 newserver1 kernel: SMP
Jul 31 17:23:45 newserver1 kernel: Modules linked in: drbd usbhid nfs nfsd 
exportfs lockd nfs_acl sunrpc button ac battery ipv6 cn dm_snapshot dm_mirror 
dm_mod loop serio_
raw evdev shpchp psmouse pci_hotplug pcspkr rtc ext3 jbd mbcache ide_cd cdrom 
generic sd_mod piix ehci_hcd uhci_hcd megaraid_sas ide_core bnx2 usbcore 
qla2xxx firmware_cla
ss scsi_transport_fc scsi_mod thermal processor fan
Jul 31 17:23:45 newserver1 kernel: CPU:    7
Jul 31 17:23:45 newserver1 kernel: EIP:    0060:[<c014542c>]    Not tainted 
VLI
Jul 31 17:23:45 newserver1 kernel: EFLAGS: 00010046   (2.6.18-4-686 #1)
Jul 31 17:23:45 newserver1 kernel: EIP is at __free_pages+0x9/0x2f
Jul 31 17:23:45 newserver1 kernel: eax: 00000000   ebx: f79d9000   ecx: 
c27ab7a0   edx: 00000000
Jul 31 17:23:45 newserver1 kernel: esi: c27ab7a0   edi: 00000001   ebp: 
f7f46540   esp: ee651f08
Jul 31 17:23:45 newserver1 kernel: ds: 007b   es: 007b   ss: 0068
Jul 31 17:23:45 newserver1 kernel: Process rmmod (pid: 31376, ti=ee650000 
task=edc28000 task.ti=ee650000)
Jul 31 17:23:45 newserver1 kernel: Stack: f9114660 f4d1618c 00000001 ed0c3550 
f91146d8 f79d9000 f79d9000 00000002
Jul 31 17:23:45 newserver1 kernel: f79d9360 00000000 f9115d64 f79d9000 
00000008 00000000 f9127f1b 00000000
Jul 31 17:23:45 newserver1 kernel: f91387c0 00000008 00000000 00000880 
c0135c81 64627264 c014dd00 df8d6d4c
Jul 31 17:23:45 newserver1 kernel: Call Trace:
Jul 31 17:23:45 newserver1 kernel: [<f9114660>] drbd_pp_free+0x5d/0x78 [drbd]
Jul 31 17:23:45 newserver1 kernel: [<f91146d8>] drbd_free_ee+0x5d/0xa8 [drbd]
Jul 31 17:23:45 newserver1 kernel: [<f9115d64>] drbd_release_ee+0x1e/0x33 
[drbd]
Jul 31 17:23:45 newserver1 kernel: [<f9127f1b>] cleanup_module+0x19d/0x300 
[drbd]
Jul 31 17:23:45 newserver1 kernel: [<c0135c81>] sys_delete_module+0x1ad/0x1d4
Jul 31 17:23:45 newserver1 kernel: [<c014dd00>] unmap_region+0x64/0xf5
Jul 31 17:23:45 newserver1 kernel: [<c014ddc2>] remove_vma+0x31/0x36
Jul 31 17:23:45 newserver1 kernel: [<c014e674>] do_munmap+0x181/0x19b
Jul 31 17:23:45 newserver1 kernel: [<c0102c11>] sysenter_past_esp+0x56/0x79
Jul 31 17:23:45 newserver1 kernel: Code: 4e 18 89 4a 04 89 54 24 08 ba 01 00 
00 00 ff 74 24 04 e8 23 fb ff ff 5e 53 9d 83 c4 10 5b 5e 5f 5d c3 89 c1 8b 40 
04 85 c0 75 08 <
0f> 0b 2c 01 21 a3 29 c0 f0 ff 49 04 0f 94 c0 84 c0 74 12 85 d2
Jul 31 17:23:45 newserver1 kernel: EIP: [<c014542c>] __free_pages+0x9/0x2f 
SS:ESP 0068:ee651f08

---------------------------------------------------------------------

Best regards,
-Rainer



More information about the drbd-user mailing list