Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hi, I'm trying out drbd on two identical Linux boxes running Debian etch (standard debian kernel 2.6.18-4-686) and drbd 8.0.4 (compiled from the sources in the debian unstable repository) From time to time I get a kernel oops caused by drbd. I think it happened during stopping drbd using "/etc/init.d/drbd stop" (altough I'm not 100% surehere, because it always happened when executing some shell script which contains multiple drbd related operations). Unfortunately I'm not able to reproduce this - in almost all cases starting and stopping drbd just works. However, I thought I'll post the kernel traces of my last two drbd crashes to this list, maybe someone has an idea what has been going wrong. -------------------------------------------------------------- 1st incident: Jul 30 14:57:40 newserver2 kernel: drbd0: peer( Primary -> Unknown ) conn( SyncTarget -> Disconnecting ) pdsk( UpToDate -> DUnknown ) Jul 30 14:57:40 newserver2 kernel: drbd0: short read receiving data: read 720 expected 4096 Jul 30 14:57:40 newserver2 kernel: drbd0: error receiving RSDataReply, l: 32792! Jul 30 14:57:40 newserver2 kernel: drbd0: asender terminated Jul 30 14:57:40 newserver2 kernel: drbd0: tl_clear() Jul 30 14:57:40 newserver2 kernel: drbd0: Connection closed Jul 30 14:57:40 newserver2 kernel: drbd0: Writing meta data super block now. Jul 30 14:57:40 newserver2 kernel: drbd0: conn( Disconnecting -> StandAlone ) Jul 30 14:57:40 newserver2 kernel: drbd0: receiver terminated Jul 30 14:57:40 newserver2 kernel: drbd0: disk( Inconsistent -> Diskless ) Jul 30 14:57:40 newserver2 kernel: drbd0: drbd_bm_resize called with capacity == 0 Jul 30 14:57:40 newserver2 kernel: drbd0: disk( Diskless -> Attaching ) Jul 30 14:57:40 newserver2 kernel: drbd0: No usable activity log found. Jul 30 14:57:40 newserver2 kernel: drbd0: max_segment_size ( = BIO size ) = 32768 Jul 30 14:57:40 newserver2 kernel: drbd0: drbd_bm_resize: (down_trylock(&b->bm_change)) in /usr/src/modules/drbd/drbd/drbd_bitmap.c:370 Jul 30 14:57:40 newserver2 kernel: drbd0: drbd_bm_resize called with capacity == 1409178656 Jul 30 14:57:40 newserver2 kernel: drbd0: worker terminated Jul 30 14:57:40 newserver2 kernel: BUG: unable to handle kernel NULL pointer dereference at virtual address 0000004c Jul 30 14:57:40 newserver2 kernel: printing eip: Jul 30 14:57:40 newserver2 kernel: f8c90f85 Jul 30 14:57:40 newserver2 kernel: *pde = 00000000 Jul 30 14:57:40 newserver2 kernel: Oops: 0000 [#1] Jul 30 14:57:40 newserver2 kernel: SMP Jul 30 14:57:40 newserver2 kernel: Modules linked in: drbd usbhid nfs nfsd exportfs lockd nfs_acl sunrpc button ac battery ipv6 cn dm_snapshot dm_mirror dm_mod loop serio_ raw psmouse rtc shpchp pci_hotplug evdev pcspkr ext3 jbd mbcache ide_cd cdrom piix sd_mod generic ehci_hcd ide_core uhci_hcd qla2xxx megaraid_sas bnx2 usbcore firmware_cla ss scsi_transport_fc scsi_mod thermal processor fan Jul 30 14:57:40 newserver2 kernel: CPU: 7 Jul 30 14:57:40 newserver2 kernel: EIP: 0060:[<f8c90f85>] Not tainted VLI Jul 30 14:57:40 newserver2 kernel: EFLAGS: 00010246 (2.6.18-4-686 #1) Jul 30 14:57:40 newserver2 kernel: EIP is at drbd_bm_resize+0x146/0x37b [drbd] Jul 30 14:57:40 newserver2 kernel: eax: 00000000 ebx: 00000000 ecx: 0a7fcb84 edx: 00000000 Jul 30 14:57:40 newserver2 kernel: esi: 00000000 edi: 00000000 ebp: f7fc8d40 esp: df977eb4 Jul 30 14:57:40 newserver2 kernel: ds: 007b es: 007b ss: 0068 Jul 30 14:57:40 newserver2 kernel: Process cqueue/7 (pid: 2743, ti=df976000 task=c3307aa0 task.ti=df976000) Jul 30 14:57:40 newserver2 kernel: Stack: 53fe5c20 00000000 f6ec1400 53fe5c20 0a7fcb84 0053fe5e 00000000 53fe5c20 Jul 30 14:57:40 newserver2 kernel: 00000000 00000000 00000000 f8ca7c3d f6ec1400 0000a848 53fe5c20 00000000 Jul 30 14:57:40 newserver2 kernel: 53fe5c20 00000000 c011d97e f8cb1ca9 df977f14 df977f14 f8ca7697 f8cb1ca9 Jul 30 14:57:40 newserver2 kernel: Call Trace: Jul 30 14:57:40 newserver2 kernel: [<f8ca7c3d>] drbd_determin_dev_size+0x14d/0x370 [drbd] Jul 30 14:57:40 newserver2 kernel: [<c011d97e>] printk+0x14/0x18 Jul 30 14:57:40 newserver2 kernel: [<f8ca7697>] drbd_setup_queue_param+0x2bc/0x2f6 [drbd] Jul 30 14:57:40 newserver2 kernel: [<f8c90d2f>] __drbd_bm_lock+0x18/0xda [drbd] Jul 30 14:57:40 newserver2 kernel: [<f8ca9359>] drbd_nl_disk_conf+0x524/0x788 [drbd] Jul 30 14:57:40 newserver2 kernel: [<f8ca8d86>] drbd_connector_callback+0xc4/0x173 [drbd] Jul 30 14:57:40 newserver2 kernel: [<f8c570a2>] cn_queue_wrapper+0x9/0x1e [cn] Jul 30 14:57:40 newserver2 kernel: [<c012abfc>] run_workqueue+0x78/0xb5 Jul 30 14:57:40 newserver2 kernel: [<f8c57099>] cn_queue_wrapper+0x0/0x1e [cn] Jul 30 14:57:40 newserver2 kernel: [<c012b4e6>] worker_thread+0xd9/0x10b Jul 30 14:57:40 newserver2 kernel: [<c0117778>] default_wake_function+0x0/0xc Jul 30 14:57:40 newserver2 kernel: [<c012b40d>] worker_thread+0x0/0x10b Jul 30 14:57:40 newserver2 kernel: [<c012d85f>] kthread+0xc2/0xef Jul 30 14:57:40 newserver2 kernel: [<c012d79d>] kthread+0x0/0xef Jul 30 14:57:40 newserver2 kernel: [<c0101005>] kernel_thread_helper+0x5/0xb Jul 30 14:57:40 newserver2 kernel: Code: 07 83 d2 00 83 e0 f8 0f ac d0 03 8b 54 24 08 31 db 89 44 24 10 83 c0 3f 83 e0 c0 8b 4c 24 10 c1 e8 05 89 44 24 14 8b 42 14 31 d2 < 8b> 40 4c 83 c0 b8 83 d2 ff 0f a4 c2 0c c1 e0 0c 39 d3 72 25 39 Jul 30 14:57:40 newserver2 kernel: EIP: [<f8c90f85>] drbd_bm_resize+0x146/0x37b [drbd] SS:ESP 0068:df977eb4 ------------------------------------------------------------------------------ 2nd incident: Jul 31 17:23:45 newserver1 kernel: drbd0: peer( Secondary -> Unknown ) conn( SyncSource -> Disconnecting ) Jul 31 17:23:45 newserver1 kernel: drbd0: drbd_pp_alloc interrupted! Jul 31 17:23:45 newserver1 kernel: drbd0: alloc_ee: Allocation of a page failed Jul 31 17:23:45 newserver1 kernel: drbd0: error receiving RSDataRequest, l: 24! Jul 31 17:23:45 newserver1 kernel: drbd0: asender terminated Jul 31 17:23:45 newserver1 kernel: drbd0: _drbd_send_page: size=4096 len=3492 sent=-104 Jul 31 17:23:45 newserver1 kernel: drbd0: drbd_send_block() failed Jul 31 17:23:45 newserver1 kernel: drbd0: tl_clear() Jul 31 17:23:45 newserver1 kernel: drbd0: Connection closed Jul 31 17:23:45 newserver1 kernel: drbd0: Writing meta data super block now. Jul 31 17:23:45 newserver1 kernel: drbd0: conn( Disconnecting -> StandAlone ) Jul 31 17:23:45 newserver1 kernel: drbd0: receiver terminated Jul 31 17:23:45 newserver1 kernel: drbd0: disk( UpToDate -> Diskless ) pdsk( Inconsistent -> DUnknown ) Jul 31 17:23:45 newserver1 kernel: drbd0: drbd_bm_resize called with capacity == 0 Jul 31 17:23:45 newserver1 kernel: drbd0: ASSERT( list_empty(&mdev->net_ee) ) in /usr/src/modules/drbd/drbd/drbd_main.c:2103 Jul 31 17:23:45 newserver1 kernel: drbd0: worker terminated Jul 31 17:23:45 newserver1 kernel: ------------[ cut here ]------------ Jul 31 17:23:45 newserver1 kernel: kernel BUG at include/linux/mm.h:300! Jul 31 17:23:45 newserver1 kernel: invalid opcode: 0000 [#1] Jul 31 17:23:45 newserver1 kernel: SMP Jul 31 17:23:45 newserver1 kernel: Modules linked in: drbd usbhid nfs nfsd exportfs lockd nfs_acl sunrpc button ac battery ipv6 cn dm_snapshot dm_mirror dm_mod loop serio_ raw evdev shpchp psmouse pci_hotplug pcspkr rtc ext3 jbd mbcache ide_cd cdrom generic sd_mod piix ehci_hcd uhci_hcd megaraid_sas ide_core bnx2 usbcore qla2xxx firmware_cla ss scsi_transport_fc scsi_mod thermal processor fan Jul 31 17:23:45 newserver1 kernel: CPU: 7 Jul 31 17:23:45 newserver1 kernel: EIP: 0060:[<c014542c>] Not tainted VLI Jul 31 17:23:45 newserver1 kernel: EFLAGS: 00010046 (2.6.18-4-686 #1) Jul 31 17:23:45 newserver1 kernel: EIP is at __free_pages+0x9/0x2f Jul 31 17:23:45 newserver1 kernel: eax: 00000000 ebx: f79d9000 ecx: c27ab7a0 edx: 00000000 Jul 31 17:23:45 newserver1 kernel: esi: c27ab7a0 edi: 00000001 ebp: f7f46540 esp: ee651f08 Jul 31 17:23:45 newserver1 kernel: ds: 007b es: 007b ss: 0068 Jul 31 17:23:45 newserver1 kernel: Process rmmod (pid: 31376, ti=ee650000 task=edc28000 task.ti=ee650000) Jul 31 17:23:45 newserver1 kernel: Stack: f9114660 f4d1618c 00000001 ed0c3550 f91146d8 f79d9000 f79d9000 00000002 Jul 31 17:23:45 newserver1 kernel: f79d9360 00000000 f9115d64 f79d9000 00000008 00000000 f9127f1b 00000000 Jul 31 17:23:45 newserver1 kernel: f91387c0 00000008 00000000 00000880 c0135c81 64627264 c014dd00 df8d6d4c Jul 31 17:23:45 newserver1 kernel: Call Trace: Jul 31 17:23:45 newserver1 kernel: [<f9114660>] drbd_pp_free+0x5d/0x78 [drbd] Jul 31 17:23:45 newserver1 kernel: [<f91146d8>] drbd_free_ee+0x5d/0xa8 [drbd] Jul 31 17:23:45 newserver1 kernel: [<f9115d64>] drbd_release_ee+0x1e/0x33 [drbd] Jul 31 17:23:45 newserver1 kernel: [<f9127f1b>] cleanup_module+0x19d/0x300 [drbd] Jul 31 17:23:45 newserver1 kernel: [<c0135c81>] sys_delete_module+0x1ad/0x1d4 Jul 31 17:23:45 newserver1 kernel: [<c014dd00>] unmap_region+0x64/0xf5 Jul 31 17:23:45 newserver1 kernel: [<c014ddc2>] remove_vma+0x31/0x36 Jul 31 17:23:45 newserver1 kernel: [<c014e674>] do_munmap+0x181/0x19b Jul 31 17:23:45 newserver1 kernel: [<c0102c11>] sysenter_past_esp+0x56/0x79 Jul 31 17:23:45 newserver1 kernel: Code: 4e 18 89 4a 04 89 54 24 08 ba 01 00 00 00 ff 74 24 04 e8 23 fb ff ff 5e 53 9d 83 c4 10 5b 5e 5f 5d c3 89 c1 8b 40 04 85 c0 75 08 < 0f> 0b 2c 01 21 a3 29 c0 f0 ff 49 04 0f 94 c0 84 c0 74 12 85 d2 Jul 31 17:23:45 newserver1 kernel: EIP: [<c014542c>] __free_pages+0x9/0x2f SS:ESP 0068:ee651f08 --------------------------------------------------------------------- Best regards, -Rainer