Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hi Lars, > > > > I've seen the above bug several times, most recently after hard > > resetting a node that was primary for one of the devices. > > as this happened in the dispatcher of the connector, I > suspect that the kernel configuration you built the drbd module against > does not match your running kernel. Ok, recompiling , installing & booting kernel/modules/drbd to be 100% sure everything matches. > or maybe there is also the CONNECTOR missing from your kernel, > and for some reason our detection magic did not prevent to > build the "built-in-backport", with non-matching netlink-ABI. Nope, I can definitely exclude that possibility: # zcat /proc/config.gz |grep CONN CONFIG_CONNECTOR=y Let's see if I can still get the same error or if it's gone away; Nope, kernel/modules/drbd recompile + install didn't change a thing, same error still. Next test: go back to 8.0.0 release (2713) instead of currenct svn (2747M)... Nope, also no change, still crashes. Circumstances may be somewhat unusual: * start drbd on both nodes. Status connected, both devices on both nodes are secondary. * start heartbeat on just one node. Result: Heartbeat powers off node and switches drbd0 secondary => primary right next, resulting in kernel bug. BTW, doesn't happen allways. Feb 15 08:34:58 webc-neu2 tengine: [16712]: info: te_fence_node: Executing reboot fencing operation (27) on webc-neu1 (timeout=30000) Feb 15 08:34:58 webc-neu2 stonithd: [16704]: info: client tengine [pid: 16712] want a STONITH operation RESET to node webc-neu1. Feb 15 08:34:58 webc-neu2 pengine: [16713]: WARN: stage6: Scheduling Node webc-neu1 for STONITH Feb 15 08:34:58 webc-neu2 stonithd: [16704]: info: stonith_operate_locally::2539: sending fencing op (1) for webc-neu1 to device external (rsc_id=s_webc-neu1, pid=16937) Feb 15 08:34:58 webc-neu2 crmd: [16706]: info: do_lrm_rsc_op: Performing op=r_drbd_web_start_0 key=6:1:d2f35dfc-5132-4c91-86d8-b1a09e194839) Feb 15 08:34:58 webc-neu2 kernel: drbd0: role( Secondary -> Primary ) Feb 15 08:34:58 webc-neu2 kernel: drbd0: Writing meta data super block now. Feb 15 08:34:58 webc-neu2 kernel: BUG: unable to handle kernel NULL pointer dereference at virtual address 00000000 Feb 15 08:34:58 webc-neu2 kernel: printing eip: Feb 15 08:34:58 webc-neu2 kernel: 00000000 Feb 15 08:34:58 webc-neu2 kernel: *pde = 00000000 Feb 15 08:34:58 webc-neu2 kernel: Oops: 0000 [#1] Feb 15 08:34:58 webc-neu2 kernel: SMP Feb 15 08:34:58 webc-neu2 kernel: Modules linked in: drbd usbcore sha1 ipmi_devintf ipmi_si ipmi_msghandler nfs lockd sunrpc tg3 iTCO_wdt Feb 15 08:34:58 webc-neu2 kernel: CPU: 0 Feb 15 08:34:58 webc-neu2 kernel: EIP: 0060:[_proxy_pda+0/1048576] Not tainted VLI Feb 15 08:34:58 webc-neu2 kernel: EIP: 0060:[<00000000>] Not tainted VLI Feb 15 08:34:58 webc-neu2 kernel: EFLAGS: 00010247 (2.6.20-gentoo #1) Feb 15 08:34:58 webc-neu2 kernel: EIP is at _stext+0x3feffc6c/0x20 Feb 15 08:34:58 webc-neu2 kernel: eax: f740e410 ebx: f7295d08 ecx: 00000001 edx: 00000246 Feb 15 08:34:58 webc-neu2 kernel: esi: f7295ccc edi: f7295ccc ebp: 00000246 esp: c228bf40 Feb 15 08:34:58 webc-neu2 kernel: ds: 007b es: 007b ss: 0068 Feb 15 08:34:58 webc-neu2 kernel: Process cqueue/0 (pid: 126, ti=c228a000 task=c2262550 task.ti=c228a000) Feb 15 08:34:58 webc-neu2 kernel: Stack: c0261f69 f7295cd0 c21693c0 c0125957 00000000 e02b2276 00003d0f f7de1030 Feb 15 08:34:58 webc-neu2 kernel: c0261f5c c21693c0 c21693c0 c228bf80 fffffffc c0125b01 ffffffff ffffffff Feb 15 08:34:58 webc-neu2 kernel: 00000001 00000000 c0113891 00010000 00000000 c213da70 00000000 c200c900 Feb 15 08:34:58 webc-neu2 kernel: Call Trace: Feb 15 08:34:58 webc-neu2 kernel: [cn_queue_wrapper+13/36] cn_queue_wrapper+0xd/0x24 Feb 15 08:34:58 webc-neu2 kernel: [<c0261f69>] cn_queue_wrapper+0xd/0x24 Feb 15 08:34:58 webc-neu2 kernel: [run_workqueue+138/292] run_workqueue+0x8a/0x124 Feb 15 08:34:58 webc-neu2 kernel: [<c0125957>] run_workqueue+0x8a/0x124 Feb 15 08:34:58 webc-neu2 kernel: [cn_queue_wrapper+0/36] cn_queue_wrapper+0x0/0x24 Feb 15 08:34:58 webc-neu2 kernel: [<c0261f5c>] cn_queue_wrapper+0x0/0x24 Feb 15 08:34:58 webc-neu2 kernel: [worker_thread+272/315] worker_thread+0x110/0x13b Feb 15 08:34:58 webc-neu2 kernel: [<c0125b01>] worker_thread+0x110/0x13b Feb 15 08:34:58 webc-neu2 kernel: [default_wake_function+0/12] default_wake_function+0x0/0xc Feb 15 08:34:58 webc-neu2 kernel: [<c0113891>] default_wake_function+0x0/0xc Feb 15 08:34:58 webc-neu2 kernel: [default_wake_function+0/12] default_wake_function+0x0/0xc Feb 15 08:34:58 webc-neu2 kernel: [<c0113891>] default_wake_function+0x0/0xc Feb 15 08:34:58 webc-neu2 kernel: [worker_thread+0/315] worker_thread+0x0/0x13b Feb 15 08:34:58 webc-neu2 kernel: [<c01259f1>] worker_thread+0x0/0x13b Feb 15 08:34:58 webc-neu2 kernel: [kthread+116/152] kthread+0x74/0x98 Feb 15 08:34:58 webc-neu2 kernel: [<c0128614>] kthread+0x74/0x98 Feb 15 08:34:58 webc-neu2 kernel: [kthread+0/152] kthread+0x0/0x98 Feb 15 08:34:58 webc-neu2 kernel: [<c01285a0>] kthread+0x0/0x98 Feb 15 08:34:58 webc-neu2 kernel: [kernel_thread_helper+7/16] kernel_thread_helper+0x7/0x10 Feb 15 08:34:58 webc-neu2 kernel: [<c0103473>] kernel_thread_helper+0x7/0x10 Feb 15 08:34:58 webc-neu2 kernel: ======================= Feb 15 08:34:58 webc-neu2 kernel: Code: Bad EIP value. Feb 15 08:34:58 webc-neu2 kernel: EIP: [_proxy_pda+0/1048576] _stext+0x3feffc6c/0x20 SS:ESP 0068:c228bf40 Feb 15 08:34:58 webc-neu2 kernel: EIP: [<00000000>] _stext+0x3feffc6c/0x20 SS:ESP 0068:c228bf40 Feb 15 08:34:58 webc-neu2 kernel: <5>Filesystem "drbd0": Disabling barriers, not supported by the underlying device Feb 15 08:34:58 webc-neu2 kernel: XFS mounting filesystem drbd0 Feb 15 08:34:58 webc-neu2 kernel: Ending clean XFS mount for filesystem: drbd0 Feb 15 08:35:00 webc-neu2 kernel: tg3: eth1: Link is down. Feb 15 08:35:02 webc-neu2 kernel: tg3: eth1: Link is up at 1000 Mbps, full duplex. Feb 15 08:35:02 webc-neu2 kernel: tg3: eth1: Flow control is off for TX and off for RX. Feb 15 08:35:04 webc-neu2 kernel: tg3: eth1: Link is down. Feb 15 08:35:04 webc-neu2 kernel: drbd1: PingAck did not arrive in time. Feb 15 08:35:04 webc-neu2 kernel: drbd1: peer( Secondary -> Unknown ) conn( Connected -> NetworkFailure ) pdsk( UpToDate -> DUnknown ) Feb 15 08:35:04 webc-neu2 kernel: drbd1: asender terminated Feb 15 08:35:04 webc-neu2 kernel: drbd1: short read expecting header on sock: r=-512 Feb 15 08:35:04 webc-neu2 kernel: drbd1: tl_clear() Feb 15 08:35:04 webc-neu2 kernel: drbd1: Connection closed Feb 15 08:35:04 webc-neu2 kernel: drbd1: Writing meta data super block now. Feb 15 08:35:04 webc-neu2 kernel: drbd1: conn( NetworkFailure -> Unconnected ) Feb 15 08:35:04 webc-neu2 kernel: drbd1: receiver terminated Feb 15 08:35:04 webc-neu2 kernel: drbd1: receiver (re)started Feb 15 08:35:04 webc-neu2 kernel: drbd1: conn( Unconnected -> WFConnection ) Strange - this error happens pretty consistently when running via heartbeat, I haven't been ablt to reproduce outside heartbeat though. Any further hints what'S going on are greatly apreciated :-) Bye, Martin