[DRBD-user] Kernel 2.6.20+ Drbd 8.0.0 (2738M): BUG: unable to handle kernel NULL pointer

Martin Bene martin.bene at icomedias.com
Thu Feb 15 08:50:06 CET 2007

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Hi Lars,

> > 
> > I've seen the above bug several times, most recently after hard
> > resetting a node that was primary for one of the devices.
> 
> as this happened in the dispatcher of the connector, I 
> suspect that the kernel configuration you built the drbd module
against 
> does not match your running kernel.

Ok, recompiling , installing & booting kernel/modules/drbd to be 100%
sure everything matches. 

> or maybe there is also the CONNECTOR missing from your kernel,
> and for some reason our detection magic did not prevent to
> build the "built-in-backport", with non-matching netlink-ABI.

Nope, I can definitely exclude that possibility:

# zcat /proc/config.gz  |grep CONN
CONFIG_CONNECTOR=y

Let's see if I can still get the same error or if it's gone away; 

Nope, kernel/modules/drbd recompile + install didn't change a thing,
same error still. Next test: go back to 8.0.0 release (2713) instead of
currenct svn (2747M)...

Nope, also no change, still crashes.

Circumstances may be somewhat unusual:

* start drbd on both nodes. Status connected, both devices on both nodes
are secondary.
* start heartbeat on just one node.

Result: Heartbeat powers off node and switches drbd0 secondary =>
primary right next, resulting in kernel bug. BTW, doesn't happen
allways.

Feb 15 08:34:58 webc-neu2 tengine: [16712]: info: te_fence_node:
Executing reboot fencing operation (27) on webc-neu1 (timeout=30000)
Feb 15 08:34:58 webc-neu2 stonithd: [16704]: info: client tengine [pid:
16712] want a STONITH operation RESET to node webc-neu1.
Feb 15 08:34:58 webc-neu2 pengine: [16713]: WARN: stage6: Scheduling
Node webc-neu1 for STONITH
Feb 15 08:34:58 webc-neu2 stonithd: [16704]: info:
stonith_operate_locally::2539: sending fencing op (1) for webc-neu1 to
device external (rsc_id=s_webc-neu1, pid=16937)
Feb 15 08:34:58 webc-neu2 crmd: [16706]: info: do_lrm_rsc_op: Performing
op=r_drbd_web_start_0 key=6:1:d2f35dfc-5132-4c91-86d8-b1a09e194839)
Feb 15 08:34:58 webc-neu2 kernel: drbd0: role( Secondary -> Primary )
Feb 15 08:34:58 webc-neu2 kernel: drbd0: Writing meta data super block
now.
Feb 15 08:34:58 webc-neu2 kernel: BUG: unable to handle kernel NULL
pointer dereference at virtual address 00000000
Feb 15 08:34:58 webc-neu2 kernel:  printing eip:
Feb 15 08:34:58 webc-neu2 kernel: 00000000
Feb 15 08:34:58 webc-neu2 kernel: *pde = 00000000
Feb 15 08:34:58 webc-neu2 kernel: Oops: 0000 [#1]
Feb 15 08:34:58 webc-neu2 kernel: SMP
Feb 15 08:34:58 webc-neu2 kernel: Modules linked in: drbd usbcore sha1
ipmi_devintf ipmi_si ipmi_msghandler nfs lockd sunrpc tg3 iTCO_wdt
Feb 15 08:34:58 webc-neu2 kernel: CPU:    0
Feb 15 08:34:58 webc-neu2 kernel: EIP:    0060:[_proxy_pda+0/1048576]
Not tainted VLI
Feb 15 08:34:58 webc-neu2 kernel: EIP:    0060:[<00000000>]    Not
tainted VLI
Feb 15 08:34:58 webc-neu2 kernel: EFLAGS: 00010247   (2.6.20-gentoo #1)
Feb 15 08:34:58 webc-neu2 kernel: EIP is at _stext+0x3feffc6c/0x20
Feb 15 08:34:58 webc-neu2 kernel: eax: f740e410   ebx: f7295d08   ecx:
00000001   edx: 00000246
Feb 15 08:34:58 webc-neu2 kernel: esi: f7295ccc   edi: f7295ccc   ebp:
00000246   esp: c228bf40
Feb 15 08:34:58 webc-neu2 kernel: ds: 007b   es: 007b   ss: 0068
Feb 15 08:34:58 webc-neu2 kernel: Process cqueue/0 (pid: 126,
ti=c228a000 task=c2262550 task.ti=c228a000)
Feb 15 08:34:58 webc-neu2 kernel: Stack: c0261f69 f7295cd0 c21693c0
c0125957 00000000 e02b2276 00003d0f f7de1030
Feb 15 08:34:58 webc-neu2 kernel:        c0261f5c c21693c0 c21693c0
c228bf80 fffffffc c0125b01 ffffffff ffffffff
Feb 15 08:34:58 webc-neu2 kernel:        00000001 00000000 c0113891
00010000 00000000 c213da70 00000000 c200c900
Feb 15 08:34:58 webc-neu2 kernel: Call Trace:
Feb 15 08:34:58 webc-neu2 kernel:  [cn_queue_wrapper+13/36]
cn_queue_wrapper+0xd/0x24
Feb 15 08:34:58 webc-neu2 kernel:  [<c0261f69>]
cn_queue_wrapper+0xd/0x24
Feb 15 08:34:58 webc-neu2 kernel:  [run_workqueue+138/292]
run_workqueue+0x8a/0x124
Feb 15 08:34:58 webc-neu2 kernel:  [<c0125957>] run_workqueue+0x8a/0x124
Feb 15 08:34:58 webc-neu2 kernel:  [cn_queue_wrapper+0/36]
cn_queue_wrapper+0x0/0x24
Feb 15 08:34:58 webc-neu2 kernel:  [<c0261f5c>]
cn_queue_wrapper+0x0/0x24
Feb 15 08:34:58 webc-neu2 kernel:  [worker_thread+272/315]
worker_thread+0x110/0x13b
Feb 15 08:34:58 webc-neu2 kernel:  [<c0125b01>]
worker_thread+0x110/0x13b
Feb 15 08:34:58 webc-neu2 kernel:  [default_wake_function+0/12]
default_wake_function+0x0/0xc
Feb 15 08:34:58 webc-neu2 kernel:  [<c0113891>]
default_wake_function+0x0/0xc
Feb 15 08:34:58 webc-neu2 kernel:  [default_wake_function+0/12]
default_wake_function+0x0/0xc
Feb 15 08:34:58 webc-neu2 kernel:  [<c0113891>]
default_wake_function+0x0/0xc
Feb 15 08:34:58 webc-neu2 kernel:  [worker_thread+0/315]
worker_thread+0x0/0x13b
Feb 15 08:34:58 webc-neu2 kernel:  [<c01259f1>] worker_thread+0x0/0x13b
Feb 15 08:34:58 webc-neu2 kernel:  [kthread+116/152] kthread+0x74/0x98
Feb 15 08:34:58 webc-neu2 kernel:  [<c0128614>] kthread+0x74/0x98
Feb 15 08:34:58 webc-neu2 kernel:  [kthread+0/152] kthread+0x0/0x98
Feb 15 08:34:58 webc-neu2 kernel:  [<c01285a0>] kthread+0x0/0x98
Feb 15 08:34:58 webc-neu2 kernel:  [kernel_thread_helper+7/16]
kernel_thread_helper+0x7/0x10
Feb 15 08:34:58 webc-neu2 kernel:  [<c0103473>]
kernel_thread_helper+0x7/0x10
Feb 15 08:34:58 webc-neu2 kernel:  =======================
Feb 15 08:34:58 webc-neu2 kernel: Code:  Bad EIP value.
Feb 15 08:34:58 webc-neu2 kernel: EIP: [_proxy_pda+0/1048576]
_stext+0x3feffc6c/0x20 SS:ESP 0068:c228bf40
Feb 15 08:34:58 webc-neu2 kernel: EIP: [<00000000>]
_stext+0x3feffc6c/0x20 SS:ESP 0068:c228bf40
Feb 15 08:34:58 webc-neu2 kernel:  <5>Filesystem "drbd0": Disabling
barriers, not supported by the underlying device
Feb 15 08:34:58 webc-neu2 kernel: XFS mounting filesystem drbd0
Feb 15 08:34:58 webc-neu2 kernel: Ending clean XFS mount for filesystem:
drbd0
Feb 15 08:35:00 webc-neu2 kernel: tg3: eth1: Link is down.
Feb 15 08:35:02 webc-neu2 kernel: tg3: eth1: Link is up at 1000 Mbps,
full duplex.
Feb 15 08:35:02 webc-neu2 kernel: tg3: eth1: Flow control is off for TX
and off for RX.
Feb 15 08:35:04 webc-neu2 kernel: tg3: eth1: Link is down.
Feb 15 08:35:04 webc-neu2 kernel: drbd1: PingAck did not arrive in time.
Feb 15 08:35:04 webc-neu2 kernel: drbd1: peer( Secondary -> Unknown )
conn( Connected -> NetworkFailure ) pdsk( UpToDate -> DUnknown )
Feb 15 08:35:04 webc-neu2 kernel: drbd1: asender terminated
Feb 15 08:35:04 webc-neu2 kernel: drbd1: short read expecting header on
sock: r=-512
Feb 15 08:35:04 webc-neu2 kernel: drbd1: tl_clear()
Feb 15 08:35:04 webc-neu2 kernel: drbd1: Connection closed
Feb 15 08:35:04 webc-neu2 kernel: drbd1: Writing meta data super block
now.
Feb 15 08:35:04 webc-neu2 kernel: drbd1: conn( NetworkFailure ->
Unconnected )
Feb 15 08:35:04 webc-neu2 kernel: drbd1: receiver terminated
Feb 15 08:35:04 webc-neu2 kernel: drbd1: receiver (re)started
Feb 15 08:35:04 webc-neu2 kernel: drbd1: conn( Unconnected ->
WFConnection )

Strange - this error happens pretty consistently when running via
heartbeat, I haven't been ablt to reproduce outside heartbeat though. 

Any further hints what'S going on are greatly apreciated :-)

Bye, Martin



More information about the drbd-user mailing list