[DRBD-user] Kernel Panic occuring when drbd is up & (re)syncing

Ivars Strazdiņš ivars.strazdins at gmail.com
Mon Nov 16 12:29:06 CET 2009


It looks like I am getting kernel bug on 64-bit Xen Debian in similar 
conditions, ie, when running drbd-verify.
I have got it happening on both cluster nodes.

Kernel 2.6.26-2-xen-amd64, DRBD 8.3.5 compiled from Debian unstable 
package for 8.3.4

For anyone interested, here is the stack trace.
BR,
Ivars


Nov 16 03:00:29 ariel kernel: [31375.026193] BUG: unable to handle 
kernel NULL pointer dereference at 0000000000000016
Nov 16 03:00:29 ariel kernel: [31375.026288] IP: [<ffffffffa02f9169>] 
:drbd:drbd_connector_callback+0x32/0x181
Nov 16 03:00:29 ariel kernel: [31375.026359] PGD 164c4067 PUD 170d1067 
PMD 0
Nov 16 03:00:29 ariel kernel: [31375.026423] Oops: 0000 [1] SMP
Nov 16 03:00:29 ariel kernel: [31375.026474] CPU 0
Nov 16 03:00:29 ariel kernel: [31375.026512] Modules linked in: 
xt_physdev iptable_filter ip_tables x_tables sha1_generic dr
bd cn iscsi_trgt crc32c libcrc32c ipv6 bridge xfs w83627ehf lm85 
hwmon_vid netconsole configfs xenblktap netloop softdog ipm
i_watchdog ipmi_msghandler loop psmouse serio_raw pcspkr i2c_i801 
i2c_core button rng_core shpchp pci_hotplug intel_agp evde
v ext3 jbd mbcache dm_mirror dm_log dm_snapshot dm_mod ide_cd_mod cdrom 
ide_disk ide_pci_generic ata_piix piix ide_core ata_
generic libata scsi_mod dock skge ehci_hcd uhci_hcd thermal processor 
fan thermal_sys [last unloaded: scsi_wait_scan]
Nov 16 03:00:29 ariel kernel: [31375.027370] Pid: 3165, comm: cqueue Not 
tainted 2.6.26-2-xen-amd64 #1
Nov 16 03:00:29 ariel kernel: [31375.027405] RIP: 
e030:[<ffffffffa02f9169>]  [<ffffffffa02f9169>] :drbd:drbd_connector_callb
ack+0x32/0x181
Nov 16 03:00:29 ariel kernel: [31375.027485] RSP: e02b:ffff8800104f3e50  
EFLAGS: 00010206
Nov 16 03:00:29 ariel kernel: [31375.027519] RAX: 0000000000000000 RBX: 
ffff88001648c220 RCX: 0000000000000000
Nov 16 03:00:29 ariel kernel: [31375.027555] RDX: 0000000000000000 RSI: 
0000000000000000 RDI: ffff8800164c9c10
Nov 16 03:00:29 ariel kernel: [31375.027597] RBP: ffff88001648c1d8 R08: 
ffff8800104f2000 R09: ffffffff80553e18
Nov 16 03:00:29 ariel kernel: [31375.027633] R10: 0000000000000000 R11: 
7fffffffffffffff R12: ffff8800164c9c10
Nov 16 03:00:29 ariel kernel: [31375.027669] R13: ffffffffa02d30c3 R14: 
ffffffff8057d1c0 R15: 0000000000000000
Nov 16 03:00:29 ariel kernel: [31375.027709] FS:  00007f9ee13c46e0(0000) 
GS:ffffffff8053a000(0000) knlGS:0000000000000000
Nov 16 03:00:29 ariel kernel: [31375.027761] CS:  e033 DS: 0000 ES: 0000
Nov 16 03:00:29 ariel kernel: [31375.027793] DR0: 0000000000000000 DR1: 
0000000000000000 DR2: 0000000000000000
Nov 16 03:00:29 ariel kernel: [31375.027829] DR3: 0000000000000000 DR6: 
00000000ffff0ff0 DR7: 0000000000000400
Nov 16 03:00:29 ariel kernel: [31375.027866] Process cqueue (pid: 3165, 
threadinfo ffff8800104f2000, task ffff8800161e1440)
Nov 16 03:00:29 ariel kernel: [31375.027918] Stack:  0000000000000000 
ffff88001648c220 ffff88001648c1d8 ffff88001648c1d0
Nov 16 03:00:29 ariel kernel: [31375.028024]  ffffffffa02d30c3 
ffffffff8057d1c0 0000000000000000 ffffffffa02d30d8
Nov 16 03:00:29 ariel kernel: [31375.028120]  7fffffffffffffff 
ffff880016f76840 ffff88001648c1d0 ffffffff8023c34c
Nov 16 03:00:29 ariel kernel: [31375.028185] Call Trace:
Nov 16 03:00:29 ariel kernel: [31375.028250]  [<ffffffffa02d30c3>] ? 
:cn:cn_queue_wrapper+0x0/0x33
Nov 16 03:00:29 ariel kernel: [31375.028393]  [<ffffffffa02d30d8>] ? 
:cn:cn_queue_wrapper+0x15/0x33
Nov 16 03:00:29 ariel kernel: [31375.028439]  [<ffffffff8023c34c>] ? 
run_workqueue+0xbe/0x189
Nov 16 03:00:29 ariel kernel: [31375.028482]  [<ffffffff8023cd35>] ? 
worker_thread+0xd5/0xe0
Nov 16 03:00:29 ariel kernel: [31375.028522]  [<ffffffff8023f6c1>] ? 
autoremove_wake_function+0x0/0x2e
Nov 16 03:00:29 ariel kernel: [31375.028564]  [<ffffffff8023cc60>] ? 
worker_thread+0x0/0xe0
Nov 16 03:00:29 ariel kernel: [31375.028601]  [<ffffffff8023f593>] ? 
kthread+0x47/0x74
Nov 16 03:00:29 ariel kernel: [31375.028637]  [<ffffffff802283a8>] ? 
schedule_tail+0x27/0x5c
Nov 16 03:00:29 ariel kernel: [31375.028677]  [<ffffffff8020be28>] ? 
child_rip+0xa/0x12
Nov 16 03:00:29 ariel kernel: [31375.028722]  [<ffffffff8023f54c>] ? 
kthread+0x0/0x74
Nov 16 03:00:29 ariel kernel: [31375.028760]  [<ffffffff8020be1e>] ? 
child_rip+0x0/0x12
Nov 16 03:00:29 ariel kernel: [31375.028796]
Nov 16 03:00:29 ariel kernel: [31375.028824]
Nov 16 03:00:29 ariel kernel: [31375.028852] Code: 41 55 41 54 49 89 fc 
55 53 48 83 ec 08 65 8b 04 25 24 00 00 00 83 3d a6 75 01 00 02 74 1e 89 
c0 48 c1 e0 07 48 ff 80 00 09 31 a0 <f6> 42 16 20 be 98 00 00 00 0f 84 
20 01 00 00 eb 1a 41 5b 5b 5d
Nov 16 03:00:29 ariel kernel: [31375.029581] RIP  [<ffffffffa02f9169>] 
:drbd:drbd_connector_callback+0x32/0x181
Nov 16 03:00:29 ariel kernel: [31375.029657]  RSP <ffff8800104f3e50>
Nov 16 03:00:29 ariel kernel: [31375.029688] CR2: 0000000000000016
Nov 16 03:00:29 ariel kernel: [31375.030762] ---[ end trace 
296f6157c8798c56 ]---


Jean-Francois Chevrette wrote:
> It appears that there is currently a problem with the latest 
> CentOS/Redhat kernel. We have noticed the same problem when using LVM 
> snapshots and a backup technology called R1Soft CDP.
>
> Some related info:
> http://bugs.centos.org/view.php?id=3869
> forum.r1soft.com/showthread.php?t=1158
>
> No sign of a bug at bugzilla.redhat.com
>
> For now we have reverted to kernel-2.6.18-128.7.1 on which we did not 
> have any issues for the past 4 hours. Previously, a few seconds after 
> starting a 'drbdadm verify' the kernel panic would occur.
>
> DRBD devs might want to check it out.
>
> Regards,


More information about the drbd-user mailing list