Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
It looks like I am getting kernel bug on 64-bit Xen Debian in similar conditions, ie, when running drbd-verify. I have got it happening on both cluster nodes. Kernel 2.6.26-2-xen-amd64, DRBD 8.3.5 compiled from Debian unstable package for 8.3.4 For anyone interested, here is the stack trace. BR, Ivars Nov 16 03:00:29 ariel kernel: [31375.026193] BUG: unable to handle kernel NULL pointer dereference at 0000000000000016 Nov 16 03:00:29 ariel kernel: [31375.026288] IP: [<ffffffffa02f9169>] :drbd:drbd_connector_callback+0x32/0x181 Nov 16 03:00:29 ariel kernel: [31375.026359] PGD 164c4067 PUD 170d1067 PMD 0 Nov 16 03:00:29 ariel kernel: [31375.026423] Oops: 0000 [1] SMP Nov 16 03:00:29 ariel kernel: [31375.026474] CPU 0 Nov 16 03:00:29 ariel kernel: [31375.026512] Modules linked in: xt_physdev iptable_filter ip_tables x_tables sha1_generic dr bd cn iscsi_trgt crc32c libcrc32c ipv6 bridge xfs w83627ehf lm85 hwmon_vid netconsole configfs xenblktap netloop softdog ipm i_watchdog ipmi_msghandler loop psmouse serio_raw pcspkr i2c_i801 i2c_core button rng_core shpchp pci_hotplug intel_agp evde v ext3 jbd mbcache dm_mirror dm_log dm_snapshot dm_mod ide_cd_mod cdrom ide_disk ide_pci_generic ata_piix piix ide_core ata_ generic libata scsi_mod dock skge ehci_hcd uhci_hcd thermal processor fan thermal_sys [last unloaded: scsi_wait_scan] Nov 16 03:00:29 ariel kernel: [31375.027370] Pid: 3165, comm: cqueue Not tainted 2.6.26-2-xen-amd64 #1 Nov 16 03:00:29 ariel kernel: [31375.027405] RIP: e030:[<ffffffffa02f9169>] [<ffffffffa02f9169>] :drbd:drbd_connector_callb ack+0x32/0x181 Nov 16 03:00:29 ariel kernel: [31375.027485] RSP: e02b:ffff8800104f3e50 EFLAGS: 00010206 Nov 16 03:00:29 ariel kernel: [31375.027519] RAX: 0000000000000000 RBX: ffff88001648c220 RCX: 0000000000000000 Nov 16 03:00:29 ariel kernel: [31375.027555] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8800164c9c10 Nov 16 03:00:29 ariel kernel: [31375.027597] RBP: ffff88001648c1d8 R08: ffff8800104f2000 R09: ffffffff80553e18 Nov 16 03:00:29 ariel kernel: [31375.027633] R10: 0000000000000000 R11: 7fffffffffffffff R12: ffff8800164c9c10 Nov 16 03:00:29 ariel kernel: [31375.027669] R13: ffffffffa02d30c3 R14: ffffffff8057d1c0 R15: 0000000000000000 Nov 16 03:00:29 ariel kernel: [31375.027709] FS: 00007f9ee13c46e0(0000) GS:ffffffff8053a000(0000) knlGS:0000000000000000 Nov 16 03:00:29 ariel kernel: [31375.027761] CS: e033 DS: 0000 ES: 0000 Nov 16 03:00:29 ariel kernel: [31375.027793] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Nov 16 03:00:29 ariel kernel: [31375.027829] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Nov 16 03:00:29 ariel kernel: [31375.027866] Process cqueue (pid: 3165, threadinfo ffff8800104f2000, task ffff8800161e1440) Nov 16 03:00:29 ariel kernel: [31375.027918] Stack: 0000000000000000 ffff88001648c220 ffff88001648c1d8 ffff88001648c1d0 Nov 16 03:00:29 ariel kernel: [31375.028024] ffffffffa02d30c3 ffffffff8057d1c0 0000000000000000 ffffffffa02d30d8 Nov 16 03:00:29 ariel kernel: [31375.028120] 7fffffffffffffff ffff880016f76840 ffff88001648c1d0 ffffffff8023c34c Nov 16 03:00:29 ariel kernel: [31375.028185] Call Trace: Nov 16 03:00:29 ariel kernel: [31375.028250] [<ffffffffa02d30c3>] ? :cn:cn_queue_wrapper+0x0/0x33 Nov 16 03:00:29 ariel kernel: [31375.028393] [<ffffffffa02d30d8>] ? :cn:cn_queue_wrapper+0x15/0x33 Nov 16 03:00:29 ariel kernel: [31375.028439] [<ffffffff8023c34c>] ? run_workqueue+0xbe/0x189 Nov 16 03:00:29 ariel kernel: [31375.028482] [<ffffffff8023cd35>] ? worker_thread+0xd5/0xe0 Nov 16 03:00:29 ariel kernel: [31375.028522] [<ffffffff8023f6c1>] ? autoremove_wake_function+0x0/0x2e Nov 16 03:00:29 ariel kernel: [31375.028564] [<ffffffff8023cc60>] ? worker_thread+0x0/0xe0 Nov 16 03:00:29 ariel kernel: [31375.028601] [<ffffffff8023f593>] ? kthread+0x47/0x74 Nov 16 03:00:29 ariel kernel: [31375.028637] [<ffffffff802283a8>] ? schedule_tail+0x27/0x5c Nov 16 03:00:29 ariel kernel: [31375.028677] [<ffffffff8020be28>] ? child_rip+0xa/0x12 Nov 16 03:00:29 ariel kernel: [31375.028722] [<ffffffff8023f54c>] ? kthread+0x0/0x74 Nov 16 03:00:29 ariel kernel: [31375.028760] [<ffffffff8020be1e>] ? child_rip+0x0/0x12 Nov 16 03:00:29 ariel kernel: [31375.028796] Nov 16 03:00:29 ariel kernel: [31375.028824] Nov 16 03:00:29 ariel kernel: [31375.028852] Code: 41 55 41 54 49 89 fc 55 53 48 83 ec 08 65 8b 04 25 24 00 00 00 83 3d a6 75 01 00 02 74 1e 89 c0 48 c1 e0 07 48 ff 80 00 09 31 a0 <f6> 42 16 20 be 98 00 00 00 0f 84 20 01 00 00 eb 1a 41 5b 5b 5d Nov 16 03:00:29 ariel kernel: [31375.029581] RIP [<ffffffffa02f9169>] :drbd:drbd_connector_callback+0x32/0x181 Nov 16 03:00:29 ariel kernel: [31375.029657] RSP <ffff8800104f3e50> Nov 16 03:00:29 ariel kernel: [31375.029688] CR2: 0000000000000016 Nov 16 03:00:29 ariel kernel: [31375.030762] ---[ end trace 296f6157c8798c56 ]--- Jean-Francois Chevrette wrote: > It appears that there is currently a problem with the latest > CentOS/Redhat kernel. We have noticed the same problem when using LVM > snapshots and a backup technology called R1Soft CDP. > > Some related info: > http://bugs.centos.org/view.php?id=3869 > forum.r1soft.com/showthread.php?t=1158 > > No sign of a bug at bugzilla.redhat.com > > For now we have reverted to kernel-2.6.18-128.7.1 on which we did not > have any issues for the past 4 hours. Previously, a few seconds after > starting a 'drbdadm verify' the kernel panic would occur. > > DRBD devs might want to check it out. > > Regards,