[DRBD-user] CentOS 7.3 3.10.0-514.10.2.el7.x86_64 w. DRBD v8.4.9-2: PANIC: ".1BUG: unable to handle kernel NULL pointer dereference at 0000000000000014"

Adi Pircalabu adi at ddns.com.au
Fri Jun 30 02:15:56 CEST 2017

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


On 05-05-2017 9:24, Adi Pircalabu wrote:
> On 5/5/17 2:18 AM, Robert Altnoeder wrote:
>> On 04/26/2017 06:03 AM, Adi Pircalabu wrote:
>>> Just fyi, crashed again yesterday morning 7:06am, similar backtrace.
>>> crash output for bt, ps, task & vm attached. I've since downgraded 
>>> the
>>> drbd module version from 8.4.9-2 to 8.4.9-1, waiting for the crash to
>>> replicate again. And, as expected, the folks @RedHat closed the bug
>>> after reopening it as notabug, blaming drbd.
>> If they really explicitly blamed DRBD, then I suggest reopening the 
>> bug
>> and requesting a copy of their root cause analysis that proves that 
>> DRBD
>> is causing the problem.
> 
> I have, along with providing more debug information and asking why
> they think DRBD is to blame.
> 
>> Obviously, their point will be something like "noone knows what that
>> out-of-tree code might be doing"; granted, it's not an entirely 
>> invalid
>> point.
> 
> Agree.
> 
>> But then, I am quite sure - judging by the frequency and number of
>> kernel updates that are provided each year - that noone really knows
>> what the in-tree code might be doing, so one had better look there too
>> before blaming a piece of out-of-tree code that's pretty small 
>> compared
>> to all the other pieces of code that may have caused the crash.
> 
> Here is the additional comment when reopening the bug (email client
> wrapping may make it unreadable):
> 
> Looking further into the 2 backtraces:
> 
> 1. First crash, linux-3.10.0-514.10.2.el7.x86_64
> [793292.358213] .1BUG: unable to handle kernel NULL pointer
> dereference at 0000000000000014
> [793292.358710] IP: [<ffffffff810c8375>] account_system_time+0x15/0x170
> [793292.358966] PGD 0
> [793292.359202] Oops: 0000 [#1] SMP
> [793292.359444] Modules linked in: binfmt_misc vfat fat drbd(OE)
> mpt3sas mpt2sas raid_class scsi_transport_sas mptctl mptbase dell_rbu
> bonding ipt_REJECT nf_reject_ipv4 nf_conntrack_ipv4 nf_defrag_ipv4
> xt_conntrack nf_conntrack iptable_filter dm_cache_smq dm_cache
> dm_persistent_data dm_bio_prison dm_bufio intel_powerclamp coretemp
> intel_rapl iosf_mbi kvm irqbypass crc32_pclmul ghash_clmulni_intel
> aesni_intel lrw gf128mul glue_helper ablk_helper cryptd iTCO_wdt
> iTCO_vendor_support dcdbas pcspkr mxm_wmi sg sb_edac edac_core
> ipmi_devintf ipmi_si ipmi_msghandler lpc_ich mei_me mei shpchp
> acpi_power_meter wmi nfsd auth_rpcgss nfs_acl lockd grace sunrpc
> tcp_htcp ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic
> crct10dif_pclmul crct10dif_common crc32c_intel drm_kms_helper
> syscopyarea sysfillrect
> [793292.362164]  sysimgblt fb_sys_fops ttm ixgbe drm ahci uas igb
> libahci mdio i2c_algo_bit usb_storage ptp libata pps_core i2c_core
> megaraid_sas dca fjes dm_mirror dm_region_hash dm_log dm_mod
> [793292.362978] CPU: 3 PID: 0 Comm: swapper/3 Tainted: G           OE
> ------------   3.10.0-514.10.2.el7.x86_64 #1
> [793292.363448] Hardware name: Dell Inc. PowerEdge R730xd/072T6D, BIOS
> 2.3.4 11/08/2016
> [793292.363910] task: ffff8804fa559f60 ti: ffff8804fa680000 task.ti:
> ffff8804fa680000
> [793292.364377] RIP: 0010:[<ffffffff810c8375>]  [<ffffffff810c8375>]
> account_system_time+0x15/0x170
> [793292.364850] RSP: 0018:ffff88086de43e00  EFLAGS: 00010086
> [793292.365088] RAX: 0000000000000000 RBX: ffff88086de56c40 RCX:
> 00000000000f4240
> [793292.365550] RDX: 00000000000f4240 RSI: 0000000000010000 RDI:
> 0000000000000000
> [793292.366012] RBP: ffff88086de43e28 R08: 0000000000000000 R09:
> 00000000000c1af5
> [793292.366470] R10: 000000003b9aca00 R11: 0000000000000000 R12:
> 00000000000f4240
> [793292.367018] R13: 0000000000000000 R14: 0000000000000000 R15:
> ffff88086de4f9d8
> [793292.367473] FS:  0000000000000000(0000) GS:ffff88086de40000(0000)
> knlGS:0000000000000000
> [793292.367935] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [793292.368179] CR2: 0000000000000014 CR3: 00000000019ba000 CR4:
> 00000000003407e0
> [793292.368640] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> 0000000000000000
> [793292.369106] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
> 0000000000000400
> [793292.369571] Stack:
> [793292.369801]  ffff88086de56c40 0000000000016c40 0000000000000000
> 0000000000000000
> [793292.370282]  ffff88086de4f9d8 ffff88086de43e60 ffffffff810c8682
> 0000000000000000
> [793292.370764]  0000000000000000 0000000000000003 ffffffff810f3180
> ffff88086de4f9d8
> [793292.371250] Call Trace:
> [793292.371484]  <IRQ>
> [793292.371494]
> [793292.371730]  [<ffffffff810c8682>] account_process_tick+0x62/0x170
> [793292.371973]  [<ffffffff810f3180>] ? 
> tick_sched_handle.isra.13+0x60/0x60
> [793292.372218]  [<ffffffff8109932d>] update_process_times+0x2d/0x80
> [793292.372465]  [<ffffffff810f3145>] 
> tick_sched_handle.isra.13+0x25/0x60
> [793292.372712]  [<ffffffff810f31c1>] tick_sched_timer+0x41/0x70
> [793292.372957]  [<ffffffff810b4a32>] __hrtimer_run_queues+0xd2/0x260
> [793292.373197]  [<ffffffff810b4fd0>] hrtimer_interrupt+0xb0/0x1e0
> [793292.373445]  [<ffffffff81050fd7>] 
> local_apic_timer_interrupt+0x37/0x60
> [793292.373692]  [<ffffffff8169920f>] 
> smp_apic_timer_interrupt+0x3f/0x60
> [793292.373935]  [<ffffffff8169775d>] apic_timer_interrupt+0x6d/0x80
> [793292.374178]  <EOI>
> [793292.374187]
> [793292.374423]  [<ffffffff81514492>] ? cpuidle_enter_state+0x52/0xc0
> [793292.374664]  [<ffffffff815145d9>] cpuidle_idle_call+0xd9/0x210
> [793292.374908]  [<ffffffff810350ee>] arch_cpu_idle+0xe/0x30
> [793292.375154]  [<ffffffff810e7e65>] cpu_startup_entry+0x245/0x290
> [793292.375398]  [<ffffffff8104f07a>] start_secondary+0x1ba/0x230
> [793292.375640] Code: e8 81 63 07 00 5b 41 5c 41 5d 41 5e 5d c3 0f 1f
> 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 41 55 41 54
> 49 89 d4 53 <f6> 47 14 10 48 89 fb 74 1c 65 48 8b 04 25 b8 cd 00 00 8b
> 80 44
> [793292.376666] RIP  [<ffffffff810c8375>] 
> account_system_time+0x15/0x170
> [793292.376917]  RSP <ffff88086de43e00>
> [793292.377158] CR2: 0000000000000014
> crash> dis -rl ffffffff810c8375
> /usr/src/debug/kernel-3.10.0-514.10.2.el7/linux-3.10.0-514.10.2.el7.x86_64/kernel/sched/cputime.c:
> 213
> 0xffffffff810c8360 <account_system_time>:       nopl
> 0x0(%rax,%rax,1) [FTRACE NOP]
> 0xffffffff810c8365 <account_system_time+5>:     push   %rbp
> 0xffffffff810c8366 <account_system_time+6>:     mov    %rsp,%rbp
> 0xffffffff810c8369 <account_system_time+9>:     push   %r15
> 0xffffffff810c836b <account_system_time+11>:    push   %r14
> 0xffffffff810c836d <account_system_time+13>:    push   %r13
> 0xffffffff810c836f <account_system_time+15>:    push   %r12
> 0xffffffff810c8371 <account_system_time+17>:    mov    %rdx,%r12
> 0xffffffff810c8374 <account_system_time+20>:    push   %rbx
> /usr/src/debug/kernel-3.10.0-514.10.2.el7/linux-3.10.0-514.10.2.el7.x86_64/kernel/sched/cputime.c:
> 216
> 0xffffffff810c8375 <account_system_time+21>:    testb  $0x10,0x14(%rdi)
> 
> 2. Second crash, linux-3.10.0-514.16.1.el7.x86_64:
> [647323.702265] BUG: unable to handle kernel NULL pointer dereference
> at           (null)
> [647323.702774] IP: [<ffffffff8168e48f>] 
> _raw_spin_lock_irqsave+0x1f/0x60
> [647323.703030] PGD 0
> [647323.703274] Oops: 0002 [#1] SMP
> [647323.703519] Modules linked in: mpt3sas mpt2sas raid_class
> scsi_transport_sas mptctl mptbase vfat fat drbd(OE) bonding ipt_REJECT
> nf_reject_ipv4 nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack
> nf_conntrack iptable_filter dm_cache_smq dm_cache dm_persistent_data
> dm_bio_prison dm_bufio intel_powerclamp coretemp intel_rapl iosf_mbi
> kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw
> gf128mul glue_helper ablk_helper cryptd iTCO_wdt ipmi_devintf
> iTCO_vendor_support sb_edac sg pcspkr ipmi_si edac_core mxm_wmi dcdbas
> ipmi_msghandler mei_me mei lpc_ich shpchp acpi_power_meter wmi nfsd
> auth_rpcgss nfs_acl lockd grace sunrpc tcp_htcp ip_tables xfs
> libcrc32c sd_mod crc_t10dif crct10dif_generic uas usb_storage
> crct10dif_pclmul crct10dif_common crc32c_intel drm_kms_helper
> syscopyarea sysfillrect sysimgblt
> [647323.706272]  fb_sys_fops ttm ixgbe ahci igb drm libahci mdio ptp
> libata i2c_algo_bit pps_core i2c_core megaraid_sas dca fjes dm_mirror
> dm_region_hash dm_log dm_mod [last unloaded: dell_rbu]
> [647323.707081] CPU: 3 PID: 0 Comm: swapper/3 Tainted: G           OE
> ------------   3.10.0-514.16.1.el7.x86_64 #1
> [647323.707562] Hardware name: Dell Inc. PowerEdge R730xd/072T6D, BIOS
> 2.3.4 11/08/2016
> [647323.708033] task: ffff8804fa61edd0 ti: ffff8804fa620000 task.ti:
> ffff8804fa620000
> [647323.708506] RIP: 0010:[<ffffffff8168e48f>]  [<ffffffff8168e48f>]
> _raw_spin_lock_irqsave+0x1f/0x60
> [647323.708986] RSP: 0018:ffff8804fa623e10  EFLAGS: 00010082
> [647323.709227] RAX: 0000000000000082 RBX: ffff88086de4f8e0 RCX:
> 000000000c0d81e5
> [647323.709698] RDX: 0000000000020000 RSI: ffff8804fa623e48 RDI:
> 0000000000000000
> [647323.710163] RBP: ffff8804fa623e10 R08: 0000000000000082 R09:
> 0000000000000000
> [647323.710631] R10: 0000000000000004 R11: 0000000000000005 R12:
> ffff88086de4fe10
> [647323.711100] R13: ffff8804fa623e48 R14: ffff8804fa620000 R15:
> 0000000000000000
> [647323.711578] FS:  0000000000000000(0000) GS:ffff88086de40000(0000)
> knlGS:0000000000000000
> [647323.712051] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [647323.712295] CR2: 0000000000000000 CR3: 00000000019ba000 CR4:
> 00000000003407e0
> [647323.712851] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> 0000000000000000
> [647323.713311] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
> 0000000000000400
> [647323.713771] Stack:
> [647323.714001]  ffff8804fa623e38 ffffffff810b4735 ffff88086de4fde0
> 00000000ffffffff
> [647323.714490]  ffff8804fa620000 ffff8804fa623e70 ffffffff810b4f37
> ffffffff81514a2a
> [647323.714983]  a88acbe2089d5b89 ffff88086de4fde0 00024cbb3a6654d9
> ffff8804fa620000
> [647323.715467] Call Trace:
> [647323.715707]  [<ffffffff810b4735>] 
> lock_hrtimer_base.isra.20+0x25/0x50
> [647323.715949]  [<ffffffff810b4f37>] 
> hrtimer_try_to_cancel.part.25+0x37/0x100
> [647323.716202]  [<ffffffff81514a2a>] ? cpuidle_enter_state+0x5a/0xc0
> [647323.716445]  [<ffffffff810b5048>] hrtimer_cancel+0x28/0x40
> [647323.716691]  [<ffffffff810f36d7>] tick_nohz_restart+0x17/0x70
> [647323.716935]  [<ffffffff810f417f>] tick_nohz_idle_exit+0x8f/0x150
> [647323.717182]  [<ffffffff810e81d1>] cpu_startup_entry+0x171/0x290
> [647323.717434]  [<ffffffff8104f07a>] start_secondary+0x1ba/0x230
> [647323.717676] Code: df 0f 1f 80 00 00 00 00 eb e0 66 90 0f 1f 44 00
> 00 55 48 89 e5 9c 58 0f 1f 44 00 00 49 89 c0 fa 66 0f 1f 44 00 00 ba
> 00 00 02 00 <f0> 0f c1 17 89 d1 c1 e9 10 66 39 d1 75 05 4c 89 c0 5d c3
> 83 e1
> [647323.718706] RIP  [<ffffffff8168e48f>] 
> _raw_spin_lock_irqsave+0x1f/0x60
> [647323.718952]  RSP <ffff8804fa623e10>
> [647323.719191] CR2: 0000000000000000
> crash> dis -rl ffffffff8168e48f
> /usr/src/debug/kernel-3.10.0-514.16.1.el7/linux-3.10.0-514.16.1.el7.x86_64/kernel/spinlock.c:
> 144
> 0xffffffff8168e470 <_raw_spin_lock_irqsave>:    nopl
> 0x0(%rax,%rax,1) [FTRACE NOP]
> 0xffffffff8168e475 <_raw_spin_lock_irqsave+5>:  push   %rbp
> 0xffffffff8168e476 <_raw_spin_lock_irqsave+6>:  mov    %rsp,%rbp
> /usr/src/debug/kernel-3.10.0-514.16.1.el7/linux-3.10.0-514.16.1.el7.x86_64/arch/x86/include/asm/paravirt.h:
> 775
> 0xffffffff8168e479 <_raw_spin_lock_irqsave+9>:  pushfq
> 0xffffffff8168e47a <_raw_spin_lock_irqsave+10>: pop    %rax
> 0xffffffff8168e47b <_raw_spin_lock_irqsave+11>: nopl   0x0(%rax,%rax,1)
> 0xffffffff8168e480 <_raw_spin_lock_irqsave+16>: mov    %rax,%r8
> /usr/src/debug/kernel-3.10.0-514.16.1.el7/linux-3.10.0-514.16.1.el7.x86_64/arch/x86/include/asm/paravirt.h:
> 785
> 0xffffffff8168e483 <_raw_spin_lock_irqsave+19>: cli
> 0xffffffff8168e484 <_raw_spin_lock_irqsave+20>: nopw   0x0(%rax,%rax,1)
> /usr/src/debug/kernel-3.10.0-514.16.1.el7/linux-3.10.0-514.16.1.el7.x86_64/arch/x86/include/asm/spinlock.h:
> 86
> 0xffffffff8168e48a <_raw_spin_lock_irqsave+26>: mov    $0x20000,%edx
> 0xffffffff8168e48f <_raw_spin_lock_irqsave+31>: lock xadd %edx,(%rdi)
> 
> In both cases RDI was NULL. *And* there's no evidence in any of the 2
> stacktraces of DRBD causing the crash.

Just a short update on this. After upgrading to 
kernel-3.10.0-514.21.1.el7.x86_64 I haven't seen any more crashes. The 
same drbd module has been in use all along and still is:
modinfo drbd
filename:       
/lib/modules/3.10.0-514.21.1.el7.x86_64/weak-updates/drbd84/drbd.ko
alias:          block-major-147-*
license:        GPL
version:        8.4.9-1
description:    drbd - Distributed Replicated Block Device v8.4.9-1
author:         Philipp Reisner <phil at linbit.com>, Lars Ellenberg 
<lars at linbit.com>
rhelversion:    7.3
srcversion:     D502CE1D6329A5626F8A7CD
depends:        libcrc32c
vermagic:       3.10.0-514.el7.x86_64 SMP mod_unload modversions
signer:         The ELRepo Project (http://elrepo.org): ELRepo.org 
Secure Boot Key
sig_key:        
F3:65:AD:34:81:A7:B2:0E:34:27:B6:1B:2A:26:63:5B:83:FE:42:7B
sig_hashalgo:   sha256
parm:           minor_count:Approximate number of drbd devices (1-255) 
(uint)
parm:           disable_sendpage:bool
parm:           allow_oos:DONT USE! (bool)
parm:           proc_details:int
parm:           enable_faults:int
parm:           fault_rate:int
parm:           fault_count:int
parm:           fault_devs:int
parm:           usermode_helper:string

Cheers,

---
Adi Pircalabu



More information about the drbd-user mailing list