[DRBD-user] Kernel panic with CentOS 6.0, drbd, pacemaker

Dominik Epple Dominik.Epple at EMEA.NEC.COM
Wed Aug 24 17:12:41 CEST 2011

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Hello,

correct me if I am wrong, but this looks like the problem reported by me and others (see http://lists.linbit.com/pipermail/drbd-user/2011-August/016703.html and the references therein), which happens with RHEL 6 (or compatible), DRBD, and DLM.

While I agree that it is strange that a userspace tool triggers a kernel panic, it has been mentioned that the drbdadm process which is relevant here has been created from kernel space as some userspace callout helper program (I currently don't remember the correct wording), which makes the situation a bit more delicate.

I would be very happy if someone could explicitly setup this combination (RHEL 6 (or compatible) + dual-primary DRBD + DLM) and report it running. I am only aware of some people reporting their problems, and no people reporting success. 

I mean, while not being exactly a supported kernel (2.6.32 with lots of patches vs 2.6.33 which is the first kernel where DRBD is in the mainline), the RHEL6 kernel is probably one of the most prevalent kernels out there...

Regards,
Dominik


> -----Ursprüngliche Nachricht-----
> Von: drbd-user-bounces at lists.linbit.com [mailto:drbd-user-
> bounces at lists.linbit.com] Im Auftrag von Lars Ellenberg
> Gesendet: Mittwoch, 24. August 2011 16:17
> An: Peter Hinse
> Cc: drbd-user at lists.linbit.com
> Betreff: Re: [DRBD-user] Kernel panic with CentOS 6.0, drbd, pacemaker
> 
> On Wed, Aug 24, 2011 at 11:27:56AM +0200, Peter Hinse wrote:
> > Hi all,
> >
> > I am trying to set up a KVM cluster with CentOS 6.0,
> > corosync/pacemaker, dual-primary drbd and KVM. Whenever I restart the
> > corosync process or reboot one of the machines, I get a kernel panic
> > and one (or even both) machine die.
> >
> > I tried all the tipps I found in mailing lists or bugtrackers like
> > loading the drbd module with disable_sendpage=1 or disabling
> > checksumming and generic segmentation offload via ethtool.
> 
> That would be expected, none of those would have anything to do with this
> issue.
> 
> > Same happens with drbd83 and drbd84 packages from elrepo and with a
> > self-compiled drbd84 from linbit sources.
> 
> I don't think this is anything DRBD specific.
> 
> The Oops happens from drbdadm, which is a normal user space tool, when
> doing some ioctl on apparently some socket.
> 
> Doing some ioctl on some socket file descriptor from userland should not be
> able to trigger an oops.
> 
> Try to search for similar symptoms not involving DRBD.
> 
> 
> Some more comments:
> 
> > /etc/drbd.conf:
> >
> > global {
> >   dialog-refresh	1;
> >   minor-count		5;
> >   usage-count		no;
> > }
> >
> > common {
> > }
> >
> > resource r0 {
> >   protocol		C;
> >   disk {
> >     on-io-error		pass_on;
> 
> You actually want "detach" there.
> 
> >   }
> >
> >   syncer {
> >     rate		100M;
> >   }
> >
> >   net {
> >     allow-two-primaries yes;
> >     after-sb-0pri	discard-zero-changes;
> >     after-sb-1pri	discard-secondary;
> 
> Configuring automatic data loss.
> Hope this was a concious decision.
> 
> >     after-sb-2pri	disconnect;
> >   }
> 
> 
> You need
> 	fencing resource-and-stonith;
> and appropriate fencing handlers (the "obliterate peer" one would probably
> be the right one).  Of course you need stonith configured and working in your
> cluster first.
> 
> >   startup {
> >     wfc-timeout		10;
> >     become-primary-on	both;
> 
> This is a "fair weather setup".  It will fail (aka: behave in strange and
> unexpected ways) when things go wrong.
> 
> Getting a DRBD dual-primary cluster file system setup to work reliably in
> face of errors is a bit more complex.
> 
> And you really need fencing (stonith).
> 
> >   }
> >
> >   on proxy03 {
> >     device		/dev/drbd0;
> >     address		10.10.10.27:7788;
> >     meta-disk		internal;
> >     disk		/dev/sysvg/kvm;
> >   }
> > n   on proxy04 {
> >     device		/dev/drbd0;
> >     address		10.10.10.28:7788;
> >     meta-disk		internal;
> >     disk		/dev/sysvg/kvm;
> >   }
> > }
> >
> > last messages from /var/log/messages:
> >
> > Aug 24 10:43:55 proxy03 kernel: d-con r0: Handshake successful: Agreed
> > network protocol version 100 Aug 24 10:43:55 proxy03 kernel: d-con r0:
> > conn( WFConnection -> WFReportParams ) Aug 24 10:43:55 proxy03
> kernel:
> > d-con r0: Starting asender thread (from
> > drbd_r_r0 [19247])
> > Aug 24 10:43:55 proxy03 kernel: block drbd0: drbd_sync_handshake:
> > Aug 24 10:43:55 proxy03 kernel: block drbd0: self
> >
> 52406041848E78A3:F32F8530A9B9C955:66C1B63DDC072892:66C0B63DDC0
> 72893
> > bits:0 flags:0
> > Aug 24 10:43:55 proxy03 kernel: block drbd0: peer
> >
> F32F8530A9B9C954:0000000000000000:66C1B63DDC072893:66C0B63DDC07
> 2893
> > bits:0 flags:0
> > Aug 24 10:43:55 proxy03 kernel: block drbd0: uuid_compare()=1 by rule
> > 70 Aug 24 10:43:55 proxy03 kernel: block drbd0: peer( Unknown ->
> > Secondary
> > ) conn( WFReportParams -> WFBitMapS ) pdsk( DUnknown -> Consistent )
> > Aug 24 10:43:55 proxy03 kernel: block drbd0: send bitmap stats
> > [Bytes(packets)]: plain 0(0), RLE 21(1), total 21; compression: 99.9%
> > Aug 24 10:43:55 proxy03 kernel: block drbd0: receive bitmap stats
> > [Bytes(packets)]: plain 0(0), RLE 21(1), total 21; compression: 99.9%
> > Aug 24 10:43:55 proxy03 kernel: block drbd0: helper command:
> > /sbin/drbdadm before-resync-source minor-0 Aug 24 10:43:55 proxy03
> > kernel: BUG: unable to handle kernel NULL pointer dereference at
> > 0000000000000038 Aug 24 10:43:55 proxy03 kernel: IP:
> > [<ffffffff813fda60>]
> > sock_ioctl+0x30/0x280
> > Aug 24 10:43:55 proxy03 kernel: PGD 242b39067 PUD 2422a0067 PMD 0
> Aug
> > 24 10:43:55 proxy03 kernel: Oops: 0000 [#1] SMP
> >
> > Message from syslogd at proxy03 at Aug 24 10:43:55 ...
> >  kernel:Oops: 0000 [#1] SMP
> > Aug 24 10:43:55 proxy03 kernel: last sysfs file:
> > /sys/devices/system/cpu/cpu3/cache/index2/shared_cpu_map
> >
> > Message from syslogd at proxy03 at Aug 24 10:43:55 ...
> >  kernel:last sysfs file:
> > /sys/devices/system/cpu/cpu3/cache/index2/shared_cpu_map
> > Aug 24 10:43:55 proxy03 kernel: CPU 3
> > Aug 24 10:43:55 proxy03 kernel: Modules linked in: sctp gfs2 dlm
> > configfs drbd(U) libcrc32c sunrpc cpufreq_ondemand acpi_cpufreq
> > freq_table bonding ipv6 dm_mirror dm_region_hash dm_log cdc_ether
> > usbnet mii serio_raw i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support sg
> > shpchp ioatdma dca i7core_edac edac_core bnx2 ext3 jbd mbcache
> sd_mod
> > crc_t10dif megaraid_sas ata_generic pata_acpi ata_piix dm_mod [last
> > unloaded: microcode]
> > Aug 24 10:43:55 proxy03 kernel:
> > Aug 24 10:43:55 proxy03 kernel: Modules linked in: sctp gfs2 dlm
> > configfs drbd(U) libcrc32c sunrpc cpufreq_ondemand acpi_cpufreq
> > freq_table bonding ipv6 dm_mirror dm_region_hash dm_log cdc_ether
> > usbnet mii serio_raw i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support sg
> > shpchp ioatdma dca i7core_edac edac_core bnx2 ext3 jbd mbcache
> sd_mod
> > crc_t10dif megaraid_sas ata_generic pata_acpi ata_piix dm_mod [last
> > unloaded: microcode]
> > Aug 24 10:43:55 proxy03 kernel: Pid: 20331, comm: drbdadm Not tainted
> > 2.6.32-71.29.1.el6.x86_64 #1 System x3550 M3 -[7944KBG]- Aug 24
> > 10:43:55 proxy03 kernel: RIP: 0010:[<ffffffff813fda60>]
> > [<ffffffff813fda60>] sock_ioctl+0x30/0x280 Aug 24 10:43:55 proxy03
> > kernel: RSP: 0018:ffff880242949e38  EFLAGS: 00010282 Aug 24 10:43:55
> > proxy03 kernel: RAX: 0000000000000000 RBX:
> > 0000000000005401 RCX: 00007fff34be3c40 Aug 24 10:43:55 proxy03
> kernel:
> > RDX: 00007fff34be3c40 RSI:
> > 0000000000005401 RDI: ffff880242b0b840 Aug 24 10:43:55 proxy03
> kernel:
> > RBP: ffff880242949e58 R08:
> > ffffffff81536380 R09: 000000316920e930 Aug 24 10:43:55 proxy03 kernel:
> > R10: 00007fff34be3a50 R11:
> > 0000000000000202 R12: 00007fff34be3c40 Aug 24 10:43:55 proxy03
> kernel:
> > R13: 00007fff34be3c40 R14:
> > ffff880252493140 R15: 0000000000000000 Aug 24 10:43:55 proxy03
> kernel:
> > FS:  00007fe14fe14700(0000)
> > GS:ffff88002f660000(0000) knlGS:0000000000000000 Aug 24 10:43:55
> > proxy03 kernel: CS:  0010 DS: 0000 ES: 0000 CR0:
> > 0000000080050033
> > Aug 24 10:43:55 proxy03 kernel: CR2: 0000000000000038 CR3:
> > 0000000242196000 CR4: 00000000000006e0 Aug 24 10:43:55 proxy03
> kernel:
> > DR0: 0000000000000000 DR1:
> > 0000000000000000 DR2: 0000000000000000 Aug 24 10:43:55 proxy03
> kernel:
> > DR3: 0000000000000000 DR6:
> > 00000000ffff0ff0 DR7: 0000000000000400 Aug 24 10:43:55 proxy03 kernel:
> > Process drbdadm (pid: 20331, threadinfo ffff880242948000, task
> > ffff8802714c34e0) Aug 24 10:43:55 proxy03 kernel: Stack:
> >
> > Message from syslogd at proxy03 at Aug 24 10:43:55 ...
> >  kernel:Stack:
> > Aug 24 10:43:55 proxy03 kernel: ffff880242b0b840 ffff880252493188
> > 00007fff34be3c40 0000000000000000
> > Aug 24 10:43:55 proxy03 kernel: <0> ffff880242949e98 ffffffff8117fdf2
> > ffff880242949eb8 0000000000000001
> > Aug 24 10:43:55 proxy03 kernel: <0> 0000000000402340
> 0000003169ad9050
> > ffff8802429db080 ffff880242b0b840
> > Aug 24 10:43:55 proxy03 kernel: Call Trace:
> >
> > Message from syslogd at proxy03 at Aug 24 10:43:55 ...
> >  kernel:Call Trace:
> > Aug 24 10:43:55 proxy03 kernel: [<ffffffff8117fdf2>]
> > vfs_ioctl+0x22/0xa0 Aug 24 10:43:55 proxy03 kernel:
> > [<ffffffff8117ff94>] do_vfs_ioctl+0x84/0x580 Aug 24 10:43:55 proxy03
> kernel: [<ffffffff8113676d>] ?
> > handle_mm_fault+0x1ed/0x2b0
> > Aug 24 10:43:55 proxy03 kernel: [<ffffffff81180511>]
> > sys_ioctl+0x81/0xa0 Aug 24 10:43:55 proxy03 kernel:
> > [<ffffffff81013172>] system_call_fastpath+0x16/0x1b Aug 24 10:43:55
> > proxy03 kernel: Code: 83 ec 20 48 89 1c 24 4c 89 64 24
> > 08 4c 89 6c 24 10 4c 89 74 24 18 0f 1f 44 00 00 4c 8b b7 a0 00 00 00
> > 89
> > f3 49 89 d4 49 8b 46 38 <4c> 8b 68 38 8d 83 10 76 ff ff 83 f8 0f 76 51
> > 8d 83 00 75 ff ff
> >
> > Message from syslogd at proxy03 at Aug 24 10:43:55 ...
> >  kernel:Code: 83 ec 20 48 89 1c 24 4c 89 64 24 08 4c 89 6c 24 10 4c 89
> > 74 24 18 0f 1f 44 00 00 4c 8b b7 a0 00 00 00 89 f3 49 89 d4 49 8b 46
> > 38 <4c> 8b 68 38 8d 83 10 76 ff ff 83 f8 0f 76 51 8d 83 00 75 ff ff
> > Aug 24 10:43:55 proxy03 kernel: RIP  [<ffffffff813fda60>]
> > sock_ioctl+0x30/0x280
> > Aug 24 10:43:55 proxy03 kernel: RSP <ffff880242949e38> Aug 24 10:43:55
> > proxy03 kernel: CR2: 0000000000000038
> >
> > Message from syslogd at proxy03 at Aug 24 10:43:55 ...
> >  kernel:CR2: 0000000000000038
> > Aug 24 10:43:55 proxy03 kernel: ---[ end trace 2a8c21ee3fd5b98d ]---
> > Aug 24 10:43:55 proxy03 kernel: Kernel panic - not syncing: Fatal
> > exception
> >
> > Message from syslogd at proxy03 at Aug 24 10:43:55 ...
> >  kernel:Kernel panic - not syncing: Fatal exception Aug 24 10:43:55
> > proxy03 kernel: Pid: 20331, comm: drbdadm Tainted: G
> >   D    ----------------  2.6.32-71.29.1.el6.x86_64 #1
> > Aug 24 10:43:55 proxy03 kernel: Call Trace:
> > Aug 24 10:43:55 proxy03 kernel: [<ffffffff814c8b54>] panic+0x78/0x137
> > Aug 24 10:43:55 proxy03 kernel: [<ffffffff814ccc24>]
> > oops_end+0xe4/0x100 Aug 24 10:43:55 proxy03 kernel:
> > [<ffffffff8104656b>] no_context+0xfb/0x260 Aug 24 10:43:55 proxy03
> > kernel: [<ffffffff810467f5>]
> > __bad_area_nosemaphore+0x125/0x1e0
> >
> > Any ideas? More information needed?
> >
> > Regards,
> >
> > 	Peter
> 
> --
> : Lars Ellenberg
> : LINBIT | Your Way to High Availability
> : DRBD/HA support and consulting http://www.linbit.com
> 
> DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
> __
> please don't Cc me, but send to list   --   I'm subscribed
> _______________________________________________
> drbd-user mailing list
> drbd-user at lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user
> 
> 
>  Click
> https://www.mailcontrol.com/sr/AQxtYMSOx0zTndxI!oX7Us3fUQZbLcSFc5O
> +CBxk4++RDL1pnhOAmrHTjaQHOEwdetNjp1FhaTmOd!jBVaBGJg==  to
> report this email as spam.



More information about the drbd-user mailing list