[DRBD-user] Xenserver 5.6 FP 1 DRBD crash

Lars Ellenberg lars.ellenberg at linbit.com
Mon Jan 10 22:28:25 CET 2011

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


On Mon, Jan 10, 2011 at 04:07:12PM +0100, Alex Kuehne wrote:
> Quoting Alex Kuehne <alex at movx.de>:
> 
> >Hi guys,
> >
> >This is another report of DRBD not working with Xenserver 5.6 FP1.
> >I tried version 8.3.9 and 8.3.8.1. With Xenserver 5.6 (without
> >FP1) at least version 8.3.8.1 is working.

For your drbdadm segfault with 8.3.10rc1,
please send me the config file it segfaults with (PM or to the list)

For the kernel crash with FP1 kernel:

Did you notice this post?
| Date: Mon, 10 Jan 2011 11:38:27 -0500
| From: Roman <zherom at gmail.com>
| Subject: Re: [DRBD-user] Xenserver 5.6 FP 1 DRBD crash (Alex Kuehne)
| Message-ID: <AANLkTi=Au2uqe68nQ1vkDANW2ZzzNOViJu0+eOv-sG8x at mail.gmail.com>
| 
| This issue was reported to DRBD list and Citrix forums 2 weeks ago.
| There was no official follow up from neither of the vendors.
| Seems it's a Xen new kernel issue which was resolved internally at
| Citrix, but fix is not available at present time. See thread below:
| http://forums.citrix.com/thread.jspa?threadID=279359&tstart=45
| 
| FP1 in general has plenty of issues, but that's beyond this topic.
| http://forums.citrix.com/forum.jspa?forumID=503&start=0
| Hope this saves you some time.

Other than that,
please try both 8.3.8.1 and 8.3.9-5
(not 8.3.9 plain, that is doing spin_lock_irq/spin_unlock_irq in a place
where it should do the irqsave variant).
8.3.9-y branch is
http://git.drbd.org/?p=drbd-8.3.git;a=shortlog;h=refs/heads/drbd-8.3.9-y


> >
> >The crash occured when promoting one side to primary and calling
> >drbd-overview. The system immediately freezes and becomes
> >unresponsive.
> >
> >Another crash scenario is when I try to create a LVM storage on
> >the drbd device. The host where I type the commands crashes, the
> >peer is still available, eg. does not reboot or similar.
> >
> >Here is what Xenserver wrote to crash log before halting:
> >
> ><1>BUG: unable to handle kernel NULL pointer dereference at 00000004
> >        <1>IP: [<c01b9abc>] bio_free+0x2c/0x50
> >        <4>*pdpt = 000000020c6ac027 *pde = 0000000000000000
> >        <0>Oops: 0000 [#1] SMP
> >        <0>last sysfs file: /sys/block/drbd1/dev
> >        <4>Modules linked in: sha1_generic drbd cn lockd sunrpc
> >bonding bridge stp llc ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4
> >xt_state nf_conntrack xt_tcpudp iptable_filter ip_tables x_tables
> >binfmt_misc nls_utf8 isofs dm_mirror video output sbs sbshc fan
> >battery ac parport_pc lp parport nvram sr_mod cdrom sg container
> >e1000e pata_acpi evdev bnx2 button thermal processor thermal_sys
> >ata_piix ata_generic serio_raw 8250_pnp 8250 serial_core rtc_cmos
> >rtc_core rtc_lib tpm_tis tpm i5k_amb hwmon tpm_bios libata
> >i2c_i801 i2c_core pcspkr dm_region_hash dm_log dm_mod ide_gd_mod
> >megaraid_sas sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd
> >usbcore fbcon font tileblit bitblit softcursor [last unloaded:
> >microcode]
> >        <4>
> >        <4>Pid: 0, comm: swapper Not tainted
> >(2.6.32.12-0.7.1.xs5.6.100.307.170586xen #1) PRIMERGY RX200 S4
> >        <4>EIP: 0061:[<c01b9abc>] EFLAGS: 00010246 CPU: 1
> >        <4>EIP is at bio_free+0x2c/0x50
> >        <4>EAX: ee96410c EBX: ee9640c0 ECX: cec03000 EDX: ee96410c
> >        <4>ESI: 00000000 EDI: ee863d20 EBP: ee863cf0 ESP: ee863ce8
> >        <4> DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0069
> >        <0>Process swapper (pid: 0, ti=ee862000 task=ee83d3f0
> >task.ti=ee862000)
> >        <0>Stack:
> >        <4> 00000000 ee8d76a8 ee863cf8 c01b9aeb ee863d00 c01b8385
> >ee863d34 f0c8628b
> >        <4><0> 0000002f 00000001 00000000 c16cc380 cfbf075c
> >00000014 ee863d6c c0145279
> >        <4><0> f0c86200 ee8d76a8 00000000 ee863d40 c01b8290
> >ee9640c0 ee863d64 c020f12c
> >        <0>Call Trace:
> >        <4> [<c01b9aeb>] ? bio_fs_destructor+0xb/0x10
> >        <4> [<c01b8385>] ? bio_put+0x25/0x30
> >        <4> [<f0c8628b>] ? drbd_endio_pri+0x8b/0x130 [drbd]
> >        <4> [<c0145279>] ? sched_clock_local+0xc9/0x1a0
> >        <4> [<f0c86200>] ? drbd_endio_pri+0x0/0x130 [drbd]
> >        <4> [<c01b8290>] ? bio_endio+0x20/0x40
> >        <4> [<c020f12c>] ? req_bio_endio+0x5c/0xd0
> >        <4> [<c020f22e>] ? blk_update_request+0x8e/0x390
> >        <4> [<c020f546>] ? blk_update_bidi_request+0x16/0x60
> >        <4> [<c0210056>] ? blk_end_bidi_request+0x26/0x70
> >        <4> [<c02100b2>] ? blk_end_request+0x12/0x20
> >        <4> [<f037c13c>] ? scsi_io_completion+0x9c/0x480 [scsi_mod]
> >        <4> [<f037bcfc>] ? scsi_device_unbusy+0x8c/0xc0 [scsi_mod]
> >        <4> [<f037564d>] ? scsi_finish_command+0x9d/0x100 [scsi_mod]
> >        <4> [<f037919e>] ? scsi_decide_disposition+0x15e/0x170 [scsi_mod]
> >        <4> [<f037c61d>] ? scsi_softirq_done+0xfd/0x130 [scsi_mod]
> >        <4> [<c021576a>] ? trigger_softirq+0x8a/0xa0
> >        <4> [<c02157e8>] ? blk_done_softirq+0x68/0x80
> >        <4> [<c013117a>] ? __do_softirq+0xba/0x180
> >        <4> [<c01591a7>] ? handle_IRQ_event+0x37/0x100
> >        <4> [<c015c2c4>] ? move_native_irq+0x14/0x50
> >        <4> [<c01312b5>] ? do_softirq+0x75/0x80
> >        <4> [<c013159b>] ? irq_exit+0x2b/0x40
> >        <4> [<c02987e7>] ? evtchn_do_upcall+0x1e7/0x330
> >        <4> [<c012076f>] ? set_next_entity+0x1f/0x50
> >        <4> [<c01046ef>] ? hypervisor_callback+0x43/0x4b
> >        <4> [<c0106f35>] ? xen_safe_halt+0xb5/0x150
> >        <4> [<c010ac4e>] ? xen_idle+0x1e/0x50
> >        <4> [<c0102a7b>] ? cpu_idle+0x3b/0x60
> >        <4> [<c037afdd>] ? cpu_bringup_and_idle+0xd/0x10
> >        <0>Code: 89 e5 83 ec 08 89 1c 24 89 c3 89 74 24 04 89 d6
> >8b 50 38 85 d2 74 14 8d 40 4c 39 c2 74 0d 8b 4b 10 89 f0 c1 e9 1c
> >e8 a4 ff ff ff <2b> 5e 04 8b 56 08 89 d8 e8 97 99 fa ff 8b 1c 24
> >8b 74 24 04 89
> >        <0>EIP: [<c01b9abc>] bio_free+0x2c/0x50 SS:ESP 0069:ee863ce8
> >        <0>CR2: 0000000000000004
> >
> >I hope this is only a minor bug as it used to work with 5.6, I
> >really intend to use DRBD with FP1. So any response is
> >appreciated, if you need further info just give me a note.
> >
> >Best regards
> >Alex Kuehne
> 
> Follow up: I now tried to use version 8.3.10rc1. While doing
> "service drbd start" on the Xenserver, the drbdadm command crashes
> with that error message:
> 
> Jan 10 15:57:50 xs1 kernel: drbd: initialized. Version: 8.3.10rc1
> (api:88/proto:86-96)
> Jan 10 15:57:50 xs1 kernel: drbd: GIT-hash:
> 1a1dfa9f736c091cf4a4b8f8042601f3bcd00c5e build by
> root at std11526-vm01, 2011-01-10 09:38:20
> Jan 10 15:57:50 xs1 kernel: drbd: registered as block device major 147
> Jan 10 15:57:50 xs1 kernel: drbd: minor_table @ 0xea4e30c0
> Jan 10 15:57:50 xs1 kernel: drbdadm[11720]: segfault at 0 ip
> 08052690 sp bffb4f60 error 4 in drbdadm[8048000+23000]
> 
> The drbd resource is not getting initialized at all. I'm building
> everything on Xenserver 5.6 FP1 DDK as RPM package.
> 
> BR,
> Alex Kuehne
> _______________________________________________
> drbd-user mailing list
> drbd-user at lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed



More information about the drbd-user mailing list