Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
FWIW, this very repeatable problem on 2.6.15-1.2054_FC5xen0 no longer
seems to exist on 2.6.16-1.2080_FC5xen0.
On Mar 27, 2006, at 10:40 AM, Ben wrote:
> Hey guys, I'm trying to run some Xen virtual machines on top of
> DRBD to get failover protection, and it's almost working great.
> Unfortunately, I occasionally get an oops on one of my domUs, and
> it looks like this:
>
> Mar 27 01:39:36 johnny kernel: Unable to handle kernel paging
> request at ffff8800e53ba000 RIP:
> Mar 27 01:39:36 johnny kernel: <ffffffff80179b5b>{__bio_clone+46}
> Mar 27 01:39:36 johnny kernel: PGD 10d9067 PUD 16dd067 PMD 1807067
> PTE 0
> Mar 27 01:39:36 johnny kernel: Oops: 0000 [1] SMP
> Mar 27 01:39:36 johnny kernel: CPU 0
> Mar 27 01:39:36 johnny kernel: Modules linked in: xt_physdev drbd
> (U) ipv6 bridge w83627hf hwmon_vid hwmon eeprom i2c_isa
> ip_conntrack_netbios_ns ipt_REJECT xt_state ip_conntrack nfnetlink
> ipt_LOG xt_tcp udp iptable_filter ip_tables x_tables video button
> battery ac lp parport_pc parport nvram ohci1394 ieee1394 sg e100
> mii i2c_nforce2 i2c_core forcedeth dm_snapshot dm_zero dm_mirror
> dm_mod ext3 jbd sata_nv libata aacraid sd_mod scsi_mod
> Mar 27 01:39:36 johnny kernel: Pid: 5229, comm: xvd 8 93:02 Not
> tainted 2.6.15-1.2054_FC5xen0 #1
> Mar 27 01:39:36 johnny kernel: RIP: e030:[<ffffffff80179b5b>]
> <ffffffff80179b5b>{__bio_clone+46}
> Mar 27 01:39:36 johnny kernel: RSP: e02b:ffff8800ac24d948 EFLAGS:
> 00010216
> Mar 27 01:39:36 johnny kernel: RAX: ffff8800e53b9f50 RBX:
> ffff8800e49b9d40 RCX: 0000000000000050
> Mar 27 01:39:36 johnny kernel: RDX: ffff8800e53b9e80 RSI:
> ffff8800e53ba000 RDI: ffff8800a9785d30
> Mar 27 01:39:36 johnny kernel: RBP: ffff8800e702b338 R08:
> 0000000006ffb100 R09: ffff88000189c000
> Mar 27 01:39:36 johnny kernel: R10: 0000000000001000 R11:
> 0000000000000001 R12: 0000000000000023
> Mar 27 01:39:36 johnny kernel: R13: ffff8800e53b9e80 R14:
> ffff8800e4766150 R15: 0000000000000008
> Mar 27 01:39:36 johnny kernel: FS: 00002abf280251c0(0000)
> GS:ffffffff80499000(0000) knlGS:0000000000000000
> Mar 27 01:39:36 johnny kernel: CS: e033 DS: 0000 ES: 0000
> Mar 27 01:39:36 johnny kernel: Process xvd 8 93:02 (pid: 5229,
> threadinfo ffff8800ac24c000, task ffff8800a6ff6040)
> Mar 27 01:39:36 johnny kernel: Stack: ffff8800e53b9e80
> ffff8800e49b9d40 ffff8800e53b9e80 ffffffff80179bed
> Mar 27 01:39:36 johnny kernel: ffff8800e70250d0
> 0000000000000023 ffff8800e70250d0 ffffffff88209471
> Mar 27 01:39:36 johnny kernel: 0000000000047ffd
> 00000001f1ba2a08
> Mar 27 01:39:36 johnny kernel: Call Trace: <ffffffff80179bed>
> {bio_clone+53} <ffffffff88209471>{:drbd:drbd_make_request_26+1046}
> Mar 27 01:39:36 johnny kernel: <ffffffff80155bbf>
> {mempool_alloc+66} <ffffffff8032835c>{_spin_unlock_irqrestore+9}
> Mar 27 01:39:36 johnny kernel: <ffffffff88086544>
> {:dm_mod:dm_request+345} <ffffffff8820924a>
> {:drbd:drbd_make_request_26+495}
> Mar 27 01:39:36 johnny kernel: <ffffffff801e9225>
> {generic_make_request+365} <ffffffff801ea61a>{submit_bio+186}
> Mar 27 01:39:36 johnny kernel: <ffffffff80266851>
> {dispatch_rw_block_io+994} <ffffffff80266c6a>{blkif_schedule+944}
> Mar 27 01:39:36 johnny kernel: <ffffffff80124780>
> {__wake_up_common+62} <ffffffff80141339>{autoremove_wake_function+0}
> Mar 27 01:39:36 johnny kernel: <ffffffff80140f17>
> {keventd_create_kthread+0} <ffffffff802668ba>{blkif_schedule+0}
> Mar 27 01:39:37 johnny kernel: <ffffffff80140f17>
> {keventd_create_kthread+0} <ffffffff80141200>{kthread+212}
> Mar 27 01:39:37 johnny kernel: <ffffffff8010b856>{child_rip
> +8} <ffffffff80140f17>{keventd_create_kthread+0}
> Mar 27 01:39:37 johnny kernel: <ffffffff8014112c>{kthread+0}
> <ffffffff8010b84e>{child_rip+0}
> Mar 27 01:39:37 johnny kernel:
> Mar 27 01:39:37 johnny kernel: Code: f3 a4 48 8b 02 48 89 03 48 8b
> 42 10 48 89 43 10 48 83 4b 18
> Mar 27 01:39:37 johnny kernel: RIP <ffffffff80179b5b>{__bio_clone
> +46} RSP <ffff8800ac24d948>
> Mar 27 01:39:37 johnny kernel: CR2: ffff8800e53ba000
>
>
> It appears the domU is trying to write to the DRBD resource, and
> DRBD has issues with that. The domU then becomes unresponsive and
> xen itself begines to degrades ungracefully from there.
>
> I'm running 3 other domUs on top of DRBD, and all of them are
> working flawlessly. So maybe it's an issue with this particular
> resource? It seems to happen somewhat randomly, but has a higher
> chance with higher IO levels.
>
> I'm using DRBD 0.7.17 on 2.6.15-1.2054_FC5xenU.