Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hey guys, I'm trying to run some Xen virtual machines on top of DRBD to get failover protection, and it's almost working great. Unfortunately, I occasionally get an oops on one of my domUs, and it looks like this: Mar 27 01:39:36 johnny kernel: Unable to handle kernel paging request at ffff8800e53ba000 RIP: Mar 27 01:39:36 johnny kernel: <ffffffff80179b5b>{__bio_clone+46} Mar 27 01:39:36 johnny kernel: PGD 10d9067 PUD 16dd067 PMD 1807067 PTE 0 Mar 27 01:39:36 johnny kernel: Oops: 0000 [1] SMP Mar 27 01:39:36 johnny kernel: CPU 0 Mar 27 01:39:36 johnny kernel: Modules linked in: xt_physdev drbd(U) ipv6 bridge w83627hf hwmon_vid hwmon eeprom i2c_isa ip_conntrack_netbios_ns ipt_REJECT xt_state ip_conntrack nfnetlink ipt_LOG xt_tcp udp iptable_filter ip_tables x_tables video button battery ac lp parport_pc parport nvram ohci1394 ieee1394 sg e100 mii i2c_nforce2 i2c_core forcedeth dm_snapshot dm_zero dm_mirror dm_mod ext3 jbd sata_nv libata aacraid sd_mod scsi_mod Mar 27 01:39:36 johnny kernel: Pid: 5229, comm: xvd 8 93:02 Not tainted 2.6.15-1.2054_FC5xen0 #1 Mar 27 01:39:36 johnny kernel: RIP: e030:[<ffffffff80179b5b>] <ffffffff80179b5b>{__bio_clone+46} Mar 27 01:39:36 johnny kernel: RSP: e02b:ffff8800ac24d948 EFLAGS: 00010216 Mar 27 01:39:36 johnny kernel: RAX: ffff8800e53b9f50 RBX: ffff8800e49b9d40 RCX: 0000000000000050 Mar 27 01:39:36 johnny kernel: RDX: ffff8800e53b9e80 RSI: ffff8800e53ba000 RDI: ffff8800a9785d30 Mar 27 01:39:36 johnny kernel: RBP: ffff8800e702b338 R08: 0000000006ffb100 R09: ffff88000189c000 Mar 27 01:39:36 johnny kernel: R10: 0000000000001000 R11: 0000000000000001 R12: 0000000000000023 Mar 27 01:39:36 johnny kernel: R13: ffff8800e53b9e80 R14: ffff8800e4766150 R15: 0000000000000008 Mar 27 01:39:36 johnny kernel: FS: 00002abf280251c0(0000) GS:ffffffff80499000(0000) knlGS:0000000000000000 Mar 27 01:39:36 johnny kernel: CS: e033 DS: 0000 ES: 0000 Mar 27 01:39:36 johnny kernel: Process xvd 8 93:02 (pid: 5229, threadinfo ffff8800ac24c000, task ffff8800a6ff6040) Mar 27 01:39:36 johnny kernel: Stack: ffff8800e53b9e80 ffff8800e49b9d40 ffff8800e53b9e80 ffffffff80179bed Mar 27 01:39:36 johnny kernel: ffff8800e70250d0 0000000000000023 ffff8800e70250d0 ffffffff88209471 Mar 27 01:39:36 johnny kernel: 0000000000047ffd 00000001f1ba2a08 Mar 27 01:39:36 johnny kernel: Call Trace: <ffffffff80179bed>{bio_clone+53} <ffffffff88209471>{:drbd:drbd_make_request_26+1046} Mar 27 01:39:36 johnny kernel: <ffffffff80155bbf>{mempool_alloc+66} <ffffffff8032835c>{_spin_unlock_irqrestore+9} Mar 27 01:39:36 johnny kernel: <ffffffff88086544>{:dm_mod:dm_request+345} <ffffffff8820924a>{:drbd:drbd_make_request_26+495} Mar 27 01:39:36 johnny kernel: <ffffffff801e9225>{generic_make_request+365} <ffffffff801ea61a>{submit_bio+186} Mar 27 01:39:36 johnny kernel: <ffffffff80266851>{dispatch_rw_block_io+994} <ffffffff80266c6a>{blkif_schedule+944} Mar 27 01:39:36 johnny kernel: <ffffffff80124780>{__wake_up_common+62} <ffffffff80141339>{autoremove_wake_function+0} Mar 27 01:39:36 johnny kernel: <ffffffff80140f17>{keventd_create_kthread+0} <ffffffff802668ba>{blkif_schedule+0} Mar 27 01:39:37 johnny kernel: <ffffffff80140f17>{keventd_create_kthread+0} <ffffffff80141200>{kthread+212} Mar 27 01:39:37 johnny kernel: <ffffffff8010b856>{child_rip+8} <ffffffff80140f17>{keventd_create_kthread+0} Mar 27 01:39:37 johnny kernel: <ffffffff8014112c>{kthread+0} <ffffffff8010b84e>{child_rip+0} Mar 27 01:39:37 johnny kernel: Mar 27 01:39:37 johnny kernel: Code: f3 a4 48 8b 02 48 89 03 48 8b 42 10 48 89 43 10 48 83 4b 18 Mar 27 01:39:37 johnny kernel: RIP <ffffffff80179b5b>{__bio_clone+46} RSP <ffff8800ac24d948> Mar 27 01:39:37 johnny kernel: CR2: ffff8800e53ba000 It appears the domU is trying to write to the DRBD resource, and DRBD has issues with that. The domU then becomes unresponsive and xen itself begines to degrades ungracefully from there. I'm running 3 other domUs on top of DRBD, and all of them are working flawlessly. So maybe it's an issue with this particular resource? It seems to happen somewhat randomly, but has a higher chance with higher IO levels. I'm using DRBD 0.7.17 on 2.6.15-1.2054_FC5xenU.