[DRBD-user] FC5 oops?

Ben bench at silentmedia.com
Mon Mar 27 20:40:12 CEST 2006

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Hey guys, I'm trying to run some Xen virtual machines on top of DRBD to get 
failover protection, and it's almost working great. Unfortunately, I 
occasionally get an oops on one of my domUs, and it looks like this:

Mar 27 01:39:36 johnny kernel: Unable to handle kernel paging request at ffff8800e53ba000 RIP:
Mar 27 01:39:36 johnny kernel: <ffffffff80179b5b>{__bio_clone+46}
Mar 27 01:39:36 johnny kernel: PGD 10d9067 PUD 16dd067 PMD 1807067 PTE 0
Mar 27 01:39:36 johnny kernel: Oops: 0000 [1] SMP
Mar 27 01:39:36 johnny kernel: CPU 0
Mar 27 01:39:36 johnny kernel: Modules linked in: xt_physdev drbd(U) ipv6 bridge w83627hf hwmon_vid hwmon eeprom i2c_isa ip_conntrack_netbios_ns ipt_REJECT xt_state ip_conntrack nfnetlink ipt_LOG xt_tcp udp iptable_filter ip_tables x_tables video button battery ac lp parport_pc parport nvram ohci1394 ieee1394 sg e100 mii i2c_nforce2 i2c_core forcedeth dm_snapshot dm_zero dm_mirror dm_mod ext3 jbd sata_nv libata aacraid sd_mod scsi_mod
Mar 27 01:39:36 johnny kernel: Pid: 5229, comm: xvd 8 93:02 Not tainted 2.6.15-1.2054_FC5xen0 #1
Mar 27 01:39:36 johnny kernel: RIP: e030:[<ffffffff80179b5b>] <ffffffff80179b5b>{__bio_clone+46}
Mar 27 01:39:36 johnny kernel: RSP: e02b:ffff8800ac24d948  EFLAGS: 00010216
Mar 27 01:39:36 johnny kernel: RAX: ffff8800e53b9f50 RBX: ffff8800e49b9d40 RCX: 0000000000000050
Mar 27 01:39:36 johnny kernel: RDX: ffff8800e53b9e80 RSI: ffff8800e53ba000 RDI: ffff8800a9785d30
Mar 27 01:39:36 johnny kernel: RBP: ffff8800e702b338 R08: 0000000006ffb100 R09: ffff88000189c000
Mar 27 01:39:36 johnny kernel: R10: 0000000000001000 R11: 0000000000000001 R12: 0000000000000023
Mar 27 01:39:36 johnny kernel: R13: ffff8800e53b9e80 R14: ffff8800e4766150 R15: 0000000000000008
Mar 27 01:39:36 johnny kernel: FS:  00002abf280251c0(0000) GS:ffffffff80499000(0000) knlGS:0000000000000000
Mar 27 01:39:36 johnny kernel: CS:  e033 DS: 0000 ES: 0000
Mar 27 01:39:36 johnny kernel: Process xvd 8 93:02 (pid: 5229, threadinfo ffff8800ac24c000, task ffff8800a6ff6040)
Mar 27 01:39:36 johnny kernel: Stack: ffff8800e53b9e80 ffff8800e49b9d40 ffff8800e53b9e80 ffffffff80179bed
Mar 27 01:39:36 johnny kernel:        ffff8800e70250d0 0000000000000023 ffff8800e70250d0 ffffffff88209471
Mar 27 01:39:36 johnny kernel:        0000000000047ffd 00000001f1ba2a08
Mar 27 01:39:36 johnny kernel: Call Trace: <ffffffff80179bed>{bio_clone+53} <ffffffff88209471>{:drbd:drbd_make_request_26+1046}
Mar 27 01:39:36 johnny kernel:        <ffffffff80155bbf>{mempool_alloc+66} <ffffffff8032835c>{_spin_unlock_irqrestore+9}
Mar 27 01:39:36 johnny kernel:        <ffffffff88086544>{:dm_mod:dm_request+345} <ffffffff8820924a>{:drbd:drbd_make_request_26+495}
Mar 27 01:39:36 johnny kernel: <ffffffff801e9225>{generic_make_request+365} <ffffffff801ea61a>{submit_bio+186}
Mar 27 01:39:36 johnny kernel: <ffffffff80266851>{dispatch_rw_block_io+994} <ffffffff80266c6a>{blkif_schedule+944}
Mar 27 01:39:36 johnny kernel:        <ffffffff80124780>{__wake_up_common+62} <ffffffff80141339>{autoremove_wake_function+0}
Mar 27 01:39:36 johnny kernel: <ffffffff80140f17>{keventd_create_kthread+0} <ffffffff802668ba>{blkif_schedule+0}
Mar 27 01:39:37 johnny kernel: <ffffffff80140f17>{keventd_create_kthread+0} <ffffffff80141200>{kthread+212}
Mar 27 01:39:37 johnny kernel:        <ffffffff8010b856>{child_rip+8} <ffffffff80140f17>{keventd_create_kthread+0}
Mar 27 01:39:37 johnny kernel:        <ffffffff8014112c>{kthread+0} <ffffffff8010b84e>{child_rip+0}
Mar 27 01:39:37 johnny kernel:
Mar 27 01:39:37 johnny kernel: Code: f3 a4 48 8b 02 48 89 03 48 8b 42 10 48 89 43 10 48 83 4b 18
Mar 27 01:39:37 johnny kernel: RIP <ffffffff80179b5b>{__bio_clone+46} RSP <ffff8800ac24d948>
Mar 27 01:39:37 johnny kernel: CR2: ffff8800e53ba000


It appears the domU is trying to write to the DRBD resource, and DRBD has issues 
with that. The domU then becomes unresponsive and xen itself begines to degrades 
ungracefully from there.

I'm running 3 other domUs on top of DRBD, and all of them are working 
flawlessly. So maybe it's an issue with this particular resource? It seems to 
happen somewhat randomly, but has a higher chance with higher IO levels.

I'm using DRBD 0.7.17 on 2.6.15-1.2054_FC5xenU.



More information about the drbd-user mailing list