[Drbd-dev] Re: [Ocfs2-devel] Fixing cluster/heartbeat.c's use of
bio_add_page()
Mark Fasheh
mark.fasheh at oracle.com
Tue Jan 9 23:28:04 CET 2007
Hi Philipp,
On Fri, Jan 05, 2007 at 03:06:52PM +0100, Philipp Reisner wrote:
> Hi There,
>
> As was already pointed out Mathieu Avila on Thu, 07 Sep 2006 03:15:25 -0700
> that OCFS2 is expecting bio_add_page() to add pages to BIOs in an easily
> predictable manner.
>
> That is not true, especially for devices with own merge_bvec_fn().
>
> Therefore OCFS2's heartbeat code is very likely to fail on such devices.
>
> The attached patch corrects this, and reduces the file cluster/heartbeat.c
> by 90 lines.
Thanks for taking this on - fixing up that code has been on my todo list for
a while.
> The whole patch builds around the change that the bio_put() call is
> moved to the bio's bi_end_io() function.
> This makes the whole idea of trying to predict the behaviour of
> bio_add_page() unnecessary, therefore we can remove compute_max_sectors()
> and o2hb_compute_request_limits().
>
> -Phil
>
> PS1: With that applied I can happily run OCFS2 on a DRBD-8.0 device which
> sits on top of software raid (both have interesting merge_bvec_fn()
> functions.
>
> PS2: I created the ocfs2-fix-bio_add_page.diff on 2.6.17 .
> And later refitted the patch to 2.6.20-rc2.
Ok, great. I only looked at the 2.6.20-rc2 patch as that's what I'm running
here.
> static int o2hb_read_slots(struct o2hb_region *reg,
> unsigned int max_slots)
> {
> - unsigned int num_bios, slots_per_bio, start_slot, num_slots;
> - int i, status;
> + unsigned int current_slot=0;
> + int status;
> struct o2hb_bio_wait_ctxt wc;
> - struct bio **bios;
> struct bio *bio;
>
> - o2hb_compute_request_limits(reg, max_slots, &num_bios, &slots_per_bio);
> -
> - bios = kcalloc(num_bios, sizeof(struct bio *), GFP_KERNEL);
> - if (!bios) {
> - status = -ENOMEM;
> - mlog_errno(status);
> - return status;
> - }
> -
> - o2hb_bio_wait_init(&wc, num_bios);
> -
> - num_slots = slots_per_bio;
> - for(i = 0; i < num_bios; i++) {
> - start_slot = i * slots_per_bio;
> + o2hb_bio_wait_init(&wc);
>
> - /* adjust num_slots at last bio */
> - if (max_slots < (start_slot + num_slots))
> - num_slots = max_slots - start_slot;
> -
> - bio = o2hb_setup_one_bio(reg, &wc, start_slot, num_slots);
> + while(current_slot < max_slots) {
> + bio = o2hb_setup_one_bio(reg, &wc, ¤t_slot, max_slots);
> if (IS_ERR(bio)) {
> - o2hb_bio_wait_dec(&wc, num_bios - i);
> -
> status = PTR_ERR(bio);
> mlog_errno(status);
> goto bail_and_wait;
> }
> - bios[i] = bio;
>
> + atomic_inc(&wc.wc_num_reqs);
> submit_bio(READ, bio);
> }
You changed o2hb_bio_wait_init() to initialize wc_num_reqs to zero, and
instead you increment the value before each submit_bio(). This makes sense
because we now don't know how many requests will ultimately be submitted.
However, A race now exists here between the piecemeal increment of
wc_num_reqs and the completion code in o2hb_bio_wait_dec():
if (atomic_dec_and_test(&wc->wc_num_reqs)) {
BUG_ON(num > 0);
complete(&wc->wc_io_complete);
}
So, if enough bios complete before you're done submitting all your read
requests, you could trigger the completion, wc_io_complete early.
This would certainly be hard to hit, but that's never stopped a race from
occuring before :)
So, NACK until we fix that race. Otherwise the patch seems to look fine. I
can test a future version here at Oracle.
Thanks,
--Mark
--
Mark Fasheh
Senior Software Developer, Oracle
mark.fasheh at oracle.com
More information about the drbd-dev
mailing list