Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On 08/08/2015 01:40 AM, Ming Lin wrote: > > On Fri, 2015-08-07 at 09:30 +0200, Christoph Hellwig wrote: >> I'm for solution 3: >> >> - keep blk_bio_{discard,write_same}_split, but ensure we never built >> a > 4GB bio in blkdev_issue_{discard,write_same}. > > This has problem as I mentioned in solution 1. > We need to also make sure max discard size is of proper granularity. > See below example. > > 4G: 8388608 sectors > UINT_MAX: 8388607 sectors > > dm-thinp block size = default discard granularity = 128 sectors > > blkdev_issue_discard(sector=0, nr_sectors=8388608) > > 1. Only ensure bi_size not overflow > > It doesn't work. > > [start_sector, end_sector] > [0, 8388607] > [0, 8388606], then dm-thinp splits it to 2 bios > [0, 8388479] > [8388480, 8388606] ---> this has problem in process_discard_bio(), > because the discard size(7 sectors) covers less than a block(128 sectors) > [8388607, 8388607] ---> same problem > > 2. Ensure bi_size not overflow and max discard size is of proper granularity > > It works. > > [start_sector, end_sector] > [0, 8388607] > [0, 8388479] > [8388480, 8388607] > > > So how about below patch? > > commit 1ca2ad977255efb3c339f4ca16fb798ed5ec54f7 > Author: Ming Lin <ming.l at ssi.samsung.com> > Date: Fri Aug 7 15:07:07 2015 -0700 > > block: remove split code in blkdev_issue_{discard,write_same} > > The split code in blkdev_issue_{discard,write_same} can go away > now that any driver that cares does the split. We have to make > sure bio size doesn't overflow. > > For discard, we ensure max_discard_sectors is of the proper > granularity. So if discard size > 4G, blkdev_issue_discard() always > send multiple granularity requests to lower level, except that the > last one may be not multiple granularity. > > Signed-off-by: Ming Lin <ming.l at ssi.samsung.com> > --- > block/blk-lib.c | 37 +++++++++---------------------------- > 1 file changed, 9 insertions(+), 28 deletions(-) > > diff --git a/block/blk-lib.c b/block/blk-lib.c > index 7688ee3..e178a07 100644 > --- a/block/blk-lib.c > +++ b/block/blk-lib.c > @@ -44,7 +44,6 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector, > struct request_queue *q = bdev_get_queue(bdev); > int type = REQ_WRITE | REQ_DISCARD; > unsigned int max_discard_sectors, granularity; > - int alignment; > struct bio_batch bb; > struct bio *bio; > int ret = 0; > @@ -58,18 +57,15 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector, > > /* Zero-sector (unknown) and one-sector granularities are the same. */ > granularity = max(q->limits.discard_granularity >> 9, 1U); > - alignment = (bdev_discard_alignment(bdev) >> 9) % granularity; > > /* > - * Ensure that max_discard_sectors is of the proper > - * granularity, so that requests stay aligned after a split. > - */ > - max_discard_sectors = min(q->limits.max_discard_sectors, UINT_MAX >> 9); > + * Ensure that max_discard_sectors doesn't overflow bi_size and is of > + * the proper granularity. So if discard size > 4G, blkdev_issue_discard() > + * always split and send multiple granularity requests to lower level, > + * except that the last one may be not multiple granularity. > + */ > + max_discard_sectors = UINT_MAX >> 9; > max_discard_sectors -= max_discard_sectors % granularity; > - if (unlikely(!max_discard_sectors)) { > - /* Avoid infinite loop below. Being cautious never hurts. */ > - return -EOPNOTSUPP; > - } > > if (flags & BLKDEV_DISCARD_SECURE) { > if (!blk_queue_secdiscard(q)) > @@ -84,7 +80,7 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector, > blk_start_plug(&plug); > while (nr_sects) { > unsigned int req_sects; > - sector_t end_sect, tmp; > + sector_t end_sect; > > bio = bio_alloc(gfp_mask, 1); > if (!bio) { > @@ -93,20 +89,7 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector, > } > > req_sects = min_t(sector_t, nr_sects, max_discard_sectors); > - > - /* > - * If splitting a request, and the next starting sector would be > - * misaligned, stop the discard at the previous aligned sector. > - */ > end_sect = sector + req_sects; > - tmp = end_sect; > - if (req_sects < nr_sects && > - sector_div(tmp, granularity) != alignment) { > - end_sect = end_sect - alignment; > - sector_div(end_sect, granularity); > - end_sect = end_sect * granularity + alignment; > - req_sects = end_sect - sector; > - } > > bio->bi_iter.bi_sector = sector; > bio->bi_end_io = bio_batch_end_io; > @@ -166,10 +149,8 @@ int blkdev_issue_write_same(struct block_device *bdev, sector_t sector, > if (!q) > return -ENXIO; > > - max_write_same_sectors = q->limits.max_write_same_sectors; > - > - if (max_write_same_sectors == 0) > - return -EOPNOTSUPP; > + /* Ensure that max_write_same_sectors doesn't overflow bi_size */ > + max_write_same_sectors = UINT_MAX >> 9; > > atomic_set(&bb.done, 1); > bb.flags = 1 << BIO_UPTODATE; > Wouldn't it be easier to move both max_write_same_sectors and max_discard sectors to 64 bit (ie to type sector_t) and be done with the overflow? Seems to me this is far too much coding around self-imposed restrictions... Cheers, Hannes -- Dr. Hannes Reinecke zSeries & Storage hare at suse.de +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)