Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On Thu, Jun 4, 2015 at 2:06 PM, Mike Snitzer <snitzer at redhat.com> wrote: > On Tue, Jun 02 2015 at 4:59pm -0400, > Ming Lin <mlin at kernel.org> wrote: > >> On Sun, May 31, 2015 at 11:02 PM, Ming Lin <mlin at kernel.org> wrote: >> > On Thu, 2015-05-28 at 01:36 +0100, Alasdair G Kergon wrote: >> >> On Wed, May 27, 2015 at 04:42:44PM -0700, Ming Lin wrote: >> >> > Here are fio results of XFS on a DM stripped target with 2 SSDs + 1 HDD. >> >> > Does it make sense? >> >> >> >> To stripe across devices with different characteristics? >> >> >> >> Some suggestions. >> >> >> >> Prepare 3 kernels. >> >> O - Old kernel. >> >> M - Old kernel with merge_bvec_fn disabled. >> >> N - New kernel. >> >> >> >> You're trying to search for counter-examples to the hypothesis that >> >> "Kernel N always outperforms Kernel O". Then if you find any, trying >> >> to show either that the performance impediment is small enough that >> >> it doesn't matter or that the cases are sufficiently rare or obscure >> >> that they may be ignored because of the greater benefits of N in much more >> >> common cases. >> >> >> >> (1) You're looking to set up configurations where kernel O performs noticeably >> >> better than M. Then you're comparing the performance of O and N in those >> >> situations. >> >> >> >> (2) You're looking at other sensible configurations where O and M have >> >> similar performance, and comparing that with the performance of N. >> > >> > I didn't find case (1). >> > >> > But the important thing for this series is to simplify block layer >> > based on immutable biovecs. I don't expect performance improvement. > > No simplifying isn't the important thing. Any change to remove the > merge_bvec callbacks needs to not introduce performance regressions on > enterprise systems with large RAID arrays, etc. > > It is fine if there isn't a performance improvement but I really don't > think the limited testing you've done on a relatively small storage > configuration has come even close to showing these changes don't > introduce performance regressions. > >> > Here is the changes statistics. >> > >> > "68 files changed, 336 insertions(+), 1331 deletions(-)" >> > >> > I run below 3 test cases to make sure it didn't bring any regressions. >> > Test environment: 2 NVMe drives on 2 sockets server. >> > Each case run for 30 minutes. >> > >> > 2) btrfs radi0 >> > >> > mkfs.btrfs -f -d raid0 /dev/nvme0n1 /dev/nvme1n1 >> > mount /dev/nvme0n1 /mnt >> > >> > Then run 8K read. >> > >> > [global] >> > ioengine=libaio >> > iodepth=64 >> > direct=1 >> > runtime=1800 >> > time_based >> > group_reporting >> > numjobs=4 >> > rw=read >> > >> > [job1] >> > bs=8K >> > directory=/mnt >> > size=1G >> > >> > 2) ext4 on MD raid5 >> > >> > mdadm --create /dev/md0 --level=5 --raid-devices=2 /dev/nvme0n1 /dev/nvme1n1 >> > mkfs.ext4 /dev/md0 >> > mount /dev/md0 /mnt >> > >> > fio script same as btrfs test >> > >> > 3) xfs on DM stripped target >> > >> > pvcreate /dev/nvme0n1 /dev/nvme1n1 >> > vgcreate striped_vol_group /dev/nvme0n1 /dev/nvme1n1 >> > lvcreate -i2 -I4 -L250G -nstriped_logical_volume striped_vol_group >> > mkfs.xfs -f /dev/striped_vol_group/striped_logical_volume >> > mount /dev/striped_vol_group/striped_logical_volume /mnt >> > >> > fio script same as btrfs test >> > >> > ------ >> > >> > Results: >> > >> > 4.1-rc4 4.1-rc4-patched >> > btrfs 1818.6MB/s 1874.1MB/s >> > ext4 717307KB/s 714030KB/s >> > xfs 1396.6MB/s 1398.6MB/s >> >> Hi Alasdair & Mike, >> >> Would you like these numbers? >> I'd like to address your concerns to move forward. > > I really don't see that these NVMe results prove much. > > We need to test on large HW raid setups like a Netapp filer (or even > local SAS drives connected via some SAS controller). Like a 8+2 drive > RAID6 or 8+1 RAID5 setup. Testing with MD raid on JBOD setups with 8 > devices is also useful. It is larger RAID setups that will be more > sensitive to IO sizes being properly aligned on RAID stripe and/or chunk > size boundaries. I'll test it on large HW raid setup. Here is HW RAID5 setup with 19 278G HDDs on Dell R730xd(2sockets/48 logical cpus/264G mem). http://minggr.net/pub/20150604/hw_raid5.jpg The stripe size is 64K. I'm going to test ext4/btrfs/xfs on it. "bs" set to 1216k(64K * 19 = 1216k) and run 48 jobs. [global] ioengine=libaio iodepth=64 direct=1 runtime=1800 time_based group_reporting numjobs=48 rw=read [job1] bs=1216K directory=/mnt size=1G Or do you have other suggestions of what tests I should run? Thanks.