Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
> -----Original Message----- > From: Weilin Gong [mailto:wgong at alcatel-lucent.com] > Sent: Wednesday, February 21, 2007 7:39 PM > To: Ross S. W. Walker > Cc: drbd-user at linbit.com > Subject: Re: [DRBD-user] Performance with DRBD + iSCSI > > Ross S. W. Walker wrote: > >> -----Original Message----- > >> From: Weilin Gong [mailto:wgong at alcatel-lucent.com] > >> Sent: Wednesday, February 21, 2007 5:47 PM > >> To: Ross S. W. Walker > >> Cc: drbd-user at linbit.com > >> Subject: Re: [DRBD-user] Performance with DRBD + iSCSI > >> > >> Ross S. W. Walker wrote: > >> > >>>> -----Original Message----- > >>>> From: drbd-user-bounces at lists.linbit.com > >>>> [mailto:drbd-user-bounces at lists.linbit.com] On Behalf Of > >>>> > >> Weilin Gong > >> > >>>> Sent: Wednesday, February 21, 2007 1:11 PM > >>>> Cc: drbd-user at lists.linbit.com > >>>> Subject: Re: [DRBD-user] Performance with DRBD + iSCSI > >>>> > >>>> Ross S. W. Walker wrote: > >>>> > >>>> > >>>>> You can only write into drbd using what your application > >>>>> > >>>>> > >>>> can handle and > >>>> > >>>> > >>>>> for VFS file operations that is 4k io! > >>>>> > >>>>> > >>>> On Solaris ufs, the "maxcontig" parameter can be tuned to > >>>> > >> specify the > >> > >>>> the number of contiguous > >>>> blocks written to the disk. Haven't found the equivalence on > >>>> Linux yet. > >>>> > >>>> > >>> Well, if you write your own app you can bypass VFS page-memory io > >>> restriction by using the generic block layer. > >>> > >>> I'm not sure if you quite understand the maxcontig > parameter either: > >>> > >>> maxcontig=n The maximum number of logical > >>> blocks, belonging to one file, that > >>> are allocated contiguously. The > >>> default is calculated as follows... > >>> > >>> This parameter is for tuning disk space allocation in order > >>> > >> to reduce > >> > >>> fragmentation, it doesn't affect the io block size. > >>> > >>> > >> Actually, this defines the max io data size the file system > >> sends down > >> to the driver. > >> We have a home grown vdisk driver, similar to drbd, the > >> "maxcontig" had > >> to be > >> tuned to match the size of the buffer allocated for the > >> network transport. > >> > > > > Ok, so I found this: > > > > ---------------- > > Tune maxcontig > > > > Under the Solaris OE, UFS uses an extent-like feature > called clustering. > > It is impossible to have a default setting for maxcontig > that is optimal > > for all file systems. It is too application dependent. Many > small files > > accessed in a random pattern do not need extents and performance can > > suffer for both reads and writes when using extents. Larger > files can > > benefit from read-ahead on reads and improved allocation units when > > using extents in writes. > > > > For reads, the extent-like feature is really just a read ahead. To > > simply and dynamically tune the read-ahead algorithm, use > the tunefs(1M) > > command as follows: > > > > # tunefs -a 4 /ufs1 > > > > The value changed is maxcontig, which sets the number of file system > > blocks read in read ahead. The preceding example changes the maximum > > contiguous block count from 32 (the default) to 4. > > > > When a process reads more than one file system block, the kernel > > schedules reads to fill the rest of maxcontig * file system > blocksize > > bytes. A single 8 kilobyte, or smaller, random read on a > file does not > > trigger read ahead. Read ahead does not occur on files > being read with > > mmap(2). > > > > The kernel attempts to automatically detect whether an > application is > > doing small random or large sequential I/O. This often > works fine, but > > the definition of small or large depends more on system application > > criteria than on device characteristics. Tune maxcontig to obtain > > optimal performance. > > ---------------- > > > > So this is like setting read-ahead with, blockdev --setra XX /dev/XX > > > "maxcontig" is also used for write: > > In UFS, the filesystem cluster size, for both reads and > writes, is set > to the value set for /maxcontig/. The filesystem cluster size > is used to > determine: > > * The maximum number of logical blocks contiguously laid > out on disk > for a UFS filesystem before inserting a rotational delay. > * When, and the amount to read ahead and/or write behind if the > sequential IO case is found. The algorithm that determines > sequential read ahead in UFS is broken, so system administrators > use the /maxcontig/ value to tune their filesystems to achieve > better random I/O performance. > * The UFS filesystem cluster size also indicates how many pages to > attempt to push out to disk at a time. It also determines the > frequency of pushing pages because in UFS pages are > clustered for > writes, based on the filesystem cluster size. Interesting, so much seemingly unrelated performance metrics are tuned with a single parameter, must be hard to get it just right. In Linux, each device driver will have it's own max_sectors and max_phys_segments, which in turn will determine how large a request that driver can process and the i/o scheduler will make sure when merging contig requests on the queue not to build them past that point. Max_sectors is the maximum number of disk sectors per request, while max_phys_segments is the maximum number of disperse memory locations that the disk drive can do DMA to in a request. Some drivers have tunable max_sectors, but max_phys_segments is a limitation of the HBA. Read-ahead is tunable as above, write-back is handled transparently by the page cache, but can be tuned to a degree using VFS API features, mainly size is a concern, as if the write-back gets too big then there is a substantial performance penalty to other i/o ops in flight when it finally flushes. -Ross ______________________________________________________________________ This e-mail, and any attachments thereto, is intended only for use by the addressee(s) named herein and may contain legally privileged and/or confidential information. If you are not the intended recipient of this e-mail, you are hereby notified that any dissemination, distribution or copying of this e-mail, and any attachments thereto, is strictly prohibited. If you have received this e-mail in error, please immediately notify the sender and permanently delete the original and any copy or printout thereof.