Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On 01.03.2013 15:47, Jason Thomas wrote: > Correct, replicated storage. > > We are using Infiniband for RDMA. > > Combining efforts sounds good, please let me know what I can do. Perfect, so here is what you can do: 1. build up an IB/SRP test setup (3 or 4 machines: 2 storages, 1 initiator, [+ 1 for VM live migration]) 2. set up the storages for SCST/SRP (and DRBD if you want to compare) 3. set up the initiator(s) for SRP and MD RAID-1 4. use at least kernel 3.4 as there MD uses "blk_set_stacking_limits()" (enables big 512 KiB IOs) and bio merging (also required for big IOs). Are you familiar with "blktrace"? It is the best tool to detect latency and block size issues (like they can be found in the configuration: primary/secondary DRBD and iSCSI/iSER/SRP to the primary). These are the things to be tested: 1. connect to both storages with SRP and create an MD RAID-1 device above with 1.2 superblock and write-intent bitmap 2. produce massive IO and test performance on both SRP devices - test this with and without "nv_cache=1" option of SCST vdisk_fileio - you should get great results when blktracing and benchmarking it 3. pull out one of the IB links (or use continuous "ibportstate reset" to simulate that) - How long does it take until IO is failed and the second path is used for reading? (should be 2..3 min) 4. apply Bart's SRP HA patches and configure the timeout times to fail IO earlier, retest 3. Unfortunately, the backport of Bart's srp-ha patches to < kernel 3.6 doesn't really work yet due to missing SCSI stuff. So this also needs to be tested against more recent kernels. This can also be tested with two paths and dm-multipath. Which version does your production kernel have? Report the results to me and also Bart/linux-rdma where appropriate. We should leave the drdb-user mailing list for this discussion at this point as we are moving off-topic. Cheers, Sebastian