[DRBD-user] drbd pacemaker scst/srp 2 node active/passive question

Fri Mar 1 17:30:05 CET 2013

On 01.03.2013 15:47, Jason Thomas wrote:
> Correct, replicated storage.
> 
> We are using Infiniband for RDMA.
> 
> Combining efforts sounds good, please let me know what I can do.

Perfect, so here is what you can do:
1. build up an IB/SRP test setup
(3 or 4 machines: 2 storages, 1 initiator, [+ 1 for VM live migration])
2. set up the storages for SCST/SRP (and DRBD if you want to compare)
3. set up the initiator(s) for SRP and MD RAID-1
4. use at least kernel 3.4 as there MD uses "blk_set_stacking_limits()"
(enables big 512 KiB IOs) and bio merging (also required for big IOs).

Are you familiar with "blktrace"?

It is the best tool to detect latency and block size issues (like they
can be found in the configuration: primary/secondary DRBD and
iSCSI/iSER/SRP to the primary).

These are the things to be tested:
1. connect to both storages with SRP and create an MD RAID-1 device
above with 1.2 superblock and write-intent bitmap
2. produce massive IO and test performance on both SRP devices - test
this with and without "nv_cache=1" option of SCST vdisk_fileio - you
should get great results when blktracing and benchmarking it
3. pull out one of the IB links (or use continuous "ibportstate reset"
to simulate that) - How long does it take until IO is failed and the
second path is used for reading? (should be 2..3 min)
4. apply Bart's SRP HA patches and configure the timeout times to fail
IO earlier, retest 3.

Unfortunately, the backport of Bart's srp-ha patches to < kernel 3.6
doesn't really work yet due to missing SCSI stuff. So this also needs to
be tested against more recent kernels. This can also be tested with two
paths and dm-multipath.

Which version does your production kernel have?

Report the results to me and also Bart/linux-rdma where appropriate.

We should leave the drdb-user mailing list for this discussion at this
point as we are moving off-topic.

Cheers,
Sebastian