[DRBD-user] drbd pacemaker scst/srp 2 node active/passive question

Jason Thomas jthomas at medata.com
Fri Mar 1 15:47:29 CET 2013

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.



----- Original Message -----
From: "Sebastian Riemer" <sebastian.riemer at profitbricks.com>
To: "Jason Thomas" <jthomas at medata.com>
Cc: drbd-user at lists.linbit.com
Sent: Friday, March 1, 2013 2:39:44 AM
Subject: Re: [DRBD-user] drbd pacemaker scst/srp 2 node active/passive question

On 01.03.2013 03:49, Jason Thomas wrote:
> I have a 2 node DRBD backed SCST/SRP single target(ib_srpt) setup working great using pacemaker/corosync.  I am using this for the data store for a mail server.  Where I am running into an issue is the initiator's are running on vmware ESXi 4.1 hosts, when a fail over occurs on the target the vm host initiators go dead and you have to rescan to pick up the target via the new path causing the vm guest to go down until the new path is discovered.

Wait a minute, so you just need replicated HA storage with RDMA?

Correct, replicated storage.

SRP was the right decision as iSER and IPoIB are too complex and too
unstable.
Do you use InfiniBand, iWARP or RoCE for RDMA?

We are using Infiniband for RDMA.

A primary/secondary setup introduces lots of latency as you have CHAINED
network paths. So there is no RDMA advantage anymore for writes. The
primary does "store and forward" for the secondary.

This tells you exactly that DRBD isn't the best solution for you. This
is why we've hacked MD RAID-1 for high performance replication on the
initiator side (PARALLEL paths + simplicity + stability). We had to hack
it for VM live migration, read-only volumes, raw-to-md migration, etc.
It became a really cool solution but unfortunately it isn't really
possible to merge that to the mainline as replication is a completely
different use case for MD.

The write-intent bitmap of MD is really really sophisticated compared to
the DRBD metadata stuff. RAID-1 also has sophisticated read-balancing.

But there are further issues in the mainline SRP initiator. It doesn't
support multipathing, yet. It takes 2..3 minutes until ib_srp fails the
IO to upper layers so that a path/replica can be switched over.

The ib_srpt maintainer Bart Van Assche works on fixing that. He released
his srp-ha patches to the "linux-rdma" mailing list. So take this issue
to the "linux-rdma" mailing list. Bart will help you for sure. We've
already adapted his patches and implemented our own SRP reconnect in
addition.

So the bad news is: You need a Linux kernel developer with RDMA and
storage skills for that.
Perhaps, we can combine some efforts.

Cheers,
Sebastian

-- 
Sebastian Riemer
Linux Kernel Developer - InfiniBand and Storage

We are looking for (SENIOR) LINUX KERNEL DEVELOPERS!

ProfitBricks GmbH • Greifswalder Str. 207 • 10405 Berlin, Germany
www.profitbricks.com • sebastian.riemer at profitbricks.com

Sitz der Gesellschaft: Berlin
Registergericht: Amtsgericht Charlottenburg, HRB 125506 B
Geschäftsführer: Andreas Gauger, Achim Weiss


Combining efforts sounds good, please let me know what I can do.

-- 
Jason Thomas | AVP Technology | Medata, Inc. 



More information about the drbd-user mailing list