Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
----- Original Message ----- From: "Sebastian Riemer" <sebastian.riemer at profitbricks.com> To: "Jason Thomas" <jthomas at medata.com> Cc: drbd-user at lists.linbit.com Sent: Friday, March 1, 2013 2:39:44 AM Subject: Re: [DRBD-user] drbd pacemaker scst/srp 2 node active/passive question On 01.03.2013 03:49, Jason Thomas wrote: > I have a 2 node DRBD backed SCST/SRP single target(ib_srpt) setup working great using pacemaker/corosync. I am using this for the data store for a mail server. Where I am running into an issue is the initiator's are running on vmware ESXi 4.1 hosts, when a fail over occurs on the target the vm host initiators go dead and you have to rescan to pick up the target via the new path causing the vm guest to go down until the new path is discovered. Wait a minute, so you just need replicated HA storage with RDMA? Correct, replicated storage. SRP was the right decision as iSER and IPoIB are too complex and too unstable. Do you use InfiniBand, iWARP or RoCE for RDMA? We are using Infiniband for RDMA. A primary/secondary setup introduces lots of latency as you have CHAINED network paths. So there is no RDMA advantage anymore for writes. The primary does "store and forward" for the secondary. This tells you exactly that DRBD isn't the best solution for you. This is why we've hacked MD RAID-1 for high performance replication on the initiator side (PARALLEL paths + simplicity + stability). We had to hack it for VM live migration, read-only volumes, raw-to-md migration, etc. It became a really cool solution but unfortunately it isn't really possible to merge that to the mainline as replication is a completely different use case for MD. The write-intent bitmap of MD is really really sophisticated compared to the DRBD metadata stuff. RAID-1 also has sophisticated read-balancing. But there are further issues in the mainline SRP initiator. It doesn't support multipathing, yet. It takes 2..3 minutes until ib_srp fails the IO to upper layers so that a path/replica can be switched over. The ib_srpt maintainer Bart Van Assche works on fixing that. He released his srp-ha patches to the "linux-rdma" mailing list. So take this issue to the "linux-rdma" mailing list. Bart will help you for sure. We've already adapted his patches and implemented our own SRP reconnect in addition. So the bad news is: You need a Linux kernel developer with RDMA and storage skills for that. Perhaps, we can combine some efforts. Cheers, Sebastian -- Sebastian Riemer Linux Kernel Developer - InfiniBand and Storage We are looking for (SENIOR) LINUX KERNEL DEVELOPERS! ProfitBricks GmbH • Greifswalder Str. 207 • 10405 Berlin, Germany www.profitbricks.com • sebastian.riemer at profitbricks.com Sitz der Gesellschaft: Berlin Registergericht: Amtsgericht Charlottenburg, HRB 125506 B Geschäftsführer: Andreas Gauger, Achim Weiss Combining efforts sounds good, please let me know what I can do. -- Jason Thomas | AVP Technology | Medata, Inc.