[DRBD-user] dual primary DRBD, iSCSI and multipath possible?

Markus Hochholdinger Markus at hochholdinger.net
Wed Oct 6 15:48:50 CEST 2010

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Hello,

Am 05.10.2010 um 14:36 Uhr schrieb Lars Ellenberg <lars.ellenberg at linbit.com>:
> On Mon, Oct 04, 2010 at 09:56:12PM +0200, Markus Hochholdinger wrote:
> > Am 15.03.2010 um 15:47 Uhr schrieb Olivier LAMBERT
> > <lambert.olivier at gmail.com>:
[..]
> > My setup would be:
> > On two nodes drbd with active/active (so xen live migration would work).
> > On each node export the drbd device with iscsi.
> If you xen attaches to iSCSI,
> why would you need anything else for live migration?

To use Xen on a drbd (storage) node!

I've a setup where I export logical volumes over iscsi to each other node. But 
I don't re-import (loop) the iscsi on the same host, I use the logical volume 
directly. With the help of udev-rules, I have on each node the same path to 
the same volume, nevertheless if it's the local logical volume or a remote 
logical volume imported via iscsi.
In the Xen domUs I run software raid1. With this I have:
* Each node (hardware) is a Xen dom0 AND a storage host!
* I can live migrate the Xen domUs on every node I wish.
* I don't need heartbeat or anything similar.

The downside of this setup ist, that I have to manage the raid1 inside the 
domUs.

So here comes drbd to my mind, I'd like to manage the software raid1 outside 
my domUs. With a two node setup and drbd in active/active mode it's very easy 
and straight forward. So I'm trying to expand the two node setup with drbd 
for a more than two node setup.


> > On each other node import the iscsi devices of both drbd nodes and put
> > multipath over it.
> Don't.

Why? Is there something in the semantics of drbd in active/active mode which 
is against this?


> > The tricky part now is how to handle failures. With this setup it is
> > possible that multipath switches between both drbd nodes. If we do this
> > more than once while we have a split brain, this would destroy our data!
> With dual-primary DRBD, you currently still have to set it up so that at
> least one node reboots if the replication link breaks, and the rebooted
> node must not go primary again untill connection is re-established.

A reboot is very heavy! Couldn't it be sufficient to disable the primary 
function on one of the drbd nodes? My idea is, that if the direct link fails, 
the drbd nodes could look up where the domU is running and only keep the 
active (primary) function if the domU is reachable (or very near)?


> > So the goal would be to develope a good multipath strategy.
> > How do you do this?
> You don't.

Why is multipath bad?


> > My idea is to say multipath to stick to one path and only switch on an
> > error.
> Unfortunately you cannot control on what type of error the initiator will
> switch.

The initiator won't switch! The multipath function of the device mapper should 
switch, or wouldn't it!?


> > Also you have to say multipath to NOT recover faulty paths autmatically
> > to prevent data loss in a split brain situation.
> That's not your only problem there.
> Please go for a failover setup.

It must sound funny, but I'd like to prevent any failover setup which includes 
heartbeat. I'd like to have the raid1 function very near to my domUs, and if 
I'm successfull, I won't need any heartbeat setup for this.


> You can have two targets, one active on box A, one active on box B,
> both replicating to the respective other.

That's what I'd like to set up.


> As the iSCSI target is not cluster aware, if you try to do multipath to
> two _independent_ targets on an dual-primary DRBD, in general you will
> break things.

But the underlying drbd is cluster aware!? With iscsi target in blockio mode 
there shouldn't interfere any caching. And drbd itself should do things right 
in active/active mode, or?


> DRBD only replicates the data changes.  To make that actually work, the
> targets would have to be cluster aware, and replicate iSCSI state to
> each other. All the non-data commands, unit attention, lun resets, cmd
> aborts, not to speek of ordering or reservations.

Why is the iscsi state so important? If a iscsi link breaks, it answers with 
i/o errors until the link is re-established. Each command has to be send 
again. I never had any problem with disconnecting and reconnecting iscsi 
devices on the initiator. With my setup with software raid1 in my domUs I 
have to disconnect and reconnect the devices on each reboot of a dom0 (e.g. 
kernel security update of my dom0s) and have to rebuild sofwatre raid1 in my 
domUs.


> It may appear to work as long as it is completely unloaded, you don't do
> reservations, target and initiator are mostly idle, there are not much
> scsi commands involved, and all you are doing is unplug an iSCSI cable.
> But in general, if anything happens at all, you get high load on either
> the network or the initiators or the targets or the replication link
> or anything more interesting breaks, then I certainly don't want to be
> the one cleaning up the mess.

Is something of the iscsi state transported to the drbd device? Is there some 
locking code inside the drbd which would prevent switching? I still don't get 
it. Perhaps I have to test this until I relize where the problem is for me.


> I strongly recommend against it.

Many thanks for your explanations.

For now, my first step is to test and use drbd (active/active) in a two node 
xen setup. If I'm familiar enough with drbd I will try the "more than two 
node setup" in a test environment. I'll take your advice and will test this 
under heavy load to see where my setup goes wrong. Perhaps then I'll 
understand you.


Many thanks,


-- 
greetings

eMHa
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part.
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20101006/4c5e5a1f/attachment.pgp>


More information about the drbd-user mailing list