[DRBD-user] Cluster filesystem question

Kushnir, Michael (NIH/NLM/LHC) [C] michael.kushnir at nih.gov
Tue Nov 29 23:02:27 CET 2011

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.

Hi Lars,

Thanks for the info! I have a couple of questions:

>iSCSI is a stateful protocol, there is more to it that than just reads and writes.
>To run multipath (or multi-connections per session) against *DISTINCT* targets [*] on separate nodes

What does that mean-- what more than reads and writes? Is GNBD different? Can you primary-primary-multipath GNBD servers?

I currently run DRBD 8.3 in dual-primary mode with IET iSCSI targets on both nodes. I use this as a VMFS datastore for 3 VMware ESX servers set for round-robin access. I've had no error messages or any indication of data corruption. I've got the nodes setup for IPMI-reboot fencing with node1 always surviving in the case of a split brain and node2 always getting rebooted and overwritten. 

Is this not a good setup? What could "blow up in my face"?


-----Original Message-----
From: Lars Ellenberg [mailto:lars.ellenberg at linbit.com] 
Sent: Monday, November 28, 2011 3:27 PM
To: drbd-user at lists.linbit.com
Subject: Re: [DRBD-user] Cluster filesystem question

On Fri, Nov 25, 2011 at 09:11:39PM +0100, Andreas Hofmeister wrote:
> On 25.11.2011 17:08, John Lauro wrote:
> >What I’m not sure about from the examples… Can you then add more
> >GFS/OCFS2 nodes (ie: a 3^rd and 4^th node) that have no local disk as 
> >part of the SAN cluster, but instead talk drbd to the first two nodes?
> >Or would you have to run a shared failover service such as NFS on top 
> >of the two node cluster if you need multiple hosts accessing it?
> For OCFS or GFS, all nodes need access to the same block device at the 
> same time. In a two-node setup, you can easily use DRBD in 
> dual-primary mode to provide such a shared block device.
> For more than two nodes, you need to export the block device provided 
> by DRBD by a different method (such as iSCSI), since DRBD
> (normaly) supports only two nodes to share a block device.
> Depending on your needs, you can either go for an active-passive or an 
> active-active export of the DRBD device(s).
> Active-passive is way easier to implement. Try Google for "DRBD iSCSI" 
> and you'll find some interesting readings on that. Generally this 
> route involves DRBD in single-primary mode, iSCSI target/initiator 
> implementations, a method to share an IP addresses and a cluster 
> manager to coordinate all those things.
> The active-active route seems to be somewhat harder - or at least I 
> did not find any useful references. A possible implementation would 
> likely include all of the above, except that one would use 
> SCSI-multipath instead of sharing a single IP address to access the 
> iSCSI targets.
> @list: did anybody try such a thing ?

"dual-primary" iscsi targets for multipath: does not work.

iSCSI is a stateful protocol, there is more to it that than just reads and writes.
To run multipath (or multi-connections per session) against *DISTINCT* targets [*] on separate nodes

	** you'd need to have cluster aware iSCSI targets **

which coordinate with each other in some fashion.

To my knowledge, this does not exist (not for linux, anyways).

[*] which happen to live on top of data that, due to replication, happens to be the same, most of the time, unless the replication link was lost for whatever reason; in which case you absolutely want to make sure that at least one box reboots hard before it even thinks about completing or even submitting an other IO request...

Please do not try dual-primary DRBD exported as iSCSI target on both nodes for multipathing or MS/C initiators.

If you try anyways, it may appear to work for some time, and then blow up in your face just when it is most inconvenient.

If you want some single-primary drbd be exported on one node, the other single-primary drbd exported on the other node, that is certainly possible, and would at least distribute the read load (writes have to hit both storage backends anways).
We can call that "active-active" as well, just not within the same drbd.

> >Mainly ask because I assume recovery time from failover would be much 
> >quicker with a cluster filesystem (doesn’t have to fsck/remount), and 
> >reads (would?) be split over the two hosts instead of just one, so it 
> >should be slightly faster with two hosts serving read I/O requests 
> >instead of one.
> In an active-passive setup, you won't get any performance gains.
> Whether you can see any throughput improvements with an active-active 
> setup would basically depend on your network setup.

: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
please don't Cc me, but send to list   --   I'm subscribed
drbd-user mailing list
drbd-user at lists.linbit.com

More information about the drbd-user mailing list