[DRBD-user] DRBD and iSCSI (which? ^o^) versus scalability

Fri Jul 27 04:32:58 CEST 2012

Hello,

I'm pondering a HA iSCSI (really iSER or SRP, Infiniband backend) storage
cluster based on DRBD and Pacemaker. So something that has been documented
and implemented numerous times.

However setting up things on one of my test clusters it became clear to me
that this is probably not something all that rosy.

Issues:

1. Which bloody iSCSI stack? The obvious choice would be LIO, being the
official stack and certainly having the least "fend for yourself and use
the source Luke" homepage. Alas that requires at least a 3.4 kernel (3.3
really but that's EOL) if one wants SRP. A bit on the cutting edge, esp.
considering stable user land distributions, Debian in my case. Also what I
really want is iSER, being more feature rich and a real [tm] standard. 
But for the sake of going with the times, I used LIO for the testbed,
foregoing SRP and going with plain iSCSI (no Infiniband on that test
cluster anyway ^o^)

2. House of cards. Setting this up I ran into several issues that boil
down to: "if anything goes wrong, wipe the slate". As in, reboot or
manually clean up anything left behind by either LIO (LUNs/block device
attachments from failed attempts or unclean shut down RAs) or LVM (still
active LVs due to LIO still hogging them or Pacemaker otherwise failing
and leaving crud behind). The Debian sid (bleeding edge) pacemaker seems
to be either not quite up to date or nobody ever uses LIO, this warning
every 10 seconds doesn't instill confidence either:
---
Jul 27 10:52:41 borg00b iSCSILogicalUnit[27911]: WARNING: Configuration paramete
r "scsi_id" is not supported by the iSCSI implementation and will be ignored.
---
And before anybody asks, I followed the Linbit guide.
I simply can not believe that a setup this fragile will survive normal
operations like adding additional targets or LUNs, least a real incident.
Especially not with 1000 targets/LUNs/LVs.
Also reading what others found out about SRP with LIO is that it isn't as
mature as one would wish for, example in case was the lack of support for
disconnection. If that works both ways, it would result in lingering
targets/LUNs and the impact described above.

3. Objects in the rear view mirror. Has anybody here deployed more than 10
targets/LUNs? And done so w/o going crazy or running into issues mentioned
in 2)? 
How? Self made scripts/puppet? 
I am looking at about 1000 VMs connecting to that storage cluster, meaning
1000 targets, each with probably 2 LUNs. Doing this in pacemaker is a
divine punishment and I can see it taking a loooong time getting these
started/stopped (with all the problems that can entail in the pacemaker
logic). 
I'm not asking for free counseling, I just would like to hear if anybody
climbed those heights before w/o falling of the cliff or succumbing to
hypoxia. ^o^

Regards,

Christian
-- 
Christian Balzer        Network/Systems Engineer                
chibi at gol.com   	Global OnLine Japan/Fusion Communications
http://www.gol.com/