[DRBD-user] DRBD and iSCSI (which? ^o^) versus scalability

Fri Jul 27 13:35:53 CEST 2012

On Jul 26, 2012, at 10:32 PM, Christian Balzer wrote:

> 1. Which bloody iSCSI stack?

I've been satisfied with LIO, though I can't say I've tested the others as extensively. I'm using the Debian kernel from sequeeze-backports. I'm using an Ethernet backend, so I can't comment on anything more expensive or bleeding edge.

> 2. The Debian sid (bleeding edge) pacemaker seems
> to be either not quite up to date or nobody ever uses LIO, this warning
> every 10 seconds doesn't instill confidence either:
> ---
> Jul 27 10:52:41 borg00b iSCSILogicalUnit[27911]: WARNING: Configuration paramete
> r "scsi_id" is not supported by the iSCSI implementation and will be ignored.
> ---

Are you using pacemaker from squeeze-backports?

The warning is benign, but you will find the RA provided with Pacemaker will fail horribly with LIO for other reasons. LIO has a bug where it continues to reference the underlying device even after it's been freed, as long as there are connections to that LU. If you use Pacemaker's RAs, the LUs are unconfigured before the target, and there's a small window there where LIO may receive a request, attempt to access the backing device of the LU you just unconfigured, causing a kernel panic. I was able to hit it more often than not if you force a LU to migrate while reading it with dd. I bet if you stop the LU without stopping the target you can get it every time.

The workaround is to tear down the TPG first, which will close the iSCSI connections before tearing down the backing devices, thus avoiding the bug. Incidentally, LIO will also take care to clean up all LUNs, backing devices, and other stuff used by a target when you delete the target, so the stop procedure is quite easy.

Anyhow, you can't do things in this order with the heartbeat resource agents. I borrowed the relevant bits from them and adapted them to my own RA. References:

http://comments.gmane.org/gmane.linux.scsi.target.devel/1568?set_cite=hide
http://oss.clusterlabs.org/pipermail/pacemaker/2012-July/014754.html

> 3. Has anybody here deployed more than 10
> targets/LUNs? And done so w/o going crazy or running into issues mentioned
> in 2)? 
> How? Self made scripts/puppet?

I've played with about 20 targets, most with 2 LUs, in a testing environment. I'm working on moving it to production now. I already had a description of all the VMs in Puppet, so I used that to generate the Pacemaker configuration. I generate a /etc/crm.conf, and when it changes, I have Puppet programmed to load it into a shadow CIB. Nagios checks for differences between that shadow and the live CIB so I get notified when action is required. Then I double-check it for sanity, run it through crm_simulate, and merge it. Notably, this also alerts me about things like forgetting I put a node in standby, or unmanaging a service for maintenance, or leaving a constraint from "crm resource migrate ..." in place.

Of course, 1000 VMs is two orders of magnitude more than this. I really have no idea how Pacemaker and LIO scale to that size.