Feedback about iSCSI High Availability Clustering Using DRBD and Pacemaker on RHEL 9

Matt Kereczman matt at linbit.com
Fri Mar 15 17:19:48 CET 2024


On 3/14/24 00:23, Strahil Nikolov wrote:
> Hi All,
> 
> Do we have a thread about the HA iSCSI article ?
> I saw a few things that can be optimized/updated:

Hello Strahil,

Thank you for reaching out and starting this thread! I just want to 
say up top that we don't often receive feedback on our tech guides, 
but we absolutely value and appreciate it.

> 
> 
> 1. Point 3.7.1 Shouldn't be done as pcs has a native way to auth, 
> assemble and start the cluster and corosync tinkering is no longer 
> needed. Something like this:
> 
> echo 'somepass' | passwd --stdin hacluster
> pcs host auth node1 node1.example.com node2 node2.example.com
> 
> pcs cluster setup CLUSTERNAME node1 node2 totem token=10000 --enable 
> --start
> 

Great point. While I was aware of this, I'm very used to the older 
manual process of configuring the cluster communication layer. This 
comment applies to a lot of our newer guides that use pcs, and we'll 
work on addressing this throughout them all.

> 
> 2. Point 3.8 is against any high availability and against Red Hat 
> support policy - it should have a red lavel that this is done for the 
> demo !

Absolutely agree. There is a big red "WARNING" at the bottom of 
section 3.8 that touches on this point, but we can move it to the top 
for better visibility.

It is my understanding that Red Hat doesn't support a few things that 
LINBIT does support in regards to Pacemaker, including  clusters 
without node level fencing - albeit strongly suggested whenever possible.

> 
> 3. In point 3.10 I highly recommend setting scsi_sn for the 
> ocf:heartbeat:iSCSILogicalUnit - the software by default picks the 
> same SN for the first LUN and when 1 client attaches 2 LUNs (from 2 
> srparate clusters) multipath will treat the 2 sources as one and 
> aggregate the paths - becomes a real mess

Good catch. The automation we use internally for testing HA iSCSI 
deployments sets the scsi_sn, so I'm surprised that I missed it in 
this guide. I believe I recall older VMware products required the 
scsi_sn match for smooth failover as well, either way, will update.

> 
> 4. Consider LVM filter for the DRBD device - a client might use the 
> LUN as a PV in a volume group and then thr situation will get messy - 
> the cluster won't be able to demote the node and pacemaker will fence it.

Excellent suggestion for a common use case. Will include.

> 
> 5. Consider using fencing delay when using 2-node clusters - in case 
> of split brain scenario the node with more resources will survive:
> 
> pcs resource defaults update priority=1
> pcs property set priority-fencing-delay=10
<snip>

We definitely practice this for 2-node clusters we deploy with fencing 
configured.

I think it doesn't make sense to include in this guide where we do not 
configure fencing, but it should be included in a more general 
reference on, "how to properly configure fencing". I will make sure 
that we have this information somewhere for public consumption, and 
possibly link to it within all our tech guides.

Best Regards,
Matt Kereczman

P.S. We are happy to receive feedback and suggestions through this 
mailing list. Thanks again for what you have provided. You can also 
make suggestions about user's guides through opening issues in our 
GitHub repository (https://github.com/LINBIT/linbit-documentation). 
For technical "how-to" guides or other documentation content that 
isn't a user's guide, such as a knowledge base article or blog 
article, another option is to reach out to documentation at linbit.com.


More information about the drbd-user mailing list