Feedback about iSCSI High Availability Clustering Using DRBD and Pacemaker on RHEL 9
Matt Kereczman
matt at linbit.com
Fri Mar 15 17:19:48 CET 2024
On 3/14/24 00:23, Strahil Nikolov wrote:
> Hi All,
>
> Do we have a thread about the HA iSCSI article ?
> I saw a few things that can be optimized/updated:
Hello Strahil,
Thank you for reaching out and starting this thread! I just want to
say up top that we don't often receive feedback on our tech guides,
but we absolutely value and appreciate it.
>
>
> 1. Point 3.7.1 Shouldn't be done as pcs has a native way to auth,
> assemble and start the cluster and corosync tinkering is no longer
> needed. Something like this:
>
> echo 'somepass' | passwd --stdin hacluster
> pcs host auth node1 node1.example.com node2 node2.example.com
>
> pcs cluster setup CLUSTERNAME node1 node2 totem token=10000 --enable
> --start
>
Great point. While I was aware of this, I'm very used to the older
manual process of configuring the cluster communication layer. This
comment applies to a lot of our newer guides that use pcs, and we'll
work on addressing this throughout them all.
>
> 2. Point 3.8 is against any high availability and against Red Hat
> support policy - it should have a red lavel that this is done for the
> demo !
Absolutely agree. There is a big red "WARNING" at the bottom of
section 3.8 that touches on this point, but we can move it to the top
for better visibility.
It is my understanding that Red Hat doesn't support a few things that
LINBIT does support in regards to Pacemaker, including clusters
without node level fencing - albeit strongly suggested whenever possible.
>
> 3. In point 3.10 I highly recommend setting scsi_sn for the
> ocf:heartbeat:iSCSILogicalUnit - the software by default picks the
> same SN for the first LUN and when 1 client attaches 2 LUNs (from 2
> srparate clusters) multipath will treat the 2 sources as one and
> aggregate the paths - becomes a real mess
Good catch. The automation we use internally for testing HA iSCSI
deployments sets the scsi_sn, so I'm surprised that I missed it in
this guide. I believe I recall older VMware products required the
scsi_sn match for smooth failover as well, either way, will update.
>
> 4. Consider LVM filter for the DRBD device - a client might use the
> LUN as a PV in a volume group and then thr situation will get messy -
> the cluster won't be able to demote the node and pacemaker will fence it.
Excellent suggestion for a common use case. Will include.
>
> 5. Consider using fencing delay when using 2-node clusters - in case
> of split brain scenario the node with more resources will survive:
>
> pcs resource defaults update priority=1
> pcs property set priority-fencing-delay=10
<snip>
We definitely practice this for 2-node clusters we deploy with fencing
configured.
I think it doesn't make sense to include in this guide where we do not
configure fencing, but it should be included in a more general
reference on, "how to properly configure fencing". I will make sure
that we have this information somewhere for public consumption, and
possibly link to it within all our tech guides.
Best Regards,
Matt Kereczman
P.S. We are happy to receive feedback and suggestions through this
mailing list. Thanks again for what you have provided. You can also
make suggestions about user's guides through opening issues in our
GitHub repository (https://github.com/LINBIT/linbit-documentation).
For technical "how-to" guides or other documentation content that
isn't a user's guide, such as a knowledge base article or blog
article, another option is to reach out to documentation at linbit.com.
More information about the drbd-user
mailing list