[DRBD-user] DRBD spontaneously loses connection

Fri Dec 11 06:40:32 CET 2015

On 11/12/15 12:12 AM, Igor Cicimov wrote:
> 
> 
> On Fri, Dec 11, 2015 at 3:08 AM, Digimer <lists at alteeve.ca
> <mailto:lists at alteeve.ca>> wrote:
> 
>     On 10/12/15 09:27 AM, Fabrizio Zelaya wrote:
>     > Thank you Lars and Adam for your recommendations.
>     >
>     > I have ping-timeout set to 5 and it still happened.
>     >
>     > Lars with this fencing idea. I have been contemplating this, however I
>     > am rebuilding servers which are already in place,  All the configuration
>     > related to cman and drbd  was literally copied and pasted from the old
>     > servers.
> 
>     You can hook DRBD into cman's fencing with the rhch_fence fence handler,
>     which is included with DRBD. This assumes, of course, that you have
>     cman's fencing configured properly.
> 
>     > Is this concept of having dual-primary without fencing being a
>     > mis-configuration something new?
> 
>     No, but it is often overlooked.
> 
>     > I ask this because as you would imagine by now, the old servers are
>     > working with the exact same configuration and have no problems at all.
>     > And while your idea makes perfect sense it would also make sense to have
>     > the exact same problem on every version of drbd.
> 
>     Fencing is like a seatbelt. You can drive for years never needing it,
>     but when you do, it saves you from hitting the windshield.
> 
>     Fencing is 100% needed, and all the more so with dual-primary.
> 
> 
> So this practically excludes the DRBD usage in the public clouds like
> AWS, right? Here shutting down the peer's power supply is impossible and
> using the CLI has no guarantee that shutting down a peer VM will ever
> happen since has to be done via the network. Even if the VM's have
> multiple network interfaces provisioned for redundancy in different
> subnets they are still virtual and there is possibility they end up on
> the same physical interface on the hypervisor host which has failed (or
> it's switch), causing the split brain in the first place.

As much as I may personally think that public clouds are not good
platforms for HA...

No, AWS is possible to use. There is a fence agent called (I believe)
fence_ec2 that works by requesting an instance be terminated and then
waiting for the confirmation of that task completing. You would
configure this in cluster.conf and then hook DRBD into it by using
'fence-handler "/path/to/rhcs_fence";' and 'fencing-policy
"resource-and-stonith";'. Then, if a node is lost, DRBD will block and
ask cman to fence the node, and wait for a success message.

All the other things you mention are reasons why I personally don't
consider the cloud a good platform, but it is used. For me, I insist on
dual fence methods; First using IPMI and, if that fails, falling back to
a pair of switched PDUs to cut the (redundant) PSUs off.

>     > There is a difference to be consider I guess. I am now installing this
>     > servers using SL6 as you saw in my original email, the old servers are
>     > working with Debian 6.0.7
>     >
>     > The old servers are running  drbd8-utils  2:8.3.7-2.1
> 
>     --
>     Digimer
>     Papers and Projects: https://alteeve.ca/w/
>     What if the cure for cancer is trapped in the mind of a person without
>     access to education?
>     _______________________________________________
>     drbd-user mailing list
>     drbd-user at lists.linbit.com <mailto:drbd-user at lists.linbit.com>
>     http://lists.linbit.com/mailman/listinfo/drbd-user
> 
> 

-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?