[DRBD-user] split-brain on Ubuntu 14.04 LTS after reboot of master node

Digimer lists at alteeve.ca
Sun Nov 15 18:53:44 CET 2015

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.

On 15/11/15 11:36 AM, Ivan wrote:
> On 11/15/2015 05:04 PM, Digimer wrote:
>> On 15/11/15 05:03 AM, Waldemar Brodkorb wrote:
>>> Hi,
>>> Digimer wrote,
>>>>> property $id="cib-bootstrap-options" \
>>>>>          dc-version="1.1.10-42f2063" \
>>>>>          cluster-infrastructure="corosync" \
>>>>>          stonith-enabled="false" \
>>>> And here's the core of the problem.
>>>> Configure and test stonith in pacemaker. Then, configure drbd to use
>>>> 'fencing resource-and-stonith;' and configure
>>>> 'crm-{un,}fence-peer.sh as
>>>> the {un,}fence handlers.
>>> So stonith is a hard requirement even when a simple reboot is done?
>>> The docs is mentioning following:
>>> "The ocf:linbit:drbd OCF resource agent provides Master/Slave
>>> capability, allowing Pacemaker to start and monitor the DRBD
>>> resource on multiple nodes and promoting and demoting as needed. You
>>> must, however, understand that the drbd RA disconnects and detaches
>>> all DRBD resources it manages on Pacemaker shutdown, and also upon
>>> enabling standby mode for a node."
>>> http://drbd.linbit.com/users-guide-8.4/s-pacemaker-crm-drbd-backed-service.html
>>> So why demoting does not work when a reboot is done?
>>> When I do a simple crm node standby; sleep 30; crm node online
>>> everything is fine.
>>> best regards
>>>   Waldemar
>> It's a hard requirement, period. Without it, debugging problems is a
>> waste of time because the cluster enters an undefined state. Fix
>> stonith, see if the issue remains, and if so, let us know.
> You're right that fencing should be set up for production clusters (in
> the sense that you take a huge data consistency risk not setting it up)
> but last time I did a test environment without stonith I could reboot a
> node without getting a pacemaker split-brain. Either things have changed
> from back then, or the OP is hitting another problem; maybe the reboot
> doesn't properly shut down pacemaker, or the network (link, firewall,
> ...) is torn down before pacemaker is stopped, ...
> cheers
> ivan

It is entirely possible the issue is not related to fencing being
disabled. My point is that, without fencing, debugging becomes
sufficiently more complicated that it's not worth it. With fencing,
problems become much easier to find, in my experience.

Also, you need fencing in all cases anyway, so why not use it from the
start? A "test cluster" that doesn't match what will become production
has limited use, wouldn't you agree?

Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?

More information about the drbd-user mailing list