[DRBD-user] split-brain on Ubuntu 14.04 LTS after reboot of master node

Ivan ivan at c3i.bg
Sun Nov 15 17:36:20 CET 2015

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.

On 11/15/2015 05:04 PM, Digimer wrote:
> On 15/11/15 05:03 AM, Waldemar Brodkorb wrote:
>> Hi,
>> Digimer wrote,
>>>> property $id="cib-bootstrap-options" \
>>>>          dc-version="1.1.10-42f2063" \
>>>>          cluster-infrastructure="corosync" \
>>>>          stonith-enabled="false" \
>>> And here's the core of the problem.
>>> Configure and test stonith in pacemaker. Then, configure drbd to use
>>> 'fencing resource-and-stonith;' and configure 'crm-{un,}fence-peer.sh as
>>> the {un,}fence handlers.
>> So stonith is a hard requirement even when a simple reboot is done?
>> The docs is mentioning following:
>> "The ocf:linbit:drbd OCF resource agent provides Master/Slave
>> capability, allowing Pacemaker to start and monitor the DRBD
>> resource on multiple nodes and promoting and demoting as needed. You
>> must, however, understand that the drbd RA disconnects and detaches
>> all DRBD resources it manages on Pacemaker shutdown, and also upon
>> enabling standby mode for a node."
>> http://drbd.linbit.com/users-guide-8.4/s-pacemaker-crm-drbd-backed-service.html
>> So why demoting does not work when a reboot is done?
>> When I do a simple crm node standby; sleep 30; crm node online
>> everything is fine.
>> best regards
>>   Waldemar
> It's a hard requirement, period. Without it, debugging problems is a
> waste of time because the cluster enters an undefined state. Fix
> stonith, see if the issue remains, and if so, let us know.

You're right that fencing should be set up for production clusters (in 
the sense that you take a huge data consistency risk not setting it up) 
but last time I did a test environment without stonith I could reboot a 
node without getting a pacemaker split-brain. Either things have changed 
from back then, or the OP is hitting another problem; maybe the reboot 
doesn't properly shut down pacemaker, or the network (link, firewall, 
...) is torn down before pacemaker is stopped, ...


More information about the drbd-user mailing list