[DRBD-user] split brain on both nodes

Digimer lists at alteeve.ca
Thu Oct 18 23:08:31 CEST 2018


On 2018-10-18 4:00 p.m., Lars Ellenberg wrote:
> On Wed, Oct 17, 2018 at 03:11:42PM -0400, Digimer wrote:
>> On 2018-10-17 5:35 a.m., Adam Weremczuk wrote:
>>> Hi all,
>>>
>>> Yesterday I rebooted both nodes a couple of times (replacing BBU RAID
>>> batteries) and ended up with:
>>>
>>> drbd0: Split-Brain detected but unresolved, dropping connection!
>>
>> Fencing prevents this.
>>
>>> on both.
>>>
>>> node1: /drbd-overview//
>>> //0:r0/0  StandAlone Primary/Unknown UpToDate/DUnknown /srv/test1 ext4
>>> 3.6T 75G 3.4T 3% /
>>>
>>> node2: /drbd-overview //
>>> //0:r0/0  StandAlone Secondary/Unknown UpToDate/DUnknown/
>>>
>>> I understand there is a good chance (but not absolute guarantee) that
>>> node1 holds consistent and up to date data.
>>>
>>> Q1:
>>>
>>> Is it reasonably possible to mount /dev/drbd0 (/dev/sdb1) on node2 in
>>> read only mode?
>>>
>>> I would like to examine the data before discarding and syncing
>>> everything from node1.
>>
>> Yes. You can also promote node 2 to examine it as well.
>>
>>> /drbdadm disconnect all//
>>> //drbdadm -- --discard-my-data connect all/
>>
>> Discarding the data will trigger a resync and resolve the split-brain,
>> but of course, any changes on the discarded node will be lost.
>>
>>> Q2:
>>>
>>> Will the above completely purge all data on node2 or just drbd metadata?
>>>
>>> I.e. will all 75G have to be fully copied block by block or a lot less?
>>
>> It will do a full resync.
> 
> Even after a "split brain" (rather: data divergence),
> DRBD usually can do an "incremental, bitmap based" resync:
> DRBD knows which blocks have changed on each node, and will sync the
> blocks described by the bit-or of those out-of-sync bitmaps.

Glad to hear I was wrong on that, cool.

>>> I'm concerned about time and impact on performance when it comes to
>>> terabytes of data.
>>>
>>> Regards,
>>> Adam
>>
>> The resync (on 8.4) adapts the resync rate to minimize impact on
>> applications using the storage. As it slows itself down to "stay out of
>> the way", the resync time increases of course. You won't have redundancy
>> until the resync completes.

-- 
Digimer
Papers and Projects: https://alteeve.com/w/
"I am, somehow, less interested in the weight and convolutions of
Einstein’s brain than in the near certainty that people of equal talent
have lived and died in cotton fields and sweatshops." - Stephen Jay Gould


More information about the drbd-user mailing list