[DRBD-user] strange split-brain problem

Tue Dec 7 17:19:37 CET 2010

On Tue, 07 Dec 2010 16:47:09 +0100, Felix Frank wrote:

>> Have I understood correctly?
> 
> No ;-) Please see the activity log document linked earlier in this
> thread. 

I did.  I guess I didn't grasp it well enough.

> The activity log is used when the nodes are in sync. When node A
> is a StandAlone Primary, it will update its own QuickSync Bitmap
> instead. Then, when B rejoins the cluster, the sync will be from A to B,
> not vice versa.

But that doesn't seem to be what is being described by 
klaus.mailinglists at pernau.at.  Am I reading it incorrectly, because it 
appears to me that extents on node A are being marked as out of date 
while B is down.

That's also what I thought you meant by:

	Node A is primary and marks some extents as hot. Then you 
	reboot it. It comes back up and finds hot extents, so it 
	flips the corresponding bits in its QuickSync Bitmap and 
	wants to sync those from the secondary. 

"Hot extents" is a concept involving the activity log, not the QuickSync 
bitmap, right?  So how could extents become hot while B is down?

Or could this have occurred before B was shut down?  

> 
>> I'm puzzled at how recovery and synchronization is occurring after a
>> failure.  In the scenario described by klaus.mailinglists at pernau.at,
>> node B has been down.  So where is the benefit of syncing with it when
>> node A returns to service?  Would that not roll the state of storage
>> back to when node B failed?
> 
> When A last saw B, B was online and UpToDate.

Yes, but after writes occur to A while B is down, B is out of date even 
if B doesn't know it.

> 
>> I'm envisioning an extent that is the recipient of multiple writes over
>> the time while B is down.  One is recent - leaving the extent "hot" -
>> when node A goes down.
>> 
>> If the "hotness" causes that extent to be restored to the state stored
>> on B, would that not roll that extent back several writes?
>> 
>> I assume that this doesn't actually happen, but I'm trying to
>> understand how the mechanism involved works.
> 
> When node B is offline while A is receiving writes, A will not try and
> refresh its hot extents from B under any circumstances, if I remember
> correctly, at least not any sectors that are already marked in A's
> Bitmap.

That makes sense to me.  It's what I would expect.  But it doesn't seem 
to fit what klaus.mailinglists at pernau.at is describing.

	- Andrew