[DRBD-user] strange split-brain problem

Tue Dec 7 16:47:09 CET 2010

On 12/07/2010 04:39 PM, Andrew Gideon wrote:
> On Tue, 07 Dec 2010 14:29:39 +0100, Felix Frank wrote:
> 
>> Node A is primary and marks some extents as hot.
> 
> Is this just a matter of timing, or were they hot because the writes 
> occurred while node B was down?  I'm guessing the former.  If the latter, 
> then the activity log would grow unbounded while one node was down.
> 
> Have I understood correctly?

No ;-) Please see the activity log document linked earlier in this thread.
The activity log is used when the nodes are in sync. When node A is a
StandAlone Primary, it will update its own QuickSync Bitmap instead.
Then, when B rejoins the cluster, the sync will be from A to B, not vice
versa.

> I'm puzzled at how recovery and synchronization is occurring after a 
> failure.  In the scenario described by klaus.mailinglists at pernau.at, node 
> B has been down.  So where is the benefit of syncing with it when node A 
> returns to service?  Would that not roll the state of storage back to 
> when node B failed?

When A last saw B, B was online and UpToDate.

> I'm envisioning an extent that is the recipient of multiple writes over 
> the time while B is down.  One is recent - leaving the extent "hot" - 
> when node A goes down.
> 
> If the "hotness" causes that extent to be restored to the state stored on 
> B, would that not roll that extent back several writes?
> 
> I assume that this doesn't actually happen, but I'm trying to understand 
> how the mechanism involved works.

When node B is offline while A is receiving writes, A will not try and
refresh its hot extents from B under any circumstances, if I remember
correctly, at least not any sectors that are already marked in A's Bitmap.

Regards,
Felix