[DRBD-user] protocol C replication - unexpected behaviour

Tue Aug 10 08:11:59 CEST 2021

On 10/08/2021 00:01, Digimer wrote:
> On 2021-08-05 5:53 p.m., Janusz Jaskiewicz wrote:
>> Hello.
>>
>> I'm experimenting a bit with DRBD in a cluster managed by Pacemaker.
>> It's a two node, active-passive cluster and the service that I'm 
>> trying to put in the cluster writes to the file system.
>> The service manages many files, it appends to some of them and 
>> increments the single integer value in others.
>>
>> I'm observing surprising behaviour and I would like to ask you if what 
>> I see is expected or not (I think not).
>>
>> I'm using protocol C, but still I see some delay in the files that are 
>> being replicated to the secondary server.
>> For the files that increment the integer I see a difference which 
>> corresponds roughly to 1 second of traffic.
>>
>> I'm really surprised to see this, as protocol C should guarantee 
>> synchronous replication.
>> I'd rather expect some delay in processing (potentially slower disk 
>> writes due to the network replication).
>>
>> The way I'm testing it:
>> The service runs on primary and writes to DRBD drive, secondary 
>> connected and "UpToDate".
>> I kill the service abruptly (kill -9) and then take down the network 
>> interface between primary and secondary (kill and ifdown commands in 
>> the script so executed quite promptly one after the other).
>> Then I mount the DRBD drive on both nodes and check the difference in 
>> the files with incrementing integer.
>>
>> I would appreciate any help or pointers on how to fix this.
>> But first of all I would like to confirm that this behaviour is not 
>> expected.
>>
>> Also if it is expected/allowed, how can I decrease the impact?

> What filesystem are you using? Is it cluster / multi-node aware?

The filesystem may be relevant in that filesystems can behave in ways 
one might not expect, depending on how they are tuned, so would be good 
to know what the fs is and what mount options are being used. However, 
the filesystem certainly does not need to be aware that the underlying 
block device is drbd or a cluster of any kind. A drbd device should look 
like a regular block device and there is no need to treat it like 
anything else.

I would like to know the kernel and drbd versions, if these are old 
enough then expecting things to "work" in a sane fashion might not be a 
reasonable expectation :-)