[DRBD-user] protocol C replication - unexpected behaviour

Yanni M. gianni.milo22 at gmail.com
Wed Aug 11 21:02:48 CEST 2021


To verify your assumptions are correct, you could simply tell to fstab to
mount the filesystem (xfs) with sync parameter enabled. In this way, all
i/o to the filesystem will be done synchronously. Alternatively you could
configure your 'service' to force sync writes somehow.


On Wed, 11 Aug 2021 at 19:18, Janusz Jaskiewicz <janusz.jaskiewicz at gmail.com>
wrote:

> Looking here
> https://lists.linbit.com/pipermail/drbd-user/2017-September/023601.html
> for an explanation how protocol C works it seems that the scenario that I
> described above is possible if the data is cached above drbd before we
> disconnect the secondary node.
>
> my disk-flushes and md-flushes are left default, so they are 'yes'.
>
> Regards,
> Janusz.
>
> śr., 11 sie 2021 o 01:05 Digimer <lists at alteeve.ca> napisał(a):
>
>> On 2021-08-10 3:16 p.m., Janusz Jaskiewicz wrote:
>>
>> Hi,
>>
>> Thanks for your answers.
>>
>> Answering your questions:
>> DRBD_KERNEL_VERSION=9.0.25
>>
>> Linux kernel:
>> 4.18.0-305.3.1.el8.x86_6
>>
>> File system type:
>> XFS.
>>
>> So the file system is not cluster-aware, but as far as I understand in an
>> active/passive setup - single primary (that I have) it should be OK.
>> Just checked the doc which seems to confirm that.
>>
>> I think the problem may come from the way I'm testing it.
>> I came up with this testing scenario, that I described in my first post,
>> because I didn't have an easy way to abruptly restart the server.
>> When I do the hard reset of the primary server it works as expected (at
>> least I can find a logical explanation).
>>
>> I think what happened in my previous scenario was:
>> Service is writing to the disk, and some portion of the written data is
>> in a disk cache. As the picture
>> https://linbit.com/wp-content/uploads/drbd/drbd-guide-9_0-en/images/drbd-in-kernel.png
>> shows, the cache is above the DRBD module.
>> Then I kill the service and the network, but some data is still in the
>> cache.
>> At some point the cache is flushed and the data gets written to the disk.
>> DRBD probably reports some error at this point, as it can't send that
>> data to the secondary node (DRBD thinks the other node has left the
>> cluster).
>>
>> When I check the files at this point I see more data on the primary
>> because it also contains the data from the cache, which were not replicated
>> because the network was down when the data hit the DRBD.
>>
>> When I do the hard restart of the server, data in the cache is lost, so
>> we don't observe the result as above.
>>
>> Does it make sense?
>>
>> Regards,
>> Janusz.
>>
>> OK, it sounded from your first post like you have the FS mounted on both
>> nodes at the same time, that would be a problem. If it's only mounted in
>> one place at a time, then it's OK.
>>
>> As for caching; DRBD on the Secondary will say "write complete" to the
>> primary, in protocol C, when it has been told that the disk write is
>> complete. So if the cache is _above_ drbd's kernel module, then that's
>> probably not the problem because the Secondary won't tell the primary it's
>> done until it receives the data. If there is a caching issue _below_ DRBD
>> on the Secondary, then it's _possible_ that's the problem, but I doubt it.
>> The reason is that whatever is managing the cache below DRBD on the
>> Secondary should know that a given block hasn't flushed yet and, on read
>> request, read from cache not disk. This is a guess on my part.
>>
>> What are your 'disk { disk-flushes [yes|no]; and md-flushes [yes|no]; }'
>> set to?
>>
>> --
>> Digimer
>> Papers and Projects: https://alteeve.com/w/
>> "I am, somehow, less interested in the weight and convolutions of Einstein’s brain than in the near certainty that people of equal talent have lived and died in cotton fields and sweatshops." - Stephen Jay Gould
>>
>> _______________________________________________
> Star us on GITHUB: https://github.com/LINBIT
> drbd-user mailing list
> drbd-user at lists.linbit.com
> https://lists.linbit.com/mailman/listinfo/drbd-user
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20210811/2de3c215/attachment.htm>


More information about the drbd-user mailing list