[DRBD-user] drbd 8.4.7 out-of-sync on disabled host page cache in VM

Michael Hierweck michael at hierweck.de
Tue Apr 23 09:28:13 CEST 2019


On 23.04.19 09:05, Armin Schindler wrote:
> On 20.04.2019 14:38, acs at sysgo.com wrote:
>>> On 13 March 2019 at 11:47 Roland Kammerer <roland.kammerer at linbit.com> wrote:
>>>
>>>
>>> On Tue, Mar 12, 2019 at 09:08:42AM +0100, Armin Schindler wrote:
>>>> On 3/11/19 1:42 PM, Roland Kammerer wrote:
>>>>> On Mon, Mar 11, 2019 at 11:13:11AM +0100, Armin Schindler wrote:
>>>>>> 2 hosts Debian 9 (stretch) with default DRBD version 8.4.7.
>>>>>
>>>>> Please retry with the current 8.4.11 version of DRBD. You can it from
>>>>> here:
>>>>> https://www.linbit.com/en/drbd-community/drbd-download/
>>>>
>>>> Okay, thanks. I will test 8.4.11.
>>>>
>>>> Do I need to change/update the tools as well or just the kernel driver?
>>>> I currently use drbd-utils 8.9.10.
>>>
>>> They should be fine. I don't remember any non-corner cases fixes for 8.4
>>> in drbd-utils.
>>
>> I tried version 8.4.11 and the problem persists.
>> When using Qemu/KVM virtio disk with a caching mode that uses host page cache,
>> or when using just a filesystem like ext4 on (without Qemu/KVM) on the host, the
>> drbd device gets out of sync after a while.

Same here:

LVM (thick) => DRBD => Virtio (cache=none or cache=directsync)

After some weeks of running about 80 VMs on 4 nodes, some of the VM backings report out of sync
blocks. We are running an active/passive cluster with locally attached storage.

We were not able to reproduce this behaviour when using cache="writethrough" or cache="writeback".

We are running this setup since 2011/2012. The first years we were fine but about 3 years ago
we run into serious trouble because out-of-sync blocks lead to damaged file system (journals).

The issue was discussed in 2014:

https://lists.gt.net/drbd/users/25227

We love(d) DRBD because of its simplicity and reliability. (Ceph is much more complex...)
However we wonder whether DRBD can still be considered that kind of "simple and reliable" it
was some years ago.

Even if the situation might be introduced by virtio block driver optimizations some years ago
(no stable pages anymore?) a solution is needed.


Michael



More information about the drbd-user mailing list