[DRBD-user] drbd 8.4.7 out-of-sync on disabled host page cache in VM

Armin Schindler acs at sysgo.com
Tue Apr 23 09:46:13 CEST 2019

On 23.04.2019 09:28, Michael Hierweck wrote:
> On 23.04.19 09:05, Armin Schindler wrote:
>> On 20.04.2019 14:38, acs at sysgo.com wrote:
>>>> On 13 March 2019 at 11:47 Roland Kammerer <roland.kammerer at linbit.com> wrote:
>>>> On Tue, Mar 12, 2019 at 09:08:42AM +0100, Armin Schindler wrote:
>>>>> On 3/11/19 1:42 PM, Roland Kammerer wrote:
>>>>>> On Mon, Mar 11, 2019 at 11:13:11AM +0100, Armin Schindler wrote:
>>>>>>> 2 hosts Debian 9 (stretch) with default DRBD version 8.4.7.
>>>>>> Please retry with the current 8.4.11 version of DRBD. You can it from
>>>>>> here:
>>>>>> https://www.linbit.com/en/drbd-community/drbd-download/
>>>>> Okay, thanks. I will test 8.4.11.
>>>>> Do I need to change/update the tools as well or just the kernel driver?
>>>>> I currently use drbd-utils 8.9.10.
>>>> They should be fine. I don't remember any non-corner cases fixes for 8.4
>>>> in drbd-utils.
>>> I tried version 8.4.11 and the problem persists.
>>> When using Qemu/KVM virtio disk with a caching mode that uses host page cache,
>>> or when using just a filesystem like ext4 on (without Qemu/KVM) on the host, the
>>> drbd device gets out of sync after a while.
> Same here:
> LVM (thick) => DRBD => Virtio (cache=none or cache=directsync)
> After some weeks of running about 80 VMs on 4 nodes, some of the VM backings report out of sync
> blocks. We are running an active/passive cluster with locally attached storage.
> We were not able to reproduce this behaviour when using cache="writethrough" or cache="writeback".
> We are running this setup since 2011/2012. The first years we were fine but about 3 years ago
> we run into serious trouble because out-of-sync blocks lead to damaged file system (journals).
> The issue was discussed in 2014:
> https://lists.gt.net/drbd/users/25227
> We love(d) DRBD because of its simplicity and reliability. (Ceph is much more complex...)
> However we wonder whether DRBD can still be considered that kind of "simple and reliable" it
> was some years ago.

It sounds like we have the exact same setup and same problems, but

> Even if the situation might be introduced by virtio block driver optimizations some years ago
> (no stable pages anymore?) a solution is needed.

I don't think it was introduced by virtio block.
When I use the drbd device locally mounted, e.g. for a LXC root-fs, I
can reproduce the out-of-sync as well.


-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2900 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20190423/edaa2ef7/attachment.bin>

More information about the drbd-user mailing list