[DRBD-user] drbd 8.4.7 out-of-sync on disabled host page cache in VM

Michael Hierweck michael at hierweck.de
Tue Apr 23 09:44:46 CEST 2019


On 23.04.19 09:28, Michael Hierweck wrote:
> On 23.04.19 09:05, Armin Schindler wrote:
>> On 20.04.2019 14:38, acs at sysgo.com wrote:
>>>> On 13 March 2019 at 11:47 Roland Kammerer <roland.kammerer at linbit.com> wrote:
>>>>
>>>>
>>>> On Tue, Mar 12, 2019 at 09:08:42AM +0100, Armin Schindler wrote:
>>>>> On 3/11/19 1:42 PM, Roland Kammerer wrote:
>>>>>> On Mon, Mar 11, 2019 at 11:13:11AM +0100, Armin Schindler wrote:
>>>>>>> 2 hosts Debian 9 (stretch) with default DRBD version 8.4.7.
>>>>>>
>>>>>> Please retry with the current 8.4.11 version of DRBD. You can it from
>>>>>> here:
>>>>>> https://www.linbit.com/en/drbd-community/drbd-download/
>>>>>
>>>>> Okay, thanks. I will test 8.4.11.
>>>>>
>>>>> Do I need to change/update the tools as well or just the kernel driver?
>>>>> I currently use drbd-utils 8.9.10.
>>>>
>>>> They should be fine. I don't remember any non-corner cases fixes for 8.4
>>>> in drbd-utils.
>>>
>>> I tried version 8.4.11 and the problem persists.
>>> When using Qemu/KVM virtio disk with a caching mode that uses host page cache,
>>> or when using just a filesystem like ext4 on (without Qemu/KVM) on the host, the
>>> drbd device gets out of sync after a while.
> 
> Same here:
> 
> LVM (thick) => DRBD => Virtio (cache=none or cache=directsync)
> 
> After some weeks of running about 80 VMs on 4 nodes, some of the VM backings report out of sync
> blocks. We are running an active/passive cluster with locally attached storage.
> 
> We were not able to reproduce this behaviour when using cache="writethrough" or cache="writeback".
> 
> We are running this setup since 2011/2012. The first years we were fine but about 3 years ago
> we run into serious trouble because out-of-sync blocks lead to damaged file system (journals).
> 
> The issue was discussed in 2014:
> 
> https://lists.gt.net/drbd/users/25227
> 
> We love(d) DRBD because of its simplicity and reliability. (Ceph is much more complex...)
> However we wonder whether DRBD can still be considered that kind of "simple and reliable" it
> was some years ago.
> 
> Even if the situation might be introduced by virtio block driver optimizations some years ago
> (no stable pages anymore?) a solution is needed.

Remark:

I would expect stacks such as "rdb => virtio" and maybe "ZFS ZVOL => virtio" to require stable
pages, too, for the same reason as DRBD does: checksum calculation.

http://lkml.iu.edu/hypermail/linux/kernel/1511.0/04497.html



More information about the drbd-user mailing list