[DRBD-user] drbd 8.4.7 out-of-sync on disabled host page cache in VM

Armin Schindler acs at sysgo.com
Tue May 14 12:30:42 CEST 2019


On 4/23/19 9:46 AM, Armin Schindler wrote:
> On 23.04.2019 09:28, Michael Hierweck wrote:
>> On 23.04.19 09:05, Armin Schindler wrote:
>>> On 20.04.2019 14:38, acs at sysgo.com wrote:
>>>>> On 13 March 2019 at 11:47 Roland Kammerer <roland.kammerer at linbit.com> wrote:
>>>>>
>>>>>
>>>>> On Tue, Mar 12, 2019 at 09:08:42AM +0100, Armin Schindler wrote:
>>>>>> On 3/11/19 1:42 PM, Roland Kammerer wrote:
>>>>>>> On Mon, Mar 11, 2019 at 11:13:11AM +0100, Armin Schindler wrote:
>>>>>>>> 2 hosts Debian 9 (stretch) with default DRBD version 8.4.7.
>>>>>>>
>>>>>>> Please retry with the current 8.4.11 version of DRBD. You can it from
>>>>>>> here:
>>>>>>> https://www.linbit.com/en/drbd-community/drbd-download/
>>>>>>
>>>>>> Okay, thanks. I will test 8.4.11.
>>>>>>
>>>>>> Do I need to change/update the tools as well or just the kernel driver?
>>>>>> I currently use drbd-utils 8.9.10.
>>>>>
>>>>> They should be fine. I don't remember any non-corner cases fixes for 8.4
>>>>> in drbd-utils.
>>>>
>>>> I tried version 8.4.11 and the problem persists.
>>>> When using Qemu/KVM virtio disk with a caching mode that uses host page cache,
>>>> or when using just a filesystem like ext4 on (without Qemu/KVM) on the host, the
>>>> drbd device gets out of sync after a while.
>>
>> Same here:
>>
>> LVM (thick) => DRBD => Virtio (cache=none or cache=directsync)
>>
>> After some weeks of running about 80 VMs on 4 nodes, some of the VM backings report out of sync
>> blocks. We are running an active/passive cluster with locally attached storage.
>>
>> We were not able to reproduce this behaviour when using cache="writethrough" or cache="writeback".
>>
>> We are running this setup since 2011/2012. The first years we were fine but about 3 years ago
>> we run into serious trouble because out-of-sync blocks lead to damaged file system (journals).
>>
>> The issue was discussed in 2014:
>>
>> https://lists.gt.net/drbd/users/25227
>>
>> We love(d) DRBD because of its simplicity and reliability. (Ceph is much more complex...)
>> However we wonder whether DRBD can still be considered that kind of "simple and reliable" it
>> was some years ago.
> 
> It sounds like we have the exact same setup and same problems, but
> 
>> Even if the situation might be introduced by virtio block driver optimizations some years ago
>> (no stable pages anymore?) a solution is needed.
> 
> I don't think it was introduced by virtio block.
> When I use the drbd device locally mounted, e.g. for a LXC root-fs, I
> can reproduce the out-of-sync as well.

Is there something else we can test?
Could a config setting causing this? We use mostly the defaults.

Any help is welcome.

Armin


-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2900 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20190514/b9859c69/attachment.bin>


More information about the drbd-user mailing list