[DRBD-user] 8 Zettabytes out-of-sync?

Jarno Elonen elonen at iki.fi
Fri Nov 2 09:45:16 CET 2018


More clues:

Just witnessed a resync (after invalidate) to steadily go from 100%
out-of-sync to 0% (after several automatic disconnects and reconnects).
Immediately after reaching 0%, it went to negative -<very-large-number>% !
After that, drbdtop started showing 8.0ZiB out-of-sync.

Looks like a severe wrap-around bug.

-Jarno


On Thu, 1 Nov 2018 at 22:30, Jarno Elonen <elonen at iki.fi> wrote:

> Here's some more info.
> Dmesg shows some suspicious looking log message, such as:
>
> 1) FIXME drbd_s_vm-117-s[2830] op clear, bitmap locked for 'receive
> bitmap' by drbd_r_vm-117-s[5038]
>
> 2) Wrong magic value 0xffff0007 in protocol version 114
>
> 3) peer request with dagtag 399201392 not found
> got_peer_ack [drbd] failed
>
> 4) Rejecting concurrent remote state change 2226202936 because of state
> change 2923939731
> Ignoring P_TWOPC_ABORT packet 2226202936.
>
> 5) drbd_r_vm-117-s[5038] going to 'detect_finished_resyncs()' but bitmap
> already locked for 'write from resync_finished' by drbd_w_vm-117-s[2812]
> md_sync_timer expired! Worker calls drbd_md_sync().
>
> 6) incompatible discard-my-data settings
> conn( Connecting -> Disconnecting )
> error receiving P_PROTOCOL, e: -5 l: 7!
>
> Two of the four nodes have DRBD 9.0.15-1 and two have 9.0.16-1. All of
> them API v 16:
>
> == mox-a ==
> version: 9.0.15-1 (api:2/proto:86-114)
> GIT-hash: c46d27900f471ea0f5ba587592439a9ddde1d08b build by root at mox-a,
> 2018-10-28 03:08:58
> Transports (api:16): tcp (9.0.15-1)
>
> == mox-b ==
> version: 9.0.15-1 (api:2/proto:86-114)
> GIT-hash: c46d27900f471ea0f5ba587592439a9ddde1d08b build by root at mox-b,
> 2018-10-10 17:50:25
> Transports (api:16): tcp (9.0.15-1)
>
> == mox-c ==
> version: 9.0.16-1 (api:2/proto:86-114)
> GIT-hash: ab9777dfeaf9d619acc9a5201bfcae8103e9529c build by root at mox-c,
> 2018-10-28 05:45:05
> Transports (api:16): tcp (9.0.16-1)
>
> == mox-d ==
> version: 9.0.16-1 (api:2/proto:86-114)
> GIT-hash: ab9777dfeaf9d619acc9a5201bfcae8103e9529c build by root at mox-d,
> 2018-10-29 00:22:23
> Transports (api:16): tcp (9.0.16-1)
>
> Running Proxmox (5.2-2) as can you'd guess from host names. DRBD resources
> being managed by LINSTOR.
>
>
> On Thu, 1 Nov 2018 at 17:32, Jarno Elonen <elonen at iki.fi> wrote:
>
>> Okay, today one of these resources got a sudden, severe filesystem
>> corruption on the primary.
>>
>> On the other hand, the secondaries (that showed 8ZiB out-of-sync) were
>> still mountable after I disconnected the corrupted primary. No idea how
>> current data the secondaries had, but drbdtop still showed them as
>> connected and 8Zib out-of-sync.
>>
>> This is getting quite worrisome. Is anyone else experiencing this with
>> DRBD 9? Is it something really wrong in my setup, or are there perhaps some
>> known instabilities in DRBD 9.0.15-1?
>>
>> -Jarno
>>
>>
>> On Wed, 31 Oct 2018 at 20:46, Jarno Elonen <elonen at iki.fi> wrote:
>>
>>> I've got several DRBD 9 resource that constantly show *UpToDate* with
>>> 9223372036854774304 bytes (exactly 8ZiB) of OutOfDate data.
>>>
>>> Any idea what might cause this and how to fix it?
>>>
>>> Example:
>>>
>>> # drbdsetup status --verbose --statistics vm-106-disk-1
>>> vm-106-disk-1 node-id:0 role:Primary suspended:no
>>>     write-ordering:flush
>>>   volume:0 minor:1003 disk:UpToDate quorum:yes
>>>       size:16777688 read:215779 written:22369564 al-writes:89
>>> bm-writes:0 upper-pending:0
>>>       lower-pending:0 al-suspended:no blocked:no
>>>   mox-a node-id:1 connection:Connected role:Secondary congested:no
>>> ap-in-flight:0
>>>       rs-in-flight:18446744073709549808
>>>     volume:0 replication:Established peer-disk:UpToDate
>>> resync-suspended:no
>>>         received:215116 sent:22368903 out-of-sync:9223372036854774304
>>> pending:0 unacked:0
>>>   mox-c node-id:2 connection:Connected role:Secondary congested:no
>>> ap-in-flight:0
>>>       rs-in-flight:18446744073709549808
>>>     volume:0 replication:Established peer-disk:UpToDate
>>> resync-suspended:no
>>>         received:1188 sent:19884428 out-of-sync:0 pending:0 unacked:0
>>>
>>> Version info:
>>> version: 9.0.15-1 (api:2/proto:86-114)
>>> GIT-hash: c46d27900f471ea0f5ba587592439a9ddde1d08b build by root at mox-b,
>>> 2018-10-10 17:50:25
>>> Transports (api:16): tcp (9.0.15-1)
>>>
>>> -Jarno
>>>
>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20181102/2abde6c2/attachment.htm>


More information about the drbd-user mailing list