[DRBD-user] 8 Zettabytes out-of-sync?

Jarno Elonen elonen at iki.fi
Thu Nov 1 21:30:59 CET 2018


Here's some more info.
Dmesg shows some suspicious looking log message, such as:

1) FIXME drbd_s_vm-117-s[2830] op clear, bitmap locked for 'receive bitmap'
by drbd_r_vm-117-s[5038]

2) Wrong magic value 0xffff0007 in protocol version 114

3) peer request with dagtag 399201392 not found
got_peer_ack [drbd] failed

4) Rejecting concurrent remote state change 2226202936 because of state
change 2923939731
Ignoring P_TWOPC_ABORT packet 2226202936.

5) drbd_r_vm-117-s[5038] going to 'detect_finished_resyncs()' but bitmap
already locked for 'write from resync_finished' by drbd_w_vm-117-s[2812]
md_sync_timer expired! Worker calls drbd_md_sync().

6) incompatible discard-my-data settings
conn( Connecting -> Disconnecting )
error receiving P_PROTOCOL, e: -5 l: 7!

Two of the four nodes have DRBD 9.0.15-1 and two have 9.0.16-1. All of them
API v 16:

== mox-a ==
version: 9.0.15-1 (api:2/proto:86-114)
GIT-hash: c46d27900f471ea0f5ba587592439a9ddde1d08b build by root at mox-a,
2018-10-28 03:08:58
Transports (api:16): tcp (9.0.15-1)

== mox-b ==
version: 9.0.15-1 (api:2/proto:86-114)
GIT-hash: c46d27900f471ea0f5ba587592439a9ddde1d08b build by root at mox-b,
2018-10-10 17:50:25
Transports (api:16): tcp (9.0.15-1)

== mox-c ==
version: 9.0.16-1 (api:2/proto:86-114)
GIT-hash: ab9777dfeaf9d619acc9a5201bfcae8103e9529c build by root at mox-c,
2018-10-28 05:45:05
Transports (api:16): tcp (9.0.16-1)

== mox-d ==
version: 9.0.16-1 (api:2/proto:86-114)
GIT-hash: ab9777dfeaf9d619acc9a5201bfcae8103e9529c build by root at mox-d,
2018-10-29 00:22:23
Transports (api:16): tcp (9.0.16-1)

Running Proxmox (5.2-2) as can you'd guess from host names. DRBD resources
being managed by LINSTOR.


On Thu, 1 Nov 2018 at 17:32, Jarno Elonen <elonen at iki.fi> wrote:

> Okay, today one of these resources got a sudden, severe filesystem
> corruption on the primary.
>
> On the other hand, the secondaries (that showed 8ZiB out-of-sync) were
> still mountable after I disconnected the corrupted primary. No idea how
> current data the secondaries had, but drbdtop still showed them as
> connected and 8Zib out-of-sync.
>
> This is getting quite worrisome. Is anyone else experiencing this with
> DRBD 9? Is it something really wrong in my setup, or are there perhaps some
> known instabilities in DRBD 9.0.15-1?
>
> -Jarno
>
>
> On Wed, 31 Oct 2018 at 20:46, Jarno Elonen <elonen at iki.fi> wrote:
>
>> I've got several DRBD 9 resource that constantly show *UpToDate* with
>> 9223372036854774304 bytes (exactly 8ZiB) of OutOfDate data.
>>
>> Any idea what might cause this and how to fix it?
>>
>> Example:
>>
>> # drbdsetup status --verbose --statistics vm-106-disk-1
>> vm-106-disk-1 node-id:0 role:Primary suspended:no
>>     write-ordering:flush
>>   volume:0 minor:1003 disk:UpToDate quorum:yes
>>       size:16777688 read:215779 written:22369564 al-writes:89 bm-writes:0
>> upper-pending:0
>>       lower-pending:0 al-suspended:no blocked:no
>>   mox-a node-id:1 connection:Connected role:Secondary congested:no
>> ap-in-flight:0
>>       rs-in-flight:18446744073709549808
>>     volume:0 replication:Established peer-disk:UpToDate
>> resync-suspended:no
>>         received:215116 sent:22368903 out-of-sync:9223372036854774304
>> pending:0 unacked:0
>>   mox-c node-id:2 connection:Connected role:Secondary congested:no
>> ap-in-flight:0
>>       rs-in-flight:18446744073709549808
>>     volume:0 replication:Established peer-disk:UpToDate
>> resync-suspended:no
>>         received:1188 sent:19884428 out-of-sync:0 pending:0 unacked:0
>>
>> Version info:
>> version: 9.0.15-1 (api:2/proto:86-114)
>> GIT-hash: c46d27900f471ea0f5ba587592439a9ddde1d08b build by root at mox-b,
>> 2018-10-10 17:50:25
>> Transports (api:16): tcp (9.0.15-1)
>>
>> -Jarno
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20181101/4edc5136/attachment-0001.htm>


More information about the drbd-user mailing list