Usynced blocks if replication is interrupted during initial sync

Joel Colledge joel.colledge at linbit.com
Tue Mar 19 09:56:58 CET 2024


Hi Tim,

Thanks for the report and your previous one with the subject line
"Blocks fail to sync if connection lost during initial sync". This
does look like a bug.

I can reproduce the issue. It appears to be a regression introduced in
drbd-9.1.16 / drbd-9.2.5. Specifically, with the following commit:
f647d1f7fff6 drbd: set bitmap UUID when setting bitmap slot

The issue only occurs when there is a spare bitmap slot. That is, when
create-md was run with --max-peers=2 or a higher value.

Based on a brief look at the logs, the root cause seems to be that the
day0 bitmap is unexpectedly copied after reconnecting during the
initial sync.

A workaround is to bring the node with the data that is to be
preserved up as follows:
$ drbdadm up <res>
$ drbdadm invalidate <res>
$ drbdadm primary --force <res>
This causes all bits in the day0 bitmap to be set, so that when it is
unexpectedly used, a full sync is performed. This only needs to be
done at the very start. No additional steps should be required when
bringing up additional nodes later.

An alternative workaround would be to reduce the max-peers value.
However, it is good practice to keep some spare peer slots for
potential future usage, so this is not ideal.

Best regards,
Joel


More information about the drbd-user mailing list