Usynced blocks if replication is interrupted during initial sync

Tim Westbrook Tim_Westbrook at selinc.com
Wed Apr 17 21:11:20 CEST 2024


Philipp


Thanks again, 

Is there a way to tell what issues were addressed between version 9.2.4 and the current version? 

We may not be able to delay our release until 9.2.9 and would like to understand as much as possible what else we may need to be concerned with

Cheers,
Tim


From: Philipp Reisner <philipp.reisner at linbit.com>
Sent: Thursday, April 4, 2024 1:06 PM
To: Tim Westbrook <Tim_Westbrook at selinc.com>
Cc: drbd-user at lists.linbit.com <drbd-user at lists.linbit.com>
Subject: Re: Usynced blocks if replication is interrupted during initial sync
 
[Caution - External]

Hello Tim,

We were able to write a reproducer test case and fix this regression
with this commit:
https://urldefense.com/v3/__https://github.com/LINBIT/drbd/commit/be9a404134acc3d167e8a7e60adce4f1910a4893__;!!O7uE89YCNVw!Lg3rRgojII2WxVzSLqO-h7mIpRxkiz34chmd89P-b1GDlUP3QD3-jc3gdlj5aTFp9uwgCw_5PBjXtwPtevJ0JK_oC8s8ZGg$

This commit will go into the drbd-9.1.20 and drbd-9.2.9 releases.

best regards,
 Philipp

On Fri, Mar 22, 2024 at 1:49 AM Tim Westbrook <Tim_Westbrook at selinc.com> wrote:
>
>
>
> Thank you
>
>
> So if "Copying bitmap of peer node_id=0" on reconnect after interruption, indicates the issue, the issue still exists for me.
>
> I am able to dump the metadata, but not sure it is very useful at this point...
>
> I have not tried invalidating it after a mount/unmount, nor have I tried invalidating it after adding a node, but we were trying to avoid unmounting once configured.
>
> Would you recommend against going back to a release version prior to this change?
>
> Is there any other information I can provide that would help ?  Could I dump the meta data at any some point to show the expected/unexpected state?
>
> Latest flow is below
>
> Thank you so much for your assistance,
> Tim
>
> 1. /dev/vg/persist mounted directly without drbd
> 2. Enable DRBD by creating a single node configuration file
> 3. Reboot
> 4. Create metadata on separate disk (--max-peers=5)
> 5. drdbadm up persist
> 6. drbdadm invalidate persist
> 7. drbdadm primary --force persist
> 8. drbdadm down persist
> 9. drbdadm up persist
> 10. drbdadm invalidate persist*
> 11. drbdadm primary --force persist
> 12. mount /dev/drbd0 to /persist
> 13. start using that mount point
> 14. some time later
> 15. Modify configuration to add new target backup node
> 16. Copy config to remote node and reboot, it will restart in secondary
> 17. drbdadm adjust persist (on primary)
> 18. secondary comes up and initial sync starts
> 19. stop at 50% by disabling network interface
> 20. re-enable network interface
> 21. sync completes right away - node-id 0 message here
> 22. drbdadm verify persist - fails many blocks
>
>
>
>
> From: Joel Colledge <joel.colledge at linbit.com>
> Sent: Wednesday, March 20, 2024 12:02 AM
> To: Tim Westbrook <Tim_Westbrook at selinc.com>
> Cc: drbd-user at lists.linbit.com <drbd-user at lists.linbit.com>
> Subject: Re: Usynced blocks if replication is interrupted during initial sync
>
> [Caution - External]
>
> > We are still seeing the issue as described but perhaps I am not putting the invalidate
> > at the right spot
> >
> > Note - I've added it at step 6 below, but I'm wondering if it should be after
> > the additional node is configured and adjusted (in which case I would need to
> > unmount as apparently you can't invalidate a disk in use)
> >
> > So do I need to invalidate after every node is added?
>
> With my reproducer, the workaround at step 6 works.
>
> > Also Note, the node-id in the logs from the kernel is 0 but peers are configured with 1 and 2 ,
> > is this an issue or they separate ids?
>
> I presume you are referring to the line:
> "Copying bitmap of peer node_id=0"
> The reason that node ID 0 appears here is that DRBD stores a bitmap of
> the blocks that have changed since it was first brought up. This is
> the "day0" bitmap. This is stored in all unused bitmap slots. All
> unused node IDs point to one of these bitmaps. In this case, node ID 0
> is unused. So this line means that it is using the day0 bitmap here.
> This is unexpected, as mentioned in my previous reply.
>
> Joel


More information about the drbd-user mailing list