Usynced blocks if replication is interrupted during initial sync
Tim Westbrook
Tim_Westbrook at selinc.com
Tue Apr 16 04:25:33 CEST 2024
Thank you!
________________________________
From: Philipp Reisner <philipp.reisner at linbit.com>
Sent: Thursday, April 4, 2024 1:06 PM
To: Tim Westbrook <Tim_Westbrook at selinc.com>
Cc: drbd-user at lists.linbit.com <drbd-user at lists.linbit.com>
Subject: Re: Usynced blocks if replication is interrupted during initial sync
[Caution - External]
Hello Tim,
We were able to write a reproducer test case and fix this regression
with this commit:
https://urldefense.com/v3/__https://github.com/LINBIT/drbd/commit/be9a404134acc3d167e8a7e60adce4f1910a4893__;!!O7uE89YCNVw!Lg3rRgojII2WxVzSLqO-h7mIpRxkiz34chmd89P-b1GDlUP3QD3-jc3gdlj5aTFp9uwgCw_5PBjXtwPtevJ0JK_oC8s8ZGg$
This commit will go into the drbd-9.1.20 and drbd-9.2.9 releases.
best regards,
Philipp
On Fri, Mar 22, 2024 at 1:49 AM Tim Westbrook <Tim_Westbrook at selinc.com> wrote:
>
>
>
> Thank you
>
>
> So if "Copying bitmap of peer node_id=0" on reconnect after interruption, indicates the issue, the issue still exists for me.
>
> I am able to dump the metadata, but not sure it is very useful at this point...
>
> I have not tried invalidating it after a mount/unmount, nor have I tried invalidating it after adding a node, but we were trying to avoid unmounting once configured.
>
> Would you recommend against going back to a release version prior to this change?
>
> Is there any other information I can provide that would help ? Could I dump the meta data at any some point to show the expected/unexpected state?
>
> Latest flow is below
>
> Thank you so much for your assistance,
> Tim
>
> 1. /dev/vg/persist mounted directly without drbd
> 2. Enable DRBD by creating a single node configuration file
> 3. Reboot
> 4. Create metadata on separate disk (--max-peers=5)
> 5. drdbadm up persist
> 6. drbdadm invalidate persist
> 7. drbdadm primary --force persist
> 8. drbdadm down persist
> 9. drbdadm up persist
> 10. drbdadm invalidate persist*
> 11. drbdadm primary --force persist
> 12. mount /dev/drbd0 to /persist
> 13. start using that mount point
> 14. some time later
> 15. Modify configuration to add new target backup node
> 16. Copy config to remote node and reboot, it will restart in secondary
> 17. drbdadm adjust persist (on primary)
> 18. secondary comes up and initial sync starts
> 19. stop at 50% by disabling network interface
> 20. re-enable network interface
> 21. sync completes right away - node-id 0 message here
> 22. drbdadm verify persist - fails many blocks
>
>
>
>
> From: Joel Colledge <joel.colledge at linbit.com>
> Sent: Wednesday, March 20, 2024 12:02 AM
> To: Tim Westbrook <Tim_Westbrook at selinc.com>
> Cc: drbd-user at lists.linbit.com <drbd-user at lists.linbit.com>
> Subject: Re: Usynced blocks if replication is interrupted during initial sync
>
> [Caution - External]
>
> > We are still seeing the issue as described but perhaps I am not putting the invalidate
> > at the right spot
> >
> > Note - I've added it at step 6 below, but I'm wondering if it should be after
> > the additional node is configured and adjusted (in which case I would need to
> > unmount as apparently you can't invalidate a disk in use)
> >
> > So do I need to invalidate after every node is added?
>
> With my reproducer, the workaround at step 6 works.
>
> > Also Note, the node-id in the logs from the kernel is 0 but peers are configured with 1 and 2 ,
> > is this an issue or they separate ids?
>
> I presume you are referring to the line:
> "Copying bitmap of peer node_id=0"
> The reason that node ID 0 appears here is that DRBD stores a bitmap of
> the blocks that have changed since it was first brought up. This is
> the "day0" bitmap. This is stored in all unused bitmap slots. All
> unused node IDs point to one of these bitmaps. In this case, node ID 0
> is unused. So this line means that it is using the day0 bitmap here.
> This is unexpected, as mentioned in my previous reply.
>
> Joel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20240416/ed04f1f3/attachment.htm>
More information about the drbd-user
mailing list