<html>


<head>


<meta http-equiv="Content-Type" content="text/html; charset=utf-8">


<style type="text/css" style="display:none;"> P {margin-top:0;margin-bottom:0;} </style>


</head>


<body dir="ltr">


<div class="elementToProof" style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 11pt; color: rgb(0, 0, 0);">


Thank you!</div>


<div id="appendonsend"></div>


<hr style="display:inline-block;width:98%" tabindex="-1">


<div id="divRplyFwdMsg" dir="ltr"><font face="Calibri, sans-serif" style="font-size:11pt" color="#000000"><b>From:</b> Philipp Reisner &lt;philipp.reisner@linbit.com&gt;<br>


<b>Sent:</b> Thursday, April 4, 2024 1:06 PM<br>


<b>To:</b> Tim Westbrook &lt;Tim_Westbrook@selinc.com&gt;<br>


<b>Cc:</b> drbd-user@lists.linbit.com &lt;drbd-user@lists.linbit.com&gt;<br>


<b>Subject:</b> Re: Usynced blocks if replication is interrupted during initial sync</font>


<div>&nbsp;</div>


</div>


<div class="BodyFragment"><font size="2"><span style="font-size:11pt;">


<div class="PlainText">[Caution - External]<br>


<br>


Hello Tim,<br>


<br>


We were able to write a reproducer test case and fix this regression<br>


with this commit:<br>


<a href="https://urldefense.com/v3/__https://github.com/LINBIT/drbd/commit/be9a404134acc3d167e8a7e60adce4f1910a4893__;!!O7uE89YCNVw!Lg3rRgojII2WxVzSLqO-h7mIpRxkiz34chmd89P-b1GDlUP3QD3-jc3gdlj5aTFp9uwgCw_5PBjXtwPtevJ0JK_oC8s8ZGg$">https://urldefense.com/v3/__https://github.com/LINBIT/drbd/commit/be9a404134acc3d167e8a7e60adce4f1910a4893__;!!O7uE89YCNVw!Lg3rRgojII2WxVzSLqO-h7mIpRxkiz34chmd89P-b1GDlUP3QD3-jc3gdlj5aTFp9uwgCw_5PBjXtwPtevJ0JK_oC8s8ZGg$</a><br>


<br>


This commit will go into the drbd-9.1.20 and drbd-9.2.9 releases.<br>


<br>


best regards,<br>


&nbsp;Philipp<br>


<br>


On Fri, Mar 22, 2024 at 1:49 AM Tim Westbrook &lt;Tim_Westbrook@selinc.com&gt; wrote:<br>


&gt;<br>


&gt;<br>


&gt;<br>


&gt; Thank you<br>


&gt;<br>


&gt;<br>


&gt; So if &quot;Copying bitmap of peer node_id=0&quot; on reconnect after interruption, indicates the issue, the issue still exists for me.<br>


&gt;<br>


&gt; I am able to dump the metadata, but not sure it is very useful at this point...<br>


&gt;<br>


&gt; I have not tried invalidating it after a mount/unmount, nor have I tried invalidating it after adding a node, but we were trying to avoid unmounting once configured.<br>


&gt;<br>


&gt; Would you recommend against going back to a release version prior to this change?<br>


&gt;<br>


&gt; Is there any other information I can provide that would help ?&nbsp; Could I dump the meta data at any some point to show the expected/unexpected state?<br>


&gt;<br>


&gt; Latest flow is below<br>


&gt;<br>


&gt; Thank you so much for your assistance,<br>


&gt; Tim<br>


&gt;<br>


&gt; 1. /dev/vg/persist mounted directly without drbd<br>


&gt; 2. Enable DRBD by creating a single node configuration file<br>


&gt; 3. Reboot<br>


&gt; 4. Create metadata on separate disk (--max-peers=5)<br>


&gt; 5. drdbadm up persist<br>


&gt; 6. drbdadm invalidate persist<br>


&gt; 7. drbdadm primary --force persist<br>


&gt; 8. drbdadm down persist<br>


&gt; 9. drbdadm up persist<br>


&gt; 10. drbdadm invalidate persist*<br>


&gt; 11. drbdadm primary --force persist<br>


&gt; 12. mount /dev/drbd0 to /persist<br>


&gt; 13. start using that mount point<br>


&gt; 14. some time later<br>


&gt; 15. Modify configuration to add new target backup node<br>


&gt; 16. Copy config to remote node and reboot, it will restart in secondary<br>


&gt; 17. drbdadm adjust persist (on primary)<br>


&gt; 18. secondary comes up and initial sync starts<br>


&gt; 19. stop at 50% by disabling network interface<br>


&gt; 20. re-enable network interface<br>


&gt; 21. sync completes right away - node-id 0 message here<br>


&gt; 22. drbdadm verify persist - fails many blocks<br>


&gt;<br>


&gt;<br>


&gt;<br>


&gt;<br>


&gt; From: Joel Colledge &lt;joel.colledge@linbit.com&gt;<br>


&gt; Sent: Wednesday, March 20, 2024 12:02 AM<br>


&gt; To: Tim Westbrook &lt;Tim_Westbrook@selinc.com&gt;<br>


&gt; Cc: drbd-user@lists.linbit.com &lt;drbd-user@lists.linbit.com&gt;<br>


&gt; Subject: Re: Usynced blocks if replication is interrupted during initial sync<br>


&gt;<br>


&gt; [Caution - External]<br>


&gt;<br>


&gt; &gt; We are still seeing the issue as described but perhaps I am not putting the invalidate<br>


&gt; &gt; at the right spot<br>


&gt; &gt;<br>


&gt; &gt; Note - I've added it at step 6 below, but I'm wondering if it should be after<br>


&gt; &gt; the additional node is configured and adjusted (in which case I would need to<br>


&gt; &gt; unmount as apparently you can't invalidate a disk in use)<br>


&gt; &gt;<br>


&gt; &gt; So do I need to invalidate after every node is added?<br>


&gt;<br>


&gt; With my reproducer, the workaround at step 6 works.<br>


&gt;<br>


&gt; &gt; Also Note, the node-id in the logs from the kernel is 0 but peers are configured with 1 and 2 ,<br>


&gt; &gt; is this an issue or they separate ids?<br>


&gt;<br>


&gt; I presume you are referring to the line:<br>


&gt; &quot;Copying bitmap of peer node_id=0&quot;<br>


&gt; The reason that node ID 0 appears here is that DRBD stores a bitmap of<br>


&gt; the blocks that have changed since it was first brought up. This is<br>


&gt; the &quot;day0&quot; bitmap. This is stored in all unused bitmap slots. All<br>


&gt; unused node IDs point to one of these bitmaps. In this case, node ID 0<br>


&gt; is unused. So this line means that it is using the day0 bitmap here.<br>


&gt; This is unexpected, as mentioned in my previous reply.<br>


&gt;<br>


&gt; Joel<br>


</div>


</span></font></div>


</body>


</html>