Hi Lars: <br><br>Yesterday, On the secondary, I shut down drbd, try to use rsync to recovery the snapshot, but failed. Then I start drbd to resync from the primary node. It was the only time I bypass drbd, howerver,the application seems good at that time.<br>
<br>This afternoon, we continue our experiment, disconnect drbd on secondary, changed something and resync from priamry. Oracle was unable to start. drbd status suggest that both node were uptodate, drbadm verify doesn't report any error. Disconnect one side and mount drbd device on both node, md5sum show that some files were different.<br>
<br><br><div class="gmail_quote">On Fri, Aug 26, 2011 at 8:28 PM, Lars Ellenberg <span dir="ltr"><<a href="mailto:lars.ellenberg@linbit.com">lars.ellenberg@linbit.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
<div class="im">On Fri, Aug 26, 2011 at 05:50:59PM +0800, Lyre wrote:<br>
> Hi all:<br>
><br>
> Is there a way to check the data integrity on both node? I've encounter<br>
> an confusing problem.<br>
><br>
> We have two drbd devices, 20Gb one for application, and 300Gb one for oracle<br>
> database(oracle was installed in a different location, which were not<br>
> replicated), csums-alg & verify-alg were configured to crc32c, bandwidth<br>
> was 100M.<br>
<br>
</div>I suggest to not use the same algorithm for csums-alg and verify-alg,<br>
so the verify can detect differences which the csums based resnc thought<br>
where identical due to identical checksums respective hash colision.<br>
<br>
I further suggest that you use stronger hash algorithms for both.<br>
Like md5 and sha1, or similar.<br>
<div class="im"><br>
> I try to upgrade our app & database on secondary(node2) , by disconnect and<br>
> promote 2 drbd devices to primary and then perform the upgrade, it was fine.<br>
> Then I try to roll back the seondary the original version, by drbdadm --<br>
> --discard-my-data connect drbdX. After drbd sync, oracle was unable to<br>
> startup, it reports data corruption. So I issue drbdadm verify, but it<br>
> doesn't report anything, everything seems good.<br>
><br>
> I disconnect drbds and then mount them on both side, md5sum all files on the<br>
> disk. I diff the output from both side and found that serval database file's<br>
> md5 value wasn't identical. Then I connect drbds and found that it begin to<br>
> sync about 30Gb's data. I didn't the device read only, but have all<br>
> applications stopped.<br>
><br>
><br>
> BTW, In earier, I've try to recovery the underlying lvm devices from<br>
> snapshot, but get IO error, so I just canceled and resync it. Does it<br>
> matter? Since I've get rid of drbd there.<br>
<br>
</div>Can you give more detail there?<br>
What did you do?<br>
Did that manipulate DRBD meta data as well?<br>
Did you bypass DRBD at some point during the process?<br>
<br>
<br>
--<br>
: Lars Ellenberg<br>
: LINBIT | Your Way to High Availability<br>
: DRBD/HA support and consulting <a href="http://www.linbit.com" target="_blank">http://www.linbit.com</a><br>
<br>
DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.<br>
__<br>
please don't Cc me, but send to list -- I'm subscribed<br>
_______________________________________________<br>
drbd-user mailing list<br>
<a href="mailto:drbd-user@lists.linbit.com">drbd-user@lists.linbit.com</a><br>
<a href="http://lists.linbit.com/mailman/listinfo/drbd-user" target="_blank">http://lists.linbit.com/mailman/listinfo/drbd-user</a><br>
</blockquote></div><br>