[DRBD-user] data integrity in drbd

Lyre 4179e1 at gmail.com
Fri Aug 26 17:33:09 CEST 2011


Hi Lars:

Yesterday,  On the secondary,  I shut down drbd, try to use rsync to
recovery the snapshot, but failed. Then I start drbd to resync from the
primary node. It was the only time I bypass drbd, howerver,the application
seems good at that time.

This afternoon, we continue our experiment, disconnect drbd on secondary,
changed something and resync from priamry.  Oracle was unable to start. drbd
status suggest that both node were uptodate, drbadm verify doesn't report
any error. Disconnect one side and mount drbd device on both node, md5sum
show that some files were different.


On Fri, Aug 26, 2011 at 8:28 PM, Lars Ellenberg
<lars.ellenberg at linbit.com>wrote:

> On Fri, Aug 26, 2011 at 05:50:59PM +0800, Lyre wrote:
> > Hi all:
> >
> >     Is there a way to check the data integrity on both node? I've
> encounter
> > an confusing problem.
> >
> > We have two drbd devices, 20Gb one for application, and 300Gb one for
> oracle
> > database(oracle was installed in a different location, which were not
> > replicated), csums-alg & verify-alg were configured to crc32c,  bandwidth
> > was 100M.
>
> I suggest to not use the same algorithm for csums-alg and verify-alg,
> so the verify can detect differences which the csums based resnc thought
> where identical due to identical checksums respective hash colision.
>
> I further suggest that you use stronger hash algorithms for both.
> Like md5 and sha1, or similar.
>
> >  I try to upgrade our app & database on secondary(node2) , by disconnect
> and
> > promote 2 drbd devices to primary and then perform the upgrade, it was
> fine.
> > Then I try to roll back the seondary the original version, by  drbdadm --
> > --discard-my-data connect drbdX.  After drbd sync,  oracle was unable to
> > startup, it reports data corruption. So I issue drbdadm verify, but it
> > doesn't report anything, everything seems good.
> >
> > I disconnect drbds and then mount them on both side, md5sum all files on
> the
> > disk. I diff the output from both side and found that serval database
> file's
> > md5 value wasn't identical. Then I connect drbds and found that it begin
> to
> > sync about 30Gb's data. I didn't the device read only, but have all
> > applications stopped.
> >
> >
> > BTW,  In earier, I've try to recovery the underlying  lvm devices from
> > snapshot, but get IO error, so I just canceled and resync it. Does it
> > matter? Since I've get rid of drbd there.
>
> Can you give more detail there?
> What did you do?
> Did that manipulate DRBD meta data as well?
> Did you bypass DRBD at some point during the process?
>
>
> --
> : Lars Ellenberg
> : LINBIT | Your Way to High Availability
> : DRBD/HA support and consulting http://www.linbit.com
>
> DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
> __
> please don't Cc me, but send to list   --   I'm subscribed
> _______________________________________________
> drbd-user mailing list
> drbd-user at lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20110826/87a79eb2/attachment.htm>


More information about the drbd-user mailing list