[DRBD-user] BUG: Uncatchable DRBD out-of-sync issue

Stanislav German-Evtushenko ginermail at gmail.com
Mon Jan 27 13:51:02 CET 2014

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


On Mon, Jan 27, 2014 at 4:18 PM, Bram Matthys <syzop at vulnscan.org> wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA256
>
> Hi,
>
> Just jumping in, unaware of the history of this thread...
>
> Stanislav German-Evtushenko wrote, on 27-1-2014 7:08:
> >
> > On Thu, Apr 18, 2013 at 4:21 PM, Stanislav German-Evtushenko
> > <ginermail at gmail.com <mailto:ginermail at gmail.com>> wrote:
> >
> >     No choice so far :)
> >     http://pve.proxmox.com/wiki/Roadmap#Proxmox_VE_2.3
> >
> >     I don't think this is a kernel bug. Anyway would be nice if sombody
> >     can investigate and fix or at least find work around. IDE is slow in
> >     compare to VIRTIO.
> >
> >     On Thu, Apr 18, 2013 at 2:31 PM, Felix Frank <ff at mpexnet.de
> >     <mailto:ff at mpexnet.de>> wrote:
> >     > On 04/18/2013 12:20 PM, Stanislav German-Evtushenko wrote:
> >     >>> Note that your kernel (and hence kvm/virtio) can be considered
> >     rather old by now.
> >     >> This is a stable RHEL 6 kernel at the moment.
> >     >
> >     > Exactly ;-)
> >     >
> >     > Same for Debian 6, which I no longer consider fit for KVM setups
> >     > (without backports and such).
> >
> >
> > I have replaced all hard-drives on the first server and upgraded DRBD
> kernel
> > modules to 8.3.15. I do verifying every week. It usually founds new
> > out-of-sync sectors, then I check if they are false-positive or not (with
> > md5sum) and find that 95% of them are real.
> > Could anybody suggest a way to debug? Can it be DRBD + RAID problem? Or
> DRBD
> > + one specific RAID problem?
>
> Have you figured out on which one of the servers the data is correct? And
> is
> it always the same server? This assumes a primary/secondary setup.
> If you know on which server the data is correct then you know - IF it's a
> hardware problem - which server is at fault. If it's a software problem,
> then you still can't tell.
>
> Do you run a weekly/monthly RAID verification job? On both servers? Linux
> sw
> raid has this, and presumably hw raid has this option as well.
> This would pick up (most) RAID / disk issues.
> Silent disk corruption on RAID arrays can occur and disk verification would
> be the only way to tell (well, apart from using a filesystem like ZFS).
>
> Good luck,
>
> Bram.
>
>
> - --
> Bram Matthys
> Software developer/IT consultant        syzop at vulnscan.org
> Website:                                  www.vulnscan.org
> PGP key:                       www.vulnscan.org/pubkey.asc
> PGP fp: EBCA 8977 FCA6 0AB0 6EDB  04A7 6E67 6D45 7FE1 99A6
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v2.0.17 (MingW32)
>
> iF4EAREIAAYFAlLmToAACgkQbmdtRX/hmabbewD9HEaFbFw1j91AgDiAbgWcDari
> qZ/fYOYBw/qyMMempbMA/iCKM5Y2Oa3XAUApPWc05cTZ+W9FyOGdOmNgIl4FMGE0
> =z7Jn
> -----END PGP SIGNATURE-----
> _______________________________________________
> drbd-user mailing list
> drbd-user at lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user
>


> Have you figured out on which one of the servers the data is correct?
> And is it always the same server?
It depends on what server is writing. On the one which write it is always
correct.
Servers are identical and firmwares are up to date.

> Do you run a weekly/monthly RAID verification job? On both servers?
That is nice point to try. I've been thinking I'd tried everything already.

> This would pick up (most) RAID / disk issues.
This is very unlikely, however I'll try to run RAID verification job on
both and will come back with results.

Stanislav
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20140127/63f5bfd2/attachment.htm>


More information about the drbd-user mailing list