[DRBD-user] BUG: Uncatchable DRBD out-of-sync issue

Pascal BERTON pascal.berton3 at free.fr
Mon Jan 27 15:03:06 CET 2014

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Are you using iSCSI to access your volumes ? Might worth it activating iSCSI digests on both sides and see how it behaves then, wouldn’t it ? You’d probably lose some perfs but it would probably too help you identify the root cause of your problems I guess…

 

Regards,

 

Pascal.

 

De : drbd-user-bounces at lists.linbit.com [mailto:drbd-user-bounces at lists.linbit.com] De la part de Stanislav German-Evtushenko
Envoyé : lundi 27 janvier 2014 13:51
À : Bram Matthys
Cc : drbd-user
Objet : Re: [DRBD-user] BUG: Uncatchable DRBD out-of-sync issue

 

 

On Mon, Jan 27, 2014 at 4:18 PM, Bram Matthys <syzop at vulnscan.org <mailto:syzop at vulnscan.org> > wrote:

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

Hi,

Just jumping in, unaware of the history of this thread...

Stanislav German-Evtushenko wrote, on 27-1-2014 7:08:

>
> On Thu, Apr 18, 2013 at 4:21 PM, Stanislav German-Evtushenko

> <ginermail at gmail.com <mailto:ginermail at gmail.com>  <mailto:ginermail at gmail.com <mailto:ginermail at gmail.com> >> wrote:
>
>     No choice so far :)
>     http://pve.proxmox.com/wiki/Roadmap#Proxmox_VE_2.3
>
>     I don't think this is a kernel bug. Anyway would be nice if sombody
>     can investigate and fix or at least find work around. IDE is slow in
>     compare to VIRTIO.
>
>     On Thu, Apr 18, 2013 at 2:31 PM, Felix Frank <ff at mpexnet.de <mailto:ff at mpexnet.de> 

>     <mailto:ff at mpexnet.de <mailto:ff at mpexnet.de> >> wrote:
>     > On 04/18/2013 12:20 PM, Stanislav German-Evtushenko wrote:
>     >>> Note that your kernel (and hence kvm/virtio) can be considered
>     rather old by now.
>     >> This is a stable RHEL 6 kernel at the moment.
>     >
>     > Exactly ;-)
>     >
>     > Same for Debian 6, which I no longer consider fit for KVM setups
>     > (without backports and such).
>
>
> I have replaced all hard-drives on the first server and upgraded DRBD kernel
> modules to 8.3.15. I do verifying every week. It usually founds new
> out-of-sync sectors, then I check if they are false-positive or not (with
> md5sum) and find that 95% of them are real.
> Could anybody suggest a way to debug? Can it be DRBD + RAID problem? Or DRBD
> + one specific RAID problem?

Have you figured out on which one of the servers the data is correct? And is
it always the same server? This assumes a primary/secondary setup.
If you know on which server the data is correct then you know - IF it's a
hardware problem - which server is at fault. If it's a software problem,
then you still can't tell.

Do you run a weekly/monthly RAID verification job? On both servers? Linux sw
raid has this, and presumably hw raid has this option as well.
This would pick up (most) RAID / disk issues.
Silent disk corruption on RAID arrays can occur and disk verification would
be the only way to tell (well, apart from using a filesystem like ZFS).

Good luck,

Bram.


- --
Bram Matthys
Software developer/IT consultant        syzop at vulnscan.org <mailto:syzop at vulnscan.org> 
Website:                                  www.vulnscan.org <http://www.vulnscan.org> 
PGP key:                       www.vulnscan.org/pubkey.asc <http://www.vulnscan.org/pubkey.asc> 
PGP fp: EBCA 8977 FCA6 0AB0 6EDB  04A7 6E67 6D45 7FE1 99A6
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.17 (MingW32)

iF4EAREIAAYFAlLmToAACgkQbmdtRX/hmabbewD9HEaFbFw1j91AgDiAbgWcDari
qZ/fYOYBw/qyMMempbMA/iCKM5Y2Oa3XAUApPWc05cTZ+W9FyOGdOmNgIl4FMGE0
=z7Jn
-----END PGP SIGNATURE-----
_______________________________________________
drbd-user mailing list
drbd-user at lists.linbit.com <mailto:drbd-user at lists.linbit.com> 
http://lists.linbit.com/mailman/listinfo/drbd-user



> Have you figured out on which one of the servers the data is correct?
> And is it always the same server?
It depends on what server is writing. On the one which write it is always correct.
Servers are identical and firmwares are up to date.


> Do you run a weekly/monthly RAID verification job? On both servers?

That is nice point to try. I've been thinking I'd tried everything already.


> This would pick up (most) RAID / disk issues.

This is very unlikely, however I'll try to run RAID verification job on both and will come back with results.

Stanislav



---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce que la protection avast! Antivirus est active.
http://www.avast.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20140127/5cd7a6ed/attachment.htm>


More information about the drbd-user mailing list