[DRBD-user] DRBD crash on two nodes cluster. Some help please?
Theophanis Kontogiannis
theophanis_kontogiannis at yahoo.com
Sun Nov 1 14:05:42 CET 2009
Hello Lars and All,
Please look bellow
On Thu, 2009-10-29 at 16:53 +0100, Lars Ellenberg wrote:
> On Thu, Oct 29, 2009 at 04:40:01PM +0200, Theophanis Kontogiannis wrote:
> > Hello all again.
> >
> > In continuation to the bellow described issue, with integrity check
> > enabled, I used to get a crash at least once per 24 hours.
>
> No.
> You don't get "crashes".
>
> You configured it to fence its peer on connection loss,
> and that is what it does.
>
Correct in strict terminology. I just had in my mind that both nodes get
fenced so I get "crush" in the sense of having no service.
But yes, the actual thing is that it gets fenced.
> > Now I have integrity check disabled and the cluster is running without
> > crashes for the last 9 days.
> >
> > Could someone kindly provide some hints for the possible reasons of
> > this observed behavior?
> >
> > Off-loading is disabled on both dedicated gigabit NICs.
>
> Either something modifies in-flight buffers,
> which may or may not be intentional,
> and may or may not be "safe" wrt file system data integrity.
>
> Or you actually _do_ have data corruption.
>
> If drbd detects checksum mismatch (== data corruption,
> or more general: data received is not the same as
> it was when calculating the checksum before it was
> send), rather than knowingly writing diverging data,
> drbd disconnects, and tries to reconnect,
> hoping for the bitmap based resync to send
> "better" data this time.
>
> On disconnect, if so configured, a primary will call its
> fence-peer handler.
>
> You configured "obliterate" as fence peer handler.
>
> So it "obliterates" its peer.
>
> > Also is integrity-check really needed (I have read the
> > documentation :) ) if it keeps on breaking the cluster?
>
> If you rather have silent data corruption :-)
>
> ==> Find the cause of the checksum mismatch.
>
Is there any way to track to really low level the crc error? Turn on
insane debugging on drbd or something else?
I can not think of any good way to go low level for that!
Thank you All for your time.
T.K.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20091101/b04139f1/attachment.htm>
More information about the drbd-user
mailing list