Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hello Lars and All, Please look bellow On Thu, 2009-10-29 at 16:53 +0100, Lars Ellenberg wrote: > On Thu, Oct 29, 2009 at 04:40:01PM +0200, Theophanis Kontogiannis wrote: > > Hello all again. > > > > In continuation to the bellow described issue, with integrity check > > enabled, I used to get a crash at least once per 24 hours. > > No. > You don't get "crashes". > > You configured it to fence its peer on connection loss, > and that is what it does. > Correct in strict terminology. I just had in my mind that both nodes get fenced so I get "crush" in the sense of having no service. But yes, the actual thing is that it gets fenced. > > Now I have integrity check disabled and the cluster is running without > > crashes for the last 9 days. > > > > Could someone kindly provide some hints for the possible reasons of > > this observed behavior? > > > > Off-loading is disabled on both dedicated gigabit NICs. > > Either something modifies in-flight buffers, > which may or may not be intentional, > and may or may not be "safe" wrt file system data integrity. > > Or you actually _do_ have data corruption. > > If drbd detects checksum mismatch (== data corruption, > or more general: data received is not the same as > it was when calculating the checksum before it was > send), rather than knowingly writing diverging data, > drbd disconnects, and tries to reconnect, > hoping for the bitmap based resync to send > "better" data this time. > > On disconnect, if so configured, a primary will call its > fence-peer handler. > > You configured "obliterate" as fence peer handler. > > So it "obliterates" its peer. > > > Also is integrity-check really needed (I have read the > > documentation :) ) if it keeps on breaking the cluster? > > If you rather have silent data corruption :-) > > ==> Find the cause of the checksum mismatch. > Is there any way to track to really low level the crc error? Turn on insane debugging on drbd or something else? I can not think of any good way to go low level for that! Thank you All for your time. T.K. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20091101/b04139f1/attachment.htm>