[DRBD-user] Digest integrity check FAILED

Fri Jun 24 19:36:25 CEST 2011

Hi Lars, thanks for your response.

What I'm trying to achieve is a load balanced MySQL cluster, where my
application could write to both MYSQL servers, under DRBD.
I'm actually using OCFS2 as the filesystem. But after several hours my nodes
are always getting disconnected due to the failure I pointed out in the last
message.

The reason I'm running DRBD in dual primary mode, is that I expect to have
MySQL writing on both nodes. Am I missing something?

What benefit would I have by running a Primary/Secondary configuration,
instead of a Primary/Primary?

Best regards,
Thiago Vinhas

On Fri, Jun 24, 2011 at 2:27 PM, Lars Ellenberg
<lars.ellenberg at linbit.com>wrote:

> On Thu, Jun 23, 2011 at 07:39:00AM -0300, Thiago Vinhas wrote:
> > Hi,
> >
> > I'm testing a DRBD+MySQL environment in production, but after a while the
> > second node always gets disconnected, and I have no idea if it's a
> hardware
> > problem or missconfiguration.
> > The second node is not even mounted. I'm just replicating the data, not
> > using it.
> >
> > The error is on the end of the message. Here is my conf:
> >
> >
> > resource r0 {
> >         meta-disk internal;
> >         device /dev/drbd0;
> >         disk /dev/sda4;
> >
> >         syncer { rate 33M; }
> >
> >         handlers {
> >         split-brain "/etc/init.d/mysql stop";
> >         }
> >
> >         net {
> >                 allow-two-primaries;
>
> WHY?? You very likely do not want two primaries,
> only you do not know it yet ;-)
>
>
> >                 after-sb-0pri discard-zero-changes;
> >                 after-sb-1pri discard-secondary;
> >                 after-sb-2pri disconnect;
> >                 data-integrity-alg crc32c;
>
> Have you read
>
> http://www.mail-archive.com/drbd-user@lists.linbit.com/msg03373.html
>
>
> >                 ko-count 4;
> >         }
> >
> >         startup { become-primary-on both; }
>
> Why??
> You do not want that.
> Really.
> Most people trying to use "dual primary DRBD"
> are really not needing it.
>
> If you think you really want it, make sure that you understand,
> and are able to deal with, the additional complexity it involves.
>
> You realize of course that concurrent access with standard file systems
> simply does not work, for that you need to use OCFS or GFS.
>
> >         on stewart { address 192.168.0.1:7789; }
> >         on prost { address 192.168.0.2:7789; }
> > }
> >
> >
> > Is there something wrong in my conf? Should I change something?
> > Another problem is that after the second node gets disconnected, I have
> to
> > reconnect it my hand my running "drbdadm connect r0". Aparently after
> > running it the nodes get quickly re-synced (less then a minute), and the
> > previously disconnected node starts as Secondary, so I had to run
> "drbdadm
> > primary r0".
> >
> > Both nodes are Dell PowerEdge R710 with 48GB of ram, running RHEL 5.6 and
> > DRBD 8.3.10 (from ElRepo).
> >
> > Am I missing something here?
> >
> >
> > Thanks for any help!
> >
> > Regards,
> > Thiago Vinhas
> > block drbd0: Digest integrity check FAILED: 63266864s +4096
> > block drbd0: error receiving Data, l: 4136!
> > block drbd0: peer( Primary -> Unknown ) conn( Connected -> ProtocolError
> )
> > pdsk( UpToDate -> DUnknown )
> > block drbd0: new current UUID
> > 66983E6BBEE733F5:6157ABDB87926AA5:0001000000000001:5905CD0F6B61A6A9
> > block drbd0: asender terminated
> > block drbd0: Terminating asender thread
> > block drbd0: Connection closed
> > block drbd0: conn( ProtocolError -> Unconnected )
> > block drbd0: receiver terminated
> > block drbd0: Restarting receiver thread
> > block drbd0: receiver (re)started
> > block drbd0: conn( Unconnected -> WFConnection )
> > block drbd0: Handshake successful: Agreed network protocol version 96
> > block drbd0: conn( WFConnection -> WFReportParams )
> > block drbd0: Starting asender thread (from drbd0_receiver [7794])
> > block drbd0: data-integrity-alg: md5
> > block drbd0: drbd_sync_handshake:
> > block drbd0: self
> > 66983E6BBEE733F5:6157ABDB87926AA5:0001000000000001:5905CD0F6B61A6A9
> bits:0
> > flags:0
> > block drbd0: peer
> > 4C9FC71A2D13AF9F:6157ABDB87926AA5:0001000000000000:5905CD0F6B61A6A9
> bits:40
> > flags:0
> > block drbd0: uuid_compare()=100 by rule 90
> > block drbd0: helper command: /sbin/drbdadm initial-split-brain minor-0
> > block drbd0: helper command: /sbin/drbdadm initial-split-brain minor-0
> exit
> > code 0 (0x0)
> > block drbd0: Split-Brain detected but unresolved, dropping connection!
> > block drbd0: helper command: /sbin/drbdadm split-brain minor-0
> > block drbd0: meta connection shut down by peer.
> > block drbd0: conn( WFReportParams -> NetworkFailure )
> > block drbd0: asender terminated
> > block drbd0: Terminating asender thread
> > block drbd0: helper command: /sbin/drbdadm split-brain minor-0 exit code
> 0
> > (0x0)
> > block drbd0: conn( NetworkFailure -> Disconnecting )
> > block drbd0: error receiving ReportState, l: 4!
> > block drbd0: Connection closed
> > block drbd0: conn( Disconnecting -> StandAlone )
> > block drbd0: receiver terminated
> > block drbd0: Terminating receiver thread
> >
> > Abs,
> > Thiago Vinhas
>
> > _______________________________________________
> > drbd-user mailing list
> > drbd-user at lists.linbit.com
> > http://lists.linbit.com/mailman/listinfo/drbd-user
>
>
> --
> : Lars Ellenberg
> : LINBIT | Your Way to High Availability
> : DRBD/HA support and consulting http://www.linbit.com
>
> DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
> __
> please don't Cc me, but send to list   --   I'm subscribed
> _______________________________________________
> drbd-user mailing list
> drbd-user at lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20110624/85ff6e9a/attachment.htm>