Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hi, I'm testing a DRBD+MySQL environment in production, but after a while the second node always gets disconnected, and I have no idea if it's a hardware problem or missconfiguration. The second node is not even mounted. I'm just replicating the data, not using it. The error is on the end of the message. Here is my conf: resource r0 { meta-disk internal; device /dev/drbd0; disk /dev/sda4; syncer { rate 33M; } handlers { split-brain "/etc/init.d/mysql stop"; } net { allow-two-primaries; after-sb-0pri discard-zero-changes; after-sb-1pri discard-secondary; after-sb-2pri disconnect; data-integrity-alg crc32c; ko-count 4; } startup { become-primary-on both; } on stewart { address 192.168.0.1:7789; } on prost { address 192.168.0.2:7789; } } Is there something wrong in my conf? Should I change something? Another problem is that after the second node gets disconnected, I have to reconnect it my hand my running "drbdadm connect r0". Aparently after running it the nodes get quickly re-synced (less then a minute), and the previously disconnected node starts as Secondary, so I had to run "drbdadm primary r0". Both nodes are Dell PowerEdge R710 with 48GB of ram, running RHEL 5.6 and DRBD 8.3.10 (from ElRepo). Am I missing something here? Thanks for any help! Regards, Thiago Vinhas block drbd0: Digest integrity check FAILED: 63266864s +4096 block drbd0: error receiving Data, l: 4136! block drbd0: peer( Primary -> Unknown ) conn( Connected -> ProtocolError ) pdsk( UpToDate -> DUnknown ) block drbd0: new current UUID 66983E6BBEE733F5:6157ABDB87926AA5:0001000000000001:5905CD0F6B61A6A9 block drbd0: asender terminated block drbd0: Terminating asender thread block drbd0: Connection closed block drbd0: conn( ProtocolError -> Unconnected ) block drbd0: receiver terminated block drbd0: Restarting receiver thread block drbd0: receiver (re)started block drbd0: conn( Unconnected -> WFConnection ) block drbd0: Handshake successful: Agreed network protocol version 96 block drbd0: conn( WFConnection -> WFReportParams ) block drbd0: Starting asender thread (from drbd0_receiver [7794]) block drbd0: data-integrity-alg: md5 block drbd0: drbd_sync_handshake: block drbd0: self 66983E6BBEE733F5:6157ABDB87926AA5:0001000000000001:5905CD0F6B61A6A9 bits:0 flags:0 block drbd0: peer 4C9FC71A2D13AF9F:6157ABDB87926AA5:0001000000000000:5905CD0F6B61A6A9 bits:40 flags:0 block drbd0: uuid_compare()=100 by rule 90 block drbd0: helper command: /sbin/drbdadm initial-split-brain minor-0 block drbd0: helper command: /sbin/drbdadm initial-split-brain minor-0 exit code 0 (0x0) block drbd0: Split-Brain detected but unresolved, dropping connection! block drbd0: helper command: /sbin/drbdadm split-brain minor-0 block drbd0: meta connection shut down by peer. block drbd0: conn( WFReportParams -> NetworkFailure ) block drbd0: asender terminated block drbd0: Terminating asender thread block drbd0: helper command: /sbin/drbdadm split-brain minor-0 exit code 0 (0x0) block drbd0: conn( NetworkFailure -> Disconnecting ) block drbd0: error receiving ReportState, l: 4! block drbd0: Connection closed block drbd0: conn( Disconnecting -> StandAlone ) block drbd0: receiver terminated block drbd0: Terminating receiver thread Abs, Thiago Vinhas -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20110623/a4df4c39/attachment.htm>