Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On Tue, Jan 25, 2011 at 10:36:03AM +1100, Lew wrote: > Jan 23 22:19:35 emlsurit-v4 kernel: [25910.905963] block drbd9: drbd_sync_handshake: > Jan 23 22:19:35 emlsurit-v4 kernel: [25910.905967] block drbd9: self 49615ABF1622FC55:643454BA1CA67140:5625CFAB3DDD24A2:EA5079D16F8C7807 bits:143432 flags:0 > Jan 23 22:19:35 emlsurit-v4 kernel: [25910.905971] block drbd9: peer 6116B0558277E470:643454BA1CA67140:5625CFAB3DDD24A2:EA5079D16F8C7807 bits:336381 flags:0 There. Both nodes have changes the other node did not see (yet). That's where DRBD can detect that there previously has been data divergence, usually caused by cluster split brain. > Jan 23 22:19:35 emlsurit-v4 kernel: [25910.905975] block drbd9: uuid_compare()=100 by rule 90 > Jan 23 22:19:35 emlsurit-v4 kernel: [25910.906273] block drbd9: helper command: /sbin/drbdadm split-brain minor-9 > Jan 23 22:19:35 emlsurit-v4 kernel: [25910.937925] block drbd9: conn( WFReportParams -> NetworkFailure ) > Jan 23 22:19:35 emlsurit-v4 kernel: [25910.937935] block drbd9: asender terminated > Jan 23 22:19:35 emlsurit-v4 kernel: [25910.937938] block drbd9: Terminating asender thread > Jan 23 22:19:35 emlsurit-v4 kernel: [25910.950821] block drbd9: helper command: /sbin/drbdadm split-brain minor-9 exit code 127 (0x7f00) > Jan 23 22:19:35 emlsurit-v4 kernel: [25910.950827] block drbd9: conn( NetworkFailure -> Disconnecting ) > Jan 23 22:19:35 emlsurit-v4 kernel: [25910.951122] block drbd9: Connection closed > Jan 23 22:19:35 emlsurit-v4 kernel: [25910.951129] block drbd9: conn( Disconnecting -> StandAlone ) > Jan 23 22:19:35 emlsurit-v4 kernel: [25910.951149] block drbd9: receiver terminated > Jan 23 22:19:35 emlsurit-v4 kernel: [25910.951151] block drbd9: Terminating receiver thread Which is detected. DRBD cannot decide which version of your data you'd rather keep, so the default behaviour is to drop the network connection, and no longer talk to the peer. But 15 minutes later, you decide to try again to connect them, > Jan 23 22:34:37 emlsurit-v4 kernel: [26811.487616] block drbd9: conn( StandAlone -> Unconnected ) > Jan 23 22:34:37 emlsurit-v4 kernel: [26811.487638] block drbd9: Starting receiver thread (from drbd9_worker [2126]) > Jan 23 22:34:37 emlsurit-v4 kernel: [26811.487690] block drbd9: receiver (re)started > Jan 23 22:34:37 emlsurit-v4 kernel: [26811.487696] block drbd9: conn( Unconnected -> WFConnection ) > Jan 23 22:35:04 emlsurit-v4 kernel: [26838.182513] block drbd9: Handshake successful: Agreed network protocol version 91 > Jan 23 22:35:04 emlsurit-v4 kernel: [26838.182522] block drbd9: conn( WFConnection -> WFReportParams ) > Jan 23 22:35:04 emlsurit-v4 kernel: [26838.182539] block drbd9: Starting asender thread (from drbd9_receiver [20045]) > Jan 23 22:35:04 emlsurit-v4 kernel: [26838.183313] block drbd9: data-integrity-alg: <not-used> > Jan 23 22:35:04 emlsurit-v4 kernel: [26838.183340] block drbd9: drbd_sync_handshake: > Jan 23 22:35:04 emlsurit-v4 kernel: [26838.183345] block drbd9: self 49615ABF1622FC55:643454BA1CA67140:5625CFAB3DDD24A2:EA5079D16F8C7807 bits:143799 flags:0 > Jan 23 22:35:04 emlsurit-v4 kernel: [26838.183349] block drbd9: peer 6116B0558277E470:643454BA1CA67140:5625CFAB3DDD24A2:EA5079D16F8C7807 bits:336381 flags:0 DRBD notices that you still have not decided which version to use, and we can see that currently, emsulrit-v4 is still being actively modified (we cannot be sure about the other node, though). > Jan 23 22:35:04 emlsurit-v4 kernel: [26838.183353] block drbd9: uuid_compare()=100 by rule 90 > Jan 23 22:35:04 emlsurit-v4 kernel: [26838.183610] block drbd9: helper command: /sbin/drbdadm split-brain minor-9 > Jan 23 22:35:04 emlsurit-v4 kernel: [26838.192301] block drbd9: conn( WFReportParams -> NetworkFailure ) > Jan 23 22:35:04 emlsurit-v4 kernel: [26838.192309] block drbd9: asender terminated > Jan 23 22:35:04 emlsurit-v4 kernel: [26838.192311] block drbd9: Terminating asender thread > Jan 23 22:35:04 emlsurit-v4 kernel: [26838.192702] block drbd9: helper command: /sbin/drbdadm split-brain minor-9 exit code 127 (0x7f00) > Jan 23 22:35:04 emlsurit-v4 kernel: [26838.192709] block drbd9: conn( NetworkFailure -> Disconnecting ) > Jan 23 22:35:04 emlsurit-v4 kernel: [26838.193004] block drbd9: Connection closed > Jan 23 22:35:04 emlsurit-v4 kernel: [26838.193012] block drbd9: conn( Disconnecting -> StandAlone ) And again, the connection is dropped. > Jan 23 22:35:04 emlsurit-v4 kernel: [26838.193027] block drbd9: receiver terminated > Jan 23 22:35:04 emlsurit-v4 kernel: [26838.193029] block drbd9: Terminating receiver thread > Jan 23 22:35:58 emlsurit-v4 kernel: [26892.356300] block drbd9: conn( StandAlone -> Unconnected ) > Jan 23 22:35:58 emlsurit-v4 kernel: [26892.356326] block drbd9: Starting receiver thread (from drbd9_worker [2126]) > Jan 23 22:35:58 emlsurit-v4 kernel: [26892.356519] block drbd9: receiver (re)started > Jan 23 22:35:58 emlsurit-v4 kernel: [26892.356527] block drbd9: conn( Unconnected -> WFConnection ) So... My guess is, that you still have two versions of your data. >From this log, there was no sync, because DRBD default behaviour in that case it to disconnect. Therefore no rollback, and no data loss. But you certainly have diverging data sets, and my guess is they keep diverging still. You have to figure out when they started to diverge, and why. And you have to sort it out, decide which to keep, and tell DRBD (see the User's Guide for details on this). Consider booking DRBD Training ;-) -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. __ please don't Cc me, but send to list -- I'm subscribed