Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hello, I have 2 nodes with a Heartbeat/DRBD configuration. (Node1: primary, the hostname is "bleu", Node2: secondary, the hostname is "rocamadour"). I want to check the automatic recovery. my configuration is: after-sb-0pri discard-least-changes; after-sb-1pri discard-secondary; after-sb-2pri call-pri-lost-after-sb; I disconnect the network wire from N2. N1 and N2 become primaries. I write some data on N1, I write MORE data on N2. I connect the network wire of N2. N1 immediately reboots. And after the boot of N1, data is synchronised from N1 to N2 (bad!!, there are more changes on N2) just before the boot of N1: May 25 16:38:26 rocamadour kernel: [ 1785.500483] block drbd1: drbd_sync_handshake: May 25 16:38:26 rocamadour kernel: [ 1785.500490] block drbd1: self 3E097AF02C8F65F1:106921FD4287CF02:D202FB3C36D5F660:59248EB25AF9FA1B bits:170465 flags:0 May 25 16:38:26 rocamadour kernel: [ 1785.500496] block drbd1: peer 7BF8153619974429:106921FD4287CF03:D202FB3C36D5F660:59248EB25AF9FA1B bits:227 flags:0 May 25 16:38:26 rocamadour kernel: [ 1785.500502] block drbd1: uuid_compare()=100 by rule 90 May 25 16:38:26 rocamadour kernel: [ 1785.500507] block drbd1: Split-Brain detected, 2 primaries, automatically solved. Sync from this node After the boot of N1: May 25 16:38:26 rocamadour kernel: [ 1785.500483] block drbd1: drbd_sync_handshake: May 25 16:38:58 rocamadour kernel: [ 1817.216740] block drbd1: self 3E097AF02C8F65F0:106921FD4287CF02:D202FB3C36D5F660:59248EB25AF9FA1B bits:170475 flags:0 May 25 16:38:58 rocamadour kernel: [ 1817.216746] block drbd1: peer 7BF8153619974428:106921FD4287CF03:D202FB3C36D5F660:59248EB25AF9FA1B bits:263168 flags:2 May 25 16:38:58 rocamadour kernel: [ 1817.216751] block drbd1: uuid_compare()=100 by rule 90 May 25 16:38:58 rocamadour kernel: [ 1817.216756] block drbd1: Split-Brain detected, 0 primaries, automatically solved. Sync from peer node I don't understand: - before the reboot "automatically solved. Sync from THIS node", the peer has "bits:227 flags:0" - and after the reboot "automatically solved. Sync from PEER node", the peer has "bits:263168 flags:2" why these differences? Thanks in advance for your light. --------- Thierry PS: The drbd version is "version: 8.3.3 (api:88/proto:86-91)". This is log (tail -f /var/log/messages | grep drbd) from N2 ----------------------------------------------- May 25 16:38:26 rocamadour kernel: [ 1785.500047] block drbd1: Handshake successful: Agreed network protocol version 91 May 25 16:38:26 rocamadour kernel: [ 1785.500061] block drbd1: conn( WFConnection -> WFReportParams ) May 25 16:38:26 rocamadour kernel: [ 1785.500093] block drbd1: Starting asender thread (from drbd1_receiver [1651]) May 25 16:38:26 rocamadour kernel: [ 1785.500412] block drbd1: data-integrity-alg: <not-used> May 25 16:38:26 rocamadour kernel: [ 1785.500483] block drbd1: drbd_sync_handshake: May 25 16:38:26 rocamadour kernel: [ 1785.500490] block drbd1: self 3E097AF02C8F65F1:106921FD4287CF02:D202FB3C36D5F660:59248EB25AF9FA1B bits:170465 flags:0 May 25 16:38:26 rocamadour kernel: [ 1785.500496] block drbd1: peer 7BF8153619974429:106921FD4287CF03:D202FB3C36D5F660:59248EB25AF9FA1B bits:227 flags:0 May 25 16:38:26 rocamadour kernel: [ 1785.500502] block drbd1: uuid_compare()=100 by rule 90 May 25 16:38:26 rocamadour kernel: [ 1785.500507] block drbd1: Split-Brain detected, 2 primaries, automatically solved. Sync from this node May 25 16:38:26 rocamadour kernel: [ 1785.500516] block drbd1: peer( Unknown -> Primary ) conn( WFReportParams -> WFBitMapS ) pdsk( Outdated -> UpToDate ) May 25 16:38:28 rocamadour ResourceManager[14013]: info: Releasing resource group: bleu IPaddr::192.168.1.228/24 drbddisk::mysql Filesystem::/dev/drbd1::/mnt/drbd::ext3::defaults mysql mon May 25 16:38:37 rocamadour kernel: [ 1796.000056] block drbd1: peer( Primary -> Unknown ) conn( WFBitMapS -> NetworkFailure ) pdsk( UpToDate -> DUnknown ) May 25 16:38:37 rocamadour kernel: [ 1796.000070] block drbd1: asender terminated May 25 16:38:37 rocamadour kernel: [ 1796.000075] block drbd1: Terminating asender thread May 25 16:38:44 rocamadour kernel: [ 1803.500048] block drbd1: md_sync_timer expired! Worker calls drbd_md_sync(). May 25 16:38:44 rocamadour kernel: [ 1803.518126] block drbd1: Connection closed May 25 16:38:44 rocamadour kernel: [ 1803.518137] block drbd1: helper command: /sbin/drbdadm fence-peer minor-1 May 25 16:38:45 rocamadour kernel: [ 1803.657969] block drbd1: helper command: /sbin/drbdadm fence-peer minor-1 exit code 5 (0x500) May 25 16:38:45 rocamadour kernel: [ 1803.657978] block drbd1: fence-peer helper returned 5 (peer is unreachable, assumed to be dead) May 25 16:38:45 rocamadour kernel: [ 1803.657993] block drbd1: pdsk( DUnknown -> Outdated ) May 25 16:38:45 rocamadour kernel: [ 1803.668081] block drbd1: conn( NetworkFailure -> Unconnected ) May 25 16:38:45 rocamadour kernel: [ 1803.668090] block drbd1: receiver terminated May 25 16:38:45 rocamadour kernel: [ 1803.668095] block drbd1: Restarting receiver thread May 25 16:38:45 rocamadour kernel: [ 1803.668100] block drbd1: receiver (re)started May 25 16:38:45 rocamadour kernel: [ 1803.668106] block drbd1: conn( Unconnected -> WFConnection ) May 25 16:38:45 rocamadour ResourceManager[14013]: info: Running /etc/ha.d/resource.d/Filesystem /dev/drbd1 /mnt/drbd ext3 defaults stop May 25 16:38:45 rocamadour Filesystem[14305]: INFO: Running stop for /dev/drbd1 on /mnt/drbd May 25 16:38:45 rocamadour Filesystem[14305]: INFO: Trying to unmount /mnt/drbd May 25 16:38:45 rocamadour Filesystem[14305]: INFO: unmounted /mnt/drbd successfully May 25 16:38:45 rocamadour ResourceManager[14013]: info: Running /etc/ha.d/resource.d/drbddisk mysql stop May 25 16:38:45 rocamadour kernel: [ 1804.477879] block drbd1: role( Primary -> Secondary ) May 25 16:38:58 rocamadour kernel: [ 1817.212040] block drbd1: Handshake successful: Agreed network protocol version 91 May 25 16:38:58 rocamadour kernel: [ 1817.212053] block drbd1: conn( WFConnection -> WFReportParams ) May 25 16:38:58 rocamadour kernel: [ 1817.212077] block drbd1: Starting asender thread (from drbd1_receiver [1651]) May 25 16:38:58 rocamadour kernel: [ 1817.216707] block drbd1: data-integrity-alg: <not-used> May 25 16:38:58 rocamadour kernel: [ 1817.216733] block drbd1: drbd_sync_handshake: May 25 16:38:58 rocamadour kernel: [ 1817.216740] block drbd1: self 3E097AF02C8F65F0:106921FD4287CF02:D202FB3C36D5F660:59248EB25AF9FA1B bits:170475 flags:0 May 25 16:38:58 rocamadour kernel: [ 1817.216746] block drbd1: peer 7BF8153619974428:106921FD4287CF03:D202FB3C36D5F660:59248EB25AF9FA1B bits:263168 flags:2 May 25 16:38:58 rocamadour kernel: [ 1817.216751] block drbd1: uuid_compare()=100 by rule 90 May 25 16:38:58 rocamadour kernel: [ 1817.216756] block drbd1: Split-Brain detected, 0 primaries, automatically solved. Sync from peer node May 25 16:38:58 rocamadour kernel: [ 1817.216764] block drbd1: peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapT ) pdsk( Outdated -> UpToDate ) May 25 16:38:58 rocamadour kernel: [ 1817.502214] block drbd1: conn( WFBitMapT -> WFSyncUUID ) May 25 16:38:58 rocamadour kernel: [ 1817.513044] block drbd1: helper command: /sbin/drbdadm before-resync-target minor-1 May 25 16:38:58 rocamadour kernel: [ 1817.515887] block drbd1: helper command: /sbin/drbdadm before-resync-target minor-1 exit code 0 (0x0) May 25 16:38:58 rocamadour kernel: [ 1817.515895] block drbd1: conn( WFSyncUUID -> SyncTarget ) disk( UpToDate -> Inconsistent ) May 25 16:38:58 rocamadour kernel: [ 1817.515905] block drbd1: Began resync as SyncTarget (will sync 1058948 KB [264737 bits set]). May 25 16:39:13 rocamadour kernel: [ 1832.100203] block drbd1: peer( Secondary -> Primary ) May 25 16:40:32 rocamadour kernel: [ 1911.244897] block drbd1: Resync done (total 93 sec; paused 0 sec; 11384 K/sec) May 25 16:40:32 rocamadour kernel: [ 1911.244912] block drbd1: conn( SyncTarget -> Connected ) disk( Inconsistent -> UpToDate ) May 25 16:40:32 rocamadour kernel: [ 1911.244924] block drbd1: helper command: /sbin/drbdadm after-resync-target minor-1 May 25 16:40:32 rocamadour kernel: [ 1911.248905] block drbd1: helper command: /sbin/drbdadm after-resync-target minor-1 exit code 0 (0x0) -----------------------------------------------