[DRBD-user] automatic recovery, 2 successively drbd_sync_handshake with different results

Thierry Haillot thierry.haillot at gmail.com
Tue May 25 18:15:05 CEST 2010

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Hello,

I have 2 nodes with a Heartbeat/DRBD configuration.
(Node1: primary, the hostname is "bleu", Node2: secondary, the
hostname is "rocamadour").

I want to check the automatic recovery.
my configuration is:
                after-sb-0pri discard-least-changes;
                after-sb-1pri discard-secondary;
                after-sb-2pri call-pri-lost-after-sb;


I disconnect the network wire from N2.
N1 and N2 become primaries.
I write some data on N1, I write MORE data on N2.

I connect the network wire of N2.
N1 immediately reboots.
And after the boot of N1, data is synchronised from N1 to N2 (bad!!,
there are more changes on N2)

just before the boot of N1:
May 25 16:38:26 rocamadour kernel: [ 1785.500483] block drbd1:
drbd_sync_handshake:
May 25 16:38:26 rocamadour kernel: [ 1785.500490] block drbd1: self
3E097AF02C8F65F1:106921FD4287CF02:D202FB3C36D5F660:59248EB25AF9FA1B
bits:170465 flags:0
May 25 16:38:26 rocamadour kernel: [ 1785.500496] block drbd1: peer
7BF8153619974429:106921FD4287CF03:D202FB3C36D5F660:59248EB25AF9FA1B
bits:227 flags:0
May 25 16:38:26 rocamadour kernel: [ 1785.500502] block drbd1:
uuid_compare()=100 by rule 90
May 25 16:38:26 rocamadour kernel: [ 1785.500507] block drbd1:
Split-Brain detected, 2 primaries, automatically solved. Sync from
this node

After the boot of N1:
May 25 16:38:26 rocamadour kernel: [ 1785.500483] block drbd1:
drbd_sync_handshake:
May 25 16:38:58 rocamadour kernel: [ 1817.216740] block drbd1: self
3E097AF02C8F65F0:106921FD4287CF02:D202FB3C36D5F660:59248EB25AF9FA1B
bits:170475 flags:0
May 25 16:38:58 rocamadour kernel: [ 1817.216746] block drbd1: peer
7BF8153619974428:106921FD4287CF03:D202FB3C36D5F660:59248EB25AF9FA1B
bits:263168 flags:2
May 25 16:38:58 rocamadour kernel: [ 1817.216751] block drbd1:
uuid_compare()=100 by rule 90
May 25 16:38:58 rocamadour kernel: [ 1817.216756] block drbd1:
Split-Brain detected, 0 primaries, automatically solved. Sync from
peer node


I don't understand:
   - before the reboot "automatically solved. Sync from THIS node",
the peer has "bits:227 flags:0"
   - and after the reboot "automatically solved. Sync from PEER node",
the peer has "bits:263168 flags:2"

why these differences?

Thanks in advance for your light.

---------
Thierry



PS:
The drbd version is "version: 8.3.3 (api:88/proto:86-91)".


This is log (tail -f /var/log/messages | grep drbd) from N2
-----------------------------------------------

May 25 16:38:26 rocamadour kernel: [ 1785.500047] block drbd1:
Handshake successful: Agreed network protocol version 91
May 25 16:38:26 rocamadour kernel: [ 1785.500061] block drbd1: conn(
WFConnection -> WFReportParams )
May 25 16:38:26 rocamadour kernel: [ 1785.500093] block drbd1:
Starting asender thread (from drbd1_receiver [1651])
May 25 16:38:26 rocamadour kernel: [ 1785.500412] block drbd1:
data-integrity-alg: <not-used>
May 25 16:38:26 rocamadour kernel: [ 1785.500483] block drbd1:
drbd_sync_handshake:
May 25 16:38:26 rocamadour kernel: [ 1785.500490] block drbd1: self
3E097AF02C8F65F1:106921FD4287CF02:D202FB3C36D5F660:59248EB25AF9FA1B
bits:170465 flags:0
May 25 16:38:26 rocamadour kernel: [ 1785.500496] block drbd1: peer
7BF8153619974429:106921FD4287CF03:D202FB3C36D5F660:59248EB25AF9FA1B
bits:227 flags:0
May 25 16:38:26 rocamadour kernel: [ 1785.500502] block drbd1:
uuid_compare()=100 by rule 90
May 25 16:38:26 rocamadour kernel: [ 1785.500507] block drbd1:
Split-Brain detected, 2 primaries, automatically solved. Sync from
this node
May 25 16:38:26 rocamadour kernel: [ 1785.500516] block drbd1: peer(
Unknown -> Primary ) conn( WFReportParams -> WFBitMapS ) pdsk(
Outdated -> UpToDate )
May 25 16:38:28 rocamadour ResourceManager[14013]: info: Releasing
resource group: bleu IPaddr::192.168.1.228/24 drbddisk::mysql
Filesystem::/dev/drbd1::/mnt/drbd::ext3::defaults mysql mon
May 25 16:38:37 rocamadour kernel: [ 1796.000056] block drbd1: peer(
Primary -> Unknown ) conn( WFBitMapS -> NetworkFailure ) pdsk(
UpToDate -> DUnknown )
May 25 16:38:37 rocamadour kernel: [ 1796.000070] block drbd1: asender
terminated
May 25 16:38:37 rocamadour kernel: [ 1796.000075] block drbd1:
Terminating asender thread
May 25 16:38:44 rocamadour kernel: [ 1803.500048] block drbd1:
md_sync_timer expired! Worker calls drbd_md_sync().
May 25 16:38:44 rocamadour kernel: [ 1803.518126] block drbd1: Connection closed
May 25 16:38:44 rocamadour kernel: [ 1803.518137] block drbd1: helper
command: /sbin/drbdadm fence-peer minor-1
May 25 16:38:45 rocamadour kernel: [ 1803.657969] block drbd1: helper
command: /sbin/drbdadm fence-peer minor-1 exit code 5 (0x500)
May 25 16:38:45 rocamadour kernel: [ 1803.657978] block drbd1:
fence-peer helper returned 5 (peer is unreachable, assumed to be dead)
May 25 16:38:45 rocamadour kernel: [ 1803.657993] block drbd1: pdsk(
DUnknown -> Outdated )
May 25 16:38:45 rocamadour kernel: [ 1803.668081] block drbd1: conn(
NetworkFailure -> Unconnected )
May 25 16:38:45 rocamadour kernel: [ 1803.668090] block drbd1:
receiver terminated
May 25 16:38:45 rocamadour kernel: [ 1803.668095] block drbd1:
Restarting receiver thread
May 25 16:38:45 rocamadour kernel: [ 1803.668100] block drbd1:
receiver (re)started
May 25 16:38:45 rocamadour kernel: [ 1803.668106] block drbd1: conn(
Unconnected -> WFConnection )
May 25 16:38:45 rocamadour ResourceManager[14013]: info: Running
/etc/ha.d/resource.d/Filesystem /dev/drbd1 /mnt/drbd ext3 defaults
stop
May 25 16:38:45 rocamadour Filesystem[14305]: INFO: Running stop for
/dev/drbd1 on /mnt/drbd
May 25 16:38:45 rocamadour Filesystem[14305]: INFO: Trying to unmount /mnt/drbd
May 25 16:38:45 rocamadour Filesystem[14305]: INFO: unmounted
/mnt/drbd successfully
May 25 16:38:45 rocamadour ResourceManager[14013]: info: Running
/etc/ha.d/resource.d/drbddisk mysql stop
May 25 16:38:45 rocamadour kernel: [ 1804.477879] block drbd1: role(
Primary -> Secondary )
May 25 16:38:58 rocamadour kernel: [ 1817.212040] block drbd1:
Handshake successful: Agreed network protocol version 91
May 25 16:38:58 rocamadour kernel: [ 1817.212053] block drbd1: conn(
WFConnection -> WFReportParams )
May 25 16:38:58 rocamadour kernel: [ 1817.212077] block drbd1:
Starting asender thread (from drbd1_receiver [1651])
May 25 16:38:58 rocamadour kernel: [ 1817.216707] block drbd1:
data-integrity-alg: <not-used>
May 25 16:38:58 rocamadour kernel: [ 1817.216733] block drbd1:
drbd_sync_handshake:
May 25 16:38:58 rocamadour kernel: [ 1817.216740] block drbd1: self
3E097AF02C8F65F0:106921FD4287CF02:D202FB3C36D5F660:59248EB25AF9FA1B
bits:170475 flags:0
May 25 16:38:58 rocamadour kernel: [ 1817.216746] block drbd1: peer
7BF8153619974428:106921FD4287CF03:D202FB3C36D5F660:59248EB25AF9FA1B
bits:263168 flags:2
May 25 16:38:58 rocamadour kernel: [ 1817.216751] block drbd1:
uuid_compare()=100 by rule 90
May 25 16:38:58 rocamadour kernel: [ 1817.216756] block drbd1:
Split-Brain detected, 0 primaries, automatically solved. Sync from
peer node
May 25 16:38:58 rocamadour kernel: [ 1817.216764] block drbd1: peer(
Unknown -> Secondary ) conn( WFReportParams -> WFBitMapT ) pdsk(
Outdated -> UpToDate )
May 25 16:38:58 rocamadour kernel: [ 1817.502214] block drbd1: conn(
WFBitMapT -> WFSyncUUID )
May 25 16:38:58 rocamadour kernel: [ 1817.513044] block drbd1: helper
command: /sbin/drbdadm before-resync-target minor-1
May 25 16:38:58 rocamadour kernel: [ 1817.515887] block drbd1: helper
command: /sbin/drbdadm before-resync-target minor-1 exit code 0 (0x0)
May 25 16:38:58 rocamadour kernel: [ 1817.515895] block drbd1: conn(
WFSyncUUID -> SyncTarget ) disk( UpToDate -> Inconsistent )
May 25 16:38:58 rocamadour kernel: [ 1817.515905] block drbd1: Began
resync as SyncTarget (will sync 1058948 KB [264737 bits set]).
May 25 16:39:13 rocamadour kernel: [ 1832.100203] block drbd1: peer(
Secondary -> Primary )

May 25 16:40:32 rocamadour kernel: [ 1911.244897] block drbd1: Resync
done (total 93 sec; paused 0 sec; 11384 K/sec)
May 25 16:40:32 rocamadour kernel: [ 1911.244912] block drbd1: conn(
SyncTarget -> Connected ) disk( Inconsistent -> UpToDate )
May 25 16:40:32 rocamadour kernel: [ 1911.244924] block drbd1: helper
command: /sbin/drbdadm after-resync-target minor-1
May 25 16:40:32 rocamadour kernel: [ 1911.248905] block drbd1: helper
command: /sbin/drbdadm after-resync-target minor-1 exit code 0 (0x0)

-----------------------------------------------



More information about the drbd-user mailing list