Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
2010/5/25 Maros Timko <timkom at gmail.com>: >> >> Hello, >> >> I have 2 nodes with a Heartbeat/DRBD configuration. >> (Node1: primary, the hostname is "bleu", Node2: secondary, the >> hostname is "rocamadour"). >> >> I want to check the automatic recovery. >> my configuration is: >> after-sb-0pri discard-least-changes; >> after-sb-1pri discard-secondary; >> after-sb-2pri call-pri-lost-after-sb; > > What is your pri-lost-after-sb doing? Rebooting the node? Which one? pri-lost-after-sb "echo perte primaire DRBD >~/.drbdStatus; drbdadm secondary mysql; drbdadm outdate mysql; ifconfig eth0 down; sync; reboot -f"; N1 reboots. > >> >> I disconnect the network wire from N2. >> N1 and N2 become primaries. >> I write some data on N1, I write MORE data on N2. >> >> I connect the network wire of N2. >> N1 immediately reboots. >> And after the boot of N1, data is synchronised from N1 to N2 (bad!!, >> there are more changes on N2) >> >> just before the boot of N1: >> May 25 16:38:26 rocamadour kernel: [ 1785.500483] block drbd1: >> drbd_sync_handshake: >> May 25 16:38:26 rocamadour kernel: [ 1785.500490] block drbd1: self >> 3E097AF02C8F65F1:106921FD4287CF02:D202FB3C36D5F660:59248EB25AF9FA1B >> bits:170465 flags:0 >> May 25 16:38:26 rocamadour kernel: [ 1785.500496] block drbd1: peer >> 7BF8153619974429:106921FD4287CF03:D202FB3C36D5F660:59248EB25AF9FA1B >> bits:227 flags:0 >> May 25 16:38:26 rocamadour kernel: [ 1785.500502] block drbd1: >> uuid_compare()=100 by rule 90 >> May 25 16:38:26 rocamadour kernel: [ 1785.500507] block drbd1: >> Split-Brain detected, 2 primaries, automatically solved. Sync from >> this node > > We have 2 primaries here > >> >> After the boot of N1: >> May 25 16:38:26 rocamadour kernel: [ 1785.500483] block drbd1: >> drbd_sync_handshake: >> May 25 16:38:58 rocamadour kernel: [ 1817.216740] block drbd1: self >> 3E097AF02C8F65F0:106921FD4287CF02:D202FB3C36D5F660:59248EB25AF9FA1B >> bits:170475 flags:0 >> May 25 16:38:58 rocamadour kernel: [ 1817.216746] block drbd1: peer >> 7BF8153619974428:106921FD4287CF03:D202FB3C36D5F660:59248EB25AF9FA1B >> bits:263168 flags:2 >> May 25 16:38:58 rocamadour kernel: [ 1817.216751] block drbd1: >> uuid_compare()=100 by rule 90 >> May 25 16:38:58 rocamadour kernel: [ 1817.216756] block drbd1: >> Split-Brain detected, 0 primaries, automatically solved. Sync from >> peer node >> > > We have 0 primaries here. You mentined reboot of one node. Who demoted > the other one? At the same time on N2, heartbeat gives up all HA resources. Heartbeat unmounts drbd and "drbd1: role( Primary -> Secondary )", --- log on N2 --- May 25 16:38:26 rocamadour kernel: [ 1785.500483] block drbd1: drbd_sync_handshake: May 25 16:38:26 rocamadour kernel: [ 1785.500490] block drbd1: self 3E097AF02C8F65F1:106921FD4287CF02:D202FB3C36D5F660:59248EB25AF9FA1B bits:170465 flags:0 May 25 16:38:26 rocamadour kernel: [ 1785.500496] block drbd1: peer 7BF8153619974429:106921FD4287CF03:D202FB3C36D5F660:59248EB25AF9FA1B bits:227 flags:0 May 25 16:38:26 rocamadour kernel: [ 1785.500502] block drbd1: uuid_compare()=100 by rule 90 May 25 16:38:26 rocamadour kernel: [ 1785.500507] block drbd1: Split-Brain detected, 2 primaries, automatically solved. Sync from this node May 25 16:38:26 rocamadour kernel: [ 1785.500516] block drbd1: peer( Unknown -> Primary ) conn( WFReportParams -> WFBitMapS ) pdsk( Outdated -> UpToDate ) May 25 16:38:27 rocamadour harc[13960]: info: Running /etc/ha.d/rc.d/status status May 25 16:38:27 rocamadour heartbeat: [1523]: info: Link 192.168.1.119:192.168.1.119 up. May 25 16:38:27 rocamadour heartbeat: [1523]: WARN: Late heartbeat: Node 192.168.1.119: interval 141210 ms May 25 16:38:27 rocamadour ipfail: [1766]: info: Link Status update: Link 192.168.1.119/192.168.1.119 now has status up May 25 16:38:27 rocamadour heartbeat: [1523]: info: Status update for node 192.168.1.119: status ping May 25 16:38:27 rocamadour ipfail: [1766]: info: Status update: Node 192.168.1.119 now has status ping May 25 16:38:27 rocamadour heartbeat: [1523]: info: Managed status process 13960 exited with return code 0. May 25 16:38:27 rocamadour ipfail: [1766]: info: A ping node just came up. May 25 16:38:27 rocamadour heartbeat: [1523]: info: all clients are now paused May 25 16:38:28 rocamadour heartbeat: [1523]: info: hb_giveup_resources(): current status: active May 25 16:38:28 rocamadour heartbeat: [1523]: info: Heartbeat shutdown in progress. (1523) May 25 16:38:28 rocamadour heartbeat: [13997]: info: Giving up all HA resources. May 25 16:38:28 rocamadour ResourceManager[14013]: info: Releasing resource group: bleu IPaddr::192.168.1.228/24 drbddisk::mysql Filesystem::/dev/drbd1::/mnt/drbd::ext3::defaults mysql mon (...) May 25 16:38:45 rocamadour Filesystem[14305]: INFO: Running stop for /dev/drbd1 on /mnt/drbd May 25 16:38:45 rocamadour Filesystem[14305]: INFO: Trying to unmount /mnt/drbd May 25 16:38:45 rocamadour Filesystem[14305]: INFO: unmounted /mnt/drbd successfully May 25 16:38:45 rocamadour Filesystem[14289]: INFO: Success May 25 16:38:45 rocamadour ResourceManager[14013]: info: Running /etc/ha.d/resource.d/drbddisk mysql stop May 25 16:38:45 rocamadour kernel: [ 1804.477879] block drbd1: role( Primary -> Secondary ) May 25 16:38:45 rocamadour ResourceManager[14013]: info: Running /etc/ha.d/resource.d/IPaddr 192.168.1.228/24 stop May 25 16:38:46 rocamadour IPaddr[14456]: INFO: ifconfig eth0:0 down May 25 16:38:46 rocamadour IPaddr[14428]: INFO: Success May 25 16:38:46 rocamadour heartbeat: [13997]: info: All HA resources relinquished. Thanks. ---------- Thierry