[DRBD-user] drbd master to slave synchronisation under heartbeat

avn avn at avn.ro
Tue Mar 9 16:32:06 CET 2010

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Lars,
I think that we have to separate the issues (or maybe here is my mistake).
All the configuration issues pointed by you are from my understanding
related only to what happen only after a split brain is detected/occured.
And split brain could occur due to a lot of facts. Yes, since the servers
are collocated and they share a separate network connection, this should be
very unlikely, but not impossible.
So it should not influence the mecanism of the synchronization.

I reproduce here the issue (some not important or confidential data
trimmed/changed).
If you need I could send you ssh details to look into the boxes.
Shortly, I changed a value in a file on /data where drbd is mounted, copied
this on a normal disk and also on a normal disk of the peer.
(My comments between ##############)

1_mail ~ # nano /data/chroot/dns/etc/bind/pri/adomain-static.eu.zone
#############
Here I edited this file, changing the serial number from 2010030801 to
2010030802 
#############
1_mail ~ # cp /data/chroot/dns/etc/bind/pri/adomain-static.eu.zone
/home/adomain-static.eu.zone
1_mail ~ # diff /data/chroot/dns/etc/bind/pri/adomain-static.eu.zone
/home/adomain-static.eu.zone
1_mail ~ # rsync /data/chroot/dns/etc/bind/pri/adomain-static.eu.zone
2_mail:/home
1_mail ~ # cat /proc/drbd
version: 8.3.7 (api:88/proto:86-91)
GIT-hash: ea9e28dbff98e331a62bcbcc63a6135808fe2917 build by root at 1_mail,
2010-03-04 01:20:09
 0: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r----
    ns:17389628 nr:36296 dw:17425988 dr:135950 al:188 bm:118 lo:0 pe:0 ua:0
ap:0 ep:1 wo:d oos:0
1_mail ~ # ssh 2_mail cat /proc/drbd
version: 8.3.7 (api:88/proto:86-91)
GIT-hash: ea9e28dbff98e331a62bcbcc63a6135808fe2917 build by root at 2_mail,
2010-03-03 21:46:58
 0: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate C r----
    ns:26104 nr:17391104 dw:17417208 dr:11941 al:57 bm:59 lo:0 pe:0 ua:0
ap:0 ep:1 wo:d oos:0
1_mail ~ # shutdown -r now
#############################
verified that drbd say it is in sync and perform a restart on the primary
server
immediatly all resources moved and the second server become primary
#############################                                                   

2_mail ~ # diff /data/chroot/dns/etc/bind/pri/adomain-static.eu.zone
/home/adomain-static.eu.zone
6c6
<                         2010012204
---
>                         2010030802                               
###################################              
It has a VERY old version!!!!!     
Actualy it was a version which was edited when this server was primary and
was synced be mannualy declare the other one out of sync
Meanwhile I edited several times this file, all times when 1_mail was
primary
###################################
2_mail ~ # cat /proc/drbd
version: 8.3.7 (api:88/proto:86-91)
GIT-hash: ea9e28dbff98e331a62bcbcc63a6135808fe2917 build by root at 2_mail,
2010-03-03 21:46:58
 0: cs:WFConnection ro:Primary/Unknown ds:UpToDate/Outdated C r----
    ns:26104 nr:17398616 dw:17425392 dr:13246 al:59 bm:59 lo:0 pe:0 ua:0
ap:0 ep:1 wo:d oos:668
############
the other one is not up yet....... So  no strange sync between them
############
2_mail ~ # crm_mon -1f
============
Last updated: Tue Mar  9 16:46:01 2010
Stack: Heartbeat
Current DC: 2_mail (5102bd2d-aef8-4b93-b01f-c5396c7dec41) - partition with
quorum
Version: 1.0.7-2eed906f43e90ee1e0f7d411f814fc585b30f869
2 Nodes configured, 2 expected votes
6 Resources configured.
============

Online: [ 2_mail ]
OFFLINE: [ 1_mail ]

 Master/Slave Set: ms_drbd
     Masters: [ 2_mail ]
     Stopped: [ drbd:0 ]
 Resource Group: HA
     fs (ocf::heartbeat:Filesystem):    Started 2_mail
     ip (ocf::heartbeat:IPaddr2):       Started 2_mail
     ip_flash   (ocf::heartbeat:IPaddr2):       Started 2_mail
     nginx_flash_ld     (ocf::heartbeat:ldirectord):    Started 2_mail
     pgsql      (ocf::heartbeat:pgsql): Started 2_mail
     named      (lsb:named):    Stopped

Migration summary:
* Node 2_mail:
   named: migration-threshold=2 fail-count=2

Failed actions:
    named_start_0 (node=2_mail, call=386, rc=1, status=complete): unknown
error

Mar  9 16:44:43 2_mail lrmd: [5879]: info: rsc:named:384: start
Mar  9 16:44:43 2_mail crmd: [5882]: info: do_lrm_rsc_op: Performing
key=57:0:0:0d15bbaf-4cda-4219-8e04-6667546ff9f1 op=named_start_0 )
Mar  9 16:44:43 2_mail lrmd: [5879]: info: RA output: (named:start:stdout) 
* Starting chrooted named ...
Mar  9 16:44:43 2_mail lrmd: [5879]: info: RA output: (named:start:stdout)  
* Mounting chroot dirs
Mar  9 16:44:43 2_mail lrmd: [5879]: info: RA output: (named:start:stdout) 
* mounting /etc/bind to /data/chroot/dns/etc/bind
Mar  9 16:44:43 2_mail lrmd: [5879]: info: RA output: (named:start:stdout) 
* mounting /var/bind to /data/chroot/dns/var/bind
Mar  9 16:44:43 2_mail lrmd: [5879]: info: RA output: (named:start:stdout) 
* mounting /var/log/named to /data/chroot/dns/var/log/named
Mar  9 16:44:43 2_mail named[4048]: starting BIND 9.6.1-P1 -u named -n 1 -t
/data/chroot/dns
Mar  9 16:44:43 2_mail named[4048]: built with '--prefix=/usr'
'--build=x86_64-pc-linux-gnu' '--host=x86_64-pc-linux-gnu'
'--mandir=/usr/share/man' '--infodir=/usr/share/info' '--datadir=/usr/share'
'--sysconfdir=/etc' '--localstatedir=/var/lib' '--libdir=/usr/lib64'
'--sysconfdir=/etc/bind' '--localstatedir=/var' '--with-libtool'
'--with-openssl' '--without-idn' '--disable-ipv6' '--without-libxml2'
'--enable-linux-caps' '--enable-threads' '--with-randomdev=/dev/random'
'build_alias=x86_64-pc-linux-gnu' 'host_alias=x86_64-pc-linux-gnu'
'CFLAGS=-O2 -march=nocona -pipe' 'LDFLAGS=-Wl,-O1' 'CXXFLAGS=-O2
-march=nocona -pipe'
Mar  9 16:44:43 2_mail named[4048]: adjusted limit on open files from 1024
to 1048576
Mar  9 16:44:43 2_mail named[4048]: found 2 CPUs, using 1 worker thread
Mar  9 16:44:43 2_mail named[4048]: using up to 4096 sockets
Mar  9 16:44:43 2_mail named[4048]: loading configuration from
'/etc/bind/named.conf'
Mar  9 16:44:43 2_mail named[4048]: /etc/bind/named.conf:26: expected IPv4
address or '*' near '{'
Mar  9 16:44:43 2_mail named[4048]: loading configuration: unexpected token
Mar  9 16:44:43 2_mail named[4048]: exiting (due to fatal error)
Mar  9 16:44:43 2_mail lrmd: [5879]: info: RA output: (named:start:stdout)                                                                          
[ !! ]
Mar  9 16:44:43 2_mail lrmd: [5879]: WARN: Managed named:start process 3974
exited with return code 1.
########
The named failed to start because the configuration file residing on the
drbd partition is an old one with errors in it!
Interestingly this config file was modified 2 days ago into the wrong
version, so long after the zone file above (with different serial numbers)
was edited.
So here it is what I tried to explain earlier, that there is no time based
rule of when the files were modified and synced after.
########
1_mail ~ # cat /proc/drbd
version: 8.3.7 (api:88/proto:86-91)
GIT-hash: ea9e28dbff98e331a62bcbcc63a6135808fe2917 build by root at 1_mail,
2010-03-04 01:20:09
 0: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r----
    ns:32144 nr:940 dw:33084 dr:12505 al:56 bm:13 lo:0 pe:0 ua:0 ap:0 ep:1
wo:b oos:0
Last updated: Tue Mar  9 16:56:08 2010
1_mail ~ #crm_mon -1f
Stack: Heartbeat
Current DC: 2_mail (5102bd2d-aef8-4b93-b01f-c5396c7dec41) - partition with
quorum
Version: 1.0.7-2eed906f43e90ee1e0f7d411f814fc585b30f869
2 Nodes configured, 2 expected votes
6 Resources configured.
============

Online: [ 1_mail 2_mail ]

 Master/Slave Set: ms_drbd
     Masters: [ 1_mail ]
     Slaves: [ 2_mail ]
 Resource Group: HA
     fs (ocf::heartbeat:Filesystem):    Started 1_mail
     ip (ocf::heartbeat:IPaddr2):       Started 1_mail
     ip_flash   (ocf::heartbeat:IPaddr2):       Started 1_mail
     nginx_flash_ld     (ocf::heartbeat:ldirectord):    Started 1_mail
     pgsql      (ocf::heartbeat:pgsql): Started 1_mail
     named      (lsb:named):    Started 1_mail

Migration summary:
* Node 2_mail:
   named: migration-threshold=2 fail-count=2
* Node 1_mail:

Failed actions:
    named_start_0 (node=2_mail, call=386, rc=1, status=complete): unknown
error                                                 
#######
When the 1_mail wake up the resources were migrated to it, with the correct
file in place named conf and named zone file.
diff is ok, named started.

#######
-- 
View this message in context: http://old.nabble.com/drbd-master-to-slave-synchronisation-under-heartbeat-tp27824570p27837328.html
Sent from the DRBD - User mailing list archive at Nabble.com.




More information about the drbd-user mailing list