Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi all, I'm a beginner with ha cluster; after reading the users-guide of drbd and the documentation of ucarp And after a long surfing the web to see how others had resolved the issue I have configured two identical machines with drbd 8.0.11 and ucarp 1.2. The environment is the follow: Distro debian etch kernel 2.6.24.3 drbd 8.0.11 ucarp 1.2 (etch package) drbd work on dedicated nics Broadcom Corporation NetXtreme BCM5754 Gigabit Ethernet PCI Express (rev 02) sangiorgio2:~# ethtool -i eth2 driver: tg3 version: 3.86 firmware-version: 5754-v3.12 bus-info: 0000:04:00.0 The two machines are plugged at two different routers of two ISP, I need to put in ha squid dansguardian bind apache. I have compiled drbd in kernel built-in and installed the tools. Now, when I shutdown the master, the slave takes adresses, mounts the partition and starts the services without problems, but when I reboot the master it does not return primary and doesn't mount the partition nor starts the services, but takes the address. after te master reboot I have on master: sangiorgio2:~# cat /proc/drbd version: 8.0.11 (api:86/proto:86) GIT-hash: b3fe2bdfd3b9f7c2f923186883eb9e2a0d3a5b1b build by root at sangiorgio2, 2008-03-28 16:36:00 0: cs:Connected st:Secondary/Secondary ds:UpToDate/UpToDate C r--- ns:0 nr:1052732 dw:1052732 dr:0 al:0 bm:402 lo:0 pe:0 ua:0 ap:0 resync: used:0/31 hits:65591 misses:201 starving:0 dirty:0 changed:201 act_log: used:0/257 hits:0 misses:0 starving:0 dirty:0 changed:0 master's syslog says : Starting DRBD resources: [ d(r0) drbd0: disk( Diskless -> Attaching ) drbd0: Starting worker thread (from cqueue/0 [187]) drbd0: Found 6 transactions (324 active extents) in activity log. drbd0: max_segment_size ( = BIO size ) = 32768 drbd0: drbd_bm_resize called with capacity == 61719704 drbd0: resync bitmap: bits=7714963 words=241094 drbd0: size = 29 GB (30859852 KB) drbd0: reading of bitmap took 13 jiffies drbd0: recounting of set bits took additional 0 jiffies drbd0: 0 KB (0 bits) marked out-of-sync by on disk bit-map. drbd0: Marked additional 1028 MB as out-of-sync based on AL. drbd0: disk( Attaching -> UpToDate ) drbd0: Writing meta data super block now. s(r0) n(r0) drbd0: conn( StandAlone -> Unconnected ) drbd0: Starting receiver thread (from drbd0_worker [3391]) drbd0: receiver (re)started drbd0: conn( Unconnected -> WFConnection ) ]. drbd0: Handshake successful: DRBD Network Protocol version 86 drbd0: conn( WFConnection -> WFReportParams ) drbd0: Starting asender thread (from drbd0_receiver [3396]) drbd0: Split-Brain detected, 1 primaries, automatically solved. Sync from peer node drbd0: peer( Unknown -> Primary ) conn( WFReportParams -> WFBitMapT ) pdsk( DUnknown -> UpToDate ) drbd0: Writing meta data super block now. drbd0: conn( WFBitMapT -> WFSyncUUID ) drbd0: conn( WFSyncUUID -> SyncTarget ) disk( UpToDate -> Inconsistent ) drbd0: Began resync as SyncTarget (will sync 1052672 KB [263168 bits set]). drbd0: Writing meta data super block now. Starting ucarp service: ucarp. Starting deferred execution scheduler: atd. Starting periodic command scheduler: crond. Starting anti portscan daemon: portsentry in atcp & audp mode. Debian GNU/Linux 4.0 sangiorgio2 tty1 sangiorgio2 login: drbd0: Resync done (total 47 sec; paused 0 sec; 22396 K/sec) drbd0: conn( SyncTarget -> Connected ) disk( Inconsistent -> UpToDate ) drbd0: Writing meta data super block now. drbd0: peer( Primary -> Secondary ) drbd0: Writing meta data super block now. on slave: perseo:~# cat /proc/drbd version: 8.0.11 (api:86/proto:86) GIT-hash: b3fe2bdfd3b9f7c2f923186883eb9e2a0d3a5b1b build by root at sangiorgio2, 2008-03-28 16:36:00 0: cs:Connected st:Secondary/Secondary ds:UpToDate/UpToDate C r--- ns:2105712 nr:1052940 dw:1053820 dr:2215119 al:0 bm:807 lo:0 pe:0 ua:0 ap:0 resync: used:0/31 hits:196801 misses:609 starving:0 dirty:0 changed:609 act_log: used:0/257 hits:220 misses:0 starving:0 dirty:0 changed:0 slave's syslog says: Apr 23 09:02:07 perseo kernel: tg3: eth2: Link is down. Apr 23 09:02:09 perseo kernel: tg3: eth2: Link is up at 1000 Mbps, full duplex. Apr 23 09:02:09 perseo kernel: tg3: eth2: Flow control is on for TX and on for RX. Apr 23 09:02:12 perseo kernel: drbd0: Handshake successful: DRBD Network Protocol version 86 Apr 23 09:02:12 perseo kernel: drbd0: conn( WFConnection -> WFReportParams ) Apr 23 09:02:12 perseo kernel: drbd0: Starting asender thread (from drbd0_receiver [5031]) Apr 23 09:02:12 perseo kernel: drbd0: Split-Brain detected, 1 primaries, automatically solved. Sync from this node Apr 23 09:02:12 perseo kernel: drbd0: peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapS ) pdsk( DUnknown -> UpToDate ) Apr 23 09:02:12 perseo kernel: drbd0: Writing meta data super block now. Apr 23 09:02:12 perseo kernel: drbd0: conn( WFBitMapS -> SyncSource ) pdsk( UpToDate -> Inconsistent ) Apr 23 09:02:12 perseo kernel: drbd0: Began resync as SyncSource (will sync 1052672 KB [263168 bits set]). Apr 23 09:02:12 perseo kernel: drbd0: Writing meta data super block now. Apr 23 09:02:30 perseo ucarp[3411]: [WARNING] Switching to state: BACKUP Apr 23 09:02:30 perseo ucarp[3411]: [WARNING] Spawning [/etc/ucarp/ucarp-down.sh eth1:0] Apr 23 09:02:30 perseo squid[5128]: Preparing for shutdown after 0 requests Apr 23 09:02:30 perseo squid[5128]: Waiting 30 seconds for active connections to finish Apr 23 09:02:30 perseo squid[5128]: FD 165 Closing HTTP connection Apr 23 09:02:59 perseo kernel: drbd0: Resync done (total 47 sec; paused 0 sec; 22396 K/sec) Apr 23 09:02:59 perseo kernel: drbd0: conn( SyncSource -> Connected ) pdsk( Inconsistent -> UpToDate ) Apr 23 09:02:59 perseo kernel: drbd0: Writing meta data super block now and if I type sh /etc/ucarp/ucarp-up.s all is ok and it is fixed I can not understand if it is a problem of drbd or ucarp my configurations are drbd.conf global { # minor-count 64; # dialog-refresh 5; # 5 seconds # disable-ip-verification; usage-count no; } common { syncer { rate 100M; } } resource r0 { protocol C; handlers { pri-lost-after-sb "echo o > /proc/sysrq-trigger ; halt -f"; local-io-error "echo o > /proc/sysrq-trigger ; halt -f"; # outdate-peer "/usr/lib/drbd/outdate-peer.sh on amd 192.168.22.11 192.168.23.11 on alf 192.168.22.12 192.168.23.12"; # outdate-peer "/usr/lib/heartbeat/drbd-peer-outdater -t 5"; pri-lost "echo pri-lost. Have a look at the log files. | mail -s 'DRBD Alert' root"; # Notify someone in case DRBD split brained. split-brain "echo split-brain. drbdadm -- --discard-my-data connect $DRBD_RESOURCE ? | mail -s 'DRBD Alert' root"; } startup { # wfc-timeout 0; degr-wfc-timeout 120; # 2 minutes. # wait-after-sb; } disk { on-io-error detach; # fencing resource-only; # size 10G; } net { sndbuf-size 512k; #timeout 60; # 6 seconds (unit = 0.1 seconds) #connect-int 10; # 10 seconds (unit = 1 second) #ping-int 10; # 10 seconds (unit = 1 second) #ping-timeout 5; # 500 ms (unit = 0.1 seconds) # max-buffers 2048; max-buffers 4096; # unplug-watermark 128; max-epoch-size 2048; # ko-count 4; # allow-two-primaries; # cram-hmac-alg "sha1"; # shared-secret "FooFunFactory"; after-sb-0pri discard-older-primary; after-sb-1pri call-pri-lost-after-sb; after-sb-2pri call-pri-lost-after-sb; rr-conflict disconnect; # DRBD-0.7's behaviour is equivalent to # after-sb-0pri discard-younger-primary; # after-sb-1pri consensus; # after-sb-2pri disconnect; } syncer { rate 100M; #after "r2"; al-extents 257; } on sangiorgio2 { device /dev/drbd0; disk /dev/sda12; address 192.168.1.240:7788; meta-disk internal; } on perseo { device /dev/drbd0; disk /dev/sda12; address 192.168.1.5:7788; meta-disk internal; } } on both nodes /etc/init.d/ucarp on master: #!/bin/sh PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin #set to -P if you want this to be the preferred master PREEMPT=-P #Log to /var/log/messages LOG=syslog FAILOVER_ID=15 SHARED_SECRET=secret1 VIRTUAL_IP=192.168.2.201 REAL_IP=192.168.2.200 INTERFACE=eth1:0 VIP_UP_SCRIPT=/etc/ucarp/ucarp-up.sh VIP_DOWN_SCRIPT=/etc/ucarp/ucarp-down.sh OPTIONS="-B -k 0 -b 1 -v $FAILOVER_ID -p $SHARED_SECRET -a $VIRTUAL_IP - -s $REAL_IP -i $INTERFACE -u $VIP_UP_SCRIPT -d $VIP_DOWN_SCRIPT $PREEMPT - -z -f $LOG" test -x /usr/sbin/ucarp || exit 0 case "$1" in blablabla on slave: #!/bin/sh PATH=/sbin:/bin:/usr/sbin:/usr/bin #set to -P if you want this to be the preferred master PREEMPT=-P #Log to /var/log/messages LOG=syslog FAILOVER_ID=10 SHARED_SECRET=secret NTLM_IP=192.168.2.201 REAL_IP=192.168.2.200 INTERFACE=eth0 VIP_UP_SCRIPT=/etc/ucarp/ucarp-up.sh VIP_DOWN_SCRIPT=/etc/ucarp/ucarp-down.sh OPTIONS="-B -k 200 -b 5 -v $FAILOVER_ID -p $SHARED_SECRET -a $NTLM_IP - -s $REAL_IP -i $INTERFACE -u $VIP_UP_SCRIPT -d $VIP_DOWN_SCRIPT $PREEMPT -z -f $LOG" test -x /usr/sbin/ucarp || exit 0 case "$1" in blablabla the differences between the two are -k and -b /etc/ucarp/ucarp-up.sh on both machines #!/bin/sh PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin VIP=192.168.2.201 VIP_BRD=255.255.255.0 NIC=eth1:0 NIC1=eth1:1 BIND=192.168.2.242 exec 2> /dev/null ifconfig $NIC $VIP/24 broadcast $VIP_BRD ifconfig $NIC1 $BIND/24 broadcast $VIP_BRD arping -q -c 5 -U $VIP arping -q -c 5 -U $BIND drbdadm primary r0 mount /jumper /etc/init.d/samba start /etc/init.d/winbind start chmod g+rx /var/run/samba/winbindd_privileged chgrp proxy /var/run/samba/winbindd_privileged /etc/init.d/bind start /etc/init.d/squid start /etc/init.d/dansguardian start /etc/init.d/squid-ntlm start /etc/init.d/apache start /etc/ucarp/ucarp-down.sh on the two machines except reversed mac #!/bin/sh PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin /etc/init.d/apache stop /etc/init.d/squid-ntlm stop /etc/init.d/dansguardian stop /etc/init.d/squid stop /etc/init.d/samba stop /etc/init.d/winbind stop /etc/init.d/bind stop umount /jumper drbdadm secondary r0 OTHER_MAC=00:15:17:4B:A7:3A VIP=192.168.2.201 NIC=eth1:0 BIND=192.168.2.242 NIC1=eth1:1 exec 2> /dev/null ifconfig $NIC down ifconfig $NIC1 down send_arp \ $VIP $OTHER_MAC \ $VIP $OTHER_MAC \ eth1:0 $OTHER_MAC FF:FF:FF:FF:FF:FF reply sleep 1 send_arp \ $VIP $OTHER_MAC \ $VIP 00:00:00:00:00:00 \ eth1:0 $OTHER_MAC FF:FF:FF:FF:FF:FF request sleep 1 send_arp \ $VIP $OTHER_MAC \ $VIP $OTHER_MAC \ "$NIC1" $OTHER_MAC FF:FF:FF:FF:FF:FF reply sleep 1 send_arp \ $BIND $OTHER_MAC \ $BIND 00:00:00:00:00:00 \ "$NIC1" $OTHER_MAC FF:FF:FF:FF:FF:FF request Can someone help me to understand what's wrong? or tell me where to look for a deeper analysis many thanks in advance and best regarsd sorry for my english. - -- Mario Vittorio Guenzi E-mail jclark at tiscali.it Si vis pacem, para bellum -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIDvakm6qs1ZkNrIoRAn4yAJsEI1UE9nC1EqH30mqY0buhpAVH1wCeN9rw 4Du4+Vtdz4QTxuHPHWAGzI8= =AgR5 -----END PGP SIGNATURE-----