Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Hi all,
I'm a beginner with ha cluster; after reading the users-guide of drbd
and the documentation of ucarp And after a long surfing the web to see
how others had resolved the issue I have configured two identical
machines with drbd 8.0.11 and ucarp 1.2.
The environment is the follow:
Distro debian etch
kernel 2.6.24.3
drbd 8.0.11
ucarp 1.2 (etch package)
drbd work on dedicated nics Broadcom Corporation NetXtreme BCM5754
Gigabit Ethernet PCI Express (rev 02)
sangiorgio2:~# ethtool -i eth2
driver: tg3
version: 3.86
firmware-version: 5754-v3.12
bus-info: 0000:04:00.0
The two machines are plugged at two different routers of two ISP, I need
to put in ha squid dansguardian bind apache.
I have compiled drbd in kernel built-in and installed the tools.
Now, when I shutdown the master, the slave takes adresses, mounts the
partition and starts the services without problems, but when I reboot
the master it does not return primary and doesn't mount the partition
nor starts the services, but takes the address.
after te master reboot I have
on master:
sangiorgio2:~# cat /proc/drbd
version: 8.0.11 (api:86/proto:86)
GIT-hash: b3fe2bdfd3b9f7c2f923186883eb9e2a0d3a5b1b build by
root at sangiorgio2, 2008-03-28 16:36:00
0: cs:Connected st:Secondary/Secondary ds:UpToDate/UpToDate C r---
ns:0 nr:1052732 dw:1052732 dr:0 al:0 bm:402 lo:0 pe:0 ua:0 ap:0
resync: used:0/31 hits:65591 misses:201 starving:0 dirty:0
changed:201
act_log: used:0/257 hits:0 misses:0 starving:0 dirty:0 changed:0
master's syslog says :
Starting DRBD resources: [ d(r0) drbd0: disk( Diskless -> Attaching )
drbd0: Starting worker thread (from cqueue/0 [187])
drbd0: Found 6 transactions (324 active extents) in activity log.
drbd0: max_segment_size ( = BIO size ) = 32768
drbd0: drbd_bm_resize called with capacity == 61719704
drbd0: resync bitmap: bits=7714963 words=241094
drbd0: size = 29 GB (30859852 KB)
drbd0: reading of bitmap took 13 jiffies
drbd0: recounting of set bits took additional 0 jiffies
drbd0: 0 KB (0 bits) marked out-of-sync by on disk bit-map.
drbd0: Marked additional 1028 MB as out-of-sync based on AL.
drbd0: disk( Attaching -> UpToDate )
drbd0: Writing meta data super block now.
s(r0) n(r0) drbd0: conn( StandAlone -> Unconnected )
drbd0: Starting receiver thread (from drbd0_worker [3391])
drbd0: receiver (re)started
drbd0: conn( Unconnected -> WFConnection )
].
drbd0: Handshake successful: DRBD Network Protocol version 86
drbd0: conn( WFConnection -> WFReportParams )
drbd0: Starting asender thread (from drbd0_receiver [3396])
drbd0: Split-Brain detected, 1 primaries, automatically solved. Sync
from peer node
drbd0: peer( Unknown -> Primary ) conn( WFReportParams -> WFBitMapT )
pdsk( DUnknown -> UpToDate )
drbd0: Writing meta data super block now.
drbd0: conn( WFBitMapT -> WFSyncUUID )
drbd0: conn( WFSyncUUID -> SyncTarget ) disk( UpToDate -> Inconsistent )
drbd0: Began resync as SyncTarget (will sync 1052672 KB [263168 bits set]).
drbd0: Writing meta data super block now.
Starting ucarp service: ucarp.
Starting deferred execution scheduler: atd.
Starting periodic command scheduler: crond.
Starting anti portscan daemon: portsentry in atcp & audp mode.
Debian GNU/Linux 4.0 sangiorgio2 tty1
sangiorgio2 login: drbd0: Resync done (total 47 sec; paused 0 sec; 22396
K/sec)
drbd0: conn( SyncTarget -> Connected ) disk( Inconsistent -> UpToDate )
drbd0: Writing meta data super block now.
drbd0: peer( Primary -> Secondary )
drbd0: Writing meta data super block now.
on slave:
perseo:~# cat /proc/drbd
version: 8.0.11 (api:86/proto:86)
GIT-hash: b3fe2bdfd3b9f7c2f923186883eb9e2a0d3a5b1b build by
root at sangiorgio2, 2008-03-28 16:36:00
0: cs:Connected st:Secondary/Secondary ds:UpToDate/UpToDate C r---
ns:2105712 nr:1052940 dw:1053820 dr:2215119 al:0 bm:807 lo:0 pe:0
ua:0 ap:0
resync: used:0/31 hits:196801 misses:609 starving:0 dirty:0
changed:609
act_log: used:0/257 hits:220 misses:0 starving:0 dirty:0 changed:0
slave's syslog says:
Apr 23 09:02:07 perseo kernel: tg3: eth2: Link is down.
Apr 23 09:02:09 perseo kernel: tg3: eth2: Link is up at 1000 Mbps, full
duplex.
Apr 23 09:02:09 perseo kernel: tg3: eth2: Flow control is on for TX and
on for RX.
Apr 23 09:02:12 perseo kernel: drbd0: Handshake successful: DRBD Network
Protocol version 86
Apr 23 09:02:12 perseo kernel: drbd0: conn( WFConnection -> WFReportParams )
Apr 23 09:02:12 perseo kernel: drbd0: Starting asender thread (from
drbd0_receiver [5031])
Apr 23 09:02:12 perseo kernel: drbd0: Split-Brain detected, 1 primaries,
automatically solved. Sync from this node
Apr 23 09:02:12 perseo kernel: drbd0: peer( Unknown -> Secondary ) conn(
WFReportParams -> WFBitMapS ) pdsk( DUnknown -> UpToDate )
Apr 23 09:02:12 perseo kernel: drbd0: Writing meta data super block now.
Apr 23 09:02:12 perseo kernel: drbd0: conn( WFBitMapS -> SyncSource )
pdsk( UpToDate -> Inconsistent )
Apr 23 09:02:12 perseo kernel: drbd0: Began resync as SyncSource (will
sync 1052672 KB [263168 bits set]).
Apr 23 09:02:12 perseo kernel: drbd0: Writing meta data super block now.
Apr 23 09:02:30 perseo ucarp[3411]: [WARNING] Switching to state: BACKUP
Apr 23 09:02:30 perseo ucarp[3411]: [WARNING] Spawning
[/etc/ucarp/ucarp-down.sh eth1:0]
Apr 23 09:02:30 perseo squid[5128]: Preparing for shutdown after 0 requests
Apr 23 09:02:30 perseo squid[5128]: Waiting 30 seconds for active
connections to finish
Apr 23 09:02:30 perseo squid[5128]: FD 165 Closing HTTP connection
Apr 23 09:02:59 perseo kernel: drbd0: Resync done (total 47 sec; paused
0 sec; 22396 K/sec)
Apr 23 09:02:59 perseo kernel: drbd0: conn( SyncSource -> Connected )
pdsk( Inconsistent -> UpToDate )
Apr 23 09:02:59 perseo kernel: drbd0: Writing meta data super block now
and if I type sh /etc/ucarp/ucarp-up.s all is ok and it is fixed
I can not understand if it is a problem of drbd or ucarp
my configurations are
drbd.conf
global {
# minor-count 64;
# dialog-refresh 5; # 5 seconds
# disable-ip-verification;
usage-count no;
}
common {
syncer { rate 100M; }
}
resource r0 {
protocol C;
handlers {
pri-lost-after-sb "echo o > /proc/sysrq-trigger ; halt -f";
local-io-error "echo o > /proc/sysrq-trigger ; halt -f";
# outdate-peer "/usr/lib/drbd/outdate-peer.sh on amd 192.168.22.11
192.168.23.11 on alf 192.168.22.12 192.168.23.12";
# outdate-peer "/usr/lib/heartbeat/drbd-peer-outdater -t 5";
pri-lost "echo pri-lost. Have a look at the log files. | mail -s
'DRBD Alert' root";
# Notify someone in case DRBD split brained.
split-brain "echo split-brain. drbdadm -- --discard-my-data connect
$DRBD_RESOURCE ? | mail -s 'DRBD Alert' root";
}
startup {
# wfc-timeout 0;
degr-wfc-timeout 120; # 2 minutes.
# wait-after-sb;
}
disk {
on-io-error detach;
# fencing resource-only;
# size 10G;
}
net {
sndbuf-size 512k;
#timeout 60; # 6 seconds (unit = 0.1 seconds)
#connect-int 10; # 10 seconds (unit = 1 second)
#ping-int 10; # 10 seconds (unit = 1 second)
#ping-timeout 5; # 500 ms (unit = 0.1 seconds)
# max-buffers 2048;
max-buffers 4096;
# unplug-watermark 128;
max-epoch-size 2048;
# ko-count 4;
# allow-two-primaries;
# cram-hmac-alg "sha1";
# shared-secret "FooFunFactory";
after-sb-0pri discard-older-primary;
after-sb-1pri call-pri-lost-after-sb;
after-sb-2pri call-pri-lost-after-sb;
rr-conflict disconnect;
# DRBD-0.7's behaviour is equivalent to
# after-sb-0pri discard-younger-primary;
# after-sb-1pri consensus;
# after-sb-2pri disconnect;
}
syncer {
rate 100M;
#after "r2";
al-extents 257;
}
on sangiorgio2 {
device /dev/drbd0;
disk /dev/sda12;
address 192.168.1.240:7788;
meta-disk internal;
}
on perseo {
device /dev/drbd0;
disk /dev/sda12;
address 192.168.1.5:7788;
meta-disk internal;
}
}
on both nodes
/etc/init.d/ucarp
on master:
#!/bin/sh
PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin
#set to -P if you want this to be the preferred master
PREEMPT=-P
#Log to /var/log/messages
LOG=syslog
FAILOVER_ID=15
SHARED_SECRET=secret1
VIRTUAL_IP=192.168.2.201
REAL_IP=192.168.2.200
INTERFACE=eth1:0
VIP_UP_SCRIPT=/etc/ucarp/ucarp-up.sh
VIP_DOWN_SCRIPT=/etc/ucarp/ucarp-down.sh
OPTIONS="-B -k 0 -b 1 -v $FAILOVER_ID -p $SHARED_SECRET -a $VIRTUAL_IP
- -s $REAL_IP -i $INTERFACE -u $VIP_UP_SCRIPT -d $VIP_DOWN_SCRIPT $PREEMPT
- -z -f $LOG"
test -x /usr/sbin/ucarp || exit 0
case "$1" in
blablabla
on slave:
#!/bin/sh
PATH=/sbin:/bin:/usr/sbin:/usr/bin
#set to -P if you want this to be the preferred master
PREEMPT=-P
#Log to /var/log/messages
LOG=syslog
FAILOVER_ID=10
SHARED_SECRET=secret
NTLM_IP=192.168.2.201
REAL_IP=192.168.2.200
INTERFACE=eth0
VIP_UP_SCRIPT=/etc/ucarp/ucarp-up.sh
VIP_DOWN_SCRIPT=/etc/ucarp/ucarp-down.sh
OPTIONS="-B -k 200 -b 5 -v $FAILOVER_ID -p $SHARED_SECRET -a $NTLM_IP
- -s $REAL_IP -i $INTERFACE -u $VIP_UP_SCRIPT -d $VIP_DOWN_SCRIPT $PREEMPT
-z -f $LOG"
test -x /usr/sbin/ucarp || exit 0
case "$1" in
blablabla
the differences between the two are -k and -b
/etc/ucarp/ucarp-up.sh on both machines
#!/bin/sh
PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin
VIP=192.168.2.201
VIP_BRD=255.255.255.0
NIC=eth1:0
NIC1=eth1:1
BIND=192.168.2.242
exec 2> /dev/null
ifconfig $NIC $VIP/24 broadcast $VIP_BRD
ifconfig $NIC1 $BIND/24 broadcast $VIP_BRD
arping -q -c 5 -U $VIP
arping -q -c 5 -U $BIND
drbdadm primary r0
mount /jumper
/etc/init.d/samba start
/etc/init.d/winbind start
chmod g+rx /var/run/samba/winbindd_privileged
chgrp proxy /var/run/samba/winbindd_privileged
/etc/init.d/bind start
/etc/init.d/squid start
/etc/init.d/dansguardian start
/etc/init.d/squid-ntlm start
/etc/init.d/apache start
/etc/ucarp/ucarp-down.sh on the two machines except reversed mac
#!/bin/sh
PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin
/etc/init.d/apache stop
/etc/init.d/squid-ntlm stop
/etc/init.d/dansguardian stop
/etc/init.d/squid stop
/etc/init.d/samba stop
/etc/init.d/winbind stop
/etc/init.d/bind stop
umount /jumper
drbdadm secondary r0
OTHER_MAC=00:15:17:4B:A7:3A
VIP=192.168.2.201
NIC=eth1:0
BIND=192.168.2.242
NIC1=eth1:1
exec 2> /dev/null
ifconfig $NIC down
ifconfig $NIC1 down
send_arp \
$VIP $OTHER_MAC \
$VIP $OTHER_MAC \
eth1:0 $OTHER_MAC FF:FF:FF:FF:FF:FF reply
sleep 1
send_arp \
$VIP $OTHER_MAC \
$VIP 00:00:00:00:00:00 \
eth1:0 $OTHER_MAC FF:FF:FF:FF:FF:FF request
sleep 1
send_arp \
$VIP $OTHER_MAC \
$VIP $OTHER_MAC \
"$NIC1" $OTHER_MAC FF:FF:FF:FF:FF:FF reply
sleep 1
send_arp \
$BIND $OTHER_MAC \
$BIND 00:00:00:00:00:00 \
"$NIC1" $OTHER_MAC FF:FF:FF:FF:FF:FF request
Can someone help me to understand what's wrong?
or tell me where to look for a deeper analysis
many thanks in advance and best regarsd
sorry for my english.
- --
Mario Vittorio Guenzi
E-mail jclark at tiscali.it
Si vis pacem, para bellum
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFIDvakm6qs1ZkNrIoRAn4yAJsEI1UE9nC1EqH30mqY0buhpAVH1wCeN9rw
4Du4+Vtdz4QTxuHPHWAGzI8=
=AgR5
-----END PGP SIGNATURE-----