[DRBD-user] drbd+ucarp advice on config options

Mario Vittorio Guenzi jclark at tiscali.it
Wed Apr 23 10:43:16 CEST 2008

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi all,
I'm a beginner with ha cluster; after reading the users-guide of drbd
and the documentation of ucarp   And after a long surfing the web to see
how others had resolved the issue I have configured two identical
machines with drbd 8.0.11 and ucarp 1.2.
The environment is the follow:
Distro debian etch
kernel 2.6.24.3
drbd 8.0.11
ucarp 1.2 (etch package)
drbd work on dedicated nics Broadcom Corporation NetXtreme BCM5754
Gigabit Ethernet PCI Express (rev 02)
sangiorgio2:~# ethtool -i  eth2
driver: tg3
version: 3.86
firmware-version: 5754-v3.12
bus-info: 0000:04:00.0

The two machines are plugged at two different routers of two ISP, I need
to  put in ha squid dansguardian bind apache.

I have compiled drbd in kernel built-in and installed the tools.
Now, when I shutdown the master, the slave takes adresses, mounts the
partition and starts the services without problems, but when I reboot
the master it does not return primary and doesn't mount the partition
nor starts the services, but takes the address.
after te master reboot I have
on master:

sangiorgio2:~# cat /proc/drbd
version: 8.0.11 (api:86/proto:86)
GIT-hash: b3fe2bdfd3b9f7c2f923186883eb9e2a0d3a5b1b build by
root at sangiorgio2, 2008-03-28 16:36:00
 0: cs:Connected st:Secondary/Secondary ds:UpToDate/UpToDate C r---
    ns:0 nr:1052732 dw:1052732 dr:0 al:0 bm:402 lo:0 pe:0 ua:0 ap:0
        resync: used:0/31 hits:65591 misses:201 starving:0 dirty:0
changed:201
        act_log: used:0/257 hits:0 misses:0 starving:0 dirty:0 changed:0

master's syslog says :

Starting DRBD resources:    [ d(r0) drbd0: disk( Diskless -> Attaching )
drbd0: Starting worker thread (from cqueue/0 [187])
drbd0: Found 6 transactions (324 active extents) in activity log.
drbd0: max_segment_size ( = BIO size ) = 32768
drbd0: drbd_bm_resize called with capacity == 61719704
drbd0: resync bitmap: bits=7714963 words=241094
drbd0: size = 29 GB (30859852 KB)
drbd0: reading of bitmap took 13 jiffies
drbd0: recounting of set bits took additional 0 jiffies
drbd0: 0 KB (0 bits) marked out-of-sync by on disk bit-map.
drbd0: Marked additional 1028 MB as out-of-sync based on AL.
drbd0: disk( Attaching -> UpToDate )
drbd0: Writing meta data super block now.
s(r0) n(r0) drbd0: conn( StandAlone -> Unconnected )
drbd0: Starting receiver thread (from drbd0_worker [3391])
drbd0: receiver (re)started
drbd0: conn( Unconnected -> WFConnection )
].
drbd0: Handshake successful: DRBD Network Protocol version 86
drbd0: conn( WFConnection -> WFReportParams )
drbd0: Starting asender thread (from drbd0_receiver [3396])
drbd0: Split-Brain detected, 1 primaries, automatically solved. Sync
from peer node
drbd0: peer( Unknown -> Primary ) conn( WFReportParams -> WFBitMapT )
pdsk( DUnknown -> UpToDate )
drbd0: Writing meta data super block now.
drbd0: conn( WFBitMapT -> WFSyncUUID )
drbd0: conn( WFSyncUUID -> SyncTarget ) disk( UpToDate -> Inconsistent )
drbd0: Began resync as SyncTarget (will sync 1052672 KB [263168 bits set]).
drbd0: Writing meta data super block now.
Starting ucarp service: ucarp.
Starting deferred execution scheduler: atd.
Starting periodic command scheduler: crond.
Starting anti portscan daemon: portsentry in atcp & audp mode.

Debian GNU/Linux 4.0 sangiorgio2 tty1

sangiorgio2 login: drbd0: Resync done (total 47 sec; paused 0 sec; 22396
K/sec)
drbd0: conn( SyncTarget -> Connected ) disk( Inconsistent -> UpToDate )
drbd0: Writing meta data super block now.
drbd0: peer( Primary -> Secondary )
drbd0: Writing meta data super block now.


on slave:
perseo:~# cat /proc/drbd
version: 8.0.11 (api:86/proto:86)
GIT-hash: b3fe2bdfd3b9f7c2f923186883eb9e2a0d3a5b1b build by
root at sangiorgio2, 2008-03-28 16:36:00
 0: cs:Connected st:Secondary/Secondary ds:UpToDate/UpToDate C r---
    ns:2105712 nr:1052940 dw:1053820 dr:2215119 al:0 bm:807 lo:0 pe:0
ua:0 ap:0
        resync: used:0/31 hits:196801 misses:609 starving:0 dirty:0
changed:609
        act_log: used:0/257 hits:220 misses:0 starving:0 dirty:0 changed:0

slave's syslog  says:


Apr 23 09:02:07 perseo kernel: tg3: eth2: Link is down.
Apr 23 09:02:09 perseo kernel: tg3: eth2: Link is up at 1000 Mbps, full
duplex.
Apr 23 09:02:09 perseo kernel: tg3: eth2: Flow control is on for TX and
on for RX.
Apr 23 09:02:12 perseo kernel: drbd0: Handshake successful: DRBD Network
Protocol version 86
Apr 23 09:02:12 perseo kernel: drbd0: conn( WFConnection -> WFReportParams )
Apr 23 09:02:12 perseo kernel: drbd0: Starting asender thread (from
drbd0_receiver [5031])
Apr 23 09:02:12 perseo kernel: drbd0: Split-Brain detected, 1 primaries,
automatically solved. Sync from this node
Apr 23 09:02:12 perseo kernel: drbd0: peer( Unknown -> Secondary ) conn(
WFReportParams -> WFBitMapS ) pdsk( DUnknown -> UpToDate )
Apr 23 09:02:12 perseo kernel: drbd0: Writing meta data super block now.
Apr 23 09:02:12 perseo kernel: drbd0: conn( WFBitMapS -> SyncSource )
pdsk( UpToDate -> Inconsistent )
Apr 23 09:02:12 perseo kernel: drbd0: Began resync as SyncSource (will
sync 1052672 KB [263168 bits set]).
Apr 23 09:02:12 perseo kernel: drbd0: Writing meta data super block now.
Apr 23 09:02:30 perseo ucarp[3411]: [WARNING] Switching to state: BACKUP
Apr 23 09:02:30 perseo ucarp[3411]: [WARNING] Spawning
[/etc/ucarp/ucarp-down.sh eth1:0]
Apr 23 09:02:30 perseo squid[5128]: Preparing for shutdown after 0 requests
Apr 23 09:02:30 perseo squid[5128]: Waiting 30 seconds for active
connections to finish
Apr 23 09:02:30 perseo squid[5128]: FD 165 Closing HTTP connection
Apr 23 09:02:59 perseo kernel: drbd0: Resync done (total 47 sec; paused
0 sec; 22396 K/sec)
Apr 23 09:02:59 perseo kernel: drbd0: conn( SyncSource -> Connected )
pdsk( Inconsistent -> UpToDate )
Apr 23 09:02:59 perseo kernel: drbd0: Writing meta data super block now


and if I type sh /etc/ucarp/ucarp-up.s all is ok and it is fixed

I can not understand if it is a problem of drbd or ucarp


my configurations are

drbd.conf
global {
    # minor-count 64;
    # dialog-refresh 5; # 5 seconds
    # disable-ip-verification;
    usage-count no;
}
common {
  syncer { rate 100M; }
}
resource r0 {
  protocol C;
  handlers {
    pri-lost-after-sb "echo o > /proc/sysrq-trigger ; halt -f";
    local-io-error "echo o > /proc/sysrq-trigger ; halt -f";
    # outdate-peer "/usr/lib/drbd/outdate-peer.sh on amd 192.168.22.11
192.168.23.11 on alf 192.168.22.12 192.168.23.12";
    # outdate-peer "/usr/lib/heartbeat/drbd-peer-outdater -t 5";
    pri-lost "echo pri-lost. Have a look at the log files. | mail -s
'DRBD Alert' root";
    # Notify someone in case DRBD split brained.
    split-brain "echo split-brain. drbdadm -- --discard-my-data connect
$DRBD_RESOURCE ? | mail -s 'DRBD Alert' root";
  }
  startup {
    # wfc-timeout  0;
    degr-wfc-timeout 120;    # 2 minutes.
    # wait-after-sb;
  }
  disk {
    on-io-error   detach;
    # fencing resource-only;
    # size 10G;
  }
  net {
     sndbuf-size 512k;
     #timeout       60;    #  6 seconds  (unit = 0.1 seconds)
     #connect-int   10;    # 10 seconds  (unit = 1 second)
     #ping-int      10;    # 10 seconds  (unit = 1 second)
     #ping-timeout   5;    # 500 ms (unit = 0.1 seconds)

    # max-buffers     2048;
    max-buffers     4096;
    # unplug-watermark   128;
     max-epoch-size  2048;
    #  ko-count 4;
    # allow-two-primaries;
    # cram-hmac-alg "sha1";
    # shared-secret "FooFunFactory";
    after-sb-0pri discard-older-primary;
    after-sb-1pri call-pri-lost-after-sb;
    after-sb-2pri call-pri-lost-after-sb;
    rr-conflict disconnect;

    # DRBD-0.7's behaviour is equivalent to
    #   after-sb-0pri discard-younger-primary;
    #   after-sb-1pri consensus;
    #   after-sb-2pri disconnect;
  }

  syncer {

    rate 100M;
    #after "r2";
    al-extents 257;
  }

  on sangiorgio2 {
    device     /dev/drbd0;
    disk       /dev/sda12;
    address    192.168.1.240:7788;
    meta-disk  internal;

  }

  on perseo {
    device    /dev/drbd0;
    disk      /dev/sda12;
    address   192.168.1.5:7788;
    meta-disk internal;
  }
}

on both nodes

/etc/init.d/ucarp

on master:

#!/bin/sh

PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin
#set to -P if you want this to be the preferred master
PREEMPT=-P
#Log to /var/log/messages
LOG=syslog
FAILOVER_ID=15
SHARED_SECRET=secret1
VIRTUAL_IP=192.168.2.201
REAL_IP=192.168.2.200
INTERFACE=eth1:0
VIP_UP_SCRIPT=/etc/ucarp/ucarp-up.sh
VIP_DOWN_SCRIPT=/etc/ucarp/ucarp-down.sh
OPTIONS="-B -k 0 -b 1 -v $FAILOVER_ID -p $SHARED_SECRET -a $VIRTUAL_IP
- -s $REAL_IP -i $INTERFACE -u $VIP_UP_SCRIPT -d $VIP_DOWN_SCRIPT $PREEMPT
- -z  -f $LOG"
test -x /usr/sbin/ucarp || exit 0

case "$1" in
blablabla

on slave:
#!/bin/sh

PATH=/sbin:/bin:/usr/sbin:/usr/bin
#set to -P if you want this to be the preferred master
PREEMPT=-P
#Log to /var/log/messages
LOG=syslog
FAILOVER_ID=10

SHARED_SECRET=secret
NTLM_IP=192.168.2.201
REAL_IP=192.168.2.200
INTERFACE=eth0
VIP_UP_SCRIPT=/etc/ucarp/ucarp-up.sh
VIP_DOWN_SCRIPT=/etc/ucarp/ucarp-down.sh
OPTIONS="-B  -k 200 -b 5 -v $FAILOVER_ID -p $SHARED_SECRET -a $NTLM_IP
- -s $REAL_IP -i $INTERFACE -u $VIP_UP_SCRIPT -d $VIP_DOWN_SCRIPT $PREEMPT
 -z -f $LOG"
test -x /usr/sbin/ucarp || exit 0

case "$1" in
blablabla

the differences between the two are -k and -b

/etc/ucarp/ucarp-up.sh on both machines
#!/bin/sh
PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin
VIP=192.168.2.201
VIP_BRD=255.255.255.0
NIC=eth1:0
NIC1=eth1:1
BIND=192.168.2.242
exec 2> /dev/null

ifconfig $NIC $VIP/24 broadcast $VIP_BRD
ifconfig $NIC1 $BIND/24 broadcast $VIP_BRD
arping -q -c 5 -U $VIP
arping -q -c 5 -U $BIND
drbdadm primary r0
mount /jumper
/etc/init.d/samba start
/etc/init.d/winbind start
chmod g+rx /var/run/samba/winbindd_privileged
chgrp proxy /var/run/samba/winbindd_privileged
/etc/init.d/bind start
/etc/init.d/squid start
/etc/init.d/dansguardian start
/etc/init.d/squid-ntlm start
/etc/init.d/apache start

/etc/ucarp/ucarp-down.sh on the two machines except reversed mac

#!/bin/sh
PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin
/etc/init.d/apache stop
/etc/init.d/squid-ntlm stop
/etc/init.d/dansguardian stop
/etc/init.d/squid stop
/etc/init.d/samba stop
/etc/init.d/winbind stop
/etc/init.d/bind stop
umount /jumper
drbdadm secondary r0
OTHER_MAC=00:15:17:4B:A7:3A
VIP=192.168.2.201
NIC=eth1:0
BIND=192.168.2.242
NIC1=eth1:1
exec 2> /dev/null
ifconfig $NIC down
ifconfig $NIC1 down

send_arp \
       $VIP $OTHER_MAC \
       $VIP $OTHER_MAC \
       eth1:0  $OTHER_MAC FF:FF:FF:FF:FF:FF reply
sleep 1
send_arp \
       $VIP $OTHER_MAC \
       $VIP 00:00:00:00:00:00 \
       eth1:0  $OTHER_MAC FF:FF:FF:FF:FF:FF request
sleep 1
send_arp \
       $VIP $OTHER_MAC \
       $VIP $OTHER_MAC \
       "$NIC1"  $OTHER_MAC FF:FF:FF:FF:FF:FF reply
sleep 1
send_arp \
       $BIND $OTHER_MAC \
       $BIND 00:00:00:00:00:00 \
       "$NIC1"  $OTHER_MAC FF:FF:FF:FF:FF:FF request


Can someone help me to  understand what's wrong?
or tell me where to look for a deeper analysis
many thanks in advance and best regarsd

sorry for my english.

- --

Mario Vittorio Guenzi
E-mail jclark at tiscali.it
Si vis pacem, para bellum




-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFIDvakm6qs1ZkNrIoRAn4yAJsEI1UE9nC1EqH30mqY0buhpAVH1wCeN9rw
4Du4+Vtdz4QTxuHPHWAGzI8=
=AgR5
-----END PGP SIGNATURE-----



More information about the drbd-user mailing list