[DRBD-user] mirror config questions for manual failover

Mon Jan 19 21:42:06 CET 2004

Some clarifications:
On Sun, 18 Jan 2004 22:18:51 -0500
george young <gry at ll.mit.edu> threw this fish to the penguins:

> [drbd-0.6.10, Suse 8.2 x86 linux 2.4.20-4GB-SMP, 2 nodes, pvt 100Mb net]
> I have two nodes, pig-app and pig-db.  Default config is that pig-app has
> the active copy of /home(36GB), pig-db has /db(1GB), each DRBD mirrored to
> the other.  If and *only if* an administrator decides that one node is
> down, she runs a script on the remaining node to take over the other's
> file system (and switch ip's around so users get the new host).  I'm
> having trouble getting the right drbd commands for this script.  I also
> see very slowww syncing time...  
> Both are are reiser file systems.
> There is a private 100Mbit ethernet between the two nodes.

The 100Mb private net seems healthy:  tar-rsh-tar gets me 7.5MB/s.
Without the tar and fs overhead I get 7.9MB/sec.

With --sync-min=10M and fsckcmd=/bin/true and separate --sync-groups
and removing the disconnect/net pair of commands, I get 1h44m sync
for the 36G partition, i.e. 5.8MB/s.  It would be nice to do 20%
better, but clearly not a lot faster until I get gigabit ethernet.

> I use the HA-Linux "IPaddr" script, but heatbeat is *not* enabled.
> 
> Here's my script for pig-app to grab the /db filesystem from pig-db:
> ------------------------------------------------------------
> if ping -c 1 pig-db; then
>     rsh -n pig-db /usr/local/etc/ha.d/resource.d/datadisk drbd_db stop &
>     sleep 30
> fi

I think I had put these in hoping to avoid a long sync delay.
I'll take them out.
#> /sbin/drbdsetup /dev/nb1 disconnect
#> /sbin/drbdsetup /dev/nb1 net 10.0.0.115:7789 10.0.0.114:7789 C
> /usr/local/etc/ha.d/resource.d/datadisk drbd_db start

I *do* start other services in this script: samba, postgres,
and the roving service IP addresses were omitted here for brevity.

> ------------------------------------------------------------
> I'm trying to assure that I don't get into a 2-hour long sync while
> users are screaming.  After correcting the problem, I can revert
> off-hours, so that time is not critical.
> Does this script make sense?  How could it be better?
> 
> I am also frustrated that it takes 2 hours to sync 36G over a 100Mbit
> private net.  That's a rate of about 5 Mbytes/sec.  Disks on both hosts
> are fast hardware raids.  Am I missing something?
> 
> Below is my(common) drbd.conf:
> ------------------------------------------------------------
> resource drbd_home {
>   protocol = C
>   fsckcmd  = fsck -p -y
I'll change this to /bin/true

>   disk {
>     disk-size = 36707364k
>   }
>   net {
>     sync-min    = 500k
I'll change this to sync-min=10M

>     sync-max    = 100M    # maximal average syncer bandwidth
>     tl-size     = 5000  # transfer log size, ensures strict write ordering
>     timeout     = 60    # 0.1 seconds
>     connect-int = 10    # seconds
>     ping-int    = 10    # seconds

I'll add sync-group=1
>   }
>   on pig-app {
>     device  = /dev/nb0
>     disk    = /dev/rd/c0d0p5
>     address = 10.0.0.115
>     port    = 7788
>   }
>   on pig-db {
>     device  = /dev/nb0
>     disk    = /dev/rd/c0d2p1
>     address = 10.0.0.114
>     port    = 7788
>   }
> }
> resource drbd_db {
>   protocol = C
>   fsckcmd  = fsck -p -y
>   disk {
>     disk-size = 1052184k
>   }
>   net {
>     sync-min    = 500k
>     sync-max    = 100M    # maximal average syncer bandwidth
>     tl-size     = 5000  # transfer log size, ensures strict write ordering
>     timeout     = 60    # 0.1 seconds
>     connect-int = 10    # seconds
>     ping-int    = 10    # seconds

I'll add sync-group=2
>   }
>   on pig-app {
>     device  = /dev/nb1
>     disk    = /dev/rd/c0d0p3
>     address = 10.0.0.115
>     port    = 7789
>   }
>   on pig-db {
>     device  = /dev/nb1
>     disk    = /dev/rd/c0d0p1
>     address = 10.0.0.114
>     port    = 7789
>   }
> }
> 

-- 
"Are the gods not just?"  "Oh no, child.
What would become of us if they were?" (CSL)