[DRBD-user] Optimization Issues

Fri May 7 06:04:27 CEST 2004

Hi everyone,

I have a two-node HA cluster using DRBD for data replication.
Filesystems are all ext3 in ordered-write mode, DRBD v0.6.12, Linux
2.4.26 and Debian GNU/Linux 3.0r2 with DRBD taken direct from
<http://fsrc.csee.wvu.edu/debian/apt-repository>.

The two machines are connected to each other via GbE with a crossover
cable. They have two physical hardware SCSI RAID 10 arrays each, and two
DRBD devices set up, one for each array. One DRBD device is low-traffic,
used for shared configs. The other DRBD device is used heavily for the
PostgreSQL database.

During synchronization and when there is heavy database I/O, I get a lot
of the following messages in my logs on the primary node (interestingly,
the secondary node doesn't complain at all):

    kernel: drbd1: transferlog too small!!
    kernel: drbd1: tl messed up!
    kernel: drbd1: Epoch set size wrong!!found=192 reported=191

I searched the archives and found that this basically means I need to
tune my drbd.conf file, but isn't something critical. Is this correct?
Would anyone know of a general tuning and optimization guide for DRBD?
Or perhaps would anyone be able to spare me some time to comment on my
DRBD configuration file?

I'm also curious: what's the most reliable way of finding the value to
put in disk-size? Or can this be omitted for configurations where the
partitions on both sides are of exactly the same size?

My configuration is as follows:

    resource drbd0 {
        protocol = C
        fsckcmd = fsck -p -y
        inittimeout = 60
        disk {
             do-panic
             disk-size = 61522304k
        }
        net {
             sndbuf-size = 1M
             sync-nice = -20
             sync-min = 4M
             sync-max = 600M
             tl-size = 5000
             timeout = 60
             connect-int = 10
             ping-int = 10
             ko-count = 4
        }
        on node1 {
             device = /dev/nb0
             disk = /dev/cciss/c0d0p5
             address = 192.168.4.1
             port = 7788
        }
        on node2 {
             device = /dev/nb0
             disk = /dev/cciss/c0d0p5
             address = 192.168.4.2
             port = 7788
        }
    }

    resource drbd1 {
        protocol = C
        fsckcmd = fsck -p -y
        inittimeout = 60
        disk {
             do-panic
             disk-size = 70005836k
        }
        net {
             sndbuf-size = 1M
             sync-nice = -20
             sync-min = 4M
             sync-max = 600M
             tl-size = 5000
             timeout = 60
             connect-int = 10
             ping-int = 10
             ko-count = 4
        }
        on node1 {
             device = /dev/nb1
             disk = /dev/cciss/c0d1p1
             address = 192.168.4.1
             port = 7789
        }
        on node2 {
             device = /dev/nb1
             disk = /dev/cciss/c0d1p1
             address = 192.168.4.2
             port = 7789
        }
    }

Thank you very much! :)

 --> Jijo

-- 
Federico Sevilla III : jijo.free.net.ph : When we speak of free software
GNU/Linux Specialist : GnuPG 0x93B746BE : we refer to freedom, not price.