Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hi all,
I am currently struggling with a Linux HA cluster.   This is
my first venture into this area, but I am looking for a HA cluster
of smb file servers.
The setup I am trying to get working is:
2 identical servers, each with:
 Tyan Thunder i7501 Pro Motherboard
 Dual Pentium Xeon 2.4Ghz 533 FSB
 2Gb ECC 266 RAM
 2 Gigabit Ethernet ports
 4Gb SCSI HDD (/dev/sdc) SWAP Partition
 4Gb SCSI HDD (/dev/sdd) System disk
 2 x 3Ware Escalade 8506 SATA RAID controllers
  Each with 4 SATA 200Gb HDD (8Mb Cache)
  Configured as RAID5 with 1 Hot Spare
 I want to RAID0 the two RAID5 arrays with software RAID for performance
The two servers will be clustered together with drbd and heartbeat over a
dedicated Gigabit link.
The Problem
-----------
The hardware RAID5 seems to be fine, and working well, but when I introduce
the RAID0
level on top, drbd seesm to hang the PRIMARY machine after a relatively
short period of time.
I have tried dbrb with just RAID5 (hardware), and that seems fine.
When I put RAID0 on top it all falls down.
I have tried RAID0 on top without a file system and with a filesystem.
I have tried creating the RAID0 using both mdadm and raidtools.
I have updated the kernel to the latest version
I have rebuilt the 3ware drivers to the latest version
I have updated the 3ware firmware
The signs
---------
The primary machine just hangs.   There are no panics or logs of anything
unusual.
The secondary machine gives a c:WFConnect s:Secondary/Unknown status from
/proc/drbd
/var/log/messages reports a ping ack timeout error.   This is all that
happens.
Thank you in advance to anyone who might be able to help me.
Neil
Below are the drbd.conf file and the raidtab file (which I used in the
raidtools test)
# drbd.conf
resource drbd0 {
  protocol = B
  fsckcmd = /bin/true
  disk {
    do-panic
    disk-size = 796582784k
  }
  net {
    sync-nice  = -18
    sync-min   = 4M
    sync-max   = 500M
    tl-size     = 5000
    timeout     = 60
    connect-int = 10
    ping-int    = 10
  }
  on babbage {
    device  = /dev/nb0
    disk    = /dev/md0
    address = 172.16.0.1
    port    = 7788
  }
  on newton {
    device  = /dev/nb0
    disk    = /dev/md0
    address = 172.16.0.2
    port    = 7788
  }
}
# raidtab
raiddev /dev/md0
 raid-level 0
 nr-raid-disks 2
 persistent-superblock 1
 chunk-size 64
 device  /dev/sda1
 raid-disk 0
 device  /dev/sdb1
 raid-disk 1
________________________________________________________________________
This e-mail has been scanned for all viruses by Star Internet. The
service is powered by MessageLabs. For more information on a proactive
anti-virus service working around the clock, around the globe, visit:
http://www.star.net.uk
________________________________________________________________________