Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
/ 2004-01-28 09:02:43 -0000
\ Dave Smith:
> Hi all,
>
> I am currently struggling with a Linux HA cluster. This is
> my first venture into this area, but I am looking for a HA cluster
> of smb file servers.
>
> The setup I am trying to get working is:
>
> 2 identical servers, each with:
> Tyan Thunder i7501 Pro Motherboard
> Dual Pentium Xeon 2.4Ghz 533 FSB
> 2Gb ECC 266 RAM
> 2 Gigabit Ethernet ports
> 4Gb SCSI HDD (/dev/sdc) SWAP Partition
> 4Gb SCSI HDD (/dev/sdd) System disk
> 2 x 3Ware Escalade 8506 SATA RAID controllers
> Each with 4 SATA 200Gb HDD (8Mb Cache)
> Configured as RAID5 with 1 Hot Spare
> I want to RAID0 the two RAID5 arrays with software RAID for performance
>
> The two servers will be clustered together with drbd and heartbeat over a
> dedicated Gigabit link.
>
> The Problem
> -----------
>
> The hardware RAID5 seems to be fine, and working well, but when I introduce
> the RAID0 level on top, drbd seesm to hang the PRIMARY machine
> after a relatively short period of time.
>
> I have tried dbrb with just RAID5 (hardware), and that seems fine.
>
> When I put RAID0 on top it all falls down.
>
> I have tried RAID0 on top without a file system and with a filesystem.
> I have tried creating the RAID0 using both mdadm and raidtools.
>
> I have updated the kernel to the latest version
> I have rebuilt the 3ware drivers to the latest version
> I have updated the 3ware firmware
what about the drbd version, did you use 0.6.10+cvs ?
And which "latest kernel version"? kernel.org? some vendor kernel?
Not that I think it is a kernel problem, but it won't be the first
interoperability problem ...
> The signs
> ---------
> The primary machine just hangs. There are no panics or logs of anything
> unusual.
> The secondary machine gives a c:WFConnect s:Secondary/Unknown status from
> /proc/drbd
> /var/log/messages reports a ping ack timeout error. This is all that
> happens.
>
> Thank you in advance to anyone who might be able to help me.
>
> Neil
>
>
> Below are the drbd.conf file and the raidtab file (which I used in the
> raidtools test)
>
> # drbd.conf
> resource drbd0 {
> protocol = B
You should use protocol C.
Unless you are mirroring over some long distance link, in which
case you shuold use A. Benchmarks suggest that proto B does not
cut it at all, even though one may think it does.
[...]
> # raidtab
> raiddev /dev/md0
> raid-level 0
> nr-raid-disks 2
> persistent-superblock 1
> chunk-size 64
> device /dev/sda1
> raid-disk 0
> device /dev/sdb1
> raid-disk 1
I don't see anything suspicious.
Lars Ellenberg