Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hi all, I've come across an issue on a clustered set up that I haven't been able to find a resolution for. (It's a bit different than the other stalled ticket as neither of the nodes are locking up on their own. They remain stable and are able to ping each other on both eth0/eth1.) If anyone could shed some light on it, I'd very much appreciate it. *SETUP* Two hardware nodes, running a DRBD/LVM/Xen stack (in that order). One node is using a RAID-1 3ware controller with two Western Digital Blue 1.0TB drives, the other RAID-5 3ware controller with three WD Black RE 500GB drives. Everything else is identical. (I'm attempting to sync the data to the RAID-5 node so I can failover to that and replace RAID-1 on the other node--given that the performance with DRBD on the latter is awful.) Controller information: node1 3w-9xxx: scsi0: Found a 3ware 9000 Storage Controller at 0xda100000, IRQ: 16. 3w-9xxx: scsi0: Firmware FE9X 4.06.00.004, BIOS BE9X 4.05.00.015, Ports: 4. node2 3w-9xxx: scsi0: Found a 3ware 9000 Storage Controller at 0xda100000, IRQ: 16. 3w-9xxx: scsi0: Firmware FE9X 4.06.00.004, BIOS BE9X 4.05.00.015, Ports: 2. Software Version information for both nodes are identical: Distro: Debian 5.0.3 Kernel: Linux 2.6.26-2-xen-amd64 #1 SMP Thu Aug 20 00:36:34 UTC 2009 x86_64 GNU/Linux drbdadm Version: Version: 8.0.14 (api:86) Xen Version: 3.0.3 *PROBLEM* The sync will run along without any issues for a few hours, after which /proc/drbd reports that it has stalled. The Xen instance running on top stops responding entirely. There are a number of stack traces in the system log which I have attached to the email. Has anyone come across something like this before? Updating DRBD is a bit iffy, as our client is very downtime-adverse given that he's paying for a high-availability setup. Don't know if that can be done without reinitializing both resources. Thanks in advance for any help! Regards, Tom Pawlowski -------------- next part -------------- A non-text attachment was scrubbed... Name: node2.drbd.crash.20090928.log Type: text/x-log Size: 25836 bytes Desc: not available URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20090928/a2e780f8/attachment.bin>