[DRBD-user] DRBD Crashing/Stalling on Sync

Tom Pawlowski tpawlowski at fortressitx.com
Mon Sep 28 22:41:23 CEST 2009

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Hi all,

I've come across an issue on a clustered set up that I haven't been
able to find a resolution for. (It's a bit different than the other
stalled ticket as neither of the nodes are locking up on their own.
They remain stable and are able to ping each other on both eth0/eth1.)
If anyone could shed some light on it, I'd very much appreciate it.


*SETUP*

Two hardware nodes, running a DRBD/LVM/Xen stack (in that order). One
node is using a RAID-1 3ware controller with two Western Digital Blue
1.0TB drives, the other RAID-5 3ware controller with three WD Black RE
500GB drives. Everything else is identical.

(I'm attempting to sync the data to the RAID-5 node so I can failover
to that and replace RAID-1 on the other node--given that the
performance with DRBD on the latter is awful.)

Controller information:

node1
3w-9xxx: scsi0: Found a 3ware 9000 Storage Controller at 0xda100000, IRQ: 16.
3w-9xxx: scsi0: Firmware FE9X 4.06.00.004, BIOS BE9X 4.05.00.015, Ports: 4.

node2
3w-9xxx: scsi0: Found a 3ware 9000 Storage Controller at 0xda100000, IRQ: 16.
3w-9xxx: scsi0: Firmware FE9X 4.06.00.004, BIOS BE9X 4.05.00.015, Ports: 2.

Software Version information for both nodes are identical:

Distro: Debian 5.0.3
Kernel: Linux 2.6.26-2-xen-amd64 #1 SMP Thu Aug 20 00:36:34 UTC 2009
x86_64 GNU/Linux
drbdadm Version: Version: 8.0.14 (api:86)
Xen Version: 3.0.3


*PROBLEM*

The sync will run along without any issues for a few hours, after
which /proc/drbd reports that it has stalled. The Xen instance running
on top stops responding entirely. There are a number of stack traces
in the system log which I have attached to the email.

Has anyone come across something like this before? Updating DRBD is a
bit iffy, as our client is very downtime-adverse given that he's
paying for a high-availability setup. Don't know if that can be
done without reinitializing both resources.

Thanks in advance for any help!

Regards,
Tom Pawlowski
-------------- next part --------------
A non-text attachment was scrubbed...
Name: node2.drbd.crash.20090928.log
Type: text/x-log
Size: 25836 bytes
Desc: not available
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20090928/a2e780f8/attachment.bin>


More information about the drbd-user mailing list