[DRBD-user] DRBD Crashing/Stalling on Sync

Adam Taylor adam.taylor at wml.co.nz
Tue Sep 29 22:50:14 CEST 2009

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Hi Tom,

Have you tried lowering your Sync Rate in drbd.conf?  

Thanks

Adam 

-----Original Message-----
From: drbd-user-bounces at lists.linbit.com
[mailto:drbd-user-bounces at lists.linbit.com] On Behalf Of Tom Pawlowski
Sent: Tuesday, September 29, 2009 8:41 AM
To: drbd-user at lists.linbit.com
Subject: [DRBD-user] DRBD Crashing/Stalling on Sync

Hi all,

I've come across an issue on a clustered set up that I haven't been able to
find a resolution for. (It's a bit different than the other stalled ticket
as neither of the nodes are locking up on their own.
They remain stable and are able to ping each other on both eth0/eth1.) If
anyone could shed some light on it, I'd very much appreciate it.


*SETUP*

Two hardware nodes, running a DRBD/LVM/Xen stack (in that order). One node
is using a RAID-1 3ware controller with two Western Digital Blue 1.0TB
drives, the other RAID-5 3ware controller with three WD Black RE 500GB
drives. Everything else is identical.

(I'm attempting to sync the data to the RAID-5 node so I can failover to
that and replace RAID-1 on the other node--given that the performance with
DRBD on the latter is awful.)

Controller information:

node1
3w-9xxx: scsi0: Found a 3ware 9000 Storage Controller at 0xda100000, IRQ:
16.
3w-9xxx: scsi0: Firmware FE9X 4.06.00.004, BIOS BE9X 4.05.00.015, Ports: 4.

node2
3w-9xxx: scsi0: Found a 3ware 9000 Storage Controller at 0xda100000, IRQ:
16.
3w-9xxx: scsi0: Firmware FE9X 4.06.00.004, BIOS BE9X 4.05.00.015, Ports: 2.

Software Version information for both nodes are identical:

Distro: Debian 5.0.3
Kernel: Linux 2.6.26-2-xen-amd64 #1 SMP Thu Aug 20 00:36:34 UTC 2009
x86_64 GNU/Linux
drbdadm Version: Version: 8.0.14 (api:86) Xen Version: 3.0.3


*PROBLEM*

The sync will run along without any issues for a few hours, after which
/proc/drbd reports that it has stalled. The Xen instance running on top
stops responding entirely. There are a number of stack traces in the system
log which I have attached to the email.

Has anyone come across something like this before? Updating DRBD is a bit
iffy, as our client is very downtime-adverse given that he's paying for a
high-availability setup. Don't know if that can be done without
reinitializing both resources.

Thanks in advance for any help!

Regards,
Tom Pawlowski



__________ NOD32 4468 (20090929) Information __________

This message was checked by NOD32 antivirus system.
http://www.eset.com




More information about the drbd-user mailing list