[DRBD-user] DRBD stability issues

Lars Ellenberg lars.ellenberg at linbit.com
Wed Nov 16 13:03:47 CET 2011

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


On Wed, Nov 16, 2011 at 01:01:38PM +1100, Steve Kieu wrote:
> Hello,
> 
> I am experimenting drbd and not quite good in stability (un usable). I saw
> this in dmesg log:
> 
> block drbd1: md_sync_timer expired! Worker calls drbd_md_sync
> ().

Usually, especially with "huge" devices, this is no reason to worry.
No need to do _anything_.

> At fist restart it works for a while, and then all of sudden - cat
> /proc/drbd show ProtocolError and system hang (mysql or any other process
> read/write to the drbd partitions.
> 
> It is repeatable and when it happend network is not busy, machine load is
> nearly 0 and all other network connectivity is normal.
> 
> Googling show me that many users has same problem and one suggested to
> lower the rate of resync and sync, I did that (for 100Mbit ethernet I set
> resync is 3M and in syncer  rate 40M; I setup two volumes . Problem still.
> 
> 
> Here is the short description of the system:
> 
> * Centos 6  x86_64
> * Kernel 2.6.32.43-vs2.3.0.36.29.8-h1-32cpu-noselinux which is vanilar
> kernel 2.6.32.43 with vserver patch vs2.3.0.36.29.8 - compile with HZ = 100
> and SMP for 32 cpu
> * DRBD compiled from source, version 8.4.0 (including kernel module)

8.4.0 seems to have serious stability issues under moderate to heavy IO
when actually using the multi volume feature :-(
We are preparing a 8.4.1.

> * DRBD build on top of LVM here is the config
> 
> resource r0 {
> 
>           on cosmos {
>                   volume 0 {
>                     #device minor 0;
>                     device /dev/drbd0;
>                     meta-disk internal;
>                     disk  /dev/vs-resource1/mysqldata;
>                   }
> 
>                   volume 1 {
>                     device /dev/drbd1;
>                     meta-disk internal;
>                     disk  /dev/vs-resource1/pgsqldata;
>                   }
> 
>                   address   10.200.11.4:7789;
>            }
> 
>           on seaspray {
>                  volume 0 {
>                         # device minor 0;
>                         device /dev/drbd0;
>                         meta-disk internal;
>                         disk      /dev/vg_seaspray/mysqldata;
>                  }
> 
>                  volume 1 {
>                     device /dev/drbd1;
>                     meta-disk internal;
>                     disk  /dev/vg_seaspray/pgsqldata;
>                   }
> 
>             address   10.200.11.3:7789;
>           }
> 
>         startup {
>           #become-primary-on both;
> 
>         }
>  net {
>                 #allow-two-primaries;
>                 protocol C;
>                 after-sb-0pri discard-zero-changes;
>                 after-sb-1pri discard-secondary;
>                 after-sb-2pri disconnect;
>                 #cram-hmac-alg sha1;
>                 #shared-secret "FooFunFactory";
> 
>         }
> 
> 
> }
> 
> * DRBD runs in Primary/Secondary mode for now. The device is mounted into a
> vserver instance and mysql and postgres is running from the vserver
> * IPtables is setup to allow DRBD trafic - it happened even iptables is off
> 
> * Network route
> route
> Kernel IP routing table
> Destination     Gateway         Genmask         Flags Metric Ref    Use
> Iface
> 10.200.11.0     *               255.255.255.224 U     0      0        0 eth0
> 10.200.11.128   *               255.255.255.192 U     0      0        0
> eth1.503
> 192.168.100.0   *               255.255.255.0   U     0      0        0
> dummy0
> 1.1.1.0         *               255.255.255.0   U     0      0        0
> vmbr0
> link-local      *               255.255.0.0     U     1002   0        0 eth0
> link-local      *               255.255.0.0     U     1003   0        0 eth1
> link-local      *               255.255.0.0     U     1004   0        0
> eth1.503
> default         10.200.11.1     0.0.0.0         UG    0      0        0 eth0
> 
> I attach the dmesg here as well if it helps to debug. I would like to have
> it fixed so please help.
> 
> Many thanks,
> 
> 
> 
> 
> -- 
> Steve Kieu


> _______________________________________________
> drbd-user mailing list
> drbd-user at lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user


-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed



More information about the drbd-user mailing list