[DRBD-user] DRBD stability issues

Steve Kieu msh.computing at gmail.com
Wed Nov 16 22:46:35 CET 2011

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Hi,

Thanks for the reply.

The multi volume feature is quite useful. With this I can for example
switch Primary/Secondary individually if mysql or pgsql crashed without
affecting each other.

By no means of urging, it would be great if you can tell when the next
release would be so I can test? Also is 8.3.12 workable - I will give it a
test anyway if it is long to wait for the next release.

Many thanks,



On Wed, Nov 16, 2011 at 11:03 PM, Lars Ellenberg
<lars.ellenberg at linbit.com>wrote:

> On Wed, Nov 16, 2011 at 01:01:38PM +1100, Steve Kieu wrote:
> > Hello,
> >
> > I am experimenting drbd and not quite good in stability (un usable). I
> saw
> > this in dmesg log:
> >
> > block drbd1: md_sync_timer expired! Worker calls drbd_md_sync
> > ().
>
> Usually, especially with "huge" devices, this is no reason to worry.
> No need to do _anything_.
>
> > At fist restart it works for a while, and then all of sudden - cat
> > /proc/drbd show ProtocolError and system hang (mysql or any other process
> > read/write to the drbd partitions.
> >
> > It is repeatable and when it happend network is not busy, machine load is
> > nearly 0 and all other network connectivity is normal.
> >
> > Googling show me that many users has same problem and one suggested to
> > lower the rate of resync and sync, I did that (for 100Mbit ethernet I set
> > resync is 3M and in syncer  rate 40M; I setup two volumes . Problem
> still.
> >
> >
> > Here is the short description of the system:
> >
> > * Centos 6  x86_64
> > * Kernel 2.6.32.43-vs2.3.0.36.29.8-h1-32cpu-noselinux which is vanilar
> > kernel 2.6.32.43 with vserver patch vs2.3.0.36.29.8 - compile with HZ =
> 100
> > and SMP for 32 cpu
> > * DRBD compiled from source, version 8.4.0 (including kernel module)
>
> 8.4.0 seems to have serious stability issues under moderate to heavy IO
> when actually using the multi volume feature :-(
> We are preparing a 8.4.1.
>
> > * DRBD build on top of LVM here is the config
> >
> > resource r0 {
> >
> >           on cosmos {
> >                   volume 0 {
> >                     #device minor 0;
> >                     device /dev/drbd0;
> >                     meta-disk internal;
> >                     disk  /dev/vs-resource1/mysqldata;
> >                   }
> >
> >                   volume 1 {
> >                     device /dev/drbd1;
> >                     meta-disk internal;
> >                     disk  /dev/vs-resource1/pgsqldata;
> >                   }
> >
> >                   address   10.200.11.4:7789;
> >            }
> >
> >           on seaspray {
> >                  volume 0 {
> >                         # device minor 0;
> >                         device /dev/drbd0;
> >                         meta-disk internal;
> >                         disk      /dev/vg_seaspray/mysqldata;
> >                  }
> >
> >                  volume 1 {
> >                     device /dev/drbd1;
> >                     meta-disk internal;
> >                     disk  /dev/vg_seaspray/pgsqldata;
> >                   }
> >
> >             address   10.200.11.3:7789;
> >           }
> >
> >         startup {
> >           #become-primary-on both;
> >
> >         }
> >  net {
> >                 #allow-two-primaries;
> >                 protocol C;
> >                 after-sb-0pri discard-zero-changes;
> >                 after-sb-1pri discard-secondary;
> >                 after-sb-2pri disconnect;
> >                 #cram-hmac-alg sha1;
> >                 #shared-secret "FooFunFactory";
> >
> >         }
> >
> >
> > }
> >
> > * DRBD runs in Primary/Secondary mode for now. The device is mounted
> into a
> > vserver instance and mysql and postgres is running from the vserver
> > * IPtables is setup to allow DRBD trafic - it happened even iptables is
> off
> >
> > * Network route
> > route
> > Kernel IP routing table
> > Destination     Gateway         Genmask         Flags Metric Ref    Use
> > Iface
> > 10.200.11.0     *               255.255.255.224 U     0      0        0
> eth0
> > 10.200.11.128   *               255.255.255.192 U     0      0        0
> > eth1.503
> > 192.168.100.0   *               255.255.255.0   U     0      0        0
> > dummy0
> > 1.1.1.0         *               255.255.255.0   U     0      0        0
> > vmbr0
> > link-local      *               255.255.0.0     U     1002   0        0
> eth0
> > link-local      *               255.255.0.0     U     1003   0        0
> eth1
> > link-local      *               255.255.0.0     U     1004   0        0
> > eth1.503
> > default         10.200.11.1     0.0.0.0         UG    0      0        0
> eth0
> >
> > I attach the dmesg here as well if it helps to debug. I would like to
> have
> > it fixed so please help.
> >
> > Many thanks,
> >
> >
> >
> >
> > --
> > Steve Kieu
>
>
> > _______________________________________________
> > drbd-user mailing list
> > drbd-user at lists.linbit.com
> > http://lists.linbit.com/mailman/listinfo/drbd-user
>
>
> --
> : Lars Ellenberg
> : LINBIT | Your Way to High Availability
> : DRBD/HA support and consulting http://www.linbit.com
>
> DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
> __
> please don't Cc me, but send to list   --   I'm subscribed
> _______________________________________________
> drbd-user mailing list
> drbd-user at lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user
>



-- 
Steve Kieu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20111117/93665859/attachment.htm>


More information about the drbd-user mailing list