[DRBD-user] Deleting directories/files from drbd drive getting hung for some time.

Wed Jul 27 21:32:13 CEST 2011

From: "Balreddy Medipally" <balreddy.m at imimobile.com>
> I couldn't found anything like "sda: Uncorrectable error" in
> dmesg and I am sure that there is no faulty HDD in our servers.

No filesystem or disk errors anywhere in the dmesg output from either box? 
OK.  That means other stuff is going on, but solving that will be more
annoying than replacing disks.

> Actually we are using DRBD for MySQL high availability, so when
> dropping some big table like 50GB or bigger [then some] applications
> are not able to access [the] MySQL DB [while the table's being
> dropped].

Dropping a table that big entails a lot of disk I/O.  There's not a whole lot
you can do about that.  Do you see the same problems on a non-DRBD system? 
That's something I'd check if I could.

> resource drbd0 {
> protocol C;
> startup {
> become-primary-on both;
> }

Dual-primary is important; you should've mentioned that in the first message. 
Which filesystem are you using?  What are the mount options?  Because
GFS/OCFS2 have to maintain consistency and make sure all writes are replicated
to both nodes and manage locking, they're going to perform worse than ext3 in
a primary/secondary setup.  If performance is that critical, you'd probably be
better off using primary/secondary than dual-primary.

If you're unable to do that, switching to protocol A would probably make stuff
a bit faster.  We've had a primary/secondary setup using protocol A for over a
year.  This setup's had two unplanned failovers due to equipment failures, and
we haven't lost any data.

> net {
>         timeout 60;
>         max-epoch-size  8000;
>         max-buffers     8000;
>         unplug-watermark        128;
>         connect-int     10;
>         ping-int        10;
>         sndbuf-size     1024k;
>         ko-count        180;
>         ping-timeout    5;
>         allow-two-primaries;

Probably OK.

> syncer {
>         rate 200M;
>         al-extents 3389;

Can the link between the two nodes actually handle 200M/sec?  If not, set this
lower.  Setting the sync rate too high can actually slow things down, though
it shouldn't cause the problems you reported.

>         no-disk-barrier;
>         no-disk-flushes;
>         no-md-flushes;
>         on-io-error detach;

This should be OK too, provided the disks have battery-backed cache and so
forth.

-- 
Matt G / Dances With Crows
The Crow202 Blog:  http://crow202.org/wordpress/
There is no Darkness in Eternity/But only Light too dim for us to see