Hi,<div><br></div><div>Thanks for the reply. </div><div><br></div><div>The multi volume feature is quite useful. With this I can for example switch Primary/Secondary individually if mysql or pgsql crashed without affecting each other. </div>
<div><br></div><div>By no means of urging, it would be great if you can tell when the next release would be so I can test? Also is 8.3.12 workable - I will give it a test anyway if it is long to wait for the next release.</div>
<div><br></div><div>Many thanks,</div><div><br></div><div><br><br><div class="gmail_quote">On Wed, Nov 16, 2011 at 11:03 PM, Lars Ellenberg <span dir="ltr"><<a href="mailto:lars.ellenberg@linbit.com">lars.ellenberg@linbit.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;"><div class="im">On Wed, Nov 16, 2011 at 01:01:38PM +1100, Steve Kieu wrote:<br>
> Hello,<br>
><br>
> I am experimenting drbd and not quite good in stability (un usable). I saw<br>
> this in dmesg log:<br>
><br>
> block drbd1: md_sync_timer expired! Worker calls drbd_md_sync<br>
> ().<br>
<br>
</div>Usually, especially with "huge" devices, this is no reason to worry.<br>
No need to do _anything_.<br>
<div class="im"><br>
> At fist restart it works for a while, and then all of sudden - cat<br>
> /proc/drbd show ProtocolError and system hang (mysql or any other process<br>
> read/write to the drbd partitions.<br>
><br>
> It is repeatable and when it happend network is not busy, machine load is<br>
> nearly 0 and all other network connectivity is normal.<br>
><br>
> Googling show me that many users has same problem and one suggested to<br>
> lower the rate of resync and sync, I did that (for 100Mbit ethernet I set<br>
> resync is 3M and in syncer rate 40M; I setup two volumes . Problem still.<br>
><br>
><br>
> Here is the short description of the system:<br>
><br>
> * Centos 6 x86_64<br>
> * Kernel 2.6.32.43-vs2.3.0.36.29.8-h1-32cpu-noselinux which is vanilar<br>
> kernel 2.6.32.43 with vserver patch vs2.3.0.36.29.8 - compile with HZ = 100<br>
> and SMP for 32 cpu<br>
> * DRBD compiled from source, version 8.4.0 (including kernel module)<br>
<br>
</div>8.4.0 seems to have serious stability issues under moderate to heavy IO<br>
when actually using the multi volume feature :-(<br>
We are preparing a 8.4.1.<br>
<div><div></div><div class="h5"><br>
> * DRBD build on top of LVM here is the config<br>
><br>
> resource r0 {<br>
><br>
> on cosmos {<br>
> volume 0 {<br>
> #device minor 0;<br>
> device /dev/drbd0;<br>
> meta-disk internal;<br>
> disk /dev/vs-resource1/mysqldata;<br>
> }<br>
><br>
> volume 1 {<br>
> device /dev/drbd1;<br>
> meta-disk internal;<br>
> disk /dev/vs-resource1/pgsqldata;<br>
> }<br>
><br>
> address <a href="http://10.200.11.4:7789" target="_blank">10.200.11.4:7789</a>;<br>
> }<br>
><br>
> on seaspray {<br>
> volume 0 {<br>
> # device minor 0;<br>
> device /dev/drbd0;<br>
> meta-disk internal;<br>
> disk /dev/vg_seaspray/mysqldata;<br>
> }<br>
><br>
> volume 1 {<br>
> device /dev/drbd1;<br>
> meta-disk internal;<br>
> disk /dev/vg_seaspray/pgsqldata;<br>
> }<br>
><br>
> address <a href="http://10.200.11.3:7789" target="_blank">10.200.11.3:7789</a>;<br>
> }<br>
><br>
> startup {<br>
> #become-primary-on both;<br>
><br>
> }<br>
> net {<br>
> #allow-two-primaries;<br>
> protocol C;<br>
> after-sb-0pri discard-zero-changes;<br>
> after-sb-1pri discard-secondary;<br>
> after-sb-2pri disconnect;<br>
> #cram-hmac-alg sha1;<br>
> #shared-secret "FooFunFactory";<br>
><br>
> }<br>
><br>
><br>
> }<br>
><br>
> * DRBD runs in Primary/Secondary mode for now. The device is mounted into a<br>
> vserver instance and mysql and postgres is running from the vserver<br>
> * IPtables is setup to allow DRBD trafic - it happened even iptables is off<br>
><br>
> * Network route<br>
> route<br>
> Kernel IP routing table<br>
> Destination Gateway Genmask Flags Metric Ref Use<br>
> Iface<br>
> 10.200.11.0 * 255.255.255.224 U 0 0 0 eth0<br>
> 10.200.11.128 * 255.255.255.192 U 0 0 0<br>
> eth1.503<br>
> 192.168.100.0 * 255.255.255.0 U 0 0 0<br>
> dummy0<br>
> 1.1.1.0 * 255.255.255.0 U 0 0 0<br>
> vmbr0<br>
> link-local * 255.255.0.0 U 1002 0 0 eth0<br>
> link-local * 255.255.0.0 U 1003 0 0 eth1<br>
> link-local * 255.255.0.0 U 1004 0 0<br>
> eth1.503<br>
> default 10.200.11.1 0.0.0.0 UG 0 0 0 eth0<br>
><br>
> I attach the dmesg here as well if it helps to debug. I would like to have<br>
> it fixed so please help.<br>
><br>
> Many thanks,<br>
><br>
><br>
><br>
><br>
> --<br>
> Steve Kieu<br>
<br>
<br>
</div></div>> _______________________________________________<br>
> drbd-user mailing list<br>
> <a href="mailto:drbd-user@lists.linbit.com">drbd-user@lists.linbit.com</a><br>
> <a href="http://lists.linbit.com/mailman/listinfo/drbd-user" target="_blank">http://lists.linbit.com/mailman/listinfo/drbd-user</a><br>
<font color="#888888"><br>
<br>
--<br>
: Lars Ellenberg<br>
: LINBIT | Your Way to High Availability<br>
: DRBD/HA support and consulting <a href="http://www.linbit.com" target="_blank">http://www.linbit.com</a><br>
<br>
DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.<br>
__<br>
please don't Cc me, but send to list -- I'm subscribed<br>
_______________________________________________<br>
drbd-user mailing list<br>
<a href="mailto:drbd-user@lists.linbit.com">drbd-user@lists.linbit.com</a><br>
<a href="http://lists.linbit.com/mailman/listinfo/drbd-user" target="_blank">http://lists.linbit.com/mailman/listinfo/drbd-user</a><br>
</font></blockquote></div><br><br clear="all"><div><br></div>-- <br>Steve Kieu<br>
</div>