Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Lars Ellenberg wrote:
> / 2006-09-19 16:44:14 +0200
> \ Maciej Bogucki:
>
>>Hello,
>>
>>I've been using drbd for the past few years without any problems, but now my problem is a strange one.
>>I have HA mysql server with drbd 0.7.21 and kernel 2.6.17. When mysql
>>Partition (mysql databases) is on drbd device(datadb), server gets
>>lags - no response for 1-4seconds. It isn't network related problem(there in no packet lost!!), but
>>console (keyboard and monitor are directly connected) and all processes on server hangs too! The problem only
>>apears when mysql database is on drbd device, and everything is working fine when I move data to non drbd
>>device(sda). So I'm sure that it is drbd or kernel problem. When I migrate mysql to secondary machine, I have the
>>same problems, so I think that hardware is ok :)
>>The same problem is with drbd 0.7.17 with kernel version 2.6.14.
>>The strangest thing is, that I have the same hardware and
>>Software (drbd,kernel) in another location and there is no problems. One
>>change is that there I have apache instead of mysql.
>>Any ideas?
>
>
> outside drbd:
> verify what io scheduler you use.
> I'd recommend to use "deadline" on servers.
I have had "cfq" scheduler, but I changed it do "deadline", and I still
have lags :(
>
> in drbd:
> you could play with "unplug-watermark" and "max-epoch-size" (and
> possibly max-buffers).
> when I say "play", I mean it. it could get better if you increase,
> it could get better when you decrease, it could get better if you
> adjust in opposite directions (where possible), and it could happen to
> have no noticable effect at all, which is all very dependent on your
> lower level io subsystem and on network timings and ...
I know, than I can play with them, but there is another strange thing.
When I disconnect secondary node(shutdown heartbeat, and drbd) I get
lags also. Also I don't have much traffic on database(1256 writes per
minute - so I's 20KB per seconds, and only a few reads per minute), so
playing with "net" parameters is not necessary in my case. I think that
it is drbd bug or some stupid thing :)
>>resource datafs {
>> protocol C;
>> incon-degr-cmd "echo '!DRBD! pri on incon-degr' | wall ; sleep 60 ; halt -f";
>
>
> I know the "halt -f" is in the example config, but you may want to
> consider to write something like "sleep <verylargenumber>" or
> "killall -9 heartbeat ccm ipfail" instead...
But when I do like You write, there is a higher chance that I get split
brain. When I do "halt -f" the chance is minimal.
Best Regards
Maciej Bogucki