Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Lars Ellenberg wrote: > / 2006-09-19 16:44:14 +0200 > \ Maciej Bogucki: > >>Hello, >> >>I've been using drbd for the past few years without any problems, but now my problem is a strange one. >>I have HA mysql server with drbd 0.7.21 and kernel 2.6.17. When mysql >>Partition (mysql databases) is on drbd device(datadb), server gets >>lags - no response for 1-4seconds. It isn't network related problem(there in no packet lost!!), but >>console (keyboard and monitor are directly connected) and all processes on server hangs too! The problem only >>apears when mysql database is on drbd device, and everything is working fine when I move data to non drbd >>device(sda). So I'm sure that it is drbd or kernel problem. When I migrate mysql to secondary machine, I have the >>same problems, so I think that hardware is ok :) >>The same problem is with drbd 0.7.17 with kernel version 2.6.14. >>The strangest thing is, that I have the same hardware and >>Software (drbd,kernel) in another location and there is no problems. One >>change is that there I have apache instead of mysql. >>Any ideas? > > > outside drbd: > verify what io scheduler you use. > I'd recommend to use "deadline" on servers. I have had "cfq" scheduler, but I changed it do "deadline", and I still have lags :( > > in drbd: > you could play with "unplug-watermark" and "max-epoch-size" (and > possibly max-buffers). > when I say "play", I mean it. it could get better if you increase, > it could get better when you decrease, it could get better if you > adjust in opposite directions (where possible), and it could happen to > have no noticable effect at all, which is all very dependent on your > lower level io subsystem and on network timings and ... I know, than I can play with them, but there is another strange thing. When I disconnect secondary node(shutdown heartbeat, and drbd) I get lags also. Also I don't have much traffic on database(1256 writes per minute - so I's 20KB per seconds, and only a few reads per minute), so playing with "net" parameters is not necessary in my case. I think that it is drbd bug or some stupid thing :) >>resource datafs { >> protocol C; >> incon-degr-cmd "echo '!DRBD! pri on incon-degr' | wall ; sleep 60 ; halt -f"; > > > I know the "halt -f" is in the example config, but you may want to > consider to write something like "sleep <verylargenumber>" or > "killall -9 heartbeat ccm ipfail" instead... But when I do like You write, there is a higher chance that I get split brain. When I do "halt -f" the chance is minimal. Best Regards Maciej Bogucki