Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Gerry Reno wrote: > Lars, > Yes I did a memtest and both machines got some errors on test #5 > (block moves, 64 moves). As I said in one of the previous posts I had > read that some Athlons get some errors on test #5 and it may not be > critical. I don't know whether that is true or not. The memory passed > all the other tests. So maybe I'll go swap some memory around just to > see if it makes a difference. Maybe drbd is putting a lot of pressure > in the memory area that wasn't there previously? > > Gerry > After some investigating and testing I have some results. What was baffling me was the fact that the machines were perfectly stable prior to installing HA and drbd - no hangs, no kernel oops, just rock solid. What I found was that apparently drbd was applying enough pressure on memory to reveal a concealed problem. I began by taking the memory modules out of the server and testing them. They tested perfect. So why were they generating some errors on one of the Memtest86 tests when installed in the servers? I began looking around in the BIOS and tweaking some of the settings for slew rate and found that with a small adjustment I could make the error go away and the memory would pass all tests. This tweak probably became necessary because a few months ago I had installed some newer faster memory in these machines although I had not seen any problem until HA and drbd were running. So the hang problem appears not to have directly been an HA or drbd problem after all. It is still very early and I've only had the servers running with the new setting for less than 24 hours but so far no hangs or kernel oops. And I am still concerned though about the drbd over md RAID-1 issue and I will pursue that scenario next but at least for now the servers appear stable once again. My thanks to everyone who responded to the post. Gerry