Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
> -----Original Message----- > From: drbd-user-bounces at lists.linbit.com > [mailto:drbd-user-bounces at lists.linbit.com] On Behalf Of Gerry Reno > Sent: Saturday, February 24, 2007 9:44 PM > To: drbd-user at lists.linbit.com > Subject: Re: [DRBD-user] Re: heartbeat 2.0.8: lockups kerneloops > > Gerry Reno wrote: > > Lars, > > Yes I did a memtest and both machines got some errors on test #5 > > (block moves, 64 moves). As I said in one of the previous > posts I had > > read that some Athlons get some errors on test #5 and it may not be > > critical. I don't know whether that is true or not. The > memory passed > > all the other tests. So maybe I'll go swap some memory > around just to > > see if it makes a difference. Maybe drbd is putting a lot > of pressure > > in the memory area that wasn't there previously? > > > > Gerry > > > After some investigating and testing I have some results. What was > baffling me was the fact that the machines were perfectly > stable prior > to installing HA and drbd - no hangs, no kernel oops, just > rock solid. > What I found was that apparently drbd was applying enough pressure on > memory to reveal a concealed problem. I began by taking the memory > modules out of the server and testing them. They tested > perfect. So why > were they generating some errors on one of the Memtest86 tests when > installed in the servers? I began looking around in the BIOS and > tweaking some of the settings for slew rate and found that > with a small > adjustment I could make the error go away and the memory > would pass all > tests. This tweak probably became necessary because a few > months ago I > had installed some newer faster memory in these machines > although I had > not seen any problem until HA and drbd were running. So the > hang problem > appears not to have directly been an HA or drbd problem after > all. It is > still very early and I've only had the servers running with the new > setting for less than 24 hours but so far no hangs or kernel > oops. And I > am still concerned though about the drbd over md RAID-1 issue > and I will > pursue that scenario next but at least for now the servers > appear stable > once again. > My thanks to everyone who responded to the post. I'm glad to hear you have solved your mysterious problem and that it was hardware after all. I have managed to back port Lars patch to fix the performance problems with drbd and md raid-1 to CentOS 2.6.9 plus, I successfully build a custom version of the latest stable and it tested successfully. I'm trying to get it included in the next CentOS 4.4 plus kernel update. If you want I can forward the patches and my recipe for building painlessly. -Ross ______________________________________________________________________ This e-mail, and any attachments thereto, is intended only for use by the addressee(s) named herein and may contain legally privileged and/or confidential information. If you are not the intended recipient of this e-mail, you are hereby notified that any dissemination, distribution or copying of this e-mail, and any attachments thereto, is strictly prohibited. If you have received this e-mail in error, please immediately notify the sender and permanently delete the original and any copy or printout thereof.