[DRBD-user] Re: heartbeat 2.0.8: lockups kerneloops

Ross S. W. Walker rwalker at medallion.com
Sun Feb 25 06:51:45 CET 2007

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


> -----Original Message-----
> From: drbd-user-bounces at lists.linbit.com 
> [mailto:drbd-user-bounces at lists.linbit.com] On Behalf Of Gerry Reno
> Sent: Saturday, February 24, 2007 9:44 PM
> To: drbd-user at lists.linbit.com
> Subject: Re: [DRBD-user] Re: heartbeat 2.0.8: lockups kerneloops
> 
> Gerry Reno wrote:
> > Lars,
> > Yes I did a memtest and both machines got some errors on test #5 
> > (block moves, 64 moves). As I said in one of the previous 
> posts I had 
> > read that some Athlons get some errors on test #5 and it may not be 
> > critical. I don't know whether that is true or not. The 
> memory passed 
> > all the other tests. So maybe I'll go swap some memory 
> around just to 
> > see if it makes a difference. Maybe drbd is putting a lot 
> of pressure 
> > in the memory area that wasn't there previously?
> >
> > Gerry
> >
> After some investigating and testing I have some results. What was 
> baffling me was the fact that the machines were perfectly 
> stable prior 
> to installing HA and drbd - no hangs, no kernel oops, just 
> rock solid. 
> What I found was that apparently drbd was applying enough pressure on 
> memory to reveal a concealed problem. I began by taking the memory 
> modules out of the server and testing them. They tested 
> perfect. So why 
> were they generating some errors on one of the Memtest86 tests when 
> installed in the servers? I began looking around in the BIOS and 
> tweaking some of the settings for slew rate and found that 
> with a small 
> adjustment I could make the error go away and the memory 
> would pass all 
> tests. This tweak probably became necessary because a few 
> months ago I 
> had installed some newer faster memory in these machines 
> although I had 
> not seen any problem until HA and drbd were running. So the 
> hang problem 
> appears not to have directly been an HA or drbd problem after 
> all. It is 
> still very early and I've only had the servers running with the new 
> setting for less than 24 hours but so far no hangs or kernel 
> oops. And I 
> am still concerned though about the drbd over md RAID-1 issue 
> and I will 
> pursue that scenario next but at least for now the servers 
> appear stable 
> once again.
> My thanks to everyone who responded to the post.

I'm glad to hear you have solved your mysterious problem and that it was
hardware after all. I have managed to back port Lars patch to fix the
performance problems with drbd and md raid-1 to CentOS 2.6.9 plus, I
successfully build a custom version of the latest stable and it tested
successfully.

I'm trying to get it included in the next CentOS 4.4 plus kernel update.

If you want I can forward the patches and my recipe for building
painlessly.

-Ross

______________________________________________________________________
This e-mail, and any attachments thereto, is intended only for use by
the addressee(s) named herein and may contain legally privileged
and/or confidential information. If you are not the intended recipient
of this e-mail, you are hereby notified that any dissemination,
distribution or copying of this e-mail, and any attachments thereto,
is strictly prohibited. If you have received this e-mail in error,
please immediately notify the sender and permanently delete the
original and any copy or printout thereof.




More information about the drbd-user mailing list