Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Below On 05/18/2011 01:02 PM, Richard Stockton wrote: > Hi Felix, > > Thanks for responding, answers (and resolution) below... > > At 12:20 AM 5/18/2011, Felix Frank wrote: >> On 05/18/2011 04:09 AM, Richard Stockton wrote: >> > Hi drbd folks, >> > >> > We have been using DRBD with heartbeat for several years now. >> > The load averages on our RedHat servers have almost always stayed >> > below 1.0. Suddenly last week the loads jumped up to 12-14 on >> > both servers. They go down at night, when the usage goes down, >> > but by mid-day every business day they are back to 14. >> > >> > We don't see anything out of the ordinary in logs, no drive >> > warning lights, no degraded RAID, nothing especially out of >> > bounds in iostat, NFS is running smoothly, tcpdump doesn't >> > show any nastiness (that we can see).... >> > We have run out of ideas. >> > >> > Our setup is 2 disk arrays, with each server the fail-over for >> > the other. These are POP/IMAP accounts with half the alphabet >> > on one server and half on the other. If one server fails, the >> > other one takes over all of the alphabet. Each letter is a >> > separate resource, with a separate IP, etc. The size of the >> > letter partitions range from 5G to 120G. The pop/imap servers >> > access the letters via NFS3. >> > >> > RedHat 5.3 (2.6.18-128.1.10.el5) >> > DRBD 8.3.1 >> > Heartbeat 2.1.4 >> > >> > Has anyone seen this sort of behavior before, and if so, what >> > the heck is happening? We would appreciate any suggestions. >> >> Is NFS mounted sync or async? > > NFS is mounted "sync" (NFS3 default, I believe). > >> Which I/O scheduler is in use for your DRBDs (i.e., for their backing >> devices)? > > cfg (also the default, I believe) > > > Now for the interesting part. This problem had been occurring every > day for about a week, so I joined this list yesterday and posted my > issue. When I came to work this morning the problem had magically > subsided! Loads are now back to 2.0 or less, even during the busy > times. This is still slightly higher than normal, but certainly not > enough to adversely effect performance. > > At this point we are assuming that some rebuild was happening in the > background, and because of the somewhat large disk arrays, (1.25 TB > each), it just took a very long time. Either that or just the act > of joining the list frightened the system into submission. [grin] > In any case, all appears to be good now. > > However, we would still love to know exactly what was happening, > so we can be prepared to deal with it if it ever happens again. > If anyone has a clue, please share. I recall someone else having similar issues and it ended up being related to a NIC that was on its way out and sending bad packets periodically, especially under heavy load. Maybe check out your networking cards/cables? Brian > > Thanks again. > - Richard > > _______________________________________________ > drbd-user mailing list > drbd-user at lists.linbit.com > http://lists.linbit.com/mailman/listinfo/drbd-user