[drbd-mc] Lots of pvdisplay commands -> RA timeouts
Lars Ellenberg
lars.ellenberg at linbit.com
Tue Dec 18 11:00:47 CET 2012
On Mon, Dec 17, 2012 at 04:09:25PM +0100, Rasto Levrinc wrote:
> On Mon, Dec 17, 2012 at 3:39 PM, Caspar Smit <c.smit at truebit.nl> wrote:
> > 2012/12/17 Rasto Levrinc <rasto.levrinc at gmail.com>
> >>
> >> On Mon, Dec 17, 2012 at 3:09 PM, Caspar Smit <c.smit at truebit.nl> wrote:
> >> > Hi Rasto,
> >> >
> >> > I noticed this in one of my clusters:
> >> >
> >> ...
> >>
> >> > /usr/local/bin/lcmc-gui-helper-1.4.2 hw-info-daemon
> >> > root 9869 0.0 0.0 39856 1516 pts/7 S+ 14:49 0:00
> >> > \_ sudo -E -p DRBD MC sudo pwd: /usr/local/bin/lcmc-gui-helper-1.4.2
> >> > hw-info-daemon
> >> > root 9870 0.3 0.0 27676 5000 pts/7 S+ 14:49 0:00
> >> > \_ /usr/bin/perl /usr/local/bin/lcmc-gui-helper-1.4.2 hw-info-daemon
> >> > root 18176 0.0 0.0 9060 1180 pts/7 S+ 14:50 0:00
> >> > \_ sh -c /sbin/pvdisplay -C --noheadings -o pv_name,vg_name 2>/dev/null
> >> > root 18177 0.0 0.0 17872 1604 pts/7 D+ 14:50 0:00
> >> > \_ /sbin/pvdisplay -C --noheadings -o pv_name,vg_name
> >> >
> >> > Why is LCMC running so many pvdisplay commands at once?
> >>
> >> Hi Caspar,
> >>
> >> it is running it once in 10 seconds, to see if something has changed.
> >> Can you check what does it do on your nodes?
> >>
> >> /sbin/pvdisplay -C --noheadings -o pv_name,vg_name
> >>
> >> Rasto
> >>
> >
> > # /sbin/pvdisplay -C --noheadings -o pv_name,vg_name
> > /dev/sdb single_array3
> > /dev/sdc single_array3
> > /dev/sdd single_array3
> > /dev/sdh replicated_array1and2
> > /dev/sdi replicated_array1and2
> > /dev/sdj replicated_array1and2
> > /dev/sdk replicated_array1and2
> >
> > I know that LCMC does monitor changes with the lcmc-gui-helper script, but I
> > presume the "hw-info-daemon" part has to run only once and not 5(+) times
> > concurrently?
> >
> > Running 5x pvdisplay concurrently can really slow things down.
>
> It shouldn't run this 5x concurrently. What here probably happens, is that
> the hw daemon takes too long and is assumed dead and is restarted.
Which does not really improve things in this case ;-)
> Can it be that /sbin/pvdisplay -C --noheadings -o pv_name,vg_name
> hangs on or takes very long on your system, at least sometimes?
We have seen lvm commands that scan meta data take several *minutes* to
complete on a moderately busy server.
[0.1 seconds when the system is idle,
virtually "forever" when it is really busy :-)]
In part because of too many devices to be scanned, badly chosen filter
settings, badly chosen bio flags for O_DIRECT (that has been fixed since
in kernel), too long device queues (too large nr_requests),
and evil io scheduler interactions.
All tuneable, or possible to work around.
Still that brought it down to ~ 20 seconds only.
> Anyway I can/should fix LCMC to deal with this situation.
You should probably not initiate a full device scan every ten seconds,
but preferably on demand only,
or maybe every once in a while if loadavg is low.
Lars
More information about the drbd-mc
mailing list