[drbd-mc] Lots of pvdisplay commands -> RA timeouts

Lars Ellenberg lars.ellenberg at linbit.com
Tue Dec 18 11:00:47 CET 2012


On Mon, Dec 17, 2012 at 04:09:25PM +0100, Rasto Levrinc wrote:
> On Mon, Dec 17, 2012 at 3:39 PM, Caspar Smit <c.smit at truebit.nl> wrote:
> > 2012/12/17 Rasto Levrinc <rasto.levrinc at gmail.com>
> >>
> >> On Mon, Dec 17, 2012 at 3:09 PM, Caspar Smit <c.smit at truebit.nl> wrote:
> >> > Hi Rasto,
> >> >
> >> > I noticed this in one of my clusters:
> >> >
> >> ...
> >>
> >> > /usr/local/bin/lcmc-gui-helper-1.4.2 hw-info-daemon
> >> > root      9869  0.0  0.0  39856  1516 pts/7    S+   14:49   0:00
> >> > \_ sudo -E -p DRBD MC sudo pwd:  /usr/local/bin/lcmc-gui-helper-1.4.2
> >> > hw-info-daemon
> >> > root      9870  0.3  0.0  27676  5000 pts/7    S+   14:49   0:00
> >> > \_ /usr/bin/perl /usr/local/bin/lcmc-gui-helper-1.4.2 hw-info-daemon
> >> > root     18176  0.0  0.0   9060  1180 pts/7    S+   14:50   0:00
> >> > \_ sh -c /sbin/pvdisplay -C --noheadings -o pv_name,vg_name 2>/dev/null
> >> > root     18177  0.0  0.0  17872  1604 pts/7    D+   14:50   0:00
> >> > \_ /sbin/pvdisplay -C --noheadings -o pv_name,vg_name
> >> >
> >> > Why is LCMC running so many pvdisplay commands at once?
> >>
> >> Hi Caspar,
> >>
> >> it is running it once in 10 seconds, to see if something has changed.
> >> Can you check what does it do on your nodes?
> >>
> >>  /sbin/pvdisplay -C --noheadings -o pv_name,vg_name
> >>
> >> Rasto
> >>
> >
> > # /sbin/pvdisplay -C --noheadings -o pv_name,vg_name
> >   /dev/sdb   single_array3
> >   /dev/sdc   single_array3
> >   /dev/sdd   single_array3
> >   /dev/sdh   replicated_array1and2
> >   /dev/sdi   replicated_array1and2
> >   /dev/sdj   replicated_array1and2
> >   /dev/sdk   replicated_array1and2
> >
> > I know that LCMC does monitor changes with the lcmc-gui-helper script, but I
> > presume the "hw-info-daemon" part has to run only once and not 5(+) times
> > concurrently?
> >
> > Running 5x pvdisplay concurrently can really slow things down.
> 
> It shouldn't run this 5x concurrently. What here probably happens, is that
> the hw daemon takes too long and is assumed dead and is restarted.

Which does not really improve things in this case ;-)

> Can it be that /sbin/pvdisplay -C --noheadings -o pv_name,vg_name
> hangs on or takes very long on your system, at least sometimes?

We have seen lvm commands that scan meta data take several *minutes* to
complete on a moderately busy server.
[0.1 seconds when the system is idle,
 virtually "forever" when it is really busy :-)]

In part because of too many devices to be scanned, badly chosen filter
settings, badly chosen bio flags for O_DIRECT (that has been fixed since
in kernel), too long device queues (too large nr_requests),
and evil io scheduler interactions.
All tuneable, or possible to work around.
Still that brought it down to ~ 20 seconds only.

> Anyway I can/should fix LCMC to deal with this situation.

You should probably not initiate a full device scan every ten seconds,
but preferably on demand only,
or maybe every once in a while if loadavg is low.

	Lars



More information about the drbd-mc mailing list