[drbd-mc] Lots of pvdisplay commands -> RA timeouts
Rasto Levrinc
rasto.levrinc at gmail.com
Sat Dec 22 23:14:03 CET 2012
On Mon, Dec 17, 2012 at 9:10 PM, Caspar Smit <c.smit at truebit.nl> wrote:
> 2012/12/17 Rasto Levrinc <rasto.levrinc at gmail.com>
>>
>> On Mon, Dec 17, 2012 at 3:39 PM, Caspar Smit <c.smit at truebit.nl> wrote:
>> > 2012/12/17 Rasto Levrinc <rasto.levrinc at gmail.com>
>> >>
>> >> On Mon, Dec 17, 2012 at 3:09 PM, Caspar Smit <c.smit at truebit.nl> wrote:
>> >> > Hi Rasto,
>> >> >
>> >> > I noticed this in one of my clusters:
>> >> >
>> >> ...
>> >>
>> >> > /usr/local/bin/lcmc-gui-helper-1.4.2 hw-info-daemon
>> >> > root 9869 0.0 0.0 39856 1516 pts/7 S+ 14:49 0:00
>> >> > \_ sudo -E -p DRBD MC sudo pwd: /usr/local/bin/lcmc-gui-helper-1.4.2
>> >> > hw-info-daemon
>> >> > root 9870 0.3 0.0 27676 5000 pts/7 S+ 14:49 0:00
>> >> > \_ /usr/bin/perl /usr/local/bin/lcmc-gui-helper-1.4.2 hw-info-daemon
>> >> > root 18176 0.0 0.0 9060 1180 pts/7 S+ 14:50 0:00
>> >> > \_ sh -c /sbin/pvdisplay -C --noheadings -o pv_name,vg_name
>> >> > 2>/dev/null
>> >> > root 18177 0.0 0.0 17872 1604 pts/7 D+ 14:50 0:00
>> >> > \_ /sbin/pvdisplay -C --noheadings -o pv_name,vg_name
>> >> >
>> >> > Why is LCMC running so many pvdisplay commands at once?
>> >>
>> >> Hi Caspar,
>> >>
>> >> it is running it once in 10 seconds, to see if something has changed.
>> >> Can you check what does it do on your nodes?
>> >>
>> >> /sbin/pvdisplay -C --noheadings -o pv_name,vg_name
>> >>
>> >> Rasto
>> >>
>> >
>> > # /sbin/pvdisplay -C --noheadings -o pv_name,vg_name
>> > /dev/sdb single_array3
>> > /dev/sdc single_array3
>> > /dev/sdd single_array3
>> > /dev/sdh replicated_array1and2
>> > /dev/sdi replicated_array1and2
>> > /dev/sdj replicated_array1and2
>> > /dev/sdk replicated_array1and2
>> >
>> > I know that LCMC does monitor changes with the lcmc-gui-helper script,
>> > but I
>> > presume the "hw-info-daemon" part has to run only once and not 5(+)
>> > times
>> > concurrently?
>> >
>> > Running 5x pvdisplay concurrently can really slow things down.
>>
>> It shouldn't run this 5x concurrently. What here probably happens, is that
>> the hw daemon takes too long and is assumed dead and is restarted.
>>
>> Can it be that /sbin/pvdisplay -C --noheadings -o pv_name,vg_name
>> hangs on or takes very long on your system, at least sometimes?
>>
> Yes, that is probably the case because the system is under heavy (NFS) load.
>
>>
>> Anyway I can/should fix LCMC to deal with this situation.
>
>
> That would be nice :)
>
Fixed in 1.4.5. The info daemon wasn't cancelled after a timeout and it
would result in multiple instances running at the same time. Some
further optimizations are possible but this bug was the most important
at the moment.
Rasto
--
Dipl.-Ing. Rastislav Levrinc
rasto.levrinc at gmail.com
Linux Cluster Management Console
http://lcmc.sf.net/
More information about the drbd-mc
mailing list