[drbd-mc] Lots of pvdisplay commands -> RA timeouts

Rasto Levrinc rasto.levrinc at gmail.com
Sat Dec 22 23:14:03 CET 2012


On Mon, Dec 17, 2012 at 9:10 PM, Caspar Smit <c.smit at truebit.nl> wrote:
> 2012/12/17 Rasto Levrinc <rasto.levrinc at gmail.com>
>>
>> On Mon, Dec 17, 2012 at 3:39 PM, Caspar Smit <c.smit at truebit.nl> wrote:
>> > 2012/12/17 Rasto Levrinc <rasto.levrinc at gmail.com>
>> >>
>> >> On Mon, Dec 17, 2012 at 3:09 PM, Caspar Smit <c.smit at truebit.nl> wrote:
>> >> > Hi Rasto,
>> >> >
>> >> > I noticed this in one of my clusters:
>> >> >
>> >> ...
>> >>
>> >> > /usr/local/bin/lcmc-gui-helper-1.4.2 hw-info-daemon
>> >> > root      9869  0.0  0.0  39856  1516 pts/7    S+   14:49   0:00
>> >> > \_ sudo -E -p DRBD MC sudo pwd:  /usr/local/bin/lcmc-gui-helper-1.4.2
>> >> > hw-info-daemon
>> >> > root      9870  0.3  0.0  27676  5000 pts/7    S+   14:49   0:00
>> >> > \_ /usr/bin/perl /usr/local/bin/lcmc-gui-helper-1.4.2 hw-info-daemon
>> >> > root     18176  0.0  0.0   9060  1180 pts/7    S+   14:50   0:00
>> >> > \_ sh -c /sbin/pvdisplay -C --noheadings -o pv_name,vg_name
>> >> > 2>/dev/null
>> >> > root     18177  0.0  0.0  17872  1604 pts/7    D+   14:50   0:00
>> >> > \_ /sbin/pvdisplay -C --noheadings -o pv_name,vg_name
>> >> >
>> >> > Why is LCMC running so many pvdisplay commands at once?
>> >>
>> >> Hi Caspar,
>> >>
>> >> it is running it once in 10 seconds, to see if something has changed.
>> >> Can you check what does it do on your nodes?
>> >>
>> >>  /sbin/pvdisplay -C --noheadings -o pv_name,vg_name
>> >>
>> >> Rasto
>> >>
>> >
>> > # /sbin/pvdisplay -C --noheadings -o pv_name,vg_name
>> >   /dev/sdb   single_array3
>> >   /dev/sdc   single_array3
>> >   /dev/sdd   single_array3
>> >   /dev/sdh   replicated_array1and2
>> >   /dev/sdi   replicated_array1and2
>> >   /dev/sdj   replicated_array1and2
>> >   /dev/sdk   replicated_array1and2
>> >
>> > I know that LCMC does monitor changes with the lcmc-gui-helper script,
>> > but I
>> > presume the "hw-info-daemon" part has to run only once and not 5(+)
>> > times
>> > concurrently?
>> >
>> > Running 5x pvdisplay concurrently can really slow things down.
>>
>> It shouldn't run this 5x concurrently. What here probably happens, is that
>> the hw daemon takes too long and is assumed dead and is restarted.
>>
>> Can it be that /sbin/pvdisplay -C --noheadings -o pv_name,vg_name
>> hangs on or takes very long on your system, at least sometimes?
>>
> Yes, that is probably the case because the system is under heavy (NFS) load.
>
>>
>> Anyway I can/should fix LCMC to deal with this situation.
>
>
> That would be nice :)
>

Fixed in 1.4.5. The info daemon wasn't cancelled after a timeout and it
would result in multiple instances running at the same time. Some
further optimizations are possible but this bug was the most important
at the moment.

Rasto

-- 
Dipl.-Ing. Rastislav Levrinc
rasto.levrinc at gmail.com
Linux Cluster Management Console
http://lcmc.sf.net/


More information about the drbd-mc mailing list