VMware graphs are dropping often

amaserverguy · May 2018

My Cluster CPU and memeory graphs are dropping often. Any reason for this ?

Image: https://forum.xorux.com/uploads/editor/9b/nnt88i5g56ys.png

Image: https://forum.xorux.com/uploads/editor/n6/wxn373tkhqwb.png

Pavel · May 2018

send us logs

Note a short problem description in the text field of the upload form.

cd /home/lpar2rrd/lpar2rrd  # or where is your LPAR2RRD working dir
tar cvhf logs.tar logs etc tmp/*txt 
gzip -9 logs.tar

Send us logs.tar.gz via https://upload.lpar2rrd.com

ivancalic · October 2018

Did you find the reason? We have same issue.

Pavel · October 2018

Hi,

we have no feedback.

Basically problem is that data load is longer than an hour.

It was quite huge environment.

To speed up proccesing you can increase paralelization processing of ESXi.

Normally there is 9 processes running to get ESXi data from each vCenter.

Increase it to 20.

cd /home/lpar2rrd/lpar2rrd

vi etc/.magic

VMWARE_PARALLEL_RUN=20
export VMWARE_PARALLEL_RUN

Let it run.

Does it help?

Juergen · May 2019

Hi Pavel,
(Version 5.07)

just found this old thread.

I think you comment is not correct!

I checked the load_vmware.sh script and found, that the script is looking for "VCENTER_PARALLEL_RUN" and not for "VMWARE_PARALLEL_RUN".

So, the correct syntax in etc/.magic should be:

VCENTER_PARALLEL_RUN=20
export VCENTER_PARALLEL_RUN

Furthermore, this variable ist not really considered in the scripts, so it makes no sense to use it as it is!
I've now added the following lines to the load_vmware.sh:

if [ $VCENTER_PARALLEL_RUN -gt 1 ]; then
     while [ `ps -e | grep vmware_run.sh | wc -l` -ge $VCENTER_PARALLEL_RUN ];
     do
          echo "waiting ...."
          sleep 1
     done
fi
if [ $VCENTER_PARALLEL_RUN -eq 1 ]; then
     $BINDIR/vmware_run.sh 2>>$ERRLOG   | tee -a $INPUTDIR/logs/load_vmware.log
else
     eval '$BINDIR/vmware_run.sh 2>>$ERRLOG  | tee -a $INPUTDIR/logs/load_vmware.log' &
fi

That works so far but the problem is not the time fetching the data from the VCenters, but the creation of the charts. The part of creating all the charts is sooo time consuming and takes more than 1 hour. How can we parallelise this?

BR Juergen

Pavel · May 2019

Hi,

VMWARE_PARALLEL_RUN is correct variable

# grep VMWARE_PARALLEL_RUN bin/*
bin/vmw2rrd.pl:if ( defined $ENV{VMWARE_PARALLEL_RUN} ) {
bin/vmw2rrd.pl: $PARALLELIZATION = $ENV{VMWARE_PARALLEL_RUN};

Charts are not created in advance.

What is your product version?

Try to increase paralle run to 20.

Let us know.

Juergen · May 2019

Hi Pavel,

as I mentioned in the first line, we are actually on Version 5.07.

In this version is no VMWARE_PARALLEL_RUN !

$ grep VMWARE_PARALLEL_RUN bin/vmw2rrd.pl
$ grep PARALLEL bin/vmw2rrd.pl           
     my $PARALLELIZATION = 10;
     #   $PARALLELIZATION = 1;
         if ( !$do_fork || $cycle_count == $PARALLELIZATION ) {

ok, now I've redone my changes and set $PARALLELIZATION = 20

I'll send you a feedback when a few runs are finished.

BR Juergen

Pavel · May 2019

Hi,

also definitelly upgrade to 6.02, there was a lot of improvements and optimisations on the back-end.

Juergen · May 2019

Hi,
after one night, it is not working. All VMware charts are interrupted. All Power charts are ok.
We got the problem with the migration from AIX to Linux.

However, I've to discuss it first, but I think we will upgrade to the most actual version.

BR Juergen

Juergen · May 2019

Hi Pavel,
now we upgraded to 6.02. The problem is almost solved. We still have a one hour gap at midnight but only on all VMware-, not on the Power graphs.

The load.sh takes appx. 45 minutes and the daily_lpar_check takes appx. 30 minutes, together more than one hour. So the next start of load.sh is blocked by "There is already running another copy of load.sh, exiting ...".

Only for letting you know, we have the following amount of systems:
60 HMCs
174 ManagedSystems (Ps)
2005 LPARs
29 VCenter
449 VMware Hosts
13835 VMware VMs

Best regards,
Juergen

Pavel · May 2019

Hi,

wow, it is prety big environment!

Use this as a hot fix, it will resolve the issue

https://www.lpar2rrd.com/download/load.sh.gz

-rwxrwxr-x 1 lpar2rrd lpar2rrd 22664 May 27 12:32 load.sh

Gunzip it and copy to /home/lpar2rrd/lpar2rrd/ (755, lpar2rrd owner)

If your web browser gunzips it automatically then just rename it: mv load.sh.gz load.sh

Assure that file size is the same as on above example

If you are on Linux then change interpretten on the first line of the script to /bin/bash

Juergen · May 2019

Hi Pavel,

it works!
Sending daily_lpar_check.pl into the background with nohup is an easy solution!

Thank you so far!

BR Juergen

MarekH · April 2020

Hello,

we have a same issue here (version 6.02). VMware data are dropping (missing every 2nd hour) because the load.sh does not finish within a hour. Most likely the cause is higher number of VMs (5000+) belonging to a one of the VCENTERs. Is there some workaround or 5000+ VMs per VCENTER is simply too much please?

VMWARE_PARALLEL_RUN=20

Thank you.

Marek.

Pavel · April 2020

Hi,

well, 5k+ VM is a lot but we have users with 12k+ where it is working.

We making still new enahncements in this stuff to keep run load.sh under 1 hour.

The best would be upgrade to the latest build to see if that helps:

https://www.lpar2rrd.com/download/lpar2rrd-6.16-10.tar

You can even increase VMWARE_PARALLEL_RUN in etc/.magic

export VMWARE_PARALLEL_RUN=40

also assure you have anough of CPU resources on lpar2rrd server

if nothing heplsthen send us logs, we will check what we can do further

Note a short problem description in the text field of the upload form.

cd /home/lpar2rrd/lpar2rrd # or where is your LPAR2RRD working dir

tar cvhf logs.tar logs tmp/*txt tmp/*json

gzip -9 logs.tar

Send us logs.tar.gz via https://upload.lpar2rrd.com

MarekH · June 2020

Hello Pavel,

sorry for my delayed response: thank you for your suggestions. Finally I had a chance to work on a problem again and upgraded the lpar2rrd (6.16) but with no improvement.

Could be we just simply don't have enough resources...?

We have almost 3k LPARs (70 HMCs) and over 20k VMs (30 vCenters)

and our lpar2rrd server is running on Linux RHEL with 32CPUs and 48GB ram.

Thank you very much.

Pavel · June 2020

Hi Marek,

can you upgrade to this version yet? We have just finished further enahncements especially for such big environments.

https://www.lpar2rrd.com/download-static/lpar2rrd-6.16-28.tar

Then go to /home/lpar2rrd/lpar2rrd (lpar2rrd working dir) and set this in etc/.magic:

export VMWARE_PARALLEL_RUN=250

chmod 644 etc/.magic

When problem persit then logs pls

Note a short problem description in the text field of the upload form.

cd /home/lpar2rrd/lpar2rrd # or where is your LPAR2RRD working dir

tar cvhf logs.tar logs tmp/*txt tmp/*json

gzip -9 logs.tar

Send us logs.tar.gz via https://upload.lpar2rrd.com

MarekH · June 2020

Hi,

ok, I will give it a try. First I will crank up the _PARALLEL_RUN value. 250 seems quite like a big difference, that might help. But what about our CPU/RAM setup, is it ok for such environment please?

As for the version - we have 6.16 which I've downloaded a week ago (26.5.) from your download section.

If it persists we can check logs.

Thank you.

Pavel · June 2020

it is ok, try it, your HW setup is fine for that

MarekH · June 2020

Hi,

perhaps a silly question, but is it ok to remove the rule from load.sh/load_vmware.sh which checks if previous instance is still running so the next run will start no matter what? We've tried that and so far it is running fine with no gaps. Can we expect any side effects?

Thanks.

VMware graphs are dropping often

Comments

Howdy, Stranger!

Categories

In this Discussion