Mismatch between CPU tab and CPU OS tab.

kbarth · October 2018

Over the weekend, we had a situation where we had some Nagios processes fall into a runaway state and spike the CPU on a number of our AIX LPARs. LPAR2RRD captured this spike accurately on both the CPU and CPU OS tab. This morning we cleaned up those processes and the CPU utilization returned to normal (via vmstat, nmon, and topas outputs). When we looked at the LPAR2RRD graphs, the CPU OS (run from the agent) reflected the drop in utilization, but the CPU graph continues to show a high CPU utilization.

I checked the LPAR2RRD error.log and found nothing. I reran the load.sh and it completed without any errors or warnings in the data collection output, yet the CPU graph still reflects incorrect information.

Any ideas on how to correct this issue?

kbarth · October 2018

Image: https://forum.xorux.com/uploads/editor/qg/oy88pglqgcvd.jpg

kbarth · October 2018

kbarth · October 2018

A further update. It appears that the OS tab graph returns to reporting correct information 5 hours after the issue was corrected.

kbarth · October 2018

Our sample rate is set to 60.

We did upgrade the HMCs to V9R1 M920 recently to prepare for eventual migration to IBM Power9 hardware.

Pavel · October 2018

wrong time on the HMC.

Check last update time, it is quite different in both examples.

su - lpar2rrd

cd /home/lpar2rrd/lpar2rrd

./bin/sample_rate.sh

HMC time and lslprutil time must be more less same for that server.

Pavel · October 2018

HMC upgrade resets timezone often.

Note: changing TZ require HMC reboot!

kbarth · October 2018

Pavel, you were correct. This version upgrade of the HMC did reset the TZ to UTC, the prior version did not.

kbarth · October 2018

It is interesting to note that the graph kept the local time settings. The update time showed the difference.

Pavel · October 2018

I know

This is not definitelly only upgrade which resets TZ, we seen it many times in the past already.

kbarth · October 2018

Time Zone changed and we will be checking to make sure that the graphs are now consistent. I found it interesting that the bottom line of the graph had the local time, but the graph line followed UTC.

Pavel · October 2018

both times should be same, both are from HMC.

Send a screenshot example.

Mismatch between CPU tab and CPU OS tab.

Comments

Howdy, Stranger!

Categories

In this Discussion