power 8 server statistic disappeared

owlmind · May 2019

Hello,

We upgraded to 6.02 and decided to switch to rest api and statistics from one of our servers disappeared. We have 4 servers connected to 2 hmcs.
3 of them - power 7 systems work good, but 9119-MHE server disappeared with all lpars on it. I don't see any issues in logs. But no signs of it in web interface.

download data : hmc2-sh:Server-9119-MHE-SN last 140 minute(s) (3 hours)

fetching HMC : hmc2-sh:Server-9119-MHE-SN lpar data

fetching HMC : hmc2-sh:Server-9119-MHE-SN pool data

last rec 3 : hmc2-sh:Server-9119-MHE-SN min:21144 , hour:353, 4/29/2019 16:59 : last-pool.txt - source:pool

fetching HMC : hmc2-sh:Server-9119-MHE-SN mem data

last rec 3 : hmc2-sh:Server-9119-MHE-SN min:21144 , hour:353, 4/29/2019 16:59 : last-mem.txt - source:mem

fetching HMC : hmc2-sh:Server-9119-MHE-SN CoD data

last rec 3 : hmc2-sh:Server-9119-MHE-SN min:2880 , hour:48, 4/29/2019 16:59 : init - last-cod.txt - source:cod

except

ERROR: Server-9119-MHE-SN, hmc-sh data load in /home/lpar2rrd/lpar2rrd/bin/lpar2rrd.pl at line 851

ERROR: /home/lpar2rrd/lpar2rrd/data/Server-9119-MHE-SN/hmc-sh/rbo-db-sh.rrm: not a simple unsigned integer: '-32812763677588725' at /home/lpar2rrd/lpar2rrd/bin/LoadDataModule.pm line 420.

date load : hmc-sh:Server-9119-MHE-SN Tue May 14 09:18:30 2019

I see the time is weird, it's like it stuck at 4/29/2019 16:59.

We have 2 more hmcs with 5 severs and similar 9119-MHE server connected. And and everything works well there.

Pavel · May 2019

Hi,

use this as a fix:

https://www.lpar2rrd.com/download/LoadDataModule.pm.gz

-rwxrwxr-x 1 lpar2rrd lpar2rrd 315607 May 10 09:04 bin/LoadDataModule.pm

Gunzip it and copy to /home/lpar2rrd/lpar2rrd/bin (755, lpar2rrd owner)

If your web browser gunzips it automatically then just rename it: mv LoadDataModule.pm.gz LoadDataModule.pm

Assure that file size is the same as on above example

owlmind · May 2019

Hello,

Thank you for your help. Server appeared on the web interface. And now i can see statistics from lpar agents. However we have new issue.

We don't have most of statistics from hmc from all servers since the moment i replaced the file.

Image: https://forum.xorux.com/uploads/editor/01/jqacaxa28gxv.png

Image: https://forum.xorux.com/uploads/editor/jl/sajve80qb9az.png

bu cpu pool is still there

Image: https://forum.xorux.com/uploads/editor/sf/j5e98dhkzyeq.png

There are messages about invalid jsons in logs.

last rec 3 : hmc-sh:Server-9117-MMD-SN067B767 min:2880 , hour:48, 5/15/2019 8:40 : init - last-cod.txt - source:cod

Rest API 2019-05-15T09:00:45 : inserting hmc2-sh Server-9117-MMD-SN064B6F7 HMC_hmc2-sh_lpars_perf_20190515_0900.json to rrd files

HMC_hmc2-kt_lpars_perf_20190515_0900.json is not valid : HASH(0x2101188)

no content in /home/lpar2rrd/lpar2rrd/data/Server-9117-MMC-SN062D397/hmc2-kt/iostat/HMC_hmc2-kt_lpars_perf_20190515_0900.json

HMC_hmc-kt_lpars_perf_20190515_0900.json is not valid : HASH(0x3237188)

jan_dvorak · May 2019

Hi,

there is the fix that should resolve the issue.

https://download.lpar2rrd.com/patch/hmc_rest_api.pl.gz

Gunzip it and copy to /home/lpar2rrd/lpar2rrd/bin (755, lpar2rrd owner)

-rwxrwxr-x 1 lpar2rrd lpar2rrd 241786 May 15 08:53 bin/hmc_rest_api.pl

If your web browser gunzips it automatically then just rename it: mv hmc_rest_api.pl.gz hmc_rest_api.pl

Assure that file size is the same as on above example

owlmind · May 2019

This helped, issue solved. Thank you very much.

owlmind · May 2019

Hello,
Here i am again. We'he got new issue.
Now we have blanks in hmc graphics.

Logs look like this

LPARSUTIL2 : tst-rep-ah-72

LPARSUTIL3 : tst-rep-ah-72 ts for 05/20/2019 09:39:30 is OK?

LPARSUTIL2 : tst_rep_ah

LPARSUTIL3 : tst_rep_ah ts for 05/20/2019 09:39:30 is OK?

LPARSUTIL2 : nes-t1a-app6

LPARSUTIL3 : nes-t1a-app6 ts for 05/20/2019 09:39:30 is OK?

Rest API 2019-05-20 09:40:39 : hmc-ah Server-9117-MMB-SN Perffiles OK

Rest API 2019-05-20 09:40:39 : hmc-ah Server-9117-MMB-SN PID end : 0

And then

[lpar2rrd@xorux-mon lpar2rrd]$ cat load_hmc_rest_api.out

Mon May 20 10:00:01 +06 2019: There is already running another copy of load_hmc_rest_api.sh, exiting ...

jan_dvorak · May 2019

Hi,

send us logs. It looks like your load_hmc_rest_api.sh runs quite a long time. It should take a few minutes.

Mon May 20 10:00:01 +06 2019: There is already running another copy of load_hmc_rest_api.sh, exiting ...
This indicates, that more than 20 minutes are needed.

cd /home/lpar2rrd/lpar2rrd # or where is your LPAR2RRD working directory
grep -v password etc/web_config/hosts.json > tmp/hosts.txt
ls -l data/*/*/cpu.cfg > tmp/o.txt
ls -l data/*/*/pool.rrm >> tmp/o.txt
tar cvhf logs.tar logs tmp/restapi/* tmp/*.txt
gzip -9 logs.tar
https://upload.lpar2rrd.com/

owlmind · May 2019

I figured it out. It's one of our hmc hangs. Sorry to bother you.

owlmind · May 2019

We have 2 hmc connected to same servers. So if one of them takes to long to answer we have the issue described above.

Pavel · May 2019

Hi,

basically yes, this happens when connection to one HMC hangs.

You are the first one reporting it, we have never seen it till now.

owlmind · May 2019

E332FFFF

Explanation

This error occurs when the HMC receives notification that a particular Java code string is corrupted.

Problem determination

This is the reason hmc hanged.

Pavel · May 2019

what was resolution? HMC reboot?

Where this error appeared? Any particual HMC log?

thanks

owlmind · May 2019

Yeah i just rebooted hmc. This error appeared on hmc Serviceable Events Overview.
And repeated 6 time for the past 2 days.
And one more alert

E212E151

Explanation

Licensed Internal Code failure on the Hardware Management Console (HMC).

Response

CPU Alert: The SE HMC overall was way too busy for too long. Error reason = percent in use scaled by 10.

I will try to get more logs.

Pavel · May 2019

ok, thanks, I think it is enough for identification

owlmind · May 2019

I opened a case with ibm support. Will let you know the result.

power 8 server statistic disappeared

Comments

E332FFFF

Explanation

Problem determination

E212E151

Explanation

Response

Howdy, Stranger!

Categories

In this Discussion