power 8 server statistic disappeared
Hello,
We upgraded to 6.02 and decided to switch to rest api and statistics from one of our servers disappeared. We have 4 servers connected to 2 hmcs.
3 of them - power 7 systems work good, but 9119-MHE server disappeared with all lpars on it. I don't see any issues in logs. But no signs of it in web interface.
except
I see the time is weird, it's like it stuck at 4/29/2019 16:59.
We have 2 more hmcs with 5 severs and similar 9119-MHE server connected. And and everything works well there.
We upgraded to 6.02 and decided to switch to rest api and statistics from one of our servers disappeared. We have 4 servers connected to 2 hmcs.
3 of them - power 7 systems work good, but 9119-MHE server disappeared with all lpars on it. I don't see any issues in logs. But no signs of it in web interface.
download data : hmc2-sh:Server-9119-MHE-SN last 140 minute(s) (3 hours)
fetching HMC : hmc2-sh:Server-9119-MHE-SN lpar data
fetching HMC : hmc2-sh:Server-9119-MHE-SN pool data
last rec 3 : hmc2-sh:Server-9119-MHE-SN min:21144 , hour:353, 4/29/2019 16:59 : last-pool.txt - source:pool
fetching HMC : hmc2-sh:Server-9119-MHE-SN mem data
last rec 3 : hmc2-sh:Server-9119-MHE-SN min:21144 , hour:353, 4/29/2019 16:59 : last-mem.txt - source:mem
fetching HMC : hmc2-sh:Server-9119-MHE-SN CoD data
last rec 3 : hmc2-sh:Server-9119-MHE-SN min:2880 , hour:48, 4/29/2019 16:59 : init - last-cod.txt - source:cod
except
ERROR: Server-9119-MHE-SN, hmc-sh data load in /home/lpar2rrd/lpar2rrd/bin/lpar2rrd.pl at line 851
ERROR: /home/lpar2rrd/lpar2rrd/data/Server-9119-MHE-SN/hmc-sh/rbo-db-sh.rrm: not a simple unsigned integer: '-32812763677588725' at /home/lpar2rrd/lpar2rrd/bin/LoadDataModule.pm line 420.
date load : hmc-sh:Server-9119-MHE-SN Tue May 14 09:18:30 2019
I see the time is weird, it's like it stuck at 4/29/2019 16:59.
We have 2 more hmcs with 5 severs and similar 9119-MHE server connected. And and everything works well there.
Comments
-
Hi,use this as a fix:-rwxrwxr-x 1 lpar2rrd lpar2rrd 315607 May 10 09:04 bin/LoadDataModule.pmGunzip it and copy to /home/lpar2rrd/lpar2rrd/bin (755, lpar2rrd owner)If your web browser gunzips it automatically then just rename it: mv LoadDataModule.pm.gz LoadDataModule.pmAssure that file size is the same as on above example
-
Hello,
Thank you for your help. Server appeared on the web interface. And now i can see statistics from lpar agents. However we have new issue.
We don't have most of statistics from hmc from all servers since the moment i replaced the file.
bu cpu pool is still there
There are messages about invalid jsons in logs.last rec 3 : hmc-sh:Server-9117-MMD-SN067B767 min:2880 , hour:48, 5/15/2019 8:40 : init - last-cod.txt - source:codRest API 2019-05-15T09:00:45 : inserting hmc2-sh Server-9117-MMD-SN064B6F7 HMC_hmc2-sh_lpars_perf_20190515_0900.json to rrd filesHMC_hmc2-kt_lpars_perf_20190515_0900.json is not valid : HASH(0x2101188)no content in /home/lpar2rrd/lpar2rrd/data/Server-9117-MMC-SN062D397/hmc2-kt/iostat/HMC_hmc2-kt_lpars_perf_20190515_0900.jsonHMC_hmc-kt_lpars_perf_20190515_0900.json is not valid : HASH(0x3237188) -
Hi,
there is the fix that should resolve the issue.Gunzip it and copy to /home/lpar2rrd/lpar2rrd/bin (755, lpar2rrd owner)-rwxrwxr-x 1 lpar2rrd lpar2rrd 241786 May 15 08:53 bin/hmc_rest_api.plIf your web browser gunzips it automatically then just rename it: mv hmc_rest_api.pl.gz hmc_rest_api.pl
Assure that file size is the same as on above example -
This helped, issue solved. Thank you very much.
-
Hello,
Here i am again. We'he got new issue.
Now we have blanks in hmc graphics.
Logs look like thisLPARSUTIL2 : tst-rep-ah-72LPARSUTIL3 : tst-rep-ah-72 ts for 05/20/2019 09:39:30 is OK?LPARSUTIL2 : tst_rep_ahLPARSUTIL3 : tst_rep_ah ts for 05/20/2019 09:39:30 is OK?LPARSUTIL2 : nes-t1a-app6LPARSUTIL3 : nes-t1a-app6 ts for 05/20/2019 09:39:30 is OK?Rest API 2019-05-20 09:40:39 : hmc-ah Server-9117-MMB-SN Perffiles OKRest API 2019-05-20 09:40:39 : hmc-ah Server-9117-MMB-SN PID end : 0
And then[lpar2rrd@xorux-mon lpar2rrd]$ cat load_hmc_rest_api.outMon May 20 10:00:01 +06 2019: There is already running another copy of load_hmc_rest_api.sh, exiting ...
-
Hi,
send us logs. It looks like your load_hmc_rest_api.sh runs quite a long time. It should take a few minutes.
Mon May 20 10:00:01 +06 2019: There is already running another copy of load_hmc_rest_api.sh, exiting ...
This indicates, that more than 20 minutes are needed.
cd /home/lpar2rrd/lpar2rrd # or where is your LPAR2RRD working directory
grep -v password etc/web_config/hosts.json > tmp/hosts.txt
ls -l data/*/*/cpu.cfg > tmp/o.txt
ls -l data/*/*/pool.rrm >> tmp/o.txt
tar cvhf logs.tar logs tmp/restapi/* tmp/*.txt
gzip -9 logs.tar
https://upload.lpar2rrd.com/
-
I figured it out. It's one of our hmc hangs. Sorry to bother you.
-
We have 2 hmc connected to same servers. So if one of them takes to long to answer we have the issue described above.
-
Hi,basically yes, this happens when connection to one HMC hangs.You are the first one reporting it, we have never seen it till now.
-
E332FFFF
Explanation
This error occurs when the HMC receives notification that a particular Java code string is corrupted.Problem determination
This is the reason hmc hanged. -
what was resolution? HMC reboot?Where this error appeared? Any particual HMC log?thanks
-
Yeah i just rebooted hmc. This error appeared on hmc Serviceable Events Overview.
And repeated 6 time for the past 2 days.
And one more alertE212E151
Explanation
Licensed Internal Code failure on the Hardware Management Console (HMC).Response
CPU Alert: The SE HMC overall was way too busy for too long. Error reason = percent in use scaled by 10.
I will try to get more logs.
-
ok, thanks, I think it is enough for identification
-
I opened a case with ibm support. Will let you know the result.
Howdy, Stranger!
Categories
- 1.6K All Categories
- 41 XORMON NG
- 25 XORMON
- 149 LPAR2RRD
- 13 VMware
- 16 IBM i
- 2 oVirt / RHV
- 4 MS Windows and Hyper-V
- Solaris / OracleVM
- XenServer / Citrix
- Nutanix
- 6 Database
- 2 Cloud
- 10 Kubernetes / OpenShift / Docker
- 122 STOR2RRD
- 19 SAN
- 7 LAN
- 17 IBM
- 3 EMC
- 12 Hitachi
- 5 NetApp
- 15 HPE
- Lenovo
- 1 Huawei
- 1 Dell
- Fujitsu
- 2 DataCore
- INFINIDAT
- 3 Pure Storage
- Oracle