"removed" VIO servers
Hello, I'm a new user this week on the community/free version of 7.80-1 , running via the appliance.
I'm connected to a HMC and getting performance data back from lots of IBMi and VIO servers; all but two. On one host, on the left navbar for "IBM Power Systems ->. servername -> LPAR -> Removed " there are two VIO servers listed. I'm getting data from all of our other VIO servers on all of our other hosts, as well as data from the IBMi lpars on this specific host. And since they're listed as "Removed", then there seems to have had data retrieved about them originally?
They seem to be displaying as expected in the HMC performance tabs appropriately.
Any ideas on how to troubleshoot/diagnose as the next step? All ideas are appreciated!
Thanks-- Jeff
Comments
-
Hi,
here are our notes what we advice tousers whose has that issue
------------------------------------------------------------
VIOS data is not collected through REST API because VIOS does not communicate with the HMC, we have seen that many times, check below.
I would start with point 4, then go with 3, both is something what helped a few users recently.
1. To resolve issue: http://www-01.ibm.com/support/docview.wss?uid=isg3T1024482
also restarting vios daemon should work
2.
There might be a problem with vio_daemon stuck or not communicating with HMC:
https://forum.xorux.com/discussion/comment/3450#Comment_3450
https://www.ibm.com/support/pages/node/629995
It can be also related to resolving the vioses hostname/s, dns.
also this might help:
Please, follow point 2. from the below IBM Tech Note to stop and start the vio daemon:
basically would be enough start or restart vio_daemon:
under root:
ls -l /usr/ios/db/bin/solid*
ps -ef | egrep "vio_daemon|solid|db"
lssrc -ls vio_daemon
stopsrc -s vio_daemon
startsrc -s vio_daemon
sleep 10
ps -ef | egrep "vio_daemon|solid|db"
lssrc -ls vio_daemon
when you go to UI --> IBM Power --> RMC check, is that VIOS listed there?
so after running script cleanup_cmdb_with_logging.sh and next load.sh it works correctly.
--> another user it helped as well, he saw error on HMC under Virtual Networks: Error occurred while quering for SharedEthernetAdapter from VIOS ....
4.here is what resolved our internal issue on our P10 machine, proper DNS setup and vio-daemon restart
# tail -1 /etc/hosts
10.x.x.x p10-vios p10-vios.int.xorux.com
# tail -1 /etc/netsvc.conf
hosts=local,bind4
# cat /etc/resolv.conf
domain int.xorux.com
nameserver 10.x.x.x
nameserver 1.1.1.1
--> make sure DNS is working properly, nslookup/ping
# stopsrc -s vio_daemon
# startsrc -s vio_daemon
5. can you test the same solution as is described at the end of this thread?
https://forum.xorux.com/discussion/comment/5744#Comment_5744
6.You were indeed right ¿ the problem was with the vio servers. At first, we couldn¿t see any problems with the vio functions from the HMC, but digging some more, certain ¿work with virtual networks¿ functions would throw an error on the HMC.
The solution was to force the vio to IPv4 name resolution, we received this from IBM:
# vi /etc/netsvc.conf
hosts=local4,bind4 <===== change ¿hosts=local,bind¿ to this
Then
/usr/bin/stopsrc -s vio_daemon
Wait 300 seconds or until vio_daemon has stopped.
/usr/sbin/slibclean
rm -rf /home/ios/CM
/usr/bin/startsrc -s vio_daemon -a '-d 4'
ps -ef |grep vio_chgmgt |grep -v grep |awk -F ' ' '{print $2}'
kill -1 <PID_of_vio_chgmgt>
7. IBM support:
The VIOS version that you are using has an issue with logs collection, so I'm missing the copy of the CMDB, that is the DB running on VIOS that HMC queries
I do not see any clear error in the logs but I can¿t check the DB to see if it¿s properly populated
The good new is that we can recreate this DB without any impact to the running LPARs with the following steps
$ oem_setup_env
# stopsrc -s vio_daemon
# /usr/sbin/slibclean
# rm -rf /home/ios/CM
# rm /home/ios/logs/viod_bkps/*
# startsrc -s vio_daemon -a '-d 4'
# kill -1 vio_daemon's PID
Then wait few minutes and retry the operation that was failing
Howdy, Stranger!
Categories
- 1.6K All Categories
- 43 XORMON NG
- 25 XORMON
- 152 LPAR2RRD
- 13 VMware
- 16 IBM i
- 2 oVirt / RHV
- 4 MS Windows and Hyper-V
- Solaris / OracleVM
- XenServer / Citrix
- Nutanix
- 7 Database
- 2 Cloud
- 10 Kubernetes / OpenShift / Docker
- 122 STOR2RRD
- 19 SAN
- 7 LAN
- 17 IBM
- 3 EMC
- 12 Hitachi
- 5 NetApp
- 15 HPE
- Lenovo
- 1 Huawei
- 1 Dell
- Fujitsu
- 2 DataCore
- INFINIDAT
- 3 Pure Storage
- Oracle