"removed" VIO servers

Hello, I'm a new user this week on the community/free version of 7.80-1 , running via the appliance.

I'm connected to a HMC and getting performance data back from lots of IBMi and VIO servers; all but two. On one host, on the left navbar for "IBM Power Systems ->. servername -> LPAR -> Removed " there are two VIO servers listed. I'm getting data from all of our other VIO servers on all of our other hosts, as well as data from the IBMi lpars on this specific host. And since they're listed as "Removed", then there seems to have had data retrieved about them originally?

They seem to be displaying as expected in the HMC performance tabs appropriately.

Any ideas on how to troubleshoot/diagnose as the next step? All ideas are appreciated!

Thanks-- Jeff

Comments

  • Pavel
    edited January 25

    Hi,

    here are our notes what we advice tousers whose has that issue

    ------------------------------------------------------------

    VIOS data is not collected through REST API because VIOS does not communicate with the HMC, we have seen that many times, check below.

    I would start with point 4, then go with 3, both is something what helped a few users recently.


    1. To resolve issue: http://www-01.ibm.com/support/docview.wss?uid=isg3T1024482

     also restarting vios daemon should work

      https://www.ibm.com/support/pages/when-using-hmc-gui-you-see-message-unable-connect-database-error-occurred


    2.

    There might be a problem with vio_daemon stuck or not communicating with HMC:

      https://forum.xorux.com/discussion/comment/3450#Comment_3450

      https://www.ibm.com/support/pages/node/629995

     It can be also related to resolving the vioses hostname/s, dns.

     also this might help:

     Please, follow point 2. from the below IBM Tech Note to stop and start the vio daemon:

      https://www.ibm.com/support/pages/when-using-hmc-gui-you-see-message-unable-connect-database-error-occurred


    basically would be enough start or restart vio_daemon:

    under root:

     ls -l /usr/ios/db/bin/solid*

     ps -ef | egrep "vio_daemon|solid|db"

     lssrc -ls vio_daemon

     stopsrc -s vio_daemon

     startsrc -s vio_daemon

     sleep 10


     ps -ef | egrep "vio_daemon|solid|db"

     lssrc -ls vio_daemon

     when you go to UI --> IBM Power --> RMC check, is that VIOS listed there?


    3. https://www.ibm.com/support/pages/when-using-hmc-gui-you-see-message-unable-connect-database-error-occurred

    so after running script cleanup_cmdb_with_logging.sh and next load.sh it works correctly.

           --> another user it helped as well, he saw error on HMC under Virtual Networks: Error occurred while quering for SharedEthernetAdapter from VIOS ....


    4.here is what resolved our internal issue on our P10 machine, proper DNS setup and vio-daemon restart

                   # tail -1 /etc/hosts

                   10.x.x.x    p10-vios p10-vios.int.xorux.com

                   # tail -1 /etc/netsvc.conf

                   hosts=local,bind4

                   # cat /etc/resolv.conf

                   domain int.xorux.com

                   nameserver 10.x.x.x

                   nameserver 1.1.1.1

                           --> make sure DNS is working properly, nslookup/ping

                   # stopsrc -s vio_daemon

                   # startsrc -s vio_daemon


    5. can you test the same solution as is described at the end of this thread?

            https://forum.xorux.com/discussion/comment/5744#Comment_5744


    6.You were indeed right ¿ the problem was with the vio servers. At first, we couldn¿t see any problems with the vio functions from the HMC, but digging some more, certain ¿work with virtual networks¿ functions would throw an error on the HMC.

    The solution was to force the vio to IPv4 name resolution, we received this from IBM:


    # vi /etc/netsvc.conf

    hosts=local4,bind4     <===== change ¿hosts=local,bind¿ to this


    Then


     /usr/bin/stopsrc -s vio_daemon

    Wait 300 seconds or until vio_daemon has stopped.

     /usr/sbin/slibclean

     rm -rf /home/ios/CM

     /usr/bin/startsrc -s vio_daemon -a '-d 4'

       ps -ef |grep vio_chgmgt |grep -v grep |awk -F ' ' '{print $2}'

     kill -1 <PID_of_vio_chgmgt>


    7. IBM support:


    The VIOS version that you are using has an issue with logs collection, so I'm missing the copy of the CMDB, that is the DB running on VIOS that HMC queries


    I do not see any clear error in the logs but I can¿t check the DB to see if it¿s properly populated


    The good new is that we can recreate this DB without any impact to the running LPARs with the following steps

    $ oem_setup_env

    # stopsrc -s vio_daemon

    # /usr/sbin/slibclean

    # rm -rf /home/ios/CM

    # rm /home/ios/logs/viod_bkps/*

    # startsrc -s vio_daemon -a '-d 4'

    # kill -1 vio_daemon's PID

    Then wait few minutes and retry the operation that was failing

Sign In or Register to comment.