Linux on Power Agent Problem

Hi,

we have the problem that the lpar2rrd agent 5.05.4 Version on our: SLES12 SP2 (ppc64le) is not sending any data to the lpar2rrd server.
With the HMC we collect the infos, but the additional data SAN IOPS etc. are not collected and that would be a important info for us.

When I try to change the Data Source on the WEB UI I see only "HMC (agentless)"

LPAR2rrd-Server: 5.05

$ tail -1 /var/tmp/lpar2rrd-agent-192.168.121.16-root.txt | cut -d : -f 1-7
9080-MHE*#######:lnx0a:9:1513843202:Thu Dec 21 09:00:02 2017 version 5.05-4
$ tail -1 /var/tmp/lpar2rrd-agent-192.168.121.16-root-ps_job.txt | cut -d : -f 1-7
408617,root,[kworker/18:0],00:00:00,0,0

Telnet works:
$ telnet 192.168.121.16 8162
Trying 192.168.121.16...
Connected to 192.168.121.16.
Escape character is '^]'.

Thank you for your help!



Comments

  • this output, looks like something failing in the agent

    /usr/bin/perl /opt/lpar2rrd-agent/lpar2rrd-agent.pl -d <LPAR2RRD-SERVER>

  • master07
    edited December 2017
    $ s /usr/bin/perl /opt/lpar2rrd-agent/lpar2rrd-agent.pl 192.168.121.16 -d
    LPAR2RRD agent version:5.05-4
    Thu Dec 21 15:57:26 2017
    main timeout 600
    JOB TOP setting: MAX_JOBS=20, LOAD_LIMIT=10, LOAD_LIMIT_MIN=1, PROCESSES_INCLUDE= --
    uname -W 2>/dev/null
    false
    false
    false
    cat /proc/meminfo 2>>/var/tmp/lpar2rrd-agent-192.168.121.16-root.err
    cat /proc/ppc64/lparcfg 2>>/var/tmp/lpar2rrd-agent-192.168.121.16-root.err
    false
    cat /proc/vmstat 2>>/var/tmp/lpar2rrd-agent-192.168.121.16-root.err
    false 2>>/var/tmp/lpar2rrd-agent-192.168.121.16-root.err
    false 2>/dev/null
    false 2>>/var/tmp/lpar2rrd-agent-192.168.121.16-root.err
    ifconfig -a 2>>/var/tmp/lpar2rrd-agent-192.168.121.16-root.err
    false 2>>/var/tmp/lpar2rrd-agent-192.168.121.16-root.err
    uname -n 2>>/var/tmp/lpar2rrd-agent-192.168.121.16-root.err
    vmstat 60 2 2>>/var/tmp/lpar2rrd-agent-192.168.121.16-root.err
    9080-MH:lnx00:9:1513868246:Thu Dec 21 15:57:26 2017 version 5.05-4:|::::mem:::4290320768:3661468032:628852736:-1:3603741888:57726144:pgs:::0:0:24607.9375:0.0:::lan:eth0:192.168.101.10:16962918304963:175760633497:::::lan:eth1:10.1.245.10:7764992448214:582111082689:::::lan:eth2:172.30.61.10:944962155345:8633136337424:::::cpu:::0:0:1:0::
    /var/tmp/lpar2rrd-agent-192.168.121.16-root.txt
     9080-MHE:lnx0a:9:1513868246:Thu Dec 21 15:57:26 2017 version 5.05-4:|::::mem:::4290320768:3661468032:628852736:-1:3603741888:57726144:pgs:::0:0:24607.9375:0.0:::lan:eth0:192.168.101.10:16962918304963:175760633497:::::lan:eth1:10.1.245.10:7764992448214:582111082689:::::lan:eth2:172.30.61.10:944962155345:8633136337424:::::cpu:::0:0:1:0::
    Agent send     : not sending data this time (act_time=1513868246, last_send_time=1513867802, next_time=1513868102, random=6)
  • ok, I did not notice that you "cut" output in previous example. here it looks ok.
    ls -l /var/tmp/lpar2rrd*
    tail -2 /var/tmp/lpar2rrd*err
  • master07
    edited December 2017
    $ ls -l /var/tmp/lpar2rrd*
    -rw-r--r-- 1 root root 123642 Dec 21 16:30 /var/tmp/lpar2rrd-agent-192.168.121.16-root-ps_job.txt
    -rw-r--r-- 1 root root   2169 Nov 30 16:09 /var/tmp/lpar2rrd-agent-192.168.121.16-root.err
    -rw-r--r-- 1 root root      0 Dec 21 16:31 /var/tmp/lpar2rrd-agent-192.168.121.16-root.stamp
    -rw-r--r-- 1 root root     10 Dec 21 16:31 /var/tmp/lpar2rrd-agent-192.168.121.16-root.stamp-send
    -rw-r--r-- 1 root root      0 Dec 21 14:33 /var/tmp/lpar2rrd-agent-192.168.121.16-root.stamp-trimlogs
    -rw-r--r-- 1 root root   1745 Dec 21 16:36 /var/tmp/lpar2rrd-agent-192.168.121.16-root.txt
    -rw-r--r-- 1 root root      0 Dec 21 16:31 /var/tmp/lpar2rrd-agent-192.168.121.16-root.txt-tmp
    -rw-r--r-- 1 root root   6332 Dec 21 16:31 /var/tmp/lpar2rrd-agent-192.168.121.16-root.txtorig
    -rw-r--r-- 1 root root      0 Nov 17 14:00 /var/tmp/lpar2rrd-agent-nmon-192.168.121.16-root-time_file.txt

    $ tail -2 /var/tmp/lpar2rrd*err
    Thu Nov 30 16:09:06 2017: wrong server response: agent_time:1512054001 : recv_time:0 :
    Thu Nov 30 16:09:06 2017: Error: Not all data has been sent out, refused line: 9080-MHE*78:lnx0a:9:1512054001:Thu Nov 30 16:00:01 2017 version 5.05-4:|::::mem:::4290320768:3509294592:781026176:-1:3453965952:55328640:pgs:::0:0:24607.9375:0.0:::lan:eth0:192.168.101.10:9502665536164:125387129903:::::lan:eth1:10.1.245.10:7755117983677:579608000258:::::lan:eth2:172.30.61.10:679845861905:7290371919012:::::cpu:::0:0:0:0:::CPUTOP:263757:h3eadm:hdbindexserver -port 30040:4265820:1074:3426822976:3431358272::CPUTOPb/log/DB_H31/:683:75:241536:643776::CPUTOP:263759:h3eadm:hdbindexserver -port 30043:219542:65:25269568:101901056::CPUTOP:263629:h3eadm:hdbnameserver:92488:27:19408832:98522880::CPUTOP:47876:sapadm:/usr/sap/hostctrl/exe/sapstartsrv pf=/usr/sap/hostctrl/exe/host_profile -D:73548:21:183168:1698112::CPUTOP:48827:root:/usr/sap/hostctrl/exe/saposcol -l -w60 pf=/usr/sap/hostctrl/exe/host_profile:50878:15:19648:24640::CPUTOP:263715:h3eadm:hdbpreprocessor:52975:15:1698816:91459520::CPUTOP:264237:h3eadm:hdbwebdispatcher:46682:14:1816640:90378496::CPUTOP:263713:h3eadm:hdbcompileserver:45577:14:1555008:90306944::CPUTOP:120017:daaadm:/u... -hostvm -nodeName=smdagent -file=/usr/sap/DAA/SMDA98/smdagentgroup.properties -jvmFile=/usr/sap/DAA/SMDA98/work/jstart.jvm -traceFile=/usr/sap/DAA/SMDA98/work/dev_smdagent -javaOutFile=/usr/sap/DAA/SMDA98/work/jvm_smdagent.out:34225:11:606592:9144640::CPUTOP:120015:-hostvm -nodeName=smdagent_saph3e00a -file=/usr/sap/DAA/SMDA98/smdagentgroup.properties -jvmFile=/usr/sap/DAA/SMDA98/work/jstart.jvm -traceFile=/usr/sap/DAA/SMDA98/work/dev_smdagent_saph3e00a -javaOutFile=/usr/sap/DAA/SMDA98/work/jvm_smdagent_saph3e00a.out:33635:11:612288:9225152: /opt/lpar2rrd-agent/lpar2rrd-agent.pl:2454
  • It looks fine, data is collected and trasferent to lpar2rrd server.
    Are you deleteing serial from examples above, I see only server type like: 9080-MH, there should be serial.

    On the server side:
    cd /home/lpar2rrd/lpar2rrd
    ls -l data/*/*/lnx0a/


  • master07
    edited December 2017
    Hi Pavel,

    maybe I have deleted some lines in the output to hide some private information.

    here is the ls output:
    -rw-r--r-x 1 root root 3825840 Dec 28 08:00 data/9080-.../hmc0001a/lnx0a.rrm


    PS: Is there a possible way to provide only you some private information?

    Thanks!

  • ok, all data except SAN is collected.
    Note that not all FC drivers on Linux provide perf data.

    Is that exists on your Linux host?
    ls -l /sys/class/fc_host/*/statistics/tx_frames

    Is there any data inside (cat ...)?

    BTW: you can reach us at: support at lpar2rrd.com


  • no there is no directory statistics:

    $ ls -l /sys/class/fc_host/*/statistics/tx_frames
    ls: cannot access '/sys/class/fc_host/*/statistics/tx_frames': No such file or directory

    $ ls /sys/class/fc_host/*/
    /sys/class/fc_host/host0/:
    dev_loss_tmo  issue_lip      port_id     port_type  subsystem           uevent
    device          maxframe_size  port_name     power        supported_classes
    fabric_name   node_name      port_state  speed        tgtid_bind_type

    /sys/class/fc_host/host1/:
    dev_loss_tmo  issue_lip      port_id     port_type  subsystem           uevent
    device          maxframe_size  port_name     power        supported_classes
    fabric_name   node_name      port_state  speed        tgtid_bind_type

    So I guess they is no way for  the lpar2rrd-agent to get the SAN Data instead?

    Thank you for your help!


  • hmm, there is no file with stats as you can see, no way for us ...
    If you find any other method how to get perf data at you then let us know.
  • okay, thank you any way and have nice New Year's Eve.
    See you 2018 :p
  • I have the same problem whit a Solaris ldom, y see the followin messages on the /var/tmp/lpar2rrd-agent.out file 

    Agent send     : not sending data this time (act_time=1535008380, last_send_time=1535008142, next_time=1535008442, random=9)

    on server i see the information of server but don show on GUI

    /home/lpar2rrd/lpar2rrd/data/Solaris/no_hmc/lbdgdbmurex3

    [lpar2rrd@lbdgvlpar2rrd lbdgdbmurex3]$ ls -l
    total 20576
    -rw-r--r-- 1 lpar2rrd lpar2rrd       7 Aug  9 05:54 agent.cfg
    -rw-r--r-- 1 lpar2rrd lpar2rrd 3825840 Aug  9 09:12 cpu.mmm
    drwxrwxr-x 2 lpar2rrd lpar2rrd       6 Aug 23 02:13 JOB
    -rw-r--r-- 1 lpar2rrd lpar2rrd      14 Aug 22 23:16 lan-net0.cfg
    -rw-r--r-- 1 lpar2rrd lpar2rrd 3825840 Aug 23 02:09 lan-net0.mmm
    -rw-r--r-- 1 lpar2rrd lpar2rrd       8 Aug  9 05:54 lan-net1.cfg
    -rw-r--r-- 1 lpar2rrd lpar2rrd 3825840 Aug  9 09:12 lan-net1.mmm
    -rw-r--r-- 1 lpar2rrd lpar2rrd 5738368 Aug 23 02:09 mem.mmm
    -rw-r--r-- 1 lpar2rrd lpar2rrd 3825840 Aug 23 02:09 pgs.mmm



  • Hi,

    send us this info via support at lpar2rrd.com

    agent side:
    tail -1 /var/tmp/lpar2rrd*txt
    ls -l /var/tmp/lpar2rrd*

    lpar2rrd server side:
    cd /home/lpar2rrd/lpar2rrd
    grep -i lbdgdbmurex3 logs/error.log-daemon| tail




  • root@lbdgdbmurex3:~# tail -1 /var/tmp/lpar2rrd*txt
    27960,ora11g,ora_w008_murex1,00:00,4853400,4840800
    root@lbdgdbmurex3:~# ls -l /var/tmp/lpar2rrd*
    -rw-r--r--   1 lpar2rrd lpar2rrd   17905 Sep 10 11:00 /var/tmp/lpar2rrd-agent-NimserverPRD-bck-lpar2rrd-ps_job.txt
    -rw-r--r--   1 lpar2rrd lpar2rrd    2096 Aug 30 12:45 /var/tmp/lpar2rrd-agent-NimserverPRD-bck-lpar2rrd.err
    -rw-r--r--   1 lpar2rrd lpar2rrd       0 Sep 10 11:19 /var/tmp/lpar2rrd-agent-NimserverPRD-bck-lpar2rrd.stamp
    -rw-r--r--   1 lpar2rrd lpar2rrd      10 Sep 10 11:19 /var/tmp/lpar2rrd-agent-NimserverPRD-bck-lpar2rrd.stamp-send
    -rw-r--r--   1 lpar2rrd lpar2rrd       0 Sep 10 09:40 /var/tmp/lpar2rrd-agent-NimserverPRD-bck-lpar2rrd.stamp-trimlogs
    -rw-r--r--   1 lpar2rrd lpar2rrd     323 Sep 10 11:20 /var/tmp/lpar2rrd-agent-NimserverPRD-bck-lpar2rrd.txt
    -rw-r--r--   1 lpar2rrd lpar2rrd       0 Sep 10 11:19 /var/tmp/lpar2rrd-agent-NimserverPRD-bck-lpar2rrd.txt-tmp
    -rw-r--r--   1 lpar2rrd lpar2rrd    8792 Sep 10 11:19 /var/tmp/lpar2rrd-agent-NimserverPRD-bck-lpar2rrd.txtorig
    -rw-r--r--   1 root     root           0 Aug 23 01:20 /var/tmp/lpar2rrd-agent-NimserverPRD-bck-root.err
    -rw-r--r--   1 root     root           0 Aug 23 01:20 /var/tmp/lpar2rrd-agent-NimserverPRD-bck-root.stamp
    -rw-r--r--   1 root     root          10 Aug 23 01:20 /var/tmp/lpar2rrd-agent-NimserverPRD-bck-root.stamp-send
    -rw-r--r--   1 root     root           0 Aug 23 01:20 /var/tmp/lpar2rrd-agent-NimserverPRD-bck-root.stamp-trimlogs
    -rw-r--r--   1 root     root           0 Aug 23 01:20 /var/tmp/lpar2rrd-agent-NimserverPRD-bck-root.txt-tmp
    -rw-r--r--   1 root     root         317 Aug 23 01:20 /var/tmp/lpar2rrd-agent-NimserverPRD-bck-root.txtorig
    -rw-r--r--   1 lpar2rrd lpar2rrd    1706 Sep 10 11:20 /var/tmp/lpar2rrd-agent.out
    root@lbdgdbmurex3:~#

    =======================

    [root@lbdgvlpar2rrd lpar2rrd]# grep -i lbdgdbmurex3 logs/error.log-daemon| tail
    Sat Jul 28 20:46:20 2018: Client comunication failed - client:  (192.168.60.15): ERROR: /home/lpar2rrd/lpar2rrd/data/Solaris/no_hmc/lbdgdbmurex3/lan-net0.mmm: illegal attempt to update using time 1532803449 when last update time is 1532803449 (minimum one second step) at /home/lpar2rrd/lpar2rrd/bin/lpar2rrd-daemon.pl line 1038. : sending ok time anyway : 1532803449 : :
    Sat Jul 28 13:54:42 2018: Client comunication failed - client:  (192.168.60.15): ERROR: /home/lpar2rrd/lpar2rrd/data/Solaris/no_hmc/lbdgdbmurex3/lan-net0.mmm: illegal attempt to update using time 1532803511 when last update time is 1532803511 (minimum one second step) at /home/lpar2rrd/lpar2rrd/bin/lpar2rrd-daemon.pl line 1038. : sending ok time anyway : 1532803511 : :
    Sat Jul 28 13:54:42 2018: Client comunication failed - client:  (192.168.60.15): ERROR: /home/lpar2rrd/lpar2rrd/data/Solaris/no_hmc/lbdgdbmurex3/lan-net0.mmm: illegal attempt to update using time 1532803575 when last update time is 1532803575 (minimum one second step) at /home/lpar2rrd/lpar2rrd/bin/lpar2rrd-daemon.pl line 1038. : sending ok time anyway : 1532803575 : :
    Sat Jul 28 13:54:42 2018: Client comunication failed - client:  (192.168.60.15): ERROR: /home/lpar2rrd/lpar2rrd/data/Solaris/no_hmc/lbdgdbmurex3/lan-net0.mmm: illegal attempt to update using time 1532803630 when last update time is 1532803630 (minimum one second step) at /home/lpar2rrd/lpar2rrd/bin/lpar2rrd-daemon.pl line 1038. : sending ok time anyway : 1532803630 : :
    Sat Jul 28 13:54:42 2018: Client comunication failed - client:  (192.168.60.15): ERROR: /home/lpar2rrd/lpar2rrd/data/Solaris/no_hmc/lbdgdbmurex3/lan-net0.mmm: illegal attempt to update using time 1532803692 when last update time is 1532803692 (minimum one second step) at /home/lpar2rrd/lpar2rrd/bin/lpar2rrd-daemon.pl line 1038. : sending ok time anyway : 1532803692 : :
    Sat Jul 28 13:54:42 2018: Client comunication failed - client:  (192.168.60.15): ERROR: /home/lpar2rrd/lpar2rrd/data/Solaris/no_hmc/lbdgdbmurex3/lan-net0.mmm: illegal attempt to update using time 1532803760 when last update time is 1532803760 (minimum one second step) at /home/lpar2rrd/lpar2rrd/bin/lpar2rrd-daemon.pl line 1038. : sending ok time anyway : 1532803760 : :
    Sat Jul 28 13:54:42 2018: Client comunication failed - client:  (192.168.60.15): ERROR: /home/lpar2rrd/lpar2rrd/data/Solaris/no_hmc/lbdgdbmurex3/lan-net0.mmm: illegal attempt to update using time 1532803812 when last update time is 1532803812 (minimum one second step) at /home/lpar2rrd/lpar2rrd/bin/lpar2rrd-daemon.pl line 1038. : sending ok time anyway : 1532803812 : :
    Sat Jul 28 13:54:42 2018: Client comunication failed - client:  (192.168.60.15): ERROR: /home/lpar2rrd/lpar2rrd/data/Solaris/no_hmc/lbdgdbmurex3/lan-net0.mmm: illegal attempt to update using time 1532803870 when last update time is 1532803870 (minimum one second step) at /home/lpar2rrd/lpar2rrd/bin/lpar2rrd-daemon.pl line 1038. : sending ok time anyway : 1532803870 : :
    Sat Jul 28 13:54:42 2018: Client comunication failed - client:  (192.168.60.15): ERROR: /home/lpar2rrd/lpar2rrd/data/Solaris/no_hmc/lbdgdbmurex3/lan-net0.mmm: illegal attempt to update using time 1532803926 when last update time is 1532803926 (minimum one second step) at /home/lpar2rrd/lpar2rrd/bin/lpar2rrd-daemon.pl line 1038. : sending ok time anyway : 1532803926 : :
    Sat Jul 28 13:54:42 2018: Client comunication failed - client:  (192.168.60.15): ERROR: /home/lpar2rrd/lpar2rrd/data/Solaris/no_hmc/lbdgdbmurex3/lan-net0.mmm: illegal attempt to update using time 1532803991 when last update time is 1532803991 (minimum one second step) at /home/lpar2rrd/lpar2rrd/bin/lpar2rrd-daemon.pl line 1038. : sending ok time anyway : 1532803991 : :



  • as per me it si working fine, do you see data in the GUI --> Unmanaged --> Solaris for that host?

    BTW we are finishing completly new Solaris implementation which follows up Solaris (Sparc/x86) virtualisation (LDOM/Global zone support etc). It is being already beta tested, release in Oct 2018.
Sign In or Register to comment.