lpar2rrd network monitoring wrong data

Hi,

we have a AIX LPAR with three different network interfaces and different traffic. but in the lpar2rrd gui the performance data is for all three interfaces the same and this is not true.

[MB/sec]

Int READ/IN Avg     Max     WRITE/OUT Avg     Max    
en10
206.63 684.22
201.58 501.72
en11
206.63 684.19
201.58 501.70
en12
206.63 684.17
201.58 501.69


How can I fix this!

lpar2rrd Agent-version: the latest
lpar2rrd Server-version: 4.70

we try to install the lpar2rrd agent for the first time and want to roll out the agent on all production machines and need to fix this issue first.

I use following crontab entry:
* * * * * /usr/bin/perl /opt/lpar2rrd-agent/lpar2rrd-agent.pl <lpar2rrd_server>

I have no entries in the /var/tmp/lpar2rrd*err files since May 11

Comments

  • Hi

    we do not support such old version for free users.
    Pls upgrade to the latest 5.00 (server & agent) and let us know if the issue persist.

  • Hi Pavel,

    I have upgraded to the latest lpar2rrd version 5.0  (server and agent) but have  the same problem

    [MB/sec]

    en10
    231.60 699.67     
    18.53 225.12
    en11
    231.60 699.67     
    18.53 225.11
    en12
    231.60 699.65     
    18.53 225.11


  • I am checking it in our enc and do not see that, all looks ok, AIX, Linux.

    What is your agent OS?
    rpm -qa| grep lpar2rrd
    tail -1 /var/tmp/lpar2rrd*txt
  • Hi Pavel,

    $ rpm -qa | grep lpar
    lpar2rrd-agent-5.00-0
    $ oslevel -s
    7200-01-01-1642 AIX

    Here is the tail -1 output

    ==> /var/log/tb017/lpar2rrd-agent-lpar2rrd_server-root.txt <==
    9080-MME*1234567:<hostname>:3:1495011360:Wed May 17 10:56:00 2017 version 4.96-0:4024000000|8:<hostname>:0::mem:::1073741824:920092540:153649284:101339108:0:88235776:pgs:::0:0:24576:7:::lan:en10:192.168.203.31:1165993083957646:2335900672640161:136089428177:894643750007:::lan:en11:192.168.113.31:1165993086376987:2335900672896979:136089428232:894643750092:::lan:en12:10.1.236.120:1165993088459607:2335900673179882:136089428278:894643750198:::cpu:::0:11:49:3:::san:fcs0:0x20000090FADC896E:1699758578828377:211911817552992:66670565202:15896992573:::san:fcs1:0x20000090FADC896F:1780582838391467:960022827282432:7942518717:3793802404:::san:fcs2:0x20000090FAE09249:1699759495447486:211928097357920:66657032687:15896959120:::san:fcs3:0x20000090FAE0924A:1800190725660606:973053833142272:8046810280:3833127311:::san_resp:fcs0::0.8:0.4:::::san_resp:fcs1::27.9:4.5:::::san_resp:fcs2::0.8:0.4:::::san_resp:fcs3::27.9:4.5::::

    ==> /var/log/tb017/lpar2rrd-agent-nmon-lpar2rrd_server-root-time_file.txt <==
    SRV_9080-MME*1234567_LPR_<hostname>_TIME_1494507902_XOR_<hostname>_170511_0000.nmon


    Maybe it is not correct displayed, because we are using real native hardware network adapter in our lpar and not VEA (Virtual ethernet adapter) as usual under AIX?


    Thanks!


  • here is our network setup with physical network adapters. the physical adapters combined to a ethernet channel and on the ethernet channel devices we created three virtuell network interfaces for our three different vlans:

    $ lsdev | grep ent
    ent0            Available 02-00       PCIe3 4-Port 10GbE SR Adapter (df1020e21410e304)
    ent1            Available 02-01       PCIe3 4-Port 10GbE SR Adapter (df1020e21410e304)
    ent2            Available 02-02       PCIe3 4-Port 10GbE SR Adapter (df1020e21410e304)
    ent3            Available 02-03       PCIe3 4-Port 10GbE SR Adapter (df1020e21410e304)
    ent4            Available 03-00       PCIe3 4-Port 10GbE SR Adapter (df1020e21410e304)
    ent5            Available 03-01       PCIe3 4-Port 10GbE SR Adapter (df1020e21410e304)
    ent6            Available 03-02       PCIe3 4-Port 10GbE SR Adapter (df1020e21410e304)
    ent7            Available 03-03       PCIe3 4-Port 10GbE SR Adapter (df1020e21410e304)
    ent8            Available             Virtual I/O Ethernet Adapter (l-lan)
    ent9            Available             EtherChannel / IEEE 802.3ad Link Aggregation
    ent10           Available             VLAN
    ent11           Available             VLAN
    ent12           Available             VLAN

    $ lsattr -El 10
    lsattr: 0514-519 The following device was not found in the customized
            device configuration database:
            10

    $ lsattr -El ent9
    adapter_names   ent0,ent1,ent4,ent5 EtherChannel Adapters                           True
    alt_addr        0xe60086007b02      Alternate EtherChannel Address                  True
    auto_recovery   yes                 Enable automatic recovery after failover        True
    backup_adapter  NONE                Adapter used when whole channel fails           True
    hash_mode       default             Determines how outgoing adapter is chosen       True
    interval        short               Determines interval value for IEEE 802.3ad mode True
    mode            8023ad              EtherChannel mode of operation                  True
    netaddr         0                   Address to ping                                 True
    noloss_failover yes                 Enable lossless failover after ping failure     True
    num_retries     3                   Times to retry ping before failing              True
    retry_time      1                   Wait time (in seconds) between pings            True
    use_alt_addr    yes                 Enable Alternate EtherChannel Address           True
    use_jumbo_frame yes                 Enable Gigabit Ethernet Jumbo Frames            True

    $ lsattr -El ent11
    base_adapter  ent9 VLAN Base Adapter True
    vlan_priority 0    VLAN Priority     True
    vlan_tag_id   2113 VLAN Tag ID       True

    $ lsattr -El ent12
    base_adapter  ent9 VLAN Base Adapter True
    vlan_priority 0    VLAN Priority     True
    vlan_tag_id   236  VLAN Tag ID       True
    $

    $ lsattr -El ent10
    base_adapter  ent9 VLAN Base Adapter True
    vlan_priority 0    VLAN Priority     True
    vlan_tag_id   2203 VLAN Tag ID       True


    On the network interfaces ent10,ent11 and ent12 are our service ips defined.

  • entstat -d en10| egrep "Bytes"
    entstat -d en11| egrep "Bytes"
    entstat -d en12| egrep "Bytes"
    sleep 10
    entstat -d en10| egrep "Bytes"
    entstat -d en11| egrep "Bytes"
    entstat -d en12| egrep "Bytes"



  • $ entstat -d en10| egrep "Bytes"
    Bytes: 1242811655484412                       Bytes: 2521087133781420
    $ entstat -d en11| egrep "Bytes"
    Bytes: 1242811657235642                       Bytes: 2521087134061111
    $ entstat -d en12| egrep "Bytes"
    Bytes: 1242811658715091                       Bytes: 2521087134732180
    $ sleep 10
    $ entstat -d en10| egrep "Bytes"
    Bytes: 1242813951892748                       Bytes: 2521088268639117
    $ entstat -d en11| egrep "Bytes"
    Bytes: 1242813952700723                       Bytes: 2521088269711420
    $ entstat -d en12| egrep "Bytes"
    Bytes: 1242814297905151                       Bytes: 2521088466227944
    $


  • I can see that traffic is very similar on all 3 interfaces, is that possible?

    1242813951892748-1242811655484412
    2296408336
    1242813952700723-1242811657235642
    2295465081
    1242814297905151-1242811658715091
    2639190060


  • Hi Pavel,

    when i look at the current nmon network throughput, the entstat Bytes makes no sense to me.
    The most network traffic is definitely at en12.
  • we just report what entstat reports. If it reports wrong numbers then it is a problem in entstat.
    I remember the same case. entstat apparently ignores VLAN taged networks and shows just total traffic. Do they have same MAC: netstat -in

    You can upload nmon file to lpar2rrd to compare it.


  • Hi Pavel,

    yes they have the same mac address and it looks like that this is the problem. Do you remember how to fix that behaviour in aix or maybe on lpar2rrd.


  • there is no fix or workaround.
    I am not sure if we  (customer) have raised a call with IBM.
    You can try it, the problem is clear.
  • Hi pavel,

    okay, I will create a call and we will see what the ibm says.

    thanks anyway.
  • ok, thanks.
    Let us know.
  • Hi,

    an update from the IBM support:
    the vlan adapters get there statistics from the real hardware adapter attached to them.
    that is so designed from ibm.

    Maybe there is another way for lpar2rrd with different commands to get the correct results?

    What do you think Pavel?

    Thanks!
  • Hi,

    I do not think that on the VIOS (AIX) level we can get such stat by using std OS cmds.

    There is something on the HMC level how to get some VLAN stats as far as I remember, but its priority has never reached level to look into at least :(



  • Warrax
    edited September 2018
    Hello,
    I'm seeing the same problem on all LPARs with EtherChannel
    (on LPARs with a single virtual Ethernet, the network load with reliable data)
    on any 5.0X version lpar2rrd and older

    if the lpar2rrd collects data simply from the output of the entstat, for several interconnected interfaces, the data may be processed incorrectly?

    the entstat counters on my problem LPARs look reliably
    for example
    entstat -r en2
    entstat -d en2
    sleep 60
    entstat -d en2

    ETHERNET STATISTICS (en2) :
    Device Type: EtherChannel
    Elapsed Time: 0 days 0 hours 0 minutes 0 seconds
    Transmit Statistics:                          Receive Statistics:
    --------------------                          -------------------
    Packets: 55                                   Packets: 24
    Bytes: 22319                                  Bytes: 2335

    Statistics for every adapter in the EtherChannel:
    -------------------------------------------------
    Number of adapters: 2
    Active channel: primary channel
    Operating mode: Network interface backup mode

    ETHERNET STATISTICS (ent0) :
    Device Type: Host Ethernet Adapter (l-hea)
    Transmit Statistics:                          Receive Statistics:
    --------------------                          -------------------
    Packets: 70                                   Packets: 50
    Bytes: 35984                                  Bytes: 11052

    Backup adapter - ent1:
    ======================
    ETHERNET STATISTICS (ent1) :
    Device Type: Host Ethernet Adapter (l-hea)
    Transmit Statistics:                          Receive Statistics:
    --------------------                          -------------------
    Packets: 0                                    Packets: 0
    Bytes: 0                                      Bytes: 0



    ETHERNET STATISTICS (en2) :
    Device Type: EtherChannel
    Elapsed Time: 0 days 0 hours 0 minutes 58 seconds
    Transmit Statistics:                          Receive Statistics:
    --------------------                          -------------------
    Packets: 223768                               Packets: 326280
    Bytes: 216985038                              Bytes: 115945103

    Statistics for every adapter in the EtherChannel:
    -------------------------------------------------
    Number of adapters: 2
    Active channel: primary channel
    Operating mode: Network interface backup mode
    -------------------------------------------------------------
    ETHERNET STATISTICS (ent0) :
    Device Type: Host Ethernet Adapter (l-hea)
    Transmit Statistics:                          Receive Statistics:
    --------------------                          -------------------
    Packets: 223771                               Packets: 326242
    Bytes: 216986211                              Bytes: 115942447


    Backup adapter - ent1:
    ======================
    ETHERNET STATISTICS (ent1) :
    Device Type: Host Ethernet Adapter (l-hea)
    Transmit Statistics:                          Receive Statistics:
    --------------------                          -------------------
    Packets: 0                                    Packets: 44
    Bytes: 0                                      Bytes: 3669

    P.S. on all graphs lpar2rrd with inaccurate LAN average load about the same ~14Mbps (At the same time, the actual load is quite different)











  • Hi,

    check below lpar2rrd agent data file what is transfered and compare it to entstat data.

    tail -1 /var/tmp/lpar2rrd*txt
    Note there could be filetered 2 files, use the one without "ps"in its name.
    You will see if there are correctly transfered data counters like in nthis example:
    ... lan:en0:192.168.1.9:4069721276:3022293914:17760734:17787938 ...

Sign In or Register to comment.