Inconsistencies for data rate for nodes and for ports (cdot)

Hello,

we have inconsistencies for some of our NetApps in the representation of the data rate for the nodes and for the ports.

In this example there is only throughput for node A and nothing for B.
There seems to be also a wrong label for the Y axis: about a factor 1000 to small compared to the real values, which can be seen for example in the picture for the rate by protocol.


In contrast, the picture for the ports shows node B (e0h/e0g) with throughput.




In Grafana the ports are displayed correctly.

We have two FAS8200 (ONTAP 9.5P8 resp. 9.5P6) with this behaviour, and two FAS8200 (ONTAP 9.5P8 resp. 9.5P6), where it looks ok.
And there is also a FAS2750 (9.5P6) with these inconsistencies.

I’ve also upgraded STOR2RRD to 8.21-16 without improvement.

Regards,
Arndt


Comments

  • Hi,

    send us logs, note storages names, proper one and wrong one


    Note a short problem description in the text field of the upload form.

    cd /home/stor2rrd/stor2rrd # or where is your STOR2RRD working dir

    tar cvhf logs.tar logs tmp/*txt

    gzip -9 logs.tar

    Send us logs.tar.gz via https://upload.stor2rrd.com

  • Hi,

    I’ve uploaded logs.tar.gz as well as details.txt with details.

    The issue with the wrong label for the Y axis appears for all NetApps.

    Thanks


  • Hi Arndt,

    we need some raw perf data from affected Netapp, can you please do following steps:
    su - stor2rrd
    cd /home/stor2rrd/stor2rrd
    echo "export DEBUG_FULL=2" >> etc/.magic
    Wait for 15 minutes, then:
    cd /home/stor2rrd/stor2rrd
    tar cvf dump.tar data/<YOUR_NETAPP>/*.dump
    gzip -9 dump.tar
    please replace <YOUR_NETAPP> with affected netapp name

    Send us dump.tar.gz for analysis ( https://upload.stor2rrd.com )
    Don't worry, there will be no private content in that file.

    Don't forget to remove etc/.magic , or just comment out line containing DEBUG_FULL=2
    rm etc/.magic
    Thanks in advance
  • Hi Jirka,

    I’ve uploded the data for all NetApps.

    Is it possible that there is a missing assignment for the ports to the node_name as seen in the *cli.dump, like it is there for processor or fcp_lif? 


  • edited September 10
    Hi Arndt,

    I can see data for both nodes only in mshnap1 and mshnap2 dumps, others has non-zero values for only one of them (either A or B ) - that's what we present in graphs. 

    Please check it on your dumps:
    cd data/oggnap10
    grep -e 'system:node' *cli.dump
    Ad bad y-axis scale: I have to made some calculations on dumps you send, I'll reply on it ASAP

  • Hi Jirka,

    your grep for oggnap10 is ok, and also that nearly all throughput is on node A and nothing on node B. This corresponds to the reality and to the representation in STOR2RRD for the node.

    The problem is, that in the representation for the ports (e0g/e0h) all throughput is on node B and nothing on A in contrast to the above.

    So doing a grep in the cli.dump for e0g resp. e0h, I can’t see any assignment to a node. It seems as if the first set of values (recv-data,recv-packets,recv-errors,sent-data,sent-packets,sent-errors) belong to node A, because these are high values, and the second set belongs to node B (low values).

    But in the representation for the ports it’s vice versa, as if something gets confused or has not the right assignment.






  • Hi Arndt,

    you're right, there is a bug in port mapping:  I've used a hash structure for nodes instead of array, so it 's mapping ports to nodes in random order. Sometimes it's good, another time it's bad :-/

    I'm working on the fix.
  • Great - thank you!
  • Hi Arndt,

    please try this patch, it should fix node->port mapping:

    https://download.stor2rrd.com/patch/2.81-21-64-g91de9/naperf.pl.gz

    Gunzip it and copy to /home/stor2rrd/stor2rrd/bin (755, stor2rrd owner)
    -rwxr-xr-x 1 stor2rrd stor2rrd 110083 Sep 16 09:18 naperf.pl

    If your web browser gunzips it automatically then just rename it: mv naperf.pl.gz naperf.pl
    Assure that file size is the same as on above example

    Let us know please
  • Hi Jirka,

    at first sight it looks good.
    But I’ve a problem with the FAS2750 oggnap11.

    Below is an example for the time from about 11am to  2pm. The graphs for the data rate for the nodes and for the ports should be similar. But they are not. Especially there are contributions from ports OGGNAP11A-e0c/e which can’t be seen in the node representation.

    I’ve collected new dumps (only for 10 minutes) and uploaded them.

    I also wonder why I still can’t see any relationship in the dumps for the port data to the nodes, as this is the case for other objects like lif, processor, …





Sign In or Register to comment.