Not getting performance metrics from NetApp devices

Hello!

We are using the latest docker version of STOR2RRD running on CentOS hosts.
We use docker volume mapping for the data and config folders to have persistent data storage.
The issue is the same on 2 different instances, on various sites, with several NetApp FAS and AFF devices.

Our NetApp devices are configured using NetApp FAS CDOT and all the tests are successful when ran.
We do get the capacity metrics but not the IO's anymore: we briefly got some perf metrics when we deployed the container but then it stopped.

I tried updating the naperf.pl file (ONTAP v9.5 bug) but it hasn't fixed the issue so far...

As a side note, our E-series are working well, getting both capacity and performance metrics.

Any suggestion(s) on what could be wrong?

Thanks for any help!
JC

Comments

  • Hello,

    send us logs please. 
    Note a short problem description in the text field of the upload form.

    cd /home/stor2rrd/stor2rrd # or where is your STOR2RRD working dir

    tar cvhf logs.tar logs tmp/*txt

    gzip -9 logs.tar

    Send us logs.tar.gz via https://upload.stor2rrd.com

  • Hi,

    I just uploaded the log files.

    Thank you for your help!
    JC
  • Jirka
    edited July 2021
    Hello,

    I can see no data is collected via SSH, did test connection succeed? You can also try to check connection inside the container:

    docker exec -ti stor2rrd_cont_name /bin/bash
    su - stor2rrd cd; cd stor2rrd
    bin/config_check.sh
    You should see:
    SSH connection OK 

    Let us know please
  • picodon
    edited July 2021
    Hi,

    Both the Web GUI and the CLI (config_check.sh) show all tests are ok (SSH and API).
    I am still getting capacity metrics but not the performance (IO's etc.).

    As a note, the data, etc, and log folders are mapped to the host using RW (docker volumes).

    Thanks,
    JC
  • Hello,

    there is something wrong, I can see this in logs regarding perf data collection:

    (0 lines collected [readings: 12])

    It means no perf data taken after being tried 12 times.

    Did you follow our Netapp howto https://stor2rrd.com/NetApp-monitoring.php ? Please check the roles and running services again.

    Let us know please
  • Hi,

    I have confirmed the role is properly configured on the NetApp side for the stor2rrd user used in the config.

    Which services do you want me to check? From inside the stor2rrd docker container?

    Thanks
  • picodon
    edited July 2021
    Hi,

    Is it mandatory to insert stor2rrd user's SSH key in the target NetApp device, or SSH password is enough?
    In the docker version, the SSH keys is not present by default and needs to be generated.

    Here's an error I get constantly (every 5m poll) for a device showing all connection tests OK in the config (WebUI and CLI):

    ERROR Thu Jul 22 17:06:12 2021 naperf.pl: Error: show failed: sample: "d999_1626969606" does not exist in "nscluster01" context.

    Use of uninitialized value $id in concatenation (.) or string at /home/stor2rrd/stor2rrd/bin/LoadDataModule.pm line 47909, <FH> line 591.

    Use of uninitialized value $id in concatenation (.) or string at /home/stor2rrd/stor2rrd/bin/LoadDataModule.pm line 47909, <FH> line 591.

    ERROR Thu Jul 22 17:11:12 2021 naperf.pl: Error: show failed: sample: "d999_1626969906" does not exist in "nscluster01" context.

    Use of uninitialized value $id in concatenation (.) or string at /home/stor2rrd/stor2rrd/bin/LoadDataModule.pm line 47909, <FH> line 591.

    Use of uninitialized value $id in concatenation (.) or string at /home/stor2rrd/stor2rrd/bin/LoadDataModule.pm line 47909, <FH> line 591.

  • Hello,

    there are too many sample collections stored in your Netapp, old collections are not deleted for some reason. You can see this in error logs:

    Error: command failed: More than 50 samples are not allowed. Please delete existing sample(s) to create a new sample.

    Maybe you cannot use delete command on statistics, can you check it please:
    ssh stor2rrd@nscluster01
    set -privilege advanced -confirmations off
    statistics samples show statistics samples delete -sample-id <insert_one_of_listed>
    Did it work? Can you delete samples as stor2rrd user?

  • Hi,

    The  number of samples was the issue indeed, we deleted a couple of them and we're now getting perf metrics again!

    Thank you very much for your help!!
    JC
Sign In or Register to comment.