ExportTool2 & controller failure

Greetings,

this ain't request for help, just an idea / observation.

We monitor our F700 storages via ExportTool2. Today on one of the them failed controller no.1
Stor2rrd remained stuck, trying to get the data from it. Not a problem, I just killed the command. However when I tried to test API connection it returned "timeout", like it was trying to test only against one controller (the failed one).
Manual test of API functionality "sh ./runUnix.sh show interval -ip <ip controller> -login <user> <password>" on the healthy controller finished successfully. 
Unfortunately data collection didn't work as well, until I changed IP of controller #1 to the IP of controller #2. Now it collecting data, even though API test still times out.

If you'd be so nice and consider it in some of future releases, it would be great. If you'd like to check any logs, let me know and I'll send them to you.

Keep up with this amazing tool :)
Thank you
OndraS

Comments

  • Hello,

    What version of stor2rrd do you have?


    Thank you
  • Hello,

    STOR2RRD version2.81
    STOR2RRD editionfree
    OS infoLinux
    Perl versionThis is perl 5, version 16, subversion 3 (v5.16.3) built for x86_64-linux-thread-multi
    Web server infoApache/2.4.6 ()
    RRDTOOL version1.4.8
    RRDp version1.4008
  • please outputs from these commands:

    cd /home/stor2rrd/stor2rrd # or where is your STOR2RRD working dir

    perl bin/conntest.pl <ip controller 1> 443

    perl bin/conntest.pl <ip controller 2> 443


    Thank you


  • here you go, thank you
  • Hello,

    try this fix (connection test)

    https://download.stor2rrd.com/patch/2.81-21-63-g228e/hds_check.sh.gz

    Gunzip it and copy to /home/stor2rrd/stor2rrd/bin (755, stor2rrd owner)

    -rwxr-xr-x 1 stor2rrd stor2rrd 10728 Sep 15 13:52 hds_check.sh

    If your web browser gunzips it automatically then just rename it: mv hds_check.sh.gz hds_check.sh

    Assure that file size is the same as on above example

    If you are on Linux, then change shell interpreter on the first line of that script to #!/bin/bash


    after copy script to bin directory

    1. change controllers (as it was originally)

    2.
    cd /home/stor2rrd/stor2rrd # or where is your STOR2RRD working dir
    ./bin/config_check.sh <storage_name>

    if connection is ok then try agent script (collection data)



    https://download.stor2rrd.com/patch/2.81-21-63-g228e/vspgperf.pl.gz

    Gunzip it and copy to /home/stor2rrd/stor2rrd/bin (755, stor2rrd owner)

    -rwxr-xr-x 1 stor2rrd stor2rrd 264048 Sep 15 14:02 vspgperf.pl

    If your web browser gunzips it automatically then just rename it: mv vspgperf.pl.gz vspgperf.pl

    Assure that file size is the same as on above example


    Let us know.

    Thank you


  • Hello,

    updated hds_check script and setup controllers IP accordingly; output below


    When I tried to change it back and run the config_check again, it looks like the API test is started before the connection check ++ it still somewhere keep the "old" CTL1 IP


    Thank you
  • Hello,

    I didn't know you were using the REST API. Use these latest scripts:

    https://download.stor2rrd.com/patch/2.81-21-63-g228e/vspg_apitest.pl.gz

    Gunzip it and copy to /home/stor2rrd/stor2rrd/bin (755, stor2rrd owner)

    -rwxr-xr-x 1 stor2rrd stor2rrd 6421 Sep 15 15:45 vspg_apitest.pl

    If your web browser gunzips it automatically then just rename it: mv vspg_apitest.pl.gz vspg_apitest.pl

    Assure that file size is the same as on above example

    https://download.stor2rrd.com/patch/2.81-21-63-g228e/config_check.sh.gz

    Gunzip it and copy to /home/stor2rrd/stor2rrd/bin (755, stor2rrd owner)

    -rwxr-xr-x 1 stor2rrd stor2rrd 162369 Sep 15 15:45 config_check.sh

    If your web browser gunzips it automatically then just rename it: mv config_check.sh.gz config_check.sh

    Assure that file size is the same as on above example

    If you are on Linux, then change shell interpreter on the first line of that script to #!/bin/bash

    after copy scripts to bin directory

    connection test of storage

    Let us know.


    Thank you

  • Hello,

    I'm very sorry I did mentioned very poorly about the REST API. Next time I'll be more careful. I updated vspg_apitest.pl, config_check.sh and reverse changes on hds_check.sh and vspgperf.pl, which turned out to be bad idea :)

    After update all 4 scripts, the config_check.sh finished successfully after about 8 minutes. Unfortunately it seems, it can't catch up with 5 minutes crontab schedule, as now I constantly see 4 - 5 vspg_stor_load.sh & vspgperf.pl processes at the time, each due started 5 minutes from the other one.

  • Hello,

    1. Scripts vspg_stor_load.sh & vspgperf.pl run for each storage (VSP).

    2. Do you see data for PH-GLC-VSP-I?

    Thank you
  • Hello,

    1. I know, that those scripts running for each VSP. I was grepping (?!) only the affected hostname.

    2. No data were displayed in stor2rrd, even though I let it run 2 hours to "catch up". I tried to kill all "old" processes to let it fresh start, also didn't help.

    Thank you
  • Send us logs. 
    Note a short problem description in the text field of the upload form. cd /home/stor2rrd/stor2rrd # or where is your STOR2RRD working dir tar cvhf logs.tar logs tmp/*txt gzip -9 logs.tar Send us logs.tar.gz via https://upload.stor2rrd.com

    Thank you

  • logs sent, I left the description the same as the name of this thread

    note: yesterday evening I changed the IP of ctl1 back to "ctl2", so we have some graphs.
    In case logs are rolled-out, I can change it back and leave it several hours running and send new set of logs.

    Thank you
Sign In or Register to comment.