ExportTool2 & controller failure
Greetings,
this ain't request for help, just an idea / observation.
We monitor our F700 storages via ExportTool2. Today on one of the them failed controller no.1
Stor2rrd remained stuck, trying to get the data from it. Not a problem, I just killed the command. However when I tried to test API connection it returned "timeout", like it was trying to test only against one controller (the failed one).
Manual test of API functionality "sh ./runUnix.sh show interval -ip <ip controller> -login <user> <password>" on the healthy controller finished successfully.
Unfortunately data collection didn't work as well, until I changed IP of controller #1 to the IP of controller #2. Now it collecting data, even though API test still times out.
If you'd be so nice and consider it in some of future releases, it would be great. If you'd like to check any logs, let me know and I'll send them to you.
Keep up with this amazing tool
Thank you
OndraS
this ain't request for help, just an idea / observation.
We monitor our F700 storages via ExportTool2. Today on one of the them failed controller no.1
Stor2rrd remained stuck, trying to get the data from it. Not a problem, I just killed the command. However when I tried to test API connection it returned "timeout", like it was trying to test only against one controller (the failed one).
Manual test of API functionality "sh ./runUnix.sh show interval -ip <ip controller> -login <user> <password>" on the healthy controller finished successfully.
Unfortunately data collection didn't work as well, until I changed IP of controller #1 to the IP of controller #2. Now it collecting data, even though API test still times out.
If you'd be so nice and consider it in some of future releases, it would be great. If you'd like to check any logs, let me know and I'll send them to you.
Keep up with this amazing tool
Thank you
OndraS
Comments
-
Hello,
What version of stor2rrd do you have?
Thank you
-
Hello,
STOR2RRD version 2.81 STOR2RRD edition free OS info Linux Perl version This is perl 5, version 16, subversion 3 (v5.16.3) built for x86_64-linux-thread-multi Web server info Apache/2.4.6 () RRDTOOL version 1.4.8 RRDp version 1.4008 -
please outputs from these commands:
cd /home/stor2rrd/stor2rrd # or where is your STOR2RRD working dir
perl bin/conntest.pl <ip controller 1> 443
perl bin/conntest.pl <ip controller 2> 443
Thank you
-
here you go, thank you -
Hello,
try this fix (connection test)
https://download.stor2rrd.com/patch/2.81-21-63-g228e/hds_check.sh.gz
Gunzip it and copy to /home/stor2rrd/stor2rrd/bin (755, stor2rrd owner)
-rwxr-xr-x 1 stor2rrd stor2rrd 10728 Sep 15 13:52 hds_check.sh
If your web browser gunzips it automatically then just rename it: mv hds_check.sh.gz hds_check.sh
Assure that file size is the same as on above example
If you are on Linux, then change shell interpreter on the first line of that script to #!/bin/bash
after copy script to bin directory
1. change controllers (as it was originally)
2.cd /home/stor2rrd/stor2rrd # or where is your STOR2RRD working dir
./bin/config_check.sh <storage_name>
if connection is ok then try agent script (collection data)
https://download.stor2rrd.com/patch/2.81-21-63-g228e/vspgperf.pl.gz
Gunzip it and copy to /home/stor2rrd/stor2rrd/bin (755, stor2rrd owner)
-rwxr-xr-x 1 stor2rrd stor2rrd 264048 Sep 15 14:02 vspgperf.pl
If your web browser gunzips it automatically then just rename it: mv vspgperf.pl.gz vspgperf.pl
Assure that file size is the same as on above example
Let us know.
Thank you
-
Hello,
updated hds_check script and setup controllers IP accordingly; output below
When I tried to change it back and run the config_check again, it looks like the API test is started before the connection check ++ it still somewhere keep the "old" CTL1 IP
Thank you -
Hello,
I didn't know you were using the REST API. Use these latest scripts:
https://download.stor2rrd.com/patch/2.81-21-63-g228e/vspg_apitest.pl.gz
Gunzip it and copy to /home/stor2rrd/stor2rrd/bin (755, stor2rrd owner)
-rwxr-xr-x 1 stor2rrd stor2rrd 6421 Sep 15 15:45 vspg_apitest.pl
If your web browser gunzips it automatically then just rename it: mv vspg_apitest.pl.gz vspg_apitest.pl
Assure that file size is the same as on above example
https://download.stor2rrd.com/patch/2.81-21-63-g228e/config_check.sh.gz
Gunzip it and copy to /home/stor2rrd/stor2rrd/bin (755, stor2rrd owner)
-rwxr-xr-x 1 stor2rrd stor2rrd 162369 Sep 15 15:45 config_check.sh
If your web browser gunzips it automatically then just rename it: mv config_check.sh.gz config_check.sh
Assure that file size is the same as on above example
If you are on Linux, then change shell interpreter on the first line of that script to #!/bin/bash
after copy scripts to bin directory
connection test of storage
Let us know.
Thank you
-
Hello,
I'm very sorry I did mentioned very poorly about the REST API. Next time I'll be more careful. I updated vspg_apitest.pl, config_check.sh and reverse changes on hds_check.sh and vspgperf.pl, which turned out to be bad idea
After update all 4 scripts, the config_check.sh finished successfully after about 8 minutes. Unfortunately it seems, it can't catch up with 5 minutes crontab schedule, as now I constantly see 4 - 5 vspg_stor_load.sh & vspgperf.pl processes at the time, each due started 5 minutes from the other one.
-
Hello,
1. Scripts vspg_stor_load.sh & vspgperf.pl run for each storage (VSP).
2. Do you see data for PH-GLC-VSP-I?
Thank you
-
Hello,
1. I know, that those scripts running for each VSP. I was grepping (?!) only the affected hostname.
2. No data were displayed in stor2rrd, even though I let it run 2 hours to "catch up". I tried to kill all "old" processes to let it fresh start, also didn't help.
Thank you -
Send us logs.
Note a short problem description in the text field of the upload form. cd /home/stor2rrd/stor2rrd # or where is your STOR2RRD working dir tar cvhf logs.tar logs tmp/*txt gzip -9 logs.tar Send us logs.tar.gz via https://upload.stor2rrd.com
Thank you
-
logs sent, I left the description the same as the name of this thread
note: yesterday evening I changed the IP of ctl1 back to "ctl2", so we have some graphs.
In case logs are rolled-out, I can change it back and leave it several hours running and send new set of logs.
Thank you
Howdy, Stranger!
Categories
- 1.6K All Categories
- 50 XORMON NG
- 25 XORMON
- 155 LPAR2RRD
- 13 VMware
- 16 IBM i
- 2 oVirt / RHV
- 4 MS Windows and Hyper-V
- Solaris / OracleVM
- XenServer / Citrix
- Nutanix
- 7 Database
- 2 Cloud
- 10 Kubernetes / OpenShift / Docker
- 125 STOR2RRD
- 19 SAN
- 7 LAN
- 17 IBM
- 3 EMC
- 12 Hitachi
- 5 NetApp
- 15 HPE
- Lenovo
- 1 Huawei
- 2 Dell
- Fujitsu
- 2 DataCore
- INFINIDAT
- 3 Pure Storage
- Oracle