Cisco edge switch timeouts
The issue we are running into is in the errors logs snippet below. looks like load.sh times out which is preventing us from getting continues data from our Cisco edge switches. Data pulled from the core switches seem to find and both are on the same IP subnet. Is there a setting we can tweak to address these gaps?
Errors:
Adding to menu : W : CISCO9513E: Health status
Adding SAN port:
end time : Wed Nov 21 11:21:46 CST 2018
An error occured, check /opt/stor2rrd/logs/error.log and output of load.sh
$ tail -701 /opt/stor2rrd/logs/error.log
………….
Wed Nov 21 11:05:01 2018: Could not find ports for Fabric VSAN00014xxxxxxx /opt/stor2rrd/bin/san.pl:1883
Wed Nov 21 11:05:01 2018: Could not find ports for Fabric VSAN00015xxxxxxx /opt/stor2rrd/bin/san.pl:1883
^* matches null string many times in regex; marked by <-- HERE in m/^* <-- HERE ENCAPSPOOL*$/ at /opt/stor2rrd/bin/storage.pl line 16823.
^* matches null string many times in regex; marked by <-- HERE in m/^* <-- HERE ENCAPSPOOL*$/ at /opt/stor2rrd/bin/storage.pl line 16823.
………..
Wed Nov 21 11:18:38 2018: /opt/stor2rrd/bin/custom_ext.pl timed out after : 600 seconds
Wed Nov 21 11:21:46 CST 2018
Comments
-
Hi Pete,Send us logs pls.Note a short problem description in the text field of the upload form.
cd /home/stor2rrd/stor2rrd # or where is your STOR2RRD working dir
tar cvhf logs.tar logs etc tmp/*txt
gzip -9 logs.tarSend us logs.tar.gz via https://upload.stor2rrd.com -
Hi Pete,
data collection takes more than 10 minutes.Mon Nov 19 09:50:03 2018: Data collection via snmp timed out after 600 seconds : /opt/stor2rrd/bin/sanperf.pl:91 : Interrupted system call
Mon Nov 19 10:00:38 2018: Data collection via snmp timed out after 600 seconds : /opt/stor2rrd/bin/sanperf.pl:91 : Interrupted system call
Mon Nov 19 10:30:03 2018: Data collection via snmp timed out after 600 seconds : /opt/stor2rrd/bin/sanperf.pl:91 : Interrupted system call
Mon Nov 19 11:30:03 2018: Data collection via snmp timed out after 600 seconds : /opt/stor2rrd/bin/sanperf.pl:91 : Interrupted system call
Mon Nov 19 12:00:06 2018: Data collection via snmp timed out after 600 seconds : /opt/stor2rrd/bin/sanperf.pl:91 : Interrupted system call
Mon Nov 19 12:10:03 2018: Data collection via snmp timed out after 600 seconds : /opt/stor2rrd/bin/sanperf.pl:91 : Interrupted system call
Mon Nov 19 12:20:04 2018: Data collection via snmp timed out after 600 seconds : /opt/stor2rrd/bin/sanperf.pl:91 : No such file or directory
Mon Nov 19 12:30:03 2018: Data collection via snmp timed out after 600 seconds : /opt/stor2rrd/bin/sanperf.pl:91 : Interrupted system call
Mon Nov 19 13:10:02 2018: Data collection via snmp timed out after 600 seconds : /opt/stor2rrd/bin/sanperf.pl:91 : No such file or directory
Mon Nov 19 13:20:03 2018: Data collection via snmp timed out after 600 seconds : /opt/stor2rrd/bin/sanperf.pl:91 : Interrupted system call
Mon Nov 19 13:40:03 2018: Data collection via snmp timed out after 600 seconds : /opt/stor2rrd/bin/sanperf.pl:91 : Interrupted system call
Mon Nov 19 14:20:03 2018: Data collection via snmp timed out after 600 seconds : /opt/stor2rrd/bin/sanperf.pl:91 : Interrupted system call
Mon Nov 19 15:40:03 2018: Data collection via snmp timed out after 600 seconds : /opt/stor2rrd/bin/sanperf.pl:91 : Interrupted system callCan you increase this timeout (max is 14 minutes)?
Use the following procedure:su - stor2rrd
cd /home/stor2rrd/stor2rrd # or where is your STOR2RRD working dir
echo "export SAN_DATA_COLLECTION_TIMEOUT=840" >> etc/.magic
Change also crontab entry.
From:# SAN agent
#4,14,24,34,44,54 * * * * /opt/stor2rrd/load_sanperf.sh >/opt/stor2rrd/logs/load_sanperf.out 2>&1
*/10 * * * * /opt/stor2rrd/load_sanperf.sh >/opt/stor2rrd/logs/load_sanperf.out 2>&1
To:# SAN agent
#4,14,24,34,44,54 * * * * /opt/stor2rrd/load_sanperf.sh >/opt/stor2rrd/logs/load_sanperf.out 2>&1
0,12,24,36,48 * * * * /opt/stor2rrd/load_sanperf.sh >/opt/stor2rrd/logs/load_sanperf.out 2>&1
Let us know if that helped. -
I made the changes and will give it a few hours to run through its paces. Thank you for quick assistance.
-
No changes to captured data. We still have major gaps in pulled data.
From first glance at the logs, I do not see timeouts as in the past, so not sure why these gaps only show up on the edges.
-
Hi,I affraid we cannot do too much here, switches are not able to response to snmp queries within given time period.We are planing to do some optimalisation in the code at start of new year, it might help a bit.It is unfortunatelly all we can do for now
Howdy, Stranger!
Categories
- 1.6K All Categories
- 43 XORMON NG
- 25 XORMON
- 152 LPAR2RRD
- 13 VMware
- 16 IBM i
- 2 oVirt / RHV
- 4 MS Windows and Hyper-V
- Solaris / OracleVM
- XenServer / Citrix
- Nutanix
- 7 Database
- 2 Cloud
- 10 Kubernetes / OpenShift / Docker
- 122 STOR2RRD
- 19 SAN
- 7 LAN
- 17 IBM
- 3 EMC
- 12 Hitachi
- 5 NetApp
- 15 HPE
- Lenovo
- 1 Huawei
- 1 Dell
- Fujitsu
- 2 DataCore
- INFINIDAT
- 3 Pure Storage
- Oracle