Cisco edge switch timeouts

Pete · November 2018

The issue we are running into is in the errors logs snippet below. looks like load.sh times out which is preventing us from getting continues data from our Cisco edge switches. Data pulled from the core switches seem to find and both are on the same IP subnet. Is there a setting we can tweak to address these gaps?

Errors:

Adding to menu : W : CISCO9513E: Health status
Adding SAN port:
end time       : Wed Nov 21 11:21:46 CST 2018
An error occured, check /opt/stor2rrd/logs/error.log and output of load.sh
 
$ tail -701 /opt/stor2rrd/logs/error.log
 
………….
Wed Nov 21 11:05:01 2018: Could not find ports for Fabric VSAN00014xxxxxxx /opt/stor2rrd/bin/san.pl:1883
Wed Nov 21 11:05:01 2018: Could not find ports for Fabric VSAN00015xxxxxxx /opt/stor2rrd/bin/san.pl:1883
^* matches null string many times in regex; marked by <-- HERE in m/^* <-- HERE ENCAPSPOOL*$/ at /opt/stor2rrd/bin/storage.pl line 16823.
^* matches null string many times in regex; marked by <-- HERE in m/^* <-- HERE ENCAPSPOOL*$/ at /opt/stor2rrd/bin/storage.pl line 16823.
………..
Wed Nov 21 11:18:38 2018: /opt/stor2rrd/bin/custom_ext.pl timed out after : 600 seconds
Wed Nov 21 11:21:46 CST 2018

Karel · November 2018

Hi Pete,

Send us logs pls.

Note a short problem description in the text field of the upload form.

cd /home/stor2rrd/stor2rrd # or where is your STOR2RRD working dir
tar cvhf logs.tar logs etc tmp/*txt
gzip -9 logs.tar

Send us logs.tar.gz via https://upload.stor2rrd.com

Karel · November 2018

Hi Pete,

data collection takes more than 10 minutes.

Mon Nov 19 09:50:03 2018: Data collection via snmp timed out after 600 seconds : /opt/stor2rrd/bin/sanperf.pl:91 : Interrupted system call
Mon Nov 19 10:00:38 2018: Data collection via snmp timed out after 600 seconds : /opt/stor2rrd/bin/sanperf.pl:91 : Interrupted system call
Mon Nov 19 10:30:03 2018: Data collection via snmp timed out after 600 seconds : /opt/stor2rrd/bin/sanperf.pl:91 : Interrupted system call
Mon Nov 19 11:30:03 2018: Data collection via snmp timed out after 600 seconds : /opt/stor2rrd/bin/sanperf.pl:91 : Interrupted system call
Mon Nov 19 12:00:06 2018: Data collection via snmp timed out after 600 seconds : /opt/stor2rrd/bin/sanperf.pl:91 : Interrupted system call
Mon Nov 19 12:10:03 2018: Data collection via snmp timed out after 600 seconds : /opt/stor2rrd/bin/sanperf.pl:91 : Interrupted system call
Mon Nov 19 12:20:04 2018: Data collection via snmp timed out after 600 seconds : /opt/stor2rrd/bin/sanperf.pl:91 : No such file or directory
Mon Nov 19 12:30:03 2018: Data collection via snmp timed out after 600 seconds : /opt/stor2rrd/bin/sanperf.pl:91 : Interrupted system call
Mon Nov 19 13:10:02 2018: Data collection via snmp timed out after 600 seconds : /opt/stor2rrd/bin/sanperf.pl:91 : No such file or directory
Mon Nov 19 13:20:03 2018: Data collection via snmp timed out after 600 seconds : /opt/stor2rrd/bin/sanperf.pl:91 : Interrupted system call
Mon Nov 19 13:40:03 2018: Data collection via snmp timed out after 600 seconds : /opt/stor2rrd/bin/sanperf.pl:91 : Interrupted system call
Mon Nov 19 14:20:03 2018: Data collection via snmp timed out after 600 seconds : /opt/stor2rrd/bin/sanperf.pl:91 : Interrupted system call
Mon Nov 19 15:40:03 2018: Data collection via snmp timed out after 600 seconds : /opt/stor2rrd/bin/sanperf.pl:91 : Interrupted system call

Can you increase this timeout (max is 14 minutes)?
Use the following procedure:

su - stor2rrd
cd /home/stor2rrd/stor2rrd # or where is your STOR2RRD working dir
echo "export SAN_DATA_COLLECTION_TIMEOUT=840" >>  etc/.magic

Change also crontab entry.
From:

# SAN agent
#4,14,24,34,44,54 * * * * /opt/stor2rrd/load_sanperf.sh >/opt/stor2rrd/logs/load_sanperf.out 2>&1
*/10 * * * * /opt/stor2rrd/load_sanperf.sh >/opt/stor2rrd/logs/load_sanperf.out 2>&1

To:

# SAN agent
#4,14,24,34,44,54 * * * * /opt/stor2rrd/load_sanperf.sh >/opt/stor2rrd/logs/load_sanperf.out 2>&1
0,12,24,36,48 * * * * /opt/stor2rrd/load_sanperf.sh >/opt/stor2rrd/logs/load_sanperf.out 2>&1

Let us know if that helped.

Pete · November 2018

I made the changes and will give it a few hours to run through its paces. Thank you for quick assistance.

Pete · November 2018

No changes to captured data. We still have major gaps in pulled data.
From first glance at the logs, I do not see timeouts as in the past, so not sure why these gaps only show up on the edges.

Image: https://forum.xorux.com/uploads/editor/ic/ofevma86mpju.jpg

Pavel · December 2018

Hi,

I affraid we cannot do too much here, switches are not able to response to snmp queries within given time period.

We are planing to do some optimalisation in the code at start of new year, it might help a bit.

It is unfortunatelly all we can do for now

Cisco edge switch timeouts

Comments

Howdy, Stranger!

Categories

In this Discussion