Overall Health Status shows all FC Switches NotOK but not all are individually showing as NotOK

Cerberus128 · April 2021

We have 4 Cisco MDS 9418S FC switches, 2 at each site, dual redundant connectivity to IBM SWIZ fs5100(s). I added them to SAN in STOR2RRD (Hyper-V image) and they seem to be collecting the correct info. The 'Overall' Health Status shows all 4 as NotOK

Image: https://forum.xorux.com/uploads/editor/iy/p3waa4wsdn2v.jpg

but going to each of the individual switches (in SAN SWITCH>{SWITCH NAME}>Health status>Switch Status(tab), two show as OK, and 2 show as 'warning'.

Image: https://forum.xorux.com/uploads/editor/4o/j138u1hr45oc.jpg

Image: https://forum.xorux.com/uploads/editor/t0/y1oggl8yw2ts.jpg

Image: https://forum.xorux.com/uploads/editor/ux/qj5dff46xr8k.jpg

The two in marginal status are at one site, and the two that are 'status: ok' are at the other. One site is live, processing production data to the SAN across the Fiber, the other site is the failover/role swap site so it is seeing no 'fiber' read/write (is it HW replication, so only IP), so that is my guess as to why the idle site is 'ok'.

P.S. What are (some of) the conditions that you are coding for to change the status from OK to warning (and what are the status I can expect to see...) I cannot find any hint as to what criteria you might be using.

Karel · April 2021

Hi,

the main health status of the Cisco SAN switch is based on the following 2 conditions.

1. the switch status

connUnitStatus - 1.3.6.1.3.94.1.6.1.6
possible values: 1-unknown, 2-unused, 3-ok, 4-warning, 5-failed
Red status = warning or failed

2. port statuses

a) ifOperStatus - 1.3.6.1.2.1.2.2.1.8

possible values: 1-up, 2-down, 3-testing, 4-unknown, 5-dormant, 6-notPresent, 7-lowerLayerDown
Red status = dormant or lowerLayerDown

b) fcIfOperStatusCause - 1.3.6.1.4.1.9.9.289.1.1.2.1.7

You can find possible values and statuses which causes red status of the switch in the stor2rrd instalation here:

cd /home/stor2rrd/stor2rrd # or where is your STOR2RRD working dir
head etc/cisco-status.txt

# Cisco status error file
# It is based on http://tools.cisco.com/Support/SNMP/do/BrowseOID.do?local=en&translate=Translate&typeName=FcIfOperStatusReason
#
# You can modify it on your own, when you save it under cisco-status_custom.txt
# then it will be prefered and will not be overwriten by upgrade
#
grey : 1 : other
green : 2 : none
red : 3 : hwFailure
red : 4 : loopbackDiagFailure

Overall Health Status shows all FC Switches NotOK but not all are individually showing as NotOK

Comments

Howdy, Stranger!

Quick Links

Categories

In this Discussion