Improved CPU utilization graphing for Hitachi arrays
because of an issue with CPU utilization on Hitachi arrays we've had recently I took a deep dive into how stor2rrd shows it and found out that it really needs improvement in several ways.
Let's discuss high-end Hitachi (VSP, VSP G1000, G1500, F1500, HPE XP) first.
Stor2rrd shows an average utilization graph followed by a breakdown per each object it retrieves. It is not very useful because an array has 2 types of CPUs for different tasks and over-utilization of one CPU type will be averaged out by another. If there's an issue, we'll never notice that from this graph. Okay, here goes:
1. MPB (Microprocessor Blade) - general I/O processing. A single MPB is basically an Intel CPU with 4 (Hitachi VSP) or 8 (Hitachi VSP G1000 and later) cores. An object string looks like that: MPB-1MA.MP00-1MB where MPB-1MA is a CPU and MP00-1MP is an individual core.
2. DRR (Data Recovery and Reconstruction) - RAID parity calculations. A typical object string looks like that: CHA-xxx.DRR-xxxx or DKA-xxx.DRR-xxxx. CHA (Channel Adapter) processes front-end I/O, DKA (Disk Adapter) processes back-end IO.
So what are my suggestions? I'd like to see the following in the CPU section
- Total - Should be displayed by subsystem, without aggregation (MPB averaged, DRR-CHA averaged, DRR-DKA averaged)
- MPB - Aggregated load per CPU (plot every MPB-xxx on the graph but have all cores averaged out). Plotting every core on the graph will probably give 64+ objects for large systems and is therefore not necessary. Individual CPU cores typically have the same utilization so averaging is fine.
- DRR - Aggregated per adapter (CHA+DKA), Aggregated per CHA (CHA-xxx.DRR-xxxx), Aggregated per DKA (DKA-xxx.DRR-xxxx). If number of items is too large, average out the items under every CHA-xxx and DKA-xxx - that will cut the amount of plot items in half.
Mid-range Hitachis are less complicated because they have no dedicated CPUs. Their CPU units are named MPU-xx.MPxx-xx where MPU-xx is an individual CPU and MPxx-xx is a CPU core. I'll be happy to see a utilization graph broken down by MPU so cases of bad system load balancing become obvious. Just like above, it is possible to average out the items under each MPU-xx to make graphs less complex.
Hope you can help me out and we'll make stor2rrd even better than it is now.
- 1.3K All Categories
- 5 XORMON
- 62 LPAR2RRD
- 5 VMware
- 3 IBM i
- oVirt / RHV
- 2 MS Windows and Hyper-V
- Solaris / OracleVM
- XenServer / Citrix
- 2 Database
- 1 Cloud
- 5 Kubernetes / OpenShift / Docker
- 44 STOR2RRD
- 7 SAN
- 2 LAN
- 5 IBM
- 1 EMC
- 5 Hitachi
- 4 NetApp
- 4 HPE
- 1 Huawei
- 1 Dell
- 2 Pure Storage