Alert of specific Pool Fails

Hi,

I'm having problems trying to alert over specific pool on two different storages, an Hitachi and a IBM. This is are the errors of the alert test:

For the IBM DS3700:
WARNING        : POOL: LID-STR-BKP:Pool-BKP_00:io_rate : no data at: 15:22:00 23/11/2016 - 600 seconds /home/lpar2rrd/stor2rrd/bin/AlertStor2rrd.pm:953 : 
Util           : POOL: LID-STR-BKP:Pool-BKP_00:io : some problem encountered, continuing with next rule

For the Hitachi:
Working for    : POOL: HUS130_9223267XX:0:resp_t_r : POOL:HUS130_922326XX:0:resp_t_r:5:10:15::Giorgio
WARNING        : POOL: HUS130_922326XX:0:resp_t_r  Pool "0" has not been found in /home/lpar2rrd/stor2rrd/data/HUS130_922326XX/pool.cfg /home/lpar2rrd/stor2rrd/bin/AlertStor2rrd.pm:509 : 
Working for    : POOL-ALL: HUS130_922326XX::io : POOL-ALL:HUS130_92232670::io:7000:10:15::Giorgio
The other configured alerts, for all pools of a storage, works right. Please Your help to solve this issue. Regards Giorgio

Comments

  • More info:

    I'm using the appliance,  updated to 1.35...

    Regards.
  • DS3K: run below cmd to see if there is any data being updated
     (for pool ID 0, put there another id if pool ID 0 does not exist)
    rrdtool fetch data/LID-STR-BKP/POOL/0.rrd AVERAGE -r 300 -s now-900 -e now


    HUS: is there pool ID 0?
    cat /home/lpar2rrd/stor2rrd/data/HUS130_922326XX/pool.cfg




  • Hi Pavel,

    Here is what I've got for the IBM:

    [lpar2rrd@stor2rrd POOL]$ pwd
    /home/stor2rrd/stor2rrd/data/LID-STR-BKP/POOL
    [lpar2rrd@stor2rrd POOL]$ ls -l
    total 4416
    -rw-r--r-- 1 lpar2rrd lpar2rrd      10 Nov 23 19:00 2-cap.first
    -rw-r--r-- 1 lpar2rrd lpar2rrd 2915480 Nov 24 09:25 2-cap.rrd
    -rw-r--r-- 1 lpar2rrd lpar2rrd      10 Nov 23 19:00 2.first
    -rw-r--r-- 1 lpar2rrd lpar2rrd 1590560 Nov 24 09:25 2.rrd
    -rw-r--r-- 1 lpar2rrd lpar2rrd      43 Nov 21 11:20 pools.col
    [lpar2rrd@stor2rrd POOL]$ cat pools.col  
    0 : Pool-BKP_00                            
    [lpar2rrd@stor2rrd POOL]$ rrdtool fetch 2.rrd AVERAGE -r 300 -s now-900 -e now
                              read               write             read_io            write_io            resp_t_r            resp_t_w

    1479989700: 8.6878573970e+04 6.4870207873e+04 1.8670289867e+03 8.5343030667e+02 2.0230040000e+01 6.2984933333e+00
    1479990000: -nan -nan -nan -nan -nan -nan
    1479990300: -nan -nan -nan -nan -nan -nan
    1479990600: -nan -nan -nan -nan -nan -nan
    [lpar2rrd@stor2rrd POOL]$

    And for the HUS:

    [lpar2rrd@stor2rrd HUS130_922326XX]$ pwd
    /home/stor2rrd/stor2rrd/data/HUS130_922326XX 
    [lpar2rrd@stor2rrd HUS130_922326XX]$ cat pool.cfg                 
    0:0
    10:10
    [lpar2rrd@stor2rrd HUS130_922326XX]$ ls -la POOL/
    total 5200
    drwxr-xr-x  2 lpar2rrd lpar2rrd      76 Nov 21 11:20 .
    drwxr-xr-x 11 lpar2rrd lpar2rrd    4096 Nov 24 09:25 ..
    -rw-r--r--  1 lpar2rrd lpar2rrd      10 Nov 23 15:40 0.first
    -rw-r--r--  1 lpar2rrd lpar2rrd 2650496 Nov 24 09:25 0.rrd
    -rw-r--r--  1 lpar2rrd lpar2rrd      10 Nov 23 15:40 10.first
    -rw-r--r--  1 lpar2rrd lpar2rrd 2650496 Nov 24 09:25 10.rrd
    -rw-r--r--  1 lpar2rrd lpar2rrd      86 Nov 21 11:20 pools.col
    [lpar2rrd@stor2rrd HUS130_922326XX]$

    Thanks for Your Help...

    Regards
  • What is local time on your V3700? Is not it couple of minutes back?

    regarding HUS issue there is a probolem when pool name is "0". Then it does not work for that. Install this fix:

    http://www.stor2rrd.com/download/AlertStor2rrd.pm.gz
    Gunzip it and copy to /home/stor2rrd/stor2rrd/bin (755, stor2rrd owner)
    -rwxr-xr-x    1 lpar2rrd staff         40985 Nov 24 14:10 bin/AlertStor2rrd.pm
    If your web browser gunzips it automatically then just rename it: mv AlertStor2rrd.pm.gz AlertStor2rrd.pm


  • The local time is the same at the appliance and the V7300, but at the graphs there is no data for the last 10 or 15 minutes.... any idea how to fix this? 

    The fix worked great for the HUS, thanks.

    regards,
  • I am not sre whay it is 10 - 15 mins back (usually it might be abou 5 - 10 mi back), but it should not be a problem. It can be a problem only for testing from the GUI.

    Let it work, set some alert (low limit) and check if you receive an alert.
Sign In or Register to comment.