XIV volumes data not displayed after linux host reboot

Hi

We are running stor2rrd v2.40 on a RHEL 7.5. After applying patches , we do not see all the LUNS in the web GUI.

looks at stor2rrd CLI , we see the following error in logs/error.log-XIVNAMEHERE
WANING Tue Jul 2 15:10:10 2019 xivperf.pl: Same sample: IBM.2812-7860226.volume.9e814d0006a = 20190629071904.000000+000, skipping

There are hundreds if not thousands of lines above this. I have looked at the code on github , this happens when previous volume data has no time stamp. Is there any fix like deleting the stale files to kick start this back to life.

Atif




Comments


  • Line 335 - 339 ,

    &warning("Same sample: ".$cacheKey." = ".$stat->{'StatisticTime'}.", skipping.");
    $retry_cnt = 0;
    if($debug) { print("=\n"); }
    last;
    if ( $storage_epoch == $cached_raw_counters->{'storage_timestamp'} ) {


  • Hi,

    try to remove these file
    stor2rrd/data/XIVNAMEHERE/tmp/cache.file
    stor2rrd/data/XIVNAMEHERE/tmp/config.file

    Let us know if that helps

  • Thanks for your reply.

    This location had no files stor2rrd/data/XIVNAMEHERE/tmp/*
    Instead I found the files here
    stor2rrd/data/XIVNAMEHERE/cache.file
    stor2rrd/data/XIVNAMEHERE/config.file

    I have backed up the original file and deleted them. I re-ran load.sh and I will give it a couple of hours to see if that helps, will keep you posted.






  • Only one host is displaying the correct IO rate and data rate on the Stor2rrd interface. Rest of the hosts show "-nan" values in the graphs.
  • WE have dig through the time line, the day data stopped collecting was a reboot , interestingly this XIV was a target XIV in mirrored relationship. We switched to this XIV as primary and removed the mirrored relationships.
    After doing that, Stor2rrd got in the state where it displays -nan in the graphs
  • The linux host reboot and mirror relationship change happened at the same day, stor2rrd never worked properly after the bounce
  • Hello,

    Send us logs pls.
    Note a short problem description in the text field of the upload form.

    cd /home/stor2rrd/stor2rrd # or where is your STOR2RRD working dir

    tar cvhf logs.tar logs tmp/*txt

    gzip -9 logs.tar

    Send us logs.tar.gz via https://upload.stor2rrd.com
    You might even attach screenshots when it helps in understanding of the issue.

    thank you
  • Hi,

    pls screenshots
    1. host which is working, 
    2. some other host
    3. volume aggregated graph
    4. ls -l data/<storage>HOST
    5. cat data/<storage>/HOST/host*txt


  • Historical data on this XIV is not important as it was a mirrored target until we changed it to source recently.

    Any way we can remove this XIV completely from stor2rrd and rediscover it again.

  • you can remove data/<storage name> directory, then everything wil be rediscovered
  • I removed the old data using rm -Rf data/xivnamehere*

    Rediscovery worked and now we have some hosts with values in the charts and some state the following error in the GUI.
    Error happened
    Check
    - $$LPAR2RRD_HOME$$/logs/error-cgi.log
    - Web server error log

    When I checked the log/error.log-xivname , I see it flooded with entries like this one

    Warning Tue Jul 9 11:50:13 2019 xivperf.pl: Same sample: IBM.2812-7860226.volume.xxxxxxxxxxxx =  20190629071904.000000+000, skipping

    where xxxxxxxxxxx changes every line. Interesting fact , June 29 is when we stopped the remote mirror relationships and unlocked the volumes in this XIV and made them independent volumes. It looks like it has nothing to do with the reboot but it is somehow effected by stopping the relationship and unlocking the volumes on XIV.

    XIV itself is working fine but stor2rrd is not working properly after the replication change.

Sign In or Register to comment.