intermittent stats collection on G600

Hi Guys

Great product etc. Have it monitoring Various NetApp, VSP, SVC/V7, E-Sereis, Brocade and of course Vmware.  Just having  issues with inconsistent collection for G600's. I have 4 G200's that are collecting very well and not sure whats the issue. My thoughts are:

The G6's are very busy, More than 60% o all MPU's constantly whereas the G2' are less than 30% max
There are man other systems hitting the G6's as they are customer prod so there is more HORCM instances on them, like 3.
The G6's are a lot more complex as in number of items (LDEVs, HOSTGROUPS etc) than the G2's so there may be some timeout?

Anyway, I use a Command Device as per the Installation guide for the Virtual appliance I installed based on( version=2.31-1)



Comments

  • [lpar2rrd@bne3-0001xorux ~]$ cat /home/lpar2rrd/stor2rrd/data/BNE3DS009/log/export0402235103.log
    2019/04/02 23:51:03 Export tool start [Version 83-05-29-FF/00]
    2019/04/02 23:51:03 md.command = /home/lpar2rrd/stor2rrd/data/BNE3DS009/IOSTATS/BNE3DS009-type-vspg51.txt
    2019/04/02 23:51:03 md.logpath = /home/lpar2rrd/stor2rrd/data/BNE3DS009/log
    2019/04/02 23:51:03 md.logfile = null
    2019/04/02 23:51:03 md.rmitimeout = 20
    2019/04/02 23:51:03 md.divide = null
    2019/04/02 23:51:03 command file = /home/lpar2rrd/stor2rrd/data/BNE3DS009/IOSTATS/BNE3DS009-type-vspg51.txt
    2019/04/02 23:51:03 [  1] [****************]        ; Specifies IP adress of SVP
    2019/04/02 23:51:03     ipAddress = [****************]
    2019/04/02 23:51:03 [  2] dkcsn 410779           ; Specifies Serial Number of SVP
    2019/04/02 23:51:03 [  3] login User = [****************], Passwd = [****************]
    2019/04/02 23:51:04 . HostPath = //10.5.15.32:1099/com.hitachi.sanproject.rmi.supervisor.rmiserver
    2019/04/02 23:51:04 . [login] lookup1:start
    2019/04/02 23:51:05 . lookup1 Normal
    2019/04/02 23:51:05 . [login] checkLicenseEx:start
    2019/04/02 23:52:08 . checkLicenseEx OK!!
    2019/04/02 23:52:08 . [login] lookup2:start
    2019/04/02 23:52:09 . lookup2 Normal
    2019/04/02 23:52:09 . [login] setUserTimeoutValue:start
    2019/04/02 23:52:09 . setUserTimeoutValue Normal
    2019/04/02 23:52:09 [MPC Software Version] 83-05-29-40/00
    2019/04/02 23:52:09 [ExportTool version] 83-05-29-FF/00
    2019/04/02 23:52:09 [ExportTool IF version] 83-00-00
    2019/04/02 23:52:09 . [login] useMonitor:start
    2019/04/02 23:52:10 . Find ErrorCode = (1,5400)
    2019/04/02 23:54:10 . Retry!!!
    2019/04/02 23:54:11 . HostPath = //10.5.15.32:1099/com.hitachi.sanproject.rmi.supervisor.rmiserver
    2019/04/02 23:54:11 . [login] useMonitor:start
    2019/04/02 23:54:11 . Find ErrorCode = (1,5400)
    2019/04/02 23:56:11 . Retry!!!
    2019/04/02 23:56:12 . HostPath = //10.5.15.32:1099/com.hitachi.sanproject.rmi.supervisor.rmiserver
    2019/04/02 23:56:12 . [login] useMonitor:start
    2019/04/02 23:56:12 . Find ErrorCode = (1,5400)
    2019/04/02 23:58:12 . Retry!!!
    2019/04/02 23:58:12 . HostPath = //10.5.15.32:1099/com.hitachi.sanproject.rmi.supervisor.rmiserver
    2019/04/02 23:58:12 . [login] useMonitor:start
    2019/04/02 23:58:13 . Find ErrorCode = (1,5400)
    2019/04/02 23:58:13 . sanproject.serverux.data.SANRmiException : RMI server error (1, 5400)
    2019/04/02 23:58:13 Login failed [line = 3]
    2019/04/02 23:58:13 Execution stops.
    2019/04/02 23:58:13 logout
    2019/04/02 23:59:07 Export tool end

  • 2019/04/02 23:58:13 . Find ErrorCode = (1,5400)
    2019/04/02 23:58:13 . sanproject.serverux.data.SANRmiException : RMI server error (1, 5400)
    2019/04/02 23:58:13 Login failed [line = 3]

    Is it possible to increase the login retry for more than 4 attempts??


  • arqarq
    edited April 3
    Also noticed that even when the log and subsequent collect works, the date of the runtime differs from the date in the process log?

    This difference is present in the G200 logs as well so????

    2019/04/03 11:15:55 . [Wed Apr 03 21:04:00 AEST 2019 -> Wed Apr 03 21:14:00 AEST 2019]

  • Hello,

    The error "sanproject.serverux.data.SANRmiException : RMI server error (1, 5400)" is described in the documentation as

    Here is link

    https://files.mtstatic.com/site_6669/23444/0?Expires=1554277198&Signature=QeprDmlgHKceC7WdRhJj4i2VUW5yfPmo55fPj4~P1qVSZRYLd3oiF1KJDejsILYJW9qqsygc1SRidqWUeSGfjlzadDSLkrSpp6CEdZySWbX4t~9ldPF7g~mtpMw-bCg5-wbLaLEgevO5RoQHfy2qBNmcjE9dS1641zDXepGstQk_&Key-Pair-Id=APKAJ5Y6AV4GI7A555NA


    I'm afraid we won't do much that. Try to follow the recommendation.

    Thank you


  • Great, thanks Lukas. I have not managed ot find the description of that error anywhere.

    BTW, your link is broken or there is an issue at that URL, But I now need to consolidate the 3 systems I have down to sharing ONE instance of the export tool.

    I used an instance of the tool for SPLUNK, HDCA ad XORUX. The G2's are not monitored by the former so I see that this must be the issue and why it only affects the G6's
  • I also have HCS running as well. :-(

  • Thanks Lukas

  • Hello

    I am running in to a similar situation where the G1500 Hitachi array is busy due to the raid agent running on an HDCA probe server so the export tool fails.  Stopping the raid agent would rectify the situation.  This however, is an unacceptable solution since I would lose data collection for troubleshooting and reporting purposes.

    Thank You
  • Hi,

    I am not sure what HDCA is and how we can help you?
  • HDCA is Hitachi Data Center Analytics - it uses probes to capture data from managed devices like the G1500 array for reporting and troubleshooting.  What Hitachi is saying is that the export tool fails because the array is busy due to the Hitachi raid agent on the probe server collecting information.  Once the array is less busy - by stopping the raid agent - export tool works. - hope this helps
  • ok, then export tool failing because of HDCA is running?

  • Yes guys that is indeed the issue. The solution I see is for the STOR2RRD app to be able to collect more than 5 minutes of data at a time. Therefore it will be able to 'fill in' the missing data by collecting it on subsequent successful logins. As it stands, being able to only collect 5 minutes everey 5 minutes means every failed login leave a hole in the data collected.
  • Hi,

    yes, collecting more data samples is possible, we have it on our TODO list already for a long time, just priority is not too high.

    Other users do not suffer by that like you, you are the first one where we see it.

Sign In or Register to comment.