rotation of agent logfile store_not_confirm_file

Have you ever thought about rotating the logfile store_not_confirm_file? In the environment of our customer we had a problem of full filesystem. It was because the "$base_dir/lpar2rrd-agent-hmc-$host-$user_name.txt" had grown to over 300MB.

Comments

  • Hi,

    it is not a log file. It is a file where is saved data which has not been sent to the server.
    Looks like there is a problem with connecting lpar2rrd deamon from that agent or any other problem preventing to send data to the server.

    Try to troubleshoot the problem as per http://www.lpar2rrd.com/OSagent-debug.htm

    Check /var/tmp/lpar2rrd*err on the agent side and search OS agent name in server log logs/error.log-daemon

    Assure you do not run too old OS agent (use 4.80+ at least)

  • Thank you for your fast Feedback. I'm going to check why the Server can't receive all of the information.
    Are there any limitations or recommondations of maximum connected LPARs? Actually we have configured 21 Managed Systems with 149 LPARs.
  • No limit on number of servers/hmcs or lpars.
  • I have spent some more time in error analysis. In logfiles on server and client I have found the following messages. Is this maybe a known error, that there are conversion problems? Do not be surprised because of different time. The servers do not use the same time zones.

    Server

    /home/lpar2rrd/lpar2rrd/logs/error.log-daemon

    Thu Jan 26 08:08:06 2017: Client communication failed - client:  (172.20.222.18): ERROR: /home/lpar2rrd/lpar2rrd/data/XXXX-YYY*SERIALNUM/blade-ivm8/lpar2rrd-client/san-vscsi0.mmm: not a simple unsigned integer: '0.3' at /home/lpar2rrd/lpar2rrd/bin/lpar2rrd-daemon.pl line 997  :

     

    Client

     

    /var/tmp/lpar2rrd-agent-lpar2rrd-server-lpar2rrd.txt

    XXXX-YYY*SERIALNUM:lpar2rrd-client:2:1485418140:Thu Jan 26 09:09:00 2017 version 4.95-7:3000000000|4:lpar2rrd-client:0::mem:::8388608:8149616:238992:2251412:0:1406748:pgs:::0:0:8192:0:::lan:en0:172.20.222.18:62112655517:11563133301:81535489:72792357:::cpu:::0:0:1:0:::san:fcs0:0xC0507606C8230000:32343837626504:14706443084320:72265518:43778518:::san:fcs1:0xC0507606C8230002:32344533554096:14719462789120:67503127:43826014:::san_resp:fcs0::0.4:0.4:::::san_resp:fcs1::0.4:0.4:::::san:vscsi0::0:15800:0:3.2:::san_resp:vscsi0::0.0:8.2::::

     

    /var/tmp/lpar2rrd-agent-lpar2rrd-server-lpar2rrd.err

    Thu Jan 26 09:08:07 2017: wrong server response: agent_time:1485286501 : recv_time: :

    Thu Jan 26 09:08:07 2017: Error: Not all data has been sent out, refused line: XXXX-YYY*SERIALNUM:lpar2rrd-client:2:1485286501:Tue Jan 24 20:35:01 2017 version 4.95-7:3000000000|4:lpar2rrd-client:0::mem:::8388608:8143396:245212:2249248:0:1407188:pgs:::0:0:8192:0:::lan:en0:172.20.222.18:59154407082:11410431140:79242025:71478840:::cpu:::0:0:1:0:::san:fcs0:0xC0507606C8230000:32128114194184:14616178154528:71820789:43471857:::san:fcs1:0xC0507606C8230002:32128987641264:14629308485120:67101082:43519577:::san_resp:fcs0::0.3:0.4:::::san_resp:fcs1::0.3:0.4:::::san:vscsi0::18500:25400:0.3:5.5:::san_resp:vscsi0::4.0:8.3:::: /opt/lpar2rrd-agent/lpar2rrd-agent.pl:2271



  • What is your lpar2rrd server version? (GUI --> menu panel --> next to the last item)
    Looks like you have installed higher version of the agent than server (actually or in the past)
    Based on version info I will let you know resolution.
  • We are using the latest versions of lpar2rrd for client and server.
    Client: lpar2rrd-agent-4.95-7
    Server: 4.95-4

  • ok, looks like in the past there was for a moment installed agent higher version on that lpar earlier than on the server.

    Anyway when ever you see this error in the server daemon log "not a simple unsigned integer" then remove affected file (all files of that type).

    rm /home/lpar2rrd/lpar2rrd/data/XXXX-YYY*SERIALNUM/*/lpar2rrd-client/san-vscsi*

    Note use always "*" instead of HMC/IVM name!

    Once you remove it then agent starts working, sends all already collected data to the server and releases /var/tmp
  • Thank you for fast and good response. It look's like that everything is working fine :smile: 
Sign In or Register to comment.