Trouble getting osagent data into Lpar2rrd server running in openshift
Hello,
We have been successfully running an lpar2rrd server in Openshift for a few years; however we have only been collecting HMC data. We would like to add aix osagent data too. We have installed the osagent and created a route in Openshift to direct the connection to port 8162 of the openshift POD. When we try to run the agent we get a Connection refused error. After troubleshooting with our openshift team it turns out that the way we have our Openshift setup it only allows incoming requests to ports 443 or 80 through to the load balancer and then it gets redirected to the appropriate port/POD. For example a connection attempt from an aix client to openshift_lpar2rrd_server@somewhere.com:8162 will not work. It would need to be openshift_lpar2rrd_server@somewhere.com (Which has a route pointed to port 8162).
Has anyone else ran into this?
Thanks
Comments
-
you can specify port instead of the default one like openshift_lpar2rrd_server@somewhere.com:443
* * * * * /usr/bin/perl /opt/lpar2rrd-agent/lpar2rrd-agent.pl openshift_lpar2rrd_server@somewhere.com:443 > /var/tmp/lpar2rrd-agent.out 2>&1
-
I did try that but the lpar2rrd-agent.pl only allows for using ports that are greater than 1000 (line 4709 and 5149).
This is the error - error setting port 443 /opt/lpar2rrd-agent/lpar2rrd-agent.pl:4709
-
ok, I see it, there is a check in the agent code, but it makes no sense
if ( ( !isdigit($port) ) || ( $port < 1000 ) ) {
error( "error setting port $port " . __FILE__ . ":" . __LINE__ );
next;
}
/opt/lpar2rrd-agent/lpar2rrd-agent.pl line 4740
remove these 4 lines and try it again.
-
I changed the less than to be greater than and then received this error when using port 443. The server did not log anything. -
Thu Sep 7 11:09:07 2023: wrong server response: agent_time:1694008742 39 XX 9XXX-MXX*123456: recv_time: :
Thu Sep 7 11:09:07 2023: Error: Not all data has been sent out, refused line: 9XX-MXX*123456:XX:39:1694008742:
-
weird, try this:
rm /var/tmp/lpar2rrd*
/usr/bin/perl /opt/lpar2rrd-agent/lpar2rrd-agent.pl -d openshift_lpar2rrd_server@somewhere.com:443
# do not miss to use -d option in above cmd
-
same result.
-
According to openshift support in order for this to work we should not be passing in a port number.
If I curl to the 443 port I get the follow error and nothing is logged on the lpar2rrd server
curl: (52) Empty reply from server
If I curl to just the hostname I get this message from curl:
The server returned an invalid or incomplete response.
And this error logged to the pod running the lpar2rrd server:
Thu Sep 7 18:48:39 2023: Received bad conn from: (X.X.XX.X.X) : port:38806 : GET /home/lpar2rrd/lpar2rrd/bin/lpar2rrd-daemon.pl:440 :
-
telnet <name> 443
then type something and enter, what do you get in server's log and on the agent side?
-
nothing in the server log.
Here is the info from the agent side:
host:/home/> telnet server 443
Trying...
Connected to server
Escape character is '^]'.
ls
Connection closed.
-
it is weird, usually lpar2rrd daemon answer like this with "Protocol error"
It does not seem to reach the lpar2rrd daemon.
$ telnet localhost 8162
Trying ::1...
telnet: connect to address ::1: Connection refused
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
kjh
Protocol error: kjh
Connection closed by foreign host.
Howdy, Stranger!
Categories
- 1.6K All Categories
- 48 XORMON NG
- 25 XORMON
- 153 LPAR2RRD
- 13 VMware
- 16 IBM i
- 2 oVirt / RHV
- 4 MS Windows and Hyper-V
- Solaris / OracleVM
- XenServer / Citrix
- Nutanix
- 7 Database
- 2 Cloud
- 10 Kubernetes / OpenShift / Docker
- 124 STOR2RRD
- 19 SAN
- 7 LAN
- 17 IBM
- 3 EMC
- 12 Hitachi
- 5 NetApp
- 15 HPE
- Lenovo
- 1 Huawei
- 2 Dell
- Fujitsu
- 2 DataCore
- INFINIDAT
- 3 Pure Storage
- Oracle