Oracle 12c agent troubleshooting (EM_PING_NOTIF_RESPONSE: BACKOFF::180000)

  • From: Martin Bach <development@xxxxxxxxxxxxxxxxx>
  • To: "oracle-l@xxxxxxxxxxxxx" <oracle-l@xxxxxxxxxxxxx>
  • Date: Thu, 27 Oct 2011 15:59:34 +0100

Good afternoon!

It's been a busy day on the mailing list, and maybe I can benefit from 
this a little :) Before I begin I have to admit that I'm not the best 
agent troubleshooter, and 12.1 hasn't made that easier.

I have 2 agents that are deployed on a 2 node cluster, both have worked 
in the past. After a reboot, both stopped to function. Now I have this:

[oracle@rac11203node1 log]$ emctl status agent
Oracle Enterprise Manager 12c Cloud Control 12.1.0.1.0
Copyright (c) 1996, 2011 Oracle Corporation. All rights reserved.
---------------------------------------------------------------
Agent Version : 12.1.0.1.0
OMS Version : (unknown)
Protocol Version : 12.1.0.1.0
Agent Home : /u01/app/oracle/product/agent_inst
Agent Binaries : /u01/app/oracle/product/core/12.1.0.1.0
Agent Process ID : 13270
Parent Process ID : 13215
Agent URL : https://rac11203node1.localdomain:3872/emd/main/
Repository URL : https://oem12oms.localdomain:4901/empbs/upload
Started at : 2011-10-26 18:30:17
Started by user : oracle
Last Reload : (none)
Last successful upload : (none)
Last attempted upload : (none)
Total Megabytes of XML files uploaded so far : 0
Number of XML files pending upload : 1,858
Size of XML files pending upload(MB) : 8.05
Available disk space on upload filesystem : 49.16%
Collection Status : Collections enabled
Last attempted heartbeat to OMS : 2011-10-27 15:42:47
Last successful heartbeat to OMS : (none)

---------------------------------------------------------------
Agent is Running and Ready

The settings are correct, I have verified that with another, uploading 
and otherwise fine agent.

I have also secured the agent, and 
$AGENT_BASE/agent_inst/sysman/log/secure.log as well as the emctl secure 
agent commands reported normal, successful operation.

Still the stubborn thing doesn't want to talk to the OMS - in the agent 
overview page both agents are listed as "unavailable", but not blocked. 
When I force an upload, I get this:

[oracle@rac11203node1 log]$ emctl upload
Oracle Enterprise Manager 12c Cloud Control 12.1.0.1.0
Copyright (c) 1996, 2011 Oracle Corporation. All rights reserved.
---------------------------------------------------------------
EMD upload error:full upload has failed: uploadXMLFiles skipped :: OMS 
version not checked yet. If this issue persists check trace files for 
ping to OMS related errors. (OMS_DOWN)

However it's not down, I can reach it from another agent (which happens 
to be on the same box as the OMS)

[oracle@oem12oms 12.1.0.1.0]$ $ORACLE_HOME/bin/emctl status agent
Oracle Enterprise Manager 12c Cloud Control 12.1.0.1.0
Copyright (c) 1996, 2011 Oracle Corporation. All rights reserved.
---------------------------------------------------------------
Agent Version : 12.1.0.1.0
OMS Version : 12.1.0.1.0
Protocol Version : 12.1.0.1.0
Agent Home : /u01/gc12.1/agent/agent_inst
Agent Binaries : /u01/gc12.1/agent/core/12.1.0.1.0
Agent Process ID : 2964
Parent Process ID : 2910
Agent URL : https://oem12oms.localdomain:3872/emd/main/
Repository URL : https://oem12oms.localdomain:4901/empbs/upload
Started at : 2011-10-15 21:00:37
Started by user : oracle
Last Reload : (none)
Last successful upload : 2011-10-27 15:46:38
Last attempted upload : 2011-10-27 15:46:38
Total Megabytes of XML files uploaded so far : 137.79
Number of XML files pending upload : 0
Size of XML files pending upload(MB) : 0
Available disk space on upload filesystem : 50.78%
Collection Status : Collections enabled
Last attempted heartbeat to OMS : 2011-10-27 15:48:34
Last successful heartbeat to OMS : 2011-10-27 15:48:34

---------------------------------------------------------------
Agent is Running and Ready

And no, the firewall is turned off and I can connect to the upload from 
any machine in the network:

[oracle@rac11203node1 log]$ wget --no-check-certificate 
https://oem12oms.localdomain:4901/empbs/upload
--2011-10-27 15:55:46-- https://oem12oms.localdomain:4901/empbs/upload
Resolving oem12oms.localdomain... 192.168.99.28
Connecting to oem12oms.localdomain|192.168.99.28|:4901... connected.
WARNING: cannot verify oem12oms.localdomain’s certificate, issued by 
“/O=EnterpriseManager on oem12oms.localdomain/OU=EnterpriseManager on 
oem12oms.localdomain/L=EnterpriseManager on 
oem12oms.localdomain/ST=CA/C=US/CN=oem12oms.localdomain”:
Self-signed certificate encountered.
HTTP request sent, awaiting response... 200 OK
Length: 314 [text/html]
Saving to: “upload.1”

100%[======================================>] 314 --.-K/s in 0s

2011-10-27 15:55:46 (5.19 MB/s) - “upload.1” saved [314/314]

The agent complains about this in gcagent.log:

2011-10-27 15:56:08,947 [37:3F09CD9C] WARN - improper ping interval 
(EM_PING_NOTIF_RESPONSE: BACKOFF::180000)
2011-10-27 15:56:18,471 [167:E3E93C4C] WARN - improper ping interval 
(EM_PING_NOTIF_RESPONSE: BACKOFF::180000)
2011-10-27 15:56:18,472 [167:E3E93C4C] WARN - Ping protocol error
o.s.gcagent.ping.PingProtocolException [OMS sent an invalid response: 
"BACKOFF::180000"]

At least someone in Oracle has some humour when it comes to this :) For 
those who read all of this: have you seen that before? Any pointers 
appreciated.

Martin
--
http://www.linkedin.com/in/martincarstenbach
http://martincarstenbach.wordpress.com
--
//www.freelists.org/webpage/oracle-l


Other related posts: