Hi Folks.
Wintel / CTM 6.4.01.200 on all components.
I had a couple of weird problems last night.
Two jobs that have successfully executed on an /Agent don't update their status in the /EM GUI. Dependant jobs continue to process normally as if there is no problem.
If I 'hold' the 'stuck' job the status, start and finish fields are corrected and the job turns Green immediately, Ended-OK.
So where did that update go? And how come activities like recycling /Agent and /Server don't grab an update the status but a Hold does? I guess a communication problem stops an update reaching the GUI but i can't see anything in the logs.
Any ideas? I have to come up with a reasonable explanation for my customer.
Cheers.
GUI shows Waiting status even after execution
Hi Breakfix,
The Control-M/Agent is not able to talk back to the Control-M/Server after receiving the Job - therefore the Control-M/Server could not perform the Post-Processing (i.e. Conditions, etc.) in order for the Successor Jobs to start.
The Hold/Free initiates communication between the Control-M/Server and Control-M/Agent, and this is why it worked.
Alternatively you could have waited for the "Track All" of the Control-M/Server (i.e. ctm_menu -> Parameter Customization -> Default Parameters for Communicating with Agent Platforms -> Polling Interval), but at the default of 900 seconds (15 minutes) this is often too long to wait. If desperate you can tune this number down (this will require a recycle of the Control-M/Server), but do not be too aggresive as this will over-load the Control-M/Server Tracker (TR) process and have negative impact on all Agents/Jobs. (I would not go less than say 180 seconds).
As you said the problem only happened last night, I assume that it is "normally" OK, so it is surely not a configuration problem, but a network problem.
Do a "ag_diag_comm" from the Control-M/Agent machine and see if this is successful - you might find that the "Agent Ping to Control-M/Server" takes an eternity, before finally failing.
If this is the case then you need to refer this to your Network/comms team to investigate ports, firewalls, etc.
If you needed to provide them with more "output" for their investigation then you can attempt to telnet from the Control-M/Agent machine to the Control-M/Server machine, including/especially on the Agent-to-Server Port (as seen in the ag_diag_comm output).
If you want even more output then debug the Control-M/Agent (ctmagcfg -> Diagnostic Level -> 4) and run ag_diag_comm again to view the $CONTROLM/proclog/ag_ping_<PID>.log file.
If the ag_diag_comm above did come back successfully, then go to the Control-M/Server and do a "ctm_daig_comm <Agent>" to ensure that all of the details match.
Scenarios like this are a good reason to investigate (BIM and/or) the addition of Late Shouts.
All the best...
Cheers, Graeme.
The Control-M/Agent is not able to talk back to the Control-M/Server after receiving the Job - therefore the Control-M/Server could not perform the Post-Processing (i.e. Conditions, etc.) in order for the Successor Jobs to start.
The Hold/Free initiates communication between the Control-M/Server and Control-M/Agent, and this is why it worked.
Alternatively you could have waited for the "Track All" of the Control-M/Server (i.e. ctm_menu -> Parameter Customization -> Default Parameters for Communicating with Agent Platforms -> Polling Interval), but at the default of 900 seconds (15 minutes) this is often too long to wait. If desperate you can tune this number down (this will require a recycle of the Control-M/Server), but do not be too aggresive as this will over-load the Control-M/Server Tracker (TR) process and have negative impact on all Agents/Jobs. (I would not go less than say 180 seconds).
As you said the problem only happened last night, I assume that it is "normally" OK, so it is surely not a configuration problem, but a network problem.
Do a "ag_diag_comm" from the Control-M/Agent machine and see if this is successful - you might find that the "Agent Ping to Control-M/Server" takes an eternity, before finally failing.
If this is the case then you need to refer this to your Network/comms team to investigate ports, firewalls, etc.
If you needed to provide them with more "output" for their investigation then you can attempt to telnet from the Control-M/Agent machine to the Control-M/Server machine, including/especially on the Agent-to-Server Port (as seen in the ag_diag_comm output).
If you want even more output then debug the Control-M/Agent (ctmagcfg -> Diagnostic Level -> 4) and run ag_diag_comm again to view the $CONTROLM/proclog/ag_ping_<PID>.log file.
If the ag_diag_comm above did come back successfully, then go to the Control-M/Server and do a "ctm_daig_comm <Agent>" to ensure that all of the details match.
Scenarios like this are a good reason to investigate (BIM and/or) the addition of Late Shouts.
All the best...
Cheers, Graeme.