GUI shows Waiting status even after execution

Post Reply
User avatar
BreakFix
Nouveau
Nouveau
Posts: 30
Joined: 24 Sep 2009 12:00

GUI shows Waiting status even after execution

Post by BreakFix » 24 Sep 2010 12:22

Hi Folks.

Wintel / CTM 6.4.01.200 on all components.

I had a couple of weird problems last night. :roll:

Two jobs that have successfully executed on an /Agent don't update their status in the /EM GUI. Dependant jobs continue to process normally as if there is no problem.

If I 'hold' the 'stuck' job the status, start and finish fields are corrected and the job turns Green immediately, Ended-OK.

So where did that update go? And how come activities like recycling /Agent and /Server don't grab an update the status but a Hold does? I guess a communication problem stops an update reaching the GUI but i can't see anything in the logs.

Any ideas? I have to come up with a reasonable explanation for my customer.

Cheers.
/BreakFix

User avatar
gbyrnes
Nouveau
Nouveau
Posts: 8
Joined: 05 May 2010 12:00
Location: Melbourne, Australia

Post by gbyrnes » 24 Sep 2010 2:24

Hi Breakfix,

The Control-M/Agent is not able to talk back to the Control-M/Server after receiving the Job - therefore the Control-M/Server could not perform the Post-Processing (i.e. Conditions, etc.) in order for the Successor Jobs to start.

The Hold/Free initiates communication between the Control-M/Server and Control-M/Agent, and this is why it worked.

Alternatively you could have waited for the "Track All" of the Control-M/Server (i.e. ctm_menu -> Parameter Customization -> Default Parameters for Communicating with Agent Platforms -> Polling Interval), but at the default of 900 seconds (15 minutes) this is often too long to wait. If desperate you can tune this number down (this will require a recycle of the Control-M/Server), but do not be too aggresive as this will over-load the Control-M/Server Tracker (TR) process and have negative impact on all Agents/Jobs. (I would not go less than say 180 seconds).

As you said the problem only happened last night, I assume that it is "normally" OK, so it is surely not a configuration problem, but a network problem.

Do a "ag_diag_comm" from the Control-M/Agent machine and see if this is successful - you might find that the "Agent Ping to Control-M/Server" takes an eternity, before finally failing.
If this is the case then you need to refer this to your Network/comms team to investigate ports, firewalls, etc.
If you needed to provide them with more "output" for their investigation then you can attempt to telnet from the Control-M/Agent machine to the Control-M/Server machine, including/especially on the Agent-to-Server Port (as seen in the ag_diag_comm output).
If you want even more output then debug the Control-M/Agent (ctmagcfg -> Diagnostic Level -> 4) and run ag_diag_comm again to view the $CONTROLM/proclog/ag_ping_<PID>.log file.

If the ag_diag_comm above did come back successfully, then go to the Control-M/Server and do a "ctm_daig_comm <Agent>" to ensure that all of the details match.

Scenarios like this are a good reason to investigate (BIM and/or) the addition of Late Shouts.

All the best...
Cheers, Graeme.

Post Reply