Hi All
I want to set up a job to check for agent status and in case any agent is down it informs ops via alert.
Is it possible?
set up job for control-m agent down
It depends on your alerting system too (Not everyone can tie directly into Remedy or whatever alerting system they have). Some configurations (using the Patrol Control-M KM) will not alert on an Agent Unavailable alert but will alert on job failures. I believe this is corrected/added feature in the Patrol KM for Control-M 8.0 but is not documented.
To schedule a job to check the status of an agent, all you need to do is schedule a job to run a 'ls -la' or some generic command and to have a Late Sub alert after the run time. If the agent is available, the job will run and complete... if it is not available the job will turn "Wait for Resource" and trigger the Late Sub Alert since it cannot run on time-- giving you a decent check on the health of your agent determining whether it can run work or cannot run work by going through each step of the process: Submitting, Running, Completing, Returning a Completion code.
There are several problems with this method:
1) You must schedule a job for every hour of the day (or chosen interval) for how often you want to check it. (Cyclic jobs won't work with Late Sub Alerts). If you have a lot of agents, this could mean a lot of tasks. 24 hour checks * 50 agents * 365 days in a year = 483000 extra jobs.
2) If the Agent goes down right after the last check you may have a gap of time the agent is down before the next check.
Advantages:
1) Some agents go down routinely and come back up automatically within 5 minutes. These agents won't be alerted on immediately and cause extra tickets to be generated.
NOTE: Configure the job to delete the sysout after running in all cases (success or fail) or else the proclog on the agent will fill up and consume the file system in some cases.
To schedule a job to check the status of an agent, all you need to do is schedule a job to run a 'ls -la' or some generic command and to have a Late Sub alert after the run time. If the agent is available, the job will run and complete... if it is not available the job will turn "Wait for Resource" and trigger the Late Sub Alert since it cannot run on time-- giving you a decent check on the health of your agent determining whether it can run work or cannot run work by going through each step of the process: Submitting, Running, Completing, Returning a Completion code.
There are several problems with this method:
1) You must schedule a job for every hour of the day (or chosen interval) for how often you want to check it. (Cyclic jobs won't work with Late Sub Alerts). If you have a lot of agents, this could mean a lot of tasks. 24 hour checks * 50 agents * 365 days in a year = 483000 extra jobs.
2) If the Agent goes down right after the last check you may have a gap of time the agent is down before the next check.
Advantages:
1) Some agents go down routinely and come back up automatically within 5 minutes. These agents won't be alerted on immediately and cause extra tickets to be generated.
NOTE: Configure the job to delete the sysout after running in all cases (success or fail) or else the proclog on the agent will fill up and consume the file system in some cases.