Alert Messages.

Post Reply
User avatar
futre25
Nouveau
Nouveau
Posts: 166
Joined: 11 Aug 2009 12:00

Alert Messages.

Post by futre25 » 07 Dec 2011 2:15

Hi to all.

I have the alerts messages, two messages that do not understand:

- Failed to backup Agent's status parameters, and
- The interval longest run was over 13 minutes, every 5 hours.

I can not find nothing in the documentation.

Can you tell me why are these alerts?, what is the problem?

Thanks to all for your help.

User avatar
meuser
Nouveau
Nouveau
Posts: 4
Joined: 23 Nov 2010 12:00

Post by meuser » 09 Dec 2011 7:42

Hi,

here are some answers to you questions. You can find more info on all sorts of problems at http://www.bmc.com/support/reg?c=n .

In the CONTROL-M/Enterprise Manager Global Alert Client (GAC), the following messages are appearing:

1. Failed to backup agent status parameters
2. The interval longest run was over X minutes
3. The script perl expire it's timeout and has been killed
CONTROL-M/Server Watchdog all databases
The problem happens when a download is in progress or when CONTROL-M/Server is handling Quantitative Resources or Control Resources while backrest_nodes_config Perl utility is activated by the watchdog (WD) process.

The backres_nodes_config_xxxxx.log in the CONTROL-M/Server's proclog directory will show a message similar to:

SQL> 2 update CMR_RESOURCELOCK set LASTEND = '0'
*
ERROR at line 1:
ORA-00060: deadlock detected while waiting for resource

Defect ID CAR00040571 was opened to track this issue. This issue has been resolved in Fix Pack 3 for version 6.4.01 (CONTROL-M/Server 6.4.01.300).
Modify the CONTROL-M/Server so that the Watchdog (WD) process executes the backup less frequently. NOTE: Once CONTROL-M/Server 6.4.01 Fix Pack 3 is applied, this workaround should be removed.

1. Navigate to CONTROL-M/Server $HOME\Ctm_server\Data directory
2. Find the config.dat file
3. Make a backup copy of the config.dat file
4. Open the config.dat file and scroll down until you see:
5. Locate the line: # WatchDog exit's to backup nodes status parameters
6. The next line says: WD_CTMEXIT_3_INTERVAL 10
7. Change the 10 to 360
8. Save the file
9. Cycle the CONTROL-M/Server to make the change take effect.


interval longest run:

The WD process is hard-coded to perform a list of tasks every 6 minutes that includes among other the heartbeat checking of the processes. After completing it if more than 6 minutes have passed it writes down the time that it took. After performing 50 circles (50 * 6 minutes = 5 hours) it writes in the WD log file and in the ctmlog the longest period of time that was registered, if any, in the following format:
<date> <time> WD: Handling error: The interval longest run was over <number> minutes.


There is a known issue that was documented as bug BMPM009641 concerning this messages. The issue is that even when the original problem that triggered this message is solved, the WD process does not reset, and will keep on sending the same message every 5 hours.

The workaround to solve this problem is to kill the WD process.
Issue the following command:

kill <pid>

The WD process will be automatically restarted, and the message should not return.

Hope this helps

User avatar
futre25
Nouveau
Nouveau
Posts: 166
Joined: 11 Aug 2009 12:00

Post by futre25 » 09 Dec 2011 11:10

Thanks very much for your reply. Is very interesting.

We do not have in my company, to be supported of BMC. user to access support.

If you do not mind, I would like to ask another question.

Occasionally, we also have the following alert:

Low on database space., pero la base de datas esta al 11%:

ctmdbspace -LIMIT 90
The DB Data+Log use: 11%

so, why I have the alert?.
we should be alert when this bd to more than 90%.

Thanks very much.

User avatar
futre25
Nouveau
Nouveau
Posts: 166
Joined: 11 Aug 2009 12:00

Post by futre25 » 21 Dec 2011 5:08

If you do not mind, I would like to repeat the last question.

Occasionally, we also have the following alert:

Low on database space., but the database is 11%:

ctmdbspace -LIMIT 90
The DB Data+Log use: 11%

so, why I have the alert?.
we should be alert when this bd to more than 90%.

AND,

Every Saturday I have the following alert.

The script ctmdbspace expire it's timeout and has been killed

Why?
Would greatly appreciate a response.


Thanks very much to all.

User avatar
aronfire
Nouveau
Nouveau
Posts: 8
Joined: 23 Dec 2010 12:00

Thanks Meuser

Post by aronfire » 07 Feb 2012 6:15

I had the same problem with de message "The interval ...." and solved just killing the watchdog process and start it again automatically.
Thank MeUser.
Aaron Hernandez

Post Reply