Thursday 22 October 2015

Manages parallel workers in AD Administration and AutoPatch.

AD Controller is a general maintenance utility you use to determine the status of AutoUpgrade, AD Administration, or AutoPatch workers and to control worker operation. AD Controller is run in its own window, not in the same window as AutoUpgrade, AD Administration, or AutoPatch.
The command to start AD Controller is ‘adctrl’.
AD Controller prompts you for standard items such as the AOL username and password, and the log file name. AD Controller writes its log file to the current working directory.
Reviewing Worker Status
StatusDescription
ASSIGNEDThe Manager assigned a job to the worker and the worker has not started.
COMPLETEDThe worker completed the job and the manager has not yet assigned it a new job
FAILEDThe worker encountered a problem
FIXED, RESTARTYou fixed the problem and the worker should retry whatever failed
RESTARTEDThe worker is retrying a job or has successfully restarted a job
RUNNINGThe worker is running a job
WAITThe worker is idle

Worker Status Flow

Resolving a Failed WorkerThe first time a job fails; the manager automatically defers it to the end of the current phase and assigns a new job to the worker. If the deferred job fails the second time it is run, the manager defers it again only if the total runtime of the job is less than ten minutes. If the deferred job fails a third time (or if the job’s total runtime is not less than ten minutes the second time it is run) the job stays at failed status and the worker waits. At this point, you must address the cause of the failure, and then restart the job using AD Controller. If you enabled the email feature when starting the AD utility that started the workers, you automatically receive an email when a worker fails.
Determining Why a Worker FailedPerform the following steps to investigate the problem that caused the failure and restart a failed worker:
  1. Review worker status and confirm the failed status of the worker
  2. Review the worker log file adworkXXX.log under $APPL_TOP/admin/<SID>/log too determine the source of the error
  3. Resolve the error.
Restarting a Failed Workerafter you have resolved the error
  1. Tell the worker to restart a failed job
  2. When prompted enter the number of the worker that failed
  3. Review worker status again
Restarting a Failed Patch ProcessThere may be cases where a worker fails and cannot be restarted, for example, a worker process may dump core. In this case the worker is listed as running, but there is actually no worker process present. If this occurs, the only alternative may be to stop the patch and restart. In such as case, you must first shut down all workers manually. After you shut down the workers, the manager must be told that the non-running worker has failed its job and acknowledges quit.   
The progression of AD Controller commands is:
• Option 3: Tell worker to shut down/quit (all workers)
• Option 4: Tell manager that a worker failed its job
• Option 5: Tell manager that a worker acknowledges quit
Once AutoPatch shuts down, restart the patch process.
Terminating a Hanging Worker ProcessWhen running the AD utilities, there may be situations when a worker process appears to hang, or stop processing. If this occurs, and you are satisfied that the process is not simply carrying out a long-running job, you may need to terminate the process manually. Once you do, you must also manually restart that process. In such a case:
1. Determine what the worker process is doing. Use the AD Controller worker status screen to determine the file being processed and check the worker log file to see what it is doing:
– Verify whether the process is consuming CPU.
– Review the file to see what actions are being taken.
– Check for correct indexes on the tables (if the problem appears to be performance related).
– Check for an entry for this process in the V$SESSION table. This may provide clues to what the process is doing in the database.
2. Get the worker’s process ID.
If the job is identified as hanging, determine the worker’s process ID.
– UNIX: $ ps -a | grep adworker
– Windows: Run Task Manager (for example, using Ctrl-Alt-Delete) to view processes.
3. Determine what processes the worker has started, if any. If there are child processes, obtain their process IDs. Examples of child processes include SQL*Plus and FNDLOAD.
4. Stop the hanging process, using the command that is appropriate for your operating system.
5. Make necessary changes. Fix the issue that caused the worker to hang. Contact Oracle Support Services if you do not understand how to proceed.
6. Restart the job or the worker.
Restarting a Terminated WorkerTo restart a terminated worker process, complete these steps:
1. Start AD Controller.
2. Review worker status.
3. Take the appropriate action for each worker status:
If the worker shows Failed, restart the failed job. When prompted, enter the number of the worker that failed. If the worker shows Running or Restarted status, but the process is not really running, select the following options:
• Option 4: Tell the manager that a worker has failed its job. When prompted, enter the number of the hanging worker.
• Option 6: Restart a worker on the current machine. When prompted, enter the number of the worker that failed.
Do not tell the manager to start a worker that has shut down if the worker process is running. Doing so will create duplicate worker processes with the same worker ID, which will give unpredictable results.
Restarting a Terminated Child Process
  • Some worker processes spawn other processes called child processes
  • If you terminate child process that is hanging, the worker that spawned the process shows failed as the status
  • After you fix the problem choose to restart the failed job
  • Once the worker is restarted the associated child processes are started as well.
Restarting an AD Utility after a Machine CrashIf your system crashes while running an AD utility you should:
  1. Start AD controller
  2. Select the following options:
    1. Option 4: Tell manager that a worker has failed its job (specify all workers)
    2. Option 2:Tell worker to restart a failed job(specify all workers)
  3. Restart the AD utility that was running when the machine crashed.
Because the AD utilities cannot automatically detect a machine crash, you must manually notify the manager that all jobs have failed and manually restart the workers using AD Controller. If you restart the utility without doing this, the utility status and the system status will not be synchronized.
Shutting Down ManagersThere may be situations when you need to shut down an AD utility (the manager) while it is running. For example, you may need to shut down your database while you are running AutoPatch, AutoUpgrade, or AD Administration.
You should perform this shutdown in an orderly fashion so that it does not affect your data.
The best way to do this is to shut down the workers manually, which also causes the AD utility to quit in an orderly manner.
Running AD Controller InteractivelyFollow these steps to access AD Controller.1. Log in as applmgr and set the environment as described in Setting the Environment, page 1-24 in this chapter.2. Start AD Controller with the adctrl command. This will prompt you to:
• Confirm the value of APPL_TOP.
• Specify an AD Controller log file (the default is adctrl.log). The AD Controller log file is written in the current working directory.
• Supply the Oracle Application Object Library user name and password.
3. Choose an option from the main menu. Once you respond to the prompts, the main menu appears.

AD Controller Menu


Type a number to select an option. Press [Return] at any time to return to the AD Controller main menu.
Running AD Controller Non-Interactivelyyou can run AD Controller without user intervention by creating a defaults file, which captures information you supply at the interactive prompts in a file that you can later use to run AD Controller without user intervention. Creating a defaults file and running AD Controller non-interactively works in much the same way as it does for AD Administration.
Like AD Administration, the same defaults file can be used to run different AD Controller commands: a single file can contain all your choices for the different menu options. In order to choose which task the defaults file will run, you add menu_ option= <menu choice> to the utility start command. This overrides any menu-specific key stroke information stored in the defaults file initially, and allows you to use the defaults file for any of the AD Controller menu items. It also ensures that the menu option you intended for the defaults file is always valid, even it the menu items are renumbered or relocated in subsequent releases.
The available options are listed in the following table.
AD Controller Menu Options
Menu Option Effect
ACKNOWLEDGE_QUIT Tell manager that a worker acknowledges quit
INFORM_FAILURE Tell manager that a worker failed its job
RESTART_JOB Tell worker to restart a failed job
SHOW_STATUS Show worker status
SHUTDOWN_WORKER Tell worker to quit
START_WORKER Restart a worker on the current machine
The following is an example of running AD Controller non-interactively to show worker status:
$ adctrl interactive=n defaults_file=$APPL_TOP/admin/prod/ctrldefs.txtlogfile=adctr.log menu_option=SHOW_STATUS
Using any menu option on the command line, except for SHOW_STATUS, requires that you also use the worker_ range=<range> option. See the AD Controller command line help fo

Thursday 15 October 2015

CMCLEAN EXECUTION: 

what this script does?

I’m sure one of the most popular scripts for Apps DBAs on My Oracle Support is cmclean.sql from MOS Article ID 134007.1 “Concurrent Processing – CMCLEAN.SQL – Non Destructive Script to Clean Concurrent Manager Tables”. DBAs usually use the script to clean up stale data from concurrent processing tables (FND_CONCURRENT_%) after incidents like a crash of the database or concurrent processing node. This script sets correct completion phase and status codes for terminated concurrent requests and sets correct control codes for terminated concurrent manager processes. Despite the assuring “Non Destructive” claim in the title of the MOS Article there is a possibility to lose concurrent request schedules when cmclean.sql is executed.
First of all it’s important to understand how scheduled concurrent requests are executed and resubmitted. A simplified process of the execution is:
  1. Concurrent manager process (e.g. FNDLIBR in case of Standard Manager) queries the FND_CONCURRENT_REQUESTS table for pending requests.
  2. When a pending request is found, the manager process updates the PHASE_CODE=R (Running) and STATUS_CODE=R (Running).
  3. The next step is to start the executable of the concurrent program. If it’s a PL/SQL procedure – FNDLIBR  connects to the DB and executes the PL/SQL code, if it’s a java program – FNDLIBR starts up a java process to execute the java class, etc.
  4. FNDLIBR catches the exit codes from the executable of the concurrent program and updates the statuses in FND_CONCURRENT_REQUESTS accordingly – PHASE_CODE=C (Completed) and STATUS_CODE = C (Normal), G (Warning) or E (Error).
  5. FNDLIBR checks if the concurrent request has a schedule and needs to be resubmitted. If yes – it resubmits a new concurrent request with the same parameters.
But what happens if the FNDLIBR process crashes, terminates or gets killed while it’s running a concurrent request? Who takes care of the statuses in FND_CONCURRENT_REQUESTS table and how the request is resubmitted if the concurrent manager process is not there anymore?
It appears the Internal Concurrent Manager (ICM) takes care of these tasks. It checks the running requests periodically (every two minutes by default) and if it finds any that are missing the concurrent manager process and the DB session, it updates the statuses for the concurrent request and also resubmits it if it has a schedule. This action is followed by a log entry in the ICM log file:
1
2
3
4
5
6
7
8
9
10
                   Process monitor session started : 17-JUL-2013 04:24:24
Found running request 5829148 attached to dead manager process.
Setting request status to completed.
Found dead process: spid=(15160), cpid=(2032540), ORA pid=(35), manager=(0/0)
Starting STANDARD Concurrent Manager               : 17-JUL-2013 04:24:25
                     Process monitor session ended : 17-JUL-2013 04:24:25
Interesting to note, if the Internal Concurrent Manager is terminated at the same time with the manager process and is restarted later by the reviver process or by running “adcmctl.sh start” manually, the ICM performs the same check of running requests as part of the startup sequence, but this time it restarts the request instead of terminating and resubmitting it. The log of the ICM contains the following lines:
1
2
Found running request 5829146 attached to dead manager process.
Attempting to restart request.
The concurrent request is started again with exactly the same request_id as the previous time it was terminated, and the log file of the request will contain information from 2 executions – the 1st which didn’t complete and then the 2nd which probably completed. I think this scenario is very confusing and instead of restarting the request it should better be terminated and a new one should be submitted.
Let’s get back to the problem with cmclean.sql! The worst thing that can be done is running cmclean.sqlafter the crash of the concurrent processing node before starting up the concurrent managers. Why? Because cmclean.sql cleans up data in FND_CONCURRENT_REQUESTS by executing one simple update statement to change the phase and status of any “Running” or “Terminating” request to “Completed/Error”:
1
2
3
UPDATE fnd_concurrent_requests
SET phase_code = 'C', status_code = 'E'
WHERE status_code ='T' OR phase_code = 'R';
Cmclean.sql does not resubmit the request if it has a schedule. Execute it and you risk to lose some scheduled programs without any warning.
Similarly – never run cmclean.sql if you stopped the concurrent managers using “adcmctl.sh abort” or “kill -9” on concurrent manager processes to speed up the shutdown procedure. There’s the same risk to lose some scheduled requests.
Despite the risks, cmclean.sql is still a useful tool in case concurrent managers don’t come up after a failure or there are some stale data that is otherwise not cleaned up. But please, be careful when you run it! Check closely the list of requests reported in the following section of the outputs from cmclean.sql, because these requests have to be resubmitted manually if they had schedules.
1
2
3
4
5
6
7
8
9
-- Updating any Running or Terminating requests to Completed/Error
Request ID Phase  Status
---------- ------ ------
6607       R      W
6700       R      W
893534056  R      R
3 rows updated.

“Concurrent Manager Recovery” wizard is even worse! 

After posting this article I started thinking about whether the “Concurrent Manager Recovery” Wizard available from Oracle Applications Manager in e-Business Suite was any better then cmclean.sql or not. As I didn’t have much experience with it I decided to give it a try. This is what I did:
  1. I scheduled 2 concurrent programs (“CP Java Regression Test” and “CP PLSQL Regression Test”) to restart in 1 minute after the previous execution completes. These are simple test concurrent programs which sleep for some time and then complete.
  2. I made sure both programs were running and terminated all concurrent manager process and DB sessions for these concurrent programs.
  3. The termination of the processes and sessions left the rows in FND_CONCURRENT_REQUESTS with PHASE_CODE=R and STATUS_CODE=R
  4. I executed the “Concurrent Manager Recovery” wizard which fixed the status codes of the concurrent manager processes, but didn’t touch the statuses of the concurrent requests – I thought this was a good thing (I expected the ICM to clean up the statuses and resubmit the requests at its startup phase)
  5. I started up the concurrent managers, but ICM didn’t clean up the 2 stale records in FND_CONCURRENT_REQUESTS table. The 2 requests appeared as they would be running, while in fact they didn’t have any OS processes or DB sessions.
I didn’t have much time to look into the details, but it looks like the ICM is only cleaning up requests attached to dead managers (“Active” status in the FND_CONCURRENT_PROCESSES table and no OS processes running). Here, the Wizard updated the statuses of the manager processes as if they completed normally, so the ICM couldn’t identify them as being “dead”.
This actually means that the “Concurrent Manager Recovery” wizard can cause serious issues too – it doesn’t clear up the concurrent_request statuses and it prevents ICM from doing it too, so once we start up the system the terminated requests appear as if they were running. And because of this, the Conflict Resolution Manager might prevent execution of some other programs with the incompatibility rules against the terminated requests. You will need to stop the managers and run cmclean.sql to fix the statuses (and loose the schedules) to get out of this situation.