Restarting Jobs

Overview

Using SAS Grid Manager, you can restart jobs that have been submitted to the grid. You can restart a job based on a lack of response from the server running to job or on the receipt of a specified return code.
The restart capability is available only if you are using the SAS Grid Manager Client Utility or are scheduling grid jobs. It is not available if you are using the grid to start remote SAS/CONNECT servers.

Restarting Unresponsive Jobs

You can automatically restart a job when the server on which the job is running stops responding. To use this capability:
  • add checkpoints to the SAS programs you submit to the grid
  • submit programs to the grid using the SAS Grid Manager Client Utility and specifying the GRIDRESTARTOK argument
If the host that is running the job becomes unresponsive, the program is automatically restarted at the last checkpoint.
Note: This function is available only if you are using Platform Suite for SAS as the grid middleware.

Restarting Jobs Based on Return Codes

You can set up a queue that automatically requeues and restarts any job that ends with a specified return code. In order to use this functionality, you must use Platform Suite for SAS as the grid middleware, and you must be using the SAS Grid Manager Client Utility. This utility is included with the second maintenance release for SAS 9.2.
To set up a queue for automatic restart, follow these steps:
  1. Create a queue, including the REQUEUE_EXIT_VALUES: return_code_a return_code_b ...return_code_n option in the queue definition. The return_code values are the job exit codes that you want to filter. Any job that exits with one of the specified codes will be restarted.
    Note: If you specify a return_code value higher than 255, LSF uses the modulus of the value with 256. For example, if SAS returns an exit code of 999, LSF sees that value as (999 mod 256), or 231. Therefore, you must specify a value of 231 on REQUE_EXIT_VALUES.
  2. Specify the queue you created in step 1, either by modifying a grid server definition or by specifying the -GRIDJOBOPTS option.
    To create or modify a grid server definition, use the Server Manager plug-in in SAS Management Console. To specify the queue, specify “queue=<name_of_requeue_queue>” in the Additional Options field of the server definition.
    To use -GRIDJOBOPTS, submit the job using the -GRIDJOBOPTS queue=name_of_requeue_queue option.
  3. Submit the job to the requeue queue on the grid. You must use the SAS Grid Manager Client Utility to specify the -GRIDRESTARTOK option. Send the job to the requeue queue by using the server you specified in step 2.