Scalability & Performance Notes and Questions

Notes and Frequently Asked Questions about Scheduling

Troubleshooting Scheduling Problems


Privileges assignments cause the greatest number of problems when you schedule jobs. Therefore, you should: Use Platform tools and log files to isolate a problem. Verify that the command line will run correctly from an operating environment command prompt.

Examine the output from your job.

Foundation SAS exits with warnings that cause an Exit instead of a Done status when using Platform Computing's scheduling servers

Platform LSF treats any exit code of non-zero as being Exit instead of Done. Here are two common ways to resolve this issue: use a job starter or use Platform LSF in conjunction with the sasbatch script from a TUE install. The TUE install will create t he sasbatch script to run SAS in batch mode.

For Windows installation, you can modify the script as follows:

  if not {%username%}=={} (
    call sas.bat %*%
  ) else (
    call sas.bat -sasuser work %*%

  set rc=%ERRORLEVEL%
  if %rc%==1 goto makenormalexit

  exit %rc%


  exit 0

For UNIX installation, you can modify the script as follows:

  sas $*
  if [ $rc -eq 1 ]; then
    exit 0
    exit $rc

The job starter is similar to an automatic job wrapper and is responsible for starting the task.

Below are references to a job starter for Windows that checks the ERRORLEVEL returned by SAS. If the ERRORLEVEL is 2 or greater, it exits with that error level. If the ERRORLEVEL is 0 or 1, it exits with 0. This addresses the issue of warnings in the S AS program log. The relevant pieces for these Windows implementation and test cases are available:

To use these test programs, do the following:

  1. Copy run_sas_batch.cmd to a directory, for example,
  2. Open the lsb.queues file in an editor, for example,
  3. Add the following lines to _all_ the queue definitions:
    Begin Queue
    QUEUE_NAME = normal
    JOB_STARTER = D:\SAS_files\scripts\run_sas_batch.cmd    #<-- you add this
    End Queue
    Begin Queue
    QUEUE_NAME = priority
    JOB_STARTER = D:\SAS_files\scripts\run_sas_batch.cmd
    End Queue
    and so on.
  4. After you have modified the configuration, tell the batch system to re-read the configuration by running the command BADMIN RECONFIG. You will need to run this command as one of the Platform LSF administrator accounts. Alternatively, you can stop all the Platform LSF services, and then re-start them.
  5. Test this. If you use a SAS program such as the add
    date test;
    Call it in a .bat file.
    "c:\program files\...\sas.exe" D:\SAS_files\scripts\ -noterminal
    Submit it.
    bsub -o warn.txt warn.bat
    You will see that the job is in a DONE state when it is finished.

Note:: The run_sas_batch.cmd script relies on the command having the correct file extension. Be sure that your command, which is defined in your SAS Batch Server definition, has the correct extension, for example, .bat, .cmd, or .exe. Alternatively, you can add call in front of the :program section in run_sas_batch if your command is a .bat or .cmd file.

Windows JobStarter Script Example

rem echo off
rem This script wraps a sas.exe invocation and analyzes the ERRORLEVEL
rem set by sas.exe to determine whether or not the sas program has
rem failed or not, and return this information to LSF using 'exit'
rem (since LSF does not interpret ERRORLEVEL).
rem Errorlevels set by SAS:
rem Condition                              Return code
rem =========                              ===========
rem All steps terminated normally               0
rem SAS System issued warning(s)                1
rem SAS issued error(s)                         2
rem User issued the ABORT statement             3
rem User issued the ABORT RETURN statement      4
rem User issued the ABORT ABEND statement       5
rem SAS internal error                          6
rem Any error codes above 6 are returned as a result of using the ABORT
rem statement with a numeric argument. If your program calls ABORT with
rem return codes above 6, you'll need to modify the script.
rem If the error condition is SUCCESS or WARNING, then the script will
rem exit with a zero exit code, thus indicating success to LSF. If the
rem error condition is ERROR, INFORMATIONAL or FATAL, the script will
rem exit with the provided ERRORLEVEL, thus indicating failure to LSF.
rem The script also distinguishes .cmd and .bat script files and runs
rem them using 'call' so that the exit from the script does not exit the
rem entire shell (and thus stop the execution of this script).

rem Check for at least one argument (command to run)
if X%1 == X goto noarg

rem Check if the program to run is a script or not
rem If there is no file extension, assume a regular program
echo "Checking extension ...."
if X%~x1 == X goto program

echo "Is it a cmd file...."
if %~x1 == .cmd goto script

echo "Is it a bat file...."
if %~x1 == .bat goto script

        echo "Running program %1...."
        goto ran

        echo "Running script %1...."
        call %*
        goto ran

        echo Errorlevel from %1 is %ERRORLEVEL%
        if %ERRORLEVEL% GEQ 7 goto unknown
        goto level%ERRORLEVEL%

        echo All steps terminated normally.
        goto success

        echo SAS issued warning(s).
        goto success

        echo SAS issued error(s).
        goto failure

        echo User issued the ABORT statement.
        goto failure

        echo User issued the ABORT RETURN statement.
        goto failure

        echo User issued the ABORT ABEND statement.
        goto failure

        echo SAS internal error.
        goto failure

        echo Unknown ERRORLEVEL %ERRORLEVEL%.
        goto failure

        exit %ERRORLEVEL%

        exit 0

        echo "Usage: %0  "
        exit 127

UNIX JobStarter Script Example

#! /bin/ksh

# This script wraps a sas invocation and analyzes the return code
# set by sas to determine whether or not the sas program has
# failed or not, and return this information to LSF using 'exit'
# Errorlevels set by SAS:
# Condition                             Return code
# =========                             ===========
# All steps terminated normally             0
# SAS System issued warning(s)              1
# SAS issued error(s)                       2
# User issued the ABORT statement           3
# User issued the ABORT RETURN statement    4
# User issued the ABORT ABEND statement     5
# SAS internal error                        6


# if exits with 1 make it be 0; otherwise exist with same value
if [ $rc -eq 1 ]; then
  exit 0
  exit $rc

Diagnostic Output

Process Manager (PM), JobScheduler (JS), and LSF all contain log directories where they output logging information. You can look in the log files that PM, JS and LSF write to, in their respective default locations: <PM_TOP>/log, <JS_TOP>/log and <LSF_TOP>/logs. PM and JS output will be in the jfd.log.<host>. LSF output will be in one of the following logs:

You can change the configuration files (js.conf or lsf.conf) to have more diagnostic messages printed out. PM/JS logging is controlled by the parameter JS_LOG_MASK. The default value for JS_LOG_MASK is LOG_NOTICE. Debug settings are LOG_DEBUG1, LOG_DEBUG2, LOG_DEBUG3. LSF logging is controlled by a series of options (LSB_DEBUG*).

Platform LSF daemons logging is controlled by the parameter LSF_LOG_MASK. Possible values for this parameter can be any log priority symbol that is defined in /usr/include/sys/syslog.h. The default value for LSF_LOG_MASK is LOG_WARNING. You can temporarily set the message log level by using the following commands:

jreconfigdebug -l debug_level

lsadmin limdebug [-c class_name] [-l debug_level] [-f logfile_name] [-o] [host_name]
lsadmin resdebug [-c class_name] [-l debug_level] [-f logfile_name] [-o] [host_name]
badmin mbddebug [-c class_name] [-l debug_level] [-f logfile_name] [-o]
badmin sbddebug [-c class_name] [-l debug_level] [-f logfile_name] [-o] [host_name]
Where -o resets back to the daemon starting state.

You can add -DDebug to the Java command line to invoke SAS Management Console. This causes JS to put information about the communication between the client (SAS Management Console) and the server. You will find this output in the errorlog.txt file that's generated by SAS Management Console.

Windows Specific

The Windows security policy has some requirements that are unique. In order to correctly install the Platform Computing software, you must provide a valid user ID and password to run and administer the services. This requires that the user ID that's used to run the installation program have the privilege Act as part of the operating system assigned to it. The user ID that you specify to run the services under must have the privilege Log on as a batch job assigned to it.

There can also be problems such as the password expired or the wrong domain name was provided. Many times these user IDs are not your usual user ID. One simple way to test the domain, user ID, and password is to:

  1. Bring up a DOS command prompt.
  2. Issue the RUNAS command to bring up a new DOS command prompt running as the other user ID
    -->runas /user:DOMAIN\userid cmd
  3. Type the password, and a new DOS command prompt should be running

You can use this new DOS command prompt to run the various scheduled commands that are failing to find out if they work from the DOS prompt. If they run, you know there is a problem with the scheduler setup. If they don’t run, then it’s probable that you will have additional information on the console of the DOS command window to help you find the problem.

In a multi-user environment in which you want more than one user submitting and running flows, there are privilege settings on folders that need to be in place. The scheduling server folders should already be set for service and administrator accounts and need no further changes. Verify that the LSF installed files have the following privileges:

Folder User Group Privileges
LSFTOP\work LSF service accounts full control (All) (All)
LSFTOP\work LSF administrators full control (All) (All)
LSFTOP\work Everyone special access (R) (R)
LSFTOP\logs LSF service accounts full control (All) (All)
LSFTOP\logs LSF administrators full control (All) (All)
LSFTOP\logs Everyone special access (R) (R)
LSFTOP\conf\lsfuser.passwd JS service accounts special access (R) (R)
Verify that the scheduling servers installed files have the following privileges:
Folder User group Privileges
JSTOP\work JS service accounts full control (All) (All)
JSTOP\work JS administrators full control (All) (All)
JSTOP\work Everyone special access (R) (R)
JSTOP\log JS service accounts full control (All) (All)
JSTOP\log JS administrators full control (All) (All)
JSTOP\log Everyone special access (R) (R)

AIX Specific

The AIX environment is unique in that you can configure your kernel to be in either 32-bit mode or 64-bit mode, and your 64-bit applications will run with either mode. Platform Computing software requires that the kernel be in 64-bit mode for their 64-bit application. This means that you will need the kernel in 64-bit mode in order to use Platform Computing’s scheduling servers and Platform LSF under the AIX environment.

To determine if your kernel is in 64-bit mode, use the LSCONF command. Here is an example of a 32-bit kernel running with a 64-bit kernel installed.

   $ lsconf -k
   Kernel Type: 32-bit
   $ lslpp -l | grep bos | grep 64
     bos.64bit         COMMITTED  Base Operating System 64 bit
     bos.mp64          COMMITTED  Base Operating System 64-bit
     bos.64bit         COMMITTED  Base Operating System 64 bit
     bos.mp64          COMMITTED  Base Operating System 64-bit

To switch to 64-bit mode, the RTFM command must be used. Ask your System Administrator to issue this command for you. Here is a link that tells how to boot in 64-bit mode:

Platform Computing's Scheduling Servers and Platform LSF Specific

Q: What does the following message mean when I run the LSADMIN CKCONFIG –V command?
readKernel(): read(/dev/kmem) failed, Bad address.

A: Usually, this warning message means that a binary that’s running on the system is different from its operating environment’s bit system (it might be that a 32-bit machine is running a 64-bit binary or vice versa).

Q: What does the exception below mean when I try to run Flow Manager?

Exception in thread "main" java.lang.UnsatisfiedLinkError: /usr/local/js/5.31/linux2.4-glibc2.3-ia64/jre/lib/ia64/ cannot open shared object file: No such file or directory
at java.lang.ClassLoader$NativeLibrary.load(Native Method)
at java.lang.ClassLoader.loadLibrary0(
at java.lang.ClassLoader.loadLibrary(
at java.lang.Runtime.loadLibrary0(
at java.lang.System.loadLibrary(
at Method)
at sun.awt.font.NativeFontWrapper.(
at sun.awt.X11GraphicsEnvironment.initDisplay(Native Method)
at sun.awt.X11GraphicsEnvironment.(
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(
at java.awt.GraphicsEnvironment.getLocalGraphicsEnvironment(
at java.awt.Window.init(
at java.awt.Window.(
at java.awt.Frame.(
at java.awt.Frame.(
at javax.swing.JFrame.(

A: The preceding information indicates that the required version of the stdc++ library is missing. Ask your System Administrator to install the correct version.