Verifying the SAS Environment

Verifying SAS Grid Metadata

SAS needs to retrieve metadata about the grid from a SAS Metadata Server in order to operate properly. Start SAS Management Console and use the Server Manager plug-in to verify the following:
Logical grid server
Under the SAS Application Server context (for example, SASApp), verify that a logical grid server has been defined.
Open the Properties window for the logical grid server. Verify that the properties contain the correct path to the script file or the correct command that is executed on the grid node. Verify that the path exists on every node in the grid and that the command is valid on every node in the grid.
Grid monitoring server
Verify that a grid monitoring server has been defined.
Open the connection properties for the server. Verify that the properties contain the name or address of the machine that is running the Grid Monitoring Server daemon (typically the SAS grid control machine). Verify that the port specified in the properties is the same as that specified in the Grid Monitoring Service configuration file (the default value is 1976).

Verifying Grid Monitoring

The Grid Manager plug-in for SAS Management Console displays information about the grid's jobs, hosts, and queues. After you define the Grid Monitoring Server and the Grid Management Service is running on the control server, grid information is displayed in the Grid Manager plug-in in SAS Management Console. Common error messages encountered in the Grid Manager plug-in include the following:
Connection timed out or Connection refused
The Grid Management Service is not running. Start the Grid Management Service on the grid control machine.
Your user ID or password is invalid. Please try again or contact your systems administrator
Either the user provided invalid credentials for the machine running the Grid Monitoring Service, or the user's credentials that are stored in the metadata do not include a password for the login that is associated with the authorization domain used by the Grid Monitoring Server connection. For example, "Grid 1 Monitoring Server" is defined in the metadata to use the "DefaultAuth" authorization domain. A login has been defined for "User1" in the User Manager for the "DefaultAuth" domain, but only the user ID has been specified and the password is blank.
There are three ways to correct the problem. First, provide complete credentials for the authorization domain for the user. Second, you can remove the login for the authorization domain. The third option is to use a different authorization domain for the Grid Monitoring Server connection. If you provide the correct credentials, the user is not prompted for a user ID and password. If you remove the login for that authorization domain or change the Grid Monitoring Server connection to use a different authorization domain without adding credentials for the user for that domain, the user is prompted for their user ID and password to connect to the machine where the Grid Monitoring Server is running.

Verifying SAS Job Execution

SAS provides a grid test program on the SAS support website that tests connectivity to all nodes in the grid. Run the program from a grid client. You can download the program from http://support.sas.com/rnd/scalability/grid/gridfunc.html#testprog. After you download the program, follow these steps:
  1. Copy and paste the grid test program into a Foundation SAS Display Manager session.
  2. If the application server associated with your logical grid server in your metadata is not named “SASMain”, change all occurrences of “SASMain” in the test program to the name of the application server that is associated with your logical grid server. For example, some SAS installations have named the application server “SASApp”, so all occurrences of SASMain should be replaced with “SASApp”.
  3. Submit the code.
The program attempts to start one remote SAS session for every job slot available in the grid. The program might start more than one job on multi-processor machines, because LSF assigns one job slot for each core by default.
Here are some problems that you might encounter when running the grid test program:
Grid Manager not licensed message
Make sure that your SID contains a license for SAS Grid Manager.
Grid Manager cannot be loaded message
Make sure that Platform Suite for SAS has been installed and that the LSF and PATH environment variables are defined properly.
Invalid resource requested message
The application server name or workload value has not been defined in the lsf.shared file. Also, make sure that you associate the value with the hosts that you want to run SAS programs in the lsf.cluster.<cluster_name> file.
The number of grid nodes is 0.
Possible reasons for this error include the following:
  • The application server name was not defined as a resource name in the lsf.shared file.
  • The application server name was not associated with any grid nodes in the lsf.cluster.<cluster_name> file.
  • The grid client where the job was submitted cannot communicate with the entire grid.
The number of grid nodes is not the same as the number of grid node machines.
As shipped, the number of grid nodes equals the number of job slots in the grid. By default, the number of job slots is equal to the number of cores, but the number of job slots for a grid node can be changed.
Another explanation is that the application server name has not been associated with all the grid nodes in the lsf.cluster.<cluster_name> file.
Jobs fail to start.
Possible reasons for this problem include the following:
  • The grid command defined in the logical grid server metadata is either not valid on grid nodes or does not bring up SAS on the grid node when the command is run. To verify the command, log on to a grid node and run the command defined in the logical grid server definition. The command should attempt to start a SAS session on the grid node. However, the SAS session might not run successfully because grid parameters have not been included. Platform Suite for SAS provides a return code of 127 if the command to be executed is not found and a return code of 128 return code if the command is found, but there is a problem executing the command.
  • Incorrect version of SAS installed on grid nodes. SAS 9.1.3 Service Pack 3 is the minimum supported version. A return code of 231 might be associated with this problem.
  • Unable to communicate between the grid client and grid nodes. Verify that the network is set up properly, using the information in Verifying the Network Setup .
Jobs run on machines that are supposed to be only grid clients.
By default, all machines that are listed in the lsf.cluster.<cluster_name> file are part of the grid and can process jobs. If you want a machine to be able to submit jobs to the grid (a grid client) but not be a machine that can process the job (a grid node), set its maximum job slots to 0 or use the Grid Manager plug-in to close the host.