-
Copy and paste the grid
test program into a Foundation SAS Display Manager session.
-
If the application server
associated with your logical grid server in your metadata is not named
“SASMain”, change all occurrences of “SASMain”
in the test program to the name of the application server that is
associated with your logical grid server. For example, some SAS installations
have named the application server “SASApp”, so all occurrences
of SASMain should be replaced with “SASApp”.
-
The program attempts
to start one remote SAS session for every job slot available in the
grid. The program might start more than one job on multi-processor
machines, because LSF assigns one job slot for each core by default.
Here are some problems
that you might encounter when running the grid test program:
Grid Manager not licensed
message
Make sure that your
SID contains a license for SAS Grid Manager.
Grid Manager cannot be loaded
message
Make sure that Platform
Suite for SAS has been installed and that the LSF and PATH environment
variables are defined properly.
Invalid resource requested
message
The application server
name or workload value has not been defined in the lsf.shared file.
Also, make sure that you associate the value with the hosts that you
want to run SAS programs in the lsf.cluster.<cluster_name>
file.
The number of grid nodes is 0.
Possible reasons for
this error include the following:
-
The application server name was
not defined as a resource name in the lsf.shared file.
-
The application server name was
not associated with any grid nodes in the lsf.cluster.<
cluster_name>
file.
-
The grid client where the job was
submitted cannot communicate with the entire grid.
The number of grid nodes is not the same as the number of grid
node machines.
As shipped, the number
of grid nodes equals the number of job slots in the grid. By default,
the number of job slots is equal to the number of cores, but the number
of job slots for a grid node can be changed.
Another explanation
is that the application server name has not been associated with all
the grid nodes in the lsf.cluster.<
cluster_name>
file.
Jobs fail to start.
Possible reasons for
this problem include the following:
-
The grid command defined in the
logical grid server metadata is either not valid on grid nodes or
does not bring up SAS on the grid node when the command is run. To
verify the command, log on to a grid node and run the command defined
in the logical grid server definition. The command should attempt
to start a SAS session on the grid node. However, the SAS session
might not run successfully because grid parameters have not been included.
Platform Suite for SAS provides a return code of 127 if the command
to be executed is not found and a return code of 128 return code if
the command is found, but there is a problem executing the command.
-
Incorrect version of SAS installed
on grid nodes. SAS 9.1.3 Service Pack 3 is the minimum supported version.
A return code of 231 might be associated with this problem.
-
Unable to communicate between the
grid client and grid nodes.
Verify that the network is set up properly, using the information
in Verifying the Network Setup .
Jobs run on machines that are supposed to be only grid clients.
By default, all machines
that are listed in the lsf.cluster.<cluster_name>
file are part of the grid and can process jobs. If you want a machine
to be able to submit jobs to the grid (a grid client) but not be a
machine that can process the job (a grid node), set its maximum job
slots to 0 or use the Grid Manager plug-in to close the host.