Verifying the Network Setup

Overview

The first step in troubleshooting problems with a SAS grid is to verify that all computers in the grid can communicate with one another through the ports that are used by the grid middleware.

Host Addresses

Check the /etc/hosts file on each grid node to ensure that the machine name is not mapped to the 127.0.0.1 address. This mapping causes the signon connection to the grid node to fail or to hang. This happens because the SAS session being invoked on the grid node cannot determine the correct IP address of the machine on which it is running. A correct IP address must be returned to the client session in order to complete the connection. For example, delete the name "myserver" if the following line is present in the /etc/hosts file
127.0.0.1 myserver localhost.localdomain localhost

Host Connectivity

You must verify that the network has been set up properly and that each machine knows the network address of all the other machines in the grid. Follow these steps to test the network setup:
  1. Run the hostname command on every machine in the grid (including grid nodes, grid control servers, and Foundation SAS grid clients).
  2. Run the ping command on all grid node machines and the grid control machine against every other machine in the grid (including grid client machines). When you ping a grid client machine, use the host name without the domain suffix.
  3. Run the ping command on each grid client machine against every other machine in the grid (including itself). When a grid client machine pings itself using the value from the hostname command, verify that the returned IP address is the same IP address that is returned when the grid nodes ping the client. However, this might not occur on machines with multiple network adapters.
If the network tests indicate a problem, you must either correct the DNS server or add entries to each machine's hosts file. Contact your network administrator for the best way to fix the problem.
Platform LSF assumes that each host in the grid has a single name, that it can resolve the IP address from the name, and that it can resolve the official name from the IP address. If any of these conditions are not met, LSF needs its own hosts file, which is located in its configuration directory (LSF_ENVDIR/conf/hosts).

Host Ports

You must verify that the ports that SAS and LSF use for communication are accessible from other machines. The ports might not be accessible if a firewall is running on one or more machines. If firewalls are running, you must open ports so that communication works between the LSF daemons and the instances of SAS. Issue the telnet <host><port> command to determine whether a port is open on a specific host.
The default ports used in a grid are:
  • LSF: 6878, 6881, 6882, 7869, 7870, 7871, 7872
  • Grid Monitoring Service: 1976
  • Platform Process Manager: 1966
If you need to change any port numbers, modify these files:
  • LSF ports: LSF_ENVDIR/conf/lsf.conf and EGO_CONFDIR/ego.conf
  • Grid Monitoring Service port: gms/conf/ga.conf
  • Platform Process Manager port: pm/conf/js.conf
If you change the Grid Monitoring Service port, you must also change the metadata for the Grid Monitoring Server. If you change the Platform process Manager port, you must also change the metadata for the Job Scheduler Server.
Ports might be used by other programs. To check for ports that are in use, stop the LSF daemons and issue the command netstat -an |<search-tool><port>, where search-tool is grep (UNIX) or findstr (Windows). Check the output of the command for the LSF ports. If a port is in use, reassign the port or stop the program that is using the port.
SAS assigns random ports for connections, but you can restrict the range of ports SAS uses by using the -tcpportfirst <first-port> and the -tcpportlast <last-port> options. You can specify these options in the SAS configuration file or on the SAS command line. For remote sessions, you must specify these options either in the grid command script (sasgrid.cmd on Windows or sasgrid on UNIX) or in the Command field in the logical grid server definition in metadata. For example, adding the following parameters to the SAS command line in the grid script restricts the ports that the remote session uses to between 5000 and 5005:
-tcpportfirst 5000 -tcpportlast 5005