Problem Note 61712: Troubleshooting frequent failures of the SAS® LASR™ Analytic Server and SAS® High-Performance Analytics Environment in UNIX operating environments
If you experience persistent SAS LASR Analytic Server failures (crashes), or the following errors in SAS High-Performance Risk:
ERROR: Positions could not be sent to the HPRisk Engine.
ERROR: Positions could not be sent to the HPRisk Engine.
ERROR: Information could not be read from the HPRisk Engine. The HPRISK procedure must terminate the task.
In this scenario, it is a good practice to check for network and Secure Shell (SSH) settings that might interfere with your SAS LASR Analytic Server connection. This SAS Note covers some basic troubleshooting steps that you can take to try to address this issue:
- Check the /etc/ssh/sshd_config directory on all TKGrid nodes for parameters that cause a time-out, such as the ClientAliveInterval and ClientAliveCountMax settings. Here is an example:
ClientAliveInterval 60
ClientAliveCountMax 5
These settings time out client machines in five minutes (5 X 60 seconds).
- Ensure that ClientAliveInterval is set to 0 on all nodes and restart SSH if you make any changes. This change forces a persistent connection.
- Ensure that TCPKeepAlive no does not appear in the /etc/ssh/ssh_config directory or the /etc/ssh/sshd_config directory on any nodes.
- Verify that the user who starts SAS LASR Analytic Server uses passwordless SSH from each node to each node by submitting the following command exactly as shown, except for editing the two paths to TKGrid/bin/simsh:
/opt/TKGrid/bin/simsh /opt/TKGrid/bin/simsh hostname
This example uses the simsh script, which runs a command on all nodes in the TKGrid installation. By listing simsh twice, the script runs a nested FOR loop to try SSH connections from each node to each node. In addition, it issues the hostname command. The output looks similar to the following:
sasts009: sasts011: sasts011.unx.sas.com
sasts009: sasts012: sasts012.unx.sas.com
sasts009: sasts010: sasts010.unx.sas.com
sasts009: sasts009: sasts009.unx.sas.com
sasts010: sasts011: sasts011.unx.sas.com
sasts010: sasts012: sasts012.unx.sas.com
. . .more lines. . .
The output means that the machine in column 1 connected to the machine in column 2, and column 3 contains the hostname command output.
The user starting SAS LASR Analytic Server should not receive any password prompts or failed to login messages when executing this command. Also note that all SAS LASR Analytic Server users need passwordless SSH access.
Operating System and Release Information
SAS System | SAS LASR Analytic Server | 64-bit Enabled AIX | | | | |
64-bit Enabled Solaris | | | | |
Linux for x64 | | | | |
Solaris for x64 | | | | |
SAS System | SAS High-Performance Risk | Microsoft® Windows® for x64 | 4.1 | | 9.4 TS1M5 | |
64-bit Enabled AIX | 4.1 | | 9.4 TS1M5 | |
Linux for x64 | 4.1 | | 9.4 TS1M5 | |
*
For software releases that are not yet generally available, the Fixed
Release is the software release in which the problem is planned to be
fixed.
The distributed SAS LASR Analytic Server server requires a stable SSH connection. Network settings that interrupt this connection can cause what looks like a SAS LASR Analytic Server failure (crash) with no apparent cause.
Type: | Problem Note |
Priority: | medium |
Date Modified: | 2023-03-22 07:57:48 |
Date Created: | 2018-01-16 09:31:24 |