SAS Global Forum 2017 Proceedings

UNIX and Linux SAS^® administrators, have you ever been greeted by one of these statements as you walk into the office before you have gotten your first cup of coffee? Power outage! SAS servers are down. I cannot access my reports. Have you frantically tried to restart the SAS servers to avoid loss of productivity and missed one of the steps in the process, causing further delays while other work continues to pile up? If you have had this experience, you understand the benefit to be gained from a utility that automates the management of these multi-tiered deployments. Until recently, there was no method for automatically starting and stopping multi-tiered services in an orchestrated fashion. Instead, you had to use time-consuming manual procedures to manage SAS services. These procedures were also prone to human error, which could result in corrupted services and additional time lost, debugging and resolving issues injected by this process. To address this challenge, SAS Technical Support created the SAS Local Services Management (SAS_lsm) utility, which provides automated, orderly management of your SAS^® multi-tiered deployments. The intent of this paper is to demonstrate the deployment and usage of the SAS_lsm utility. Now, go grab a coffee, and let's see how SAS_lsm can make life less chaotic.

Read the paper (PDF)

Data has to be a moderate size when you estimate parameters with machine learning. For data with huge numbers of records like healthcare big data (for example, receipt data) or super multi-dimensional data like genome big data, it is important to follow a procedure in which data is cleaned first and then selection of data or variables for modeling is performed. Big data often consists of macroscopic and microscopic groups. With these groups, it is possible to increase the accuracy of estimation by following the above procedure in which data is cleaned from a macro perspective and the selection of data or variables for modeling is performed from a micro perspective. This kind of step-wise procedure can be expected to help reduce bias. We also propose a new analysis algorithm with N-stage machine learning. For simplicity, we assume N =2. Note that different machine learning approaches should be applied; that is, a random forest method is used at the first stage for data cleaning, and an elastic net method is used for the selection of data or variables. For programming N-stage machine learning, we use the LUA procedure that is not only efficient, but also enables an easily readable iteration algorithm to be developed. Note that we use well-known machine learning methods that are implementable with SAS^® 9.4, SAS^® In-Memory Statistics, and so on.

Read the paper (PDF)

SAS^® In-Memory Analytics for Hadoop is an analytical programming environment that enables a user to use many components of an analytics project in a single environment, rather than switching between different applications. Users can easily prepare raw data for different types of analytics procedures. These techniques explore the data to enhance the information extractions. They can apply a large variety of statistical and machine learning techniques to the data to compare different analytical approaches. The model comparison capabilities let them quickly find the best model, which they can deploy and score in the Hadoop environment. All of these different components of the analytics project are supported in a distributed in-memory environment for lightning-fast processing. This paper highlights tips for working with the interaction between Hadoop data and for dealing with SAS^® LASR Analytic Server. It contains multiple scenarios with elementary but pragmatic approaches that enable SAS^® programmers to work efficiently within the SAS^® In-Memory Analytics environment.