SAS Global Forum 2017 Proceedings

UNIX and Linux SAS^® administrators, have you ever been greeted by one of these statements as you walk into the office before you have gotten your first cup of coffee? Power outage! SAS servers are down. I cannot access my reports. Have you frantically tried to restart the SAS servers to avoid loss of productivity and missed one of the steps in the process, causing further delays while other work continues to pile up? If you have had this experience, you understand the benefit to be gained from a utility that automates the management of these multi-tiered deployments. Until recently, there was no method for automatically starting and stopping multi-tiered services in an orchestrated fashion. Instead, you had to use time-consuming manual procedures to manage SAS services. These procedures were also prone to human error, which could result in corrupted services and additional time lost, debugging and resolving issues injected by this process. To address this challenge, SAS Technical Support created the SAS Local Services Management (SAS_lsm) utility, which provides automated, orderly management of your SAS^® multi-tiered deployments. The intent of this paper is to demonstrate the deployment and usage of the SAS_lsm utility. Now, go grab a coffee, and let's see how SAS_lsm can make life less chaotic.

Read the paper (PDF)

The singular spectrum analysis (SSA) method of time series analysis applies nonparametric techniques to decompose time series into principal components. SSA is particularly valuable for long time series, in which patterns (such as trends and cycles) are difficult to visualize and analyze. An important step in SSA is determining the spectral groupings; this step can be automated by analyzing the w-correlations of the spectral components. This paper provides an introduction to singular spectrum analysis and demonstrates how to use SAS/ETS^® software to perform it. To illustrate, monthly data on temperatures in the United States over the last century are analyzed to discover significant patterns.

Read the paper (PDF)

Data that are gathered in modern data collection processes are often large and contain geographic information that enables you to examine how spatial proximity affects the outcome of interest. For example, in real estate economics, the price of a housing unit is likely to depend on the prices of housing units in the same neighborhood or nearby neighborhoods, either because of their locations or because of some unobserved characteristics that these neighborhoods share. Understanding spatial relationships and being able to represent them in a compact form are vital to extracting value from big data. This paper describes how to glean analytical insights from big data and discover their big value by using spatial econometric methods in SAS/ETS^® software.

Read the paper (PDF)

Detection and adjustment of structural breaks are an important step in modeling time series and panel data. In some cases, such as studying the impact of a new policy or an advertising campaign, structural break analysis might even be the main goal of a data analysis project. In other cases, the adjustment of structural breaks is a necessary step to achieve other analysis objectives, such as obtaining accurate forecasts and effective seasonal adjustment. Structural breaks can occur in a variety of ways during the course of a time series. For example, a series can have an abrupt change in its trend, its seasonal pattern, or its response to a regressor. The SSM procedure in SAS/ETS^® software provides a comprehensive set of tools for modeling different types of sequential data, including univariate and multivariate time series data and panel data. These tools include options for easy detection and adjustment of a wide variety of structural breaks. This paper shows how you can use the SSM procedure to detect and adjust structural breaks in many different modeling scenarios. Several real-world data sets are used in the examples. The paper also includes a brief review of the structural break detection facilities of other SAS/ETS procedures, such as the ARIMA, AUTOREG, and UCM procedures.

Read the paper (PDF)

Getting Started with ARIMA Models will introduce the basic features of time series variation, and the model components used to accommodate them; stationary (ARMA), trend and seasonal (the 'I' in ARIMA) and exogenous (input variable related). The Identify, Estimate and Forecast framework for building ARIMA models is illustrated with two demonstrations.

Recent advances in computing technology, monitoring systems, and data collection mechanisms have prompted renewed interest in multivariate time series analysis. In contrast to univariate time series models, which focus on temporal dependencies of individual variables, multivariate time series models also exploit the interrelationships between different series, thus often yielding improved forecasts. This paper focuses on cointegration and long memory, two phenomena that require careful consideration and are observed in time series data sets from several application areas, such as finance, economics, and computer networks. Cointegration of time series implies a long-run equilibrium between the underlying variables, and long memory is a special type of dependence in which the impact of a series' past values on its future values dies out slowly with the increasing lag. Two examples illustrate how you can use the new features of the VARMAX procedure in SAS/ETS^® 14.1 and 14.2 to glean important insights and obtain improved forecasts for multivariate time series. One example examines cointegration by using the Granger causality tests and the vector error correction models, which are the techniques frequently applied in the Federal Reserve Board's Comprehensive Capital Analysis and Review (CCAR), and the other example analyzes the long-memory behavior of US inflation rates.

Read the paper (PDF) | Download the data file (ZIP)

People typically invest in more than one stock to help diversify their risk. These stock portfolios are a collection of assets that each have their own inherit risk. If you know the future risk of each of the assets, you can optimize how much of each asset to keep in the portfolio. The real challenge is trying to evaluate the potential future risk of these assets. Different techniques provide different forecasts, which can drastically change the optimal allocation of assets. This talk presents a case study of portfolio optimization in three different scenarios historical standard deviation estimation, capital asset pricing model (CAPM), and GARCH-based volatility modeling. The structure and results of these three approaches are discussed.

Read the paper (PDF)

This paper uses a simulation comparison to evaluate quantile approximation methods in terms of their practical usefulness and potential applicability in an operational risk context. A popular method in modeling the aggregate loss distribution in risk and insurance is the Loss Distribution Approach (LDA). Many banks currently use the LDA for estimating regulatory capital for operational risk. The aggregate loss distribution is a compound distribution resulting from a random sum of losses, where the losses are distributed according to some severity distribution and the number (of losses) distributed according to some frequency distribution. In order to estimate the regulatory capital, an extreme quantile of the aggregate loss distribution has to be estimated. A number of numerical approximation techniques have been proposed to approximate the extreme quantiles of the aggregate loss distribution. We use PROC SEVERITY to fit various severity distributions to simulated samples of individual losses from a preselected severity distribution. The accuracy of the approximations obtained is then evaluated against a Monte Carlo approximation of the extreme quantiles of the compound distribution resulting from the preselected severity distribution. We find that the second-order perturbative approximation, a closed-form approximation, performs very well at the extreme quantiles and over a wide range of distributions and is very easy to implement.

Read the paper (PDF)

Panel data, which are collected on a set (panel) of individuals over several time points, are ubiquitous in economics and other analytic fields because their structure allows for individuals to act as their own control groups. The PANEL procedure in SAS/ETS^® software models panel data that have a continuous response, and it provides many options for estimating regression coefficients and their standard errors. Some of the available estimation methods enable you to estimate a dynamic model by using a lagged dependent variable as a regressor, thus capturing the autoregressive nature of the underlying process. Including lagged dependent variables introduces correlation between the regressors and the residual error, which necessitates using instrumental variables. This paper guides you through the process of using the typical estimation method for this situation-the generalized method of moments (GMM)-and the process of selecting the optimal set of instrumental variables for your model. Your goal is to achieve unbiased, consistent, and efficient parameter estimates that best represent the dynamic nature of the model.

Read the paper (PDF)

Graphs are mathematical structures capable of representing networks of objects and their relationships. Clustering is an area in graph theory where objects are split into groups based on their connections. Depending on the application domain, object clusters have various meanings (for example, in market basket analysis, clusters are families of products that are frequently purchased together). This paper provides a SAS^® macro featuring PROC OPTGRAPH, which enables the transformation of transactional data, or any data with a many-to-many relationship between two entities, into graph data, allowing for the generation and application of the co-occurrence graph and the probability graph.