Testing for unit roots and determining whether a data set is nonstationary is important for the economist who does empirical work. SAS® enables the user to detect unit roots using an array of tests: the Dickey-Fuller, Augmented Dickey-Fuller, Phillips-Perron, and the Kwiatkowski-Phillips-Schmidt-Shin test. This paper presents a brief overview of unit roots and shows how to test for a unit root using the example of U.S. national health expenditure data.
Don McCarthy, Kaiser Permanente
The importance of econometrics in the analytics toolkit is increasing every day. Econometric modeling helps uncover structural relationships in observational data. This paper highlights the many recent changes to the SAS/ETS® portfolio that increase your power to explain the past and predict the future. Examples show how you can use Bayesian regression tools for price elasticity modeling, use state space models to gain insight from inconsistent time series, use panel data methods to help control for unobserved confounding effects, and much more.
Mark Little, SAS
Kenneth Sanford, SAS
This analysis is based on data for all transactions at four parking meters within a small area in central Copenhagen for a period of four years. The observations show the exact minute parking was bought and the amount of time for which parking was bought in each transaction. These series of at most 80,000 transactions are aggregated to the hour, day, week, and month using PROC TIMESERIES. The aggregated series of parking times and the number of transactions are analyzed for seasonality and interdependence by PROC X12, PROC UCM, and PROC VARMAX.
Anders Milhoj, Copenhagen University
Many organizations need to forecast large numbers of time series that are discretely valued. These series, called count series, fall approximately between continuously valued time series, for which there are many forecasting techniques (ARIMA, UCM, ESM, and others), and intermittent time series, for which there are a few forecasting techniques (Croston's method and others). This paper proposes a technique for large-scale automatic count series forecasting and uses SAS® Forecast Server and SAS/ETS® software to demonstrate this technique.
Michael Leonard, SAS
Design of experiments (DOE) is an essential component of laboratory, greenhouse, and field research in the natural sciences. It has also been an integral part of scientific inquiry in diverse social science fields such as education, psychology, marketing, pricing, and social works. The principle and practices of DOE are among the oldest and the most advanced tools within the realm of statistics. DOE classification schemes, however, are diverse and, at times, confusing. In this presentation, we provide a simple conceptual classification framework in which experimental methods are grouped into classical and statistical approaches. The classical approach is further divided into pre-, quasi-, and true-experiments. The statistical approach is divided into one, two, and more than two factor experiments. Within these broad categories, we review several contemporary and widely used designs and their applications. The optimal use of Base SAS® and SAS/STAT® to analyze, summarize, and report these diverse designs is demonstrated. The prospects and challenges of such diverse and critically important analytics tools on business insight extraction in marketing and pricing research are discussed.
Max Friedauer
Jason Greenfield, Cardinal Health
Yuhan Jia, Cardinal Health
Joseph Thurman, Cardinal Health
As pollution and population continue to increase, new concepts of eco-friendly commuting evolve. One of the emerging concepts is the bicycle sharing system. It is a bike rental service on a short-term basis at a moderate price. It provides people the flexibility to rent a bike from one location and return it to another location. This business is quickly gaining popularity all over the globe. In May 2011, there were only 375 bike rental schemes consisting of nearly 236,000 bikes. However, this number jumped to 535 bike sharing programs with approximately 517,000 bikes in just a couple of years. It is expected that this trend will continue to grow at a similar pace in the future. Most of the businesses involved in this system of bike rental are faced with the challenge of balancing supply and inconsistent demand. The number of bikes needed on a particular day can vary on several factors such as season, time, temperature, wind speed, humidity, holiday and day of the week. In this paper, we have tried to solve this problem using SAS® Forecast Studio. Incorporating the effects of all the above factors and analyzing the demand trends of the last two years, we have been able to precisely forecast the number of bikes needed on any day in the future. Also, we are able to do the scenario analysis to observe the effect of particular variables on the demand.
Kushal Kathed, Oklahoma State University
Goutam Chakraborty, Oklahoma State University
Ayush Priyadarshi, Oklahoma State University
In many studies, a continuous response variable is repeatedly measured over time on one or more subjects. The subjects might be grouped into different categories, such as cases and controls. The study of resulting observation profiles as functions of time is called functional data analysis. This paper shows how you can use the SSM procedure in SAS/ETS® software to model these functional data by using structural state space models (SSMs). A structural SSM decomposes a subject profile into latent components such as the group mean curve, the subject-specific deviation curve, and the covariate effects. The SSM procedure enables you to fit a rich class of structural SSMs, which permit latent components that have a wide variety of patterns. For example, the latent components can be different types of smoothing splines, including polynomial smoothing splines of any order and all L-splines up to order 2. The SSM procedure efficiently computes the restricted maximum likelihood (REML) estimates of the model parameters and the best linear unbiased predictors (BLUPs) of the latent components (and their derivatives). The paper presents several real-life examples that show how you can fit, diagnose, and select structural SSMs; test hypotheses about the latent components in the model; and interpolate and extrapolate these latent components.
Rajesh Selukar, SAS
SAS/ETS provides many tools to improve the productivity of the analyst who works with time series data. This tutorial will take an analyst through the process of turning transaction-level data into a time series. The session will then cover some basic forecasting techniques that use past fluctuations to predict future events. We will then extend this modeling technique to include explanatory factors in the prediction equation.
Kenneth Sanford, SAS
Since the financial crisis of 2008, banks and bank holding companies in the United States have faced increased regulation. One of the recent changes to these regulations is known as the Comprehensive Capital Analysis and Review (CCAR). At the core of these new regulations, specifically under the Dodd-Frank Wall Street Reform and Consumer Protection Act and the stress tests it mandates, are a series of what-if or scenario analyses requirements that involve a number of scenarios provided by the Federal Reserve. This paper proposes frequentist and Bayesian time series methods that solve this stress testing problem using a highly practical top-down approach. The paper focuses on the value of using univariate time series methods, as well as the methodology behind these models.
Kenneth Sanford, SAS
Christian Macaro, SAS
Operational risk losses are heavy tailed and likely to be asymmetric and extremely dependent among business lines and event types. We propose a new methodology to assess, in a multivariate way, the asymmetry and extreme dependence between severity distributions and to calculate the capital for operational risk. This methodology simultaneously uses several parametric distributions and an alternative mix distribution (the lognormal for the body of losses and the generalized Pareto distribution for the tail) via the extreme value theory using SAS®; the multivariate skew t-copula applied for the first time to operational losses; and the Bayesian inference theory to estimate new n-dimensional skew t-copula models via Markov chain Monte Carlo (MCMC) simulation. This paper analyzes a new operational loss data set, SAS® Operational Risk Global Data (SAS OpRisk Global Data), to model operational risk at international financial institutions. All of the severity models are constructed in SAS® 9.2. We implement PROC SEVERITY and PROC NLMIXED and this paper describes this implementation.
Betty Johanna Garzon Rozo, The University of Edinburgh
SAS® Forecast Server provides easy and automatic large-scale forecasting, which enables organizations to commit fewer resources to the process, reduce human touch interaction and minimize the biases that contaminate forecasts. SAS Forecast Server Client represents the modernization of the graphical user interface for SAS Forecast Server. This session will describe and demonstrate this new client, including new features, such as demand classification, and overall functionality.
Udo Sglavo, SAS
Faced with diminishing forecast returns from the forecast engine within the existing replenishment application, Tractor Supply Company (TSC) engaged SAS® Institute to deliver a fully integrated forecasting solution that promised a significant improvement of chain-wide forecast accuracy. The end-to-end forecast implementation including problems faced, solutions delivered, and results realized will be explored.
Chris Houck, SAS
To stay competitive in the marketplace, health-care programs must be capable of reporting the true savings to clients. This is a tall order, because most health-care programs are set up to be available to the client's entire population and thus cannot be conducted as a randomized control trial. In order to evaluate the performance of the program for the client, we use an observational study design that has inherent selection bias due to its inability to randomly assign participants. To reduce the impact of bias, we apply propensity score matching to the analysis. This technique is beneficial to health-care program evaluations because it helps reduce selection bias in the observational analysis and in turn provides a clearer view of the client's savings. This paper explores how to develop a propensity score, evaluate the use of inverse propensity weighting versus propensity matching, and determine the overall impact of the propensity score matching method on the observational study population. All results shown are drawn from a savings analysis using a participant (cases) versus non-participant (controls) observational study design for a health-care decision support program aiming to reduce emergency room visits.
Amber Schmitz, Optum
Reporting Best Practices
Trina Gladwell, Bealls Inc
Pay-for-performance programs are putting increasing pressure on providers to better manage patient utilization through care coordination, with the philosophy that good preventive services and routine care can prevent the need for some high-resource services. Evaluation of provider performance frequently includes measures such as acute care events (ER and inpatient), imaging, and specialist services, yet rarely are these indicators adjusted for the underlying risk of providers' patient panel. In part, this is because standard patient risk scores are designed to predict costs, not the probability of specific service utilization. As such, Blue Cross Blue Shield of North Carolina has developed a methodology to model our members' risk of these events in an effort to ensure that providers are evaluated fairly and to prevent our providers from adverse selection practices. Our risk modeling takes into consideration members' underlying health conditions and limited demographic factors during the previous 12 month period, and employs two-part regression models using SAS® software. These risk-adjusted measures will subsequently be the basis of performance evaluation of primary care providers for our Accountable Care Organizations and medical home initiatives.
Stephanie Poley, Blue Cross Blue Shield of North Carolina
This workshop provides hands-on experience using SAS® Forecast Server. Workshop participants will learn to: create a project with a hierarchy, generate multiple forecast automatically, evaluate the forecasts accuracy, and build a custom model.
Catherine Truxillo, SAS
George Fernandez, SAS
Terry Woodfield, SAS
In today's omni-channel world, consumers expect retailers to deliver the product they want, where they want it, when they want it, at a price they accept. A major challenge many retailers face in delighting their customers is successfully predicting consumer demand. Business decisions across the enterprise are affected by these demand estimates. Forecasts used to inform high-level strategic planning, merchandising decisions (planning assortments, buying products, pricing, and allocating and replenishing inventory) and operational execution (labor planning) are similar in many respects. However, each business process requires careful consideration of specific input data, modeling strategies, output requirements, and success metrics. In this session, learn how leading retailers are increasing sales and profitability by operationalizing forecasts that improve decisions across their enterprise.
Alex Chien, SAS
Elizabeth Cubbage, SAS
Wanda Shive, SAS
The unsustainable trend in healthcare costs has led to efforts to shift some healthcare services to less expensive sites of care. In North Carolina, the expansion of urgent care centers introduces the possibility that non-emergent and non-life threatening conditions can be treated at a less intensive care setting. BCBSNC conducted a longitudinal study of density of urgent care centers, primary care providers, and emergency departments, and the differences in how members access care near those locations. This talk focuses on several analytic techniques that were considered for the analysis. The model needed to account for the complex relationship between the changes in the population (including health conditions and health insurance benefits) and the changes in the types of services and supply of services offered by healthcare providers proximal to them. Results for the chosen methodology are discussed.
Laurel Trantham, Blue Cross and Blue Shield North Carolina
The bookBot Identity: January 2013. With no memory of it from the past, students and faculty at NC State awake to find the Hunt Library just opened, and inside it, the mysterious and powerful bookBot. A true physical search engine, the bookBot, without thinking, relentlessly pursues, captures, and delivers to the patron any requested book (those things with paper pages--remember?) from the Hunt Library. The bookBot Supremacy: Some books were moved from the central campus library to the new Hunt Library. Did this decrease overall campus circulation or did the Hunt Library and its bookBot reign supreme in increasing circulation? The bookBot Ultimatum: To find out if the opening of the Hunt Library decreased or increased overall circulation. To address the bookBot Ultimatum, the Circulation Statistics Investigation (CSI) team uses the power of SAS® analytics to model library circulation before and after the opening of the Hunt Library. The bookBot Legacy: Join us for the adventure-filled story. Filled with excitement and mystery, this talk is bound to draw a much bigger crowd than had it been more honestly titled Intervention Analysis for Library Data. Tools used are PROC ARIMA, PROC REG, and PROC SGPLOT.
David Dickey, NC State University
John Vickery, North Carolina State University
A bank that wants to use the Internal Ratings Based (IRB) methods to calculate minimum Basel capital requirements has to calculate default probabilities (PDs) for all its obligors. Supervisors are mainly concerned about the credit risk being underestimated. For high-quality exposures or groups with an insufficient number of obligors, calculations based on historical data may not be sufficiently reliable due to infrequent or no observed defaults. In an effort to solve the problem of default data scarcity, modeling assumptions are made, and to control the possibility of model risk, a high level of conservatism is applied. Banks, on the other hand, are more concerned about PDs that are too pessimistic, since this has an impact on their pricing and economic capital. In small samples or where we have little or no defaults, the data provides very little information about the parameters of interest. The incorporation of prior information or expert judgment and using Bayesian parameter estimation can potentially be a very useful approach in a situation like this. Using PROC MCMC, we show that a Bayesian approach can serve as a valuable tool for validation and monitoring of PD models for low default portfolios (LDPs). We cover cases ranging from single-period, zero correlation, and zero observed defaults to multi-period, non-zero correlation, and few observed defaults.
Machiel Kruger, North-West University
Many retail and consumer packaged goods (CPG) companies are now keeping track of what their customers purchased in the past, often through some form of loyalty program. This record keeping is one example of how modern corporations are building data sets that have a panel structure, a data structure that is also pervasive in insurance and finance organizations. Panel data (sometimes called longitudinal data) can be thought of as the joining of cross-sectional and time series data. Panel data enable analysts to control for factors that cannot be considered by simple cross-sectional regression models that ignore the time dimension. These factors, which are unobserved by the modeler, might bias regression coefficients if they are ignored. This paper compares several methods of working with panel data in the PANEL procedure and discusses how you might benefit from using multiple observations for each customer. Sample code is available.
Bobby Gutierrez, SAS
Kenneth Sanford, SAS