SAS/ETS^{®} Software Papers A-Z

Debugging SAS^{®} code contained in a macro can be frustrating because the SAS error messages refer only to the line in the SAS log where the macro was invoked. This can make it difficult to pinpoint the problem when the macro contains a large amount of SAS code. Using a macro that contains one small DATA step, this paper shows how to use the MPRINT and MFILE options along with the fileref MPRINT to write just the SAS code generated by a macro to a file. The 'de-macroified' SAS code can be easily executed and debugged.

Evaluation of the efficacy of an intervention is often complicated because the intervention is not randomly assigned. Usually, interventions in marketing, such as coupons or retention campaigns, are directed at customers because their spending is below some threshold or because the customers themselves make a purchase decision. The presence of nonrandom assignment of the stimulus can lead to over- or underestimating the value of the intervention. This can cause future campaigns to be directed at the wrong customers or cause the impacts of these effects to be over- or understated. This paper gives a brief overview of selection bias, demonstrates how selection in the data can be modeled, and shows how to apply some of the important consistent methods of estimating selection models, including Heckman's two-step procedure, in an empirical example. Sample code is provided in an appendix.

Portfolio segmentation is key in all forecasting projects. Not all products are equally predictable. Nestl uses animal names for its segmentation, and the animal behavior translates well into how the planners should plan these products. Mad Bulls are those products that are tough to predict, if we don't know what is causing their unpredictability. The Horses are easier to deal with. Modern time series based statistical forecasting methods can tame Mad Bulls, as they allow to add explanatory variables into the models. Nestl now complements its Demand Planning solution based on SAP with predictive analytics technology provided by SAS^{®}, to overcome these issues in an industry that is highly promotion-driven. In this talk, we will provide an overview of the relationship Nestl is building with SAS, and provide concrete examples of how modern statistical forecasting methods available in SAS^{®} Demand-Driven Planning and Optimization help us to increase forecasting performance, and therefore to provide high service to our customers with optimized stock, the primary goal of Nestl 's supply chains.

The role of the Data Scientist is the viral job description of the decade. And like LOLcats, there are many types of Data Scientists. What is this new role? Who is hiring them? What do they do? What skills are required to do their job? What does this mean for the SAS^{®} programmer and the statistician? Are they obsolete? And finally, if I am a SAS user, how can I become a Data Scientist? Come learn about this job of the future and what you can do to be part of it.

Retail price setting is influenced by two distinct factors: the regular price and the promotion price. Together, these factors determine the list price for a specific item at a specific time. These data are often reported only as a singular list price. Separating this one price into two distinct prices is critical for accurate price elasticity modeling in retail. These elasticities are then used to make sales forecasts, manage inventory, and evaluate promotions. This paper describes a new time-series feature extraction utility within SAS^{®} Forecast Server that allows for automated separation of promotional and regular prices.

Two examples of Vector Autoregressive Moving Average modeling with exogenous variables are given in this presentation. Data is from the real world. One example is about a two-dimensional time series for wages and prices in Denmark that spans more than a hundred years. The other is about the market for agricultural products, especially eggs! These examples give a general overview of the many possibilities offered by PROC VARMAX, such as handling of seasonality, causality testing and Bayesian modeling, and so on.

Duration and severity data arise in several fields including biostatistics, demography, economics, engineering, and sociology. SAS^{®} procedures LIFETEST, LIFEREG. and PHREG are the workhorses for analysis of time to event data in applications in biostatistics. Similar methods apply to the magnitude or severity of a random event, where the outcome might be right, left, or interval censored and/or, right or left truncated. All combinations of types of censoring and truncation could be present in the data set. Regression models such as the accelerated failure time model, the Cox model, and the non-homogeneous Poisson model have extensions to address time-varying covariates in the analysis of clustered outcomes, multivariate outcomes of mixed types, and recurrent events. We present an overview of new capabilities that are available in the procedures QLIM, QUANTLIFE, RELIABILITY, and SEVERITY with examples illustrating their application using empirical data sets drawn from easily accessible sources.

Companies in the insurance and banking industries need to model the frequency and severity of adverse events every day. Accurate modeling of risks and the application of predictive methods ensure the liquidity and financial health of portfolios. Often, the modeling involves computationally intensive, large-scale simulation. SAS/ETS^{®} provides high-performance procedures to assist in this modeling. This paper discusses the capabilities of the HPCOUNTREG and HPSEVERITY procedures, which estimate count and loss distribution models in a massively parallel processing environment. The loss modeling features have been extended by the new HPCDM procedure, which simulates the probability distribution of the aggregate loss by compounding the count and severity distribution models. PROC HPCDM also analyzes the impact of various future scenarios and parameter uncertainty on the distribution of the aggregate loss. This paper steps through the entire modeling and simulation process that is useful in the insurance and banking industries.

Global businesses must react to daily changes in market conditions over multiple geographies and industries. Consuming reputable daily economic reports assists in understanding these changing conditions, but requires both a significant human time commitment and a subjective assessment of each topic area of interest. To combat these constraints, Dow's Advanced Analytics team has constructed a process to calculate sentence-level topic frequency and sentiment scoring from unstructured economic reports. Daily topic sentiment scores are aggregated to weekly and monthly intervals and used as exogenous variables to model external economic time series data. These models serve to both validate the relationship between our sentiment scoring process and also as near-term forecasts where daily or weekly variables are unavailable. This paper will first describe our process of using SAS^{®} Text Miner to import and discover economic topics and sentiment from unstructured economic reports. The next section describes sentiment variable selection techniques that use SAS/STAT^{®}, SAS/ETS^{®}, and SAS^{®} Enterprise Miner^{™} to generate similarity measures to economic indices. Our process then uses ARIMAX modeling in SAS^{®} Forecast Studio to create economic index forecasts with topic sentiments. Finally, we show how the sentiment model components are used as a matrix of economic key performance indicators by topic and geography.

One of the most striking features separating SAS^{®} from other statistical languages is that SAS has native SQL (Structured Query Language) capacity. In addition to the merging or the querying that a SAS user commonly applies in daily practice, SQL significantly enhances the power of SAS in descriptive statistics and data management. In this paper, we show reproducible examples to introduce 10 useful tips for the SQL procedure in the BASE module.

Volatility estimation plays an important role in the elds of statistics and nance. Many different techniques address the problem of estimating volatility of nancial assets. Autoregressive conditional heteroscedasticity (ARCH) models and the related generalized ARCH models are popular models for volatility. This talk will introduce the need for volatility modeling as well as introduce the framework of ARCH and GARCH models. A brief discussion about the structure of ARCH and GARCH models will then be compared to other volatility modeling techniques.

Expensive physical capital must be regularly maintained for optimal efficiency and long-term insurance against damage. The maintenance process usually consists of constantly monitoring high-frequency sensor data and performing corrective maintenance when the expected values do not match the actual values. An economic system can also be thought of as a system that requires constant monitoring and occasional maintenance in the form of monetary or fiscal policy. This paper shows how to use the SSM procedure in SAS/ETS^{®} to make forecasts of expected values by using high-frequency multivariate time series. The paper also demonstrates the functionality of the new SASEFRED interface engine in SAS/ETS.