SAS Global Forum 2016 Proceedings

Arrays and DO loops have been used for years by SAS^® programmers who work with diagnosis fields in health-care data, and the opportunity to use these techniques has only grown since the launch of the Affordable Care Act (ACA) in the United States. Users new to SAS or to the health-care field may find an overview of existing (as well as new) applications helpful. Risk-adjustment software, including the publicly available Health and Human Services (HHS) risk software that uses SAS and was released as part of the ACA implementation, is one example of code that is significantly improved by the use of arrays. Similar projects might include evaluations of diagnostic persistence, comparisons of diagnostic frequency or intensity between providers, and checks for unusual clusters of diagnosed conditions. This session reviews examples suitable for intermediate SAS users, including the application of two-dimensional arrays to diagnosis fields.

Read the paper (PDF)

Even though marketing is inevitable in every business, every year the marketing budget is limited and prudent fund allocations are required to optimize marketing investment. In many businesses, the marketing fund is allocated based on the marketing manager's experience, departmental budget allocation rules, and sometimes 'gut feelings' of business leaders. Those traditional ways of budget allocation yield suboptimal results and in many cases lead to money being wasted on irrelevant marketing efforts. Marketing mixed models can be used to understand the effects of marketing activities and identify the key marketing efforts that drive the most sales among a group of competing marketing activities. The results can be used in marketing budget allocation to take out the guesswork that typically goes into the budget allocation. In this paper, we illustrate practical methods for developing and implementing marketing mixed modeling using SAS^® procedures. Real-life challenges of marketing mixed model development and execution are discussed, and several recommendations are provided to overcome some of those challenges.

Read the paper (PDF)

Customer retention is a primary concern for businesses that rely on a subscription revenue model. It is common for marketers of subscription-based offerings to develop predictive models that are aimed at identifying subscribers who have the highest risk of attrition. With these likely unsubscribes identified, marketers then attempt to forestall customer termination by using a variety of retention enhancement tactics, which might include free offers, customer training, satisfaction surveys, or other measures. Although customer retention is always a worthy pursuit, it is often expensive to retain subscribers. In many cases, associated retention programs simply prove unprofitable over time because the overall cost of such programs frequently exceeds the lifetime value of the cohort of unsubscribed customers. Generally, it is more profitable to focus resources on identifying and marketing to a targeted prospective customer. When the target marketing strategy focuses on identifying prospects who are most likely to subscribe over the long term, the need for special retention marketing efforts decreases sharply. This paper describes results of an analytically driven targeting approach that is aimed at inviting new customers to a milk and grocery home-delivery service, with the promise of attracting only those prospects who are expected to exhibit high long-term retention rates.

Read the paper (PDF)

Many organizations need to report forecasts of large numbers of time series at various levels of aggregation. Numerous model-based forecasts that are statistically generated at the lowest level of aggregation need to be combined to form an aggregate forecast that is not required to follow a fixed hierarchy. The forecasts need to be dynamically aggregated according to any subset of the time series, such as from a query. This paper proposes a technique for large-scale automatic forecast aggregation and uses SAS^® Forecast Server and SAS/ETS^® software to demonstrate this technique.

Read the paper (PDF)

We introduce age-period-cohort (APC) models, which analyze data in which performance is measured by age of an account, account open date, and performance date. We demonstrate this flexible technique with an example from a recent study that seeks to explain the root causes of the US mortgage crisis. In addition, we show how APC models can predict website usage, retail store sales, salesperson performance, and employee attrition. We even present an example in which APC was applied to a database of tree rings to reveal climate variation in the southwestern United States.

View the e-poster or slides (PDF)

SAS^® Forecast Server provides an excellent tool towards forecasting of time series across a variety of scenarios and industries. Given sufficient data, SAS Forecast Server can provide insights into seasonality, trend, and effects of input variables and events. In some business cases, there might not be sufficient data within individual series to produce these complex models. This paper presents an approach to forecasting these short time series. Using sufficiently long time series as a training data set, forecasted shape and volumes are created through clustering and attribute-based regressions, allowing for new series to have forecasts based on their characteristics. These forecasts are passed into SAS Forecast Server through preprocessing. Production of artificial history and input variables, matched with a custom model addition to the standard model repository, allows for future values of the artificial variable to be realized as the forecast. This process not only relieves the need for manual entry of overrides, but also allows SAS Forecast Server to choose alternative models as additional observations are provided through incremental loads.

Read the paper (PDF)

The importance of understanding terrorism in the United States assumed heightened prominence in the wake of the coordinated attacks of September 11, 2001. Yet, surprisingly little is known about the frequency of attacks that might happen in the future and the factors that lead to a successful terrorist attack. This research is aimed at forecasting the frequency of attacks per annum in the United States and determining the factors that contribute to an attack's success. Using the data acquired from the Global Terrorism Database (GTD), we tested our hypothesis by examining the frequency of attacks per annum in the United States from 1972 to 2014 using SAS^® Enterprise Miner™ and SAS^® Forecast Studio. The data set has 2,683 observations. Our forecasting model predicts that there could be as many as 16 attacks every year for the next 4 years. From our study of factors that contribute to the success of a terrorist attack, we discovered that attack type, weapons used in the attack, place of attack, and type of target play pivotal roles in determining the success of a terrorist attack. Results reveal that the government might be successful in averting assassination attempts but might fail to prevent armed assaults and facilities or infrastructure attacks. Therefore, additional security might be required for important facilities in order to prevent further attacks from being successful. Results further reveal that it is possible to reduce the forecasted number of attacks by raising defense spending and by putting an end to the raging war in the Middle East. We are currently working on gathering data to do in-depth analysis at the monthly and quarterly level.

View the e-poster or slides (PDF)

Since Atul Gawande popularized the term in describing the work of Dr. Jeffrey Brenner in a New Yorker article, hot-spotting has been used in health care to describe the process of identifying super-utilizers of health care services, then defining intervention programs to coordinate and improve their care. According to Brenner's data from Camden, New Jersey, 1% of patients generate 30% of payments to hospitals, while 5% of patients generate 50% of payments. Analyzing administrative health care claims data, which contains information about diagnoses, treatments, costs, charges, and patient sociodemographic data, can be a useful way to identify super-users, as well as those who may be receiving inappropriate care. Both groups can be targeted for care management interventions. In this paper, techniques for patient outlier identification and prioritization are discussed using examples from private commercial and public health insurance claims data. The paper also describes techniques used with health care claims data to identify high-risk, high-cost patients and to generate analyses that can be used to prioritize patients for various interventions to improve their health.

Read the paper (PDF)

Electric load forecasting is a complex problem that is linked with social activity considerations and variations in weather and climate. Furthermore, electric load is one of only a few goods that require the demand and supply to be balanced at almost real time (that is, there is almost no inventory). As a result, the utility industry could be much more sensitive to forecast error than many other industries. This electric load forecasting problem is even more challenging for holidays, which have limited historical data. Because of the limited holiday data, the forecast error for holidays is higher, on average, than it is for regular days. Identifying and using days in the history that are similar to holidays to help model the demand during holidays is not new for holiday demand forecasting in many industries. However, the electric demand in the utility industry is strongly affected by the interaction of weather conditions and social activities, making the problem even more dynamic. This paper describes an investigation into the various technologies that are used to identify days that are similar to holidays and using those days for holiday demand forecasting in the utility industry.

Read the paper (PDF)

Contemporary data-collection processes usually involve recording information about the geographic location of each observation. This geospatial information provides modelers with opportunities to examine how the interaction of observations affects the outcome of interest. For example, it is likely that car sales from one auto dealership might depend on sales from a nearby dealership either because the two dealerships compete for the same customers or because of some form of unobserved heterogeneity common to both dealerships. Knowledge of the size and magnitude of the positive or negative spillover effect is important for creating pricing or promotional policies. This paper describes how geospatial methods are implemented in SAS/ETS^® and illustrates some ways you can incorporate spatial data into your modeling toolkit.

Read the paper (PDF)

Macroeconomic simulation analysis provides in-depth insights to a portfolio's performance spectrum. Conventionally, portfolio and risk managers obtain macroeconomic scenarios from third parties such as the Federal Reserve and determine portfolio performance under the provided scenarios. In this paper, we propose a technique to extend scenario analysis to an unconditional simulation capturing the distribution of possible macroeconomic climates and hence the true multivariate distribution of returns. We propose a methodology that adds to the existing scenario analysis tools and can be used to determine which types of macroeconomic climates have the most adverse outcomes for the portfolio. This provides a broader perspective on value at risk measures thereby allowing more robust investment decisions. We explain the use of SAS^® procedures like VARMAX and PROC COPULA in SAS/IML^® in this analysis.

Read the paper (PDF)

In 1965, nearly half of all Americans 65 and older had no health insurance. Now, 50 years later, only 2% lack health insurance. The difference, of course, is Medicare. Medicare now covers 55 million people, about 17% of the US population, and is the single largest purchaser of personal health care. Despite this success, the rising costs of health care in general and Medicare in particular have become a growing concern. Medicare policies are important not only because they directly affect large numbers of beneficiaries, payers, and providers, but also because they affect private-sector policies as well. Analyses of Medicare policies and their consequences are complicated both by the effects of an aging population that has changing cost drivers (such as less smoking and more obesity) and by different Medicare payment models. For example, the average age of the Medicare population will initially decrease as the baby-boom generation reaches eligibility, but then increase as that generation grows older. Because younger beneficiaries have lower costs, these changes will affect cost trends and patterns that need to be interpreted within the larger context of demographic shifts. This presentation examines three Medicare payment models: fee-for-service (FFS), Medicare Advantage (MA), and Accountable Care Organizations (ACOs). FFS, originally based on payment methods used by Blue Cross and Blue Shield in the mid-1960s, pays providers for individual services (for example, physicians are paid based on the fees they charge). MA is a capitated payment model in which private plans receive a risk-adjusted rate. ACOs are groups of providers who are given financial incentives for reducing cost and maintaining quality of care for specified beneficiaries. Each model has strengths and weaknesses in specific markets. We examine each model, in addition to new data sources and more recent, innovative payment models that are likely to affect future trends.

Read the paper (PDF)

As any airline traveler knows, connection time is a key element of the travel experience. A tight connection time can cause angst and concern, while a lengthy connection time can introduce boredom and a longer than desired travel time. The same elements apply when constructing schedules for airline pilots. Like passengers, pilot schedules are built with connections. Delta Air Lines operates a hub and spoke system that feeds both passengers and pilots from the spoke stations and connects them through the hub stations. Pilot connection times that are tight can result in operational disruptions, whereas extended pilot connection times are inefficient and unnecessarily costly. This paper demonstrates how Delta Air Lines used SAS^® PROC REG and PROC LOGISTIC to analyze historical data in order to build operationally robust and financially responsible pilot connections.

Read the paper (PDF)

Today, there are 28 million small businesses, which account for 54% of all sales in the United States. The challenge is that small businesses struggle every day to accurately forecast future sales. These forecasts not only drive investment decisions in the business, but also are used in setting daily par, determining labor hours, and scheduling operating hours. In general, owners use their gut instinct. Using SAS^® provides the opportunity to develop accurate and robust models that can unlock costs for small business owners in a short amount of time. This research examines over 5,000 records from the first year of daily sales data for a start-up small business, while comparing the four basic forecasting models within SAS^® Enterprise Guide^®. The objective of this model comparison is to demonstrate how quick and easy it is to forecast small business sales using SAS Enterprise Guide. What does that mean for small businesses? More profit. SAS provides cost-effective models for small businesses to better forecast sales, resulting in better business decisions.

View the e-poster or slides (PDF)

Optimization models require continuous constraints to converge. However, some real-life problems are better described by models that incorporate discontinuous constraints. A common type of such discontinuous constraints becomes apparent when a regulation-mandated diversification requirement is implemented in an investment portfolio model. Generally stated, the requirement postulates that the aggregate of investments with individual weights exceeding certain threshold in the portfolio should not exceed some predefined total within the portfolio. This format of the diversification requirement can be defined by the rules of any specific portfolio construction methodology and is commonly imposed by the regulators. The paper discusses the impact of this type of discontinuous portfolio diversification constraint on the portfolio optimization model solution process, and develops a convergent approach. The latter includes a sequence of definite series of convergent non-linear optimization problems and is presented in the framework of the OPTMODEL procedure modeling environment. The approach discussed has been used in constructing investable equity indexes.

Read the paper (PDF) | Download the data file (ZIP) | View the e-poster or slides (PDF)

Fast detection of forest fires is a great concern among environmental experts and national park managers because forest fires create economic and ecological damage and endanger human lives. For effective fire control and resource preparation, it is necessary to predict fire occurrences in advance and estimate the possible losses caused by fires. For this purpose, using real-time sensor data of weather conditions and fire occurrence are highly recommended in order to support the predicting mechanism. The objective of this presentation is to use SAS^® 9.4 and SAS^® Enterprise Miner™ 14.1 to predict the probability of fires and to figure out special weather conditions resulting in incremental burned areas in Montesinho Park forest (Portugal). The data set was obtained from the Center for Machine Learning and Intelligent Systems at the University of California, Irvine and contains 517 observations and 13 variables from January 2000 to December 2003. Support Vector Machine analyses with variable selection were performed on this data set for fire occurrence prediction with a validation accuracy of approximately 60%. The study also incorporates Incremental Response technique and hypothesis testing to estimate the increased probability of fire, as well as the extra burned area under various conditions. For example, when there is no rain, a 27% higher chance of fires and 4.8 hectares of extra burned area are recorded, compared to when there is rain.

Read the paper (PDF)

Value-based reimbursement is the emerging strategy in the US healthcare system. The premise of value-based care is simple in concept--high quality and low cost provides the greatest value to patients and the various parties that fund their coverage. The basic equation for value is equally simple to compute: value=quality/cost. However, there are significant challenges to measuring it accurately. Error or bias in measuring value could result in the failure of this strategy to ultimately improve the healthcare system. This session discusses various methods and issues with risk adjustment in a value-based reimbursement model. Risk adjustment is an essential tool for ensuring that fair comparisons are made when deciding what health services and health providers have high value. The goal this presentation is to give analysts an overview of risk adjustment and to provide guidance for when, why, and how to use risk adjustment when quantifying performance of health services and healthcare providers on both cost and quality. Statistical modeling approaches are reviewed and practical issues with developing and implementing the models are discussed. Real-world examples are also provided.

Read the paper (PDF)

In forecasting, there are often situations where several time series are interrelated: components of one time series can transition into and from other time series. A Markov chain forecast model may readily capture such intricacies through the estimation of a transition probability matrix, which enables a forecaster to forecast all the interrelated time series simultaneously. A Markov chain forecast model is flexible in accommodating various forecast assumptions and structures. Implementation of a Markov chain forecast model is straightforward using SAS/IML^® software. This paper demonstrates a real-world application in forecasting a community supervision caseload in Washington State. A Markov model was used to forecast five interrelated time series in the midst of turbulent caseload changes. This paper discusses the considerations and techniques in building a Markov chain forecast model at each step. Sample code using SAS/IML is provided. Anyone interested in adding another tool to their forecasting technique toolbox will find that the Markov approach is useful and has some unique advantages in certain settings.

Read the paper (PDF)

SAS/ETS^® 14.1 delivers a substantial number of new features to researchers who want to examine causality with observational data in addition to forecasting the future. This release adds count data models with spatial effects, new linear and nonlinear models for panel data, the X13 procedure for seasonal adjustment, and many more new features. This paper highlights the many enhancements to SAS/ETS software and demonstrates how these features can help your organization increase revenue and enhance productivity.

Read the paper (PDF)

This paper proposes a technique to implement wavelet analysis (WA) for improving a forecasting accuracy of the autoregressive integrated moving average model (ARIMA) in nonlinear time-series. With the assumption of the linear correlation, and conventional seasonality adjustment methods used in ARIMA (that is, differencing, X11, and X12), the model might fail to capture any nonlinear pattern. Rather than directly model such a signal, we decompose it to less complex components such as trend, seasonality, process variations, and noises, using WA. Then, we use them as exogenous variables in the autoregressive integrated moving average with explanatory variable model (ARIMAX). We describe a background of WA. Then, the code and a detailed explanation of WA based on multi-resolution analysis (MRA) in SAS/IML^® software are demonstrated. The idea and mathematical basis of ARIMA and ARIMAX are also given. Next, we demonstrate our technique in forecasting applications using SAS^® Forecast Studio. The demonstrated time-series are nonlinear in nature from different fields. The results suggest that WA effects are good regressors in ARIMAX, which captures nonlinear patterns well.

Read the paper (PDF)

Each night on the news we hear the level of the Dow Jones Industrial Average along with the 'first difference,' which is today's price-weighted average minus yesterday's. It is that series of first differences that excites or depresses us each night as it reflects whether stocks made or lost money that day. Furthermore, the differences form the data series that has the most addressable statistical features. In particular, the differences have the stationarity requirement, which justifies standard distributional results such as asymptotically normal distributions of parameter estimates. Differencing arises in many practical time series because they seem to have what are called 'unit roots,' which mathematically indicate the need to take differences. In 1976, Dickey and Fuller developed the first well-known tests to decide whether differencing is needed. These tests are part of the ARIMA procedure in SAS/ETS^® in addition to many other time series analysis products. I'll review a little of what is was like to do the development and the required computing back then, say a little about why this is an important issue, and focus on examples.