SAS Global Forum 2015 Proceedings

A B C D E F G H I J L M O P R S T U W Y

A

Paper 2641-2015:

A New Method of Using Polytomous Independent Variables with Many Levels for the Binary Outcome of Big Data Analysis

In big data, many variables are polytomous with many levels. The common method to deal with polytomous independent variables is to use a series of design variables, which correspond to the option class or by in the polytomous independent variable in PROC LOGISTIC, if the outcome is binary. If big data has many polytomous independent variables with many levels, using design variables makes the analysis processing very complicated in both computation time and result, which might provide little help on the prediction of outcome. This paper presents a new simple method for logistic regression with polytomous independent variables in big data analysis when analysis of big data is required. In the proposed method, the first step is to conduct an iteration statistical analysis from a SAS^® macro program. Similar to an algorithm in the creation of spline variables, this analysis searches for the proper aggregation groups with a statistical significant difference from all levels in a polytomous independent variable. In the SAS macro program for an iteration, processing of searching new level groups with statistical significant differences has been developed. The first is from level 1 with the smallest value of the outcome means. Then we can conduct a statistical test for the level 1 group with the level 2 group with the second smallest value of outcome mean. If these two groups have a statistical significant difference, we can start to test the level 2 group with the level 3 group. If level 1 and level 2 do not have a statistical significant difference, we can combine them into a new level group 1. Then we are going to test the new level group 1 with level 3. The processing continues until all the levels have been tested. Then we can replace the original level values of the polytomous variable by the new level values with the statistical significant difference. In this situation, the polytomous variable with new levels can be described by these means of all new levels because of the 1 to 1 equivalence relationship of a piecewise function in logit from the polytomous's levels to outcome means. It is very easy to approve that the conditional mean of an outcome y given a polytomous variable x is a very good approximation based on the maximum likelihood analysis. Compared with design variables, the new piecewise variable based on the information of all levels as a single independent variable can capture the impact of all levels in a much simpler way. We have used this method in the predictive models of customer attrition on the polytomous variables: state, business type, customer claim type, and so on. All of these polytomous variables show significant improvement on the prediction of customer attrition than without using them or using design variables in the model development.

Read the paper (PDF).

jian gao, constant contact

jesse harriot, constant contact

lisa Pimentel, constant contact

Paper 3194-2015:

A Tool That Uses the SAS^® PRX Functions to Fix Delimited Text Files

Delimited text files are often plagued by appended and/or truncated records. Writing customized SAS^® code to import such a text file and break out into fields can be challenging. If only there was a way to fix the file before importing it. Enter the file_fixing_tool, a SAS^® Enterprise Guide^® project that uses the SAS PRX functions to import, fix, and export a delimited text file. This fixed file can then be easily imported and broken out into fields.

Read the paper (PDF). | Download the data file (ZIP).

Paul Genovesi, Henry Jackson Foundation for the Advancement of Military Medicine, Inc.

Paper 3492-2015:

Alien Nation: Text Analysis of UFO Sightings in the US Using SAS^® Enterprise Miner™ 13.1

Are we alone in this universe? This is a question that undoubtedly passes through every mind several times during a lifetime. We often hear a lot of stories about close encounters, Unidentified Flying Object (UFO) sightings and other mysterious things, but we lack the documented evidence for analysis on this topic. UFOs have been a matter of interest in the public for a long time. The objective of this paper is to analyze one database that has a collection of documented reports of UFO sightings to uncover any fascinating story related to the data. Using SAS^® Enterprise Miner™ 13.1, the powerful capabilities of text analytics and topic mining are leveraged to summarize the associations between reported sightings. We used PROC GEOCODE to convert addresses of sightings to the locations on the map. Then we used PROC GMAP procedure to produce a heat map to represent the frequency of the sightings in various locations. The GEOCODE procedure converts address data to geographic coordinates (latitude and longitude values). These geographic coordinates can then be used on a map to calculate distances or to perform spatial analysis. On preliminary analysis of the data associated with sightings, it was found that the most popular words associated with UFOs tell us about their shapes, formations, movements, and colors. The Text Profiler node in SAS Enterprise Miner 13.1 was leveraged to build a model and cluster the data into different levels of segment variable. We also explain how the opinions about the UFO sightings change over time using Text Profiling. Further, this analysis uses the Text Profile node to find interesting terms or topics that were used to describe the UFO sightings. Based on the feedback received at SAS^® analytics conference, we plan to incorporate a technique to filter duplicate comments and include weather in that location.

Read the paper (PDF). | Download the data file (ZIP).

Pradeep Reddy Kalakota, Federal Home Loan Bank of Desmoines

Naresh Abburi, Oklahoma State University

Goutam Chakraborty, Oklahoma State University

Zabiulla Mohammed, Oklahoma State University

Paper 3197-2015:

All Payer Claims Databases (APCDs) in Data Transparency and Quality Improvement

Since Maine established the first All Payer Claims Database (APCD) in 2003, 10 additional states have established APCDs and 30 others are in development or show strong interest in establishing APCDs. APCDs are generally mandated by legislation, though voluntary efforts exist. They are administered through various agencies, including state health departments or other governmental agencies and private not-for-profit organizations. APCDs receive funding from various sources, including legislative appropriations and private foundations. To ensure sustainability, APCDs must also consider the sale of data access and reports as a source of revenue. With the advent of the Affordable Care Act, there has been an increased interest in APCDs as a data source to aid in health care reform. The call for greater transparency in health care pricing and quality, development of Patient-Centered Medical Homes (PCMHs) and Accountable Care Organizations (ACOs), expansion of state Medicaid programs, and establishment of health insurance and health information exchanges have increased the demand for the type of administrative claims data contained in an APCD. Data collection, management, analysis, and reporting issues are examined with examples from implementations of live APCDs. Developing data intake, processing, warehousing, and reporting standards are discussed in light of achieving the triple aim of improving the individual experience of care; improving the health of populations; and reducing the per capita costs of care. APCDs are compared and contrasted with other sources of state-level health care data, including hospital discharge databases, state departments of insurance records, and institutional and consumer surveys. The benefits and limitations of administrative claims data are reviewed. Specific issues addressed with examples include implementing transparent reporting of service prices and provider quality, maintaining master patient and provider identifiers, validating APCD data and comparison with o ther state health care data available to researchers and consumers, defining data suppression rules to ensure patient confidentiality and HIPAA-compliant data release and reporting, and serving multiple end users, including policy makers, researchers, and consumers with appropriately consumable information.

Read the paper (PDF). | Watch the recording.

Paul LaBrec, 3M Health Information Systems

Paper 3140-2015:

An Application of the Cox Proportional Hazards Model to the Construction of Objective Vintages for Credit in Financial Institutions, Using PROC PHREG

In Scotia - Colpatria Bank, the retail segment is very important. The quantity of lending applications makes it necessary to use statistical models and analytic tools in order to do an initial selection of good customers, who our credit analyst will study in depth to finally approve or deny a credit application. The construction of target vintages using the Cox model will generate past-due alerts in a shorter time, so the mitigation measures can be applied one or two months earlier than currently. This can reduce the losses by 100 bps in the new vintages. This paper makes the estimation of a proportional hazard model of Cox and compares the results with a logit model for a specific product of the bank. Additionally, we will estimate the objective vintage for the product.

Read the paper (PDF). | Download the data file (ZIP).

Ivan Atehortua Rojas, Scotia - Colpatria Bank

Paper 3439-2015:

An Innovative Method of Customer Clustering

This session will describe an innovative way to identify groupings of customer offerings using SAS^® software. The authors investigated the customer enrollments in nine different programs offered by a large energy utility. These programs included levelized billing plans, electronic payment options, renewable energy, energy efficiency programs, a home protection plan, and a home energy report for managing usage. Of the 640,788 residential customers, 374,441 had been solicited for a program and had adequate data for analysis. Nearly half of these eligible customers (49.8%) enrolled in some type of program. To examine the commonality among programs based on characteristics of customers who enroll, cluster analysis procedures and correlation matrices are often used. However, the value of these procedures was greatly limited by the binary nature of enrollments (enroll or no enroll), as well as the fact that some programs are mutually exclusive (limiting cross-enrollments for correlation measures). To overcome these limitations, PROC LOGISTIC was used to generate predicted scores for each customer for a given program. Then, using the same predictor variables, PROC LOGISTIC was used on each program to generate predictive scores for all customers. This provided a broad range of scores for each program, under the assumption that customers who are likely to join similar programs would have similar predicted scores for these programs. PROC FASTCLUS was used to build k-means cluster models based on these predicted logistic scores. Two distinct clusters were identified from the nine programs. These clusters not only aligned with the hypothesized model, but were generally supported by correlations (using PROC CORR) among program predicted scores as well as program enrollments.

Read the paper (PDF).

Brian Borchers, PhD, Direct Options

Ashlie Ossege, Direct Options

Paper 3472-2015:

Analyzing Marine Piracy from Structured and Unstructured Data Using SAS^® Text Miner

Approximately 80% of world trade at present uses the seaways, with around 110,000 merchant vessels and 1.25 million marine farers transported and almost 6 billion tons of goods transferred every year. Marine piracy stands as a serious challenge to sea trade. Understanding how the pirate attacks occur is crucial in effectively countering marine piracy. Predictive modeling using the combination of textual data with numeric data provides an effective methodology to derive insights from both structured and unstructured data. 2,266 text descriptions about pirate incidents that occurred over the past seven years, from 2008 to the second quarter of 2014, were collected from the International Maritime Bureau (IMB) website. Analysis of the textual data using SAS^® Enterprise Miner™ 12.3, with the help of concept links, answered questions on certain aspects of pirate activities, such as the following: 1. What are the arms used by pirates for attacks? 2. How do pirates steal the ships? 3. How do pirates escape after the attacks? 4. What are the reasons for occasional unsuccessful attacks? Topics are extracted from the text descriptions using a text topic node, and the varying trends of these topics are analyzed with respect to time. Using the cluster node, attack descriptions are classified into different categories based on attack style and pirate behavior described by a set of terms. A target variable called Attack Type is derived from the clusters and is combined with other structured input variables such as Ship Type, Status, Region, Part of Day, and Part of Year. A Predictive model is built with Attact Type as the target variable and other structured data variables as input predictors. The Predictive model is used to predict the possible type of attack given the details of the ship and its travel. Thus, the results of this paper could be very helpful for the shipping industry to become more aware of possible attack types for different vessel types when traversing different routes , and to devise counter-strategies in reducing the effects of piracy on crews, vessels, and cargo.

Read the paper (PDF).

Raghavender Reddy Byreddy, Oklahoma State University

Nitish Byri, Oklahoma State University

Goutam Chakraborty, Oklahoma State University

Tejeshwar Gurram, Oklahoma State University

Anvesh Reddy Minukuri, Oklahoma State University

Paper 3401-2015:

Assessing the Impact of Communication Channel on Behavior Changes in Energy Efficiency

With the increase in government and commissions incentivizing electric utilities to get consumers to save energy, there has been a large increase in the number of energy saving programs. Some are structural, incentivizing consumers to make improvements to their home that result in energy savings. Some, called behavioral programs, are designed to get consumers to change their behavior to save energy. Within behavioral programs, Home Energy Reports are a good method to achieve behavioral savings as well as to educate consumers on structural energy savings. This paper examines the different Home Energy Report communication channels (direct mail and e-mail) and the marketing channel effect on energy savings, using SAS^® for linear models. For consumer behavioral change, we often hear the questions: 1) Are the people that responded via direct mail solicitation saving at a higher rate than people who responded via an e-mail solicitation? 1a) Hypothesis: Because e-mail is easy to respond to, the type of customers that enroll through this channel will exert less effort for the behavior changes that require more time and investment toward energy efficiency changes and thus will save less. 2) Does the mode of that ongoing dialog (mail versus e-mail) impact the amount of consumer savings? 2a) Hypothesis: E-mail is more likely to be ignored and thus these recipients will save less. As savings is most often calculated by comparing the treatment group to a control group (to account for weather and economic impact over time), and by definition you cannot have a dialog with a control group, the answers are not a simple PROC FREQ away. Also, people who responded to mail look very different demographically than people who responded to e-mail. So, is the driver of savings differences the channel, or is it the demographics of the customers that happen to use those chosen channels? This study used clustering (PROC FASTCLUS) to segment the consumers by mail versus e-mail and append cluster assignment s to the respective control group. This study also used DID (Difference-in-Differences) as well as Billing Analysis (PROC GLM) to calculate the savings of these groups.

Read the paper (PDF).

Angela Wells, Direct Options

Ashlie Ossege, Direct Options

Paper SAS1887-2015:

Automating a SAS^® 9.4 Installation without Using Provisioning Software: A Case Study Involving the Setup of Machines for SAS Regional Users Groups

Whether you manage computer systems in a small-to-medium environment (for example, in labs, workshops, or corporate training groups) or in a large-scale deployment, the ability to automate SAS^® 9.4 installations is important to the efficiency and success of your software deployments. For large-scale deployments, you can automate the installation process by using third-party provisioning software such as Microsoft System Center Configuration Manager (SCCM) or Symantec Altiris. But what if you have a small-to-medium environment and you do not have provisioning software to package deployment jobs? No worries! There is a solution. This paper presents a case study of just such a situation where a process was developed for SAS regional users groups (RUGs). Along with the case study, the paper offers a process for automating SAS 9.4 installations in workshop, lab, and corporate training (small-to-medium sized) environments. This process incorporates the new -srwonly option with the SAS^® Deployment Wizard, deployment-wizard commands that use response files, and batch-file implementation. This combination results in easy automation of an installation, even without provisioning software.

Read the paper (PDF).

Max Blake, SAS

B

Paper 3120-2015:

"BatchStats": SAS^® Batch Statistics, A Click Away!

Over the years, the SAS^® Business Intelligence platform has proved its importance in this big data world with its suite of applications that enable us to efficiently process, analyze, and transform huge amounts of business data. Within the data warehouse universe, 'batch execution' sits in the heart of SAS Data Integration technologies. On a day-to-day basis, batches run, and the current status of the batch is generally sent out to the team or to the client as a 'static' e-mail or as a report. From experience, we know that they don't provide much insight into the real 'bits and bytes' of a batch run. Imagine if the status of the running batch is automatically captured in one central repository and is presented on a beautiful web browser on your computer or on your iPad. All this can be achieved without asking anybody to send reports and with all 'post-batch' queries being answered automatically with a click. This paper aims to answer the same with a framework that is designed specifically to automate the reporting aspects of SAS batches and, yes, it is all about collecting statistics of the batch, and we call it - 'BatchStats.'

Prajwal Shetty, Tesco HSC

Paper 3022-2015:

Benefits from the Conversion of Medicaid Encounter Data Reporting System to SAS^® Enterprise Guide^®

Kaiser Permanente Northwest is contractually obligated for regulatory submissions to Oregon Health Authority, Health Share of Oregon, and Molina Healthcare in Washington. The submissions consist of Medicaid Encounter data for medical and pharmacy claims. SAS^® programs are used to extract claims data from Kaiser's claims data warehouse, process the data, and produce output files in HIPAA ASC X12 and NCPDP format. Prior to April 2014, programs were written in SAS^® 8.2 running on a VAX server. Several key drivers resulted in the conversion of the existing system to SAS^® Enterprise Guide^® 5.1 running on UNIX. These drivers were: the need to have a scalable system in preparation for the Affordable Care Act (ACA); performance issues with the existing system; incomplete process reporting and notification to business owners; and a highly manual, labor-intensive process of running individual programs. The upgraded system addressed these drivers. The estimated cost reduction was from $1.30 per reported encounter to $0.13 per encounter. The converted system provides for better preparedness for the ACA. One expected result of ACA is significant Medicaid membership growth. The program has already increased in size by 50% in the preceding 12 months. The updated system allows for the expected growth in membership.

Read the paper (PDF).

Eric Sather, Kaiser Permanente

Paper SAS1801-2015:

Best Practices for Upgrading from SAS^® 9.1.3 to SAS^® 9.4

We regularly speak with organizations running established SAS^® 9.1.3 systems that have not yet upgraded to a later version of SAS^®. Often this is because their current SAS 9.1.3 environment is working fine, and no compelling event to upgrade has materialized. Now that SAS 9.1.3 has moved to a lower level of support and some very exciting technologies (Hadoop, cloud, ever-better scalability) are more accessible than ever using SAS^® 9.4, the case for migrating from SAS 9.1.3 is strong. Upgrading a large SAS ecosystem with multiple environments, an active development stream, and a busy production environment can seem daunting. This paper aims to demystify the process, suggesting outline migration approaches for a variety of the most common scenarios in SAS 9.1.3 to SAS 9.4 upgrades, and a scalable template project plan that has been proven at a range of organizations.

Read the paper (PDF).

David Stern, SAS

Paper 3082-2015:

Big Data Meets Little Data: Hadoop and Arduino Integration Using SAS^®

SAS^® has been an early leader in big data technology architecture that more easily integrates unstructured files across multi-tier data system platforms. By using SAS^® Data Integration Studio and SAS^® Enterprise Business Intelligence software, you can easily automate big data using SAS^® system accommodations for Hadoop open-source standards. At the same time, another seminal technology has emerged, which involves real-time multi-sensor data integration using Arduino microprocessors. This break-out session demonstrates the use of SAS^® 9.4 coding to define Hadoop clusters and to automate Arduino data acquisition to convert custom unstructured log files into structured tables, which can be analyzed by SAS in near real time. Examples include the use of SAS Data Integration Studio to create and automate stored processes, as well as tips for C language object coding to integrate to SAS data management, with a simple temperature monitoring application for Hadoop to Arduino using SAS.

Keith Allan Jones PHD, QUALIMATIX.com

C

Paper 1442-2015:

Confirmatory Factor Analysis Using PROC CALIS: A Practical Guide for Survey Researchers

Survey research can provide a straightforward and effective means of collecting input on a range of topics. Survey researchers often like to group similar survey items into construct domains in order to make generalizations about a particular area of interest. Confirmatory Factor Analysis is used to test whether this pre-existing theoretical model underlies a particular set of responses to survey questions. Based on Structural Equation Modeling (SEM), Confirmatory Factor Analysis provides the survey researcher with a means to evaluate how well the actual survey response data fits within the a priori model specified by subject matter experts. PROC CALIS now provides survey researchers the ability to perform Confirmatory Factor Analysis using SAS^®. This paper provides a survey researcher with the steps needed to complete Confirmatory Factor Analysis using SAS. We discuss and demonstrate the options available to survey researchers in the handling of missing and not applicable survey responses using an ARRAY statement within a DATA step and imputation of item non-response. A simple demonstration of PROC CALIS is then provided with interpretation of key portions of the SAS output. Using recommendations provided by SAS from the PROC CALIS output, the analysis is then modified to provide a better fit of survey items into survey domains.

Read the paper (PDF).

Lindsey Brown Philpot, Baylor Scott & White Health

Sunni Barnes, Baylor Scott&White Health

Crystal Carel, BaylorScott&White Health Care System

Paper 3217-2015:

Credit Card Holders' Behavior Modeling: Transition Probability Prediction with Multinomial and Conditional Logistic Regression in SAS/STAT^®

Because of the variety of card holders' behavior patterns and income sources, each consumer account can change to different states. Each consumer account can change to states such as non-active, transactor, revolver, delinquent, and defaulted, and each account requires an individual model for generated income prediction. The estimation of the transition probability between statuses at the account level helps to avoid the lack of memory in the MDP approach. The key question is which approach gives more accurate results: multinomial logistic regression or multistage decision tree with binary logistic regressions. This paper investigates the approaches to credit cards' profitability estimation at the account level based on multistates conditional probability by using the SAS/STAT procedure PROC LOGISTIC. Both models show moderate, but not strong, predictive power. Prediction accuracy for decision tree is dependent on the order of stages for conditional binary logistic regression. Current development is concentrated on discrete choice models as nested logit with PROC MDC.

Read the paper (PDF).

Denys Osipenko, the University of Edinburgh

Jonathan Crook

Paper 3511-2015:

Credit Scorecard Generation Using the Credit Scoring Node in SAS^® Enterprise Miner™

In today's competitive world, acquiring new customers is crucial for businesses but what if most of the acquired customers turn out to be defaulters? This decision would backfire on the business and might lead to losses. The extant statistical methods have enabled businesses to identify good risk customers rather than intuitively judging them. The objective of this paper is to build a credit risk scorecard using the Credit Risk Node inside SAS^® Enterprise Miner™ 12.3, which can be used by a manager to make an instant decision on whether to accept or reject a customer's credit application. The data set used for credit scoring was extracted from UCI Machine Learning repository and consisted of 15 variables that capture details such as status of customer's existing checking account, purpose of the credit, credit amount, employment status, and property. To ensure generalization of the model, the data set has been partitioned using the data partition node in two groups of 70:30 as training and validation respectively. The target is a binary variable, which categorizes customers into good risk and bad risk group. After identifying the key variables required to generate the credit scorecard, a particular score was assigned to each of its sub groups. The final model generating the scorecard has a prediction accuracy of about 75%. A cumulative cut-off score of 120 was generated by SAS to make the demarcation between good and bad risk customers. Even in case of future variations in the data, model refinement is easy as the whole process is already defined and does not need to be rebuilt from scratch.

Read the paper (PDF).

Ayush Priyadarshi, Oklahoma State University

Kushal Kathed, Oklahoma State University

Shilpi Prasad, Oklahoma State University

D

Paper 2000-2015:

Data Aggregation Using the SAS^® Hash Object

Soon after the advent of the SAS^® hash object in SAS^® 9.0, its early adopters realized that the potential functionality of the new structure is much broader than basic 0(1)-time lookup and file matching. Specifically, they went on to invent methods of data aggregation based on the ability of the hash object to quickly store and update key summary information. They also demonstrated that the DATA step aggregation using the hash object offered significantly lower run time and memory utilization compared to the SUMMARY/MEANS or SQL procedures, coupled with the possibility of eliminating the need to write the aggregation results to interim data files and the programming flexibility that allowed them to combine sophisticated data manipulation and adjustments of the aggregates within a single step. Such developments within the SAS user community did not go unnoticed by SAS R&D, and for SAS^® 9.2 the hash object had been enriched with tag parameters and methods specifically designed to handle aggregation without the need to write the summarized data to the PDV host variable and update the hash table with new key summaries, thus further improving run-time performance. As more SAS programmers applied these methods in their real-world practice, they developed aggregation techniques fit to various programmatic scenarios and ideas for handling the hash object memory limitations in situations calling for truly enormous hash tables. This paper presents a review of the DATA step aggregation methods and techniques using the hash object. The presentation is intended for all situations in which the final SAS code is either a straight Base SAS DATA step or a DATA step generated by any other SAS product.

Read the paper (PDF).

Paul Dorfman, Dorfman Consukting

Don Henderson, Henderson Consulting Services

Paper 3104-2015:

Data Management Techniques for Complex Healthcare Data

Data sharing through healthcare collaboratives and national registries creates opportunities for secondary data analysis projects. These initiatives provide data for quality comparisons as well as endless research opportunities to external researchers across the country. The possibilities are bountiful when you join data from diverse organizations and look for common themes related to illnesses and patient outcomes. With these great opportunities comes great pain for data analysts and health services researchers tasked with compiling these data sets according to specifications. Patient care data is complex, and, particularly at large healthcare systems, might be managed with multiple electronic health record (EHR) systems. Matching data from separate EHR systems while simultaneously ensuring the integrity of the details of that care visit is challenging. This paper demonstrates how data management personnel can use traditional SAS PROCs in new and inventive ways to compile, clean, and complete data sets for submission to healthcare collaboratives and other data sharing initiatives. Traditional data matching methods such as SPEDIS are uniquely combined with iterative SQL joins using the SAS^® functions INDEX, COMPRESS, CATX, and SUBSTR to yield the most out of complex patient and physician name matches. Recoding, correcting missing items, and formatting data can be efficiently achieved by using traditional functions such as MAX, PROC FORMAT, and FIND in new and inventive ways.

Read the paper (PDF).

Gabriela Cantu, Baylor Scott &White Health

Christopher Klekar, Baylor Scott and White Health

Paper 3261-2015:

Data Quality Scorecard

Many users would like to check the quality of data after the data integration process has loaded the data into a data set or table. The approach in this paper shows users how to develop a process that scores columns based on rules judged against a set of standards set by the user. Each rule has a standard that determines whether it passes, fails, or needs review (a green, red, or yellow score). A rule can be as simple as: Is the value for this column missing, or is this column within a valid range? Further, it includes comparing a column to one or more other columns, or checking for specific invalid entries. It also includes rules that compare a column value to a lookup table to determine whether the value is in the lookup table. Users can create their own rules and each column can have any number of rules. For example, a rule can be created to measure a dollar column to a range of acceptable values. The user can determine that it is expected that up to two percent of the values are allowed to be out of range. If two to five percent of the values are out of range, then data should be reviewed. And, if over five percent of the values are out of range, the data is not acceptable. The entire table has a color-coded scorecard showing each rule and its score. Summary reports show columns by score and distributions of key columns. The scorecard enables the user to quickly assess whether the SAS data set is acceptable, or whether specific columns need to be reviewed. Drill-down reports enable the user to drill into the data to examine why the column scored as it did. Based on the scores, the data set can be accepted or rejected, and the user will know where and why the data set failed. The process can store each scorecard data in a data mart. This data mart enables the user to review the quality of their data over time. It can answer questions such as: is the quality of the data improving overall? Are there specific columns that are improving or declining over time? What can we do to improve the qu ality of our data? This scorecard is not intended to replace the quality control of the data integration or ETL process. It is a supplement to the ETL process. The programs are written using only Base SAS^® and Output Delivery System (ODS), macro variables, and formats. This presentation shows how to: (1) use ODS HTML; (2) color code cells with the use of formats; (3) use formats as lookup tables; (4) use INCLUDE statements to make use of template code snippets to simplify programming; and (5) use hyperlinks to launch stored processes from the scorecard.

Read the paper (PDF). | Download the data file (ZIP).

Tom Purvis, Qualex Consulting Services, Inc.

Paper 3483-2015:

Data Sampling Improvement by Developing the SMOTE Technique in SAS^®

A common problem when developing classifications models is the imbalance of classes in the classification variable. This imbalance means that a class is represented by a large number of cases while the other class is represented by very few. When this happens, the predictive power of the developed model could be biased. This is the case because classification methods tend to favor the majority class. And the classification methods are designed to minimize the error on the total data set regardless of the proportions or balance of the classes. Due to this problem, there are several techniques used to balance the distribution of the classification variable. One method is to reduce the size of the majority class (under-sampling), another is to increase the number of cases in the minority class (over-sampling); or a third method is to combine these two methods. There is also a more complex technique called SMOTE (Synthetic Minority Over-sampling Technique) that consists of intelligently generating new synthetic registers of the minority class using a closest-neighbors approach. In this paper, we present the development in SAS^® of a combination of SMOTE and under-sampling techniques as applied to a churn model. Then, we compare the predictive power of the model using this proposed balancing technique against other models developed with different data sampling techniques.

Read the paper (PDF).

Lina Maria Guzman Cartagena, DIRECTV

E

Paper 3083-2015:

Easing into Analytics Using SAS^® Enterprise Guide^® 6.1

Do you need to deliver business insight and analytics to support decision-making? Using SAS^® Enterprise Guide^®, you can access the full power of SAS^® for analytics, without needing to learn the details of SAS programming. This presentation focuses on the following uses of SAS Enterprise Guide: Exploring and understanding--getting a feel for your data and for its issues and anomalies Visualizing--looking at the relationships, trends, surprises Consolidating--starting to piece together the story Presenting--building the insight and analytics into a presentation using SAS Enterprise Guide

Read the paper (PDF).

Marje Fecht, Prowerk Consulting

Paper 3502-2015:

Edit the Editor: Creating Keyboard Macros in SAS^® Enterprise Guide^®

Programmers can create keyboard macros to perform common editing tasks in SAS^® Enterprise Guide^®. This paper introduces how to record keystrokes, save a keyboard macro, edit the commands, and assign a shortcut key. Sample keyboard macros are included. Techniques to share keyboard macros are also covered.

Read the paper (PDF). | Download the data file (ZIP).

Christopher Bost, MDRC

F

Paper SAS1750-2015:

Feeling Anxious about Transitioning from Desktop to Server? Key Considerations to Diminish Your Administrators' and Users' Jitters

As organizations strive to do more with fewer resources, many modernize their disparate PC operations to centralized server deployments. Administrators and users share many concerns about using SAS^® on a Microsoft Windows server. This paper outlines key guidelines, plus architecture and performance considerations, that are essential to making a successful transition from PC to server. This paper outlines the five key considerations for SAS customers who will change their configuration from PC-based SAS to using SAS on a Windows server: 1) Data and directory references; 2) Interactive and surrounding applications; 3) Usability; 4) Performance; 5) SAS Metadata Server.

Read the paper (PDF).

Kate Schwarz, SAS

Donna Bennett, SAS

Margaret Crevar, SAS

Paper 3049-2015:

Filling your SAS^® Efficiency Toolbox: Creating a Stored Process to Interact with Your Shared SAS^® Server Using the X and SYSTASK Commands

SAS^® Enterprise Guide^® is a great interface for businesses running SAS^® in a shared server environment. However, interacting with the shared server outside of SAS can require costly third-party software and knowledge of specific server programming languages. This can create a barrier between the SAS program and the server, which can be frustrating for even the best SAS programmers. This paper reviews the X and SYSTASK commands and creates a template of SAS code to pass commands from SAS to the server. By writing the server log to a text file, we demonstrate how to display critical server information in the code results. Using macros and the prompt functionality of SAS Enterprise Guide, we form stored procedures, allowing SAS users of all skill levels to interact with the server environment. These stored procedures can improve programming efficiency by providing a quick in-program solution to complete common server tasks such as copying folders or changing file permissions. They might also reduce the need for third-party programs to communicate with the server, which could potentially reduce software costs.

Read the paper (PDF).

Cody Murray, Medica Health Plans

Chad Stegeman, Medica

Paper SAS1924-2015:

Find What You Are Looking For And More in SAS^® Enterprise Guide^®

Are you looking to track changes to your SAS^® programs? Do you wish you could easily find errors, warnings, and notes in your SAS logs? Looking for a convenient way to find point-and-click tasks? Want to search your SAS^® Enterprise Guide^® project? How about a point-and-click way to view SAS system options and SAS macro variables? Or perhaps you want to upload data to the SAS^® LASR™ Analytics Server, view SAS^® Visual Analytics reports, or run SAS^® Studio tasks, all from within SAS Enterprise Guide? You can find these capabilities and more in SAS Enterprise Guide. Knowing what tools are at your disposal and how to use them will put you a step ahead of the rest. Come learn about some of the newer features in SAS Enterprise Guide 7.1 and how you can leverage them in your work.

Read the paper (PDF).

Casey Smith, SAS

Paper 3471-2015:

Forecasting Vehicle Sharing Demand Using SAS^® Forecast Studio

As pollution and population continue to increase, new concepts of eco-friendly commuting evolve. One of the emerging concepts is the bicycle sharing system. It is a bike rental service on a short-term basis at a moderate price. It provides people the flexibility to rent a bike from one location and return it to another location. This business is quickly gaining popularity all over the globe. In May 2011, there were only 375 bike rental schemes consisting of nearly 236,000 bikes. However, this number jumped to 535 bike sharing programs with approximately 517,000 bikes in just a couple of years. It is expected that this trend will continue to grow at a similar pace in the future. Most of the businesses involved in this system of bike rental are faced with the challenge of balancing supply and inconsistent demand. The number of bikes needed on a particular day can vary on several factors such as season, time, temperature, wind speed, humidity, holiday and day of the week. In this paper, we have tried to solve this problem using SAS^® Forecast Studio. Incorporating the effects of all the above factors and analyzing the demand trends of the last two years, we have been able to precisely forecast the number of bikes needed on any day in the future. Also, we are able to do the scenario analysis to observe the effect of particular variables on the demand.

Read the paper (PDF).

Kushal Kathed, Oklahoma State University

Goutam Chakraborty, Oklahoma State University

Ayush Priyadarshi, Oklahoma State University

Paper 3142-2015:

Quality measurement is increasingly important in the health-care sphere for both performance optimization and reimbursement. Treatment of chronic conditions is a key area of quality measurement. However, medication compendiums change frequently, and health-care providers often free text medications into a patient's record. Manually reviewing a complete medications database is time consuming. In order to build a robust medications list, we matched a pharmacist-generated list of categorized medications to a raw medications database that contained names, name-dose combinations, and misspellings. The matching procedure we used is called PROC COMPGED. We were able to combine a truncation function and an upcase function to optimize the output of PROC COMPGED. Using these combinations and manipulating the scoring metric of PROC COMPGED enabled us to narrow the database list to medications that were relevant to our categories. This process transformed a tedious task for PROC COMPARE or an Excel macro into a quick and efficient method of matching. The task of sorting through relevant matches was still conducted manually, but the time required to do so was significantly decreased by the fuzzy match in our application of PROC COMPGED.

Read the paper (PDF).

Arti Virkud, NYC Department of Health

G

Paper SAS2602-2015:

Getting Started with Enabling Your End-User Applications to Use SAS^® Grid Manager 9.4

A SAS^® Grid Manager environment provides your organization with a powerful and flexible way to manage many forms of SAS^® computing workloads. For the business and IT user community, the benefits can range from data management jobs effectively utilizing the available processing resources, complex analyses being run in parallel, and reassurance that statutory reports are generated in a highly available environment. This workshop begins the process of familiarizing users with the core concepts of how to grid-enable tasks within SAS^® Studio, SAS^® Enterprise Guide^®, SAS^® Data Integration Studio, and SAS^® Enterprise Miner™ client applications.

Edoardo Riva, SAS

Paper 3474-2015:

Getting your SAS^® Program to Do Your Typing for You!

Do you have a SAS^® program that requires adding filenames to the input every time you run it? Aren't you tired of having to check for the files, check the names, and type them in? Check out how my SAS^® Enterprise Guide^® project checks for files, figures out the file names, and saves me from having to type in the file names for the input data files!

Read the paper (PDF). | Download the data file (ZIP).

Nancy Wilson, Ally

Paper 3198-2015:

Gross Margin Percent Prediction: Using the Power of SAS^® Enterprise Miner™ 12.3 to Predict the Gross Margin Percent for a Steel Manufacturing Company

Predicting the profitability of future sales orders in a price-sensitive, highly competitive make-to-order market can create a competitive advantage for an organization. Order size and specifications vary from order to order and customer to customer, and might or might not be repeated. While it is the intent of the sales groups to take orders for a profit, because of the volatility of steel prices and the competitive nature of the markets, gross margins can range dramatically from one order to the next and in some cases can be negative. Understanding the key factors affecting the gross margin percent and their impact can help the organization to reduce the risk of non-profitable orders and at the same time improve their decision-making ability on market planning and forecasting. The objective of this paper is to identify the best model amongst multiple predictive models inside SAS^® Enterprise Miner™, which could accurately predict the gross margin percent for future orders. The data used for the project consisted of over 30,000 transactional records and 33 input variables. The sales records have been collected from multiple manufacturing plants of the steel manufacturing company. Variables such as order quantity, customer location, sales group, and others were used to build predictive models. The target variable gross margin percent is the net profit on the sales, considering all the factors such as labor cost, cost of raw materials, and so on. The model comparison node of SAS Enterprise Miner was used to determine the best among different variations of regression models, decision trees, and neural networks, as well as ensemble models. Average squared error was used as the fit statistic to evaluate each model's performance. Based on the preliminary model analysis, the ensemble model outperforms other models with the least average square error.

Read the paper (PDF).

Kushal Kathed, Oklahoma State University

Patti Jordan

Ayush Priyadarshi, Oklahoma State University

H

Paper SAS1704-2015:

Helpful Hints for Transitioning to SAS^® 9.4

A group tasked with testing SAS^® software from the customer perspective has gathered a number of helpful hints for SAS^® 9.4 that will smooth the transition to its new features and products. These hints will help with the 'huh?' moments that crop up when you are getting oriented and will provide short, straightforward answers. We also share insights about changes in your order contents. Gleaned from extensive multi-tier deployments, SAS^® Customer Experience Testing shares insiders' practical tips to ensure that you are ready to begin your transition to SAS 9.4. The target audience for this paper is primarily system administrators who will be installing, configuring, or administering the SAS 9.4 environment. (This paper is an updated version of the paper presented at SAS Global Forum 2014 and includes new features and software changes since the original paper was delivered, plus any relevant content that still applies. This paper includes information specific to SAS 9.4 and SAS 9.4 maintenance releases.)

Read the paper (PDF).

Cindy Taylor, SAS

Paper 3431-2015:

How Latent Analyses within Survey Data Can Be Valuable Additions to Any Regression Model

This study looks at several ways to investigate latent variables in longitudinal surveys and their use in regression models. Three different analyses for latent variable discovery are briefly reviewed and explored. The procedures explored in this paper are PROC LCA, PROC LTA, PROC CATMOD, PROC FACTOR, PROC TRAJ, and PROC SURVEYLOGISTIC. The analyses defined through these procedures are latent profile analyses, latent class analyses, and latent transition analyses. The latent variables are included in three separate regression models. The effect of the latent variables on the fit and use of the regression model compared to a similar model using observed data is briefly reviewed. The data used for this study was obtained via the National Longitudinal Study of Adolescent Health, a study distributed and collected by Add Health. Data was analyzed using SAS^® 9.3. This paper is intended for any level of SAS^® user. This paper is also aimed at an audience with a background in behavioral science or statistics.

Read the paper (PDF).

Deanna Schreiber-Gregory, National University

Paper 3185-2015:

How to Hunt for Utility Customer Electric Usage Patterns Armed with SAS^® Visual Statistics with Hadoop and Hive

Your electricity usage patterns reveal a lot about your family and routines. Information collected from electrical smart meters can be mined to identify patterns of behavior that can in turn be used to help change customer behavior for the purpose of altering system load profiles. Demand Response (DR) programs represent an effective way to cope with rising energy needs and increasing electricity costs. The Federal Energy Regulatory Commission (FERC) defines demand response as changes in electric usage by end-use customers from their normal consumption patterns in response to changes in the price of electricity over time, or to incentive payments designed to lower electricity use at times of high wholesale market prices or when system reliability of jeopardized. In order to effectively motivate customers to voluntarily change their consumptions patterns, it is important to identify customers whose load profiles are similar so that targeted incentives can be directed toward these customers. Hence, it is critical to use tools that can accurately cluster similar time series patterns while providing a means to profile these clusters. In order to solve this problem, though, hardware and software that is capable of storing, extracting, transforming, loading and analyzing large amounts of data must first be in place. Utilities receive customer data from smart meters, which track and store customer energy usage. The data collected is sent to the energy companies every fifteen minutes or hourly. With millions of meters deployed, this quantity of information creates a data deluge for utilities, because each customer generates about three thousand data points monthly, and more than thirty-six billion reads are collected annually for a million customers. The data scientist is the hunter, and DR candidate patterns are the prey in this cat-and-mouse game of finding customers willing to curtail electrical usage for a program benefit. The data scientist must connect large siloed data sources, external data , and even unstructured data to detect common customer electrical usage patterns, build dependency models, and score them against their customer population. Taking advantage of Hadoop's ability to store and process data on commodity hardware with distributed parallel processing is a game changer. With Hadoop, no data set is too large, and SAS^® Visual Statistics leverages machine learning, artificial intelligence, and clustering techniques to build descriptive and predictive models. All data can be usable from disparate systems, including structured, unstructured, and log files. The data scientist can use Hadoop to ingest all available data at rest, and analyze customer usage patterns, system electrical flow data, and external data such as weather. This paper will use Cloudera Hadoop with Apache Hive queries for analysis on platforms such as SAS^® Visual Analytics and SAS Visual Statistics. The paper will showcase optionality within Hadoop for querying large data sets with open-source tools and importing these data into SAS^® for robust customer analytics, clustering customers by usage profiles, propensity to respond to a demand response event, and an electrical system analysis for Demand Response events.

Read the paper (PDF).

Kathy Ball, SAS

Paper 3446-2015:

How to Implement Two-Phase Regression Analysis to Predict Profitable Revenue Units

Is it a better business decision to determine profitability of all business units/kiosks and then decide to prune the nonprofitable ones? Or does model performance improve if we decide to first find the units that meet the break-even point and then try to calculate their profits? In our project, we did a two-stage regression process due to highly skewed distribution of the variables. First, we performed logistic regression to predict which kiosks would be profitable. Then, we used linear regression to predict the average monthly revenue at each kiosk. We used SAS^® Enterprise Guide^® and SAS^® Enterprise Miner™ for the modeling process. The effectiveness of the linear regression model is much more for predicting the target variable at profitable kiosks as compared to unprofitable kiosks. The two-phase regression model seemed to perform better than simply performing a linear regression, particularly when the target variable has too many levels. In real-life situations, the dependent and independent variables can have highly skewed distributions, and two-phase regression can help improve model performance and accuracy. Some results: The logistic regression model has an overall accuracy of 82.9%, sensitivity of 92.6%, and specificity of 61.1% with comparable figures for the training data set at 81.8%, 90.7%, and 63.8% respectively. This indicates that the regression model seems to be consistently predicting the profitable kiosks at a reasonably good level. Linear regression model: For the training data set, the MAPE (mean absolute percentage errors in prediction) is 7.2% for the kiosks that earn more than $350 whereas the MAPE (mean absolute percentage errors in prediction) for kiosks that earn less than $350 is -102% for the predicted values (not log-transformed) of the target versus the actual value of the target respectively. For the validation data set, the MAPE (mean absolute percentage errors in prediction) is 7.6% for the kiosks that earn more than $350 whereas the MAPE (mean absolute percentage errors in prediction) for kiosks that earn less than $350 is -142% for the predicted values (not log-transformed) of the target versus the actual value of the target respectively. This means that the average monthly revenue figures seem to be better predicted for the model where the kiosks were earning higher than the threshold value of $350--that is, for those kiosk variables with a flag variable of 1. The model seems to be predicting the target variable with lower APE for higher values of the target variable for both the training data set above and the entire data set below. In fact, if the threshold value for the kiosks is moved to even say $500, the predictive power of the model in terms of APE will substantially increase. The validation data set (Selection Indicator=0) has fewer data points, and, therefore, the contrast in APEs is higher and more varied.

Read the paper (PDF).

Shrey Tandon, Sobeys West

I

Paper 3343-2015:

Improving SAS^® Global Forum Papers

Just as research is built on existing research, the references section is an important part of a research paper. The purpose of this study is to find the differences between professionals and academicians with respect to the references section of a paper. Data is collected from SAS^® Global Forum 2014 Proceedings. Two research hypotheses are supported by the data. First, the average number of references in papers by academicians is higher than those by professionals. Second, academicians follow standards for citing references more than professionals. Text mining is performed on the references to understand the actual content. This study suggests that authors of SAS Global Forum papers should include more references to increase the quality of the papers.

Read the paper (PDF).

Vijay Singh, Oklahoma State University

Pankush Kalgotra, Oklahoma State University

Paper 3356-2015:

Improving the Performance of Two-Stage Modeling Using the Association Node of SAS^® Enterprise Miner™ 12.3

Over the years, very few published studies have discussed ways to improve the performance of two-stage predictive models. This study, based on 10 years (1999-2008) of data from 130 US hospitals and integrated delivery networks, is an attempt to demonstrate how we can leverage the Association node in SAS^® Enterprise Miner™ to improve the classification accuracy of the two-stage model. We prepared the data with imputation operations and data cleaning procedures. Variable selection methods and domain knowledge were used to choose 43 key variables for the analysis. The prominent association rules revealed interesting relationships between prescribed medications and patient readmission/no-readmission. The rules with lift values greater than 1.6 were used to create dummy variables for use in the subsequent predictive modeling. Next, we used two-stage sequential modeling, where the first stage predicted if the diabetic patient was readmitted and the second stage predicted whether the readmission happened within 30 days. The backward logistic regression model outperformed competing models for the first stage. After including dummy variables from an association analysis, many fit indices improved, such as the validation ASE to 0.228 from 0.238, cumulative lift to 1.56 from 1.40. Likewise, the performance of the second stage was improved after including dummy variables from an association analysis. Fit indices such as the misclassification rate improved to 0.240 from 0.243 and the final prediction error to 0.17 from 0.18.

Read the paper (PDF).

Girish Shirodkar, Oklahoma State University

Goutam Chakraborty, Oklahoma State University

Ankita Chaudhari, Oklahoma State University

J

Paper 2340-2015:

Joining Tables Using SAS^® Enterprise Guide^®

Examples include: how to join when your data is perfect, how to join when your data does not match and you have to manipulate it in order to join, how to create a subquery, how to use a subquery.

Read the paper (PDF).

Anita Measey, Bank of Montreal, Risk Capital & Stress Testing

L

Paper 3514-2015:

Learn How Slalom Consulting and Celebrity Cruises Bridge the Marketing Campaign Attribution Divide

Although today's marketing teams enjoy large-scale campaign relationship management systems, many are still left with the task of bridging the well-known gap between campaigns and customer purchasing decisions. During this session, we discuss how Slalom Consulting and Celebrity Cruises decided to take a bold step and bridge that gap. We show how marketing efforts are distorted when a team considers only the last campaign sent to a customer that later booked a cruise. Then we lay out a custom-built SAS 9.3 solution that scales to process thousands of campaigns per month using a stochastic attribution technique. This approach considers all of the campaigns that touch the customer, assigning a single campaign or a set of campaigns that contributed to their decision.

Christopher Byrd, Slalom Consulting

Paper 3342-2015:

Location-Based Association of Customer Sentiment and Retail Sales

There are various economic factors that affect retail sales. One important factor that is expected to correlate is overall customer sentiment toward a brand. In this paper, we analyze how location-specific customer sentiment could vary and correlate with sales at retail stores. In our attempt to find any dependency, we have used location-specific Twitter feeds related to a national-brand chain retail store. We opinion-mine their overall sentiment using SAS^® Sentiment Analysis Studio. We estimate correlation between the opinion index and retail sales within the studied geographic areas. Later in the analysis, using ArcGIS Online from Esri, we estimate whether other location-specific variables that could potentially correlate with customer sentiment toward the brand are significantly important to predict a brand's retail sales.

Read the paper (PDF).

Asish Satpathy, University of California, Riverside

Goutam Chakraborty, Oklahoma State University

Tanvi Kode, Oklahoma State University

M

Paper 2481-2015:

Managing Extended Attributes With a SAS^® Enterprise Guide^® Add-In

SAS^® 9.4 introduced extended attributes, which are name-value pairs that can be attached to either the data set or to individual variables. Extended attributes are managed through PROC DATASETS and can be viewed through PROC CONTENTS or through Dictionary.XATTRS. This paper describes the development of a SAS^® Enterprise Guide^® custom add-in that allows for the entry and editing of extended attributes, with the possibility of using a controlled vocabulary. The controlled vocabulary used in the initial application is derived from the lifecycle branch of the Data Documentation Initiative metadata standard (DDI-L).

Read the paper (PDF).

Larry Hoyle, IPSR, Univ. of Kansas

Paper 3375-2015:

Maximizing a Churn Campaign's Profitability with Cost-sensitive Predictive Analytics

Predictive analytics has been widely studied in recent years, and it has been applied to solve a wide range of real-world problems. Nevertheless, current state-of-the-art predictive analytics models are not well aligned with managers' requirements in that the models fail to include the real financial costs and benefits during the training and evaluation phases. Churn predictive modeling is one of those examples in which evaluating a model based on a traditional measure such as accuracy or predictive power does not yield the best results when measured by investment per subscriber in a loyalty campaign and the financial impact of failing to detect a real churner versus wrongly predicting a non-churner as a churner. In this paper, we propose a new financially based measure for evaluating the effectiveness of a voluntary churn campaign, taking into account the available portfolio of offers, their individual financial cost, and the probability of acceptance depending on the customer profile. Then, using a real-world churn data set, we compared different cost-insensitive and cost-sensitive predictive analytics models and measured their effectiveness based on their predictive power and cost optimization. The results show that using a cost-sensitive approach yields to an increase in profitability of up to 32.5%.

Alejandro Correa Bahnsen, University of Luxembourg

Darwin Amezquita, DIRECTV

Juan Camilo Arias, Smartics

Paper 2240-2015:

Member-Level Regression Using SAS^® Enterprise Guide^® and SAS^® Forecast Studio

The need to measure slight changes in healthcare costs and utilization patterns over time is vital in predictive modeling, forecasting, and other advanced analytics. At BlueCross BlueShield of Tennessee, a method for developing member-level regression slopes creates a better way of identifying these changes across various time spans. The goal is to create multiple metrics at the member level that will indicate when an individual is seeking more or less medical or pharmacy services. Significant increases or decreases in utilization and cost are used to predict the likelihood of acquiring certain conditions, seeking services at particular facilities, and self-engaging in health and wellness. Data setup and compilation consists of calculating a member's eligibility with the health plan and then aggregating cost and utilization of particular services (for example, primary care visits, Rx costs, ER visits, and so on). A member must have at least six months of eligibility for a valid regression slope to be calculated. Linear regression is used to build single-factor models for 6, 12, 18 and 24 month time spans if the appropriate amount of data is available for the member. Models are built at the member-metric time period resulting in the possibility of over 75 regression coefficients per member per monthly run. The computing power needed to execute such a vast amount of calculations requires in-database processing of various macro processes. SAS^® Enterprise Guide^® is used to structure the data and SAS^® Forecast Studio is used to forecast trends at a member level. Algorithms are run the first of each month. Data is stored so that each metric and corresponding slope is appended on a monthly basis. Because the data is setup up for the member regression algorithm, slopes are interpreted in the following manner: a positive value for -1*slope indicates an increase in utilization/cost; a negative value for -1*slope indicates a decrease in utilization/cost. The ac tual slope value indicates the intensity of the change in cost in utilization. The insight provided by this member-level regression methodology replaces subjective methods that used arbitrary thresholds of change to measure differences in cost and utilization.

Read the paper (PDF).

Leigh McCormack, BCBST

Prudhvidhar Perati, BlueCross BlueShield of TN

Paper 2524-2015:

Methodology of Model Creation

The goal of this session is to describe the whole process of model creation from the business request through model specification, data preparation, iterative model creation, model tuning, implementation, and model servicing. Each mentioned phase consists of several steps in which we describe the main goal of the step, the expected outcome, the tools used, our own SAS codes, useful nodes, and settings in SAS^® Enterprise Miner™, procedures in SAS^® Enterprise Guide^®, measurement criteria, and expected duration in man-days. For three steps, we also present deep insights with examples of practical usage, explanations of used codes, settings, and ways of exploring and interpreting the output. During the actual model creation process, we suggest using Microsoft Excel to keep all input metadata along with information about transformations performed in SAS Enterprise Miner. To get faster information about model results, we combine an automatic SAS^® code generator implemented in Excel, and then we input this code to SAS Enterprise Guide and create a specific profile of results directly from the nodes output tables of SAS Enterprise Miner. This paper also focuses on an example of a binary model stability check-in time performed in SAS Enterprise Guide through measuring optimal cut-off percentage and lift. These measurements are visualized and automatized using our own codes. By using this methodology, users would have direct contact with transformed data along with the possibility to analyze and explore any semi-results. Furthermore, the proposed approach could be used for several types of modeling (for example, binary and nominal predictive models or segmentation models). Generally, we have summarized our best practices of combining specific procedures performed in SAS Enterprise Guide, SAS Enterprise Miner, and Microsoft Excel to create and interpret models faster and more effectively.

Read the paper (PDF).

Peter Kertys, VÚB a.s.

Paper 1381-2015:

Model Risk and Corporate Governance of Models with SAS^®

Banks can create a competitive advantage in their business by using business intelligence (BI) and by building models. In the credit domain, the best practice is to build risk-sensitive models (Probability of Default, Exposure at Default, Loss-given Default, Unexpected Loss, Concentration Risk, and so on) and implement them in decision-making, credit granting, and credit risk management. There are models and tools on the next level built on these models and that are used to help in achieving business targets, risk-sensitive pricing, capital planning, optimizing of ROE/RAROC, managing the credit portfolio, setting the level of provisions, and so on. It works remarkably well as long as the models work. However, over time, models deteriorate and their predictive power can drop dramatically. Since the global financial crisis in 2008, we have faced a tsunami of regulation and accelerated frequency of changes in the business environment, which cause models to deteriorate faster than ever before. As a result, heavy reliance on models in decision-making (some decisions are automated following the model's results--without human intervention) might result in a huge error that can have dramatic consequences for the bank's performance. In my presentation, I share our experience in reducing model risk and establishing corporate governance of models with the following SAS^® tools: model monitoring, SAS^® Model Manager, dashboards, and SAS^® Visual Analytics.

Read the paper (PDF).

Boaz Galinson, Bank Leumi

Paper 3406-2015:

Modeling to Improve the Customer Unit Target Selection for Inspections of Commercial Losses in the Brazilian Electric Sector: The case of CEMIG

Electricity is an extremely important product for society. In Brazil, the electric sector is regulated by ANEEL (Ag ncia Nacional de Energia El trica), and one of the regulated aspects is power loss in the distribution system. In 2013, 13.99% of all injected energy was lost in the Brazilian system. Commercial loss is one of the power loss classifications, which can be countered by inspections of the electrical installation in a search for irregularities in power meters. CEMIG (Companhia Energ tica de Minas Gerais) currently serves approximately 7.8 million customers, which makes it unfeasible (in financial and logistic terms) to inspect all customer units. Thus, the ability to select potential inspection targets is essential. In this paper, logistic regression models, decision tree models, and the Ensemble model were used to improve the target selection process in CEMIG. The results indicate an improvement in the positive predictive value from 35% to 50%.

Read the paper (PDF).

Sergio Henrique Ribeiro, Cemig

Iguatinan Monteiro, CEMIG

O

Paper 4300-2015:

"Out Here" Forecasting: A Retail Case Study

Faced with diminishing forecast returns from the forecast engine within the existing replenishment application, Tractor Supply Company (TSC) engaged SAS^® Institute to deliver a fully integrated forecasting solution that promised a significant improvement of chain-wide forecast accuracy. The end-to-end forecast implementation including problems faced, solutions delivered, and results realized will be explored.

Read the paper (PDF).

Chris Houck, SAS

P

Paper 3080-2015:

Picture-Perfect Graphing with Graph Template Language

Do you have reports based on SAS/GRAPH^® procedures, customized with multiple GOPTIONS? Do you dream of those same graphs existing in a GOPTIONS and ANNOTATE-free world? Re-creating complex graphs using statistical graphics (SG) procedures is not only possible, but much easier than you think! Using before and after examples, I discuss how the graphs were created using the combination of Graph Template Language (GTL) and the SG procedures. This method produces graphs that are nearly indistinguishable from the original. This method simplifies the code required to make complex graphs, allows for maximum re-usability for graphics code, and enables changes to cascade to multiple reports simultaneously.

Read the paper (PDF).

Julie VanBuskirk, Nurtur

Paper 3326-2015:

Predicting Hospitalization of a Patient Using SAS^® Enterprise Miner™

Inpatient treatment is the most common type of treatment ordered for patients who have a serious ailment and need immediate attention. Using a data set about diabetes patients downloaded from the UCI Network Data Repository, we built a model to predict the probability that the patient will be rehospitalized within 30 days of discharge. The data has about 100,000 rows and 51 columns. In our preliminary analysis, a neural network turned out to be the best model, followed closely by the decision tree model and regression model.

Nikhil Kapoor, Oklahoma State University

Ganesh Kumar Gangarajula, Oklahoma State University

Paper 3501-2015:

Predicting Transformer Lifetime Using Survival Analysis and Modeling Risk Associated with Overloaded Transformers Using SAS^® Enterprise Miner™ 12.1

Utility companies in America are always challenged when it comes to knowing when their infrastructure fails. One of the most critical components of a utility company's infrastructure is the transformer. It is important to assess the remaining lifetime of transformers so that the company can reduce costs, plan expenditures in advance, and largely mitigate the risk of failure. It is also equally important to identify the high-risk transformers in advance and to maintain them accordingly in order to avoid sudden loss of equipment due to overloading. This paper uses SAS^® to predict the lifetime of transformers, identify the various factors that contribute to their failure, and model the transformer into High, Medium, and Low risk categories based on load for easy maintenance. The data set from a utility company contains around 18,000 observations and 26 variables from 2006 to 2013, and contains the failure and installation dates of the transformers. The data set comprises many transformers that were installed before 2006 (there are 190,000 transformers on which several regression models are built in this paper to identify their risk of failure), but there is no age-related parameter for them. Survival analysis was performed on this left-truncated and right-censored data. The data set has variables such as Age, Average Temperature, Average Load, and Normal and Overloaded Conditions for residential and commercial transformers. Data creation involved merging 12 different tables. Nonparametric models for failure time data were built so as to explore the lifetime and failure rate of the transformers. By building a Cox's regression model, the important factors contributing to the failure of a transformer are also analyzed in this paper. Several risk- based models are then built to categorize transformers into High, Medium, and Low risk categories based on their loads. This categorization can help the utility companies to better manage the risks associated with transformer failures.

Read the paper (PDF).

Balamurugan Mohan, Oklahoma State University

Goutam Chakraborty, Oklahoma State University

Paper 3405-2015:

Prioritizing Feeders for Investments: Peformance Analysis Using Data Envelopment Analysis

This paper presents a methodology developed to define and prioritize feeders with the least satisfactory performances for continuity of energy supply, in order to obtain an efficiency ranking that supports a decision-making process regarding investments to be implemented. Data Envelopment Analysis (DEA) was the basis for the development of this methodology, in which the input-oriented model with variable returns to scale was adopted. To perform the analysis of the feeders, data from the utility geographic information system (GIS) and from the interruption control system was exported to SAS^® Enterprise Guide^®, where data manipulation was possible. Different continuity variables and physical-electrical parameters were consolidated for each feeder for the years 2011 to 2013. They were separated according to the geographical regions of the concession area, according to their location (urban or rural), and then grouped by physical similarity. Results showed that 56.8% of the feeders could be considered as efficient, based on the continuity of the service. Furthermore, the results enable identification of the assets with the most critical performance and their benchmarks, and the definition of preliminary goals to reach efficiency.

Read the paper (PDF).

Victor Henrique de Oliveira, Cemig

Iguatinan Monteiro, CEMIG

Paper 3258-2015:

Put Data in the Driver's Seat: A Primer on Data-Driven Programming Using SAS^®

One of the hallmarks of a good or great SAS^® program is that it requires only a minimum of upkeep. Especially for code that produces reports on a regular basis, it is preferable to minimize user and programmer input and instead have the input data drive the outputs of a program. Data-driven SAS programs are more efficient and reliable, require less hardcoding, and result in less downtime and fewer user complaints. This paper reviews three ways of building a SAS program to create regular Microsoft Excel reports; one method using hardcoded variables, another using SAS keyword macros, and the last using metadata to drive the reports.

Andrew Clapson, MD Financial Management

R

Paper 1341-2015:

Random vs. Fixed Effects: Which Technique More Effectively Addresses Selection Bias in Observational Studies

Retrospective case-control studies are frequently used to evaluate health care programs when it is not feasible to randomly assign members to a respective cohort. Without randomization, observational studies are more susceptible to selection bias where the characteristics of the enrolled population differ from those of the entire population. When the participant sample is different from the comparison group, the measured outcomes are likely to be biased. Given this issue, this paper discusses how propensity score matching and random effects techniques can be used to reduce the impact selection bias has on observational study outcomes. All results shown are drawn from an ROI analysis using a participant (cases) versus non-participant (controls) observational study design for a fitness reimbursement program aiming to reduce health care expenditures of participating members.

Read the paper (PDF). | Download the data file (ZIP).

Jess Navratil-Strawn, Optum

Paper 2382-2015:

Reducing the Bias: Practical Application of Propensity Score Matching in Health-Care Program Evaluation

To stay competitive in the marketplace, health-care programs must be capable of reporting the true savings to clients. This is a tall order, because most health-care programs are set up to be available to the client's entire population and thus cannot be conducted as a randomized control trial. In order to evaluate the performance of the program for the client, we use an observational study design that has inherent selection bias due to its inability to randomly assign participants. To reduce the impact of bias, we apply propensity score matching to the analysis. This technique is beneficial to health-care program evaluations because it helps reduce selection bias in the observational analysis and in turn provides a clearer view of the client's savings. This paper explores how to develop a propensity score, evaluate the use of inverse propensity weighting versus propensity matching, and determine the overall impact of the propensity score matching method on the observational study population. All results shown are drawn from a savings analysis using a participant (cases) versus non-participant (controls) observational study design for a health-care decision support program aiming to reduce emergency room visits.

Read the paper (PDF).

Amber Schmitz, Optum

Paper SAS1871-2015:

Regulatory Compliance Reporting Using SAS^® XML Mapper

As a part of regulatory compliance requirements, banks are required to submit reports based on Microsoft Excel, as per templates supplied by the regulators. This poses several challenges, including the high complexity of templates, the fact that implementation using ODS can be cumbersome, and the difficulty in keeping up with regulatory changes and supporting dynamic report content. At the same time, you need the flexibility to customize and schedule these reports as per your business requirements. This paper discusses an approach to building these reports using SAS^® XML Mapper and the Excel XML spreadsheet format. This approach provides an easy-to-use framework that can accommodate template changes from the regulators without needing to modify the code. It is implemented using SAS^® technologies, providing you the flexibility to customize to your needs. This approach also provides easy maintainability.

Read the paper (PDF).

Sarita Kannarath, SAS

Phil Hanna, SAS

Amitkumar Nakrani, SAS

Nishant Sharma, SAS

S

Paper 3102-2015:

SAS^® Enterprise Guide^® 5.1 and PROC GPLOT--the Power, the Glory and the PROC-tical Limitations

Customer expectations are set high when Microsoft Excel and Microsoft PowerPoint are used to design reports. Using SAS^® for reporting has benefits because it generates plots directly from prepared data sets, automates the plotting process, minimizes labor-intensive manual construction using Microsoft products, and does not compromise the presentation value. SAS^® Enterprise Guide^® 5.1 has a powerful point-and-click method that is quick and easy to use. However, it is limited in its ability to customize the output to mimic manually created Microsoft graphics. This paper demonstrates why SAS Enterprise Guide is the perfect starting point for creating initial code for plots using SAS/GRAPH^® point-and-click features and how the code can be enhanced using established PROC GPLOT, ANNOTATE, and ODS options to re-create the look and feel of plots generated by Excel and PowerPoint. Examples show the generation of plots and tables using PROC TABULATE to embed the plot data into the graphical output. Also included are tips for overcoming the ODS limitation of SAS^® 9.3, which is used by SAS Enterprise Guide 5.1, to transfer the SAS graphical output to PowerPoint files. These SAS^® 9.3 tips are contrasted with the new SAS^® 9.4 ODS POWERPOINT statement that enables direct PowerPoint file creation from a SAS program.

Read the paper (PDF).

Christopher Klekar, Baylor Scott and White Health

Gabriela Cantu, Baylor Scott &White Health

Paper 3110-2015:

SAS^® Enterprise Guide^® Query Builder

The Query Builder in SAS^® Enterprise Guide^® is an excellent point-and-click tool that generates PROC SQL code and creates queries in SAS^®. This hands-on workshop will give an overview of query options, sorting, simple and complex filtering, and joining tables. It is a great workshop for programmers and non-programmers alike.

Jennifer First-Kluge, Systems Seminar Consultants

Paper 2540-2015:

SAS^® Enterprise Guide^® System Design

A good system should embody the following characteristics: planned, maintainable, flexible, simple, accurate, restartable, reliable, reusable, automated, documented, efficient, modular, and validated. This is true of any system, but how to implement this in SAS^® Enterprise Guide^® is a unique endeavor. We provide a brief overview of these characteristics and then dive deeper into how a SAS Enterprise Guide user should approach developing both ad hoc and production systems.

Read the paper (PDF).

Steven First, Systems Seminar Consultants

Jennifer First-Kluge, Systems Seminar Consultants

Paper 2683-2015:

SAS^® Enterprise Guide^® or SAS^® Studio: Which is Best for You?

SAS^® Studio (previously known as SAS^® Web Editor) was introduced in the first maintenance release of SAS^® 9.4 as an alternative programming environment to SAS^® Enterprise Guide^® and SAS^® Display Manager. SAS Studio is different in many ways from SAS Enterprise Guide and SAS Display Manager. As a programmer, I currently use SAS Enterprise Guide to help me code, test, maintain, and organize my SAS^® programs. I have SAS Display Manager installed on my PC, but I still prefer to write my programs in SAS Enterprise Guide because I know it saves my log and output whenever I run a program, even if that program crashes and takes the SAS session with it! So should I now be using SAS Studio instead, and should you be using it, too?

Read the paper (PDF).

Philip Holland, Holland Numerics Limited

Paper 3479-2015:

SAS^® Metadata Security 101: A Primer for SAS Administrators and Users Not Familiar with SAS

The purpose behind this paper is to provide a high-level overview of how SAS^® security works in a way that can be communicated to both SAS administrators and users who are not familiar with SAS. It is not uncommon to hear SAS administrators complain that their IT department and users just don't 'get' it when it comes to metadata and security. For the administrator or user not familiar with SAS, understanding how SAS interacts with the operating system, the file system, external databases, and users can be confusing. Based on a SAS^® Enterprise Office Analytics installation in a Windows environment, this paper walks the reader through all of the basic metadata relationships and how they are created, thus unraveling the mystery of how the host system, external databases, and SAS work together to provide users what they need, while reliably enforcing the appropriate security.

Read the paper (PDF).

Charyn Faenza, F.N.B. Corporation

Paper SAS1661-2015:

Show Me the Money! Text Analytics for Decision-Making in Government Spending

Understanding organizational trends in spending can help overseeing government agencies make appropriate modifications in spending to best serve the organization and the citizenry. However, given millions of line items for organizations annually, including free-form text, it is unrealistic for these overseeing agencies to succeed by using only a manual approach to this textual data. Using a publicly available data set, this paper explores how business users can apply text analytics using SAS^® Contextual Analysis to assess trends in spending for particular agencies, apply subject matter expertise to refine these trends into a taxonomy, and ultimately, categorize the spending for organizations in a flexible, user-friendly manner. SAS^® Visual Analytics enables dynamic exploration, including modeling results from SAS^® Visual Statistics, in order to assess areas of potentially extraneous spending, providing actionable information to the decision makers.

Read the paper (PDF).

Tom Sabo, SAS

Paper SAS1972-2015:

Social Media and Open Data Integration through SAS^® Visual Analytics and SAS^® Text Analytics for Public Health Surveillance

A leading killer in the United States is smoking. Moreover, over 8.6 million Americans live with a serious illness caused by smoking or second-hand smoking. Despite this, over 46.6 million U.S. adults smoke tobacco, cigars, and pipes. The key analytic question in this paper is, How would e-cigarettes affect this public health situation? Can monitoring public opinions of e-cigarettes using SAS^® Text Analytics and SAS^® Visual Analytics help provide insight into the potential dangers of these new products? Are e-cigarettes an example of Big Tobacco up to its old tricks or, in fact, a cessation product? The research in this paper was conducted on thousands of tweets from April to August 2014. It includes API sources beyond Twitter--for example, indicators from the Health Indicators Warehouse (HIW) of the Centers for Disease Control and Prevention (CDC)--that were used to enrich Twitter data in order to implement a surveillance system developed by SAS^® for the CDC. The analysis is especially important to The Office of Smoking and Health (OSH) at the CDC, which is responsible for tobacco control initiatives that help states to promote cessation and prevent initiation in young people. To help the CDC succeed with these initiatives, the surveillance system also: 1) automates the acquisition of data, especially tweets; and 2) applies text analytics to categorize these tweets using a taxonomy that provides the CDC with insights into a variety of relevant subjects. Twitter text data can help the CDC look at the public response to the use of e-cigarettes, and examine general discussions regarding smoking and public health, and potential controversies (involving tobacco exposure to children, increasing government regulations, and so on). SAS^® Content Categorization helps health care analysts review large volumes of unstructured data by categorizing tweets in order to monitor and follow what people are saying and why they are saying it. Ultimatel y, it is a solution intended to help the CDC monitor the public's perception of the dangers of smoking and e-cigarettes, in addition, it can identify areas where OSH can focus its attention in order to fulfill its mission and track the success of CDC health initiatives.

Read the paper (PDF).

Manuel Figallo, SAS

Emily McRae, SAS

T

Paper SAS1804-2015:

Take Your Data Analysis and Reporting to the Next Level by Combining SAS^® Office Analytics, SAS^® Visual Analytics, and SAS^® Studio

SAS^® Office Analytics, SAS^® Visual Analytics, and SAS^® Studio provide excellent data analysis and report generation. When these products are combined, their deep interoperability enables you to take your analysis and reporting to the next level. Build interactive reports in SAS^® Visual Analytics Designer, and then view, customize and comment on them from Microsoft Office and SAS^® Enterprise Guide^®. Create stored processes in SAS Enterprise Guide, and then run them in SAS Visual Analytics Designer, mobile tablets, or SAS Studio. Run your SAS Studio tasks in SAS Enterprise Guide and Microsoft Office using data provided by those applications. These interoperability examples and more will enable you to combine and maximize the strength of each of the applications. Learn more about this integration between these products and what's coming in the future in this session.

Read the paper (PDF).

David Bailey, SAS

Tim Beese, SAS

Casey Smith, SAS

Paper 2920-2015:

Text Mining Kaiser Permanente Member Complaints with SAS^® Enterprise Miner™

This presentation details the steps involved in using SAS^® Enterprise Miner™ to text mine a sample of member complaints. Specifically, it describes how the Text Parsing, Text Filtering, and Text Topic nodes were used to generate topics that described the complaints. Text mining results are reviewed (slightly modified for confidentiality), as well as conclusions and lessons learned from the project.

Read the paper (PDF).

Amanda Pasch, Kaiser Permanenta

Paper 3200-2015:

The Use Of SAS^® Maps For University Retention And Recruiting

Universities often have student data that is difficult to represent, which includes information about the student's home location. Often, student data is represented in tables, and patterns are easily overlooked. This study aimed to represent recruiting and retention data at a county level using SAS^® mapping software for a public, land-grant university. Three years of student data from the student records database were used to visually represent enrollment, retention, and other predictors of student success. SAS^® Enterprise Guide^® was used along with the GMAP procedure to make user-friendly maps. Displaying data using maps on a county level revealed patterns in enrollment, retention, and other factors of interest that might have otherwise been overlooked, which might be beneficial for recruiting purposes.

Read the paper (PDF).

Allison Lempola, South Dakota State University

Thomas Brandenburger, South Dakota State University

Paper 3003-2015:

Tips and Tricks for SAS^® Program Automation

There are many 'gotcha's' when you are trying to automate a well-written program. The details differ depending on the way you schedule the program and the environment you are using. This paper covers system options, error handling logic, and best practices for logging. Save time and frustration by using these tips as you schedule programs to run.

Read the paper (PDF).

Adam Hood, Slalom Consulting

Paper 3518-2015:

Twelve Ways to Better Graphs

If you are looking for ways to make your graphs more communication-effective, this tutorial can help. It covers both the new ODS Graphics SG (Statistical Graphics) procedures and the traditional SAS/GRAPH^® software G procedures. The focus is on management reporting and presentation graphs, but the principles are relevant for statistical graphs as well. Important features unique to SAS^® 9.4 are included, but most of the designs and construction methods apply to earlier versions as well. The principles of good graphic design are actually independent of your choice of software.

Read the paper (PDF).

LeRoy Bessler, Bessler Consulting and Research

U

Paper SAS1910-2015:

Unconventional Data-Driven Methodologies Forecast Performance in Unconventional Oil and Gas Reservoirs

How does historical production data relate a story about subsurface oil and gas reservoirs? Business and domain experts must perform accurate analysis of reservoir behavior using only rate and pressure data as a function of time. This paper introduces innovative data-driven methodologies to forecast oil and gas production in unconventional reservoirs that, owing to the nature of the tightness of the rocks, render the empirical functions less effective and accurate. You learn how implementations of the SAS^® MODEL procedure provide functional algorithms that generate data-driven type curves on historical production data. Reservoir engineers can now gain more insight to the future performance of the wells across their assets. SAS enables a more robust forecast of the hydrocarbons in both an ad hoc individual well interaction and in an automated batch mode across the entire portfolio of wells. Examples of the MODEL procedure arising in subsurface production data analysis are discussed, including the Duong data model and the stretched exponential data model. In addressing these examples, techniques for pattern recognition and for implementing TREE, CLUSTER, and DISTANCE procedures in SAS/STAT^® are highlighted to explicate the importance of oil and gas well profiling to characterize the reservoir. The MODEL procedure analyzes models in which the relationships among the variables comprise a system of one or more nonlinear equations. Primary uses of the MODEL procedure are estimation, simulation, and forecasting of nonlinear simultaneous equation models, and generating type curves that fit the historical rate production data. You will walk through several advanced analytical methodologies that implement the SEMMA process to enable hypotheses testing as well as directed and undirected data mining techniques. SAS^® Visual Analytics Explorer drives the exploratory data analysis to surface trends and relationships, and the data QC workflows ensure a robust input space for the performance forecasting methodologies that are visualized in a web-based thin client for interactive interpretation by reservoir engineers.

Read the paper (PDF).

Keith Holdaway, SAS

Louis Fabbi, SAS

Dan Lozie, SAS

Paper 1640-2015:

Understanding Characteristics of Insider Threats by Using Feature Extraction

This paper explores feature extraction from unstructured text variables using Term Frequency-Inverse Document Frequency (TF-IDF) weighting algorithms coded in Base SAS^®. Data sets with unstructured text variables can often hold a lot of potential to enable better predictive analysis and document clustering. Each of these unstructured text variables can be used as inputs to build an enriched data set-specific inverted index, and the most significant terms from this index can be used as single word queries to weight the importance of the term to each document from the corpus. This paper also explores the usage of hash objects to build the inverted indices from the unstructured text variables. We find that hash objects provide a considerable increase in algorithm efficiency, and our experiments show that a novel weighting algorithm proposed by Paik (2013) best enables meaningful feature extraction. Our TF-IDF implementations are tested against a publicly available data breach data set to understand patterns specific to insider threats to an organization.

Read the paper (PDF). | Watch the recording.

Ila Gokarn, Singapore Management University

Clifton Phua, SAS

Paper SAS1925-2015:

Using SAS to Deliver Web Content Personalization Using Cloud-Based Clickstream Data and On-premises Customer Data

Real-time web content personalization has come into its teen years, but recently a spate of marketing solutions have enabled marketers to finely personalize web content for visitors based on browsing behavior, geo-location, preferences, and so on. In an age where the attention span of a web visitor is measured in seconds, marketers hope that tailoring the digital experience will pique each visitor's interest just long enough to increase corporate sales. The range of solutions spans the entire spectrum of completely cloud-based installations to completely on-premises installations. Marketers struggle to find the most optimal solution that would meet their corporation's marketing objectives, provide them the highest agility and time-to-market, and still keep a low marketing budget. In the last decade or so, marketing strategies that involved personalizing using purely on-premises customer data quickly got replaced by ones that involved personalizing using only web-browsing behavior (a.k.a, clickstream data). This was possible because of a spate of cloud-based solutions that enabled marketers to de-couple themselves from the underlying IT infrastructure and the storage issues of capturing large volumes of data. However, this new trend meant that corporations weren't using much of their treasure trove of on-premises customer data. Of late, however, enterprises have been trying hard to find solutions that give them the best of both--the ease of gathering clickstream data using cloud-based applications and on-premises customer data--to perform analytics that lead to better web content personalization for a visitor. This paper explains a process that attempts to address this rapidly evolving need. The paper assumes that the enterprise already has tools for capturing clickstream data, developing analytical models, and for presenting the content. It provides a roadmap to implementing a phased approach where enterprises continue to capture clickstream data, but they bring that data in-house to be merg ed with customer data to enable their analytics team to build sophisticated predictive models that can be deployed into the real-time web-personalization application. The final phase requires enterprises to continuously improve their predictive models on a periodic basis.

Read the paper (PDF).

Mahesh Subramanian, SAS Institute Inc.

Suneel Grover, SAS

Paper 3503-2015:

Using SAS^® Enterprise Guide^®, SAS^® Enterprise Miner™, and SAS^® Marketing Automation to Make a Collection Campaign Smarter

Companies are increasingly relying on analytics as the right solution to their problems. In order to use analytics and create value for the business, companies first need to store, transform, and structure the data to make it available and functional. This paper shows a successful business case where the extraction and transformation of the data combined with analytical solutions were developed to automate and optimize the management of the collections cycle for a TELCO company (DIRECTV Colombia). SAS^® Data Integration Studio is used to extract, process, and store information from a diverse set of sources. SAS Information Map is used to integrate and structure the created databases. SAS^® Enterprise Guide^® and SAS^® Enterprise Miner™ are used to analyze the data, find patterns, create profiles of clients, and develop churn predictive models. SAS^® Customer Intelligence Studio is the platform on which the collection campaigns are created, tested, and executed. SAS^® Web Report Studio is used to create a set of operational and management reports.

Read the paper (PDF).

Darwin Amezquita, DIRECTV

Paulo Fuentes, Directv Colombia

Andres Felipe Gonzalez, Directv

Paper 3212-2015:

Using SAS^® to Combine Regression and Time Series Analysis on U.S. Financial Data to Predict the Economic Downturn

During the financial crisis of 2007-2009, the U.S. labor market lost 8.4 million jobs, causing the unemployment rate to increase from 5% to 9.5%. One of the indicators for economic recession is negative gross domestic product (GDP) for two consecutive quarters. This poster combines quantitative and qualitative techniques to predict the economic downturn by forecasting recession probabilities. Data was collected from the Board of Governors of the Federal Reserve System and the Federal Reserve Bank of St. Louis, containing 29 variables and quarterly observations from 1976-Q1 to 2013-Q3. Eleven variables were selected as inputs based on their effects on recession and limiting the multicollinearity: long-term treasury yield (5-year and 10-year), mortgage rate, CPI inflation rate, prime rate, market volatility index, Better Business Bureau (BBB) corporate yield, house price index, stock market index, commercial real estate price index, and one calculated variable yield spread (Treasury yield-curve spread). The target variable was a binary variable depicting the economic recession for each quarter (1=Recession). Data was prepared for modeling by applying imputation and transformation on variables. Two-step analysis was used to forecast the recession probabilities for the short-term period. Predicted recession probabilities were first obtained from the Backward Elimination Logistic Regression model that was selected on the basis of misclassification (validation misclassification= 0.115). These probabilities were then forecasted using the Exponential Smoothing method that was selected on the basis of mean average error (MAE= 11.04). Results show the recession periods including the great recession of 2008 and the forecast for eight quarters (up to 2015-Q3).

Read the paper (PDF).

Avinash Kalwani, Oklahoma State University

Nishant Vyas, Oklahoma State University

W

Paper SAS1440-2015:

Want an Early Picture of the Data Quality Status of Your Analysis Data? SAS^® Visual Analytics Shows You How

When you are analyzing your data and building your models, you often find out that the data cannot be used in the intended way. Systematic pattern, incomplete data, and inconsistencies from a business point of view are often the reason. You wish you could get a complete picture of the quality status of your data much earlier in the analytic lifecycle. SAS^® analytics tools like SAS^® Visual Analytics help you to profile and visualize the quality status of your data in an easy and powerful way. In this session, you learn advanced methods for analytic data quality profiling. You will see case studies based on real-life data, where we look at time series data from a bird's-eye-view and interactively profile GPS trackpoint data from a sail race.

Read the paper (PDF). | Download the data file (ZIP).

Gerhard Svolba, SAS

Y

Paper 3262-2015:

Yes, SAS^® Can Do! Manage External Files with SAS Programming

Managing and organizing external files and directories play an important part in our data analysis and business analytics work. A good file management system can streamline project management and file organizations and significantly improve work efficiency . Therefore, under many circumstances, it is necessary to automate and standardize the file management processes through SAS^® programming. Compared with managing SAS files via PROC DATASETS, managing external files is a much more challenging task, which requires advanced programming skills. This paper presents and discusses various methods and approaches to managing external files with SAS programming. The illustrated methods and skills can have important applications in a wide variety of analytic work fields.

Read the paper (PDF).

Justin Jia, Trans Union

Amanda Lin, CIBC