SAS Enterprise Guide Papers A-Z

A
Session 10081-2016:
An Application of the PRINQUAL Procedure to Develop a Synthetic Index of Customer Value for a Colombian Financial Institution
Currently Colpatria, as a part of Scotiabank in Colombia, has several methodologies that enable us to have a vision of the customer from a risk perspective. However, the current trend in the financial sector is to have a global vision that involves aspects of risk as well as of profitability and utility. As a part of the business strategies to develop cross-sell and customer profitability under conditions of risk needs, it's necessary to create a customer value index to score the customer according to different groups of business key variables that permit us to describe the profitability and risk of each customer. In order to generate the Index of Customer Value, we propose to construct a synthetic index using principal component analysis and multiple factorial analysis.
Read the paper (PDF)
Ivan Atehortua, Colpatria
Diana Flórez, Colpatria
B
Session 3640-2016:
Big Data, Big Headaches: An Agile Modeling Solution Designed for the Information Age
The surge of data and data sources in marketing has created an analytical bottleneck in most organizations. Analytics departments have been pushed into a difficult decision: either purchase black-box analytical tools to generate efficiencies or hire more analysts, modelers, and data scientists. Knowledge gaps stemming from restrictions in black-box tools or from backlogs in the work of analytical teams have resulted in lost business opportunities. Existing big data analytics tools respond well when dealing with large record counts and small variable counts, but they fall short in bringing efficiencies when dealing with wide data. This paper discusses the importance of an agile modeling engine designed to deliver productivity, irrespective of the size of the data or the complexity of the modeling approach.
Read the paper (PDF) | Watch the recording
Mariam Seirafi, Cornerstone Group of Companies
C
Session 2440-2016:
Change Management: The Secret to a Successful SAS® Implementation
Whether you are deploying a new capability with SAS® or modernizing the tool set that people already use in your organization, change management is a valuable practice. Sharing the news of a change with employees can be a daunting task and is often put off until the last possible second. Organizations frequently underestimate the impact of the change, and the results of that miscalculation can be disastrous. Too often, employees find out about a change just before mandatory training and are expected to embrace it. But change management is far more than training. It is early and frequent communication; an inclusive discussion; encouraging and enabling the development of an individual; and facilitating learning before, during, and long after the change. This paper not only showcases the importance of change management but also identifies key objectives for a purposeful strategy. We outline our experiences with both successful and not so successful organizational changes. We present best practices for implementing change management strategies and highlighting common gaps. For example, developing and engaging Change Champions from the beginning alleviates many headaches and avoids disruptions. Finally, we discuss how the overall company culture can either support or hinder the positive experience change management should be and how to engender support for formal change management in your organization.
Read the paper (PDF) | Watch the recording
Greg Nelson, ThotWave
Session 11862-2016:
College Football: Can the Public Predict Games Correctly?
Thanks to advances in technologies that make data more readily available, sports analytics is an increasingly popular topic. A majority of sports analyses use advanced statistics and metrics to achieve their goal, whether it be prediction or explanation. Few studies include public opinion data. Last year's highly anticipated NCAA College Football Championship game between Ohio State and Oregon broke ESPN and cable television records with an astounding 33.4 million viewers. Given the popularity of college football, especially now with the inclusion of the new playoff system, people seem to be paying more attention than ever to the game. ESPN provides fans with College Pick'em, which gives them a way to compete with their friends and colleagues on a weekly basis, for free, to see who can correctly pick the winners of college football games. Each week, 10 close matchups are selected, and users must select which team they think will win the game and rank those picks on a scale of 1 (lowest) to 10 (highest), according to their confidence level. For each team, the percentage of users who picked that team and the national average confidence are shown. Ideally, one could use these variables in conjunction with other information to enhance one's own predictions. The analysis described in this session explores the relationship between public opinion data from College Pick'em and the corresponding game outcomes by using visualizations and statistical models implemented by various SAS® products.
View the e-poster or slides (PDF)
Taylor Larkin, The University of Alabama
Matt Collins, University of Alabama
E
Session 2760-2016:
Easing into Data Exploration, Reporting, and Analytics Using SAS® Enterprise Guide®
Whether you have been programming in SAS® for years, are new to it, or have dabbled with SAS® Enterprise Guide® before, this hands-on workshop sheds some light on the depth, breadth, and power of the Enterprise Guide environment. With all the demands on your time, you need powerful tools that are easy to learn and deliver end-to-end support for your data exploration, reporting, and analytics needs. Included are the following: data exploration tools formatting code--cleaning up after your coworkers enhanced programming environment (and how to calm it down) easily creating reports and graphics producing the output formats you need (XLS, PDF, RTF, HTML) workspace layout start-up processing notes to help your coworkers use your processes This workshop uses SAS Enterprise Guide 7.1, but most of the content is applicable to earlier versions.
Read the paper (PDF)
Marje Fecht, Prowerk Consulting
Session 9180-2016:
Efficiently Create Rates over Different Time Periods (PROC MEANS and PROC EXPAND)
This session illustrates how to quickly create rates over a specified period of time, using the MEANS and EXPAND procedures. For example, do you want to know how to use the power of SAS® to create a year-to-date, rolling 12-month, or monthly rate? At Kaiser Permanente, we use this technique to develop Emergency Department (ED) use rates, ED admit rates, patient day rates, readmission rates, and more. A powerful function of PROC MEANS, given a database table with several dimensions and one or more facts, is to perform a mathematical calculation on fact columns across several different combinations of dimensions. For example, if a membership database table exists with the dimensions member ID, year-month, line of business, medical office building, and age grouping, PROC MEANS can easily determine and output the count of members by every possible dimension combination into a SAS data set. Likewise, if a hospital visit database table exists with the same dimensions and facts, PROC MEANS can output the number of hospital visits by the dimension combinations into a second SAS data set. With the power of PROC EXPAND, each of the data sets above, once sorted properly, can have columns added, which calculate total members and total hospital visits by a time dimension of the analyst's choice. Common time dimensions used for Kaiser Permanente's utilization rates are monthly, rolling 12-months, and year-to-date. The resulting membership and hospital visit data sets can be joined with a MERGE statement, and simple division produces a rate for the given dimensions.
Read the paper (PDF) | Watch the recording
Thomas Gant, Kaiser Permanente
Session 2581-2016:
Empowering People to Use SAS® as a Weapon for Work Reduction
You have SAS® Enterprise Guide® installed. You use SAS Enterprise Guide in your day-to-day work. You see how Enterprise Guide can be an aid to accessing data and insightful analytics. You have people you work with or support who are new to SAS® and want to learn. You have people you work with or support who don't particularly want to code but use the GUI and wizard within Enterprise Guide. And then you have the spreadsheet addict, the person or group who refuse to even sign on to SAS. These people need to consume the data sitting in SAS, and they need to do analysis, but they want to do it all in a spreadsheet. But you need to retain an audit trail of the data, and you have to reduce the operational risk of using spreadsheets for reporting. What do you do? This paper shares some of the challenges and triumphs in empowering these very different groups of people using SAS.
Read the paper (PDF)
Anita Measey, Bank of Montreal
Session 9600-2016:
Evaluation of a Customer's Life Cycle Time-to-Offer Cross-sales in a Bank, Based on the Behavior Score Using Logit Models & DTMC in SAS®
Scotiabank Colombian division - Colpatria, is the national leader in terms of providing credit cards, with more than 1,600,000 active cards--the equivalent to a portfolio of 700 million dollars approximately. The behavior score is used to offer credit cards through a cross-sell process, which happens only if customers have completed six months on books after using their first product with the bank. This is the minimum period of time requested by the behavior Artificial Neural Network (ANN) model. The six months on books internal policy suggests that the maturation of the client in this period is adequate, but this has never been proven. The following research aims to evaluate this hypothesis and calculate the appropriate time to offer cross-sales to new customers using Logistic Regression (Logit), while also segmenting these sales targets by their level of seniority using Discrete-Time Markov Chains (DTMC).
Read the paper (PDF)
Oscar Javier Cortés Arrigui, Scotiabank - Colpatria
Miguel Angel Diaz Rodriguez, Scotiabank - Colpatria
G
Session 11362-2016:
Generating Color Scales in SAS®: 256 Shades of RGB
Color is an important aspect of data visualization and provides an analyst with another tool for identifying data trends. But is the default option the best for every case? Default color scales may be familiar to us; however, they can have inherent flaws that skew our perception of the data. The impact of data visualizations can be considerably improved with just a little thought toward choosing the correct colors. Selecting an appropriate color map can be difficult, and you may decide that it may be easier to generate your own custom color scale. After a brief introduction to the red, green, and blue (RGB) color space, we discuss the strengths and weaknesses of some widely used color scales, as well as what to keep in mind when designing your own. A simple technique is presented, detailing how you can use SAS® to generate a number of color scales and apply them to your data. Using just a few graphics procedures, you can transform almost any complex data into an easily digestible set of visuals. The techniques used in this discussion were developed and tested using SAS® Enterprise Guide® 5.1.
Read the paper (PDF) | View the e-poster or slides (PDF)
Jeff Grant, Bank of Montreal
Mahmoud Mamlouk, BMO Harris Bank
H
Session 8820-2016:
How Managers and Executives Can Leverage SAS® Enterprise Guide®
SAS® Enterprise Guide® is an extremely valuable tool for programmers, but it should also be leveraged by managers and executives to do data exploration, get information on the fly, and take advantage of the powerful analytics and reporting that SAS® has to offer. This can all be done without learning to program. This paper gives an overview of how SAS Enterprise Guide can improve the process of turning real-time data into real-time business decisions by managers.
Read the paper (PDF)
Steven First, Systems Seminar Consultants, Inc.
K
Session 9000-2016:
Kicking and Screaming Your Way to SAS® Enterprise Guide®
You are a skilled SAS® programmer. You can code circles around those newbies who point and click in SAS® Enterprise Guide®. And yet& there are tasks you struggle with on a regular basis, such as Is the name of that data set DRUG or DRUGS? and What intern wrote this code? It's not formatted well at all and is hard to read. In this seminar you learn how to program, yes program, more efficiently. You learn the benefits of autocomplete and inline help, as well as how to easily format the code that intern wrote that you inherited. In addition, you learn how to create a process flow of a program to identify any dead ends, i.e., data sets that get created but are not used in that program.
Read the paper (PDF)
Michelle Buchecker, ThotWave Technologies
M
Session 5580-2016:
Macro Variables in SAS® Enterprise Guide®
For SAS® Enterprise Guide® users, sometimes macro variables and their values need to be brought over to the local workspace from the server, especially when multiple data sets or outputs need to be written to separate files in a local drive. Manually retyping the macro variables and their values in the local workspace after they have been created on the server workspace would be time-consuming and error-prone, especially when we have quite a number of macro variables and values to bring over. Instead, this task can be achieved in an efficient manner by using dictionary tables and the CALL SYMPUT routine, as illustrated in more detail below. The same approach can also be used to bring macro variables and their values from the local to the server workspace.
Read the paper (PDF) | Download the data file (ZIP) | Watch the recording
Khoi To, Office of Planning and Decision Support, Virginia Commonwealth University
O
Session 10640-2016:
Optimizing Airline Pilot Connection Time Using PROC REG and PROC LOGISTIC
As any airline traveler knows, connection time is a key element of the travel experience. A tight connection time can cause angst and concern, while a lengthy connection time can introduce boredom and a longer than desired travel time. The same elements apply when constructing schedules for airline pilots. Like passengers, pilot schedules are built with connections. Delta Air Lines operates a hub and spoke system that feeds both passengers and pilots from the spoke stations and connects them through the hub stations. Pilot connection times that are tight can result in operational disruptions, whereas extended pilot connection times are inefficient and unnecessarily costly. This paper demonstrates how Delta Air Lines used SAS® PROC REG and PROC LOGISTIC to analyze historical data in order to build operationally robust and financially responsible pilot connections.
Read the paper (PDF)
Andy Hummel, Delta Air Lines
P
Session 10381-2016:
Pastries, Microbreweries, Diamonds, and More: Small Businesses Can Profit with SAS®
Today, there are 28 million small businesses, which account for 54% of all sales in the United States. The challenge is that small businesses struggle every day to accurately forecast future sales. These forecasts not only drive investment decisions in the business, but also are used in setting daily par, determining labor hours, and scheduling operating hours. In general, owners use their gut instinct. Using SAS® provides the opportunity to develop accurate and robust models that can unlock costs for small business owners in a short amount of time. This research examines over 5,000 records from the first year of daily sales data for a start-up small business, while comparing the four basic forecasting models within SAS® Enterprise Guide®. The objective of this model comparison is to demonstrate how quick and easy it is to forecast small business sales using SAS Enterprise Guide. What does that mean for small businesses? More profit. SAS provides cost-effective models for small businesses to better forecast sales, resulting in better business decisions.
View the e-poster or slides (PDF)
Cameron Jagoe, The University of Alabama
Taylor Larkin, The University of Alabama
Denise McManus, University of Alabama
Session 9660-2016:
Performing Efficient Wide-to-Long Transposes on Teradata Tables Using SAS® Explicit Pass-Through
SAS® provides in-database processing technology in the SQL procedure, which allows the SQL explicit pass-through method to push some or all of the work to a database management system (DBMS). This paper focuses on using the SAS SQL explicit pass-through method to transform Teradata table columns into rows. There are two common approaches for transforming table columns into rows. The first approach is to create narrow tables, one for each column that requires transposition, and then use UNION or UNION ALL to append all the tables together. This approach is straightforward but can be quite cumbersome, especially when there is a large number of columns that need to be transposed. The second approach is using the Teradata TD_UNPIVOT function, which makes the wide-to-long table transposition an easy job. However, TD_UNPIVOT allows you to transpose only columns with the same data type from wide to long. This paper presents a SAS macro solution to the wide-to-long table transposition involving different column data types. Several examples are provided to illustrate the usage of the macro solution. This paper complements the author's SAS paper Performing Efficient Transposes on Large Teradata Tables Using SQL Explicit Pass-Through in which the solution of performing the long-to-wide table transposition method is discussed. SAS programmers who are working with data stored in an external DBMS and would like to efficiently transpose their data will benefit from this paper.
Read the paper (PDF) | Watch the recording
Tao Cheng, Accenture
Session 11140-2016:
Predicting Rare Events Using Specialized Sampling Techniques in SAS®
In recent years, many companies are trying to understand the rare events that are very critical in the current business environment. But a data set with rare events is always imbalanced and the models developed using this data set cannot predict the rare events precisely. Therefore, to overcome this issue, a data set needs to be sampled using specialized sampling techniques like over-sampling, under-sampling, or the synthetic minority over-sampling technique (SMOTE). The over-sampling technique deals with randomly duplicating minority class observations, but this technique might bias the results. The under-sampling technique deals with randomly deleting majority class observations, but this technique might lose information. SMOTE sampling deals with creating new synthetic minority observations instead of duplicating minority class observations or deleting the majority class observations. Therefore, this technique can overcome the problems, like biased results and lost information, found in other sampling techniques. In our research, we used an imbalanced data set containing results from a thyroid test with 3,163 observations, out of which only 4.7 percent of the observations had positive test results. Using SAS® procedures like PROC SURVERYSELECT and PROC MODECLUS, we created over-sampled, under-sampled, and the SMOTE sampled data set in SAS® Enterprise Guide®. Then we built decision tree, gradient boosting, and rule induction models using four different data sets (non-sampled, majority under-sampled, minority over-sampled with majority under-sampled, and minority SMOTE sampled with majority under-sampled) in SAS® Enterprise Miner™. Finally, based on the receiver operating characteristic (ROC) index, Kolmogorov-Smirnov statistics, and the misclassification rate, we found that the models built using minority SMOTE sampled with the majority under-sampled data yields better output for this data set.
Read the paper (PDF)
Rhupesh Damodaran Ganesh Kumar, Oklahoma State University (SAS and OSU data mining Certificate)
Kiren Raj Mohan Jagan Mohan, Zions Bancorporation
Session 11779-2016:
Predicting Response Time for the First Reply after a Question Is Posted in the SAS® Community Forum
Many inquisitive minds are filled with excitement and anticipation of response every time one posts a question on a forum. This paper explores the factors that impact the response time of the first response for questions posted in the SAS® Community forum. The factors are contributors' availability, nature of topic, and number of contributors knowledgeable for that particular topic. The results from this project help SAS® users receive an estimated response time, and the SAS Community forum can use this information to answer several business questions such as following: What time of the year is likely to have an overflow of questions? Do specific topics receive delayed responses? Which days of the week are the community most active? To answer such questions, we built a web crawler using Python and Selenium to fetch data from the SAS Community forum, one of the largest analytics groups. We scraped over 13,443 queries and solutions starting from January 2014 to present. We also captured several query-related attributes such as the number of replies, likes, views, bookmarks, and the number of people conversing on the query. Using different tools, we analyzed this data set after clustering the queries into 22 subtopics and found interesting patterns that can help the SAS Community forum in several ways, as presented in this paper.
View the e-poster or slides (PDF)
Praveen Kumar Kotekal, Oklahoma State University
Session 11671-2016:
Predicting the Influence of Demographics on Domestic Violence Using SAS® Enterprise Guide® 6.1 and SAS® Enterprise Miner™ 12.3
The Oklahoma State Department of Health (OSDH) conducts home visiting programs with families that need parental support. Domestic violence is one of the many screenings performed on these visits. The home visiting personnel are trained to do initial screenings; however, they do not have the extensive information required to treat or serve the participants in this arena. Understanding how demographics such as age, level of education, and household income among others, are related to domestic violence might help home visiting personnel better serve their clients by modifying their questions based on these demographics. The objective of this study is to better understand the demographic characteristics of those in the home visiting programs who are identified with domestic violence. We also developed predictive models such as logistic regression and decision trees based on understanding the influence of demographics on domestic violence. The study population consists of all the women who participated in the Children First Program of the OSDH from 2012 to 2014. The data set contains 1,750 observations collected during screening by the home visiting personnel over the two-year period. In addition, they must have completed the Demographic form as well as the Relationship Assessment form at the time of intake. Univariate and multivariate analysis has been performed to discover the influence that age, education, and household income have on domestic violence. From the initial analysis, we can see that women who are younger than 25 years old, who haven't completed high school, and who are somewhat dependent on their husbands or partners for money are most vulnerable. We have even segmented the clients based on the likelihood of domestic violence.
View the e-poster or slides (PDF)
Soumil Mukherjee, Oklahoma State University
Goutam Chakraborty, Oklahoma State University
Miriam McGaugh, Oklahoma state department of Health
R
Session 10701-2016:
Running Projects for the Average Joe
This paper explores some proven methods used to automate complex SAS® Enterprise Guide® projects so that the average Joe can run them with little or no prior experience. There are often times when a programmer is requested to extract data and dump it into Microsoft Excel for a user. Often these data extracts are very similar and can be run with previously saved code. However, the user quite often has to wait for the programmer to have the time to simply run the code. By automating the code, the programmer regains control over their data requests. This paper discusses the benefits of establishing macro variables and creating stored procedures, among other tips
Read the paper (PDF) | Watch the recording
Jennifer Davies, Department of Education
S
Session 11480-2016:
Solving a Business Problem in SAS® Enterprise Guide®: Creating a "Layered" Inpatient Indicator Model
This paper describes a Kaiser Permanente Northwest business problem regarding tracking recent inpatient hospital utilization at external hospitals, and how it was solved with the flexibility of SAS® Enterprise Guide®. The Inpatient Indicator is an estimate of our regional inpatient hospital utilization as of yesterday. It tells us which of our members are in which hospitals. It measures inpatient admissions, which are health care interactions where a patient is admitted to a hospital for bed occupancy to receive hospital services. The Inpatient Indicator is used to produce data and create metrics and analysis essential to the decision making of Kaiser Permanente executives, care coordinators, patient navigators, utilization management physicians, and operations managers. Accurate, recent hospital inpatient information is vital for decisions regarding patient care, staffing, and member utilization. Due to a business policy change, Kaiser Permanente Northwest lost the ability to track urgent and emergent inpatient admits at external, non-plan hospitals through our referral system, which was our data source for all recent external inpatient admits. Without this information, we did not have complete knowledge of whether a member had an inpatient stay at an external hospital until a claim was received, which could be several weeks after the member was admitted. Other sources were needed to understand our inpatient utilization at external hospitals. A tool was needed with the flexibility to easily combine and compare multiple data sets with different field names, formats, and values representing the same metric. The tool needed to be able to import data from different sources and export data to different destinations. We also needed a tool that would allow this project to be scheduled. We chose to build the model with SAS Enterprise Guide.
View the e-poster or slides (PDF)
Thomas Gant, Kaiser Permanente
T
Session 8280-2016:
Transforming Data to Information in Education: Stop with the Point-and-Click!!!
Educational systems at the district, state, and national levels all report possessing amazing student-level longitudinal data systems (LDS). Are the LDS systems improving educational outcomes for students? Are they guiding development of effective instructional practices? Are the standardized exams measuring student knowledge relative to the learning expectations? Many questions exist about the effective use of the LDS system and educational data, but data architecture and analytics (including the products developed by SAS®) are not designed to answer any of these questions. However, the ability to develop more effective educational interfaces, improve use of data to the classroom level, and improve student outcomes, might only be available through use of SAS. The purpose of this session and paper is to demonstrate an integrated use of SAS tools to guide the transformation of data to analytics that improve educational outcomes for all students.
Read the paper (PDF)
Sean Mulvenon, University of Arkansas
U
Session 5581-2016:
Using PROC TABULATE and LAG(n) Function for Rates of Change
For SAS® users, PROC TABULATE and PROC REPORT (and its compute blocks) are probably among the most common procedures for calculating and displaying data. It is, however, pretty difficult to calculate and display changes from one column to another using data from other rows with just these two procedures. Compute blocks in PROC REPORT can calculate additional columns, but it would be challenging to pick up values from other rows as inputs. This presentation shows how PROC TABULATE can work with the lag(n) function to calculate rates of change from one period of time to another. This offers the flexibility of feeding into calculations the data retrieved from other rows of the report. PROC REPORT is then used to produce the desired output. The same approach can also be used in a variety of scenarios to produce customized reports.
Read the paper (PDF) | Download the data file (ZIP) | Watch the recording
Khoi To, Office of Planning and Decision Support, Virginia Commonwealth University
back to top