Getting Started Papers A-Z

A
Session 0294-2017:
A Critique of Implementing the Study Data Tabulation Model (SDTM) for Drugs and Medical Devices
The Clinical Data Interchange Standards Consortium (CDISC) encompasses a variety of standards for medical research. Amongst the several standards developed by the CDISC organization are standards for data collection (Clinical Data Acquisition Standards Harmonization CDASH), data submission (Study Data Tabulation Model SDTM), and data analysis (Analysis Data Model ADaM). These standards were originally developed with drug development in mind. Therapeutic Area User Guides (TAUGs) have been a recent focus to provide advice, examples, and explanations for collecting and submitting data for a specific disease. Non-subjects even have a way to collect data using the Associated Persons Implementation Guide (APIG). SDTM domains for medical devices were published in 2012. Interestingly, the use of device domains in the TAUGs occurs in 14 out of 18 TAUGs, providing examples of the use of various device domains. Drug-device studies also provide a contrast on adoption of CDISC standards for drug submissions versus device submissions. Adoption of SDTM in general and the seven device SDTM domains by the medical device industry has been slow. Reasons for the slow adoption are discussed in this paper.
Read the paper (PDF)
Carey Smoak, DataCeutics
Session SAS0729-2017:
A Man with One Watch Always Knows What Time It Is
A man with one watch always knows what time it is...but a man with two watches is never sure. Contrary to this adage, load forecasters at electric utilities would gladly wear an armful of watches. With only one model to choose from, it is certain that some forecasts will be wrong. But with multiple models, forecasters can have confidence about periods when the forecasts agree and can focus their attention on periods when the predictions diverge. Having a second opinion is preferred, and that's one of the six classic rules for forecasters as per Dr. Tao Hong of the University of North Carolina at Charlotte. Dr. Hong is the premiere thought leader and practitioner in the field of energy forecasting. This presentation discusses Dr. Hong's six rules, how they relate to the increasingly complex problem of forecasting electricity consumption, and the role that predictive analytics plays.
Read the paper (PDF)
Tim Fairchild, SAS
Session 0154-2017:
A Novel Approach to Calculating Medicare Hospital 30-Day Readmissions for the SAS® Novice
The hospital Medicare readmission rate has become a key indicator for measuring the quality of health care in the US. This rate is currently used by major health-care stakeholders including the Centers for Medicare and Medicaid Services (CMS), the Agency for Healthcare Research and Quality (AHRQ), and the National Committee for Quality Assurance (NCQA) (Fan and Sarfarazi, 2014). Although many papers have been written about how to calculate readmissions, this paper provides updated code that includes ICD-10 (International Classification of Diseases) code and offers a novel and comprehensive approach using SAS® DATA step options and PROC SQL. We discuss: 1) De-identifying patient data 2) Calculating sequential admissions 3) Subsetting criteria required to report for CMS 30-day readmissions. In addition, this papers demonstrates: 1) Using the output delivery system (ODS) to create a labeled and de-identified data set 2) Macro variables to examine data quality 3) Summary statistics for further reporting and analysis.
Read the paper (PDF) | View the e-poster or slides (PDF)
Karen Wallace, Centene Corporation
Session 0831-2017:
A Practical Guide to Healthcare Data: Tips, Traps, and Techniques
Healthcare is weird. Healthcare data is even more so. The digitization of healthcare data that describes the patient experience is a modern phenomenon, with most healthcare organizations still in their infancy. While the business of healthcare is already a century old, most organizations have focused their efforts on the financial aspects of healthcare and not on stakeholder experience or clinical outcomes. Think of the workflow that you might have experienced such as scheduling an appointment through doctor visits, obtaining lab tests, or obtaining prescriptions for interventions such as surgery or physical therapy. The modern healthcare system creates a digital footprint of administrative, process, quality, epidemiological, financial, clinical, and outcome measures, which range in size, cleanliness, and usefulness. Whether you are new to healthcare data or are looking to advance your knowledge of healthcare data and the techniques used to analyze it, this paper serves as a practical guide to understanding and using healthcare data. We explore common methods for how we structure and access data, discuss common challenges such as aggregating data into episodes of care, describe reverse engineering real world events, and talk about dealing with the myriad of unstructured data found in nursing notes. Finally, we discuss the ethical uses of healthcare data and the limits of informed consent, which are critically important for those of us in analytics.
Read the paper (PDF)
Greg Nelson, Thotwave Technologies, LLC.
Session SAS0245-2017:
Accessing DBMS with the GROOVY Procedure and a JDBC Connection
SAS/ACCESS® software grants access to data in third-party database management systems (DBMS), but how do you access data in DBMS not supported by SAS/ACCESS products? The introduction of the GROOVY procedure in SAS® 9.3 lets you retrieve this formerly inaccessible data through a JDBC connection. Groovy is an object-oriented, dynamic programming language executed on the Java Virtual Machine (JVM). Using Microsoft Azure HDInsight as an example, this paper demonstrates how to access and read data into a SAS data set using PROC GROOVY and a JDBC connection.
Read the paper (PDF)
Lilyanne Zhang, SAS
Session SAS0524-2017:
Adding a Workflow to Your Analytics with SAS® Visual Investigator
Monitoring server events to proactively identify future outages. Looking at financial transactions to check for money laundering. Analyzing insurance claims to detect fraud. These are all examples of the many applications that can use the power of SAS® analytics to identify threats to a business. Using SAS® Visual Investigator, users can now add a workflow to control how these threats are managed. Using the administrative tools provided, users can visually design the workflow that the threat would be routed through. In this way, the administrator can control the tasks within the workflow, as well as which users or groups those tasks are assigned to. This presentation walks through an example of using the administrative tools of SAS Visual Investigator to create a ticketing system in response to threats to a business. It shows how SAS Visual Investigator can easily be adapted to meet the changing nature of the threats the business faces.
Read the paper (PDF)
Gordon Robinson, SAS
Ryan Schmiedl, SAS
Session 1165-2017:
Advanced, Dynamic, and Effective Dashboarding with SAS® Visual Analytics
SAS® Visual Analytics provides a robust platform to perform business intelligence through a high-end and advanced dashboarding style. In today's technology era, dashboards not only help in gaining insight into an organization's operations, but they also are a key performance indicator. In this paper, I discuss five important and frequently used objects in SAS Visual Analytics. These objects are used to get the most out of dashboards in an effective and efficient way. This paper covers the use of dates (as a format) in the date slider and gauges, cascading filters, custom graphs, linking reports within sections of the same report or with other reports, and associating buttons with graphs for dynamic functionality.
Read the paper (PDF)
Abhilasha Tiwari, Accenture
Session 0988-2017:
An Analysis of the Coding Movement: How Code.org and Educators Are Bringing Coding to Every Student
Technology plays an integral role in every aspect of daily life. As a result, educators should leverage technology-based learning to ensure that students are provided with authentic, engaging, and meaningful learning experiences (Pringle, Dawson, and Ritzhaupt, 2015).The significance and value of computer science understanding continue to increase. A major resource that can be credited with spreading support for computer science is the site Code.org. Its mission is to enable every student in every school to have the opportunity to learn computer science (https://code.org/about). Two years ago, our mentor partnered with Code.org to conduct workshops within the Charlotte, NC area to educate teachers on how to teach computer science activities and concepts in their classrooms. We had the opportunity to assist during the workshops to provide student perspectives and opinions. As we look back on the workshops, we wondered, How are the teachers who attended the workshops implementing the concepts they were taught? After each workshop, a survey was distributed to the attendees to receive workshop feedback and to follow up. We collected the data from the surveys sent to participants and analyzed it using SAS® University Edition. The results of the survey concluded that the workshops were beneficial and that the educators had implemented a concept that they learned. We believe that computer science activity implementations will assist students across the curriculum.
View the e-poster or slides (PDF)
Lauren Cook, University of North Carolina at Charlotte
Talazia Moore, North Carolina State University
Session 1161-2017:
An Analysis of the Repetitiveness of Lyrics in Predicting a Song's Popularity
To determine whether there is a correlation between the repetitiveness of a song s lyrics and its popularity, the top 10 songs from the Billboard Hot 100 songs chart from 2006 to 2015 were collected. Song lyrics were assessed to determine the count of the top 10 words used. Word counts were used to predict the number of weeks the song was on the chart. The prediction model was analyzed to determine the quality of the model and whether word count was a significant predictor of a song s popularity. To investigate whether song lyrics are becoming more simplistic over time, several tests were performed to see whether the average word count has been changing over the years. All analysis was completed in SAS® using various procedures.
View the e-poster or slides (PDF)
Drew Doyle, University of Central Florida
Session 0773-2017:
An Easy-to-Use SAS® Macro for a Descriptive Statistics Table
Are you tired of copying PROC FREQ or PROC MEANS output and pasting it into your tables? Do you need to produce summary tables repeatedly? Are you spending a lot of your time generating the same summary tables for different subpopulations? This paper introduces an easy-to-use macro to generate a descriptive statistics table. The table reports counts and percentages for categorical variables, and means, standard deviations, medians, and quantiles for continuous variables. For variables with missing values, the table also includes the count and percentage missing. Customization options allow for the analysis of stratified data, specification of variable output order, and user-defined formats. In addition, this macro incorporates the SAS® Output Delivery System (ODS) to automatically produce a Rich Text Format (RTF) file, which can be further edited by a word processor for the purpose of publication.
Read the paper (PDF) | View the e-poster or slides (PDF)
Yuanchao Zheng, Stanford University
Jin Long, Stanford University
Maria Montez-Rath, Stanford University
Session 0853-2017:
An Information Technology Perspective of SAS® at The University of Memphis
The University of Memphis has been a SAS® customer since the mainframe computing era. Our deployments have included various SAS products involving web-based applications, client/server implementations, desktop installations, and virtualized services. This paper uses an information technology (IT) perspective to discuss how the University has leveraged SAS, as well as the latest benefits and challenges for our most recent deployment involving SAS® Visual Analytics.
Read the paper (PDF)
Robert Jackson, University of Memphis
Session SAS0409-2017:
An Insider's Guide to Fine-Tuning Your CREATE TABLE Statements Using SAS® Options
The SAS® code looks perfect. You submit it and to your amazement, there is a problem with the CREATE TABLE statement. You need to change the table definition, ever so slightly, but how? Explicit pass-through? That's not an option. Fortunately, there are a handful of SAS options that can save the day. This presentation covers everything you need to know in order to adjust the SAS CREATE TABLE statements using SAS options. This presentation covers the following SAS options: DBCREATE_TABLE_OPTS=, POST_STMT_OPTS=, POST_TABLE_OPTS=, PRE_STMT_OPTS=, and PRE_TABLE_OPTS=. We use Hadoop and Oracle examples to show why these options can make your life easier. From there, we use real code to show you how to use them.
Read the paper (PDF)
Jeff Bailey, SAS
Session SAS0282-2017:
Applying Text Analytics and Machine Learning to Assess Consumer Financial Complaints
The Consumer Financial Protection Bureau (CFPB) collects tens of thousands of complaints against companies each year, many of which result in the companies in question taking action, including making payouts to the individuals who filed the complaints. Given the volume of the complaints, how can an overseeing organization quantitatively assess the data for various trends, including the areas of greatest concern for consumers? In this presentation, we propose a repeatable model of text analytics techniques to the publicly available CFPB data. Specifically, we use SAS® Contextual Analysis to explore sentiment, and machine learning techniques to model the natural language available in each free-form complaint against a disposition code for the complaint, primarily focusing on whether a company paid out money. This process generates a taxonomy in an automated manner. We also explore methods to structure and visualize the results, showcasing how areas of concern are made available to analysts using SAS® Visual Analytics and SAS® Visual Statistics. Finally, we discuss the applications of this methodology for overseeing government agencies and financial institutions alike.
Read the paper (PDF)
Tom Sabo, SAS
Session 0836-2017:
Automate Validation of CDISC SDTM with SAS®
There are many good validation tools for Clinical Data Interchange Standards Consortium (CDISC) Study Data Tabulation Model (SDTM) such as Pinnacle 21. However, the power and customizability of SAS® provide an effective tool for validating SDTM data sets used in clinical trials FDA submissions. This paper presents three distinct methods of using SAS to validate the transformation from Electronic Data Capture (EDC) data into CDISC SDTM format. This includes: duplicate programming, an independent SAS program used to transform EDC data with PROC COMPARE; rules checker, a SAS program to verify a specific SDTM or regulatory rules applied to SDTM SAS data sets; and transformation validation, a SAS macro used to compare EDC data and SDTM using PROC FREQ to identify outliers. The three examples illustrate the diverse approaches to applying SAS programs to catch errors in data standard compliance or identify inconsistencies that would otherwise be missed by other general purpose utilities. The stakes are high when preparing for an FDA submission. Catching errors in SDTM during validation prior to a submission can mean the difference between success or failure for a drug or medical device.
Read the paper (PDF)
Sy Truong, Pharmacyclics
Session 1104-2017:
Automatically Create Diagrams Showing the Structure and Performance of Your SAS® Code
I have come up with a way to use the output of the SCAPROC procedure to produce DOT directives, which are then put through the Graphviz engine to produce a diagram. This allows the production of flowcharts of SAS® code automatically. I have enhanced the charts to also show the longest steps by run time, so even if you look at thousands of steps in a complex program, you can easily see the structure and flow of it, and where most of the time is spent, just by having a look for a few seconds. Great for documentation, benchmarking, tuning, understanding, and more.
Read the paper (PDF)
Philip Mason, Wood Street Consultants Ltd.
B
Session 0175-2017:
Best-Practice Programming Techniques Using SAS® Software
It's essential that SAS® users enhance their skills to implement best-practice programming techniques when using Base SAS® software. This presentation illustrates core concepts with examples to ensure that code is readable, clearly written, understandable, structured, portable, and maintainable. Attendees learn how to apply good programming techniques including implementing naming conventions for data sets, variables, programs, and libraries; code appearance and structure using modular design, logic scenarios, controlled loops, subroutines and embedded control flow; code compatibility and portability across applications and operating platforms; developing readable code and program documentation; applying statements, options, and definitions to achieve the greatest advantage in the program environment; and implementing program generality into code to enable its continued operation with little or no modifications.
Read the paper (PDF)
Kirk Paul Lafler, Software Intelligence Corporation
Session SAS0761-2017:
Breaking through the Barriers: Innovative Sampling Techniques for Unstructured Data Analysis
As in any analytical process, data sampling is a definitive step for unstructured data analysis. Sampling is of paramount importance if your data is fed from social media reservoirs such as Twitter, Facebook, Amazon, and Reddit, where information proliferation happens minute by minute. Unless you have a sophisticated analytical engine and robust physical servers to handle billions of pieces of data, you can't use all your data for analysis without sampling. So, how do you sample textual data? The standard method is to generate either a simple random sample, or a stratified random sample if a stratification variable exists in the data. Neither of these two methods can reliably produce a representative sample of documents from the population data simply because the process does not encompass a step to ensure that the distribution of terms between the population and sample sets remains similar. This shortcoming can cause the supervised or unsupervised learning to yield inaccurate results. If the generated sample is not representative of the population data, it is difficult to train and validate categories or sub-categories for those rare events during taxonomy development. In this paper, we show you new methods for sampling text data. We rely on a term-by-document matrix and SAS® macros to generate representative samples. Using these methods helps generate sufficient samples to train and validate every category in rule-based modeling approaches using SAS® Contextual Analysis.
Read the paper (PDF)
Murali Pagolu, SAS
Session 0865-2017:
Building an Analytics Culture at a 114-year-old Regulated Electric Utility
Coming off a recent smart grid implementation, OGE Energy Corp. was collecting more data than at any time in its history. This data held the potential to help the organization uncover new insights and chart new paths. Find out how OGE Energy is building a culture of data analytics by using SAS® tools, a distributed analytics model, and an analytics center of excellence.
Clayton Bellamy, OGE Energy Corp
C
Session SAS0454-2017:
Change Management: Best Practices for Implementing SAS® Prescriptive Analytics
When new technologies, workflows, or processes are implemented, an organization and its employees must embrace changes in order to ensure long-term success. This paper provides guidelines and best practices in change management that the SAS Advanced Analytics Division uses with customers when it implements prescriptive analytics solutions (provided by SAS/OR® software). Highlights include engaging technical leaders in defining project scope and providing functional design documents. The paper also highlights SAS' approach in engaging business leaders on business scope, garnering executive-level project involvement, establishing steering committees, defining use cases, developing an effective communication strategy, training, and implementing of SAS/OR solutions.
Read the paper (PDF)
Scott Shuler, SAS
Session SAS0436-2017:
Choosing the Best Fit for Your Client/Server Architecture: SAS® Studio versus SAS® Enterprise Guide®
SAS® is often deployed in a client/server architecture in which SAS® Foundation is installed on a server and is accessed from each user's workstation. Many system administrators prefer that users not log on directly to the server to run SAS, nor do they want to set up a complex Citrix environment. SAS client applications are an attractive alternative for this type of architecture. But with the advent of multiple SAS® Studio editions and ongoing enhancements to SAS® Enterprise Guide®, choosing the most suitable client application presents a challenge for many system administrators. To help guide you in this choice, this paper compares the administration of three SAS Foundation client applications that can be used in a client/server architecture: SAS Enterprise Guide, SAS® Studio Basic, and SAS® Studio Mid-Tier. The usage differences between SAS Studio and SAS Enterprise Guide have been addressed elsewhere. In this paper, we focus on differences that pertain specifically to system administration, including deployment, maintenance, and authentication. The information presented here will help system administrators determine which application best fits the needs of their users and their environment.
Read the paper (PDF)
Shayne Muelling, SAS
John Brower, SAS
Session 1288-2017:
Creating a Departmental Standard SAS® Enterprise Guide® Template
This presentation describes an ongoing effort to standardize and simplify SAS® coding across a rapidly growing analytics team in the health care industry. The number of SAS analysts in Kaiser Permanente's Data and Information Management Enhancement (DIME) department has nearly doubled in the past two years, going from approximately 20 to 40 analysts. The level of experience and technical skill varies greatly within the department. Analysts are required to provide quick turn-around on a large volume of analytical requests in this dynamic and high-demand environment. An effort was initiated in 2016 to create a SAS® Enterprise Guide® Template to standardize and simplify SAS coding across the department. The SAS Enterprise Guide® template is designed to be a standard project file containing predefined code shells and examples that can be used as a basis for all new SAS Enterprise Guide® projects. The primary goals of the template are to: 1) Effectively onboard new analysts to department standards; 2) Increase the efficiency of SAS development; 3) Bring consistency to how SAS is used; and 4) Simplify the transitioning of SAS jobs to the department's Production Support team. This presentation focuses on the process in which the template was initiated, drafted, and socialized across a large and diverse team of SAS analysts. It also highlights plans for ongoing maintenance of and improvements to the original template.
Read the paper (PDF)
Amanda Pasch, Kaiser Permanente
Chris Koppenhafer, Kaiser Permanente
D
Session 0977-2017:
Deriving Rows in CDISC ADaM BDS Data Sets Using SAS® DATA Step Programming
The Analysis Data Model (ADaM) Basic Data Structure (BDS) can be used for many analysis needs. We all know that the SAS® DATA step is a very flexible and powerful tool for data processing. In fact, the DATA step is very useful in the creation of a non-trivial BDS data set. This paper walks through a series of examples showing use of the SAS DATA step when deriving rows in BDS. These examples include creating new parameters, new time points, and changes from multiple baselines.
Read the paper (PDF)
Sandra Minjoe
Session 0830-2017:
Developing Your Data Strategy
The ever growing volume of data challenges us to keep pace in ensuring that we use it to its full advantage. Unfortunately, often our response to new data sources, data types, and applications is somewhat reactionary. There exists a misperception that organizations have precious little time to consider a purposeful strategy without disrupting business continuity. Strategy is a phrase that is often misused and ill-defined. However, it is nothing more than a set of integrated choices that help position an initiative for future success. This presentation covers the key elements defining data strategy. The following key topics are included: What data should we keep or toss? How should we structure data (warehouse versus data lake versus real-time event streaming)? How do we store data (cloud, virtualization, federation, cloud, Hadoop)? What is the approach we use to integrate and cleanse data (ETL versus cognitive/ automated profiling)? How do we protect and share data? These topics ensure that the organization gets the most value from our data. They explore how we prioritize and adapt our strategy to meet unanticipated needs in the future. As with any strategy, we need to make sure that we have a roadmap or plan for execution, so we talk specifically about the tools, technologies, methods, and processes that are useful as we design a data strategy that is both relevant and actionable to your organization.
Read the paper (PDF)
Greg Nelson, Thotwave Technologies, LLC.
Session SAS0418-2017:
Dictionaries: Referencing a New PROC FCMP Data Type
Hash objects have been supported in the DATA step and in the FCMP procedure for a while, but have you ever felt that hash objects could do a little more? For example, what if you needed to store more than doubles and character strings? Introducing PROC FCMP dictionaries. Dictionaries allow you to create references not only to numeric and character data, but they also give you fast in-memory hashing to arrays, other dictionaries, and even PROC FCMP hash objects. This paper gets you started using PROC FCMP dictionaries, describes usage syntax, and explores new programming patterns that are now available to your PROC FCMP programs, functions, and subroutines in the new SAS® Viya platform environment.
Read the paper (PDF)
Andrew Henrick, SAS
Mike Whitcher, SAS
Karen Croft, SAS
E
Session 0809-2017:
Easing into Data Exploration, Reporting, and Analytics Using SAS® Enterprise Guide®
Whether you have been programming in SAS® for years, or you are new to it, or you have dabbled with SAS® Enterprise Guide® before, this hands-on workshop sheds some light on the depth, breadth, and power of the SAS Enterprise Guide environment. With all the demands on your time, you need powerful tools that are easy to learn and that deliver end-to-end support for your data exploration, reporting, and analytics needs. Included in this workshop are data exploration tools; formatting code (cleaning up after your coworkers); enhanced programming environment (and how to calm it down); easily creating reports and graphics; producing the output formats you need (XLS, PDF, RTF, HTML); workspace layout; and productivity tips. This workshop uses SAS Enterprise Guide 7.1, but most of the content is applicable to earlier versions.
Read the paper (PDF) | Download the data file (ZIP)
Marje Fecht
Session 1139-2017:
Enabling Advanced Customer Value Management with SAS®
Join this breakout session hosted by the Customer Value Management team from Saudi Telecommunications Company to understand the journey we took with SAS® to evolve from simple below-the-line campaign communication to advanced customer value management (CVM). Learn how the team leveraged SAS tools ranging from SAS® Enterprise Miner to SAS® Customer Intelligence Suite in order to gain a deeper understanding of customers and move toward targeting customers with the right offer at the right time through the right channel.
Read the paper (PDF)
Noorulain Malik, Saudi Telecom
Session SAS0377-2017:
Escape the Desktop with ODS EPUB
The Base SAS® 9.4 Output Delivery System (ODS) EPUB destination enables users to deliver SAS® reports as e-books on Apple mobile devices. ODS EPUB e-books are truly mobile you don't need an Internet connection to read them. Just install Apple's free iBooks app, and you're good to go. This paper shows you how to create an e-book with ODS EPUB and sideload it onto your Apple device. You will learn new SAS® 9.4 techniques for including text, images, audio, and video in your ODS EPUB e-books. You will understand how to customize your e-book's table of contents (TOC) so that readers can easily navigate the e-book. And you will learn how to modify the ODS EPUB style to create specialized presentation effects. This paper provides beginning to intermediate instruction for writing e-books with ODS EPUB. Please bring your iPad, iPhone, or iPod to the presentation so that you can download and read the examples.
Read the paper (PDF)
David Kelley, SAS
Session 0788-2017:
Examining Higher Education Performance Metrics with SAS® Enterprise Miner™ and SAS® Visual Analytics
Given the proposed budget cuts to higher education in the state of Kentucky, public universities will likely be awarded financial appropriations based on several performance metrics. The purpose of this project was to conceptualize, design, and implement predictive models that addressed two of the state's metrics: six-year graduation rate and fall-to-fall persistence for freshmen. The Western Kentucky University (WKU) Office of Institutional Research analyzed five years' worth of data on first-time, full-time bachelor's degree seeking students. Two predictive models evaluated and scored current students on their likelihood to stay enrolled and their chances of graduating on time. Following an ensemble of machine-learning assessments, the scored data were imported into SAS® Visual Analytics, where interactive reports allowed users to easily identify which students were at a high risk for attrition or at risk of not graduating on time.
Read the paper (PDF)
Taylor Blaetz, Western Kentucky University
Tuesdi Helbig, Western Kentucky University
Gina Huff, Western Kentucky University
Matt Bogard, Western Kentucky University
Session SAS0587-2017:
Exploring the Art and Science of SAS® Text Analytics: Best Practices in Developing Rule-Based Models
Traditional analytical modeling, with roots in statistical techniques, works best on structured data. Structured data enables you to impose certain standards and formats in which to store the data values. For example, a variable indicating gas mileage in miles per gallon should always be a number (for example, 25). However, with unstructured data analysis, the free-form text no longer limits you to expressing this information in only one way (25 mpg, twenty-five mpg, and 25M/G). The nuances of language, context, and subjectivity of text make it more complex to fit generalized models. Although statistical methods using supervised learning prove efficient and effective in some cases, sometimes you need a different approach. These situations are when rule-based models with Natural Language Processing capabilities can add significant value. In what context would you choose a rule-based modeling versus a statistical approach? How do you assess the tradeoffs of choosing a rule-based modeling approach with higher interpretability versus a statistical model that is black-box in nature? How can we develop rule-based models that optimize model performance without compromising accuracy? How can we design, construct, and maintain a complex rule-based model? What is a data-driven approach to rule writing? What are the common pitfalls to avoid? In this paper, we discuss all these questions based on our experiences working with SAS® Contextual Analysis and SAS® Sentiment Analysis.
Read the paper (PDF)
Murali Pagolu, SAS
Cheyanne Baird, SAS
Christina Engelhardt, SAS
F
Session 1516-2017:
Five Ways to Create Macro Variables: A Short Introduction to the Macro Language
The macro language is both powerful and flexible. With this power, however, comes complexity, and this complexity often makes the language more difficult to learn and use. Fortunately, one of the key elements of the macro language is its use of macro variables, and these are easy to learn and easy to use. You can create macro variables using a number of different techniques and statements. However, the five most commonly methods are not only the most useful, but also among the easiest to master. Since macro variables are used in so many ways within the macro language, learning how they are created can also serve as an excellent introduction to the language itself. These methods include: 1) the %LET statement; 2) macro parameters (named and positional); 3) the iterative %DO statement; 4) using the INTO clause in PROC SQL; and 5) using the CALL SYMPUTX routine.
Read the paper (PDF) | Download the data file (ZIP)
Art Carpenter, California Occidental Consultants
Session 1422-2017:
Flags Flying: Avoiding Consistently Penalized Defenses in NFL Fantasy Football
In fantasy football, it is a relatively common strategy to rotate which team's defense a player uses based on some combination of favorable/unfavorable player matchups, recent performance, and projection of expected points. However, there is danger in this strategy because defensive scoring volatility is high, and any team has the possibility of turning in a statistically bad performance in a given week. This paper uses data mining techniques to identify which National Football League (NFL) teams give up high numbers of defensive penalties on a week-to-week basis, and to what degree those high-penalty games correlate with poor team defensive fantasy scores. Examining penalty count and penalty yards allowed totals, we can narrow down which teams are consistently hurt by poor technique and find correlation between games with high penalty totals to their respective fantasy football score. By doing so, we seek to find which teams should be avoided in fantasy football due to their likelihood of poor performance.
Robert Silverman, Franklin & Marshall College
Session 0959-2017:
Formats Are Your Friends
Formats can be used for more than just making your data look nice. They can be used in memory lookup tables and to help you create data-driven code. This paper shows you how to build a format from a data set, how to write a format out as a data set, and how to use formats to make programs data driven. Examples are provided.
Read the paper (PDF)
Anita Measey, BMO Financial Group
Lei Sun, BMO Financial Group
Session 1385-2017:
From Researcher to Programmer: Five SAS® Tips I Wished I Knew Then
Having crossed the spectrum from an epidemiologist and researcher (where ad hoc is a way of life and where research is the main focus) to a SAS® programmer (writing reusable code for automation and batch jobs, which require no manual interventions), I have learned a few things that I wish I had known as a researcher. These things would not only have helped me to be a better SAS programmer, but they also would have saved me time and effort as a researcher by enabling me to have well-organized, accurate code (that I didn't accidentally remove) and code that would work when I ran it again on another date. This poster presents five SAS tips that are common practice among SAS programmers. I provide researchers who use SAS with tips that are handy and useful, and I provide code (where applicable) that they can try out at home. Using the tips provided will make any SAS programmer smile when they are presented with your code (not guaranteed, but your results should not vary by using these tips).
View the e-poster or slides (PDF)
Crystal Carel, Baylor Scott & White Health
G
Session 1529-2017:
Getting Started with Bayesian Analytics
The presentation will give a brief introduction to Bayesian Analysis within SAS. Participants will learn the difference between Bayesian and Classical Statistics and be introduced to PROC MCMC.
Danny Modlin, SAS
Session SAS0709-2017:
Getting Started with Designing and Implementing a SAS® 9.4 Metadata and File System Security Design
SAS® has been installed at your organization now what? How do you approach configuring groups, roles, folders, and permissions in your environment? This presentation is built on best practices used within the U.S. SAS® Professional Services and Delivery division and aims to equip new and seasoned SAS administrators with the knowledge and tools necessary to design and implement a SAS metadata and file system security model. We start by covering the basic building blocks of the SAS® Intelligence Platform metadata and security framework. We discuss the SAS metadata architecture, and highlight the differences between groups and roles, permissions and capabilities, access control entries and access control templates, and what content can be stored within metadata folders versus in file system folders. We review the various authorization layers in a SAS deployment that must work together to create a secure environment, including the metadata layer, the file system, and the data layer. Then, we present a 10-step best practice approach for how to design your SAS metadata security model. We provide an introduction to basic metadata security design and file system security design templates that have been used extensively by SAS Professional Services and Delivery in helping customers secure their SAS environments.
Read the paper (PDF)
Angie Hedberg, SAS
Philip Hopkins, SAS
Session 1527-2017:
Getting Started with Multilevel Modeling
In this presentation you will learn the basics of working with nested data, such as students within classes, customers within households, or patients within clinics through the use of multilevel models. Multilevel models can accommodate correlation among nested units through random intercepts and slopes, and generalize easily to 2, 3, or more levels of nesting. These models represent a statistically efficient and powerful way to test your key hypotheses while accounting for the hierarchical nesting of the design. The GLIMMIX procedure is used to demonstrate analyses in SAS.
Catherine Truxillo, SAS
Session 0818-2017:
Getting Started with SAS® Prompts
Allowing SAS® users to leverage SAS prompts when running programs is a very powerful tool. Using SAS prompts makes it easier for SAS users to submit parameter-driven programs and for developers to create robust, data-driven programs. This presentation demonstrates how to create SAS prompts from SAS® Enterprise Guide® and shows how to roll them out to users so that they can take advantage of them from SAS Enterprise Guide, the SAS® Add-In for Microsoft Office, and the SAS® Stored Process Web Application.
Read the paper (PDF)
Brian Varney, Experis
Session 1058-2017:
Going Green With Your SAS® Applications
This paper shows how you can reduce the computing footprint of your SAS® applications without compromising your end products. The paper presents the 15 axioms of going green with your SAS applications. The axioms are proven, real-world techniques for reducing the computer resources used by your SAS programs. When you follow these axioms, your programs run faster, use less network bandwidth, use fewer desktop or shared server computing resources, and create more compact SAS data sets.
Read the paper (PDF)
Michael Raithel, Westat
Session 0187-2017:
Guidelines for Protecting Your Computer, Network, and Data from Malware Threats
Because many SAS® users either work for or own companies that house big data, the threat that malicious software poses becomes even more extreme. Malicious software, often abbreviated as malware, includes many different classifications, ways of infection, and methods of attack. This E-Poster highlights the types of malware, detection strategies, and removal methods. It provides guidelines to secure essential assets and prevent future malware breaches.
Read the paper (PDF) | View the e-poster or slides (PDF)
Ryan Lafler
H
Session SAS2005-2017:
Hands-On Workshop: Exploring SAS Visual Analytics on SAS® Viya™
Nicole Ball, SAS
Session SAS2003-2017:
Hands-On Workshop: Macro Coding by Example
This hands-on-workshop explores the power of SAS Macro, a text substitution facility for extending and customizing SAS programs. Examples will range from simple macro variables to advanced macro programs. As a participant, you will add macro syntax to existing programs to dynamically enhance your programming experience.
Michele Ensor, SAS
Session SAS2008-2017:
Hands-On Workshop: Python and CAS Integration on SAS® Viya™
Jay Laramore, SAS
Session SAS2002-2017:
Hands-On Workshop: SAS® Studio for SAS Programmers
This workshop provides hands-on experience with SAS® Studio. Workshop participants will use SAS's new web-based interface to access data, write SAS programs, and generate SAS code through predefined tasks. This workshop is intended for SAS programmers from all experience levels.
Stacey Syphus, SAS
Session SAS2009-2017:
Hands-On Workshop: Statistical Analysis using SAS® University Edition
This workshop provides hands-on experience performing statistical analysis with the Statistics tasks in SAS Studio. Workshop participants will learn to perform statistical analyses using tasks, evaluate which tasks are ideal for different kinds of analyses, edit the generated code, and customize a task.
Danny Modlin, SAS
I
Session SAS0668-2017:
I Am Multilingual: A Comparison of the Python, Java, Lua, and REST Interfaces to SAS® Viya™
The openness of SAS® Viya , the new cloud analytic platform that uses SAS® Cloud Analytic Services (CAS), emphasizes a unified experience for data scientists. You can now execute the analytics capabilities of SAS® in different programming languages including Python, Java, and Lua, as well as use a RESTful endpoint to execute CAS actions directly. This paper provides an introduction to these programming languages. For each language, we illustrate how the API is surfaced from the CAS server, the types of data that you can upload to a CAS server, and the result tables that are returned. This paper also provides a comprehensive comparison of using these programming languages to build a common analytical process, including loading data to a CAS server; exploring, manipulating, and visualizing data; and building statistical and machine learning models.
Read the paper (PDF)
Xiangxiang Meng, SAS
Kevin Smith, SAS
Session 1318-2017:
Import and Export XML Documents with SAS®
XML documents are becoming increasingly popular for transporting data from different operating systems. In the pharmaceutical industry, the Food and Drug Administration (FDA) requires pharmaceutical companies to submit certain types of data in XML format. This paper provides insights into XML documents and summarizes different methods of importing and exporting XML documents with SAS®, including: using the XML LIBNAME engine to translate between the XML markup and the SAS format; creating an XML Map and using the XML92 LIBNAME engine to read in XML documents and create SAS data sets; and using Clinical Data Interchange Standards Consortium (CDISC) procedures to import and export XML documents. An example of importing OpenClinica data into SAS by implementing these methods is provided.
Read the paper (PDF)
Fei Wang, McDougall Scientific
Session SAS0562-2017:
Increasing Your Productivity with New Features in SAS® Enterprise Guide®
SAS® Enterprise Guide® continues to add easy-to-use features that enable you to work more efficiently. For example, you can now debug your DATA step code with a DATA step debugger tool; upload data to SAS® Viya with a point-and-click task; control process flow execution behavior when an error occurs; export results to Microsoft Excel and Microsoft PowerPoint destinations with the click of a button; zoom views; filter the data grid with your own WHERE clause; easily define case-insensitive filters; and automatically get the latest product updates. Come see these and more new features and enhancements in SAS Enterprise Guide 7.11, 7.12, and 7.13.
Read the paper (PDF)
Casey Smith, SAS
Session 0971-2017:
Instant Formats in a Blink with PROC FORMAT CNTLIN=
Do you need to create a format instantly? Does the format have a lot of labels, and it would take a long time to type in all the codes and labels by hand? Sometimes, a SAS® programmer needs to create a user-defined format for hundreds or thousands of codes, and he needs an easy way to accomplish this without having to type in all of the codes. SAS provides a way to create a user-defined format without having to type in any codes. If the codes and labels are in a text file, SAS data set, Excel file, or in any file that can be converted to a SAS data set, then a SAS user-defined format can be created on the fly. The CNTLIN=option of PROC FORMAT allows a user to create a user-defined format or informat from raw data or from a SAS file. This paper demonstrates how to create two user-defined formats instantly from a raw text file on our Census Bureau website. It explains how to use these user-defined formats for the final report and final output data set from PROC TABULATE. The paper focuses on the CNTLIN= option of PROC FORMAT, not the CNTLOUT= option.
Read the paper (PDF)
Christopher Boniface, U.S. Census Bureau
Session 1511-2017:
Intermediate SAS® ODS Graphics
This paper will build on the knowledge gained in the Intro to SAS® ODS Graphics. The capabilities in ODS Graphics grow with every release as both new paradigms and smaller tweaks are introduced. After talking with the ODS developers, a selection of the many wonderful capabilities was selected. This paper will look at that selection of both types of capabilities and provide the reader with more tools for their belt. Visualization of data is an important part of telling the story seen in the data. And while the standards and defaults in ODS Graphics are very well done, sometimes the user has specific nuances for characters in the story or additional plot lines they want to incorporate. Almost any possibility, from drama to comedy to mystery, is available in ODS Graphics if you know how. We will explore tables, annotation and changing attributes, as well as the BLOCK plot. Any user of Base SAS on any platform will find great value from the SAS ODS Graphics procedures. Some experience with these procedures is assumed, but not required.
Read the paper (PDF) | Download the data file (ZIP)
Chuck Kincaid, Experis Business Analytics
Session SAS0331-2017:
Introduction to SAS® Data Connectors and SAS® Data Connect Accelerators on SAS® Viya™
For many years now you have learned the ins and outs of using SAS/ACCESS® software to move data into SAS® to do your analytics. With the new open, cloud-ready SAS® Viya platform comes a new set of data access technologies known as SAS data connectors and SAS data connect accelerators. This paper describes what these new data access products are and how they integrate with the SAS Viya platform. After reading this paper, you will have the foundation needed to load data from third-party data sources into SAS Viya.
Read the paper (PDF)
Salman Maher, SAS
Chris DeHart, SAS
Barbara Kemper, SAS
Session SAS0722-2017:
Investigating Big-Data Crime Scenes
Statistical analysis is like detective work, and a data set is like the crime scene. The data set contains unorganized clues and patterns that can, with proper analysis, ultimately lead to meaningful conclusions. Using SAS® tools, a statistical analyst (like any good crime scene investigator) performs a preliminary analysis of the data set through visualization and descriptive statistics. Based on the preliminary analysis, followed by a detailed analysis, both the crime scene investigator (CSI) and the statistical analyst (SA) can use scientific or analytical tools to answer the key questions: What happened? What were the causes and effects? Why did this happen? Will it happen again? Applying the CSI analogy, this paper presents an example case study using a two-step process to investigate a big-data crime scene. Part I shows the general procedures that are used to identify clues and patterns and to obtain preliminary insights from those clues. Part II narrows the focus on the specific statistical analyses that provide answers to different questions.
Read the paper (PDF)
Theresa Ngo, SAS
Session SAS0495-2017:
Investigating Connections between Disparate Data Sources with SAS® Visual Investigator
In 1993, Erin Brockovich, a legal clerk to Edward L. Masry, began a lengthy manual investigation after discovering a link between elevated clusters of cancer cases in Hinkley, CA, and contaminated water in the same area due to the disposal of chemicals from a utility company. In this session, we combine disparate data sources - cancer cases and chemical spillages - to identify connections between the two data sets using SAS® Visual Investigator. Using the map and network functionalities, we visualize the contaminated areas and their link to cancer clusters. What took Erin Brockovich months and months to investigate, we can do in minutes with SAS Visual Investigator.
Read the paper (PDF)
Gordon Robinson, SAS
K
Session 1002-2017:
Know Thyself: Diabetes Trend Analysis
Throughout history, the phrase know thyself has been the aspiration of many. The trend of wearable technologies has certainly provided the opportunity to collect personal data. These technologies enable individuals to know thyself on a more sophisticated level. Specifically, wearable technologies that can track a patient's medical profile in a web-based environment, such as continuous blood glucose monitors, are saving lives. The main goal for diabetics is to replicate the functions of the pancreas in a manner that allows them to live a normal, functioning lifestyle. Many diabetics have access to a visual analytics website to track their blood glucose readings. However, they often are unreadable and overloaded with information. Analyzing these readings from the glucose monitor and insulin pump with SAS®, diabetics can parse their own information into more simplified and readable graphs. This presentation demonstrates the ease in creating these visualizations. Not only is this beneficial for diabetics, but also for the doctors that prescribe the necessary basal and bolus levels of insulin for a patient s insulin pump.
View the e-poster or slides (PDF)
Taylor Larkin, The University of Alabama
Denise McManus, The University of Alabama
M
Session SAS1008-2017:
Merging Marketing and Merchandising in Retail to Drive Profitable, Customer-Centric Assortments
As a retailer, have you ever found yourself reviewing your last season's assortment and wondering, What should I have carried in my assortment ? You are constantly faced with the challenge of product selection, placement, and ensuring your assortment will drive profitable sales. With millions of consumers, thousands of products, and hundreds of locations, this question can often times be challenging and overwhelming. With the rise in omnichannel, traditional approaches just won't cut it to gain the insights needed to maximize and manage localized assortments as well as increase customer satisfaction. This presentation explores applications of analytics within marketing and merchandising to drive assortment curation as well as relevancy for customers. The use of analytics can not only increase efficiencies but can also give insights into what you should be buying, how best to create a profitable assortment, and how to engage with customers in-season to drive their path to purchase. Leveraging an analytical infrastructure to infuse analytics into the assortment management process can help retailers achieve customer-centric insights, in a way that is easy to understand, so that retailers can quickly take insights to actions and gain the competitive edge.
Read the paper (PDF)
Brittany Bullard, SAS
Session 1231-2017:
Modeling Machiavelianism: Predicting Scores with Fewer Factors
Prince Niccolo Machiavelli said things on the order of, The promise given was a necessity of the past: the word broken is a necessity of the present. His utilitarian philosophy can be summed up by the phrase, The ends justify the means. As a personality trait, Machiavelianism is characterized by the drive to pursue one's own goals at the cost of others. In 1970, Richard Christie and Florence L. Geis created the MACH-IV test to assign a MACH score to an individual, using 20 Likert-scaled questions. The purpose of this study was to build a regression model that can be used to predict the MACH score of an individual using fewer factors. Such a model could be useful in screening processes where personality is considered, such as in job screening, offender profiling, or online dating. The research was conducted on a data set from an online personality test similar to the MACH-IV test. It was hypothesized that a statistically significant model exists that can predict an average MACH score for individuals with similar factors. This hypothesis was accepted.
View the e-poster or slides (PDF)
Patrick Schambach, Kennesaw State University
N
Session SAS0517-2017:
Nine Best Practices for Big Data Dashboards Using SAS® Visual Analytics
Creating your first suite of reports using SAS® Visual Analytics is like being a kid in a candy store with so many options for data visualization, it is difficult to know where to start. Having a plan for implementation can save you a lot of time in development and beyond, especially when you are wrangling big data. This paper helps you make sure that you are parallelizing work (where possible), maximizing your data insights, and creating a polished end product. We provide guidelines to common questions, such as How many objects are too many ? or When should I use multiple tabs versus report linking? to start any data visualizer off on the right foot.
Read the paper (PDF)
Elena Snavely, SAS
O
Session 0973-2017:
ODS TAGSETS.EXCELXP and ODS EXCEL Showdown
Do you create Excel files from SAS®? Do you use the ODS EXCELXP tagset or the ODS EXCEL destination? In this presentation, the EXCELXP tagset and the ODS EXCEL destination are compared face to face. There's gonna be a showdown! We give quick tips for each and show how to create Excel files for our Special Census program. Pros of each method are explored. We show the added benefits of the ODS EXCEL destination. We display how to create XML files with the EXCELXP tagset. We present how to use TAGATTR formats with the EXCELXP tagset to ensure that leading and trailing zeros in Excel are preserved. We demonstrate how to create the same Excel file with the ODS EXCEL destination with SAS formats instead of with TAGATTR formats. We show how the ODS EXCEL destination creates native Excel files. One of the drawbacks of an XML file created with the EXCELXP tagset is that a pop-up message is displayed in Excel each time you open it. We present differences using the ABSOLUTE_COLUMN_WIDTH= option in both methods.
Read the paper (PDF)
Christopher Boniface, U.S. Census Bureau
Session 1377-2017:
Optimizing SAS® on Red Hat Enterprise Linux 6 and 7
Today, companies are increasingly using analytics to discover new revenue and cost-saving opportunities. Many business professionals turn to SAS, a leader in business analytics software and service, to help them improve performance and make better decisions faster. Analytics is also being used in risk management, fraud detection, life sciences, sports, and many more emerging markets. However, to maximize the value to the business, analytics solutions need to be deployed quickly and cost-effectively, while also providing the ability to readily scale without degrading performance. Of course, in today's demanding environments, where budgets are still shrinking and mandates to reduce carbon footprints are growing, the solution must deliver excellent hardware utilization, power efficiency, and return on investment. To help solve some of these challenges, Red Hat and SAS have collaborated to recommend the best practices for configuring SAS®9 running on Red Hat Enterprise Linux. The scope of this document covers Red Hat Enterprise Linux 6 and 7. Areas researched include the I/O subsystem, file system selection, and kernel tuning, both in bare metal and virtualized (KVM) environments. Additionally, we now include grid-based configurations running with Red Hat Resilient Storage Add-On (Global File System 2 [GFS2] clusters).
Read the paper (PDF)
Barry Marson, Red Hat, Inc
P
Session 1294-2017:
Pillars of a Successful SAS® Implementation with Lessons from Boston Scientific
Moving a workforce in a new direction takes a lot of energy. Your planning should include four pillars: culture, technology, process, and people. These pillars assist small and large SAS® rollouts with a successful implementation and an eye toward future proofing. Boston Scientific is a large multi-national corporation that recently grew SAS from a couple of desktops to a global implementation. Boston Scientific's real world experiences reflect on each pillar, both in what worked and in lessons learned.
Read the paper (PDF)
Brian Bell, Boston Scientific
Tricia Aanderud, Zencos
Session 0942-2017:
Predictive Accuracy: A Misleading Performance Measure for Highly Imbalanced Data
The most commonly reported model evaluation metric is the accuracy. This metric can be misleading when the data are imbalanced. In such cases, other evaluation metrics should be considered in addition to the accuracy. This study reviews alternative evaluation metrics for assessing the effectiveness of a model in highly imbalanced data. We used credit card clients in Taiwan as a case study. The data set contains 30,000 instances (22.12% risky and 77.88% non-risky) assessing the likeliness of a customer defaulting on a payment. Three different techniques were used during the model building process. The first technique involved down-sampling the majority class in the training subset. The second used the original imbalanced data whereas prior probabilities were set to account for oversampling in the third technique. The same sets of predictive models were then built for each technique after which the evaluation metrics were computed. The results suggest that model evaluation metrics might reveal more about distribution of classes than they do about the actual performance of models when the data are imbalanced. Moreover, some of the predictive models were identified to be very sensitive to imbalance. The final decision in model selection should consider a combination of different measures instead of relying on one measure. To minimize imbalance-biased estimates of performance, we recommend reporting both the obtained metric values and the degree of imbalance in the data.
Read the paper (PDF)
Josephine Akosa, Oklahoma State University
Q
Session 0928-2017:
Quick Results with PROC SQL
SQL is a universal language that allows you to access data stored in relational databases or tables. This hands-on workshop presents core concepts and features of using PROC SQL to access data stored in relational database tables. Attendees learn how to define, access, and manipulate data from one or more tables using PROC SQL quickly and easily. Numerous code examples are presented on how to construct simple queries, subset data, produce simple and effective output, join two tables, summarize data with summary functions, construct BY-groups, identify FIRST. and LAST. rows, and create and use virtual tables.
Read the paper (PDF) | Download the data file (ZIP)
Kirk Paul Lafler, Software Intelligence Corporation
Session 0173-2017:
Quick Results with SAS® Enterprise Guide®
SAS® Enterprise Guide® empowers organizations, programmers, business analysts, statisticians, and end users with all the capabilities that SAS has to offer. This hands-on workshop presents the SAS Enterprise Guide graphical user interface (GUI). It covers access to multi-platform enterprise data sources, various data manipulation techniques that do not require you to learn complex coding constructs, built-in wizards for performing reporting and analytical tasks, the delivery of data and results to a variety of mediums and outlets, and support for data management and documentation requirements. Attendees learn how to use the graphical user interface to access SAS® data sets and tab-delimited and Microsoft Excel input files; to subset and summarize data; to join (or merge) two tables together; to flexibly export results to HTML, PDF, and Excel; and to visually manage projects using flow diagrams.
Read the paper (PDF)
Kirk Paul Lafler, Software Intelligence Corporation
Ryan Lafler
Session 0926-2017:
Quick Results with SAS® Enterprise Guide®
SAS® Enterprise Guide® empowers organizations, programmers, business analysts, statisticians, and users with all the capabilities that SAS® has to offer. This hands-on workshop presents the SAS Enterprise Guide graphical user interface (GUI), access to multi-platform enterprise data sources, various data manipulation techniques without the need to learn complex coding constructs, built-in wizards for performing reporting and analytical tasks, the delivery of data and results to a variety of mediums and outlets, and support for data management and documentation requirements. Attendees learn how to use the GUI to access SAS data sets and tab-delimited and Excel input files; how to subset and summarize data; how to join (or merge) two tables together; how to flexibly export results to HTML, PDF, and Excel; and how to visually manage projects using flow diagrams.
Read the paper (PDF) | Download the data file (ZIP)
Kirk Paul Lafler, Software Intelligence Corporation
Ryan Lafler
Session 0998-2017:
Quick Results with SAS® University Edition
The announcement of SAS Institute's free SAS® University Edition is an exciting development for SAS users and learners around the world! The software bundle includes Base SAS®, SAS/STAT® software, SAS/IML® software, SAS® Studio (user interface), and SAS/ACCESS® for Windows, with all the popular features found in the licensed SAS versions. This is an incredible opportunity for users, statisticians, data analysts, scientists, programmers, students, and academics everywhere to use (and learn) for career opportunities and advancement. Capabilities include data manipulation, data management, comprehensive programming language, powerful analytics, high-quality graphics, world-renowned statistical analysis capabilities, and many other exciting features. This paper illustrates a variety of powerful features found in the SAS University Edition. Attendees will be shown a number of tips and techniques on how to use the SAS® Studio user interface, and they will see demonstrations of powerful data management and programming features found in this exciting software bundle.
Read the paper (PDF)
Ryan Lafler
R
Session 0188-2017:
Removing Duplicates Using SAS®
We live in a world of data; small data, big data, and data in every conceivable size between small and big. In today's world, data finds its way into our lives wherever we are. We talk about data, create data, read data, transmit data, receive data, and save data constantly during any given hour in a day, and we still want and need more. So, we collect even more data at work, in meetings, at home, on our smartphones, in emails, in voice messages, sifting through financial reports, analyzing profits and losses, watching streaming videos, playing computer games, comparing sports teams and favorite players, and countless other ways. Data is growing and being collected at such astounding rates, all in the hope of being able to better understand the world around us. As SAS® professionals, the world of data offers many new and exciting opportunities, but it also presents a frightening realization that data sources might very well contain a host of integrity issues that need to be resolved first. This presentation describes the available methods to remove duplicate observations (or rows) from data sets (or tables) based on the row's values and keys using SAS.
Read the paper (PDF)
Kirk Paul Lafler, Software Intelligence Corporation
S
Session 1232-2017:
SAS® Abbreviations: Shortcuts for Remembering Complicated Syntax
One of the many difficulties for a SAS® programmer is remembering how to accurately use SAS syntax, especially syntax that includes many parameters. Not mastering basic syntax parameters definitely makes coding inefficient because the programmer has to check reference manuals constantly to ensure that syntax is correct. One of the more useful but somewhat unknown tools in SAS is the use of SAS abbreviations. This feature enables users to store text strings (such as the syntax of a DATA step function, a SAS procedure, or a complete DATA step) in a user-defined and easy-to-remember abbreviation. When this abbreviation is entered in the enhanced editor, SAS automatically brings up the corresponding stored syntax. Knowing how to use SAS abbreviations is beneficial to programmers with varying levels of SAS expertise. In this paper, various examples of using SAS abbreviations are demonstrated.
Yaorui Liu, USC
Session SAS0147-2017:
SAS® Customer Intelligence 360 for Dummies
Have you heard of SAS® Customer Intelligence 360, the program for creating a digital marketing SasS offering on a multi-tenant SAS cloud? Were you mesmerized by it but found it overwhelming? Did you tell yourself, I wish someone would show me how to do this ? This paper is for you. This paper provides you with an easy, step-by-step procedure on how to create a successful digital web, mobile, and email marketing campaign. In addition to these basics, the paper points to resources that allow you to get deeper into the application and customize each object to satisfy your marketing needs.
Read the paper (PDF)
Fariba Bat-haee, SAS
Denise Sealy, SAS
Session 1517-2017:
SAS® Data Integration: a Capgemini Solution to Accelerate and Keeping It All 'in Sync'
A common issue in data integration is that often the documentation and the SAS® data integration job source code start to diverge and eventually become out of sync. At Capgemini, working for a specific client, we developed a solution to rectify this challenge. We proposed moving all necessary documentation into the SAS® Data Integration Studio job itself. In this way, all documentation then becomes part of the metadata we have created, with the possibility of automatically generating Job and Release documentation from the metadata. This presentation therefore focuses on the metadata documentation generator. Specifically, this presentation: 1) looks at how to use programming and documentation standards in SAS data integration jobs to enable the generation of documentation from the metadata; and 2) shows how the documentation is generated from the metadata, and the challenges that were encountered creating the code. I draw on our hands-on experience; Capgemini has implemented this for a customer in the Netherlands, and we are rolling this out as an accelerator in other SAS data integration projects worldwide. I share examples of the generated documentation, which contains functional and technical designs, including a list with all source tables, a list with the target tables, all transformations with their own documentation, job dependencies, and more.
Read the paper (PDF)
Richard Hogenberg, Capgemini
Session 1479-2017:
SAS® Hash Objects, Demystified
The hash object provides an efficient method for quick data storage and data retrieval. Using a common set of lookup keys, hash objects can be used to retrieve data, store data, merge or join tables of data, and split a single table into multiple tables. This paper explains what a hash object is and why you should use hash objects, and provides basic programming instructions associated with the construction and use of hash objects in a DATA step.
Read the paper (PDF)
Dari Mazloom, USAA
Session 0969-2017:
SAS® Macros for Binning Predictors with a Binary Target
Binary logistic regression models are widely used in CRM (customer relationship management) or credit risk modeling. In these models, it is common to use nominal, ordinal, or discrete (NOD) predictors. NOD predictors typically are binned (reducing the number of their levels) before usage in a logistic model. The primary purpose of binning is to obtain parsimony without greatly reducing the strength of association of the predictor X to the binary target Y. In this paper, two SAS® macros are discussed. The %NOD_BIN macro bins predictors with nominal values (and ordinal and discrete values) by collapsing levels to maximize information value (IV). The %ORDINAL_BIN macro is applied to predictors that are ordered and in which collapsing can occur only for levels that are adjacent in the ordering of X. The %ORDINAL_BIN macro finds all possible binning solutions by complete enumeration. Solutions are ranked by IV, and monotonic solutions are identified.
Read the paper (PDF)
Bruce Lund, Magnify Analytic Solutions
Session 1293-2017:
SAS® Metadata Security 201: Security Basics for a New SAS Administrator
The purpose of this paper is to provide an overview of SAS® metadata security for new or inexperienced SAS administrators. The focus of the discussion is on identifying the most common metadata security objects such as access control entries (ACEs), access control templates (ACTs), metadata folders, authentication domains, and so on, and on describing how these objects work together to secure the SAS environment. Based on a standard SAS® Enterprise Office Analytics for Midsize Business installation in a Windows environment, this paper walks through a simple example of securing a metadata environment, which demonstrates how security is prioritized, the impact of each security layer, and how conflicts are resolved.
Read the paper (PDF)
Charyn Faenza, F.N.B. Corporation
Session 0135-2017:
Sankey Diagram: A Compelling, Convenient, and Informational Path Analysis with SAS® Visual Analytics
SAS® Visual Analytics provides a complete platform for analytics visualization and exploration of the data. There are several interactive visualizations such as charts, histograms, heat maps, decision tree, and Sankey diagrams. A Sankey diagram helps in performing path analytics and offers a better understanding of complex data. It is a graphic illustration of flows from one set of values to another as a series of paths, where the width of each flow represents the quantity. It is a better and more efficient way to illustrate which flows represent advantages and which flows are responsible for the disadvantages or losses. Sankey diagrams are named after Matthew Henry Phineas Riall Sankey, who first used this in a publication on energy efficiency of a steam engine in 1898. This paper begins with information regarding the essentials or parts of Sankey: nodes, links, drop-off links, and path. Later, the paper explains the method for creating a meaningful visualization (with the help of examples) with a Sankey diagram by looking into the data roles and properties, describing ways to manage the path selection, exploring the transaction identifier values for a path selection, and using the spotlight tool to view multiple data tips in SAS Visual Analytics. Finally, the paper provides recommendation and tips to work effectively and efficiently with the Sankey diagram.
Read the paper (PDF)
Abhilasha Tiwari, Accenture
Session 1340-2017:
Simplified Project Management Using a SAS® Visual Analytics Dashboard
The University of Central Florida (UCF) Institutional Knowledge Management (IKM) office provides data analysis and reporting for all UCF divisions. These projects are logged and tracked through the Oracle PeopleSoft content management system (CMS). In the past, projects were monitored via a weekly query pulled using SAS® Enterprise Guide®. The output would be filtered and prioritized based on project importance and due dates. A project list would be sent to individual staff members to make updates in the CMS. As data requests were increasing, UCF IKM needed a tool to get a broad overview of the entire project list and more efficiently identify projects in need of immediate attention. A project management dashboard that all IKM staff members can access was created in SAS® Visual Analytics. This dashboard is currently being used in weekly project management meetings and has eliminated the need to send weekly staff reports.
View the e-poster or slides (PDF)
Andre Watts, University of Central Florida
Danae Barulich, University of Central Florida
Session 1157-2017:
Statistical Volunteering with SAS: Experiences and Opportunities
This presentation brings together experiences from SAS® professionals working as volunteers for organizations, charities, and in academic research. Pro bono work, much like that done by physicians, attorneys, and professionals in other areas, is rapidly growing in statistical practice as an important part of a statistical career, offering the opportunity to use your skills in a places where they are so needed but cannot be supported in a for-pay position. Statistical volunteers also gain important learning experiences, mentoring, networking, and other opportunities for professional development. The presenter shares experiences from volunteering for local charities, non-governmental organizations (NGOs) and other organizations and causes, both in the US and around the world. The mission, methods, and focus of some organizations are presented, including DataKind, Statistics Without Borders, Peacework, and others.
Read the paper (PDF)
David Corliss, Peace-Work
Session 1095-2017:
Supplier Negotiations Optimized with SAS® Enterprise Guide®: Save Time and Money
Every sourcing and procurement department has limited resources to use for realizing productivity (cost savings). In practice, many organizations simply schedule yearly pricing negotiations with their main suppliers. They do not deviate from that approach unless there is a very large swing in the underlying commodity. Using cost data gleaned from previous quotes and SAS® Enterprise Guide®, we can put in place a program and methodology that move the practice from gut instinct to quantifiable and justifiable models that can easily be updated on a monthly basis. From these updated models, we can print a report of suppliers or categories that we should consider for cost downs, and suppliers or categories that we should work on to hold current pricing. By having all cost models, commodity data, and reporting functions within SAS Enterprise Guide, we are able to not only increase the precision and effectiveness of our negotiations, but also to vastly decrease the load of repetitive work that has been traditionally placed on supporting analysts. Now the analyst can execute the program, send the initial reports to the management team, and be leveraged for other projects and tasks. Moreover, the management team can have confidence in the analysis and the recommended plan of action.
View the e-poster or slides (PDF)
Cameron Jagoe, The University of Alabama
Denise McManus, The University of Alabama
Session 0924-2017:
Survival Analysis of Lung Cancer Patients Using PROC PHREG and PROC LIFETEST
Survival analysis differs from other types of statistical analysis, including graphical summaries and regression modeling procedures, because data is almost always censored. The purpose of this project is to apply survival analysis techniques in SAS® to practical survival data, aiming to understand the effects of gender and age on lung cancer patient survival at different cancer sites. Results show that both gender and age are significant variables in predicting lung cancer patient survival using the Cox proportional hazards model. Females have better survival than males when other variables in the model are fixed (p-value 0.0254). Moreover, the hazard of patients who are over 65 is 1.385 times that of patients who are under 65 (p-value 0.0145).
View the e-poster or slides (PDF)
Yan Wang, Kennesaw State University
T
Session 0832-2017:
The Elusive Data Scientist: Real-World Analytic Competencies
You've all seen the job posting that looks more like an advertisement for the ever-elusive unicorn. It begins by outlining the required skills that include a mixture of tools, technologies, and masterful things that you should be able to do. Unfortunately, many such postings begin with restrictions to those with advanced degrees in math, science, statistics, or computer science and experience in your specific industry. They must be able to perform predictive modeling, natural language processing, and, for good measure, candidates should apply only if they know artificial intelligence, cognitive computing, and machine learning. The candidate should be proficient in SAS®, R, Python, Hadoop, ETL, real-time, in-cloud, in-memory, in-database and must be a master storyteller. I know of no one who would be able to fit that description and still be able to hold a normal conversation with another human. In our work, we have developed a competency model for analytics, which describes nine performance domains that encompass the knowledge, skills, behaviors, and dispositions that today's analytic professional should possess in support of a learning, analytically driven organization. In this paper, we describe the model and provide specific examples of job families and career paths that can be followed based on the domains that best fit your skills and interests. We also share with participants a self-assessment tool so that they can see where the stack up!
Read the paper (PDF)
Greg Nelson, Thotwave Technologies, LLC.
Session 1322-2017:
The Orange Lifestyle
As a freshman at a large university, life can be fun as well as stressful. The choices a freshman makes while in college might impact his or her overall health. In order to examine the overall health and different behaviors of students at Oklahoma State University, a survey was conducted among the freshmen students. The survey focused on capturing the psychological, environmental, diet, exercise, and alcohol and drug use among students. A total of 795 out of 1,036 freshman students completed the survey, which included around 270 questions that covered the range of issues mentioned above. An exploratory factor analysis identified 26 factors. For example, two factors that relate to the behavior of students under stress are eating and relaxing. Further understanding the variables that contribute to alcohol and drug use might help the university in planning appropriate interventions and preventions. Factor analysis with Cronbach's alpha provided insight into a more defined set of variables to help address these types of issues. We used SAS® to do factor analysis as well as to create different clusters of students with unique characteristics and profiled these clusters
Read the paper (PDF)
Mohit Singhi, Oklahoma State University
Session 0978-2017:
The SAS® Ecosystem: A Programmer’s Perspective
You might encounter people who used SAS® long ago (perhaps in university), or people who had a very limited use of SAS in a job. Some of these people with limited knowledge and experience think that SAS is just a statistics package or just a GUI. Those that think of it as a GUI usually reference SAS® Enterprise Guide® or, if it was a really long time ago, SAS/AF® or SAS/FSP®. The reality is that the modern SAS system is a very large and complex ecosystem, with hundreds of software products and diversified tools for programmers and users. This poster provides diagrams and tables that illustrate the complexity of the SAS system from the perspective of a programmer. Diagrams and illustrations include the functional scope and operating systems in the ecosystem; different environments that program code can run in; cross-environment interactions and related tools; SAS® Grid Computing: parallel processing; how SAS can run with files in memory (the legacy SAFILE statement and big data and Hadoop); and how some code can run in-database. We end with a tabulation of the many programming languages and SQL dialects that are directly or indirectly supported within SAS. This poster should enlighten those who think that SAS is an old, dated statistics package or just a GUI.
View the e-poster or slides (PDF)
Thomas Billings, MUFG Union Bank
Session SAS0289-2017:
The Well-Equipped Student: Using SAS® University Edition and E-Learning to Gain SAS® Skills
SAS® programming skills are much in-demand, and numerous free tools are available for students who want to develop those skills. This paper introduces students to SAS® Studio and the Jupyter Notebook interface within SAS® University Edition. To make this introduction more tangible, the paper uses a large data set of baseball statistics as an example. In particular, statistical analysis using SAS® Studio examines the relationship between salary and performance for major leaguers. From importing text files to creating basic statistics to doing a more advanced analysis, this paper shows multiple ways to carry out tasks so that you can choose whichever method works best for you. Additional statistics that use t tests and linear regression are simple with SAS University Edition. For completeness, the paper shows the same code that is used in SAS Studio examples in the context of Jupyter Notebook in SAS University Edition. The paper also provides additional information about SAS e-learning and SAS Certification to show students how to be fully equipped in order to apply themselves to analytics and data exploration.
Read the paper (PDF)
Randy Mullis, SAS
Allison Mahaffey, SAS
Session 1270-2017:
Time Series Analysis and Forecasting in SAS® University Edition
Time series analysis and forecasting have always been popular as businesses realize the power and impact they can have. Getting students to learn effective and correct ways to build their models is key to having successful analyses as more graduates move into the business world. Using SAS® University Edition is a great way for students to learn analysis, and this talk focuses on the time series tasks. A brief introduction to time series is provided, as well as other important topics that are key to building strong models.
Read the paper (PDF)
Chris Battiston
Session 1489-2017:
Timing is Everything: Detecting Important Behavior Triggers
Predictive analytics is powerful in its ability to predict likely future behavior, and is widely used in marketing. However, the old adage timing is everything continues to hold true the right customer and the right offer at the wrong time is less than optimal. In life, timing matters a great deal, but predictive analytics seldom takes timing into account explicitly. We should be able to do better. Financial service consumption changes often have precursor changes in behavior, and a behavior change can lead to multiple subsequent consumption changes. One way to improve our awareness of the customer situation is to detect significant events that have meaningful consequences that warrant proactive outreach. This session presents a simple time series approach to event detection that has proven to work successfully. Real case studies are discussed to illustrate the approach and implementation. Adoption of this practice can augment and enhance predictive analytics practice to elevate our game to the next level.
Read the paper (PDF)
Daymond Ling, Seneca College
Session 1160-2017:
To Hydrate or Chlorinate: A Regression Analysis of the Levels of Chlorine in the Public Water Supply
Public water supplies contain disease-causing microorganisms in the water or distribution ducts. To kill off these pathogens, a disinfectant, such as chlorine, is added to the water. Chlorine is the most widely used disinfectant in all US water treatment facilities. Chlorine is known to be one of the most powerful disinfectants to restrict harmful pathogens from reaching the consumer. In the interest of obtaining a better understanding of what variables affect the levels of chlorine in the water, this presentation analyzed a particular set of water samples randomly collected from locations in Orange County, Florida. Thirty water samples were collected and their chlorine level, temperature, and pH were recorded. A linear regression analysis was performed on the data collected with several qualitative and quantitative variables. Water storage time, temperature, time of day, location, pH, and dissolved oxygen level were the independent variables collected from each water sample. All data collected was analyzed using various SAS® procedures. Partial residual plots were used to determine possible relationships between the chlorine level and the independent variables. A stepwise selection was used to eliminate possible insignificant predictors. From there, several possible models for the data were selected. F-tests were conducted to determine which of the models appeared to be the most useful.
View the e-poster or slides (PDF)
Drew Doyle, University of Central Florida
U
Session SAS0152-2017:
Using Python with SAS® Cloud Analytic Services
With SAS® Viya and SAS® Cloud Analytic Services (CAS), SAS is moving into a new territory where SAS® Analytics is accessible to popular scripting languages using open APIs. Python is one of those client languages. We demonstrate how to connect to CAS, run CAS actions, explore data, build analytical models, and then manipulate and visualize the results using standard Python packages such as Pandas and Matplotlib. We cover a wide variety of topics to give you a bird's eye view of what is possible when you combine the best of SAS with the best of open source.
Read the paper (PDF)
Kevin Smith, SAS
Xiangxiang Meng, SAS
Session 0194-2017:
Using SAS® Data Management Advanced to Ensure Data Quality for Master Data Management
Data is a valuable corporate asset that, when managed improperly, can detract from a company's ability to achieve strategic goals. At 1-800-Flowers.com, Inc. (18F), we have embarked on a journey toward data governance through embracing Master Data Management (MDM). Along the path, we've recognized that in order to protect and increase the value of our data, we must take data quality into consideration at all aspects of data movement in the organization. This presentation discusses the ways that SAS® Data Management is being leveraged by the team at 18F to create and enhance our data quality strategy to ensure data quality for MDM.
Read the paper (PDF)
Brian Smith, 1800Flowers.com
Session 0780-2017:
Using the Power of SAS® from JMP®
JMP® integrates very nicely with SAS® software, so you can do some pretty amazing things by combining the power of JMP and SAS. You can submit some code to run something on a SAS server and bring the results back as a JMP table. Then you can do lots of things with the JMP table to analyze the data returned. This workshop shows you how to access data via SAS servers, run SAS code and bring data back to JMP, and use JMP to do many things very quickly and easily. Explore the synergies between these tools; having both is a powerful combination that far outstrips just having one, or not using them together.
Read the paper (PDF) | Download the data file (ZIP)
Philip Mason, Wood Street Consultants Ltd.
W
Session SAS0708-2017:
What Is Needed for True Population Health Intelligence?
Health care has long been focused on providing reactive care for illness, injury, or chronic conditions. But the rising cost of providing health care has forced many countries, health insurance payers, and health care providers to shift approaches. A new focus on patient value includes providing financial incentives that emphasize clinical outcomes instead of treatments. This focus also means that providers and wellness programs are required to take a segmentation approach to the population under their care, targeting specific people based on their individual risks. This session discusses the benefits of a shift from thinking about health care data as a series of clinical or financial transactions, to one that is centered on patients and their respective clinical conditions. This approach allows for insights pertaining to care delivery processes and treatment patterns, including identification of potentially avoidable complications, variations in care provided, and inefficient care that contributes to waste. All of which contributes to poor clinical outcomes.
Read the paper (PDF)
Laurie Rose, SAS
Dan Stevens, SAS
Session 1393-2017:
What? I am the Linux Administrator for SAS® Visual Analytics?
Whether you are a new SAS® administrator or you are switching to a Linux environment, you have a complex mission. This job becomes even more formidable when you are working with a system like SAS® Visual Analytics that requires multiple users loading data daily. Eventually a user has data issues or creates a disruption that causes the system to malfunction. When that happens, what do you do next? In this paper, we go through the basics of a SAS Visual Analytics Linux environment and how to troubleshoot the system when issues arise.
Read the paper (PDF)
Ryan Kumpfmiller, Zencos
Session 1188-2017:
Where Does Cleopatra Really Belong? An Analysis of Slot Machine Placement and Performance Using SAS®
In the world of gambling, superstition drives behavior, which can be difficult to explain. Conflicting evidence suggests that slot machines, like BCLC's Cleopatra, perform well regardless of where they are placed on a casino floor. Other evidence disputes this, arguing that performance is driven by their strategic placement (for example, in high-traffic areas). We explore and quantify the location sensitivity of slot machines by leveraging SAS® to develop robust models. We test various methodologies and data import techniques (such as casino CAD floor plans) to unlock some of the nebulous concepts of player behavior, product performance, and superstition. By demystifying location sensitivity, key drivers of performance can be identified to aid in optimizing the placement of slot machines.
Read the paper (PDF)
Stephen Tam, British Columbia Lottery Corporation
Session SAS0567-2017:
Wrangling Your Data into Shape for In-Memory Analytics
High-quality analytics works best with the best-quality data. Preparing your data ranges from activities like text manipulation and filtering to creating calculated items and blending data from multiple tables. This paper covers the range of activities you can easily perform to get your data ready. High-performance analytics works best with in-memory data. Getting your data into an in-memory server, as well as keeping it fresh and secure, are considerations for in-memory data management. This paper covers how to make small or large data available and how to manage it for analytics. You can choose to perform these activities in a graphical user interface or via batch scripts. This paper describes both ways to perform these activities. You ll be well-prepared to get your data wrangled into shape for analytics!
Read the paper (PDF)
Gary Mehler, SAS
Y
Session SAS0603-2017:
You Imported What? Supporting International Trade with Advanced Analytics
Global trade and more people and freight moving across international borders present border and security agencies with a difficult challenge. While supporting freedom of movement, agencies must minimize risks, preserve national security, guarantee that correct duties are collected, deploy human resources to the right place, and ensure that additional checks do not result in increased delays for passengers or cargo. To meet these objectives, border agencies must make the most efficient use of their data, which is often found across disparate intelligence sources. Bringing this data together with powerful analytics can help them identify suspicious events, highlight areas of risk, process watch lists, and notify relevant agents so that they can investigate, take immediate action to intercept illegal or high-risk activities, and report findings. With SAS® Visual Investigator, organizations can use advanced analytical models and surveillance scenarios to identify and score events, and to deliver them to agents and intelligence analysts for investigation and action. SAS Visual Investigator provides analysts with a holistic view of people, cargo, relationships, social networks, patterns, and anomalies, which they can explore through interactive visualizations before capturing their decision and initiating an action.
Read the paper (PDF)
Susan Trueman, SAS
back to top