SAS Global Forum 2017 Proceedings

A B C D E F G H I K L M N P S T U V W Z

A

Session 0294-2017:

A Critique of Implementing the Study Data Tabulation Model (SDTM) for Drugs and Medical Devices

The Clinical Data Interchange Standards Consortium (CDISC) encompasses a variety of standards for medical research. Amongst the several standards developed by the CDISC organization are standards for data collection (Clinical Data Acquisition Standards Harmonization CDASH), data submission (Study Data Tabulation Model SDTM), and data analysis (Analysis Data Model ADaM). These standards were originally developed with drug development in mind. Therapeutic Area User Guides (TAUGs) have been a recent focus to provide advice, examples, and explanations for collecting and submitting data for a specific disease. Non-subjects even have a way to collect data using the Associated Persons Implementation Guide (APIG). SDTM domains for medical devices were published in 2012. Interestingly, the use of device domains in the TAUGs occurs in 14 out of 18 TAUGs, providing examples of the use of various device domains. Drug-device studies also provide a contrast on adoption of CDISC standards for drug submissions versus device submissions. Adoption of SDTM in general and the seven device SDTM domains by the medical device industry has been slow. Reasons for the slow adoption are discussed in this paper.

Read the paper (PDF)

Carey Smoak, DataCeutics

Session 1342-2017:

A Macro to Generate Kaplan-Meier Plot and Optional Estimates

We have a lot of chances to use time-to-event (survival) analysis, especially in the biomedical and pharmaceutical fields. SAS^® provides the LIFETEST procedure to calculate Kaplan-Meier estimates for survival function and to delineate a survival plot. The PHREG procedure is used in Cox regression models to estimate the effect of predictors in hazard rates. Programs with ODS tables that are defined by PROC LIFETEST and PROC PHREG can provide more statistical information from the generated data sets. This paper provides a macro that uses PROC LIFETEST and PROC PHREG with ODS. It helps users to have a survival plot with estimates that include the subject at risk, events and total subject number, survival rate with median and 95% confidence interval, and hazard ratio estimates with 95% confidence interval. Some of these estimates are optional in the macro, so users can select what they need to display in the output. (Subject at risk and event and subject number are not optional.) Users can also specify the tick marks in the X-axis and subject at risk table, for example, every 10 or 20 units. The macro dynamic calculates the maximum for the X-axis and uses the interval that the user specified. Finally, the macro uses ODS and can be output in any document files, including JPG, PDF, and RTF formats.

Read the paper (PDF) | View the e-poster or slides (PDF)

Chia-Ling Wu, University of Southern California

Session 0154-2017:

A Novel Approach to Calculating Medicare Hospital 30-Day Readmissions for the SAS^® Novice

The hospital Medicare readmission rate has become a key indicator for measuring the quality of health care in the US. This rate is currently used by major health-care stakeholders including the Centers for Medicare and Medicaid Services (CMS), the Agency for Healthcare Research and Quality (AHRQ), and the National Committee for Quality Assurance (NCQA) (Fan and Sarfarazi, 2014). Although many papers have been written about how to calculate readmissions, this paper provides updated code that includes ICD-10 (International Classification of Diseases) code and offers a novel and comprehensive approach using SAS^® DATA step options and PROC SQL. We discuss: 1) De-identifying patient data 2) Calculating sequential admissions 3) Subsetting criteria required to report for CMS 30-day readmissions. In addition, this papers demonstrates: 1) Using the output delivery system (ODS) to create a labeled and de-identified data set 2) Macro variables to examine data quality 3) Summary statistics for further reporting and analysis.

Read the paper (PDF) | View the e-poster or slides (PDF)

Karen Wallace, Centene Corporation

Session 0689-2017:

A Practical Guide to Getting Started with Propensity Scores

This presentation gives you the tools to begin using propensity scoring in SAS^® to answer research questions involving observational data. It is for both those attendees who have never used propensity scores and those who have a basic understanding of propensity scores but are unsure how to begin using them in SAS. It provides a brief introduction to the concept of propensity scores, and then turns its attention to giving you the tips and resources you need to get started. The presentation walks you through how the code in the book 'Analysis of Observational Health Care Data Using SAS^®', which was published by SAS Institute, is used to answer how a particular health care intervention impacted a health care outcome. It details how propensity scores are created and how propensity score matching is used to balance covariates between treated and untreated observations. With this case study in hand, you will feel confident that you have the tools necessary to begin answering some of your own research questions using propensity scores.

Read the paper (PDF)

Thomas Gant, Kaiser Permanente

Session 0831-2017:

A Practical Guide to Healthcare Data: Tips, Traps, and Techniques

Healthcare is weird. Healthcare data is even more so. The digitization of healthcare data that describes the patient experience is a modern phenomenon, with most healthcare organizations still in their infancy. While the business of healthcare is already a century old, most organizations have focused their efforts on the financial aspects of healthcare and not on stakeholder experience or clinical outcomes. Think of the workflow that you might have experienced such as scheduling an appointment through doctor visits, obtaining lab tests, or obtaining prescriptions for interventions such as surgery or physical therapy. The modern healthcare system creates a digital footprint of administrative, process, quality, epidemiological, financial, clinical, and outcome measures, which range in size, cleanliness, and usefulness. Whether you are new to healthcare data or are looking to advance your knowledge of healthcare data and the techniques used to analyze it, this paper serves as a practical guide to understanding and using healthcare data. We explore common methods for how we structure and access data, discuss common challenges such as aggregating data into episodes of care, describe reverse engineering real world events, and talk about dealing with the myriad of unstructured data found in nursing notes. Finally, we discuss the ethical uses of healthcare data and the limits of informed consent, which are critically important for those of us in analytics.

Read the paper (PDF)

Greg Nelson, Thotwave Technologies, LLC.

Session 1001-2017:

A SAS^® Program to Identify Duplicates in Clinical Data

Duplicates in a clinical trial or survey database could jeopardize data quality and integrity, and they can induce biased analysis results. These complications often happen in clinical trials, meta analyses, and registry and observational studies. Common practice to identify possible duplicates involves sensitive personal information, such as name, Social Security number (SSN), date of birth, address, telephone number, etc. However, access to this sensitive information is limited. Sometimes, it is even restricted. As a measure of data quality control, a SAS^® program was developed to identify duplicated individuals using non-sensitive information, such as age, gender, race, medical history, vital signs, and laboratory measurements. A probabilistic approach was used by calculating weights for data elements used to identify duplicates based on two probabilities (probability of agreement for an element among matched pairs and probability of agreement purely by chance among non-matched pairs). For elements with categorical values, agreement was defined as matching pairs sharing the same value. For elements with interval values, agreement was defined as matching values within 1% of measurement precision range. Probabilities used to compute matching element weights were estimated using an expectation-maximization (EM) algorithm. The method was then tested on a survey and clinical trial data from hypertension studies.

View the e-poster or slides (PDF)

Xiaoli Lu, VA CSPCC

Session SAS0465-2017:

Advanced Location Analytics Using Demographic Data from Esri and SAS^® Visual Analytics

Location information plays a big role in business data. Everything that happens in a business happens somewhere, whether it s sales of products in different regions or crimes that happened in a city. Business analysts typically use the historic data that they have gathered for years for analysis. One of the most important pieces of data that can help answer more questions qualitatively, is the demographic data along with the business data. An analyst can match the sales or the crimes with the population metrics like gender, age groups, family income, race, and other pieces of information, which are part of the demographic data, for better insight. This paper demonstrates how a business analyst can bring the demographic and lifestyle data from Esri into SAS^® Visual Analytics and join the data with business data. The integration of SAS Visual Analytics with Esri allows this to happen. We demonstrate different methods of accessing Esri demographic data from SAS Visual Analytics. We also demonstrate how you can use custom shape files and integrate with Esri Portal for ArcGIS.

Read the paper (PDF)

Murali Nori, SAS

Himesh Patel, SAS

Session 1031-2017:

Analyzing the Effectiveness of COPD Drugs Through Statistical Tests and Sentiment Analysis

Chronic obstructive pulmonary disease (COPD) is the third leading cause of death in the US. An estimated 24 million people suffer from COPD, and the medical cost associated with it stands at a whopping $36 billion. Besides the emotional and physical impact, a patient with COPD has to undergo severe economic burden to pay for the medication. Hospitals are subjected to heavy penalties for high re-admissions. Identifying the best medicine combinations to treat COPD benefits patients and hospitals. This paper deals with analyzing the effectiveness of three popular drugs prescribed for COPD patients in terms of mortality rates and re-admission within 30 days of discharge. The data from Cerner Health Facts consists of over 1 million real-world, anonymized patient records collected in a real-world health environment. Base SAS^® is used to perform statistical analysis and data processing; re-admission of patients is analyzed using a lag function. The preliminary results show a re-admission rate of 5.96% and a mortality rate of 3.3% among all patients. The odds ratios computed using logistic regression show an increased mortality rate 2.4 times more for patients using Symbicort compared to Spiriva and Advair. This paper also uses text mining of social media, drug portals, and blogs to gauge the sentiments of patients using these drugs. The results obtained through sentiment analysis are then compared with the statistical analysis to determine the effectiveness of drugs prescribed to the COPD patients.

Read the paper (PDF)

Indra Kiran Chowdavarapu, Oklahoma State University

Dursun Delen, Oklahoma State University

Vivek Manikandan Damodaran, Oklahoma State University

Session 1131-2017:

Application of Survival Analysis for Predicting Customer Churn with Recency, Frequency, and Monetary

Customer churn is an important area of concern that affects not just the growth of your company, but also the profit. Conventional survival analysis can provide a customer's likelihood to churn in the near term, but it does not take into account the lifetime value of the higher-risk churn customers you are trying to retain. Not all customers are equally important to your company. Recency, frequency, and monetary (RFM) analysis can help companies identify customers that are most important and most likely to respond to a retention offer. In this paper, we use the IML and PHREG procedures to combine the RFM analysis and survival analysis in order to determine the optimal number of higher-risk and higher-value customers to retain.

Read the paper (PDF)

Bo Zhang, IBM

Liwei Wang, Pharmaceutical Product Development Inc

Session 0968-2017:

Association between Sunlight and Specific-Cause Mortality

Research frequently shows that exposure to sunlight contributes to non-melanoma skin cancer. But, it also shows that sunlight might protect you against multiple sclerosis and breast, ovarian, prostate, and colon cancer. In my study, I explored whether mortality from skin cancer, myocardial infarction, atrial fibrillation, and stroke is associated with exposure to sunlight. I used SAS^® 9.4 and RStudio to conduct the entire study. I collected mortality data including cause of death in Los Angeles from 2000 to 2003. In addition, I collected sunlight data for Los Angeles for the same period. There are three types of sunlight in my data global sunlight, diffuse sunlight, and direct sunlight. Data was collected at three different times morning, middle of day, and afternoon. I used two models the Poisson time series regression model and a logistic regression model to investigate the association. I considered a one-year and two-year lag of sunlight association with the types of diseases. I adjusted for age, sex, race, education, temperature, and day of week. Results show that stroke is statistically and significantly associated with a one-year lag of sunlight (p<0.001). Previous epidemiological studies have found that sunlight exposure can ameliorate osteoporosis in stroke patients, and my study provides the protective effects of sunlight on stroke patients.

View the e-poster or slides (PDF)

Wei Xiong, University of Southern California

Session 1352-2017:

Automatically Map CDASH Data to the Study Data Tabulation Model (SDTM) Structure

In the pharmaceutical industry, the Clinical Data Interchange Standards Consortium s (CDISC) Study Data Tabulation Model (SDTM) is required by the US Food and Drug Administration (FDA) as the standard data structure for regulatory submission of clinical data. Manually mapping raw data to SDTM domains can be time consuming and error prone, considering the increasing complexity of clinical data. However, this process can be much more efficient if the raw data is collected using the Clinical Data Acquisition Standards Harmonization (CDASH) standard, allowing for the automatic conversion to the SDTM data structure. This paper introduces a macro that can automatically create a SAS^® program for each SDTM domain (for example, dm.sas for Demography [DM]), that maps CDASH data to SDTM data. The macro compares the attributes of CDASH raw data sets with SDTM domains to generate SAS code that performs the mapping. Each SAS program does the following: 1) sets up variables and assigns their proper order; 2) converts date and time to ISO8601 standard; 3) converts numeric variables to character variables; and 4) transposes the data sets from wide to long for the Findings and Events domains. This macro, which sets up the basic frame of SDTM mapping, can minimize the manual work for SAS programmers or, in some cases, completely handle some simple domains without any further modifications. This greatly increases the efficiency and speed of the SDTM conversion process.

Read the paper (PDF)

Hao Xu, McDougall Scientific

Hong Chen, McDougall Scientific

B

Session SAS0535-2017:

Big Value from Big Data: SAS/ETS^® Methods for Spatial Econometric Modeling in the Era of Big Data

Data that are gathered in modern data collection processes are often large and contain geographic information that enables you to examine how spatial proximity affects the outcome of interest. For example, in real estate economics, the price of a housing unit is likely to depend on the prices of housing units in the same neighborhood or nearby neighborhoods, either because of their locations or because of some unobserved characteristics that these neighborhoods share. Understanding spatial relationships and being able to represent them in a compact form are vital to extracting value from big data. This paper describes how to glean analytical insights from big data and discover their big value by using spatial econometric methods in SAS/ETS^® software.

Read the paper (PDF)

Guohui Wu, SAS

Jan Chvosta, SAS

Session SAS0474-2017:

Building Bayesian Network Classifiers Using the HPBNET Procedure

A Bayesian network is a directed acyclic graphical model that represents probability relationships and conditional independence structure between random variables. SAS^® Enterprise Miner implements a Bayesian network primarily as a classification tool; it includes na ve Bayes, tree-augmented na ve Bayes, Bayesian-network-augmented na ve Bayes, parent-child Bayesian network, and Markov blanket Bayesian network classifiers. The HPBNET procedure uses a score-based approach and a constraint-based approach to model network structures. This paper compares the performance of Bayesian network classifiers to other popular classification methods, such as classification tree, neural network, logistic regression, and support vector machines. The paper also shows some real-world applications of the implemented Bayesian network classifiers and a useful visualization of the results.

Read the paper (PDF)

Ye Liu, SAS

Weihua Shi, SAS

Wendy Czika, SAS

Session 0929-2017:

Building a Member-Centric World from a Transactional Data Galaxy

Health insurers have terabytes of transactional data. However, transactional data does not tell a member-level story. Humana Inc. is often faced with requirements for tagging (identifying) members with various clinical conditions such as diabetes, depression, hypertension, hyperlipidemia, and various member-level utilization metrics. For example, Consumer Health Tags are built to identify the condition (that is, diabetes, hypertension, and so on) and to estimate the intensity of the disease using medical and pharmacy administrative claims data. This case study takes you on an analytics journey from the initial problem diagnosis and analytics solution using SAS^®.

Read the paper (PDF)

Brian Mitchell, Humana Inc.

C

Session 1070-2017:

Cold-Start Solution to A/B Testing Using Adaptive Sample Size Modification

A/B testing is a form of statistical hypothesis testing on two business options (A and B) to determine which is more effective in the modern Internet age. The challenge for startups or new product businesses leveraging A/B testing are two-fold: a small number of customers and poor understanding of their responses. This paper shows you how to use the IML and POWER procedures to deal with the reassessment of sample size for adaptive multiple business stage designs based on conditional power arguments, using the data observed at the previous business stage.

Read the paper (PDF)

Bo Zhang, IBM

Liwei Wang, Pharmaceutical Product Development Inc

Session 1445-2017:

Complex Merging of Emergency Department and Hospitalization Data to Create a Longitudinal Data Set

Epidemiologists and other health scientists are often tasked with solving health problems but find collecting original data prohibitive for a multitude of reasons. For this reason, it is common to instead use secondary data such as that from emergency departments (ED) or inpatient hospital stays. In order to use some of these secondary data sets to study problems over time, it is necessary to link them together using common identifiers and still keep all the unique information about each ED visit or hospitalization. This paper discusses a method that was used to combine five years worth of individual ED visits and five years worth of individual hospitalizations to create a single and (much) larger data set for longitudinal analysis.

Read the paper (PDF)

Charlotte Baker, Florida A&M University

Session 1312-2017:

Construction of a Disease Network and a Prediction Model for Dementia

Regarding a human disease network, most studies have estimated the associations of disorders primarily with gene or protein information. Those studies, however, have some difficulties in the data because of the massive volume of data and the huge computational cost. Instead, we constructed a human disease network that can describe the associations between diseases, using the claim data of Korean health insurance. Through several statistical analyses, we show the applicability and suitability of the disease network. Furthermore, we develop a statistical model that can predict a prevalence rate for dementia by using significant associations of the network in a statistical perspective.

Read the paper (PDF)

Jinwoo Cho, Sung Kyun Kwan University

Session 0841-2017:

Creating Daily Rolling 6-Month Average and Cumulative Enrollment Graphs Using PROC EXPAND and INTNX

Clinical research study enrollment data consists of subject identifiers and enrollment dates that are used by investigators to monitor enrollment progress. Meeting study enrollment targets is critical to ensuring there will be enough data and end points to enable the statistical power of the study. For clinical trials that do not experience heavy, nearly daily enrollment, there will be a number of dates on which no subjects were enrolled. Therefore, plots of cumulative enrollment represented by a smoothed line can give a false impression, or imprecise reading, of study enrollment. A more accurate display would be a step function plot that would include dates where no subjects were enrolled. Rolling average plots often start with summing the data by month and creating a rolling average from the monthly sums. This session shows how to use the EXPAND procedure, along with the SQL and GPLOT procedures and the INTNX function, to create plots that display cumulative enrollment and rolling 6-month averages for each day. This includes filling in the dates with no subject enrollment and creating a rolling 6-month average for each date. This allows analysis of day-to-day variation as well as the short- and long-term impacts of changes, such as adding an enrollment center or initiatives to increase enrollment. This technique can be applied to any data that has gaps in dates. Examples include service history data and installation rates for a newly launched product.

Read the paper (PDF)

Susan Schleede, University of Rochester

D

Session 1172-2017:

Data Analytics and Visualization Tell Your Story with a Web Reporting Framework Based on SAS^®

For all business analytics projects big or small, the results are used to support business or managerial decision-making processes, and many of them eventually lead to business actions. However, executives or decision makers are often confused and feel uninformed about contents when presented with complicated analytics steps, especially when multi-processes or environments are involved. After many years of research and experiment, a web reporting framework based on SAS^® Stored Processes was developed to smooth the communication between data analysts, researches, and business decision makers. This web reporting framework uses a storytelling style to present essential analytical steps to audiences, with dynamic HTML5 content and drill-down and drill-through functions in text, graph, table, and dashboard formats. No special skills other than SAS^® programming are needed for implementing a new report. The model-view-controller (MVC) structure in this framework significantly reduced the time needed for developing high-end web reports for audiences not familiar with SAS. Additionally, the report contents can be used to feed to tablet or smartphone users. A business analytical example is demonstrated during this session. By using this web reporting framework based on SAS Stored Processes, many existing SAS results can be delivered more effectively and persuasively on a SAS^® Enterprise BI platform.

Read the paper (PDF)

Qiang Li, Locfit LLC

Session SAS0456-2017:

Detecting and Adjusting Structural Breaks in Time Series and Panel Data Using the SSM Procedure

Detection and adjustment of structural breaks are an important step in modeling time series and panel data. In some cases, such as studying the impact of a new policy or an advertising campaign, structural break analysis might even be the main goal of a data analysis project. In other cases, the adjustment of structural breaks is a necessary step to achieve other analysis objectives, such as obtaining accurate forecasts and effective seasonal adjustment. Structural breaks can occur in a variety of ways during the course of a time series. For example, a series can have an abrupt change in its trend, its seasonal pattern, or its response to a regressor. The SSM procedure in SAS/ETS^® software provides a comprehensive set of tools for modeling different types of sequential data, including univariate and multivariate time series data and panel data. These tools include options for easy detection and adjustment of a wide variety of structural breaks. This paper shows how you can use the SSM procedure to detect and adjust structural breaks in many different modeling scenarios. Several real-world data sets are used in the examples. The paper also includes a brief review of the structural break detection facilities of other SAS/ETS procedures, such as the ARIMA, AUTOREG, and UCM procedures.

Read the paper (PDF)

Rajesh Selukar, SAS

Session 1089-2017:

Developing a Predictive Model of Physician Attribution of Patient Satisfaction Surveys

For all healthcare systems, considerable attention and resources are directed at gauging and improving patient satisfaction. Dignity Health has made considerable efforts in improving most areas of patient satisfaction. However, improving metrics around physician interaction with patients has been challenging. Failure to improve these publicly reported scores can result in reimbursement penalties, damage to Dignity's brand and an increased risk of patient harm. One possible way to improve these scores is to better identify the physicians that present the best opportunity for positive change. Currently, the survey tool mandated by the Centers for Medicare and Medicaid Services (CMS), the Hospital Consumer Assessment of Healthcare Providers and Systems (HCAHPS), has three questions centered on patient experience with providers, specifically concerning listening, respect, and clarity of conversation. For purposes of relating patient satisfaction scores to physicians, Dignity Health has assigned scores based on the attending physician at discharge. By conducting a manual record review, it was determined that this method rarely corresponds to the manual review (PPV = 20.7%, 95% CI: 9.9% -38.4%). Using a variety of SAS^® tools and predictive modeling programs, we developed a logistic regression model that had better agreement with chart abstractors (PPV = 75.9%, 95% CI: 57.9% - 87.8%). By attributing providers based on this predictive model, opportunities for improvement can be more accurately targeted, resulting in improved patient satisfaction and outcomes while protecting fiscal health.

Read the paper (PDF)

Ken Ferrell, Dignity Health

E

Session 1068-2017:

Establishing an Agile, Self-Service Environment to Empower Agile Analytic Capabilities

Creating an environment that enables and empowers self-service and agile analytic capabilities requires a tremendous amount of working together and extensive agreements between IT and the business. Business and IT users are struggling to know what version of the data is valid, where they should get the data from, and how to combine and aggregate all the data sources to apply analytics and deliver results in a timely manner. All the while, IT is struggling to supply the business with more and more data that is becoming available through many different data sources such as the Internet, sensors, the Internet of Things, and others. In addition, once they start trying to join and aggregate all the different types of data, the manual coding can be very complicated and tedious, can demand extraneous resources and processing, and can negatively impact the overhead on the system. If IT enables agile analytics in a data lab, it can alleviate many of these issues, increase productivity, and deliver an effective self-service environment for all users. This self-service environment using SAS^® analytics in Teradata has decreased the time required to prepare the data and develop the statistical data model, and delivered faster results in minutes compared to days or even weeks. This session discusses how you can enable agile analytics in a data lab, leverage SAS analytics in Teradata to increase performance, and learn how hundreds of organizations have adopted this concept to deliver self-service capabilities in a streamlined process.

Bob Matsey, Teradata

David Hare, SAS

Session 1308-2017:

Exploration of Information Technology-Related Barriers Affecting Rural Primary Care Clinics

With an aim to improve rural healthcare, Oklahoma State University (OSU) Center for Health Systems Innovation (CHSI) conducted a study with primary care clinics (n=35) in rural Oklahoma to identify possible impediments to clinic workflows. The study entailed semi-structured personal interviews (n=241) and administered an online survey using an iPad (n=190). Respondents encompassed all consenting clinic constituents (physicians, nurses, practice managers, schedulers). Quantitative data from surveys revealed that electronic medical records (EMRs) are well accepted and contributed to increasing workflow efficiency. However, the qualitative data from interviews reveals that there are IT-related barriers like Internet connectivity, hardware problems, and inefficiencies in information systems. Interview responses identified six IT-related response categories (computer, connectivity, EMR-related, fax, paperwork, and phone calls) that routinely affect clinic workflow. These categories together account for more than 50% of all the routine workflow-related problems faced by the clinics. Text mining was performed on transcribed Interviews using SAS^® Text Miner to validate these six categories and to further identify concept linking for a quantifiable insight. Two variables (Redundancy Reduction and Idle Time Generation) were derived from survey questions with low scores of -129 and -64 respectively out of 384. Finally, ANOVA was run using SAS^® Enterprise Guide^® 6.1 to determine whether the six qualitative categories affect the two quantitative variables differently.

Read the paper (PDF)

Ankita Srivastava, Oklahoma State University

Ipe Paramel, Oklahoma State University

Onkar Jadhav, Oklahoma State University

Jennifer Briggs, Oklahoma State University

F

Session SAS0686-2017:

Fighting Crime in Real Time with SAS^® Visual Scenario Designer

Credit card fraud. Loan fraud. Online banking fraud. Money laundering. Terrorism financing. Identity theft. The strains that modern criminals are placing on financial and government institutions demands new approaches to detecting and fighting crime. Traditional methods of analyzing large data sets on a periodic, batch basis are no longer sufficient. SAS^® Event Stream Processing provides a framework and run-time architecture for building and deploying analytical models that run continuously on streams of incoming data, which can come from virtually any source: message queues, databases, files, TCP\IP sockets, and so on. SAS^® Visual Scenario Designer is a powerful tool for developing, testing, and deploying aggregations, models, and rule sets that run in the SAS^® Event Stream Processing Engine. This session explores the technology architecture, data flow, tools, and methodologies that are required to build a solution based on SAS Visual Scenario Designer that enables organizations to fight crime in real time.

Read the paper (PDF)

John Shipway, SAS

G

Session 1025-2017:

GMM Logistic Regression with Time-Dependent Covariates and Feedback Processes in SAS^®

The analysis of longitudinal data requires a model that correctly accounts for both the inherent correlation amongst the responses as a result of the repeated measurements, as well as the feedback between the responses and predictors at different time points. Lalonde, Wilson, and Yin (2013) developed an approach based on generalized method of moments (GMM) for identifying and using valid moment conditions to account for time-dependent covariates in longitudinal data with binary outcomes. However, the model developed using this approach does not provide information about the specific relationships that exist across time points. We present a SAS^® macro that extends the work of Lalonde, Wilson, and Yin by using valid moment conditions to estimate and evaluate the relationships between the response and predictors at different time periods. The performance of this method is compared to previously established results.

Read the paper (PDF)

Jeffrey Wilson, Arizona State University

H

Session 0794-2017:

Hands-On Graph Template Language (GTL): Part A

Would you like to be more confident in producing graphs and figures? Do you understand the differences between the OVERLAY, GRIDDED, LATTICE, DATAPANEL, and DATALATTICE layouts? Finally, would you like to learn the fundamental Graph Template Language methods in a relaxed environment that fosters questions? Great this topic is for you! In this hands-on workshop, you are guided through the fundamental aspects of the GTL procedure, and you can try fun and challenging SAS^® graphics exercises to enable you to more easily retain what you have learned.

Read the paper (PDF) | Download the data file (ZIP)

Kriss Harris

Session 0864-2017:

Hands-on Graph Template Language (GTL): Part B

Do you need to add annotations to your graphs? Do you need to specify your own colors on the graph? Would you like to add Unicode characters to your graph, or would you like to create templates that can also be used by non-programmers to produce the required figures? Great, then this topic is for you! In this hands-on workshop, you are guided through the more advanced features of the GTL procedure. There are also fun and challenging SAS^® graphics exercises to enable you to more easily retain what you have learned.

Read the paper (PDF) | Download the data file (ZIP)

Kriss Harris

Session 0840-2017:

Health-Care Data Sharing and Innovative Analytic Development in Distributed Data Networks

Secondary use of administrative claims data, EHRs and EMRs, registry data, and other data sources within the health data ecosystem provide rich opportunity and potential to study topics ranging from public health surveillance to comparative effectiveness research. Data sourced from individual sites can be limited in their scope, coverage, and statistical power. Sharing and pooling data from multiple sites and sources, however, present administrative, governance, analytic, and patient-privacy challenges. Distributed data networks represent a paradigm shift in health-care data sharing. They have evolved at a critical time when big data and patient privacy are often competing priorities. A distributed data network is one that has no central repository of data. Data reside behind the firewall of each data-contributing partner in a network. Each partner transforms its source data in accordance with a common data model and allows indirect access to data through a standard query approach using flexibly designed informatics tools. This presentation discusses how distributed data networks have matured to make important contributions to the health-care data ecosystem and the evolving Learning Healthcare System. The presentation focuses on 1) the distributed data network and its purpose, concept, guiding principles, and benefits. 2) Common data models and their concepts, designs, and benefits. 3) Analytic tool development and its design and implementation considerations. 4) Analytic chal

Read the paper (PDF)

Jennifer Popovic, Harvard Medical School / Harvard Pilgrim Health Care Institute

I

Session 1318-2017:

Import and Export XML Documents with SAS^®

XML documents are becoming increasingly popular for transporting data from different operating systems. In the pharmaceutical industry, the Food and Drug Administration (FDA) requires pharmaceutical companies to submit certain types of data in XML format. This paper provides insights into XML documents and summarizes different methods of importing and exporting XML documents with SAS^®, including: using the XML LIBNAME engine to translate between the XML markup and the SAS format; creating an XML Map and using the XML92 LIBNAME engine to read in XML documents and create SAS data sets; and using Clinical Data Interchange Standards Consortium (CDISC) procedures to import and export XML documents. An example of importing OpenClinica data into SAS by implementing these methods is provided.

Read the paper (PDF)

Fei Wang, McDougall Scientific

Session 1339-2017:

Interrupted Time Series Power Calculation using DO Loop Simulations

Interrupted time series analysis (ITS) is a tool that can be used to help Learning Healthcare Systems evaluate programs in settings where randomization is not feasible. Interrupted time series is a statistical method to assess repeated snap shots over regular intervals of time before and after a system-level intervention or program is implemented. This method can be used by Learning Healthcare Systems to evaluate programs aimed at improving patient outcomes in real-world, clinical settings. In practice, the number of patients and the timing of observations are restricted. This presentation describes a program that helps statisticians identify optimal segments of time within a fixed population size for an interrupted time series analysis. A macro creates simulations based on DO loops to calculate power to detect changes over time due to system-level interventions. Parameters used in the macro are sample size, number of subjects in each time frame in each year, number of intervals in a year, and the probability of the event before and after the intervention. The macro gives the user the ability to specify different assumptions that result in design options that yield varying power based on the number of patients in each time intervals given the fixed parameters. The output from the macro can help stakeholders understand necessary parameters to help determine the optimal evaluation design.

Read the paper (PDF)

Nigel Rozario, UNCC

Andrew McWilliams, CHS

Charity Moore, CHS

K

Session 1002-2017:

Know Thyself: Diabetes Trend Analysis

Throughout history, the phrase know thyself has been the aspiration of many. The trend of wearable technologies has certainly provided the opportunity to collect personal data. These technologies enable individuals to know thyself on a more sophisticated level. Specifically, wearable technologies that can track a patient's medical profile in a web-based environment, such as continuous blood glucose monitors, are saving lives. The main goal for diabetics is to replicate the functions of the pancreas in a manner that allows them to live a normal, functioning lifestyle. Many diabetics have access to a visual analytics website to track their blood glucose readings. However, they often are unreadable and overloaded with information. Analyzing these readings from the glucose monitor and insulin pump with SAS^®, diabetics can parse their own information into more simplified and readable graphs. This presentation demonstrates the ease in creating these visualizations. Not only is this beneficial for diabetics, but also for the doctors that prescribe the necessary basal and bolus levels of insulin for a patient s insulin pump.

View the e-poster or slides (PDF)

Taylor Larkin, The University of Alabama

Denise McManus, The University of Alabama

Session 1069-2017:

Know Your Tools Before You Use

When analyzing data with SAS^®, we often use the SAS DATA step and the SQL procedure to explore and manipulate data. Though they both are useful tools in SAS, many SAS users do not fully understand their differences, advantages, and disadvantages and thus have numerous unnecessary biased debates on them. Therefore, this paper illustrates and discusses these aspects with real work examples, which give SAS users deep insights into using them. Using the right tool for a given circumstance not only provides an easier and more convenient solution, it also saves time and work in programming, thus improving work efficiency. Furthermore, the illustrated methods and advanced programming skills can be used in a wide variety of data analysis and business analytics fields.

Read the paper (PDF)

Justin Jia, TransUnion

L

Session 1277-2017:

Leads and Lags: Static and Dynamic Queues in the SAS^® DATA Step

From stock price histories to hospital stay records, analysis of time series data often requires use of lagged (and occasionally lead) values of one or more analysis variable. For the SAS^® user, the central operational task is typically getting lagged (lead) values for each time point in the data set. Although SAS has long provided a LAG function, it has no analogous lead function, which is an especially significant problem in the case of large data series. This paper 1) reviews the lag function, in particular, the powerful but non-intuitive implications of its queue-oriented basis; 2) demonstrates efficient ways to generate leads with the same flexibility as the LAG function, but without the common and expensive recourse to data re-sorting; and 3) shows how to dynamically generate leads and lags through use of the hash object.

Read the paper (PDF)

Mark Keintz, Wharton Research Data Services

Session 0290-2017:

Learning from Quality Improvement Data: Introduction to Statistical Process Control Charts

Data is your friend. This presentation discusses the use of data for quality improvement (QI). Measurement over time is integral to quality improvement, and statistical process control charts (also known as Shewhart or SPC charts) are a good way to learn from the way measures change over time, in response to our improvement efforts. The presentation explains what an SPC chart is, how to chose the correct type of chart, how to create and update a chart using SAS^®, and how to learn from the chart. The examples come from QI projects in health care, and the material is based on the Institute for Healthcare Improvement's Model for Improvement. However, the material is applicable to other fields, including manufacturing and business. The presentation is intended for people newly considering a QI project, people who want to graph their data and need help with getting started, and anyone interested in interpreting SPC charts created by someone else.

Read the paper (PDF)

Ruth Croxford, Institute for Clinical Evaluative Sciences

M

Session 1009-2017:

Manage Your Parking Lot! Must-Haves and Good-to-Haves for a Highly Effective Analytics Team

Every organization, from the most mature to a day-one start-up, needs to grow organically. A deep understanding of internal customer and operational data is the single biggest catalyst to develop and sustain the data. Advanced analytics and big data directly feed into this, and there are best practices that any organization (across the entire growth curve) can adopt to drive success. Analytics teams can be drivers of growth. But to be truly effective, key best practices need to be implemented. These practices include in-the-weeds details, like the approach to data hygiene, as well as strategic practices, like team structure and model governance. When executed poorly, business leadership and the analytics team are unable to communicate with each other they talk past each other and do not work together toward a common goal. When executed well, the analytics team is part of the business solution, aligned with the needs of business decision-makers, and drives the organization forward. Through our engagements, we have discovered best practices in three key areas. All three are critical to analytics team effectiveness. 1) Data Hygiene 2) Complex Statistical Modeling 3) Team Collaboration

Read the paper (PDF)

Aarti Gupta, Bain & Company

Paul Markowitz, Bain & Company

Session 1231-2017:

Modeling Machiavelianism: Predicting Scores with Fewer Factors

Prince Niccolo Machiavelli said things on the order of, The promise given was a necessity of the past: the word broken is a necessity of the present. His utilitarian philosophy can be summed up by the phrase, The ends justify the means. As a personality trait, Machiavelianism is characterized by the drive to pursue one's own goals at the cost of others. In 1970, Richard Christie and Florence L. Geis created the MACH-IV test to assign a MACH score to an individual, using 20 Likert-scaled questions. The purpose of this study was to build a regression model that can be used to predict the MACH score of an individual using fewer factors. Such a model could be useful in screening processes where personality is considered, such as in job screening, offender profiling, or online dating. The research was conducted on a data set from an online personality test similar to the MACH-IV test. It was hypothesized that a statistically significant model exists that can predict an average MACH score for individuals with similar factors. This hypothesis was accepted.

View the e-poster or slides (PDF)

Patrick Schambach, Kennesaw State University

Session 1392-2017:

Moving Along in Health Research: Applying PROC EXPAND to Medical Encounter Data

The EXPAND procedure is very useful when handling time series data and is commonly used in fields such as finance or economics, but it can also be applied to medical encounter data within a health research setting. Medical encounter data consists of detailed information about healthcare services provided to a patient by a managed care entity and is a rich resource for epidemiologic research. Specific data items include, but are not limited to, dates of service, procedures performed, diagnoses, and costs associated with services provided. Drug prescription information is also available. Because epidemiologic studies generally focus on a particular health condition, a researcher using encounter data might wish to distinguish individuals with the health condition of interest by identifying encounters with a defining diagnosis and/or procedure. In this presentation, I provide two examples of how cases can be identified from a medical encounter database. The first uses a relatively simple case definition, and then I EXPAND the example to a more complex case definition.

View the e-poster or slides (PDF)

Rayna Matsuno, Henry M. Jackson Foundation

Session 1404-2017:

Multicollinearity: What Is It, Why Should We Care, and How Can It Be Controlled?

Multicollinearity can be briefly described as the phenomenon in which two or more identified predictor variables in a multiple regression model are highly correlated. The presence of this phenomenon can have a negative impact on the analysis as a whole and can severely limit the conclusions of the research study. This paper reviews and provides examples of the different ways in which multicollinearity can affect a research project, and tells how to detect multicollinearity and how to reduce it once it is found. In order to demonstrate the effects of multicollinearity and how to combat it, this paper explores the proposed techniques by using the Behavioral Risk Factor Surveillance System data set. This paper is intended for any level of SAS^® user. This paper is also written to an audience with a background in behavioral science or statistics.

Read the paper (PDF)

Deanna Schreiber-Gregory, National University

N

Session 1470-2017:

N-Stage Machine Learning Analysis with the LUA Procedure Helps Solve Big Data Analysis Problems

Data has to be a moderate size when you estimate parameters with machine learning. For data with huge numbers of records like healthcare big data (for example, receipt data) or super multi-dimensional data like genome big data, it is important to follow a procedure in which data is cleaned first and then selection of data or variables for modeling is performed. Big data often consists of macroscopic and microscopic groups. With these groups, it is possible to increase the accuracy of estimation by following the above procedure in which data is cleaned from a macro perspective and the selection of data or variables for modeling is performed from a micro perspective. This kind of step-wise procedure can be expected to help reduce bias. We also propose a new analysis algorithm with N-stage machine learning. For simplicity, we assume N =2. Note that different machine learning approaches should be applied; that is, a random forest method is used at the first stage for data cleaning, and an elastic net method is used for the selection of data or variables. For programming N-stage machine learning, we use the LUA procedure that is not only efficient, but also enables an easily readable iteration algorithm to be developed. Note that we use well-known machine learning methods that are implementable with SAS^® 9.4, SAS^® In-Memory Statistics, and so on.

Read the paper (PDF)

Ryo Kiguchi, Shionogi & Co., LTD

Eri Sakai, Shionogi & Co., LTD

Yoshitake Kitanishi, Shionogi & Co., LTD

Akio Tsuji, Shionogi & Co., LTD

Session 1440-2017:

Need a Graphic for a Scientific Journal? No Problem!

Graphics are an excellent way to display results from multiple statistical analyses and get a visual message across to the correct audience. Scientific journals often have very precise requirements for graphs that are submitted with manuscripts. While authors often find themselves using tools other than SAS^® to create these graphs, the combination of the SGPLOT procedure and the Output Delivery System enables authors to create what they need in the same place as they conducted their analysis. This presentation focuses on two methods for creating a publication quality graphic in SAS^® 9.4 and provides solutions for some issues encountered when doing so.

Read the paper (PDF)

Charlotte Baker, Florida A&M University

Session SAS0517-2017:

Nine Best Practices for Big Data Dashboards Using SAS^® Visual Analytics

Creating your first suite of reports using SAS^® Visual Analytics is like being a kid in a candy store with so many options for data visualization, it is difficult to know where to start. Having a plan for implementation can save you a lot of time in development and beyond, especially when you are wrangling big data. This paper helps you make sure that you are parallelizing work (where possible), maximizing your data insights, and creating a polished end product. We provide guidelines to common questions, such as How many objects are too many ? or When should I use multiple tabs versus report linking? to start any data visualizer off on the right foot.

Read the paper (PDF)

Elena Snavely, SAS

P

Session 0949-2017:

Personally Identifiable Information Secured Transformation

Organizations that create and store personally identifiable information (PII) are often required to de-identify sensitive data to protect an individual s privacy. There are multiple methods in SAS^® that can be used to de-identify PII depending on data types and encryption needs. The first method is to apply crosswalk mapping by linking a data set with PII to a secured data set that contains the PII and its corresponding surrogate. Then, the surrogate replaces the PII in the original data set. A second method is SAS encryption, which involves translating PII into an encrypted string using SAS functions. This could be a one-byte-to-one-byte swap or a one-byte-to-two-byte swap. The third method is in-database encryption, which encrypts the PII in a data warehouse, such as Oracle and Teradata, using SAS tools before any information is imported into SAS for users to see. This paper discusses the advantages and disadvantages of these three methods, provides sample SAS code, and describes the corresponding methods to decrypt the encrypted data.

Read the paper (PDF) | View the e-poster or slides (PDF)

Shuhua Liang, Kaiser Permanente

Zoe Bider-Canfield, Kaiser Permanente

Session 1305-2017:

Predicting the Completeness of Clinical Data through Claims Data Matching Using SAS^® Enterprise Miner™

Research using electronic health records (EHR) is emerging, but questions remain about its completeness, due in part to physicians' time to enter data in all fields. This presentation demonstrates the use of SAS^® Enterprise Miner to predict completeness of clinical data using claims data as the standard 'source of truth' against which to compare it. A method for assessing and predicting the completeness of clinical data is presented using the tools and techniques from SAS Enterprise Miner. Some of the topics covered include: tips for preparing your sample data set for use in SAS Enterprise Miner; tips for preparing your sample data set for modeling, including effective use of the Input Data, Data Partition, Filter, and Replacement nodes; and building predictive models using Stat Explore, Decision Tree, Regression, and Model Compare nodes.

View the e-poster or slides (PDF)

Catherine Olson, Optum

Thomas Horstman, Optum

Session SAS0332-2017:

Propensity Score Methods for Causal Inference with the PSMATCH Procedure

In a randomized study, subjects are randomly assigned to either a treated group or a control group. Random assignment ensures that the distribution of the covariates is the same in both groups and that the treatment effect can be estimated by directly comparing the outcomes for the subjects in the two groups. In contrast, subjects in an observational study are not randomly assigned. In order to establish causal interpretations of the treatment effects in observational studies, special statistical approaches that adjust for the covariate confounding are required to obtain unbiased estimation of causal treatment effects. One strategy for correctly estimating the treatment effect is based on the propensity score, which is the conditional probability of the treatment assignment given the observed covariates. Prior to the analysis, you use propensity scores to adjust the data by weighting observations, stratifying subjects that have similar propensity scores, or matching treated subjects to control subjects. This paper reviews propensity score methods for causal inference and introduces the PSMATCH procedure, which is new in SAS/STAT^® 14.2. The procedure provides methods of weighting, stratification, and matching. Matching methods include greedy matching, matching with replacement, and optimal matching. The procedure assesses covariate balance by comparing distributions between the adjusted treated and control groups.

Read the paper (PDF)

Yang Yuan, SAS

S

Session 1271-2017:

SAS^® In-Memory Analytics for Hadoop

SAS^® In-Memory Analytics for Hadoop is an analytical programming environment that enables a user to use many components of an analytics project in a single environment, rather than switching between different applications. Users can easily prepare raw data for different types of analytics procedures. These techniques explore the data to enhance the information extractions. They can apply a large variety of statistical and machine learning techniques to the data to compare different analytical approaches. The model comparison capabilities let them quickly find the best model, which they can deploy and score in the Hadoop environment. All of these different components of the analytics project are supported in a distributed in-memory environment for lightning-fast processing. This paper highlights tips for working with the interaction between Hadoop data and for dealing with SAS^® LASR Analytic Server. It contains multiple scenarios with elementary but pragmatic approaches that enable SAS^® programmers to work efficiently within the SAS^® In-Memory Analytics environment.

Read the paper (PDF) | View the e-poster or slides (PDF)

Venkateswarlu Toluchuri, United HealthCare Group

Session 1005-2017:

SAS^® Macros for Computing the Mediated Effect in the Pretest-Posttest Control Group Design

Mediation analysis is a statistical technique for investigating the extent to which a mediating variable transmits the relation of an independent variable to a dependent variable. Because it is useful in many fields, there have been rapid developments in statistical mediation methods. The most cutting-edge statistical mediation analysis focuses on the causal interpretation of mediated effect estimates. Cause-and-effect inferences are particularly challenging in mediation analysis because of the difficulty of randomizing subjects to levels of the mediator (MacKinnon, 2008). The focus of this paper is how incorporating longitudinal measures of the mediating and outcome variables aides in the causal interpretation of mediated effects. This paper provides useful SAS^® tools for designing adequately powered studies to detect the mediated effect. Three SAS macros were developed using the powerful but easy-to-use REG, CALIS, and SURVEYSELECT procedures to do the following: (1) implement popular statistical models for estimating the mediated effect in the pretest-posttest control group design; (2) conduct a prospective power analysis for determining the required sample size for detecting the mediated effect; and (3) conduct a retrospective power analysis for studies that have already been conducted and a required sample to detect an observed effect is desired. We demonstrate the use of these three macros with an example.

Read the paper (PDF)

David MacKinnon, Arizona State University

Session 1279-2017:

SAS^®: A Unifying Tool That Manages Hospital and Research Pharmacy Data and Reporting

Hospital Information Technologists are faced with a dilemma: how to get the many pharmacy databases, dynamic data sets, and software systems to communicate with each other and generate useful, automated, real-time output. SAS^® serves as a unifying tool for our hospital pharmacy. It brings together data from multiple sources, generates output in multiple formats, analyzes trends, and generates summary reports to meet workload, quality, and regulatory requirements. Data sets originate from multiple sources, including drug and device wholesalers, web-based drug information systems, dumb machine output, pharmacy drug-dispensing platforms, hospital administration systems, and others. SAS output includes CSV files that can be read by dispensing machines, report output for Pharmacy and Therapeutics committees, graphs to summarize year-to-year dispensing and quality trends, emails to customers with inventory and expiry date notifications, investigational drug information summaries for hospital staff, inventory trending with restock alerts, and quality assurance summary reports. For clinical trial support, additional output includes randomization codes, data collection forms, blinded enrollment summaries, study subject assignment lists, and others. For business operations, output includes invoices, shipping documents, and customer metrics. SAS brings our pharmacy information systems together and supports an efficient, cost-effective, flexible, and reliable workflow.

Read the paper (PDF) | View the e-poster or slides (PDF)

Robert MacArthur, Rockefeller University

Arman Altincatal, Evidera

Session SAS0437-2017:

Stacked Ensemble Models for Improved Prediction Accuracy

Ensemble models have become increasingly popular in boosting prediction accuracy over the last several years. Stacked ensemble techniques combine predictions from multiple machine learning algorithms and use these predictions as inputs to a second level-learning algorithm. This paper shows how you can generate a diverse set of models by various methods (such as neural networks, extreme gradient boosting, and matrix factorizations) and then combine them with popular stacking ensemble techniques, including hill-climbing, generalized linear models, gradient boosted decision trees, and neural nets, by using both the SAS^® 9.4 and SAS^® Visual Data Mining and Machine Learning environments. The paper analyzes the application of these techniques to real-life big data problems and demonstrates how using stacked ensembles produces greater prediction accuracy than individual models and na ve ensembling techniques. In addition to training a large number of models, model stacking requires the proper use of cross validation to avoid overfitting, which makes the process even more computationally expensive. The paper shows how to deal with the computational expense and efficiently manage an ensemble workflow by using parallel computation in a distributed framework.

Read the paper (PDF)

Funda Gunes, SAS

Russ Wolfinger, SAS

Pei-Yi Tan, SAS

Session 0924-2017:

Survival Analysis of Lung Cancer Patients Using PROC PHREG and PROC LIFETEST

Survival analysis differs from other types of statistical analysis, including graphical summaries and regression modeling procedures, because data is almost always censored. The purpose of this project is to apply survival analysis techniques in SAS^® to practical survival data, aiming to understand the effects of gender and age on lung cancer patient survival at different cancer sites. Results show that both gender and age are significant variables in predicting lung cancer patient survival using the Cox proportional hazards model. Females have better survival than males when other variables in the model are fixed (p-value 0.0254). Moreover, the hazard of patients who are over 65 is 1.385 times that of patients who are under 65 (p-value 0.0145).

View the e-poster or slides (PDF)

Yan Wang, Kennesaw State University

T

Session SAS0427-2017:

Telling the Story of Your Process with Graphical Enhancements of Control Charts

Have you ever used a control chart to assess the variation in a process? Did you wonder how you could modify the chart to tell a more complete story about the process? This paper explains how you can use the SHEWHART procedure in SAS/QC^® software to make the following enhancements: display multiple sets of control limits that visualize the evolution of the process, visualize stratified variation, explore within-subgroup variation with box-and-whisker plots, and add information that improves the interpretability of the chart. The paper begins by reviewing the basics of control charts and then illustrates the enhancements with examples drawn from real-world quality improvement efforts.

Read the paper (PDF)

Bucky Ransdell, SAS

Session 0882-2017:

The Development and Application of a Composite Score for Social Determinants of Health

Socioeconomic status (SES) is a major contributor to health disparities in the United States. Research suggests that those with a low SES versus a high SES are more likely to have lower life expectancy; participate in unhealthy behaviors such as smoking and alcohol consumption; experience higher rates of depression, childhood obesity, and ADHD; and experience problems accessing appropriate health care. Interpreting SES can be difficult due to the complexity of data, multiple data sources, and the large number of socioeconomic and demographic measures available. When SES is expanded to include additional social determinants of health (SDOH) such as language barriers and transportation barriers to care; access to employment and affordable housing; adequate nutrition, family support and social cohesion; health literacy; crime and violence; quality of housing; and other environmental conditions, the ability to measure and interpret the concept becomes even more difficult. This paper presents an approach to measuring SES and SDOH using publicly available data. Various statistical modeling techniques are used to define state-specific composite SES scores at local areas-ZIP Code and Census Tract. Once developed, the SES/SDOH models are applied to health care claims data to evaluate the relationship between health services utilization, cost, and social factors. The analysis includes a discussion of the potential impact of social factors on population risk adjustment.

Read the paper (PDF)

Paul LaBrec, 3M Health Information Systems, Inc.

Ryan Butterfield, DrPH, 3M HIS

Session 1450-2017:

The Effects of Socioeconomic, Demographic Variables on US Mortality Using SAS^® Visual Analytics

Every visualization tells a story. The effectiveness of showing data through visualization becomes clear as these visualizations will tell stories about differences in US mortality using the National Longitudinal Mortality Study (NLMS) data, using the Public-Use Microdata Samples (PUMS) of 1.2 million cases and 122 thousand records of mortality. SAS^® Visual Analytics is a versatile and flexible tool that easily displays the simple effects of differences in mortality rates between age groups, genders, races, places of birth (native or foreign), education and income levels, and so on. Sophisticated analyses including logistical regression (with interactions), decision trees, and neural networks that are displayed in a clear, concise manner help describe more interesting relationships among variables that influence mortality. Some of the most compelling examples are: Males who live alone have a higher mortality rate than females. White men have higher rates of suicide than black men.

Read the paper (PDF) | View the e-poster or slides (PDF)

Catherine Loveless-Schmitt, U.S. Census Bureau

Session 1482-2017:

The ODS EXCEL statement: Tips and Tricks for the TABULATE and REPORT Procedures

You might scream in pain or cry with joy that SAS^® software can directly produce output in Microsoft Excel as .xlsx workbooks. Excel is an excellent vehicle for delivering large amounts of summary information that needs to be partitioned for human review, exploratory filtering, and sorting. SAS supports ODS EXCEL as a production destination. This paper discusses using the ODS EXCEL statement and the TABULATE and REPORT procedures in the domain of summarizing cross-sectional data extracted from a medical claims database. The discussion covers data preparation, report preparation, and tabulation statements such as CLASS, CLASSLEV, and TABLE. The effects of STYLE options and the TAGATTR suboption for inserting features that are specific to Excel such as formulas, formats, and alignment are covered in detail. A short discussion of reusing these concepts in PROC REPORT statements such as DEFINE, COMPUTE, and CALL DEFINE are also covered.

Read the paper (PDF)

Richard DeVenezia, Johnson & Johnson

Session 1322-2017:

The Orange Lifestyle

As a freshman at a large university, life can be fun as well as stressful. The choices a freshman makes while in college might impact his or her overall health. In order to examine the overall health and different behaviors of students at Oklahoma State University, a survey was conducted among the freshmen students. The survey focused on capturing the psychological, environmental, diet, exercise, and alcohol and drug use among students. A total of 795 out of 1,036 freshman students completed the survey, which included around 270 questions that covered the range of issues mentioned above. An exploratory factor analysis identified 26 factors. For example, two factors that relate to the behavior of students under stress are eating and relaxing. Further understanding the variables that contribute to alcohol and drug use might help the university in planning appropriate interventions and preventions. Factor analysis with Cronbach's alpha provided insight into a more defined set of variables to help address these types of issues. We used SAS^® to do factor analysis as well as to create different clusters of students with unique characteristics and profiled these clusters

Read the paper (PDF)

Mohit Singhi, Oklahoma State University

Session 1354-2017:

Transitioning Health Care Data Analytic Platforms to the Cloud

As the IT industry moves to further embrace cloud computing and the benefits it enables, many companies have been slow to adopt these changes due to concerns around data compliance. Compliance with state and federal law and the relevant regulations often leads decision makers to insist that systems dealing with protected health information or similarly sensitive data remain on-premises, as the risks for non-compliance are so high. In this session, we detail BNL Consulting s standard practices for transitioning solutions that are compliant with the Health Insurance Portability and Accountability Act (HIPAA) from on-premises to a cloud-based environment hosted by Amazon Web Services (AWS). We explain that by following best practices and doing plenty of research, HIPAA compliance in a cloud environment is no more challenging than compliance in an on-premises environment. We discuss the role of best-in-practice dev-ops tools like Docker, Consul, ELK Stack, and others, which improve the reliability and the repeat-ability of your HIPAA-compliant solutions. We tie these recommendations to the use of common SAS tools and show how they can work in concert to stabilize and improve the performance of the solution over the on-premises alternatives. Although this presentation is focused on health care and HIPAA-specific examples, many of the described practices and processes apply to any sensitive-data solutions that are being considered for the cloud.

Read the paper (PDF)

Jay Baker, BNL Consulting

U

Session 1248-2017:

Using SAS^® to Analyze Emergency Department Visits: Medicaid Patients Compared to Other Pay Sources

Access to care for Medicaid beneficiaries is a topic of frequent study and debate. Section 1202 of the Affordable Care Act (ACA) requires states to raise Medicaid primary care payment rates to Medicare levels in 2013 and 2014. The federal government paid 100% of the increase. This program was designed to encourage primary care providers to participate in Medicaid, since this has long been a challenge for Medicaid. Whether this fee increase has increased access to primary care providers is still debated. Using SAS^®, we evaluated whether Medicaid patients have a higher incidence of non-urgent visits to local emergency departments (ED) than do patients with other payment sources. The National Hospital Ambulatory Medical Care Survey (NHAMCS) data set, obtained from the Centers for Disease Control (CDC), was selected, since it contains data relating to hospital emergency departments. This emergency room data, for years 2003 2011, was analyzed by diagnosis, expected payment method, reason for the visit, region, and year. To evaluate whether the ED visits were considered urgent or non-urgent, we used the NYU Billings algorithm for classifying ED utilization (NYU Wagner 2015). Three models were used for the analyses: Binary Classification, Multi-Classification, and Regression. In addition to finding no regional differences, decision trees and SAS^® Visual Analytics revealed that Medicaid patients do not have a higher rate of non-emergent visits when compared to other payment types.

Read the paper (PDF)

Bradley Casselman, CSA

Taylor Larkin, The University of Alabama

Denise McManus, The University of Alabama

Session 0159-2017:

Using SAS^® to Estimate Rates of Disease from Nationally Representative Databases

One of the research goals in public health is to estimate the burden of diseases on the US population. We describe burden of disease by analyzing the statistical association of various diseases with hospitalizations, emergency department (ED) visits, ambulatory/outpatient (doctors' offices) visits, and deaths. In this short paper, we discuss the use of large, nationally representative databases, such as those offered by the National Center for Health Statistics (NCHS) or the Agency for Healthcare Research and Quality (AHRQ), to produce reliable estimates of diseases for studies. In this example, we use SAS^® and SUDAAN to analyze the Nationwide Emergency Department Sample (NEDS), offered by AHRQ, to estimate ED visits for hand, foot, and mouth disease (HFMD) in children less than five years old.

Read the paper (PDF) | View the e-poster or slides (PDF)

Jessica Rudd, Kennesaw State University

Session 0923-2017:

Using the SYMPUT Function to Automatically Choose Reference for Bivariate Cox Proportional Models

Bivariate Cox proportional models are used when we test the association between a single covariate and the outcome. The test repeats for each covariate of interest. SAS^® uses the last category as the default reference. This raises problems when we want to keep using 0 as our reference for each covariate. The reference group can be changed in the CLASS statement. But, if a format is associated with a covariate, we have to use the corresponding format instead of raw numeric data. This problem becomes even worse when we have to repeat the test and manually enter the reference every single time. This presentation demonstrates one way of fixing the problem using the MACRO function and SYMPUT function.

Read the paper (PDF) | Download the data file (ZIP)

Zhongjie Cai, University of Southern California

V

Session 1374-2017:

Visualizing the Demographics of a Large Healthcare Provider's Membership using SAS^®

Visualization of complex data can be a valuable tool for researchers and policy makers, and Base SAS^® has powerful tools for such data exploration. In particular, SAS/GRAPH^® software is a flexible tool that enables the analyst to create a wide variety of data visualizations. This paper uses SAS^® to visualize complex demographic data related to the membership of a large American healthcare provider. Kaiser Permanente (KP) has demographic data about 4 million active members in Southern California. We use SAS to create a number of geographic visualizations of KP demographic data related to membership at the census-block level of detail and higher. Demographic data available from the US Census' American Community Survey (ACS) at the same level of geographic organization are also used as comparators to show how the KP membership differs from the demographics of the geographies from which it draws. In addition, we use SAS to create a number of visualizations of KP demographic data related to utilizations (inpatient and outpatient) at the medical center area level through time. As with the membership data, data available from the ACS is used as a comparator to show how patterns of KP utilizations at various medical centers compare to the demographics of the populations that these medical centers serve. The paper will be of interest to programmers learning how to use SAS to visualize data and to researchers interested in the demographics of one of the largest health care providers in the US.

Read the paper (PDF)

Don McCarthy, Kaiser Permanente

Michael Santema, Kaiser Permanente

W

Session SAS0708-2017:

What Is Needed for True Population Health Intelligence?

Health care has long been focused on providing reactive care for illness, injury, or chronic conditions. But the rising cost of providing health care has forced many countries, health insurance payers, and health care providers to shift approaches. A new focus on patient value includes providing financial incentives that emphasize clinical outcomes instead of treatments. This focus also means that providers and wellness programs are required to take a segmentation approach to the population under their care, targeting specific people based on their individual risks. This session discusses the benefits of a shift from thinking about health care data as a series of clinical or financial transactions, to one that is centered on patients and their respective clinical conditions. This approach allows for insights pertaining to care delivery processes and treatment patterns, including identification of potentially avoidable complications, variations in care provided, and inefficient care that contributes to waste. All of which contributes to poor clinical outcomes.

Read the paper (PDF)

Laurie Rose, SAS

Dan Stevens, SAS

Z

Session 0935-2017:

Zeroing In on Effective Member Communication: An Rx Education Study

In 2013, the Centers for Medicare & Medicaid Services (CMS) changed the pharmacy mail-order member-acquisition process so that Humana Pharmacy may only call a member with cost savings greater than $2.00 to educate the member on the potential savings and instruct the member to call back. The Rx Education call center asked for analytics work to help prioritize member outreach, improve conversions, and decrease the number of members who are unable to be contacted. After a year of contacting members using this additional insight, the conversions after agreement rate rose from 71.5% to 77.5% and the unable to contact rate fell from 30.7% to 17.4%. This case study takes you on an analytics journey from the initial problem diagnosis and analytics solution, followed by refinements, as well as test and learn campaigns.

Read the paper (PDF)

Brian Mitchell, Humana Inc.