Health Care Papers A-Z

A
Session 8160-2016:
A Descriptive Analysis of Reported Health Issues in Rural Jamaica
There are currently thousands of Jamaican citizens that lack access to basic health care. In order to improve the health-care system, I collect and analyze data from two clinics in remote locations of the island. This report analyzes data collected from Clarendon Parish, Jamaica. In order to create a descriptive analysis, I use SAS® Studio 9.4. A few of the procedures I use include: PROC IMPORT, PROC MEANS, PROC FREQ, and PROC GCHART. After conducting the aforementioned procedures, I am able to produce a descriptive analysis of the health issues plaguing the island.
Read the paper (PDF) | View the e-poster or slides (PDF)
Verlin Joseph, Florida A&M University
Session SAS3020-2016:
A Prompted Application to Easily Create Forest Plots and Inner Margin Tables
Two of the powerful features of ODS Graphics procedures is the ability to create forest plots and add inner margin tables to graphics output. The drawback, however, is that the syntax required by the programmer from PROC TEMPLATE is complex and tedious. A prompted application or even a parameterized stored process that connects PROC TEMPLATE code to a point-and-click application definitely makes life easier for coders in many industries who frequently create these types of graphic output.
Read the paper (PDF) | Download the data file (ZIP)
Ted Durie, SAS
Session SAS5642-2016:
A Ringside Seat: The ODS Excel Destination versus the ODS ExcelXP Tagset
The new and highly anticipated SAS® Output Delivery System (ODS) destination for Microsoft Excel is finally here! Available as a production feature in the third maintenance release of SAS® 9.4 (TS1M3), this new destination generates native Excel (XLSX) files that are compatible with Microsoft Office 2010 or later. This paper is written for anyone, from entry-level programmers to business analysts, who uses the SAS® System and Microsoft Excel to create reports. The discussion covers features and benefits of the new Excel destination, differences between the Excel destination and the older ExcelXP tagset, and functionality that exists in the ExcelXP tagset that is not available in the Excel destination. These topics are all illustrated with meaningful examples. The paper also explains how you can bridge the gap that exists as a result of differences in the functionality between the destination and the tagset. In addition, the discussion outlines when it is beneficial for you to use the Excel destination versus the ExcelXP tagset, and vice versa. After reading this paper, you should be able to make an informed decision about which tool best meets your needs.
Read the paper (PDF) | Watch the recording
Chevell Parker, SAS
Session 8000-2016:
A SAS® Macro for Geographically Weighted Negative Binomial Regression
Geographically Weighted Negative Binomial Regression (GWNBR) was developed by Silva and Rodrigues (2014). It is a generalization of the Geographically Weighted Poisson Regression (GWPR) proposed by Nakaya and others (2005) and of the Poisson and negative binomial regressions. This paper shows a SAS® macro to estimate the GWNBR model encoded in SAS/IML® software and shows how to use the SAS procedure GMAP to draw the maps.
Read the paper (PDF) | Download the data file (ZIP)
Alan Silva, University of Brasilia
Thais Rodrigues, University of Brasilia
Session 11620-2016:
A SAS® Program to Analyze High Dimensional Sparse Counts Data Using a Lumping and Splitting Approach
It is common for hundreds or even thousands of clinical endpoints to be collected from individual subjects, but events from the majority of clinical endpoints are rare. The challenge of analyzing high dimensional sparse data is in balancing analytical consideration for statistical inference and clinical interpretation with precise meaningful outcomes of interest at intra-categorical and inter-categorical levels. Lumping or grouping similar rare events into a composite category has the statistical advantage of increasing testing power, reducing multiplicity size, and avoiding competing risk problems. However, too much or inappropriate lumping would jeopardize the clinical meaning of interest and external validity, whereas splitting or keeping each individual event at its basic clinical meaningful category can overcome the drawbacks of lumping. This practice might create analytical issues of increasing type II errors, multiplicity size, competing risks, and having a large proportion of endpoints with rare events. It seems that lumping and splitting are diametrically opposite approaches, but in fact, they are complementary. Both are essential for high dimensional data analysis. This paper describes the steps required for the lumping and splitting analysis and presents SAS® code that can be used to implement each step.
View the e-poster or slides (PDF)
Shirley Lu, VAMHCS, Cooperative Studies Program
Session 5800-2016:
A Short Introduction to Longitudinal and Repeated Measures Data Analyses
Longitudinal and repeated measures data are seen in nearly all fields of analysis. Examples of this data include weekly lab test results of patients or performance of test score by children from the same class. Statistics students and analysts alike might be overwhelmed when it comes to repeated measures or longitudinal data analyses. They might try to educate themselves by diving into text books or taking semester-long or intensive weekend courses, resulting in even more confusion. Some might try to ignore the repeated nature of data and take short cuts such as analyzing all data as independent observations or analyzing summary statistics such as averages or changes from first to last points and ignoring all the data in-between. This hands-on presentation introduces longitudinal and repeated measures analyses without heavy emphasis on theory. Students in the workshop will have the opportunity to get hands-on experience graphing longitudinal and repeated measures data. They will learn how to approach these analyses with tools like PROC MIXED and PROC GENMOD. Emphasis will be on continuous outcomes, but categorical outcomes will briefly be covered.
Read the paper (PDF)
Leanne Goldstein, City of Hope
Session 11721-2016:
A Text Analytics Approach for Unraveling Trends in Government Funding for Small Businesses
Government funding is a highly sought-after financing medium by entrepreneurs and researchers aspiring to make a breakthrough in the market and compete with larger organizations. The funding is channeled via federal agencies that seek out promising research and products that improve the extant work. This study analyzes the project abstracts that earned the government funding through a text analytic approach over the past three and a half decades in the fields of Defense and Health and Human Services. This helps us understand the factors that might be responsible for awards made to a project. We collected the data of 100,279 records from the Small Business Innovation and Research (SBIR) website (https://www.sbir.gov). Initially, we analyze the trends in government funding over research by the small businesses. Then we perform text mining on the 55,791 abstracts of the projects that are funded by the Department of Defense and the Department of Health. From the text mining, we portray the key topics and patterns related to the past research and determine the changes in their trends over time. Through our study, we highlight the research trends to enable organizations to shape their business strategy and make better investments in the future research.
View the e-poster or slides (PDF)
Sairam Tanguturi, Oklahoma State University
Sagar Rudrake, Oklahoma state University
Nithish Reddy Yeduguri, Oklahoma State University
Session 5483-2016:
Accelerate the Time to Value by Implementing a Semantic Layer Using SAS® Visual Analytics
Although business intelligence experts agree that empowering businesses through a well-constructed semantic layer has undisputed benefits, a successful implementation has always been a formidable challenge. This presentation highlights the best practices to follow and mistakes to avoid, leading to a successful semantic layer implementation by using SAS® Visual Analytics. A correctly implemented semantic layer provides business users with quick and easy access to information for analytical and fact-based decision-making. Today, everyone talks about how the modern data platform enables businesses to store and analyze big data, but we still see most businesses trying to generate value from the data that they already store. From self-service to data visualization, business intelligence and descriptive analytics are still the key requirements for any business, and we discuss how to use SAS Visual Analytics to address them all. We also describe the key considerations in strategy, people, process, and data for a successful semantic layer rollout that uses SAS Visual Analytics.
Read the paper (PDF)
Arun Sugumar, KAVI ASSOCIATES
Vignesh Balasubramanian, Kavi Global
Harsh Sharma, Kav Global
Session 12527-2016:
An Analysis of Medicare Provider Utilization and Payment Data: A Focus on the Top 5 DRGs and Mental Health Care
In an effort to increase transparency and accountability in the US health care system, the Obama administration mandated the Centers for Medicare & Medicaid Services (CMS) to make available data for use by researchers and interested parties from the general public. Among the more well-known uses of this data are analyses published by the Wall Street Journal showing that a large, and in some cases, shocking discrepancy between what hospitals potentially charge the uninsured and what they are paid by Medicare for the same procedure. Analyses such as these highlight both potential inequities in the US health care system and, more importantly, potential opportunities for its reform. However, while capturing the public imagination, analyses such as these are but one means to capitalize on the remarkable wealth of information this data provides. Specifically, data from the public distribution CMS data can help both researchers and the public better understand the burden specific conditions and medical treatments place on the US health care system. It was this simple, but important objective that motivated the present study. Our specific analyses focus on two of what we believe to be important questions. First, using the total number of hospital discharges as a proxy for incidence of a condition or treatment, which have the highest incidence rates nationally? Does their incidence remain stable, or is it increasing/decreasing? And, is there variability in these incidence rates across states? Second, as psychologists, we are necessarily interested in understanding the state of mental health care. To date, and to the best of our knowledge, there has been no study utilizing the public inpatient Medicare provider utilization and payment data set to explore the utilization of mental illness services funded by Medicare.
Read the paper (PDF)
Joo Ann Lee, York University
Micheal Friendly, York University
cathy labrish, york university
Session 9542-2016:
An Efficient Algorithm in SAS® to Reconcile Two Different Serious Adverse Event (SAE) Data Sources
In the regulatory world of patient safety and pharmacovigilance, whether it's during clinical trials or post-market surveillance, SAEs that affect participants must be collected, and if certain criteria are met, reported to the FDA and other regulatory authorities. SAEs are often entered into multiple databases by various users, resulting in possible data discrepancies and quality loss. Efforts have been made to reconcile the SAE data between databases, but there is no industrial standard regarding the methodology or tool employed for this task. Some organizations still reconcile the data manually, with visual inspections and vocal verification. Not only is this laborious and error-prone, it becomes prohibitive when the data reach hundreds of records. We devised an efficient algorithm using SAS® to compare two data sources automatically. Our algorithm identifies matched, discrepant, and unpaired SAE records. Additionally, it employs a user-supplied list of synonyms to find non-identical but relevant matches. First, two data sources are combined and sorted by key fields such as Subject ID , Onset Date , Stop Date , and Event Term . Record counts and Levenshtein edit distances are calculated within certain groups to assist with sorting and matching. This combined record list is then fed into a DATA step to decide whether a record is paired or unpaired. For an unpaired record, a stub record with all fields set as ? is generated as a matching placeholder. Each record is written to one of two data sets. Later, the data sets are tagged and pulled into a comparison logic using hash objects, which enable field-by-field comparison and display discrepancies in clean format for each field. Identical fields or columns are cleared or removed for clarity. The result is a streamlined and user-friendly process that allows for fast and easy SAE reconciliation.
Read the paper (PDF)
Zhou (Tom) Hui, AstraZeneca
Session SAS3100-2016:
An Efficient Pattern Recognition Approach with Applications
This paper presents supervised and unsupervised pattern recognition techniques that use Base SAS® and SAS® Enterprise Miner™ software. A simple preprocessing technique creates many small image patches from larger images. These patches encourage the learned patterns to have local scale, which follows well-known statistical properties of natural images. In addition, these patches reduce the number of features that are required to represent an image and can decrease the training time that algorithms need in order to learn from the images. If a training label is available, a classifier is trained to identify patches of interest. In the unsupervised case, a stacked autoencoder network is used to generate a dictionary of representative patches, which can be used to locate areas of interest in new images. This technique can be applied to pattern recognition problems in general, and this paper presents examples from the oil and gas industry and from a solar power forecasting application.
Read the paper (PDF)
Patrick Hall, SAS
Alex Chien, SAS Institute
ILKNUR KABUL, SAS Institute
JORGE SILVA, SAS Institute
Session 11702-2016:
Analyzing Non-Normal Binomial and Categorical Response Variables under Varying Data Conditions
When dealing with non-normal categorical response variables, logistic regression is the robust method to use for modeling the relationship between categorical outcomes and different predictors without assuming a linear relationship between them. Within such models, the categorical outcome might be binary, multinomial, or ordinal, and predictors might be continuous or categorical. Another complexity that might be added to such studies is when data is longitudinal, such as when outcomes are collected at multiple follow-up times. Learning about modeling such data within any statistical method is beneficial because it enables researchers to look at changes over time. This study looks at several methods of modeling binary and categorical response variables within regression models by using real-world data. Starting with the simplest case of binary outcomes through ordinal outcomes, this study looks at different modeling options within SAS® and includes longitudinal cases for each model. To assess binary outcomes, the current study models binary data in the absence and presence of correlated observations under regular logistic regression and mixed logistic regression. To assess multinomial outcomes, the current study uses multinomial logistic regression. When responses are ordered, using ordinal logistic regression is required as it allows for interpretations based on inherent rankings. Different logit functions for this model include the cumulative logit, adjacent-category logit, and continuation ratio logit. Each of these models is also considered for longitudinal (panel) data using methods such as mixed models and Generalized Estimating Equations (GEE). The final consideration, which cannot be addressed by GEE, is the conditional logit to examine bias due to omitted explanatory variables at the cluster level. Different procedures for the aforementioned within SAS® 9.4 are explored and their strengths and limitations are specified for applied researchers finding simil ar data characteristics. These procedures include PROC LOGISTIC, PROC GLIMMIX, PROC GENMOD, PROC NLMIXED, and PROC PHREG.
Read the paper (PDF)
Niloofar Ramezani, University of Northern Colorado
Session 2380-2016:
Analyzing the Hospital Episode Statistics Data Set: How Much Does SAS® Help?
Hospital Episode Statistics (HES) is a data warehouse that contains records of all admissions, outpatient appointments, and accident and emergency (A&E) attendances at National Health Service (NHS) hospitals in England. Each year it processes over 125 million admitted patient, outpatient, and A&E records. Such a large data set gives endless research opportunities for researchers and health-care professionals. However, patient care data is complex and might be difficult to manage. This paper demonstrates the flexibility and power of SAS® programming tools such as the DATA step, the SQL procedure, and macros to help to analyze HES data.
Read the paper (PDF)
Violeta Balinskaite, Imperial College London
Session 9420-2016:
Application of Data Mining Techniques in Improving Breast Cancer Diagnosis
Breast cancer is the second leading cause of cancer deaths among women in the United States. Although mortality rates have been decreasing over the past decade, it is important to continue to make advances in diagnostic procedures as early detection vastly improves chances for survival. The goal of this study is to accurately predict the presence of a malignant tumor using data from fine needle aspiration (FNA) with visual interpretation. Compared with other methods of diagnosis, FNA displays the highest likelihood for improvement in sensitivity. Furthermore, this study aims to identify the variables most closely associated with accurate outcome prediction. The study utilizes the Wisconsin Breast Cancer data available within the UCI Machine Learning Repository. The data set contains 699 clinical case samples (65.52% benign and 34.48% malignant) assessing the nuclear features of fine needle aspirates taken from patients' breasts. The study analyzes a variety of traditional and modern models, including: logistic regression, decision tree, neural network, support vector machine, gradient boosting, and random forest. Prior to model building, the weights of evidence (WOE) approach was used to account for the high dimensionality of the categorical variables after which variable selection methods were employed. Ultimately, the gradient boosting model utilizing a principal component variable reduction method was selected as the best prediction model with a 2.4% misclassification rate, 100% sensitivity, 0.963 Kolmogorov-Smirnov statistic, 0.985 Gini coefficient, and 0.992 ROC index for the validation data. Additionally, the uniformity of cell shape and size, bare nuclei, and bland chromatin were consistently identified as the most important FNA characteristics across variable selection methods. These results suggest that future research should attempt to refine the techniques used to determine these specific model inputs. Greater accuracy in characterizing the FNA attributes will allow researchers to develop more promising models for early detection.
Read the paper (PDF) | View the e-poster or slides (PDF)
Josephine Akosa, Oklahoma State University
Session 11801-2016:
Application of Gradient Boosting through SAS® Enterprise Miner™ to Classify Human Activities
Using smart clothing with wearable medical sensors integrated to keep track of human health is now attracting many researchers. However, body movement caused by daily human activities inserts artificial noise into physiological data signals, which affects the output of a health monitoring/alert system. To overcome this problem, recognizing human activities, determining relationship between activities and physiological signals, and removing noise from the collected signals are essential steps. This paper focuses on the first step, which is human activity recognition. Our research shows that no other study used SAS® for classifying human activities. For this study, two data sets were collected from an open repository. Both data sets have 561 input variables and one nominal target variable with four levels. Principal component analysis along with other variable reduction and selection techniques were applied to reduce dimensionality in the input space. Several modeling techniques with different optimization parameters were used to classify human activity. The gradient boosting model was selected as the best model based on a test misclassification rate of 0.1233. That is, 87.67% of total events were classified correctly.
Read the paper (PDF) | View the e-poster or slides (PDF)
Minh Pham, Oklahoma State University
Mary Ruppert-Stroescu, Oklahoma State University
Mostakim Tanjil, Oklahoma State University
Session 5720-2016:
Arrays and DO Loops: Applications to Health-Care Diagnosis Fields
Arrays and DO loops have been used for years by SAS® programmers who work with diagnosis fields in health-care data, and the opportunity to use these techniques has only grown since the launch of the Affordable Care Act (ACA) in the United States. Users new to SAS or to the health-care field may find an overview of existing (as well as new) applications helpful. Risk-adjustment software, including the publicly available Health and Human Services (HHS) risk software that uses SAS and was released as part of the ACA implementation, is one example of code that is significantly improved by the use of arrays. Similar projects might include evaluations of diagnostic persistence, comparisons of diagnostic frequency or intensity between providers, and checks for unusual clusters of diagnosed conditions. This session reviews examples suitable for intermediate SAS users, including the application of two-dimensional arrays to diagnosis fields.
Read the paper (PDF)
Ryan Ferland, Blue Cross Blue Shield of Arizona
B
Session 7200-2016:
Bayesian Inference for Gaussian Semiparametric Multilevel Models
Bayesian inference for complex hierarchical models with smoothing splines is typically intractable, requiring approximate inference methods for use in practice. Markov Chain Monte Carlo (MCMC) is the standard method for generating samples from the posterior distribution. However, for large or complex models, MCMC can be computationally intensive, or even infeasible. Mean Field Variational Bayes (MFVB) is a fast deterministic alternative to MCMC. It provides an approximating distribution that has minimum Kullback-Leibler distance to the posterior. Unlike MCMC, MFVB efficiently scales to arbitrarily large and complex models. We derive MFVB algorithms for Gaussian semiparametric multilevel models and implement them in SAS/IML® software. To improve speed and memory efficiency, we use block decomposition to streamline the estimation of the large sparse covariance matrix. Through a series of simulations and real data examples, we demonstrate that the inference obtained from MFVB is comparable to that of PROC MCMC. We also provide practical demonstrations of how to estimate additional posterior quantities of interest from MFVB either directly or via Monte Carlo simulation.
Read the paper (PDF) | Download the data file (ZIP)
Jason Bentley, The University of Sydney
Cathy Lee, University of Technology Sydney
Session 3640-2016:
Big Data, Big Headaches: An Agile Modeling Solution Designed for the Information Age
The surge of data and data sources in marketing has created an analytical bottleneck in most organizations. Analytics departments have been pushed into a difficult decision: either purchase black-box analytical tools to generate efficiencies or hire more analysts, modelers, and data scientists. Knowledge gaps stemming from restrictions in black-box tools or from backlogs in the work of analytical teams have resulted in lost business opportunities. Existing big data analytics tools respond well when dealing with large record counts and small variable counts, but they fall short in bringing efficiencies when dealing with wide data. This paper discusses the importance of an agile modeling engine designed to deliver productivity, irrespective of the size of the data or the complexity of the modeling approach.
Read the paper (PDF) | Watch the recording
Mariam Seirafi, Cornerstone Group of Companies
Session 10540-2016:
Bridging the Gap: Importing Health Indicators Warehouse Data into SAS® Visual Analytics Using SAS® Stored Processes and APIs
The Health Indicators Warehouse (HIW) is part of the US Department of Health and Human Services' (DHHS) response to make federal data more accessible. Through it, users can access data and metadata for over 1,200 indicators from approximately 180 federal and nonfederal sources. The HIW also supports data access by applications such as SAS® Visual Analytics through the use of an application programming interface (API). An API serves as a communication interface for integration. As a result of the API, HIW data consumers using SAS Visual Analytics can avoid difficult manual data processing. This paper provides detailed information about how to access HIW data with SAS Visual Analytics in order to produce easily understood visualizations with minimal effort through a methodology that automates HIW data processing. This paper also shows how to run SAS® macros inside a stored process to make HIW data available in SAS Visual Analytics for exploration and reporting via API calls; the SAS macros are provided. Use cases involving dashboards are also examined in order to demonstrate the value of streaming data directly from the HIW. Both IT professionals and population health analysts will benefit from understanding how to import HIW data into SAS Visual Analytics using SAS® Stored Processes, macros, and APIs. This can be very helpful to organizations that want to lower maintenance costs associated with data management while gaining insights into health data with visualizations. This paper provides a starting point for any organization interested in deriving full value from SAS Visual Analytics while augmenting their work with HIW data.
Read the paper (PDF)
Li Hui Chen, US Consumer Product Safety Commission
Manuel Figallo, SAS
C
Session 7940-2016:
Careers in Biotatistics and Clinical SAS® Programming: An Overview for the Uninitiated
In the biopharmaceutical industry, biostatistics plays an important and essential role in the research and development of drugs, diagnostics, and medical devices. Familiarity with biostatistics combined with knowledge of SAS® software can lead to a challenging and rewarding career that also improves patients' lives. This paper provides a broad overview of the different types of jobs and career paths available, discusses the education and skill sets needed for each, and presents some ideas for overcoming entry barriers to careers in biostatistics and clinical SAS programming.
Read the paper (PDF)
Justina Flavin, Independent Consultant
Session SAS4321-2016:
Clinical Graphs Using SAS®
Graphs are essential for many clinical and health care domains, including analysis of clinical trials safety data and analysis of the efficacy of the treatment, such as change in tumor size. Creating such graphs is a breeze with procedures from SAS® 9.4 ODS Graphics. This paper shows how to create many industry-standard graphs such as Lipid Profile, Swimmer Plot, Survival Plot, Forest Plot with Subgroups, Waterfall Plot, and Patient Profile using Study Data Tabulation Model (SDTM) data with just a few lines of code.
Read the paper (PDF) | Download the data file (ZIP) | Watch the recording
Sanjay Matange, SAS
Session 7882-2016:
Comparative Regional Analysis of Bacterial Pneumonia Readmission Patients in the Medicare Category
One of the major diseases that records a high number of readmissions is bacterial pneumonia in Medicare patients. This study aims at comparing and contrasting Northeast and South regions of the United States based on the factors causing the 30-day readmissions. The study also identifies some of the ICD-9 medical procedures associated with those readmissions. Using SAS® Enterprise Guide® 7.1 and SAS® Enterprise Miner™ 14.1, this study analyzes patient and hospital demographics of the Northeast and South regions of United States using the Cerner Health Facts Database. Further, the study suggests some preventive measures to reduce readmissions. The 30-day readmissions are computed based on admission and discharge dates from 2003 until 2013. Using clustering, various hospitals, along with discharge disposition levels (where a patient is sent after discharge), are grouped. In both regions, the patients who are discharged to home have shown significantly lower chances of readmission. The odds ratio = 0.562 for the Northeast region and for the South region; all other disposition levels have significantly high odds ( > 1), compared to discharged to home. For the South region, females have around 53 percent more possibility of readmission compared to males (odds ratio = 1.535). Also some of the hospital groups have higher readmission cases. The association analysis using the Market Basket node finds catheterization procedures to be highly significant (Lift = 1.25 for the Northeast and 3.84 for the South; Confidence = 52.94% for the Northeast and 85.71% for the South) in bacterial pneumonia readmissions. By research it was found that during these procedures, patients are highly susceptible to acquiring Methicillin-resistant Staphylococcus aureus (MRSA) bacteria, which causes Methicillin-susceptible pneumonia. Providing timely follow up for the patients operated with these procedures might possibly limit readmissions. These patients might also be discharge d to home under medical supervision, as such patients had shown significantly lower chances of readmission.
Read the paper (PDF)
Heramb Joshi, Oklahoma State University
Goutam Chakraborty, Oklahoma State University
Dr. William Paiva, Center for Health Systems Innovation, OSU, Tulsa
Aditya Sharma, Oklahoma State Univesrity
Session 11778-2016:
Comparative Study of PROC EXPORT and Output Delivery System
Suppose that you have a very large data set with some specific values in one of the columns of the data set, and you want to classify the entire data set into different comma-separated-values format (CSV) sheets based on the values present in that specific column. Perhaps you might use codes using IF/THEN and ELSE statement conditions in SAS®, along with some OUTPUT statements. If you divide that data set into csv sheets, it is more frustrating to use the conventional, manual process of converting each of the separated data sets into csv files. This paper shows a comparative study of using the Macro command in SAS with the help of the proc Export statement and the Output Delivery System (ODS) command using proc tabulate. In these two processes, the whole tedious process is done automatically using the SAS code.
Read the paper (PDF) | Watch the recording
Saurabh Nandy, Oklahoma State University
Session 8620-2016:
Competing-Risks Analyses: An Overview of Regression Models
Competing-risks analyses are methods for analyzing the time to a terminal event (such as death or failure) and its cause or type. The cumulative incidence function CIF(j, t) is the probability of death by time t from cause j. New options in the LIFETEST procedure provide for nonparametric estimation of the CIF from event times and their associated causes, allowing for right-censoring when the event and its cause are not observed. Cause-specific hazard functions that are derived from the CIFs are the analogs of the hazard function when only a single cause is present. Death by one cause precludes occurrence of death by any other cause, because an individual can die only once. Incorporating explanatory variables in hazard functions provides an approach to assessing their impact on overall survival and on the CIF. This semiparametric approach can be analyzed in the PHREG procedure. The Fine-Gray model defines a subdistribution hazard function that has an expanded risk set, which consists of individuals at risk of the event by any cause at time t, together with those who experienced the event before t from any cause other than the cause of interest j. Finally, with additional assumptions a full parametric analysis is also feasible. We illustrate the application of these methods with empirical data sets.
Read the paper (PDF) | Watch the recording
Joseph Gardiner, Michigan State University
Session 8360-2016:
Creating a Better SAS® Visual Analytics User Experience while Working under HIPAA Data Restrictions
The HIPAA Privacy Rule can restrict geographic and demographic data used in health-care analytics. After reviewing the HIPAA requirements for de-identification of health-care data used in research, this poster guides the beginning SAS® Visual Analytics user through different options to create a better user experience. This poster presents a variety of data visualizations the analyst will encounter when describing a health-care population. We explore the different options SAS Visual Analytics offers and also offer tips on data preparation prior to using SAS® Visual Analytics Designer. Among the topics we cover are SAS Visual Analytics Designer object options (including geo bubble map, geo region map, crosstab, and treemap), tips for preparing your data for use in SAS Visual Analytics, and tips on filtering data after it's been loaded into SAS Visual Analytics, and more.
View the e-poster or slides (PDF)
Jessica Wegner, Optum
Margaret Burgess, Optum
Catherine Olson, Optum
Session SAS4240-2016:
Creating a Strong Business Case for SAS® Grid Manager: Translating Grid Computing Benefits to Business Benefits
SAS® Grid Manager, as well as other grid computing technologies, have a set of great capabilities that we, IT professionals, love to have in our systems. This technology increases high availability, allows parallel processing, facilitates increasing demand by scale out, and offers other features that make life better for those managing and using these environments. However, even when business users take advantage of these features, they are more concerned about the business part of the problem. Most of the time business groups hold the budgets and are key stakeholders for any SAS Grid Manager project. Therefore, it is crucial to demonstrate to business users how they will benefit from the new technologies, how the features will improve their daily operations, help them be more efficient and productive, and help them achieve better results. This paper guides you through a process to create a strong and persuasive business plan that translates the technology features from SAS Grid Manager to business benefits.
Read the paper (PDF) | Watch the recording
Marlos Bosso, SAS
Session 12521-2016:
Cyclist Death and Distracted Driving: Important Factors to Consider
Introduction: Cycling is on the rise in many urban areas across the United States. The number of cyclist fatalities is also increasing, by 19% in the last 3 years. With the broad-ranging personal and public health benefits of cycling, it is important to understand factors that are associated with these traffic-related deaths. There are more distracted drivers on the road than ever before, but the question remains of the extent that these drivers are affecting cycling fatality rates. Methods: This paper uses the Fatality Analysis Reporting System (FARS) data to examine factors related to cyclist death when the drivers are distracted. We use a novel machine learning approach, adaptive LASSO, to determine the relevant features and estimate their effect. Results: If a cyclist makes an improper action at or just before the time of the crash, the likelihood of the driver of the vehicle being distracted decreases. At the same time, if the driver is speeding or has failed to obey a traffic sign and fatally hits a cyclist, the likelihood of them also being distracted increases. Being distracted is related to other risky driving practices when cyclists are fatally injured. Environmental factors such as weather and road condition did not impact the likelihood that a driver was distracted when a cyclist fatality occurred.
Read the paper (PDF)
Lysbeth Floden, University of Arizona
Dr Melanie Bell, Dept of Epidemiology & Biostatistics, University of Arizona
Patrick O'Connor, University of Arizona
D
Session 10841-2016:
Data Review Listings on Auto-Pilot: Using SAS® and Windows Server to Automate Reports and Flag Incremental Data Records
During the course of a clinical trial study, large numbers of new and modified data records are received on an ongoing basis. Providing end users with an approach to continuously review and monitor study data, while enabling them to focus reviews on new or modified (incremental) data records, allows for greater efficiency in identifying potential data issues. In addition, supplying data reviewers with a familiar machine-readable output format (for example, Microsoft Excel) allows for greater flexibility in filtering, highlighting, and retention of data reviewers' comments. In this paper, we outline an approach using SAS® in a Windows server environment and a shared folder structure to automatically refresh data review listings. Upon each execution, the listings are compared against previously reviewed data to flag new and modified records, as well as carry forward any data reviewers' comments made during the previous review. In addition, we highlight the use and capabilities of the SAS® ExcelXP tagset, which enables greater control over data formatting, including management of Microsoft Excel's sometimes undesired automatic formatting. Overall, this approach provides a significantly improved end-user experience above and beyond the more traditional approach of performing cumulative or incremental data reviews using PDF listings.
Read the paper (PDF)
Victor Lopez, Samumed, LLC
Heli Ghandehari, Samumed, LLC
Bill Knowlton, Samumed, LLC
Christopher Swearingen, Samumed, LLC
Session 10740-2016:
Developing an On-Demand Web Report Platform Using Stored Processes and SAS® Web Application Server
As SAS® programmers, we often develop listings, graphs, and reports that need to be delivered frequently to our customers. We might decide to manually run the program every time we get a request, or we might easily schedule an automatic task to send a report at a specific date and time. Both scenarios have some disadvantages. If the report is manual, we have to find and run the program every time someone request an updated version of the output. It takes some time and it is not the most interesting part of the job. If we schedule an automatic task in Windows, we still sometimes get an email from the customers because they need the report immediately. That means that we have to find and run the program for them. This paper explains how we developed an on-demand report platform using SAS® Enterprise Guide®, SAS® Web Application Server, and stored processes. We had developed many reports for different customer groups, and we were getting more and more emails from them asking for updated versions of their reports. We felt we were not using our time wisely and decided to create an infrastructure where users could easily run their programs through a web interface. The tool that we created enables SAS programmers to easily release on-demand web reports with minimum programming. It has web interfaces developed using stored processes for the administrative tasks, and it also automatically customizes the front end based on the user who connects to the website. One of the challenges of the project was that certain reports had to be available to a specific group of users only.
Read the paper (PDF)
Romain Miralles, Genomic Health
Session 11841-2016:
Diagnosing Obstructive Sleep Apnea: Using Predictive Analytics Based on Wavelet Analysis in SAS/IML® Software and Spectral Analysis in PROC SPECTRA
This paper presents an application based on predictive analytics and feature-extraction techniques to develop the alternative method for diagnosis of obstructive sleep apnea (OSA). Our method reduces the time and cost associated with the gold standard or polysomnography (PSG), which is operated manually, by automatically determining the OSA's severity of a patient via classification models using the time series from a one-lead electrocardiogram (ECG). The data is from Dr. Thomas Penzel of Philipps-University, Germany, and can be downloaded at www.physionet.org. The selected data consists of 10 recordings (7 OSAs, and 3 controls) of ECG collected overnight, and non-overlapping-minute-by-minute OSA episode annotations (apnea and non-apnea states). This accounts for a total of 4,998 events (2,532 non-apnea and 2,466 apnea minutes). This paper highlights the nonlinear decomposition technique, wavelet analysis (WA) in SAS/IML® software, to maximize the information of OSA symptoms from ECG, resulting in useful predictor signals. Then, the spectral and cross-spectral analyses via PROC SPECTRA are used to quantify important patterns of those signals to numbers (features), namely power spectral density (PSD), cross power spectral density (CPSD), and coherency, such that the machine learning techniques in SAS® Enterprise Miner™, can differentiate OSA states. To eliminate variations such as body build, age, gender, and health condition, we normalize each feature by the feature of its original signal (that is, ratio of PSD of ECGs WA by PSD of ECG). Moreover, because different OSA symptoms occur at different times, we account for this by taking features from adjacency minutes into analysis, and select only important ones using a decision tree model. The best classification result in the validation data (70:30) obtained from the Random Forest model is 96.83% accuracy, 96.39% sensitivity, and 97.26% specificity. The results suggest our method is well comparable to the gold standard.
Read the paper (PDF)
Woranat Wongdhamma, Oklahoma State University
Session 11893-2016:
Disparities in the Receipt of Cardiac Revascularization Procedures
While cardiac revascularization procedures like cardiac catheterization, percutaneous transluminal angioplasty, and cardiac artery bypass surgery have become standard practices in restorative cardiology, the practice is not evenly prescribed or subscribed to. We analyzed Florida hospital discharge records for the period 1992 to 2010 to determine the odds of receipt of any of these procedures by Hispanics and non-Hispanic Whites. Covariates (potential confounders) were age, insurance type, gender, and year of discharge. Additional covariates considered included comorbidities such as hypertension, diabetes, obesity, and depression. The results indicated that even after adjusting for covariates, Hispanics in Florida during the time period 1992 to 2010 were consistently less likely to receive these procedures than their White counterparts. Reasons for this phenomenon are discussed.
Read the paper (PDF)
C. Perry Brown, Florida A&M University
Jontae Sanders, Florida Department of Health
E
Session SAS3120-2016:
Ensemble Modeling: Recent Advances and Applications
Ensemble models are a popular class of methods for combining the posterior probabilities of two or more predictive models in order to create a potentially more accurate model. This paper summarizes the theoretical background of recent ensemble techniques and presents examples of real-world applications. Examples of these novel ensemble techniques include weighted combinations (such as stacking or blending) of predicted probabilities in addition to averaging or voting approaches that combine the posterior probabilities by adding one model at a time. Fit statistics across several data sets are compared to highlight the advantages and disadvantages of each method, and process flow diagrams that can be used as ensemble templates for SAS® Enterprise Miner™ are presented.
Read the paper (PDF)
Wendy Czika, SAS
Ye Liu, SAS Institute
Session SAS5246-2016:
Enterprise Data Governance across SAS® and Beyond
As Data Management professionals, you have to comply with new regulations and controls. One such regulation is Basel Committee on Banking Supervision (BCBS) 239. To respond to these new demands, you have to put processes and methods in place to automate metadata collection and analysis, and to provide rigorous documentation around your data flows. You also have to deal with many aspects of data management including data access, data manipulation (ETL and other), data quality, data usage, and data consumption, often from a variety of toolsets that are not necessarily from a single vendor. This paper shows you how to use SAS® technologies to support data governance requirements, including third party metadata collection and data monitoring. It highlights best practices such as implementing a business glossary and establishing controls for monitoring data. Attend this session to become familiar with the SAS tools used to meet the new requirements and to implement a more managed environment.
Read the paper (PDF)
Jeff Stander, SAS
Session 9760-2016:
Evaluation of PROC IRT Procedure for Item Response Modeling
The experimental item response theory procedure (PROC IRT), included in recently released SAS/STAT® 13.1 and 13.2, enables item response modeling and trait estimation in SAS®. PROC IRT enables you to perform item parameter calibration and latent trait estimation using a wide spectrum of educational and psychological research. This paper evaluates the performance of PROC IRT in item parameter recovery and under various testing conditions. The pros and cons of PROC IRT versus BILOG-MG 3.0 are presented. For practitioners of IRT models, the development of IRT-related analysis in SAS is inspiring. This analysis offers a great choice to the growing population of IRT users. A shift to SAS can be beneficial based on several features of SAS: its flexibility in data management, its power in data analysis, its convenient output delivery, and its increasing richness in graphical presentation. It is critical to ensure the quality of item parameter calibration and trait estimation before you can continue with other components, such as test scoring, test form constructions, IRT equatings, and so on.
View the e-poster or slides (PDF)
Yi-Fang Wu, ACT, Inc.
Session 9340-2016:
Exact Logistic Models for Nested Binary Data in SAS®
The use of logistic models for independent binary data has relied first on asymptotic theory and later on exact distributions for small samples, as discussed by Troxler, Lalonde, and Wilson (2011). While the use of logistic models for dependent analysis based on exact analyses is not common, it is usually presented in the case of one-stage clustering. We present a SAS® macro that allows the testing of hypotheses using exact methods in the case of one-stage and two-stage clustering for small samples. The accuracy of the method and the results are compared to results obtained using an R program.
Read the paper (PDF)
Kyle Irimata, Arizona State University
Jeffrey Wilson, Arizona State University
Session SAS5060-2016:
Exploring SAS® Embedded Process Technologies on Hadoop
SAS® Embedded Process offers a flexible, efficient way to leverage increasing amounts of data by injecting the processing power of SAS® directly where the data lives. SAS Embedded Process can tap into the massively parallel processing (MPP) architecture of Hadoop for scalable performance. Using SAS® In-Database Technologies for Hadoop, you can run scoring models generated by SAS® Enterprise Miner™ or, with SAS® In-Database Code Accelerator for Hadoop, user-written DS2 programs in parallel. With SAS Embedded Process on Hadoop you can also perform data quality operations, and extract and transform data using SAS® Data Loader. This paper explores key SAS technologies that run inside the Hadoop parallel processing framework and prepares you to get started with them.
Read the paper (PDF)
David Ghazaleh, SAS
F
Session 9260-2016:
FASHION, STYLE "GOTTA HAVE IT" COMPUTE DEFINE BLOCK
Do you create complex reports using PROC REPORT? Are you confused by the COMPUTE BLOCK feature of PROC REPORT? Are you even aware of it? Maybe you already produce reports using PROC REPORT, but suddenly your boss needs you to modify some of the values in one or more of the columns. Maybe your boss needs to see the values of some rows in boldface and others highlighted in a stylish yellow. Perhaps one of the columns in the report needs to display a variety of fashionable formats (some with varying decimal places and some without any decimals). Maybe the customer needs to see a footnote in specific cells of the report. Well, if this sounds familiar then come take a look at the COMPUTE BLOCK of PROC REPORT. This paper shows a few tips and tricks of using the COMPUTE DEFINE block with conditional IF/THEN logic to make your reports stylish and fashionable. The COMPUTE BLOCK allows you to use data DATA step code within PROC REPORT to provide customization and style to your reports. We'll see how the Census Bureau produces a stylish demographic profile for customers of its Special Census program using PROC REPORT with the COMPUTE BLOCK. The paper focuses on how to use the COMPUTE BLOCK to create this stylish Special Census profile. The paper shows quick tips and simple code to handle multiple formats within the same column, make the values in the Total rows boldface, trafficlighting, and how to add footnotes to any cell based on the column or row. The Special Census profile report is an Excel table created with ODS tagsets.ExcelXP that is stylish and fashionable, thanks in part to the COMPUTE BLOCK.
Read the paper (PDF) | Watch the recording
Chris Boniface, Census Bureau
Session SAS4720-2016:
Fitting Multilevel Hierarchical Mixed Models Using PROC NLMIXED
Hierarchical nonlinear mixed models are complex models that occur naturally in many fields. The NLMIXED procedure's ability to fit linear or nonlinear models with standard or general distributions enables you to fit a wide range of such models. SAS/STAT® 13.2 enhanced PROC NLMIXED to support multiple RANDOM statements, enabling you to fit nested multilevel mixed models. This paper uses an example to illustrate the new functionality.
Read the paper (PDF)
Raghavendra Kurada, SAS
Session 2700-2016:
Forecasting Behavior with Age-Period-Cohort Models: How APC Predicted the US Mortgage Crisis, but Also Does So Much More
We introduce age-period-cohort (APC) models, which analyze data in which performance is measured by age of an account, account open date, and performance date. We demonstrate this flexible technique with an example from a recent study that seeks to explain the root causes of the US mortgage crisis. In addition, we show how APC models can predict website usage, retail store sales, salesperson performance, and employee attrition. We even present an example in which APC was applied to a database of tree rings to reveal climate variation in the southwestern United States.
View the e-poster or slides (PDF)
Joseph Breeden, Prescient Models
G
Session 7980-2016:
Generating and Testing the Properties of Randomization Sequences Created by the Adaptive Biased Coin Randomization Design
There are many methods to randomize participants in randomized control trials. If it is important to have approximately balanced groups throughout the course of the trial, simple randomization is not a suitable method. Perhaps the most common alternative method that provides balance is the blocked randomization method. A less well-known method called the treatment adaptive randomized design also achieves balance. This paper shows you how to generate an entire randomization sequence to randomize participants in a two-group clinical trial using the adaptive biased coin randomization design (ABCD), prior to recruiting any patients. Such a sequence could be used in a central randomization server. A unique feature of this method allows the user to determine the extent to which imbalance is permitted to occur throughout the sequence while retaining the probabilistic nature that is essential to randomization. Properties of sequences generated by the ABCD approach are compared to those generated by simple randomization, a variant of simple randomization that ensures balance at the end of the sequence, and by blocked randomization.
View the e-poster or slides (PDF)
Gary Foster, St Joseph's Healthcare
Session SAS5501-2016:
Getting There from Here: Lifting Enterprise SAS® to the Amazon Public Cloud
If your organization already deploys one or more software solutions via Amazon Web Services (AWS), you know the value of the public cloud. AWS provides a scalable public cloud with a global footprint, allowing users access to enterprise software solutions anywhere at any time. Although SAS® began long before AWS was even imagined, many loyal organizations driven by SAS are moving their local SAS analytics into the public AWS cloud, alongside other software hosted by AWS. SAS® Solutions OnDemand has assisted organizations in this transition. In this paper, we describe how we extended our enterprise hosting business to AWS. We describe the open source automation framework from which SAS Soultions onDemand built our automation stack, which simplified the process of migrating a SAS implementation. We'll provide the technical details of our automation and network footprint, a discussion of the technologies we chose along the way, and a list of lessons learned.
Read the paper (PDF)
Ethan Merrill, SAS
Bryan Harkola, SAS
Session 7300-2016:
Graphing Made Easy for Project Management
Project management is a hot topic across many industries, and there are multiple commercial software applications for managing projects available. The reality, however, is that the majority of project management software is not applicable for daily usage. SAS® has a solution for this issue that can be used for managing projects graphically in real time. This paper introduces a new paradigm for project management using the SAS® Graph Template Language (GTL). SAS clients, in real time, can use GTL to visualize resource assignments, task plans, delivery tracking, and project status across multiple project levels for more efficient project management.
Read the paper (PDF)
Zhouming(Victor) Sun, Medimmune
H
Session 3620-2016:
Health Care's One Percenters: Hot-Spotting to Identify Areas of Need and Opportunity
Since Atul Gawande popularized the term in describing the work of Dr. Jeffrey Brenner in a New Yorker article, hot-spotting has been used in health care to describe the process of identifying super-utilizers of health care services, then defining intervention programs to coordinate and improve their care. According to Brenner's data from Camden, New Jersey, 1% of patients generate 30% of payments to hospitals, while 5% of patients generate 50% of payments. Analyzing administrative health care claims data, which contains information about diagnoses, treatments, costs, charges, and patient sociodemographic data, can be a useful way to identify super-users, as well as those who may be receiving inappropriate care. Both groups can be targeted for care management interventions. In this paper, techniques for patient outlier identification and prioritization are discussed using examples from private commercial and public health insurance claims data. The paper also describes techniques used with health care claims data to identify high-risk, high-cost patients and to generate analyses that can be used to prioritize patients for various interventions to improve their health.
Read the paper (PDF)
Paul LaBrec, 3M Health Information Systems
Session SAS4440-2016:
How Do My Neighbors Affect Me? SAS/ETS® Methods for Spatial Econometric Modeling
Contemporary data-collection processes usually involve recording information about the geographic location of each observation. This geospatial information provides modelers with opportunities to examine how the interaction of observations affects the outcome of interest. For example, it is likely that car sales from one auto dealership might depend on sales from a nearby dealership either because the two dealerships compete for the same customers or because of some form of unobserved heterogeneity common to both dealerships. Knowledge of the size and magnitude of the positive or negative spillover effect is important for creating pricing or promotional policies. This paper describes how geospatial methods are implemented in SAS/ETS® and illustrates some ways you can incorporate spatial data into your modeling toolkit.
Read the paper (PDF)
Guohui Wu, SAS
Jan Chvosta, SAS
Session 10840-2016:
How to Speed Up Your Validation Process Without Really Trying
This paper provides tips and techniques to speed up the validation process without and with automation. For validation without automation, it introduces both standard use and clever use of options and statements to be implemented in the COMPARE procedure that can speed up the validation process. For validation with automation, a macro named %QCDATA is introduced for individual data set validation, and a macro named %QCDIR is introduced for comparison of data sets in two different directories. Also introduced in this section is a definition of &SYSINFO and an explanation of how it can be of use to interpret the result of the comparison.
Read the paper (PDF)
Alice Cheng, Portola Pharmaceuticals
Justina Flavin, Independent Consultant
Michael Wise, Experis BI & Analytics Practice
Session 9800-2016:
How to Visualize SAS® Data with JavaScript Libraries like HighCharts and D3
Have you ever wondered how to get the most from Web 2.0 technologies in order to visualize SAS® data? How to make those graphs dynamic, so that users can explore the data in a controlled way, without needing prior knowledge of SAS products or data science? Wonder no more! In this session, you learn how to turn basic sashelp.stocks data into a snazzy HighCharts stock chart in which a user can review any time period, zoom in and out, and export the graph as an image. All of these features with only two DATA steps and one SORT procedure, for 57 lines of SAS code.
Download the data file (ZIP) | View the e-poster or slides (PDF)
Vasilij Nevlev, Analytium Ltd
I
Session SAS4040-2016:
Improving Health Care Quality with the RAREEVENTS Procedure
Statistical quality improvement is based on understanding process variation, which falls into two categories: variation that is natural and inherent to a process, and unusual variation due to specific causes that can be addressed. If you can distinguish between natural and unusual variation, you can take action to fix a broken process and avoid disrupting a stable process. A control chart is a tool that enables you to distinguish between the two types of variation. In many health care activities, carefully designed processes are in place to reduce variation and limit adverse events. The types of traditional control charts that are designed to monitor defect counts are not applicable to monitoring these rare events, because these charts tend to be overly sensitive, signaling unusual variation each time an event occurs. In contrast, specialized rare events charts are well suited to monitoring low-probability events. These charts have gained acceptance in health care quality improvement applications because of their ease of use and their suitability for processes that have low defect rates. The RAREEVENTS procedure, which is new in SAS/QC® 14.1, produces rare events charts. This paper presents an overview of PROC RAREEVENTS and illustrates how you can use rare events charts to improve health care quality.
Read the paper (PDF)
Bucky Ransdell, SAS
Session 10780-2016:
Increasing Efficiency by Parallel Processing
Working with big data is often time consuming and challenging. The primary goal in programming is to maximize throughputs while minimizing the use of computer processing time, real time, and programmers' time. By using the Multiprocessing (MP) CONNECT method on a symmetric multiprocessing (SMP) computer, a programmer can divide a job into independent tasks and execute the tasks as threads in parallel on several processors. This paper demonstrates the development and application of a parallel processing program on a large amount of health-care data.
Read the paper (PDF) | View the e-poster or slides (PDF)
Shuhua Liang, Kaiser Permanente
Session 8680-2016:
Integrating Microsoft VBScript and SAS®
Microsoft Visual Basic Scripting Edition (VBScript) and SAS® software are each powerful tools in their own right. These two technologies can be combined so that SAS code can call a VBScript program or vice versa. This gives a programmer the ability to automate SAS tasks; traverse the file system; send emails programmatically via Microsoft Outlook or SMTP; manipulate Microsoft Word, Microsoft Excel, and Microsoft PowerPoint files; get web data; and more. This paper presents example code to demonstrate each of these capabilities.
Read the paper (PDF) | Download the data file (ZIP)
Christopher Johnson, BrickStreet Insurance
Session 11420-2016:
Integrating SAS® and R to Perform Optimal Propensity Score Matching
In studies where randomization is not possible, imbalance in baseline covariates (confounding by indication) is a fundamental concern. Propensity score matching (PSM) is a popular method to minimize this potential bias, matching individuals who received treatment to those who did not, to reduce the imbalance in pre-treatment covariate distributions. PSM methods continue to advance, as computing resources expand. Optimal matching, which selects the set of matches that minimizes the average difference in propensity scores between mates, has been shown to outperform less computationally intensive methods. However, many find the implementation daunting. SAS/IML® software allows the integration of optimal matching routines that execute in R, e.g. the R optmatch package. This presentation walks through performing optimal PSM in SAS® through implementing R functions, assessing whether covariate trimming is necessary prior to PSM. It covers the propensity score analysis in SAS, the matching procedure, and the post-matching assessment of covariate balance using SAS/STAT® 13.2 and SAS/IML procedures.
Read the paper (PDF)
Lucy D'Agostino McGowan, Vanderbilt University
Robert Greevy, Department of Biostatistics, Vanderbilt University
K
Session 10680-2016:
Key Features in ODS Graphics for Efficient Clinical Graphing
High-quality effective graphs not only enhance understanding of the data but also facilitate regulators in the review and approval process. In recent SAS® releases, SAS has made significant progress toward more efficient graphing in ODS Statistical Graphics (SG) procedures and Graph Template Language (GTL). A variety of graphs can be quickly produced using convenient built-in options in SG procedures. With graphical examples and comparison between SG procedures and traditional SAS/GRAPH® procedures in reporting clinical trial data, this paper highlights several key features in ODS Graphics to efficiently produce sophisticated statistical graphs with more flexible and dynamic control of graphical presentation including: 1) Better control of axes in different scales and intervals; 2) Flexible ways to control graph appearance; 3) Plots overlay in single-cell or multi-cell graphs; 4) Enhanced annotation; 5) Classification panel of multiple plots with individualized labeling.
Read the paper (PDF) | Watch the recording
Yuxin (Ellen) Jiang, Biogen
Session 7140-2016:
Key Requirements For SAS® Grid Users
Considering the fact that SAS® Grid Manager is becoming more and more popular, it is important to fulfill the user's need for a successful migration to a SAS® Grid environment. This paper focuses on key requirements and common issues for new SAS Grid users, especially if they are coming from a traditional environment. This paper describes a few common requirements like the need for a current working directory, the change of file system navigation in SAS® Enterprise Guide® with user-given location, getting job execution summary email, and so on. The GRIDWORK directory has been introduced in SAS Grid Manager, which is a bit different from the traditional SAS WORK location. This paper explains how you can use the GRIDWORK location in a more user-friendly way. Sometimes users experience data set size differences during grid migration. A few important reasons for data set size difference are demonstrated. We also demonstrate how to create new custom scripts as per business needs and how to incorporate them with SAS Grid Manager engine.
Read the paper (PDF) | View the e-poster or slides (PDF)
Piyush Singh, TATA Consultancy Services Ltd
Tanuj Gupta, TATA Consultancy Services
Prasoon Sangwan, Tata consultancy services limited
L
Session 5500-2016:
Latent Class Analysis Using the LCA Procedure
This paper presents the use of latent class analysis (LCA) to base the identification of a set of mutually exclusive latent classes of individuals on responses to a set of categorical, observed variables. The LCA procedure, a user-defined SAS® procedure for conducting LCA and LCA with covariates, is demonstrated using follow-up data on substance use from Monitoring the Future panel data, a nationally representative sample of high school seniors who are followed at selected time points during adulthood. The demonstration includes guidance on data management prior to analysis, PROC LCA syntax requirements and options, and interpretation of output.
Read the paper (PDF)
Patricia Berglund, University of Michigan
Session 11900-2016:
Latent Structure Analysis Procedures in SAS®
The current study looks at several ways to investigate latent variables in longitudinal surveys and their use in regression models. Three different analyses for latent variable discovery are briefly reviewed and explored. The latent analysis procedures explored in this paper are PROC LCA, PROC LTA, PROC TRAJ, and PROC CALIS. The latent variables are then included in separate regression models. The effect of the latent variables on the fit and use of the regression model compared to a similar model using observed data is briefly reviewed. The data used for this study was obtained via the National Longitudinal Study of Adolescent Health, a study distributed and collected by Add Health. Data was analyzed using SAS® 9.4. This paper is intended for any level of SAS® user. This paper is also written to an audience with a background in behavioral science and/or statistics.
Read the paper (PDF)
Deanna Schreiber-Gregory, National University
Session SAS5140-2016:
Leverage Your Reports in SAS® Visual Analytics: Using SAS® Theme Designer
Is uniqueness essential for your reports? SAS® Visual Analytics provides the ability to customize your reports to make them unique by using the SAS® Theme Designer. The SAS Theme Designer can be accessed from the SAS® Visual Analytics Hub to create custom themes to meet your branding needs and to ensure a unified look across your company. The report themes affect the colors, fonts, and other elements that are used in tables and graphs. The paper explores how to access SAS Theme Designer from the SAS Visual Analytics home page, how to create and modify report themes that are used in SAS Visual Analytics, how to create report themes from imported custom themes, and how to import and export custom report themes.
Read the paper (PDF)
Meenu Jaiswal, SAS
Ipsita Samantarai, SAS Research & Development (India) Pvt Ltd
Session 1720-2016:
Limit of Detection (LoD) Estimation Using Parametric Curve Fitting to (Hit) Rate Data: The LOD_EST SAS® Macro
The Limit of Detection (LoD) is defined as the lowest concentration or amount of material, target, or analyte that is consistently detectable (for polymerase chain reaction [PCR] quantitative studies, in at least 95% of the samples tested). In practice, the estimation of the LoD uses a parametric curve fit to a set of panel member (PM1, PM2, PM3, and so on) data where the responses are binary. Typically, the parametric curve fit to the percent detection levels takes on the form of a probit or logistic distribution. The SAS® PROBIT procedure can be used to fit a variety of distributions, including both the probit and logistic. We introduce the LOD_EST SAS macro that takes advantage of the SAS PROBIT procedure's strengths and returns an information-rich graphic as well as a percent detection table with associated 95% exact (Clopper-Pearson) confidence intervals for the hit rates at each level.
Read the paper (PDF) | Download the data file (ZIP)
Jesse Canchola, Roche Molecular Systems, Inc.
Pari Hemyari, Roche Molecular Systems, Inc.
Session SAS4060-2016:
Location, Location, Location--Analytics with SAS® Visual Analytics and Esri
Business Intelligence users analyze business data in a variety of ways. Seventy percent of business data contains location information. For in-depth analysis, it is essential to combine location information with mapping. New analytical capabilities are added to SAS® Visual Analytics, leveraging the new partnership with Esri, a leader in location intelligence and mapping. The new capabilities enable users to enhance the analytical insights from SAS Visual Analytics. This paper demonstrates and discusses the new partnership with Esri and the new capabilities added to SAS Visual Analytics.
Read the paper (PDF)
Murali Nori, SAS
Himesh Patel, SAS
M
Session 5580-2016:
Macro Variables in SAS® Enterprise Guide®
For SAS® Enterprise Guide® users, sometimes macro variables and their values need to be brought over to the local workspace from the server, especially when multiple data sets or outputs need to be written to separate files in a local drive. Manually retyping the macro variables and their values in the local workspace after they have been created on the server workspace would be time-consuming and error-prone, especially when we have quite a number of macro variables and values to bring over. Instead, this task can be achieved in an efficient manner by using dictionary tables and the CALL SYMPUT routine, as illustrated in more detail below. The same approach can also be used to bring macro variables and their values from the local to the server workspace.
Read the paper (PDF) | Download the data file (ZIP) | Watch the recording
Khoi To, Office of Planning and Decision Support, Virginia Commonwealth University
Session SAS6344-2016:
Mass-Scale, Automated Machine Learning and Model Deployment Using SAS® Factory Miner and SAS® Decision Manager
Business problems have become more stratified and micro-segmentation is driving the need for mass-scale, automated machine learning solutions. Additionally, deployment environments include diverse ecosystems, requiring hundreds of models to be built and deployed quickly via web services to operational systems. The new SAS® automated modeling tool allows you to build and test hundreds of models across all of the segments in your data, testing a wide variety of machine learning techniques. The tool is completely customizable, allowing you transparent access to all modeling results. This paper shows you how to identify hundreds of champion models using SAS® Factory Miner, while generating scoring web services using SAS® Decision Manager. Immediate benefits include efficient model deployments, which allow you to spend more time generating insights that might reveal new opportunities, expose hidden risks, and fuel smarter, well-timed decisions.
Read the paper (PDF)
Jonathan Wexler, SAS
Steve Sparano, SAS
Session 10761-2016:
Medicare Fraud Analytics Using Cluster Analysis: How PROC FASTCLUS Can Refine the Identification of Peer Comparison Groups
Although limited to a small fraction of health care providers, the existence and magnitude of fraud in health insurance programs requires the use of fraud prevention and detection procedures. Data mining methods are used to uncover odd billing patterns in large databases of health claims history. Efficient fraud discovery can involve the preliminary step of deploying automated outlier detection techniques in order to classify identified outliers as potential fraud before an in-depth investigation. An essential component of the outlier detection procedure is the identification of proper peer comparison groups to classify providers as within-the-norm or outliers. This study refines the concept of peer comparison group within the provider category and considers the possibility of distinct billing patterns associated with medical or surgical procedure codes identifiable by the Berenson-Eggers Type of System (BETOS). The BETOS system covers all HCPCS codes (Health Care Procedure Coding System); assigns a HCPCS code to only one BETOS code; consists of readily understood clinical categories; consists of categories that permit objective assignment& (Center for Medicare and Medicaid Services, CMS). The study focuses on the specialty General Practice and involves two steps: first, the identification of clusters of similar BETOS-based billing patterns; and second, the assessment of the effectiveness of these peer comparison groups in identifying outliers. The working data set is a sample of the summary of 2012 data of physicians active in health care government programs made publicly available by the CMS through its website. The analysis uses PROC FASTCLUS, the SAS® cubic clustering criterion approach, to find the optimal number of clusters in the data. It also uses PROC ROBUSTREG to implement a multivariate adaptive threshold outlier detection method.
Read the paper (PDF) | Download the data file (ZIP)
Paulo Macedo, Integrity Management Services
Session 7700-2016:
Medicare Payment Models: Past, Present, and Future
In 1965, nearly half of all Americans 65 and older had no health insurance. Now, 50 years later, only 2% lack health insurance. The difference, of course, is Medicare. Medicare now covers 55 million people, about 17% of the US population, and is the single largest purchaser of personal health care. Despite this success, the rising costs of health care in general and Medicare in particular have become a growing concern. Medicare policies are important not only because they directly affect large numbers of beneficiaries, payers, and providers, but also because they affect private-sector policies as well. Analyses of Medicare policies and their consequences are complicated both by the effects of an aging population that has changing cost drivers (such as less smoking and more obesity) and by different Medicare payment models. For example, the average age of the Medicare population will initially decrease as the baby-boom generation reaches eligibility, but then increase as that generation grows older. Because younger beneficiaries have lower costs, these changes will affect cost trends and patterns that need to be interpreted within the larger context of demographic shifts. This presentation examines three Medicare payment models: fee-for-service (FFS), Medicare Advantage (MA), and Accountable Care Organizations (ACOs). FFS, originally based on payment methods used by Blue Cross and Blue Shield in the mid-1960s, pays providers for individual services (for example, physicians are paid based on the fees they charge). MA is a capitated payment model in which private plans receive a risk-adjusted rate. ACOs are groups of providers who are given financial incentives for reducing cost and maintaining quality of care for specified beneficiaries. Each model has strengths and weaknesses in specific markets. We examine each model, in addition to new data sources and more recent, innovative payment models that are likely to affect future trends.
Read the paper (PDF)
Paul Gorrell, IMPAQ International
Session SAS5801-2016:
Minimizing Fraud Risk through Dynamic Entity Resolution and Network Analysis
Every day, businesses have to remain vigilant of fraudulent activity, which threatens customers, partners, employees, and financials. Normally, networks of people or groups perpetrate deviant activity. Finding these connections is now made easier for analysts with SAS® Visual Investigator, an upcoming SAS® solution that ultimately minimizes the loss of money and preserves mutual trust among its shareholders. SAS Visual Investigator takes advantage of the capabilities of the new SAS® In-Memory Server. Investigators can efficiently investigate suspicious cases across business lines, which has traditionally been difficult. However, the time required to collect, process and identify emerging fraud and compliance issues has been costly. Making proactive analysis accessible to analysts is now more important than ever. SAS Visual Investigator was designed with this goal in mind and a key component is the visual social network view. This paper discusses how the network analysis view of SAS Visual Investigator, with all its dynamic visual capabilities, can make the investigative process more informative and efficient.
Read the paper (PDF)
Danielle Davis, SAS
Stephen Boyd, SAS Institute
Ray Ong, SAS Institute
Session 10460-2016:
Missing Values: They Are NOT Nothing
When analyzing data with SAS®, we often encounter missing or null values in data. Missing values can arise from the availability, collectibility, or other issues with the data. They represent the imperfect nature of real data. Under most circumstances, we need to clean, filter, separate, impute, or investigate the missing values in data. These processes can take up a lot of time, and they are annoying. For these reasons, missing values are usually unwelcome and need to be avoided in data analysis. There are two sides to every coin, however. If we can think outside the box, we can take advantage of the negative features of missing values for positive uses. Sometimes, we can create and use missing values to achieve our particular goals in data manipulation and analysis. These approaches can make data analyses convenient and improve work efficiency for SAS programming. This kind of creative and critical thinking is the most valuable quality for data analysts. This paper exploits real-world examples to demonstrate the creative uses of missing values in data analysis and SAS programming, and discusses the advantages and disadvantages of these methods and approaches. The illustrated methods and advanced programming skills can be used in a wide variety of data analysis and business analytics fields.
Read the paper (PDF)
Justin Jia, Trans Union Canada
Shan Shan Lin, CIBC
N
Session 10360-2016:
Nine Frequently Asked Questions about Getting Started with SAS® Visual Analytics
You've heard all the talk about SAS® Visual Analytics--but maybe you are still confused about how the product would work in your SAS® environment. Many customers have the same points of confusion about what they need to do with their data, how to get data into the product, how SAS Visual Analytics would benefit them, and even should they be considering Hadoop or the cloud. In this paper, we cover the questions we are asked most often about implementation, administration, and usage of SAS Visual Analytics.
Read the paper (PDF) | Watch the recording
Tricia Aanderud, Zencos Consulting LLC
Ryan Kumpfmiller, Zencos Consulting
Nick Welke, Zencos Consulting
P
Session 11780-2016:
PROC IMSTAT Boosts Knowledge Discovery in Big-Database (KDBD) in a Pharmaceutical Company
In recent years, big data has been in the limelight as a solution for business issues. Implementation of big data mining has begun in a variety of industries. The variety of data types and the velocity of increasing data have been astonishing, represented as structured data stored in a relational database or unstructured data (for example, text data, GPS data, image data, and so on). In the pharmaceutical industry, big data means real-world data such as Electronic Health Record, genomics data, medical imaging data, social network data, and so on. Handling these types of big data often requires the special environment infrastructure for statistical computing. Our presentation covers case study 1: IMSTAT implementation as a large-scale parallel computation environment; conversion from business issue to data science issue in pharma; case study 2: data handling and machine learning for vertical and horizontal big data by using PROC IMSTAT; the importance of the analysis result integration; and caution points of big data mining.
Read the paper (PDF)
Yoshitake Kitanishi, Shionogi & Co., Ltd.
Ryo Kiguchi, Shionogi & Co., Ltd.
Akio Tsuji, Shionogi & Co., Ltd.
Hideaki Watanabe, Shionogi & Co., Ltd.
Session 7540-2016:
PROC SQL for SQL DieHards
Inspired by Christianna William's paper on transitioning to PROC SQL from the DATA step, this paper aims to help SQL programmers transition to SAS® by using PROC SQL. SAS adapted the Structured Query Language (SQL) by means of PROC SQL back with SAS®6. PROC SQL syntax closely resembles SQL. However, there are some SQL features that are not available in SAS. Throughout this paper, we outline common SQL tasks and how they might differ in PROC SQL. We also introduce useful SAS features that are not available in SQL. Topics covered are appropriate for novice SAS users.
Read the paper (PDF)
Barbara Ross, NA
Jessica Bennett, Snap Finance
Session 2480-2016:
Performing Pattern Matching by Using Perl Regular Expressions
SAS® software provides many DATA step functions that search and extract patterns from a character string, such as SUBSTR, SCAN, INDEX, TRANWRD, etc. Using these functions to perform pattern matching often requires you to use many function calls to match a character position. However, using the Perl regular expression (PRX) functions or routines in the DATA step improves pattern-matching tasks by reducing the number of function calls and making the program easier to maintain. This talk, in addition to discussing the syntax of Perl regular expressions, demonstrates many real-world applications.
Read the paper (PDF) | Download the data file (ZIP)
Arthur Li, City of Hope
Session 11846-2016:
Predicting Human Activity Sensor Data Using an Auto Neural Model with Stepwise Logistic Regression Inputs
Due to advances in medical care and the rise in living standards, life expectancy increased on average to 79 years in the US. This resulted in an increase in aging populations and increased demand for development of technologies that aid elderly people to live independently and safely. It is possible to achieve this challenge through ambient-assisted living (AAL) technologies that help elderly people live more independently. Much research has been done on human activity recognition (HAR) in the last decade. This research work can be used in the development of assistive technologies, and HAR is expected to be the future technology for e-health systems. In this research, I discuss the need to predict human activity accurately by building various models in SAS® Enterprise Miner™ 14.1 on a free public data set that contains 165,633 observations and 19 attributes. Variables used in this research represent the metrics of accelerometers mounted on the waist, left thigh, right arm, and right ankle of four individuals performing five different activities, recorded over a period of eight hours. The target variable predicts human activity such as sitting, sitting down, standing, standing up, and walking. Upon comparing different models, the winner is an auto neural model whose input is taken from stepwise logistic regression, which is used for variable selection. The model has an accuracy of 98.73% and sensitivity of 98.42%.
Read the paper (PDF) | View the e-poster or slides (PDF)
Venkata Vennelakanti, Oklahoma State University
Session 9621-2016:
Predicting If Mental Health Facilities Will Offer Free Treatment to Patients Who Cannot Afford It
There is a growing recognition that mental health is a vital public health and development issue worldwide. Considerable studies have reported that there are close interactions between poverty and mental illness. The association between mental illness and poverty is cyclic and negative. Impoverished people are generally more prone to mental illness and are less capable to afford treatment. Likewise, people with mental health problems are more likely to be in poverty. The availability of free mental health treatment to people in poverty is critical to break this vicious cycle. Based on this hypothesis, a model was developed based on the responses provided by mental health facilities to a federally supported survey. We examined if we can predict whether the mental health facilities will offer free treatment to the patients who cannot afford treatment costs in the United States. About a third of the 9,076 mental health facilities who responded to the survey stated that they offer free treatment to the patients incapable of paying. Different machine learning algorithms and regression models were assessed to predict correctly which facility would offer treatment at no cost. Using a neural network model in conjunction with a decision tree for input variable selection, we found that the best performing model can predict the mental health facilities with an overall accuracy of 71.49%. Sensitivity and specificity of the selected neural network model was 82.96 and 56.57% respectively. The top five most important covariates that explained the model's predictive power are: ownership of mental health facilities, whether facilities provide a sliding fee scale, the type of facility, whether facilities provide mental health treatment service in Spanish, and whether facilities offer smoking cession services. More detailed spatial and descriptive analysis of those key variables will be conducted.
Read the paper (PDF)
Ram Poudel, Oklahoma state university
Lynn Xiang, Oklahoma State University
Session 11681-2016:
Predicting Occurrence Rate of Systemic Lupus Erythematosus and Rheumatoid Arthritis in Pregnant Women
Years ago, doctors advised women with autoimmune diseases such as systemic lupus erythematosus (SLE) and rheumatoid arthritis (RA) not to become pregnant for fear of maternal health. Now, it is known that healthy pregnancy is possible for women with lupus but at the expense of a higher pregnancy complication rate. The main objective of this research is to identify key factors contributing to these diseases and to predict the occurrence rate of SLE and RA in pregnant women. Based on the approach used in this study, the prediction of adverse pregnancy outcomes for women with SLE, RA, and other diseases such as diabetes mellitus (DM) and antiphospholipid antibody syndrome (APS) can be carried out. These results will help pregnant women undergo a healthy pregnancy by receiving proper medication at an earlier stage. The data set was obtained from Cerner Health Facts data warehouse. The raw data set contains 883,473 records and 85 variables such as diagnosis code, age, race, procedure code, admission date, discharge date, total charges, and so on. Analyses were carried out with two different data sets--one for SLE patients and the other for RA patients. The final data sets had 398,742 and 397,898 records each for modeling SLE and RA patients, respectively. To provide an honest assessment of the models, the data was split into training and validation using the data partition node. Variable selection techniques such as LASSO, LARS, stepwise regression, and forward regression were used. Using a decision tree, prominent factors that determine the SLE and RA occurrence rate were identified separately. Of all the predictive models run using SAS® Enterprise Miner™ 12.3, the model comparison node identified the decision tree (Gini) as the best model with the least misclassification rate of 0.308 to predict the SLE patients and 0.288 to predict the RA patients.
Read the paper (PDF)
Ravikrishna Vijaya Kumar, Oklahoma State University
SHRIE RAAM SATHYANARAYANAN, Oklahoma State University - Center for Health Sciences
Session 11140-2016:
Predicting Rare Events Using Specialized Sampling Techniques in SAS®
In recent years, many companies are trying to understand the rare events that are very critical in the current business environment. But a data set with rare events is always imbalanced and the models developed using this data set cannot predict the rare events precisely. Therefore, to overcome this issue, a data set needs to be sampled using specialized sampling techniques like over-sampling, under-sampling, or the synthetic minority over-sampling technique (SMOTE). The over-sampling technique deals with randomly duplicating minority class observations, but this technique might bias the results. The under-sampling technique deals with randomly deleting majority class observations, but this technique might lose information. SMOTE sampling deals with creating new synthetic minority observations instead of duplicating minority class observations or deleting the majority class observations. Therefore, this technique can overcome the problems, like biased results and lost information, found in other sampling techniques. In our research, we used an imbalanced data set containing results from a thyroid test with 3,163 observations, out of which only 4.7 percent of the observations had positive test results. Using SAS® procedures like PROC SURVERYSELECT and PROC MODECLUS, we created over-sampled, under-sampled, and the SMOTE sampled data set in SAS® Enterprise Guide®. Then we built decision tree, gradient boosting, and rule induction models using four different data sets (non-sampled, majority under-sampled, minority over-sampled with majority under-sampled, and minority SMOTE sampled with majority under-sampled) in SAS® Enterprise Miner™. Finally, based on the receiver operating characteristic (ROC) index, Kolmogorov-Smirnov statistics, and the misclassification rate, we found that the models built using minority SMOTE sampled with the majority under-sampled data yields better output for this data set.
Read the paper (PDF)
Rhupesh Damodaran Ganesh Kumar, Oklahoma State University (SAS and OSU data mining Certificate)
Kiren Raj Mohan Jagan Mohan, Zions Bancorporation
Session 11671-2016:
Predicting the Influence of Demographics on Domestic Violence Using SAS® Enterprise Guide® 6.1 and SAS® Enterprise Miner™ 12.3
The Oklahoma State Department of Health (OSDH) conducts home visiting programs with families that need parental support. Domestic violence is one of the many screenings performed on these visits. The home visiting personnel are trained to do initial screenings; however, they do not have the extensive information required to treat or serve the participants in this arena. Understanding how demographics such as age, level of education, and household income among others, are related to domestic violence might help home visiting personnel better serve their clients by modifying their questions based on these demographics. The objective of this study is to better understand the demographic characteristics of those in the home visiting programs who are identified with domestic violence. We also developed predictive models such as logistic regression and decision trees based on understanding the influence of demographics on domestic violence. The study population consists of all the women who participated in the Children First Program of the OSDH from 2012 to 2014. The data set contains 1,750 observations collected during screening by the home visiting personnel over the two-year period. In addition, they must have completed the Demographic form as well as the Relationship Assessment form at the time of intake. Univariate and multivariate analysis has been performed to discover the influence that age, education, and household income have on domestic violence. From the initial analysis, we can see that women who are younger than 25 years old, who haven't completed high school, and who are somewhat dependent on their husbands or partners for money are most vulnerable. We have even segmented the clients based on the likelihood of domestic violence.
View the e-poster or slides (PDF)
Soumil Mukherjee, Oklahoma State University
Goutam Chakraborty, Oklahoma State University
Miriam McGaugh, Oklahoma state department of Health
Session 7560-2016:
Processing CDC and SCD Type 2 for Sources without CDC: A Hybrid Approach
In a data warehousing system, change data capture (CDC) plays an important part not just in making the data warehouse (DWH) aware of the change but also in providing a means of flowing the change to the DWH marts and reporting tables so that we see the current and latest version of the truth. This and slowly changing dimensions (SCD) create a cycle that runs the DWH and provides valuable insights in the history and for the decision-making future. What if the source has no CDC? It would be an ETL nightmare to identify the exact change and report the absolute truth. If these two processes can be combined into a single process where just one single transform does both jobs of identifying the change and applying the change to the DWH, then we can save significant processing times and value resources of the system. Hence, I came up with a hybrid SCD with CDC approach for this. My paper focuses on sources that DO NOT have CDC in their sources and need to perform SCD Type 2 on such records without worrying about data duplications and increased processing times.
Read the paper (PDF) | Watch the recording
Vishant Bhat, University of Newcastle
Tony Blanch, SAS Consultant
Session 10481-2016:
Product Purchase Sequence Analyses by Using a Horizontal Data Sorting Technique
Horizontal data sorting is a very useful SAS® technique in advanced data analysis when you are using SAS programming. Two years ago (SAS® Global Forum Paper 376-2013), we presented and illustrated various methods and approaches to perform horizontal data sorting, and we demonstrated its valuable application in strategic data reporting. However, this technique can also be used as a creative analytic method in advanced business analytics. This paper presents and discusses its innovative and insightful applications in product purchase sequence analyses such as product opening sequence analysis, product affinity analysis, next best offer analysis, time-span analysis, and so on. Compared to other analytic approaches, the horizontal data sorting technique has the distinct advantages of being straightforward, simple, and convenient to use. This technique also produces easy-to-interpret analytic results. Therefore, the technique can have a wide variety of applications in customer data analysis and business analytics fields.
Read the paper (PDF) | View the e-poster or slides (PDF)
Justin Jia, Trans Union Canada
Shan Shan Lin, CIBC
R
Session 10621-2016:
Risk Adjustment Methods in Value-Based Reimbursement Strategies
Value-based reimbursement is the emerging strategy in the US healthcare system. The premise of value-based care is simple in concept--high quality and low cost provides the greatest value to patients and the various parties that fund their coverage. The basic equation for value is equally simple to compute: value=quality/cost. However, there are significant challenges to measuring it accurately. Error or bias in measuring value could result in the failure of this strategy to ultimately improve the healthcare system. This session discusses various methods and issues with risk adjustment in a value-based reimbursement model. Risk adjustment is an essential tool for ensuring that fair comparisons are made when deciding what health services and health providers have high value. The goal this presentation is to give analysts an overview of risk adjustment and to provide guidance for when, why, and how to use risk adjustment when quantifying performance of health services and healthcare providers on both cost and quality. Statistical modeling approaches are reviewed and practical issues with developing and implementing the models are discussed. Real-world examples are also provided.
Read the paper (PDF)
Daryl Wansink, Conifer Value Based Care
S
Session 2020-2016:
SAS® Grid Architecture Solution Using IBM Hardware
This session is an in-depth review of SAS® Grid performance on IBM Hardware. This review spans our environment's growth over the last four years and includes the latest upgrade to our environment from the first maintenance release of SAS® 9.3 to the third maintenance release of SAS® 9.4 (and doing a hardware refresh in the process).
Read the paper (PDF)
Whayne Rouse, Humana
Andrew Scott, Humana
Session 10260-2016:
SAS® Macro for Generalized Method of Moments Estimation for Longitudinal Data with Time-Dependent Covariates
Longitudinal data with time-dependent covariates is not readily analyzed as there are inherent, complex correlations due to the repeated measurements on the sampling unit and the feedback process between the covariates in one time period and the response in another. A generalized method of moments (GMM) logistic regression model (Lalonde, Wilson, and Yin 2014) is one method for analyzing such correlated binary data. While GMM can account for the correlation due to both of these factors, it is imperative to identify the appropriate estimating equations in the model. Cai and Wilson (2015) developed a SAS® macro using SAS/IML® software to fit GMM logistic regression models with extended classifications. In this paper, we expand the use of this macro to allow for continuous responses and as many repeated time points and predictors as possible. We demonstrate the use of the macro through two examples, one with binary response and another with continuous response.
Read the paper (PDF)
Katherine Cai, Arizona State University
Jeffrey Wilson, Arizona State University
Session SAS5880-2016:
SAS® Mobile Analytics: Accelerate Analytical Insights on the Go
Mobile devices are an integral part of a business professional's life. These mobile devices are getting increasingly powerful in terms of processor speeds and memory capabilities. Business users can benefit from a more analytical visualization of the data along with their business context. The new SAS® Mobile BI contains many enhancements that facilitate the use of SAS® Analytics in the newest version of SAS® Visual Analytics. This paper demonstrates how to use the new analytical visualization that has been added to SAS Mobile BI from SAS Visual Analytics, for a richer and more insightful experience for business professionals on the go.
Read the paper (PDF)
Murali Nori, SAS
Session 11480-2016:
Solving a Business Problem in SAS® Enterprise Guide®: Creating a "Layered" Inpatient Indicator Model
This paper describes a Kaiser Permanente Northwest business problem regarding tracking recent inpatient hospital utilization at external hospitals, and how it was solved with the flexibility of SAS® Enterprise Guide®. The Inpatient Indicator is an estimate of our regional inpatient hospital utilization as of yesterday. It tells us which of our members are in which hospitals. It measures inpatient admissions, which are health care interactions where a patient is admitted to a hospital for bed occupancy to receive hospital services. The Inpatient Indicator is used to produce data and create metrics and analysis essential to the decision making of Kaiser Permanente executives, care coordinators, patient navigators, utilization management physicians, and operations managers. Accurate, recent hospital inpatient information is vital for decisions regarding patient care, staffing, and member utilization. Due to a business policy change, Kaiser Permanente Northwest lost the ability to track urgent and emergent inpatient admits at external, non-plan hospitals through our referral system, which was our data source for all recent external inpatient admits. Without this information, we did not have complete knowledge of whether a member had an inpatient stay at an external hospital until a claim was received, which could be several weeks after the member was admitted. Other sources were needed to understand our inpatient utilization at external hospitals. A tool was needed with the flexibility to easily combine and compare multiple data sets with different field names, formats, and values representing the same metric. The tool needed to be able to import data from different sources and export data to different destinations. We also needed a tool that would allow this project to be scheduled. We chose to build the model with SAS Enterprise Guide.
View the e-poster or slides (PDF)
Thomas Gant, Kaiser Permanente
Session 11773-2016:
Statistical Comparisons of Disease Prevalence Rates Using the Bootstrap Procedure
Disease prevalence is one of the most basic measures of the burden of disease in the field of epidemiology. As an estimate of the total number of cases of disease in a given population, prevalence is a standard in public health analysis. The prevalence of diseases in a given area is also frequently at the core of governmental policy decisions, charitable organization funding initiatives, and countless other aspects of everyday life. However, all too often, prevalence estimates are restricted to descriptive estimates of population characteristics when they could have a much wider application through the use of inferential statistics. As an estimate based on a sample from a population, disease prevalence can vary based on random fluctuations in that sample rather than true differences in the population characteristic. Statistical inference uses a known distribution of this sampling variation to perform hypothesis tests, calculate confidence intervals, and perform other advanced statistical methods. However, there is no agreed-upon sampling distribution of the prevalence estimate. In cases where the sampling distribution of an estimate is unknown, statisticians frequently rely on the bootstrap re-sampling procedure first given by Efron in 1979. This procedure relies on the computational power of software to generate repeated pseudo-samples similar in structure to an original, real data set. These multiple samples allow for the construction of confidence intervals and statistical tests to make statistical determinations and comparisons using the estimated prevalence. In this paper, we use the bootstrapping capabilities of SAS® 9.4 to compare statistically the difference between two given prevalence rates. We create a bootstrap analog to the two-sample t test to compare prevalence rates from two states despite the fact that the sampling distribution of these estimates is unknown using SAS®.
Read the paper (PDF)
Matthew Dutton, Florida A&M University
Charlotte Baker, Florida A&M University
Session 11682-2016:
Survival Analysis of the Patients Diagnosed with Non-Small Cell Lung Cancer Using SAS® Enterprise Miner™ 13.1
Cancer is the second-leading cause of deaths in the United States. About 10% to 15% of all lung cancers are non-small cell lung cancer, but they constitute about 26.8% of cancer deaths. An efficient cancer treatment plan is therefore of prime importance for increasing the survival chances of a patient. Cancer treatments are generally given to a patient in multiple sittings and often doctors tend to make decisions based on the improvement over time. Calculating the survival chances of the patient with respect to time and determining the various factors that influence the survival time would help doctors make more informed decisions about the further course of treatment and also help patients develop a proactive approach in making choices for the treatment. The objective of this paper is to analyze the survival time of patients suffering from non-small cell lung cancer, identify the time interval crucial for their survival, and identify various factors such as age, gender, and treatment type. The performances of the models built are analyzed to understand the significance of cubic splines used in these models. The data set is from the Cerner database with 548 records and 12 variables from 2009 to 2013. The patient records with loss to follow-up are censored. The survival analysis is performed using parametric and nonparametric methods in SAS® Enterprise Miner™ 13.1. The analysis revealed that the survival probability of a patient is high within two weeks of hospitalization and the probability of survival goes down by 60% between weeks 5 and 9 of admission. Age, gender, and treatment type play prominent roles in influencing the survival time and risk. The probability of survival for female patients decreases by 70% and 80% during weeks 6 and 7, respectively, and for male patients the probability of survival decreases by 70% and 50% during weeks 8 and 13, respectively.
Read the paper (PDF)
Raja Rajeswari Veggalam, Oklahoma State University
Goutam Chakraborty, Oklahoma State University
Akansha Gupta, Oklahoma State University
T
Session SAS2560-2016:
Ten Tips to Unlock the Power of Hadoop with SAS®
This paper discusses a set of practical recommendations for optimizing the performance and scalability of your Hadoop system using SAS®. Topics include recommendations gleaned from actual deployments from a variety of implementations and distributions. Techniques cover tips for improving performance and working with complex Hadoop technologies such as Kerberos, techniques for improving efficiency when working with data, methods to better leverage the SAS in Hadoop components, and other recommendations. With this information, you can unlock the power of SAS in your Hadoop system.
Read the paper (PDF)
Nancy Rausch, SAS
Wilbram Hazejager, SAS
Session 7840-2016:
Text Analytics and Assessing Patient Satisfaction
A survey is often the best way to obtain information and feedback to help in program improvement. At a bare minimum, survey research comprises economics, statistics, and psychology in order to develop a theory of how surveys measure and predict important aspects of the human condition. Through healthcare-related research papers written by experts in the industry, I hypothesize a list of several factors that are important and necessary for patients during hospital visits. Through text mining using SAS® Enterprise Miner™ 12.1, I measured tangible aspects of hospital quality and patient care by using online survey responses and comments found on the website for the National Health Service in England. I use these patient comments to determine the majority opinion and whether it correlates with expert research on hospital quality. The implications of this research are vital in comprehending our health-care system and the factors that are important in satisfying patient needs. Analyzing survey responses can help us to understand and mitigate disparities in health-care services provided to population subgroups in the United States. Starting with online survey responses can help us to understand the overall methodology and motivation needed to analyze the surveys that have been developed for departments like the Centers for Medicare and Medicaid Services (CMS).
Read the paper (PDF) | View the e-poster or slides (PDF)
Divya Sridhar, Deloitte
Session 3980-2016:
Text Analytics and Brand Topic Maps
In this session, we examine the use of text analytics as a tool for strategic analysis of an organization, a leader, a brand, or a process. The software solutions SAS® Enterprise Miner™ and Base SAS® are used to extract topics from text and create visualizations that identify the relative placement of the topics next to various business entities. We review a number of case studies that identify and visualize brand-topic relationships in the context of branding, leadership, and service quality.
Read the paper (PDF) | Watch the recording
Nicholas Evangelopoulos, University of North Texas
Session 12489-2016:
The Application of Fatality Analysis Reporting System Data on the Road Safety Education of US, DC, and PR Minors
All public schools in the United States require health and safety education for their students. Furthermore, almost all states require driver education before minors can obtain a driver's license. Through extensive analysis of the Fatality Analysis Reporting System data, we have concluded that from 2011-2013 an average of 12.1% of all individuals killed in a motor vehicle accident in the United States, District of Columbia, and Puerto Rico were minors (18 years or younger). Our goal is to offer insight within our analysis in order to better road safety education to prevent future premature deaths involving motor vehicles.
Read the paper (PDF)
Molly Funk, Bryant University
Max Karsok, Bryant University
Michelle Williams, Bryant University
Session 6541-2016:
The Baker's Dozen: What Every Biostatistician Needs to Know
It's impossible to know all of SAS® or all of statistics. There will always be some technique that you don't know. However, there are a few techniques that anyone in biostatistics should know. If you can calculate those with SAS, life is all the better. In this session you will learn how to compute and interpret a baker's dozen of these techniques, including several statistics that are frequently confused. The following statistics are covered: prevalence, incidence, sensitivity, specificity, attributable fraction, population attributable fraction, risk difference, relative risk, odds ratio, Fisher's exact test, number needed to treat, and McNemar's test. With these 13 tools in their tool chest, even nonstatisticians or statisticians who are not specialists will be able to answer many common questions in biostatistics. The fact that each of these can be computed with a few statements in SAS makes the situation all the sweeter. Bring your own doughnuts.
Read the paper (PDF)
AnnMaria De Mars, 7 Generation Games
Session 7120-2016:
The Combination of SAS® and VBA Makes Life Easier
VBA has been described as a glue language, and has been widely used in exchanging data between Microsoft products such as Excel and Word or PowerPoint. How to trigger the VBA macro from SAS® via DDE has been widely discussed in recent years. However, using SAS to send parameters to a VBA macro was seldom reported. This paper provides a solution for this problem. Copying Excel tables to PowerPoint using the combination of SAS and VBA is illustrated as an example. The SAS program rapidly scans all Excel files that are contained in one folder, passes the file information to VBA as parameters, and triggers the VBA macro to write PowerPoint files in a loop. As a result, a batch of PowerPoint files can be generated by just one mouse-click.
Read the paper (PDF) | Watch the recording
Zhu Yanrong, Medtronic
Session SAS6477-2016:
The Optimization of the Optimal Customer
For marketers who are responsible for identifying the best customer to target in a campaign, it is often daunting to determine which media channel, offer, or campaign program is the one the customer is more apt to respond to, and therefore, is more likely to increase revenue. This presentation examines the components of designing campaigns to identify promotable segments of customers and to target the optimal customers using SAS® Marketing Automation integrated with SAS® Marketing Optimization.
Read the paper (PDF)
Pamela Dixon, SAS
Session 5181-2016:
The Use of Statistical Sampling in Auditing Health-Care Insurance Claim Payments
This paper is a primer on the practice of designing, selecting, and making inferences on a statistical sample, where the goal is to estimate the magnitude of error in a book value total. Although the concepts and syntax are presented through the lens of an audit of health-care insurance claim payments, they generalize to other contexts. After presenting the fundamental measures of uncertainty that are associated with sample-based estimates, we outline a few methods to estimate the sample size necessary to achieve a targeted precision threshold. The benefits of stratification are also explained. Finally, we compare several viable estimators to quantify the book value discrepancy, making note of the scenarios where one might be preferred over the others.
Read the paper (PDF)
Taylor Lewis, U.S. Office of Personnel Management
Julie Johnson, OPM - Office of the Inspector General
Christine Muha, U.S. Office of Personnel Management
Session 7020-2016:
Three Methods to Dynamically Assign Colors to Plots Based on Group Value
Specifying colors based on group value is a quite popular practice in visualizing data, but it is not so easy to do, especially when there are multiple group values. This paper explores three different methods to dynamically assign colors to plots based on their group values. They are combining EVAL and IFN functions in the plot statements; bringing the DISCRETEATTRMAP block into the plot statements; and using the macro from the SAS® sample 40255.
Read the paper (PDF) | Watch the recording
Amos Shu, MedImmune
Session SAS3441-2016:
Tips and Techniques for Using Site-Signed HTTPS with SAS® 9.4
Are you going to enable HTTPS for your SAS® environment? Looking to improve the security of your SAS deployment? Do you need more details about how to efficiently configure HTTPS? This paper guides you through the configuration of SAS® 9.4 with HTTPS for the SAS middle tier. We examine how best to implement site-signed Transport Layer Security (TLS) certificates and explore how far you can take the encryption. This paper presents tips and proven practices that can help you be successful.
Read the paper (PDF)
Stuart Rogers, SAS
U
Session 12600-2016:
Unlocking Healthcare Data with Cloudera Enterprise Data Hub and SAS® to Improve Healthcare through Analytics
Over the years, complex medical data has been captured and retained in a variety of legacy platforms that are characterized by special formats, hierarchical/network relationships, and relational databases. Due to the complex nature of the data structures and the capacity constraints associated with outdated technologies, high-value healthcare data locked in legacy systems has been restricted in its use for analytics. With the emergence of highly scalable, big data technologies such as Hadoop, now it's possible to move, transform, and enable previously locked legacy data economically. In addition, sourcing such data once on an enterprise data hub enables the ability to efficiently store, process, and analyze the data from multiple perspectives. This presentation illustrates how legacy data is not only unlocked but enabled for rapid visual analytics by using Cloudera's enterprise data hub for data transformation and the SAS® ecosystem.
Read the paper (PDF)
Session 9940-2016:
Use Capture-Recapture Model in SAS® to Estimate Prevalence of Disease
Administrative health databases, including hospital and physician records, are frequently used to estimate the prevalence of chronic diseases. Disease-surveillance information is used by policy makers and researchers to compare the health of populations and develop projections about disease burden. However, not all cases are captured by administrative health databases, which can result in biased estimates. Capture-recapture (CR) models, originally developed to estimate the sizes of animal populations, have been adapted for use by epidemiologists to estimate the total sizes of disease populations for such conditions as cancer, diabetes, and arthritis. Estimates of the number of cases are produced by assessing the degree of overlap among incomplete lists of disease cases captured in different sources. Two- and three-source CR models are most commonly used, often with covariates. Two important assumptions--independence of capture in each data source and homogeneity of capture probabilities, which underlie conventional CR models--are unlikely to hold in epidemiological studies. Failure to satisfy these assumptions bias the model results. Log-linear, multinomial logistic regression, and conditional logistic regression models, if used properly, can incorporate dependency among sources and covariates to model the effect of heterogeneity in capture probabilities. However, none of these models is optimal, and researchers might be unfamiliar with how to use them in practice. This paper demonstrates how to use SAS® to implement the log-linear, multinomial logistic regression, and conditional logistic regression CR models. Methods to address the assumptions of independence between sources and homogeneity of capture probabilities for a three-source CR model are provided. The paper uses a real numeric data set about Parkinson's disease involving physician claims, hospital abstract, and prescription drug records from one Canadian province. Advantages and disadvantages of each model are discus sed.
View the e-poster or slides (PDF)
Lisa Lix, University of Manitoba
Session 11844-2016:
Using Analytics to Devise Marketing Strategies for New Business
Someone has aptly said, Las Vegas looks the way one would imagine heaven must look at night. What if you know the secret to run a plethora of various businesses in the entertainment capital of the world? Nothing better, right? Well, we have what you want, all the necessary ingredients for you to precisely know what business to target in a particular locality of Las Vegas. Yelp, a community portal, wants to help people finding great local businesses. They cover almost everything from dentists and hair stylists through mechanics and restaurants. Yelp's users, Yelpers, write reviews and give ratings for all types of businesses. Yelp then uses this data to make recommendations to the Yelpers about which institutions best fit their individual needs. We have the yelp academic data set comprising 1.6 million reviews and 500K tips by 366K in 61K businesses across several cities. We combine current Yelp data from all the various data sets for Las Vegas to create an interactive map that provides an overview of how a business runs in a locality and how the ratings and reviews tickers a business. We answer the following questions: Where is the most appropriate neighborhood to start a new business (such as cafes, bars, and so on)? Which category of business has the greatest total count of reviews that is the most talked about (trending) business in Las Vegas? How does a business' working hours affect the customer reviews and the corresponding rating of the business? Our findings present research for further understanding of perceptions of various users, while giving reviews and ratings for the growth of a business by encompassing a variety of topics in data mining and data visualization.
View the e-poster or slides (PDF)
Anirban Chakraborty, Oklahoma State University
Session 10920-2016:
Using Animation to Make Statistical Graphics Come to Life
The Statistical Graphics (SG) procedures and the Graph Template Language (GTL) are capable of generating powerful individual data displays. What if one wanted to visualize how a distribution changes with different parameters, or to view multiple aspects of a three-dimensional plot, just as two examples? By using macros to generate a graph for each frame, combined with the ODS PRINTER destination, it is possible to create GIF files to create effective animated data displays. This paper outlines the syntax and strategy necessary to generate these displays and provides a handful of examples. Intermediate knowledge of PROC SGPLOT, PROC TEMPLATE, and the SAS® macro language is assumed.
Read the paper (PDF)
Jesse Pratt, Cincinnati Children's Hospital Medical Center
Session 1680-2016:
Using GENMOD to Analyze Correlated Data on Military Health System Beneficiaries Receiving Behavioral Health Care in South Carolina Health Care System
Many SAS® procedures can be used to analyze large amounts of correlated data. This study was a secondary analysis of data obtained from the South Carolina Revenue and Fiscal Affairs Office (RFA). The data includes medical claims from all health care systems in South Carolina (SC). This study used the SAS procedure GENMOD to analyze a large amount of correlated data about Military Health Care (MHS) system beneficiaries who received behavioral health care in South Carolina Health Care Systems from 2005 to 2014. Behavioral health (BH) was defined by Major Diagnostic Code (MDC) 19 (mental disorders and diseases) and 20 (alcohol/drug use). MDCs are formed by dividing all possible principal diagnoses from the International Classification Diagnostic (ICD-9) codes into 25 mutually exclusive diagnostic categories. The sample included a total of 6,783 BH visits and 4,827 unique adult and child patients that included military service members, veterans, and their adult and child dependents who have MHS insurance coverage. PROC GENMOD included a multivariate GEE model with type of BH visit (mental health or substance abuse) as the dependent variable, and with gender, race group, age group, and discharge year as predictors. Hospital ID was used in the repeated statement with different correlation structures. Gender was significant with both independent correlation (p = .0001) and exchangeable structure (p = .0003). However, age group was significant using the independent correlation (p = .0160), but non-significant using the exchangeable correlation structure (p = .0584). SAS is a powerful statistical program for analyzing large amounts of correlated data with categorical outcomes.
Read the paper (PDF) | View the e-poster or slides (PDF)
Abbas Tavakoli, University of South Carolina
Jordan Brittingham, USC/ Arnold School of Public Health
Nikki R. Wooten, USC/College of Social Work
Session SAS6660-2016:
Using Metadata Queries To Build Row-Level Audit Reports in SAS® Visual Analytics
Sensitive data requires elevated security requirements and the flexibility to apply logic that subsets data based on user privileges. Following the instructions in SAS® Visual Analytics: Administration Guide gives you the ability to apply row-level permission conditions. After you have set the permissions, you have to prove through audits who has access and row-level security. This paper provides you with the ability to easily apply, validate, report, and audit all tables that have row-level permissions, along with the groups, users, and conditions that will be applied. Take the hours of maintenance and lack of visibility out of row-level secure data and build confidence in the data and analytics that are provided to the enterprise.
Read the paper (PDF) | Download the data file (ZIP)
Brandon Kirk, SAS
Session 5581-2016:
Using PROC TABULATE and LAG(n) Function for Rates of Change
For SAS® users, PROC TABULATE and PROC REPORT (and its compute blocks) are probably among the most common procedures for calculating and displaying data. It is, however, pretty difficult to calculate and display changes from one column to another using data from other rows with just these two procedures. Compute blocks in PROC REPORT can calculate additional columns, but it would be challenging to pick up values from other rows as inputs. This presentation shows how PROC TABULATE can work with the lag(n) function to calculate rates of change from one period of time to another. This offers the flexibility of feeding into calculations the data retrieved from other rows of the report. PROC REPORT is then used to produce the desired output. The same approach can also be used in a variety of scenarios to produce customized reports.
Read the paper (PDF) | Download the data file (ZIP) | Watch the recording
Khoi To, Office of Planning and Decision Support, Virginia Commonwealth University
Session 11669-2016:
Using Predictive Analysis to Optimize Pharmaceutical Marketing
Most businesses have benefited from using advanced analytics for marketing and other decision making. But to apply analytical techniques to pharmaceutical marketing is challenging and emerging as it is critical to ensure that the analysis makes sense from the medical side. The drug for a specific disease finally consumed is directly or indirectly influenced by many factors, including the disease origins, health-care system policy, physicians' clinical decisions, and the patients' perceptions and behaviors. The key to pharmaceutical marketing is in identifying the targeted populations for specific diseases and to focus on those populations. Because the health-care environment consistently changes, the predictive models are important to predict the change of the targeted population over time based on the patient journey and epidemiology. Time series analysis is used to forecast the number of cases of infectious diseases; correspondingly, over the counter and prescribed medicines for the specific disease could be predicted. The accurate prediction provides valuable information for the strategic plan of campaigns. For different diseases, different analytical techniques are applied. By taking the medical features of the disease and epidemiology into account, the prediction of the potential and total addressable markets can reveal more insightful marketing trends. And by simulating the important factors and quantifying how they impact the patient journey within the typical health-care system, the most accurate demand for specific medicines or treatments could be discovered. Through monitoring the parameters in the dynamic simulation, the smart decision can be made using what-if comparisons to optimize the marketing result.
Read the paper (PDF)
Xue Yao, Winnipeg Regional Health Aurthority
Session 11901-2016:
Using Propensity Score Analyses to Adjust for Selection Bias: A Study of Adolescent Mental Illness and Substance Use
An important strength of observational studies is the ability to estimate a key behavior's or treatment's effect on a specific health outcome. This is a crucial strength as most health outcomes research studies are unable to use experimental designs due to ethical and other constraints. Keeping this in mind, one drawback of observational studies (that experimental studies naturally control for) is that they lack the ability to randomize their participants into treatment groups. This can result in the unwanted inclusion of a selection bias. One way to adjust for a selection bias is through the use of a propensity score analysis. In this study, we provide an example of how to use these types of analyses. Our concern is whether recent substance abuse has an effect on an adolescent's identification of suicidal thoughts. In order to conduct this analysis, a selection bias was identified and adjustment was sought through three common forms of propensity scoring: stratification, matching, and regression adjustment. Each form is separately conducted, reviewed, and assessed as to its effectiveness in improving the model. Data for this study was gathered through the Youth Risk Behavior Surveillance System, an ongoing nation-wide project of the Centers for Disease Control and Prevention. This presentation is designed for any level of statistician, SAS® programmer, or data analyst with an interest in controlling for selection bias, as well as for anyone who has an interest in the effects of substance abuse on mental illness.
Read the paper (PDF)
Deanna Schreiber-Gregory, National University
Session 9881-2016:
Using SAS® Arrays to Calculate Bouts of Moderate to Vigorous Physical Activity from Minute-by-Minute Fitbit Data
The increasing popularity and affordability of wearable devices, together with their ability to provide granular physical activity data down to the minute, have enabled researchers to conduct advanced studies on the effects of physical activity on health and disease. This provides statistical programmers the challenge of processing data and translating it into analyzable measures. One such measure is the number of time-specific bouts of moderate to vigorous physical activity (MVPA) (similar to exercise), which is needed to determine whether the participant meets current physical activity guidelines (for example, 150 minutes of MVPA per week performed in bouts of at least 20 minutes). In this paper, we illustrate how we used SAS® arrays to calculate the number of 20-minute bouts of MVPA per day. We provide working code on how we processed Fitbit Flex data from 63 healthy volunteers whose physical activities were monitored daily for a period of 12 months.
Read the paper (PDF) | Download the data file (ZIP)
Faith Parsons, Columbia University Medical Center
Keith M Diaz, Columbia University Medical Center
Jacob E Julian, Columbia University Medical Center
Session 10729-2016:
Using SAS® to Conduct Multivariate Statistical Analysis in Educational Research: Exploratory Factor Analysis and Confirmatory Factor Analysis
Multivariate statistical analysis plays an increasingly important role as the number of variables being measured increases in educational research. In both cognitive and noncognitive assessments, many instruments that researchers aim to study contain a large number of variables, with each measured variable assigned to a specific factor of the bigger construct. Recalling the educational theories or empirical research, the factor of each instrument usually emerges the same way. Two types of factor analysis are widely used in order to understand the latent relationships among these variables based on different scenarios. (1) Exploratory factor analysis (EFA), which is performed by using the SAS® procedure PROC FACTOR, is an advanced statistical method used to probe deeply into the relationship among the variables and the larger construct and then develop a customized model for the specific assessment. (2) When a model is established, confirmatory factor analysis (CFA) is conducted by using the SAS procedure PROC CALIS to examine the model fit of specific data and then make adjustments for the model as needed. This paper presents the application of SAS to conduct these two types of factor analysis to fulfill various research purposes. Examples using real noncognitive assessment data are demonstrated, and the interpretation of the fit statistics is discussed.
Read the paper (PDF)
Jun Xu, Educational Testing Service
Steven Holtzman, Educational Testing Service
Kevin Petway, Educational Testing Service
Lili Yao, Educational Testing Service
Session 8260-2016:
Using SAS® to Integrate the LACE Readmissions Risk Score into the Electronic Health Record
The LACE readmission risk score is a methodology used by Kaiser Permanente Northwest (KPNW) to target and customize readmission prevention strategies for patients admitted to the hospital. This presentation shares how KPNW used SAS® in combination with Epic's Datalink to integrate the LACE score into its electronic health record (EHR) for usage in real time. The LACE score is an objective measure, composed of four components including: L) length of stay; A) acuity of admission; C) pre-existing co-morbidities; and E) Emergency department (ED) visits in the prior six months. SAS was used to perform complex calculations and combine data from multiple sources (which was not possible for the EHR alone), and then to calculate a score that was integrated back into the EHR. The technical approach includes a trigger macro to kick off the process once the database ETL completes, several explicit and implicit proc SQL statements, a volatile temp table for filtering, and a series of SORT, MEANS, TRANSPOSE, and EXPORT procedures. We walk through the technical approach taken to generate and integrate the LACE score into Epic, as well as describe the challenges we faced, how we overcame them, and the beneficial results we have gained throughout the process.
View the e-poster or slides (PDF)
Delilah Moore, Kaiser Permanente
Session 7340-2016:
Using a Single SAS® Radar Graphic to Describe Multiple Binary Variables within a Population
In health care and epidemiological research, there is a growing need for basic graphical output that is clear, easy to interpret, and easy to create. SAS® 9.3 has a very clear and customizable graphic called a Radar Graph, yet it can only display the unique responses of one variable and would not be useful for multiple binary variables. In this paper we describe a way to display multiple binary variables for a single population on a single radar graph. Then we convert our method into a macro with as few parameters as possible to make this procedure available for everyday users.
Read the paper (PDF) | Download the data file (ZIP)
Kevin Sundquist, Columbia University Medical Center
Jacob E Julian, Columbia University Medical Center
Faith Parsons, Columbia University Medical Center
W
Session SAS2400-2016:
What's New in SAS® Data Management
The latest releases of SAS® Data Integration Studio, SAS® Data Management Studio and SAS® Data Integration Server, SAS® Data Governance, and SAS/ACCESS® software provide a comprehensive and integrated set of capabilities for collecting, transforming, and managing your data. The latest features in the product suite include capabilities for working with data from a wide variety of environments and types including Hadoop, cloud, RDBMS, files, unstructured data, streaming, and others, and the ability to perform ETL and ELT transformations in diverse run-time environments including SAS®, database systems, Hadoop, Spark, SAS® Analytics, cloud, and data virtualization environments. There are also new capabilities for lineage, impact analysis, clustering, and other data governance features for enhancements to master data and support metadata management. This paper provides an overview of the latest features of the SAS® Data Management product suite and includes use cases and examples for leveraging product capabilities.
Read the paper (PDF)
Nancy Rausch, SAS
Session SAS5520-2016:
When the Answer to Public or Private Is Both: Managing a Hybrid Cloud Environment
For many organizations, the answer to whether to manage their data and analytics in a public or private cloud is going to be both. Both can be the answer for many different reasons: common sense logic not to replace a system that already works just to incorporate something new; legal or corporate regulations that require some data, but not all data, to remain in place; and even a desire to provide local employees with a traditional data center experience while providing remote or international employees with cloud-based analytics easily managed through software deployed via Amazon Web Services (AWS). In this paper, we discuss some of the unique technical challenges of managing a hybrid environment, including how to monitor system performance simultaneously for two different systems that might not share the same infrastructure or even provide comparable system monitoring tools; how to manage authorization when access and permissions might be driven by two different security technologies that make implementation of a singular protocol problematic; and how to ensure overall automation of two platforms that might be independently automated, but not originally designed to work together. In this paper, we share lessons learned from a decade of experience implementing hybrid cloud environments.
Read the paper (PDF)
Ethan Merrill, SAS
Bryan Harkola, SAS
Session 9440-2016:
Who's Your Neighbor? A SAS® Algorithm for Finding Nearby Zip Codes
Even if you're not a GIS mapping pro, it pays to have some geographic problem-solving techniques in your back pocket. In this paper we illustrate a general approach to finding the closest location to any given US zip code, with a specific, user-accessible example of how to do it, using only Base SAS®. We also suggest a method for implementing the solution in a production environment, as well as demonstrate how parallel processing can be used to cut down on computing time if there are hardware constraints.
Read the paper (PDF) | Download the data file (ZIP)
Andrew Clapson, MD Financial Management
Annmarie Smith, HomeServe USA
Y
Session 10600-2016:
You Can Bet on It: Missing Observations Are Preserved with the PRELOADFMT and COMPLETETYPES Options
Do you write reports that sometimes have missing categories across all class variables? Some programmers write all sorts of additional DATA step code in order to show the zeros for the missing rows or columns. Did you ever wonder whether there is an easier way to accomplish this? PROC MEANS and PROC TABULATE, in conjunction with PROC FORMAT, can handle this situation with a couple of powerful options. With PROC TABULATE, we can use the PRELOADFMT and PRINTMISS options in conjunction with a user-defined format in PROC FORMAT to accomplish this task. With PROC SUMMARY, we can use the COMPLETETYPES option to get all the rows with zeros. This paper uses examples from Census Bureau tabulations to illustrate the use of these procedures and options to preserve missing rows or columns.
Read the paper (PDF) | Watch the recording
Chris Boniface, Census Bureau
Janet Wysocki, U.S. Census Bureau
back to top