Research and Development Papers A-Z

A
Paper 1603-2015:
A Model Family for Hierarchical Data with Combined Normal and Conjugate Random Effects
Non-Gaussian outcomes are often modeled using members of the so-called exponential family. Notorious members are the Bernoulli model for binary data, leading to logistic regression, and the Poisson model for count data, leading to Poisson regression. Two of the main reasons for extending this family are (1) the occurrence of overdispersion, meaning that the variability in the data is not adequately described by the models, which often exhibit a prescribed mean-variance link, and (2) the accommodation of hierarchical structure in the data, stemming from clustering in the data which, in turn, might result from repeatedly measuring the outcome, for various members of the same family, and so on. The first issue is dealt with through a variety of overdispersion models such as the beta-binomial model for grouped binary data and the negative-binomial model for counts. Clustering is often accommodated through the inclusion of random subject-specific effects. Though not always, one conventionally assumes such random effects to be normally distributed. While both of these phenomena might occur simultaneously, models combining them are uncommon. This paper proposes a broad class of generalized linear models accommodating overdispersion and clustering through two separate sets of random effects. We place particular emphasis on so-called conjugate random effects at the level of the mean for the first aspect and normal random effects embedded within the linear predictor for the second aspect, even though our family is more general. The binary, count, and time-to-event cases are given particular emphasis. Apart from model formulation, we present an overview of estimation methods, and then settle for maximum likelihood estimation with analytic-numerical integration. Implications for the derivation of marginal correlations functions are discussed. The methodology is applied to data from a study of epileptic seizures, a clinical trial for a toenail infection named onychomycosis, and survival data in children with asthma.
Read the paper (PDF). | Watch the recording.
Geert Molenberghs, Universiteit Hasselt & KU Leuven
Paper 3560-2015:
A SAS Macro to Calculate the PDC Adjustment of Inpatient Stays
The Centers for Medicare & Medicaid Services (CMS) uses the Proportion of Days Covered (PDC) to measure medication adherence. There is also some PDC-related research based on Medicare Part D Event (PDE) Data. However, Under Medicare rules, beneficiaries who receive care at an Inpatient (IP) [facility] may receive Medicare covered medications directly from the IP, rather than by filling prescriptions through their Part D contracts; thus, their medication fills during an IP stay would not be included in the PDE claims used to calculate the Patient Safety adherence measures. (Medicare 2014 Part C&D star rating technical notes). Therefore, the previous PDC calculation method underestimated the true PDC value. Starting with 2013 Star rating, PDC calculation was adjusted with IP stays. This is, when a patient has an inpatient admission during the measurement period, the inpatient stays are censored for the PDC calculation. If the patient also has measured drug coverage during the inpatient stay, the drug supplied during inpatient stay will be shifted after the inpatient stay. This shifting also causes a chain of shifting. This paper presents a SAS R Macro using the SAS Hash Object to match inpatient stays, censoring the inpatient stays, shifting the drug starting and ending dates, and calculating the adjusted PDC.
Read the paper (PDF). | Download the data file (ZIP).
anping Chang, IHRC Inc.
Paper 2141-2015:
A SAS® Macro to Compare Predictive Values of Diagnostic Tests
Medical tests are used for various purposes including diagnosis, prognosis, risk assessment and screening. Statistical methodology is used often to evaluate such types of tests, most frequent measures used for binary data being sensitivity, specificity, positive and negative predictive values. An important goal in diagnostic medicine research is to estimate and compare the accuracies of such tests. In this paper I give a gentle introduction to measures of diagnostic test accuracy and introduce a SAS® macro to calculate generalized score statistic and weighted generalized score statistic for comparison of predictive values using formula's generalized and proposed by Andrzej S. Kosinski.
Read the paper (PDF).
Lovedeep Gondara, University of Illinois Springfield
Paper 2980-2015:
A Set of SAS® Macros for Generating Survival Analysis Reports for Lifetime Data with or without Competing Risks
The paper introduces users to how they can use a set of SAS® macros, %LIFETEST and %LIFETESTEXPORT, to generate survival analysis reports for data with or without competing risks. The macros provide a wrapper of PROC LIFETEST and an enhanced version of the SAS autocall macro %CIF to give users an easy-to-use interface to report both survival estimates and cumulative incidence estimates in a unified way. The macros also provide a number of parameters to enable users to flexibly adjust how the final reports should look without the need to manually input or format the final reports.
Read the paper (PDF). | Download the data file (ZIP).
Zhen-Huan Hu, Medical College of Wisconsin
Paper 3311-2015:
Adaptive Randomization Using PROC MCMC
Based on work by Thall et al. (2012), we implement a method for randomizing patients in a Phase II trial. We accumulate evidence that identifies which dose(s) of a cancer treatment provide the most desirable profile, per a matrix of efficacy and toxicity combinations rated by expert oncologists (0-100). Experts also define the region of Good utility scores and criteria of dose inclusion based on toxicity and efficacy performance. Each patient is rated for efficacy and toxicity at a specified time point. Simulation work is done mainly using PROC MCMC in which priors and likelihood function for joint outcomes of efficacy and toxicity are defined to generate posteriors. Resulting joint probabilities for doses that meet the inclusion criteria are used to calculate the mean utility and probability of having Good utility scores. Adaptive randomization probabilities are proportional to the probabilities of having Good utility scores. A final decision of the optimal dose will be made at the end of the Phase II trial.
Read the paper (PDF).
Qianyi Huang, McDougall Scientific Ltd.
John Amrhein, McDougall Scientific Ltd.
Paper 3492-2015:
Alien Nation: Text Analysis of UFO Sightings in the US Using SAS® Enterprise Miner™ 13.1
Are we alone in this universe? This is a question that undoubtedly passes through every mind several times during a lifetime. We often hear a lot of stories about close encounters, Unidentified Flying Object (UFO) sightings and other mysterious things, but we lack the documented evidence for analysis on this topic. UFOs have been a matter of interest in the public for a long time. The objective of this paper is to analyze one database that has a collection of documented reports of UFO sightings to uncover any fascinating story related to the data. Using SAS® Enterprise Miner™ 13.1, the powerful capabilities of text analytics and topic mining are leveraged to summarize the associations between reported sightings. We used PROC GEOCODE to convert addresses of sightings to the locations on the map. Then we used PROC GMAP procedure to produce a heat map to represent the frequency of the sightings in various locations. The GEOCODE procedure converts address data to geographic coordinates (latitude and longitude values). These geographic coordinates can then be used on a map to calculate distances or to perform spatial analysis. On preliminary analysis of the data associated with sightings, it was found that the most popular words associated with UFOs tell us about their shapes, formations, movements, and colors. The Text Profiler node in SAS Enterprise Miner 13.1 was leveraged to build a model and cluster the data into different levels of segment variable. We also explain how the opinions about the UFO sightings change over time using Text Profiling. Further, this analysis uses the Text Profile node to find interesting terms or topics that were used to describe the UFO sightings. Based on the feedback received at SAS® analytics conference, we plan to incorporate a technique to filter duplicate comments and include weather in that location.
Read the paper (PDF). | Download the data file (ZIP).
Pradeep Reddy Kalakota, Federal Home Loan Bank of Desmoines
Naresh Abburi, Oklahoma State University
Goutam Chakraborty, Oklahoma State University
Zabiulla Mohammed, Oklahoma State University
Paper 3197-2015:
All Payer Claims Databases (APCDs) in Data Transparency and Quality Improvement
Since Maine established the first All Payer Claims Database (APCD) in 2003, 10 additional states have established APCDs and 30 others are in development or show strong interest in establishing APCDs. APCDs are generally mandated by legislation, though voluntary efforts exist. They are administered through various agencies, including state health departments or other governmental agencies and private not-for-profit organizations. APCDs receive funding from various sources, including legislative appropriations and private foundations. To ensure sustainability, APCDs must also consider the sale of data access and reports as a source of revenue. With the advent of the Affordable Care Act, there has been an increased interest in APCDs as a data source to aid in health care reform. The call for greater transparency in health care pricing and quality, development of Patient-Centered Medical Homes (PCMHs) and Accountable Care Organizations (ACOs), expansion of state Medicaid programs, and establishment of health insurance and health information exchanges have increased the demand for the type of administrative claims data contained in an APCD. Data collection, management, analysis, and reporting issues are examined with examples from implementations of live APCDs. Developing data intake, processing, warehousing, and reporting standards are discussed in light of achieving the triple aim of improving the individual experience of care; improving the health of populations; and reducing the per capita costs of care. APCDs are compared and contrasted with other sources of state-level health care data, including hospital discharge databases, state departments of insurance records, and institutional and consumer surveys. The benefits and limitations of administrative claims data are reviewed. Specific issues addressed with examples include implementing transparent reporting of service prices and provider quality, maintaining master patient and provider identifiers, validating APCD data and comparison with o ther state health care data available to researchers and consumers, defining data suppression rules to ensure patient confidentiality and HIPAA-compliant data release and reporting, and serving multiple end users, including policy makers, researchers, and consumers with appropriately consumable information.
Read the paper (PDF). | Watch the recording.
Paul LaBrec, 3M Health Information Systems
Paper 3412-2015:
Alternative Methods of Regression When Ordinary Least Squares Regression Is Not Right
Ordinary least squares regression is one of the most widely used statistical methods. However, it is a parametric model and relies on assumptions that are often not met. Alternative methods of regression for continuous dependent variables relax these assumptions in various ways. This paper explores procedures such as QUANTREG, ADAPTIVEREG, and TRANSREG for these kinds of data.
Read the paper (PDF). | Watch the recording.
Peter Flom, Peter Flom Consulting
Paper 3300-2015:
An Empirical Comparison of Multiple Imputation Approaches for Treating Missing Data in Observational Studies
Missing data are a common and significant problem that researchers and data analysts encounter in applied research. Because most statistical procedures require complete data, missing data can substantially affect the analysis and the interpretation of results if left untreated. Methods to treat missing data have been developed so that missing values are imputed and analyses can be conducted using standard statistical procedures. Among these missing data methods, multiple imputation has received considerable attention and its effectiveness has been explored (for example, in the context of survey and longitudinal research). This paper compares four multiple imputation approaches for treating missing continuous covariate data under MCAR, MAR, and NMAR assumptions, in the context of propensity score analysis and observational studies. The comparison of the four MI approaches in terms of bias in parameter estimates, Type I error rates, and statistical power is presented. In addition, complete case analysis (listwise deletion) is presented as the default analysis that would be conducted if missing data are not treated. Issues are discussed, and conclusions and recommendations are provided.
Read the paper (PDF).
Patricia Rodriguez de Gil, University of South Florida
Shetay Ashford, University of South Florida
Chunhua Cao, University of South Florida
Eun-Sook Kim, University of South Florida
Rheta Lanehart, University of South Florida
Reginald Lee, University of South Florida
Jessica Montgomery, University of South Florida
Yan Wang, University of South Florida
Paper 3294-2015:
An Introduction to Testing for Unit Roots Using SAS®: The Case of U.S. National Health Expenditures
Testing for unit roots and determining whether a data set is nonstationary is important for the economist who does empirical work. SAS® enables the user to detect unit roots using an array of tests: the Dickey-Fuller, Augmented Dickey-Fuller, Phillips-Perron, and the Kwiatkowski-Phillips-Schmidt-Shin test. This paper presents a brief overview of unit roots and shows how to test for a unit root using the example of U.S. national health expenditure data.
Read the paper (PDF). | Download the data file (ZIP).
Don McCarthy, Kaiser Permanente
Paper 2600-2015:
An Introductory Overview of the Features of Complex Survey Data
A complex survey data set is one characterized by any combination of the following four features: stratification, clustering, unequal weights, or finite population correction factors. In this paper, we provide context for why these features might appear in data sets produced from surveys, highlight some of the formulaic modifications they introduce, and outline the syntax needed to properly account for them. Specifically, we explain why you should use the SURVEY family of SAS/STAT® procedures, such as PROC SURVEYMEANS or PROC SURVEYREG, to analyze data of this type. Although many of the syntax examples are drawn from a fictitious expenditure survey, we also discuss the origins of complex survey features in three real-world survey efforts sponsored by statistical agencies of the United States government--namely, the National Ambulatory Medical Care Survey, the National Survey of Family of Growth, and the Consumer Building Energy Consumption Survey.
Read the paper (PDF). | Watch the recording.
Taylor Lewis, University of Maryland
Paper 1644-2015:
Analysis of Survey Data Using the SAS SURVEY Procedures: A Primer
This paper provides an overview of analysis of data derived from complex sample designs. General discussion of how and why analysis of complex sample data differs from standard analysis is included. In addition, a variety of applications are presented using PROC SURVEYMEANS, PROC SURVEYFREQ, PROC SURVEYREG, PROC SURVEYLOGISTIC, and PROC SURVEYPHREG, with an emphasis on correct usage and interpretation of results.
Read the paper (PDF). | Watch the recording.
Patricia Berglund, University of Michigan
Paper SAS1332-2015:
Analyzing Spatial Point Patterns Using the New SPP Procedure
In many spatial analysis applications (including crime analysis, epidemiology, ecology, and forestry), spatial point process modeling can help you study the interaction between different events and help you model the process intensity (the rate of event occurrence per unit area). For example, crime analysts might want to estimate where crimes are likely to occur in a city and whether they are associated with locations of public features such as bars and bus stops. Forestry researchers might want to estimate where trees grow best and test for association with covariates such as elevation and gradient. This paper describes the SPP procedure, new in SAS/STAT® 13.2, for exploring and modeling spatial point pattern data. It describes methods that PROC SPP implements for exploratory analysis of spatial point patterns and for log-linear intensity modeling that uses covariates. It also shows you how to use specialized functions for studying interactions between points and how to use specialized analytical graphics to diagnose log-linear models of spatial intensity. Crime analysis, forestry, and ecology examples demonstrate key features of PROC SPP.
Read the paper (PDF).
Pradeep Mohan, SAS
Randy Tobias, SAS
Paper 3327-2015:
Automated Macros to Extract Data from the National (Nationwide) Inpatient Sample (NIS)
The use of administrative databases for understanding practice patterns in the real world has become increasingly apparent. This is essential in the current health-care environment. The Affordable Care Act has helped us to better understand the current use of technology and different approaches to surgery. This paper describes a method for extracting specific information about surgical procedures from the Healthcare Cost and Utilization Project (HCUP) database (also referred to as the National (Nationwide) Inpatient Sample (NIS)).The analyses provide a framework for comparing the different modalities of surgerical procedures of interest. Using an NIS database for a single year, we want to identify cohorts based on surgical approach. We do this by identifying the ICD-9 codes specific to robotic surgery, laparoscopic surgery, and open surgery. After we identify the appropriate codes using an ARRAY statement, a similar array is created based on the ICD-9 codes. Any minimally invasive procedure (robotic or laparoscopic) that results in a conversion is flagged as a conversion. Comorbidities are identified by ICD-9 codes representing the severity of each subject and merged with the NIS inpatient core file. Using a FORMAT statement for all diagnosis variables, we create macros that can be regenerated for each type of complication. These created macros are compiled in SAS® and stored in the library that contains the four macros that are called by tables. They call the macros for different macros variables. In addition, they create the frequencies of all cohorts and create the table structure with the title and number of the table. This paper describes a systematic method in SAS/STAT® 9.2 to extract the data from NIS using the ARRAY statement for the specific ICD-9 codes, to format the extracted data for the analysis, to merge the different NIS databases by procedures, and to use automatic macros to generate the report.
Read the paper (PDF).
Ravi Tejeshwar Reddy Gaddameedi, California State University,Eastbay
Usha Kreaden, Intuitive Surgical
B
Paper 3643-2015:
Before and After Models in Observational Research Using Random Slopes and Intercepts
In observational data analyses, it is often helpful to use patients as their own controls by comparing their outcomes before and after some signal event, such as the initiation of a new therapy. It might be useful to have a control group that does not have the event but that is instead evaluated before and after some arbitrary point in time, such as their birthday. In this context, the change over time is a continuous outcome that can be modeled as a (possibly discontinuous) line, with the same or different slope before and after the event. Mixed models can be used to estimate random slopes and intercepts and compare patients between groups. A specific example published in a peer-reviewed journal is presented.
Read the paper (PDF).
David Pasta, ICON Clinical Research
Paper 3162-2015:
Best Practices: Subset without Getting Upset
You've worked for weeks or even months to produce an analysis suite for a project. Then, at the last moment, someone wants a subgroup analysis, and they inform you that they need it yesterday. This should be easy to do, right? So often, the programs that we write fall apart when we use them on subsets of the original data. This paper takes a look at some of the best practice techniques that can be built into a program at the beginning, so that users can subset on the fly without losing categories or creating errors in statistical tests. We review techniques for creating tables and corresponding titles with BY-group processing so that minimal code needs to be modified when more groups are created. And we provide a link to sample code and sample data that can be used to get started with this process.
Read the paper (PDF).
Mary Rosenbloom, Edwards Lifesciences, LLC
Kirk Paul Lafler, Software Intelligence Corporation
C
Paper 2080-2015:
Calculate Decision Consistency Statistics for a Single Administration of a Test
Many certification programs classify candidates into performance levels. For example, the SAS® Certified Base Programmer breaks down candidates into two performance levels: Pass and Fail. It is important to note that because all test scores contain measurement error, the performance level categorizations based on those test scores are also subject to measurement error. An important part of psychometric analysis is to estimate the decision consistency of the classifications. This study helps fill a gap in estimating decision consistency statistics for a single administration of a test using SAS.
Read the paper (PDF).
Fan Yang, The University of Iowa
Yi Song, University of Illinois at Chicago
Paper 3148-2015:
Catering to Your Tastes: Using PROC OPTEX to Design Custom Experiments, with Applications in Food Science and Field Trials
The success of an experimental study almost always hinges on how you design it. Does it provide estimates for everything you're interested in? Does it take all the experimental constraints into account? Does it make efficient use of limited resources? The OPTEX procedure in SAS/QC® software enables you to focus on specifying your interests and constraints, and it takes responsibility for handling them efficiently. With PROC OPTEX, you skip the step of rifling through tables of standard designs to try to find the one that's right for you. You concentrate on the science and the analytics and let SAS® do the computing. This paper reviews the features of PROC OPTEX and shows them in action using examples from field trials and food science experimentation. PROC OPTEX is a useful tool for all these situations, doing the designing and freeing the scientist to think about the food and the biology.
Read the paper (PDF). | Download the data file (ZIP).
Cliff Pereira, Dept of Statistics, Oregon State University
Randy Tobias, SAS
Paper 1329-2015:
Causal Analytics: Testing, Targeting, and Tweaking to Improve Outcomes
This session is an introduction to predictive analytics and causal analytics in the context of improving outcomes. The session covers the following topics: 1) Basic predictive analytics vs. causal analytics; 2) The causal analytics framework; 3) Testing whether the outcomes improve because of an intervention; 4) Targeting the cases that have the best improvement in outcomes because of an intervention; and 5) Tweaking an intervention in a way that improves outcomes further.
Read the paper (PDF).
Jason Pieratt, Humana
Paper 2380-2015:
Chi-Square and T Tests Using SAS®: Performance and Interpretation
Data analysis begins with cleaning up data, calculating descriptive statistics, and examining variable distributions. Before more rigorous statistical analysis begins, many statisticians perform basic inferential statistical tests such as chi-square and t tests to assess unadjusted associations. These tests help guide the direction of the more rigorous statistical analysis. How to perform chi-square and t tests is presented. We explain how to interpret the output and where to look for the association or difference based on the hypothesis being tested. We propose the next steps for further analysis using example data.
Read the paper (PDF).
Maribeth Johnson, Georgia Regents University
Jennifer Waller, Georgia Regents University
Paper 3291-2015:
Coding Your Own MCMC Algorithm
In Bayesian statistics, Markov chain Monte Carlo (MCMC) algorithms are an essential tool for sampling from probability distributions. PROC MCMC is useful for these algorithms. However, it is often desirable to code an algorithm from scratch. This is especially present in academia where students are expected to be able to understand and code an MCMC. The ability of SAS® to accomplish this is relatively unknown yet quite straightforward. We use SAS/IML® to demonstrate methods for coding an MCMC algorithm with examples of a Gibbs sampler and Metropolis-Hastings random walk.
Read the paper (PDF).
Chelsea Lofland, University of California Santa Cruz
Paper 1424-2015:
Competing Risk Survival Analysis Using SAS®: When, Why, and How
Competing risk arise in time to event data when the event of interest cannot be observed because of a preceding event i.e. a competing event occurring before. An example can be of an event of interest being a specific cause of death where death from any other cause can be termed as a competing event, if focusing on relapse, death before relapse would constitute a competing event. It is well studied and pointed out that in presence of competing risks, the standard product limit methods yield biased results due to violation of their basic assumption. The effect of competing events on parameter estimation depends on their distribution and frequency. Fine and Gray's sub-distribution hazard model can be used in presence of competing events which is available in PROC PHREG with the release of version 9.4 of SAS® software.
Read the paper (PDF).
Lovedeep Gondara, University of Illinois Springfield
Paper 1442-2015:
Confirmatory Factor Analysis Using PROC CALIS: A Practical Guide for Survey Researchers
Survey research can provide a straightforward and effective means of collecting input on a range of topics. Survey researchers often like to group similar survey items into construct domains in order to make generalizations about a particular area of interest. Confirmatory Factor Analysis is used to test whether this pre-existing theoretical model underlies a particular set of responses to survey questions. Based on Structural Equation Modeling (SEM), Confirmatory Factor Analysis provides the survey researcher with a means to evaluate how well the actual survey response data fits within the a priori model specified by subject matter experts. PROC CALIS now provides survey researchers the ability to perform Confirmatory Factor Analysis using SAS®. This paper provides a survey researcher with the steps needed to complete Confirmatory Factor Analysis using SAS. We discuss and demonstrate the options available to survey researchers in the handling of missing and not applicable survey responses using an ARRAY statement within a DATA step and imputation of item non-response. A simple demonstration of PROC CALIS is then provided with interpretation of key portions of the SAS output. Using recommendations provided by SAS from the PROC CALIS output, the analysis is then modified to provide a better fit of survey items into survey domains.
Read the paper (PDF).
Lindsey Brown Philpot, Baylor Scott & White Health
Sunni Barnes, Baylor Scott&White Health
Crystal Carel, BaylorScott&White Health Care System
Paper SAS1854-2015:
Creating Reports in SAS® Visual Analytics Designer That Dynamically Substitute Graph Roles on the Fly Using Parameterized Expressions
With the expansive new features in SAS® Visual Analytics 7.1, you can now take control of the graph data while viewing a report. Using parameterized expressions, calculated items, custom categories, and prompt controls, you can now change the measures or categories on a graph from a mobile device or web viewer. View your data from different perspectives while using the same graph. This paper demonstrates how you can use these features in SAS® Visual Analytics Designer to create reports in which graph roles can be dynamically changed with the click of a button.
Read the paper (PDF).
Kenny Lui, SAS
Paper 2242-2015:
Creative Uses of Vector Plots Using SAS®
Hysteresis loops occur because of a time lag between input and output. In the pharmaceutical industry, hysteresis plots are tools to visualize the time lag between drug concentration and drug effects. Before SAS® 9.2, SAS annotations were used to generate such plots. One of the criticisms was that SAS programmers had to write complex macros to automate the process; code management and validation tasks were not easy. With SAS 9.2, SAS programmers are able to generate such plots with ease. This paper demonstrates the generation of such plots with both Base SAS and SAS/GRAPH® software.
Read the paper (PDF).
Deli Wang, REGENERON
Paper 3249-2015:
Cutpoint Determination Methods in Survival Analysis Using SAS®: Updated %FINDCUT Macro
Statistical analysis that uses data from clinical or epidemiological studies include continuous variables such as patient's age, blood pressure, and various biomarkers. Over the years, there has been an increase in studies that focus on assessing associations between biomarkers and disease of interest. Many of the biomarkers are measured as continuous variables. Investigators seek to identify the possible cutpoint to classify patients as high risk versus low risk based on the value of the biomarker. Several data-oriented techniques such as median and upper quartile, and outcome-oriented techniques based on score, Wald, and likelihood ratio tests are commonly used in the literature. Contal and O'Quigley (1999) presented a technique that used log rank test statistic in order to estimate the cutpoint. Their method was computationally intensive and hence was overlooked due to the unavailability of built-in options in standard statistical software. In 2003, we provided the %FINDCUT macro that used Contal and O'Quigley's approach to identify a cutpoint when the outcome of interest was measured as time to event. Over the past decade, demand for this macro has continued to grow, which has led us to consider updating the %FINDCUT macro to incorporate new tools and procedures from SAS® such as array processing, Graph Template Language, and the REPORT procedure. New and updated features include: results presented in a much cleaner report format, user-specified cutpoints, macro parameter error checking, temporary data set cleanup, preserving current option settings, and increased processing speed. We present the utility and added options of the revised %FINDCUT macro using a real-life data set. In addition, we critically compare this method to some of the existing methods and discuss the use and misuse of categorizing a continuous covariate.
Read the paper (PDF).
Jay Mandrekar, Mayo Clinic
Jeffrey Meyers, Mayo Clinic
D
Paper 3104-2015:
Data Management Techniques for Complex Healthcare Data
Data sharing through healthcare collaboratives and national registries creates opportunities for secondary data analysis projects. These initiatives provide data for quality comparisons as well as endless research opportunities to external researchers across the country. The possibilities are bountiful when you join data from diverse organizations and look for common themes related to illnesses and patient outcomes. With these great opportunities comes great pain for data analysts and health services researchers tasked with compiling these data sets according to specifications. Patient care data is complex, and, particularly at large healthcare systems, might be managed with multiple electronic health record (EHR) systems. Matching data from separate EHR systems while simultaneously ensuring the integrity of the details of that care visit is challenging. This paper demonstrates how data management personnel can use traditional SAS PROCs in new and inventive ways to compile, clean, and complete data sets for submission to healthcare collaboratives and other data sharing initiatives. Traditional data matching methods such as SPEDIS are uniquely combined with iterative SQL joins using the SAS® functions INDEX, COMPRESS, CATX, and SUBSTR to yield the most out of complex patient and physician name matches. Recoding, correcting missing items, and formatting data can be efficiently achieved by using traditional functions such as MAX, PROC FORMAT, and FIND in new and inventive ways.
Read the paper (PDF).
Gabriela Cantu, Baylor Scott &White Health
Christopher Klekar, Baylor Scott and White Health
Paper 3321-2015:
Data Summarization for a Dissertation: A Grad Student How-To Paper
Graduate students often need to explore data and summarize multiple statistical models into tables for a dissertation. The challenges of data summarization include coding multiple, similar statistical models, and summarizing these models into meaningful tables for review. The default method is to type (or copy and paste) results into tables. This often takes longer than creating and running the analyses. Students might spend hours creating tables, only to have to start over when a change or correction in the underlying data requires the analyses to be updated. This paper gives graduate students the tools to efficiently summarize the results of statistical models in tables. These tools include a macro-based SAS/STAT® analysis and ODS OUTPUT statement to summarize statistics into meaningful tables. Specifically, we summarize PROC GLM and PROC LOGISTIC output. We convert an analysis of hospital-acquired delirium from hundreds of pages of output into three formatted Microsoft Excel files. This paper is appropriate for users familiar with basic macro language.
Read the paper (PDF).
Elisa Priest, Texas A&M University Health Science Center
Ashley Collinsworth, Baylor Scott & White Health/Tulane University
Paper 3305-2015:
Defensive Coding by Example: Kick the Tires, Pump the Breaks, Check Your Blind Spots, and Merge Ahead!
As SAS® programmers and statisticians, we rarely write programs that are run only once and then set aside. Instead, we are often asked to develop programs very early in a project, on immature data, following specifications that may be little more than a guess as to what the data is supposed to look like. These programs will then be run repeatedly on periodically updated data through the duration of the project. This paper offers strategies for not only making those programs more flexible, so they can handle some of the more commonly encountered variations in that data, but also for setting traps to identify unexpected data points that require further investigation. We will also touch upon some good programming practices that can benefit both the original programmer and others who might have to touch the code. In this paper, we will provide explicit examples of defensive coding that will aid in kicking the tires, pumping the breaks, checking your blind spots, and merging ahead for quality programming from the beginning.
Read the paper (PDF).
Donna Levy, Inventiv Health Clinical
Nancy Brucken, inVentiv Health Clinical
Paper 3386-2015:
Defining and Mapping a Reasonable Distance for Consumer Access to Market Locations
Using geocoded addresses from FDIC Summary of Deposits data with Census geospatial data including TIGER boundary files and population-weighted centroid shapefiles, we were able to calculate a reasonable distance threshold by metropolitan statistical area (MSA) (or metropolitan division, where applicable (MD)) through a series of SAS® DATA steps and SQL joins. We first used the Cartesian join with PROC SQL on the data set containing population-weighted centroid coordinates. (The data set contained geocoded coordinates of approximately 91,000 full-service bank branches.) Using the GEODIST function in SAS, we were able to calculate the distance to the nearest bank branch from the population-weighted centroid of each Census tract. The tract data set was then grouped by MSA/MD and sorted in ascending order within each grouping (using the RETAIN function) by distance to the nearest bank branch. We calculated the cumulative population and cumulative population percent for each MSA/MD. The reasonable threshold distance is established where cumulative population percent is closest (in either direction +/-) to 90%.
Read the paper (PDF).
Sarah Campbell, Federal Deposit Insurance Corporation
Paper 2442-2015:
Don't Copy and Paste--Use BY Statement Processing with ODS to Make Your Summary Tables
Most manuscripts in medical journals contain summary tables that combine simple summaries and between-group comparisons. These tables typically combine estimates for categorical and continuous variables. The statistician generally summarizes the data using the FREQ procedure for categorical variables and compares percentages between groups using a chi-square or a Fisher's exact test. For continuous variables, the MEANS procedure is used to summarize data as either means and standard deviation or medians and quartiles. Then these statistics are generally compared between groups by using the GLM procedure or NPAR1WAY procedure, depending on whether one is interested in a parametric test or a non-parametric test. The outputs from these different procedures are then combined and presented in a concise format ready for publications. Currently there is no straightforward way in SAS® to build these tables in a presentable format that can then be customized to individual tastes. In this paper, we focus on presenting summary statistics and results from comparing categorical variables between two or more independent groups. The macro takes the dataset, the number of treatment groups, and the type of test (either chi-square or Fisher's exact) as input and presents the results in a publication-ready table. This macro automates summarizing data to a certain extent and minimizes risky typographical errors when copying results or typing them into a table.
Read the paper (PDF).
Jeff Gossett, University of Arkansas for Medical Sciences
Mallikarjuna Rettiganti, UAMS
Paper 3381-2015:
Double Generalized Linear Models Using SAS®: The %DOUBLEGLM Macro
The purpose of this paper is to introduce a SAS® macro named %DOUBLEGLM that enables users to model the mean and dispersion jointly using double generalized linear models described in Nelder (1991) and Lee (1998). The R functions FITJOINT and DGLM (R Development Core Team, 2011) were used to verify the suitability of the %DOUBLEGLM macro estimates. The results showed that estimates were closer than the R functions.
Read the paper (PDF). | Download the data file (ZIP).
Paulo Silva, Universidade de Brasilia
Alan Silva, Universidade de Brasilia
E
Paper SAS1911-2015:
Equivalence and Noninferiority Testing Using SAS/STAT® Software
Proving difference is the point of most statistical testing. In contrast, the point of equivalence and noninferiority tests is to prove that results are substantially the same, or at least not appreciably worse. An equivalence test can show that a new treatment, one that is less expensive or causes fewer side effects, can replace a standard treatment. A noninferiority test can show that a faster manufacturing process creates no more product defects or industrial waste than the standard process. This paper reviews familiar and new methods for planning and analyzing equivalence and noninferiority studies in the POWER, TTEST, and FREQ procedures in SAS/STAT® software. Techniques that are discussed range from Schuirmann's classic method of two one-sided tests (TOST) for demonstrating similar normal or lognormal means in bioequivalence studies, to Farrington and Manning's noninferiority score test for showing that an incidence rate (such as a rate of mortality, side effects, or product defects) is no worse. Real-world examples from clinical trials, drug development, and industrial process design are included.
Read the paper (PDF).
John Castelloe, SAS
Donna Watts, SAS
Paper 3047-2015:
Examination of Three SAS® Tools for Solving a Complicated Data Summary Problem
When faced with a difficult data reduction problem, a SAS® programmer has many options for how to solve the problem. In this presentation, three different methods are reviewed and compared in terms of processing time, debugging, and ease of understanding. The three methods include linearizing the data, using SQL Cartesian joins, and using sequential data processing. Inconsistencies in the raw data caused the data linearization to be problematic. The large number of records and the need for many-to-many merges resulted in a long run time for the SQL code. The sequential data processing, although older technology, provided the most time efficient and error-free results.
Read the paper (PDF).
Carry Croghan, US-EPA
F
Paper 3419-2015:
Forest Plotting Analysis Macro %FORESTPLOT
A powerful tool for visually analyzing regression analysis is the forest plot. Model estimates, ratios, and rates with confidence limits are graphically stacked vertically in order to show how they overlap with each other and to show values of significance. The ability to see whether two values are significantly different from each other or whether a covariate has a significant meaning on its own is made much simpler in a forest plot rather than sifting through numbers in a report table. The amount of data preparation needed in order to build a high-quality forest plot in SAS® can be tremendous because the programmer needs to run analyses, extract the estimates to be plotted, structure the estimates in a format conducive to generating a forest plot, and then run the correct plotting procedure or create a graph template using the Graph Template Language (GTL). While some SAS procedures can produce forest plots using Output Delivery System (ODS) Graphics automatically, the plots are not generally publication-ready and are difficult to customize even if the programmer is familiar with GTL. The macro %FORESTPLOT is designed to perform all of the steps of building a high-quality forest plot in order to save time for both experienced and inexperienced programmers, and is currently set up to perform regression analyses common to the clinical oncology research areas, Cox proportional hazards and logistic, as well as calculate Kaplan-Meier event-free rates. To improve flexibility, the user can specify a pre-built data set to transform into a forest plot if the automated analysis options of the macro do not fit the user's needs.
Read the paper (PDF).
Jeffrey Meyers, Mayo Clinic
Qian Shi, Mayo Clinic
Paper SAS1580-2015:
Functional Modeling of Longitudinal Data with the SSM Procedure
In many studies, a continuous response variable is repeatedly measured over time on one or more subjects. The subjects might be grouped into different categories, such as cases and controls. The study of resulting observation profiles as functions of time is called functional data analysis. This paper shows how you can use the SSM procedure in SAS/ETS® software to model these functional data by using structural state space models (SSMs). A structural SSM decomposes a subject profile into latent components such as the group mean curve, the subject-specific deviation curve, and the covariate effects. The SSM procedure enables you to fit a rich class of structural SSMs, which permit latent components that have a wide variety of patterns. For example, the latent components can be different types of smoothing splines, including polynomial smoothing splines of any order and all L-splines up to order 2. The SSM procedure efficiently computes the restricted maximum likelihood (REML) estimates of the model parameters and the best linear unbiased predictors (BLUPs) of the latent components (and their derivatives). The paper presents several real-life examples that show how you can fit, diagnose, and select structural SSMs; test hypotheses about the latent components in the model; and interpolate and extrapolate these latent components.
Read the paper (PDF). | Download the data file (ZIP).
Rajesh Selukar, SAS
Paper 3142-2015:
Fuzzy Matching
Quality measurement is increasingly important in the health-care sphere for both performance optimization and reimbursement. Treatment of chronic conditions is a key area of quality measurement. However, medication compendiums change frequently, and health-care providers often free text medications into a patient's record. Manually reviewing a complete medications database is time consuming. In order to build a robust medications list, we matched a pharmacist-generated list of categorized medications to a raw medications database that contained names, name-dose combinations, and misspellings. The matching procedure we used is called PROC COMPGED. We were able to combine a truncation function and an upcase function to optimize the output of PROC COMPGED. Using these combinations and manipulating the scoring metric of PROC COMPGED enabled us to narrow the database list to medications that were relevant to our categories. This process transformed a tedious task for PROC COMPARE or an Excel macro into a quick and efficient method of matching. The task of sorting through relevant matches was still conducted manually, but the time required to do so was significantly decreased by the fuzzy match in our application of PROC COMPGED.
Read the paper (PDF).
Arti Virkud, NYC Department of Health
G
Paper 2787-2015:
GEN_OMEGA2: A SAS® Macro for Computing the Generalized Omega- Squared Effect Size Associated with Analysis of Variance Models
Effect sizes are strongly encouraged to be reported in addition to statistical significance and should be considered in evaluating the results of a study. The choice of an effect size for ANOVA models can be confusing because indices might differ depending on the research design as well as the magnitude of the effect. Olejnik and Algina (2003) proposed the generalized eta-squared and omega-squared effect sizes, which are comparable across a wide variety of research designs. This paper provides a SAS® macro for computing the generalized omega-squared effect size associated with analysis of variance models by using data from PROC GLM ODS tables. The paper provides the macro programming language, as well as results from an executed example of the macro.
Read the paper (PDF).
Anh Kellermann, University of South Florida
Yi-hsin Chen, USF
Anh Kellermann, University of South Florida
Jeffrey Kromrey, University of South Florida
Thanh Pham, USF
Patrice Rasmussen, USF
Patricia Rodriguez de Gil, University of South Florida
Jeanine Romano, USF
H
Paper 3485-2015:
Health Services Research Using Electronic Health Record Data: A Grad Student How-To Paper
Graduate students encounter many challenges when conducting health services research using real world data obtained from electronic health records (EHRs). These challenges include cleaning and sorting data, summarizing and identifying present-on-admission diagnosis codes, identifying appropriate metrics for risk-adjustment, and determining the effectiveness and cost effectiveness of treatments. In addition, outcome variables commonly used in health service research are not normally distributed. This necessitates the use of nonparametric methods in statistical analyses. This paper provides graduate students with the basic tools for the conduct of health services research with EHR data. We will examine SAS® tools and step-by-step approaches used in an analysis of the effectiveness and cost-effectiveness of the ABCDE (Awakening and Breathing Coordination, Delirium monitoring/management, and Early exercise/mobility) bundle in improving outcomes for intensive care unit (ICU) patients. These tools include the following: (1) ARRAYS; (2) lookup tables; (3) LAG functions; (4) PROC TABULATE; (5) recycled predictions; and (6) bootstrapping. We will discuss challenges and lessons learned in working with data obtained from the EHR. This content is appropriate for beginning SAS users.
Read the paper (PDF).
Ashley Collinsworth, Baylor Scott & White Health/Tulane University
Elisa Priest, Texas A&M University Health Science Center
Paper SAS1747-2015:
Helping You C What You Can Do with SAS®
SAS® users are already familiar with the FCMP procedure and the flexibility it provides them in writing their own functions and subroutines. However, did you know that FCMP also allows you to call functions written in C? Did you know that you can create and populate complex C structures and use C types in FCMP? With the PROTO procedure, you can define function prototypes, structures, enumeration types, and even small bits of C code. This paper gets you started on how to use the PROTO procedure and, in turn, how to call your C functions from within FCMP and SAS.
Read the paper (PDF). | Download the data file (ZIP).
Andrew Henrick, SAS
Karen Croft, SAS
Donald Erdman, SAS
Paper 3431-2015:
How Latent Analyses within Survey Data Can Be Valuable Additions to Any Regression Model
This study looks at several ways to investigate latent variables in longitudinal surveys and their use in regression models. Three different analyses for latent variable discovery are briefly reviewed and explored. The procedures explored in this paper are PROC LCA, PROC LTA, PROC CATMOD, PROC FACTOR, PROC TRAJ, and PROC SURVEYLOGISTIC. The analyses defined through these procedures are latent profile analyses, latent class analyses, and latent transition analyses. The latent variables are included in three separate regression models. The effect of the latent variables on the fit and use of the regression model compared to a similar model using observed data is briefly reviewed. The data used for this study was obtained via the National Longitudinal Study of Adolescent Health, a study distributed and collected by Add Health. Data was analyzed using SAS® 9.3. This paper is intended for any level of SAS® user. This paper is also aimed at an audience with a background in behavioral science or statistics.
Read the paper (PDF).
Deanna Schreiber-Gregory, National University
Paper 3214-2015:
How is Your Health? Using SAS® Macros, ODS Graphics, and GIS Mapping to Monitor Neighborhood and Small-Area Health Outcomes
With the constant need to inform researchers about neighborhood health data, the Santa Clara County Health Department created socio-demographic and health profiles for 109 neighborhoods in the county. Data was pulled from many public and county data sets, compiled, analyzed, and automated using SAS®. With over 60 indicators and 109 profiles, an efficient set of macros was used to automate the calculation of percentages, rates, and mean statistics for all of the indicators. Macros were also used to automate individual census tracts into pre-decided neighborhoods to avoid data entry errors. Simple SQL procedures were used to calculate and format percentages within the macros, and output was pushed out using Output Delivery System (ODS) Graphics. This output was exported to Microsoft Excel, which was used to create a sortable database for end users to compare cities and/or neighborhoods. Finally, the automated SAS output was used to map the demographic data using geographic information system (GIS) software at three geographies: city, neighborhood, and census tract. This presentation describes the use of simple macros and SAS procedures to reduce resources and time spent on checking data for quality assurance purposes. It also highlights the simple use of ODS Graphics to export data to an Excel file, which was used to mail merge the data into 109 unique profiles. The presentation is aimed at intermediate SAS users at local and state health departments who might be interested in finding an efficient way to run and present health statistics given limited staff and resources.
Read the paper (PDF).
Roshni Shah, Santa Clara County
Paper 3143-2015:
How to Cut the Time of Processing of Ryan White Services (RSR) Data by 99% and More.
The widely used method to convert RSR XML data to some standard, ready-to-process database uses a Visual Basic mapper as a buffer tool when reading XML data (for example, into an MS Access database). This paper describes the shortcomings of this method with respect to the different schemas of RSR data and offers a SAS® macro that enables users to read any schema of RSR data directly into a SAS relational database. This macro entirely eliminates the step of creating an MS Access database. Using our macro, the user can cut the time of processing of Ryan White data by 99% and more, depending on the number of files that need to be processed in one run.
Read the paper (PDF).
Michael Costa, Abt Associates
Fizza Gillani, Brown University and Lifespan/Tufts/Brown Center for AIDS Research.
Paper 3252-2015:
How to Use SAS® for GMM Logistic Regression Models for Longitudinal Data with Time-Dependent Covariates
In longitudinal data, it is important to account for the correlation due to repeated measures and time-dependent covariates. Generalized method of moments can be used to estimate the coefficients in longitudinal data, although there are currently limited procedures in SAS® to produce GMM estimates for correlated data. In a recent paper, Lalonde, Wilson, and Yin provided a GMM model for estimating the coefficients in this type of data. SAS PROC IML was used to generate equations that needed to be solved to determine which estimating equations to use. In addition, this study extended classifications of moment conditions to include a type IV covariate. Two data sets were evaluated using this method, including re-hospitalization rates from a Medicare database as well as body mass index and future morbidity rates among Filipino children. Both examples contain binary responses, repeated measures, and time-dependent covariates. However, while this technique is useful, it is tedious and can also be complicated when determining the matrices necessary to obtain the estimating equations. We provide a concise and user-friendly macro to fit GMM logistic regression models with extended classifications.
Read the paper (PDF).
Katherine Cai, Arizona State University
I
Paper 3411-2015:
Identifying Factors Associated with High-Cost Patients
Research has shown that the top five percent of patients can account for nearly fifty percent of the total healthcare expenditure in the United States. Using SAS® Enterprise Guide® and PROC LOGISTIC, a statistical methodology was developed to identify factors (for example, patient demographics, diagnostic symptoms, comorbidity, and the type of procedure code) associated with the high cost of healthcare. Analyses were performed using the FAIR Health National Private Insurance Claims (NPIC) database, which contains information about healthcare utilization and cost in the United States. The analyses focused on treatments for chronic conditions, such as trans-myocardial laser revascularization for the treatment of coronary heart disease (CHD) and pressurized inhalation for the treatment of asthma. Furthermore, bubble plots and heat maps were created using SAS® Visual Analytics to provide key insights into potentially high-cost treatments for heart disease and asthma patients across the nation.
Read the paper (PDF). | Download the data file (ZIP).
Jeff Dang, FAIR Health
Paper 2320-2015:
Implementing a Discrete Event Simulation Using the American Community Survey and the SAS® University Edition
SAS® University Edition is a great addition to the world of freely available analytic software, and this 'how-to' presentation shows you how to implement a discrete event simulation using Base SAS® to model future US Veterans population distributions. Features include generating a slideshow using ODS output to PowerPoint.
Read the paper (PDF). | Download the data file (ZIP).
Michael Grierson
Paper 3343-2015:
Improving SAS® Global Forum Papers
Just as research is built on existing research, the references section is an important part of a research paper. The purpose of this study is to find the differences between professionals and academicians with respect to the references section of a paper. Data is collected from SAS® Global Forum 2014 Proceedings. Two research hypotheses are supported by the data. First, the average number of references in papers by academicians is higher than those by professionals. Second, academicians follow standards for citing references more than professionals. Text mining is performed on the references to understand the actual content. This study suggests that authors of SAS Global Forum papers should include more references to increase the quality of the papers.
Read the paper (PDF).
Vijay Singh, Oklahoma State University
Pankush Kalgotra, Oklahoma State University
Paper 3356-2015:
Improving the Performance of Two-Stage Modeling Using the Association Node of SAS® Enterprise Miner™ 12.3
Over the years, very few published studies have discussed ways to improve the performance of two-stage predictive models. This study, based on 10 years (1999-2008) of data from 130 US hospitals and integrated delivery networks, is an attempt to demonstrate how we can leverage the Association node in SAS® Enterprise Miner™ to improve the classification accuracy of the two-stage model. We prepared the data with imputation operations and data cleaning procedures. Variable selection methods and domain knowledge were used to choose 43 key variables for the analysis. The prominent association rules revealed interesting relationships between prescribed medications and patient readmission/no-readmission. The rules with lift values greater than 1.6 were used to create dummy variables for use in the subsequent predictive modeling. Next, we used two-stage sequential modeling, where the first stage predicted if the diabetic patient was readmitted and the second stage predicted whether the readmission happened within 30 days. The backward logistic regression model outperformed competing models for the first stage. After including dummy variables from an association analysis, many fit indices improved, such as the validation ASE to 0.228 from 0.238, cumulative lift to 1.56 from 1.40. Likewise, the performance of the second stage was improved after including dummy variables from an association analysis. Fit indices such as the misclassification rate improved to 0.240 from 0.243 and the final prediction error to 0.17 from 0.18.
Read the paper (PDF).
Girish Shirodkar, Oklahoma State University
Goutam Chakraborty, Oklahoma State University
Ankita Chaudhari, Oklahoma State University
Paper 3052-2015:
Introduce a Linear Regression Model by Using the Variable Transformation Method
This paper explains how to build a linear regression model using the variable transformation method. Testing the assumptions, which is required for linear modeling and testing the fit of a linear model, is included. This paper is intended for analysts who have limited exposure to building linear models. This paper uses the REG, GLM, CORR, UNIVARIATE, and GPLOT procedures.
Read the paper (PDF). | Download the data file (ZIP).
Nancy Hu, Discover
J
Paper 3020-2015:
Jeffreys Interval for One-Sample Proportion with SAS/STAT® Software
This paper introduces Jeffreys interval for one-sample proportion using SAS® software. It compares the credible interval from a Bayesian approach with the confidence interval from a frequentist approach. Different ways to calculate the Jeffreys interval are presented using PROC FREQ, the QUANTILE function, a SAS program of the random walk Metropolis sampler, and PROC MCMC.
Read the paper (PDF).
Wu Gong, The Children's Hospital of Philadelphia
K
Paper 2480-2015:
Kaplan-Meier Survival Plotting Macro %NEWSURV
The research areas of pharmaceuticals and oncology clinical trials greatly depend on time-to-event endpoints such as overall survival and progression-free survival. One of the best graphical displays of these analyses is the Kaplan-Meier curve, which can be simple to generate with the LIFETEST procedure but difficult to customize. Journal articles generally prefer that statistics such as median time-to-event, number of patients, and time-point event-free rate estimates be displayed within the graphic itself, and this was previously difficult to do without an external program such as Microsoft Excel. The macro %NEWSURV takes advantage of the Graph Template Language (GTL) that was added with the SG graphics engine to create this level of customizability without the need for back-end manipulation. Taking this one step further, the macro was improved to be able to generate a lattice of multiple unique Kaplan-Meier curves for side-by-side comparisons or for condensing figures for publications. This paper describes the functionality of the macro and describes how the key elements of the macro work.
Read the paper (PDF).
Jeffrey Meyers, Mayo Clinic
L
Paper 3297-2015:
Lasso Regularization for Generalized Linear Models in Base SAS® Using Cyclical Coordinate Descent
The cyclical coordinate descent method is a simple algorithm that has been used for fitting generalized linear models with lasso penalties by Friedman et al. (2007). The coordinate descent algorithm can be implemented in Base SAS® to perform efficient variable selection and shrinkage for GLMs with the L1 penalty (the lasso).
Read the paper (PDF).
Robert Feyerharm, Beacon Health Options
Paper SAS1748-2015:
Lost in the Forest Plot? Follow the Graph Template Language AXISTABLE Road!
A forest plot is a common visualization for meta-analysis. Some popular versions that use subgroups with indented text and bold fonts can seem outright daunting to create. With SAS® 9.4, the Graph Template Language (GTL) has introduced the AXISTABLE statement, specifically designed for including text data columns into a graph. In this paper, we demonstrate the simplicity of creating various forest plots using AXISTABLE statements. Come and see how to create forest plots as clear as day!
Read the paper (PDF). | Download the data file (ZIP).
Prashant Hebbar, SAS
M
Paper 2481-2015:
Managing Extended Attributes With a SAS® Enterprise Guide® Add-In
SAS® 9.4 introduced extended attributes, which are name-value pairs that can be attached to either the data set or to individual variables. Extended attributes are managed through PROC DATASETS and can be viewed through PROC CONTENTS or through Dictionary.XATTRS. This paper describes the development of a SAS® Enterprise Guide® custom add-in that allows for the entry and editing of extended attributes, with the possibility of using a controlled vocabulary. The controlled vocabulary used in the initial application is derived from the lifecycle branch of the Data Documentation Initiative metadata standard (DDI-L).
Read the paper (PDF).
Larry Hoyle, IPSR, Univ. of Kansas
Paper 2240-2015:
Member-Level Regression Using SAS® Enterprise Guide® and SAS® Forecast Studio
The need to measure slight changes in healthcare costs and utilization patterns over time is vital in predictive modeling, forecasting, and other advanced analytics. At BlueCross BlueShield of Tennessee, a method for developing member-level regression slopes creates a better way of identifying these changes across various time spans. The goal is to create multiple metrics at the member level that will indicate when an individual is seeking more or less medical or pharmacy services. Significant increases or decreases in utilization and cost are used to predict the likelihood of acquiring certain conditions, seeking services at particular facilities, and self-engaging in health and wellness. Data setup and compilation consists of calculating a member's eligibility with the health plan and then aggregating cost and utilization of particular services (for example, primary care visits, Rx costs, ER visits, and so on). A member must have at least six months of eligibility for a valid regression slope to be calculated. Linear regression is used to build single-factor models for 6, 12, 18 and 24 month time spans if the appropriate amount of data is available for the member. Models are built at the member-metric time period resulting in the possibility of over 75 regression coefficients per member per monthly run. The computing power needed to execute such a vast amount of calculations requires in-database processing of various macro processes. SAS® Enterprise Guide® is used to structure the data and SAS® Forecast Studio is used to forecast trends at a member level. Algorithms are run the first of each month. Data is stored so that each metric and corresponding slope is appended on a monthly basis. Because the data is setup up for the member regression algorithm, slopes are interpreted in the following manner: a positive value for -1*slope indicates an increase in utilization/cost; a negative value for -1*slope indicates a decrease in utilization/cost. The ac tual slope value indicates the intensity of the change in cost in utilization. The insight provided by this member-level regression methodology replaces subjective methods that used arbitrary thresholds of change to measure differences in cost and utilization.
Read the paper (PDF).
Leigh McCormack, BCBST
Prudhvidhar Perati, BlueCross BlueShield of TN
Paper 3760-2015:
Methodological and Statistical Issues in Provider Performance Assessment
With the move to value-based benefit and reimbursement models, it is essential toquantify the relative cost, quality, and outcome of a service. Accuratelymeasuring the cost and quality of doctors, practices, and health systems iscritical when you are developing a tiered network, a shared savings program, ora pay-for-performance incentive. Limitations in claims payment systems requiredeveloping methodological and statistical techniques to improve the validityand reliability of provider's scores on cost and quality of care. This talkdiscusses several key concepts in the development of a measurement systemfor provider performance, including measure selection, risk adjustment methods,and peer group benchmark development.
Read the paper (PDF). | Watch the recording.
Daryl Wansink, Qualmetrix, Inc.
Paper 2400-2015:
Modeling Effect Modification and Higher-Order Interactions: A Novel Approach for Repeated Measures Design Using the LSMESTIMATE Statement in SAS® 9.4
Effect modification occurs when the association between a predictor of interest and the outcome is differential across levels of a third variable--the modifier. Effect modification is statistically tested as the interaction effect between the predictor and the modifier. In repeated measures studies (with more than two time points), higher-order (three-way) interactions must be considered to test effect modification by adding time to the interaction terms. Custom fitting and constructing these repeated measures models are difficult and time consuming, especially with respect to estimating post-fitting contrasts. With the advancement of the LSMESTIMATE statement in SAS®, a simplified approach can be used to custom test for higher-order interactions with post-fitting contrasts within a mixed model framework. This paper provides a simulated example with tips and techniques for using an application of the nonpositional syntax of the LSMESTIMATE statement to test effect modification in repeated measures studies. This approach, which is applicable to exploring modifiers in randomized controlled trials (RCTs), goes beyond the treatment effect on outcome to a more functional understanding of the factors that can enhance, reduce, or change this relationship. Using this technique, we can easily identify differential changes for specific subgroups of individuals or patients that subsequently impact treatment decision making. We provide examples of conventional approaches to higher-order interaction and post-fitting tests using the ESTIMATE statement and compare and contrast this to the nonpositional syntax of the LSMESTIMATE statement. The merits and limitations of this approach are discussed.
Read the paper (PDF). | Download the data file (ZIP).
Pronabesh DasMahapatra, PatientsLikeMe Inc.
Ryan Black, NOVA Southeastern University
Paper 3430-2015:
Multilevel Models for Categorical Data Using SAS® PROC GLIMMIX: The Basics
Multilevel models (MLMs) are frequently used in social and health sciences where data are typically hierarchical in nature. However, the commonly used hierarchical linear models (HLMs) are appropriate only when the outcome of interest is normally distributed. When you are dealing with outcomes that are not normally distributed (binary, categorical, ordinal), a transformation and an appropriate error distribution for the response variable needs to be incorporated into the model. Therefore, hierarchical generalized linear models (HGLMs) need to be used. This paper provides an introduction to specifying HGLMs using PROC GLIMMIX, following the structure of the primer for HLMs previously presented by Bell, Ene, Smiley, and Schoeneberger (2013). A brief introduction into the field of multilevel modeling and HGLMs with both dichotomous and polytomous outcomes is followed by a discussion of the model-building process and appropriate ways to assess the fit of these models. Next, the paper provides a discussion of PROC GLIMMIX statements and options as well as concrete examples of how PROC GLIMMIX can be used to estimate (a) two-level organizational models with a dichotomous outcome and (b) two-level organizational models with a polytomous outcome. These examples use data from High School and Beyond (HS&B), a nationally representative longitudinal study of American youth. For each example, narrative explanations accompany annotated examples of the GLIMMIX code and corresponding output.
Read the paper (PDF).
Mihaela Ene, University of South Carolina
Bethany Bell, University of South Carolina
Genine Blue, University of South Carolina
Elizabeth Leighton, University of South Carolina
Paper 2081-2015:
Multiple Imputation Using the Fully Conditional Specification Method: A Comparison of SAS®, Stata, IVEware, and R
This presentation emphasizes use of SAS® 9.4 to perform multiple imputation of missing data using the PROC MI Fully Conditional Specification (FCS) method with subsequent analysis using PROC SURVEYLOGISTIC and PROC MIANALYZE. The data set used is based on a complex sample design. Therefore, the examples correctly incorporate the complex sample features and weights. The demonstration is then repeated in Stata, IVEware, and R for a comparison of major software applications that are capable of multiple imputation using FCS or equivalent methods and subsequent analysis of imputed data sets based on complex sample design data.
Read the paper (PDF).
Patricia Berglund, University of Michigan
N
Paper 3253-2015:
Need Additional Statistics in That Report? ODS OUTPUT to the Rescue!
You might be familiar with or experienced in writing or running reports using PROC REPORT, PROC TABULATE, or other methods of report generation. These reporting methods are often very flexible, but they can be limited in the statistics that are available as options for inclusion in the resulting output. SAS® provides the capability to produce a variety of statistics through Base SAS® and SAS/STAT® procedures by using ODS OUTPUT. These procedures include statistics from PROC CORR, PROC FREQ, and PROC UNIVARIATE in Base SAS, as well as PROC GLM, PROC LIFETEST, PROC MIXED, PROC LOGISTIC, and PROC TTEST in SAS/STAT. A number of other procedures can also produce useful ODS OUTPUT objects. Commonly requested statistics for reports include p-values, confidence intervals, and test statistics. These values can be computed with the appropriate procedure, and then use ODS OUTPUT to output the desired information to a data set and include the new information with the other data used to produce the report. Examples that demonstrate how to easily generate the desired statistics or other information and include it to produce the requested final reports are provided and discussed.
Read the paper (PDF).
Debbie Buck, inVentiv Health Clinical
Paper 3455-2015:
Nifty Uses of SQL Reflexive Join and Subquery in SAS®
SAS® SQL is so powerful that you hardly miss using Oracle PL/SQL. One SAS SQL forte can be found in using the SQL reflexive join. Another area of SAS SQL strength is the SQL subquery concept. The focus of this paper is to show alternative approaches to data reporting and to show how to surface data quality problems using reflexive join and subquery SQL concepts. The target audience for this paper is the intermediate SAS programmer or the experienced ANSI SQL programmer new to SAS programming.
Read the paper (PDF).
Cynthia Trinidad, Theorem Clinical Research
P
Paper 3370-2015:
PROC RANK, PROC SQL, PROC FORMAT, and PROC GMAP Team Up and a (Map) Legend Is Born!
The task was to produce a figure legend that gave the quintile ranges of a continuous measure corresponding to each color on a five-color choropleth US map. Actually, we needed to produce the figures and associated legends for several dozen maps for several dozen different continuous measures and time periods, as well as create the associated alt text for compliance with Section 508. So, the process needed to be automated. A method was devised using PROC RANK to generate the quintiles, PROC SQL to get the data value ranges within each quintile, and PROC FORMAT (with the CNTLIN= option) to generate and store the legend labels. The resulting data files and format catalogs were used to generate both the maps (with legends) and associated alt text. Then, these processes were rolled into a macro to apply the method for the many different maps and their legends. Each part of the method is quite simple--even mundane--but together, these techniques enabled us to standardize and automate an otherwise very tedious process. The same basic strategy could be used whenever you need to dynamically generate data buckets and keep track of the bucket boundaries (for producing labels, map legends, or alt text or for benchmarking future data against the stored categories).
Read the paper (PDF).
Christianna Williams, Self-Employed
Louise Hadden, Abt Associates Inc.
Paper 3154-2015:
PROC SQL for PROC SUMMARY Stalwarts
One of the fascinating features of SAS® is that the software often provides multiple ways to accomplish the same task. A perfect example of this is the aggregation and summarization of data across multiple rows or BY groups of interest. These groupings can be study participants, time periods, geographical areas, or just about any type of discrete classification that you want. While many SAS programmers might be accustomed to accomplishing these aggregation tasks with PROC SUMMARY (or equivalently, PROC MEANS), PROC SQL can also do a bang-up job of aggregation--often with less code and fewer steps. This step-by-step paper explains how to use PROC SQL for a variety of summarization and aggregation tasks. It uses a series of concrete, task-oriented examples to do so. The presentation style issimilar to that used in the author's previous paper, PROC SQL for DATA Step Die-Hards.'
Read the paper (PDF).
Christianna Williams, Self-Employed
Paper 3516-2015:
Piecewise Linear Mixed Effects Models Using SAS
Evaluation of the impact of critical or high-risk events or periods in longitudinal studies of growth might provide clues to the long-term effects of life events and efficacies of preventive and therapeutic interventions. Conventional linear longitudinal models typically involve a single growth profile to represent linear changes in an outcome variable across time, which sometimes does not fit the empirical data. The piecewise linear mixed-effects models allow different linear functions of time corresponding to the pre- and post-critical time point trends. This presentation shows: 1) how to perform piecewise linear mixed effects models using SAS step by step, in the context of a clinical trial with two-arm interventions and a predictive covariate of interest; 2) how to obtain the slopes and corresponding p-values for intervention and control groups during pre- and post-critical periods, conditional on different values of the predictive covariate; and 3) explains how to make meaningful comparisons and present results in a scientific manuscript. A SAS macro to generate the summary tables assisting the interpretation of the results is also provided.
Qinlei Huang, St Jude Children's Research Hospital
Paper SAS1774-2015:
Predictive Modeling Using SAS® Visual Statistics: Beyond the Prediction
Predictions, including regressions and classifications, are the predominant focus of many statistical and machine-learning models. However, in the era of big data, a predictive modeling process contains more than just making the final predictions. For example, a large collection of data often represents a set of small, heterogeneous populations. Identification of these sub groups is therefore an important step in predictive modeling. In addition, big data data sets are often complex, exhibiting high dimensionality. Consequently, variable selection, transformation, and outlier detection are integral steps. This paper provides working examples of these critical stages using SAS® Visual Statistics, including data segmentation (supervised and unsupervised), variable transformation, outlier detection, and filtering, in addition to building the final predictive model using methodology such as linear regressions, decision trees, and logistic regressions. The illustration data was collected from 2010 to 2014, from vehicle emission testing results.
Read the paper (PDF).
Xiangxiang Meng, SAS
Jennifer Ames, SAS
Wayne Thompson, SAS
Paper 2103-2015:
Preparing Students for the Real World with SAS® Studio
A common complaint of employers is that educational institutions do not prepare students for the types of messy data and multi-faceted requirements that occur on the job. No organization has data that resembles the perfectly scrubbed data sets in the back of a statistics textbook. The objective of the Annual Report Project is to quickly bring new SAS® users to a level of competence where they can use real data to meet real business requirements. Many organizations need annual reports for stockholders, funding agencies, or donors. Or, they need annual reports at the department or division level for an internal audience. Being tapped as part of the team creating an annual report used to mean weeks of tedium, poring over columns of numbers in 8-point font in (shudder) Excel spreadsheets, but no more. No longer painful, using a few SAS procedures and functions, reporting can be easy and, dare I say, fun. All analyses are done using SAS® Studio (formerly SAS® Web Editor) of SAS OnDemand for Academics. This paper uses an example with actual data for a report prepared to comply with federal grant funding requirements as proof that, yes, it really is that simple.
Read the paper (PDF). | Watch the recording.
AnnMaria De Mars, AnnMaria De Mars
Paper 3247-2015:
Privacy, Transparency, and Quality Improvement in the Era of Big Data and Health Care Reform
The era of big data and health care reform is an exciting and challenging time for anyone whose work involves data security, analytics, data visualization, or health services research. This presentation examines important aspects of current approaches to quality improvement in health care based on data transparency and patient choice. We look at specific initiatives related to the Affordable Care Act (for example, the qualified entity program of section 10332 that allows the Centers for Medicare and Medicaid Services (CMS) to provide Medicare claims data to organizations for multi-payer quality measurement and reporting, the open payments program, and state-level all-payer claims databases to inform improvement and public reporting) within the context of a core issue in the era of big data: security and privacy versus transparency and openness. In addition, we examine an assumption that underlies many of these initiatives: data transparency leads to improved choices by health care consumers and increased accountability of providers. For example, recent studies of one component of data transparency, price transparency, show that, although health plans generally offer consumers an easy-to-use cost calculator tool, only about 2 percent of plan members use it. Similarly, even patients with high-deductible plans (presumably those with an increased incentive to do comparative shopping) seek prices for only about 10 percent of their services. Anyone who has worked in analytics, reporting, or data visualization recognizes the importance of understanding the intended audience, and that methodological transparency is as important as the public reporting of the output of the calculation of cost or quality metrics. Although widespread use of publicly reported health care data might not be a realistic goal, data transparency does offer a number of potential benefits: data-driven policy making, informed management of cost and use of services, as well as public health benefits through, for example, the rec ognition of patterns of disease prevalence and immunization use. Looking at this from a system perspective, we can distinguish five main activities: data collection, data storage, data processing, data analysis, and data reporting. Each of these activities has important components (such as database design for data storage and de-identification and aggregation for data reporting) as well as overarching requirements such as data security and quality assurance that are applicable to all activities. A recent Health Affairs article by CMS leaders noted that the big-data revolution could not have come at a better time, but it also recognizes that challenges remain. Although CMS is the largest single payer for health care in the U.S., the challenges it faces are shared by all organizations that collect, store, analyze, or report health care data. In turn, these challenges are opportunities for database developers, systems analysts, programmers, statisticians, data analysts, and those who provide the tools for public reporting to work together to design comprehensive solutions that inform evidence-based improvement efforts.
Read the paper (PDF).
Paul Gorrell, IMPAQ International
R
Paper 1341-2015:
Random vs. Fixed Effects: Which Technique More Effectively Addresses Selection Bias in Observational Studies
Retrospective case-control studies are frequently used to evaluate health care programs when it is not feasible to randomly assign members to a respective cohort. Without randomization, observational studies are more susceptible to selection bias where the characteristics of the enrolled population differ from those of the entire population. When the participant sample is different from the comparison group, the measured outcomes are likely to be biased. Given this issue, this paper discusses how propensity score matching and random effects techniques can be used to reduce the impact selection bias has on observational study outcomes. All results shown are drawn from an ROI analysis using a participant (cases) versus non-participant (controls) observational study design for a fitness reimbursement program aiming to reduce health care expenditures of participating members.
Read the paper (PDF). | Download the data file (ZIP).
Jess Navratil-Strawn, Optum
Paper 2382-2015:
Reducing the Bias: Practical Application of Propensity Score Matching in Health-Care Program Evaluation
To stay competitive in the marketplace, health-care programs must be capable of reporting the true savings to clients. This is a tall order, because most health-care programs are set up to be available to the client's entire population and thus cannot be conducted as a randomized control trial. In order to evaluate the performance of the program for the client, we use an observational study design that has inherent selection bias due to its inability to randomly assign participants. To reduce the impact of bias, we apply propensity score matching to the analysis. This technique is beneficial to health-care program evaluations because it helps reduce selection bias in the observational analysis and in turn provides a clearer view of the client's savings. This paper explores how to develop a propensity score, evaluate the use of inverse propensity weighting versus propensity matching, and determine the overall impact of the propensity score matching method on the observational study population. All results shown are drawn from a savings analysis using a participant (cases) versus non-participant (controls) observational study design for a health-care decision support program aiming to reduce emergency room visits.
Read the paper (PDF).
Amber Schmitz, Optum
Paper 2601-2015:
Replication Techniques for Variance Approximation
Replication techniques such as the jackknife and the bootstrap have become increasingly popular in recent years, particularly within the field of complex survey data analysis. The premise of these techniques is to treat the data set as if it were the population and repeatedly sample from it in some systematic fashion. From each sample, or replicate, the estimate of interest is computed, and the variability of the estimate from the full data set is approximated by a simple function of the variability among the replicate-specific estimates. An appealing feature is that there is generally only one variance formula per method, regardless of the underlying quantity being estimated. The entire process can be efficiently implemented after appending a series of replicate weights to the analysis data set. As will be shown, the SURVEY family of SAS/STAT® procedures can be exploited to facilitate both the task of appending the replicate weights and approximating variances.
Read the paper (PDF).
Taylor Lewis, University of Maryland
Paper 3740-2015:
Risk-Adjusting Provider Performance Utilization Metrics
Pay-for-performance programs are putting increasing pressure on providers to better manage patient utilization through care coordination, with the philosophy that good preventive services and routine care can prevent the need for some high-resource services. Evaluation of provider performance frequently includes measures such as acute care events (ER and inpatient), imaging, and specialist services, yet rarely are these indicators adjusted for the underlying risk of providers' patient panel. In part, this is because standard patient risk scores are designed to predict costs, not the probability of specific service utilization. As such, Blue Cross Blue Shield of North Carolina has developed a methodology to model our members' risk of these events in an effort to ensure that providers are evaluated fairly and to prevent our providers from adverse selection practices. Our risk modeling takes into consideration members' underlying health conditions and limited demographic factors during the previous 12 month period, and employs two-part regression models using SAS® software. These risk-adjusted measures will subsequently be the basis of performance evaluation of primary care providers for our Accountable Care Organizations and medical home initiatives.
Read the paper (PDF).
Stephanie Poley, Blue Cross Blue Shield of North Carolina
S
Paper 3155-2015:
SAS(R) Formats Top Ten
SAS® formats can be used in so many different ways! Even the most basic SAS format use (modifying the way a SAS data value is displayed without changing the underlying data value) holds a variety of nifty tricks, such as nesting formats, formats that affect various style attributes, and conditional formatting. Add in picture formats, multi-label formats, using formats for data cleaning, and formats for joins and table look-ups, and we have quite a bag of tricks for the humble SAS format and PROC FORMAT, which are used to generate them. This paper describes a few very useful programming techniques that employ SAS formats. While this paper will be appropriate for the newest SAS user, it will also focus on some of the lesser-known features of formats and PROC FORMAT and so should be useful for even quite experienced users of SAS.
Read the paper (PDF).
Christianna Williams, Self-Employed
Paper 3102-2015:
SAS® Enterprise Guide® 5.1 and PROC GPLOT--the Power, the Glory and the PROC-tical Limitations
Customer expectations are set high when Microsoft Excel and Microsoft PowerPoint are used to design reports. Using SAS® for reporting has benefits because it generates plots directly from prepared data sets, automates the plotting process, minimizes labor-intensive manual construction using Microsoft products, and does not compromise the presentation value. SAS® Enterprise Guide® 5.1 has a powerful point-and-click method that is quick and easy to use. However, it is limited in its ability to customize the output to mimic manually created Microsoft graphics. This paper demonstrates why SAS Enterprise Guide is the perfect starting point for creating initial code for plots using SAS/GRAPH® point-and-click features and how the code can be enhanced using established PROC GPLOT, ANNOTATE, and ODS options to re-create the look and feel of plots generated by Excel and PowerPoint. Examples show the generation of plots and tables using PROC TABULATE to embed the plot data into the graphical output. Also included are tips for overcoming the ODS limitation of SAS® 9.3, which is used by SAS Enterprise Guide 5.1, to transfer the SAS graphical output to PowerPoint files. These SAS® 9.3 tips are contrasted with the new SAS® 9.4 ODS POWERPOINT statement that enables direct PowerPoint file creation from a SAS program.
Read the paper (PDF).
Christopher Klekar, Baylor Scott and White Health
Gabriela Cantu, Baylor Scott &White Health
Paper 3309-2015:
Snapshot SNAFU: Preventative Measures to Safeguard Deliveries
Little did you know that your last delivery ran on incomplete data. To make matters worse, the client realized the issue first. Sounds like a horror story, no? A few preventative measures can go a long way in ensuring that your data are up-to-date and progressing normally. At the data set level, metadata comparisons between the current and previous data cuts will help identify observation and variable discrepancies. Comparisons will also uncover attribute differences at the variable level. At the subject level, they will identify missing subjects. By compiling these comparison results into a comprehensive scheduled e-mail, a data facilitator need only skim the report to confirm that the data is good to go--or in need of some corrective action. This paper introduces a suite of checks contained in a macro that will compare data cuts in the data set, variable, and subject levels and produce an e-mail report. The wide use of this macro will help all SAS® users create better deliveries while avoiding rework.
Read the paper (PDF). | Download the data file (ZIP).
Spencer Childress, Rho,Inc
Alexandra Buck, Rho, Inc.
Paper 3274-2015:
Statistical Analysis of Publicly Released Survey Data with SAS/STAT® Software SURVEY Procedures
Several U.S. Federal agencies conduct national surveys to monitor health status of residents. Many of these agencies release their survey data to the public. Investigators might be able to address their research objectives by conducting secondary statistical analyses with these available data sources. This paper describes the steps in using the SAS SURVEY procedures to analyze publicly released data from surveys that use probability sampling to make statistical inference to a carefully defined population of elements (the target population).
Read the paper (PDF). | Watch the recording.
Donna Brogan, Emory University, Atlanta, GA
Paper SAS1789-2015:
Step into the Cloud: Ways to Connect to Amazon Redshift with SAS/ACCESS®
Every day, companies all over the world are moving their data into the Cloud. While there are many options available, much of this data will wind up in Amazon Redshift. As a SAS® user, you are probably wondering, 'What is the best way to access this data using SAS?' This paper discusses the many ways that you can use SAS/ACCESS® to get to Amazon Redshift. We compare and contrast the various approaches and help you decide which is best for you. Topics that are discussed are building a connection, choosing appropriate data types, and SQL functions.
Read the paper (PDF).
James (Ke) Wang, SAS
Salman Maher, SAS
Paper 3187-2015:
Structuring your SAS® Applications for Long-Term Survival: Reproducible Methods in Base SAS® Programming
SAS® users organize their applications in a variety of ways. However, there are some approaches that are more successful, and some that are less successful. In particular, the need to process some of the code some of the time in a file is sometimes challenging. Reproducible research methods require that SAS applications be understandable by the author and other staff members. In this presentation, you learn how to organize and structure your SAS application to manage the process of data access, data analysis, and data presentation. The approach to structure applications requires that tasks in the process of data analysis be compartmentalized. This can be done using a well-defined program. The author presents his structuring algorithm, and discusses the characteristics of good structuring methods for SAS applications. Reproducible research methods are becoming more centrally important, and SAS users must keep up with the current developments.
Read the paper (PDF). | Download the data file (ZIP).
Paul Thomas, ASUP Ltd
Paper 1521-2015:
Sums of Squares: The Basics and a Surprise
Most 'Design of Experiment' textbooks cover Type I, Type II, and Type III sums of squares, but many researchers and statisticians fall into the habit of using one type mindlessly. This breakout session reviews the basics and illustrates the importance of the choice of type as well as the variable definitions in PROC GLM and PROC REG.
Read the paper (PDF).
Sheila Barron, University of Iowa
Michelle Mengeling, Comprehensive Access & Delivery Research & Evaluation-CADRE, Iowa City VA Health Care System
Paper 2040-2015:
Survival Analysis with Survey Data
Surveys are designed to elicit information about population characteristics. A survey design typically combines stratification and multistage sampling of intact clusters, sub-clusters, and individual units with specified probabilities of selection. A survey sample can produce valid and reliable estimates of population parameters at a fraction of the cost of carrying out a census of the entire population, with clear logistical efficiencies. For analyses of survey data, SAS®software provides a suite of procedures from SURVEYMEANS and SURVEYFREQ for generating descriptive statistics and conducting inference on means and proportions to regression-based analysis through SURVEYREG and SURVEYLOGISTIC. For longitudinal surveys and follow-up studies, SURVEYPHREG is designed to incorporate aspects of the survey design for analysis of time-to-event outcomes based on the Cox proportional hazards model, allowing for time-varying explanatory variables.We review the salient features of the SURVEYPHREG procedure with application to survey data from the National Health and Nutrition Examination Survey (NHANES III) Linked Mortality File.
Read the paper (PDF).
JOSEPH GARDINER, MICHIGAN STATE UNIVERSITY
T
Paper SAS1804-2015:
Take Your Data Analysis and Reporting to the Next Level by Combining SAS® Office Analytics, SAS® Visual Analytics, and SAS® Studio
SAS® Office Analytics, SAS® Visual Analytics, and SAS® Studio provide excellent data analysis and report generation. When these products are combined, their deep interoperability enables you to take your analysis and reporting to the next level. Build interactive reports in SAS® Visual Analytics Designer, and then view, customize and comment on them from Microsoft Office and SAS® Enterprise Guide®. Create stored processes in SAS Enterprise Guide, and then run them in SAS Visual Analytics Designer, mobile tablets, or SAS Studio. Run your SAS Studio tasks in SAS Enterprise Guide and Microsoft Office using data provided by those applications. These interoperability examples and more will enable you to combine and maximize the strength of each of the applications. Learn more about this integration between these products and what's coming in the future in this session.
Read the paper (PDF).
David Bailey, SAS
Tim Beese, SAS
Casey Smith, SAS
Paper SAS1387-2015:
Ten Tips for Simulating Data with SAS®
Data simulation is a fundamental tool for statistical programmers. SAS® software provides many techniques for simulating data from a variety of statistical models. However, not all techniques are equally efficient. An efficient simulation can run in seconds, whereas an inefficient simulation might require days to run. This paper presents 10 techniques that enable you to write efficient simulations in SAS. Examples include how to simulate data from a complex distribution and how to use simulated data to approximate the sampling distribution of a statistic.
Read the paper (PDF). | Download the data file (ZIP).
Rick Wicklin, SAS
Paper 3504-2015:
The %ic_mixed Macro: A SAS Macro to Produce Sorted Information Criteria (AIC and BIC) List for PROC MIXED for Model Selection
PROC MIXED is one of the most popular SAS procedures to perform longitudinal analysis or multilevel models in epidemiology. Model selection is one of the fundamental questions in model building. One of the most popular and widely used strategies is model selection based on information criteria, such as Akaike Information Criterion (AIC) and Sawa Bayesian Information Criterion (BIC). This strategy considers both fit and complexity, and enables multiple models to be compared simultaneously. However, there is no existing SAS procedure to perform model selection automatically based on information criteria for PROC MIXED, given a set of covariates. This paper provides information about using the SAS %ic_mixed macro to select a final model with the smallest value of AIC and BIC. Specifically, the %ic_mixed macro will do the following: 1) produce a complete list of all possible model specifications given a set of covariates, 2) use do loop to read in one model specification every time and save it in a macro variable, 3) execute PROC MIXED and use the Output Delivery System (ODS) to output AICs and BICs, 4) append all outputs and use the DATA step to create a sorted list of information criteria with model specifications, and 5) run PROC REPORT to produce the final summary table. Based on the sorted list of information criteria, researchers can easily identify the best model. This paper includes the macro programming language, as well as examples of the macro calls and outputs.
Read the paper (PDF).
Qinlei Huang, St Jude Children's Research Hospital
Paper 3060-2015:
The Knight's Tour in Chess--Implementing a Heuristic Solution
The knight's tour is a sequence of moves on a chess board such that a knight visits each square only once. Using a heuristic method, it is possible to find a complete path, beginning from any arbitrary square on the board and landing on the remaining squares only once. However, the implementation poses challenging programming problems. For example, it is necessary to discern viable knight moves, which change throughout the tour. Even worse, the heuristic approach does not guarantee a solution. This paper explains a SAS® solution that finds a knight's tour beginning from every initial square on a chess board...well, almost.
Read the paper (PDF).
John R Gerlach, Dataceutics, Inc.
Paper 3361-2015:
The More Trees, the Better! Scaling Up Performance Using Random Forest in SAS® Enterprise Miner™
Random Forest (RF) is a trademarked term for an ensemble approach to decision trees. RF was introduced by Leo Breiman in 2001.Due to our familiarity with decision trees--one of the intuitive, easily interpretable models that divides the feature space with recursive partitioning and uses sets of binary rules to classify the target--we also know some of its limitations such as over-fitting and high variance. RF uses decision trees, but takes a different approach. Instead of growing one deep tree, it aggregates the output of many shallow trees and makes a strong classifier model. RF significantly improves the accuracy of classification by growing an ensemble of trees and allowing for the selection of the most popular one. Unlike decision trees, RF has a robustness against over-fitting and high variance, since it randomly selects a subset of variables in each split node. This paper demonstrates this simple yet powerful classification algorithm by building an income-level prediction system. Data extracted from the 1994 Census Bureau database was used for this study. The data set comprises information about 14 key attributes for 45,222 individuals. Using SAS® Enterprise Miner™ 13.1, models such as random forest, decision tree, probability decision tree, gradient boosting, and logistic regression were built to classify the income level( >50K or <50k) of the population. The results showed that the random forest model was the best model for this data, based on the misclassification rate criteria. The RF model predicts the income-level group of the individuals with an accuracy of 85.1%, with the predictors capturing specific characteristic patterns. This demonstration using SAS® can lead to useful insights into RF for solving classification problems.
Read the paper (PDF).
Narmada Deve Panneerselvam, OSU
Paper SAS1745-2015:
The SEM Approach to Longitudinal Data Analysis Using the CALIS Procedure
Researchers often use longitudinal data analysis to study the development of behaviors or traits. For example, they might study how an elderly person's cognitive functioning changes over time or how a therapeutic intervention affects a certain behavior over a period of time. This paper introduces the structural equation modeling (SEM) approach to analyzing longitudinal data. It describes various types of latent curve models and demonstrates how you can use the CALIS procedure in SAS/STAT® software to fit these models. Specifically, the paper covers basic latent curve models, such as unconditional and conditional models, as well as more complex models that involve multivariate responses and latent factors. All illustrations use real data that were collected in a study that looked at maternal stress and the relationship between mothers and their preterm infants. This paper emphasizes the practical aspects of longitudinal data analysis. In addition to illustrating the program code, it shows how you can interpret the estimation results and revise the model appropriately. The final section of the paper discusses the advantages and disadvantages of the SEM approach to longitudinal data analysis.
Read the paper (PDF).
Xinming An, SAS
Yiu-Fai Yung, SAS
Paper 3741-2015:
The Spatio-Temporal Impact of Urgent Care Centers on Physician and ER Use
The unsustainable trend in healthcare costs has led to efforts to shift some healthcare services to less expensive sites of care. In North Carolina, the expansion of urgent care centers introduces the possibility that non-emergent and non-life threatening conditions can be treated at a less intensive care setting. BCBSNC conducted a longitudinal study of density of urgent care centers, primary care providers, and emergency departments, and the differences in how members access care near those locations. This talk focuses on several analytic techniques that were considered for the analysis. The model needed to account for the complex relationship between the changes in the population (including health conditions and health insurance benefits) and the changes in the types of services and supply of services offered by healthcare providers proximal to them. Results for the chosen methodology are discussed.
Read the paper (PDF).
Laurel Trantham, Blue Cross and Blue Shield North Carolina
Paper 3338-2015:
Time Series Modeling and Forecasting--An Application to Stress-Test Banks
Did you ever wonder how large US bank holding companies (BHCs) perform stress testing? I had the pleasure to be a part of this process on the model building end, and now I perform model validation. As with everything that is new and uncertain, there is much room for the discovery process. This presentation explains how banks in general perform time series modeling of different loans and credits to establish the bank's position during simulated stress. You learn the basic process behind model building and validation for Comprehensive Capital Analysis and Review (CCAR) purposes, which includes, but is not limited to, back testing, sensitivity analysis, scenario analysis, and model assumption testing. My goal is to gain your interest in the areas of challenging current modeling techniques and looking beyond standard model assumption testing to assess the true risk behind the formulated model and its consequences. This presentation examines the procedures that happen behind the scenes of any code's syntax to better explore statistics that play crucial roles in assessing model performance and forecasting. Forecasting future periods is the process that needs more attention and a better understanding because this is what the CCAR is really all about. In summary, this presentation engages professionals and students to dig dipper into every aspect of time series forecasting.
Read the paper (PDF).
Ania Supady, KeyCorp
Paper 1600-2015:
Tips for Publishing in Health Care Journals with the Medical Expenditure Panel Survey (MEPS) Data Using SAS®
This presentation provides an in-depth analysis, with example SAS® code, of the health care use and expenditures associated with depression among individuals with heart disease using the 2012 Medical Expenditure Panel Survey (MEPS) data. A cross-sectional study design was used to identify differences in health care use and expenditures between depressed (n = 601) and nondepressed (n = 1,720) individuals among patients with heart disease in the United States. Multivariate regression analyses using the SAS survey analysis procedures were conducted to estimate the incremental health services and direct medical costs (inpatient, outpatient, emergency room, prescription drugs, and other) attributable to depression. The prevalence of depression among individuals with heart disease in 2012 was estimated at 27.1% (6.48 million persons) and their total direct medical costs were estimated at approximately $110 billion in 2012 U.S. dollars. Younger adults (< 60 years), women, unmarried, poor, and sicker individuals with heart disease were more likely to have depression. Patients with heart disease and depression had more hospital discharges (relative ratio (RR) = 1.06, 95% confidence interval (CI) [1.02 to 1.09]), office-based visits (RR = 1.27, 95% CI [1.15 to 1.41]), emergency room visits (RR = 1.08, 95% CI [1.02 to 1.14]), and prescribed medicines (RR = 1.89, 95% CI [1.70, 2.11]) than their counterparts without depression. Finally, among individuals with heart disease, overall health care expenditures for individuals with depression was 69% higher than that for individuals without depression (RR = 1.69, 95% CI [1.44, 1.99]). The conclusion is that depression in individuals with heart disease is associated with increased health care use and expenditures, even after adjusting for differences in age, gender, race/ethnicity, marital status, poverty level, and medical comorbidity.
Read the paper (PDF).
Seungyoung Hwang, Johns Hopkins University Bloomberg School of Public Health
Paper 3423-2015:
To Believe or Not To Believe? The Truth of Data Analytics Results
Drawing on the results from machine learning, exploratory statistics, and a variety of related methodologies, data analytics is becoming one of the hottest areas in a variety of global industries. The utility and application of these analyses have been extremely impressive and have led to successes ranging from business value generation to hospital infection control applications. This presentation examines the philosophical foundations epistemology associated with scientific discovery and shows whether the currently used analytics techniques rest on a rational philosophy of science. Examples are provided to assist in making the concepts more concrete to the business and scientific user.
Read the paper (PDF). | Watch the recording.
Mike Hardin, The University of Alabama
U
Paper 3408-2015:
Understanding Patterns in the Utilization and Costs of Elbow Reconstruction Surgeries: A Healthcare Procedure that is Common among Baseball Pitchers
Athletes in sports, such as baseball and softball, commonly undergo elbow reconstruction surgeries. There is research that suggests that the rate of elbow reconstruction surgeries among professional baseball pitchers continues to rise by leaps and bounds. Given the trend found among professional pitchers, the current study reviews patterns of elbow reconstruction surgery among the privately insured population. The study examined trends (for example, cost, age, geography, and utilization) in elbow reconstruction surgeries among privately insured patients using analytic tools such as SAS® Enterprise Guide® and SAS® Visual Analytics, based on the medical and surgical claims data from the FAIR Health National Private Insurance Claims (NPIC) database. The findings of the study suggested that there are discernable patterns in the prevalence of elbow reconstruction surgeries over time and across specific geographic regions.
Read the paper (PDF). | Download the data file (ZIP).
Jeff Dang, FAIR Health
Paper 3141-2015:
Unstructured Data Mining to Improve Customer Experience in Interactive Voice Response Systems
Interactive Voice Response (IVR) systems are likely one of the best and worst gifts to the world of communication, depending on who you ask. Businesses love IVR systems because they take out hundreds of millions of dollars of call center costs in automation of routine tasks, while consumers hate IVRs because they want to talk to an agent! It is a delicate balancing act to manage an IVR system that saves money for the business, yet is smart enough to minimize consumer abrasion by knowing who they are, why they are calling, and providing an easy automated solution or a quick route to an agent. There are many aspects to designing such IVR systems, including engineering, application development, omni-channel integration, user interface design, and data analytics. For larger call volume businesses, IVRs generate terabytes of data per year, with hundreds of millions of rows per day that track all system and customer- facing events. The data is stored in various formats and is often unstructured (lengthy character fields that store API return information or text fields containing consumer utterances). The focus of this talk is the development of a data mining framework based on SAS® that is used to parse and analyze IVR data in order to provide insights into usability of the application across various customer segments. Certain use cases are also provided.
Read the paper (PDF).
Dmitriy Khots, West Corp
Paper 2740-2015:
Using Heat Maps to Compare Clusters of Ontario DWI Drivers
SAS® PROC FASTCLUS generates five clusters for the group of repeat clients of Ontario's Remedial Measures program. Heat map tables are shown for selected variables such as demographics, scales, factor, and drug use to visualize the difference between clusters.
Rosely Flam-Zalcman, CAMH
Robert Mann, CAM
Rita Thomas, CAMH
Paper 3101-2015:
Using SAS® Enterprise Miner™ to Predict Breast Cancer at an Early Stage
Breast cancer is the leading cause of cancer-related deaths among women worldwide, and its early detection can reduce mortality rate. Using a data set containing information about breast screening provided by the Royal Liverpool University Hospital, we constructed a model that can provide early indication of a patient's tendency to develop breast cancer. This data set has information about breast screening from patients who were believed to be at risk of developing breast cancer. The most important aspect of this work is that we excluded variables that are in one way or another associated with breast cancer, while keeping the variables that are less likely to be associated with breast cancer or whose associations with breast cancer are unknown as input predictors. The target variable is a binary variable with two values, 1 (indicating a type of cancer is present) and 0 (indicating a type of cancer is not present). SAS® Enterprise Miner™ 12.1 was used to perform data validation and data cleansing, to identify potentially related predictors, and to build models that can be used to predict at an early stage the likelihood of patients developing breast cancer. We compared two models: the first model was built with an interactive node and a cluster node, and the second was built without an interactive node and a cluster node. Classification performance was compared using a receiver operating characteristic (ROC) curve and average squares error. Interestingly, we found significantly improved model performance by using only variables that have a lesser or unknown association with breast cancer. The result shows that the logistic model with an interactive node and a cluster node has better performance with a lower average squared error (0.059614) than the model without an interactive node and a cluster node. Among other benefits, this model will assist inexperienced oncologists in saving time in disease diagnosis.
Read the paper (PDF).
Gibson Ikoro, Queen Mary University of London
Beatriz de la Iglesia, University of East Anglia, Norwich, UK
Paper 3484-2015:
Using SAS® Enterprise Miner™ to Predict the Number of Rings on an Abalone Shells Using Its Physical Characteristics
Abalone is a common name given to sea snails or mollusks. These creatures are highly iridescent, with shells of strong changeable colors. This characteristic makes the shells attractive to humans as decorative objects and jewelry. The abalone structure is being researched to build body armor. The value of a shell varies by its age and the colors it displays. Determining the number of rings on an abalone is a tedious and cumbersome task and is usually done by cutting the shell through the cone, staining it, and counting the number of rings on it through a microscope. In this poster, I aim to predict the number of any rings on any abalone by using the physical characteristics. This data was obtained from UCI Machine Learning Repository, which consists of 4,177 observations with 8 attributes. I considered the number of rings to be my target variable. The abalone's age can be reasonably approximated as being 1.5 times the number of rings on its shell. Using SAS® Enterprise Miner™, I have built regression models and neural network models to determine the physical measurements responsible for determining the number of rings on the abalone. While I have obtained a coefficient of determination of 54.01%, my aim is to improve and expand the analysis using the power of SAS Enterprise Miner. The current initial results indicate that the height, the shucked weight, and the viscera weight of the shell are the three most influential variables in predicting the number of rings on an abalone.
Ganesh Kumar Gangarajula, Oklahoma State University
Yogananda Domlur Seetharam
Paper 1340-2015:
Using SAS® Macros to Flag Claims Based on Medical Codes
Many epidemiological studies use medical claims to identify and describe a population. But finding out who was diagnosed, and who received treatment, isn't always simple. Each claim can have dozens of medical codes, with different types of codes for procedures, drugs, and diagnoses. Even a basic definition of treatment could require a search for any one of 100 different codes. A SAS® macro may come to mind, but generalizing the macro to work with different codes and types allows it to be reused in a variety of different scenarios. We look at a number of examples, starting with a single code type and variable. Then we consider multiple code variables, multiple code types, and multiple flag variables. We show how these macros can be combined and customized for different data with minimal rework. Macro flexibility and reusability are also discussed, along with ways to keep our list of medical codes separate from our program. Finally, we discuss time-dependent medical codes, codes requiring database lookup, and macro performance.
Read the paper (PDF). | Download the data file (ZIP).
Andy Karnopp, Fred Hutchinson Cancer Research Center
Paper SAS1951-2015:
Using SAS® Text Analytics to Examine Labor and Delivery Sentiments on the Internet
In today's society, where seemingly unlimited information is just a mouse click away, many turn to social media, forums, and medical websites to research and understand how mothers feel about the birthing process. Mining the data in these resources helps provide an understanding of what mothers value and how they feel. This paper shows the use of SAS® Text Analytics to gather, explore, and analyze reports from mothers to determine their sentiment about labor and delivery topics. Results of this analysis could aid in the design and development of a labor and delivery survey and be used to understand what characteristics of the birthing process yield the highest levels of importance. These resources can then be used by labor and delivery professionals to engage with mothers regarding their labor and delivery preferences.
Read the paper (PDF).
Michael Wallis, SAS
Paper 3281-2015:
Using SAS® to Create Episodes-of-Hospitalization for Health Services Research
An essential part of health services research is describing the use and sequencing of a variety of health services. One of the most frequently examined health services is hospitalization. A common problem in describing hospitalizations is that a patient might have multiple hospitalizations to treat the same health problem. Specifically, a hospitalized patient might be (1) sent to and returned from another facility in a single day for testing, (2) transferred from one hospital to another, and/or (3) discharged home and re-admitted within 24 hours. In all cases, these hospitalizations are treating the same underlying health problem and should be considered as a single episode. If examined without regard for the episode, a patient would be identified as having 4 hospitalizations (the initial hospitalization, the testing hospitalization, the transfer hospitalization, and the readmission hospitalization). In reality, they had one hospitalization episode spanning multiple facilities. IMPORTANCE: Failing to account for multiple hospitalizations in the same episode has implications for many disciplines including health services research, health services planning, and quality improvement for patient safety. HEALTH SERVICES RESEARCH: Hospitalizations will be counted multiple times, leading to an overestimate of the number of hospitalizations a person had. For example, a person can be identified as having 4 hospitalizations when in reality they had one episode of hospitalization. This will result in a person appearing to be a higher user of health care than is true. RESOURCE PLANNING FOR HEALTH SERVICES. The average time and resources needed to treat a specific health problem may be underestimated. To illustrate, if a patient spends 10 days each in 3 different hospitals in the same episode, the total number of days needed to treat the health problem is 30 days, but each hospital will believe it is only 10, and planned resourcing may be inadequate. QUALITY IMPROVEMENT FOR PATIENT SAFETY. Hospital-acquir ed infections are a serious concern and a major cause of extended hospital stays, morbidity, and death. As a result, many hospitals have quality improvement programs that monitor the occurrence of infections in order to identify ways to reduce them. If episodes of hospitalizations are not considered, an infection acquired in a hospital that does not manifest until a patient is transferred to a different hospital will incorrectly be attributed to the receiving hospital. PROPOSAL: We have developed SAS® code to identify episodes of hospitalizations, the sequence of hospitalizations within each episode, and the overall duration of the episode. The output clearly displays the data in an intuitive and easy-to-understand format. APPLICATION: The method we will describe and the associated SAS code will be useful to not only health services researchers, but also anyone who works with temporal data that includes nested, overlapping, and subsequent events.
Read the paper (PDF).
Meriç Osman, Health Quality Council
Jacqueline Quail, Saskatchewan Health Quality Council
Nianping Hu, Saskatchewan Health Quality Council
Nedeene Hudema, Saskatchewan Health Quality Council
Paper 3644-2015:
Using and Understanding the LSMEANS and LSMESTIMATE Statements
The concept of least squares means, or population marginal means, seems to confuse a lot of people. We explore least squares means as implemented by the LSMEANS statement in SAS®, beginning with the basics. Particular emphasis is paid to the effect of alternative parameterizations (for example, whether binary variables are in the CLASS statement) and the effect of the OBSMARGINS option. We use examples to show how to mimic LSMEANS using ESTIMATE statements and the advantages of the relatively new LSMESTIMATE statement. The basics of estimability are discussed, including how to get around the dreaded non-estimable messages. Emphasis is put on using the STORE statement and PROC PLM to test hypotheses without having to redo all the model calculations. This material is appropriate for all levels of SAS experience, but some familiarity with linear models is assumed.
Read the paper (PDF). | Watch the recording.
David Pasta, ICON Clinical Research
Paper 1335-2015:
Using the GLIMMIX and GENMOD Procedures to Analyze Longitudinal Data from a Department of Veterans Affairs Multisite Randomized Controlled Trial
Many SAS® procedures can be used to analyze longitudinal data. This study employed a multisite randomized controlled trial design to demonstrate the effectiveness of two SAS procedures, GLIMMIX and GENMOD, to analyze longitudinal data from five Department of Veterans Affairs Medical Centers (VAMCs). Older male veterans (n = 1222) seen in VAMC primary care clinics were randomly assigned to two behavioral health models, integrated (n = 605) and enhanced referral (n = 617). Data was collected at baseline, and at 3-, 6-, and 12- month follow-up. A mixed-effects repeated measures model was used to examine the dependent variable, problem drinking, which was defined as count and dichotomous from baseline to 12 month follow-up. Sociodemographics and depressive symptoms were included as covariates. First, bivariate analyses included general linear model and chi-square tests to examine covariates by group and group by problem drinking outcomes. All significant covariates were included in the GLIMMIX and GENMOD models. Then, multivariate analysis included mixed models with Generalized Estimation Equations (GEEs). The effect of group, time, and the interaction effect of group by time were examined after controlling for covariates. Multivariate results were inconsistent for GLIMMIX and GENMOD using Lognormal, Gaussian, Weibull, and Gamma distributions. SAS is a powerful statistical program in data analyses for longitudinal study.
Read the paper (PDF).
Abbas Tavakoli, University of South Carolina/College of Nursing
Marlene Al-Barwani, University of South Carolina
Sue Levkoff, University of South Carolina
Selina McKinney, University of South Carolina
Nikki Wooten, University of South Carolina
Paper SAS1502-2015:
Using the OPTMODEL Procedure in SAS/OR® to Solve Complex Problems
Mathematical optimization is a powerful paradigm for modeling and solving business problems that involve interrelated decisions about resource allocation, pricing, routing, scheduling, and similar issues. The OPTMODEL procedure in SAS/OR® software provides unified access to a wide range of optimization solvers and supports both standard and customized optimization algorithms. This paper illustrates PROC OPTMODEL's power and versatility in building and solving optimization models and describes the significant improvements that result from PROC OPTMODEL's many new features. Highlights include the recently added support for the network solver, the constraint programming solver, and the COFOR statement, which allows parallel execution of independent solver calls. Best practices for solving complex problems that require access to more than one solver are also demonstrated.
Read the paper (PDF). | Download the data file (ZIP).
Rob Pratt, SAS
Paper SAS1855-2015:
Using the PHREG Procedure to Analyze Competing-Risks Data
Competing risks arise in studies in which individuals are subject to a number of potential failure events and the occurrence of one event might impede the occurrence of other events. For example, after a bone marrow transplant, a patient might experience a relapse or might die while in remission. You can use one of the standard methods of survival analysis, such as the log-rank test or Cox regression, to analyze competing-risks data, whereas other methods, such as the product-limit estimator, might yield biased results. An increasingly common practice of assessing the probability of a failure in competing-risks analysis is to estimate the cumulative incidence function, which is the probability subdistribution function of failure from a specific cause. This paper discusses two commonly used regression approaches for evaluating the relationship of the covariates to the cause-specific failure in competing-risks data. One approach models the cause-specific hazard, and the other models the cumulative incidence. The paper shows how to use the PHREG procedure in SAS/STAT® software to fit these models.
Read the paper (PDF).
Ying So, SAS
Paper 3376-2015:
Using the SAS-PIRT Macro for Estimating the Parameters of Polytomous Items
Polytomous items have been widely used in educational and psychological settings. As a result, the demand for statistical programs that estimate the parameters of polytomous items has been increasing. For this purpose, Samejima (1969) proposed the graded response model (GRM), in which category characteristic curves are characterized by the difference of the two adjacent boundary characteristic curves. In this paper, we show how the SAS-PIRT macro (a SAS® macro written in SAS/IML®) was developed based on the GRM and how it performs in recovering the parameters of polytomous items using simulated data.
Read the paper (PDF).
Sung-Hyuck Lee, ACT, Inc.
V
Paper SAS1888-2015:
Visualizing Clinical Trial Data: Small Data, Big Insights
Data visualization is synonymous with big data, for which billions of records and millions of variables are analyzed simultaneously. But that does not mean data scientists analyzing clinical trial data that include only thousands of records and hundreds of variables cannot take advantage of data visualization methodologies. This paper presents a comprehensive process for loading standard clinical trial data into SAS® Visual Analytics, an interactive analytic solution. The process implements template reporting for a wide variety of point-and-click visualizations. Data operations required to support this reporting are explained and examples of the actual visualizations are presented so that users can implement this reporting using their own data.
Read the paper (PDF).
Michael Drutar, SAS
Elliot Inman, SAS
W
Paper 3600-2015:
When Two Are Better Than One: Fitting Two-Part Models Using SAS
In many situations, an outcome of interest has a large number of zero outcomes and a group of nonzero outcomes that are discrete or highly skewed. For example, in modeling health care costs, some patients have zero costs, and the distribution of positive costs are often extremely right-skewed. When modeling charitable donations, many potential donors give nothing, and the majority of donations are relatively small with a few very large donors. In the analysis of count data, there are also times where there are more zeros than would be expected using standard methodology, or cases where the zeros might differ substantially than the non-zeros, such as number of cavities a patient has at a dentist appointment or number of children born to a mother. If data has such structure, and ordinary least squares methods are used, then predictions and estimation might be inaccurate. The two-part model gives us a flexible and useful modeling framework in many situations. Methods for fitting the models with SAS® software are illustrated.
Read the paper (PDF).
Laura Kapitula, Grand Valley State University
back to top