Health Care Papers A-Z

A
Paper 1603-2015:
A Model Family for Hierarchical Data with Combined Normal and Conjugate Random Effects
Non-Gaussian outcomes are often modeled using members of the so-called exponential family. Notorious members are the Bernoulli model for binary data, leading to logistic regression, and the Poisson model for count data, leading to Poisson regression. Two of the main reasons for extending this family are (1) the occurrence of overdispersion, meaning that the variability in the data is not adequately described by the models, which often exhibit a prescribed mean-variance link, and (2) the accommodation of hierarchical structure in the data, stemming from clustering in the data which, in turn, might result from repeatedly measuring the outcome, for various members of the same family, and so on. The first issue is dealt with through a variety of overdispersion models such as the beta-binomial model for grouped binary data and the negative-binomial model for counts. Clustering is often accommodated through the inclusion of random subject-specific effects. Though not always, one conventionally assumes such random effects to be normally distributed. While both of these phenomena might occur simultaneously, models combining them are uncommon. This paper proposes a broad class of generalized linear models accommodating overdispersion and clustering through two separate sets of random effects. We place particular emphasis on so-called conjugate random effects at the level of the mean for the first aspect and normal random effects embedded within the linear predictor for the second aspect, even though our family is more general. The binary, count, and time-to-event cases are given particular emphasis. Apart from model formulation, we present an overview of estimation methods, and then settle for maximum likelihood estimation with analytic-numerical integration. Implications for the derivation of marginal correlations functions are discussed. The methodology is applied to data from a study of epileptic seizures, a clinical trial for a toenail infection named onychomycosis, and survival data in children with asthma.
Read the paper (PDF). | Watch the recording.
Geert Molenberghs, Universiteit Hasselt & KU Leuven
Paper 3560-2015:
A SAS Macro to Calculate the PDC Adjustment of Inpatient Stays
The Centers for Medicare & Medicaid Services (CMS) uses the Proportion of Days Covered (PDC) to measure medication adherence. There is also some PDC-related research based on Medicare Part D Event (PDE) Data. However, Under Medicare rules, beneficiaries who receive care at an Inpatient (IP) [facility] may receive Medicare covered medications directly from the IP, rather than by filling prescriptions through their Part D contracts; thus, their medication fills during an IP stay would not be included in the PDE claims used to calculate the Patient Safety adherence measures. (Medicare 2014 Part C&D star rating technical notes). Therefore, the previous PDC calculation method underestimated the true PDC value. Starting with 2013 Star rating, PDC calculation was adjusted with IP stays. This is, when a patient has an inpatient admission during the measurement period, the inpatient stays are censored for the PDC calculation. If the patient also has measured drug coverage during the inpatient stay, the drug supplied during inpatient stay will be shifted after the inpatient stay. This shifting also causes a chain of shifting. This paper presents a SAS R Macro using the SAS Hash Object to match inpatient stays, censoring the inpatient stays, shifting the drug starting and ending dates, and calculating the adjusted PDC.
Read the paper (PDF). | Download the data file (ZIP).
anping Chang, IHRC Inc.
Paper 2141-2015:
A SAS® Macro to Compare Predictive Values of Diagnostic Tests
Medical tests are used for various purposes including diagnosis, prognosis, risk assessment and screening. Statistical methodology is used often to evaluate such types of tests, most frequent measures used for binary data being sensitivity, specificity, positive and negative predictive values. An important goal in diagnostic medicine research is to estimate and compare the accuracies of such tests. In this paper I give a gentle introduction to measures of diagnostic test accuracy and introduce a SAS® macro to calculate generalized score statistic and weighted generalized score statistic for comparison of predictive values using formula's generalized and proposed by Andrzej S. Kosinski.
Read the paper (PDF).
Lovedeep Gondara, University of Illinois Springfield
Paper 2980-2015:
A Set of SAS® Macros for Generating Survival Analysis Reports for Lifetime Data with or without Competing Risks
The paper introduces users to how they can use a set of SAS® macros, %LIFETEST and %LIFETESTEXPORT, to generate survival analysis reports for data with or without competing risks. The macros provide a wrapper of PROC LIFETEST and an enhanced version of the SAS autocall macro %CIF to give users an easy-to-use interface to report both survival estimates and cumulative incidence estimates in a unified way. The macros also provide a number of parameters to enable users to flexibly adjust how the final reports should look without the need to manually input or format the final reports.
Read the paper (PDF). | Download the data file (ZIP).
Zhen-Huan Hu, Medical College of Wisconsin
Paper 3311-2015:
Adaptive Randomization Using PROC MCMC
Based on work by Thall et al. (2012), we implement a method for randomizing patients in a Phase II trial. We accumulate evidence that identifies which dose(s) of a cancer treatment provide the most desirable profile, per a matrix of efficacy and toxicity combinations rated by expert oncologists (0-100). Experts also define the region of Good utility scores and criteria of dose inclusion based on toxicity and efficacy performance. Each patient is rated for efficacy and toxicity at a specified time point. Simulation work is done mainly using PROC MCMC in which priors and likelihood function for joint outcomes of efficacy and toxicity are defined to generate posteriors. Resulting joint probabilities for doses that meet the inclusion criteria are used to calculate the mean utility and probability of having Good utility scores. Adaptive randomization probabilities are proportional to the probabilities of having Good utility scores. A final decision of the optimal dose will be made at the end of the Phase II trial.
Read the paper (PDF).
Qianyi Huang, McDougall Scientific Ltd.
John Amrhein, McDougall Scientific Ltd.
Paper SAS2603-2015:
Addressing AML Regulatory Pressures by Creating Customer Risk Rating Models with Ordinal Logistic Regression
With increasing regulatory emphasis on using more scientific statistical processes and procedures in the Bank Secrecy Act/Anti-Money Laundering (BSA/AML) compliance space, financial institutions are being pressured to replace their heuristic, rule-based customer risk rating models with well-established, academically supported, statistically based models. As part of their customer-enhanced due diligence, firms are expected to both rate and monitor every customer for the overall risk that the customer poses. Firms with ineffective customer risk rating models can face regulatory enforcement actions such as matters requiring attention (MRAs); the Office of the Comptroller of the Currency (OCC) can issue consent orders for federally chartered banks; and the Federal Deposit Insurance Corporation (FDIC) can take similar actions against state-chartered banks. Although there is a reasonable amount of information available that discusses the use of statistically based models and adherence to the OCC bulletin Supervisory Guidance on Model Risk Management (OCC 2011-12), there is only limited material about the specific statistical techniques that financial institutions can use to rate customer risk. This paper discusses some of these techniques; compares heuristic, rule-based models and statistically based models; and suggests ordinal logistic regression as an effective statistical modeling technique for assessing customer BSA/AML compliance risk. In discussing the ordinal logistic regression model, the paper addresses data quality and the selection of customer risk attributes, as well as the importance of following the OCC's key concepts for developing and managing an effective model risk management framework. Many statistical models can be used to assign customer risk, but logistic regression, and in this case ordinal logistic regression, is a fairly common and robust statistical method of assigning customers to ordered classifications (such as Low, Medium, High-Low, High-Medium, and High-High risk). Using ordinal logistic regression, a financial institution can create a customer risk rating model that is effective in assigning risk, justifiable to regulators, and relatively easy to update, validate, and maintain.
Read the paper (PDF).
Edwin Rivera, SAS
Jim West, SAS
Paper 3197-2015:
All Payer Claims Databases (APCDs) in Data Transparency and Quality Improvement
Since Maine established the first All Payer Claims Database (APCD) in 2003, 10 additional states have established APCDs and 30 others are in development or show strong interest in establishing APCDs. APCDs are generally mandated by legislation, though voluntary efforts exist. They are administered through various agencies, including state health departments or other governmental agencies and private not-for-profit organizations. APCDs receive funding from various sources, including legislative appropriations and private foundations. To ensure sustainability, APCDs must also consider the sale of data access and reports as a source of revenue. With the advent of the Affordable Care Act, there has been an increased interest in APCDs as a data source to aid in health care reform. The call for greater transparency in health care pricing and quality, development of Patient-Centered Medical Homes (PCMHs) and Accountable Care Organizations (ACOs), expansion of state Medicaid programs, and establishment of health insurance and health information exchanges have increased the demand for the type of administrative claims data contained in an APCD. Data collection, management, analysis, and reporting issues are examined with examples from implementations of live APCDs. Developing data intake, processing, warehousing, and reporting standards are discussed in light of achieving the triple aim of improving the individual experience of care; improving the health of populations; and reducing the per capita costs of care. APCDs are compared and contrasted with other sources of state-level health care data, including hospital discharge databases, state departments of insurance records, and institutional and consumer surveys. The benefits and limitations of administrative claims data are reviewed. Specific issues addressed with examples include implementing transparent reporting of service prices and provider quality, maintaining master patient and provider identifiers, validating APCD data and comparison with o ther state health care data available to researchers and consumers, defining data suppression rules to ensure patient confidentiality and HIPAA-compliant data release and reporting, and serving multiple end users, including policy makers, researchers, and consumers with appropriately consumable information.
Read the paper (PDF). | Watch the recording.
Paul LaBrec, 3M Health Information Systems
Paper 3294-2015:
An Introduction to Testing for Unit Roots Using SAS®: The Case of U.S. National Health Expenditures
Testing for unit roots and determining whether a data set is nonstationary is important for the economist who does empirical work. SAS® enables the user to detect unit roots using an array of tests: the Dickey-Fuller, Augmented Dickey-Fuller, Phillips-Perron, and the Kwiatkowski-Phillips-Schmidt-Shin test. This paper presents a brief overview of unit roots and shows how to test for a unit root using the example of U.S. national health expenditure data.
Read the paper (PDF). | Download the data file (ZIP).
Don McCarthy, Kaiser Permanente
Paper 2600-2015:
An Introductory Overview of the Features of Complex Survey Data
A complex survey data set is one characterized by any combination of the following four features: stratification, clustering, unequal weights, or finite population correction factors. In this paper, we provide context for why these features might appear in data sets produced from surveys, highlight some of the formulaic modifications they introduce, and outline the syntax needed to properly account for them. Specifically, we explain why you should use the SURVEY family of SAS/STAT® procedures, such as PROC SURVEYMEANS or PROC SURVEYREG, to analyze data of this type. Although many of the syntax examples are drawn from a fictitious expenditure survey, we also discuss the origins of complex survey features in three real-world survey efforts sponsored by statistical agencies of the United States government--namely, the National Ambulatory Medical Care Survey, the National Survey of Family of Growth, and the Consumer Building Energy Consumption Survey.
Read the paper (PDF). | Watch the recording.
Taylor Lewis, University of Maryland
Paper SAS1759-2015:
An Overview of Econometrics Tools in SAS/ETS®: Explaining the Past and Modeling the Future:
The importance of econometrics in the analytics toolkit is increasing every day. Econometric modeling helps uncover structural relationships in observational data. This paper highlights the many recent changes to the SAS/ETS® portfolio that increase your power to explain the past and predict the future. Examples show how you can use Bayesian regression tools for price elasticity modeling, use state space models to gain insight from inconsistent time series, use panel data methods to help control for unobserved confounding effects, and much more.
Read the paper (PDF).
Mark Little, SAS
Kenneth Sanford, SAS
Paper 3473-2015:
Analytics for NHL Hockey
This presentation provides an overview of the advancement of analytics in National Hockey League (NHL) hockey, including how it applies to areas such as player performance, coaching, and injuries. The speaker discusses his analysis on predicting concussions that was featured in the New York Times, as well as other examples of statistical analysis in hockey.
Read the paper (PDF).
Peter Tanner, Capital One
Paper SAS1332-2015:
Analyzing Spatial Point Patterns Using the New SPP Procedure
In many spatial analysis applications (including crime analysis, epidemiology, ecology, and forestry), spatial point process modeling can help you study the interaction between different events and help you model the process intensity (the rate of event occurrence per unit area). For example, crime analysts might want to estimate where crimes are likely to occur in a city and whether they are associated with locations of public features such as bars and bus stops. Forestry researchers might want to estimate where trees grow best and test for association with covariates such as elevation and gradient. This paper describes the SPP procedure, new in SAS/STAT® 13.2, for exploring and modeling spatial point pattern data. It describes methods that PROC SPP implements for exploratory analysis of spatial point patterns and for log-linear intensity modeling that uses covariates. It also shows you how to use specialized functions for studying interactions between points and how to use specialized analytical graphics to diagnose log-linear models of spatial intensity. Crime analysis, forestry, and ecology examples demonstrate key features of PROC SPP.
Read the paper (PDF).
Pradeep Mohan, SAS
Randy Tobias, SAS
Paper 3330-2015:
Analyzing and Visualizing the Sentiment of the Ebola Outbreak via Tweets
The Ebola virus outbreak is producing some of the most significant and fastest trending news throughout the globe today. There is a lot of buzz surrounding the deadly disease and the drastic consequences that it potentially poses to mankind. Social media provides the basic platforms for millions of people to discuss the issue and allows them to openly voice their opinions. There has been a significant increase in the magnitude of responses all over the world since the death of an Ebola patient in a Dallas, Texas hospital. In this paper, we aim to analyze the overall sentiment that is prevailing in the world of social media. For this, we extracted the live streaming data from Twitter at two different times using the Python scripting language. One instance relates to the period before the death of the patient, and the other relates to the period after the death. We used SAS® Text Miner nodes to parse, filter, and analyze the data and to get a feel for the patterns that exist in the tweets. We then used SAS® Sentiment Analysis Studio to further analyze and predict the sentiment of the Ebola outbreak in the United States. In our results, we found that the issue was not taken very seriously until the death of the Ebola patient in Dallas. After the death, we found that prominent personalities across the globe were talking about the disease and then raised funds to fight it. We are continuing to collect tweets. We analyze the locations of the tweets to produce a heat map that corresponds to the intensity of the varying sentiment across locations.
Read the paper (PDF).
Dheeraj Jami, Oklahoma State University
Goutam Chakraborty, Oklahoma State University
Shivkanth Lanka, Oklahoma State University
Paper 3426-2015:
Application of Text Mining on Tweets to Collect Rational Information about Type-2 Diabetes
Twitter is a powerful form of social media for sharing information about various issues and can be used to raise awareness and collect pointers about associated risk factors and preventive measures. Type-2 diabetes is a national problem in the US. We analyzed twitter feeds about Type-2 diabetes in order to suggest a rational use of social media with respect to an assertive study of any ailment. To accomplish this task, 900 tweets were collected using Twitter API v1.1 in a Python script. Tweets, follower counts, and user information were extracted via the scripts. The tweets were segregated into different groups on the basis of their annotations related to risk factors, complications, preventions and precautions, and so on. We then used SAS® Text Miner to analyze the data. We found that 70% of the tweets stated the status quo, based on marketing and awareness campaigns. The remaining 30% of tweets contained various key terms and labels associated with type-2 diabetes. It was observed that influential users tweeted more about precautionary measures whereas non-influential people gave suggestions about treatments as well as preventions and precautions.
Read the paper (PDF).
Shubhi Choudhary, Oklahoma State University
Goutam Chakraborty, Oklahoma State University
Vijay Singh, Oklahoma State University
Paper 3354-2015:
Applying Data-Driven Analytics to Relieve Suffering Associated with Natural Disasters
Managing the large-scale displacement of people and communities caused by a natural disaster has historically been reactive rather than proactive. Following a disaster, data is collected to inform and prompt operational responses. In many countries prone to frequent natural disasters such as the Philippines, large amounts of longitudinal data are collected and available to apply to new disaster scenarios. However, because of the nature of natural disasters, it is difficult to analyze all of the data until long after the emergency has passed. For this reason, little research and analysis have been conducted to derive deeper analytical insight for proactive responses. This paper demonstrates the application of SAS® analytics to this data and establishes predictive alternatives that can improve conventional storm responses. Humanitarian organizations can use this data to understand displacement patterns and trends and to optimize evacuation routing and planning. Identifying the main contributing factors and leading indicators for the displacement of communities in a timely and efficient manner prevents detrimental incidents at disaster evacuation sites. Using quantitative and qualitative methods, responding organizations can make data-driven decisions that innovate and improve approaches to managing disaster response on a global basis. The benefits of creating a data-driven analytical model can help reduce response time, improve the health and safety of displaced individuals, and optimize scarce resources in a more effective manner. The International Organization for Migration (IOM), an intergovernmental organization, is one of the first-response organizations on the ground that responds to most emergencies. IOM is the global co-load for the Camp Coordination and Camp Management (CCCM) cluster in natural disasters. This paper shows how to use SAS® Visual Analytics and SAS® Visual Statistics for the Philippines in response to Super Typhoon Haiyan in Nove mber 2013 to develop increasingly accurate models for better emergency-preparedness. Using data collected from IOM's Displacement Tracking Matrix (DTM), the final analysis shows how to better coordinate service delivery to evacuation centers sheltering large numbers of displaced individuals, applying accurate hindsight to develop foresight on how to better respond to emergencies and disasters. Predictive models build on patterns found in historical and transactional data to identify risks and opportunities. The capacity to predict trends and behavior patterns related to displacement and mobility has the potential to enable the IOM to respond in a more timely and targeted manner. By predicting the locations of displacement, numbers of persons displaced, number of vulnerable groups, and sites at most risk of security incidents, humanitarians can respond quickly and more effectively with the appropriate resources (material and human) from the outset. The end analysis uses the SAS® Storm Optimization model combined with human mobility algorithms to predict population movement.
Lorelle Yuen, International Organization for Migration
Kathy Ball, Devon Energy
Paper 3327-2015:
Automated Macros to Extract Data from the National (Nationwide) Inpatient Sample (NIS)
The use of administrative databases for understanding practice patterns in the real world has become increasingly apparent. This is essential in the current health-care environment. The Affordable Care Act has helped us to better understand the current use of technology and different approaches to surgery. This paper describes a method for extracting specific information about surgical procedures from the Healthcare Cost and Utilization Project (HCUP) database (also referred to as the National (Nationwide) Inpatient Sample (NIS)).The analyses provide a framework for comparing the different modalities of surgerical procedures of interest. Using an NIS database for a single year, we want to identify cohorts based on surgical approach. We do this by identifying the ICD-9 codes specific to robotic surgery, laparoscopic surgery, and open surgery. After we identify the appropriate codes using an ARRAY statement, a similar array is created based on the ICD-9 codes. Any minimally invasive procedure (robotic or laparoscopic) that results in a conversion is flagged as a conversion. Comorbidities are identified by ICD-9 codes representing the severity of each subject and merged with the NIS inpatient core file. Using a FORMAT statement for all diagnosis variables, we create macros that can be regenerated for each type of complication. These created macros are compiled in SAS® and stored in the library that contains the four macros that are called by tables. They call the macros for different macros variables. In addition, they create the frequencies of all cohorts and create the table structure with the title and number of the table. This paper describes a systematic method in SAS/STAT® 9.2 to extract the data from NIS using the ARRAY statement for the specific ICD-9 codes, to format the extracted data for the analysis, to merge the different NIS databases by procedures, and to use automatic macros to generate the report.
Read the paper (PDF).
Ravi Tejeshwar Reddy Gaddameedi, California State University,Eastbay
Usha Kreaden, Intuitive Surgical
Paper 3050-2015:
Automation of Statistics Summary for Census Data in SAS®
Census data, such as education and income, has been extensively used for various purposes. The data is usually collected in percentages of census unit levels, based on the population sample. Such presentation of the data makes it hard to interpret and compare. A more convenient way of presenting the data is to use the geocoded percentage to produce counts for a pseudo-population. We developed a very flexible SAS® macro to automatically generate the descriptive summary tables for the census data as well as to conduct statistical tests to compare the different levels of the variable by groups. The SAS macro is not only useful for census data but can be used to generate summary tables for any data with percentages in multiple categories.
Read the paper (PDF).
Janet Lee, Kaiser Permanente Southern California
B
Paper 3643-2015:
Before and After Models in Observational Research Using Random Slopes and Intercepts
In observational data analyses, it is often helpful to use patients as their own controls by comparing their outcomes before and after some signal event, such as the initiation of a new therapy. It might be useful to have a control group that does not have the event but that is instead evaluated before and after some arbitrary point in time, such as their birthday. In this context, the change over time is a continuous outcome that can be modeled as a (possibly discontinuous) line, with the same or different slope before and after the event. Mixed models can be used to estimate random slopes and intercepts and compare patients between groups. A specific example published in a peer-reviewed journal is presented.
Read the paper (PDF).
David Pasta, ICON Clinical Research
Paper 3022-2015:
Benefits from the Conversion of Medicaid Encounter Data Reporting System to SAS® Enterprise Guide®
Kaiser Permanente Northwest is contractually obligated for regulatory submissions to Oregon Health Authority, Health Share of Oregon, and Molina Healthcare in Washington. The submissions consist of Medicaid Encounter data for medical and pharmacy claims. SAS® programs are used to extract claims data from Kaiser's claims data warehouse, process the data, and produce output files in HIPAA ASC X12 and NCPDP format. Prior to April 2014, programs were written in SAS® 8.2 running on a VAX server. Several key drivers resulted in the conversion of the existing system to SAS® Enterprise Guide® 5.1 running on UNIX. These drivers were: the need to have a scalable system in preparation for the Affordable Care Act (ACA); performance issues with the existing system; incomplete process reporting and notification to business owners; and a highly manual, labor-intensive process of running individual programs. The upgraded system addressed these drivers. The estimated cost reduction was from $1.30 per reported encounter to $0.13 per encounter. The converted system provides for better preparedness for the ACA. One expected result of ACA is significant Medicaid membership growth. The program has already increased in size by 50% in the preceding 12 months. The updated system allows for the expected growth in membership.
Read the paper (PDF).
Eric Sather, Kaiser Permanente
Paper 3162-2015:
Best Practices: Subset without Getting Upset
You've worked for weeks or even months to produce an analysis suite for a project. Then, at the last moment, someone wants a subgroup analysis, and they inform you that they need it yesterday. This should be easy to do, right? So often, the programs that we write fall apart when we use them on subsets of the original data. This paper takes a look at some of the best practice techniques that can be built into a program at the beginning, so that users can subset on the fly without losing categories or creating errors in statistical tests. We review techniques for creating tables and corresponding titles with BY-group processing so that minimal code needs to be modified when more groups are created. And we provide a link to sample code and sample data that can be used to get started with this process.
Read the paper (PDF).
Mary Rosenbloom, Edwards Lifesciences, LLC
Kirk Paul Lafler, Software Intelligence Corporation
C
Paper 1329-2015:
Causal Analytics: Testing, Targeting, and Tweaking to Improve Outcomes
This session is an introduction to predictive analytics and causal analytics in the context of improving outcomes. The session covers the following topics: 1) Basic predictive analytics vs. causal analytics; 2) The causal analytics framework; 3) Testing whether the outcomes improve because of an intervention; 4) Targeting the cases that have the best improvement in outcomes because of an intervention; and 5) Tweaking an intervention in a way that improves outcomes further.
Read the paper (PDF).
Jason Pieratt, Humana
Paper 3291-2015:
Coding Your Own MCMC Algorithm
In Bayesian statistics, Markov chain Monte Carlo (MCMC) algorithms are an essential tool for sampling from probability distributions. PROC MCMC is useful for these algorithms. However, it is often desirable to code an algorithm from scratch. This is especially present in academia where students are expected to be able to understand and code an MCMC. The ability of SAS® to accomplish this is relatively unknown yet quite straightforward. We use SAS/IML® to demonstrate methods for coding an MCMC algorithm with examples of a Gibbs sampler and Metropolis-Hastings random walk.
Read the paper (PDF).
Chelsea Lofland, University of California Santa Cruz
Paper 1424-2015:
Competing Risk Survival Analysis Using SAS®: When, Why, and How
Competing risk arise in time to event data when the event of interest cannot be observed because of a preceding event i.e. a competing event occurring before. An example can be of an event of interest being a specific cause of death where death from any other cause can be termed as a competing event, if focusing on relapse, death before relapse would constitute a competing event. It is well studied and pointed out that in presence of competing risks, the standard product limit methods yield biased results due to violation of their basic assumption. The effect of competing events on parameter estimation depends on their distribution and frequency. Fine and Gray's sub-distribution hazard model can be used in presence of competing events which is available in PROC PHREG with the release of version 9.4 of SAS® software.
Read the paper (PDF).
Lovedeep Gondara, University of Illinois Springfield
Paper SAS1754-2015:
Count Series Forecasting
Many organizations need to forecast large numbers of time series that are discretely valued. These series, called count series, fall approximately between continuously valued time series, for which there are many forecasting techniques (ARIMA, UCM, ESM, and others), and intermittent time series, for which there are a few forecasting techniques (Croston's method and others). This paper proposes a technique for large-scale automatic count series forecasting and uses SAS® Forecast Server and SAS/ETS® software to demonstrate this technique.
Read the paper (PDF).
Michael Leonard, SAS
Paper 2242-2015:
Creative Uses of Vector Plots Using SAS®
Hysteresis loops occur because of a time lag between input and output. In the pharmaceutical industry, hysteresis plots are tools to visualize the time lag between drug concentration and drug effects. Before SAS® 9.2, SAS annotations were used to generate such plots. One of the criticisms was that SAS programmers had to write complex macros to automate the process; code management and validation tasks were not easy. With SAS 9.2, SAS programmers are able to generate such plots with ease. This paper demonstrates the generation of such plots with both Base SAS and SAS/GRAPH® software.
Read the paper (PDF).
Deli Wang, REGENERON
Paper 3249-2015:
Cutpoint Determination Methods in Survival Analysis Using SAS®: Updated %FINDCUT Macro
Statistical analysis that uses data from clinical or epidemiological studies include continuous variables such as patient's age, blood pressure, and various biomarkers. Over the years, there has been an increase in studies that focus on assessing associations between biomarkers and disease of interest. Many of the biomarkers are measured as continuous variables. Investigators seek to identify the possible cutpoint to classify patients as high risk versus low risk based on the value of the biomarker. Several data-oriented techniques such as median and upper quartile, and outcome-oriented techniques based on score, Wald, and likelihood ratio tests are commonly used in the literature. Contal and O'Quigley (1999) presented a technique that used log rank test statistic in order to estimate the cutpoint. Their method was computationally intensive and hence was overlooked due to the unavailability of built-in options in standard statistical software. In 2003, we provided the %FINDCUT macro that used Contal and O'Quigley's approach to identify a cutpoint when the outcome of interest was measured as time to event. Over the past decade, demand for this macro has continued to grow, which has led us to consider updating the %FINDCUT macro to incorporate new tools and procedures from SAS® such as array processing, Graph Template Language, and the REPORT procedure. New and updated features include: results presented in a much cleaner report format, user-specified cutpoints, macro parameter error checking, temporary data set cleanup, preserving current option settings, and increased processing speed. We present the utility and added options of the revised %FINDCUT macro using a real-life data set. In addition, we critically compare this method to some of the existing methods and discuss the use and misuse of categorizing a continuous covariate.
Read the paper (PDF).
Jay Mandrekar, Mayo Clinic
Jeffrey Meyers, Mayo Clinic
D
Paper 3104-2015:
Data Management Techniques for Complex Healthcare Data
Data sharing through healthcare collaboratives and national registries creates opportunities for secondary data analysis projects. These initiatives provide data for quality comparisons as well as endless research opportunities to external researchers across the country. The possibilities are bountiful when you join data from diverse organizations and look for common themes related to illnesses and patient outcomes. With these great opportunities comes great pain for data analysts and health services researchers tasked with compiling these data sets according to specifications. Patient care data is complex, and, particularly at large healthcare systems, might be managed with multiple electronic health record (EHR) systems. Matching data from separate EHR systems while simultaneously ensuring the integrity of the details of that care visit is challenging. This paper demonstrates how data management personnel can use traditional SAS PROCs in new and inventive ways to compile, clean, and complete data sets for submission to healthcare collaboratives and other data sharing initiatives. Traditional data matching methods such as SPEDIS are uniquely combined with iterative SQL joins using the SAS® functions INDEX, COMPRESS, CATX, and SUBSTR to yield the most out of complex patient and physician name matches. Recoding, correcting missing items, and formatting data can be efficiently achieved by using traditional functions such as MAX, PROC FORMAT, and FIND in new and inventive ways.
Read the paper (PDF).
Gabriela Cantu, Baylor Scott &White Health
Christopher Klekar, Baylor Scott and White Health
Paper 2660-2015:
Deriving Adverse Event Records Based on Study Treatments
This paper puts forward an approach on how to create duplicate records from one Adverse Event (AE) datum based on study treatments. In order to fulfill this task, one flag was created to check if we need to produce duplicate records depending on an existing AE in different treatment periods. If yes, then we create these duplicate records and derive AE dates for these duplicate records based on treatment periods or discontinued dates.
Read the paper (PDF).
Jonson Jiang, inVentiv Health
E
Paper 2124-2015:
Efficient Ways to Build Up Chains and Waves for Respondent-Driven Sampling Data
Respondent Driven Sampling (RDS) is both a sampling method and a data analysis technique. As a sampling method, RDS is a chain referral technique with strategic recruitment quotas and specific data gathering requirements. Like other chain referral techniques (for example, snowball sampling), the chains and waves are the start point to conduct analysis. But building the chains and waves still would be a daunting task because it involves too many transpositions and merges. This paper provides an efficient method of using Base SAS® to build up chains and waves.
Read the paper (PDF).
Wen Song, ICF International
Paper SAS1900-2015:
Establishing a Health Analytics Framework
Medicaid programs are the second largest line item in each state's budget. In 2012, they contributed $421.2 billion, or 15 percent of total national healthcare expenditures. With US health care reform at full speed, state Medicaid programs must establish new initiatives that will reduce the cost of healthcare, while providing coordinated, quality care to the nation's most vulnerable populations. This paper discusses how states can implement innovative reform through the use of data analytics. It explains how to establish a statewide health analytics framework that can create novel analyses of health data and improve the health of communities. With solutions such as SAS® Claims Analytics, SAS® Episode Analytics, and SAS® Fraud Framework, state Medicaid programs can transform the way they make business and clinical decisions. Moreover, new payment structures and delivery models can be successfully supported through the use of healthcare analytics. A statewide health analytics framework can support initiatives such as bundled and episodic payments, utilization studies, accountable care organizations, and all-payer claims databases. Furthermore, integrating health data into a single analytics framework can provide the flexibility to support a unique analysis that each state can customize with multiple solutions and multiple sources of data. Establishing a health analytics framework can significantly improve the efficiency and effectiveness of state health programs and bend the healthcare cost curve.
Read the paper (PDF).
Krisa Tailor, SAS
Jeremy Racine
F
Paper SAS1580-2015:
Functional Modeling of Longitudinal Data with the SSM Procedure
In many studies, a continuous response variable is repeatedly measured over time on one or more subjects. The subjects might be grouped into different categories, such as cases and controls. The study of resulting observation profiles as functions of time is called functional data analysis. This paper shows how you can use the SSM procedure in SAS/ETS® software to model these functional data by using structural state space models (SSMs). A structural SSM decomposes a subject profile into latent components such as the group mean curve, the subject-specific deviation curve, and the covariate effects. The SSM procedure enables you to fit a rich class of structural SSMs, which permit latent components that have a wide variety of patterns. For example, the latent components can be different types of smoothing splines, including polynomial smoothing splines of any order and all L-splines up to order 2. The SSM procedure efficiently computes the restricted maximum likelihood (REML) estimates of the model parameters and the best linear unbiased predictors (BLUPs) of the latent components (and their derivatives). The paper presents several real-life examples that show how you can fit, diagnose, and select structural SSMs; test hypotheses about the latent components in the model; and interpolate and extrapolate these latent components.
Read the paper (PDF). | Download the data file (ZIP).
Rajesh Selukar, SAS
G
Paper 1886-2015:
Getting Started with Data Governance
While there has been tremendous progress in technologies related to data storage, high-performance computing, and advanced analytic techniques, organizations have only recently begun to comprehend the importance of parallel strategies that help manage the cacophony of concerns around access, quality, provenance, data sharing, and use. While data governance is not new, the drumbeat around it, along with master data management and data quality, is approaching a crescendo. Intensified by the increase in consumption of information, expectations about ubiquitous access, and highly dynamic visualizations, these factors are also circumscribed by security and regulatory constraints. In this paper, we provide a summary of what data governance is and its importance. We go beyond the obvious and provide practical guidance on what it takes to build out a data governance capability appropriate to the scale, size, and purpose of the organization and its culture. Moreover, we discuss best practices in the form of requirements that highlight what we think is important to consider as you provide that tactical linkage between people, policies, and processes to the actual data lifecycle. To that end, our focus includes the organization and its culture, people, processes, policies, and technology. Further, we include discussions of organizational models as well as the role of the data steward, and provide guidance on how to formalize data governance into a sustainable set of practices within your organization.
Read the paper (PDF). | Watch the recording.
Greg Nelson, ThotWave
Lisa Dodson, SAS
H
Paper 3485-2015:
Health Services Research Using Electronic Health Record Data: A Grad Student How-To Paper
Graduate students encounter many challenges when conducting health services research using real world data obtained from electronic health records (EHRs). These challenges include cleaning and sorting data, summarizing and identifying present-on-admission diagnosis codes, identifying appropriate metrics for risk-adjustment, and determining the effectiveness and cost effectiveness of treatments. In addition, outcome variables commonly used in health service research are not normally distributed. This necessitates the use of nonparametric methods in statistical analyses. This paper provides graduate students with the basic tools for the conduct of health services research with EHR data. We will examine SAS® tools and step-by-step approaches used in an analysis of the effectiveness and cost-effectiveness of the ABCDE (Awakening and Breathing Coordination, Delirium monitoring/management, and Early exercise/mobility) bundle in improving outcomes for intensive care unit (ICU) patients. These tools include the following: (1) ARRAYS; (2) lookup tables; (3) LAG functions; (4) PROC TABULATE; (5) recycled predictions; and (6) bootstrapping. We will discuss challenges and lessons learned in working with data obtained from the EHR. This content is appropriate for beginning SAS users.
Read the paper (PDF).
Ashley Collinsworth, Baylor Scott & White Health/Tulane University
Elisa Priest, Texas A&M University Health Science Center
Paper 3214-2015:
How is Your Health? Using SAS® Macros, ODS Graphics, and GIS Mapping to Monitor Neighborhood and Small-Area Health Outcomes
With the constant need to inform researchers about neighborhood health data, the Santa Clara County Health Department created socio-demographic and health profiles for 109 neighborhoods in the county. Data was pulled from many public and county data sets, compiled, analyzed, and automated using SAS®. With over 60 indicators and 109 profiles, an efficient set of macros was used to automate the calculation of percentages, rates, and mean statistics for all of the indicators. Macros were also used to automate individual census tracts into pre-decided neighborhoods to avoid data entry errors. Simple SQL procedures were used to calculate and format percentages within the macros, and output was pushed out using Output Delivery System (ODS) Graphics. This output was exported to Microsoft Excel, which was used to create a sortable database for end users to compare cities and/or neighborhoods. Finally, the automated SAS output was used to map the demographic data using geographic information system (GIS) software at three geographies: city, neighborhood, and census tract. This presentation describes the use of simple macros and SAS procedures to reduce resources and time spent on checking data for quality assurance purposes. It also highlights the simple use of ODS Graphics to export data to an Excel file, which was used to mail merge the data into 109 unique profiles. The presentation is aimed at intermediate SAS users at local and state health departments who might be interested in finding an efficient way to run and present health statistics given limited staff and resources.
Read the paper (PDF).
Roshni Shah, Santa Clara County
Paper 3143-2015:
How to Cut the Time of Processing of Ryan White Services (RSR) Data by 99% and More.
The widely used method to convert RSR XML data to some standard, ready-to-process database uses a Visual Basic mapper as a buffer tool when reading XML data (for example, into an MS Access database). This paper describes the shortcomings of this method with respect to the different schemas of RSR data and offers a SAS® macro that enables users to read any schema of RSR data directly into a SAS relational database. This macro entirely eliminates the step of creating an MS Access database. Using our macro, the user can cut the time of processing of Ryan White data by 99% and more, depending on the number of files that need to be processed in one run.
Read the paper (PDF).
Michael Costa, Abt Associates
Fizza Gillani, Brown University and Lifespan/Tufts/Brown Center for AIDS Research.
Paper 3252-2015:
How to Use SAS® for GMM Logistic Regression Models for Longitudinal Data with Time-Dependent Covariates
In longitudinal data, it is important to account for the correlation due to repeated measures and time-dependent covariates. Generalized method of moments can be used to estimate the coefficients in longitudinal data, although there are currently limited procedures in SAS® to produce GMM estimates for correlated data. In a recent paper, Lalonde, Wilson, and Yin provided a GMM model for estimating the coefficients in this type of data. SAS PROC IML was used to generate equations that needed to be solved to determine which estimating equations to use. In addition, this study extended classifications of moment conditions to include a type IV covariate. Two data sets were evaluated using this method, including re-hospitalization rates from a Medicare database as well as body mass index and future morbidity rates among Filipino children. Both examples contain binary responses, repeated measures, and time-dependent covariates. However, while this technique is useful, it is tedious and can also be complicated when determining the matrices necessary to obtain the estimating equations. We provide a concise and user-friendly macro to fit GMM logistic regression models with extended classifications.
Read the paper (PDF).
Katherine Cai, Arizona State University
I
Paper 3411-2015:
Identifying Factors Associated with High-Cost Patients
Research has shown that the top five percent of patients can account for nearly fifty percent of the total healthcare expenditure in the United States. Using SAS® Enterprise Guide® and PROC LOGISTIC, a statistical methodology was developed to identify factors (for example, patient demographics, diagnostic symptoms, comorbidity, and the type of procedure code) associated with the high cost of healthcare. Analyses were performed using the FAIR Health National Private Insurance Claims (NPIC) database, which contains information about healthcare utilization and cost in the United States. The analyses focused on treatments for chronic conditions, such as trans-myocardial laser revascularization for the treatment of coronary heart disease (CHD) and pressurized inhalation for the treatment of asthma. Furthermore, bubble plots and heat maps were created using SAS® Visual Analytics to provide key insights into potentially high-cost treatments for heart disease and asthma patients across the nation.
Read the paper (PDF). | Download the data file (ZIP).
Jeff Dang, FAIR Health
Paper 3356-2015:
Improving the Performance of Two-Stage Modeling Using the Association Node of SAS® Enterprise Miner™ 12.3
Over the years, very few published studies have discussed ways to improve the performance of two-stage predictive models. This study, based on 10 years (1999-2008) of data from 130 US hospitals and integrated delivery networks, is an attempt to demonstrate how we can leverage the Association node in SAS® Enterprise Miner™ to improve the classification accuracy of the two-stage model. We prepared the data with imputation operations and data cleaning procedures. Variable selection methods and domain knowledge were used to choose 43 key variables for the analysis. The prominent association rules revealed interesting relationships between prescribed medications and patient readmission/no-readmission. The rules with lift values greater than 1.6 were used to create dummy variables for use in the subsequent predictive modeling. Next, we used two-stage sequential modeling, where the first stage predicted if the diabetic patient was readmitted and the second stage predicted whether the readmission happened within 30 days. The backward logistic regression model outperformed competing models for the first stage. After including dummy variables from an association analysis, many fit indices improved, such as the validation ASE to 0.228 from 0.238, cumulative lift to 1.56 from 1.40. Likewise, the performance of the second stage was improved after including dummy variables from an association analysis. Fit indices such as the misclassification rate improved to 0.240 from 0.243 and the final prediction error to 0.17 from 0.18.
Read the paper (PDF).
Girish Shirodkar, Oklahoma State University
Goutam Chakraborty, Oklahoma State University
Ankita Chaudhari, Oklahoma State University
Paper 3024-2015:
Introduction to SAS® Hash Objects
The SAS® hash object is an incredibly powerful technique for integrating data from two or more data sets based on a common key. This session describes the basic methodology for defining, populating, and using a hash object to perform lookups within the DATA step and provides examples of situations in which the performance of SAS programs is improved by their use. Common problems encountered when using hash objects are explained, and tools and techniques for optimizing hash objects within your SAS program are demonstrated.
Read the paper (PDF).
Chris Schacherer, Clinical Data Management Systems, LLC
K
Paper 2480-2015:
Kaplan-Meier Survival Plotting Macro %NEWSURV
The research areas of pharmaceuticals and oncology clinical trials greatly depend on time-to-event endpoints such as overall survival and progression-free survival. One of the best graphical displays of these analyses is the Kaplan-Meier curve, which can be simple to generate with the LIFETEST procedure but difficult to customize. Journal articles generally prefer that statistics such as median time-to-event, number of patients, and time-point event-free rate estimates be displayed within the graphic itself, and this was previously difficult to do without an external program such as Microsoft Excel. The macro %NEWSURV takes advantage of the Graph Template Language (GTL) that was added with the SG graphics engine to create this level of customizability without the need for back-end manipulation. Taking this one step further, the macro was improved to be able to generate a lattice of multiple unique Kaplan-Meier curves for side-by-side comparisons or for condensing figures for publications. This paper describes the functionality of the macro and describes how the key elements of the macro work.
Read the paper (PDF).
Jeffrey Meyers, Mayo Clinic
L
Paper SAS1748-2015:
Lost in the Forest Plot? Follow the Graph Template Language AXISTABLE Road!
A forest plot is a common visualization for meta-analysis. Some popular versions that use subgroups with indented text and bold fonts can seem outright daunting to create. With SAS® 9.4, the Graph Template Language (GTL) has introduced the AXISTABLE statement, specifically designed for including text data columns into a graph. In this paper, we demonstrate the simplicity of creating various forest plots using AXISTABLE statements. Come and see how to create forest plots as clear as day!
Read the paper (PDF). | Download the data file (ZIP).
Prashant Hebbar, SAS
M
Paper 2240-2015:
Member-Level Regression Using SAS® Enterprise Guide® and SAS® Forecast Studio
The need to measure slight changes in healthcare costs and utilization patterns over time is vital in predictive modeling, forecasting, and other advanced analytics. At BlueCross BlueShield of Tennessee, a method for developing member-level regression slopes creates a better way of identifying these changes across various time spans. The goal is to create multiple metrics at the member level that will indicate when an individual is seeking more or less medical or pharmacy services. Significant increases or decreases in utilization and cost are used to predict the likelihood of acquiring certain conditions, seeking services at particular facilities, and self-engaging in health and wellness. Data setup and compilation consists of calculating a member's eligibility with the health plan and then aggregating cost and utilization of particular services (for example, primary care visits, Rx costs, ER visits, and so on). A member must have at least six months of eligibility for a valid regression slope to be calculated. Linear regression is used to build single-factor models for 6, 12, 18 and 24 month time spans if the appropriate amount of data is available for the member. Models are built at the member-metric time period resulting in the possibility of over 75 regression coefficients per member per monthly run. The computing power needed to execute such a vast amount of calculations requires in-database processing of various macro processes. SAS® Enterprise Guide® is used to structure the data and SAS® Forecast Studio is used to forecast trends at a member level. Algorithms are run the first of each month. Data is stored so that each metric and corresponding slope is appended on a monthly basis. Because the data is setup up for the member regression algorithm, slopes are interpreted in the following manner: a positive value for -1*slope indicates an increase in utilization/cost; a negative value for -1*slope indicates a decrease in utilization/cost. The ac tual slope value indicates the intensity of the change in cost in utilization. The insight provided by this member-level regression methodology replaces subjective methods that used arbitrary thresholds of change to measure differences in cost and utilization.
Read the paper (PDF).
Leigh McCormack, BCBST
Prudhvidhar Perati, BlueCross BlueShield of TN
Paper 3760-2015:
Methodological and Statistical Issues in Provider Performance Assessment
With the move to value-based benefit and reimbursement models, it is essential toquantify the relative cost, quality, and outcome of a service. Accuratelymeasuring the cost and quality of doctors, practices, and health systems iscritical when you are developing a tiered network, a shared savings program, ora pay-for-performance incentive. Limitations in claims payment systems requiredeveloping methodological and statistical techniques to improve the validityand reliability of provider's scores on cost and quality of care. This talkdiscusses several key concepts in the development of a measurement systemfor provider performance, including measure selection, risk adjustment methods,and peer group benchmark development.
Read the paper (PDF). | Watch the recording.
Daryl Wansink, Qualmetrix, Inc.
Paper 2400-2015:
Modeling Effect Modification and Higher-Order Interactions: A Novel Approach for Repeated Measures Design Using the LSMESTIMATE Statement in SAS® 9.4
Effect modification occurs when the association between a predictor of interest and the outcome is differential across levels of a third variable--the modifier. Effect modification is statistically tested as the interaction effect between the predictor and the modifier. In repeated measures studies (with more than two time points), higher-order (three-way) interactions must be considered to test effect modification by adding time to the interaction terms. Custom fitting and constructing these repeated measures models are difficult and time consuming, especially with respect to estimating post-fitting contrasts. With the advancement of the LSMESTIMATE statement in SAS®, a simplified approach can be used to custom test for higher-order interactions with post-fitting contrasts within a mixed model framework. This paper provides a simulated example with tips and techniques for using an application of the nonpositional syntax of the LSMESTIMATE statement to test effect modification in repeated measures studies. This approach, which is applicable to exploring modifiers in randomized controlled trials (RCTs), goes beyond the treatment effect on outcome to a more functional understanding of the factors that can enhance, reduce, or change this relationship. Using this technique, we can easily identify differential changes for specific subgroups of individuals or patients that subsequently impact treatment decision making. We provide examples of conventional approaches to higher-order interaction and post-fitting tests using the ESTIMATE statement and compare and contrast this to the nonpositional syntax of the LSMESTIMATE statement. The merits and limitations of this approach are discussed.
Read the paper (PDF). | Download the data file (ZIP).
Pronabesh DasMahapatra, PatientsLikeMe Inc.
Ryan Black, NOVA Southeastern University
Paper 2900-2015:
Multiple Ways to Detect Differential Item Functioning in SAS®
Differential item functioning (DIF), as an assessment tool, has been widely used in quantitative psychology, educational measurement, business management, insurance, and health care. The purpose of DIF analysis is to detect response differences of items in questionnaires, rating scales, or tests across different subgroups (for example, gender) and to ensure the fairness and validity of each item for those subgroups. The goal of this paper is to demonstrate several ways to conduct DIF analysis by using different SAS® procedures (PROC FREQ, PROC LOGISITC, PROC GENMOD, PROC GLIMMIX, and PROC NLMIXED) and their applications. There are three general methods to examine DIF: generalized Mantel-Haenszel (MH), logistic regression, and item response theory (IRT). The SAS® System provides flexible procedures for all these approaches. There are two types of DIF: uniform DIF, which remains consistent across ability levels, and non-uniform DIF, which varies across ability levels. Generalized MH is a nonparametric method and is often used to detect uniform DIF while the other two are parametric methods and examine both uniform and non-uniform DIF. In this study, I first describe the underlying theories and mathematical formulations for each method. Then I show the SAS statements, input data format, and SAS output for each method, followed by a detailed demonstration of the differences among the three methods. Specifically, PROC FREQ is used to calculate generalized MH only for dichotomous items. PROC LOGISITIC and PROC GENMOD are used to detect DIF by using logistic regression. PROC NLMIXED and PROC GLIMMIX are used to examine DIF by applying an exploratory item response theory model. Finally, I use SAS/IML® to call two R packages (that is, difR and lordif) to conduct DIF analysis and then compare the results between SAS procedures and R packages. An example data set, the Verbal Aggression assessment, which includes 316 subjects and 24 items, is used in this stud y. Following the general DIF analysis, the male group is used as the reference group, and the female group is used as the focal group. All the analyses are conducted by SAS® 9.3 and R 2.15.3. The paper closes with the conclusion that the SAS System provides different flexible and efficient ways to conduct DIF analysis. However, it is essential for SAS users to understand the underlying theories and assumptions of different DIF methods and apply them appropriately in their DIF analyses.
Read the paper (PDF).
Yan Zhang, Educational Testing Service
N
Paper 3253-2015:
Need Additional Statistics in That Report? ODS OUTPUT to the Rescue!
You might be familiar with or experienced in writing or running reports using PROC REPORT, PROC TABULATE, or other methods of report generation. These reporting methods are often very flexible, but they can be limited in the statistics that are available as options for inclusion in the resulting output. SAS® provides the capability to produce a variety of statistics through Base SAS® and SAS/STAT® procedures by using ODS OUTPUT. These procedures include statistics from PROC CORR, PROC FREQ, and PROC UNIVARIATE in Base SAS, as well as PROC GLM, PROC LIFETEST, PROC MIXED, PROC LOGISTIC, and PROC TTEST in SAS/STAT. A number of other procedures can also produce useful ODS OUTPUT objects. Commonly requested statistics for reports include p-values, confidence intervals, and test statistics. These values can be computed with the appropriate procedure, and then use ODS OUTPUT to output the desired information to a data set and include the new information with the other data used to produce the report. Examples that demonstrate how to easily generate the desired statistics or other information and include it to produce the requested final reports are provided and discussed.
Read the paper (PDF).
Debbie Buck, inVentiv Health Clinical
Paper 3455-2015:
Nifty Uses of SQL Reflexive Join and Subquery in SAS®
SAS® SQL is so powerful that you hardly miss using Oracle PL/SQL. One SAS SQL forte can be found in using the SQL reflexive join. Another area of SAS SQL strength is the SQL subquery concept. The focus of this paper is to show alternative approaches to data reporting and to show how to surface data quality problems using reflexive join and subquery SQL concepts. The target audience for this paper is the intermediate SAS programmer or the experienced ANSI SQL programmer new to SAS programming.
Read the paper (PDF).
Cynthia Trinidad, Theorem Clinical Research
O
Paper 3296-2015:
Out of Control! A SAS® Macro to Recalculate QC Statistics
SAS/QC® provides procedures, such as PROC SHEWHART, to produce control charts with centerlines and control limits. When quality improvement initiatives create an out-of-control process of improvement, centerlines and control limits need to be recalculated. While this is not a complicated process, producing many charts with multiple centerline shifts can quickly become difficult. This paper illustrates the use of a macro to efficiently compute centerlines and control limits when one or more recalculations are needed for multiple charts.
Read the paper (PDF).
Jesse Pratt, Cincinnati Children's Hospital Medical Center
P
Paper 1884-2015:
Practical Implications of Sharing Data: A Primer on Data Privacy, Anonymization, and De-Identification
Researchers, patients, clinicians, and other health-care industry participants are forging new models for data-sharing in hopes that the quantity, diversity, and analytic potential of health-related data for research and practice will yield new opportunities for innovation in basic and translational science. Whether we are talking about medical records (for example, EHR, lab, notes), administrative data (claims and billing), social (on-line activity), behavioral (fitness trackers, purchasing patterns), contextual (geographic, environmental), or demographic data (genomics, proteomics), it is clear that as health-care data proliferates, threats to security grow. Beginning with a review of the major health-care data breeches in our recent history, we highlight some of the lessons that can be gleaned from these incidents. In this paper, we talk about the practical implications of data sharing and how to ensure that only the right people have the right access to the right level of data. To that end, we explore not only the definitions of concepts like data privacy, but we discuss, in detail, methods that can be used to protect data--whether inside our organization or beyond its walls. In this discussion, we cover the fundamental differences between encrypted data, 'de-identified', 'anonymous', and 'coded' data, and methods to implement each. We summarize the landscape of maturity models that can be used to benchmark your organization's data privacy and protection of sensitive data.
Read the paper (PDF). | Watch the recording.
Greg Nelson, ThotWave
Paper 3326-2015:
Predicting Hospitalization of a Patient Using SAS® Enterprise Miner™
Inpatient treatment is the most common type of treatment ordered for patients who have a serious ailment and need immediate attention. Using a data set about diabetes patients downloaded from the UCI Network Data Repository, we built a model to predict the probability that the patient will be rehospitalized within 30 days of discharge. The data has about 100,000 rows and 51 columns. In our preliminary analysis, a neural network turned out to be the best model, followed closely by the decision tree model and regression model.
Nikhil Kapoor, Oklahoma State University
Ganesh Kumar Gangarajula, Oklahoma State University
Paper 3254-2015:
Predicting Readmission of Diabetic Patients Using the High-Performance Support Vector Machine Algorithm of SAS® Enterprise Miner™ 13.1
Diabetes is a chronic condition affecting people of all ages and is prevalent in around 25.8 million people in the U.S. The objective of this research is to predict the probability of a diabetic patient being readmitted. The results from this research will help hospitals design a follow-up protocol to ensure that patients having a higher re-admission probability are doing well in order to promote a healthy doctor-patient relationship. The data was obtained from the Center for Machine Learning and Intelligent Systems at University of California, Irvine. The data set contains over 100,000 instances and 55 variables such as insulin and length of stay, and so on. The data set was split into training and validation to provide an honest assessment of models. Various variable selection techniques such as stepwise regression, forward regression, LARS, and LASSO were used. Using LARS, prominent factors were identified in determining the patient readmission rate. Numerous predictive models were built: Decision Tree, Logistic Regression, Gradient Boosting, MBR, SVM, and others. The model comparison algorithm in SAS® Enterprise Miner™ 13.1 recognized that the High-Performance Support Vector Machine outperformed the other models, having the lowest misclassification rate of 0.363. The chosen model has a sensitivity of 49.7% and a specificity of 75.1% in the validation data.
Read the paper (PDF).
Hephzibah Munnangi, Oklahoma State University
Goutam Chakraborty, Oklahoma State University
Paper 3247-2015:
Privacy, Transparency, and Quality Improvement in the Era of Big Data and Health Care Reform
The era of big data and health care reform is an exciting and challenging time for anyone whose work involves data security, analytics, data visualization, or health services research. This presentation examines important aspects of current approaches to quality improvement in health care based on data transparency and patient choice. We look at specific initiatives related to the Affordable Care Act (for example, the qualified entity program of section 10332 that allows the Centers for Medicare and Medicaid Services (CMS) to provide Medicare claims data to organizations for multi-payer quality measurement and reporting, the open payments program, and state-level all-payer claims databases to inform improvement and public reporting) within the context of a core issue in the era of big data: security and privacy versus transparency and openness. In addition, we examine an assumption that underlies many of these initiatives: data transparency leads to improved choices by health care consumers and increased accountability of providers. For example, recent studies of one component of data transparency, price transparency, show that, although health plans generally offer consumers an easy-to-use cost calculator tool, only about 2 percent of plan members use it. Similarly, even patients with high-deductible plans (presumably those with an increased incentive to do comparative shopping) seek prices for only about 10 percent of their services. Anyone who has worked in analytics, reporting, or data visualization recognizes the importance of understanding the intended audience, and that methodological transparency is as important as the public reporting of the output of the calculation of cost or quality metrics. Although widespread use of publicly reported health care data might not be a realistic goal, data transparency does offer a number of potential benefits: data-driven policy making, informed management of cost and use of services, as well as public health benefits through, for example, the rec ognition of patterns of disease prevalence and immunization use. Looking at this from a system perspective, we can distinguish five main activities: data collection, data storage, data processing, data analysis, and data reporting. Each of these activities has important components (such as database design for data storage and de-identification and aggregation for data reporting) as well as overarching requirements such as data security and quality assurance that are applicable to all activities. A recent Health Affairs article by CMS leaders noted that the big-data revolution could not have come at a better time, but it also recognizes that challenges remain. Although CMS is the largest single payer for health care in the U.S., the challenges it faces are shared by all organizations that collect, store, analyze, or report health care data. In turn, these challenges are opportunities for database developers, systems analysts, programmers, statisticians, data analysts, and those who provide the tools for public reporting to work together to design comprehensive solutions that inform evidence-based improvement efforts.
Read the paper (PDF).
Paul Gorrell, IMPAQ International
R
Paper 1341-2015:
Random vs. Fixed Effects: Which Technique More Effectively Addresses Selection Bias in Observational Studies
Retrospective case-control studies are frequently used to evaluate health care programs when it is not feasible to randomly assign members to a respective cohort. Without randomization, observational studies are more susceptible to selection bias where the characteristics of the enrolled population differ from those of the entire population. When the participant sample is different from the comparison group, the measured outcomes are likely to be biased. Given this issue, this paper discusses how propensity score matching and random effects techniques can be used to reduce the impact selection bias has on observational study outcomes. All results shown are drawn from an ROI analysis using a participant (cases) versus non-participant (controls) observational study design for a fitness reimbursement program aiming to reduce health care expenditures of participating members.
Read the paper (PDF). | Download the data file (ZIP).
Jess Navratil-Strawn, Optum
Paper 2382-2015:
Reducing the Bias: Practical Application of Propensity Score Matching in Health-Care Program Evaluation
To stay competitive in the marketplace, health-care programs must be capable of reporting the true savings to clients. This is a tall order, because most health-care programs are set up to be available to the client's entire population and thus cannot be conducted as a randomized control trial. In order to evaluate the performance of the program for the client, we use an observational study design that has inherent selection bias due to its inability to randomly assign participants. To reduce the impact of bias, we apply propensity score matching to the analysis. This technique is beneficial to health-care program evaluations because it helps reduce selection bias in the observational analysis and in turn provides a clearer view of the client's savings. This paper explores how to develop a propensity score, evaluate the use of inverse propensity weighting versus propensity matching, and determine the overall impact of the propensity score matching method on the observational study population. All results shown are drawn from a savings analysis using a participant (cases) versus non-participant (controls) observational study design for a health-care decision support program aiming to reduce emergency room visits.
Read the paper (PDF).
Amber Schmitz, Optum
Paper 2601-2015:
Replication Techniques for Variance Approximation
Replication techniques such as the jackknife and the bootstrap have become increasingly popular in recent years, particularly within the field of complex survey data analysis. The premise of these techniques is to treat the data set as if it were the population and repeatedly sample from it in some systematic fashion. From each sample, or replicate, the estimate of interest is computed, and the variability of the estimate from the full data set is approximated by a simple function of the variability among the replicate-specific estimates. An appealing feature is that there is generally only one variance formula per method, regardless of the underlying quantity being estimated. The entire process can be efficiently implemented after appending a series of replicate weights to the analysis data set. As will be shown, the SURVEY family of SAS/STAT® procedures can be exploited to facilitate both the task of appending the replicate weights and approximating variances.
Read the paper (PDF).
Taylor Lewis, University of Maryland
Paper 3740-2015:
Risk-Adjusting Provider Performance Utilization Metrics
Pay-for-performance programs are putting increasing pressure on providers to better manage patient utilization through care coordination, with the philosophy that good preventive services and routine care can prevent the need for some high-resource services. Evaluation of provider performance frequently includes measures such as acute care events (ER and inpatient), imaging, and specialist services, yet rarely are these indicators adjusted for the underlying risk of providers' patient panel. In part, this is because standard patient risk scores are designed to predict costs, not the probability of specific service utilization. As such, Blue Cross Blue Shield of North Carolina has developed a methodology to model our members' risk of these events in an effort to ensure that providers are evaluated fairly and to prevent our providers from adverse selection practices. Our risk modeling takes into consideration members' underlying health conditions and limited demographic factors during the previous 12 month period, and employs two-part regression models using SAS® software. These risk-adjusted measures will subsequently be the basis of performance evaluation of primary care providers for our Accountable Care Organizations and medical home initiatives.
Read the paper (PDF).
Stephanie Poley, Blue Cross Blue Shield of North Carolina
S
Paper SAS1972-2015:
Social Media and Open Data Integration through SAS® Visual Analytics and SAS® Text Analytics for Public Health Surveillance
A leading killer in the United States is smoking. Moreover, over 8.6 million Americans live with a serious illness caused by smoking or second-hand smoking. Despite this, over 46.6 million U.S. adults smoke tobacco, cigars, and pipes. The key analytic question in this paper is, How would e-cigarettes affect this public health situation? Can monitoring public opinions of e-cigarettes using SAS® Text Analytics and SAS® Visual Analytics help provide insight into the potential dangers of these new products? Are e-cigarettes an example of Big Tobacco up to its old tricks or, in fact, a cessation product? The research in this paper was conducted on thousands of tweets from April to August 2014. It includes API sources beyond Twitter--for example, indicators from the Health Indicators Warehouse (HIW) of the Centers for Disease Control and Prevention (CDC)--that were used to enrich Twitter data in order to implement a surveillance system developed by SAS® for the CDC. The analysis is especially important to The Office of Smoking and Health (OSH) at the CDC, which is responsible for tobacco control initiatives that help states to promote cessation and prevent initiation in young people. To help the CDC succeed with these initiatives, the surveillance system also: 1) automates the acquisition of data, especially tweets; and 2) applies text analytics to categorize these tweets using a taxonomy that provides the CDC with insights into a variety of relevant subjects. Twitter text data can help the CDC look at the public response to the use of e-cigarettes, and examine general discussions regarding smoking and public health, and potential controversies (involving tobacco exposure to children, increasing government regulations, and so on). SAS® Content Categorization helps health care analysts review large volumes of unstructured data by categorizing tweets in order to monitor and follow what people are saying and why they are saying it. Ultimatel y, it is a solution intended to help the CDC monitor the public's perception of the dangers of smoking and e-cigarettes, in addition, it can identify areas where OSH can focus its attention in order to fulfill its mission and track the success of CDC health initiatives.
Read the paper (PDF).
Manuel Figallo, SAS
Emily McRae, SAS
Paper 3274-2015:
Statistical Analysis of Publicly Released Survey Data with SAS/STAT® Software SURVEY Procedures
Several U.S. Federal agencies conduct national surveys to monitor health status of residents. Many of these agencies release their survey data to the public. Investigators might be able to address their research objectives by conducting secondary statistical analyses with these available data sources. This paper describes the steps in using the SAS SURVEY procedures to analyze publicly released data from surveys that use probability sampling to make statistical inference to a carefully defined population of elements (the target population).
Read the paper (PDF). | Watch the recording.
Donna Brogan, Emory University, Atlanta, GA
Paper 3490-2015:
Survey Sample Designs for Auditing, Fraud, and Forensics.
Sampling for audits and forensics presents special challenges: Each survey/sample item requires examination by a team of professionals, so sample size must be contained. Surveys involve estimating--not hypothesis testing. So power is not a helpful concept. Stratification and modeling is often required to keep sampling distributions from being skewed. A precision of alpha is not required to create a confidence interval of 1-alpha, but how small a sample is supportable? Many times replicated sampling is required to prove the applicability of the design. Given the robust, programming-oriented approach of SAS®, the random selection, stratification, and optimizing techniques built into SAS can be used to bring transparency and reliability to the sample design process. While a sample that is used in a published audit or as a measure of financial damages must endure a special scrutiny, it can be a rewarding process to design a sample whose performance you truly understand and which will stand up under a challenge.
Read the paper (PDF).
Turner Bond, HUD-Office of Inspector General
Paper 2040-2015:
Survival Analysis with Survey Data
Surveys are designed to elicit information about population characteristics. A survey design typically combines stratification and multistage sampling of intact clusters, sub-clusters, and individual units with specified probabilities of selection. A survey sample can produce valid and reliable estimates of population parameters at a fraction of the cost of carrying out a census of the entire population, with clear logistical efficiencies. For analyses of survey data, SAS®software provides a suite of procedures from SURVEYMEANS and SURVEYFREQ for generating descriptive statistics and conducting inference on means and proportions to regression-based analysis through SURVEYREG and SURVEYLOGISTIC. For longitudinal surveys and follow-up studies, SURVEYPHREG is designed to incorporate aspects of the survey design for analysis of time-to-event outcomes based on the Cox proportional hazards model, allowing for time-varying explanatory variables.We review the salient features of the SURVEYPHREG procedure with application to survey data from the National Health and Nutrition Examination Survey (NHANES III) Linked Mortality File.
Read the paper (PDF).
JOSEPH GARDINER, MICHIGAN STATE UNIVERSITY
T
Paper SAS2604-2015:
Take a Bite Out of Crime with SAS® Visual Scenario Designer
The vast and increasing demands of fraud detection and description have promoted the broad application of statistics and machine learning in fields as diverse as banking, credit card application and usage, insurance claims, trader surveillance, health care claims, and government funding and allowance management. SAS® Visual Scenario Designer enables you to derive interactive business rules, along with descriptive and predictive models, to detect and describe fraud. This paper focuses on building interactive decision trees to classify fraud. Attention to optimizing the feature space (candidate predictors) prior to modeling is also covered. Because big data plays an increasingly vital role in fraud detection and description, SAS Visual Scenario Designer leverages the in-memory, parallel, and distributed computing abilities of SAS® LASR™ Analytic Server as a back end to support real-time performance on massive amounts of data.
Read the paper (PDF).
Yue Qi, SAS
Paper 3488-2015:
Text Analytics on Electronic Medical Record Data
This session describes our journey from data acquisition to text analytics on clinical, textual data.
Mark Pitts, Highmark Health
Paper 2920-2015:
Text Mining Kaiser Permanente Member Complaints with SAS® Enterprise Miner™
This presentation details the steps involved in using SAS® Enterprise Miner™ to text mine a sample of member complaints. Specifically, it describes how the Text Parsing, Text Filtering, and Text Topic nodes were used to generate topics that described the complaints. Text mining results are reviewed (slightly modified for confidentiality), as well as conclusions and lessons learned from the project.
Read the paper (PDF).
Amanda Pasch, Kaiser Permanenta
Paper 3504-2015:
The %ic_mixed Macro: A SAS Macro to Produce Sorted Information Criteria (AIC and BIC) List for PROC MIXED for Model Selection
PROC MIXED is one of the most popular SAS procedures to perform longitudinal analysis or multilevel models in epidemiology. Model selection is one of the fundamental questions in model building. One of the most popular and widely used strategies is model selection based on information criteria, such as Akaike Information Criterion (AIC) and Sawa Bayesian Information Criterion (BIC). This strategy considers both fit and complexity, and enables multiple models to be compared simultaneously. However, there is no existing SAS procedure to perform model selection automatically based on information criteria for PROC MIXED, given a set of covariates. This paper provides information about using the SAS %ic_mixed macro to select a final model with the smallest value of AIC and BIC. Specifically, the %ic_mixed macro will do the following: 1) produce a complete list of all possible model specifications given a set of covariates, 2) use do loop to read in one model specification every time and save it in a macro variable, 3) execute PROC MIXED and use the Output Delivery System (ODS) to output AICs and BICs, 4) append all outputs and use the DATA step to create a sorted list of information criteria with model specifications, and 5) run PROC REPORT to produce the final summary table. Based on the sorted list of information criteria, researchers can easily identify the best model. This paper includes the macro programming language, as well as examples of the macro calls and outputs.
Read the paper (PDF).
Qinlei Huang, St Jude Children's Research Hospital
Paper SAS1745-2015:
The SEM Approach to Longitudinal Data Analysis Using the CALIS Procedure
Researchers often use longitudinal data analysis to study the development of behaviors or traits. For example, they might study how an elderly person's cognitive functioning changes over time or how a therapeutic intervention affects a certain behavior over a period of time. This paper introduces the structural equation modeling (SEM) approach to analyzing longitudinal data. It describes various types of latent curve models and demonstrates how you can use the CALIS procedure in SAS/STAT® software to fit these models. Specifically, the paper covers basic latent curve models, such as unconditional and conditional models, as well as more complex models that involve multivariate responses and latent factors. All illustrations use real data that were collected in a study that looked at maternal stress and the relationship between mothers and their preterm infants. This paper emphasizes the practical aspects of longitudinal data analysis. In addition to illustrating the program code, it shows how you can interpret the estimation results and revise the model appropriately. The final section of the paper discusses the advantages and disadvantages of the SEM approach to longitudinal data analysis.
Read the paper (PDF).
Xinming An, SAS
Yiu-Fai Yung, SAS
Paper 3741-2015:
The Spatio-Temporal Impact of Urgent Care Centers on Physician and ER Use
The unsustainable trend in healthcare costs has led to efforts to shift some healthcare services to less expensive sites of care. In North Carolina, the expansion of urgent care centers introduces the possibility that non-emergent and non-life threatening conditions can be treated at a less intensive care setting. BCBSNC conducted a longitudinal study of density of urgent care centers, primary care providers, and emergency departments, and the differences in how members access care near those locations. This talk focuses on several analytic techniques that were considered for the analysis. The model needed to account for the complex relationship between the changes in the population (including health conditions and health insurance benefits) and the changes in the types of services and supply of services offered by healthcare providers proximal to them. Results for the chosen methodology are discussed.
Read the paper (PDF).
Laurel Trantham, Blue Cross and Blue Shield North Carolina
Paper 1600-2015:
Tips for Publishing in Health Care Journals with the Medical Expenditure Panel Survey (MEPS) Data Using SAS®
This presentation provides an in-depth analysis, with example SAS® code, of the health care use and expenditures associated with depression among individuals with heart disease using the 2012 Medical Expenditure Panel Survey (MEPS) data. A cross-sectional study design was used to identify differences in health care use and expenditures between depressed (n = 601) and nondepressed (n = 1,720) individuals among patients with heart disease in the United States. Multivariate regression analyses using the SAS survey analysis procedures were conducted to estimate the incremental health services and direct medical costs (inpatient, outpatient, emergency room, prescription drugs, and other) attributable to depression. The prevalence of depression among individuals with heart disease in 2012 was estimated at 27.1% (6.48 million persons) and their total direct medical costs were estimated at approximately $110 billion in 2012 U.S. dollars. Younger adults (< 60 years), women, unmarried, poor, and sicker individuals with heart disease were more likely to have depression. Patients with heart disease and depression had more hospital discharges (relative ratio (RR) = 1.06, 95% confidence interval (CI) [1.02 to 1.09]), office-based visits (RR = 1.27, 95% CI [1.15 to 1.41]), emergency room visits (RR = 1.08, 95% CI [1.02 to 1.14]), and prescribed medicines (RR = 1.89, 95% CI [1.70, 2.11]) than their counterparts without depression. Finally, among individuals with heart disease, overall health care expenditures for individuals with depression was 69% higher than that for individuals without depression (RR = 1.69, 95% CI [1.44, 1.99]). The conclusion is that depression in individuals with heart disease is associated with increased health care use and expenditures, even after adjusting for differences in age, gender, race/ethnicity, marital status, poverty level, and medical comorbidity.
Read the paper (PDF).
Seungyoung Hwang, Johns Hopkins University Bloomberg School of Public Health
U
Paper 3333-2015:
Understanding Patient Populations in New Hampshire using SAS® Visual Analytics
The NH Citizens Health Initiative and the University of New Hampshire Institute for Health Policy and Practice, in collaboration with Accountable Care Project (ACP) participants, have developed a set of analytic reports to provide systems undergoing transformation a capacity to compare performance on the measures of quality, utilization, and cost across systems and regions. The purpose of these reports is to provide data and analysis on which our ACP learning collaborative can share knowledge and develop action plans that can be adopted by health-care innovators in New Hampshire. This breakout session showcases the claims-based reports, powered by SAS® Visual Analytics and driven by the New Hampshire Comprehensive Health Care Information System (CHIS), which includes commercial, Medicaid, and Medicare populations. With the power of SAS Visual Analytics, hundreds of pages of PDF files were distilled down to a manageable, dynamic, web-based portal that allows users to target information most appealing to them. This streamlined approach reduces barriers to obtaining information, offers that information in a digestible medium, and creates a better user experience. For more information about the ACP or to access the public reports, visit http://nhaccountablecare.org/.
Read the paper (PDF).
Danna Hourani, SAS
Paper 3408-2015:
Understanding Patterns in the Utilization and Costs of Elbow Reconstruction Surgeries: A Healthcare Procedure that is Common among Baseball Pitchers
Athletes in sports, such as baseball and softball, commonly undergo elbow reconstruction surgeries. There is research that suggests that the rate of elbow reconstruction surgeries among professional baseball pitchers continues to rise by leaps and bounds. Given the trend found among professional pitchers, the current study reviews patterns of elbow reconstruction surgery among the privately insured population. The study examined trends (for example, cost, age, geography, and utilization) in elbow reconstruction surgeries among privately insured patients using analytic tools such as SAS® Enterprise Guide® and SAS® Visual Analytics, based on the medical and surgical claims data from the FAIR Health National Private Insurance Claims (NPIC) database. The findings of the study suggested that there are discernable patterns in the prevalence of elbow reconstruction surgeries over time and across specific geographic regions.
Read the paper (PDF). | Download the data file (ZIP).
Jeff Dang, FAIR Health
Paper 2740-2015:
Using Heat Maps to Compare Clusters of Ontario DWI Drivers
SAS® PROC FASTCLUS generates five clusters for the group of repeat clients of Ontario's Remedial Measures program. Heat map tables are shown for selected variables such as demographics, scales, factor, and drug use to visualize the difference between clusters.
Rosely Flam-Zalcman, CAMH
Robert Mann, CAM
Rita Thomas, CAMH
Paper 3320-2015:
Using PROC SURVEYREG and PROC SURVEYLOGISTIC to Assess Potential Bias
The Behavioral Risk Factor Surveillance System (BRFSS) collects data on health practices and risk behaviors via telephone survey. This study focuses on the question, On average, how many hours of sleep do you get in a 24-hour period? Recall bias is a potential concern in interviews and questionnaires, such as BRFSS. The 2013 BRFSS data is used to illustrate the proper methods for implementing PROC SURVEYREG and PROC SURVEYLOGISTIC, using the complex weighting scheme that BRFSS provides.
Read the paper (PDF).
Lucy D'Agostino McGowan, Vanderbilt University
Alice Toll, Vanderbilt University
Paper 3101-2015:
Using SAS® Enterprise Miner™ to Predict Breast Cancer at an Early Stage
Breast cancer is the leading cause of cancer-related deaths among women worldwide, and its early detection can reduce mortality rate. Using a data set containing information about breast screening provided by the Royal Liverpool University Hospital, we constructed a model that can provide early indication of a patient's tendency to develop breast cancer. This data set has information about breast screening from patients who were believed to be at risk of developing breast cancer. The most important aspect of this work is that we excluded variables that are in one way or another associated with breast cancer, while keeping the variables that are less likely to be associated with breast cancer or whose associations with breast cancer are unknown as input predictors. The target variable is a binary variable with two values, 1 (indicating a type of cancer is present) and 0 (indicating a type of cancer is not present). SAS® Enterprise Miner™ 12.1 was used to perform data validation and data cleansing, to identify potentially related predictors, and to build models that can be used to predict at an early stage the likelihood of patients developing breast cancer. We compared two models: the first model was built with an interactive node and a cluster node, and the second was built without an interactive node and a cluster node. Classification performance was compared using a receiver operating characteristic (ROC) curve and average squares error. Interestingly, we found significantly improved model performance by using only variables that have a lesser or unknown association with breast cancer. The result shows that the logistic model with an interactive node and a cluster node has better performance with a lower average squared error (0.059614) than the model without an interactive node and a cluster node. Among other benefits, this model will assist inexperienced oncologists in saving time in disease diagnosis.
Read the paper (PDF).
Gibson Ikoro, Queen Mary University of London
Beatriz de la Iglesia, University of East Anglia, Norwich, UK
Paper 1340-2015:
Using SAS® Macros to Flag Claims Based on Medical Codes
Many epidemiological studies use medical claims to identify and describe a population. But finding out who was diagnosed, and who received treatment, isn't always simple. Each claim can have dozens of medical codes, with different types of codes for procedures, drugs, and diagnoses. Even a basic definition of treatment could require a search for any one of 100 different codes. A SAS® macro may come to mind, but generalizing the macro to work with different codes and types allows it to be reused in a variety of different scenarios. We look at a number of examples, starting with a single code type and variable. Then we consider multiple code variables, multiple code types, and multiple flag variables. We show how these macros can be combined and customized for different data with minimal rework. Macro flexibility and reusability are also discussed, along with ways to keep our list of medical codes separate from our program. Finally, we discuss time-dependent medical codes, codes requiring database lookup, and macro performance.
Read the paper (PDF). | Download the data file (ZIP).
Andy Karnopp, Fred Hutchinson Cancer Research Center
Paper SAS1951-2015:
Using SAS® Text Analytics to Examine Labor and Delivery Sentiments on the Internet
In today's society, where seemingly unlimited information is just a mouse click away, many turn to social media, forums, and medical websites to research and understand how mothers feel about the birthing process. Mining the data in these resources helps provide an understanding of what mothers value and how they feel. This paper shows the use of SAS® Text Analytics to gather, explore, and analyze reports from mothers to determine their sentiment about labor and delivery topics. Results of this analysis could aid in the design and development of a labor and delivery survey and be used to understand what characteristics of the birthing process yield the highest levels of importance. These resources can then be used by labor and delivery professionals to engage with mothers regarding their labor and delivery preferences.
Read the paper (PDF).
Michael Wallis, SAS
Paper 3281-2015:
Using SAS® to Create Episodes-of-Hospitalization for Health Services Research
An essential part of health services research is describing the use and sequencing of a variety of health services. One of the most frequently examined health services is hospitalization. A common problem in describing hospitalizations is that a patient might have multiple hospitalizations to treat the same health problem. Specifically, a hospitalized patient might be (1) sent to and returned from another facility in a single day for testing, (2) transferred from one hospital to another, and/or (3) discharged home and re-admitted within 24 hours. In all cases, these hospitalizations are treating the same underlying health problem and should be considered as a single episode. If examined without regard for the episode, a patient would be identified as having 4 hospitalizations (the initial hospitalization, the testing hospitalization, the transfer hospitalization, and the readmission hospitalization). In reality, they had one hospitalization episode spanning multiple facilities. IMPORTANCE: Failing to account for multiple hospitalizations in the same episode has implications for many disciplines including health services research, health services planning, and quality improvement for patient safety. HEALTH SERVICES RESEARCH: Hospitalizations will be counted multiple times, leading to an overestimate of the number of hospitalizations a person had. For example, a person can be identified as having 4 hospitalizations when in reality they had one episode of hospitalization. This will result in a person appearing to be a higher user of health care than is true. RESOURCE PLANNING FOR HEALTH SERVICES. The average time and resources needed to treat a specific health problem may be underestimated. To illustrate, if a patient spends 10 days each in 3 different hospitals in the same episode, the total number of days needed to treat the health problem is 30 days, but each hospital will believe it is only 10, and planned resourcing may be inadequate. QUALITY IMPROVEMENT FOR PATIENT SAFETY. Hospital-acquir ed infections are a serious concern and a major cause of extended hospital stays, morbidity, and death. As a result, many hospitals have quality improvement programs that monitor the occurrence of infections in order to identify ways to reduce them. If episodes of hospitalizations are not considered, an infection acquired in a hospital that does not manifest until a patient is transferred to a different hospital will incorrectly be attributed to the receiving hospital. PROPOSAL: We have developed SAS® code to identify episodes of hospitalizations, the sequence of hospitalizations within each episode, and the overall duration of the episode. The output clearly displays the data in an intuitive and easy-to-understand format. APPLICATION: The method we will describe and the associated SAS code will be useful to not only health services researchers, but also anyone who works with temporal data that includes nested, overlapping, and subsequent events.
Read the paper (PDF).
Meriç Osman, Health Quality Council
Jacqueline Quail, Saskatchewan Health Quality Council
Nianping Hu, Saskatchewan Health Quality Council
Nedeene Hudema, Saskatchewan Health Quality Council
Paper 1335-2015:
Using the GLIMMIX and GENMOD Procedures to Analyze Longitudinal Data from a Department of Veterans Affairs Multisite Randomized Controlled Trial
Many SAS® procedures can be used to analyze longitudinal data. This study employed a multisite randomized controlled trial design to demonstrate the effectiveness of two SAS procedures, GLIMMIX and GENMOD, to analyze longitudinal data from five Department of Veterans Affairs Medical Centers (VAMCs). Older male veterans (n = 1222) seen in VAMC primary care clinics were randomly assigned to two behavioral health models, integrated (n = 605) and enhanced referral (n = 617). Data was collected at baseline, and at 3-, 6-, and 12- month follow-up. A mixed-effects repeated measures model was used to examine the dependent variable, problem drinking, which was defined as count and dichotomous from baseline to 12 month follow-up. Sociodemographics and depressive symptoms were included as covariates. First, bivariate analyses included general linear model and chi-square tests to examine covariates by group and group by problem drinking outcomes. All significant covariates were included in the GLIMMIX and GENMOD models. Then, multivariate analysis included mixed models with Generalized Estimation Equations (GEEs). The effect of group, time, and the interaction effect of group by time were examined after controlling for covariates. Multivariate results were inconsistent for GLIMMIX and GENMOD using Lognormal, Gaussian, Weibull, and Gamma distributions. SAS is a powerful statistical program in data analyses for longitudinal study.
Read the paper (PDF).
Abbas Tavakoli, University of South Carolina/College of Nursing
Marlene Al-Barwani, University of South Carolina
Sue Levkoff, University of South Carolina
Selina McKinney, University of South Carolina
Nikki Wooten, University of South Carolina
Paper SAS1855-2015:
Using the PHREG Procedure to Analyze Competing-Risks Data
Competing risks arise in studies in which individuals are subject to a number of potential failure events and the occurrence of one event might impede the occurrence of other events. For example, after a bone marrow transplant, a patient might experience a relapse or might die while in remission. You can use one of the standard methods of survival analysis, such as the log-rank test or Cox regression, to analyze competing-risks data, whereas other methods, such as the product-limit estimator, might yield biased results. An increasingly common practice of assessing the probability of a failure in competing-risks analysis is to estimate the cumulative incidence function, which is the probability subdistribution function of failure from a specific cause. This paper discusses two commonly used regression approaches for evaluating the relationship of the covariates to the cause-specific failure in competing-risks data. One approach models the cause-specific hazard, and the other models the cumulative incidence. The paper shows how to use the PHREG procedure in SAS/STAT® software to fit these models.
Read the paper (PDF).
Ying So, SAS
Paper 3376-2015:
Using the SAS-PIRT Macro for Estimating the Parameters of Polytomous Items
Polytomous items have been widely used in educational and psychological settings. As a result, the demand for statistical programs that estimate the parameters of polytomous items has been increasing. For this purpose, Samejima (1969) proposed the graded response model (GRM), in which category characteristic curves are characterized by the difference of the two adjacent boundary characteristic curves. In this paper, we show how the SAS-PIRT macro (a SAS® macro written in SAS/IML®) was developed based on the GRM and how it performs in recovering the parameters of polytomous items using simulated data.
Read the paper (PDF).
Sung-Hyuck Lee, ACT, Inc.
V
Paper SAS1888-2015:
Visualizing Clinical Trial Data: Small Data, Big Insights
Data visualization is synonymous with big data, for which billions of records and millions of variables are analyzed simultaneously. But that does not mean data scientists analyzing clinical trial data that include only thousands of records and hundreds of variables cannot take advantage of data visualization methodologies. This paper presents a comprehensive process for loading standard clinical trial data into SAS® Visual Analytics, an interactive analytic solution. The process implements template reporting for a wide variety of point-and-click visualizations. Data operations required to support this reporting are explained and examples of the actual visualizations are presented so that users can implement this reporting using their own data.
Read the paper (PDF).
Michael Drutar, SAS
Elliot Inman, SAS
W
Paper 3390-2015:
Working with PROC FEDSQL in SAS® 9.4
Working with multiple data sources in SAS® was not a straight forward thing until PROC FEDSQL was introduced in the SAS® 9.4 release. Federated Query Language, or FEDSQL, is a vendor-independent language that provides a common SQL syntax to communicate across multiple relational databases without having to worry about vendor-specific SQL syntax. PROC FEDSQL is a SAS implementation of the FEDSQL language. PROC FEDSQL enables us to write federated queries that can be used to perform joins on tables from different databases with a single query, without having to worry about loading the tables into SAS individually and combining them using DATA steps and PROC SQL statements. The objective of this paper is to demonstrate the working of PROC FEDSQL to fetch data from multiple data sources such as Microsoft SQL Server database, MySQL database, and a SAS data set, and run federated queries on all the data sources. Other powerful features of PROC FEDSQL such as transactions and FEDSQL pass-through facility are discussed briefly.
Read the paper (PDF).
Zabiulla Mohammed, Oklahoma State University
Ganesh Kumar Gangarajula, Oklahoma State University
Pradeep Reddy Kalakota, Federal Home Loan Bank of Desmoines
back to top