SAS Global Forum 2015 Proceedings

The use of administrative databases for understanding practice patterns in the real world has become increasingly apparent. This is essential in the current health-care environment. The Affordable Care Act has helped us to better understand the current use of technology and different approaches to surgery. This paper describes a method for extracting specific information about surgical procedures from the Healthcare Cost and Utilization Project (HCUP) database (also referred to as the National (Nationwide) Inpatient Sample (NIS)).The analyses provide a framework for comparing the different modalities of surgerical procedures of interest. Using an NIS database for a single year, we want to identify cohorts based on surgical approach. We do this by identifying the ICD-9 codes specific to robotic surgery, laparoscopic surgery, and open surgery. After we identify the appropriate codes using an ARRAY statement, a similar array is created based on the ICD-9 codes. Any minimally invasive procedure (robotic or laparoscopic) that results in a conversion is flagged as a conversion. Comorbidities are identified by ICD-9 codes representing the severity of each subject and merged with the NIS inpatient core file. Using a FORMAT statement for all diagnosis variables, we create macros that can be regenerated for each type of complication. These created macros are compiled in SAS^® and stored in the library that contains the four macros that are called by tables. They call the macros for different macros variables. In addition, they create the frequencies of all cohorts and create the table structure with the title and number of the table. This paper describes a systematic method in SAS/STAT^® 9.2 to extract the data from NIS using the ARRAY statement for the specific ICD-9 codes, to format the extracted data for the analysis, to merge the different NIS databases by procedures, and to use automatic macros to generate the report.

Read the paper (PDF).

Data sharing through healthcare collaboratives and national registries creates opportunities for secondary data analysis projects. These initiatives provide data for quality comparisons as well as endless research opportunities to external researchers across the country. The possibilities are bountiful when you join data from diverse organizations and look for common themes related to illnesses and patient outcomes. With these great opportunities comes great pain for data analysts and health services researchers tasked with compiling these data sets according to specifications. Patient care data is complex, and, particularly at large healthcare systems, might be managed with multiple electronic health record (EHR) systems. Matching data from separate EHR systems while simultaneously ensuring the integrity of the details of that care visit is challenging. This paper demonstrates how data management personnel can use traditional SAS PROCs in new and inventive ways to compile, clean, and complete data sets for submission to healthcare collaboratives and other data sharing initiatives. Traditional data matching methods such as SPEDIS are uniquely combined with iterative SQL joins using the SAS^® functions INDEX, COMPRESS, CATX, and SUBSTR to yield the most out of complex patient and physician name matches. Recoding, correcting missing items, and formatting data can be efficiently achieved by using traditional functions such as MAX, PROC FORMAT, and FIND in new and inventive ways.

Read the paper (PDF).

For the Research Data Centers (RDCs) of the United States Census Bureau, the demand for disk space substantially increases with each passing year. Efficiently using the SAS^® data view might successfully address the concern about disk space challenges within the RDCs. This paper discusses the usage and benefits of the SAS data view to save disk space and reduce the time and effort required to manage large data sets. The ability and efficiency of the SAS data view to process regular ASCII, compressed ASCII, and other commonly used file formats are analyzed and evaluated in detail. The authors discuss ways in which using SAS data views is more efficient than the traditional methods in processing and deploying the large census and survey data in the RDCs.

Read the paper (PDF).

Proper management of master data is a critical component of any enterprise information system. However, effective master data management (MDM) requires that both IT and Business understand the life cycle of master data and the fundamental principles of entity resolution (ER). This presentation provides a high-level overview of current practices in data matching, record linking, and entity information life cycle management that are foundational to building an effective strategy to improve data integration and MDM. Particular areas of focus are: 1) The need for ongoing ER analytics--the systematic and quantitative measurement of ER performance; 2) Investing in clerical review and asserted resolution for continuous improvement; and 3) Addressing the large-scale ER challenge through distributed processing.

Read the paper (PDF). | Watch the recording.

While there has been tremendous progress in technologies related to data storage, high-performance computing, and advanced analytic techniques, organizations have only recently begun to comprehend the importance of parallel strategies that help manage the cacophony of concerns around access, quality, provenance, data sharing, and use. While data governance is not new, the drumbeat around it, along with master data management and data quality, is approaching a crescendo. Intensified by the increase in consumption of information, expectations about ubiquitous access, and highly dynamic visualizations, these factors are also circumscribed by security and regulatory constraints. In this paper, we provide a summary of what data governance is and its importance. We go beyond the obvious and provide practical guidance on what it takes to build out a data governance capability appropriate to the scale, size, and purpose of the organization and its culture. Moreover, we discuss best practices in the form of requirements that highlight what we think is important to consider as you provide that tactical linkage between people, policies, and processes to the actual data lifecycle. To that end, our focus includes the organization and its culture, people, processes, policies, and technology. Further, we include discussions of organizational models as well as the role of the data steward, and provide guidance on how to formalize data governance into a sustainable set of practices within your organization.

Read the paper (PDF). | Watch the recording.

It's well known that SAS^® is the leader in advanced analytics but often overlooked is the intelligent data preparation that combines information from disparate sources to enable confident creation and deployment of compelling models. Improving data-based decision making is among the top reasons why organizations decide to embark on master data management (MDM) projects and why you should consider incorporating MDM functionality into your analytics-based processes. MDM is a discipline that includes the people, processes, and technologies for creating an authoritative view of core data elements in enterprise operational and analytic systems. This paper demonstrates why MDM functionality is a natural fit for many SAS solutions that need to have access to timely, clean, and unique master data. Because MDM shares many of the same technologies that power SAS analytic solutions, it has never been easier to add MDM capabilities to your advanced analytics projects.

Read the paper (PDF).

In this session, I discuss an overall approach to governing Big Data. I begin with an introduction to Big Data governance and the governance framework. Then I address the disciplines of Big Data governance: data ownership, metadata, privacy, data quality, and master and reference data management. Finally, I discuss the reference architecture of Big Data, and how SAS^® tools can address Big Data governance.