SAS Global Forum 2017 Proceedings

Disseminating data to potential collaborators can be essential in the development of models, algorithms, and innovative research opportunities. However, it is often time-consuming to get approval to access sensitive data such as health data. An alternative to sharing the real data is to use synthetic data, which has similar properties to the original data but does not disclose sensitive information. The collaborators can use the synthetic data to make preliminary models or to work out bugs in their code while waiting to get approval to access the original data. A data owner can also use the synthetic data to crowdsource solutions from the public through competitions like Kaggle and then test those solutions on the original data. This paper implements a method that generates fully synthetic data in a way that matches the statistical moments of the true data up to a specified moment order as a SAS^® macro. Variables in the synthetic data set are of the same data type as the true data (for example, integer, binary, continuous). The implementation uses the linear programming solver within a column generation algorithm and the mixed integer linear programming solver from the OPTMODEL procedure in SAS/OR^® software. The COFOR statement in PROC OPTMODEL automatically parallelizes a portion of the algorithm. This paper demonstrates the method by using the Sashelp.Heart data set to generate fully synthetic data copies.

Read the paper (PDF)

With an aim to improve rural healthcare, Oklahoma State University (OSU) Center for Health Systems Innovation (CHSI) conducted a study with primary care clinics (n=35) in rural Oklahoma to identify possible impediments to clinic workflows. The study entailed semi-structured personal interviews (n=241) and administered an online survey using an iPad (n=190). Respondents encompassed all consenting clinic constituents (physicians, nurses, practice managers, schedulers). Quantitative data from surveys revealed that electronic medical records (EMRs) are well accepted and contributed to increasing workflow efficiency. However, the qualitative data from interviews reveals that there are IT-related barriers like Internet connectivity, hardware problems, and inefficiencies in information systems. Interview responses identified six IT-related response categories (computer, connectivity, EMR-related, fax, paperwork, and phone calls) that routinely affect clinic workflow. These categories together account for more than 50% of all the routine workflow-related problems faced by the clinics. Text mining was performed on transcribed Interviews using SAS^® Text Miner to validate these six categories and to further identify concept linking for a quantifiable insight. Two variables (Redundancy Reduction and Idle Time Generation) were derived from survey questions with low scores of -129 and -64 respectively out of 384. Finally, ANOVA was run using SAS^® Enterprise Guide^® 6.1 to determine whether the six qualitative categories affect the two quantitative variables differently.

Read the paper (PDF)

Data is your friend. This presentation discusses the use of data for quality improvement (QI). Measurement over time is integral to quality improvement, and statistical process control charts (also known as Shewhart or SPC charts) are a good way to learn from the way measures change over time, in response to our improvement efforts. The presentation explains what an SPC chart is, how to chose the correct type of chart, how to create and update a chart using SAS^®, and how to learn from the chart. The examples come from QI projects in health care, and the material is based on the Institute for Healthcare Improvement's Model for Improvement. However, the material is applicable to other fields, including manufacturing and business. The presentation is intended for people newly considering a QI project, people who want to graph their data and need help with getting started, and anyone interested in interpreting SPC charts created by someone else.

Read the paper (PDF)

Every organization, from the most mature to a day-one start-up, needs to grow organically. A deep understanding of internal customer and operational data is the single biggest catalyst to develop and sustain the data. Advanced analytics and big data directly feed into this, and there are best practices that any organization (across the entire growth curve) can adopt to drive success. Analytics teams can be drivers of growth. But to be truly effective, key best practices need to be implemented. These practices include in-the-weeds details, like the approach to data hygiene, as well as strategic practices, like team structure and model governance. When executed poorly, business leadership and the analytics team are unable to communicate with each other they talk past each other and do not work together toward a common goal. When executed well, the analytics team is part of the business solution, aligned with the needs of business decision-makers, and drives the organization forward. Through our engagements, we have discovered best practices in three key areas. All three are critical to analytics team effectiveness. 1) Data Hygiene 2) Complex Statistical Modeling 3) Team Collaboration

Read the paper (PDF)

Conferences for SAS^® programming are replete with the newest software capabilities and clever programming techniques. However, discussion about quality control (QC) is lacking. QC is fundamental to ensuring both correct results and sound interpretation of data. It is not industry specific, and it simply makes sense. Most QC procedures are a function of regulatory requirements, industry standards, and corporate philosophies. Good QC goes well beyond just reviewing results, and should also consider the underlying data. It should be driven by a thoughtful consideration of relevance and impact. While programmers strive to produce correct results, it is no wonder that programming mistakes are common despite rigid QC processes in an industry where expedited deliverables and a lean workforce are the norm. This leads to a lack of trust in team members and an overall increase in resource requirements as these errors are corrected, particularly when SAS programming is outsourced. Is it possible to produce results with a high degree of accuracy, even when time and budget are limited? Thorough QC is easy to overlook in a high-pressure environment with increased expectations of workload and expedited deliverables. Does this suggest that QC programming is becoming a lost art, or does it simply suggest that we need to evolve with technology? The focus of the presentation is to review the who, what, when, how, why, and where of QC programming implementation.