SAS Global Forum 2016 Proceedings

In the regulatory world of patient safety and pharmacovigilance, whether it's during clinical trials or post-market surveillance, SAEs that affect participants must be collected, and if certain criteria are met, reported to the FDA and other regulatory authorities. SAEs are often entered into multiple databases by various users, resulting in possible data discrepancies and quality loss. Efforts have been made to reconcile the SAE data between databases, but there is no industrial standard regarding the methodology or tool employed for this task. Some organizations still reconcile the data manually, with visual inspections and vocal verification. Not only is this laborious and error-prone, it becomes prohibitive when the data reach hundreds of records. We devised an efficient algorithm using SAS^® to compare two data sources automatically. Our algorithm identifies matched, discrepant, and unpaired SAE records. Additionally, it employs a user-supplied list of synonyms to find non-identical but relevant matches. First, two data sources are combined and sorted by key fields such as Subject ID , Onset Date , Stop Date , and Event Term . Record counts and Levenshtein edit distances are calculated within certain groups to assist with sorting and matching. This combined record list is then fed into a DATA step to decide whether a record is paired or unpaired. For an unpaired record, a stub record with all fields set as ? is generated as a matching placeholder. Each record is written to one of two data sets. Later, the data sets are tagged and pulled into a comparison logic using hash objects, which enable field-by-field comparison and display discrepancies in clean format for each field. Identical fields or columns are cleared or removed for clarity. The result is a streamlined and user-friendly process that allows for fast and easy SAE reconciliation.

Read the paper (PDF)

Managing large data sets comes with the task of providing a certain level of quality assurance, no matter what the data is used for. We present here the fundamental SAS^® procedures to perform when determining the completeness of a data set. Even though each data set is unique and has its own variables that need more examination in detail, it is important to first examine the size, time, range, interactions, and purity (STRIP) of a data set to determine its readiness for any use. This paper covers first steps you should always take, regardless of whether you're dealing with health, financial, demographic, or environmental data.

Read the paper (PDF) | Watch the recording

Looking for new ways to improve your business? Try mining your own data! Event log data is a side product of information systems generated for audit and security purposes and is seldom analyzed, especially in combination with business data. Along with the cloud computing era, more event log data has been accumulated and analysts are searching for innovative ways to take advantage of all data resources in order to get valuable insights. Process mining, a new field for discovering business patterns from event log data, has recently proved useful for business applications. Process mining shares some algorithms with data mining but it is more focused on interpretation of the detected patterns rather than prediction. Analysis of these patterns can lead to improvements in the efficiency of common existing and planned business processes. Through process mining, analysts can uncover hidden relationships between resources and activities and make changes to improve organizational structure. This paper shows you how to use SAS^® Analytics to gain insights from real event log data.

Read the paper (PDF)

Statistical quality improvement is based on understanding process variation, which falls into two categories: variation that is natural and inherent to a process, and unusual variation due to specific causes that can be addressed. If you can distinguish between natural and unusual variation, you can take action to fix a broken process and avoid disrupting a stable process. A control chart is a tool that enables you to distinguish between the two types of variation. In many health care activities, carefully designed processes are in place to reduce variation and limit adverse events. The types of traditional control charts that are designed to monitor defect counts are not applicable to monitoring these rare events, because these charts tend to be overly sensitive, signaling unusual variation each time an event occurs. In contrast, specialized rare events charts are well suited to monitoring low-probability events. These charts have gained acceptance in health care quality improvement applications because of their ease of use and their suitability for processes that have low defect rates. The RAREEVENTS procedure, which is new in SAS/QC^® 14.1, produces rare events charts. This paper presents an overview of PROC RAREEVENTS and illustrates how you can use rare events charts to improve health care quality.

Read the paper (PDF)

Memory-based reasoning (MBR) is an empirical classification method that works by comparing cases in hand with similar examples from the past and then applying that information to the new case. MBR modeling is based on the assumptions that the input variables are numeric, orthogonal to each other, and standardized. The latter two assumptions are taken care of by principal components transformation of raw variables and using the components instead of the raw variables as inputs to MBR. To satisfy the first assumption, the categorical variables are often dummy coded. This raises issues such as increasing dimensionality and overfitting in the training data by introducing discontinuity in the response surface relating inputs and target variables. The Weight of Evidence (WOE) method overcomes this challenge. This method measures the relative response of the target for each group level of a categorical variable. Then the levels are replaced by the pattern of response of the target variable within that category. The SAS^® Enterprise Miner™ Interactive Grouping Node is used to achieve this. With this method, the categorical variables are converted into numeric variables. This paper demonstrates the improvement in performance of an MBR model when categorical variables are WOE coded. A credit screening data set obtained from SAS^® Education that comprises 25 attributes for 3000 applicants is used for this study. Three different types of MBR models were built using the SAS Enterprise Miner MBR node to check the improvement in performance. The results show the MBR model with WOE coded categorical variables is the best, based on misclassification rate. Using this data, when WOE coding is adopted, the model misclassification rate decreases from 0.382 to 0.344 while the sensitivity of the model increases from 0.552 to 0.572.

Read the paper (PDF)

The North Carolina Community College System office can quickly and easily enable colleges to compare their program's success to other college programs. Institutional researchers can now spend their days quickly looking at trends, abnormalities, and other colleges, compared to spending their days digging for data to load into a Microsoft Excel spreadsheet. We look at performance measures and how programs are being graded using SAS^® Visual Analytics.

Read the paper (PDF)

Automation of everyday activities holds the promise of consistency, accuracy, and relevancy. When applied to business operations, the additional benefits of governance, adaptability, and risk avoidance are realized. Prescriptive analytics empowers both systems and front-line workers to take the desired company action each and every time. And with data streaming from transactional systems, from the Internet of Things (IoT), and any other source, doing the right thing with exceptional processing speed produces the responsiveness that customers depend on. This talk describes how SAS^® and Teradata are enabling prescriptive analytics in current business environments and in the emerging IoT.

Read the paper (PDF) | View the e-poster or slides (PDF)

Educational systems at the district, state, and national levels all report possessing amazing student-level longitudinal data systems (LDS). Are the LDS systems improving educational outcomes for students? Are they guiding development of effective instructional practices? Are the standardized exams measuring student knowledge relative to the learning expectations? Many questions exist about the effective use of the LDS system and educational data, but data architecture and analytics (including the products developed by SAS^®) are not designed to answer any of these questions. However, the ability to develop more effective educational interfaces, improve use of data to the classroom level, and improve student outcomes, might only be available through use of SAS. The purpose of this session and paper is to demonstrate an integrated use of SAS tools to guide the transformation of data to analytics that improve educational outcomes for all students.

Read the paper (PDF)

The LACE readmission risk score is a methodology used by Kaiser Permanente Northwest (KPNW) to target and customize readmission prevention strategies for patients admitted to the hospital. This presentation shares how KPNW used SAS^® in combination with Epic's Datalink to integrate the LACE score into its electronic health record (EHR) for usage in real time. The LACE score is an objective measure, composed of four components including: L) length of stay; A) acuity of admission; C) pre-existing co-morbidities; and E) Emergency department (ED) visits in the prior six months. SAS was used to perform complex calculations and combine data from multiple sources (which was not possible for the EHR alone), and then to calculate a score that was integrated back into the EHR. The technical approach includes a trigger macro to kick off the process once the database ETL completes, several explicit and implicit proc SQL statements, a volatile temp table for filtering, and a series of SORT, MEANS, TRANSPOSE, and EXPORT procedures. We walk through the technical approach taken to generate and integrate the LACE score into Epic, as well as describe the challenges we faced, how we overcame them, and the beneficial results we have gained throughout the process.