Base SAS® Papers A-Z

Paper 1815-2014:
A Case Study: Performance Analysis and Optimization of SAS® Grid Computing Scaling on a Shared Storage
SAS® Grid Computing is a scale-out SAS® solution that enables SAS applications to better utilize computing resources, which is extremely I/O and compute intensive. It requires the use of a high-performance shared storage (SS) that allows all servers to access the same file systems. SS may be implemented via traditional NFS NAS or clustered file systems (CFS) like GPFS. This paper uses the Lustre* file system, a parallel, distributed CFS, for a case study of performance scalability of SAS Grid Computing nodes on SS. The paper qualifies the performance of a standardized SAS workload running on Lustre at scale. Lustre has been traditionally used for large and sequential I/O. We will record and present the tuning changes necessary for the optimization of Lustre for the SAS applications. In addition, results from the scaling of SAS Cluster jobs running on Lustre will be presented.
Suleyman Sair, Intel Corporation
Brett Lee, Intel Corporation
Ying M. Zhang, Intel Corporation
Paper 1548-2014:
A Framework Based on SAS® for ETL and Reporting
Nowadays, most corporations build and maintain their own data warehouse, and an ETL (Extract, Transform, and Load) process plays a critical role in managing the data. Some people might create a large program and execute this program from top to bottom. Others might generate a SAS® driver with several programs included, and then execute this driver. If some programs can be run in parallel, then developers must write extra code to handle these concurrent processes. If one program fails, then users can either rerun the entire process or comment out the successful programs and resume the job from where the program failed. Usually the programs are deployed in production with read and execute permission only. Users do not have the priviledge of modifying codes on the fly. In this case, how do you comment out the programs if the job terminated abnormally? This paper illustrates an approach for managing ETL process flows. The approach uses a framework based on SAS, on a UNIX platform. This is a high-level infrastructure discussion with some explanation of the SAS codes that are used to implement the framework. The framework supports the rerun or partial run of the entire process without changing any source codes. It also supports the concurrent process, and therefore no extra code is needed.
Kevin Chung, Fannie Mae
Paper 1876-2014:
A Mental Health and Risk Behavior Analysis of American Youth Using PROC FACTOR and SURVEYLOGISTIC
The current study looks at recent health trends and behavior analyses of youth in America. Data used in this analysis was provided by the Center for Disease Control and Prevention and gathered using the Youth Risk Behavior Surveillance System (YRBSS). A factor analysis was performed to identify and define latent mental health and risk behavior variables. A series of logistic regression analyses were then performed using the risk behavior and demographic variables as potential contributing factors to each of the mental health variables. Mental health variables included disordered eating and depression/suicidal ideation data, while the risk behavior variables included smoking, consumption of alcohol and drugs, violence, vehicle safety, and sexual behavior data. Implications derived from the results of this research are a primary focus of this study. Risks and benefits of using a factor analysis with logistic regression in social science research will also be discussed in depth. Results included reporting differences between the years of 1991 and 2011. All results are discussed in relation to current youth health trend issues. Data was analyzed using SAS® 9.3.
Deanna Schreiber-Gregory, North Dakota State University
Paper 1752-2014:
A Note on Type Conversions and Numeric Precision in SAS®: Numeric to Character and Back Again
One of the first lessons that SAS® programmers learn on the job is that numeric and character variables do not play well together, and that type mismatches are one of the more common source of errors in their otherwise flawless SAS programs. Luckily, converting variables from one type to another in SAS (that is, casting) is not difficult, requiring only the judicious use of either the input() or put() function. There remains, however, the danger of data being lost in the conversion process. This type of error is most likely to occur in cases of character-to-numeric variable conversion, most especially when the user does not fully understand the data contained in the data set. This paper will review the basics of data storage for character and numeric variables in SAS, the use of formats and informats for conversions, and how to ensure accurate type conversion of even high-precision numeric values.
Andrew Clapson, Statistics Canada
Paper 1797-2014:
A Paradigm Shift: Complex Data Manipulations with DS2 and In-Memory Data Structures
Complex data manipulations can be resource intensive, both in terms of development time and processing duration. However, in recent years SAS has introduced a number of new technologies that, when used together, can produce a dramatic increase in performance while simultaneously simplifying program development and maintenance. This paper presents a development paradigm that utilizes the problem decomposition capabilities of DS2, the flexibility of SQL, and the performance benefits of in-memory storage using hash objects.
Shaun Kaufmann, Farm Credit Canada
Paper 1793-2014:
A Poor/Rich SAS® User's PROC EXPORT
Have you ever wished that with one click you could copy any SAS® data set, including variable names, so that you could paste the text into a Microsoft Word file, Microsoft PowerPoint slide, or spreadsheet? You can and, with just Base SAS®, there are some little-known but easy-to use methods that are available for automating many of your (or your users ) common tasks.
Arthur Tabachneck, myQNA, Inc.
Tom Abernathy, Pfizer, Inc.
Matthew Kastin, I-Behavior, Inc.
Paper 1442-2014:
A Risk Score Calculator for Short-Term Morbidity Following Hip Fracture Surgery
Hip fractures are a common source of morbidity and mortality among the elderly. While multiple prior studies have identified risk factors for poor outcomes, few studies have presented a validated method for stratifying patient risk. The purpose of this study was to develop a simple risk score calculator tool predictive of 30-day morbidity after hip fracture. To achieve this, we prospectively queried a database maintained by The American College of Surgeons (ACS) National Surgical Quality Improvement Program (NSQIP) to identify all cases of hip fracture between 2005 and 2010, based on primary Current Procedural Terminology (CPT) codes. Patient demographics, comorbidities, laboratory values, and operative characteristics were compared in a univariate analysis, and a multivariate logistic regression analysis was then used to identify independent predictors of 30-day morbidity. Weighted values were assigned to each independent risk factor and were used to create predictive models of 30-day complication risk. The models were internally validated with randomly partitioned 80%/20% cohort groups. We hypothesized that significant predictors of morbidity could be identified and used in a predictive model for a simple risk score calculator. All analyses are performed via SAS® software.
Yubo Gao, University of Iowa Hospitals and Clinics
Paper 1657-2014:
A SAS® Macro for Complex Sample Data Analysis Using Generalized Linear Models
This paper shows users how they can use a SAS® macro named %SURVEYGLM to incorporate information about survey design to Generalized Linear Models (GLM). The R function %svyglm (Lumley, 2004) was used to verify the suitability of the %SURVEYGLM macro estimates. The results show that estimates are closer than the R function and that new distributions can be easily added to the algorithm.
Paulo Dourado, University of Brasilia
Alan Silva, Universidade de Brasilia
Paper 1461-2014:
A SAS® Macro to Diagnose Influential Subjects in Longitudinal Studies
Influence analysis in statistical modeling looks for observations that unduly influence the fitted model. Cook s distance is a standard tool for influence analysis in regression. It works by measuring the difference in the fitted parameters as individual observations are deleted. You can apply the same idea to examining influence of groups of observations for example, the multiple observations for subjects in longitudinal or clustered data but you need to adapt it to the fact that different subjects can have different numbers of observations. Such an adaptation is discussed by Zhu, Ibrahim, and Cho (2012), who generalize the subject size factor as the so-called degree of perturbation, and correspondingly generalize Cook s distances as the scaled Cook s distance. This paper presents the %SCDMixed SAS® macro, which implements these ideas for analyzing influence in mixed models for longitudinal or clustered data. The macro calculates the degree of perturbation and scaled Cook s distance measures of Zhu et al. (2012) and presents the results with useful tabular and graphical summaries. The underlying theory is discussed, as well as some of the programming tricks useful for computing these influence measures efficiently. The macro is demonstrated using both simulated and real data to show how you can interpret its results for analyzing influence in your longitudinal modeling.
Grant Schneider, The Ohio State University
Randy Tobias, SAS Institute
Paper 1822-2014:
A Stepwise Algorithm for Generalized Linear Mixed Models
Stepwise regression includes regression models in which the predictive variables are selected by an automated algorithm. The stepwise method involves two approaches: backward elimination and forward selection. Currently, SAS® has three procedures capable of performing stepwise regression: REG, LOGISTIC, and GLMSELECT. PROC REG handles the linear regression model, but does not support a CLASS statement. PROC LOGISTIC handles binary responses and allows for logit, probit, and complementary log-log link functions. It also supports a CLASS statement. The GLMSELECT procedure performs selections in the framework of general linear models. It allows for a variety of model selection methods, including the LASSO method of Tibshirani (1996) and the related LAR method of Efron et al. (2004). PROC GLMSELECT also supports a CLASS statement. We present a stepwise algorithm for generalized linear mixed models for both marginal and conditional models. We illustrate the algorithm using data from a longitudinal epidemiology study aimed to investigate parents beliefs, behaviors, and feeding practices that associate positively or negatively with indices of sleep quality.
Nagaraj Neerchal, University of Maryland Baltimore County
Jorge Morel, Procter and Gamble
Xuang Huang, University of Maryland Baltimore County
Alain Moluh, University of Maryland Baltimore County
Paper 1636-2014:
A Way to Fetch User Reviews from iTunes Using SAS®
This paper simply develops a new SAS® macro, which allows you to scrap user textual reviews from Apple iTunes store for iPhone applications. It not only can help you understand your customers experiences and needs, but also can help you be aware of your competitors user experiences. The macro uses API in iTunes and PROC HTTP in SAS to extract and create data sets. This paper also shows how you can use the application ID and country code to extract user reviews.
Jiawen Liu, Qualex Consulting Services, Inc.
Mantosh Kumar Sarkar, Verizon
Meizi Jin, Oklahoma State University
Goutam Chakraborty, Oklahoma State University
Paper 1881-2014:
Absolute_Pixel_Width? Taming Column Widths in the ExcelXP Tagset
The ExcelXP tagset offers several options for controlling column widths, including Width_Points, Width_Fudge, and Absolute_Column_Width. Although Absolute_Column_Width might seem unpredictable at first, it is possible to fix the first two options so that the Absolute_Column_Width is the exact column width in pixels. This poster presents these settings and suggests how to create and manage the integer string of column widths.
Dylan Ellis, Mathematica Policy Research
Paper 1277-2014:
Adding Serial Numbers to SQL Data
Structured Query Language (SQL) does not recognize the concept of row order. Instead, query results are thought of as unordered sets of rows. Most workarounds involve including serial numbers, which can then be compared or subtracted. This presentation illustrates and compares five techniques for creating serial numbers.
Howard Schreier, Howles Informatics
Paper 1850-2014:
Adding the Power of DataFlux® to SAS® Programs Using the DQMATCH Function
The SAS® Data Quality Server allows SAS® programmers to integrate the power of DataFlux® into their data cleaning programs. The power of SAS Data Quality Server enables programmers to efficiently identify matching records across different datasets when exact matches are not present. During a recent educational research project, the DQMATCH function proved very capable when trying to link records from disparate data sources. Two key insights led to even greater success in linking records. The first insight was acknowledging that the hierarchical structure of data can greatly improve success in matching records. The second insight was that the names of individuals can be restructured to improve the chances of successful matches. This paper provides an overview of how these insights were implemented using the DQMATCH function to link educational data from multiple sources.
Pat Taylor, University of Houston
Lee Branum-Martin, Georgia State University
Paper 1570-2014:
Adjusting Clustering: Minimize Your Suffering!
Cluster (group) randomization in trials is increasingly used over patient-level randomization. There are many reasons for this, including more pragmatic trials associated with comparative effectiveness research. Examples of clusters that could be randomized for study are clinics or hospitals, counties within a state, and other geographical areas such as communities. In many of these trials, the number of clusters is relatively small. This can be a problem if there are important covariates at the cluster level that are not balanced across the intervention and control groups. For example, if we randomize eight counties, a simple randomization could put all counties with high socioeconomic status in one group or the other, leaving us without good comparison data. There are strategies to prevent an unlucky cluster randomization. These include matching, stratification, minimization and covariate-constrained randomization. Each method is discussed, and a county-level Health Economics example of covariate-constrained randomization is shown for intermediate SAS® users working with SAS® Foundation for Release 9.2 and SAS/STAT® on a Windows operating system.
Brenda Beaty, University of Colorado
L. Miriam Dickinson, University of Colorado
Paper 1720-2014:
An Ensemble Approach for Integrating Intuition and Models
Finding groups with similar attributes is at the core of knowledge discovery. To this end, Cluster Analysis automatically locates groups of similar observations. Despite successful applications, many practitioners are uncomfortable with the degree of automation in Cluster Analysis, which causes intuitive knowledge to be ignored. This is more true in text mining applications since individual words have meaning beyond the data set. Discovering groups with similar text is extremely insightful. However, blind applications of clustering algorithms ignore intuition and hence are unable to group similar text categories. The challenge is to integrate the power of clustering algorithms with the knowledge of experts. We demonstrate how SAS/STAT® 9.2 procedures and the SAS® Macro Language are used to ensemble the opinion of domain experts with multiple clustering models to arrive at a consensus. The method has been successfully applied to a large data set with structured attributes and unstructured opinions. The result is the ability to discover observations with similar attributes and opinions by capturing the wisdom of the crowds whether man or model.
Masoud Charkhabi, Canadian Imperial Bank of Commerce (CIBC)
Ling Zhu, Canadian Imperial Bank of Commerce (CIBC)
Paper 1869-2014:
An Intermediate Primer to Estimating Linear Multilevel Models Using SAS® PROC MIXED
This paper expands upon A Multilevel Model Primer Using SAS® PROC MIXED in which we presented an overview of estimating two- and three-level linear models via PROC MIXED. However, in our earlier paper, we, for the most part, relied on simple options available in PROC MIXED. In this paper, we present a more advanced look at common PROC MIXED options used in the analysis of social and behavioral science data, as well introduce users to two different SAS macros previously developed for use with PROC MIXED: one to examine model fit (MIXED_FIT) and the other to examine distributional assumptions (MIXED_DX). Specific statistical options presented in the current paper include (a) PROC MIXED statement options for estimating statistical significance of variance estimates (COVTEST, including problems with using this option) and estimation methods (METHOD =), (b) MODEL statement option for degrees of freedom estimation (DDFM =), and (c) RANDOM statement option for specifying the variance/covariance structure to be used (TYPE =). Given the importance of examining model fit, we also present methods for estimating changes in model fit through an illustration of the SAS macro MIXED_FIT. Likewise, the SAS macro MIXED_DX is introduced to remind users to examine distributional assumptions associated with two-level linear models. To maintain continuity with the 2013 introductory PROC MIXED paper, thus providing users with a set of comprehensive guides for estimating multilevel models using PROC MIXED, we use the same real-world data sources that we used in our earlier primer paper.
Bethany Bell, University of South Carolina
Whitney Smiley, University of South Carolina
Mihaela Ene, University of South Carolina
Genine Blue, University of South Carolina
Paper 1842-2014:
An Investigation of the Kolmogorov-Smirnov Nonparametric Test Using SAS®
The Kolmogorov-Smirnov (K-S) test is one of the most useful and general nonparametric methods for comparing two samples. It is sensitive to all types of differences between two populations (shift, scale, shape, and so on). In this paper, we will present a thorough investigation into the K-S test including, derivation of the formal test procedure, practical demonstration of the test, large sample approximation of the test, and ease of use in SAS® using the NPAR1WAY procedure.
Tison Bolen, Cardinal Health
Dawit Mulugeta, Cardinal Health
Jason Greenfield, Cardinal Health
Lisa Conley, Cardinal Health
Paper 1507-2014:
An implementation of MapReduce in Base SAS®
Big data! Hadoop! MapReduce! These are all buzzwords that you ve probably already heard mentioned at SAS® Global Forum 2014. But what exactly is MapReduce and what has it got to do with SAS®? This talk explains how a simple processing framework (created by Google and more recently popularized by the open-source technology Hadoop) can be replicated using cornerstone SAS technologies such as Base SAS®, SAS macros, and SAS/CONNECT®. The talk explains how, out of the box, the SAS DATA step can replicate the MAP function. It looks at how well-established SAS procedures can be used to create reduce-like functionality. We look at how parallel processing data across multiple machines using MPCONNECT can replicate MapReduce s shared-nothing approach to data processing.
David Moors, Whitehound Limited
Paper 1565-2014:
Analyzing U.S. Healthcare Cost and Use with SAS®
A central component of discussions of healthcare reform in the U.S. are estimations of healthcare cost and use at the national or state level, as well as for subpopulation analyses for individuals with certain demographic properties or medical conditions. For example, a striking but persistent observation is that just 1% of the U.S. population accounts for more than 20% of total healthcare costs, and 5% account for almost 50% of total costs. In addition to descriptions of specific data sources underlying this type of observation, we demonstrate how to use SAS® to generate these estimates and to extend the analysis in various ways; that is, to investigate costs for specific subpopulations. The goal is to provide SAS programmers and healthcare analysts with sufficient data-source background and analytic resources to independently conduct analyses on a wide variety of topics in healthcare research. For selected examples, such as the estimates above, we concretely show how to download the data from federal web sites, replicate published estimates, and extend the analysis. An added plus is that most of the data sources we describe are available as free downloads.
Paul Gorrell, IMPAQ International
Paper 1877-2014:
Answer Frequently Asked SAS® Usage Questions with the Help of RTRACE
A SAS® license of any organization consists of a variety of SAS components such as SAS/STAT®, SAS/GRAPH®, SAS/OR®, and so on. SAS administrators do not have any automated tool supplied with Base SAS® software to find how many licensed copies are being actively used, how many SAS users are actively utilizing the SAS server, and how many SAS datasets are being referenced. These questions help a SAS administrator to take important decisions such as controlling SAS licenses, removing inactive SAS users, purging long-time non-referenced SAS data sets, and so on. With the help of a system parameter that is provided by SAS and called RTRACE, these questions can be answered. The goal of this paper is to explain the setup of the RTRACE parameter and to explain its use in making the SAS administrator s life easy. This paper is based on SAS® 9.2 running on AIX operating system.
Airaha Chelvakkanthan Manickam, Cognizant Technology Solutions
Paper 1630-2014:
Application of Survey Sampling for Quality Control
Sampling is widely used in different fields for quality control, population monitoring, and modeling. However, the purposes of sampling might be justified by the business scenario, such as legal or compliance needs. This paper uses one probability sampling method, stratified sampling, combined with quality control review business cost to determine an optimized procedure of sampling that satisfies both statistical selection criteria and business needs. The first step is to determine the total number of strata by grouping the strata with a small number of sample units, using box-and-whisker plots outliers as a whole. Then, the cost to review the sample in each stratum is justified by a corresponding business counter-party, which is the human working hour. Lastly, using the determined number of strata and sample review cost, optimal allocation of predetermined total sample population is applied to allocate the sample into different strata.
Yi Du, Freddie Mac
Paper 1779-2014:
Assessing the Impact of Factors in a Facebook Post that Influence the EdgeRank Metric of Facebook Using the Power Ratio
Most marketers today are trying to use Facebook s network of 1.1 billion plus registered users for social media marketing. Local television stations and newspapers are no exception. This paper investigates what makes a post effective. A Facebook page that is owned by a brand has fans, or people who like the page and follow the stories posted on that page. The posts on a brand page, however, do not appear on all the fans News Feeds. This is determined by EdgeRank, a Facebook proprietary algorithm that determines what content users see and how it s prioritized on their News Feed. If marketers can understand how EdgeRank works, then they can develop more impactful posts and ultimately produce more effective social marketing using Facebook. The objective of this paper is to find the characteristics of a Facebook post that enhance the efficacy of a news outlet s page among their fans using Facebook Power Ratio as the target variable. Power Ratio, a surrogate to EdgeRank, was developed by experts at Frank N. Magid Associates, a research-based media consulting firm. Seventeen variables that describe the characteristics of a post were extracted from more than 8,000 posts, which were encoded by 10 media experts at Magid. Numerous models were built and compared to predict Power Ratio. The most useful model is a polynomial regression with the top three important factors as whether a post asks fans to like the post, content category of a post (including news, weather, etc.), and number of fans of the page.
Dinesh Yadav Gaddam, Oklahoma State University
Yogananda Domlur Seetharama, Oklahoma State University
Paper 1605-2014:
Assigning Agents to Districts under Multiple Constraints Using PROC CLP
The Challenge: assigning outbound calling agents in a telemarketing campaign to geographic districts. The districts have a variable number of leads, and each agent needs to be assigned entire districts with the total number of leads being as close as possible to a specified number for each of the agents (usually, but not always, an equal number). In addition, there are constraints concerning the distribution of assigned districts across time zones in order to maximize productivity and availability. Our Solution: use the SAS/OR® procedure PROC CLP to formulate the challenge as a constraint satisfaction problem (CSP) since the objective is not necessarily to minimize a cost function, but rather to find a feasible solution to the constraint set. The input consists of the number of agents, the number of districts, the number of leads in each district, the desired number of leads per agent, the amount by which the actual number of leads can differ from the desired number, and the time zone for each district.
Kevin Gillette, Accenture
Stephen Sloan, Accenture
Paper 1545-2014:
Association Mining of Brain Data: An EEG Study
Many different neuroscience researchers have explored how various parts of the brain are connected, but no one has performed association mining using brain data. In this study, we used SAS® Enterprise Miner 7.1 for association mining of brain data collected by a 14-channel EEG device. An application of the association mining technique is presented in this novel context of brain activities and by linking our results to theories of cognitive neuroscience. The brain waves were collected while a user processed information about Facebook, the most well-known social networking site. The data was cleaned using Independent Component Analysis via an open source MATLAB package. Next, by applying the LORETA algorithm, activations at every fraction of the second were recorded. The data was codified into transactions to perform association mining. Results showing how various parts of brain get excited while processing the information are reported. This study provides preliminary insights into how brain wave data can be analyzed by widely available data mining techniques to enhance researcher s understanding of brain activation patterns.
Pankush Kalgotra, Oklahoma State University
Ramesh Sharda, Oklahoma State University
Goutam Chakraborty, Oklahoma State University
Paper 1746-2014:
Automatic Detection of Section Membership for SAS® Conference Paper Abstract Submissions: A Case Study
Do you have an abstract for an idea that you want to submit as a proposal to SAS® conferences, but you are not sure which section is the most appropriate one? In this paper, we discuss a methodology for automatically identifying the most suitable section or sections for your proposal content. We use SAS® Text Miner 12.1 and SAS® Content Categorization Studio 12.1 to develop a rule-based categorization model. This model is used to automatically score your paper abstract to identify the most relevant and appropriate conference sections to submit to for a better chance of acceptance.
Goutam Chakraborty, Oklahoma State University
Murali Pagolu, SAS
Paper 1732-2014:
Automatic and Efficient Post-Campaign Analyses By Using SAS® Macro Programs
In our previous work, we often needed to perform large numbers of repetitive and data-driven post-campaign analyses to evaluate the performance of marketing campaigns in terms of customer response. These routine tasks were usually carried out manually by using Microsoft Excel, which was tedious, time-consuming, and error-prone. In order to improve the work efficiency and analysis accuracy, we managed to automate the analysis process with SAS® programming and replace the manual Excel work. Through the use of SAS macro programs and other advanced skills, we successfully automated the complicated data-driven analyses with high efficiency and accuracy. This paper presents and illustrates the creative analytical ideas and programming skills for developing the automatic analysis process, which can be extended to apply in a variety of business intelligence and analytics fields.
Justin Jia, Canadian Imperial Bank of Commerce (CIBC)
Amanda Lin, Bell Canada
Paper 1617-2014:
Basic Concepts for Documenting SAS® Projects: Documentation Styles for SAS Projects, Programs, and Variables
This paper kicks off a project to write a comprehensive book of best practices for documenting SAS® projects. The presenter s existing documentation styles are explained. The presenter wants to discuss and gather current best practices used by the SAS user community. The presenter shows documentation styles at three different levels of scope. The first is a style used for project documentation, the second a style for program documentation, and the third a style for variable documentation. This third style enables researchers to repeat the modeling in SAS research, in an alternative language, or conceptually.
Peter Timusk, Statistics Canada
Paper 1449-2014:
Basic SAS® PROCedures for Producing Quick Results
As IT professionals, saving time is critical. Delivering timely and quality-looking reports and information to management, end users, and customers is essential. SAS® provides numerous 'canned' PROCedures for generating quick results to take care of these needs ... and more. In this hands-on workshop, attendees acquire basic insights into the power and flexibility offered by SAS PROCedures using PRINT, FORMS, and SQL to produce detail output; FREQ, MEANS, and UNIVARIATE to summarize and create tabular and statistical output; and data sets to manage data libraries. Additional topics include techniques for informing SAS which data set to use as input to a procedure, how to subset data using a WHERE statement (or WHERE= data set option), and how to perform BY-group processing.
Kirk Paul Lafler, Software Intelligence Corporation
Paper 1722-2014:
Bayesian Framework in Early Phase Drug Development with SAS® Examples
There is an ever-increasing number of study designs and analysis of clinical trials using Bayesian frameworks to interpret treatment effects. Many research scientists prefer to understand the power and probability of taking a new drug forward across the whole range of possible true treatment effects, rather than focusing on one particular value to power the study. Examples are used in this paper to show how to compute Bayesian probabilities using the SAS/STAT® MIXED procedure and UNIVARIATE procedure. Particular emphasis is given to the application on efficacy analysis, including the comparison of new drugs to placebos and to standard drugs on the market.
Howard Liang, inVentiv health Clinical
Paper 1444-2014:
Before You Get Started: A Macro Language Preview in Three Parts. Part 1: What the Language Is, What It Does, and What It Can Do
As complicated as the macro language is to learn, there are very strong reasons for doing so. At its heart, the macro language is a code generator. In its simplest uses, it can substitute simple bits of code like variable names and the names of data sets that are to be analyzed. In more complex situations, it can be used to create entire statements and steps based on information may even be unavailable to the person writing or even executing the macro. At the time of execution, it can be used to make queries of the SAS® environment as well as the operating system, and utilize the gathered information to make informed decisions about how it is to further function and execute.
Art Carpenter, California Occidental Consultants
Paper 1445-2014:
Before You Get Started: A Macro Language Preview in Three Parts. Part 2: It's All about the Timing.Why the Macro Language Comes First
Because the macro language is primarily a code generator, it makes sense that the code that it creates must be generated before it can be executed. This implies that execution of the macro language comes first. Simple as this is in concept, timing issues and conflicts are often not so simple to recognize in application. As we use the macro language to take on more complex tasks, it becomes even more critical that we have an understanding of these issues.
Art Carpenter, California Occidental Consultants
Paper 1447-2014:
Before You Get Started: A Macro Language Preview in Three Parts. Part 3: Creating Macro Variables and Demystifying Their Scope
Macro variables and their values are stored in symbol tables, which in turn are held in memory. Not only are there are a number of ways to create macro variables, but they can be created in a wide variety of situations. How they are created and under what circumstances effects the variable s scope how and where the macro variable is stored and retrieved. There are a number of misconceptions about macro variable scope and about how the macro variables are assigned to symbol tables. These misconceptions can cause problems that the new, and sometimes even the experienced, macro programmer does not anticipate. Understanding the basic rules for macro variable assignment can help the macro programmer solve some of these problems that are otherwise quite mystifying.
Art Carpenter, California Occidental Consultants
Paper 1682-2014:
Big Data Analysis for Resource-Constrained Surgical Scheduling
The scheduling of surgical operations in a hospital is a complex problem, with each surgical specialty having to satisfy their demand while competing for resources with other hospital departments. This project extends the construction of a weekly timetable, the Master Surgery Schedule, which assigns surgical specialties to operating theater sessions by taking into account the post-surgery resource requirements, primarily post-operative beds on hospital wards. Using real data from the largest teaching hospital in Wales, UK, this paper describes how SAS® has been used to analyze large data sets to investigate the relationship between the operating theater schedule and the demand for beds on wards in the hospital. By understanding this relationship, a more well-informed and robust operating theater schedule can be produced that delivers economic benefit to the hospital and a better experience for the patients by reducing the number of cancelled operations caused by the unavailability of beds on hospital wards.
Elizabeth Rowse, Cardiff University
Paul Harper, Cardiff University
Paper 1792-2014:
Big Data/Metadata Governance
The emerging discipline of data governance encompasses data quality assurance, data access and use policy, security risks and privacy protection, and longitudinal management of an organization s data infrastructure. In the interests of forestalling another bureaucratic solution to data governance issues, this presentation features database programming tools that provide rapid access to big data and make selective access to and restructuring of metadata practical.
Sigurd Hermansen, Westat
Paper 1549-2014:
Build your Metadata with PROC CONTENTS and ODS OUTPUT
Simply using an ODS destination to replay PROC CONTENTS output does not provide the user with attractive, usable metadata. Harness the power of SAS® and ODS output objects to create designer multi-tab metadata workbooks with the click of a mouse!
Louise Hadden, Abt Associates Inc.
Paper 1809-2014:
CMS Core Measures, the Affordable Care Act, and SAS® Visual Analytics
The Affordable Care Act (ACA) contains provisions that have stimulated interest in analytics among health care providers, especially those provisions that address quality of outcomes. High Impact Technologies (HIT) has been addressing these issues since before passage of the ACA and has a Health Care Data Model recognized by Gartner and implemented at several health care providers. Recently, HIT acquired SAS® Visual Analytics, and this paper reports our successful efforts to use SAS Visual Analytics for visually exploring Big Data for health care providers. Health care providers can suffer significant financial penalties for readmission rates above a certain threshold and other penalties related to quality of care. We have been able to use SAS Visual Analytics, coupled with our experience gained from implementing the HIT Healthcare Data Model at a number of Healthcare providers, to identify clinical measures that are significant predictors for readmission. As a result, we can help health care providers reduce the rate of 30-day readmissions.
Joe Whitehurst, High Impact Technologies
Diane Hatcher, SAS
Paper 1825-2014:
Calculate All Kappa Statistics in One Step
The use of Cohen s kappa has enjoyed a growing popularity in the social sciences as a way of evaluating rater agreement on a categorical scale. The kappa statistic can be calculated as Cohen first proposed it in his 1960 paper or by using any one of a variety of weighting schemes. The most popular among these are the linear weighted kappa and the quadratic weighted kappa. Currently, SAS® users can produce the kappa statistic of their choice through PROC FREQ and the use of relevant AGREE options. Complications arise however when the data set does not contain a completely square cross-tabulation of data. That is, this method requires that both raters have to have at least one data point for every available category. There have been many solutions offered for this predicament. Most suggested solutions include the insertion of dummy records into the data and then assigning a weight of zero to those records through an additional class variable. The result is a multi-step macro, extraneous variable assignments, and potential data integrity issues. The author offers a much more elegant solution by producing a segment of code which uses brute force to calculate Cohen s kappa as well as all popular variants. The code uses nested PROC SQL statements to provide a single conceptual step which generates kappa statistics of all types even those that the user wishes to define for themselves.
Matthew Duchnowski, Educational Testing Service (ETS)
Paper 1861-2014:
Case Control Matching: Comparing Simple Distance- and Propensity Score-Based Methods
A case control study is in its most basic form comparing a case series to a matched control series and are commonly implemented in the field of public health. While matching is intended to eliminate confounding, the main potential benefit of matching in case control studies is a gain in efficiency. There are many known methods for selecting potential match or matches (in case of 1:n studies) per case, the most prominent being distance-based approach and matching on propensity scores. In this paper, we will go through both and compare their results and will present a macro capable of performing both.
Lovedeep Gondara, BC Cancer Agency
Colleen Mcgahan, BC Cancer Agency
Paper 1661-2014:
Challenges of Processing Questionnaire Data from Collection to SDTM to ADaM and Solutions Using SAS®
Often in a clinical trial, measures are needed to describe pain, discomfort, or physical constraints that are visible but not measurable through lab tests or other vital signs. In these cases, researchers turn to questionnaires to provide documentation of improvement or statistically meaningful change in support safety and efficacy hypotheses. For example, in studies (like Parkinson s studies) where pain or depression are serious non-motor symptoms of the disease, these questionnaires provide primary endpoints for analysis. Questionnaire data presents unique challenges in both collection and analysis in the world of CDISC standards. The questions are usually aggregated into scale scores, as the underlying questions by themselves provide little additional usefulness. SAS® is a powerful tool for extraction of the raw data from the collection databases and transposition of columns into a basic data structure in SDTM, which is vertical. The data is then processed further as per the instructions in the Statistical Analysis Plan (SAP). This involves translation of the originally collected values into sums, and the values of some questions need to be reversed. Missing values can be computed as means of the remaining questions. These scores are then saved as new rows in the ADaM (analysis-ready) data sets. This paper describes the types of questionnaires, how data collection takes place, the basic CDISC rules for storing raw data in SDTM, and how to create analysis data sets with derived records using ADaM standards, while maintaining traceability to the original question.
Karin LaPann, PRA International
Terek Peterson, PRA International
Paper 1762-2014:
Chasing the Log File While Running the SAS® Program
Usually, log files are checked by users only when SAS® completes the execution of programs. If SAS finds any errors in the current line, it skips the current step and executes the next line. The process is completed only at the execution complete program. There are a few programs that will take more than a day to complete. In this case, the user opens the log file in Read-Only mode frequently to check for errors, warnings, and unexpected notes and terminates the execution of the program manually if any potential messages are identified. Otherwise, the user will be notified with the errors in the log file only at the end of the execution. Our suggestion is to run the parallel utility program along with the production program to check the log file of the currently running program and to notify the user through an e-mail when an error, warning, or unexpected note is found in the log file. Also, the execution can be terminated automatically and the user can be notified when potential messages are identified.
Harun Rasheed, Cognizant Technology Solutions
Amarnath Vijayarangan, Genpact
Paper 1717-2014:
Collaborative Problem Solving in the SAS® Community
When a SAS® user asked for help scanning words in textual data and then matching them to pre-scored keywords, it struck a chord with SAS programmers! They contributed code that solved the problem using hash structures, SQL, informats, arrays, and PRX routines. Of course, the next question was which program is fastest! This paper compares the different approaches and evaluates the performance of the programs on varying amounts of data. The code for each program is provided to show how SAS has a variety of tools available to solve common problems. While this won t make you an expert on any of these programming techniques, you ll see each of them in action on a common problem.
Tom Kari, Tom Kari Consulting
Paper 1499-2014:
Combined SAS® ODS Graphics Procedures with ODS to Create Graphs of Individual Data
The graphical display of the individual data is important in understanding the raw data and the relationship between the variables in the data set. You can explore your data to ensure statistical assumptions hold by detecting and excluding outliers if they exist. Since you can visualize what actually happens to individual subjects, you can make your conclusions more convincing in statistical analysis and interpretation of the results. SAS® provides many tools for creating graphs of individual data. In some cases, multiple tools need to be combined to make a specific type of graph that you need. Examples are used in this paper to show how to create graphs of individual data using the SAS® ODS Graphics procedures (SG procedures).
Howard Liang, inVentiv health Clinical
Paper 1613-2014:
Combining Multiple Date-Ranged Historical Data Sets with Dissimilar Date Ranges into a Single Change History Data Set
This paper describes a method that uses some simple SAS® macros and SQL to merge data sets containing related data that contains rows with varying effective date ranges. The data sets are merged into a single data set that represents a serial list of snapshots of the merged data, as of a change in any of the effective dates. While simple conceptually, this type of merge is often problematic when the effective date ranges are not consecutive or consistent, when the ranges overlap, or when there are missing ranges from one or more of the merged data sets. The technique described was used by the Fairfax County Human Resources Department to combine various employee data sets (Employee Name and Personal Data, Personnel Assignment and Job Classification, Personnel Actions, Position-Related data, Pay Plan and Grade, Work Schedule, Organizational Assignment, and so on) from the County's SAP-HCM ERP system into a single Employee Action History/Change Activity file for historical reporting purposes. The technique currently is used to combine fourteen data sets, but is easily expandable by inserting a few lines of code using the existing macros.
James Moon, County of Fairfax, Virginia
Paper 1543-2014:
Combining Type-III Analyses from Multiple Imputations
Missing data commonly occurs in medical, psychiatry, and social researches. The SAS® MI and MIANALYZE procedures are often used to generate multiple imputations and then provide valid statistical inferences based on them. However, MIANALYZE is not applicable to combine type-III analyses obtained using multiple imputed data sets. In this manuscript, we write a macro to combine the type-III analyses generated from the SAS MIXED procedure based on multiple imputations. The proposed method can be extended to other procedures reporting type-III analyses, such as GENMOD and GLM.
Binhuan Wang, New York University School of Medicine
Yixin Fang, New York University School of Medicine
Man Jin, Forest Research Institute
Paper 1464-2014:
Communication-Effective Data Visualization: Design Principles, Widely Usable Graphic Examples, and Code for Visual Data Insights
Graphic software users are confronted with what I call Options Over-Choice, and with defaults that are designed to easily give you a result, but not necessarily the best result. This presentation and paper focus on guidelines for communication-effective data visualization. It demonstrates their practical implementation, using graphic examples likely to be adaptable to your own work. Code is provided for the examples. Audience members will receive the latest update of my tip sheet compendium of graphic design principles. The examples use SAS® tools (traditional SAS/GRAPH® or the newer ODS graphics procedures that are available with Base SAS®), but the design principles are really software independent. Come learn how to use data visualization to inform and influence, to reveal and persuade, using tips and techniques developed and refined over 34 years of working to get the best out of SAS® graphic software tools.
LeRoy Bessler, Bessler Consulting and Research
Paper 1798-2014:
Comparison of Five Analytic Techniques for Two-Group, Pre-Post Repeated Measures Designs Using SAS®
There has been debate regarding which method to use to analyze repeated measures continuous data when the design includes only two measurement times. Five different techniques can be applied and give similar results when there is little to no correlation between pre- and post-test measurements and when data at each time point are complete: 1) analysis of variance on the difference between pre- and post-test, 2) analysis of covariance on the differences between pre- and post-test controlling for pre-test, 3) analysis of covariance on post-test controlling for pre-test, 4) multiple analysis of variance on post- test and pre-test, and 5) repeated measures analysis of variance. However, when there is missing data or if a moderate to high correlation between pre- and post-test measures exists under an intent-to-treat analysis framework, bias is introduced in the tests for the ANOVA, ANCOVA, and MANOVA techniques. A comparison of Type III sum of squares, F-tests, and p-values for a complete case and an intent-to-treat analysis are presented. The analysis using a complete case data set shows that all five methods produce similar results except for the repeated measures ANOVA due to a moderate correlation between pre- and post-test measures. However, significant bias is introduced for the tests using the intent-to-treat data set.
J. Madison Hyer, Georgia Regents University
Jennifer Waller, Georgia Regents University
Paper 1693-2014:
Conditional Execution "Switch Path" Logic in SAS® Data Integration Studio 4.6
With the growth in size and complexity of organizations investing in SAS® platform technologies, the size and complexity of ETL subsystems and data integration (DI) jobs is growing at a rapid rate. Developers are pushed to come up with new and innovative ways to improve process efficiency in their DI jobs to meet increasingly demanding service level agreements (SLAs). The ability to conditionally execute or switch paths in a DI job is an extremely useful technique for improving process efficiency. How can a SAS® Data Integration developer design a job to best suit conditional execution? This paper discusses a technique for providing a parameterized dynamic execution custom transformation that can be easily incorporated into SAS® Data Integration Studio jobs to provide process path switching capabilities. The aim of any data integration task is to ensure that all sources of business data are integrated as efficiently as possible. It is concerned with the repurposing of data via transformation, should be a value-adding process, and also should be the product of collaboration. Modularization of common or repeatable processes is a fundamental part of the collaboration process in DI design and development. Switch path a custom transformation built to conditionally execute branches or nodes in SAS Data Integration Studio provides a reusable module for solving the conditional execution limitations of standard SAS Data Integration Studio transformations and jobs. Switch Path logic in SAS Data Integration Studio can serve many purposes in day-to-day business needs for a SAS data integration developer as it is completely reusable
Prajwal Shetty, Tesco
Paper SAS2401-2014:
Confessions of a SAS® Dummy
People from all over the world are using SAS® analytics to achieve great things, such as to develop life-saving medicines, detect and prevent financial fraud, and ensure the survival of endangered species. Chris Hemedinger is not one of those people. Instead, Chris has used SAS to optimize his baby name selections, evaluate his movie rental behavior, and analyze his Facebook friends. Join Chris as he reviews some of his personal triumphs over the little problems in life, and learn how these exercises can help to hone your skills for when it really matters.
Chris Hemedinger, SAS
Paper 1651-2014:
Confirmatory Factor Analysis and Structural Equation Modeling of Non-cognitive Assessments Using PROC CALIS
Non-cognitive assessments, which measure constructs such as time management, goal-setting, and personality, are becoming more prevalent today in research within the domains of academic performance and workforce readiness. Many instruments that are used for this purpose contain a large number of items that can each be assigned to specific facets of the larger construct. The factor structure of each instrument emerges from a mixture of psychological theory and empirical research, often by doing exploratory factor analysis (EFA) using the SAS® procedure PROC FACTOR. Once an initial model is established, it is important to perform confirmatory factor analysis (CFA) to confirm that the hypothesized model provides a good fit to the data. If outcome data such as grades are collected, structural equation modeling (SEM) should also be employed to investigate how well the assessment predicts these measures. This paper demonstrates how the SAS procedure PROC CALIS is useful for performing confirmatory factor analysis and structural equation modeling. Examples of these methods are demonstrated and proper interpretation of the fit statistics and resulting output is illustrated.
Steven Holtzman, Educational Testing Service
Paper 1451-2014:
Converting Clinical Database to SDTM: The SAS® Implementation
The CDISC Study Data Tabulation Model (SDTM) provides a standardized structure and specification for a broad range of human and animal study data in pharmaceutical research, and is widely adopted in the industry for the submission of the clinical trial data. Because SDTM requires additional variables and datasets that are not normally available in the clinical database, further programming is required to convert the clinical database into the SDTM datasets. This presentation introduces the concept and general requirements of SDTM, and the different approaches in the SDTM data conversion process. The author discusses database design considerations, implementation procedures, and SAS® macros that can be used to maximize the efficiency of the process. The creation of the metadata DEFINE.XML and the final STDM dataset validation are also discussed.
Hong Chen, McDougall Scientific Ltd.
Paper 1719-2014:
Counting Days of the Week - The INTCK Approach
The INTCK function is used to obtain the number of time intervals between two dates. The INTCK function comes with arguments and argument-modifiers to enable us to perform a variety of date-related manipulations. This paper deals with a real-time simple usage of the INTCK function to calculate frequency of days of the week between the start and end day of a trip. The INTCK function with its arguments can directly calculate the number of days of the week as illustrated in this paper. The same usage of the INTCK function using PROC SQL is also presented in this paper. All the codes executed and presented in this paper involve Base SAS® Release 9.3 only.
Jinson Erinjeri, D.K. Shifflet & Associates
Paper 1749-2014:
Creating Define.xml v2 Using SAS® for FDA Submissions
When submitting clinical data to the Food and Drug Administration (FDA), besides the usual trials results, we need to submit the information that helps the FDA to understand the data. The FDA has required the CDISC Case Report Tabulation Data Definition Specification (Define-XML), which is based on the CDISC Operational Data Model (ODM), for submissions using Study Data Tabulation Model (SDTM). Electronic submission to the FDA is therefore a process of following the guidelines from CDISC and FDA. This paper illustrates how to create an FDA guidance compliant define.xml v2 from metadata by using SAS®.
Qinghua (Kathy) Chen, Exelixis Inc.
James Lenihan, Exelixis Inc.
Paper 1838-2014:
Creating Formats on the Fly
The Census Bureau conducts the Common Core of Data surveys for the National Center for Education Statistics annually. We have written SAS® programs to automate the database documentation. We try to avoid including hard-coded values in the programs. Thanks to a record layout spreadsheet, the analysts can quickly update the survey metadata outside the SAS programs. This paper explains how SAS can read the record layout spreadsheet to create formats on the fly. The analysts can update the values as changes occur over time without having to worry about writing correct SAS syntax. Behind the scenes, SAS is using dictionary views, macros, ODS OUTPUT, PROC TEMPLATE, PROC FORMAT, the ODS Report Writing Interface, and RTF to create the desired results. This paper uses syntax for SAS® 9.2, written for programmers at the intermediate level.
Suzanne Dorinski, US Census Bureau
Paper 2033-2014:
Creating Journal Ready Tables with Special Characters Using ODS LaTeX
LaTeX is a free document creation package that is often used to create journal articles. It provides the capability to create very specific formatting and to write a wide variety of formulas. Using ODS, SAS® can write documents to a LaTeX file, which can then be compiled through LaTeX into PDF files. This paper briefly reviews the basic syntax and options to produce these files. Then, we look at how to create a new tagset to make changes to the standard ODS LaTeX templates to create the non-gridded table appearance that is typically seen in journal articles. We also explore how to write special characters and equations not otherwise available through ODS LaTeX.
Steven Feder, Federal Reserve Board of Governors
Paper SAS050-2014:
Creating Multi-Sheet Microsoft Excel Workbooks with SAS®: The Basics and Beyond Part 1
This presentation explains how to use Base SAS®9 software to create multi-sheet Microsoft Excel workbooks. You learn step-by-step techniques for quickly and easily creating attractive multi-sheet Excel workbooks that contain your SAS® output using the ExcelXP ODS tagset. The techniques can be used regardless of the platform on which your SAS software is installed. You can even use them on a mainframe! Creating and delivering your workbooks on-demand and in real time using SAS server technology is discussed. Although the title is similar to previous presentations by this author, this presentation contains new and revised material not previously presented.
Vince DelGobbo, SAS
Paper 1555-2014:
Creating a Journal-Ready Group Comparison Table in Microsoft Word with SAS®
The first table in many research journal articles is a statistical comparison of demographic traits across study groups. It might not be exciting, but it s necessary. And although SAS® calculates these numbers with ease, it is a time-consuming chore to transfer these results into a journal-ready table. Introducing the time-saving deluxe %MAKETABLE SAS macro it does the heavy work for you. It creates a Microsoft Word table of up to four comparative groups reporting t-tests, chi-square, ANOVA, or median test results, including a p-value. You specify only a one-line macro call for each line in the table, and the macro takes it from there. The result is a tidily formatted journal-ready Word table that you can easily include in a manuscript, report, or Microsoft PowerPoint presentation. For statisticians and researchers needing to summarize group comparisons in a table, this macro saves time and relieves you from the drudgery of trying to make your output neat and pretty. And after all, isn t that what we want computing to do for us?
Alan Elliott, Southern Methodist University
Paper 1488-2014:
Custom BI Tools Using SAS® Stored Processes
Business Intelligence platforms provide a bridge between expert data analysts and decision-makers and other end-users. But what do you do when you can identify no system that meets both your needs and your budget? If you are the Consolidated Data Analysis Center in the HHS Office of Inspector General, you use SAS® Enterprise BI Server and the SAS® Stored Process Web Application to build your own. This presentation covers the inception, design, and implementation of the PAYment by Geographic Area (PAYGAR) system, which uses only SAS® Enterprise BI tools, namely the SAS Stored Process Web Application, PROC GMAP, and HTML/JAVA embedded in a DATA step, to create an interactive platform for presenting and exploring data that has a geographic component. In particular, the presentation reviews how we created a system of chained stored processes to enable a user to select the data to be presented, navigate through different geographic levels, and display companion reports related to the current data and geographic selections. It also covers the creation of the HTML front-end that sits over and manages the system. Throughout, the presentation emphasizes the scalability of PAYGAR, which the SAS Stored Process Web Application facilitates.
Scott Hutchison, HHS Office of Inspector General
John Venturini, Piper Enterprise Solutions
Paper 1564-2014:
Dashboards: A Data Lifeline for the Business
The Washington D.C. aqueduct was completed in 1863, carrying desperately needed clean water to its many residents. Just as the aqueduct was vital and important to its residents, a lifeline if you will, so too is the supply of data to the business. Without the flow of vital information, many businesses would not be able to make important decisions. The task of building my company s first dashboard was brought before us by our CIO; the business had not asked for it. In this poster, I discuss how we were able to bring fresh ideas and data to our business units by converting the data they saw on a daily basis in reports to dashboards. The road to success was long with plenty of struggles from creating our own business requirements to building data marts, synching SQL to SAS®, using information maps and SAS® Enterprise Guide® projects to move data around, all while dealing with technology and other I.T. team roadblocks. Then on to designing what would become our real-time dashboards, fighting for SharePoint single sign-on, and, oh yeah, user adoption. My story of how dashboards revitalized the business is a refreshing tale for all levels.
Jennifer McBride, Virginia Credit Union
Paper 1314-2014:
Data Cleaning: Longitudinal Study Cross-Visit Checks
Cross-visit checks are a vital part of data cleaning for longitudinal studies. The nature of longitudinal studies encourages repeatedly collecting the same information. Sometimes, these variables are expected to remain static, go away, increase, or decrease over time. This presentation reviews the na ve and the better approaches at handling one-variable and two-variable consistency checks. For a single-variable check, the better approach features the new ALLCOMB function, introduced in SAS® 9.2. For a two-variable check, the better approach uses the .first pseudo-class to flag inconsistencies. This presentation will provide you the tools to enhance your longitudinal data cleaning process.
Lauren Parlett, Johns Hopkins University
Paper 1603-2014:
Data Coarsening and Data Swapping Algorithms
With increased concern about privacy and simultaneous pressure to make survey data available, statistical disclosure control (SDC) treatments are performed on survey microdata to reduce disclosure risk prior to dissemination to the public. This situation is all the more problematic in the push to provide data online for immediate user query. Two SDC approaches are data coarsening, which reduces the information collected, and data swapping, which is used to adjust data values. Data coarsening includes recodes, top-codes and variable suppression. Challenges related to creating a SAS® macro for data coarsening include providing flexibility for conducting different coarsening approaches, and keeping track of the changes to the data so that variable and value labels can be assigned correctly. Data swapping includes selecting target records for swapping, finding swapping partners, and swapping data values for the target variables. With the goal of minimizing the impact on resulting estimates, challenges for data swapping are to find swapping partners that are close matches in terms of both unordered categorical and ordered categorical variables. Such swapping partners ensure that enough change is made to the target variables, that data consistency between variables is retained, and that the pool of potential swapping partners is controlled. An example is presented using each algorithm.
Tom Krenzke, Westat
Katie Hubbell, Westat
Mamadou Diallo, Westat
Amita Gopinath, Westat
Sixia Chen, Westat
Paper 1568-2014:
Data Quality Governance for Analytics Teams
Having data that are consistent, reliable, and well linked is one of the biggest challenges faced by financial institutions. The paper describes how the SAS® Data Management offering helps to connect people, processes, and technology to deliver consistent results for data sourcing and analytics teams, and minimizes the cost and time involved in the development life cycle. The paper concludes with best practices learned from various enterprise data initiatives.
Anand Jagarapu, Arunam Technologies LLC
Paper 2044-2014:
Dataset Matching and Clustering with PROC OPTNET
We used OPTNET to link hedge fund datasets from four vendors, covering overlapping populations, but with no universal identifier. This quick tip shows how to treat data records as nodes, use pairwise identifiers to generate distance measures, and get PROC OPTNET to assign clusters of records from all sources to each hedge fund. This proved to be far faster, and easier, than doing the same task in PROC SQL.
Mark Keintz, Wharton Research Data Services
Paper 1302-2014:
Debugging SAS® Code in a Macro
Debugging SAS® code contained in a macro can be frustrating because the SAS error messages refer only to the line in the SAS log where the macro was invoked. This can make it difficult to pinpoint the problem when the macro contains a large amount of SAS code. Using a macro that contains one small DATA step, this paper shows how to use the MPRINT and MFILE options along with the fileref MPRINT to write just the SAS code generated by a macro to a file. The 'de-macroified' SAS code can be easily executed and debugged.
Bruce Gilsen, Federal Reserve Board
Paper 1721-2014:
Deploying a User-Friendly SAS® Grid on Microsoft Windows
Your company s chronically overloaded SAS® environment, adversely impacted user community, and the resultant lackluster productivity have finally convinced your upper management that it is time to upgrade to a SAS® grid to eliminate all the resource problems once and for all. But after the contract is signed and implementation begins, you as the SAS administrator suddenly realize that your company-wide standard mode of SAS operations, that is, using the traditional SAS® Display Manager on a server machine, runs counter to the expectation of the SAS grid your users are now supposed to switch to SAS® Enterprise Guide® on a PC. This is utterly unacceptable to the user community because almost everything has to change in a big way. If you like to play a hero in your little world, this is your opportunity. There are a number of things you can do to make the transition to the SAS grid as smooth and painless as possible, and your users get to keep their favorite SAS Display Manager.
Houliang Li, HL SASBIPros Inc
Paper 1807-2014:
Develop Highly Interactive Web Charts with SAS®
Very often, there is a need to present the analysis output from SAS® through web applications. On these occasions, it would make a lot of difference to have highly interactive charts over static image charts and graphs. Not only this is visually appealing, with features like zooming, filtering, etc., it enables consumers to have a better understanding of the output. There are a lot of charting libraries available in the market which enable us to develop cool charts without much effort. Some of the packages are Highcharts, Highstock, KendoUI, and so on. They are developed in JavaScript and use the latest HTML5 components, and they also support a variety of chart types such as line, spline, area, area spline, column, bar, pie, scatter, angular gauges, area range, area spline range, column range, bubble, box plot, error bars, funnel, waterfall, polar chart types etc. This paper demonstrates how we can combine the data processing and analytic powers of SAS with the visualization abilities of these charting libraries. Since most of them consume JSON-formatted data, the emphasis is on JSON producing capabilities of SAS, both with PROC JSON and other custom programming methods. The example would show how easy it is to do develop a stored process which produces JSON data which would be consumed by the charting library with minimum change to the sample program.
Rajesh Inbasekaran, Kavi Associates
Naren Mudivarthy, Kavi Associates
Neetha Sindhu, Kavi Associates
Paper 2030-2014:
Developing the Code to Execute Particle Swarm Optimization in SAS®
Particle swarm optimization is a heuristic global optimization method that was given by James Kennedy and Russell C. Eberhart in 1995. (James Kennedy and Russell C. Eberhart). The purpose of this paper develops a code for particle swarm optimization in SAS® 9.2.
Anurag Srivastava, Decision Quotient
Sangita Kumbharvadiya, Decision Quotient
Paper 1784-2014:
Dining with the Data: The Case of New York City and Its Restaurants
New York City boasts a wide variety of cuisine owing to the rich tourism and the vibrant immigrant population. The quality of food and hygiene maintained at the restaurants serving different cuisines has a direct impact on the people dining in them. The objective of this paper is to build a model that predicts the grade of the restaurants in New York City. It also provides deeper statistical insights into the distribution of restaurants, cuisine categories, grades, criticality of violations, etc., and concludes with the sequence analysis performed on the complete set of violations recorded for the restaurants at different time periods over the years 2012 and 2013. The data for 2013 is used to test the model. The data set consists of 15 variables that capture to restaurant location-specific and violation details. The target is an ordinal variable with three levels, A, B, and C, in descending order of the quality representation. Various SAS® Enterprise Miner models, logistic regression, decision trees, neural networks, and ensemble models are built and compared using validation misclassification rate. The stepwise regression model appears to be the best model, with prediction accuracy of 75.33%. The regression model is trained at step 3. The number of critical violations at 8.5 gives the root node for the split of the target levels, and the rest of the tree splits are guided by the predictor variables such as number of critical and non-critical violations, number of critical violations for the year 2011, cuisine group, and the borough.
Pruthvi Bhupathiraju Venkata, Oklahoma State University
Goutam Chakraborty, Oklahoma State University
Paper 1510-2014:
Direct Memory Data Management Using the APP Tools
APP is an unofficial collective abbreviation for the SAS® functions ADDR, PEEK, PEEKC, the CALL POKE routine, and their so-called LONG 64-bit counterparts the SAS tools designed to directly read from and write to physical memory in the DATA step. APP functions have long been a SAS dark horse. First, the examples of APP usage in SAS documentation amount to a few technical report tidbits intended for mainframe system programming, with nary a hint how the functions can be used for data management programming. Second, the documentation note on the CALL POKE routine is so intimidating in tone that many potentially receptive folks might decide to avoid the allegedly precarious route altogether. However, little can stand in the way of an inquisitive SAS programmer daring to take a close look, and it turns out that APP functions are very simple and useful tools! They can be used to explore how things really work, to make code more concise, to implement en masse data movement, and they can often dramatically improve execution efficiency. The author and many other SAS experts (notably Peter Crawford, Koen Vyverman, Richard DeVenezia, Toby Dunn, and the fellow masked by his 'Puddin' Man' sobriquet) have been poking around the SAS APP realm on SAS-L and in their own practices since 1998, occasionally letting the SAS community at large to peek at their findings. This opus is an attempt to circumscribe the results in a systematic manner. Welcome to the APP world! You are in for a few glorious surprises.
Paul Dorfman, Dorfman Consulting
Paper 1777-2014:
Disease Prevention to Reduce New Hampshire Health-Care Claims and Costs: A Data Mining Approach
The health-care industry in the United States is going through a paradigm shift moving away from its focus on treating diseases and toward promoting health, wellness, and preventive public health programs, so that both the individuals and the government can maintain a healthy bottom line. The high-level business problem is to reduce the expected medical costs and number of medical services required by the people of New Hampshire by implementing successful disease prevention programs. The objective is to identify which among the six prevention programs will successfully improve the health of the residents of New Hampshire over nine future years (2012 2020). The business scenario of the case is to identify the preventive programs that are most effective in reducing the costs in New Hampshire and to invest the money in those programs so that the overall health-care overhead costs can be reduced or controlled. The effectiveness of implementing the preventive programs was evaluated using SAS® Enterprise Guide® 5.1 and SAS® Enterprise Miner 12. Time series analysis, in particular, forecasting, is used to project the future health-care services and costs for the years from 2012 to 2020. Our analysis showed that all the preventive programs should be implemented concurrently. The minimum anticipated savings in cost is approximately $572,111 or 3.3% of the expected baseline cost of $17,297,931. Therefore, our recommendation is to use this cost reduction figure, $572,111, as the initial funding investment toward initiating the six prevention programs concurrently, so that tangible results can be noticed by 2020.
Rakesh Karn, Oklahoma State University
Rom Khattri, Oklahoma State University
Pradeep Podila, Oklahoma State University
Linda Schumacher, Oklahoma State University
Paper 1615-2014:
Don't Get Blindsided by PROC COMPARE
'NOTE: No unequal values were found. All values compared are exactly equal.' Do your eyes automatically drop to the end of your PROC COMPARE output in search of these words? Do you then conclude that your data sets match? Be careful here! Major discrepancies might still lurk in the shadows, and you ll never know about them if you make this common mistake. This paper describes several of PROC COMPARE s blind spots and how to steer clear of them. Watch in horror as PROC COMPARE glosses over important differences while boldly proclaiming that all is well. See the gruesome truth about what PROC COMPARE does, and what it doesn t do! Learn simple techniques that allow you to peer into these blind spots and avoid getting blindsided by PROC COMPARE!
Josh Horstman, Nested Loop Consulting
Roger Muller,
Paper 1618-2014:
Effectively Utilizing Loops and Arrays in the DATA Step
The implicit loop refers to the DATA step repetitively reading data and creating observations, one at a time. The explicit loop, which uses the iterative DO, DO WHILE, or DO UNTIL statements, is used to repetitively execute certain SAS® statements within each iteration of the DATA step execution. Explicit loops are often used to simulate data and to perform a certain computation repetitively. However, when an explicit loop is used along with array processing, the applications are extended widely, which includes transposing data, performing computations across variables, and so on. To be able to write a successful program that uses loops and arrays, one needs to know the contents in the program data vector (PDV) during the DATA step execution, which is the fundamental concept of DATA step programming. This workshop covers the basic concepts of the PDV, which is often ignored by novice programmers, and then illustrates how to use loops and arrays to transform lengthy code into more efficient programs.
Arthur Li, City of Hope
Paper 1872-2014:
Efficiency Estimation Using a Hybrid of Data Envelopment Analysis and Linear Regression
Literature suggests two main approaches, parametric and non-parametric, for constructing efficiency frontiers on which efficiency scores of other units can be based. Parametric functions can be either deterministic or stochastic in nature. However, when multiple inputs and outputs are encountered, Data Envelopment Analysis (DEA), a non-parametric approach, is a powerful tool used for decades in measurement of productivity/efficiency with a wide range of applications. Both approaches have advantages and limitations. This paper attempts to further explore and validate a hybrid approach, taking the best of both the DEA and the parametric approach, in order to estimate efficiency of Decision Making Units (DMUs) in an even better way.
John Dilip Raj, GE
Paper 1673-2014:
Enhance the ODS HTML Output with JavaScript
For data analysts, one of the most important steps after manipulating and analyzing the data set is to create a report for it. Nowadays, many statistics tables and reports are generated as HTML files that can be easily accessed through the Internet. However, the SAS® Output Delivery System (ODS) HTML output has many limitations on interacting with users. In this paper, we introduce a method to enhance the traditional ODS HTML output by using jQuery (a JavaScript library). A macro was developed to implement this idea. Compared to the standard HTML output, this macro can add sort, pagination, search, and even dynamic drilldown function to the ODS HTML output file.
Yu Fu, Oklahoma State Department of Health
Chao Huang, Oklahoma State University
Paper 1826-2014:
Enhancing SAS® Piping Through Dynamic Port Allocation
Pipeline parallelism, an extension of MP Connect, is an effective way to speed processing. Piping allows the typical programming sequence of DATA step followed by PROC to execute in parallel. Piping uses TCP ports to pass records directly from the DATA step to the PROC immediately as each individual record is processed. The DATA step in effect becomes a data transformation filter for the PROC , running in parallel and incurring no additional disk storage or related I/O lag. Establishing a pipe with MP Connect typically requires specifying a physical TCP port to be used by the writing and by the reading processes. Coding in this style opens the possibility for users to generate systems conflicts by inadvertently requesting ports that are in use. SAS® Metadata Server allows one to allocate ports dynamically; that is, users can use a symbolic name for the port with the server dynamically determining an unused port to temporarily assign to the SAS® job. While this capability is attractive, implementing SAS Metadata Server on a system which does not use any of the other SAS BI technology can be inefficient from a cost perspective. To enable dynamic port allocation without the added cost, we created a UNIX script which can be called from within SAS to ascertain which ports are available at runtime. The script returns a list of available ports which is captured in a SAS macro variable and subsequently used in establishing pipeline parallelism.
Piyush Singh, TATA Consultancy Services Ltd.
Gerhardt Pohl, Eli Lilly and Company
Paper 1468-2014:
Errors, Warnings, and Notes (Oh, My): A Practical Guide to Debugging SAS® Programs
This paper is based on the belief that debugging your programs is not only necessary, but also a good way to gain insight into how SAS® works. Once you understand why you got an error, a warning, or a note, you'll be better able to avoid problems in the future. In other words, people who are good debuggers are good programmers. This paper covers common problems including missing semicolons and character-to-numeric conversions, and the tricky problem of a DATA step that runs without suspicious messages but, nonetheless, produces the wrong results. For each problem, the message is deciphered, possible causes are listed, and how to fix the problem is explained.
Lora Delwiche, University of California
Susan Slaughter, Avocet Solutions
Paper 2042-2014:
Estimating Ordinal Reliability Using SAS®
In evaluation instruments and tests, individual items are often collected using an ordinal measurement or Likert type scale. Typically measures such as Cronbach s alpha are estimated using the standard Pearson correlation. Gadderman and Zumbo (2012) illustrate how using the standard Pearson correlations may yield biased estimates of reliability when the data are ordinal and present methodology for using the polychoric correlation in reliability estimates as an alternative. This session shows how to implement the methods of Gadderman and Zumbo using SAS® software. An example will be presented that incorporates these methods in the estimation of the reliability of an active learning post-occupancy evaluation instrument developed by Steelcase Education Solutions researchers.
Laura Kapitula, Grand Valley State University
Paper 1814-2014:
Evaluating School Attendance Data Using SAS®
The worst part of going to school is having to show up. However, data shows that those who do show up are the ones that are going to be the most successful (Johnson, 2000). As shown in a study done in Minneapolis, students who were in class at least 95% of the time were twice as likely pass state tests (Johnson, 2000). Studies have been conducted and show that school districts that show interest in attendance have higher achievement in students (Reeves, 2008). The goal in doing research on student attendance is to find out the patterns of when people are missing class and why they are absent. The data comes directly from the Phillip O Berry High School Attendance Office, with around 1600 students; there is plenty of data to be used from the 2012 2013 school year. Using Base SAS® 9.3, after importing the data in from Microsoft Excel, a series of PROC formats and PROC GCharts were used to output and analyze the data. The data showed the days of the week and period that students missed the most, depending on grade level. The data shows that Freshman and Seniors were the most likely to be absent on a given day. Based on the data, attendance continues to be a issue; therefore, school districts need to take an active role in developing attendance policies.
Jacob Foard, Phillip O. Berry Academy of Technology
Thomas Nix, Phillip O. Berry Academy of Technology
Rachel Simmons, Phillip O. Berry Academy of Technology
Paper 1764-2014:
Excel with SAS® and Microsoft Excel
SAS® is an outstanding suite of software, but not everyone in the workplace speaks SAS. However, almost everyone speaks Excel. Often, the data you are analyzing, the data you are creating, and the report you are producing is a form of a Microsoft Excel spreadsheet. Every year at SAS® Global Forum, there are SAS and Excel presentations, not just because Excel isso pervasive in the workplace, but because there s always something new to learn (or re-learn)! This paper summarizes and references (and pays homage to!) previous SAS Global Forum presentations, as well as examines some of the latest Excel capabilities with the latest versions of SAS® 9.4 and SAS® Visual Analytics.
Andrew Howell, ANJ Solutions
Paper 1493-2014:
Experiences in Using Academic Data for SAS® BI Dashboard Development
Business Intelligence (BI) dashboards serve as an invaluable, high-level, visual reference tool for decision-making processes in many business industries. A request was made to our department to develop some BI dashboards that could be incorporated in an academic setting. These dashboards would aim to serve various undergraduate executive and administrative staff at the university. While most business data may lend itself to work very well and easily in the development of dashboards, academic data is typically modeled differently and, therefore, faces unique challenges. In this paper, the authors detail and share the design and development process of creating dashboards for decision making in an academic environment utilizing SAS® BI Dashboard 4.3 and other SAS® Enterprise Business Intelligence 9.2 tools. The authors also provide lessons learned as well as recommendations for future implementations of BI dashboards utilizing academic data.
Evangeline Collado, University of Central Florida
Michelle Parente, University of Central Florida
Paper 1450-2014:
Exploring DATA Step Merges and PROC SQL Joins
Explore the various DATA step merge and PROC SQL join processes. This presentation examines the similarities and differences between merges and joins, and provides examples of effective coding techniques. Attendees examine the objectives and principles behind merges and joins, one-to-one merges (joins), and match-merge (equi-join), as well as the coding constructs associated with inner and outer merges (joins) and PROC SQL set operators.
Kirk Paul Lafler, Software Intelligence Corporation
Paper 1854-2014:
Exporting Formulas to Microsoft Excel Using the ODS ExcelXP Tagset
SAS® can easily perform calculations and export the result to Microsoft Excel in a report. However, sometimes you need Excel to have a formula or a function in a cell and not just a number. Whether it s for a boss who wants to see a SUM formula in the total cell or to have automatically updating reports that can be sent to people who don t use SAS to be completed, exporting formulas to Excel can be very powerful. This paper illustrates how, by using PROC REPORT and PROC PRINT along with the ExcelXP tagset, you can easily export formulas and functions into Excel directly from SAS. The method outlined in this paper requires Base SAS® 9.1 or higher and Excel 2002 or later and requires a basic understanding of the ExcelXP tagset.
Joseph Skopic, Federal Government
Paper 1753-2014:
Extracting the Needles from the Haystacks
When you want to know the details about a small subset of a much larger data set, it can take a long time to select the records you need. This paper shows you how to create a user-defined SAS® format to pull only the observations that you want out of a big data source. Even when selecting a million records out of data sets that can have more than 100 million records, this method is much quicker than either a PROC SQL join or a SAS merge.
Sara Boltman, Butterfly Projects
Paper 2029-2014:
Five Things to Do when Using SAS® BI Web Services
Traditionally, web applications interact with back-end databases by means of JDBC/ODBC connections to retrieve and update data. With the growing need for real-time charting and complex analysis types of data representation on these web applications, SAS computing power can be put to use by adding a SAS web service layer between the application and the database. With the experience that we have with integrating these applications to SAS® BI Web Services, this is our attempt to point out five things to do when using SAS BI Web Services. 1) Input Data Sources: always enable Allow rewinding stream while creating the stored process. 2) Use LIBNAME statements to define XML filerefs for the Input and Output Streams (Data Sources). 3) Define input prompts and output parameters as global macro variables in the stored process if the stored process calls macros that use these parameters. 4) Make sure that all of the output parameters values are set correctly as defined (data type) before the end of the stored process. 5) The Input Streams (if any) should have a consistent data type; essentially, every instance of the stream should have the same structure. This paper consist of examples and illustrations of errors and warnings associated with the previously mentioned cases.
Neetha Sindhu, Kavi Associates
Vimal Raj, Kavi Associates
Paper 1448-2014:
From Providing Support to Driving Decisions: Improving the Value of Institutional Research
For almost two decades, Western Kentucky University's Office of Institutional Research (WKU-IR) has used SAS® to help shape the future of the institution by providing faculty and administrators with information they can use to make a difference in the lives of their students. This presentation provides specific examples of how WKU-IR has shaped the policies and practices of our institution and discusses how WKU-IR moved from a support unit to a key strategic partner. In addition, the presentation covers the following topics: How the WKU Office of Institutional Research developed over time; Why WKU abandoned reactive reporting for a more accurate, convenient system using SAS® Enterprise Intelligence Suite for Education; How WKU shifted from investigating what happened to predicting outcomes using SAS® Enterprise Miner and SAS® Text Miner; How the office keeps the system relevant and utilized by key decision makers; What the office has accomplished and key plans for the future.
Tuesdi Helbig, Western Kentucky University
Gina Huff, Western Kentucky University
Paper 1668-2014:
Generate Cloned Output with a Loop or Splitter Transformation
Based on selection criteria, the SAS® Data Integration Studio loop or splitter transformations can be used to generate multiple output files. The ETL developer or SAS® administrator can decide which transformation is better suited for the design, priorities, and SAS configuration at their site. Factors to consider are the setup, maintenance, and performance of the ETL job. The loop transformation requires an understanding of macros and a control table. The splitter transformation is more straightforward and self documenting. If time allows, creating and running a job with each transformation can provide benchmarking to measure performance. For a comparison of these two options, this paper shows an example of the same job using the loop or splitter transformation. For added testing metrics, one can adapt the LOGPARSE SAS macro to parse the job logs.
Liotus Laura, Community Care Behavioral Health
Paper 1594-2014:
Generating Dynamic Tables Using PROC SQL and PROC TABULATE
PROC TABULATE is the most widely used reporting tool in SAS®, along with PROC REPORT. Any kind of report with the desired statistics can be produced by PROC TABULATE. When we need to report some summary statistics like mean, median, and range in the heading, either we have to edit it outside SAS in word processing software or enter it manually. In this paper, we discuss how we can automate this to be dynamic by using PROC SQL and some simple macros.
Lovedeep Gondara, BC Cancer Agency
Paper 1765-2014:
Geo Reporting: Integrating ArcGIS Maps in SAS® Reports
This paper shares our experience integrating two leading data analytics and Geographic Information Systems (GIS) software products SAS® and ArcGIS to provide integrated reporting capabilities. SAS is a powerful tool for data manipulation and statistical analysis. ArcGIS is a powerful tool for analyzing data spatially and presenting complex cartographic representations. Combining statistical data analytics and GIS provides increased insight into data and allows for new and creative ways of visualizing the results. Although products exist to facilitate the sharing of data between SAS and ArcGIS, there are no ready-made solutions for integrating the output of these two tools in a dynamic and automated way. Our approach leverages the individual strengths of SAS and ArcGIS, as well as the report delivery infrastructure of SAS® Information Delivery Portal.
Nathan Clausen, CACI
Aaron House, CACI
Paper SAS2221-2014:
Getting Started with the SAS/IML® Language
Do you need a statistic that is not computed by any SAS® procedure? Reach for the SAS/IML® language! Many statistics are naturally expressed in terms of matrices and vectors. For these, you need a matrix-vector language. This hands-on workshop introduces the SAS/IML language to experienced SAS programmers. The workshop focuses on statements that create and manipulate matrices, read and write data sets, and control the program flow. You will learn how to write user-defined functions, interact with other SAS procedures, and recognize efficient programming techniques. Programs are written using the SAS/IML® Studio development environment. This course covers Chapters 2 4 of Statistical Programming with SAS/IML Software (Wicklin, 2010).
Rick Wicklin, SAS
Paper 1316-2014:
Getting the Warm and Fuzzy Feeling with Inexact Matching
With the ever increasing proliferation of disparate complex data being collected and stored, it has never been more important that this information is accurate, clean, integrated, and often times in compliance with an expanding set of government regulations. This means that the data must be cleaned and standardized, duplicates must be identified and removed, and the individual data must be able to be joined or merged together in some way. However, it is often the case that this data does not have the same variables or values to make this possible with a simple Join or Merge. To that end, one has to employ a set of fuzzy logics or fuzzy matching. Simply put, fuzzy matching is the implementation of algorithmic processes (fuzzy logic) to determine the similarity between elements of data such as business names, people names, or address information. Fuzzy logic is used to predict the probability of data with non-exact matches to help in data cleansing, deduplication, or matching of disparate data sets. This paper shows the basics of using fuzzy logic by using SAS® functions, COMPLEV, multiple variables matches, and a modified Porter stemming algorithm.
Toby Dunn, Dunn Consulting
Paper 1601-2014:
Graphs Useful for Variable Selection in Predictive Modeling
This paper illustrates some SAS® graphs that can be useful for variable selection in predictive modeling. Analysts are often confronted with hundreds of candidate variables available for use in predictive models, and this paper illustrates some simple SAS graphs that are easy to create and that are useful for visually evaluating candidate variables for inclusion or exclusion in predictive models. The graphs illustrated in this paper are bar charts with confidence intervals using the GCHART procedure and comparative histograms using the UNIVARIATE procedure. The graphs can be used for most combinations of categorical or continuous target variables with categorical or continuous input variables. This paper assumes the reader is familiar with the basic process of creating predictive models using multiple (linear or logistic) regression.
Bob Moore, Thrivent Financial
Paper 1646-2014:
Hash It Out with a Web Service and Get the Data You Need
Have you ever needed additional data that was only accessible via a web service in XML or JSON? In some situations, the web service is set up to only accept parameter values that return data for a single observation. To get the data for multiple values, we need to iteratively pass the parameter values to the web service in order to build the necessary dataset. This paper shows how to combine the SAS® hash object with the FILEVAR= option to iteratively pass a parameter value to a web service and input the resulting JSON or XML formatted data.
John Vickery, North Carolina State University
Paper 1658-2014:
Healthcare Services Data Distribution, Transformation, and Model Fitting
Healthcare services data on products and services come in different shapes and forms. Data cleaning, characterization, massaging, and transformation are essential precursors to any statistical model-building efforts. In addition, data size, quality, and distribution influence model selection, model life cycle, and the ease with which business insights are extracted from data. Analysts need to examine data characteristics and determine the right data transformation and methods of analysis for valid interpretation of results. In this presentation, we demonstrate the common data distribution types for a typical healthcare services industry such as Cardinal Health and their salient features. In addition, we use Base SAS® and SAS/STAT® for data transformation of both the response (Y) and the explanatory (X) variables in four combinations [RR (Y and X as row data), TR (only Y transformed), RT (only X transformed), and TT (Y and X transformed)] and the practical significance of interpreting linear, logistic, and completely randomized design model results using the original and the transformed data values for decision-making processes. The reality of dealing with diverse forms of data, the ramification of data transformation, and the challenge of interpreting model results of transformed data are discussed. Our analysis showed that the magnitude of data variability is an overriding factor to the success of data transformation and the subsequent tasks of model building and interpretation of model parameters. Although data transformation provided some benefits, it complicated analysis and subsequent interpretation of model results.
Dawit Mulugeta, Cardinal Health
Jason Greenfield, Cardinal Health
Tison Bolen, Cardinal Health
Lisa Conley, Cardinal Health
Paper 2345-2014:
Helpful Tips for Novice Mainframe Programmers
Do you create reports via a mainframe? If so, you can use SAS® as a one-stop shop for all of your data manipulations. SAS can efficiently read data, create data sets without the need for multiple DATA steps, and produce Excel reports without the need for edits. This poster helps novice mainframe programmers by providing helpful tips to efficiently create reports using SAS in the mainframe environment. Topics covered are replacing JCL with SAS for reading data, efficient merging for efficient programming, and using PROQ FREQ for data quality and PROC TABULATE for superior reporting.
Rahul Pillay, Northrop Grumman
Paper 1385-2014:
How Predictive Analytics Turns Mad Bulls into Predictable Animals
Portfolio segmentation is key in all forecasting projects. Not all products are equally predictable. Nestl uses animal names for its segmentation, and the animal behavior translates well into how the planners should plan these products. Mad Bulls are those products that are tough to predict, if we don't know what is causing their unpredictability. The Horses are easier to deal with. Modern time series based statistical forecasting methods can tame Mad Bulls, as they allow to add explanatory variables into the models. Nestl now complements its Demand Planning solution based on SAP with predictive analytics technology provided by SAS®, to overcome these issues in an industry that is highly promotion-driven. In this talk, we will provide an overview of the relationship Nestl is building with SAS, and provide concrete examples of how modern statistical forecasting methods available in SAS® Demand-Driven Planning and Optimization help us to increase forecasting performance, and therefore to provide high service to our customers with optimized stock, the primary goal of Nestl 's supply chains.
Marcel Baumgartner, Nestlé SA
Paper 1486-2014:
How to Be A Data Scientist Using SAS®
The role of the Data Scientist is the viral job description of the decade. And like LOLcats, there are many types of Data Scientists. What is this new role? Who is hiring them? What do they do? What skills are required to do their job? What does this mean for the SAS® programmer and the statistician? Are they obsolete? And finally, if I am a SAS user, how can I become a Data Scientist? Come learn about this job of the future and what you can do to be part of it.
Chuck Kincaid, Experis Business Analytics
Paper 1729-2014:
How to Interpret SVD Units in Predictive Models?
Recent studies suggest that unstructured data, such as customer comments or feedback, can enhance the power of existing predictive models. SAS® Text Miner can generate singular value decomposition (SVD) units from text documents, which is a vectorial representation of terms in documents. These SVDs, when used as additional inputs along with the existing structured input variables, often prove to capture the response better. However, SVD units are sort of black box variables and are not easy to interpret or explain. This is a big hindrance to win over the decision makers in the organizations to incorporate these derived textual data components in the models. In this paper, we demonstrate a new and powerful feature in SAS® Text Miner 12.1 that helps in explaining the SVDs or the text cluster components. We discuss two important methods that are useful to interpreting them. For this purpose, we used data from a television network company that has transcripts of its call center notes from three prior calls of each customer. We are able to extract the key terms from the call center notes in the form of Boolean rules, which have contributed to the prediction of customer churn. These rules provide an intuitive sense of which set of terms, when occurring in either the presence or absence of another set of terms in the call center notes, might lead to a churn. It also provides insights into which customers are at a bigger risk of churning from the company s services and, more importantly, why.
Murali Pagolu, SAS
Goutam Chakraborty, Oklahoma State University
Paper 1623-2014:
How to Read, Write, and Manipulate SAS® Dates
No matter how long you ve been programming in SAS®, using and manipulating dates still seems to require effort. Learn all about SAS dates, the different ways they can be presented, and how to make them useful. This paper includes excellent examples for dealing with raw input dates, functions to manage dates, and outputting SAS dates into other formats. Included is all the date information you will need: date and time functions, Informats, formats, and arithmetic operations.
Jenine Milum, Equifax Inc.
Paper 1283-2014:
I Object: SAS® Does Objects with DS2
The DATA step has served SAS® programmers well over the years, and although it is powerful, it has not fundamentally changed. With DS2, SAS has introduced a significant alternative to the DATA step by introducing an object-oriented programming environment. In this paper, we share our experiences with getting started with DS2 and learning to use it to access, manage, and share data in a scalable, threaded, and standards-based way.
Peter Eberhardt, Fernwood Consulting Group Inc.
Xue Yao, Winnipeg Regional Health Authority
Paper 1544-2014:
Implementing Multiple Comparisons on Pearson Chi-square Test for an R×C Contingency Table in SAS®
This paper illustrates a permutation method for implementing multiple comparisons on Pearson s Chi-square test for an R×C contingency table, using the SAS® FREQ procedure and a newly developed SAS macro called CHISQ_MC. This method is analogous to the Tukey-type multiple comparison method for one-way analysis of variance.
Man Jin, Forest Research Institute
Binhuan Wang, New York University School of Medicine
Paper 1638-2014:
Institutional Research: Serving University Deans and Department Heads
Administrators at Western Kentucky University rely on the Institutional Research department to perform detailed statistical analyses to deepen the understanding of issues associated with enrollment management, student and faculty performance, and overall program operations. This paper presents several instances of analyses performed for the university to help it identify and recruit suitable candidates, uncover root causes in grade and enrollment trends, evaluate faculty effectiveness, and assess the impact of student characteristics, programs, or student activities on retention and graduation rates. The paper briefly discusses the data infrastructure created and used by Institutional Research. For each analysis performed, it reviews the SAS® program and key components of the SAS code involved. The studies presented include the use of SAS® Enterprise Miner to create a retention model incorporating dozens of student background variables. It shows an examination of grade trends in the same courses taught by different faculty and subsequent student behavior and success, providing insights into the nuances and subtleties of evaluating faculty performance. Another analysis uncovers the possible influence of fraternities and sororities in freshmen algebra courses. Two investigations explore the impact of programs on student retention and graduation rates. Each example and its findings illustrate how Institutional Research can support the administration of university operations. The target audience is any SAS professional interested in learning more about Institutional Research in higher education and how SAS software is used by an Institutional Research department to serve its organization.
Matthew Foraker, Western Kentucky University
Paper 1828-2014:
Integrated Big Data: Hadoop + DBMS + Discovery for SAS® High-Performance Analytics
SAS® High-Performance Analytics is a significant step forward in the area of high-speed, analytic processing in a scalable clustered environment. However, Big Data problems generally come with data from lots of data sources, at varying levels of maturity. Teradata s innovative Unified Data Architecture (UDA) represents a significant improvement in the way that large companies can think about Enterprise Data Management, including the Teradata Database, Hortonworks Hadoop, and Aster Data Discovery platform in a seamless integrated platform. Together, the two platforms provide business users, analysts, and data scientists with the ideally suited data management platforms, targeted specifically to their analytic needs, based upon analytic use cases, managed in a single integrated enterprise data management environment. The paper will focus on how several companies today are using Teradata s Integrated Hardware and Software UDA Platform to manage a single enterprise analytic environment, fight the ongoing proliferation of analytic data marts, and speed their operational analytic processes.
John Cunningham, Teradata Corporation
Paper 1863-2014:
Internet Gambling Behavioral Markers: Using the Power of SAS® Enterprise Miner 12.1 to Predict High-Risk Internet Gamblers
Understanding the actual gambling behavior of an individual over the Internet, we develop markers which identify behavioral patterns, which in turn can be used to predict the level of risk a subscriber is prone to gambling. The data set contains 4,056 subscribers. Using SAS® Enterprise Miner 12.1, a set of models are run to predict which subscriber is likely to become a high-risk internet gambler. The data contains 114 variables such as first active date and first active product used on the website as well as the characteristics of the game such as fixed odds, poker, casino, games, etc. Other measures of a subscriber s data such as money put at stake and what odds are being bet are also included. These variables provide a comprehensive view of a subscriber s behavior while gambling over the website. The target variable is modeled as a binary variable, 0 indicating a risky gambler and 1 indicating a controlled gambler. The data is a typical example of real-world data with many missing values and hence had to be transformed, imputed, and then later considered for analysis. The model comparison algorithm of SAS Enterprise Miner 12.1 was used to determine the best model. The stepwise Regression performs the best among a set of 25 models which were run using over a 100 permutations of each model. The Stepwise Regression model predicts a high-risk Internet gambler at an accuracy of 69.63% with variables such as wk4frequency and wk3frequency of bets.
Sai Vijay Kishore Movva, Oklahoma State University
Vandana Reddy, Oklahoma State University
Goutam Chakraborty, Oklahoma State University
Paper 1567-2014:
Iterative Programming In-Database Using SAS® Enterprise Guide® Query Builder
Traditional SAS® programs typically consist of a series of SAS DATA steps, which refine input data sets until the final data set or report is reached. SAS DATA steps do not run in-database. However, SAS® Enterprise Guide® users can replicate this kind of iterative programming and have the resulting process flow run in-database by linking a series of SAS Enterprise Guide Query Builder tasks that output SAS views pointing at data that resides in a Teradata database, right up to the last Query Builder task, which generates the final data set or report. This session both explains and demonstrates this functionality.
Frank Capobianco, Teradata
Paper 1495-2014:
Jazz It Up a Little with Formats
Formats are an often under-valued tool in the SAS® toolbox. They can be used in just about all domains to improve the readability of a report, or they can be used as a look-up table to recode your data. Out of the box, SAS includes a multitude of ready-defined formats that can be applied without modification to address most recode and redisplay requirements. And if that s not enough, there is also a FORMAT procedure for defining your own custom formats. This paper looks at using some of the formats supplied by SAS in some innovative ways, but primarily focuses on the techniques we can apply in creating our own custom formats.
Brian Bee, The Knowledge Warehouse Ltd
Paper 1702-2014:
Let SAS® Handle Your Job While You Are Not at Work!
Report automation and scheduling are very hot topics in many industries. They confer many advantages including reduced work load, elimination of repetitive tasks, generatation of accurate results, and better performance. This paper illustrates how to design an appropriate program to automate and schedule reports in SAS® 9.1 and SAS® Enterprise Guide® 5.1 using a SAS® server as well as the Windows Scheduler. The automation part includes good aspects of formatting Microsoft Excel tables using XML or VBA coding or any other formats, and conditional auto e-mailing with file attachments. We systematically walk through each step with a clear flow diagram from the data source to the final destination. We also discuss details of server-side and PC-side schedulers and how these schedulers involve invoking batch programs.
Anjan Matlapudi, AmerihealthCaritas
Paper 1823-2014:
Let SAS® Power Your .NET GUI
Despite its popularity in recent years, .NET development has yet to enjoy the quality, level, and depth of statistical support that has always been provided by SAS®. And yet, many .NET applications could benefit greatly from the power of SAS and, likewise, some SAS applications could benefit from friendly graphical user interfaces (GUIs) supported by Microsoft s .NET Framework. What the author sets out to do here is to 1) outline the basic mechanics of automating SAS with .NET, 2) provide a framework and specific strategies for maintaining parallelism between the two platforms at runtime, and 3) sketch out put some simple applications that provide an exciting combination of powerful SAS analytics and highly accessible GUIs. The mechanics of automating SAS with .NET will be covered briefly. Attendees will learn the required objects and methods needed to pass information between the two platforms. The attendees will learn some strategies for organizing their projects and for writing SAS code that lends itself to automation. This will include embedding SAS scripts within a .NET project and managing communications between the two platforms. Specifically, the log and listing output will be captured and handled by .NET, and user actions will be interpreted and sent to the SAS engine. Example applications used throughout the session include a tool that converts between SAS variable types through simple drag-and-drop and an application that analyzes the growth of the user s computer hard drive.
Matthew Duchnowski, Educational Testing Service (ETS)
Paper 1845-2014:
Let the CAT Catch a STYLE
Being flexible and highlighting important details in your output is critical. The use of ODS ESCAPECHAR allows the SAS® programmer to insert inline formatting functions into variable values through the DATA step, and it makes for a quick and easy way to highlight specific data values or modify the style of the table cells in your output. What is an easier and more efficient way to concatenate those inline formatting functions to the variable values? This paper shows how the CAT functions can simplify this task.
Yanhong Liu, Cincinnati Children's Hospital Medical Center
Justin Bates, Cincinnati Children's Hospital Medical Center
Paper 1426-2014:
Leveraging Publicly Available Data in the Classroom Using SAS® PROC SURVEYLOGISTIC
The soaring number of publicly available data sets across disciplines have allowed for increased access to real-life data for use in both research and educational settings. These data often leverage cost-effective complex sampling designs including stratification and clustering, which allow for increased efficiency in survey data collection and analyses. Weighting becomes a necessary component in these survey data in order to properly calculate variance estimates and arrive at sound inferences through statistical analysis. Generally speaking, these weights are included with the variables provided in the public use data, though an explanation for how and when to use these weights is often lacking. This paper presents an analysis using the California Health Interview Survey to compare weighted and non-weighted results using SAS® PROC LOGISTIC and PROC SURVEYLOGISTIC.
div class="author multi">Tyler Smith, National University
Besa Smith, Analydata
Paper 1899-2014:
Macro Design
This paper provides a set of ideas about design elements of SAS® macros. This paper is a checklist for programmers who write or test macros.
Ronald Fehd, Stakana Analytics
Paper 1309-2014:
Make It Possible: Create Customized Graphs with Graph Template Language
Effective graphs are indispensable for modern statistical analysis. They reveal tendencies that are not readily apparent in simple tables and add visual clarity to reports. My client is a big graph fan; he always shows me a lot of high-quality and complex sample graphs that were created by other software and asks me Can SAS® duplicate these outputs? Often, by leveraging the capabilities of the ODS Graph Template Language and the SGRENDER procedure, the answer is Yes . Graph Template Language offers SAS users a more direct approach to customize the output and to overlay graphs in different levels. This paper uses cases drawn from a real work situation to demonstrate how to get the seemingly unattainable results with the power of Graph Template Language: utilizing bubble plots as your distribution density bars creating refreshing looking linear regression graphics with the slop information in the legend overlaying different plots together to create sophisticated analytical bottleneck test output
Wen Song, ICF International
Ge Wu, Johns Hopkins University
Paper 1769-2014:
Making It Happen: A Novel Way to Combine, Construct, Customize, and Implement Your Big Data with a SAS® Data Discovery, Analytics, and Fraud Framework in Medicare
A common complaint from users working on identifying fraud and abuse in Medicare is that teams focus on operational applications, static reports, and high-level outliers. But, when faced with the need to constantly evaluate changing Medicare provider and beneficiary or enrollee dynamics, users are clamoring for more dynamic and accurate detection approaches. Providing these organizations with a data discovery and predictive analytics framework that leverages Hadoop and other big data approaches, while providing a clear path for teams to make more fact-based decisions more quickly is very important in pre- and post-fraud and abuse analysis. Organizations that do pursue a framework and a reusable services-based data discovery and analytics framework and architecture approach enjoy greater success in supporting data management, reporting, and analytics demands. They can quickly turn models into prioritized alerts and avoid improper or fraudulent payments. A successful framework should enable organizations to come up with efficient fraud, waste, and abuse models to address complex schemes; identify fraud, waste, and abuse vulnerabilities; and shorten triage efforts using a variety of data sourced from big data platforms like Hadoop and other relational database management systems. This paper talks about the data management, data discovery, predictive analytics, and social network analysis capabilities that are included in the SAS fraud framework and how a unified approach can significantly reduce the lifecycle of building and deploying fraud models. We hope this paper will provide IT leaders with a clear path for resolving issues from the simple to the incredibly complex, through a measured and scalable approach for delivering value for fraud, waste, and abuse models by providing deep insights to support evidence-based investigations.
Vivek Sethunatesan, Northrop Grumman Corp
Paper 1556-2014:
Making the Log a Forethought Rather Than an Afterthought
When we start programming, we simply hope that the log comes out with no errors or warnings. Yet once we have programmed for a while, especially in the area of pharmaceutical research, we realize that having a log with specific, useful information in it improves quality and accountability. We discuss clearing the log, sending the log to an output file, helpful information to put in the log, which messages are permissible, automated log checking, adding messages regarding data changes, whether or not we want to see source code, and a few other log-related ideas. Hopefully, the log will become something that we keep in mind from the moment we start programming.
Emmy Pahmer, inVentiv Health Clinical
Paper 2043-2014:
Managing Opt-Out Risk
Email is an important marketing channel for digital marketers. We can stay connected with our subscribers and attract them with relevant content as long as they are still subscribed to our email communication. In this session, we are planning to discuss why it's important to manage opt-out risk; how did we predict opt-out risk; and how do we proactively manage opt-out using the models we developed.
Jia Lei (Carol) Li, Gilt Groupe
Paper 1862-2014:
Managing the Organization of SAS® Format and Macro Code Libraries in Complex Environments Including PC SAS, SAS® Enterprise Guide®, and UNIX SAS
The capabilities of SAS® have been extended by the use of macros and custom formats. SAS macro code libraries and custom format libraries can be stored in various locations, some of which may or may not always be easily and efficiently accessed from other operating environments. Code can be in various states of development ranging from global organization-wide approved libraries to very elementary just-getting-started code. Formalized yet flexible file structures for storing code are needed. SAS user environments range from standalone systems such as PC SAS or SAS on a server/mainframe to much more complex installations using multiple platforms. Strictest attention must be paid to (1) file location for macros and formats and (2) management of the lack of cross-platform portability of formats. Macros are relatively easy to run from their native locations. This paper covers methods of doing this with emphasis on: (a) the option sasautos to define the location and the search order for identifying macros being called, and (b) even more importantly the little-known SAS option MAUTOLOCDISPLAY to identify the location of the macro actually called in the saslog. Format libraries are more difficult to manage and cannot be created and run in a different operating system than that in which they were created. This paper will discuss the export, copying and importing of format libraries to provide cross-platform capability. A SAS macro used to identify the source of a format being used will be presented.
Roger Muller, Data-To-Events, Inc.
Paper 1674-2014:
Matching Rules: Too Loose, Too Tight, or Just Right?
This paper describes a technique for calibrating street address match logic to maximize the match rate without introducing excessive erroneous matching.
Richard Cadieux, Towers Watson
Dan Bretheim, Towers Watson
Paper 1485-2014:
Measures of Fit for Logistic Regression
One of the most common questions about logistic regression is How do I know if my model fits the data? There are many approaches to answering this question, but they generally fall into two categories: measures of predictive power (like R-squared) and goodness of fit tests (like the Pearson chi-square). This presentation looks first at R-squared measures, arguing that the optional R-squares reported by PROC LOGISTIC might not be optimal. Measures proposed by McFadden and Tjur appear to be more attractive. As for goodness of fit, the popular Hosmer and Lemeshow test is shown to have some serious problems. Several alternatives are considered.
Paul Allison, University of Pennsylvania
Paper 1593-2014:
Modeling Loss Given Default in SAS/STAT®
Predicting loss given default (LGD) is playing an increasingly crucial role in quantitative credit risk modeling. In this paper, we propose to apply mixed effects models to predict corporate bonds LGD, as well as other widely used LGD models. The empirical results show that mixed effects models are able to explain the unobservable heterogeneity and to make better predictions compared with linear regression and fractional response regression. All the statistical models are performed in SAS/STAT®, SAS® 9.2, using specifically PROC REG and PROC NLMIXED, and the model evaluation metrics are calculated in PROC IML. This paper gives a detailed description on how to use PROC NLMIXED to build and estimate generalized linear models and mixed effects models.
Xiao Yao, The University of Edinburgh
Jonathan Crook, The University of Edinburgh
Galina Andreeva, The University of Edinburgh
Paper 1873-2014:
Modeling Ordinal Responses for a Better Understanding of Drivers of Customer Satisfaction
While survey researchers make great attempts to standardize their questionnaires including the usage of ratings scales in order to collect unbiased data, respondents are still prone to introducing their own interpretation and bias to their responses. This bias can potentially affect the understanding of commonly investigated drivers of customer satisfaction and limit the quality of the recommendations made to management. One such problem is scale use heterogeneity, in which respondents do not employ a panoramic view of the entire scale range as provided, but instead focus on parts of the scale in giving their responses. Studies have found that bias arising from this phenomenon was especially prevalent in multinational research, e.g., respondents of some cultures being inclined to use only the neutral points of the scale. Moreover, personal variability in response tendencies further complicates the issue for researchers. This paper describes an implementation that uses a Bayesian hierarchical model to capture the distribution of heterogeneity while incorporating the information present in the data. More specifically, SAS® PROC MCMC is used to carry out a comprehensive modeling strategy of ratings data that account for individual level scale usage. Key takeaways include an assessment of differences between key driver analyses that ignore this phenomenon versus the one that results from our implementation. Managerial implications are also emphasized in light of the prevalent use of more simplistic approaches.
Jorge Alejandro, Market Probe
Sharon Kim, Market Probe
Paper 1790-2014:
Money Basketball: Optimizing Basketball Player Selection Using SAS®
Over the past decade, sports analytics has seen an explosion in research and model development to calculate wins, reaching cult popularity with the release of the film 'Moneyball.' The purpose of this paper is to explore the methodology of solving a real-life Moneyball problem in basketball. An optimal basketball lineup will be selected in an attempt to maximize the total points per game while maximizing court coverage. We will briefly review some of the literature that has explored this type of problem, traditionally called the maximum coverage problem (MCP) in operations research. An exploratory data analysis will be performed, including visualizations and clustering in order to prep the modeling dataset for optimization. Finally, SAS® will be used to formulate an MCP problem, and additional constraints will be added to run different business scenarios.
Sabah Sadiq, Deloitte Consulting
Jing Zhao, Deloitte Consulting
Paper 1628-2014:
Non-Empirical Modeling: Incorporating Expert Judgment as a Model Input
In business environments, a common obstacle to effective data-informed decision making occurs when key stakeholders are reluctant to embrace statistically derived predicted values or forecasts. If concerns regarding model inputs, underlying assumptions, and limitations are not addressed, decision makers might choose to trust their gut and reject the insight offered by a statistical model. This presentation explores methods for converting potential critics into partners by proactively involving them in the modeling process and by incorporating simple inputs derived from expert judgment, focus groups, market research, or other directional qualitative sources. Techniques include biasing historical data, what-if scenario testing, and Monte Carlo simulations.
John Parker, GSK
Paper 1829-2014:
Nonnegative Least Squares Regression in SAS®
It is often the case that parameters in a predictive model should be restricted to an interval that is either reasonable or necessary given the model s application. A simple and classic example of such a restriction is the regression model which requires that all parameters to be positive. In the case of multiple least squares (MLS) regression, the resulting model is therefore strictly additive and, in certain applications, not only appropriate but also intuitive. This special case of an MLS model is commonly referred to as a nonnegative least squares regression. While Base SAS® contains a multitude of ways to perform a multiple least squares regression (PROC REG and PROC GLM, to name two), there exists no native SAS® procedure to conduct a nonnegative least squares regression. The author offers a concise way to conduct the nonnegative least squares analysis by using PRON NLIN (proc non-linear ). PROC NLIN offers user restriction on parameter estimates. By fashioning a linear model in the framework of a nonlinear procedure, the end result can be achieved. As an additional corollary, the author will show how to calculate the _RSQUARE_ statistic for the resulting model, which has been left out of the PROC NLIN output for the reason that it is invalid in most cases (though not ours).
Matthew Duchnowski, Educational Testing Service (ETS)
Paper 1751-2014:
Ordering Columns in a SAS® Data Set: Should You Really RETAIN That?
When viewing and working with SAS® data sets especially wide ones it s often instinctive to rearrange the variables (columns) into some intuitive order. The RETAIN statement is one of the most commonly cited methods used for ordering variables. Though RETAIN can perform this task, its use as an ordering clause can cause a host of easily missed problems due to its intended function of retaining values across DATA step iterations. This risk is especially great for the more novice SAS programmer. Instead, two equally effective and less risky ways to order data set variables are recommended, namely, the FORMAT and SQL SELECT statements.
Andrew Clapson, Statistics Canada
Paper 1738-2014:
PROC STREAM and SAS® Server Pages: Generating Custom HTML Reports
ODS is a power tool for generating HTML-based reports. Quite often, however, there are exacting requirements for report content, layout, and placement that can be done with HTML (and especially HTML5) that can t be done with ODS. This presentation shows several examples that use PROC STREAM and SAS® Server Pages in a batch (for example, scheduled tasks, using SAS® Display Manager, using SAS® Enterprise Guide®) to generate such custom reports. And yes, despite the name SAS Server Pages, this technology, including the use of jQuery widgets, does apply to batch environments. This paper describes and shows several examples that are similar to those presented in the SAS® Press book SAS Server Pages: Generating Dynamic Content ( and on the author s blog Jurassic SAS in the BI/EBI World ( creating a custom calendar; a sample mail-merge application; generating a custom Microsoft Excel-based report; and generating an expanding drill-down table.
Don Henderson, Henderson Consulting Services
Paper 1737-2014:
PROC STREAM and SAS® Server Pages: Generating Custom User Interfaces
Quite often when building web applications that use either the SAS® Stored Process Server or the SAS/IntrNet® Applications Dispatcher, it is necessary to create a custom user interface to prompt for the needed parameters. For example, generating a custom user interface can be accomplished by chaining stored processes together. The first stored process generates the user interface where the user selects the desired options and uses PROC STREAM to process and input SAS® Server Pages to display the user interface. The second (or later) stored process in the chain generates the desired output. This paper describes and shows several examples similar to those presented in the SAS® Press book SAS Server Pages: Generating Dynamic Content ( and on the author s blog Jurassic SAS in the BI/EBI World (
Don Henderson, Henderson Consulting Services
Paper 1730-2014:
PROC TABULATE: Extending This Powerful Tool Beyond Its Limitations
PROC TABULATE is a powerful tool for creating tabular summary reports. Its advantages, over PROC REPORT, are that it requires less code, allows for more convenient table construction, and uses syntax that makes it easier to modify a table s structure. However, its inability to compute the sum, difference, product, and ratio of column sums has hindered its use in many circumstances. This paper illustrates and discusses some creative approaches and methods for overcoming these limitations, enabling users to produce needed reports and still enjoy the simplicity and convenience of PROC TABULATE. These methods and skills can have prominent applications in a variety of business intelligence and analytics fields.
Justin Jia, Canadian Imperial Bank of Commerce (CIBC)
Amanda Lin, Bell Canada
Paper 1766-2014:
Parameter Estimation of Cognitive Attributes Using the Crossed Random-Effects Linear Logistic Test Model with PROC GLIMMIX
The linear logistic test model (LLTM) that incorporates the cognitive task characteristics into the Rasch model has been widely used for various purposes in educational contexts. However, the LLTM model assumes that the variance of item difficulties is completely accounted for by cognitive attributes. To overcome the disadvantages of the LLTM, Janssen and colleagues (2004) proposed the crossed random-effects (CRE) LLTM by adding the error term on item difficulty. This study examines the accuracy and precision of the CRE-LLTM in terms of parameter estimation for cognitive attributes. The effect of different factors (for example, sample size, population distributions, sparse or dense matrices, and test length), is examined. PROC GLIMMIX was used to do the analysis and SAS/IML® software was used to generate data.
Chunhua Cao, University of South Florida
Yan Wang, University of South Florida
Yi-hsin Chen, University of South Florida
Isaac Li, University of South Florida
Paper 1902-2014:
Plotting Differences Among LS-means in Generalized Linear Models
The effectiveness of visual interpretation of the differences between pairs of LS-means in a generalized linear model includes the graph's ability to display four inferential and two perceptual tasks. Among the types of graphs which display some or all of these tasks are the forest plot, the mean-mean scatter plot (diffogram), and closely related to it, the mean-mean multiple comparison (MMC) plot. These graphs provide essential visual perspectives for interpretation of the differences among pairs of LS-means from a generalized linear model (GLM). The diffogram is a graphical option now available through ODS statistical graphics with linear model procedures such as GLIMMIX. Through combining ODS output files of the LS-means and their differences, the SGPLOT procedure can efficiently produce forest and MMC plots.
Robin High, University of Nebraska Medical Center
Paper 1240-2014:
Powerful and Hard-to-find PROC SQL Features
The SQL procedure contains many powerful and elegant language features for intermediate and advanced SQL users. This presentation discusses topics that will help SAS® users unlock the many powerful features, options, and other gems found in the SQL universe. Topics include CASE logic; a sampling of summary (statistical) functions; dictionary tables; PROC SQL and the SAS macro language interface; joins and join algorithms; PROC SQL statement options _METHOD, MAGIC=101, MAGIC=102, and MAGIC=103; and key performance (optimization) issues.
Kirk Paul Lafler, Software Intelligence Corporation
Paper 1506-2014:
Practical Considerations in the Development of a Suite of Predictive Models for Population Health Management
The use of predictive models in healthcare has steadily increased over the decades. Statistical models now are assumed to be a necessary component in population health management. This session will review practical considerations in the choice of models to develop, criteria for assessing the utility of the models for production, and challenges with incorporating the models into business process flows. Specific examples of models will be provided based upon work by the Health Economics team at Blue Cross Blue Shield of North Carolina.
Daryl Wansink, Blue Cross Blue Shield of North Carolina
Paper 1851-2014:
Predicting a Child Smoker Using SAS® Enterprise Miner 12.1
Over the years, there has been a growing concern about consumption of tobacco among youth. But no concrete studies have been done to find what exactly leads the children to start consuming tobacco. This study is an attempt to figure out the potential reasons for the same. Through our analysis, we have also tried to build A model to predict whether a child would smoke next year or not. This study is based on the 2011 National Youth Tobacco Survey data of 18,867 observations. In order to prepare data for insightful analysis, imputation operations were performed on the data using tree-based imputation methods. From a pool of 197 variables, 48 key variables were selected using variable selection methods, partial least squares, and decision tree models. Logistic Regression and Decision Tree models were built to predict whether a child would smoke in the next year or not. Comparing the models using Misclassification rate as the selection criteria, we found that the Stepwise Logistic Regression Model outperformed other models with a Validation Misclassification of 0.028497, 47.19% Sensitivity and 95.80% Specificity. Factors such as company of friends, cigarette brand ads, accessibility to the tobacco products, and passive smoking turned out to be the most important predictors in determining a child smoker. After this study, we could outline some important findings like the odds of a child taking up smoking are 2.17 times high when his close friends are also smoking.
Goutam Chakraborty, Oklahoma State University
JIN HO JUNG, Oklahoma State University
Gaurav Pathak, Oklahoma State University
Paper 1859-2014:
Prediction of the Rise in Sea Level by the Memory-Based Reasoning Model Using SAS® Enterprise Miner 12.1
An increase in sea levels is a potential problem that is affecting the human race and marine ecosystem. Many models are being developed to find out the factors that are responsible for it. In this research, the Memory-Based Reasoning model looks more effective than most other models. This is because this model takes the previous solutions and predicts the solutions for forthcoming cases. The data was collected from NASA. The data contains 1,072 observations and 10 variables such as emissions of carbon dioxide, temperature, and other contributing factors like electric power consumption, total number of industries established, and so on. Results of Memory-Based Reasoning models like RD tree, scan tree, neural networks, decision tree, and logistic regression are compared. Fit statistics, such as misclassification rate and average squared error are used to evaluate the model performance. This analysis is used to predict the rise in sea levels in the near future and to take the necessary actions to protect the environment from global warming and natural disasters.
Prasanna K S Sailaja Bhamidi, Oklahoma State University
Goutam Chakraborty, Oklahoma State University
Paper 1634-2014:
Productionalizing SAS® for Enterprise Efficiency At Kaiser Permanente
In this session, you learn how Kaiser Permanente has taken a centralized production support approach to using SAS® Enterprise Guide® 4.3 in the healthcare industry. Kaiser Permanente Northwest (KPNW) has designed standardized processes and procedures that have allowed KPNW to streamline the support of production content, which enabled KPNW analytical resources to focus more on new content development rather than on maintenance and support of steady state programs and processes. We started with over 200 individual SAS® processes across four different SAS platforms, SAS Enterprise Guide, Mainframe SAS®, PC SAS® and SAS® Data Integration Studio, in oder to standardize our development approach on SAS Enterprise Guide and build efficient and scalable processes within our department and across the region. We walk through the need for change, how the team was set up, provide an overview of the UNIX SAS platform, walk through the standard production requirements (developer pack), and review lessons learned.
Ryan Henderson, Kaiser Permanente
Karl Petith, Kaiser Permanente
Paper 2036-2014:
Programmatic Challenges of Dose Tapering Using SAS®
In a good clinical study, statisticians and various stakeholders are interested in assessing and isolating the effect of non-study drugs. One common practice in clinical trials is that clinical investigators follow the protocol to taper certain concomitant medications in an attempt to prevent or resolve adverse reactions and/or to minimize the number of subject withdrawals due to lack of efficacy or adverse event. To assess the impact of those tapering medicines during study is of high interest to clinical scientists and the study statistician. This paper presents the challenges and caveats of assessing the impact of tapering a certain type of concomitant medications using SAS® 9.3 based on a hypothetical case. The paper also presents the advantages of visual graphs in facilitating communications between clinical scientists and the study statistician.
Iuliana Barbalau, Santen Inc.
Chen Shi, Santen Inc
Yang Yang, Santen Inc.
Paper 1772-2014:
Programming in a Distributed Data Network Environment: A Perspective from the Mini-Sentinel Pilot Project
Multi-site health science-related, distributed data networks are becoming increasingly popular, particularly at a time where big data and privacy are often competing priorities. Distributed data networks allow individual-level data to remain behind the firewall of the data holder, permitting the secure execution of queries against those local data and the return of aggregated data produced from those queries to the requester. These networks allow the use of multiple, varied sources of data for study purposes ranging from public health surveillance to comparative effectiveness research, without compromising data holders concerns surrounding data security, patient privacy, or proprietary interests. This paper focuses on the experiences of the Mini-Sentinel pilot project as a case study for using SAS® to design and build infrastructure for a successful multi-site, collaborative, distributed data network. Mini-Sentinel is a pilot project sponsored by the U.S. Food and Drug Administration (FDA) to create an active surveillance system the Sentinel System to monitor the safety of FDA-regulated medical products. The paper focuses on the data and programming aspects of distributed data networks but also visits governance and administrative issues as they relate to the maintenance of a technical network.
Jennifer Popovic, Harvard Pilgrim Health Care Institute/Harvard Medical School
Paper 1459-2014:
Quick Hits: My Favorite SAS® Tricks
Are you time-poor and code-heavy? It's easy to get into a rut with your SAS® code, and it can be time-consuming to spend your time learning and implementing improved techniques. This presentation is designed to share quick improvements that take five minutes to learn and about the same time to implement. The quick hits are applicable across versions of SAS and require only Base SAS® knowledge. Included topics are: simple macro tricks little-known functions that get rid of messy coding dynamic conditional logic data summarization tips to reduce data and processing testing and space utilization tips. This presentation has proven valuable to beginner through experienced SAS users.
Marje Fecht, Prowerk Consulting
Paper 1401-2014:
Reading In Data Directly from Microsoft Word Questionnaire Forms
If someone comes to you with hundreds of questionnaire forms in Microsoft Word file format and asks you to extract the data from the forms into a SAS® data set, you might have several ways to handle this. However, as a SAS programmer, the shortcut is to write a SAS program to read the data directly from Word files into a SAS data set. This paper shows how it can be done with simple SAS programming skills, such as using FILENAME with the PIPE option, DDE, function call EXECUTE( ), and so on.
Sijian Zhang, VA Pittsburgh Healthcare System
Paper 1886-2014:
Recommending News Articles Using Cosine Similarity Function
Predicting news articles that customers are likely to view/read next provides a distinct advantage to news sites. Collaborative filtering is a widely used technique for the same. This paper details an approach within collaborative filtering that uses the cosine similarity function to achieve this purpose. The paper further details two different approaches, customized targeting and article level targeting, that can be used in marketing campaigns. Please note that this presentation connects with Session ID 1887. Session ID 1887 happens immediately following this session
Rajendra Ledalla Venkata Naga, GE Capital Retail Finance
Qing Wang, Warwick Business School
John Dilip Raj, GE Capital Retail Finance
Paper 1887-2014:
Recommending TV Programs Using Correlation
Personalized recommender systems are being used in many industries to increase customer engagement. In the TV industry, this is primarily used to increase viewership, which in turn increases market share, revenue, and profit. This paper attempts to develop a recommender system using the correlation procedure under collaborative filtering methodology. The only data requirement for this recommendation system would be past viewership of customers for a given time period. Please note that this session connects with Session ID 1886. Session ID 1886 happens immediately prior to this session
Rajendra Ledalla Venkata Naga, GE Capital Retail Finance
Qing Wang, Warwick Business School
John Dilip Raj, GE Capital Retail Finance
Paper 1502-2014:
Regression Analysis of Duration and Severity Data: New Capabilities with SAS® Software
Duration and severity data arise in several fields including biostatistics, demography, economics, engineering, and sociology. SAS® procedures LIFETEST, LIFEREG. and PHREG are the workhorses for analysis of time to event data in applications in biostatistics. Similar methods apply to the magnitude or severity of a random event, where the outcome might be right, left, or interval censored and/or, right or left truncated. All combinations of types of censoring and truncation could be present in the data set. Regression models such as the accelerated failure time model, the Cox model, and the non-homogeneous Poisson model have extensions to address time-varying covariates in the analysis of clustered outcomes, multivariate outcomes of mixed types, and recurrent events. We present an overview of new capabilities that are available in the procedures QLIM, QUANTLIFE, RELIABILITY, and SEVERITY with examples illustrating their application using empirical data sets drawn from easily accessible sources.
Joseph Gardiner, Michigan State University
Paper 1489-2014:
Reporting Healthcare Data: Understanding Rates and Adjustments
In healthcare, we often express our analytics results as being adjusted . For example, you might have read a study in which the authors reported the data as age-adjusted or risk-adjusted. The concept of adjustment is widely used in program evaluation, comparing quality indicators across providers and systems, forecasting incidence rates, and in cost-effectiveness research. In order to make reasonable comparisons across time, place, or population, we need to account for small sample sizes and case-mix variation in other words, we need to level the playing field and account for differences in health status and for uniqueness in a given population. If you are new to healthcare. What it really means to adjust the data in order to make comparisons might not be obvious. In this paper, we explore the methods by which we control for potentially confounding variables in our data. We do so through a series of examples from the healthcare literature in both primary care and health insurance. In this survey of methods, we discuss the concepts of rates and how they can be adjusted for demographic strata (such as age, gender, and race), as well as health risk factors such as case mix.
Greg Nelson, ThotWave
Paper 1846-2014:
Revealing Human Mobility Behavior and Trips Prediction Based on Mobile Data Records
This paper reveals the human mobility behavior in the metropolitan area of Rio de Janeiro, Brazil. The base for this study is the mobile phone data provided by one of the largest mobile carriers in Brazil. Mobile phone data comprises a reasonable variety of information, including data about time and location for call activity throughout urban areas. This information might be used to build users trajectories over time, describing the major characteristics of the urban mobility within the city. A variety of distribution analyses is presented in this paper aiming clearly describes the most relevant characteristics of the overall mobility in the metropolitan area of Rio de Janeiro. In addition to that, methods from physics to describe trends in trips such as gravity and radiation models were computed and compared in terms of granularity of the geographic scales and also in relation to traditional data mining approach such as linear regressions. A brief comparison in terms of performance in predicting the amount of trips between pairs of locations is presented at the end.
Carlos Andre Reis Pinheiro, KU Leuven
Paper 1783-2014:
Revealing Unwarranted Access to Sensitive Data: A Scenario-based Approach
The project focuses on using analytics to reveal unwarranted use of access to medical records, i.e. employees in health organizations that access information about neighbours, friends, celebrities, etc., without a sound reason to do so. The method is based on the natural assumption that the vast majority of lookups are legitimate lookups that differ from a statistically defined normal behavior will be subject to manual investigation. The work was carried out in collaboration between SAS Institute Norway and the largest Norwegian hospital, Oslo University Hospital (OUS) and was aimed at establishing whether the method is suitable for unveiling unwarranted lookups in medical records. A number of so called scenarios are used to indicate adverse behaviour, each responsible for looking at one particular aspect of journal access data. For instance, one scenario determines the timeliness of a lookup relative to the patient's admission history; another judges whether the medical competency of the employee is relevant to the situation of the patient at the time of the lookup. We have so far designed and developed a library of around 20 scenarios that together are used in weighted combination to render a final judgment of the appropriateness of the lookup. The approach has been proven highly successful, and a further development of these ideas is currently being done, the aim of which is to establish a joint Norwegian solution to the problem of unwarranted access. Furthermore, we believe that the approach and the framework may be utilised in many other industries where sensitive data is being processed, such as financial, police, tax and social services. In this paper, the method is outlined, as well as results of its application on data from OUS.
Heidi Thorstensen, Oslo University Hospital
Torulf Mollestad, SAS
Paper 1650-2014:
Risk Factors and Outcome of Spinal Epidural Abscess from Incident Hemodialysis Patients from the United States Renal Data System between 2005 and 2008
Spinal epidural abscess (SEA) is a serious complication in hemodialysis (HD) patients, yet there is little medical literature that discusses it. This analysis identified risk factors and co-morbidities associated with SEA, as well as risk factors for mortality following the diagnosis. All incident HD cases from the United States Renal Data System for calendar years 2005 2008 were queried for a diagnosis of SEA. Potential clinical covariates, survival, and risk factors were recovered using ICD-9 diagnosis codes. Log-binomial regressions were performed using PROC GENMOD to assess the relative risks, and Cox regression models were run using PROC PHREG to estimate hazard ratios for mortality. For the 4-year study period, 660/355084 (0.19%) HD patients were identified with SEA, the largest cohort to date. Older age (RR=1.625), infectious comorbidities including bacteremia (RR=7.7976), methicillin-resistant Staphylococcus aureus infection (RR=2.6507), hepatitis C (RR=1.545), and non-infectious factors including diabetes (RR=1.514) and presence of vascular catheters (RR=1.348) were identified as significant risk factors for SEA. SEA in HD patients was associated with an increased risk of death (HR=1.20). Older age (HR=2.269), the presence of dialysis catheters (HR=1.884), cirrhosis (HR=1.715), decubitus ulcers (HR=1.669), bacteremia (HR=1.407), and total parenteral nutrition (HR=1.376) constitute the greatest risk factors for death after SEA diagnosis and thus necessitate a comprehensive approach to management.
Chan Jin, Georgia Regents University
Jennifer White, Georgia Regents University
Rhonda Colombo, Georgia Regents University
Stephanie Baer, Georgia Regents University and Augusta VAMC
Usman Afza, Georgia Regents University
M. Kheda, Georgia Regents University
Lu Huber, Georgia Regents University
Puja Chebrolu, Georgia Regents University
N. Stanley Nahman, Georgia Regents University and Augusta VAMC
Kristina Kintziger, Georgia Regents University
Paper 2026-2014:
SAS® Data Mining for Predictor Identification: Developing Strategies for High School Dropout Prevention
The high school dropout problem has been called a national crisis (Heppen & Therriault, 2008). Almost one-third of all high school students leave the public school system before graduating (Swanson, 2004), and the problem is particularly severe among minority students (Greene & Winters, 2005; U.S. Department of Education, 2006). Educators, researchers, and policymakers continue to work to identify effective dropout prevention strategies. One effective approach is to identify high-risk students at an early stage, and then provide corresponding interventions to keep them in school. One of the strengths of Educational Data Mining is to reveal hidden patterns and predict future performance by analyzing accessible student data. These predictive algorithms generated by predictive modeling can serve as an early warning system. However, because individual schools and districts have various combinations of race, gender, and socioeconomic status, we cannot use a set of standardized predictors and obtain satisfactory predictive results. Analyzing a limited number of variables and limited historical data does not generate accurate models. Additionally, the predictive model might not consider interactions among predictors. The strength of data mining is the capability to analyze a large amount of data and variables. Multiple analytic strategies (including model comparisons) can be applied to maximize model performance. For future goals, we propose a debuted data mining framework to construct an early warning and trend analysis system with components of data warehousing, data mining, and reporting at the levels of individual students, schools, school districts, and the entire state.
Wendy Dickinson, Ringling College of Art + Design
Morgan Wang, University of Central Florida
Paper 2126-2014:
SAS® Enterprise Guide® 5.1: A Powerful Environment for Programmers, Too!
Have you been programming in SAS® for a while and just aren t sure how SAS® Enterprise Guide® can help you? This presentation demonstrates how SAS programmers can use SAS Enterprise Guide 5.1 as their primary interface to SAS, while maintaining the flexibility of writing their own customized code. We explore: navigating and customizing the SAS Enterprise Guide environment using SAS Enterprise Guide to access existing programs and enhance processing exploiting the enhanced development environment including syntax completion and built-in function help using SAS® Code Analyzer, Report Builder, and Document Builder adding Project Parameters to generalize the usability of programs and processes leveraging built-in capabilities available in SAS Enterprise Guide to further enhance the information you deliver Our audience is SAS users who understand the basics of SAS programming and want to learn how to use SAS Enterprise Guide. This paper is also appropriate for users of earlier versions of SAS Enterprise Guide who want to try the enhanced features available in SAS Enterprise Guide 5.1.
Marje Fecht, Prowerk Consulting
Rupinder Dhillon, Dhillon Consulting
Paper 1559-2014:
SAS® Grid Manager I/O: Optimizing SAS® Application Data Availability for the Grid
As organizations deploy SAS® applications to produce the analytical results that are critical for solid decision making, they are turning to distributed grid computing operated by SAS® Grid Manager. SAS Grid Manager provides a flexible, centrally managed computing environment for processing large volumes of data for analytical applications. Exceptional storage performance is one of the most critical components of implementing SAS in a distributed grid environment. When the storage subsystem is not designed properly or implemented correctly, SAS applications do not perform well, thereby reducing a key advantage of moving to grid computing. Therefore, a well-architected SAS environment with a high-performance storage environment is integral to clients getting the most out of their investment. This paper introduces concepts from software storage virtualization in the cloud for the generalized SAS Grid Manager architecture, highlights platform and enterprise architecture considerations, and uses the most popularly selected distributed file system, IBM GPFS, as an example. File system scalability considerations, configuration details, and tuning suggestions are provided in a manner that can be applied to a client s own environment. A summary checklist of important factors to consider when architecting and deploying a shared, distributed file system is provided.
Gregg Rohaly, IBM
Harry Seifert, IBM
Paper 1724-2014:
SAS® Macros 101
You've been coding in Base SAS® for a while. You've seen it, maybe even run code written by someone else, but there is something about the SAS® Macro Language that is preventing you from fully embracing it. Could it be that % sign that appears everywhere, that &, that &&, or even that dreaded &&&? Fear no more. This short presentation will make everything clearer and encourage you to start coding your own SAS macros.
Alex Chaplin, Bank of America
Paper 1622-2014:
SAS® Solutions to Identifying Hospital Readmissions
Hospital readmission rates have become a key indicator for measuring the quality of health care. Currently, use of these rates has been adopted by major healthcare stakeholders, including the Centers for Medicare & Medicaid Services (CMS), the Agency for Healthcare Research and Quality (AHRQ), and the National Committee for Quality Assurance (NCQA). In the calculation of the readmission rate, it is often a challenging task to identify eligible hospital readmissions from the convoluted administrative claims data. By taking advantage of the flexibility and power of SAS® programming tools, this paper proposes three different solutions using both DATA step and PROC SQL to help identify 30-day hospital readmissions more efficiently and accurately. Solution 1 (DATA STEP vertically) employs the LAG function to calculate the gap between the current admission date and the immediate previous discharge date. This vertical thinking process is straightforward and does not require additional data management. Solution 2 (DATA STEP horizontally) uses PROC TRANSPOSE procedures, ARRAYs, and DO loops to transform claims data from long to wide, and examines each patient s hospitalization experiences in just one line. A similar horizontal thinking process has been discussed in previous SAS papers for calculating medication utilization. Solution 3 (PROC SQL) takes advantage of a special table joining (self-join) by creating a Cartesian product further subsetted by a joining condition and WHERE statements. All three solutions have achieved the same results by correctly identifying 30-day hospital readmissions, and they can be handily applied to tackle similar programming challenges in research projects.
Weifeng Fan, UMWA Health and Retirement Funds
Maryam Sarfarazi, UMWA Health and Retirement Funds
Paper 1663-2014:
SAS® Visual Analytics Deliverers Insights into the UK University League Tables
Universities in the UK are now subject to League Table reporting by a range of providers. The criteria used by each League Table differ. Universities, their faculties, and individual subject areas want to understand how the different tables are constructed and calculated, and what is required in order to maximize their position in each league table in order to attract the best students to their institution, thereby maximizing recruitment and student-related income streams. The School of Computing and Maths at the University of Derby is developing the use SAS® Visual Analytics to analyse each league table to provide actionable insights as to actions that can be taken to improve their relative standing in the league tables and also to gain insights into feasible levels of targets relative to the peer groups of institutions. This paper outlines the approaches taken and some of the critical insights developed that will be of value to other higher education institutions in the UK, and suggests useful approaches that might be valuable in other countries.
Richard Self, University of Derby
Stuart Berry, University of Derby
Claire Foyle, University of Derby
Dave Voorhis, University of Derby
Paper 1282-2014:
SAS® XML Programming Techniques
Due to XML's growing role in data interchange, it is increasingly important for SAS® programmers to become proficient with SAS technologies and techniques for creating and consuming XML. The current work expands on a SAS® Global Forum 2013 presentation that dealt with these topics providing additional examples of using XML maps to read and write XML files and using the Output Delivery System (ODS) to create custom tagsets for generating XML.
Chris Schacherer, Clinical Data Management Systems, LLC
Paper 2027-2014:
SAS® and Java Application Integration for Dummies
Traditionally, Java web applications interact with back-end databases by means of JDBC/ODBC connections to retrieve and update data. With the growing need for real-time charting and complex analysis types of data representation on these types of web applications, SAS® computing power can be put to use by adding a SAS web service layer between the application and the database. This paper shows how a SAS web service layer can be used to render data to a JAVA application in a summarized form using SAS® Stored Processes. This paper also demonstrates how inputs can be passed to a SAS Stored Process based on which computations/summarizations are made before output parameter and/or output data streams are returned to the Java application. SAS Stored Processes are then deployed as SAS® BI Web Services using SAS® Management Console, which are available to the JAVA application as a URL. We use the SOAP method to interact with the web services. XML data representation is used as a communication medium. We then illustrate how RESTful web services can be used with JSON objects being the communication medium between the JAVA application and SAS in SAS® 9.3. Once this pipeline communication between the application, SAS engine, and database is set up, any complex manipulation or analysis as supported by SAS can be incorporated into the SAS Stored Process. We then illustrate how graphs and charts can be passed as outputs to the application.
Neetha Sindhu, Kavi Associates
Hari Hara Sudhan, Kavi Associates
Mingming Wang, Kavi Associates
Paper 1631-2014:
SAS® as a Code Manipulation Language: An Example of Writing a Music Exercise Book with Lilypond and SAS.
Using Lilypond typesetting software, you can write publication-grade music scores. The input for Lilypond is a text file that can be written once and then transferred to SAS® for patterned repetition, so that you can cycle through patterns that occur in music. The author plays a sequence of notes and then writes this into Lilypond code. The sequence starts in the key of C with only a two-note sequence. Then the sequence is extended to three-, four-, then five-note sequences, always contained in one octave. SAS is then used to write the same code for all other eleven keys and in seven scale modes. The method is very simple and not advanced programming. Lookup files are used in the programming, demonstrating efficient lookup techniques. The result is a lengthy book or exercise for practicing music in a PDF file, and a sound source file in midi format is created that you can hear. This method shows how various programming languages can be used to write other programming languages.
Peter Timusk, Statistics Canada
Paper 1569-2014:
SAS® for Bayesian Mediation Analysis
Statistical mediation analysis is common in business, social sciences, epidemiology, and related fields because it explains how and why two variables are related. For example, mediation analysis is used to investigate how product presentation affects liking the product, which then affects the purchase of the product. Mediation analysis evaluates the mechanism by which a health intervention changes norms that then change health behavior. Research on mediation analysis methods is an active area of research. Some recent research in statistical mediation analysis focuses on extracting accurate information from small samples by using Bayesian methods. The Bayesian framework offers an intuitive solution to mediation analysis with small samples; namely, incorporating prior information into the analysis when there is existing knowledge about the expected magnitude of mediation effects. Using diffuse prior distributions with no prior knowledge allows researchers to reason in terms of probability rather than in terms of (or in addition to) statistical power. Using SAS® PROC MCMC, researchers can choose one of two simple and effective methods to incorporate their prior knowledge into the statistical analysis, and can obtain the posterior probabilities for quantities of interest such as the mediated effect. This project presents four examples of using PROC MCMC to analyze a single mediator model with real data using: (1) diffuse prior information for each regression coefficient in the model, (2) informative prior distributions for each regression coefficient, (3) diffuse prior distribution for the covariance matrix of variables in the model, and (4) informative prior distribution for the covariance matrix.
Miočević Milica, Arizona State University
David MacKinnon, Arizona State University
Paper 1503-2014:
Scatter Plot Smoothing Using PROC LOESS and Restricted Cubic Splines
SAS® has a number of procedures for smoothing scatter plots. In this tutorial, we review the nonparametric technique called LOESS, which estimates local regression surfaces. We review the LOESS procedure and then compare it to a parametric regression methodology that employs restricted cubic splines to fit nonlinear patterns in the data. Not only do these two methods fit scatterplot data, but they can also be used to fit multivariate relationships.
Jonas Bilenas, Barclays UK&E RBB
Paper 1760-2014:
Scenarios Where Utilizing a Spline Model in Developing a Regression Model Is Appropriate
Linear regression has been a widely used approach in social and medical sciences to model the association between a continuous outcome and the explanatory variables. Assessing the model assumptions, such as linearity, normality, and equal variance, is a critical step for choosing the best regression model. If any of the assumptions are violated, one can apply different strategies to improve the regression model, such as performing transformation of the variables or using a spline model. SAS® has been commonly used to assess and validate the postulated model and SAS® 9.3 provides many new features that increase the efficiency and flexibility in developing and analyzing the regression model, such as ODS Statistical Graphics. This paper aims to demonstrate necessary steps to find the best linear regression model in SAS 9.3 in different scenarios where variable transformation and the implementation of a spline model are both applicable. A simulated data set is used to demonstrate the model developing steps. Moreover, the critical parameters to consider when evaluating the model performance are also discussed to achieve accuracy and efficiency.
Ning Huang, University of Southern California
Paper 1279-2014:
Selecting Peer Institutions with Cluster Analysis
Universities strive to be competitive in the quality of education as well as cost of attendance. Peer institutions are selected to make comparisons pertaining to academics, costs, and revenues. These comparisons lead to strategic decisions and long-range planning to meet goals. The process of finding comparable institutions could be completed with cluster analysis, a statistical technique. Cluster analysis places universities with similar characteristics into groups or clusters. A process to determine peer universities will be illustrated using PROC STANDARD, PROC FASTCLUS, and PROC CLUSTER.
Diana Suhr, University of Northern Colorado
Paper 1689-2014:
Simple ODS Tips to Get RWI (Really Wonderful Information)
SAS® continues to expand and improve its reporting capability. With new SAS® 9.4 enhancements in ODS (Output Delivery System), the opportunity to create stunning reports has expanded even further. If you are charged with creating relevant, informative, easy-to-read reports for clients or administrators, then the ODS Report Writing Interface, ODS LAYOUT enhancements, and the new ODSTEXT procedure are important tools to use. These tools allow you to create reports in a smart, eye-catching format that can be turned around quite quickly and programmed to provide optimum flexibility. How many times have you worked hours to tweak and fine-tune a report directly in Microsoft Excel, Microsoft Word, Microsoft Power Point or some other similar software only to be asked for a quick update , which would then take hours to recreate because you are manually transferring data? Do you ever dread receiving the compliment, This is really wonderful information!!!! because you know it will be followed by Can you run this for EVERY region? Well, dread no more, because when you harness the power of SAS® ODS, you can create first-rate, flexible, fabulous reports! Join me as I share with you two real-world examples of ODS capabilities using (1) a marketing piece I designed to help the president of our university spotlight county- and region-specific data as he recruited across the state and (2) our academic program review form, a multi-page report that outputs to Word so that program coordinators can add personalized commentary to support their program s effectiveness.
Gina Huff, Western Kentucky University
Paper 1748-2014:
Simulation of MapReduce with the Hash-of-Hashes Technique
Big data is all the rage these days, with the proliferation of data-accumulating electronic gadgets and instrumentation. At the heart of big data analytics is the MapReduce programming model. As a framework for distributed computing, MapReduce uses a divide-and-conquer approach to allow large-scale parallel processing of massive data. As the name suggests, the model consists of a Map function, which first splits data into key-value pairs, and a Reduce function, which then carries out the final processing of the mapper outputs. It is not hard to see how these functions can be simulated with the SAS® hash objects technique, and in reality, implemented in the new SAS® DS2 language. This paper demonstrates how hash object programming can handle data in a MapReduce fashion and shows some potential applications in physics, chemistry, biology, and finance.
Joseph Hinson, Accenture Life Sciences
Paper 1610-2014:
Something for Nothing! Converting Plots from SAS/GRAPH® to ODS Graphics
All the documentation about the creation of graphs with SAS® software states that ODS Graphics is not intended to replace SAS/GRAPH®. However, ODS Graphics is included in the Base SAS® license from SAS® 9.3, but SAS/GRAPH still requires an additional component license, so there is definitely a financial incentive to convert to ODS Graphics. This paper gives examples that can be used to replace commonly created SAS/GRAPH plots, and highlights the small number of plots that are still very difficult, or impossible, to create in ODS Graphics.
Philip Holland, Holland Numerics Ltd
Paper 1645-2014:
Speed Dating: Looping Through a Table Using Dates
Have you ever needed to use dates as values to loop through a table? For example, how many events occurred by 1, 2 , 3 & n months ahead? Maybe you just changed the dates manually and re-ran the query n times? This is a common need in economic and behavioral sciences. This presentation demonstrates how to create a table of dates that can be used with SAS® macro variables to loop through a table. Using this dates table in combination with the SAS DO loop ensures accuracy and saves time.
Scott Fawver, Arch Mortgage Insurance Company
Paper 1586-2014:
Stylish Waterfall Graphs Using SAS® 9.3 and SAS® 9.4 Graph Template Language
One beautiful graph provides visual clarity of data summaries reported in tables and listings. Waterfall graphs show, at a glance, the increase or decrease of data analysis results from various industries. The introduction of SAS® 9.2 ODS Statistical Graphics enables SAS® programmers to produce high-quality results with less coding effort. Also, SAS programmers can create sophisticated graphs in stylish custom layouts using the SAS® 9.3 Graph Template Language and ODS style template. This poster presents two sets of example waterfall graphs in the setting of clinical trials using SAS® 9.3 and later. The first example displays colorful graphs using new SAS 9.3 options. The second example displays simple graphs with gray-scale color coding and patterns. SAS programmers of all skill levels can create these graphs on UNIX or Windows.
Setsuko Chiba, Exelixis Inc.
Paper 1443-2014:
Summarizing Data for a Systematic Review
Systematic reviews have become increasingly important in healthcare, particularly when there is a need to compare new treatment options and to justify clinical effectiveness versus cost. This paper describes a method in SAS/STAT® 9.2 for computing weighted averages and weighted standard deviations of clinical variables across treatment options while correctly using these summary measures to make accurate statistical inference. The analyses of data from systematic reviews typically involve computations of weighted averages and comparisons across treatment groups. However, the application of the TTEST procedure does not currently take into account weighted standard deviations when computing p-values. The use of a default non-weighted standard deviation can lead to incorrect statistical inference. This paper introduces a method for computing correct p-values using weighted averages and weighted standard deviations. Given a data set containing variables for three treatment options, we want to make pairwise comparisons of three independent treatments. This is done by creating two temporary data sets using PROC MEANS, which yields the weighted means and weighted standard deviations. Subsequently, we then perform a t-test on each temporary data set.The resultant data sets containing all comparisons of each treatment options are merged and then transposed to obtain the necessary statistics. The resulting output provides pairwise comparisons of each treatment option and uses the weighted standard deviations to yield the correct p-values in a desired format. This method allows the use of correct weighted standard deviations using PROC MEANS and PROC TTEST in summarizing data from a systematic review while providing correct p-values.
Ravi Gaddameedi, California State University
Usha Kreaden, Intuitive Surgical
Paper 1505-2014:
Supporting SAS® Software in a Research Organization
Westat utilizes SAS® software as a core capability for providing clients in government and private industry with analysis and characterization of survey data. Staff programmers, analysts, and statisticians use SAS to manage, store, and analyze client data, as well as to produce tabulations, reports, graphs, and summary statistics. Because SAS is so widely used at Westat, the organization has built a comprehensive infrastructure to support its deployment and use. This paper provides an overview of Westat s SAS support infrastructure, which supplies resources that are aimed at educating staff, strengthening their SAS skills, providing SAS technical support, and keeping the staff on the cutting edge of SAS programming techniques.
Michael Raithel, Westat
Paper 1892-2014:
Survival of Your Heart: Analyzing the Effect of Stress on a Cardiac Event and Predicting the Survival Chances
One in every four people dies of heart disease in the United States, and stress is an important factor which contributes towards a cardiac event. As the condition of the heart gradually worsens with age, the factors that lead to a myocardial infarction when the patients are subjected to stress are analyzed. The data used for this project was obtained from a survey conducted through the Department of Biostatistics at Vanderbilt University. The objective of this poster is to predict the chance of survival of a patient after a cardiac event. Then by using decision trees, neural networks, regression models, bootstrap decision trees, and ensemble models, we predict the target which is modeled as a binary variable, indicating whether a person is likely to survive or die. The top 15 models, each with an accuracy of over 70%, were considered. The model will give important survival characteristics of a patient which include his history with diabetes, smoking, hypertension, and angioplasty.
Yogananda Domlur Seetharama, Oklahoma State University
Sai Vijay Kishore Movva, Oklahoma State University
Paper 1893-2014:
Text Analytics: Predicting the Success of Newly Released Free Android Apps Using SAS® Enterprise Miner and SAS® Sentiment Analysis Studio
With smartphone and mobile apps market developing so rapidly, the expectations about effectiveness of mobile applications is high. Marketers and app developers need to analyze huge data available much before the app release, not only to better market the app, but also to avoid costly mistakes. The purpose of this poster is to build models to predict the success rate of an app to be released in a particular category. Data has been collected for 540 android apps under the Top free newly released apps category from . The SAS® Enterprise Miner Text Mining node and SAS® Sentiment Analysis Studio are used to parse and tokenize the collected customer reviews and also to calculate the average customer sentiment score for each app. Linear regression, neural, and auto-neural network models have been built to predict the rank of an app by considering average rating, number of installations, total number of reviews, number of 1-5 star ratings, app size, category, content rating, and average customer sentiment score as independent variables. A linear regression model with least Average Squared Error is selected as the best model, and number of installations, app maturity content are considered as significant model variables. App category, user reviews, and average customer sentiment score are also considered as important variables in deciding the success of an app. The poster summarizes the app success trends across various factors and also introduces a new SAS® macro %getappdata, which we have developed for web crawling and text parsing.
Vandana Reddy, Oklahoma State University
Chinmay Dugar, Oklahoma State University
Paper 1834-2014:
Text Mining Economic Topic Sentiment for Time Series Modeling
Global businesses must react to daily changes in market conditions over multiple geographies and industries. Consuming reputable daily economic reports assists in understanding these changing conditions, but requires both a significant human time commitment and a subjective assessment of each topic area of interest. To combat these constraints, Dow's Advanced Analytics team has constructed a process to calculate sentence-level topic frequency and sentiment scoring from unstructured economic reports. Daily topic sentiment scores are aggregated to weekly and monthly intervals and used as exogenous variables to model external economic time series data. These models serve to both validate the relationship between our sentiment scoring process and also as near-term forecasts where daily or weekly variables are unavailable. This paper will first describe our process of using SAS® Text Miner to import and discover economic topics and sentiment from unstructured economic reports. The next section describes sentiment variable selection techniques that use SAS/STAT®, SAS/ETS®, and SAS® Enterprise Miner to generate similarity measures to economic indices. Our process then uses ARIMAX modeling in SAS® Forecast Studio to create economic index forecasts with topic sentiments. Finally, we show how the sentiment model components are used as a matrix of economic key performance indicators by topic and geography.
Michael P. Dessauer, The Dow Chemical Company
Justin Kauhl, Tata Consultancy Services
Paper 1483-2014:
The Armchair Quarterback: Writing SAS® Code for the Perfect Pivot (Table, That Is)
'Can I have that in Excel?' This is a request that makes many of us shudder. Now your boss has discovered Microsoft Excel pivot tables. Unfortunately, he has not discovered how to make them. So you get to extract the data, massage the data, put the data into Excel, and then spend hours rebuilding pivot tables every time the corporate data is refreshed. In this workshop, you learn to be the armchair quarterback and build pivot tables without leaving the comfort of your SAS® environment. In this workshop, you learn the basics of Excel pivot tables and, through a series of exercises, you learn how to augment basic pivot tables first in Excel, and then using SAS. No prior knowledge of Excel pivot tables is required.
Peter Eberhardt, Fernwood Consulting Group Inc.
Paper 1504-2014:
The Power of PROC FORMAT
The FORMAT procedure in SAS® is a very powerful and productive tool, yet many beginning programmers rarely make use of it. The FORMAT procedure provides a convenient way to do a table lookup in SAS. User-generated FORMATS can be used to assign descriptive labels to data values, create new variables, and find unexpected values. PROC FORMAT can also be used to generate data extracts and to merge data sets. This paper provides an introductory look at PROC FORMAT for the beginning user and provides sample code that illustrates the power of PROC FORMAT in a number of applications. Additional examples and applications of PROC FORMAT can be found in the SAS® Press book titled 'The Power of PROC FORMAT.'
Jonas Bilenas, Barclays UK&E RBB
Paper 1627-2014:
The RAKE-TRIM Algorithm: Reducing Variance and Bias in Sampling Weights
Raking (iterative proportional fitting) is a procedure that takes sampling weights from complex sample surveys and adjusts them so that they add to known control totals. This process reduces variance and adjusts for undercoverage. But raking in multiple dimensions can lead to extreme weights, which increase variance. Trimming is another sample weighting procedure that reduces extreme weights to cutoffs, thereby improving variance properties while potentially introducing bias. The RAKE-TRIM macro combines raking and trimming in an iterative algorithm to achieve these two goals simultaneously. The raking reduces the bias potential from trimming, and the trimming reduces the variance inflation from raking. When convergence occurs, the final weights aggregate to the control totals, as well as respect the trimming limits. SAS® macros are well suited for this kind of envelope program: the larger macro consists of the integration of component macros that were developed for other applications. A parameter specification sheet enables users to provide all of the parameters needed to define the algorithm for their particular situation, and, if necessary, to alter the parameters to facilitate convergence. Diagnostics are included when convergence fails. Microsoft Excel tables are imported to provide the cell structure and are exported to provide statistics for the algorithm s results. This RAKE-TRIM macro was first developed in 2010 for the 2009 National Household Transportation Survey and has been used in other studies as well. The paper describes the algorithm and discusses our experiences with it.
Louis Rizzo, Westat
Paper 1713-2014:
The Role of Customer Response Models in Customer Solicitation Center's Direct Marketing Campaign
Direct marketing is the practice of delivering promotional messages directly to potential customers on an individual basis rather than by using mass medium. In this project, we build a finely tuned response model that helps a financial services company to select high-quality receptive customers for their future campaigns and to identify the important factors that influence marketing to effectively manage their resources. This study was based on the customer solicitation center s marketing campaign data (45,211 observations and 18 variables) available on UC Irvine's web site with attributes of present and past campaign information (communication type, contact duration, previous campaign outcome, and so on) and customer s personal and banking information. As part of data preparation, we had performed mean imputation to handle missing values and categorical recoding for reducing levels of class variables. In this study, we had built several predictive models using the SAS® Enterprise Miner models Decision Tree, Neural Network, Logistic Regression, and SVM to predict whether the customer responds to the loan offer by subscribing. The results showed that the Stepwise Logistic Regression model was the best when chosen based on the misclassification rate criteria. When the top 3 decile customers were selected based on the best model, the cumulative response rate was 14.5% in contrast to the baseline response rate of 5%. Further analysis showed that the customers are more likely to subscribe to the loan offer if they have the following characteristics: never been contacted in the past, no default history, and provided cell phone as primary contact information.
Arun Mandapaka, Oklahoma State University
Amit Kushwah, Oklahoma State University
Goutam Chakraborty, Oklahoma State University
Paper 1482-2014:
The SAS® Hash Object: It's Time to .find() Your Way Around
This is the way I have always done it and it works fine for me. Have you heard yourself or others say this when someone suggests a new technique to help solve a problem? Most of us have a set of tricks and techniques from which we draw when starting a new project. Over time we might overlook newer techniques because our old toolkit works just fine. Sometimes we actively avoid new techniques because our initial foray leaves us daunted by the steep learning curve to mastery. For me, the PRX functions and the SAS® hash object fell into this category. In this workshop, we address possible objections to learning to use the SAS hash object. We start with the fundamentals of setting up the hash object and work through a variety of practical examples to help you master this powerful technique.
Peter Eberhardt, Fernwood Consulting Group Inc.
Paper 1883-2014:
The Soccer Oracle: Predicting Soccer Game Outcomes Using SAS® Enterprise Miner
Applying models to analyze sports data has always been done by teams across the globe. The film Moneyball has generated much hype about how a sports team can use data and statistics to build a winning team. The objective of this poster is to use the model comparison algorithm of SAS® Enterprise Miner to pick the best model that can predict the outcome of a soccer game. It is hence important to determine which factors influence the results of a game. The data set used contains input variables about a team s offensive and defensive abilities and the outcome of a game is modeled as a target variable. Using SAS Enterprise Miner, multinomial regression, neural networks, decision trees, ensemble models and gradient boosting models are built. Over 100 different versions of these models are run. The data contains statistics from the 2012-13 English premier league season. The competition has 20 teams playing each other in a home and away format. The season has a total of 380 games; the first 283 games are used to predict the outcome of the last 97 games. The target variable is treated as both nominal variable and ordinal variable with 3 levels for home win, away win, and tie. The gradient boosting model is the winning model which seems to predict games with 65% accuracy and identifies factors such as goals scored and ball possession as more important compared to fouls committed or red cards received.
Vandana Reddy, Oklahoma State University
Sai Vijay Kishore Movva, Oklahoma State University
Paper 1837-2014:
The Use of Analytics for Insurance Claim Fraud Detection: A Unique Challenge
Identifying claim fraud using predictive analytics represents a unique challenge. 1. Predictive analytics generally requires that you have a target variable which can be analyzed. Fraud is unique in this regard in that there is a lot of fraud that has occurred historically that has not been identified. Therefore, the definition of the target variable is difficult. 2.There is also a natural assumption that the past will bear some resemblance to the future. In the case of fraud, methods of defrauding insurance companies change quickly and can make the analysis of a historical database less valuable for identifying future fraud. 3. In an underlying database of claims that may have been determined to be fraudulent by an insurance company, there is many times an inconsistency between different claim adjusters regarding which claims are referred for investigation. This inconsistency can lead to erroneous model results due to data that is not homogenous. This paper will demonstrate how analytics can be used in several ways to help identify fraud: 1. More consistent referral of suspicious claims 2. Better identification of new types of suspicious claims 3. Incorporating claim adjuster insight into the analytics results. As part of this paper, we will demonstrate the application of several approaches to fraud identification: 1. Clustering 2. Association analysis 3. PRIDIT (Principal Component Analysis of RIDIT scores).
Roosevelt C. Mosley, Pinnacle Actuarial Resources, Inc.
Nick Kucera, Pinnacle Actuarial Resources, Inc.
Paper 1422-2014:
Time Series Mapping with SAS®: Visualizing Geographic Change over Time in the Health Insurance Industry
Changes in health insurance and other industries often have a spatial component. Maps can be used to convey this type of information to the user more quickly than tabular reports and other non-graphical formats. SAS® provides programmers and analysts with the tools to not only create professional and colorful maps, but also the ability to display spatial data on these maps in a meaningful manner that aids in the understanding of the changes that have transpired. This paper illustrates the creation of a number of different maps for displaying change over time with examples from the health insurance arena.
Barbara Okerson, WellPoint
Paper 1890-2014:
Tips for Moving from Base SAS® 9.3 to SAS® Enterprise Guide® 5.1
As a longtime Base SAS® programmer, whether to use a different application for programming is a constant question when powerful applications such as SAS® Enterprise Guide® are available. This paper provides some important tips for a programmer, such as the best way to use the code window and how to take advantage of system-generated code in SAS Enterprise Guide 5.1. This paper also explains the differences between some of the functions and procedures in Base SAS and SAS Enterprise Guide. It highlights features in SAS Enterprise Guide such as process flow, data access management, and report automation, including formatting using XML tag sets.
Anjan Matlapudi, AmerihealthCaritas
Paper 1786-2014:
Tips to Use Character String Functions in Record Lookup
This paper gives you a better idea of how and where to use the record lookup functions to locate observations where a variable has some characteristic. Various related functions are illustrated to search numeric and character values in this process. Code is shown with time comparisons. I will discuss three possible ways to retrieve records using the SAS® DATA step, PROC SQL, and Perl regular expressions. Real and CPU time processing issues will be highlighted when comparing to retrieve records using these methods. Although the program is written for the PC using SAS® 9.2 in a Windows XP 32-bit environment, all the functions are applicable to any system. All the tools discussed are in Base SAS®. The typical attendee or reader will have some experience in SAS, but not a lot of experience dealing with large amount of data.
Anjan Matlapudi, Amerihealth Critas
Paper 1640-2014:
Tools of the SAS® Trade: A Centralized Macro-Based Reporting System
This paper introduces basic-to-advanced strategies and syntax, the tools of the SAS® trade, that enable client-quality PDF output to be delivered through a production system of macro programs. A variety of PROC REPORT output with proven client value serves to illustrate a discussion of the fundamental syntax used to create and share formats, macro programs, PROC REPORT output, inline styles, and style templates. The syntax is integrated into basic macro programs that demonstrate the the core functionality of the reporting system. Later sections of the paper describe in detail the macro programs used to start and end a PDF: (a) programs to save all current titles, footnotes, and option settings, establish standard titles, footnotes and option settings, and initially create the PDF document; and (b) programs to create a final standard data documentation page, end the PDF, and restore all original titles, footnotes, and option settings. The paper also shows how macro programs enable the setting of inline styles at the global, macro program, and macro program call-levels. The paper includes the style template syntax and the complete PROC REPORT syntax generated by the macro programs, and is designed for the intermediate to advanced SAS programmer using Foundation SAS® for Release 9.2 on a Windows operating system.
Patrick Thornton, SRI International
Paper 1561-2014:
Top 10 SQL Tricks in SAS®
One of the most striking features separating SAS® from other statistical languages is that SAS has native SQL (Structured Query Language) capacity. In addition to the merging or the querying that a SAS user commonly applies in daily practice, SQL significantly enhances the power of SAS in descriptive statistics and data management. In this paper, we show reproducible examples to introduce 10 useful tips for the SQL procedure in the BASE module.
Chao Huang, Oklahoma State University
Paper 1660-2014:
Trimmed_t: A SAS® Macro for the Trimmed T-Test
The independent means t-test is commonly used for testing the equality of two population means. However, this test is very sensitive to violations of the population normality and homogeneity of variance assumptions. In such situations, Yuen s (1974) trimmed t-test is recommended as a robust alternative. The purpose of this paper is to provide a SAS® macro that allows easy computation of Yuen s symmetric trimmed t-test. The macro output includes a table with trimmed means for each of two groups, Winsorized variance estimates, degrees of freedom, and obtained value of t (with two-tailed p-value). In addition, the results of a simulation study are presented and provide empirical comparisons of the Type I error rates and statistical power of the independent samples t-test, Satterthwaite s approximate t-test, and the trimmed t-test when the assumptions of normality and homogeneity of variance are violated.
Patricia Rodriguez de Gil, University of South Florida
Anh P. Kellermann, University of South Florida
Diep T. Nguyen, University of South Florida
Eun Sook Kim, University of South Florida
Jeffrey D. Kromrey, University of South Florida
Paper 1598-2014:
Turn Your SAS® Macros into Microsoft Excel Functions with the SAS® Integrated Object Model and ADO
As SAS® professionals, we often wish our clients would make more use of the many excellent SAS tools at their disposal. However, it remains an indisputable fact that for many business users, Microsoft Excel is still their go-to application when it comes to carrying out any form of data analysis. There have been many attempts to integrate SAS and Excel, but none of these has up to now been entirely seamless. This paper addresses that problem by showing how, with a minimum of VBA (Visual Basic for Applications) code and by using the SAS Integrated Object Model (IOM) together with Microsoft s ActiveX Data Objects (ADO), we can create an Excel User Defined Function (UDF) that can accept parameters, carry out all data manipulations in SAS, and return the result to the spreadsheet in a way that is completely invisible to the user. They can nest or link these functions together just as if they were native Excel functions. We then go on to demonstrate how, using the same techniques, we can create small Excel applications that can perform sophisticated data analyses in SAS while not forcing users out of their Excel comfort zones.
Chris Brooks, Melrose Analytics Ltd
Paper 1614-2014:
Tying It All Together: A Story of Size Optimization at DSW
As a retailer, your bottom line is determined by supply and demand. Are you supplying what your customer is demanding? Or do they have to go look somewhere else? Accurate allocation and size optimization mean your customer will find what they want more often. And that means more sales, higher profits, and fewer losses for your organization. In this session, Linda Canada will share how DSW went from static allocation models without size capability to precision allocation using intelligent, dynamic models that incorporate item plans and size optimization.
Linda Canada, DSW Inc.
Paper 1619-2014:
Understanding and Applying the Logic of the DOW-Loop
The DOW-loop is not official terminology that one can find in SAS® documentation, but it has been well known and widely used among experienced SAS programmers. The DOW-loop was developed over a decade ago by a few SAS gurus, including Don Henderson, Paul Dorfman, and Ian Whitlock. A common construction of the DOW-loop consists of a DO-UNTIL loop with a SET and a BY statement within the loop. This construction isolates actions that are performed before and after the loop from the action within the loop, which results in eliminating the need for retaining or resetting the newly created variables to missing in the DATA step. In this talk, in addition to explaining the DOW-loop construction, we review how to apply the DOW-loop to various applications.
Arthur Li, City of Hope
Paper 1624-2014:
Using Arrays for Epidemic Modeling in SAS®
Epidemic modeling is an increasingly important tool in the study of infectious diseases. As technology advances and more and more parameters and data are incorporated into models, it is easy for programs to get bogged down and become unacceptably slow. The use of arrays for importing real data and collecting generated model results in SAS® can help to streamline the process so results can be obtained and analyzed more efficiently. This paper describes a stochastic mathematical model for transmission of influenza among residents and healthcare workers in long-term care facilities (LTCFs) in New Mexico. The purpose of the model was to determine to what extent herd immunity among LTCF residents could be induced by varying the vaccine coverage among LTCF healthcare workers. Using arrays in SAS made it possible to efficiently incorporate real surveillance data into the model while also simplifying analyses of the results, which ultimately held important implications for LTCF policy and practice.
Carl Grafe, University of Utah
Paper 2037-2014:
Using Java to Harness the Power of SAS®
Are you a Java programmer who has been asked to work with SAS®, or a SAS programmer who has been asked to provide an interface to your IT colleagues? Let s face it, not a lot of Java programmers are heavy SAS users. If this is the case in your company, then you are in luck because SAS provides a couple of really slick features to allow Java programmers to access both SAS data and SAS programming from within a Java program. This paper walks beginner Java or SAS programmers through the simple task of accessing SASdata and SAS programs from a Java program. All that you need is a Java environment and access to a running SAS process, such as a SAS server. This SAS server can either be a SAS/SHARE® server or an IOM server. However, if you do not have either of these two servers that is okay; with the tools that are provided by SAS, you can start up a remote SAS session within Java and harness the power of SAS.
Jeremy Palbicki, Mayo Clinic
Paper 1667-2014:
Using PROC GPLOT and PROC REG Together to Make One Great Graph
Regression is a helpful statistical tool for showing relationships between two or more variables. However, many users can find the barrage of numbers at best unhelpful, and at worst undecipherable. Using the shipments and inventories historical data from the U.S. Census Bureau's office of Manufacturers' Shipments, Inventories, and Orders (M3), we can create a graphical representation of two time series with PROC GPLOT and map out reported and expected results. By combining this output with results from PROC REG, we are able to highlight problem areas that might need a second look. The resulting graph shows which dates have abnormal relationships between our two variables and presents the data in an easy-to-use format that even users unfamiliar with SAS® can interpret. This graph is ideal for analysts finding problematic areas such as outliers and trend-breakers or for managers to quickly discern complications and the effect they have on overall results.
William Zupko II, DHS
Paper 1882-2014:
Using PROC MCMC for Bayesian Item Response Modeling
The new Markov chain Monte Carlo (MCMC) procedure introduced in SAS/STAT® 9.2 and further exploited in SAS/STAT® 9.3 enables Bayesian computations to run efficiently with SAS®. The MCMC procedure allows one to carry out complex statistical modeling within Bayesian frameworks under a wide spectrum of scientific research; in psychometrics, for example, the estimation of item and ability parameters is a kind. This paper describes how to use PROC MCMC for Bayesian inferences of item and ability parameters under a variety of popular item response models. This paper also covers how the results from SAS PROC MCMC are different from or similar to the results from WinBUGS. For those who are interested in the Bayesian approach to item response modeling, it is exciting and beneficial to shift to SAS, based on its flexibility of data managements and its power of data analysis. Using the resulting item parameter estimates, one can continue to test form constructions, test equatings, etc., with all these test development processes being accomplished with SAS!
Yi-Fang Wu, Department of Educational Measurement and Statistics, Iowa Testing Programs, University of Iowa
Paper 2342-2014:
Using SAS® Graph Template Language with SAS® 9.3 Updates to Visualize Data When There is Too Much Data to Visualize
Developing a good graph with ODS statistical graphics becomes a challenge when the input data maps to crowded displays with overlapping points or lines. Such is the case with the Framingham Heart Study of 5209 subjects captured in the Sashelp.Heart data set, a series of 100 booking curves for the airline industry, and three interleaving series plots that capture closing stock values over a twenty year period for three giants in the computer industry. In this paper, transparency, layering, data point rounding, and color coding are evaluated for their effectiveness to add visual clarity to graphics output. SAS® Graph Template Language plotting statements (compatible with SAS® 9.2) that are referenced in this paper include HISTOGRAM, SCATTERPLOT, BANDPLOT, and SERIESPLOT, as well as the layout statements OVERLAY, DATAPANEL, LATTICE, and GRIDDED, which produce single or multiple-panel graphs. SAS Graph Template Language is chosen over ODS Graphics procedures because of its greater graphics capability. While the original version of the paper used SAS 9.2, the latest version incorporates SAS® 9.3 updates such as HEATMAPPARM for heat maps that add a third dimension to a graph via color, and the RANGEATTRMAP statement for grouping continuous data in a legend. If you have a license for SAS 9.3, you automatically have access to Graph Template Language. Since this is not a tutorial, you will get more out of this presentation if you have read introductory papers or Warren Kuhfeld s book Statistical Graphics in SAS®: An Introduction to the Graph Template Language and the Statistical Graphics Procedures .
Nate Derby, Nordstrom
Perry Watts, Stakana Analytics
Paper 1487-2014:
Using SAS® ODS Graphics
This presentation will teach the audience how to use SAS® ODS Graphics. Now part of Base SAS®, ODS Graphics is a great way to easily create clear graphics that enable any user to tell their story well. SGPLOT and SGPANEL are two of the procedures that can be used to produce powerful graphics that used to require a lot of work. The core of the procedures are explained, as well as the options available. Furthermore, we explore the ways to combine the individual statements to make more complex graphics that tell the story better. Any user of Base SAS on any platform will find great value from the SAS ODS Graphics procedures.
Chuck Kincaid, Experis Business Analytics
Paper 1785-2014:
Using SAS® Software to Shrink the Data Used in Apache Flex® Applications
This paper discusses the techniques I used at the Census Bureau to overcome the issue of dealing with large amounts of data while modernizing some of their public-facing web applications by using service oriented architecture (SOA) to deploy Flex web applications powered by SAS®. The paper covers techniques that resulted in reducing 142,293 XML lines (3.6 MB) down to 15,813 XML lines (1.8 MB), a 50% size reduction on the server side (HTTP Response), and 196,167 observations down to 283 observations, a reduction of 99.8% in summarized data on the client side (XML Lookup file).
Ahmed Al-Attar, AnA Data Warehousing Consulting, LLC
Paper 1481-2014:
Using SAS® Stored Processes To Build a Calibration Tool
In the past, calibration was done by using extremely complicated macros in Base SAS® to create a Microsoft Excel workbook with multiple linked spreadsheets. This process made it hard to audit, was not reliably replicable, and was open to user error. The task was to create a replicable, auditable, and locked down application that allowed the user to change certain parameters and see the impact of those changes without needing to code. SAS® Stored Processes are used to generate a screen that is split into three sections: one shows static reporting, the second is a data-driven custom input form, and the third shows test results. The initial screen uses a standard stored process that enables the user to select the model and time period. Macro variables are passed through to subset data. The Static reports are created from a stored process that executes two REPORT procedures that subset the data based on the passed parameters. The form is built using SAS® to generate HTML and is data driven. The Update button at the end of the form executes a stored process that collects the data that the user has entered into the form and updates a database. After the rates have been updated, they are used to generate test results using PROC REPORT.
Anita Measey, Bank of Montreal
Paper 1731-2014:
Using SAS® to Analyze the Impact of the Affordable Care Act
The Affordable Care Act that is being implemented now is expected to fundamentally reshape the health care industry. All current participants--providers, subscribers, and payers--will operate differently under a new set of key performance indicators (KPIs). This paper uses public data and SAS® software to establish a baseline for the health care industry today so that structural changes can be measured in the future to establish the impact of the new laws.
John Cohen, Advanced Data Concepts LLC
Meenal (Mona) Sinha, Independence Blue Cross
Paper 1707-2014:
Using SAS® to Examine Internal Consistency and to Develop Community Engagement Scores
Comprehensive cancer centers have been mandated to engage communities in their work; thus, measurement of community engagement is a priority area. Siteman Cancer Center s Program for the Elimination of Cancer Disparities (PECaD) projects seek to align with 11 Engagement Principles (EP) previously developed in the literature. Participants in a PECaD pilot project were administered a survey with questions on community engagement in order to evaluate how well the project aligns with the EPs. Internal consistency is examined using PROC CORR with the ALPHA option to calculate Cronbach s alpha for questions that relate to the same EP. This allows items that have a lack of internal consistency to be identified and to be edited or removed from the assessment. EP-specific scores are developed on quantity and quality scales. Lack of internal consistency was found for six of the 16 EP s examined items (alpha<.70). After editing the items, all EP question groups had strong internal consistency (alpha>.85). There was a significant positive correlation between quantity and quality scores (r=.918, P<.001). Average EP-specific scores ranged from 6.87 to 8.06; this suggests researchers adhered to the 11 EPs between sometime and most of the time on the quantity scale and between good and very good on the quality scale. Examining internal consistency is necessary to develop measures that accurately determine how well PECaD projects align with EPs. Using SAS® to determine internal consistency is an integral step in the development of community engagement scores.
Renee Gennarelli, Washington University
Melody Goodman, Washington University
Paper 1431-2014:
Using SAS® to Get More for Less
Especially in this current financial climate, many of us are being asked to do more with less. For several years, the Office of Institutional Research and Testing at Baylor University has been using SAS® software to increase the efficiency of the office and of the University as a whole. Reports that were once prepared manually have been automated. Data quality processes have been implemented in order to reduce the number of duplicate mailings. Predictive modeling is used to focus recruiting efforts on those prospective students most likely to respond. A web-based portal has been created to provide self-service report generation for many administrators across campus. Along with this, a number of data processing functions have been centralized, eliminating the need for additional programming skills and software support. This presentation discusses these improvements in more detail and provides examples of the end results.
Faron Kincheloe, Baylor University
Paper 1697-2014:
Using SAS® to Support the Implementation of a Patient-Centered Outcomes Research Institute Grant Funded by the Affordable Care Act
The Patient-Centered Outcomes Research Institute (PCORI) was created as part of the Affordable Care Act. PCORI is authorized by Congress to conduct research to provide information about the best available evidence to help patients and their health care providers make more informed decisions. Community Care Behavioral Health Organization in Pittsburgh, Pennsylvania was awarded a PCORI research grant to investigate health care system improvements for adults with serious mental illness. The grant, titled Optimizing Behavioral Health Homes by Focusing on Outcomes that Matter Most for Adults with Serious Mental Illness, began in January of 2013 and is ongoing. Information Technology staff at Community Care have leveraged SAS® solutions in providing real-time data extraction and reports to support the development and implementation of this research project. SAS tools have been used to merge data from multiple platforms and database sources, including web data sources. SAS has also enabled the formatting and traffic lighting of multiple Microsoft Excel data sets and files, in addition to the creation of many operational reports and data files needed for study implementation, administration, and maintenance. The challenges faced and the SAS solutions employed are the subject of this paper.
Michele Mesiano, Community Care Behavioral Health Organization
Meghna Parthasarathy, Community Care Behavioral Health Organization
Lauren Terhorst, Community Care Behavioral Health Organization
Paper 2025-2014:
Using Sorting Algorithms to Create Sorted Lists
When providing lengthy cost and utilization data to medical providers, it is ideal to sort the report by descending cost (or utilization) so that the important categories are at the top. This task can be easily solved using PROC SORT. However, when you need other variables (such as unit cost per procedure or national average) to follow the sort but not be sorted themselves, the solution is not as intuitive. This paper looks at several sorting algorithms to solve this problem. First, we look at the basic bubble sort (which is still effective for smaller data sets), which sets up arrays for each variable and then sorts on just one of them. Next, we discuss the quicksort algorithm, which is effective for large data sets, too. The results of the sorts provide sorted data that is easy to read and makes for effective analysis.
Matthew Neft, Highmark Inc.
Chelle Pronko, Highmark Inc.
Paper 1744-2014:
VFORMAT Lets SAS® Do the Format Searching
When reading data files or writing SAS® programs, we are often hunting for the right format or informat. There are so many to choose from! Does it seem like too many to search the manual? Let SAS help find the right one! We use the SAS dictionary table VFORMAT and a very small SAS program. This presentation demonstrates how two simple functions unlock the potential of this great resource: SASHELP.VFORMAT.
Peter Crawford, Crawford Software Consultancy Limited
Paper 1789-2014:
Visualizing Lake Michigan Wind with SAS® Software
The world's first wind resource assessment buoy, residing in Lake Michigan, uses a pulsing laser wind sensor to accurately measure wind speed, direction, and turbulence offshore up to wind turbine hub-height and across the blade span every second. Understanding wind behavior would be tedious and fatiguing with such large data sets. However, SAS/GRAPH® 9.4 helps the user grasp wind characteristics over time and at different altitudes by exploring the data visually. This paper covers graphical approaches to evaluate wind speed validity, seasonal wind speed variation, and storm systems to inform engineers on the candidacy of Lake Michigan offshore wind farms.
Aaron Clark, Grand Valley State University
Paper 1296-2014:
What's on My Mainframe? A Macro That Gives You a Solid Overview of Your SAS® Data on z/OS
In connection with the consolidation work at Nykredit, the data stored on the Nykredit z/OS SAS® installation had to be migrated (copied) to the new x64 Windows SAS platform storage. However, getting an overview of these data on the z/OS mainframe can be difficult, and a series of questions arise during the process. For example: Who is responsible? How many bytes? How many rows and columns? When were the data created? And so on. With extensive use of filename FTP and looping, and extracting metadata, it is possible to get an overview of the data on the host presented in a Microsoft Excel spreadsheet.
Jesper Michelsen, Nykredit
Paper 1440-2014:
What You're Missing About Missing Values
Do you know everything you need to know about missing values? Do you know how to assign a missing value to multiple variables with one statement? Can you display missing values as something other than . or blank? How many types of missing numeric values are there? This paper reviews techniques for assigning, displaying, referencing, and summarizing missing values for numeric variables and character variables.
Christopher Bost, MDRC
Paper 2023-2014:
Working with Character Data
The DATA step allows one to read, write, and manipulate many types of data. As data evolves to a more free-form state, the ability of SAS® to handle character data becomes increasingly important. This paper addresses character data from multiple vantage points. For example, what is the default length of a character string, and why does it appear to change under different circumstances? What type of formatting is available for character data? How can we examine and manipulate character data? The audience for this paper is beginner to intermediate, and the goal is to provide an introduction to the numerous character functions available in SAS, including the basic LENGTH and SUBSTR functions, plus many others.
Andrew Kuligowski, HSN
Swati Agarwal, Optum
Paper 1692-2014:
You Can Have It All: Building Cumulative Data sets
We receive a daily file with information about patients who use our drug. It s updated every day so that we have the most current information. Nearly every variable on a patient s record can be different from one day to the next. But what if you wanted to capture information that changed? For example, what if a patient switched doctors sometime along the way, and the original prescribing doctor is different than the patient's present doctor? With this type of daily file, that information is lost. To avoid losing these changes, you have to build a cumulative data set. I ll show you how to build it.
Myra Oltsik, Acorda Therapeutics
back to top