SAS Global Forum 2015 Proceedings

A B C D E F G H I K L M N O P Q R S T U V W Y

A

Paper 3184-2015:

A Configurable SAS^® Framework for Managing a Reporting System Based on SAS^® OLAP Cube Studio

This paper illustrates a high-level infrastructure discussion with some explanation of the SAS^® codes used to implement a configurable batch framework for managing and updating the data rows and row-level permissions in SAS^® OLAP Cube Studio. The framework contains a collection of reusable, parameter-driven Base SAS^® macros, Base SAS custom programs, and UNIX or LINUX shell scripts. This collection manages the typical steps and processes used for manipulating SAS files and for executing SAS statements. The Base SAS macro collection contains a group of utility macros that includes: concurrent /parallel processing macros, SAS^® Metadata Repository macros, SAS^® Scalable Performance Data Engine table macros, table lookup macros, table manipulation macros, and other macros. There is also a group of OLAP-related macros that includes OLAP utility macros and OLAP permission table processing macros.

Read the paper (PDF).

Ahmed Al-Attar, AnA Data Warehousing Consulting, LLC

Paper 3257-2015:

A Genetic Algorithm for Data Reduction

When large amounts of data are available, choosing the variables for inclusion in model building can be problematic. In this analysis, a subset of variables was required from a larger set. This subset was to be used in a later cluster analysis with the aim of extracting dimensions of human flourishing. A genetic algorithm (GA), written in SAS^®, was used to select the subset of variables from a larger set in terms of their association with the dependent variable life satisfaction. Life satisfaction was selected as a proxy for an as yet undefined quantity, human flourishing. The data were divided into subject areas (health, environment). The GA was applied separately to each subject area to ensure adequate representation from each in the future analysis when defining the human flourishing dimensions.

Read the paper (PDF). | Download the data file (ZIP).

Lisa Henley, University of Canterbury

Paper 3103-2015:

A Macro for Computing the Best Transformation

This session is intended to assist analysts in generating the best variables, such as monthly amount paid, daily number of received customer service calls, weekly worked hours on a project, or annual number total sales for a specific product, by using simple arithmetic operators (square root, log, loglog, exp, and rcp). During a statistical data modeling process, analysts are often confronted with the task of computing derived variables using the existing variables. The advantage of this methodology is that the new variables might be more significant than the original ones. This paper provides a new way to compute all the possible variables using a set of math transformations. The code includes many SAS^® features that are very useful tools for SAS programmers to incorporate in their future code such as %SYSFUNC, SQL, %INCLUDE, CALL SYMPUT, %MACRO, SORT, CONTENTS, MERGE, MACRO _NULL_, as well as %DO &%TO & and many more

Read the paper (PDF). | Download the data file (ZIP).

Nancy Hu, Discover

Paper 3422-2015:

A Macro to Easily Generate a Calendar Report

This paper introduces a macro that generates a calendar report in two different formats. The first format displays the entire month in one plot, which is called a month-by-month calendar report. The second format displays the entire month in one row and is called an all-in-one calendar report. To use the macro, you just need to prepare a simple data set that has three columns: one column identifies the ID, one column contains the date, and one column specifies the notes for the dates. On the generated calendar reports, you can include notes and add different styles to certain dates. Also, the macro provides the option for you to decide whether those months in your data set that do not contain data should be shown on the reports.

Read the paper (PDF).

Ting Sa, Cincinnati Children's Hospital Medical Center

Paper 3218-2015:

A Mathematical Model for Optimizing Product Mix and Customer Lifetime Value

Companies that offer subscription-based services (such as telecom and electric utilities) must evaluate the tradeoff between month-to-month (MTM) customers, who yield a high margin at the expense of lower lifetime, and customers who commit to a longer-term contract in return for a lower price. The objective, of course, is to maximize the Customer Lifetime Value (CLV). This tradeoff must be evaluated not only at the time of customer acquisition, but throughout the customer's tenure, particularly for fixed-term contract customers whose contract is due for renewal. In this paper, we present a mathematical model that optimizes the CLV against this tradeoff between margin and lifetime. The model is presented in the context of a cohort of existing customers, some of whom are MTM customers and others who are approaching contract expiration. The model optimizes the number of MTM customers to be swapped to fixed-term contracts, as well as the number of contract renewals that should be pursued, at various term lengths and price points, over a period of time. We estimate customer life using discrete-time survival models with time varying covariates related to contract expiration and product changes. Thereafter, an optimization model is used to find the optimal trade-off between margin and customer lifetime. Although we specifically present the contract expiration case, this model can easily be adapted for customer acquisition scenarios as well.

Read the paper (PDF). | Download the data file (ZIP).

Atul Thatte, TXU Energy

Goutam Chakraborty, Oklahoma State University

Paper 3314-2015:

A Metadata to Physical Table Comparison Tool

When promoting metadata in large packages from SAS^® Data Integration Studio between environments, metadata and the underlying physical data can become out of sync. This can result in metadata items that cannot be opened by users because SAS^® has thrown an error. It often falls to the SAS administrator to resolve the synchronization issues when they might not have been responsible for promoting the metadata items in the first place. In this paper, we will discuss a simple macro that can be used to compare the table metadata to that of the physical tables, and any anomalies will be noted.

Read the paper (PDF). | Download the data file (ZIP).

David Moors, Whitehound Limited

Paper 2641-2015:

A New Method of Using Polytomous Independent Variables with Many Levels for the Binary Outcome of Big Data Analysis

In big data, many variables are polytomous with many levels. The common method to deal with polytomous independent variables is to use a series of design variables, which correspond to the option class or by in the polytomous independent variable in PROC LOGISTIC, if the outcome is binary. If big data has many polytomous independent variables with many levels, using design variables makes the analysis processing very complicated in both computation time and result, which might provide little help on the prediction of outcome. This paper presents a new simple method for logistic regression with polytomous independent variables in big data analysis when analysis of big data is required. In the proposed method, the first step is to conduct an iteration statistical analysis from a SAS^® macro program. Similar to an algorithm in the creation of spline variables, this analysis searches for the proper aggregation groups with a statistical significant difference from all levels in a polytomous independent variable. In the SAS macro program for an iteration, processing of searching new level groups with statistical significant differences has been developed. The first is from level 1 with the smallest value of the outcome means. Then we can conduct a statistical test for the level 1 group with the level 2 group with the second smallest value of outcome mean. If these two groups have a statistical significant difference, we can start to test the level 2 group with the level 3 group. If level 1 and level 2 do not have a statistical significant difference, we can combine them into a new level group 1. Then we are going to test the new level group 1 with level 3. The processing continues until all the levels have been tested. Then we can replace the original level values of the polytomous variable by the new level values with the statistical significant difference. In this situation, the polytomous variable with new levels can be described by these means of all new levels because of the 1 to 1 equivalence relationship of a piecewise function in logit from the polytomous's levels to outcome means. It is very easy to approve that the conditional mean of an outcome y given a polytomous variable x is a very good approximation based on the maximum likelihood analysis. Compared with design variables, the new piecewise variable based on the information of all levels as a single independent variable can capture the impact of all levels in a much simpler way. We have used this method in the predictive models of customer attrition on the polytomous variables: state, business type, customer claim type, and so on. All of these polytomous variables show significant improvement on the prediction of customer attrition than without using them or using design variables in the model development.

Read the paper (PDF).

jian gao, constant contact

jesse harriot, constant contact

lisa Pimentel, constant contact

Paper 1980-2015:

A Practical Guide to SAS^® Extended Attributes

All SAS^® data sets and variables have standard attributes. These include items such as creation date, engine, compression and sort information for data sets, and format and length information for variables. However, for the first time in SAS 9.4, the developer can add their own customized attributes to both data sets and variables. This paper shows how these extended attributes can be created, modified, and maintained. It suggests the sort of items that might be candidates for use as extended attributes and explains in what circumstances they can be used. It also provides a worked example of how they can be used to inform and aid the SAS programmer in creating SAS applications.

Read the paper (PDF).

Chris Brooks, Melrose Analytics Ltd

Paper 3465-2015:

A Quick View of SAS^® Views^®

Looking for a handy technique to have in your toolkit? Consider SAS^® Views^®, especially if you work with large data sets. After a brief introduction to SAS Views, I'll show you several cool ways to use them that will streamline your code and save workspace.

Read the paper (PDF).

Elizabeth Axelrod, Abt Associates Inc.

Paper 3560-2015:

A SAS Macro to Calculate the PDC Adjustment of Inpatient Stays

The Centers for Medicare & Medicaid Services (CMS) uses the Proportion of Days Covered (PDC) to measure medication adherence. There is also some PDC-related research based on Medicare Part D Event (PDE) Data. However, Under Medicare rules, beneficiaries who receive care at an Inpatient (IP) [facility] may receive Medicare covered medications directly from the IP, rather than by filling prescriptions through their Part D contracts; thus, their medication fills during an IP stay would not be included in the PDE claims used to calculate the Patient Safety adherence measures. (Medicare 2014 Part C&D star rating technical notes). Therefore, the previous PDC calculation method underestimated the true PDC value. Starting with 2013 Star rating, PDC calculation was adjusted with IP stays. This is, when a patient has an inpatient admission during the measurement period, the inpatient stays are censored for the PDC calculation. If the patient also has measured drug coverage during the inpatient stay, the drug supplied during inpatient stay will be shifted after the inpatient stay. This shifting also causes a chain of shifting. This paper presents a SAS R Macro using the SAS Hash Object to match inpatient stays, censoring the inpatient stays, shifting the drug starting and ending dates, and calculating the adjusted PDC.

Read the paper (PDF). | Download the data file (ZIP).

anping Chang, IHRC Inc.

Paper 2141-2015:

A SAS^® Macro to Compare Predictive Values of Diagnostic Tests

Medical tests are used for various purposes including diagnosis, prognosis, risk assessment and screening. Statistical methodology is used often to evaluate such types of tests, most frequent measures used for binary data being sensitivity, specificity, positive and negative predictive values. An important goal in diagnostic medicine research is to estimate and compare the accuracies of such tests. In this paper I give a gentle introduction to measures of diagnostic test accuracy and introduce a SAS^® macro to calculate generalized score statistic and weighted generalized score statistic for comparison of predictive values using formula's generalized and proposed by Andrzej S. Kosinski.

Read the paper (PDF).

Lovedeep Gondara, University of Illinois Springfield

Paper 2980-2015:

A Set of SAS^® Macros for Generating Survival Analysis Reports for Lifetime Data with or without Competing Risks

The paper introduces users to how they can use a set of SAS^® macros, %LIFETEST and %LIFETESTEXPORT, to generate survival analysis reports for data with or without competing risks. The macros provide a wrapper of PROC LIFETEST and an enhanced version of the SAS autocall macro %CIF to give users an easy-to-use interface to report both survival estimates and cumulative incidence estimates in a unified way. The macros also provide a number of parameters to enable users to flexibly adjust how the final reports should look without the need to manually input or format the final reports.

Read the paper (PDF). | Download the data file (ZIP).

Zhen-Huan Hu, Medical College of Wisconsin

Paper 3194-2015:

A Tool That Uses the SAS^® PRX Functions to Fix Delimited Text Files

Delimited text files are often plagued by appended and/or truncated records. Writing customized SAS^® code to import such a text file and break out into fields can be challenging. If only there was a way to fix the file before importing it. Enter the file_fixing_tool, a SAS^® Enterprise Guide^® project that uses the SAS PRX functions to import, fix, and export a delimited text file. This fixed file can then be easily imported and broken out into fields.

Read the paper (PDF). | Download the data file (ZIP).

Paul Genovesi, Henry Jackson Foundation for the Advancement of Military Medicine, Inc.

Paper 3197-2015:

All Payer Claims Databases (APCDs) in Data Transparency and Quality Improvement

Since Maine established the first All Payer Claims Database (APCD) in 2003, 10 additional states have established APCDs and 30 others are in development or show strong interest in establishing APCDs. APCDs are generally mandated by legislation, though voluntary efforts exist. They are administered through various agencies, including state health departments or other governmental agencies and private not-for-profit organizations. APCDs receive funding from various sources, including legislative appropriations and private foundations. To ensure sustainability, APCDs must also consider the sale of data access and reports as a source of revenue. With the advent of the Affordable Care Act, there has been an increased interest in APCDs as a data source to aid in health care reform. The call for greater transparency in health care pricing and quality, development of Patient-Centered Medical Homes (PCMHs) and Accountable Care Organizations (ACOs), expansion of state Medicaid programs, and establishment of health insurance and health information exchanges have increased the demand for the type of administrative claims data contained in an APCD. Data collection, management, analysis, and reporting issues are examined with examples from implementations of live APCDs. Developing data intake, processing, warehousing, and reporting standards are discussed in light of achieving the triple aim of improving the individual experience of care; improving the health of populations; and reducing the per capita costs of care. APCDs are compared and contrasted with other sources of state-level health care data, including hospital discharge databases, state departments of insurance records, and institutional and consumer surveys. The benefits and limitations of administrative claims data are reviewed. Specific issues addressed with examples include implementing transparent reporting of service prices and provider quality, maintaining master patient and provider identifiers, validating APCD data and comparison with o ther state health care data available to researchers and consumers, defining data suppression rules to ensure patient confidentiality and HIPAA-compliant data release and reporting, and serving multiple end users, including policy makers, researchers, and consumers with appropriately consumable information.

Read the paper (PDF). | Watch the recording.

Paul LaBrec, 3M Health Information Systems

Paper 3140-2015:

An Application of the Cox Proportional Hazards Model to the Construction of Objective Vintages for Credit in Financial Institutions, Using PROC PHREG

In Scotia - Colpatria Bank, the retail segment is very important. The quantity of lending applications makes it necessary to use statistical models and analytic tools in order to do an initial selection of good customers, who our credit analyst will study in depth to finally approve or deny a credit application. The construction of target vintages using the Cox model will generate past-due alerts in a shorter time, so the mitigation measures can be applied one or two months earlier than currently. This can reduce the losses by 100 bps in the new vintages. This paper makes the estimation of a proportional hazard model of Cox and compares the results with a logit model for a specific product of the bank. Additionally, we will estimate the objective vintage for the product.

Read the paper (PDF). | Download the data file (ZIP).

Ivan Atehortua Rojas, Scotia - Colpatria Bank

Paper 3439-2015:

An Innovative Method of Customer Clustering

This session will describe an innovative way to identify groupings of customer offerings using SAS^® software. The authors investigated the customer enrollments in nine different programs offered by a large energy utility. These programs included levelized billing plans, electronic payment options, renewable energy, energy efficiency programs, a home protection plan, and a home energy report for managing usage. Of the 640,788 residential customers, 374,441 had been solicited for a program and had adequate data for analysis. Nearly half of these eligible customers (49.8%) enrolled in some type of program. To examine the commonality among programs based on characteristics of customers who enroll, cluster analysis procedures and correlation matrices are often used. However, the value of these procedures was greatly limited by the binary nature of enrollments (enroll or no enroll), as well as the fact that some programs are mutually exclusive (limiting cross-enrollments for correlation measures). To overcome these limitations, PROC LOGISTIC was used to generate predicted scores for each customer for a given program. Then, using the same predictor variables, PROC LOGISTIC was used on each program to generate predictive scores for all customers. This provided a broad range of scores for each program, under the assumption that customers who are likely to join similar programs would have similar predicted scores for these programs. PROC FASTCLUS was used to build k-means cluster models based on these predicted logistic scores. Two distinct clusters were identified from the nine programs. These clusters not only aligned with the hypothesized model, but were generally supported by correlations (using PROC CORR) among program predicted scores as well as program enrollments.

Read the paper (PDF).

Brian Borchers, PhD, Direct Options

Ashlie Ossege, Direct Options

Paper SAS1836-2015:

An Insider's Guide to ODS LAYOUT Using SAS^® 9.4

Whenever you travel, whether it's to a new destination or to your favorite vacation spot, it's nice to have a guide to assist you with planning and setting expectations. The ODS LAYOUT statement became production in SAS^® 9.4. For those intrepid programmers who used ODS LAYOUT in an earlier release of SAS^®, this paper contains tons of information about changes you need to know about. Programmers new to SAS 9.4 (or new to ODS LAYOUT) need to understand the basics. This paper reviews some common issues customers have reported to SAS Technical Support when migrating to the LAYOUT destination in SAS 9.4 and explores the basics for those who are making their first foray into the adventure that is ODS LAYOUT. This paper discusses some tips and tricks to ensure that your trip through the ODS LAYOUT statement will be a fun and rewarding one.

Read the paper (PDF).

Scott Huntley, SAS

Paper 2123-2015:

Anatomy of a Merge Gone Wrong

The merge is one of the SAS^® programmer's most commonly used tools. However, it can be fraught with pitfalls to the unwary user. In this paper, we look under the hood of the DATA step and examine how the program data vector works. We see what's really happening when data sets are merged and how to avoid subtle problems.

Read the paper (PDF).

Joshua Horstman, Nested Loop Consulting

Paper 2220-2015:

Are You a Control Freak? Control Your SAS^® Programs - Don't Let Them Control You!

You know that you want to control the process flow of your program. When your program is executed multiple times, with slight variations, you will need to control the changes from iteration to iteration, the timing of the execution, and the maintenance of output and logs. Unfortunately, in order to achieve the control that you know that you need to have, you will need to make frequent, possibly time-consuming and potentially error-prone, manual corrections and edits to your program. Fortunately, the control you seek is available and it does not require the use of time-intensive manual techniques. List processing techniques are available that give you control and peace of mind and enable you to be a successful control freak. These techniques are not new, but there is often hesitancy on the part of some programmers to take full advantage of them. This paper reviews these techniques and demonstrates them through a series of examples.

Read the paper (PDF).

Mary Rosenbloom, Edwards Lifesciences, LLC

Art Carpenter, California Occidental Consultants

Paper SAS1803-2015:

Ask Vince: Moving SAS^® Data and Analytical Results to Microsoft Excel

This presentation is an open-ended discussion about techniques for transferring data and analytical results from SAS^® to Microsoft Excel. There are some introductory comments, but this presentation does not have any set content. Instead, the topics discussed are dictated by attendee questions. Come prepared to ask and get answers to your questions. To submit your questions or suggestions for discussion in advance, go to http://support.sas.com/surveys/askvince.html.

Vince DelGobbo, SAS

Paper 3401-2015:

Assessing the Impact of Communication Channel on Behavior Changes in Energy Efficiency

With the increase in government and commissions incentivizing electric utilities to get consumers to save energy, there has been a large increase in the number of energy saving programs. Some are structural, incentivizing consumers to make improvements to their home that result in energy savings. Some, called behavioral programs, are designed to get consumers to change their behavior to save energy. Within behavioral programs, Home Energy Reports are a good method to achieve behavioral savings as well as to educate consumers on structural energy savings. This paper examines the different Home Energy Report communication channels (direct mail and e-mail) and the marketing channel effect on energy savings, using SAS^® for linear models. For consumer behavioral change, we often hear the questions: 1) Are the people that responded via direct mail solicitation saving at a higher rate than people who responded via an e-mail solicitation? 1a) Hypothesis: Because e-mail is easy to respond to, the type of customers that enroll through this channel will exert less effort for the behavior changes that require more time and investment toward energy efficiency changes and thus will save less. 2) Does the mode of that ongoing dialog (mail versus e-mail) impact the amount of consumer savings? 2a) Hypothesis: E-mail is more likely to be ignored and thus these recipients will save less. As savings is most often calculated by comparing the treatment group to a control group (to account for weather and economic impact over time), and by definition you cannot have a dialog with a control group, the answers are not a simple PROC FREQ away. Also, people who responded to mail look very different demographically than people who responded to e-mail. So, is the driver of savings differences the channel, or is it the demographics of the customers that happen to use those chosen channels? This study used clustering (PROC FASTCLUS) to segment the consumers by mail versus e-mail and append cluster assignment s to the respective control group. This study also used DID (Difference-in-Differences) as well as Billing Analysis (PROC GLM) to calculate the savings of these groups.

Read the paper (PDF).

Angela Wells, Direct Options

Ashlie Ossege, Direct Options

Paper 3327-2015:

Automated Macros to Extract Data from the National (Nationwide) Inpatient Sample (NIS)

The use of administrative databases for understanding practice patterns in the real world has become increasingly apparent. This is essential in the current health-care environment. The Affordable Care Act has helped us to better understand the current use of technology and different approaches to surgery. This paper describes a method for extracting specific information about surgical procedures from the Healthcare Cost and Utilization Project (HCUP) database (also referred to as the National (Nationwide) Inpatient Sample (NIS)).The analyses provide a framework for comparing the different modalities of surgerical procedures of interest. Using an NIS database for a single year, we want to identify cohorts based on surgical approach. We do this by identifying the ICD-9 codes specific to robotic surgery, laparoscopic surgery, and open surgery. After we identify the appropriate codes using an ARRAY statement, a similar array is created based on the ICD-9 codes. Any minimally invasive procedure (robotic or laparoscopic) that results in a conversion is flagged as a conversion. Comorbidities are identified by ICD-9 codes representing the severity of each subject and merged with the NIS inpatient core file. Using a FORMAT statement for all diagnosis variables, we create macros that can be regenerated for each type of complication. These created macros are compiled in SAS^® and stored in the library that contains the four macros that are called by tables. They call the macros for different macros variables. In addition, they create the frequencies of all cohorts and create the table structure with the title and number of the table. This paper describes a systematic method in SAS/STAT^® 9.2 to extract the data from NIS using the ARRAY statement for the specific ICD-9 codes, to format the extracted data for the analysis, to merge the different NIS databases by procedures, and to use automatic macros to generate the report.

Read the paper (PDF).

Ravi Tejeshwar Reddy Gaddameedi, California State University,Eastbay

Usha Kreaden, Intuitive Surgical

Paper SAS1887-2015:

Automating a SAS^® 9.4 Installation without Using Provisioning Software: A Case Study Involving the Setup of Machines for SAS Regional Users Groups

Whether you manage computer systems in a small-to-medium environment (for example, in labs, workshops, or corporate training groups) or in a large-scale deployment, the ability to automate SAS^® 9.4 installations is important to the efficiency and success of your software deployments. For large-scale deployments, you can automate the installation process by using third-party provisioning software such as Microsoft System Center Configuration Manager (SCCM) or Symantec Altiris. But what if you have a small-to-medium environment and you do not have provisioning software to package deployment jobs? No worries! There is a solution. This paper presents a case study of just such a situation where a process was developed for SAS regional users groups (RUGs). Along with the case study, the paper offers a process for automating SAS 9.4 installations in workshop, lab, and corporate training (small-to-medium sized) environments. This process incorporates the new -srwonly option with the SAS^® Deployment Wizard, deployment-wizard commands that use response files, and batch-file implementation. This combination results in easy automation of an installation, even without provisioning software.

Read the paper (PDF).

Max Blake, SAS

Paper 3050-2015:

Automation of Statistics Summary for Census Data in SAS^®

Census data, such as education and income, has been extensively used for various purposes. The data is usually collected in percentages of census unit levels, based on the population sample. Such presentation of the data makes it hard to interpret and compare. A more convenient way of presenting the data is to use the geocoded percentage to produce counts for a pseudo-population. We developed a very flexible SAS^® macro to automatically generate the descriptive summary tables for the census data as well as to conduct statistical tests to compare the different levels of the variable by groups. The SAS macro is not only useful for census data but can be used to generate summary tables for any data with percentages in multiple categories.

Read the paper (PDF).

Janet Lee, Kaiser Permanente Southern California

B

Paper 3466-2015:

Bare-bones SAS^® Hash Objects

We have to pull data from several data files in creating our working databases. The simplest use of SAS^® hash objects greatly reduces the time required to draw data from many sources when compared to the use of multiple proc sorts and merges.

Read the paper (PDF). | Download the data file (ZIP).

Andrew Dagis, City of Hope

Paper 3120-2015:

"BatchStats": SAS^® Batch Statistics, A Click Away!

Over the years, the SAS^® Business Intelligence platform has proved its importance in this big data world with its suite of applications that enable us to efficiently process, analyze, and transform huge amounts of business data. Within the data warehouse universe, 'batch execution' sits in the heart of SAS Data Integration technologies. On a day-to-day basis, batches run, and the current status of the batch is generally sent out to the team or to the client as a 'static' e-mail or as a report. From experience, we know that they don't provide much insight into the real 'bits and bytes' of a batch run. Imagine if the status of the running batch is automatically captured in one central repository and is presented on a beautiful web browser on your computer or on your iPad. All this can be achieved without asking anybody to send reports and with all 'post-batch' queries being answered automatically with a click. This paper aims to answer the same with a framework that is designed specifically to automate the reporting aspects of SAS batches and, yes, it is all about collecting statistics of the batch, and we call it - 'BatchStats.'

Prajwal Shetty, Tesco HSC

Paper SAS1501-2015:

Best Practices for Configuring Your I/O Subsystem for SAS^®9 Applications

The power of SAS^®9 applications allows information and knowledge creation from very large amounts of data. Analysis that used to consist of 10s-100s of gigabytes (GBs) of supporting data has rapidly grown into the 10s to 100s of terabytes (TBs). This data expansion has resulted in more and larger SAS data stores. Setting up file systems to support these large volumes of data with adequate performance, as well as ensuring adequate storage space for the SAS^® temporary files, can be very challenging. Technology advancements in storage and system virtualization, flash storage, and hybrid storage management require continual updating of best practices to configure I/O subsystems. This paper presents updated best practices for configuring the I/O subsystem for your SAS^®9 applications, ensuring adequate capacity, bandwidth, and performance for your SAS^®9 workloads. We have found that very few storage systems work ideally with SAS with their out-of-the-box settings, so it is important to convey these general guidelines.

Read the paper (PDF).

Tony Brown, SAS

Margaret Crevar, SAS

Paper SAS1801-2015:

Best Practices for Upgrading from SAS^® 9.1.3 to SAS^® 9.4

We regularly speak with organizations running established SAS^® 9.1.3 systems that have not yet upgraded to a later version of SAS^®. Often this is because their current SAS 9.1.3 environment is working fine, and no compelling event to upgrade has materialized. Now that SAS 9.1.3 has moved to a lower level of support and some very exciting technologies (Hadoop, cloud, ever-better scalability) are more accessible than ever using SAS^® 9.4, the case for migrating from SAS 9.1.3 is strong. Upgrading a large SAS ecosystem with multiple environments, an active development stream, and a busy production environment can seem daunting. This paper aims to demystify the process, suggesting outline migration approaches for a variety of the most common scenarios in SAS 9.1.3 to SAS 9.4 upgrades, and a scalable template project plan that has been proven at a range of organizations.

Read the paper (PDF).

David Stern, SAS

Paper 3162-2015:

Best Practices: Subset without Getting Upset

You've worked for weeks or even months to produce an analysis suite for a project. Then, at the last moment, someone wants a subgroup analysis, and they inform you that they need it yesterday. This should be easy to do, right? So often, the programs that we write fall apart when we use them on subsets of the original data. This paper takes a look at some of the best practice techniques that can be built into a program at the beginning, so that users can subset on the fly without losing categories or creating errors in statistical tests. We review techniques for creating tables and corresponding titles with BY-group processing so that minimal code needs to be modified when more groups are created. And we provide a link to sample code and sample data that can be used to get started with this process.

Read the paper (PDF).

Mary Rosenbloom, Edwards Lifesciences, LLC

Kirk Paul Lafler, Software Intelligence Corporation

Paper 3458-2015:

Better Metadata Through SAS^®: %SYSFUNC, PROC DATASETS, and Dictionary Tables

SAS^® provides a wealth of resources for creating useful, attractive metadata tables, including PROC CONTENTS listing output (to ODS destinations), the PROC CONTENTS OUT= SAS data set, and PROC CONTENTS ODS Output Objects. This paper and presentation explores some less well-known resources to create metadata such as %SYSFUNC, PROC DATASETS, and Dictionary Tables. All these options in conjunction with the use of the ExcelXP tagset (and, new in the second maintenance release for SAS^® 9.4, the Excel tagset) enable the creation of multi-tab metadata workbooks at the click of a mouse.

Read the paper (PDF).

Louise Hadden, Abt Associates Inc.

Paper 3410-2015:

Building Credit Modeling Dashboards

Dashboards are an effective tool for analyzing and summarizing the large volumes of data required to manage loan portfolios. Effective dashboards must highlight the most critical drivers of risk and performance within the portfolios and must be easy to use and implement. Developing dashboards often require integrating data, analysis, or tools from different software platforms into a single, easy-to-use environment. FI Consulting has developed a Credit Modeling Dashboard in Microsoft Access that integrates complex models based on SAS into an easy-to-use, point-and-click interface. The dashboard integrates, prepares, and executes back-end models based on SAS using command-line programming in Microsoft Access with Visual Basic for Applications (VBA). The Credit Modeling Dashboard developed by FI Consulting represents a simple and effective way to supply critical business intelligence in an integrated, easy-to-use platform without requiring investment in new software or to rebuild existing SAS tools already in use.

Read the paper (PDF).

Jeremy D'Antoni, FI Consulting

Paper 2988-2015:

Building a Template from the Ground Up with Graph Template Language

This presentation focuses on building a graph template in an easy-to-follow, step-by-step manner. The presentation begins with using Graph Template Language to re-create a simple series plot, and then moves on to include a secondary y-axis as well as multiple overlaid block plots to tell a more complex and complete story than would be possible using only the SGPLOT procedure.

Read the paper (PDF).

Jed Teres, Verizon Wireless

Paper SAS1824-2015:

Bust Open That ETL Black Box and Apply Proven Techniques to Successfully Modernize Data Integration

So you are still writing SAS^® DATA steps and SAS macros and running them through a command-line scheduler. When work comes in, there is only one person who knows that code, and they are out--what to do? This paper shows how SAS applies extract, transform, load (ETL) modernization techniques with SAS^® Data Integration Studio to gain resource efficiencies and to break down the ETL black box. We are going to share the fundamentals (metadata foldering and naming standards) that ensure success, along with steps to ease into the pool while iteratively gaining benefits. Benefits include self-documenting code visualization, impact analysis on jobs and tables impacted by change, and being supportable by interchangeable bench resources. We conclude with demonstrating how SAS^® Visual Analytics is being used to monitor service-level agreements and provide actionable insights into job-flow performance and scheduling.

Read the paper (PDF).

Brandon Kirk, SAS

C

Paper 3224-2015:

CHARACTER DATA: Acquisition, Manipulation, and Analysis

The DATA step enables you to read, write, and manipulate many types of data. As data evolves to a more free-form state, the ability of SAS^® to handle character data becomes increasingly important. This presentation, expanded and enhanced from an earlier version, addresses character data from multiple vantage points. For example, what is the default length of a character string, and why does it appear to change under different circumstances? Special emphasis is given to the myriad functions that can facilitate the processing and manipulation of character data. This paper is targeted at a beginning to intermediate audience.

Read the paper (PDF).

Andrew Kuligowski, HSN

Swati Agarwal, OptumInsight

Paper 2080-2015:

Calculate Decision Consistency Statistics for a Single Administration of a Test

Many certification programs classify candidates into performance levels. For example, the SAS^® Certified Base Programmer breaks down candidates into two performance levels: Pass and Fail. It is important to note that because all test scores contain measurement error, the performance level categorizations based on those test scores are also subject to measurement error. An important part of psychometric analysis is to estimate the decision consistency of the classifications. This study helps fill a gap in estimating decision consistency statistics for a single administration of a test using SAS.

Read the paper (PDF).

Fan Yang, The University of Iowa

Yi Song, University of Illinois at Chicago

Paper 3210-2015:

"Can You Read This into SAS^® for Me?" Using INFILE and INPUT to Load Data into SAS

With all the talk of 'big data' and 'visual analytics' we sometimes forget how important it is, and often how hard it is, to get external data into SAS^®. In this paper, we review some common data sources such as delimited sources (for example, CSV), as well as structured flat files, and the programming steps needed to successfully load these files into SAS. In addition to examining the INFILE and INPUT statements, we look at some methods for dealing with bad data. This paper assumes only basic SAS skills, although the topic can be of interest to anyone who needs to read external files.

Read the paper (PDF).

Peter Eberhardt, Fernwood Consulting Group Inc.

Audrey Yeo, Athlene

Paper 3367-2015:

Colon(:)izing My Programs

There is a plethora of uses of the colon (:) in SAS^® programming. The colon is used as a data or variable name wild-card, a macro variable creator, an operator modifier, and so forth. The colon helps you write clear, concise, and compact code. The main objective of this paper is to encourage the effective use of the colon in writing crisp code. This paper presents real-time applications of the colon in day-to-day programming. In addition, this paper presents cases where the colon limits programmers' wishes.

Read the paper (PDF).

Jinson Erinjeri, EMMES Corporation

Paper 1424-2015:

Competing Risk Survival Analysis Using SAS^®: When, Why, and How

Competing risk arise in time to event data when the event of interest cannot be observed because of a preceding event i.e. a competing event occurring before. An example can be of an event of interest being a specific cause of death where death from any other cause can be termed as a competing event, if focusing on relapse, death before relapse would constitute a competing event. It is well studied and pointed out that in presence of competing risks, the standard product limit methods yield biased results due to violation of their basic assumption. The effect of competing events on parameter estimation depends on their distribution and frequency. Fine and Gray's sub-distribution hazard model can be used in presence of competing events which is available in PROC PHREG with the release of version 9.4 of SAS^® software.

Read the paper (PDF).

Lovedeep Gondara, University of Illinois Springfield

Paper 1442-2015:

Confirmatory Factor Analysis Using PROC CALIS: A Practical Guide for Survey Researchers

Survey research can provide a straightforward and effective means of collecting input on a range of topics. Survey researchers often like to group similar survey items into construct domains in order to make generalizations about a particular area of interest. Confirmatory Factor Analysis is used to test whether this pre-existing theoretical model underlies a particular set of responses to survey questions. Based on Structural Equation Modeling (SEM), Confirmatory Factor Analysis provides the survey researcher with a means to evaluate how well the actual survey response data fits within the a priori model specified by subject matter experts. PROC CALIS now provides survey researchers the ability to perform Confirmatory Factor Analysis using SAS^®. This paper provides a survey researcher with the steps needed to complete Confirmatory Factor Analysis using SAS. We discuss and demonstrate the options available to survey researchers in the handling of missing and not applicable survey responses using an ARRAY statement within a DATA step and imputation of item non-response. A simple demonstration of PROC CALIS is then provided with interpretation of key portions of the SAS output. Using recommendations provided by SAS from the PROC CALIS output, the analysis is then modified to provide a better fit of survey items into survey domains.

Read the paper (PDF).

Lindsey Brown Philpot, Baylor Scott & White Health

Sunni Barnes, Baylor Scott&White Health

Crystal Carel, BaylorScott&White Health Care System

Paper 2686-2015:

Converting Annotate to ODS Graphics. Is It Possible?

In previous papers I have described how many standard SAS/GRAPH^® plots can be converted easily to ODS Graphics by using simple PROC SGPLOT or SGPANEL code. SAS/GRAPH Annotate code would appear, at first sight, to be much more difficult to convert to ODS Graphics, but by using its layering features, many Annotate plots can be replicated in a more flexible and repeatable way. This paper explains how to convert many of your Annotate plots, so they can be reproduced using Base SAS^®.

Read the paper (PDF).

Philip Holland, Holland Numerics Limited

Paper SAS1700-2015:

Creating Multi-Sheet Microsoft Excel Workbooks with SAS^®: The Basics and Beyond, Part 2

This presentation explains how to use Base SAS^®9 software to create multi-sheet Excel workbooks. You learn step-by-step techniques for quickly and easily creating attractive multi-sheet Excel workbooks that contain your SAS^® output using the ExcelXP Output Delivery System (ODS) tagset. The techniques can be used regardless of the platform on which SAS software is installed. You can even use them on a mainframe! Creating and delivering your workbooks on-demand and in real time using SAS server technology is discussed. Although the title is similar to previous presentations by this author, this presentation contains new and revised material not previously presented.

Read the paper (PDF). | Download the data file (ZIP).

Vince DelGobbo, SAS

Paper 2242-2015:

Creative Uses of Vector Plots Using SAS^®

Hysteresis loops occur because of a time lag between input and output. In the pharmaceutical industry, hysteresis plots are tools to visualize the time lag between drug concentration and drug effects. Before SAS^® 9.2, SAS annotations were used to generate such plots. One of the criticisms was that SAS programmers had to write complex macros to automate the process; code management and validation tasks were not easy. With SAS 9.2, SAS programmers are able to generate such plots with ease. This paper demonstrates the generation of such plots with both Base SAS and SAS/GRAPH^® software.

Read the paper (PDF).

Deli Wang, REGENERON

Paper 3217-2015:

Credit Card Holders' Behavior Modeling: Transition Probability Prediction with Multinomial and Conditional Logistic Regression in SAS/STAT^®

Because of the variety of card holders' behavior patterns and income sources, each consumer account can change to different states. Each consumer account can change to states such as non-active, transactor, revolver, delinquent, and defaulted, and each account requires an individual model for generated income prediction. The estimation of the transition probability between statuses at the account level helps to avoid the lack of memory in the MDP approach. The key question is which approach gives more accurate results: multinomial logistic regression or multistage decision tree with binary logistic regressions. This paper investigates the approaches to credit cards' profitability estimation at the account level based on multistates conditional probability by using the SAS/STAT procedure PROC LOGISTIC. Both models show moderate, but not strong, predictive power. Prediction accuracy for decision tree is dependent on the order of stages for conditional binary logistic regression. Current development is concentrated on discrete choice models as nested logit with PROC MDC.

Read the paper (PDF).

Denys Osipenko, the University of Edinburgh

Jonathan Crook

Paper 3249-2015:

Cutpoint Determination Methods in Survival Analysis Using SAS^®: Updated %FINDCUT Macro

Statistical analysis that uses data from clinical or epidemiological studies include continuous variables such as patient's age, blood pressure, and various biomarkers. Over the years, there has been an increase in studies that focus on assessing associations between biomarkers and disease of interest. Many of the biomarkers are measured as continuous variables. Investigators seek to identify the possible cutpoint to classify patients as high risk versus low risk based on the value of the biomarker. Several data-oriented techniques such as median and upper quartile, and outcome-oriented techniques based on score, Wald, and likelihood ratio tests are commonly used in the literature. Contal and O'Quigley (1999) presented a technique that used log rank test statistic in order to estimate the cutpoint. Their method was computationally intensive and hence was overlooked due to the unavailability of built-in options in standard statistical software. In 2003, we provided the %FINDCUT macro that used Contal and O'Quigley's approach to identify a cutpoint when the outcome of interest was measured as time to event. Over the past decade, demand for this macro has continued to grow, which has led us to consider updating the %FINDCUT macro to incorporate new tools and procedures from SAS^® such as array processing, Graph Template Language, and the REPORT procedure. New and updated features include: results presented in a much cleaner report format, user-specified cutpoints, macro parameter error checking, temporary data set cleanup, preserving current option settings, and increased processing speed. We present the utility and added options of the revised %FINDCUT macro using a real-life data set. In addition, we critically compare this method to some of the existing methods and discuss the use and misuse of categorizing a continuous covariate.

Read the paper (PDF).

Jay Mandrekar, Mayo Clinic

Jeffrey Meyers, Mayo Clinic

D

Paper 3451-2015:

DATU - Data Automated Transfer Utility: A Program to Migrate Files from Mainframe to Windows

The Centers for Disease Control and Prevention (CDC) went through a large migration from a mainframe to a Windows platform. This e-poster will highlight the Data Automated Transfer Utility (DATU) that was developed to migrate historic files between the two file systems using SAS^® macros and SAS/CONNECT^®. We will demonstrate how this program identifies the type of file, transfers the file appropriately, verifies the successful transfer, and provides the details in a Microsoft Excel report. SAS/CONNECT code, special system options, and mainframe code will be shown. In 2009, the CDC made the decision to retire a mainframe that was used for years of primarily SAS work. The replacement platform is a SAS grid system, based on Windows, which is referred to as the Consolidated Statistical Platform (CSP). The change from mainframe to Windows required the migration of over a hundred thousand files totaling approximately 20 terabytes. To minimize countless man hours and human error, an automated solution was developed. DATU was developed for users to migrate their files from the mainframe to the new Windows CSP or other Windows destinations. Approximately 95% of the files on the CDC mainframe were one of three file types: SAS data sets, sequential text files, and partitioned data sets (PDS) libraries. DATU dynamically determines the file type and uses the appropriate method to transfer the file to the assigned Windows destination. Variations of files are detected and handled appropriately. File variations include multiple SAS versions of SAS data sets and sequential files that contain binary values such as packed decimal fields. To mitigate the loss of numeric precision during the migration, SAS numeric variables are identified and promoted to account for architectural differences between mainframe and Windows platforms. To aid users in verifying the accuracy of the file transfer, the program compares file information of the source and destination files. When a SAS file is d ownloaded, PROC CONTENTS is run on both files, and the PROC CONTENTS output is compared. For sequential text files, a checksum is generated for both files and the checksum file is compared. A PDS file transfer creates a list of the members in the PDS and destination Windows folder, and the file lists are compared. The development of this program and the file migration was a daunting task. This paper will share some of our lessons learned along the way and the method of our implementation.

Read the paper (PDF).

Jim Brittain, National Center for Health Statistics (CDC)

Robert Schwartz, National Centers for Disease Control and Prevention

Paper 2523-2015:

DS2 with Both Hands on the Wheel

The DATA Step has served SAS^® programmers well over the years, and although it is handy, the new, exciting, and powerful DS2 is a significant alternative to the DATA Step by introducing an object-oriented programming environment. It enables users to effectively manipulate complex data and efficiently manage the programming through additional data types, programming structure elements, user-defined methods, and shareable packages, as well as threaded execution. This tutorial is developed based on our experiences with getting started with DS2 and learning to use it to access, manage, and share data in a scalable and standards-based way. It facilitates SAS users of all levels to easily get started with DS2 and understand its basic functionality by practicing the features of DS2.

Read the paper (PDF).

Peter Eberhardt, Fernwood Consulting Group Inc.

Xue Yao, Winnipeg Regional Health Aurthority

Paper 3336-2015:

Dallas County Midterm Election Exit Polling Research

Texas is one of about 30 states that has recently passed laws requiring voters to produce valid IDs in an effort to avoid illegal voters. This new regulation, however, might negatively affect voting opportunities for students, low-income people, and minorities. To determine the actual effects of the regulation in Dallas County, voters were surveyed when exiting the polling offices during the November midterm election about difficulties that they might have encountered in the voting process. The database of the voting history of each registered voter in the county was examined, and the data set was converted into an analyzable structure prior to stratification. All of the polling offices were stratified by the residents' degrees of involvement in the past three general elections, namely, the proportion of people who have used early election and who have at least voted once. A two-phase sampling design was adopted for stratification. On election day, pollsters were sent to select polling offices and interviewed 20 voters at a selected time period. The number of people having difficulties was estimated when data was collected.

Read the paper (PDF).

Yusun Xia, Southern Methodist University

Paper 2000-2015:

Data Aggregation Using the SAS^® Hash Object

Soon after the advent of the SAS^® hash object in SAS^® 9.0, its early adopters realized that the potential functionality of the new structure is much broader than basic 0(1)-time lookup and file matching. Specifically, they went on to invent methods of data aggregation based on the ability of the hash object to quickly store and update key summary information. They also demonstrated that the DATA step aggregation using the hash object offered significantly lower run time and memory utilization compared to the SUMMARY/MEANS or SQL procedures, coupled with the possibility of eliminating the need to write the aggregation results to interim data files and the programming flexibility that allowed them to combine sophisticated data manipulation and adjustments of the aggregates within a single step. Such developments within the SAS user community did not go unnoticed by SAS R&D, and for SAS^® 9.2 the hash object had been enriched with tag parameters and methods specifically designed to handle aggregation without the need to write the summarized data to the PDV host variable and update the hash table with new key summaries, thus further improving run-time performance. As more SAS programmers applied these methods in their real-world practice, they developed aggregation techniques fit to various programmatic scenarios and ideas for handling the hash object memory limitations in situations calling for truly enormous hash tables. This paper presents a review of the DATA step aggregation methods and techniques using the hash object. The presentation is intended for all situations in which the final SAS code is either a straight Base SAS DATA step or a DATA step generated by any other SAS product.

Read the paper (PDF).

Paul Dorfman, Dorfman Consukting

Don Henderson, Henderson Consulting Services

Paper 3104-2015:

Data Management Techniques for Complex Healthcare Data

Data sharing through healthcare collaboratives and national registries creates opportunities for secondary data analysis projects. These initiatives provide data for quality comparisons as well as endless research opportunities to external researchers across the country. The possibilities are bountiful when you join data from diverse organizations and look for common themes related to illnesses and patient outcomes. With these great opportunities comes great pain for data analysts and health services researchers tasked with compiling these data sets according to specifications. Patient care data is complex, and, particularly at large healthcare systems, might be managed with multiple electronic health record (EHR) systems. Matching data from separate EHR systems while simultaneously ensuring the integrity of the details of that care visit is challenging. This paper demonstrates how data management personnel can use traditional SAS PROCs in new and inventive ways to compile, clean, and complete data sets for submission to healthcare collaboratives and other data sharing initiatives. Traditional data matching methods such as SPEDIS are uniquely combined with iterative SQL joins using the SAS^® functions INDEX, COMPRESS, CATX, and SUBSTR to yield the most out of complex patient and physician name matches. Recoding, correcting missing items, and formatting data can be efficiently achieved by using traditional functions such as MAX, PROC FORMAT, and FIND in new and inventive ways.

Read the paper (PDF).

Gabriela Cantu, Baylor Scott &White Health

Christopher Klekar, Baylor Scott and White Health

Paper 3261-2015:

Data Quality Scorecard

Many users would like to check the quality of data after the data integration process has loaded the data into a data set or table. The approach in this paper shows users how to develop a process that scores columns based on rules judged against a set of standards set by the user. Each rule has a standard that determines whether it passes, fails, or needs review (a green, red, or yellow score). A rule can be as simple as: Is the value for this column missing, or is this column within a valid range? Further, it includes comparing a column to one or more other columns, or checking for specific invalid entries. It also includes rules that compare a column value to a lookup table to determine whether the value is in the lookup table. Users can create their own rules and each column can have any number of rules. For example, a rule can be created to measure a dollar column to a range of acceptable values. The user can determine that it is expected that up to two percent of the values are allowed to be out of range. If two to five percent of the values are out of range, then data should be reviewed. And, if over five percent of the values are out of range, the data is not acceptable. The entire table has a color-coded scorecard showing each rule and its score. Summary reports show columns by score and distributions of key columns. The scorecard enables the user to quickly assess whether the SAS data set is acceptable, or whether specific columns need to be reviewed. Drill-down reports enable the user to drill into the data to examine why the column scored as it did. Based on the scores, the data set can be accepted or rejected, and the user will know where and why the data set failed. The process can store each scorecard data in a data mart. This data mart enables the user to review the quality of their data over time. It can answer questions such as: is the quality of the data improving overall? Are there specific columns that are improving or declining over time? What can we do to improve the qu ality of our data? This scorecard is not intended to replace the quality control of the data integration or ETL process. It is a supplement to the ETL process. The programs are written using only Base SAS^® and Output Delivery System (ODS), macro variables, and formats. This presentation shows how to: (1) use ODS HTML; (2) color code cells with the use of formats; (3) use formats as lookup tables; (4) use INCLUDE statements to make use of template code snippets to simplify programming; and (5) use hyperlinks to launch stored processes from the scorecard.

Read the paper (PDF). | Download the data file (ZIP).

Tom Purvis, Qualex Consulting Services, Inc.

Paper 3321-2015:

Data Summarization for a Dissertation: A Grad Student How-To Paper

Graduate students often need to explore data and summarize multiple statistical models into tables for a dissertation. The challenges of data summarization include coding multiple, similar statistical models, and summarizing these models into meaningful tables for review. The default method is to type (or copy and paste) results into tables. This often takes longer than creating and running the analyses. Students might spend hours creating tables, only to have to start over when a change or correction in the underlying data requires the analyses to be updated. This paper gives graduate students the tools to efficiently summarize the results of statistical models in tables. These tools include a macro-based SAS/STAT^® analysis and ODS OUTPUT statement to summarize statistics into meaningful tables. Specifically, we summarize PROC GLM and PROC LOGISTIC output. We convert an analysis of hospital-acquired delirium from hundreds of pages of output into three formatted Microsoft Excel files. This paper is appropriate for users familiar with basic macro language.

Read the paper (PDF).

Elisa Priest, Texas A&M University Health Science Center

Ashley Collinsworth, Baylor Scott & White Health/Tulane University

Paper 3305-2015:

Defensive Coding by Example: Kick the Tires, Pump the Breaks, Check Your Blind Spots, and Merge Ahead!

As SAS^® programmers and statisticians, we rarely write programs that are run only once and then set aside. Instead, we are often asked to develop programs very early in a project, on immature data, following specifications that may be little more than a guess as to what the data is supposed to look like. These programs will then be run repeatedly on periodically updated data through the duration of the project. This paper offers strategies for not only making those programs more flexible, so they can handle some of the more commonly encountered variations in that data, but also for setting traps to identify unexpected data points that require further investigation. We will also touch upon some good programming practices that can benefit both the original programmer and others who might have to touch the code. In this paper, we will provide explicit examples of defensive coding that will aid in kicking the tires, pumping the breaks, checking your blind spots, and merging ahead for quality programming from the beginning.

Read the paper (PDF).

Donna Levy, Inventiv Health Clinical

Nancy Brucken, inVentiv Health Clinical

Paper 2460-2015:

Demystifying Date and Time Intervals

Intervals have been a feature of Base SAS^® for a long time, enabling SAS users to work with commonly (and not-so-commonly) defined periods of time such as years, months, and quarters. With the release of SAS^®9, there are more options and capabilities for intervals and their functions. This paper first discusses the basics of intervals in detail, and then discusses several of the enhancements to the interval feature, such as the ability to select how the INTCK function defines interval boundaries and the ability to create your own custom intervals beyond multipliers and shift operators.

Read the paper (PDF).

Derek Morgan

Paper 2660-2015:

Deriving Adverse Event Records Based on Study Treatments

This paper puts forward an approach on how to create duplicate records from one Adverse Event (AE) datum based on study treatments. In order to fulfill this task, one flag was created to check if we need to produce duplicate records depending on an existing AE in different treatment periods. If yes, then we create these duplicate records and derive AE dates for these duplicate records based on treatment periods or discontinued dates.

Read the paper (PDF).

Jonson Jiang, inVentiv Health

Paper 3201-2015:

Designing Big Data Analytics Undergraduate and Postgraduate Programs for Employability by Using National Skills Frameworks

There is a widely forecast skills gap developing between the numbers of Big Data Analytics (BDA) graduates and the predicted jobs market. Many universities are developing innovative programs to increase the numbers of BDA graduates and postgraduates. The University of Derby has recently developed two new programs that aim to be unique and offer the applicants highly attractive and career-enhancing programs of study. One program is an undergraduate Joint Honours program that pairs analytics with a range of alternative subject areas; the other is a Master's program that has specific emphasis on governance and ethics. A critical aspect of both programs is the synthesis of a Personal Development Planning Framework that enables the students to evaluate their current status, identifies the steps needed to develop toward their career goals, and that provides a means of recording their achievements with evidence that can then be used in job applications. In the UK, we have two sources of skills frameworks that can be synthesized to provide a self-assessment matrix for the students to use as their Personal Development Planning (PDP) toolkit. These are the Skills Framework for the Information Age (SFIA-Plus) framework developed by the SFIA Foundation, and the Student Employability Profiles developed by the Higher Education Academy. A new set of National Occupational Skills (NOS) frameworks (Data Science, Data Management, and Data Analysis) have recently been released by the organization e-Skills UK for consultation. SAS^® UK has had significant input to this new set of NOSs. This paper demonstrates how curricula have been developed to meet the Big Data Analytics skills shortfall by using these frameworks and how these frameworks can be used to guide students in their reflective development of their career plans.

Read the paper (PDF).

Richard Self, University of Derby

Paper 1762-2015:

Did Your Join Work Properly?

We have many options for performing merges or joins these days, each with various advantages and disadvantages. Depending on how you perform your joins, different checks can help you verify whether the join was successful. In this presentation, we look at some sample data, use different methods, and see what kinds of tests can be done to ensure that the results are correct. If the join is performed with PROC SQL and two criteria are fulfilled (the number of observations in the primary data set has not changed [presuming a one-to-one or many-to-one situation], and a variable that should be populated is not missing), then the merge was successful.

Read the paper (PDF).

Emmy Pahmer, inVentiv Health

Paper 3457-2015:

Documentation as You Go: aka Dropping Breadcrumbs!

Your project ended a year ago, and now you need to explain what you did, or rerun some of your code. Can you remember the process? Can you describe what you did? Can you even find those programs? Visually presented here are examples of tools and techniques that follow best practices to help us, as programmers, manage the flow of information from source data to a final product.

Read the paper (PDF).

Elizabeth Axelrod, Abt Associates Inc.

Paper 3061-2015:

Does Class Rank Align with Future Potential?

Today, employers and universities alike must choose the most talented individuals from a large pool. However, it is difficult to determine whether a student's A+ in English means that he or she can write as proficiently as another student who writes as a hobby. As a result, there are now dozens of ways to compare individuals to one or another spectrum. For example, the ACT and SAT enable universities to view a student's performance on a test given to all applicants in order to help determine whether they will be successful. High schools use students' GPAs in order to compare them to one another for academic opportunities. The WorkKeys Exam enables employers to rate prospective employees on their abilities to perform day-to-day business activities. Rarely do standardized tests and in-class performance get compared to each other. We used SAS^® to analyze the GPAs and WorkKeys Exam results of 285 seniors who attend the Phillip O Berry Academy. The purpose was to compare class standing to what a student can prove he knows in a standardized environment. Emphasis is on the use of PROC SQL in SAS^® 9.3 rather than DATA step processing.

Read the paper (PDF). | Download the data file (ZIP).

Jonathan Gomez Martinez, Phillip O Berry Academy of Technology

Christopher Simpson, Phillip O Berry Academy of Technology

Paper SAS1573-2015:

Don't Be a Litterbug: Best Practices for Using Temporary Files in SAS^®

This paper explains best practices for using temporary files in SAS^® programs. These practices include using the TEMP access method, writing to the WORK directory, and ensuring that you leave no litter files behind. An additional special temporary file technique is described for mainframe users.

Read the paper (PDF).

Rick Langston, SAS

Paper 2442-2015:

Don't Copy and Paste--Use BY Statement Processing with ODS to Make Your Summary Tables

Most manuscripts in medical journals contain summary tables that combine simple summaries and between-group comparisons. These tables typically combine estimates for categorical and continuous variables. The statistician generally summarizes the data using the FREQ procedure for categorical variables and compares percentages between groups using a chi-square or a Fisher's exact test. For continuous variables, the MEANS procedure is used to summarize data as either means and standard deviation or medians and quartiles. Then these statistics are generally compared between groups by using the GLM procedure or NPAR1WAY procedure, depending on whether one is interested in a parametric test or a non-parametric test. The outputs from these different procedures are then combined and presented in a concise format ready for publications. Currently there is no straightforward way in SAS^® to build these tables in a presentable format that can then be customized to individual tastes. In this paper, we focus on presenting summary statistics and results from comparing categorical variables between two or more independent groups. The macro takes the dataset, the number of treatment groups, and the type of test (either chi-square or Fisher's exact) as input and presents the results in a publication-ready table. This macro automates summarizing data to a certain extent and minimizes risky typographical errors when copying results or typing them into a table.

Read the paper (PDF).

Jeff Gossett, University of Arkansas for Medical Sciences

Mallikarjuna Rettiganti, UAMS

Paper SAS1561-2015:

Driving SAS^® with Lua

Programming SAS^® has just been made easier, now that SAS 9.4 has incorporated the Lua programming language into the heart of the SAS System. With its elegant syntax, modern design, and support for data structures, Lua offers you a fresh way to write SAS programs, getting you past many of the limitations of the SAS macro language. This paper shows how you can get started using Lua to drive SAS, via a quick introduction to Lua and a tour through some of the features of the Lua and SAS combination that make SAS programming easier. SAS macro programming is also compared with Lua, so that you can decide where you might benefit most by using either language.

Read the paper (PDF).

Paul Tomas, SAS

Paper 3487-2015:

Dynamic Dashboards Using SAS^®

Dynamic, interactive visual displays known as dashboards are most effective when they show essential graphs, tables, statistics, and other information where data is the star. The first rule for creating an effective dashboard is to keep it simple. Striking a balance between content and style, a dashboard should be devoid of excessive clutter so as not to distract from and obscure the information displayed. The second rule of effective dashboard design involves displaying data that meets one or more business or organizational objectives. To accomplish this, the elements in a dashboard should convey a format easily understood by its intended audience. Attendees learn how to create dynamic, interactive user- and data-driven dashboards, graphical and table-driven dashboards, statistical dashboards, and drill-down dashboards with a purpose.

Read the paper (PDF).

Kirk Paul Lafler, Software Intelligence Corporation

E

Paper 2124-2015:

Efficient Ways to Build Up Chains and Waves for Respondent-Driven Sampling Data

Respondent Driven Sampling (RDS) is both a sampling method and a data analysis technique. As a sampling method, RDS is a chain referral technique with strategic recruitment quotas and specific data gathering requirements. Like other chain referral techniques (for example, snowball sampling), the chains and waves are the start point to conduct analysis. But building the chains and waves still would be a daunting task because it involves too many transpositions and merges. This paper provides an efficient method of using Base SAS^® to build up chains and waves.

Read the paper (PDF).

Wen Song, ICF International

Paper 3329-2015:

Efficiently Using SAS^® Data Views

For the Research Data Centers (RDCs) of the United States Census Bureau, the demand for disk space substantially increases with each passing year. Efficiently using the SAS^® data view might successfully address the concern about disk space challenges within the RDCs. This paper discusses the usage and benefits of the SAS data view to save disk space and reduce the time and effort required to manage large data sets. The ability and efficiency of the SAS data view to process regular ASCII, compressed ASCII, and other commonly used file formats are analyzed and evaluated in detail. The authors discuss ways in which using SAS data views is more efficient than the traditional methods in processing and deploying the large census and survey data in the RDCs.

Read the paper (PDF).

Shigui Weng, US Bureau of the Census

Shy Degrace, US BUREAU OF THE CENSUS

Ya Jiun Tsai, US BUREAU OF THE CENSUS

Paper 3242-2015:

Entropy-Based Measures of Weight of Evidence and Information Value for Variable Reduction and Segmentation for Continuous Dependent Variables

My SAS^® Global Forum 2013 paper 'Variable Reduction in SAS^® by Using Weight of Evidence (WOE) and Information Value (IV)' has become the most sought-after online article on variable reduction in SAS since its publication. But the methodology provided by the paper is limited to reduction of numeric variables for logistic regression only. Built on a similar process, the current paper adds several major enhancements: 1) The use of WOE and IV has been expanded to the analytics and modeling for continuous dependent variables. After the standardization of a continuous outcome, all records can be divided into two groups: positive performance (outcome y above sample average) and negative performance (outcome y below sample average). This treatment is rigorously consistent with the concept of entropy in Information Theory: the juxtaposition of two opposite forces in one equation, and a stronger contrast between the two suggests a higher intensity , that is, more information delivered by the variable in question. As the standardization keeps the outcome variable continuous and quantified, the revised formulas for WOE and IV can be used in the analytics and modeling for continuous outcomes such as sales volume, claim amount, and so on. 2) Categorical and ordinal variables can be assessed together with numeric ones. 3) Users of big data usually need to evaluate hundreds or thousands of variables, but it is not uncommon that over 90% of variables contain little useful information. We have added a SAS macro that trims these variables efficiently in a broad-brushed manner without a thorough examination. Afterward, we examine the retained variables more carefully on their behaviors to the target outcome. 4) We add Chi-Square analysis for categorical/ordinal variables and Gini coefficients for numeric variable in order to provide additional suggestions for segmentation and regression. With the above enhancements added, a SAS macro program is provided at the end of the paper as a complete suite for variable reduction/selection that efficiently evaluates all variables together. The paper provides a detailed explanation for how to use the SAS macro and how to read the SAS outputs that provide useful insights for subsequent linear regression, logistic regression, or scorecard development.

Read the paper (PDF).

Alec Zhixiao Lin, PayPal Credit

Paper 3047-2015:

Examination of Three SAS^® Tools for Solving a Complicated Data Summary Problem

When faced with a difficult data reduction problem, a SAS^® programmer has many options for how to solve the problem. In this presentation, three different methods are reviewed and compared in terms of processing time, debugging, and ease of understanding. The three methods include linearizing the data, using SQL Cartesian joins, and using sequential data processing. Inconsistencies in the raw data caused the data linearization to be problematic. The large number of records and the need for many-to-many merges resulted in a long run time for the SQL code. The sequential data processing, although older technology, provided the most time efficient and error-free results.

Read the paper (PDF).

Carry Croghan, US-EPA

Paper 3335-2015:

Experimental Approaches to Marketing and Pricing Research

Design of experiments (DOE) is an essential component of laboratory, greenhouse, and field research in the natural sciences. It has also been an integral part of scientific inquiry in diverse social science fields such as education, psychology, marketing, pricing, and social works. The principle and practices of DOE are among the oldest and the most advanced tools within the realm of statistics. DOE classification schemes, however, are diverse and, at times, confusing. In this presentation, we provide a simple conceptual classification framework in which experimental methods are grouped into classical and statistical approaches. The classical approach is further divided into pre-, quasi-, and true-experiments. The statistical approach is divided into one, two, and more than two factor experiments. Within these broad categories, we review several contemporary and widely used designs and their applications. The optimal use of Base SAS^® and SAS/STAT^® to analyze, summarize, and report these diverse designs is demonstrated. The prospects and challenges of such diverse and critically important analytics tools on business insight extraction in marketing and pricing research are discussed.

Read the paper (PDF).

Max Friedauer

Jason Greenfield, Cardinal Health

Yuhan Jia, Cardinal Health

Joseph Thurman, Cardinal Health

Paper 3384-2015:

Exploring Number Theory Using Base SAS^®

Long before analysts began mining large data sets, mathematicians sought truths hidden within the set of natural numbers. This exploration was formalized into the mathematical subfield known as number theory. Though this discipline has proven fruitful for many applied fields, number theory delights in numerical truth for its own sake. The austere and abstract beauty of number theory prompted nineteenth century mathematician Carl Friedrich Gauss to dub it The Queen of Mathematics.' This session and the related paper demonstrate that the analytical power of the SAS^® engine is well-suited for exploring concepts in number theory. In Base SAS^®, students and educators will find a powerful arsenal for exploring, illustrating, and visualizing the following: prime numbers, perfect numbers, the Euclidean algorithm, the prime number theorem, Euler's totient function, Chebyshev's theorem, the Chinese remainder theorem, Gauss circle problem, and more! The paper delivers custom SAS procedures and code segments that generate data sets relevant to the exploration of topics commonly found in elementary number theory texts. The efficiency of these algorithms is discussed and an emphasis is placed on structuring data sets to maximize flexibility and ease in exploring new conjectures and illustrating known theorems. Last, the power of SAS plotting is put to use in unexpected ways to visualize and convey number-theoretic facts. This session and the related paper are geared toward the academic user or SAS user who welcomes and revels in mathematical diversions.

Read the paper (PDF).

Matthew Duchnowski, Educational Testing Service

F

Paper SAS1750-2015:

Feeling Anxious about Transitioning from Desktop to Server? Key Considerations to Diminish Your Administrators' and Users' Jitters

As organizations strive to do more with fewer resources, many modernize their disparate PC operations to centralized server deployments. Administrators and users share many concerns about using SAS^® on a Microsoft Windows server. This paper outlines key guidelines, plus architecture and performance considerations, that are essential to making a successful transition from PC to server. This paper outlines the five key considerations for SAS customers who will change their configuration from PC-based SAS to using SAS on a Windows server: 1) Data and directory references; 2) Interactive and surrounding applications; 3) Usability; 4) Performance; 5) SAS Metadata Server.

Read the paper (PDF).

Kate Schwarz, SAS

Donna Bennett, SAS

Margaret Crevar, SAS

Paper 3049-2015:

Filling your SAS^® Efficiency Toolbox: Creating a Stored Process to Interact with Your Shared SAS^® Server Using the X and SYSTASK Commands

SAS^® Enterprise Guide^® is a great interface for businesses running SAS^® in a shared server environment. However, interacting with the shared server outside of SAS can require costly third-party software and knowledge of specific server programming languages. This can create a barrier between the SAS program and the server, which can be frustrating for even the best SAS programmers. This paper reviews the X and SYSTASK commands and creates a template of SAS code to pass commands from SAS to the server. By writing the server log to a text file, we demonstrate how to display critical server information in the code results. Using macros and the prompt functionality of SAS Enterprise Guide, we form stored procedures, allowing SAS users of all skill levels to interact with the server environment. These stored procedures can improve programming efficiency by providing a quick in-program solution to complete common server tasks such as copying folders or changing file permissions. They might also reduce the need for third-party programs to communicate with the server, which could potentially reduce software costs.

Read the paper (PDF).

Cody Murray, Medica Health Plans

Chad Stegeman, Medica

Paper 1752-2015:

Fixing Lowercase Acronyms Left Over from the PROPCASE Function

The PROPCASE function is useful when you are cleansing a database of names and addresses in preparation for mailing. But it does not know the difference between a proper name (in which initial capitalization should be used) and an acronym (which should be all uppercase). This paper explains an algorithm that determines with reasonable accuracy whether a word is an acronym and, if it is, converts it to uppercase.

Read the paper (PDF). | Download the data file (ZIP).

Joe DeShon, Boehringer Ingelheim Vetmedica

Paper 3419-2015:

Forest Plotting Analysis Macro %FORESTPLOT

A powerful tool for visually analyzing regression analysis is the forest plot. Model estimates, ratios, and rates with confidence limits are graphically stacked vertically in order to show how they overlap with each other and to show values of significance. The ability to see whether two values are significantly different from each other or whether a covariate has a significant meaning on its own is made much simpler in a forest plot rather than sifting through numbers in a report table. The amount of data preparation needed in order to build a high-quality forest plot in SAS^® can be tremendous because the programmer needs to run analyses, extract the estimates to be plotted, structure the estimates in a format conducive to generating a forest plot, and then run the correct plotting procedure or create a graph template using the Graph Template Language (GTL). While some SAS procedures can produce forest plots using Output Delivery System (ODS) Graphics automatically, the plots are not generally publication-ready and are difficult to customize even if the programmer is familiar with GTL. The macro %FORESTPLOT is designed to perform all of the steps of building a high-quality forest plot in order to save time for both experienced and inexperienced programmers, and is currently set up to perform regression analyses common to the clinical oncology research areas, Cox proportional hazards and logistic, as well as calculate Kaplan-Meier event-free rates. To improve flexibility, the user can specify a pre-built data set to transform into a forest plot if the automated analysis options of the macro do not fit the user's needs.

Read the paper (PDF).

Jeffrey Meyers, Mayo Clinic

Qian Shi, Mayo Clinic

Paper SAS1500-2015:

Frequently Asked Questions Regarding Storage Configurations

The SAS^® Global Forum paper 'Best Practices for Configuring Your I/O Subsystem for SAS^®9 Applications' provides general guidelines for configuring I/O subsystems for your SAS^® applications. The paper reflects updated storage and virtualization technology. This companion paper ('Frequently Asked Questions Regarding Storage Configurations') is commensurately updated, including new storage technologies such as storage virtualization, storage tiers (including automated tier management), and flash storage. The subject matter is voluminous, so a frequently asked questions (FAQ) format is used. Our goal is to continually update this paper as additional field needs arise and technology dictates.

Read the paper (PDF).

Tony Brown, SAS

Margaret Crevar, SAS

Paper 3434-2015:

From Backpacks to Briefcases: Making the Transition from Grad School to a Paying Job

During grad school, students learn SAS^® in class or on their own for a research project. Time is limited, so faculty have to focus on what they know are the fundamental skills that students need to successfully complete their coursework. However, real-world research projects are often multifaceted and require a variety of SAS skills. When students transition from grad school to a paying job, they might find that in order to be successful, they need more than the basic SAS skills that they learned in class. This paper highlights 10 insights that I've had over the past year during my transition from grad school to a paying SAS research job. I hope this paper will help other students make a successful transition. Top 10 insights: 1. You still get graded, but there is no syllabus. 2. There isn't time for perfection. 3. Learn to use your resources. 4. There is more than one solution to every problem. 5. Asking for help is not a weakness. 6. Working with a team is required. 7. There is more than one type of SAS^®. 8. The skills you learned in school are just the basics. 9. Data is complicated and often frustrating. 10. You will continue to learn both personally and professionally.

Read the paper (PDF).

Lauren Hall, Baylor Scott & White Health

Elisa Priest, Texas A&M University Health Science Center

Paper 3380-2015:

From a One-Horse to a One-Stoplight Town: A Base SAS^® Solution to Preventing Data Access Collisions Using Shared and Exclusive File Locks

Data access collisions occur when two or more processes attempt to gain concurrent access to a single data set. Collisions are a common obstacle to SAS^® practitioners in multi-user environments. As SAS instances expand to infrastructures and ultimately empires, the inherent increased complexities must be matched with commensurately higher code quality standards. Moreover, permanent data sets will attract increasingly more devoted users and automated processes clamoring for attention. As these dependencies increase, so too does the likelihood of access collisions that, if unchecked or unmitigated, lead to certain process failure. The SAS/SHARE^® module offers concurrent file access capabilities, but causes a (sometimes dramatic) reduction in processing speed, must be licensed and purchased separately from Base SAS^®, and is not a viable solution for many organizations. Previously proposed solutions in Base SAS use a busy-wait spinlock cycle to repeatedly attempt file access until process success or timeout. While effective, these solutions are inefficient because they generate only read-write locked data sets that unnecessarily prohibit access by subsequent read-only requests. This presentation introduces the %LOCKITDOWN macro that advances previous solutions by affording both read-write and read-only lock testing and deployment. Moreover, recognizing the responsibility for automated data processes to be reliable, robust, and fault tolerant, %LOCKITDOWN is demonstrated in the context of a macro-based exception handling paradigm.

Read the paper (PDF).

Troy Hughes, Datmesis Analytics

Paper 3142-2015:

Quality measurement is increasingly important in the health-care sphere for both performance optimization and reimbursement. Treatment of chronic conditions is a key area of quality measurement. However, medication compendiums change frequently, and health-care providers often free text medications into a patient's record. Manually reviewing a complete medications database is time consuming. In order to build a robust medications list, we matched a pharmacist-generated list of categorized medications to a raw medications database that contained names, name-dose combinations, and misspellings. The matching procedure we used is called PROC COMPGED. We were able to combine a truncation function and an upcase function to optimize the output of PROC COMPGED. Using these combinations and manipulating the scoring metric of PROC COMPGED enabled us to narrow the database list to medications that were relevant to our categories. This process transformed a tedious task for PROC COMPARE or an Excel macro into a quick and efficient method of matching. The task of sorting through relevant matches was still conducted manually, but the time required to do so was significantly decreased by the fuzzy match in our application of PROC COMPGED.

Read the paper (PDF).

Arti Virkud, NYC Department of Health

G

Paper 2787-2015:

GEN_OMEGA2: A SAS^® Macro for Computing the Generalized Omega- Squared Effect Size Associated with Analysis of Variance Models

Effect sizes are strongly encouraged to be reported in addition to statistical significance and should be considered in evaluating the results of a study. The choice of an effect size for ANOVA models can be confusing because indices might differ depending on the research design as well as the magnitude of the effect. Olejnik and Algina (2003) proposed the generalized eta-squared and omega-squared effect sizes, which are comparable across a wide variety of research designs. This paper provides a SAS^® macro for computing the generalized omega-squared effect size associated with analysis of variance models by using data from PROC GLM ODS tables. The paper provides the macro programming language, as well as results from an executed example of the macro.

Read the paper (PDF).

Anh Kellermann, University of South Florida

Yi-hsin Chen, USF

Anh Kellermann, University of South Florida

Jeffrey Kromrey, University of South Florida

Thanh Pham, USF

Patrice Rasmussen, USF

Patricia Rodriguez de Gil, University of South Florida

Jeanine Romano, USF

Paper 1886-2015:

Getting Started with Data Governance

While there has been tremendous progress in technologies related to data storage, high-performance computing, and advanced analytic techniques, organizations have only recently begun to comprehend the importance of parallel strategies that help manage the cacophony of concerns around access, quality, provenance, data sharing, and use. While data governance is not new, the drumbeat around it, along with master data management and data quality, is approaching a crescendo. Intensified by the increase in consumption of information, expectations about ubiquitous access, and highly dynamic visualizations, these factors are also circumscribed by security and regulatory constraints. In this paper, we provide a summary of what data governance is and its importance. We go beyond the obvious and provide practical guidance on what it takes to build out a data governance capability appropriate to the scale, size, and purpose of the organization and its culture. Moreover, we discuss best practices in the form of requirements that highlight what we think is important to consider as you provide that tactical linkage between people, policies, and processes to the actual data lifecycle. To that end, our focus includes the organization and its culture, people, processes, policies, and technology. Further, we include discussions of organizational models as well as the role of the data steward, and provide guidance on how to formalize data governance into a sustainable set of practices within your organization.

Read the paper (PDF). | Watch the recording.

Greg Nelson, ThotWave

Lisa Dodson, SAS

Paper 3432-2015:

Getting Your Hands on Reproducible Graphs

Learning the Graph Template Language (GTL) might seem like a daunting task. However, creating customized graphics with SAS^® is quite easy using many of the tools offered with Base SAS^® software. The point-and-click interface of ODS Graphics Designer provides us with a tool that can be used to generate highly polished graphics and to store the GTL-based code that creates them. This opens the door for users who would like to create canned graphics that can be used on various data sources, variables, and variable types. In this hands-on training, we explore the use of ODS Graphics Designer to create sophisticated graphics and to save the template code. We then discuss modifications using basic SAS macros in order to create stored graphics code that is flexible enough to accommodate a wide variety of situations.

Read the paper (PDF).

Rebecca Ottesen, City of Hope and Cal Poly SLO

Leanne Goldstein, City of Hope

Paper 3474-2015:

Getting your SAS^® Program to Do Your Typing for You!

Do you have a SAS^® program that requires adding filenames to the input every time you run it? Aren't you tired of having to check for the files, check the names, and type them in? Check out how my SAS^® Enterprise Guide^® project checks for files, figures out the file names, and saves me from having to type in the file names for the input data files!

Read the paper (PDF). | Download the data file (ZIP).

Nancy Wilson, Ally

Paper 2441-2015:

Graphing Made Easy with SGPLOT and SGPANEL Procedures

When ODS Graphics was introduced a few years ago, it gave SAS users an array of new ways to generate graphs. One of those ways is with Statistical Graphics procedures. Now, with just a few lines of simple code, you can create a wide variety of high-quality graphs. This paper shows how to produce single-celled graphs using PROC SGPLOT and paneled graphs using PROC SGPANEL. This paper also shows how to send your graphs to different ODS destinations, how to apply ODS styles to your graphs, and how to specify properties of graphs, such as format, name, height, and width.

Read the paper (PDF).

Susan Slaughter, Avocet Solutions

Paper SAS1780-2015:

Graphs Are Easy with SAS^® 9.4

Axis tables, polygon plot, text plot, and more features have been added to Statistical Graphics (SG) procedures and Graph Template Language (GTL) for SAS^® 9.4. These additions are a direct result of your feedback and are designed to make creating graphs easier. Axis tables let you add multiple tables of data to your graphs and to correctly align with the axis values with the right colors for group values in your data. Text plots can have rotated and aligned text anywhere in the graph. You can overlay jittered markers on box plots, use images and font glyphs as markers, specify group attributes without making style changes, and create entirely new custom graphs using the polygon plot. All this without using the annotation facility, which is now supported both for SG procedures and GTL. This paper guides you through these exciting new features now available in SG procedures and GTL.

Read the paper (PDF). | Download the data file (ZIP).

Sanjay Matange, SAS

H

Paper 2506-2015:

Hands-on SAS^® Macro Programming Essentials for New Users

The SAS^® Macro Language is a powerful tool for extending the capabilities of the SAS^® System. This hands-on workshop teaches essential macro coding concepts, techniques, tips, and tricks to help beginning users learn the basics of how the macro language works. Using a collection of proven macro language coding techniques, attendees learn how to write and process macro statements and parameters; replace text strings with macro (symbolic) variables; generate SAS code using macro techniques; manipulate macro variable values with macro functions; create and use global and local macro variables; construct simple arithmetic and logical expressions; interface the macro language with the SQL procedure; store and reuse macros; troubleshoot and debug macros; and develop efficient and portable macro language code.

Read the paper (PDF).

Kirk Paul Lafler, Software Intelligence Corporation

Paper 3485-2015:

Health Services Research Using Electronic Health Record Data: A Grad Student How-To Paper

Graduate students encounter many challenges when conducting health services research using real world data obtained from electronic health records (EHRs). These challenges include cleaning and sorting data, summarizing and identifying present-on-admission diagnosis codes, identifying appropriate metrics for risk-adjustment, and determining the effectiveness and cost effectiveness of treatments. In addition, outcome variables commonly used in health service research are not normally distributed. This necessitates the use of nonparametric methods in statistical analyses. This paper provides graduate students with the basic tools for the conduct of health services research with EHR data. We will examine SAS^® tools and step-by-step approaches used in an analysis of the effectiveness and cost-effectiveness of the ABCDE (Awakening and Breathing Coordination, Delirium monitoring/management, and Early exercise/mobility) bundle in improving outcomes for intensive care unit (ICU) patients. These tools include the following: (1) ARRAYS; (2) lookup tables; (3) LAG functions; (4) PROC TABULATE; (5) recycled predictions; and (6) bootstrapping. We will discuss challenges and lessons learned in working with data obtained from the EHR. This content is appropriate for beginning SAS users.

Read the paper (PDF).

Ashley Collinsworth, Baylor Scott & White Health/Tulane University

Elisa Priest, Texas A&M University Health Science Center

Paper SAS1747-2015:

Helping You C What You Can Do with SAS^®

SAS^® users are already familiar with the FCMP procedure and the flexibility it provides them in writing their own functions and subroutines. However, did you know that FCMP also allows you to call functions written in C? Did you know that you can create and populate complex C structures and use C types in FCMP? With the PROTO procedure, you can define function prototypes, structures, enumeration types, and even small bits of C code. This paper gets you started on how to use the PROTO procedure and, in turn, how to call your C functions from within FCMP and SAS.

Read the paper (PDF). | Download the data file (ZIP).

Andrew Henrick, SAS

Karen Croft, SAS

Donald Erdman, SAS

Paper 3181-2015:

How Best to Effectively Teach the SAS^® Language

Learning a new programming language is not an easy task, especially for someone who does not have any programming experience. Learning the SAS^® programming language can be even more challenging. One of the reasons is that the SAS System consists of a variety of languages, such as the DATA step language, SAS macro language, Structured Query Language for the SQL procedure, and so on. Furthermore, each of these languages has its own unique characteristics and simply learning the syntax is not sufficient to grasp the language essence. Thus, it is not unusual to hear about someone who has learned SAS for several years and has never become a SAS programming expert. By using the DATA step language as an example, I would like to share some of my experiences on effectively teaching the SAS language.

Read the paper (PDF).

Arthur Li, City of Hope National Medical Center

Paper 2685-2015:

How Do You Use Lookup Tables?

No matter what type of programming you do in a pharmaceutical environment, there will eventually be a need to combine your data with a lookup table. This lookup table could be a code list for adverse events, a list of names for visits, one of your own summary data sets containing totals that you will be using to calculate percentages, or you might have your favorite way to incorporate it. This paper describes and discusses the reasons for using five different simple ways to merge data sets with lookup tables, so that when you take over the maintenance of a new program, you will be ready for anything!

Read the paper (PDF).

Philip Holland, Holland Numerics Limited

Paper 3431-2015:

How Latent Analyses within Survey Data Can Be Valuable Additions to Any Regression Model

This study looks at several ways to investigate latent variables in longitudinal surveys and their use in regression models. Three different analyses for latent variable discovery are briefly reviewed and explored. The procedures explored in this paper are PROC LCA, PROC LTA, PROC CATMOD, PROC FACTOR, PROC TRAJ, and PROC SURVEYLOGISTIC. The analyses defined through these procedures are latent profile analyses, latent class analyses, and latent transition analyses. The latent variables are included in three separate regression models. The effect of the latent variables on the fit and use of the regression model compared to a similar model using observed data is briefly reviewed. The data used for this study was obtained via the National Longitudinal Study of Adolescent Health, a study distributed and collected by Add Health. Data was analyzed using SAS^® 9.3. This paper is intended for any level of SAS^® user. This paper is also aimed at an audience with a background in behavioral science or statistics.

Read the paper (PDF).

Deanna Schreiber-Gregory, National University

Paper SAS1708-2015:

How SAS^® Uses SAS to Analyze SAS Blogs

SAS^® blogs (hosted at http://blogs.sas.com/content) attract millions of page views annually. With hundreds of authors, thousands of posts, and constant chatter within the blog comments, it's impossible for one person to keep track of all of the activity. In this paper, you learn how SAS technology is used to gather data and report on SAS blogs from the inside out. The beneficiaries include personnel from all over the company, including marketing, technical support, customer loyalty, and executives. The author describes the business case for tracking and reporting on the activity of blogging. You learn how SAS tools are used to access the WordPress database and how to create a 'blog data mart' for reporting and analytics. The paper includes specific examples of the insight that you can gain from examining the blogs analytically, and which techniques are most useful for achieving that insight. For example, the blog transactional data are combined with social media metrics (also gathered by using SAS) to show which blog entries and authors yield the most engagement on Twitter, Facebook, and LinkedIn. In another example, we identified the growing trend of 'blog comment spam' on the SAS blog properties and measured its cost to the business. These metrics helped to justify the investment in a solution. Many of the tools used are part of SAS^® Foundation, including SAS/ACCESS^®, the DATA step and SQL, PROC REPORT, PROC SGPLOT, and more. The results are shared in static reports, automated daily email summaries, dynamic reports hosted in SAS/IntrNet^®, and even a corporate dashboard hosted in SAS^® Visual Analytics.

Read the paper (PDF).

Chris Hemedinger, SAS

Paper 3214-2015:

How is Your Health? Using SAS^® Macros, ODS Graphics, and GIS Mapping to Monitor Neighborhood and Small-Area Health Outcomes

With the constant need to inform researchers about neighborhood health data, the Santa Clara County Health Department created socio-demographic and health profiles for 109 neighborhoods in the county. Data was pulled from many public and county data sets, compiled, analyzed, and automated using SAS^®. With over 60 indicators and 109 profiles, an efficient set of macros was used to automate the calculation of percentages, rates, and mean statistics for all of the indicators. Macros were also used to automate individual census tracts into pre-decided neighborhoods to avoid data entry errors. Simple SQL procedures were used to calculate and format percentages within the macros, and output was pushed out using Output Delivery System (ODS) Graphics. This output was exported to Microsoft Excel, which was used to create a sortable database for end users to compare cities and/or neighborhoods. Finally, the automated SAS output was used to map the demographic data using geographic information system (GIS) software at three geographies: city, neighborhood, and census tract. This presentation describes the use of simple macros and SAS procedures to reduce resources and time spent on checking data for quality assurance purposes. It also highlights the simple use of ODS Graphics to export data to an Excel file, which was used to mail merge the data into 109 unique profiles. The presentation is aimed at intermediate SAS users at local and state health departments who might be interested in finding an efficient way to run and present health statistics given limited staff and resources.

Read the paper (PDF).

Roshni Shah, Santa Clara County

Paper 3273-2015:

How to Create a UNIX Space Management Report Using SAS^®

Storage space on a UNIX platform is a costly--and finite--resource to maintain, even under ideal conditions. By regularly monitoring and promptly responding to space limitations that might occur during production, an organization can mitigate the risk of wasted expense, time and effort caused by this problem. SAS^® programmers at Truven Health Analytics have designed a reporting tool to measure space usage by a number of distinct factors over time. Using tabular and graphical output, the tool provides a full picture of what often contributes to critical reductions of available hardware space. It enables managers and users to respond appropriately and effectively whenever this occurs. It also helps to identify ways to encourage more efficient practices, thereby minimizing the likelihood of this occurring in the future. Operating System: RHEL 5.4 (Red Hat Enterprise Linux), Oracle Sun Fire X4600 M2 SAS^® 9.3 TS1M1.

Matthew Shevrin, Truven Health Analytcis

Paper 3143-2015:

How to Cut the Time of Processing of Ryan White Services (RSR) Data by 99% and More.

The widely used method to convert RSR XML data to some standard, ready-to-process database uses a Visual Basic mapper as a buffer tool when reading XML data (for example, into an MS Access database). This paper describes the shortcomings of this method with respect to the different schemas of RSR data and offers a SAS^® macro that enables users to read any schema of RSR data directly into a SAS relational database. This macro entirely eliminates the step of creating an MS Access database. Using our macro, the user can cut the time of processing of Ryan White data by 99% and more, depending on the number of files that need to be processed in one run.

Read the paper (PDF).

Michael Costa, Abt Associates

Fizza Gillani, Brown University and Lifespan/Tufts/Brown Center for AIDS Research.

Paper SAS1480-2015:

How to Maintain Happy SAS^® Users: A SAS Environment Primer for IT Professionals

Today's SAS^® environment has large numbers of concurrent SAS processes and ever-growing data volumes. To help SAS users remain productive, SAS administrators must ensure that SAS applications have sufficient computer resources, properly configured and monitored often. Understanding how all the components of SAS work and how they will be used by your users is the first step. The guidance offered in this paper will help SAS administrators evaluate hardware, operating system, and infrastructure options for a SAS environment that will keep their SAS applications running at optimal performance and their user community happy.

Read the paper (PDF).

Margaret Crevar, SAS

Paper 3252-2015:

How to Use SAS^® for GMM Logistic Regression Models for Longitudinal Data with Time-Dependent Covariates

In longitudinal data, it is important to account for the correlation due to repeated measures and time-dependent covariates. Generalized method of moments can be used to estimate the coefficients in longitudinal data, although there are currently limited procedures in SAS^® to produce GMM estimates for correlated data. In a recent paper, Lalonde, Wilson, and Yin provided a GMM model for estimating the coefficients in this type of data. SAS PROC IML was used to generate equations that needed to be solved to determine which estimating equations to use. In addition, this study extended classifications of moment conditions to include a type IV covariate. Two data sets were evaluated using this method, including re-hospitalization rates from a Medicare database as well as body mass index and future morbidity rates among Filipino children. Both examples contain binary responses, repeated measures, and time-dependent covariates. However, while this technique is useful, it is tedious and can also be complicated when determining the matrices necessary to obtain the estimating equations. We provide a concise and user-friendly macro to fit GMM logistic regression models with extended classifications.

Read the paper (PDF).

Katherine Cai, Arizona State University

I

Paper 2522-2015:

I Object: SAS^® Does Objects with DS2

The DATA step has served SAS^® programmers well over the years, and although it is powerful, it has not fundamentally changed. With DS2, SAS introduced a significant alternative to the DATA step by providing an object-oriented programming environment. In this paper, we share our experiences with getting started with DS2 and learning to use it to access, manage, and share data in a scalable, threaded, and standards-based way.

Read the paper (PDF).

Peter Eberhardt, Fernwood Consulting Group Inc.

Xue Yao, Winnipeg Regional Health Aurthority

Paper 3411-2015:

Identifying Factors Associated with High-Cost Patients

Research has shown that the top five percent of patients can account for nearly fifty percent of the total healthcare expenditure in the United States. Using SAS^® Enterprise Guide^® and PROC LOGISTIC, a statistical methodology was developed to identify factors (for example, patient demographics, diagnostic symptoms, comorbidity, and the type of procedure code) associated with the high cost of healthcare. Analyses were performed using the FAIR Health National Private Insurance Claims (NPIC) database, which contains information about healthcare utilization and cost in the United States. The analyses focused on treatments for chronic conditions, such as trans-myocardial laser revascularization for the treatment of coronary heart disease (CHD) and pressurized inhalation for the treatment of asthma. Furthermore, bubble plots and heat maps were created using SAS^® Visual Analytics to provide key insights into potentially high-cost treatments for heart disease and asthma patients across the nation.

Read the paper (PDF). | Download the data file (ZIP).

Jeff Dang, FAIR Health

Paper 2320-2015:

Implementing a Discrete Event Simulation Using the American Community Survey and the SAS^® University Edition

SAS^® University Edition is a great addition to the world of freely available analytic software, and this 'how-to' presentation shows you how to implement a discrete event simulation using Base SAS^® to model future US Veterans population distributions. Features include generating a slideshow using ODS output to PowerPoint.

Read the paper (PDF). | Download the data file (ZIP).

Michael Grierson

Paper 3295-2015:

Imputing Missing Data using SAS^®

Missing data is an unfortunate reality of statistics. However, there are various ways to estimate and deal with missing data. This paper explores the pros and cons of traditional imputation methods versus maximum likelihood estimation as well as singular versus multiple imputation. These differences are displayed through comparing parameter estimates of a known data set and simulating random missing data of different severity. In addition, this paper uses PROC MI and PROC MIANALYZE and shows how to use these procedures in a longitudinal data set.

Read the paper (PDF).

Christopher Yim, Cal Poly San Luis Obispo

Paper 2120-2015:

Inside the DATA Step: Pearls of Wisdom for the Novice SAS^® Programmer

Why did my merge fail? How did that variable get truncated? Why am I getting unexpected results? Understanding how the DATA step actually works is the key to answering these and many other questions. In this paper, two independent consultants with a combined three decades of SAS^® programming experience share a treasure trove of knowledge aimed at helping the novice SAS programmer take his or her game to the next level by peering behind the scenes of the DATA step. We touch on a variety of topics, including compilation versus execution, the program data vector, and proper merging techniques, with a focus on good programming practices that help the programmer steer clear of common pitfalls.

Read the paper (PDF).

Joshua Horstman, Nested Loop Consulting

Britney Gilbert, Juniper Tree Consulting

Joshua Horstman, Nested Loop Consulting

Paper 3461-2015:

Integrating Data and Business Rules with a Control Data Set in SAS^®

In SAS^® software development, data specifications and process requirements can be built into user-defined control data set functioning as components of ETL routines. A control data set provides comprehensive definition on the data source, relationship, logic, description, and metadata of each data element. This approach facilitates auto-generated SAS codes during program execution to perform data ingestion, transformation, and loading procedures based on rules defined in the table. This paper demonstrates the application of using a control data set for the following: (1) data table initialization and integration; (2) validation and quality control; (3) element transformation and creation; (4) data loading; and (5) documentation. SAS programmers and business analysts would find programming development and maintenance of business rules more efficient with this standardized method.

Read the paper (PDF).

Edmond Cheng, CACI International Inc

Paper 2500-2015:

Integrating SAS^® and the R Language with Microsoft SharePoint

Microsoft SharePoint has been adopted by a number of companies today as their content management tool because of its ability to create and manage documents, records, and web content. It is described as an enterprise collaboration platform with a variety of capabilities, and thus it stands to reason that this platform should also be used to surface content from analytical applications such as SAS^® and the R language. SAS provides various methods for surfacing SAS content through SharePoint. This paper describes one such methodology that is both simple and elegant, requiring only SAS Foundation. It also explains how SAS and R can be used together to form a robust solution for delivering analytical results. The paper outlines the approach for integrating both languages into a single security model that uses Microsoft Active Directory as the primary authentication mechanism for SharePoint. It also describes how to extend the authorization to SAS running on a Linux server where LDAP is used. Users of this system are blissfully ignorant of the back-end technology components, as we offer up a seamless interface where they simply authenticate to the SharePoint site and the rest is, as they say, magic.

Read the paper (PDF).

Piyush SIngh, TATA consultancy services limited

Prasoon Sangwan, TATA CONSULTANCY SERVICES

Shiv Govind Yadav

Paper 3052-2015:

Introduce a Linear Regression Model by Using the Variable Transformation Method

This paper explains how to build a linear regression model using the variable transformation method. Testing the assumptions, which is required for linear modeling and testing the fit of a linear model, is included. This paper is intended for analysts who have limited exposure to building linear models. This paper uses the REG, GLM, CORR, UNIVARIATE, and GPLOT procedures.

Read the paper (PDF). | Download the data file (ZIP).

Nancy Hu, Discover

Paper 2986-2015:

Introduction to Output Delivery System (ODS)

This presentation teaches the audience how to use ODS Graphics. Now part of Base SAS^®, ODS Graphics are a great way to easily create clear graphics that enable any user to tell their story well. SGPLOT and SGPANEL are two of the procedures that can be used to produce powerful graphics that used to require a lot of work. The core of the procedures is explained, as well as some of the many options available. Furthermore, we explore the ways to combine the individual statements to make more complex graphics that tell the story better. Any user of Base SAS on any platform will find great value in the SAS ODS Graphics procedures.

Chuck Kincaid, Experis

Paper 3024-2015:

Introduction to SAS^® Hash Objects

The SAS^® hash object is an incredibly powerful technique for integrating data from two or more data sets based on a common key. This session describes the basic methodology for defining, populating, and using a hash object to perform lookups within the DATA step and provides examples of situations in which the performance of SAS programs is improved by their use. Common problems encountered when using hash objects are explained, and tools and techniques for optimizing hash objects within your SAS program are demonstrated.

Read the paper (PDF).

Chris Schacherer, Clinical Data Management Systems, LLC

K

Paper 2480-2015:

Kaplan-Meier Survival Plotting Macro %NEWSURV

The research areas of pharmaceuticals and oncology clinical trials greatly depend on time-to-event endpoints such as overall survival and progression-free survival. One of the best graphical displays of these analyses is the Kaplan-Meier curve, which can be simple to generate with the LIFETEST procedure but difficult to customize. Journal articles generally prefer that statistics such as median time-to-event, number of patients, and time-point event-free rate estimates be displayed within the graphic itself, and this was previously difficult to do without an external program such as Microsoft Excel. The macro %NEWSURV takes advantage of the Graph Template Language (GTL) that was added with the SG graphics engine to create this level of customizability without the need for back-end manipulation. Taking this one step further, the macro was improved to be able to generate a lattice of multiple unique Kaplan-Meier curves for side-by-side comparisons or for condensing figures for publications. This paper describes the functionality of the macro and describes how the key elements of the macro work.

Read the paper (PDF).

Jeffrey Meyers, Mayo Clinic

Paper 3023-2015:

Killing Them with Kindness: Policies Not Based on Data Might Do More Harm Than Good

Educational administrators sometimes have to make decisions based on what they believe is in the best interest of their students because they do not have the data they need at the time. Some administrators do not even know that the data exist to help them make their decisions. However, well-intentioned policies that are not based on facts can sometimes do more harm than good for the students and the institution. This presentation discusses the results of the policy analyses conducted by the Office of Institutional Research at Western Kentucky University using Base SAS^®, SAS/STAT^®, SAS^® Enterprise Miner™, and SAS^® Visual Analytics. The researchers analyzed Western Kentucky University's math course placement procedure for incoming students and assessed the criteria used for admissions decisions, including those for first-time first-year students, transfer students, and students readmitted to the University after being dismissed for unsatisfactory academic progress--procedures and criteria previously designed with the students' best interests at heart. The presenters discuss the statistical analyses used to evaluate the policies and the use of SAS Visual Analytics to present their results to administrators in a visual manner. In addition, the presenters discuss subsequent changes in the policies, and where possible, the results of the policy changes.

Read the paper (PDF).

Tuesdi Helbig, Western Kentucky University

Matthew Foraker, Western Kentucky University

L

Paper 3297-2015:

Lasso Regularization for Generalized Linear Models in Base SAS^® Using Cyclical Coordinate Descent

The cyclical coordinate descent method is a simple algorithm that has been used for fitting generalized linear models with lasso penalties by Friedman et al. (2007). The coordinate descent algorithm can be implemented in Base SAS^® to perform efficient variable selection and shrinkage for GLMs with the L1 penalty (the lasso).

Read the paper (PDF).

Robert Feyerharm, Beacon Health Options

Paper SAS1955-2015:

Latest and Greatest: Best Practices for Migrating to SAS^® 9.4

SAS^® customers benefit greatly when they are using the functionality, performance, and stability available in the latest version of SAS. However, the task of moving all SAS collateral such as programs, data, catalogs, metadata (stored processes, maps, queries, reports, and so on), and content to SAS^® 9.4 can seem daunting. This paper provides an overview of the steps required to move all SAS collateral from systems based on SAS^® 9.2 and SAS^® 9.3 to the current release of SAS^® 9.4.

Read the paper (PDF).

Alec Fernandez, SAS

Paper 3372-2015:

Leads and Lags: Static and Dynamic Queues in the SAS^® DATA Step

From stock price histories to hospital stay records, analysis of time series data often requires the use of lagged (and occasionally lead) values of one or more analysis variables. For the SAS^® user, the central operational task is typically getting lagged (lead) values for each time point in the data set. Although SAS has long provided a LAG function, it has no analogous lead function--an especially significant problem in the case of large data series. This paper reviews the LAG function (in particular, the powerful but non-intuitive implications of its queue-oriented basis), demonstrates efficient ways to generate leads with the same flexibility as the LAG function (but without the common and expensive recourse of data re-sorting), and shows how to dynamically generate leads and lags through the use of the hash object.

Read the paper (PDF). | Download the data file (ZIP).

Mark Keintz, Wharton Research Data Services

Paper 1408-2015:

Learn Hidden Ideas in Base SAS^® to Impress Colleagues

Across the languages of SAS^® are many golden nuggets--functions, formats, and programming features just waiting to impress your friends and colleagues. Learning SAS over 30+ years, I have collected a few, and I offer them to you in this presentation.

Read the paper (PDF).

Peter Crawford, Crawford Software Consultancy Limited

Paper 3514-2015:

Learn How Slalom Consulting and Celebrity Cruises Bridge the Marketing Campaign Attribution Divide

Although today's marketing teams enjoy large-scale campaign relationship management systems, many are still left with the task of bridging the well-known gap between campaigns and customer purchasing decisions. During this session, we discuss how Slalom Consulting and Celebrity Cruises decided to take a bold step and bridge that gap. We show how marketing efforts are distorted when a team considers only the last campaign sent to a customer that later booked a cruise. Then we lay out a custom-built SAS 9.3 solution that scales to process thousands of campaigns per month using a stochastic attribution technique. This approach considers all of the campaigns that touch the customer, assigning a single campaign or a set of campaigns that contributed to their decision.

Christopher Byrd, Slalom Consulting

Paper 3203-2015:

Learning Analytics to Evaluate and Confirm Pedagogic Choices

There are many pedagogic theories and practices that academics research and follow as they strive to ensure excellence in their students' achievements. In order to validate the impact of different approaches, there is a need to apply analytical techniques to evaluate the changing levels of achievements that occur as a result of changes in applied pedagogy. The analytics used should be easily accessible to all academics with minimal overhead in terms of the collection of new data. This paper is based on a case study of the changing pedagogical approaches of the author over the past five years, using grade profiles from a wide range of modules taught by the author in both the School of Computing and Maths and the Business School at the University of Derby. Base SAS^® and SAS^® Studio were used to evaluate and demonstrate the impact of the change from a pedagogical position of Academic as Domain Expert to a pedagogical position of Academic as Learning-to-Learn Expert . This change resulted in greater levels of research that supported learning along with better writing skills. The application of Learning Analytics in this case study demonstrates a very significant improvement in grade profiles of all students of between 15% and 20%. More surprisingly, it demonstrates that it also eliminates a significant grade deficit in the black and minority ethnic student population, which is typically about 15% in a large number of UK universities.

Read the paper (PDF).

Richard Self, University of Derby

Paper SAS1748-2015:

Lost in the Forest Plot? Follow the Graph Template Language AXISTABLE Road!

A forest plot is a common visualization for meta-analysis. Some popular versions that use subgroups with indented text and bold fonts can seem outright daunting to create. With SAS^® 9.4, the Graph Template Language (GTL) has introduced the AXISTABLE statement, specifically designed for including text data columns into a graph. In this paper, we demonstrate the simplicity of creating various forest plots using AXISTABLE statements. Come and see how to create forest plots as clear as day!

Read the paper (PDF). | Download the data file (ZIP).

Prashant Hebbar, SAS

M

Paper 2180-2015:

Maintaining a 'Look and Feel' throughout a Reporting Package Created with Diverse SAS^® Products

SAS^® provides a number of tools for creating customized professional reports. While SAS provides point-and-click interfaces through products such as SAS^® Web Report Studio, SAS^® Visual Analytics or even SAS^® Enterprise Guide^®, unfortunately, many users do not have access to the high-end tools and require customization beyond the SAS Enterprise Guide point-and-click interface. Fortunately, base SAS procedures such as the REPORT procedure, combined with graphics procedures, macros, ODS, and Annotate can be used to create very customized professional reports. When toggling together different solutions such as SAS Statistical Graphics, the REPORT procedure, ODS, and SAS/GRAPH^®, different techniques need to be used to keep the same look and feel throughout the report package. This presentation looks at solutions that can be used to keep a consistent look and feel in a report package created with different SAS products.

Read the paper (PDF).

Barbara Okerson, Anthem

Paper 2481-2015:

Managing Extended Attributes With a SAS^® Enterprise Guide^® Add-In

SAS^® 9.4 introduced extended attributes, which are name-value pairs that can be attached to either the data set or to individual variables. Extended attributes are managed through PROC DATASETS and can be viewed through PROC CONTENTS or through Dictionary.XATTRS. This paper describes the development of a SAS^® Enterprise Guide^® custom add-in that allows for the entry and editing of extended attributes, with the possibility of using a controlled vocabulary. The controlled vocabulary used in the initial application is derived from the lifecycle branch of the Data Documentation Initiative metadata standard (DDI-L).

Read the paper (PDF).

Larry Hoyle, IPSR, Univ. of Kansas

Paper 2884-2015:

Managing Files by Click and Drag?

File management is a tedious process that can be automated by using SAS^® to create and execute a Windows command script. The macro in this paper copies files from one location to another, identifies obsolete files by the version number, and then moves them to an archive folder. Assuming that some basic conditions are met, this macro is intended to be easy to use and robust. Windows users who run routine programs for projects with rework might want to consider this solution.

Read the paper (PDF).

Jason Wachsmuth, Public Policy Center at The University of Iowa

Paper 3399-2015:

Managing Qualtrics Survey Distributions and Response Data with SAS^®

Qualtrics is an online survey tool that offers a variety of features useful to researchers. In this paper, we show you how to implement the different options available for distributing surveys and downloading survey responses. We use the FILENAME statement (URL access method) and process the API responses with SAS^® XML Mapper. In addition, we show an approach for how to keep track of active and inactive respondents.

Read the paper (PDF).

Faith Parsons, Columbia University Medical Center

Sean Mota, Columbia University Medical Center

Yan Quan, Columbia University

Paper 3193-2015:

Mapping out SG Procedures and Using PROC SGPLOT for Mapping

This paper describes the main functions of the SAS^® SG procedures and their relations. It also offers a way to create data-colored maps using these procedures. Here are the basics of the SG procedures. For a few years, the SG procedures (PROC SGPLOT, PROC SGSCATTER, PROC SGPANEL, and so on) have been part of Base SAS^® and thus available for everybody. SG originated as Statistical Graphics , but nowadays the procedures are often referred to as SAS^® ODS Graphics. With the syntax in a 1000+ page document, it is quite a challenge to start using them. Also, SAS^® Enterprise Guide^® currently has no graphics tasks that generate code for the SG procedures (except those in the statistical arena). For a long time SAS/GRAPH^® has been the vehicle for producing presentation-ready graphs of your data. In particular, the SAS users that have experience with those SAS/GRAPH procedures will hesitate to change over. But the SG procedures continue to be enhanced with new features. And, because the appearance of many elements is governed by the ODS styles, they are very well suited to provide a consistent style across all your output, text and graphics. PROC SGPLOT - PROC SGPANEL - PROC SGSCATTER: The paper first describes the basic procedures that a user will start with; PROC SGPLOT is the first place. Then the more elaborate possibilities of PROC SGPANEL and PROC SGSCATTER are described. Both these procedures can create a matrix or panel of graphs. The different goals of these two procedures will be explained: comparing a group of variables versus comparing the levels of two variables. PROC SGPLOT can create many different graphs: histograms, time series, scatterplots, and so on. PROC SGPANEL has essentially the same possibilities. The nature of PROC SGSCATTER (and the name says it already) limits it to scatter-like graphs. But many statements and options are common to lots of types of graphs. This paper groups them logically , making clear what the procedures have in common and where they differ. Related to the SG procedures are also two utilities (the Graphics Editor and the Graphics Designer), which are delivered as SAS^® Foundation applications. The paper describes the relations between these utilities and the objects they produce, and the relevant SG procedures and related utilities. Creating a map for virtually all tasks that can be performed with the well known SAS/GRAPH procedures, the counterpart in the SG procedures is easily pointed out, often with more extensive features. This is not the case for the maps produced with PROC GMAP. This paper shows the mere few steps that are necessary to convert the data sets that contain your data and your map coordinates into data sets that enable you to use the power and features of PROC SGPLOT to create your map in any projection system and any coordinate window.

Read the paper (PDF). | Download the data file (ZIP).

Frank Poppe, PW Consulting

Paper 3375-2015:

Maximizing a Churn Campaign's Profitability with Cost-sensitive Predictive Analytics

Predictive analytics has been widely studied in recent years, and it has been applied to solve a wide range of real-world problems. Nevertheless, current state-of-the-art predictive analytics models are not well aligned with managers' requirements in that the models fail to include the real financial costs and benefits during the training and evaluation phases. Churn predictive modeling is one of those examples in which evaluating a model based on a traditional measure such as accuracy or predictive power does not yield the best results when measured by investment per subscriber in a loyalty campaign and the financial impact of failing to detect a real churner versus wrongly predicting a non-churner as a churner. In this paper, we propose a new financially based measure for evaluating the effectiveness of a voluntary churn campaign, taking into account the available portfolio of offers, their individual financial cost, and the probability of acceptance depending on the customer profile. Then, using a real-world churn data set, we compared different cost-insensitive and cost-sensitive predictive analytics models and measured their effectiveness based on their predictive power and cost optimization. The results show that using a cost-sensitive approach yields to an increase in profitability of up to 32.5%.

Alejandro Correa Bahnsen, University of Luxembourg

Darwin Amezquita, DIRECTV

Juan Camilo Arias, Smartics

Paper 3760-2015:

Methodological and Statistical Issues in Provider Performance Assessment

With the move to value-based benefit and reimbursement models, it is essential toquantify the relative cost, quality, and outcome of a service. Accuratelymeasuring the cost and quality of doctors, practices, and health systems iscritical when you are developing a tiered network, a shared savings program, ora pay-for-performance incentive. Limitations in claims payment systems requiredeveloping methodological and statistical techniques to improve the validityand reliability of provider's scores on cost and quality of care. This talkdiscusses several key concepts in the development of a measurement systemfor provider performance, including measure selection, risk adjustment methods,and peer group benchmark development.

Read the paper (PDF). | Watch the recording.

Daryl Wansink, Qualmetrix, Inc.

Paper 1381-2015:

Model Risk and Corporate Governance of Models with SAS^®

Banks can create a competitive advantage in their business by using business intelligence (BI) and by building models. In the credit domain, the best practice is to build risk-sensitive models (Probability of Default, Exposure at Default, Loss-given Default, Unexpected Loss, Concentration Risk, and so on) and implement them in decision-making, credit granting, and credit risk management. There are models and tools on the next level built on these models and that are used to help in achieving business targets, risk-sensitive pricing, capital planning, optimizing of ROE/RAROC, managing the credit portfolio, setting the level of provisions, and so on. It works remarkably well as long as the models work. However, over time, models deteriorate and their predictive power can drop dramatically. Since the global financial crisis in 2008, we have faced a tsunami of regulation and accelerated frequency of changes in the business environment, which cause models to deteriorate faster than ever before. As a result, heavy reliance on models in decision-making (some decisions are automated following the model's results--without human intervention) might result in a huge error that can have dramatic consequences for the bank's performance. In my presentation, I share our experience in reducing model risk and establishing corporate governance of models with the following SAS^® tools: model monitoring, SAS^® Model Manager, dashboards, and SAS^® Visual Analytics.

Read the paper (PDF).

Boaz Galinson, Bank Leumi

Paper SAS1715-2015:

More Data, Less Chatter: Improving Performance on z/OS via IBM zHPF

This paper describes how we reduced elapsed time for the third maintenance release for SAS^® 9.4 by as much as 22% by using the High Performance FICON for IBM System z (zHPF) facility to perform I/O for SAS^® files on IBM mainframe systems. The paper details the performance improvements, internal testing to quantify improvements, and the customer actions needed to enable zHPF on their system. The benefits of zHPF are discussed within the larger context of other techniques that a customer can use to accelerate processing of SAS files.

Read the paper (PDF).

Lewis King, SAS

Fred Forst

Paper 2720-2015:

Multinomial Logistic Model for Long-Term Value

Customer Long-Term Value (LTV) is a concept that is readily explained at a high level to marketing management of a company, but its analytic development is complex. This complexity involves the need to forecast customer behavior well into the future. This behavior includes the timing, frequency, and profitability of a customer's future purchases of products and services. This paper describes a method for computing LTV. First, a multinomial logistic regression provides probabilities for time-of-first-purchase, time-of-second-purchase, and so on, for each customer. Then the profits for the first purchase, second purchase, and so on, are forecast but only after adjustment for non-purchaser selection bias. Finally, these component models are combined in the LTV formula.

Read the paper (PDF).

Bruce Lund, Marketing Associates, LLC

Paper 3264-2015:

Multiple Product Affinity Makes Much More Sense

Retailers proactively seek a data-driven approach to provide customized product recommendations to guarantee sales increase and customer loyalty. Product affinity models have been recognized as one of the vital tools for this purpose. The algorithm assigns a customer to a product affinity group when the likelihood of purchasing is the highest and the likelihood meets the minimum and absolute requirement. However, in practice, valuable customers, up to 30% of the total universe, who buy across multiple product categories with two or more balanced product affinity likelihoods, are undefined and unable to be effectively product recommended. This paper presents multiple product affinity models that are developed using SAS^® macro language to address the problem. In this paper, we demonstrate how the innovative assignment algorithm successfully assigns the undefined customers to appropriate multiple product affinity groups using nationwide retailer transactional data. In addition, the result shows that potential customers establish loyalty through migration from a single to multiple product affinity groups. This comprehensive and insightful business solution will be shared in this paper. Also, this paper provides a clustering algorithm and nonparametric tree model for model building. The customer assignment for using SAS macro code is provided in an appendix.

Read the paper (PDF).

Hsin-Yi Wang, Alliance Data Systems

Paper 2900-2015:

Multiple Ways to Detect Differential Item Functioning in SAS^®

Differential item functioning (DIF), as an assessment tool, has been widely used in quantitative psychology, educational measurement, business management, insurance, and health care. The purpose of DIF analysis is to detect response differences of items in questionnaires, rating scales, or tests across different subgroups (for example, gender) and to ensure the fairness and validity of each item for those subgroups. The goal of this paper is to demonstrate several ways to conduct DIF analysis by using different SAS^® procedures (PROC FREQ, PROC LOGISITC, PROC GENMOD, PROC GLIMMIX, and PROC NLMIXED) and their applications. There are three general methods to examine DIF: generalized Mantel-Haenszel (MH), logistic regression, and item response theory (IRT). The SAS^® System provides flexible procedures for all these approaches. There are two types of DIF: uniform DIF, which remains consistent across ability levels, and non-uniform DIF, which varies across ability levels. Generalized MH is a nonparametric method and is often used to detect uniform DIF while the other two are parametric methods and examine both uniform and non-uniform DIF. In this study, I first describe the underlying theories and mathematical formulations for each method. Then I show the SAS statements, input data format, and SAS output for each method, followed by a detailed demonstration of the differences among the three methods. Specifically, PROC FREQ is used to calculate generalized MH only for dichotomous items. PROC LOGISITIC and PROC GENMOD are used to detect DIF by using logistic regression. PROC NLMIXED and PROC GLIMMIX are used to examine DIF by applying an exploratory item response theory model. Finally, I use SAS/IML^® to call two R packages (that is, difR and lordif) to conduct DIF analysis and then compare the results between SAS procedures and R packages. An example data set, the Verbal Aggression assessment, which includes 316 subjects and 24 items, is used in this stud y. Following the general DIF analysis, the male group is used as the reference group, and the female group is used as the focal group. All the analyses are conducted by SAS^® 9.3 and R 2.15.3. The paper closes with the conclusion that the SAS System provides different flexible and efficient ways to conduct DIF analysis. However, it is essential for SAS users to understand the underlying theories and assumptions of different DIF methods and apply them appropriately in their DIF analyses.

Read the paper (PDF).

Yan Zhang, Educational Testing Service

N

Paper 3253-2015:

Need Additional Statistics in That Report? ODS OUTPUT to the Rescue!

You might be familiar with or experienced in writing or running reports using PROC REPORT, PROC TABULATE, or other methods of report generation. These reporting methods are often very flexible, but they can be limited in the statistics that are available as options for inclusion in the resulting output. SAS^® provides the capability to produce a variety of statistics through Base SAS^® and SAS/STAT^® procedures by using ODS OUTPUT. These procedures include statistics from PROC CORR, PROC FREQ, and PROC UNIVARIATE in Base SAS, as well as PROC GLM, PROC LIFETEST, PROC MIXED, PROC LOGISTIC, and PROC TTEST in SAS/STAT. A number of other procedures can also produce useful ODS OUTPUT objects. Commonly requested statistics for reports include p-values, confidence intervals, and test statistics. These values can be computed with the appropriate procedure, and then use ODS OUTPUT to output the desired information to a data set and include the new information with the other data used to produce the report. Examples that demonstrate how to easily generate the desired statistics or other information and include it to produce the requested final reports are provided and discussed.

Read the paper (PDF).

Debbie Buck, inVentiv Health Clinical

Paper SAS1575-2015:

New Macro Features Added in SAS^® 9.3 and SAS^® 9.4

This paper describes the new features added to the macro facility in SAS^® 9.3 and SAS^® 9.4. New features described include the /READONLY option for macro variables, the %SYSMACEXIST macro function, the %PUT &= feature, and new automatic macro variables such as &SYSTIMEZONEOFFSET.

Read the paper (PDF).

Rick Langston, SAS

Paper 3455-2015:

Nifty Uses of SQL Reflexive Join and Subquery in SAS^®

SAS^® SQL is so powerful that you hardly miss using Oracle PL/SQL. One SAS SQL forte can be found in using the SQL reflexive join. Another area of SAS SQL strength is the SQL subquery concept. The focus of this paper is to show alternative approaches to data reporting and to show how to surface data quality problems using reflexive join and subquery SQL concepts. The target audience for this paper is the intermediate SAS programmer or the experienced ANSI SQL programmer new to SAS programming.

Read the paper (PDF).

Cynthia Trinidad, Theorem Clinical Research

O

Paper 2680-2015:

Object Oriented Program Design in DS2

The DS2 programming language was introduced as part of the SAS^® 9.4 release. Although this new language introduced many significant advancements, one of the most overlooked features is the addition of object-oriented programming constructs. Specifically, the addition of user-defined packages and methods enables programmers to create their own objects, greatly increasing the opportunity for code reuse and decreasing both development and QA duration. In addition, using this object-oriented approach provides a powerful design methodology where objects closely resemble the real-world entities that they model, leading to programs that are easier to understand and maintain. This paper introduces the object-oriented programming paradigm in a three-step manner. First, the key object-oriented features found in the DS2 language are introduced, and the value each provides is discussed. Next, these object-oriented concepts are demonstrated through the creation of a blackjack simulation where the players, the dealer, and the deck are modeled and coded as objects. Finally, a credit risk scoring object is presented to demonstrate the application of this approach in a real-world setting.

Read the paper (PDF).

Shaun Kaufmann, Farm Credit Canada

Paper SAS1520-2015:

Operations Integration, Audits, and Performance Analysis: Getting the Most Out of SAS^® Environment Manager

The SAS^® Environment Manager Service Architecture expands on the core monitoring capabilities of SAS^® Environment Manager delivered in SAS^® 9.4. Multiple sources of data available in the SAS^® Environment Manager Data Mart--traditional operational performance metrics, events, and ARM, audit, and access logs--together with built-in and custom reports put powerful capabilities into the hands of IT operations. This paper introduces the concept of service-oriented even identification and discusses how to use the new architecture and tools effectively as well as the wealth of data available in the SAS Environment Manager Data Mart. In addition, extensions for importing new data, writing custom reports, instrumenting batch SAS^® jobs, and leveraging and extending auditing capabilities are explored.

Read the paper (PDF).

Bob Bonham, SAS

Bryan Ellington, SAS

Paper 3296-2015:

Out of Control! A SAS^® Macro to Recalculate QC Statistics

SAS/QC^® provides procedures, such as PROC SHEWHART, to produce control charts with centerlines and control limits. When quality improvement initiatives create an out-of-control process of improvement, centerlines and control limits need to be recalculated. While this is not a complicated process, producing many charts with multiple centerline shifts can quickly become difficult. This paper illustrates the use of a macro to efficiently compute centerlines and control limits when one or more recalculations are needed for multiple charts.

Read the paper (PDF).

Jesse Pratt, Cincinnati Children's Hospital Medical Center

P

Paper 3459-2015:

PROC CATALOG, the Wish Book SAS^® Procedure

SAS^® data sets have PROC DATASETS, and SAS catalogs have PROC CATALOG. Find out what the little-known PROC CATALOG can do for you!

Read the paper (PDF).

Louise Hadden, Abt Associates Inc.

Paper 3370-2015:

PROC RANK, PROC SQL, PROC FORMAT, and PROC GMAP Team Up and a (Map) Legend Is Born!

The task was to produce a figure legend that gave the quintile ranges of a continuous measure corresponding to each color on a five-color choropleth US map. Actually, we needed to produce the figures and associated legends for several dozen maps for several dozen different continuous measures and time periods, as well as create the associated alt text for compliance with Section 508. So, the process needed to be automated. A method was devised using PROC RANK to generate the quintiles, PROC SQL to get the data value ranges within each quintile, and PROC FORMAT (with the CNTLIN= option) to generate and store the legend labels. The resulting data files and format catalogs were used to generate both the maps (with legends) and associated alt text. Then, these processes were rolled into a macro to apply the method for the many different maps and their legends. Each part of the method is quite simple--even mundane--but together, these techniques enabled us to standardize and automate an otherwise very tedious process. The same basic strategy could be used whenever you need to dynamically generate data buckets and keep track of the bucket boundaries (for producing labels, map legends, or alt text or for benchmarking future data against the stored categories).

Read the paper (PDF).

Christianna Williams, Self-Employed

Louise Hadden, Abt Associates Inc.

Paper 3154-2015:

PROC SQL for PROC SUMMARY Stalwarts

One of the fascinating features of SAS^® is that the software often provides multiple ways to accomplish the same task. A perfect example of this is the aggregation and summarization of data across multiple rows or BY groups of interest. These groupings can be study participants, time periods, geographical areas, or just about any type of discrete classification that you want. While many SAS programmers might be accustomed to accomplishing these aggregation tasks with PROC SUMMARY (or equivalently, PROC MEANS), PROC SQL can also do a bang-up job of aggregation--often with less code and fewer steps. This step-by-step paper explains how to use PROC SQL for a variety of summarization and aggregation tasks. It uses a series of concrete, task-oriented examples to do so. The presentation style issimilar to that used in the author's previous paper, PROC SQL for DATA Step Die-Hards.'

Read the paper (PDF).

Christianna Williams, Self-Employed

Paper 3340-2015:

Performing Efficient Transposes on Large Teradata Tables Using SQL Explicit Pass-Through

It is a common task to reshape your data from long to wide for the purpose of reporting or analytical modeling and PROC TRANSPOSE provides a convenient way to accomplish this. However, when performing the transpose action on large tables stored in a database management system (DBMS) such as Teradata, the performance of PROC TRANSPOSE can be significantly compromised. In this case, it is more efficient for the DBMS to perform the transpose task. SAS^® provides in-database processing technology in PROC SQL, which allows the SQL explicit pass-through method to push some or all of the work to the DBMS. This technique has facilitated integration between SAS and a wide range of data warehouses and databases, including Teradata, EMC Greenplum, IBM DB2, IBM Netezza, Oracle, and Aster Data. This paper uses the Teradata database as an example DBMS and explains how to transpose a large table that resides in it using the SQL explicit pass-through method. The paper begins with comparing the execution time using PROC TRANSPOSE with the execution time using SQL explicit pass-through. From this comparison, it is clear that SQL explicit pass-through is more efficient than the traditional PROC TRANSPOSE when transposing Teradata tables, especially large tables. The paper explains how to use the SQL explicit pass-through method and discusses the types of data columns that you might need to transpose, such as numeric and character. The paper presents a transpose solution for these types of columns. Finally, the paper provides recommendations on packaging the SQL explicit pass-through method by embedding it in a macro. SAS programmers who are working with data stored in an external DBMS and who would like to efficiently transpose their data will benefit from this paper.

Read the paper (PDF).

Tao Cheng, Accenture

Paper 2440-2015:

Permit Me to Permute: A Basic Introduction to Permutation Tests with SAS/IML^® Software

If your data do not meet the assumptions for a standard parametric test, you might want to consider using a permutation test. By randomly shuffling the data and recalculating a test statistic, a permutation test can calculate the probability of getting a value equal to or more extreme than an observed test statistic. With the power of matrices, vectors, functions, and user-defined modules, the SAS/IML^® language is an excellent option. This paper covers two examples of permutation tests: one for paired data and another for repeated measures analysis of variance. For those new to SAS/IML^® software, this paper offers a basic introduction and examples of how effective it can be.

Read the paper (PDF).

John Vickery, North Carolina State University

Paper 3516-2015:

Piecewise Linear Mixed Effects Models Using SAS

Evaluation of the impact of critical or high-risk events or periods in longitudinal studies of growth might provide clues to the long-term effects of life events and efficacies of preventive and therapeutic interventions. Conventional linear longitudinal models typically involve a single growth profile to represent linear changes in an outcome variable across time, which sometimes does not fit the empirical data. The piecewise linear mixed-effects models allow different linear functions of time corresponding to the pre- and post-critical time point trends. This presentation shows: 1) how to perform piecewise linear mixed effects models using SAS step by step, in the context of a clinical trial with two-arm interventions and a predictive covariate of interest; 2) how to obtain the slopes and corresponding p-values for intervention and control groups during pre- and post-critical periods, conditional on different values of the predictive covariate; and 3) explains how to make meaningful comparisons and present results in a scientific manuscript. A SAS macro to generate the summary tables assisting the interpretation of the results is also provided.

Qinlei Huang, St Jude Children's Research Hospital

Paper 3241-2015:

Pin the SAS^® Tail on the Microsoft Excel Donkey: Automatically Sizing and Positioning SAS Graphics for Excel

Okay, you've read all the books, manuals, and papers and can produce graphics with SAS/GRAPH^® and Output Delivery System (ODS) Graphics with the best of them. But how do you handle the Final Mile problem--getting your images generated in SAS^® sized just right and positioned just so in Microsoft Excel? This paper presents a method of doing so that employs SAS Integration Technologies and Excel Visual Basic for Applications (VBA) to produce SAS graphics and automatically embed them in Excel worksheets. This technique might be of interest to all skill levels. It uses Base SAS^®, SAS/GRAPH, ODS Graphics, the SAS macro facility, SAS^® Integration Technologies, Microsoft Excel, and VBA.

Read the paper (PDF).

Ted Conway, Self

Paper 1884-2015:

Practical Implications of Sharing Data: A Primer on Data Privacy, Anonymization, and De-Identification

Researchers, patients, clinicians, and other health-care industry participants are forging new models for data-sharing in hopes that the quantity, diversity, and analytic potential of health-related data for research and practice will yield new opportunities for innovation in basic and translational science. Whether we are talking about medical records (for example, EHR, lab, notes), administrative data (claims and billing), social (on-line activity), behavioral (fitness trackers, purchasing patterns), contextual (geographic, environmental), or demographic data (genomics, proteomics), it is clear that as health-care data proliferates, threats to security grow. Beginning with a review of the major health-care data breeches in our recent history, we highlight some of the lessons that can be gleaned from these incidents. In this paper, we talk about the practical implications of data sharing and how to ensure that only the right people have the right access to the right level of data. To that end, we explore not only the definitions of concepts like data privacy, but we discuss, in detail, methods that can be used to protect data--whether inside our organization or beyond its walls. In this discussion, we cover the fundamental differences between encrypted data, 'de-identified', 'anonymous', and 'coded' data, and methods to implement each. We summarize the landscape of maturity models that can be used to benchmark your organization's data privacy and protection of sensitive data.

Read the paper (PDF). | Watch the recording.

Greg Nelson, ThotWave

Paper 3307-2015:

Preparing Output from Statistical Procedures for Publication, Part 1: PROC REG to APA Format

Many scientific and academic journals require that statistical tables be created in a specific format, with one of the most common formats being that of the American Psychological Association (APA). The APA publishes a substantial guide book to writing and formatting papers, including an extensive section on creating tables (Nichol 2010). However, the output generated by SAS^® procedures does not match this style. This paper discusses techniques to change the SAS procedure output to match the APA guidelines using SAS ODS (Output Delivery System).

Read the paper (PDF).

Vince DelGobbo, SAS

Peter Flom, Peter Flom Consulting

Paper 2103-2015:

Preparing Students for the Real World with SAS^® Studio

A common complaint of employers is that educational institutions do not prepare students for the types of messy data and multi-faceted requirements that occur on the job. No organization has data that resembles the perfectly scrubbed data sets in the back of a statistics textbook. The objective of the Annual Report Project is to quickly bring new SAS^® users to a level of competence where they can use real data to meet real business requirements. Many organizations need annual reports for stockholders, funding agencies, or donors. Or, they need annual reports at the department or division level for an internal audience. Being tapped as part of the team creating an annual report used to mean weeks of tedium, poring over columns of numbers in 8-point font in (shudder) Excel spreadsheets, but no more. No longer painful, using a few SAS procedures and functions, reporting can be easy and, dare I say, fun. All analyses are done using SAS^® Studio (formerly SAS^® Web Editor) of SAS OnDemand for Academics. This paper uses an example with actual data for a report prepared to comply with federal grant funding requirements as proof that, yes, it really is that simple.

Read the paper (PDF). | Watch the recording.

AnnMaria De Mars, AnnMaria De Mars

Paper 2863-2015:

"Puck Pricing": Dynamic Hockey Ticket Price Optimization

Dynamic pricing is a real-time strategy where corporations attempt to alter prices based on varying market demand. The hospitality industry has been doing this for quite a while, altering prices significantly during the summer months or weekends when demand for rooms is at a premium. In recent years, the sports industry has started to catch on to this trend, especially within Major League Baseball (MLB). The purpose of this paper is to explore the methodology of applying this type of pricing to the hockey ticketing arena.

Read the paper (PDF).

Christopher Jones, Deloitte Consulting

Sabah Sadiq, Deloitte Consulting

Jing Zhao, Deloitte Consulting LLP

Paper 3258-2015:

Put Data in the Driver's Seat: A Primer on Data-Driven Programming Using SAS^®

One of the hallmarks of a good or great SAS^® program is that it requires only a minimum of upkeep. Especially for code that produces reports on a regular basis, it is preferable to minimize user and programmer input and instead have the input data drive the outputs of a program. Data-driven SAS programs are more efficient and reliable, require less hardcoding, and result in less downtime and fewer user complaints. This paper reviews three ways of building a SAS program to create regular Microsoft Excel reports; one method using hardcoded variables, another using SAS keyword macros, and the last using metadata to drive the reports.

Andrew Clapson, MD Financial Management

Paper 3382-2015:

Put Your Data on the Map

A bubble map can be a useful tool for identifying trends and visualizing the geographic proximity and intensity of events. This session shows how to use PROC GEOCODE and PROC GMAP to turn a data set of addresses and events into a map of the United States with scaled bubbles depicting the location and intensity of the events.

Read the paper (PDF).

Caroline Cutting, Warren Rogers Associates

Q

Paper 2221-2015:

Quotes within Quotes: When Single (') and Double (") Quotes Are Not Enough

Although it does not happen every day, it is not unusual to need to place a quoted string within another quoted string. Fortunately, SAS^® recognizes both single and double quote marks and either can be used within the other, which gives you the ability to have two-deep quoting. There are situations, however, where two kinds of quotes are not enough. Sometimes you need a third layer or, more commonly, you need to use a macro variable within the layers of quotes. Macro variables can be especially problematic, because they generally do not resolve when they are inside single quotes. However, this is SAS and that implies that there are several things going on at once and that there are several ways to solve these types of quoting problems. The primary goal of this presentation is to assist the programmer with solutions to the quotes-within-quotes problem with special emphasis on the presence of macro variables. The various techniques are contrasted as are the likely situations that call for these types of solutions. A secondary goal of this presentation is to help you understand how SAS works with quote marks and how it handles quoted strings. Without going into the gory details, a high-level understanding can be useful in a number of situations.

Read the paper (PDF).

Art Carpenter, California Occidental Consultants

R

Paper SAS1927-2015:

REST At Ease with SAS^®: How to Use SAS to Get Your REST

REST is being used across the industry for designing networked applications to provide lightweight and powerful alternatives to web services such as SOAP and Web Services Description Language (WSDL). Since REST is based entirely around HTTP, SAS^® provides everything you need to make REST calls and process structured and unstructured data alike. Learn how PROC HTTP and other SAS language features provide everything you need to simply and securely make use of REST.

Read the paper (PDF).

Joseph Henry, SAS

Paper 1341-2015:

Random vs. Fixed Effects: Which Technique More Effectively Addresses Selection Bias in Observational Studies

Retrospective case-control studies are frequently used to evaluate health care programs when it is not feasible to randomly assign members to a respective cohort. Without randomization, observational studies are more susceptible to selection bias where the characteristics of the enrolled population differ from those of the entire population. When the participant sample is different from the comparison group, the measured outcomes are likely to be biased. Given this issue, this paper discusses how propensity score matching and random effects techniques can be used to reduce the impact selection bias has on observational study outcomes. All results shown are drawn from an ROI analysis using a participant (cases) versus non-participant (controls) observational study design for a fitness reimbursement program aiming to reduce health care expenditures of participating members.

Read the paper (PDF). | Download the data file (ZIP).

Jess Navratil-Strawn, Optum

Paper 2382-2015:

Reducing the Bias: Practical Application of Propensity Score Matching in Health-Care Program Evaluation

To stay competitive in the marketplace, health-care programs must be capable of reporting the true savings to clients. This is a tall order, because most health-care programs are set up to be available to the client's entire population and thus cannot be conducted as a randomized control trial. In order to evaluate the performance of the program for the client, we use an observational study design that has inherent selection bias due to its inability to randomly assign participants. To reduce the impact of bias, we apply propensity score matching to the analysis. This technique is beneficial to health-care program evaluations because it helps reduce selection bias in the observational analysis and in turn provides a clearer view of the client's savings. This paper explores how to develop a propensity score, evaluate the use of inverse propensity weighting versus propensity matching, and determine the overall impact of the propensity score matching method on the observational study population. All results shown are drawn from a savings analysis using a participant (cases) versus non-participant (controls) observational study design for a health-care decision support program aiming to reduce emergency room visits.

Read the paper (PDF).

Amber Schmitz, Optum

Paper SAS1871-2015:

Regulatory Compliance Reporting Using SAS^® XML Mapper

As a part of regulatory compliance requirements, banks are required to submit reports based on Microsoft Excel, as per templates supplied by the regulators. This poses several challenges, including the high complexity of templates, the fact that implementation using ODS can be cumbersome, and the difficulty in keeping up with regulatory changes and supporting dynamic report content. At the same time, you need the flexibility to customize and schedule these reports as per your business requirements. This paper discusses an approach to building these reports using SAS^® XML Mapper and the Excel XML spreadsheet format. This approach provides an easy-to-use framework that can accommodate template changes from the regulators without needing to modify the code. It is implemented using SAS^® technologies, providing you the flexibility to customize to your needs. This approach also provides easy maintainability.

Read the paper (PDF).

Sarita Kannarath, SAS

Phil Hanna, SAS

Amitkumar Nakrani, SAS

Nishant Sharma, SAS

Paper 2102-2015:

Reversing the IN Operator

The IN operator within the DATA step is used for searching a specific variable for some values, either numeric or character (for example, 'if X in (2), then...'). This brief note explains how the opposite situation can be managed. That is, it explains how to search for a specific value in several variables through applying an array and the IN operator together.

Read the paper (PDF).

Can Tongur, Statistics Sweden

S

Paper 3444-2015:

S-M-U (Set, Merge, and Update) Revisited

It is a safe assumption that almost every SAS^® user learns how to use the SET statement not long after they're taught the concept of a DATA step. Further, it would probably be reasonable to guess that almost everyone of those people covered the MERGE statement soon afterwards. Many, maybe most, also got to try the UPDATE and/or MODIFY statements eventually, as well. It would also be a safe assumption that very few people have taken the time to review the manual since they've learned about those statements. That is most unfortunate, because there are so many options available to the users that can assist them in their tasks of obtaining and combining data sets. This presentation is designed to build onto the basic understanding of SET, MERGE, and UPDATE. It assumes that the attendee or reader has a basic knowledge of those statements, and it introduces various options and usages that extend the utility of these basic commands.

Read the paper (PDF).

Andrew Kuligowski, HSN

Paper 3155-2015:

SAS(R) Formats Top Ten

SAS^® formats can be used in so many different ways! Even the most basic SAS format use (modifying the way a SAS data value is displayed without changing the underlying data value) holds a variety of nifty tricks, such as nesting formats, formats that affect various style attributes, and conditional formatting. Add in picture formats, multi-label formats, using formats for data cleaning, and formats for joins and table look-ups, and we have quite a bag of tricks for the humble SAS format and PROC FORMAT, which are used to generate them. This paper describes a few very useful programming techniques that employ SAS formats. While this paper will be appropriate for the newest SAS user, it will also focus on some of the lesser-known features of formats and PROC FORMAT and so should be useful for even quite experienced users of SAS.

Read the paper (PDF).

Christianna Williams, Self-Employed

Paper 3290-2015:

SAS^® Analytics on IBM FlashSystem Storage: Deployment Scenarios and Best Practices

SAS^® Analytics enables organizations to tackle complex business problems using big data and to provide insights needed to make critical business decisions. A well-architected enterprise storage infrastructure is needed to realize the full potential of SAS Analytics. However, as the need for big data analytics and rapid response times increases, the performance gap between server speeds and traditional hard disk drive (HDD) based storage systems can be a significant concern. The growing performance gap can have detrimental effects, particularly when it comes to critical business applications. As a result, organizations are looking for newer, smarter, faster storage systems to accelerate business insights. IBM FlashSystem Storage systems store the data in flash memory. They are designed for dramatically faster access times and support incredible amounts of input/output operations per second (IOPS) and throughput, with significantly lower latency than HDD-based solutions. Due to their macro-efficiency design, FlashSystem Storage systems consume less power and have significantly lower cooling and space requirements, while allowing server processors to run SAS Analytics more efficiently. Being an all-flash storage system, IBM FlashSystem provides consistent low latency response across IOPS range, as the analytics workload scales. This paper introduces the benefits of IBM FlashSystem Storage for deploying SAS Analytics and highlights some of the deployment scenarios and architectural considerations. This paper also describes best practices and tuning guidelines for deploying SAS Analytics on FlashSystem Storage systems, which would help SAS Analytics customers in architecting solutions with FlashSystem Storage.

Read the paper (PDF).

David Gimpl, IBM

Matt Key, IBM

Narayana Pattipati, IBM

Harry Seifert, IBM

Paper 2620-2015:

SAS^® Certification as a Tool for Professional Development

In today's competitive job market, both recent graduates and experienced professionals are looking for ways to set themselves apart from the crowd. SAS^® certification is one way to do that. SAS Institute Inc. offers a range of exams to validate your knowledge level. In writing this paper, we have drawn upon our personal experiences, remarks shared by new and longtime SAS users, and conversations with experts at SAS. We discuss what certification is and why you might want to pursue it. Then we share practical tips you can use to prepare for an exam and do your best on exam day.

Read the paper (PDF).

Andra Northup, Advanced Analytic Designs, Inc.

Susan Slaughter, Avocet Solutions

Paper 3195-2015:

SAS^® Code for Making Microsoft Excel Files Section 508 Compliant

Can you create hundreds of great looking Microsoft Excel tables all within SAS^® and make them all Section 508 compliant at the same time? This paper examines how to use the ODS TAGSETS.EXCELXP statement and other Base SAS^® features to create fantastic looking Excel worksheet tables that are all Section 508 compliant. This paper demonstrates that there is no need for any outside intervention or pre- or post-meddling with the Excel files to make them Section 508 compliant. We do it all with simple Base SAS code.

Read the paper (PDF).

Chris Boniface, U.S. Census Bureau

Chris Boniface, U.S. Census Bureau

Paper 3102-2015:

SAS^® Enterprise Guide^® 5.1 and PROC GPLOT--the Power, the Glory and the PROC-tical Limitations

Customer expectations are set high when Microsoft Excel and Microsoft PowerPoint are used to design reports. Using SAS^® for reporting has benefits because it generates plots directly from prepared data sets, automates the plotting process, minimizes labor-intensive manual construction using Microsoft products, and does not compromise the presentation value. SAS^® Enterprise Guide^® 5.1 has a powerful point-and-click method that is quick and easy to use. However, it is limited in its ability to customize the output to mimic manually created Microsoft graphics. This paper demonstrates why SAS Enterprise Guide is the perfect starting point for creating initial code for plots using SAS/GRAPH^® point-and-click features and how the code can be enhanced using established PROC GPLOT, ANNOTATE, and ODS options to re-create the look and feel of plots generated by Excel and PowerPoint. Examples show the generation of plots and tables using PROC TABULATE to embed the plot data into the graphical output. Also included are tips for overcoming the ODS limitation of SAS^® 9.3, which is used by SAS Enterprise Guide 5.1, to transfer the SAS graphical output to PowerPoint files. These SAS^® 9.3 tips are contrasted with the new SAS^® 9.4 ODS POWERPOINT statement that enables direct PowerPoint file creation from a SAS program.

Read the paper (PDF).

Christopher Klekar, Baylor Scott and White Health

Gabriela Cantu, Baylor Scott &White Health

Paper 3109-2015:

SAS^® Enterprise Guide^® for Managers and Executives

SAS^® Enterprise Guide^® is an extremely valuable tool for programmers. But it should also be leveraged by managers and executives to do data exploration, get information on the fly, and take advantage of the powerful analytics and reporting that SAS^® has to offer. This can all be done without learning to program. This paper will overview how SAS Enterprise Guide can improve the process of turning real-time data into real-time business decisions by managers.

Jennifer First-Kluge, Systems Seminar Consultants

Paper 2683-2015:

SAS^® Enterprise Guide^® or SAS^® Studio: Which is Best for You?

SAS^® Studio (previously known as SAS^® Web Editor) was introduced in the first maintenance release of SAS^® 9.4 as an alternative programming environment to SAS^® Enterprise Guide^® and SAS^® Display Manager. SAS Studio is different in many ways from SAS Enterprise Guide and SAS Display Manager. As a programmer, I currently use SAS Enterprise Guide to help me code, test, maintain, and organize my SAS^® programs. I have SAS Display Manager installed on my PC, but I still prefer to write my programs in SAS Enterprise Guide because I know it saves my log and output whenever I run a program, even if that program crashes and takes the SAS session with it! So should I now be using SAS Studio instead, and should you be using it, too?

Read the paper (PDF).

Philip Holland, Holland Numerics Limited

Paper SAS1757-2015:

SAS^® University Edition--Connecting SAS^® Software in New Ways to Build the Next Generation of SAS Users

Are you a SAS^® software user hoping to convince your organization to move to the latest product release? Has your management team asked how your organization can hire new SAS users familiar with the latest and greatest procedures and techniques? SAS^® Studio and SAS^® University Edition might provide the answers for you. SAS University Edition was created for teaching and learning. It's a new downloadable package of selected SAS products (Base SAS^®, SAS/STAT^®, SAS/IML^®, SAS/ACCESS^® Interface to PC Files, and SAS Studio) that runs on Windows, Linux, and Mac. With the exploding demand for analytical talent, SAS launched this package to grow the next generation of SAS users. Part of the way SAS is helping grow that next generation of users is through the interface to SAS University Edition: SAS Studio. SAS Studio is a developmental web application for SAS that you access through your web browser and--since the first maintenance release of SAS 9.4--is included in Base SAS at no additional charge. The connection between SAS University Edition and commercial SAS means that it's easier than ever to use SAS for teaching, research, and learning, from high schools to community colleges to universities and beyond. This paper describes the product, as well as the intent behind it and other programs that support it, and then talks about some successes in adopting SAS University Edition to grow the next generation of users.

Read the paper (PDF).

Polly Mitchell-Guthrie, SAS

Amy Peters, SAS

Paper SAS1856-2015:

SAS^® and SAP Business Warehouse on SAP HANA--What's in the Handshake?

Is your company using or considering using SAP Business Warehouse (BW) powered by SAP HANA? SAS^® provides various levels of integration with SAP BW in an SAP HANA environment. This integration enables you to not only access SAP BW components from SAS, but to also push portions of SAS analysis directly into SAP HANA, accelerating predictive modeling and data mining operations. This paper explains the SAS toolset for different integration scenarios, highlights the newest technologies contributing to integration, and walks you through examples of using SAS with SAP BW on SAP HANA. The paper is targeted at SAS and SAP developers and architects interested in building a productive analytical environment with the help of the latest SAS and SAP collaborative advancements.

Read the paper (PDF).

Tatyana Petrova, SAS

Paper 2984-2015:

SAS^® for Six Sigma--An Introduction

Six Sigma is a business management strategy that seeks to improve the quality of process outputs by identifying and removing the causes of defects (errors) and minimizing variability in manufacturing and business processes. Each Six Sigma project carried out within an organization follows a defined sequence of steps and has quantified financial targets. All Six Sigma project methodologies include an extensive analysis phase in which SAS^® software can be applied. JMP^® software is widely used for Six Sigma projects. However, this paper demonstrates how Base SAS^® (and a bit of SAS/GRAPH^® and SAS/STAT^® software) can be used to address a wide variety of Six Sigma analysis tasks. The reader is assumed to have a basic knowledge of Six Sigma methodology. Therefore, the focus of the paper is the use of SAS code to produce outputs for analysis.

Read the paper (PDF).

Dan Bretheim, Towers Watson

Paper SAS1947-2015:

SAS^® vApps 101

SAS^® vApps (virtual applications) are a SAS^® construct designed to logically and physically encapsulate a single- or multi-tier software solution into a virtual machine (or sometimes into multiple virtual machines). In this paper, we examine the conceptual, logical, and physical design perspectives that comprise a vApp, giving you a high-level understanding of both the technical and business benefits of vApps and the design decisions that go into envisioning and constructing SAS vApps. These are described in the context of the user roles involved in the life cycle of a vApp, and how those roles interact with a vApp at various points along its continuum.

Read the paper (PDF).

Gary Kohan, SAS

Danny Hamrick, SAS

Connie Robison, SAS

Rob Stephens, SAS

Peter Villiers, SAS

Paper 2687-2015:

Selection and Transformation of Continuous Predictors for Logistic Regression

This paper discusses the selection and transformation of continuous predictor variables for the fitting of binary logistic models. The paper has two parts: (1) A procedure and associated SAS^® macro are presented that can screen hundreds of predictor variables and 10 transformations of these variables to determine their predictive power for a logistic regression. The SAS macro passes the training data set twice to prepare the transformations and one more time through PROC TTEST. (2) The FSP (function selection procedure) and a SAS implementation of FSP are discussed. The FSP tests all transformations from among a class of FSP transformations and finds the one with maximum likelihood when fitting the binary target. In a 2008 book, Patrick Royston and Willi Sauerbrei popularized the FSP.

Read the paper (PDF).

Bruce Lund, Marketing Associates, LLC

Paper 3316-2015:

Sending Text Messages to your Phone via the DATA Step

Text messages (SMS) are a convenient way to receive notifications away from your computer screen. In SAS^®, text messages can be sent to mobile phones via the DATA step. This paper briefly describes several methods for sending text messages from SAS and explores possible applications.

Read the paper (PDF).

Matthew Slaughter, Coalition for Compassionate Care of California

Paper 3309-2015:

Snapshot SNAFU: Preventative Measures to Safeguard Deliveries

Little did you know that your last delivery ran on incomplete data. To make matters worse, the client realized the issue first. Sounds like a horror story, no? A few preventative measures can go a long way in ensuring that your data are up-to-date and progressing normally. At the data set level, metadata comparisons between the current and previous data cuts will help identify observation and variable discrepancies. Comparisons will also uncover attribute differences at the variable level. At the subject level, they will identify missing subjects. By compiling these comparison results into a comprehensive scheduled e-mail, a data facilitator need only skim the report to confirm that the data is good to go--or in need of some corrective action. This paper introduces a suite of checks contained in a macro that will compare data cuts in the data set, variable, and subject levels and produce an e-mail report. The wide use of this macro will help all SAS^® users create better deliveries while avoiding rework.

Read the paper (PDF). | Download the data file (ZIP).

Spencer Childress, Rho,Inc

Alexandra Buck, Rho, Inc.

Paper SAS1972-2015:

Social Media and Open Data Integration through SAS^® Visual Analytics and SAS^® Text Analytics for Public Health Surveillance

A leading killer in the United States is smoking. Moreover, over 8.6 million Americans live with a serious illness caused by smoking or second-hand smoking. Despite this, over 46.6 million U.S. adults smoke tobacco, cigars, and pipes. The key analytic question in this paper is, How would e-cigarettes affect this public health situation? Can monitoring public opinions of e-cigarettes using SAS^® Text Analytics and SAS^® Visual Analytics help provide insight into the potential dangers of these new products? Are e-cigarettes an example of Big Tobacco up to its old tricks or, in fact, a cessation product? The research in this paper was conducted on thousands of tweets from April to August 2014. It includes API sources beyond Twitter--for example, indicators from the Health Indicators Warehouse (HIW) of the Centers for Disease Control and Prevention (CDC)--that were used to enrich Twitter data in order to implement a surveillance system developed by SAS^® for the CDC. The analysis is especially important to The Office of Smoking and Health (OSH) at the CDC, which is responsible for tobacco control initiatives that help states to promote cessation and prevent initiation in young people. To help the CDC succeed with these initiatives, the surveillance system also: 1) automates the acquisition of data, especially tweets; and 2) applies text analytics to categorize these tweets using a taxonomy that provides the CDC with insights into a variety of relevant subjects. Twitter text data can help the CDC look at the public response to the use of e-cigarettes, and examine general discussions regarding smoking and public health, and potential controversies (involving tobacco exposure to children, increasing government regulations, and so on). SAS^® Content Categorization helps health care analysts review large volumes of unstructured data by categorizing tweets in order to monitor and follow what people are saying and why they are saying it. Ultimatel y, it is a solution intended to help the CDC monitor the public's perception of the dangers of smoking and e-cigarettes, in addition, it can identify areas where OSH can focus its attention in order to fulfill its mission and track the success of CDC health initiatives.

Read the paper (PDF).

Manuel Figallo, SAS

Emily McRae, SAS

Paper 2264-2015:

Some of the Most Useful SAS^® Functions

SAS^® functions provide amazing power to your DATA step programming. Specific functions are essential--they save you from writing volumes of unnecessary code. This presentation covers some of the most useful SAS functions. A few might be new to you and they can all change how you program and approach common programming tasks. The majority of these functions work with character data. There are functions that search for strings, others that find and replace strings, and some that join strings together. Furthermore, certain functions can measure the spelling distance between two strings (useful for fuzzy matching). Some of the newest and most incredible functions are not functions at all--they are call routines. Did you know that you can sort values within an observation? Did you know that not only can you identify the largest or smallest value in a list of variables, but you can identify the second or third or nth largest or smallest value? A knowledge of these functions will make you a better SAS programmer.

Read the paper (PDF).

Ron Cody, Camp Verde Consulting

Paper 3496-2015:

Something Old, Something New... Flexible Reporting with DATA Step-based Tools

The report looks simple enough--a bar chart and a table, like something created with GCHART and REPORT procedures. But, there are some twists to the reporting requirements that make those procedures not quite flexible enough. The solution was to mix 'old' and 'new' DATA step-based techniques to solve the problem. Annotate datasets are used to create the bar chart and the Report Writing Interface (RWI) to create the table. Without a whole lot of additional code, an extreme amount of flexibility is gained.

Read the paper (PDF).

Pete Lund, Looking Glass Analytics

Paper 1978-2015:

Something for Nothing? Adding Group Descriptive Statistics Using SAS^® PROC SQL Subqueries II

Can you actually get something for nothing? With the SAS^® PROC SQL subquery and remerging features, yes, you can. When working with categorical variables, you often need to add group descriptive statistics such as group counts and minimum and maximum values for further BY-group processing. Instead of first creating the group count and minimum or maximum values and then merging the summarized data set to the original data set, why not take advantage of PROC SQL to complete two steps in one? With the PROC SQL subquery and summary functions by the group variable, you can easily remerge the new group descriptive statistics with the original data set. Now with a few DATA step enhancements, you too can include percent calculations.

Read the paper (PDF).

Sunil Gupta, Gupta Programming

Paper 3209-2015:

Standardizing the Standardization Process

A prevalent problem surrounding Extract, Transform, and Load (ETL) development is the ability to apply consistent logic and manipulation of source data when migrating to target data structures. Certain inconsistencies that add a layer of complexity include, but are not limited to, naming conventions and data types associated with multiple sources, numerous solutions applied by an array of developers, and multiple points of updating. In this paper, we examine the evolution of implementing a best practices solution during the process of data delivery, with respect to standardizing data. The solution begins with injecting the transformations of the data directly into the code at the standardized layer via Base SAS^® or SAS^® Enterprise Guide^®. A more robust method that we explore is to apply these transformations with SAS^® macros. This provides the capability to apply these changes in a consistent manner across multiple sources. We further explore this solution by implementing the macros within SAS^® Data Integration Studio processes on the DataFlux^® Data Management Platform. We consider these issues within the financial industry, but the proposed solution can be applied across multiple industries.

Read the paper (PDF).

Avery Long, Financial Risk Group

Frank Ferriola, Financial Risk Group

Paper 3150-2015:

Statistical Evaluation of the Doughnut Clustering Method for Product Affinity Segmentation

Product affinity segmentation is a powerful technique for marketers and sales professionals to gain a good understanding of customers' needs, preferences, and purchase behavior. Performing product affinity segmentation is quite challenging in practice because product-level data usually has high skewness, high kurtosis, and a large percentage of zero values. The Doughnut Clustering method has been shown to be effective using real data, and was presented at SAS^® Global Forum 2013 in the paper titled Product Affinity Segmentation Using the Doughnut Clustering Approach.' However, the Doughnut Clustering method is not a panacea for addressing the product affinity segmentation problem. There is a clear need for a comprehensive evaluation of this method in order to be able to develop generic guidelines for practitioners about when to apply it. In this paper, we meet the need by evaluating the Doughnut Clustering method on simulated data with different levels of skewness, kurtosis, and percentage of zero values. We developed a five-step approach based on Fleishman's power method to generate synthetic data with prescribed parameters. Subsequently, we designed and conducted a set of experiments to run the Doughnut Clustering method as well as the traditional K-means method as a benchmark on simulated data. We draw conclusions on the performance of the Doughnut Clustering method by comparing the clustering validity metric the ratio of between-cluster variance to within-cluster variance as well as the relative proportion of cluster sizes against those of K-means.

Read the paper (PDF). | Download the data file (ZIP).

Darius Baer, SAS

Goutam Chakraborty, Oklahoma State University

Paper SAS1880-2015:

Staying Relevant in a Competitive World: Using the SAS^® Output Delivery System to Enhance, Customize, and Render Reports

Technology is always changing. To succeed in this ever-evolving landscape, organizations must embrace the change and look for ways to use it to their advantage. Even standard business tasks such as creating reports are affected by the rapid pace of technology. Reports are key to organizations and their customers. Therefore, it is imperative that organizations employ current technology to provide data in customized and meaningful reports across a variety of media. The SAS^® Output Delivery System (ODS) gives you that edge by providing tools that enable you to package, present, and deliver report data in more meaningful ways, across the most popular desktop and mobile devices. To begin, the paper illustrates how to modify styles in your reports using the ODS CSS style engine, which incorporates the use of cascading style sheets (CSS) and the ODS document object model (DOM). You also learn how you can use SAS ODS to customize and generate reports in the body of e-mail messages. Then the paper discusses methods for enhancing reports and rendering them in desktop and mobile browsers by using the HTML and HTML5 ODS destinations. To conclude, the paper demonstrates the use of selected SAS ODS destinations and features in practical, real-world applications.

Read the paper (PDF).

Chevell Parker, SAS

Paper 1565-2015:

Strategies for Error Handling and Program Control: Concepts

SAS^® provides a complex ecosystem with multiple tools and products that run in a variety of environments and modes. SAS provides numerous error-handling and program control statements, options, and features. These features can function differently according to the run environment, and many of them have constraints and limitations. In this presentation, we review a number of potential error-handling and program control strategies that can be employed, along with some of the inherent limitations of each. The bottom line is that there is no single strategy that will work in all environments, all products, and all run modes. Instead, programmers need to consider the underlying program requirements and choose the optimal strategy for their situation.

Read the paper (PDF). | Download the data file (ZIP).

Thomas Billings, MUFG Union Bank, N.A.

Paper 3478-2015:

Stress Testing for Mid-Sized Banks

In 2014, for the first time, mid-market banks (consisting of banks and bank holding companies with $10-$50 billion in consolidated assets) were required to submit Capital Stress Tests to the federal regulators under the Dodd-Frank Act Stress Testing (DFAST). This is a process large banks have been going through since 2011. However, mid-market banks are not positioned to commit as many resources to their annual stress tests as their largest peers. Limited human and technical resources, incomplete or non-existent detailed historical data, lack of enterprise-wide cross-functional analytics teams, and limited exposure to rigorous model validations are all challenges mid-market banks face. While there are fewer deliverables required from the DFAST banks, the scrutiny the regulators are placing on the analytical modes is just as high as their expectations for Comprehensive Capital Analysis and Review (CCAR) banks. This session discusses the differences in how DFAST and CCAR banks execute their stress tests, the challenges facing DFAST banks, and potential ways DFAST banks can leverage the analytics behind this exercise.

Read the paper (PDF).

Charyn Faenza, F.N.B. Corporation

Paper 3187-2015:

Structuring your SAS^® Applications for Long-Term Survival: Reproducible Methods in Base SAS^® Programming

SAS^® users organize their applications in a variety of ways. However, there are some approaches that are more successful, and some that are less successful. In particular, the need to process some of the code some of the time in a file is sometimes challenging. Reproducible research methods require that SAS applications be understandable by the author and other staff members. In this presentation, you learn how to organize and structure your SAS application to manage the process of data access, data analysis, and data presentation. The approach to structure applications requires that tasks in the process of data analysis be compartmentalized. This can be done using a well-defined program. The author presents his structuring algorithm, and discusses the characteristics of good structuring methods for SAS applications. Reproducible research methods are becoming more centrally important, and SAS users must keep up with the current developments.

Read the paper (PDF). | Download the data file (ZIP).

Paul Thomas, ASUP Ltd

T

Paper 2219-2015:

Table Lookup Techniques: From the Basic to the Innovative

One of the more commonly needed operations in SAS^® programming is to determine the value of one variable based on the value of another. A series of techniques and tools have evolved over the years to make the matching of these values go faster, smoother, and easier. A majority of these techniques require operations such as sorting, searching, and comparing. As it turns out, these types of techniques are some of the more computationally intensive. Consequently, an understanding of the operations involved and a careful selection of the specific technique can often save the user a substantial amount of computing resources. Many of the more advanced techniques can require substantially fewer resources. It is incumbent on the user to have a broad understanding of the issues involved and a more detailed understanding of the solutions available. Even if you do not currently have a BIG data problem, you should at the very least have a basic knowledge of the kinds of techniques that are available for your use.

Read the paper (PDF).

Art Carpenter, California Occidental Consultants

Paper 3352-2015:

Tactical Marketing with SAS^® Visual Analytics--Aligning a Customer's Online Journey with In-Store Purchases

Marketers often face a cross-channel challenge in making sense of the behavior of web visitors who spend considerable time researching an item online, even putting the item in a wish list or checkout basket, but failing to follow up with an actual purchase online, instead opting to purchase the item in the store. This research shows the use of SAS^® Visual Analytics to address this challenge. This research uses a large data set of simulated web transactional data, combines it with common IDs to attach the data to in-store retail data, and studies it in SAS Visual Analytics. In this presentation, we go over tips and tricks for using SAS Visual Analytics on a non-distributed server. The loaded data set is analyzed step by step to show how to draw correlations in the web browsing behavior of customers and how to link the data to their subsequent in-store behavior. It shows how we can draw inferences between web visits and in-store visits by department. You'll change your marketing strategy as a result of the research.

Read the paper (PDF).

Tricia Aanderud, Zencos

Johann Pasion, 89 Degrees

Paper SAS1831-2015:

Teach Them to Fish--How to Use Tasks in SAS^® Studio to Enable Co-Workers to Run Your Reports Themselves

How many times has this happened to you? You create a really helpful report and share it with others. It becomes popular and you find yourself running it over and over. Then they start asking, But can't you re-run it and just change ___? (Fill in the blank with whatever simple request you can think of.) Don't you want to just put the report out as a web page with some basic parameters that users can choose themselves and run when they want? Consider writing your own task in SAS^® Studio! SAS Studio includes several predefined tasks, which are point-and-click user interfaces that guide the user through an analytical process. For example, tasks enable users to create a bar chart, run a correlation analysis, or rank data. When a user selects a task option, SAS^® code is generated and run on the SAS server. Because of the flexibility of the task framework, you can make a copy of a predefined task and modify it or create your own. Tasks use the same common task model and the Velocity Template Language--no Java programming or ActionScript programming is required. Once you have the interface set up to generate the SAS code you need, then you can publish the task for other SAS Studio users to use or you can use a straight URL. Now that others can generate the output themselves, you actually might have time to go fishing!

Read the paper (PDF).

Christie Corcoran, SAS

Amy Peters, SAS

Paper 3042-2015:

Tell Me What You Want: Conjoint Analysis Made Simple Using SAS^®

The measurement of factors influencing consumer purchasing decisions is of interest to all manufacturers of goods, retailers selling these goods, and consumers buying these goods. In the past decade, conjoint analysis has become one of the commonly used statistical techniques for analyzing the decisions or trade-offs consumers make when they purchase products. Although recent years have seen increased use of conjoint analysis and conjoint software, there is limited work that has spelled out a systematic procedure on how to do a conjoint analysis or how to use conjoint software. This paper reviews basic conjoint analysis concepts, describes the mathematical and statistical framework on which conjoint analysis is built, and introduces the TRANSREG and PHREG procedures, their syntaxes, and the output they generate using simplified real-life data examples. This paper concludes by highlighting some of the substantives issues related to the application of conjoint analysis in a business environment and the available auto call macros in SAS/STAT^®, SAS/IML^®, and SAS/QC^® software that can handle more complex conjoint designs and analyses. The paper will benefit the basic SAS user, and statisticians and research analysts in every industry, especially those in marketing and advertisement.

Read the paper (PDF).

Delali Agbenyegah, Alliance Data Systems

Paper SAS1387-2015:

Ten Tips for Simulating Data with SAS^®

Data simulation is a fundamental tool for statistical programmers. SAS^® software provides many techniques for simulating data from a variety of statistical models. However, not all techniques are equally efficient. An efficient simulation can run in seconds, whereas an inefficient simulation might require days to run. This paper presents 10 techniques that enable you to write efficient simulations in SAS. Examples include how to simulate data from a complex distribution and how to use simulated data to approximate the sampling distribution of a statistic.

Read the paper (PDF). | Download the data file (ZIP).

Rick Wicklin, SAS

Paper 3504-2015:

The %ic_mixed Macro: A SAS Macro to Produce Sorted Information Criteria (AIC and BIC) List for PROC MIXED for Model Selection

PROC MIXED is one of the most popular SAS procedures to perform longitudinal analysis or multilevel models in epidemiology. Model selection is one of the fundamental questions in model building. One of the most popular and widely used strategies is model selection based on information criteria, such as Akaike Information Criterion (AIC) and Sawa Bayesian Information Criterion (BIC). This strategy considers both fit and complexity, and enables multiple models to be compared simultaneously. However, there is no existing SAS procedure to perform model selection automatically based on information criteria for PROC MIXED, given a set of covariates. This paper provides information about using the SAS %ic_mixed macro to select a final model with the smallest value of AIC and BIC. Specifically, the %ic_mixed macro will do the following: 1) produce a complete list of all possible model specifications given a set of covariates, 2) use do loop to read in one model specification every time and save it in a macro variable, 3) execute PROC MIXED and use the Output Delivery System (ODS) to output AICs and BICs, 4) append all outputs and use the DATA step to create a sorted list of information criteria with model specifications, and 5) run PROC REPORT to produce the final summary table. Based on the sorted list of information criteria, researchers can easily identify the best model. This paper includes the macro programming language, as well as examples of the macro calls and outputs.

Read the paper (PDF).

Qinlei Huang, St Jude Children's Research Hospital

Paper 3278-2015:

The Analytics Behind an NBA Name Change

For the past two academic school years, our SAS^® Programming 1 class had a classroom discussion about the Charlotte Bobcats. We wondered aloud If the Bobcats changed their team name would the dwindling fan base return? As a class, we created a survey that consisted of 10 questions asking people if they liked the name Bobcats, did they attend basketball games, and if they bought merchandise. Within a one-hour class period, our class surveyed 981 out of 1,733 students at Phillip O. Berry Academy of Technology. After collecting the data, we performed advanced analytics using Base SAS^® and concluded that 75% of students and faculty at Phillip O. Berry would prefer any other name except the Bobcats. In other results, 80% percent of the student body liked basketball, and the most preferred name was the Hornets, followed by the Royals, Flight, Dragons, and finally the Bobcats, The following school year, we conducted another survey to discover if people's opinions had changed since the previous survey and if people were happy with the Bobcats changing their name. During this time period, the Bobcats had recently reported that they were granted the opportunity to change the team name to the Hornets. Once more, we collected and analyzed the data and concluded that 77% percent of people surveyed were thrilled with the name change. In addition, around 50% percent of surveyors were interested in purchasing merchandise. Through the work of this project, SAS^® Analytics was applied in the classroom to a real world scenario. The ability to see how SAS^® could be applied to a question of interest and create change inspired the students in our class. This project is significantly important to show the economic impact that sports can have on a city. This project in particular, focused on the nostalgia that people of the city of Charlotte felt for the name Hornets. The project opened the door for more analysis and questions and continues to spa rk interest. This is the case because when people have a connection to the team and the more the team flourishes, the more Charlotte benefits.

Read the paper (PDF). | Download the data file (ZIP).

Lauren Cook, Charlotte Mecklenburg School System

Paper 3106-2015:

The Best DATA Step Debugging Feature

Many languages, including the SAS DATA step, have extensive debuggers that can be used to detect logic errors in programs. Another easy way to detect logic errors is to simply display messages and variable content at strategic times. The PUTLOG statement will be discussed and examples given that show how using this statement is probably the easiest and most flexible way to detect and correct errors in your DATA step logic.

Read the paper (PDF).

Steven First, Systems Seminar Consultants

Paper 1334-2015:

The Essentials of SAS^® Dates and Times

The first thing that you need to know is that SAS^® software stores dates and times as numbers. However, this is not the only thing that you need to know, and this presentation gives you a solid base for working with dates and times in SAS. It also introduces you to functions and features that enable you to manipulate your dates and times with surprising flexibility. This paper also shows you some of the possible pitfalls with dates (and with times and datetimes) in your SAS code, and how to avoid them. We show you how SAS handles dates and times through examples, including the ISO 8601 formats and informats, and how to use dates and times in TITLE and FOOTNOTE statements. We close with a brief discussion of Microsoft Excel conversions.

Read the paper (PDF). | Download the data file (ZIP).

Derek Morgan

Paper 3060-2015:

The Knight's Tour in Chess--Implementing a Heuristic Solution

The knight's tour is a sequence of moves on a chess board such that a knight visits each square only once. Using a heuristic method, it is possible to find a complete path, beginning from any arbitrary square on the board and landing on the remaining squares only once. However, the implementation poses challenging programming problems. For example, it is necessary to discern viable knight moves, which change throughout the tour. Even worse, the heuristic approach does not guarantee a solution. This paper explains a SAS^® solution that finds a knight's tour beginning from every initial square on a chess board...well, almost.

Read the paper (PDF).

John R Gerlach, Dataceutics, Inc.

Paper SAS1642-2015:

The REPORT Procedure: A Primer for the Compute Block

It is well-known in the world of SAS^® programming that the REPORT procedure is one of the best procedures for creating dynamic reports. However, you might not realize that the compute block is where all of the action takes place! Its flexibility enables you to customize your output. This paper is a primer for using a compute block. With a compute block, you can easily change values in your output with the proper assignment statement and add text with the LINE statement. With the CALL DEFINE statement, you can adjust style attributes such as color and formatting. Through examples, you learn how to apply these techniques for use with any style of output. Understanding how to use the compute-block functionality empowers you to move from creating a simple report to creating one that is more complex and informative, yet still easy to use.

Read the paper (PDF).

Jane Eslinger, SAS

Paper SAS1956-2015:

The SAS^® Scalable Performance Data Engine: Moving Your Data to Hadoop without Giving Up the SAS Features You Depend On

If you are one of the many customers who want to move your SAS^® data to Hadoop, one decision you will encounter is what data storage format to use. There are many choices, and all have their pros and cons. One factor to consider is how you currently store your data. If you currently use the Base SAS^® engine or the SAS^® Scalable Performance Data Engine, then using the SPD Engine with Hadoop will enable you to continue accessing your data with as little change to your existing SAS programs as possible. This paper discusses the enhancements, usage, and benefits of the SPD Engine with Hadoop.

Read the paper (PDF).

Lisa Brown, SAS

Paper 3338-2015:

Time Series Modeling and Forecasting--An Application to Stress-Test Banks

Did you ever wonder how large US bank holding companies (BHCs) perform stress testing? I had the pleasure to be a part of this process on the model building end, and now I perform model validation. As with everything that is new and uncertain, there is much room for the discovery process. This presentation explains how banks in general perform time series modeling of different loans and credits to establish the bank's position during simulated stress. You learn the basic process behind model building and validation for Comprehensive Capital Analysis and Review (CCAR) purposes, which includes, but is not limited to, back testing, sensitivity analysis, scenario analysis, and model assumption testing. My goal is to gain your interest in the areas of challenging current modeling techniques and looking beyond standard model assumption testing to assess the true risk behind the formulated model and its consequences. This presentation examines the procedures that happen behind the scenes of any code's syntax to better explore statistics that play crucial roles in assessing model performance and forecasting. Forecasting future periods is the process that needs more attention and a better understanding because this is what the CCAR is really all about. In summary, this presentation engages professionals and students to dig dipper into every aspect of time series forecasting.

Read the paper (PDF).

Ania Supady, KeyCorp

Paper 3003-2015:

Tips and Tricks for SAS^® Program Automation

There are many 'gotcha's' when you are trying to automate a well-written program. The details differ depending on the way you schedule the program and the environment you are using. This paper covers system options, error handling logic, and best practices for logging. Save time and frustration by using these tips as you schedule programs to run.

Read the paper (PDF).

Adam Hood, Slalom Consulting

Paper 3196-2015:

Tips for Managing SAS^® Work Libraries

The Work library is at the core of most SAS^® programs, but programmers tend to ignore it unless something breaks. This paper first discusses the USER= system option for saving the Work files in a directory. Then, we cover a similar macro-controlled method for saving the files in your Work library, and the interaction of this method with OPTIONS NOREPLACE and the syntax check options. A number of SAS system options that help you to manage Work libraries are discussed: WORK=, WORKINIT, WORKTERM; these options might be restricted by SAS Administrators. Additional considerations in managing Work libraries are discussed: handling large files, file compression, programming style, and macro-controlled deletion of redundant files in SAS^® Enterprise Guide^®.

Read the paper (PDF). | Download the data file (ZIP).

Thomas Billings, MUFG Union Bank, N.A.

Avinash Kalwani, Oklahoma State University

Paper 1600-2015:

Tips for Publishing in Health Care Journals with the Medical Expenditure Panel Survey (MEPS) Data Using SAS^®

This presentation provides an in-depth analysis, with example SAS^® code, of the health care use and expenditures associated with depression among individuals with heart disease using the 2012 Medical Expenditure Panel Survey (MEPS) data. A cross-sectional study design was used to identify differences in health care use and expenditures between depressed (n = 601) and nondepressed (n = 1,720) individuals among patients with heart disease in the United States. Multivariate regression analyses using the SAS survey analysis procedures were conducted to estimate the incremental health services and direct medical costs (inpatient, outpatient, emergency room, prescription drugs, and other) attributable to depression. The prevalence of depression among individuals with heart disease in 2012 was estimated at 27.1% (6.48 million persons) and their total direct medical costs were estimated at approximately $110 billion in 2012 U.S. dollars. Younger adults (< 60 years), women, unmarried, poor, and sicker individuals with heart disease were more likely to have depression. Patients with heart disease and depression had more hospital discharges (relative ratio (RR) = 1.06, 95% confidence interval (CI) [1.02 to 1.09]), office-based visits (RR = 1.27, 95% CI [1.15 to 1.41]), emergency room visits (RR = 1.08, 95% CI [1.02 to 1.14]), and prescribed medicines (RR = 1.89, 95% CI [1.70, 2.11]) than their counterparts without depression. Finally, among individuals with heart disease, overall health care expenditures for individuals with depression was 69% higher than that for individuals without depression (RR = 1.69, 95% CI [1.44, 1.99]). The conclusion is that depression in individuals with heart disease is associated with increased health care use and expenditures, even after adjusting for differences in age, gender, race/ethnicity, marital status, poverty level, and medical comorbidity.

Read the paper (PDF).

Seungyoung Hwang, Johns Hopkins University Bloomberg School of Public Health

Paper 3268-2015:

To %Bquote or Not to %Bquote? That Is the Question (Which Drives SAS^® Macro Programmers around the Bend)

The SAS^® macro processor is a powerful ally, but it requires respect. There are a myriad of macro functions available, most of which emulate DATA step functions, but some of which require special consideration to fully use their capabilities. Questions to be answered include the following: When should you protect the macro variable? During macro compilation, during macro execution? (What do those phrases even mean?) How do you know when to use which macro function? %BQUOTE(), %NBRQUOTE(), %UNQUOTE(),%SUPERQ(), and so on? What's with the %Q prefix of some macro functions? And more: %SYSFUNC(), %SYSCALL, and so on. Macro developers will no longer by daunted by the complexity of choices. With a little clarification, the power of these macro functions will open up new possibilities.

Read the paper (PDF).

Andrew Howell, ANJ Solutions

Paper 2785-2015:

Transpose Data Sets by Merge

Using PROC TRANSPOSE to make wide files wider requires running separate PROC TRANSPOSE steps for each variable that you want transposed, as well as a DATA step using a MERGE statement to combine all of the transposed files. In addition, if you want the variables in a specific order, an extra DATA step is needed to rearrange the variable ordering. This paper presents a method that accomplishes the task in a simpler manner using less code and requiring fewer steps, and which runs n times faster than PROC TRANSPOSE (where n=the number of variables to be transposed).

Read the paper (PDF). | Download the data file (ZIP).

Keshan Xia, 3GOLDEN Beijing Technologies Co. Ltd., Beijing, China

Matthew Kastin, I-Behavior

Arthur Tabachneck, AnalystFinder, Inc.

Paper 3081-2015:

Tweet-O-Matic: An Automated Approach to Batch Processing of Tweets

Currently, there are several methods for reading JSON formatted files into SAS^® that depend on the version of SAS and which products are licensed. These methods include user-defined macros, visual analytics, PROC GROOVY, and more. The user-defined macro %GrabTweet, in particular, provides a simple way to directly read JSON-formatted tweets into SAS^® 9.3. The main limitation of %GrabTweet is that it requires the user to repeatedly run the macro in order to download large amounts of data over time. Manually downloading tweets while conforming to the Twitter rate limits might cause missing observations and is time-consuming overall. Imagine having to sit by your computer the entire day to continuously grab data every 15 minutes, just to download a complete data set of tweets for a popular event. Fortunately, the %GrabTweet macro can be modified to automate the retrieval of Twitter data based on the rate that the tweets are coming in. This paper describes the application of the %GrabTweet macro combined with batch processing to download tweets without manual intervention. Users can specify the phrase parameters they want, run the batch processing macro, leave their computer to automatically download tweets overnight, and return to a complete data set of recent Twitter activity. The batch processing implements an automated retrieval of tweets through an algorithm that assesses the rate of tweets for the specified topic in order to make downloading large amounts of data simpler and effortless for the user.

Read the paper (PDF).

Isabel Litton, California Polytechnic State University, SLO

Rebecca Ottesen, City of Hope and Cal Poly SLO

Paper 3518-2015:

Twelve Ways to Better Graphs

If you are looking for ways to make your graphs more communication-effective, this tutorial can help. It covers both the new ODS Graphics SG (Statistical Graphics) procedures and the traditional SAS/GRAPH^® software G procedures. The focus is on management reporting and presentation graphs, but the principles are relevant for statistical graphs as well. Important features unique to SAS^® 9.4 are included, but most of the designs and construction methods apply to earlier versions as well. The principles of good graphic design are actually independent of your choice of software.

Read the paper (PDF).

LeRoy Bessler, Bessler Consulting and Research

U

Paper 1640-2015:

Understanding Characteristics of Insider Threats by Using Feature Extraction

This paper explores feature extraction from unstructured text variables using Term Frequency-Inverse Document Frequency (TF-IDF) weighting algorithms coded in Base SAS^®. Data sets with unstructured text variables can often hold a lot of potential to enable better predictive analysis and document clustering. Each of these unstructured text variables can be used as inputs to build an enriched data set-specific inverted index, and the most significant terms from this index can be used as single word queries to weight the importance of the term to each document from the corpus. This paper also explores the usage of hash objects to build the inverted indices from the unstructured text variables. We find that hash objects provide a considerable increase in algorithm efficiency, and our experiments show that a novel weighting algorithm proposed by Paik (2013) best enables meaningful feature extraction. Our TF-IDF implementations are tested against a publicly available data breach data set to understand patterns specific to insider threats to an organization.

Read the paper (PDF). | Watch the recording.

Ila Gokarn, Singapore Management University

Clifton Phua, SAS

Paper 3408-2015:

Understanding Patterns in the Utilization and Costs of Elbow Reconstruction Surgeries: A Healthcare Procedure that is Common among Baseball Pitchers

Athletes in sports, such as baseball and softball, commonly undergo elbow reconstruction surgeries. There is research that suggests that the rate of elbow reconstruction surgeries among professional baseball pitchers continues to rise by leaps and bounds. Given the trend found among professional pitchers, the current study reviews patterns of elbow reconstruction surgery among the privately insured population. The study examined trends (for example, cost, age, geography, and utilization) in elbow reconstruction surgeries among privately insured patients using analytic tools such as SAS^® Enterprise Guide^® and SAS^® Visual Analytics, based on the medical and surgical claims data from the FAIR Health National Private Insurance Claims (NPIC) database. The findings of the study suggested that there are discernable patterns in the prevalence of elbow reconstruction surgeries over time and across specific geographic regions.

Read the paper (PDF). | Download the data file (ZIP).

Jeff Dang, FAIR Health

Paper 3285-2015:

Unraveling the Knot of Ampersands

We've all heard it before: 'If two ampersands don't work, add a third.' But how many of us really know how ampersands work behind the scenes? We show the function of multiple ampersands by going through examples of the common two- and three-ampersand scenarios, and expand to show four, five, six, and even seven ampersands, and explain when they might be (rarely) useful.

Read the paper (PDF).

Joe Matise, NORC at the University of Chicago

Paper 3141-2015:

Unstructured Data Mining to Improve Customer Experience in Interactive Voice Response Systems

Interactive Voice Response (IVR) systems are likely one of the best and worst gifts to the world of communication, depending on who you ask. Businesses love IVR systems because they take out hundreds of millions of dollars of call center costs in automation of routine tasks, while consumers hate IVRs because they want to talk to an agent! It is a delicate balancing act to manage an IVR system that saves money for the business, yet is smart enough to minimize consumer abrasion by knowing who they are, why they are calling, and providing an easy automated solution or a quick route to an agent. There are many aspects to designing such IVR systems, including engineering, application development, omni-channel integration, user interface design, and data analytics. For larger call volume businesses, IVRs generate terabytes of data per year, with hundreds of millions of rows per day that track all system and customer- facing events. The data is stored in various formats and is often unstructured (lengthy character fields that store API return information or text fields containing consumer utterances). The focus of this talk is the development of a data mining framework based on SAS^® that is used to parse and analyze IVR data in order to provide insights into usability of the application across various customer segments. Certain use cases are also provided.

Read the paper (PDF).

Dmitriy Khots, West Corp

Paper 3279-2015:

Update Microsoft Excel While Maintaining Your Formats

There have been many SAS^® Global Forum papers written about getting your data into SAS^® from Microsoft Excel and getting your data back out to Excel after using SAS to manipulate it. But sometimes you have to update Excel files with special formatting and formulas that would be too hard to replicate with Dynamic Data Exchange (DDE) or that change too often to make DDE worthwhile. But we can still use the output prowess of SAS and a sprinkling of Visual Basic for Applications (VBA) to maintain the existing formatting and formulas in your Excel file! This paper focuses on the possibilities you have in updating Excel files by reading in the data, using SAS to modify it as needed, and then using DDE and a simple Excel VBA macro to output back to Excel and use a formula while maintaining the existing formatting that is present in your source Excel file.

Read the paper (PDF). | Download the data file (ZIP).

Brian Wrobel, Pearson

Paper 3339-2015:

Using Analytics To Help Win The Presidential Election

In 2012, the Obama campaign used advanced analytics to target voters, especially in social media channels. Millions of voters were scored on models each night to predict their voting patterns. These models were used as the driver for all campaign decisions, including TV ads, budgeting, canvassing, and digital strategies. This presentation covers how the Obama campaign strategies worked, what's in store for analytics in future elections, and how these strategies can be applied in the business world.

Read the paper (PDF). | Watch the recording.

Peter Tanner, Capital One

Paper 2740-2015:

Using Heat Maps to Compare Clusters of Ontario DWI Drivers

SAS^® PROC FASTCLUS generates five clusters for the group of repeat clients of Ontario's Remedial Measures program. Heat map tables are shown for selected variables such as demographics, scales, factor, and drug use to visualize the difference between clusters.

Rosely Flam-Zalcman, CAMH

Robert Mann, CAM

Rita Thomas, CAMH

Paper SAS1681-2015:

Using SAS/OR^® to Optimize the Layout of Wind Farm Turbines

A Chinese wind energy company designs several hundred wind farms each year. An important step in its design process is micrositing, in which it creates a layout of turbines for a wind farm. The amount of energy that a wind farm generates is affected by geographical factors (such as elevation of the farm), wind speed, and wind direction. The types of turbines and their positions relative to each other also play a critical role in energy production. Currently the company is using an open-source software package to help with its micrositing. As the size of wind farms increases and the pace of their construction speeds up, the open-source software is no longer able to support the design requirements. The company wants to work with a commercial software vendor that can help resolve scalability and performance issues. This paper describes the use of the OPTMODEL and OPTLSO procedures on the SAS^® High-Performance Analytics infrastructure together with the FCMP procedure to model and solve this highly nonlinear optimization problem. Experimental results show that the proposed solution can meet the company's requirements for scalability and performance. A Chinese wind energy company designs several hundred wind farms each year. An important step of their design process is micro-siting, which creates a layout of turbines for a wind farm. The amount of energy generated from a wind farm is affected by geographical factors (such as elevation of the farm), wind speed, and wind direction. The types of turbines and their positions relative to each other also play critical roles in the energy production. Currently the company is using an open-source software package to help them with their micro-siting. As the size of wind farms increases and the pace of their construction speeds up, the open-source software is no longer able to support their design requirements. The company wants to work with a commercial software vendor that can help them resolve scalability and performance issues. This pap er describes the use of the FCMP, OPTMODEL, and OPTLSO procedures on the SAS^® High-Performance Analytics infrastructure to model and solve this highly nonlinear optimization problem. Experimental results show that the proposed solution can meet the company's requirements for scalability and performance.

Read the paper (PDF).

Sherry (Wei) Xu, SAS

Steven Gardner, SAS

Joshua Griffin, SAS

Baris Kacar, SAS

Jinxin Yi, SAS

Paper 3208-2015:

Using SAS/STAT^® Software to Implement a Multivariate Aadaptive Outlier Detection Approach to Distinguish Outliers from Extreme Values

Hawkins (1980) defines an outlier as an observation that deviates so much from other observations as to arouse the suspicion that it was generated by a different mechanism . To identify data outliers, a classic multivariate outlier detection approach implements the Robust Mahalanobis Distance Method by splitting the distribution of distance values into two subsets (within-the-norm and out-of-the-norm), with the threshold value usually set to the 97.5% quantile of the Chi-Square distribution with p (number of variables) degrees of freedom and items whose distance values are beyond it are labeled out-of-the-norm. This threshold value is an arbitrary number, however, and it might flag as out-of-the-norm a number of items that are actually extreme values of the baseline distribution rather than outliers. Therefore, it is desirable to identify an additional threshold, a cutoff point that divides the set of out-of-norm points in two subsets--extreme values and outliers. One way to do this--in particular for larger databases--is to Increase the threshold value to another arbitrary number, but this approach requires taking into consideration the size of the data set since size affects the threshold-separating outliers from extreme values. A 2003 article by Gervini (Journal of Multivariate Statistics) proposes an adaptive threshold that increases with the number of items n if the data is clean but it remains bounded if there are outliers in the data. In 2005 Filzmoser, Garrett, and Reimann (Computers & Geosciences) built on Gervini's contribution to derive by simulation a relationship between the number of items n, the number of variables in the data p, and a critical ancillary variable for the determination of outlier thresholds. This paper implements the Gervini adaptive threshold value estimator by using PROC ROBUSTREG and the SAS^® Chi-Square functions CINV and PROBCHI, available in the SAS/STAT^® environment. It also provides data simulations to illustrate the reliab ility and the flexibility of the method in distinguishing true outliers from extreme values.

Read the paper (PDF).

Paulo Macedo, Integrity Management Services, LLC

Paper 1340-2015:

Using SAS^® Macros to Flag Claims Based on Medical Codes

Many epidemiological studies use medical claims to identify and describe a population. But finding out who was diagnosed, and who received treatment, isn't always simple. Each claim can have dozens of medical codes, with different types of codes for procedures, drugs, and diagnoses. Even a basic definition of treatment could require a search for any one of 100 different codes. A SAS^® macro may come to mind, but generalizing the macro to work with different codes and types allows it to be reused in a variety of different scenarios. We look at a number of examples, starting with a single code type and variable. Then we consider multiple code variables, multiple code types, and multiple flag variables. We show how these macros can be combined and customized for different data with minimal rework. Macro flexibility and reusability are also discussed, along with ways to keep our list of medical codes separate from our program. Finally, we discuss time-dependent medical codes, codes requiring database lookup, and macro performance.

Read the paper (PDF). | Download the data file (ZIP).

Andy Karnopp, Fred Hutchinson Cancer Research Center

Paper 3202-2015:

Using SAS^® Mapping Functionality to Measure and Present the Veracity of Location Data

Crowd sourcing of data is growing rapidly, enabled by smart devices equipped with assisted GPS location, tagging of photos, and mapping other aspects of the users' lives and activities. A fundamental assumption that the reported locations are accurate within the usual GPS limitations of approximately 10m is made when such data is used. However, as a result of a wide range of technical issues, it turns out that the accuracy of the reported locations is highly variable and cannot be relied on; some locations are accurate but many are highly inaccurate, and that can affect many of the decisions that are being made based on the data. An analysis of a set of data is presented that demonstrates that this assumption is flawed, and examples of the levels of inaccuracy that has significant consequences in a range of contexts are provided. By using Base SAS^®, the paper demonstrates the quality and veracity of the data and the scale of the errors that can be present. This analysis has critical significance in fields such as mobile location-based marketing, forensics, and law.

Read the paper (PDF).

Richard Self, University of Derby

Paper 3281-2015:

Using SAS^® to Create Episodes-of-Hospitalization for Health Services Research

An essential part of health services research is describing the use and sequencing of a variety of health services. One of the most frequently examined health services is hospitalization. A common problem in describing hospitalizations is that a patient might have multiple hospitalizations to treat the same health problem. Specifically, a hospitalized patient might be (1) sent to and returned from another facility in a single day for testing, (2) transferred from one hospital to another, and/or (3) discharged home and re-admitted within 24 hours. In all cases, these hospitalizations are treating the same underlying health problem and should be considered as a single episode. If examined without regard for the episode, a patient would be identified as having 4 hospitalizations (the initial hospitalization, the testing hospitalization, the transfer hospitalization, and the readmission hospitalization). In reality, they had one hospitalization episode spanning multiple facilities. IMPORTANCE: Failing to account for multiple hospitalizations in the same episode has implications for many disciplines including health services research, health services planning, and quality improvement for patient safety. HEALTH SERVICES RESEARCH: Hospitalizations will be counted multiple times, leading to an overestimate of the number of hospitalizations a person had. For example, a person can be identified as having 4 hospitalizations when in reality they had one episode of hospitalization. This will result in a person appearing to be a higher user of health care than is true. RESOURCE PLANNING FOR HEALTH SERVICES. The average time and resources needed to treat a specific health problem may be underestimated. To illustrate, if a patient spends 10 days each in 3 different hospitals in the same episode, the total number of days needed to treat the health problem is 30 days, but each hospital will believe it is only 10, and planned resourcing may be inadequate. QUALITY IMPROVEMENT FOR PATIENT SAFETY. Hospital-acquir ed infections are a serious concern and a major cause of extended hospital stays, morbidity, and death. As a result, many hospitals have quality improvement programs that monitor the occurrence of infections in order to identify ways to reduce them. If episodes of hospitalizations are not considered, an infection acquired in a hospital that does not manifest until a patient is transferred to a different hospital will incorrectly be attributed to the receiving hospital. PROPOSAL: We have developed SAS^® code to identify episodes of hospitalizations, the sequence of hospitalizations within each episode, and the overall duration of the episode. The output clearly displays the data in an intuitive and easy-to-understand format. APPLICATION: The method we will describe and the associated SAS code will be useful to not only health services researchers, but also anyone who works with temporal data that includes nested, overlapping, and subsequent events.

Read the paper (PDF).

Meriç Osman, Health Quality Council

Jacqueline Quail, Saskatchewan Health Quality Council

Nianping Hu, Saskatchewan Health Quality Council

Nedeene Hudema, Saskatchewan Health Quality Council

Paper 3452-2015:

Using SAS^® to Increase Profitability through Cluster Analysis and Simplicity: Follow the Money

Developing a quality product or service, while at the same time improving cost management and maximizing profit, are challenging goals for any company. Finding the optimal balance between efficiency and profitability is not easy. The same can be said in regards to the development of a predictive statistical model. On the one hand, the model should predict as accurately as possible. On the other hand, having too many predictors can end up costing company money. One of the purposes of this project is to explore the cost of simplicity. When is it worth having a simpler model, and what are some of the costs of using a more complex one? The answer to that question leads us to another one: How can a predictive statistical model be maximized in order to increase a company's profitability? Using data from the consumer credit risk domain provided from CompuCredit (now Atlanticus), we used logistic regression to build binary classification models to predict the likelihood of default. This project compares two of these models. Although the original data set had several hundred predictor variables and more than a million observations, I chose to use rather simple models. My goal was to develop a model with as few predictors as possible, while not going lower than a concordant level of 80%. Two models were evaluated and compared based on efficiency, simplicity, and profitability. Using the selected model, cluster analysis was then performed in order to maximize the estimated profitability. Finally, the analysis was taken one step further through a supervised segmentation process, in order to target the most profitable segment of the best cluster.

Read the paper (PDF).

Sherrie Rodriguez, Kennesaw State University

Paper 1335-2015:

Using the GLIMMIX and GENMOD Procedures to Analyze Longitudinal Data from a Department of Veterans Affairs Multisite Randomized Controlled Trial

Many SAS^® procedures can be used to analyze longitudinal data. This study employed a multisite randomized controlled trial design to demonstrate the effectiveness of two SAS procedures, GLIMMIX and GENMOD, to analyze longitudinal data from five Department of Veterans Affairs Medical Centers (VAMCs). Older male veterans (n = 1222) seen in VAMC primary care clinics were randomly assigned to two behavioral health models, integrated (n = 605) and enhanced referral (n = 617). Data was collected at baseline, and at 3-, 6-, and 12- month follow-up. A mixed-effects repeated measures model was used to examine the dependent variable, problem drinking, which was defined as count and dichotomous from baseline to 12 month follow-up. Sociodemographics and depressive symptoms were included as covariates. First, bivariate analyses included general linear model and chi-square tests to examine covariates by group and group by problem drinking outcomes. All significant covariates were included in the GLIMMIX and GENMOD models. Then, multivariate analysis included mixed models with Generalized Estimation Equations (GEEs). The effect of group, time, and the interaction effect of group by time were examined after controlling for covariates. Multivariate results were inconsistent for GLIMMIX and GENMOD using Lognormal, Gaussian, Weibull, and Gamma distributions. SAS is a powerful statistical program in data analyses for longitudinal study.

Read the paper (PDF).

Abbas Tavakoli, University of South Carolina/College of Nursing

Marlene Al-Barwani, University of South Carolina

Sue Levkoff, University of South Carolina

Selina McKinney, University of South Carolina

Nikki Wooten, University of South Carolina

Paper 2182-2015:

Using the SAS^® Output Delivery System (ODS) and the TEMPLATE Procedure to Replace Dynamic Data Exchange (DDE)

Many papers have been written over the years that describe how to use Dynamic Data Exchange (DDE) to pass data from SAS^® to Excel. This presentation aims to show you how to do the same exchange with the SAS Output Delivery System (ODS) and the TEMPLATE Procedure.

Read the paper (PDF). | Download the data file (ZIP).

Peter Timusk, Statistics Canada

V

Paper 3475-2015:

Video Killed the Radio Star

How do you serve 25 million video ads a day to Internet users in 25 countries, while ensuring that you target the right ads to the right people on the right websites at the right time? With a lot of help from math, that's how! Come hear how Videology, an Internet advertising company, combines mathematical programming, predictive modeling, and big data techniques to meet the expectations of advertisers and online publishers, while respecting the privacy of online users and combatting fraudulent Internet traffic.

Kaushik Sinha, Videology

W

Paper SAS1440-2015:

Want an Early Picture of the Data Quality Status of Your Analysis Data? SAS^® Visual Analytics Shows You How

When you are analyzing your data and building your models, you often find out that the data cannot be used in the intended way. Systematic pattern, incomplete data, and inconsistencies from a business point of view are often the reason. You wish you could get a complete picture of the quality status of your data much earlier in the analytic lifecycle. SAS^® analytics tools like SAS^® Visual Analytics help you to profile and visualize the quality status of your data in an easy and powerful way. In this session, you learn advanced methods for analytic data quality profiling. You will see case studies based on real-life data, where we look at time series data from a bird's-eye-view and interactively profile GPS trackpoint data from a sail race.

Read the paper (PDF). | Download the data file (ZIP).

Gerhard Svolba, SAS

Paper SAS1832-2015:

What's New in SAS^® Studio?

If you have not had a chance to explore SAS^® Studio yet, or if you're anxious to see what's new, this paper gives you an introduction to this new browser-based interface for SAS^® programmers and a peek at what's coming. With SAS Studio, you can access your data files, libraries, and existing programs, and you can write new programs while using SAS software behind the scenes. SAS Studio connects to a SAS server in order to process SAS programs. The SAS server can be a hosted server in a cloud environment, a server in your local environment, or a copy of SAS on your local machine. It's a comfortable environment for those used to the traditional SAS windowing environment (SAS^® Display Manager) but new features like a query window, process flow diagrams, and tasks have been added to appeal to traditional SAS^® Enterprise Guide^® users.

Read the paper (PDF).

Mike Porter, SAS

Michael Monaco, SAS

Amy Peters, SAS

Paper SAS1572-2015:

When I'm 64-bit: How to Still Use 32-bit DLLs in Microsoft Windows

Now that SAS^® users are moving to 64-bit Microsoft Windows platforms, some are discovering that vendor-supplied DLLs might still be 32-bit. Since 64-bit applications cannot use 32-bit DLLs, this would present serious technical issues. This paper explains how the MODULE routines in SAS can be used to call into 32-bit DLLs successfully, using new features added in SAS^® 9.3.

Read the paper (PDF).

Rick Langston, SAS

Paper 3216-2015:

Where Did My Students Go?

Many freshmen leave their first college and go on to attend another institution. Some of these students are even successful in earning degrees elsewhere. As there is more focus on college graduation rates, this paper shows how the power of SAS^® can pull in data from many disparate sources, including the National Student Clearinghouse, to answer questions on the minds of many institutional researchers. How do we use the data to answer questions such as What would my graduation rate be if these students graduated at my institution instead of at another one?', What types of schools do students leave to attend? , and Are there certain characteristics of students who leave, and are they concentrated in certain programs? The data-handling capabilities of SAS are perfect for this type of analysis, and this presentation walks you through the process.

Read the paper (PDF).

Stephanie Thompson, Datamum

Paper 3387-2015:

Why Aren't Exception Handling Routines Routine? Toward Reliably Robust Code through Increased Quality Standards in Base SAS^®

A familiar adage in firefighting--if you can predict it, you can prevent it--rings true in many circles of accident prevention, including software development. If you can predict that a fire, however unlikely, someday might rage through a structure, it's prudent to install smoke detectors to facilitate its rapid discovery. Moreover, the combination of smoke detectors, fire alarms, sprinklers, fire-retardant building materials, and rapid intervention might not prevent a fire from starting, but it can prevent the fire from spreading and facilitate its immediate and sometimes automatic extinguishment. Thus, as fire codes have grown to incorporate increasingly more restrictions and regulations, and as fire suppression gear, tools, and tactics have continued to advance, even the harrowing business of firefighting has become more reliable, efficient, and predictable. As operational SAS^® data processes mature over time, they too should evolve to detect, respond to, and overcome dynamic environmental challenges. Erroneous data, invalid user input, disparate operating systems, network failures, memory errors, and other challenges can surprise users and cripple critical infrastructure. Exception handling describes both the identification of and response to adverse, unexpected, or untimely events that can cause process or program failure, as well as anticipated events or environmental attributes that must be handled dynamically through prescribed, predetermined channels. Rapid suppression and automatic return to functioning is the hopeful end state but, when catastrophic events do occur, exception handling routines can terminate a process or program gracefully while providing meaningful execution and environmental metrics to developers both for remediation and future model refinement. This presentation introduces fault-tolerant Base SAS^® exception handling routines that facilitate robust, reliable, and responsible software design.

Read the paper (PDF).

Troy Hughes, Datmesis Analytics

Paper 3322-2015:

Why Two Good SAS^® Programmers Are Better Than One Great SAS^® Programmer

The experiences of the programmer role in a large SAS^® shop are shared. Shortages in SAS programming talent tend to result in one SAS programmer doing all of the production programming within a unit in a shop. In a real-world example, management realized the problem and brought in new programmers to help do the work. The new programmers actually improved the existing programmers' programs. It became easier for the experienced programmers to complete other programming assignments within the unit. And, the different programs in the shop had a standard structure. As a result, all of the programmers had a clearer picture of the work involved and knowledge hoarding was eliminated. Experienced programmers were now available when great SAS code needed to be written. Yet, they were not the only programmers who could do the work! With multiple programmers able to do the same tasks, vacations were possible and didn't threaten deadlines. It was even possible for these programmers to be assigned other tasks outside of the unit and broaden their own skills in statistical production work.

Read the paper (PDF).

Peter Timusk, Statistics Canada

Paper 3388-2015:

Will You Smell Smoke When Your Data Is on Fire? The SAS^® Smoke Detector: A Scalable Quality Control Dashboard for Transactional and Persistent Data

Smoke detectors operate by comparing actual air quality to expected air quality standards and immediately alerting occupants when smoke or particle levels exceed established thresholds. Just as rapid identification of smoke (that is, poor air quality) can detect harmful fire and facilitate its early extinguishment, rapid detection of poor quality data can highlight data entry or ingestion errors, faulty logic, insufficient or inaccurate business rules, or process failure. Aspects of data quality--such as availability, completeness, correctness, and timeliness--should be assessed against stated requirements that account for the scope, objective, and intended use of data products. A single outlier, an accidentally locked data set, or even subtle modifications to a data structure can cause a robust extract-transform-load (ETL) infrastructure to grind to a halt or produce invalid results. Thus, a mature data infrastructure should incorporate quality assurance methods that facilitate robust processing and quality data products, as well as quality control methods that monitor and validate data products against their stated requirements. The SAS^® Smoke Detector represents a scalable, generalizable solution that assesses the availability, completeness, and structure of persistent SAS data sets, ideal for finished data products or transactional data sets received with standardized frequency and format. Like a smoke detector, the quality control dashboard is not intended to discover the source of the blaze, but rather to sound an alarm to stakeholders that data have been modified, locked, deleted, or otherwise corrupted. Through rapid detection and response, the fidelity of data is increased as well as the responsiveness of developers to threats to data quality and validity.

Read the paper (PDF).

Troy Hughes, Datmesis Analytics

Paper 3390-2015:

Working with PROC FEDSQL in SAS^® 9.4

Working with multiple data sources in SAS^® was not a straight forward thing until PROC FEDSQL was introduced in the SAS^® 9.4 release. Federated Query Language, or FEDSQL, is a vendor-independent language that provides a common SQL syntax to communicate across multiple relational databases without having to worry about vendor-specific SQL syntax. PROC FEDSQL is a SAS implementation of the FEDSQL language. PROC FEDSQL enables us to write federated queries that can be used to perform joins on tables from different databases with a single query, without having to worry about loading the tables into SAS individually and combining them using DATA steps and PROC SQL statements. The objective of this paper is to demonstrate the working of PROC FEDSQL to fetch data from multiple data sources such as Microsoft SQL Server database, MySQL database, and a SAS data set, and run federated queries on all the data sources. Other powerful features of PROC FEDSQL such as transactions and FEDSQL pass-through facility are discussed briefly.

Read the paper (PDF).

Zabiulla Mohammed, Oklahoma State University

Ganesh Kumar Gangarajula, Oklahoma State University

Pradeep Reddy Kalakota, Federal Home Loan Bank of Desmoines

Y

Paper 3262-2015:

Yes, SAS^® Can Do! Manage External Files with SAS Programming

Managing and organizing external files and directories play an important part in our data analysis and business analytics work. A good file management system can streamline project management and file organizations and significantly improve work efficiency . Therefore, under many circumstances, it is necessary to automate and standardize the file management processes through SAS^® programming. Compared with managing SAS files via PROC DATASETS, managing external files is a much more challenging task, which requires advanced programming skills. This paper presents and discusses various methods and approaches to managing external files with SAS programming. The illustrated methods and skills can have important applications in a wide variety of analytic work fields.

Read the paper (PDF).

Justin Jia, Trans Union

Amanda Lin, CIBC

Paper 3259-2015:

You Deserve ARRAYS: How to Be More Efficient Using SAS^®!

Everyone likes getting a raise, and using ARRAYs in SAS^® can help you do just that! Using ARRAYs simplifies processing, allowing for reading and analyzing of repetitive data with minimum coding. Improving the efficiency of your coding and in turn, your SAS productivity, is easier than you think! ARRAYs simplify coding by identifying a group of related variables that can then be referred to later in a DATA step. In this quick tip, you learn how to define an ARRAY using an array statement that establishes the ARRAY name, length, and elements. You also learn the criteria and limitations for the ARRAY name, the requirements for array elements to be processed as a group, and how to call an ARRAY and specific array elements within a DATA step. This quick tip reveals useful functions and operators, such as the DIM function and using the OF operator within existing SAS functions, that make using ARRAYs an efficient and productive way to process data. This paper takes you through an example of how to do the same task with and without using ARRAYs in order to illustrate the ease and benefits of using them. Your coding will be more efficient and your data analysis will be more productive, meaning you will deserve ARRAYs!

Read the paper (PDF).

Kate Burnett-Isaacs, Statistics Canada

Paper 3407-2015:

You Say Day-ta, I Say Dat-a: Measuring Coding Differences, Considering Integrity and Transferring Data Quickly

We all know there are multiple ways to use SAS^® language components to generate the same values in data sets and output (for example, using the DATA step versus PROC SQL, If-Then-Elses versus Format table conversions, PROC MEANS versus PROC SQL summarizations, and so on). However, do you compare those different ways to determine which are the most efficient in terms of computer resources used? Do you ever consider the time a programmer takes to develop or maintain code? In addition to determining efficient syntax, do you validate your resulting data sets? Do you ever check data values that must remain the same after being processed by multiple steps and verify that they really don't change? We share some simple coding techniques that have proven to save computer and human resources. We also explain some data validation and comparison techniques that ensure data integrity. In our distributed computing environment, we show a quick way to transfer data from a SAS server to a local client by using PROC DOWNLOAD and then PROC EXPORT on the client to convert the SAS data set to a Microsoft Excel file.

Read the paper (PDF).

Jason Beene, WellsFargo

Mary Katz, Wells Fargo Bank

Paper 2004-2015:

Your Database Can Do SAS^®, Too!

How often have you pulled oodles of data out of the corporate data warehouse down into SAS^® for additional processing? This additional processing, sometimes thought to be uniquely SAS, might include FIRST. logic, cumulative totals, lag functionality, specialized summarization, or advanced date manipulation. Using the analytical (or OLAP) and Windowing functionality available in many databases (for example, in Teradata and IBM Netezza ), all of this processing can be performed directly in the database without moving and reprocessing detail data unnecessarily. This presentation illustrates how to increase your coding and execution efficiency by using the database's power through your SAS environment.

Read the paper (PDF).

Harry Droogendyk, Stratia Consutling Inc.