SAS Global Forum 2017 Proceedings

A C F I O R S U W Y

A

Session 1223-2017:

A SAS^® Macro for Covariate Specification in Linear, Logistic, and Survival Regression

Specifying the functional form of a covariate is a fundamental part of developing a regression model. The choice to include a variable as continuous, categorical, or as a spline can be determined by model fit. This paper offers an efficient and user-friendly SAS^® macro (%SPECI) to help analysts determine how best to specify the appropriate functional form of a covariate in a linear, logistic, and survival analysis model. For each model, our macro provides a graphical and statistical single-page comparison report of the covariate as a continuous, categorical, and restricted cubic spline variable so that users can easily compare and contrast results. The report includes the residual plot and distribution of the covariate. You can also include other covariates in the model for multivariable adjustment. The output displays the likelihood ratio statistic, the Akaike Information Criterion (AIC), as well as other model-specific statistics. The %SPECI macro is demonstrated using an example data set. The macro includes the PROC REG, PROC LOGISTIC, PROC PHREG, PROC REPORT, PROC SGPLOT, and more procedures in SAS^® 9.4.

Read the paper (PDF) | Download the data file (ZIP)

Sai Liu, Stanford University

Session 1090-2017:

A Step-by-Step Solution to Create a Customized Graph for Grouped Data Using Graph Template Language

Visual+D2:D18ization is a critical part of turning data into knowledge. A customized graph is essential to make data visualization meaningful, powerful, and interpretable. Furthermore, customizing grouped data into a desired layout with specific requirements such as clusters, colors, symbols, and patterns for each group can be challenging. This paper provides a start-from-scratch, step-by-step solution to create a customized graph for grouped data using SAS^® Graph Template Language (GTL). By analyzing the data and target graph with the available tools and options that GTL provided, this paper demonstrates GTL is a powerful and flexible tool to create a customized, complex graph.

Read the paper (PDF)

Elva Chen, Pharmacyclics

Session 0839-2017:

Accelerate Your Data Prep with SAS^® Code Accelerator

Accelerate your data preparation by having your DS2 execute without translation inside the Teradata database or on the Hadoop platform with SAS^® Code Accelerator. This presentation shows how easy it is to use SAS Code Accelerator via a live demonstration.

Read the paper (PDF)

Paul Segal, Teradata

Session 1165-2017:

Advanced, Dynamic, and Effective Dashboarding with SAS^® Visual Analytics

SAS^® Visual Analytics provides a robust platform to perform business intelligence through a high-end and advanced dashboarding style. In today's technology era, dashboards not only help in gaining insight into an organization's operations, but they also are a key performance indicator. In this paper, I discuss five important and frequently used objects in SAS Visual Analytics. These objects are used to get the most out of dashboards in an effective and efficient way. This paper covers the use of dates (as a format) in the date slider and gauges, cascading filters, custom graphs, linking reports within sections of the same report or with other reports, and associating buttons with graphs for dynamic functionality.

Read the paper (PDF)

Abhilasha Tiwari, Accenture

C

Session 0142-2017:

Create a Unique Datetime Stamp for Filenames or Many Other Purposes

This paper shows how to use Base SAS^® to create unique datetime stamps that can be used for naming external files. These filenames provide automatic versioning for systems and are intuitive and completely sortable. In addition, they provide enhanced flexibility compared to generation data sets, which can be created by SAS^® or by the operating system.

Read the paper (PDF)

Joe DeShon, Boehringer Ingelheim Animal Health

F

Session 1047-2017:

Find Daylight Saving Time and Forget It

Finding daylight saving time (DST) is a common task for manipulating time series data. The date of daylight saving time changes every year. If SAS^® programmers depend on manually entering the value of daylight saving time in their programs, the maintenance of the program becomes tedious. Using a SAS function can make finding the value easy. This paper discusses several ways to capture and use daylight saving time.

Read the paper (PDF)

Chao-Ying Hsieh, Southern Company Services, Inc.

Session 1108-2017:

Fitting a Cumulative Logistic Regression

Cumulative logistic regression models are used to predict an ordinal response. They have the assumption of proportional odds. Proportional odds means that the coefficients for each predictor category must be consistent or have parallel slopes across all levels of the response. This paper uses a sample data set to demonstrate how to test the proportional odds assumption. It shows how to use the UNEQUALSLOPES option when the assumption is violated. A cumulative logistic regression model is built, and then the performance of the model on a test set is compared to the performance of a generalized multinomial model. This shows the utility and necessity of the UNEQUALSLOPES option when building a cumulative logistic regression model. The procedures shown are produced using SAS^® Enterprise Guide^® 7.1.

Read the paper (PDF)

Shana Kelly, Spectrum Health

Session 0959-2017:

Formats Are Your Friends

Formats can be used for more than just making your data look nice. They can be used in memory lookup tables and to help you create data-driven code. This paper shows you how to build a format from a data set, how to write a format out as a data set, and how to use formats to make programs data driven. Examples are provided.

Read the paper (PDF)

Anita Measey, BMO Financial Group

Lei Sun, BMO Financial Group

I

Session 0971-2017:

Instant Formats in a Blink with PROC FORMAT CNTLIN=

Do you need to create a format instantly? Does the format have a lot of labels, and it would take a long time to type in all the codes and labels by hand? Sometimes, a SAS^® programmer needs to create a user-defined format for hundreds or thousands of codes, and he needs an easy way to accomplish this without having to type in all of the codes. SAS provides a way to create a user-defined format without having to type in any codes. If the codes and labels are in a text file, SAS data set, Excel file, or in any file that can be converted to a SAS data set, then a SAS user-defined format can be created on the fly. The CNTLIN=option of PROC FORMAT allows a user to create a user-defined format or informat from raw data or from a SAS file. This paper demonstrates how to create two user-defined formats instantly from a raw text file on our Census Bureau website. It explains how to use these user-defined formats for the final report and final output data set from PROC TABULATE. The paper focuses on the CNTLIN= option of PROC FORMAT, not the CNTLOUT= option.

Read the paper (PDF)

Christopher Boniface, U.S. Census Bureau

O

Session 0973-2017:

ODS TAGSETS.EXCELXP and ODS EXCEL Showdown

Do you create Excel files from SAS^®? Do you use the ODS EXCELXP tagset or the ODS EXCEL destination? In this presentation, the EXCELXP tagset and the ODS EXCEL destination are compared face to face. There's gonna be a showdown! We give quick tips for each and show how to create Excel files for our Special Census program. Pros of each method are explored. We show the added benefits of the ODS EXCEL destination. We display how to create XML files with the EXCELXP tagset. We present how to use TAGATTR formats with the EXCELXP tagset to ensure that leading and trailing zeros in Excel are preserved. We demonstrate how to create the same Excel file with the ODS EXCEL destination with SAS formats instead of with TAGATTR formats. We show how the ODS EXCEL destination creates native Excel files. One of the drawbacks of an XML file created with the EXCELXP tagset is that a pop-up message is displayed in Excel each time you open it. We present differences using the ABSOLUTE_COLUMN_WIDTH= option in both methods.

Read the paper (PDF)

Christopher Boniface, U.S. Census Bureau

R

Session 1093-2017:

Run It in Parallel: Improving the Flow of Windows Services

SAS^® job flows created by Windows services have a problem. Currently, they can execute only jobs in a series (one at a time). This can slow down job processing, and it limits the utility of the flows. This paper shows how you can alter the flow of Windows services after they have been generated to enable jobs to run in parallel (side by side). A high-level overview of PROC GROOVY, which automates these changes, is provided, as well as a summary of the positives and negatives of running jobs in parallel.

Read the paper (PDF) | Download the data file (ZIP)

David Kratz, D-Wise Technologies Inc.

S

Session 1232-2017:

SAS^® Abbreviations: Shortcuts for Remembering Complicated Syntax

One of the many difficulties for a SAS^® programmer is remembering how to accurately use SAS syntax, especially syntax that includes many parameters. Not mastering basic syntax parameters definitely makes coding inefficient because the programmer has to check reference manuals constantly to ensure that syntax is correct. One of the more useful but somewhat unknown tools in SAS is the use of SAS abbreviations. This feature enables users to store text strings (such as the syntax of a DATA step function, a SAS procedure, or a complete DATA step) in a user-defined and easy-to-remember abbreviation. When this abbreviation is entered in the enhanced editor, SAS automatically brings up the corresponding stored syntax. Knowing how to use SAS abbreviations is beneficial to programmers with varying levels of SAS expertise. In this paper, various examples of using SAS abbreviations are demonstrated.

Yaorui Liu, USC

Session 0969-2017:

SAS^® Macros for Binning Predictors with a Binary Target

Binary logistic regression models are widely used in CRM (customer relationship management) or credit risk modeling. In these models, it is common to use nominal, ordinal, or discrete (NOD) predictors. NOD predictors typically are binned (reducing the number of their levels) before usage in a logistic model. The primary purpose of binning is to obtain parsimony without greatly reducing the strength of association of the predictor X to the binary target Y. In this paper, two SAS^® macros are discussed. The %NOD_BIN macro bins predictors with nominal values (and ordinal and discrete values) by collapsing levels to maximize information value (IV). The %ORDINAL_BIN macro is applied to predictors that are ordered and in which collapsing can occur only for levels that are adjacent in the ordering of X. The %ORDINAL_BIN macro finds all possible binning solutions by complete enumeration. Solutions are ranked by IV, and monotonic solutions are identified.

Read the paper (PDF)

Bruce Lund, Magnify Analytic Solutions

Session 1060-2017:

Snail Mail to Auto-Emails: Generating Emails from SAS^® with Attachments

Are you tired of constantly creating new emails each and every time you run a report, frantically searching for the reports, attaching said reports, and writing emails, all the while thinking there has to be a better way? Then, have I got some code to share with you! This session provides you with code to flee from your old ways of emailing data and reports. Instead, you set up your SAS^® code to send an email to your recipients. The email attaches the most current files each and every time the code is run. You do not have to do anything manually after you run your SAS code. This session provides SAS programmers with instructions about how to create their own email in a macro that is based on their current reports. We demonstrate different options to customize the code to add the email body (and to change the body) and to add attachments (such as PDF and Excel). We show you an additional macro that checks whether a file exists and adds a note in the SAS log if it is missing so that you won't get a warning message. Using SAS code, you will become more efficient and effective by automating a tedious process and reducing errors in email attachments, wording, and recipient lists.

Read the paper (PDF)

Crystal Carel, Baylor Scott & White Health

Session 0385-2017:

Some Tricks in Graph Template Language

SAS^® 9.4 Graph Template Language: Reference has more than 1300 pages and hundreds of options and statements. It is no surprise that programmers sometimes experience unexpected twists and turns when using the graph template language (GTL) to draw figures. Understandably, it is easy to become frustrated when your program fails to produce the desired graphs despite your best effort. Although SAS needs to continue improving GTL, this paper offers several tricks that help overcome some of the roadblocks in graphing.

Read the paper (PDF)

Amos Shu, AstraZeneca

Session 0984-2017:

Something for Nothing? Adding Flag Variables Using Group Descriptive Statistics and PROC SQL

Can you actually get something for nothing? With PROC SQL's subquery and remerging features, then yes, you can. When working with categorical variables, there is often a need to add flag variables based on group descriptive statistics, such as group counts and minimum and maximum values. Instead of first creating the group count or minimum or maximum values, and then merging the summarized data set with the original data set with conditional statements creating a flag variable, why not take advantage of PROC SQL to complete three steps in one? With PROC SQL's subquery, CASE-WHEN clause, and summary functions by the group variable, you can easily remerge the new flag variable back with the original data set.

Read the paper (PDF)

Sunil Gupta, Cytel

Session 1419-2017:

Stored Processes or How to Make You Use SAS^® Without Even Knowing It!

Dealing with analysts and managers who do not know how to or want to use SAS^® can be quite tricky if everything you are doing uses SAS. This is where stored processes using SAS^® Enterprise Guide^® comes in handy. Once you know what they want to get out of the code, prompts can be defined in a smart and flexible way to give all users (whether they are SAS or not) full control over the output of the code. The key is having code that requires minimal maintenance and for you to be very flexible so that you can accommodate anything that the user comes up with. This session provides examples of credit risk stress testing where loss forecasting results were presented using different levels. Results were driven by a stored process prompt using a simple DATA step, PROC SQL, and PROC REPORT. This functionality can be used in other industries where data is shown using different levels of granularity.

Read the paper (PDF)

Edmund Lee, Bank of Montreal

Session 1229-2017:

String Search in SAS^® Visual Analytics Records

In SAS^® Visual Analytics, we demonstrate a search functionality that enables users to filter a LASR table for records containing a search string. The search is performed on selected character fields that are defined for the table. The search string can be portions of words. Each additional string to search for narrows the search results.

Read the paper (PDF)

robbert rahamat, Accenture

U

Session 1074-2017:

Use Internal SAS Metadata User and Authentication Domain to Connect to an FTP Server

Within a SOX-compliant environment, a batch job is run. During the process, an FTP server needs to be accessed. The batch user password is not known and the FTP credentials are not known either. How safely and securely can we achieve this? The approach is to have an authentication domain within the SAS metadata created that has the FTP credentials. Create an internal SAS user within the SAS metadata. This user exists only within the SAS metadata, so it does not pose any risk. Create an FTP server within the SAS metadata. Add and link everything together within the SAS metadata. Within the SAS batch job, the SAS internal user will be used (with the use of the hashed password) to connect to the metadata to get the FTP credentials stored within the authentication domain and retrieve or upload the data.

Read the paper (PDF)

Sebastian Scanzi, S4S Consulting

Session 1122-2017:

Using Hash Tables for Creating Electronic Code Books

Hash tables are powerful tools when building an electronic code book, which often requires a lot of match-merging between the SAS^® data sets. In projects that span multiple years (e.g., longitudinal studies), there are usually thousands of new variables introduced at the end of every year or at the end of each phase of the project. These variables usually have the same stem or core as the previous year's variables. However, they differ only in a digit or two that usually signifies the year number of the project. So, every year, there is this extensive task of comparing thousands of new variables to older variables for the sake of carrying forward key database elements corresponding to the previously defined variables. These elements can include the length of the variable, data type, format, discrete or continuous flag, and so on. In our SAS program, hash objects are efficiently used to cut down not only time, but also the number of DATA and PROC steps used to accomplish the task. Clean and lean code is much easier to understand. A macro is used to create the data set containing new and older variables. For a specific new variable, the FIND method in hash objects is used in a loop to find the match to the most recent older variable. What was taking around a dozen PROC SQL steps is now a single DATA step using hash tables.

Read the paper (PDF)

Raghav Adimulam, Westat

Session 0169-2017:

Using the New ODS EXCEL Destination in SAS^® 9.4 When Working with Remote Servers

The ODS EXCEL destination has made sharing SAS^® reports and graphs much easier. What is even more exciting is that this destination is available for use regardless of the platform. This is extremely useful when reporting is performed on remote servers. This presentation goes through the basics of using the ODS EXCEL destination and shows specific examples of how to use this in a remote environment. Examples for both SAS^® on Windows and in SAS^® Enterprise Guide^® are provided.

Read the paper (PDF)

Thomas Bugg, Wells Fargo Home Mortgage

Session 0923-2017:

Using the SYMPUT Function to Automatically Choose Reference for Bivariate Cox Proportional Models

Bivariate Cox proportional models are used when we test the association between a single covariate and the outcome. The test repeats for each covariate of interest. SAS^® uses the last category as the default reference. This raises problems when we want to keep using 0 as our reference for each covariate. The reference group can be changed in the CLASS statement. But, if a format is associated with a covariate, we have to use the corresponding format instead of raw numeric data. This problem becomes even worse when we have to repeat the test and manually enter the reference every single time. This presentation demonstrates one way of fixing the problem using the MACRO function and SYMPUT function.

Read the paper (PDF) | Download the data file (ZIP)

Zhongjie Cai, University of Southern California

W

Session 1436-2017:

Writing SAS^® Code on the Fly Using SAS Code as Character Variables

A SAS^® program with the extension .SAS is simply a text file. This fact opens the door to many powerful results. You can read a typical SAS program into a SAS data set as a text file with a character variable, with one line of the program being one record in the data set. The program's code can be changed, and a new program can be written as a simple text file with a .SAS extension. This presentation shows an example of dynamically editing SAS code on the fly and generating statistics about SAS programs.

Read the paper (PDF)

Peter Timusk, Statistics Canada

Y

Session 1429-2017:

Yet Another Lookup Method: Updatable Indexed Data Sets

SAS^® has many methods of doing table lookups in DATA steps: formats, arrays, hash objects, the SASMSG function, indexed data sets, and so on. Of these methods, hash objects and indexed data sets enable you to specify multiple lookup keys and to return multiple table values. Both methods can be updated dynamically in the middle of a DATA step as you obtain new information (such as reading new keys from an input file or creating new synthetic keys). Hash objects are very flexible, fast, and fairly easy to use, but they are limited by the amount of data that can be held in memory. Indexed data sets can be slower, but they are not limited by what can be held in memory. As a result, they might be your only option in some circumstances. This presentation discusses how to use an indexed data set for table lookup and how to update it dynamically using the MODIFY statement and its allies.

Read the paper (PDF)

Jack Hamilton, Kaiser Permanente