SAS Global Forum 2018 Proceedings

Session 4113-2020

%SURVEYCORRCOV Macro: Complex Survey Data Correlations for Multivariate Analysis and Model Building

SAS^® SURVEY procedures cover the main topics of descriptive statistics (MEANS, FREQ) and regression (REG, LOGISTIC, PHREG). But as the use of complex surveys evolve, particularly among students who often use this data due to its high quality and low price, adding even more analytics that are suitable for this data further opens its horizons. Twelve SAS/STAT^® procedures can use special SAS^® data sets with the CORR and COV options as input data for analyses such as PRINCOMP, FACTOR, and VARCLUS. Having this functionality as our motivation, we extended Jessica Hampton's "PROC SURVEYCORR" approach to create a %SURVEYCORRCOV macro to include features of the CORR procedure. For example, rather than a vector of correlations, %SURVEYCORRCOV provides a matrix of correlations and their p-values, for both the observed values and the within-domain ranks. In addition, %SURVEYCORRCOV generates standard deviations, which can be used to create covariance matrices. The output data sets from %SURVEYCORRCOV can be used directly in procedures that use CORR and COV. We review the parameters for using %SURVEYCORRCOV and examples for use in multivariable analyses such as principal components and factor analysis, and how variable clustering can be part of a regression modeling approach. We also provide practical advice for all data users, such as when to use the correlation or correlation matrix, and orthogonal or non-orthogonal factors.

David R. Nelson and Siew Wong-Jacobson, Eli Lilly & Company

Session 5188-2020

20 in 20: Quick Tips for SAS^® Enterprise Guide^® Users

There are many time-saving and headache-saving tips and tricks you can use to make working in SAS^® Enterprise Guide^® a breeze. Did you know that you can create a process flow from a program with the click of a button? You will learn 20 tips and tricks for working in SAS Enterprise Guide in 20 minutes - tips for both SAS Enterprise Guide 7.x and 8.x. One tip per minute, and out of the twenty, you are guaranteed to find at least one nugget that will make your life easier.

Kelly Gray, SAS Institute

Session 4419-2020

A Beginner's Guide to Using ARRAYs and DO Loops

If you are copying and pasting code over and over to perform the same operation on multiple variables in a SAS^® data step you need to learn about arrays and DO loops. Arrays and DO loops are efficient and powerful data manipulation tools that you should have in your programmer's tool box. Arrays list the variables that you want to perform the same operation on and can be specified with or without the number of elements/variables in the array. DO loops are used to specify the operation across the elements in the array. This workshop will show you how to create an array and utilize DO loops to perform operations on the elements in an array, create new variables, and change a short wide to a long and skinny data structure.

Jennifer L. Waller, Augusta University

Session 4209-2020

A Beginners Guide to Consuming RESTful Web Services in SAS^®

Web services are a method to exchange data between applications or systems using web technology like HTTP and machine-readable file formats like XML and JSON. Representational State Transfer (REST) is the most popular architecture used to implement web services. Web services using the REST architecture are called RESTful web services. In recent years SAS has included procedures and libname engines for all standards to support consuming RESTful web services. This paper presents how web services can be consumed in SAS. It will explore the PROC HTTP and discuss the different options that must be set correctly to consume a web service. It shows how parameters can be generated from existing SAS data using PROC STREAM and can be submitted when calling a web service. And finally, it describes how the output from a web service can be read into SAS using the JSON and XML libname engine.

Laurent de Walick, PW Consulting

Session 4958-2020

A Close Look at How DOSUBL Handles Macro Variable Scope SAS Global Forum 2020

The macro variable scoping rules of the SAS^® macro language are complex, and welldocumented. The DOSUBL function, introduced in SAS 9.3M2, adds an additional layer of complexity to these scoping rules, as the macro programmer needs to understand how code executing in the DOSUBL side-session will create or update macro variables, and what impact this will have on macro variables stored in the main session symbol tables. Unfortunately, the current SAS documentation does not provide a clear definition of the DOSUBL scoping rules. This paper presents a series of test cases designed to illustrate DOSUBL's handling of macro variable scopes, and infers a set of DOSUBL macro variable scoping rules. The intended audience is experienced macro programmers interested in learning how DOSUBL manages macro variable scopes

Quentin McMullen, Siemens Healthineers

Session 4851-2020

A Data-driven Approach to Understand Mild Cognitive Impairment from Imbalanced Data Using SAS

Cognitive decline has emerged as a significant threat to both public health and personal welfare, and mild cognitive decline/impairment (MCI) can further develop into dementia or Alzheimers disease. While treatment of Alzheimers disease can be expensive and sometimes ineffective, the prevention of MCI by identifying modifiable risk factors is an effective complementary strategy. Using a data-driven approach to understand the factors of MCI has recently become a crucial research question. However, a fundamental problem needs to be addressed: most healthcare data sets are imbalanced. To solve this problem, we employed multiple strategies to deal with imbalanced data, such as random oversampling, random undersampling, SMOTE, SMOTEENN, etc. After that, to examine the effects of using multiple strategies and different machine learning algorithms, we used three machine learning algorithms: decision tree (DT), neural networks (NN), and gradient boosting (GB). In this study, we not only compare different balanced strategies and machine learning algorithms but also investigate the most important factors that contribute to MCI. With the SMOTEENN strategy, we increased recall from 0.007 to 0.87.

Liyuan Liu, Meng Han, Yiyun Zhou, and Gita Taasoobshirazi, Kennesaw State University

Session 4687-2020

A Doctor's Dilemma: How Propensity Scores Can Help Control for Selection Bias in Medical Education

An important strength of observational studies is the ability to estimate a key behavior's or treatment's effect on a specific health outcome. This is a crucial strength as most health outcomes research studies are unable to use experimental designs due to ethical and other constraints. With this in mind, one drawback of observational studies (that experimental studies naturally control for) is that they lack the ability to randomize their participants into treatment groups. This can result in the unwanted inclusion of a selection bias. One way to adjust for a selection bias is by using a propensity score analysis. In this paper, we explore an example of how to use these types of analyses. In order to demonstrate this technique, we seek to explore whether clerkship order has an effect on National Board of Medical Examiners (NBME) and United States Medical Licensing Examination (USMLE) exam scores for 3rd year military medical students. In order to conduct this analysis, a selection bias was identified and adjustment was sought through three common forms of propensity scoring: stratification, matching, and regression adjustment. Each form is separately conducted, reviewed, and assessed as to its effectiveness in improving the model. Data for this study was intended to imitate data gathered between 2014 and 2019 from students attending Uniformed Services University of Health Sciences (USUHS). This presentation is designed for any level of statistician, SAS^® programmer, or data scientist or analyst with an interest in controlling for selection bias.

Deanna Schreiber-Gregory, Henry M Jackson Foundation for the Advancement of Military Medicine

Session 4415-2020

A Hands-on Introduction to SAS^® Hash Programming Techniques

SAS^® users are always interested in learning techniques that will help them improve the performance of table lookup, search, and sort operations. SAS software supports a DATA step programming technique known as a hash object to associate a key with one or more values. This presentation introduces what a hash object is, how it works, the syntax required, and simple applications of it use. Essential programming techniques will be illustrated to sort data and search memory-resident data using a simple key to find a single value.

Kirk Paul Lafler, Software Intelligence Corporation

Session 4911-2020

A Level Headed Approach to the METHOD=FASTQUAD Option in the GLIMMIX Procedure

The Gaussian quadrature algorithm approximates the likelihood of a generalized linear model with multiple nested levels using the Gauss-Hermite quadrature method of integration. While the approximations are close to exact, the calculation of these approximations in SAS are computationally demanding both in memory and time. To address these intense requirements, the SAS/STAT 14.1 update added the multilevel adaptive Gaussian quadrature algorithm of Pinheiro and Chao (2006) to PROC GLIMMIX using the METHOD = FASTQUAD option. Pinheiro and Chao's algorithm reduces the number of integrations required while SAS processes the estimation of the conditional log-likelihoods required for this modelling technique. The reduction of integrations reduces computation memory and time. Using a public dataset, this paper will examine the process of creating a multilevel model using adaptive Gaussian quadrature while comparing the run times of single level and multilevel adaptive Gaussian quadrature.

Sidney J. Hann, Grand Valley State University and Spectrum Health Office of Research and Education

Session 4991-2020

A Program to Compare Two SAS Format Catalogs

SAS programming professionals are sometimes faced with the task of determining the differences between two SAS format catalogs. Perhaps they received an updated format catalog from a collaborating organization; or maybe a colleague updated a format catalog to reflect changes in the underlying data. Either way; how can programmers tell which catalog entries and value/label pairs have been modified? If the two catalogs being compared are relatively small, then the tried-and-true method of outputting each of them via the FMTLIB option of PROC CATALOG and then manually comparing the listings may suffice. But, this method is laborious and error-prone when there are a large number of formats and format value/label pairs. This paper presents a SAS program that compares two SAS format catalogs and reports the differences between them. It identifies mismatches in the format name, start value, end value, and label between the two catalogs being compared. Because the comparisons are done programmatically, this method eliminates tedious manual reviews and directly identifies all differences. Readers can immediately begin using this program to compare their own SAS format catalogs.

Michael A. Raithel, Westat

Session 4772-2020

A program to keep track of the number of records in a project

When linking multiple data sets, we face problems like dropped or added records, duplicates, etc. The SAS^®log is available, but it doesnt give detailed information unless you write additional code, and for bigger projects, you need to scour the massive lognot an easy read for a non-programmer. The program we have developed tracks the number of records/cases/controls in a data set and how the numbers change over the course of the complete project. It also reports if there are any duplicates in the data. The program was developed for case-control studies but has been used across different industries. It automatically takes the last data set from the syslast macro variable along with user-provided info like the primary key and what defines a record, case, or control; for example, is_case = 1 defines a case, and id and date could be the primary keys. It creates output in both data set and text file format that displays the name of the data set it ran against, the number of records/cases/controls in the data set, the change in the number of records from the last macro call (and the reason for that change), and whether there are any duplicates. By using this program, you may never again need to examine the SAS log after each PROC or DATA step. This presentation explains the macro parameters and then uses an example to illustrate the functionality of the macro.

Gurpreet Pabla, University of Manitoba

Session 4284-2020

A SAS^® Macro for Calibration of Survey Weights

In survey sampling, calibration is commonly used for adjusting weights to ensure that estimates for covariates in the sample match known auxiliary information, such as marginal totals from census data. Calibration can also be used to adjust for unit nonresponse. This paper discusses a macro for calibration that was developed using SAS/STAT^® 15.1 and SAS/IML^® software. The macro enables you to input the design information, the controls for the auxiliary variables, and your preferred calibration method, including either a linear or exponential method. Because an unbounded calibration method can result in extreme calibration weights, this macro also supports bounded versions of both linear and exponential calibration methods. The macro creates calibration replication weights according to the sample design and the specified calibration method. Examples are given to illustrate how to use the macro.

Tony An, SAS Institute Inc.

Session 4287-2020

A Survey of Methods in Variable Selection and Penalized Regression

Statistical learning often deals with the problem of Ô¨Ånding a best predictive model from a set of possible models on the basis of the observed data. "Best" often means most parsimonious; thus a sparse model that is composed of a subset of variables is usually preferable to a full model that uses all input variables because of its better interpretability and higher prediction accuracy. To this extent, systematic approaches such as variable selection methods for choosing good interpretable and predictive models have been developed. This paper reviews variable selection methods in linear regression, grouped into two categories: sequential methods, such as forward selection, backward elimination, and stepwise regression; and penalized methods, also called shrinkage or regularization methods, including the LASSO, elastic net, and so on. In addition to covering mathematical properties of the methods, the paper presents practical examples using SAS/STAT^® software and SAS^® Viya^®.

Yingwei Wang, SAS Institute

Session 5088-2020

A Warning about Wald Tests

In mixed models, tests for variance components can be of interest. In some instances, several tests are available. For example, in standard balanced experiments like blocked designs, split plots and other nested designs, and random effect factorials, an F test for variance components is available along with the Wald test, Wald being a test based on large sample theory. In some cases, only the Wald test is available, so it is the default output when the COVTEST option is invoked in analyzing mixed models. However, one must be very careful in using this test. Because the Wald test is a large-sample test, it is important to know exactly what is meant by large sample. Does a field trial with 4 blocks and 80 entries (genetic crosses) in each block satisfy the "large sample" criterion? The answer is no, because, for testing the random block effects, it is the number of blocks (4) that needs to be large, not the overall sample size (320). Surprisingly it is not even possible to find significant block effects with the Wald test in this example, no matter how large the true block variance is. This problem is not shared by the F test when it is available as it is in this example. A careful look at the relationship between the F test and the Wald test is shown in this paper, through which the details of the above phenomenon is made clear. The exact nature of this problem, while important to practitioners is apparently not well known.

David A. Dickey, NC State University

Session 4579-2020

Accelerate DATA Step BY-Group Processing in SAS^® Viya^®

BY-group processing is a method of processing observations from one or more data sets so that the observations are grouped by common variable values. SAS^® Cloud Analytic Services (CAS) on SAS^® Viya^® enables you to code your DATA steps such that BY groups are operated on in parallel. Parallel processing can improve performance. In this presentation, learn how to combine tables in CAS with the DATA step SET and MERGE statements. Also, learn how to use FIRST. and LAST. variables in the context of CAS to perform operations at the start or end of a BY group. We also cover issues you might run into as you convert your SAS^®9 DATA steps to run in CAS. Come see how to accelerate your DATA steps in CAS.

Jason Secosky and David Bultman, SAS Institute Inc.

Session 4632-2020

Accelerating SAS Using High-Performing File Systems on Amazon Web Services - 20200325c - darrylo.docx

When running efficient, high-performing SAS^®applications in the cloud is critical to your business, deploying the right storage architecture could be the difference between success and failure. However, running a high-performing parallel, distributed file system is difficult and often very expensive to setup, run, and maintain. What if you could offload this to someone else, someone with expertise and with virtually unlimited resources? You can! With Amazon Web Services (AWS), you can have the flexibility to run different file systems that are compatible with SAS in the cloud, including Amazon Elastic File System and Amazon FSx for Lustre, giving you a variety of performance and cost options to choose from, based on the needs and demands of your customers. This paper guides you through choosing the best storage platforms when running SAS applications on AWS. It compares and contrasts different AWS file service offerings and gives recommendations on when and where to use these services, and how to configure them. It also discusses the different types of cloud servers available on AWS and which are best suited to access these high-performing file systems. This in-depth analysis considers the needs of the different SAS tiers and gives recommendations based on compute, memory, and network performance configurations, plus tips and tricks to help you get the most out of your investment.

Darryl Osborne, Amazon Web Services

Session 4285-2020

Accessing Medicare Data at the Centers for Medicare and Medicaid Services using SAS^®

CMS is certainly no stranger to complex data. With billions of Medicare records loaded each year, Fee-for-Service (FFS) databases like the National Claims History (NCH), Standard Analytical Files (SAFs), Integrated Data Repository (IDR), and Chronic Conditions Warehouse (CCW) are getting more involved every day. This paper will discuss some of the nuances among the various systems and will offer tips for accessing each of these environments using SAS Access to Teradata^®, Oracle^®, and the SAS Grid; utilizing tools such as SAS Enterprise Guide^®, SAS Studio, and the IBM^® mainframe.

Rick Andrews, Office of the Actuary, Centers for Medicare and Medicaid Services

Session 4714-2020

Achieving Net-Zero: Forecasting Power Usage to Improve Sustainability on Clemson University's Campus

Clemson Energy Visualization and Analytics Center (CEVAC) is helping lead Clemson University toward a more sustainable future by using analytics and modeling of campus utility data. Leveraging access to multiple data stores and web-based data streams, the CEVAC team is deploying visualization and analytic solutions at a building and campus level. CEVAC is working with an initial ten buildings across campus, using SAS^® Visual Analytics 9.4 (VA) reports for each building. The CEVAC student team developed data pipelines and databases for a diverse data environment that ingests data from campus facilities databases, daily and hourly email attachments, and third-party APIs. Metrics include power, temperature, water, indoor air quality, and occupancy, which are available in each VA building report. SAS^® LASR (LSR) Server tables are updated and are subsequently available in VA building reports. A python middleware solution pushes data in LSR tables at regular and irregular intervals. Included in each building report is a custom alert system designed for the Clemson campus. Each dashboard reports alerts for building sensors that include issues varying from failure to report, to a minimum and maximum value check. Current models include carbon footprint analyses and a 24-hour deep learning neural network forecast of total power use for a campus building.

David L. White, Watt Family Innovation Center, Clemson University; Tim Howard, Watt Family Innovation Center, Clemson University; Snowil Lopes, Clemson University, Facilities

Session 4497-2020

Achieving Optimal Performance with the SAS^® Data Connect Accelerator for Hadoop

The SAS® Data Connect Accelerator for Hadoop uses the SAS® Embedded Process to improve performance when moving data between Apache™ Hadoop® sources and SAS® Cloud Analytic Services. Achieving optimal performance during data movement can be challenging. Some of the variables to consider include cluster size, number of cores available, size of the data, and number of splits in a file. This paper explains how to optimize the SAS Embedded Process and how to take advantage of the new Apache™ Spark™ Continuous Processing mode available in the SAS Embedded Process.

David Ghazaleh, SAS Institute

Session 4989-2020

Add SGSCATTER, SGPLOT, SGPANEL and SGRENDER Procedures to Your SAS^® Toolbelt

SGSCATTER, SGPLOT, SGPANEL and SGRENDER procedures provide a range of graphical data visualization options to help any data analyst tell the story of the data. If you are a SAS Enterprise Guide^® user be aware that SAS Enterprise Guide generates graphs using the GPLOT procedure which is the predecessor to PROC SGPLOT. To really bring your data alive you will need to be familiar with the expanded capabilities in PROC SGPLOT and introduce yourself to PROC SGPANEL.

Alex Chaplin, Bank of America

Session 4721-2020

Adopting SAS^® Viya^® into Your Business: Data Success Tales from a Central Bank's Perspective

Data maturity is an essential element of a successful business. To support business growth and capacity, companies, like Eastern Caribbean Central Bank (ECCB) seek to adopt high performance, user-friendly data and analytical solutions as they outgrow tools such as Microsoft Excel. This paper shows how ECCB matured from a legacy system primarily using Microsoft Excel for analytics and reporting purposes to a customized, self-service, robust system powered by SAS Viya. SAS Viya’s ability to connect Python – an open-source interface – and REST APIs to the Cloud Analytics Services (CAS), along with web-based analytics and reporting capabilities, was an optimal analytical solution that supported ECCB’s vision for growth and complex business process. In this paper, we will walk you through the business issues that inspired the change, the specific techniques we used to solve the data issues, and how they integrated the solution into their processes. All solutions have their wins and losses – so we will share the practices that worked well and those that could have gone better. O

Ben Zenick, Zencos Leah Sahely, Eastern Caribbean Central Bank

Session 5057-2020

Advertising Effectiveness - Do Super Bowl Ads Impact Stock Prices?

For many years, the Super Bowl has been seen as a great platform for companies to market their products through a 30- or 60-second commercial. These commercials are famous not only for their entertainment value but also for their skyrocketing cost. The cost of a Super Bowl ad has gone up by approximately 11% in the last five years, hitting an all-time high of $5 million in 2019. A significant question is whether this multimillion-dollar investment has a substantial impact on the growth of a company that advertises in this way. Companies from various sectors, like automobiles, entertainment, technology, etc., invest in these advertisements. Since their goal is not only to improve sales but also to increase brand awareness, the return on investment of these ads can be found by examining their impact on the stock price of these companies. The scope of this presentation is to analyze the stock data of the companies that are frequent investors in Super Bowl ads. This is done by first finding the top investing sectors and then listing the companies in those sectors that have aired their commercials multiple times during the Super Bowl in the last five years. SAS^®9.4 and SAS^®Viya^®are used to perform time series analysis on the stock data. This is followed by cluster analysis to find the sector that earns the maximum benefit from these commercials.

Jaya Bhatia, Oklahoma State University

Session 5163-2020

An Astounding Lack of Data

In today's world, where there are quintillions of bytes of information generated every day, we still suffer from a lack of data. Data is driving the decision-making in business, but what about governments? Why are cities behind when it comes to data-driven decision-making? This paper looks at the availability of city-level data and how it can be used in decision-making at the local level. The topic is further extended with a discussion of existing barriers to using data for informed decision-making in local governments.

Christopher Hooks, North Carolina State University Programming

Session 5042-2020

An Efficient Way to Create Descriptive Tables with Pairwise Comparison

This paper illustrates a SAS macro for descriptive tables, which provides Chi-square and Fisher Exact tests for categorical variables, and parametric and nonparametric statistical tests for continuous variables. A formatted output table includes Mean ¬± Standard deviation, Median (25th percentile, 75th percentile), non-missing N, count (%), and statistical test for each p-value. It also provides pairwise p-values if the comparison is among more than two groups. A permutation method has been used in the macro to allow huge amount of variables to be processed automatically. This approach is designed to create and update descriptive tables in an efficient way.

Ang Gao and Chul Ahn, University of Texas Southwestern Medical Center

Session 4103-2020

An Insider's Guide to SAS/ACCESS^® Interface to Snowflake

Snowflake is an exciting new data warehouse built for the cloud. SAS/ACCESS^®Interface to Snowflake enables SAS^®to take advantage of this exciting technology. This paper describes Snowflake and details how it is different from other databases that you might have used in the past. Using examples, we discuss the following topics: the differences between using SAS/ACCESS Interface to Snowflake and SAS/ACCESS^®Interface to ODBC; how to configure your SAS^®environment for SAS/ACCESS Interface to Snowflake; tricks that you can use to discover what the software is doing; how to effectively move data into Snowflake using the interface; and performance tuning your SAS and Snowflake environment. This paper uses an example-driven approach to explore these topics. Using the examples provided, you can apply what you learn from this paper to your environment.

Jeff Bailey, SAS Institute

Session 4306-2020

An Introduction to Multiple Time Series Analysis and the VARMAX Procedure

To understand the past, update the present, and forecast the future of a time series, you must often use information from other time series. This is why simultaneously modeling multiple time series plays a critical role in many fields. This paper shows how easy it is to use the VARMAX procedure to estimate and interpret several popular and powerful multivariate time series models, including the vector autoregressive (VAR) model, the vector error correction model (VECM), and the multivariate GARCH model. Simple examples illustrate Granger causality tests for identifying predictive causality, impulse response analysis for finding the effect of shocks, cointegration and its importance in forecasting, model selection for dealing with the trade-off between bias and variance, and volatility forecasting for risk management and portfolio optimization.

Xilong Chen, SAS Institute

Session 5146-2020

Analyzing Non-normal Data: Application to Missing Data Problems

In many applications, the response variable is neither continuous nor normally distributed. If the outcome variable is binary, count, multinomial, or ordinal, the relationship between predictors and response variable is non-linear and using ordinary linear models developed for continuous data is inappropriate. Therefore, more advanced models adopting classification algorithms should be applied. Binomial, multinomial, and ordinal logistic models, as well as Poisson regression, for low-dimensional data and classification random forest for high-dimensional data are among robust predictive methods discussed for such scenarios. Starting with the simplest case of binary outcomes, through count, multinomial, and ordinal response variables, this study discusses various modeling options for low- and high-dimensional data while handling non-normal responses and missing data issues in SAS^®. Three missing data techniques including multiple imputation are considered to appropriately account for the high percentages of missing observations, which are present in the majority of applied studies. Various techniques are discussed that can be applied to data with non-normal outcomes and missing observations. This paper discusses different options within SAS 9.4 for the aforementioned models using procedures such as PROC LOGISTIC, PROC GENMOD, PROC HPFOREST, and PROC STANDARD, PROC MI, and PROC MIANALYZE.

Niloofar Ramezani, George Mason University

Session 5201-2020

Analyzing the Factors Impacting Suicidal Behavior in American Youth

Suicide is the 2nd leading cause of death in youth aged 12-19. Estimates from the data indicate that nearly 7.4% of the American youth attempted suicide in 2017 alone, while 17.2% of teens considered attempting suicide. The last decade has seen an increase in the number of suicide attempts in teenagers. Given the seriousness of this issue, it is important to identify the factors that are leading to this increase in suicide attempts. This paper closely examines the 2017 Youth Risk Behavior Survey (YRBS), in order to understand characteristics of teenagers who attempted suicide. Latent Class Analysis using SAS^® Enterprise Guide^® found that among teens who attempted suicide, three distinct classes were evident which were characterized mainly by sexual assault, bullying, and depression respectively. Moreover, using PROC SURVEYLOGISTIC, it was seen that the odds of attempting suicide were four times higher for teens who were sexually assaulted and three times higher for teens who were bullied or abused drugs. Furthermore, it was seen that teenagers who were sexually assaulted had a high co-occurrence of other risky behaviors, and ultimately a higher percentage of multiple suicide attempts. The outcome of the paper highlights the importance of early intervention in preventing teenagers from slipping down a "rabbit hole" of risky behaviors that ultimately lead them to take their own lives.

Ashlesha Sampathi, Mounica Mandapati, Nitesh Maruthi, and Venkat Ram Reddy, Oklahoma State University

Session 4742-2020

Any Time Zone to Any Time Zone: A Macro to Convert Anything

This paper describes a flexible and fully configurable macro that converts a datetime stamp from any time zone to any other time zone. The macro can be configured to operate with any and all of the more than 24 time zones in the world.

Joe DeShon

Session 5026-2020

Application of Google Distance Matrix API to investigate healthcare access: a study on Hokkaido, Japan

Recently, a web service called web Application Programming Interface (API) has attracted the attention of academic researchers for evaluating healthcare access by estimating travel times and distances. Particularly, Google Maps API is now being adopted as a new and simple tool for spatial analyses. Google Maps API requires information on the search criteria but provides the calculation results based on their own data and calculation algorithms. However, most researchers have used R or Python for this purpose. In this paper, we introduce the key steps for conducting similar studies using SAS^® by providing an overview of Google Maps API (especially Google Distance Matrix API), Uniform Resource Identifier (URI), JavaScript Object Notation (JSON) format, and JSON map file. An analytical example is also provided to investigate healthcare access for children with cancer in Hokkaido, Japan. By using the Google Distance Matrix API, we obtained the estimated travel times to one childhood cancer hub hospital from residences by considering the locations of local government offices. The estimated departure times were then derived, and the results were visualized on a choropleth map. Our study suggests that the healthcare access is likely to be investigated by using SAS and web APIs, especially Google Distance Matrix API. To assess healthcare access properly, researchers will need large-scale real-world data and software that enables a series of processing. One such software will be SAS.

Anna Tsutsui and Yuko Ohno, Graduate School of Medicine, Osaka University

Session 4916-2020

Applications of DOW Loop: Extension to PROC REPORT

A double DOW-loop in conjunction with PROC REPORT can be used to present order variable information across pages when vertical space is needed to separate blocks of related rows. When creating reports in PDF ODS destination, SPANROWS option is used to repeat the values of GROUP or ORDER variable across the pages. However, sometimes the values are not repeated as expected, especially when a BREAK or LINE statement is used. The proposed solution uses SPANROWS and a compute block to change the height of the last row in a block of records. Preprocessing is needed since there is no LAST dot (LAST.variable) option in PROC REPORT. A double DOW-loop can be used for this processing. A double DOW-loop consists of two “DO-UNITL” statements. The first “DOUNTIL” calculates the number of records per each “break” group (LAST.BREAK) and second “DO-UNITL” attaches the calculated number of records to each break group. When used with a compute block, the vertical spacing can be adjusted. We will present an example that demonstrates the problem and uses the proposed solution using Base SAS® 9.4 in a windows environment. It is appropriate for programmers in any industry with a basic understanding of PROC REPORT. The discussion focus on the use of PDF, though limitations of RTF will be presented.

Neha Yadav and Pranathi Salla, Axio, a Cytel Company

Session 4076-2020

Apply the Key Ideas from Andrew Ng's Machine Learning Certification (Coursera) in SAS^® Viya^®

As a SAS^® administrator or architect, you are expected to know everything related to SAS. And there's a lot to know. What is available to you, and where do you find all this information? Attend this discussion and you'll walk away with a multitude of resources to help you and your customers succeed with SAS.

Petri Roine, SAS Institute

Session 4813-2020

Approaching SAS^® Viya^® Backup and Restore in the Cloud

Backups are an important part of any application, and the ability to recover quickly with the least data loss is critical. We discuss how to use some cloud-native options to back up and restore SAS^®Viya^®from backups using Backup Manager from SAS, command-line interface (CLI), and different cloud offerings. We also discuss the considerations before deployment and installation to recover SAS Viya using these cloud-native features. We have performed multiple backups and restore tests, and we were able to restore SAS Viya to a running state. This presentation guides you through the specific use cases we have tested and the process involved. The presentation is useful for SAS^®architects to design a well-architected environment of SAS Viya in the cloud and in turn to have a robust system of backup and restore for SAS Viya. We discuss cloud-based features, so attendees should already have an understanding of the different cloud offerings of public cloud providers. We have performed in-depth testing on Google Cloud Platform (GCP), Amazon Web Services (AWS), and Microsoft Azure. The presentation is divided into three sections based on these cloud service providers. In each section, we list the storage options that these cloud providers offer, along with the advantages and disadvantages of using these technologies for SAS Viya backup and restore.

Sounak Nag and Suvadeep Chatterjee, Core Complete

Session 4656-2020

Architecting SAS^® Analytics Edge-to-Cloud Solutions on Hewlett Packard Enterprise Infrastructure

We live in a world where everything computes. Where technology, apps, and data are driving digital transformation, reshaping markets, and disrupting every industry. IT needs to reach beyond the traditional data center and the public cloud to form and manage a hybrid connected system stretching from the edge to the cloud. Listen to experts on how Hewlett Packard Enterprise (HPE) and SAS help organizations to rethink and modernize their infrastructure more comprehensively to deploy their edge-to-cloud architecture and analyze data at the edge quickly. Join us, in this session for a deep dive and demo on: HPE Elastic Platform for Analytics (EPA) architecture for SAS and use cases SAS Event Stream Processing (ESP) and SAS Visual Analytics (VA) deployment at edge Cloudera Data Platform and NiFi

Mark Barnum and Kannan Mani, Hewlett Packard Enterprise

Session 5112-2020

Assessing Performance of Risk-based Testing

Trends in the regulatory landscape point to risk-based approaches to ensure high quality data and reporting for clinical trials. Risk-based methods for validation of production programming code which assign testing methods of varying robustness based on an assessment of risk have been evaluated and accepted by some industry leaders, yet they have not been fully adopted. Some view risk-based testing as simply an attempt to save money or compensate for limited resources while claiming a minimal impact on overall quality. While that may sometimes be the case, the intent should rather be to focus finite resources on what matters most. Even with the robust gold standard of full independent reproduction, mistakes still happen. Humans make errors. Therefore, risk and consequence should be considered in choosing verification methods with a resource emphasis on those areas with greatest impact. However, the assessment of these decisions must be regularly and consistently evaluated to ensure that they are appropriate and effective. Metrics both within and across projects can be implemented to aid in this evaluation. They can report the incidence, type, and method of identification of issues found at various timepoints in the production process. This includes issues found internally prior to the completion of output verification (i.e., during testing), internally during final package review, and during external review. These data are crucial for the effective evaluation of the performance of risk-based testing methods and decisions.

Amber Randall, SCHARP - Fred Hutchinson Cancer Research Center

William Coar, Axio Research

Session 4169-2020

Automate Patient Safety Survey PDF Report Production

The Agency for Healthcare Research and Quality (AHRQ) Surveys on Patient Safety Culture (SOPS) assess staff's views about their organizational culture for patient safety every two years. The results of these surveys are used by BJC HealthCare leadership to determine areas of improvement and monitor the organization's safety culture. At BJC, the survey results are summarized for 15+ provider organizations, at many human resources (HR) hierarchical levels, and distributed to leadership throughout the system. Therefore, the reports require complex formatting, customization, and data validation. During 2016, about 500 PDF reports were developed using a manual process that was error prone and time consuming. In 2018, based on lessons learned from 2016, our team created SAS Enterprise Guide^® programs to automatically produce 855 PDF reports. This paper presents techniques used for the report production process including macros, ODS PDF, ODS LAYOUT, PROC REPORT, DO loop, and more. Additionally, ODS formatting techniques that have been discussed in existing publications will be summarized and referenced. The focus will be the strategies for designing program structure, creating PDF templates, and building macros. Techniques that improve efficiencies, such as the DO loop and the automated PDF output validation process, are discussed extensively. With the improved and automated report production process used in 2018, the resources spent for this project were reduced significantly compared to 2016.

Lan Luong, Yao Zhang, and Michelle Simkins, BJC HealthCare

Session 5048-2020

Automatically Loading CAS Tables from SAS^® Data Integration Studio Using SAS^® Viya^® REST APIs

Your SAS^® Viya installation may have many SAS^® Visual Analytics reports in several folders that also need many source data tables. Assuring that the information for each of those reports is loaded at the moment the users want to use them is not an easy task. We developed a comprehensive SAS^® Data Integration Studio process that automatically finds the data source needed for each SAS^® Visual Analytics report, extracts the tables from the database management system (DBMS), loads them as SAS^® Cloud Analytic Services (CAS) tables, and checks that everything is correct for the reports' execution. This is possible thanks to the use of the SAS^® Viya^® REST API, integrating http requests that refer to the folders on which we want to automate the reports and the relationships between these and their CAS tables, along with the corresponding authorization to access the data. With this integration of SAS^® Viya with SAS^® Data Integration Studio, we achieve a result in which both the developers and end users of the reports don't need to update the information or submit a request to their respective technical support, making the workflow of data analytics faster and more continuous.

Oscar Simionati and Ervin Wilneder, Buenos Aires City Government.

Session 4912-2020

Automating Your Multi-Environment Hotfix Management

Managing SAS^® hot fixes on a SAS^® 9.4 multi servers environment can be a bit of a challenge for SAS administrators. It becomes even more of a struggle when checking for available hot fixes on multiple SAS environments with different SAS versions or even worst if you don't have access to SAS web site from your SAS servers. The SAS Hot Fix Analysis, Download and Deployment Tool (SASHFADD) creates a customized report that lists available hot fixes for the installed SAS products. It also generates scripts that automate hot fixes downloads. A script is generated to install SAS hot fixes then SAS admins use SAS Deployment Manager to install SAS hot fixes. The goal of this paper is to provide helpful information on how to use SASHFADD to automate the SAS 9.4 hot fixes download and deployment process when you are dealing with multiple SAS environments. It also provides a workaround solution for SAS environments that can't connect to SAS update servers.

Pierre Dupuis

Session 4485-2020

Automation in SAS^® Visual Data Mining and Machine Learning

Automated machine learning can help every data scientist, from the novice to the most experienced practitioner. This paper demonstrates the different levels of automation available in the Model Studio environment of SAS^® Visual Data Mining and Machine Learning software. You can choose to have features automatically constructed or to automate the process of algorithm selection and hyperparameter tuning by using dedicated Model Studio nodes in the pipeline that represents your machine learning process. You can build on or edit a pipeline that includes these nodes, inserting your domain expertise into the process. Alternatively, you can ask the software to automatically build an entire pipeline that includes various feature engineering steps and predictive models, optimized for your specific data according to the assessment criterion of your choice. The included models are determined using hyperparameter tuning across multiple modeling algorithms. Not only do these automation techniques aid and accelerate the modeling process for beginning users, but they also relieve expert data scientists of the burden of iterating through various feature engineering steps, model hyperparameter values, and modeling algorithms, enabling them to focus on solving the problem at hand.

Wendy Czika, Christian Medins, and Radhikha Myneni, SAS Institute

Session 4692-2020

Bayesian Sequential Monitoring of Clinical Trials Using SAS

In this paper, we provide an overview of Bayesian sequential monitoring for clinical trials. In such trials, patients are continually enrolled and their data are analyzed as often as is desired or feasible until a hypothesis has been proven or disproven, or until the allocated resources for the trial have been exhausted (that is, the maximum sample size or study duration has been reached). Such an approach is particularly appealing in the case of difficult-to-enroll populations such as pediatric populations or for rare disease populations. A Bayesian sequentially monitored trial does not require a pre-specified sample size or number of analyses. For proving efficacy in a sequentially monitored trial, the Bayesian collects data until the evidence in favor of the investigational treatment is substantial from the perspective of an a priori skeptical judge who doubts the possibility of large treatment effects. The Bayesian approach naturally allows for the incorporation of prior information when determining when to stop patient accrual and ultimately in evidence evaluation once the complete data are available. We give easy-to-understand examples for how Bayesian methods can be applied in the setting of pediatric trials where it is of interest to extrapolate information from adult trials. We discuss SAS/IML^® software that can be used to efficiently perform design simulations without high-powered computing infrastructure.

Matthew A. Psioda, Department of Biostatistics, University of North Carolina at Chapel Hill

Session 4597-2020

Behind the Front Door: Authentication Options with SAS^® Viya^®

If you are tasked with deploying and administering SAS^® Viya^®, one of your top concerns is how users will be authenticated. Will you use an LDAP directory, or is there a requirement to implement single sign-on with your company's existing security infrastructure? Learn what the authentication options are in SAS Viya 3.5 and what information you need to obtain from your IT security department to configure them. Learn how to limit concurrent logins and see examples of how to customize the login page. If you are administering an existing deployment, find out what new authentication options have been added since SAS Viya 3.4.

Mike Roda, SAS Institute

Session 4147-2020

Best Practices for Converting SAS Code to Leverage CAS

This is an introductory paper to help you understand how to convert SAS^® code to leverage the distributed computing engine in SAS^® Cloud Analytic Services (CAS). As customers adopt SAS^® Viya^®, they have access to many new visual interfaces, procedures, and CAS action sets that execute code in CAS. But what about existing SAS routines? How does one convert these routines to leverage CAS? In this session, we will cover terminology, the sweet spot in CAS, the biggest SAS^®9 gotchas in the conversion of SAS code to SAS Viya and CAS, best practices in doing the conversion, workarounds, case studies, coding examples, and reading material. Additionally, you will learn how to engage with SAS to run your SAS code through the SAS Viya readiness assessment utility, which produces reports to assist you in your conversion of SAS code to leverage CAS.

Steven Sober and Brian Kinnebrew, SAS Institute

Session 4126-2020

Best Practices for Effective Infographics in SAS Visual Analytics

SAS^® Visual Analytics enables you to build impressive infographics into your report. Learn how to effectively convey your message using the out-of-the box visualizations and layouts delivered with SAS Visual Analytics. In this paper, the author will analyze Earth's surface temperatures using daily and monthly NASA/GISS data for over 25,000 stations across the globe. We will learn how temperatures changed over the years and detect trends and hotspots. You will learn how to import and prepare data in SAS Viya, as well as discover the latest visualization capabilities. We will discuss best practices for designing and developing simple but effective data visualizations targeted for specific audiences.

Falko Schulz, SAS Institute

Session 4631-2020

Best Practices for Enabling SAS^® Analytics in the Cloud - at Scale

We know cloud is the future, but it is quickly becoming our present! An increasing number of companies, large and small, are turning to the cloud to deliver advanced analytic solutions that will scale. In fact, the cloud can be the perfect environment for evolving your SAS^® analytics to solve real-world problems quickly, easily, and affordably. But to do that, you need to understand how to make it work for your business. Whether you're an administrator, business user, architect, or engineer, this presentation has something for you. We discuss everything from broad best practices to specific technical tips and tricks to show how to optimize your cloud solution. We illustrate how companies are using SAS^®9, SAS^® Viya^® and Teradata Vantage in public and private clouds to achieve the following: reduce and avoid cost; better manage and prepare data; increase operational efficiency; leverage cloud marketplaces to streamline setup and configuration; and quickly scale SAS solutions to production.

Heather Burnette, Teradata Corporation

Session 4248-2020

Best Practices for Scheduling in SAS^® Viya^®

To help you harness the power of the SAS^® Platform, SAS^® Viya^® contains numerous scheduling capabilities. The SAS Viya documentation provides detailed instructions about how to schedule a job or flow. However, to give you a better understanding about which method might be best for you, this paper goes into depth about the five scheduling methods available in SAS Viya. This paper discusses the advantages, disadvantages and alternate approaches, and best practices for each scheduling procedure. This paper details the scheduling types available and helps you take full advantage of the SAS Viya environment.¬†

Ursula H. Polo, SAS Institute

Session 4929-2020

Best Practices in SAS^® Programming to Support Ever-Increasing Data Loads.

In most large organizations, SAS^® serves a pivotal role in data processing for warehousing, reporting, analytics and more. The SAS language provides multiple tools and options to streamline these data processing needs that may be unfamiliar to many developers. This paper will present some well-known and lesser-known SAS methods for efficient data handling. The focal point of the paper is more geared towards what is available and the use cases for each technique, rather than a detailed how-to on a specific solution.

John Schmitz, Luminare Data LLC

Session 4184-2020

Better Student Application Reporting: A Slowly Changing Dimension and SAS^® Data Integration Studio

For a selective university, shaping the incoming freshman class requires current operational data on the student applications that have been received, the admissions decisions that have been made, and the students that have committed in response throughout the admissions cycle. A point-in-time comparison with historical data serves as a benchmark to better understand and anticipate the makeup of the incoming students. To meet these demands, we shifted student application reporting from a Base SAS^® program (run on an ad hoc basis about twice a week that created a data set of application information for that day, saved to a drive), to a scheduled SAS^® Data Integration Studio job that created a slowly changing dimension of year-over-year daily application information stored as an oracle table in the university's enterprise database. Changing how the data was stored reduced storage space, as only new applications and change records are added to the student application table instead of accumulating data sets with a repeated record for every day the application was in the system. Scheduling a SAS Data Integration Studio job removed the task of running the program from a person's workload, as well as standardized the collection of student application data. Together these changes facilitated more frequent snapshots of student application data as well as increased processing efficiency. A daily summary is loaded to SAS^® Visual Analytics as the basis for a daily year-over-year analysis of student application decisions.

Lauren Schoenheit and Alexander K. Fantroy, NC State University

Session 4720-2020

Bits and Bytes - A Mix for High Volumes of Data

Building a system is like building a house - both relate to the construction realm and have a myriad of similarities. Handling high volumes of data is like building a tower - or a castle. Every step is critical in order to have a great building! Big towers take sound planning, big cranks and; specific foundations, materials and supplies. Some buildings are trendier than others; some are copied from others; however, we all are familiar of the main plans with so many buildings built in the world. This paper reviews the roadmap of high volumes of data processing - in a funny fashion! The first part shows how to tackle big data efficiently and presents techniques to reduce the amount of data processed; the second part explains how to reduce the amount of data stored on disks; and the final part demonstrates techniques to efficiently process large datasets. Throughout this article, we expose one principle: Keep it Simple Stupid (KISS) - no need to be complicated. √Ä la mode towers can be down-to-earth!

Karine Desilets, Canada Revenue Agency

Session 4432-2020

Bringing Computer Vision to the Edge: An Overview of Real-Time Image Analysis with SAS^®

In the Internet of Things (IoT) era, when large amounts of streaming data are generated continuously, it is extremely impractical and inefficient to store all these data in a data center. Furthermore, the majority of these data are irrelevant; for example, only streaming data that contain anomaly events are worth storing or transmitting for further investigation. For real-time image or video processing, moving the analytics to edge devices not only saves a device-to-cloud data round trip but also improves data privacy and governance. This paper presents the real-time image analytics solutions offered in SAS^® software for image processing, image classification, object detection, and segmentation. It also describes the general workflow of real-time image analytics, from preprocessing images, to training deep learning models by using SAS^® Viya^®, to deploying an image analytics pipeline on edge devices by using SAS^® Event Stream Processing. The paper discusses the following applications: real-time semantic segmentation analysis, with an example of autonomous driving; real-time defect detection for quality inspection in the manufacturing industry specific to surface mount technology (SMT); and loose ballast detection in railway tracks for monitoring track health in the transportation industry.

Maggie Du, Juthika Khargharia, Shunping Huang, and Xunlei Wu, SAS Institute

Session 4894-2020

Bringing SURVEYMEANS Up to Speed: Improving This Procedure for Estimating Domain Level Ratios

Domain-level estimation plays an important role in the field of survey sampling statistics. The implementation of domain estimation for means, totals, and ratios that SAS^® has developed is available in the SURVEYMEANS procedure. Through The Lewin Group's work on statistical audits of Medicare and Medicaid claims data, two weaknesses of PROC SURVEYMEANS have been identified. First, there is no calibration weight adjustment using auxiliary variables beyond rudimentary post-stratification. Second, the algorithm for estimating domains iterates through one domain level at a time, which can be time and resource intensive. To overcome these limitations, The Lewin Group developed the Hybrid Ratio Estimator (HRE) for use in statistical audits. The HRE is a hybrid of two well-known survey estimators: the Separate Ratio Estimator (SRE) and the Combined Ratio Estimator (CRE). The HRE allows for domain-level ratio estimation with calibration weight adjustment (Deville and S√§rndal, 1992) and uses domain-level statistics formulas described in the groundbreaking book Sampling Techniques (Cochran, 1977). Leveraging these techniques, the HRE produces results for all domain levels in one iteration, which leads to vast time and resource improvement compared to PROC SURVEYMEANS. Using real-world Medicare audit data, the HRE was over seven times faster when estimating 80 domains, with time efficiency increasing drastically as the number of domain levels increased.

Brian Simonson, Cami Sorenson, Tyler Hamashima, Michael LeFew, and Soumita Lahiri, The Lewin Group

Session 4260-2020

Build an HTML5 Web App Using SAS^®

With SAS^® you have the Power to Know, HTML5 provides the "Power to Show" A workshop to kickstart the development of new web apps on SAS Platforms.

Allan Bowe

Session 4594-2020

Building an Expert's Toolbox: Essential Tools for Generating the Perfect Microsoft Excel Worksheet

When you have a home building or renovation project to accomplish, you need expert tools for the job. The same is true when you want to build (create) or modify (renovate) Microsoft Excel worksheets. You need a variety of expert tools in your SAS^® software toolbox to accomplish these tasks. You have a choice of many tools that enable you to create and fully customize your worksheets. For example, you can use the SAS^® Output Delivery System (ODS) Excel destination and the SAS EXPORT procedure. But you can also complement the standard tools with more specialized ones (for example SAS macros and the Python opensource language) to further extend the capabilities of your worksheets. This paper explains how to use all of these tools to create fully functional Microsoft Excel worksheets. The discussion is divided into two main sections. The first section explains how to generate Excel worksheets and perform various tasks in SAS and SAS^® Viya^®. For each task in this section, the paper demonstrates how to accomplish the task by using current functionality (for example, the ODS Excel destination, PROC REPORT, and so on) that is available in SAS and SAS Viya. This section also explains how you can enhance that functionality by using the custom %Excel_Enhance macro. The second section illustrates how you can further extend worksheet functionality in all environments by using the opensource tools Python and Java.

Parker, Chevell, SAS Institute

Session 5081-2020

Building an Internal Application with the SAS^® Stored Process Web Application

Using the SAS Stored Process Web Application and SAS^® prompts you can take existing SAS code and modify it to accommodate different variations of your most common requests. You can use a custom HTML front end to provide an intuitive, modern-looking user interface, ODS PDF to present a summary of your results, ODS Excel for supporting data, and you can deliver it all by email. Your customers will love getting results on-demand and you'll love that they can fish for themselves, freeing you from repetitive queries and allowing you to focus on more interesting and productive work.

Mara Werner, Department of Health and Human Services

Session 5147-2020

Building Two Regression Lines Is Better Than Building One

Regression assumes a linear relationship among all data points. But what if there are two linear relationships? In that case, building one regression line would not be ideal. This is where you can utilize PROC UCM to help find the breakpoint of the data and build two regression lines that can better represent the data points. These lines help you produce better predictions, as they account for the shift in trend. Traditionally, this type of breakpoint analysis is more common for time series data. However, in this presentation, we introduce three different examples where this type of analysis can produce knowledge that you might not otherwise gain by fitting only one regression line. We start by introducing a common breakpoint example by using the Nile River water level data before and after a dam was built. We then move on to financial data, where we use this type of analysis to identify whether a stocks uptrend has ended and its time to sell the stock. Finally, we explore the utility of this method with traditional nontime series data.

G Liu, University of Heidelberg

Session 4437-2020

Building Web Applications with SAS^® Has Never Been So Easy

Boemska AppFactory helps you complete the "last mile of analytics" by empowering users to innovate with highly tailored productivity apps. When combined with the SAS Viya Platform you can create sophisticated apps from your data and increase access to analytics by enabling more of your organization to harness the power of SAS by deploying modern, intuitive, and relevant apps.

Matthew Kastin, NORC at the University of Chicago; Nikola Markovic, Boemska

Session 5096-2020

Calculating Exposure Hours for Contact Investigations

Tuberculosis (TB) is a disease on the National Notifiable Disease List that continues to threaten the health our nation. Epidemiologists and public health professionals work to detect, prevent, and treat TB patients. TB is caused by a bacterium called Mycobacterium tuberculosis. The bacteria usually attack the lungs but can also attack other parts of the body. Not everyone infected with TB bacteria becomes sick, but if it is not treated properly, the disease can be fatal. TB control programs utilize contact investigations (CI) to assign priorities to individuals known as contacts of TB cases. As characterized by Cook, Shah, and Gardy (2012), The closeness of contact has been defined by the amount of time spent in a shared airspace per week with minimal emphasis on specific environmental or social factors. Disease investigators use this information to perform targeted screening in a timely manner on individuals who may have become infected. The purpose of this presentation is to demonstrate how to calculate exposure hours for a CI incident that has occurred in a classroom setting. Using SAS^®Enterprise Guide^®software, we cleaned, formatted, and prepared data to be analyzed. With the results from our calculation, the disease investigator can derive the information needed to evaluate the individuals for targeted testing.

Cecine Nguyen, Los Angeles County Department of Public Health TB Control Program

Session 4613-2020

CAS Sessions: Understanding Connection Options to Ensure Data Access Security

In SAS^®Viya^®, there are a variety of ways to connect to the SAS^®Cloud Analytic Services (CAS) analytics engine, or CAS server. The resulting CAS sessions might run under the CAS instances service account or the end users identity. This is further complicated by whether the CAS client is using the Kerberos authentication protocol, which results in differences, depending on whether CAS is running on Linux or Windows. The various SAS Viya applications default to certain connection methods, which drive their CAS sessions process owner and, finally, the resulting physical security contexts. Understanding the default CAS session behavior for each application and how those defaults can be overridden and engineered to support data access policy is the focus of this paper.

Philip Hopkins, SAS Institute Inc.

Session 5117-2020

Case Study: Design a High-Performance SAS^® Grid Infrastructure

Come and learn how SAS and Stratacent helped a large financial client to design and upgrade their existing SAS^®Grid infrastructure to handle the workload from a new accounting program. Their current SAS Grid environment was not designed to handle the increased program load. One of the requirements from the client was to have faster execution of the models so that the executive has ample time to preview the results before submitting them. SAS and Stratacent together with the customer reviewed the current load and the performance that the existing system was providing. Our team extrapolated the new workload that will be added to the system with the new accounting program and how performance will be impacted. In this presentation, we discuss how the new performance metrics were created, various design options and their impact on the new SAS Grid environment, and the final specifications.

Sandeep Paul, Stratacent

Session 4454-2020

CASL, a Language Specifically Designed for Interacting with SAS VIYA

CASL is a new language designed to run SAS^® Cloud Analytic Services (CAS) actions and process responses to generate a report. To make it easy for SAS users, the language syntax mimics the syntax of the DATA step. CASL is not just a language, but a programming environment that is embeddable into any program. CASL is available through the CAS procedure or as the action runCasl in the CASL Server action set. Learn how to use the CASL language to pipeline actions one after another to produce a report. You will learn how to use CASL to create your own result tables from action results. Allow me to introduce you to a very powerful language that is simple to use.

Jerry Pendergrass, SAS Institute Inc.

Session 4322-2020

Causal Effect Estimands: Interpretation, Identification, and Computation

In modern statistics and data science, there is growing attention on estimating causal effects by using data from nonrandomized or imperfectly randomized studies. This task arises in applications such as post-approval analysis of medical treatments, evaluation of public policies, and assessment of marketing campaign efÔ¨Åcacy. One challenge of these applications is the variety of causal effects that you can estimate. For example, you might need to determine whether to estimate the average treatment effect (ATE), the average treatment effect for the treated (ATT), a mediated effect, or other conditional effects. Identifying the causal effect most relevant to your application can have important implications for determining what approach to causal inference is most appropriate. This paper provides an overview of different types of causal estimands, a comparison of how the different estimands are interpreted, and guidance on how identifying an appropriate estimand can help you determine an appropriate causal analysis workÔ¨Çow. The CAUSALGRAPH, CAUSALMED, CAUSALTRT, and PSMATCH procedures in SAS/STAT^® software are used to demonstrate the workÔ¨Çow. The paper also includes a review of the assumptions that are required for identifying and estimating causal effects.

Clay Thompson, Michael Lamm, and Yiu-Fai Yung, SAS Institute

Session 4791-2020

CECL/IFRS9 Implementation: What Is Driving My Quarter-Over-Quarter Change in Allowance?

Ford Motor Credit Company collaborated with SAS Institute Inc. to develop a quantitative process to explain the drivers in variability of issue-to-issue change in Expected Credit Loss (ECL), based on the Current Expected Credit Loss (CECL) / International Financial Reporting Standards 9 (IFRS9) framework. The underlying methodology performs attribution of the ECL change at the contract level. Results from this framework are compared to other ECL attribution approaches. This paper provides an analysis of the findings and compares the benefits and downsides of each of the approaches investigated.

Usha Srinivasan, Maria Mutuc-Hensley, and Jill Bewick, Ford Motor Credit Company LLC

Gaurav Singh and Danny Noal, SAS Institute Inc.

Session 4636-2020

Child Maltreatment Report Outcomes: Do Reports from Mandated and Non-mandated Reporters Differ?

Before the age of 18, 1 in 8 children in the US will be confirmed to have experienced child maltreatment (CM), and 1 in 17 children will be placed in foster care. Mandated reporting laws, which require certain professionals to report suspected CM, constitute a core policy response to CM in the US. Reports from mandated sources have consistently been more likely to be substantiated (confirmed as CM). Recent work, however, has suggested that substantiation might not be fully informative as an outcome measure, bearing little relationship to subsequent risk of CM. Our study attempts to evaluate differences between reports from mandated and non-mandated sources at the initial report and over time. Our research questions are Do reports from mandated and non-mandated sources have different outcomes at the initial report (substantiation and foster care entry) and different trajectories beyond the initial report (re-report, later foster care entry)? Using a detailed cross-sector, longitudinal data set collected in a midwestern metropolitan area from 1993 to 2009 (N=7,302) and a larger (N=2 million) but less detailed national universal CM and foster care data set, logistic regression and survival analysis were conducted. For both data sets, mandated reporters were more likely to have their reports substantiated and to transition into foster care after the initial report. In stark contrast, we found that reporter status did not predict report recurrence or later foster care entry.

Maria Gandarilla Ocampo and Brett Drake, Washington Univeristy

Session 4536-2020

Choose Your Own Adventure: Manage Model Development Via a Python Integrated Development Environment

Data scientists often need to work with multiple languages and in multiple analytic environments to solve a problem. SAS^® provides a complete end-to-end environment, but it has traditionally been accessible to users only through GUIs and SAS languages. This paper introduces a new tool enabling data scientists to manage components of the analytics life cycle from within any Python environment. We first demonstrate how to register a model developed with Python using SAS^® Model Manager, before exploring methods for managing, deploying, and tracking the model. In addition, we show how to accomplish supporting tasks such as rendering visualizations and extending the existing functionality.

Jon Walker, SAS Institute

Session 5003-2020

Classification of regional characteristics using population composition data and POS data

In recent years, along with the development of information technology, means of transmitting information have evolved from uniform to individually customized. However, in the case of promotional flyers that many firms in Japan have created, the reality is that they provide the same content for their consumers nationwide. In our study, we focused on a drugstore chain with the second-highest sales in the Japanese drugstore industry. This firm has about 2,000 stores in Japan. The problem with this firm is that although it prioritizes store management rooted in local residents, it distributes flyers with the same products and layout all over Japan. Using population data from Tsukuba, Ibaraki Prefecture, we created a Huff model to identify the characteristics of each store in the city. In Tsukuba, for example, the area where young people live and the area where elderly people live are relatively separated, so you might expect the drugstores in those two areas to have different characteristics. Using SAS^®software, the company analyzed whether there is a difference in sales between stores in different categories based on store characteristics and point-of-sale (POS) data. With this approach, we were able to find what seemed to be regional characteristics in a certain product group.

Kazuma Bannai , Yutaka Ishibashi , and Yuki Ukeba, University of Tsukuba

Session 4906-2020

Cloud, On-Premises, and DevOps: Implementing SAS^® Customer Intelligence 360 in a Banking Environment

SAS^® Customer Intelligence 360 is a SaaS platform that integrates with your on-premises customer data and marketing processes. This hybrid setup might raise a few questions when applied in a corporate environment, especially in a tightly regulated one such as a bank: How do we authenticate users? How do we make the system secure? How do cloud and on-premises components communicate? Which data is persisted in the cloud? On the other hand, keeping your data in-house while benefiting from the advantages that cloud-based software offers, like continuous development of new features, elastic scalability and reduced maintenance, is wonderful. This paper answers those questions with a real-world example: the implementation of SAS 360 Engage: Direct at ING BE and ING DBNL. You get insights into the data flows of SAS Customer Intelligence 360, how APIs are used to integrate in ING's IT landscape for user management and four-eyes principle, and how DevOps software tools and methodology can be used to automate the setup and configuration of SAS. Also, given that this implementation used an early release of SAS 360 Engage: Direct, this paper also shows how the flexibility of SAS software allows for workarounds for features not yet available.

Andrea Defronzo, ING

Roel Van Assche, SAS Institute

Session 5332-2020

Coding in SAS^® Viya^®

This hands-on workshop is for users who want to take advantage of the boost in processing speed for Base SAS programs executing in SAS Viya. This paper covers using the power of SAS Cloud Analytic Services (CAS) to access, manage, and manipulate in-memory tables. The purpose of this paper is to support users in their ability to get started with coding in SAS Viya as they transition f rom coding in Base SAS programs. Users learn to perform three simple yet important tasks to get comfortable with the language of SAS Viya: connect to the CAS LIBNAME engine for data transfer between SAS and CAS; load data to a caslib and process data in CAS; and modify SAS programs to run in SAS Viya

Charu Shankar, SAS Institute Inc.

Session 5062-2020

Combination Weighted Log-rank Tests for Survival Analysis with Non-proportional Hazards

The statistical methods most commonly used to test the equality of survival curves in time-to-event analysis rely on the assumption of proportional hazards. In oncology drug development, non-proportional hazards between investigational treatments are often observed but statistical methods that properly account for these situations are rarely used in practice. The use of combinations of Fleming-Harrington weighted log-rank statistics is one relatively straightforward way to perform hypothesis testing in the presence of nonproportional hazards. In this approach, the maximum test statistic of several weighted logrank statistics (Zmax) is calculated from Z-statistics obtained using the G(œÅ,Œ≥) family. A combination test can simultaneously detect equally weighted, early, late or middle departures from the null hypothesis and can thus robustly handle several non-proportional hazard types with no a priori knowledge of the hazard functions. Although the LIFETEST procedure allows for testing with Fleming-Harrington weighted log-rank statistics, there is no built-in functionality in SAS^® to test combinations of weighted tests. We discuss the development of a SAS macro that implements combination testing, including estimation of the variance-covariance matrix of the joint distribution of the Z-statistics and calculation of the p-value for Zmax.

Andrea Knezevic and Sujata Patil, Memorial Sloan Kettering Cancer Center

Session 4521-2020

Common Points of Compromise Detection Using Network Analysis

Common points of compromise (POC) are entities such as merchants and websites that suffer from a security breach that results in the compromise of a multitude of cards, online credentials, and so on. The form of such breaches can range from sophisticated security attacks on large merchants that are well publicized to an opportunistic staff member harvesting details on a regular basis. Such forms of fraud continue to thrive, and therefore POC detection continues to be a key tool in combating various forms of banking fraud such as card fraud and online banking fraud. In this paper, we introduce a process that is designed to identify POCs by combining techniques from network analysis and machine learning, engineered completely within the SAS^® ecosystem.

Prathaban Mookiah, Hamoon Azizsoltani, Ian Holmes, and Tom O'Connell, SAS Institute

Session 4091-2020

Common Tasks Done with CASL

Some tasks are so common they are used in almost every program; tasks such as loading data tables and then either deleting or saving those tables. Other tasks include looping through the data, summarizing the data, and sorting the data. As a SAS^®programmer, you know the code that you need to accomplish these common tasks. CASL, the language created to communicate with the SAS^®Cloud Analytic Services (CAS) server, might be new to you. This paper demonstrates how to perform those common tasks with CASL. Understanding how to perform common tasks and having the code to do them empowers you to develop CASL programs and take full advantage of the CAS server.

Jane Eslinger, SAS Institute

Session 4960-2020

Computer-Aided Diagnosis System for Breast Ultrasound Images Using Deep Learning

The purpose of this study was to develop a computer-aided diagnosis (CAD) system for the classification of breast malignant and benign masses using ultrasonography based on a convolutional neural network (CNN). We explored the reasons for the classification by generating a heat map capable of presenting the important regions used by the CNN for the classification. From the clinical data obtained in a previously conducted large-scale clinical trial, we collected images of 1,536 breast masses (897 malignant and 639 benign), with each breast mass captured from various angles. We constructed an ensemble network by combining two CNN models (VGG19 and ResNet152) fine-tuned on augmented training data and used the mass-level classification method to enable the CNN to classify a given mass using all views. To visualize the regions detected by the CNN models to classify breast masses, we performed heat map analysis. CNN models and heat map analysis were implemented in SAS^®Viya^®3.4. For an independent test set consisting of 154 masses (77 malignant and 77 benign), our network showed outstanding classification performance, with a sensitivity of 90.9%, a specificity of 87.0%, and an area under the curve of 0.951. In addition, our study indicated that over half of the breast masses were not important regions for correct classification. Collectively, this CNN-based CAD system is expected to assist doctors and improve the diagnosis of breast cancer in clinical practice.

Hiroki Tanaka, Japan Tobacco inc.

Session 4158-2020

Create Impactful Data Journeys for "Informavores" with SAS^® Visual Analytics on SAS^® Viya^®

Have you ever thought about what makes one dashboard or infographic more impactful than another? This paper provides all you need to take your current information products from "huh?" to "aha!" Creating data applications with SAS^® Visual Analytics is important to ensure that the audience gets immediate and long-term value from your output. Competing for audience attention today is harder than ever, and the term informavore refers to the insatiable need for information today-we are all informavores. Humans need answers or we start filling the answers in without the correct context or data. Our job as data and business analysts is to create data applications that provide a journey from one infofragment to another. This paper steps you through creating impactful output that resonates with your audience, no matter how data literate they are. SAS Visual Analytics on SAS^® Viya^® provides so many options to craft the precise journey for all users in an organization. You will gain insight about how to leverage these features in SAS Visual Analytics and how to modernize the way you communicate data stories to your audience. The journey you design for your audience is important for how long they stay and whether they keep coming back to your dashboards and reports in the future. This paper provides you with the ideas and skills needed to create and refine your information products to keep Informavores hooked and keep them wanting more tomorrow.

Travis Murphy, SAS Institute

Session 4450-2020

Creating a SAS^® Technical Platform Standard for IFRS 17

By 2022, IFRS 17 from the International Financial Reporting Standards Foundation has the potential to unsettle the entire insurance industry. It not only brings new regulatory requirements, but the technical capability to support this new regulation can potentially introduce new costs to insurance providers. As an insurance company, are you ready to support the technical requirements of an end-to-end IFRS 17 platform - including architecture, installation, verification, and validation? Does your current IT organization have the capacity to scale to support this new solution? How ready are you to support an IFRS 17 platform? Is your IT team prepared to implement and support a highly available IFRS 17 platform? If the answer is "NO," then this session is for you! This session touches on the following topics: architectural decisions, security, and best practices (such as why it's important to have a platform standard and how to leverage DevOps tools like Puppet). Also included in this session are specific tips and tricks for preparing your institution to use SAS^® software to implement your IFRS 17 platform.

Sumit Kumar, Amjad Ghori, and Melissa Cooper, SAS Institute

Session 4243-2020

Creating Custom Microsoft Excel Workbooks Using the SAS^® Output Delivery System, Part 1

This paper explains how to use Base SAS^® software to create custom multi-sheet Microsoft Excel workbooks. You learn step-by-step techniques for quickly and easily creating attractive multi-sheet Excel workbooks that contain your SAS^® output by using the SAS^® Output Delivery System (ODS) Report Writing Interface (RWI) and the ODS destination for Excel. The techniques can be used regardless of the platform on which SAS software is installed. You can even use them on a mainframe! Creating and delivering your workbooks on demand and in real time using SAS server technology is discussed. Although the title is similar to previous presentations by this author, this presentation contains new and revised material not previously presented.

Vincent DelGobbo, SAS Institute

Session 4185-2020

Creating Custom Web Applications with SAS Viya Jobs

In prior versions of SAS^®, an oft-overlooked capability was using SAS^® Stored Processes to stream HTML content and essentially create full-blown web applications that take advantage of the SAS^® Platform. Using a combination of HTML, CSS, JavaScript, and SAS code, the possibilities were endless. From workflow, to data entry, to geographic mapping applications- if you could envision it, you could probably build it with SAS Stored Processes. With the introduction of SAS^® Viya^®, the concept of SAS Stored Processes was initially sunset. Thankfully, the SAS^® Job Execution Web Application enables much of the same custom application development functionality that SAS Stored Processes did, while enabling you to leverage the full power of SAS Viya. Whether you're familiar with the old approach based on SAS Stored Processes or not, this breakout session shows you the easiest ways to leverage SAS and HTML with SAS Job Execution Web Application. The agenda for this session includes the following topics: what a SAS Viya job is; how to create and register a job in SAS^® Studio; how to associate a form with a job for parameter selection; incorporating HTML into your job; and creating a full-blown web application by using jobs.

Robert L. Anderson II, SAS Institute

Session 5155-2020

Creating Episodes of Care and Finding Coverage Gaps with Administrative Medical Records

Analysis and research on administrative healthcare data often involve the manipulation of multiple records with contiguous or overlapping date fields. Claims data contain transaction records with multiple records for each individual across multiple dates of service, while enrollment data often contain multiple records per person reflecting that individuals enrollment change history. In many cases, it is necessary to roll up the date fields in these types of files in order to establish continuous enrollment, episodes of care, and the gaps in coverage for those periods. In this presentation, a flexible SAS^®program is introduced that rolls up multi-record date fields to establish continuous segments, segment lengths, and segment gap lengths and to calculate the number of segments and gaps for an individual ID variable. Using SAS functions such as LAG, RETAIN, and INTCK and procedures such as PROC SQL and PROC EXPAND, the target data are transformed from long to wide, enabling the user to more easily process hospitalization data into episodes of care or evaluate claims and enrollment records against standardized performance measures such as the Healthcare Effectiveness Data and Information Set (HEDIS).

Stephen Lein, Arkansas Center for Health Improvement

Session 4543-2020

Creating Reports that Comply with Section 508 Using SAS 9.4

When reports need to meet Section 508 and other accessibility compliance requirements, those needs are often only addressed in the last stages of creating and designing reports. At that point in the process, decisions have already been made that either limit the ability to make reports accessible or require significant revisions to output that has already been designed and generated. The result is a report that either does not fully meet accessibility requirements or that requires a significant investment in additional resources to fix the problem. This session details the issues report authors need to consider before they write their first line of code when planning for how they will create and deliver their report. This includes the technologies they choose to use (for example, HTML vs. PDF vs. Microsoft Excel), the way data is presented in both tabular and graph form, and what technologies and techniques are available in the SAS^® 9.4M6 Output Delivery System (ODS) to achieve these goals. The end result is that the often tedious act of remediating for accessibility turns into an easy task that emerges naturally as you create reports.

Greg Kraus, SAS Institute

Session 4265-2020

Customize Your Table of Contents with the ODS Destination for Word

The Base SAS^® Output Delivery System (ODS) destination for Word enables customers to deliver SAS^® reports as native Microsoft Word documents. ODS WORD generates reports in the Office Open XML Document (.docx) format, which has been standard in Microsoft Word since 2007. The .docx format uses ZIP compression, which makes for a smaller storage footprint and speedier downloading. ODS WORD is preproduction in the sixth maintenance release of SAS 9.4. This paper shows you how to make a custom table of contents (TOC) with ODS WORD. You will learn how to control the placement, text, and style of your TOC. Place your TOC anywhere in the body of your document. Make your TOC title and entry text anything you want it to be. Assign custom colors or fonts to your TOC text. Even put a stylish border around your TOC if you like! Adding a TOC to your document makes it more navigable, whether it is in digital or hardcopy format. Adding a custom TOC makes your document smarter, which makes you look smarter too! Whatever your SAS programming experience level may be, you will benefit from this session.

David W. Kelley, SAS Institute

Session 5170-2020

Customized Output with the SAS^® Config File: Always Useful but Seldom Used

The SAS^® config file is powerful and useful. In this example, we show how to customize the names of the Log and List files from the code we run, and obtain a useful new name as the result. The new name of a log file might look like this: 20180904_hhmm_userID_name-of-code-that-was-run.log. The benefit is that when one user or a team of users builds the code, everyone can see its progress. In this way, collaborating is simplified and results are easier for colleagues to share and consume. This is a mock or proposed file name. This macro is intended to inspire you to see what additional SAS System values to add to your style.

Zeke Torres, Code629

Session 4295-2020

Data Breaches Are Deadly - Let's Move Our Data with Cloud Data Exchange

Data preparation involves collecting, processing, and cleansing data for use in analytics and business intelligence. These tasks include accessing, loading, structuring, purging, unifying (joining), adjusting data types, verifying that only valid values are present, and checking for duplicates and uniform data (for example, two birthdates for one person). When we look specifically at accessing data, we bump into the first real issue: dealing with sensitive and personal data. Many industries (for example, banks) have very strict rules concerning the movement, handling, and storage of data . Data preparation is playing the role of self-service data management. Traditional data management processes can produce data up to a point, but dynamic fine-tuning and last-minute work is being done in a self-service way, using data preparation tools. But how can we feed this process with sensitive data or personal data without potentially compromising this data? Cloud Data Exchange is an intrinsic part of SAS^® Data Preparation. With Cloud Data Exchange , we can give users secure, easy access to data, help users to easily manage access to remote data, to securely transport the data to any location (including the Cloud), enable users to control read and write access, and facilitate users to control and monitor data usage. Let's look at how Cloud Data Exchange can enable secure movement of data from source to on-premises, public cloud, private cloud, and hybrid cloud locations.

Ivor G. Moan, SAS Institute

Session 4356-2020

Data Catalog: The Binding Governance Tool for a Successful Digital Transformation

No one today challenges the power of data as a driver for growth and innovation. One after another, companies are starting their digital transformation to add value to their data and to build a data-driven strategy. Unfortunately, most organizations govern their data in an ad hoc or firefighting manner across different parts of the business, and most of the time only within IT. Mapping data by building a data catalog is one of the first steps toward more governance and sustainability. A data catalog is a repository of metadata, centralizing information about data sources, schemas, tables, and columns extended. In this paper, we discuss about the importance of data catalogs how to build a data catalog over the SAS^®9 Platform using open-source technologies.

Vincent Rejany, SAS Institute

Session 4138-2020

Data Entry in SAS^® Visual Analytics: Is It Possible?

Achieving data entry within SAS^® Visual Analytics can be a challenge for many customers. SAS Visual Analytics offers a way to achieve it using the data-driven content object and SAS^® Viya^® jobs. This session presents the available techniques that can be used to integrate a data-entry form into a SAS Visual Analytics report. It covers the basic principles of data-driven content objects and the different approaches to storing the data. At the end of the session, you should be able to create a simple data-entry form, insert it in a SAS Visual Analytics report, and store the data in SAS^® Cloud Analytic Services (CAS) for later use.

Xavier Bizoux, SAS Institute

Session 5192-2020

Data Preparation Techniques for Cloud-Based Data Sources

Traditional data warehousing continues to evolve as the industry continues to embrace more and more cloud-based transactional systems. The days of accessing on-premises source system databases to extract data for your data warehouse, report, or analytics model are becoming more and more rare. In order to be ready to successfully navigate this new world, this session focuses on techniques and alterations needed to update your data preparation and data warehousing strategies. These strategic changes address the many differences that are presented by cloud-based source systems.

Jennifer L. Nenadic, SAS Institute

Session 4416-2020

Data Standardization Using SAS^® Health Data Mapper

Businesses across all sectors and domains are combining their vast holdings of big data with high-powered analytics to find solutions to the complex and critical business problems in their industries. A challenge in moving to this data-driven paradigm is first incorporating the wide variety of data sources that companies have available. The health and life sciences industry is no different, as it encompasses everything from clinical trials, which include patient demographics and lab results, to insurance claims. SAS^® Health: Data Mapper on SAS^® Viya^® provides a modern analytics and reporting platform for health-care data, and it gives you the power to achieve the following: define standards, using a flexible definition format, to explicitly outline the expected format of their final data structures; explore and visualize your data sources to better determine how to harmonize them with one another; map your data via a guided mapping process; export transformed data sets or the SAS^® code generated by the mapping process; and take advantage of existing work to repeat the same transformations on different data sources. This paper includes a description of some of the common steps you need to perform during the transformation process, including data merging, matching, and ensuring quality. The examples provided are applicable to anyone who needs to transform data to a known model.

Eric Bolender, SAS Institute

Session 4700-2020

Data-Driven Robotics - Leveraging SAS and Python Software to Virtually Build LEGO MINDSTORMS Gear Trains for the EV3 Brick

LEGO MINDSTORMS Evolution 3 (EV3) represents the third-generation programmable "Brick," a handheldcomputer developed by the LEGO Group that intelligently drives and forms the cornerstone of LEGOrobotics. Released in 2013, EV3 leverages LEGO Group-built sensors (including haptic, gyroscopic,ultrasonic, infrared, and others) and servomotors-rotary motors that track speed, degrees, and angle ofrotation-to interpret, interact with, and respond to environmental and user stimuli. Although EV3 roboticslocomotion begins with large and medium LEGO servomotors, gears and gear trains facilitate complexactions, movements, and the increase of speed or torque. To this end, this paper introduces LEGO gearsand simple gear trains, and includes SAS^® code that programmatically identifies how (and how well) LEGOgears mesh with each other in a two-dimensional (2D) plane. Data-driven software evaluates a table of 41LEGO gears and programmatically determines where on a virtual 9x9 LEGO stud plane the gears can beplaced to mesh. Moreover, by modifying additional tables, the 9x9 stud plane can be replaced with otherLEGO Technic beams (or other bricks) to demonstrate where gears can be placed. Additionally, a FUZZparameter enables the user to specify the number of millimeters of gap or overlap permitted between gears.This data-driven design maximizes software flexibility and configurability, providing dynamic output to meetthe needs of different users by modifying tables and parameters only-not code. Finally, the interoperabilityof data-driven design is showcased in that equivalent SAS and Python code are included, both of whichrely on the same parameters and control tables. For more information, please consult the unabridged 69-page text (https://communities.sas.com/t5/SAS-Communities-Library/Data-Driven-Robotics-Leveraging-SAS-and-Python-Software-to/ta-p/641990) and the 30-minute 4K video (https://youtu.be/rvFS0rj6ml4).

Troy Martin Hughes

Session 5072-2020

Dealing with Missing Data in Epidemiological and Clinical Research

Missing values can have a surprising impact on the way data is analyzed and processed. Epidemiological and clinical research typically involve complex data and large databases that frequently contain missing data. The impact of missing data on data analysis and research findings can be significant, so it is important to develop a sound methodology to deal with it. Fortunately, there are powerful tools to represent and reference the missing data in SAS^® analytics. There are a number of SAS functions and procedures that enable differentiated approaches for handling missing data. However, dealing with missing data can still be a bit of a minefield. This paper presents an introduction to categories of missing data and demonstrates some techniques that researchers can use to deal with missing data.

Andrew T. Kuligowski, Independent Consultant

Lida Gharibvand, Loma Linda University

Session 4236-2020

Delay Analysis in the SAS^® Middle Tier

Investigation of performance delays in large distributed software environments like SAS^® 9.4 can be quite complex. This paper reviews the aspects of the SAS environment that make it so challenging to analyze. The paper goes on to explain some of the techniques that SAS Technical Support uses to diagnose and solve these issues. For example, the discussion demonstrates how to use log analysis to rule in, or rule out, contributors to the source of the problem. In addition, you will discover how to use high-frequency thread dumps to identify specific problematic components. The discussion also covers how you can use network analysis to measure and identify specific inter-process delays in specific components. Case studies illustrate all the techniques that are presented. This paper is aimed at system administrators, network engineers, and others who might be involved in the investigation of performance issues on the SAS middle tier.

Bob Celestino, SAS Institute

Session 4694-2020

Deploying Computer Vision by Combining Deep Learning Action Sets with Open-Source Technology

Whilst early computer vision dates back as far as 1927, it has gained momentum in the last years due to the developments in the fields of deep learning and artificial intelligence. With the desire for applying computer vision in ever more general and flexible contexts, the challenge arises how we can push image processing models to production in a robust way. This paper focuses on doing Automatic Meter Reading (AMR) using customer-submitted photos of their meters. The challenges in this context are two-fold. First, we need to localise the box containing the digits, and then we must classify each digit with the correct label, 0-9. Many approaches are available but both of these challenges can be addressed by combining the Deep Learning action set in SAS^® Visual Data Mining and Machine Learning (VDMML) with DLPy and Keras. However, when applying these models in a real-world business context, an additional challenge arises, which is how to deploy and keep track of these models in a consistent way. This paper shows how SAS^® and open source tools can be used together to provide a consistent approach both to creating as well as managing and deploying models.

Jonny McElhinney and Duncan Bain, ScottishPower Energy Retail Ltd

Haidar Altaie, SAS Institute Inc., UK

Session 4553-2020

Deploying Machine Learning Models in an Anti-Money Laundering (AML) Program

As expectations of artificial intelligence (AI) and deep learning have peaked in the financial services industry, anti-money laundering (AML) professionals are exploring advanced methods to more accurately identify suspicious activities that impact their institutions. Fueled by regulatory guidance in the 2018 Joint Statement on Innovative Efforts to Combat Money Laundering and Terrorist Financing, many have set up pilot programs, but few have moved into production. Creating a modern AML platform that can support rapid and automated deployment of models and traditional rule-based scenarios ensures that banks will evolve to keep pace with sophisticated financial crimes. This paper explores use cases for machine learning models in AML and provides examples of how clients are promoting advanced analytics from the sandbox into their production SAS^® AntiMoney Laundering software environment.

Beth Herron and Saurabh Duggal, SAS Institute

Session 4323-2020

Designing Dashboards for Multiple Target Audiences with SAS Visual Analytics

There is no doubt that organizations have a vast amount of data available at their disposal. Yet, they often fall into the trap of creating a single dashboard for a wide variety of audiences to derive meaningful insights from the available data. Different audiences expect different insights from the data. Even the dashboard design thinking differs based on the target audience. This presentation focuses on three key audiences (executives, middle management, and individual contributors) and how to use SAS^® Visual Analytics to create dashboards that generate powerful insights for each of these. The presentation also covers dashboard design best practices and recommendations from field consultants to help you weave a compelling story from your organizational data.

Kunal Shah. SAS Institute

Session 5202-2020

Determining College Student Success

Student loan debt is an extremely hot topic in the media and is continuously on the rise, totaling over 1.5 trillion dollars in the United States alone. The overall goal of this project was to determine a student's best financial option based on the degree they intend to seek, the university they attend, and the average salary per degree after graduation. In order to narrow the scope, this idea was applied to schools in North Carolina within the UNC public school system. The choice to exclude private universities in North Carolina was because many tuition rates at private institutions exceed the amount of funding typically available to a student through federal loans.

Carl Palombaro, Kayla Barrier, and Katherine Floyd, UNC Wilmington

Session 5175-2020

Developing an Automotive Safety Ontology through Concept Map and Text Analytics

Vehicle safety is an important area in the automotive sector; however, there is no standardterminology that all the stakeholders can use. Ontologies have long been argued as oneapproach for capturing and representing domain knowledge. Ontologies define theterminology of a domain by specifying the relevant hierarchical concepts and theirrelationships. Ontology development is an expensive and time-consuming process. Thispaper proposes a concept map-based approach for automotive safety ontology developmentby first semi-automatically creating a detailed level entities/concepts as a keyword list byapplying natural language processing, including word dependency and POS tagging.Specifically, SAS Text Miner 15.1 will be used for analyzing the customer complaintdataset published by NHTSA. The ontology development workflow will include standard textmining nodes such as Import, Parsing, Filter, Topic, and Cluster for processing thecustomer complaint text and deriving safety related terms and relationships. This isthen used to extract appropriate entities/concepts and develop the concept map andeventually an ontology for the automotive safety domain. Having a unified ontology willgreatly help in minimizing the miscommunication between various stakeholders and ensurethat the designers, suppliers, manufacturers, dealers, and repair shops are all on thesame page with respect to automotive safety related issues. The intended audience for thispresentation is SAS users who are working in the areaof text analytics and automotive safety professionals.

Sadikshya Basnet, Oakland University

Session 4831-2020

Developing Credit Scores with Telco Data Using Machine Learning and Agile Methodology

The mobile phones market in Brazil is one of the largest in the world and this huge volume is concentrated in prepaid lines. This distribution reflects the population composed of lower income, lower credit profile and lower bankarization. However, in recent years the prepaid market has been falling, while the postpaid market has grown rapidly, due to the increase in the customer usage of data. This scenario makes selecting customers for upgrade a key point in the strategy of telecommunications companies in Brazil.

Luciano Diettrich, Fábio de Souza, and André Guerreiro; Claro Brazil

Session 4296-2020

Diagnosing the Most Common SAS^® Viya^® Performance Problems

Applications and the scalable computing environments in which they run have grown in complexity with more advanced technologies. With the mixture of virtual machine, cloud, and emerging container environments, diagnosing the causes of performance issues can be difficult. Relying on significant experience from the SAS^® Performance Lab, this paper presents the most common SAS^® Viya^® performance problems and methods for diagnosing and correcting them.

Jim Kuell, SAS Institute

Session 5173-2020

Digital Trust Index

Every six months, the Digital Confidence Indicator (DCI) measures the intention of use of technological devices. The current measurement brings a consolidation of views already perceived in previous ones, pointing out that, apparently, the better people's understanding of the digital environment, the greater is their perception that digital devices are both good and evil. There is a perception, for example, that at the same time as new technologies bring anguish, they also create countless facilities, becoming indispensable in everyday life. The last version pointed to a general decrease of the optimism regarding technology-the smallest number ever seen on the DCI. People are optimistic, but less so than they were before. Further evaluation shows that this sentiment is possibly a reflection of political polarization and of discussions that took place through digital devices. Hacking, social engineering, and lack of privacy in general also contributed to the decrease in optimism. The younger members of the population have shown a greater understanding of what the use of such devices means and what they represent. This audience has the lowest DCI. Yet again, people above 65 years old continue are the most optimistic group, crediting technology with minimizing the limitations that age imposes, in addition to recognizing the facilitation of relationships with groups of friends and family through social networks. All of these insights were gained by using SAS^®.

Andre L. Miceli, Infobase

Session 4881-2020

Do More with Less: Programming Efficiently with Keyboard Macros

Time spent searching through previous programs for recycled code can be repurposed through the utilization of keyboard abbreviations. Keyboard abbreviations are a means of creating keystroke shortcuts or abbreviations that tie directly to a curated library of code. This library can then be distributed through the use of keyboard macro files, enabling an increase in coding efficiency among teams.

Breanne Young, Life Time Inc.

Session 4907-2020

Don't Just Survive, Thrive with This Multi-Stratified Cox Proportional Hazards Model Macro

Survival analysis is a commonly used set of techniques for applied data analysis where the outcome variable is the time until an event. One of the most frequently used techniques for modeling this type of data is the Cox proportional hazards model, which can be implemented in SAS^® with the PHREG procedure. This model assumes that the ratio of hazards for any two individuals is constant over time, that is, they are proportional. If this assumption is not met for any particular covariate, stratification is one method to still include and control for the effects in the model. It then must be decided if there is an interaction involved between the stratum levels and predictors in the model. However, it can be cumbersome to manually code the interactions with all levels of stratum in SAS when there are multiple stratum variables involved. The new and improved lrt_strat_cox_ph macro simplifies this procedure by allowing for multiple stratum variables in its strata_vars parameter. The new feature makes the macro a powerful tool in applied survival analysis. This paper discusses the use of multiple stratifying variables, how they are implemented in SAS, and includes a practical example using the macro to tie it all together.

Katelyn J. Ware, Spectrum Health Office of Research and Education & Grand Valley State University

Paul Egeler, Spectrum Health Office of Research and Education

Session 5092-2020

Don't Overwrite Me! A SAS^® Macro to Identify Variables That Exist in More Than One Data Set

In the DATA step, merging data sets with common variables that are not included as BY variables can yield undesirable results. Specifically, the value of a common variable can be overwritten with an incorrect value. To prevent this from happening, you must ensure that the variable is read from only one master data set, by either dropping or renaming the variable in the other data sets. When working with data sets with just a few variables, you can quickly check which variables appear in more than one data set. However, as the number of data sets and variables increases, the chance of missing a common variable also increases. The SAS^®macro CHECK_VAR_EXIST was written to identify variables that exist in more than one data set more efficiently and accurately. The macro prints all common variables, which data sets they appear in, and other pertinent information. You can then use the list to drop or rename variables where they are not relevant, thereby reducing the chance of unintentionally overwriting a large number of variables.

Andrea Barbo, Yale New Haven Health Services Corporation/Center for Outcomes Research and Evaluation (CORE)

Session 5178-2020

Dynamic Programming With The SAS^® Hash Object

The original impetus behind the development of the SAS hash object was to provide SAS programmers with a fast and simple-to-use memory-resident storage and retrieval medium that could be used for matching data sets via table lookup. However, the forward-looking SAS R&D team ended up with a product that goes far beyond this simple functionality by creating the first truly run-time dynamic SAS data structure. At the basic level, its dynamic nature is manifested by the ability of a hash object table to grow and shrink at run time, ridding the programmer of the need to know its size in advance. At a more advanced level, the very structure and behavior or a hash table can be affected at run time via argument tags that accept general SAS expressions. Finally, a hash object can be made store pointers to other hash objects and used to manipulate them as pieces of data. Combined, these features entail possibilities for dynamic programming unfathomed in the pre-hash SAS era. This paper is an attempt to illustrate, via a series of examples, at least some of these possibilities and hopefully sway some programmers away from hard-coding habits by showing what the SAS hash object can offer in terms of truly dynamic DATA step programming.

Paul Dorfman, Independent Consultant Don Henderson, Henderson Consulting

Session 4090-2020

Dynamic Programming: Using Files to Change the Behavior of Your Program

So many times we have to create code to handle changes to our suppliers or customers. Usually this requires minor changes to code we already have. If you find yourself in this situation, think about using a configuration file to handle these small changes. Once this is in place, these small changes are easy to accommodate. You can pull in a CSV file of your suppliers business rules, and you can have a configuration file for each customers preferences. As you get accustomed to this paradigm, programming changes will become less and less business as usual.

Bryan K. Merrell Merrick Bank

Session 5205-2020

Economic Impacts of Sea Level Rise on Coastal Real Estate

As a result of the industrialized global economy, an increase of carbon and methane gas concentrations in the atmosphere has contributed to an overall warming climate, and in turn, global sea level rise. Such a change has vast implications for coastal communities. This study models the potential loss of value coastal properties in New Hanover County, North Carolina will experience due to sea level rise. The methods used were linear regression to predict the future value of properties in addition to methods used in ArcGIS in order to find the parcels affected by different increments of sea level rise. This analysis could be used to inform coastal residents on the potential loss in value of their property and what to expect in future years.

Austin Willoughby, Emma Delehanty, and Victoria Mullin, University of North Carolina at Wilmington

Session 4857-2020

Efficient Release Management with SAS^®

Release management is a process for managing different software versions between environments. This paper examines how you can improve the quality, speed, and efficiency of building or updating software and have a reliable release management process. By creating a web application with SAS^® Stored Processes as the back end, we can easily create, export, import, and promote releases through the different environments in an automated and highly configurable dynamic way. Different content can be included in the release such as SAS^® metadata, configuration files and tables, SAS macros, migration jobs, and so on.

Lars Tynelius, Infotrek

Session 4326-2020

Empower and Inspire-Designing Reports for Mobile Experiences.docx

What is the impact of mobile on business intelligence and reporting? According to Gartner, approximately one third of business intelligence users consume reports, dashboards, and KPIs on mobile devices. Since this content is being viewed on non-traditional devices with less screen real estate, dashboards and reports must be designed to deliver information in an easy and consumable manner. Users expect content to look great regardless of the mobile device screen size, orientation, or functionality. Hence, mobile reports need to be optimized to display data effectively so that users can easily access, interact with, and modify visualizations. SAS^® Visual Analytics Apps empower users with reports and dashboards anywhere and on any device. In this paper, you will learn design and usability best practices informed by user experience research and cognitive science to help create beautiful reports that guide interaction in SAS Visual Analytics Apps. This paper presents tips for how to reduce information density with space saving-features, as well as do's and don'ts for selecting objects, fonts, and colors that enable complex analytics to be processed quickly. Come learn how to create effective reports that scale from your desktop to your mobile device screen.

Khaliah H. Cothran, Ph.D., SAS Institute

Session 4602-2020

Enabling Real-Time Stability Monitoring using SAS^® Viya^® and SAS^® Event Stream Processing

Today with the Internet of Things (IoT), devices generate vast amounts of data that can be used to derive insights into the health and operation of an asset. To realize value from this data, analytics are needed in order to provide context for the state of the asset. In industrial applications, one such type of analytic is stability monitoring. These models provide an intelligent way to track the ongoing health of an asset where the metric of interest is checked against the bounds generated by the model. Automating such models to function in real time delivers immediate business value beyond detecting abnormal behavior because, in many cases, they provide the first step in identifying an invaluable window of opportunity to coursecorrect in mission-critical situations. To that end, these models need to be deployed against streaming data where they can continuously analyze the data. Once abnormalities that merit actions are identified, these alerts need to be raised to a person who can take action. Supporting evidence of the data related to the alarm is also captured to confirm that this is a real issue and what action needs to be taken.

Sanjeev Heda, Sathish Gangichetty, and Steve Sparano, SAS Institute

Session 4724-2020

Enhance Customer Risk Scoring by Quantifying the Opportunity to Commit Financial Crime

Analytical scoring models help organizations quantify risk and make appropriate data-driven decisions across many areas. Fraud and financial crime related scoring models consist of complex formulas and mathematical equations designed to measure an outcome or target activity using multiple inputs and dimensions. Traditional scoring models measure commonly accepted dimensions across known activity, demographics and product behavior but often do not consider the means and opportunity for a criminal to transact through other entities or networks of bad actors. This paper will present a methodology and propensity modeling approach which enhances financial crime risk scoring models by analyzing the framework and infrastructure available to commit financial crime.

Steve Overton, 6 Degree Intelligence

Session 4917-2020

Enhancing Academic Teaching of Data Science with SAS^® Viya^® for Learners

SAS^® Viya^® for Learners offers many advanced tools for studying data science. I found the fast cloud suite especially helpful to overcome obstacles in academic teaching that hinder novice learners. As a newcomer to data science lacking strong statistics knowledge, it was difficult to promptly read large textbooks or absorb in-depth knowledge from slides while in school, with limited hours for consultations with lecturers and complicated software. My paper discusses the features of SAS Viya for Learners that helped me to overcome these obstacles: interactive courses in statistics, programming, and advanced analytics, videos, summaries, examples, practices, case studies, papers, documentation, online chat with educational experts, webinars, and communities. The advantages of SAS Viya for Learners over IBM SPSS^® Modeler and R made it much easier to comprehend machine learning and appreciate the entire analytics life cycle thanks to Model Studio and other solutions. I found it enabled students' collaboration, combining well-structured visual analytics workflow with SAS^® and open-source code, parameters that are easy to set up, automated hyperparameter tuning, assessment visualizations, and best model selection. By including SAS Viya for Learners in academic curriculums, I strongly feel that educators could teach more creatively to speed-up students' deep data science comprehension, providing betterprepared professionals to bridge the gap in data science skills. Join me for my presentation to find out more.

Stefan Dimitrov Stoyanov, University of Surrey

Session 4408-2020

Essential Performance Tips for SAS^® Visual Analytics

Troubleshooting performance-related issues across distributed systems is a challenging task, and it requires a step-by-step approach to identify the cause of the bottleneck. When it comes to performance, there are a myriad of factors involved, with each one contributing in its own way to the overall problem. Because SAS^® Viya^® is a broad system with many distributed layers, the variance and complexity is what makes the problem hard to solve, and often causes approaches to fail on given architectures. This paper takes a holistic look at the SAS Viya and explores the various processes and methodology involved in diagnosing performance problems with SAS^® Visual Analytics. Using a customer use case, we will demonstrate to you the diagnostic steps, forensic process, and the tools and methods involved in this process. In order to get the best user experience and optimum performance with SAS Visual Analytics, thought and effort should be put into architecting the application and the underlying infrastructure layers. The various components of the system should be built to be highly responsive on all the different components of the application service. As if peeling an onion, we'll examine each layer of the SAS Visual Analytics until we arrive at its core. Each layer has a rich set of performance-related information to offer along the way‚Ä¶and we promise that it won't bring tears to your eyes.

Meera Venkataramani, SAS Institute

Session 5079-2020

Estimating Jump Diffusion and VaR Models of REITs Returns Using High-Frequency Data in SAS^®

This paper examines the roles of jumps in the time series of Real Estate Investment Trust(REIT) returns. Using realized variance, bipower variation, and high-frequency REITs data,the paper shows how SAS^® programming can be used to estimate the intensity andmagnitude of jumps in the returns and the volatility of the returns of a portfolio of REITssecurities. The paper also shows how value at risk (VaR) can be estimated for a portfolio ofREITs securities when jumps or discontinuity exist in the data. The forecasting accuracy ofjump augmented VaR specifications is also compared with other VaR specifications, includingthose generated from some built-in procedures in SAS, such as the SEVERITY andSURVEYSELECT procedures.

Tunde Odusami, PhD, CFA. Widener University

Session 4739-2020

Estimating Unknown Change Points and Variation Using SAS

Change point (or knot, joint, turning point) was defined as "the time when development switches from one phase to another". Piecewise growth curve model (PGCM) is often used to estimate the underlying growth process. When fitting a PGCM, the conventional practice is to specify the change points a priori. However, the change points were often unknown and misspecifications of turning points could lead to bias of growth trait estimation. Also, there was individual variation in the change points. To estimate the individual specific change point, several different estimation methods (e.g., the profile likelihood, the first-order Taylor expansion and the Bayesian estimation methods) were proposed. Some R packages were developed to estimate the unknown change point. In SAS, the NLMIXED procedure was used to fit the nonlinear random effects models and could potentially be used to estimate the change points. We present the PGCMs to allow individual specific change points as a function of time-varying predictor. We illustrate these respective models with an empirical example to demonstrate the use of SAS in estimation of unknown change points and nonlinear growth curve model. The implication and challenges in fitting these models are discussed.

Depeng Jiang and Lin Yan, University of Manitoba

Zhongyuan Zhang, University of Toronto

Shaoping Jiang, Yunnan Minzi University

Session 5135-2020

Evolutionary Feature Selection for Machine Learning

We propose an evolutionary feature selection technique for the machine learning predictive modeling task, involving two conflicting goals of minimizing the number of features and maximizing the prediction accuracy of the applied machine learning algorithm, in a multiobjective pareto-based dominance form. The evolutionary feature selection approach involves the steps of population initialization, crossover, mutation, and selection, based on a genetic algorithm mimicking the natural evolutionary process. The machine learning problem is thereby defined as a multi-objective optimization model involving the simultaneous optimization of the given objectives, producing a set of optimal solutions, called the Pareto set, where each solution of this set has a different trade-off between the two objectives. We compare the accuracy and run-times using different feature selection approaches and compare it on real-life datasets to show how the proposed evolutionary multi-objective feature selection approach outperforms the rest, along with theoretical justification based on combinatorics and optimization.

Nandini Rakala, Munevver Mine Subasi, and Ersoy Subasi, Florida Institute of Technology

Session 4947-2020

Evolve your SAS^® Administration Skills with SAS Viya^®

Released in 2016, SAS Viya is the newest evolution in the SAS Platform, offering the latest high-performance analytics. SAS Viya brings a fundamental change to the SAS Platform, which has led to changes in how the platform is administered. As an experienced SAS9 administrator, the thought of having to deal with these changes seemed daunting, especially since learning that there is no metadata or SAS Management Console in SAS Viya. Fortunately, SAS Viya has been designed with the administrator in mind. A single web administration user interface, SAS Environment Manager, an HTML5 web application, centralizes the administration tasks. This paper provides examples of common administration tasks in SAS Viya by relating them to the corresponding tasks in SAS 9. By comparing these tasks, administrators familiar with SAS 9 will not only gain confidence with SAS Viya but will also see the benefits that SAS Viya can offer them. The intended audience for this paper is SAS 9 administrators who are interested in learning about SAS Viya 3.4 on both Microsoft Windows and Linux operating systems.

Jack McGuire, Amadeus Software Ltd.

Session 5085-2020

Examining the Impact of Discussion Activities on Student Understanding in Introductory Statistics

This study aims to examine the impact that voluntary participation in online discussion activities has on students understanding of statistical concepts in an undergraduate statistics course. A study of 90 undergraduate students enrolled in an introductory statistics course was conducted. The Levels of Conceptual Understanding in Statistics (LOCUS) assessment was utilized to measure students conceptual understanding in statistics. Form 1 of the 23-question Intermediate/Advanced online version of LOCUS was administered as a pre-test at the start of the 16-week course. Form 2 of the 23-question Intermediate/Advanced online version of LOCUS was utilized as the post-test after completion of the course. A statistical analysis of the difference between pre- and post-test data was completed in SAS^®using propensity score matching techniques.

Rachael Becker, Southern Methodist University

Session 6007-2020

Expanding SAS^® Grid Manager for Platform: Lessons from the Field

We live in a rapidly changing world. It is no longer one size fits all, and IT environments are being forced to adapt to accommodate a plethora of analytical tools, to be more agile, while trying to avoid creating application silos. SAS^® Grid Manager for Platform provides a powerful multi-tenant computing environment that provides high availability and accelerates processing for analytical workloads, which in turn is built on the IBM Spectrum LSF family of products. This same underlying technology manages the Summit (#1), Sierra (#2), Lassen (#10) and PangeaIII (#11) supercomputers on the Top500 list, and the extensive compute grids of many of the worlds largest semiconductor, automotive, and financial companies. To address these changing needs, IBM has developed the Suite for High Performance Analytics enabling SAS Grid Manager for Platform to be combined with other analytical frameworks such as Python, R, Julia, Stata, Matlab, and Dask. It surfaces powerful capabilities in areas such as: Powerful Scheduling Policies ensuring critical SLAs are met; comprehensive support for GPGPUs; Integral Container support for Docker, Singularity, Shifter, and Kubernetes; and Hybrid Cloud for on-demand bursting capacity with data management. These will be discussed using client examples from use cases where SAS Grid Manager for Platform was extended due to shifts in how IT resources are being consumed or business unit analytics workload needs have shifted

Bill McMillan and Qingda Want, IBM

Session 5193-2020

Expedite Analytics Deliver and Empowerment with a Common Analytics Base Table

One of the most time consuming and challenging aspects of an analytics project is the data preparation, and with each new project, that data preparation effort often starts at ground zero once again. It doesn't always have to be that way! Learn from how the data and analytics organization at SAS has taken the concept of an Analytical Base Table (ABT) to the next level by creating what we call a Common ABT. This Common ABT can be used to jumpstart many different analytics projects, significantly reducing the time to delivery while also creating the foundation for analytics empowerment. This technique will help you shift out of the reactive approach to your data preparation and get more proactive and strategic. During this session, we talk about both the technical aspects of the Common ABT approach as well as process-related and people-related recommendations to set your organization up to be successful.

Jennifer L. Nenadic, SAS Institute

Session 4463-2020

Explanatory Machine Learning Model Plots for Epidemiological and Real-World Evidence Studies

For real-world evidence (RWE) and epidemiological studies, comparative effectiveness research (CER) provides actionable information for drug regulatory and health care decision-making. CER relies on white box statistical and machine learning (ML) models for estimating treatment effect and drug safety evaluation. Black box ML models are also powerful for generating better predictions, but how to interpret the model results is not straightforward. How should ML model results be presented to regulators to assure them that the results are accurate, fair, and unbiased? How should ML model findings be presented to end users to overcome the stigma of black box model bias? We provide a standardized interpretability plots framework for evaluating and explaining patient-level ML models using observational health-care data. The paper shows how to use SAS^® Cloud Analytic Services action sets and model-agnostic interpretability tools available in SAS^® Visual Data Mining and Machine Learning to explain the functional relationship between model features and target outcome variable. In addition to using the partial dependence and individual conditional expectation plots, we present some use cases and example code to demonstrate how plots that represent time-varying and non-time dependent variables interaction, cohort-period-feature effect, gender-age group stratification, beneficial and untoward effects of drug exposure, drug-disease interactions, and confounding-by-indication effect can be used to explain ML model results.

David Olaleye, SAS Institute

Session 4649-2020

Exploring DataOps in the Brave New World of Agile and Cloud Delivery

DataOps is the new black, providing a combination of DevOps and agile practices to automate the creation and management of data, analytics, and visualization on cloud platforms. DataOps is revolutionizing the way we deploy and manage data platforms, and SAS^® Viya^® is embracing DataOps within it's core. In this session, I outline some of the key components required to adopt the new DataOps way of working and provide examples of the benefits each component will deliver. If you want to understand what DataOps is and how you should leverage it to build and manage better data platforms and processes, then this is the session for you.

Shane Gibson, PitchBlack

Session 4809-2020

Exploring Online Drug Reviews using Text Analytics, Sentiment Analysis, and Data Mining Models

Drug reviews play a very significant role in providing crucial medical care information for both healthcare professionals and consumers. Customers are utilizing online review sites to voice opinions and express sentiments about experienced drugs. However, a potential buyer typically finds it very hard to go through all comments before making a purchase decision. Another big challenge is the unstructured and textual nature of the reviews, which makes it difficult for readers to classify comments into meaningful insights. For these reasons, this paper primarily aims to classify the side effect level and effectiveness level of prescribed drugs by using text analytics and predictive models within SAS^® Enterprise Miner. Additionally, the paper explores specific effectiveness and potential side effects of each prescription drug through sentiment analysis and text mining within SAS^® Visual Text Analytics. The study's results show that the best performing model for side effect level classification is the rule-based model with a validation misclassification rate at 27.1%. Regarding effectiveness level classification, the text rule builder model also works best with a 22.4% validation misclassification rate. These models are further validated by using a transfer learning algorithm to evaluate performance and generalization. The results can be used to develop practical guidelines and useful references to facilitate prospective patients in making better informed purchase decisions.

Thu Dinh and Goutam Chakraborty, Miriam McGaugh, Oklahoma State University

Session 4330-2020

Expressions in Graph Template Language and Other Tips

This paper is a reprise of the SAS^® Global Forum 2013 paper entitled "Free Expressions and Other GTL Tips". SAS^® Graph Template Language (GTL) provides many powerful features for creating versatile graphs. The statistical graphics capability in GTL lets you use expressions directly in the templates in order to simplify data preparation. This presentation covers some ways of using DATA step functions to create grouped plots based on conditions and to select a subset of the observations. It also illustrates how to use the SAS Function Compiler (FCMP) procedure functions in GTL expressions. Tips on using non-breaking spaces for creating "chunked" graphs and indenting text are discussed. Workarounds for rendering Unicode characters using data column variables are also discussed.

Prashant Hebbar, SAS Institute

Session 5039-2020

Face Recognition Using SAS^® Viya^®: Guess Who the Person Is!

Humans can take a look at an image and instantly recognize the object in the image, the person in the image, or the location of the photo. The human cognitive system is fast and reliable, allowing people to perform very complex tasks like driving or operating a machine with little conscious thought. For a computer, performing these tasks would be very tough. But using fast, accurate, and reliable algorithms could enable computers to drive cars with sensors or enable them to recognize humans, operate machines, and even perform surgeries. The digital universe is expected to reach 44 zettabytes by 2020 because of the growth of the Internet of Things (IoT). This shows us the massive opportunity we have in terms of digital content analytics. Facial recognition and classification algorithms like deep learning and neural networks can extract information from photos or videos and classify them almost instantaneously after they are posted online. In this project, the rest one image was used for validation. Several models have been designed using deep learning techniques like deep fully connected neural networks (DNN), convolutional neural networks (CNN), and recurrent neural networks (RNN). All the work has been done using SAS^®DLPy, by pulling the Jupyter notebook into SAS^®Viya^®. Through this project, the objective of automatically detecting who the celebrity is was achieved. This approach can be further used to segregate the images into different folders.

Pratyush Dash, Oklahoma State University

Anvesh Reddy Minukuri, Comcast

Session 4981-2020

Fast Deployments and Cost Reductions: SAS^® in the Azure Cloud with HDInsight and the Azure Data Lake

Cost reduction is often a key objective of businesses but reducing costs without compromising on performance can be a challenge. The objective of this project was to reduce costs while improving performance. We found that combining SAS^® with Microsoft Azure Data Lake Gen2 could be part of the solution. It encompasses the power of a file system that is compatible with Apache Hadoop with integrated hierarchical namespace, along with the massive scale and economy of Microsoft Azure Blob storage. With the use of SAS^® Grid Manager for Platform, our Kerberized platform talks to multiple Microsoft Azure HDInsight clusters, enabling us to distribute the different demands of users while reading from the same data lake. There are some key requirements that you need to be aware of prior to deployment but once they are in place, the deployment time for a new SAS and Hadoop cluster is hours, rather than weeks or months. The advantage of having the data physically stored separate from the Hadoop cluster is that once you have destroyed the cluster, the data is still there. In this paper, we discuss this further and explain the steps required to make the most out of SAS^® integrating with Azure HDInsight on the Azure Data Lake Gen2.

Andrew Williams, Vanquis Bank Ltd

Session 5185-2020

Finding Your Seat at the Table

Have you ever found yourself wondering how to take that next step, wishing you had more of an impact, participating in discussions about how to move your organization's ideas forward? Do you think you warrant a seat at the strategic table? The answer is likely yes, but the opportunity and respect will need to be earned. The programming profession is dominated by technical expertise, but this alone will only advance you so far. Leadership skills must be learned and applied to make a broader impact within your organization. There are many definitions of leadership, but most are variations of a common theme. We view leadership as the ability to influence others to take specific action when they have the freedom to choose to do otherwise. This fundamentally differs from management. Many leadership principles are established and are not industry specific. Our technical training rarely emphasized or even taught these principles. Further, many of us are introverts and may not feel that the publicly visible skills required for leadership come easily. These are not prohibitive factors. Leadership principles can be learned and practiced every day, but a strategy must be developed. Your efforts must be intentional. In this paper we apply leadership and acumen principles to our programming profession to help you transition from an individual contributor to an organizational contributor, and find your seat at the table.

William Coar, Axio Research - A Cytel Company

Session 4745-2020

Fitting Statistical Models with PROC MCMC

Bayesian inference, in particular Markov Chain Monte Carlo (MCMC), is one of the most important statistical tools for analyses. Although there is free access to many powerful statistical software tools for Bayesian analysis, still, it is challenging both to learn and to apply to real life research. SAS^® has facilitated many procedures using Bayesian analysis which make it much easier to use, particularly for SAS users. This presentation demonstrates various examples such as ‚Äòone sample proportion', ‚Äòtwo sample proportion', and ‚Äòtwo sample t-test', to more advanced models via Bayesian analysis. The results will be compared with non-Bayesian models. Many real-life examples in medicine, clinical trials and meta-analysis will be given.

Mark Ghamsary, Keiji Oda, and Larry Beeson, Loma Linda University

Session 4737-2020

Five Simple Ways to Know If Variables in a Table Are All Missing

Before doing data analysis, we sometimes need to know if certain variables in a table are all missing, and need to drop these all-missing variables. This paper is introducing five simple ways to achieve this goal. Unlike data step code which needs more code to accomplish it, these five simple ways only cost several SAS^® statements.

Xia Ke Shan, iFRE Inc.

Kurt Bremser, Allianz Technology Austria

Session 4785-2020

Forecasting Healthcare Statistics At the Cleveland Clinic

A thorough understanding of key volumes is of utmost importance to effectively manage a hospital system. As the Cleveland Clinic's analytical culture matures, we see a shift from reactive decision making to proactive decision making. This shift in emphasis calls for the increased application of statistical forecast models inserted into stakeholder work flows and at the fingertips of leaders and key decision makers. However, model development and deployment are not enough. Active end-user engagement and buy in are critical in order for forecasting methods to be accepted and used to make decisions. This requires consistent interaction between our data scientists and end users beyond just the modeling and validation process. This presentation covers Cleveland Clinic's complete journey from model inception through end-user acceptance and adoption for weekly business management. We discuss the forecast model building; application of Monte Carlo simulations and programming; output visualization; and the challenges achieving end user adoption.

Michael Bromley, Michael Lewis, Colleen Houlahan, and Eric Hixson Cleveland Clinic

Session 4743-2020

Forecasting Hourly Electricity Prices

Forecasting power prices can be more of an art than it is a science. Locational Marginal Prices (LMPs) exhibit high volatility-a phenomenon known as volatility clustering as with many financial time series data. This paper will not discuss the mean reverting properties of power prices and assumes the reader has a basic understanding of the autoregressive nature of electricity prices. This method forecasts LMPs by preserving past volatility in the form of ratios and assigning them to corresponding load in the forecast. The purpose is to "shape" the forecasted series by utilizing historical profiles while holding to supply and demand fundamentals-high electricity usage tends to be associated with higher energy prices.

Joseph Perez, American Electric Power

Session 4672-2020

From AI to XI: It all starts with 'HI'

"HI" is a universal way to greet family, friends, and acquaintances. It is a common word we say many times every day. "HI" can also be very powerful in the context of human intelligence. Humans have always been the core for delivering intelligence, and we as humans provide the intelligence that feeds many XI platforms such as business intelligence, data intelligence, customer intelligence, machine intelligence, artificial intelligence, and augmented intelligence. Humans have the brain power that is the platform for providing pervasive intelligence.

Tho H. Nguyen, Teradata

Session 4658-2020

From Device Text Data to a Quality Data Set

Data quality in research is important. It may be necessary for data from a device to be used in a research project. Often it is read manually from an external file and entered onto a CRF. Then the data is manually read from the CRF and entered it into a database. This process introduces many opportunities for data quality to be compromised. The quality of device data used in a study can be greatly improved if the data can be read directly from a device's output file directly into a dataset. If the device outputs results into a file that can be saved electronically, SAS^® can be used to read the data needed from the results and save the data directly into a dataset. Quite often, device data is saved in separate files per subject and it is often difficult to import each separate file into SAS without great effort. If data is organized with the subject ID as the folder name and each subject's data in the corresponding folder, SAS^® can also be used to read the data from a general location, importing all data within each location. In addition to improving data quality, data collection and monitoring time can also be reduced by taking advantage of these electronic files as opposed to recapturing this data on a CRF.

Laurie Smith, Cincinnati Children's Hospital Medical Center

Session 4747-2020

From Santiago to the Spirit Lake Nation: 30 Things I Learned in 30 Years as a Statistical Consultant

All consultants face a few common problems; getting clients, getting the data, getting to know their data, fixing data problems, finally getting to the statistical analysis and communicating the results to an audience. After getting paid, communication is the most important part of the process. If your client doesn't understand your analysis, the findings aren't going to be applied and you're not going to get repeat business. Fortunately, SAS^® procedures from statistical graphics to PROC FORMAT ease translation of results, leaving both you and the client happy. A major difference between mathematical statistics and statistical consulting is the messiness of the data. The ability to recognize and rectify data problems is a major factor in success as a consultant. Fortunately, again, SAS offers a wealth of possibilities for testing and improving data quality. Your clients count on you to have performed the right statistical procedures in the right way. SAS offers an enormous range of sophisticated procedures, but beware of the simple things that can trip you up.

AnnMaria De Mars, The Julia Group

Session 4662-2020

From Silos to Ecosystem - Evolution of Analytical Platforms at Elisa Corporation

SAS in Elisa was originally a standalone product where metadata, data and users were siloed in three separate environments for SAS Enterprise Guide, SAS Enterprise Miner and SAS Visual Analytics. Data tasks were manually executed, and the data produced by SAS was not available to other environments. Open source team relying on R and Python were not able to utilize the SAS datasets directly. In late 2018, as part of the enterprise BI architectural strategy, Elisa ventured to build a new SAS Analytics & Data Ecosystem - a collection of infrastructure, analytics and applications used to capture and analyze data. By deploying modern SAS Viya and SAS 9.4 platforms Elisa implemented standardized approaches, automation, tools, and processes.

Prasanna Pandian, Elisa Corporation

Jarno Lindqvist, SAS Institute

Session 4662-2020

From Silos to Ecosystem - Evolution of Analytical Platforms at Elisa Corporation

Prasanna Pandian, Elisa Corporation Jarno Lindqvist, SAS Institute

Session 4615-2020

Getting from Governance Practice to Data Awareness

If you want to have a good practice of governance on assets with your organization's employees, then it is vital to put good policies, procedures, measures, and rules in place to manage those assets. Good management of assets alone is not enough in the connected and information-rich world we engage in; we must strive to have good practices for people and processes to follow in order to ensure compliance. These policies and practices need to be fed and managed by good information sources and tools. These assets are typically described or managed with a system such as a catalog. The catalog enables your systems and people to collect and manage information about your information or about the metadata of your systems. This catalog enables you to add information, relate assets, assign responsibilities, and find the information necessary to uphold and implement the policies you designate in your governance practice. The practice of cataloging assets such as data, processes, analytical models, digital locations, and intellectual property gives you the ability to not only inventory the assets, but, through their relationships, you get insight into how they are connected. This connection or relationship among assets can provide you with knowledge in how assets are created, consumed, produced, processed, classified, or are influential within your systems. This knowledge or awareness of your assets leads to more insight into how relevant information is-in other words, you have data awareness.

Chris Replogle, SAS Institute

Session 4934-2020

Getting the Time Right Across Time Zones

Suppose you enter in your digital agenda the date and time for a presentation in, say, Washington, while living in another time zone. When in Washington you look at your agenda and wonder: in what time zone is the time displayed? The local time at home, or the current time? Those kind of problems occur also when computers in different time zones work together, or when daylight savings time (DST) is involved.

Frank Poppe, PW Consulting

Session 4197-2020

Git for the SAS^® Programmer: Using Source Control to Organize Your Code and Collaborate with Others

Are you tired of using elaborate comments in your code or saving multiple copies of your files to manage changes as you make them? Wish you could go back in time to that version of your program that worked? Do you live in terror of clobbering someone else’s work? Version control can help, and the front runner in the version control world is Git. Git is a free and open source distributed version control system that you can use on your own or in collaboration with others. It can also be used with a central, shared repository such as GitHub or Bitbucket. Learn Git concepts such as clone, commit, and merge, and how to execute them using the Git interfaces in SAS® Studio and SAS® Enterprise Guide® or in code using SAS® functions.

Amy Peters, Danny Zimmerman, Grace Whiteis, Joe Flynn, and Stan Polanski, SAS Institute Inc.

Session 5194-2020

Git it? Got it? Good: Migrating from SAS^® to SAS^® Viya^® Using Git

You have established your workflows on SAS^®9 but are excited about the possibilities of moving to SAS^® Viya^®. What does it look like to do SAS^® in-memory, and in a clientless world? And what are you going to do with your current code? Let us help you get there with some ideas about how to migrate your code using the power of Git!

Cameron Estroff and Weston McManus, SAS Institute

Session 5204-2020

Hate Speech Classification of social media posts using Text Analysis and Machine Learning

Hate crimes are on the rise in the United States and other parts of the world. Hate speech is one tool that a person or group uses to let out feelings of bias, hatred and prejudice towards a religion, race, ethnicity, ancestry, sexual orientation, gender or disability thereby spreading hatred. This paper focuses on how SAS^® Enterprise Miner's Text Analytics was used to develop a model that categorizes tweets based on their content, specifically hateful vs normal. After sampling and cleaning of the data and breaking the tweets down into quantifiable components, different models were built and compared. The best performing model was used to score unseen data, achieving reasonable accuracy in classification. This paper touches upon how text analytics could be harnessed by organizations like Twitter for encouraging civic responsibility in its users. By providing a feature at the user-level which allows tweets to be labelled as a particular category as they are typed, the users might be given an opportunity to review and possibly modify any hateful tweets before posting them.

Venkateshwarlu Konduri, Sarada Padathula, Asish Pamu and Sravani Sigadam, Oklahoma State University

Session 5035-2020

History Carried Forward, Future Carried Back: Mixing Time Series of Differing Frequencies

Many programming tasks require merging time series of varying frequency. For instance, you might have three data sets (YEAR, QTR, and MONTH) of data, each with eponymous frequency and sorted by common ID and date variables. Producing a monthly file with the most recent quarterly and yearly data is a hierarchical last-observation-carried-forward (LOCF) task. Or, you might have three irregular times series (ADMISSIONS, SERVICES, TESTRESULTS), in which you want to capture the latest data from each source at every date encountered (event-based LOCF). These are tasks often left poorly optimized by most SQL-based languages, in which row order is ignored in the interests of optimizing table manipulation. This presentation shows how to use conditional SET statements in the SAS^® DATA step to update specific portions of the program data vector (that is, the YEAR variables or the QTR variables) to carry forward low frequency data to multiple subsequent high frequency records. A similar approach works just as well for carrying forward data from irregular time series. We also show how to use "sentinel variables" as a means of controlling the maximum time-span data is carried forward; that is, how to remove historical data that has become "stale." Finally, we demonstrate how to modify these techniques to carry future observations backward (NOCB1), without re-sorting data

Mark Keintz, Wharton Research Data Service

Session 5114-2020

How a SAS^® Function and the ODS PACKAGE Statement Will Help You Save Money and Space

In these times when data storage is costly, organizations often keep their data in compressed files on their servers to save both money and space. It can be tedious and time consuming to take data out of compressed files, use the data, and then delete the extracted files. However, with the use of the GETOPTION(work) function, the contents of a dataset in a compressed file can be read and processed in the work library. This paper demonstrates how to unzip files into the work library and how to create new zip files using the ODS Package Destination. This will keep your network space in a good shape.

Ricardo Rosales, Westat Inc.

Session 4307-2020

How Many Shades of Guide: SAS Enterprise Guide to 8.1 and SAS Studio to 3.8 with SAS 9.4

I have been using SAS^® Enterprise Guide^® since version 1.1.1 in 2001, and SAS^® Studio since version 3.1 in 2013. SAS Enterprise Guide version 8.1 and SAS Studio version 3.8 have developed immensely from their original releases. This paper tracks their progress and convergence towards the perfect SAS development environment. If you are not yet using SAS Enterprise Guide or SAS Studio, then your IT department may be unnecessarily overworked, and you are missing out on some very useful programming features.

Philip R Holland, Holland Numerics Limited

Session 4891-2020

How SAS^® Viya^® Provides a Way to Deliver Analytics to Everyone: Image Classification Examples

We are living in an era when machine learning (ML) helps to solve complex tasks for enterprises and improve productivity. ML techniques such as image recognition are continually improving, but they are of no use if an enterprise cannot easily implement them. The scarcity of data science experts and the high costs associated with hiring them can be a significant barrier, especially for small and midsize enterprises. In this presentation, we argue that modern tools like SAS^® Viya^® provide an opportunity even for small enterprises to build complex models with high accuracy. For this purpose, we use MNIST and FashionMNIST data sets that have been used to test image classification methods. We show that using SAS Viya with default settings and without coding, you can get a model that misclassifies about 3.5-6.0% and 11.1-11.9% of images for the first and second data sets, respectively. Simple tuning with a graphical user interface enables you to achieve misclassification rates as low as 2.3-2.7% and 9.8-10.7%. In many cases, such rates are satisfactory for enterprises, and they can benefit from machine learning techniques even if for any reason they lack data scientists. SAS Viya makes the process of building complex models simple and time-efficient and opens opportunities for all kinds of people within enterprises to perform data analysis. This presentation is intended for a technical audience with a basic knowledge of SAS^® as well as for non-experts in data science who want to start using ML.

Pavel Rogatch, Analytium BY

Session 4266-2020

How To Access and Manage Microsoft Azure Cloud Data using SAS

The popularity of cloud data storage has grown exponentially over the last decade as more and more organizations are transitioning from on-premises to cloud data storage and data management. Microsoft Azure is one of the big players accelerating the move to cloud. In this paper, we cover the following topics: ‚Ä¢ Overview of SAS^® to access and manage data in Azure Storage. ‚Ä¢ SAS best practices and options to work with big data and relational databases in Azure Storage. This paper contains data access examples and use cases to explore Azure cloud data using SAS.

Kumar Thangamuthu, SAS Institute

Session 4718-2020

How to Become a Lazy Programmer

As a programmer, how often do you see yourself writing the same skeleton of a code again and again? Quite often right? What if you can cut down that time just by using the necessary fillers? This paper will walk you through how to implement and share abbreviation macros using SAS^® Enhanced editor on Windows. This is nothing but an approach to become a lazy programmer in SAS Enterprise Guide^® (SAS EG).

Balakumar Marappa Gounder Ponnusamy, Element Technologies Inc

Session 4442-2020

How to Build a Text Analytics Model in SAS^® Viya^® with Python

Python is widely noted as one of the most important languages influencing the development of machine learning and artificial intelligence. SAS^® has made seamless integration with Python one of its recent focal points. With the introduction of the SAS^® Scripting Wrapper for Analytics Transfer (SWAT) package, Python users can now easily take advantage of the power of SAS^® Viya^®. This paper is designed for Python users who want to learn more about getting started with SAS^® Cloud Analytic Services (CAS) actions for text analytics. It walks them through the process of building a text analytics model from end to end by using a Jupyter Notebook as the Python client to connect to SAS Viya. Areas that are covered include loading data into CAS, manipulating CAS tables by using Python libraries, text parsing, converting unstructured text into input variables used in a predictive model, and scoring models. The ease of use of SWAT to interact with SAS Viya using Python is showcased throughout the text analytics model building process.

Nate Gilmore, Vinay Ashokkumar, and Russell Albright, SAS Institute

Session 4502-2020

How to Explain Your Black-Box Models in SAS^® Viya^®

SAS^® Visual Data Mining and Machine Learning in SAS^® Viya^® offers a number of algorithms for training powerful predictive models, such as gradient boosting, forest, and deep learning models. Although these models are powerful, they are often too complex for people to understand by directly inspecting the model parameters. The "black-box" nature of these models limits their use in highly regulated industries such as banking, insurance, and health care. This paper introduces various model-agnostic interpretability techniques available in SAS Viya that enable you to explain and understand machine learning models. Methods include partial dependency (PD) plots, independent conditional expectation (ICE) plots, local interpretable model-agnostic explanations (LIME), and Shapley values. This paper introduces these methods and demonstrates their use in two scenarios: a business-centered modeling task and a health-care modeling task. Also shown are the two different interfaces to these methods in SAS Viya: Model Studio and the SAS Viya programming interface.

Funda Güneş, Ricky Tharrington, Ralph Abbey, and Xin Hunt, SAS Institute Inc.

Session 4313-2020

How to Maintain Happy SAS^® Viya^® Users

Today’s SAS® Viya® environments support many concurrent processes, using ever-growing data volumes. To help SAS Viya users remain productive, SAS® administrators must ensure that the SAS Viya applications have both sufficient and properly configured compute resources that are continuously monitored. Understanding how all the SAS components work and how they are employed by your users is the first step. The guidance offered in this paper helps SAS administrators configure and tune SAS Viya hardware, operating systems, and infrastructure. This tuning and configuration will keep their SAS Viya applications running at optimal performance and keep their user community happy.

Margaret Crevar, SAS Institute

Session 4678-2020

How to Make Your First Impressive Web Application with Stored Processes and a Web Browser

Many organizations are experiencing the value of applications that can be run from a web browser. We all know that virtually anything is possible using this technology, which will usually consist of some HTML, CSS, and JavaScript running in the browser, with various other software running on a server. SAS^® has provided the Stored Process Web Application that lets us connect the web browser to the SAS server, which opens up an enormous range of potential applications. In the simplest form, we can use a stored process to prompt the user for some info, run SAS code using that info, and return results to the web browser. From this simple starting point, this paper shows you how to make a more powerful and flexible web application. The emphasis is on showing those who know SAS (and nothing about web technologies) as simply as possible the steps to build a generic web application that they can use to start building their own web applications.

Philip Mason, Wood Street Consultants Ltd.

Session 4761-2020

How To Make Your Reports Fly! The Impact of Visualization Layer Calculations on Performance

SAS^® Visual Analytics reports at the University of North Carolina Chapel Hill can be written against tables containing 500 GB of data. Reporting in an enterprise-sized environment has a specific set of requirements that were not present in the smaller environments of the past. Not paying attention to these differences can slow report visualization to an unusable crawl. However, there is one easily addressed methodology that gets the most out of your reporting performance. This paper examines the impact of visualization layer calculations (which is common in non-enterprise-sized reporting) on enterprise-sized reports. This paper also goes into the details of how some of our reports had performance increases from minutes to seconds when going against our largest tables.

Jessica Fraley, University of North Carolina at Chapel Hill

Session 4629-2020

How to Master a Risk Data Vault Using SAS^® Data Integration Studio

When implementing a new risk analysis platform, the choice was to use SAS^® for both extract, transform, load (ETL) and credit scoring. A part of the solution is a risk operational data store (ODS), in which a data vault plays a central role. And it showed that SAS^® Data Integration Studio was up to the task. We go through the data architecture and data modeling issues, and sum up our experiences of populating and querying the ODS using SAS Data Integration Studio, as well as using SQL Server as the ODS database.

Linus Hjorth, Infotrek

Session 4506-2020

Human Bias in Machine Learning: How Well Do You Really Know Your Model?

Artificial intelligence (AI) and machine learning are going to solve the world's problems. Many people believe that by letting an "objective algorithm" make decisions, bias in the results have been eliminated. Instead of ushering in a utopian era of fair decisions, AI, and machine learning have the potential to exacerbate the impact of human biases. As innovations help with everything from the identification of tumors in lungs to predicting who to hire, it is important to understand whether some groups are being left behind. This talk explores the history of bias in models, discusses how to use SAS^® Viya^® to check for bias, and examines different ways of eliminating bias from our models. Furthermore, we look at how advanced model interpretability available in SAS Viya can help end users to better understand model output. Ultimately, the promise of AI and machine learning is still a reality, but human oversight of the modeling process is vital.

Jim Box, Elena Snavely, and Hiwot Tesfaye, SAS Institute

Session 4208-2020

Hybrid Marketing, Campaign Management, and Analytics' Last Mile Using SAS^® Customer Intelligence 360

The marketing industry has never had greater access to data than it does today. The more we know the customer, the more we understand the essence of customer experience. The more we understand customer experience, the more we can shape it, develop it, and better serve the customer. According to Daniel Newman (Futurum Research) and Wilson Raj (SAS Institute) in the October 2019 research study Experience 2030: "Brands must reinvent their operating models to act in the moment. They need a holistic data and technology strategy that they can individualize at scale, customer journey capabilities that can adapt in real time, and intelligent decisioning to automate the self-reinforcing cycle of tailored experiences. And that's just today. Tomorrow's customer journeys and personalization will be even smarter, more immersive, and more trustenabling. More customer experience initiatives will be run by AI and machine learning algorithms embedded into automated software applications." The question is: Are brands ready?

Suneel Grover, SAS Institute and The George Washington University

Session 4695-2020

IBM Power Systems for SAS^® Empowers Advanced Analytics

For over 40+ years of partnership between IBM and SAS^®, clients have been benefiting from the added value brought by IBM's infrastructure platforms to deploy SAS analytics, and now SAS Viya's evolution of modern analytics. IBM Power^® Systems and IBM Storage empower SAS environments with infrastructure that does not make tradeoffs among performance, cost, and reliability. The unified solution stack, comprising server, storage, and services, reduces the compute time, controls costs, and maximizes resilience of SAS environment with ultra-high bandwidth and highest availability.

Harry Seifert and Laurent Montaron, IBM Corporation

Session 4680-2020

Identifying "One-Offs": Tuberculosis Genotype Cluster Detection Using the COMPGED Function

In surveillance and epidemiologic functions of public health departments, SAS^® software programs are leveraged to identify clusters and outbreaks. Tuberculosis (TB) genotyping is a method used to identify TB patients with closely related strains, indicating potential recent transmission. The genotyping process produces a resulting GENType, comprised of the 15 length spoligotype string and a 24-length mycobacterial interspersed repetitive units (MIRU) string. Each week, the TB Control Program (TBCP) in Los Angeles County (LAC) receives a weekly update of genotyping results of TB isolates. TBCP monitors additions to high priority genotype clusters, i.e. matching GENTypes, and looks for patients with a result string that is one alphanumeric character different than one of the high priority GENTypes. Using a macro %DO loop, the COMPGED function quantifies differences between a patient's TB isolate and a reference library of genotypes for high priority clusters in the County. The quantifiable differences determine exact matches and results that are "one-off" are further investigated by the outbreak team to interrupt TB transmission.

Edward Lan, County of Los Angeles Department of Public Health Tuberculosis Control Program

Session 5087-2020

If You Build it Well, They Will Come! Responding to a Growing Analytics Program with SAS^® Viya^®

The objective of this breakout session shows strategies, tips, and lessons learned while implementing SAS^® Viya^® for use in a Business Analytics curriculum. The University of Arkansas Walton College of Business, located in Fayetteville, Arkansas, is accredited by the Association to Advance Collegiate Schools of Business (AACSB). SAS has been instrumental in helping to design our business analytics programs and has recognized the value of our program by presenting to graduate and undergraduate students an endorsed certificate upon completion. The initial portion of this session outlines the journey to our first iteration of SAS Viya on a single-node server running all our licensed SAS Viya components. The subsequent portion of this session discusses the growth in the analytics program, which created the need for a far more versatile solution. Details include the process of deploying a six-node distributed SAS Viya solution, including planning, documentation, unforeseen issues and how they were overcome, and other steps that have led to confidence by faculty in teaching SAS Viya as well as a solid learning experience for students.

Michael Gibbs and Dr. Ronald Freeze, University of Arkansas, Sam M. Walton College of Business

Session 4957-2020

If You Need These OBS and These VARS, Then Drop IF and Keep WHERE

Reading data effectively in the DATA step requires knowing the implications of various methods and DATA step mechanics, the observation loop, and the program data vector (PDV). The impact is especially pronounced when you are working with large data sets. Individual techniques for subsetting data have varying levels of efficiency and implications for input/output time. Using the WHERE statement/option to subset observations consumes fewer resources than using the subsetting IF statement. Also, use of the DROP= and KEEP= options to select variables to include/exclude can be efficient, depending on how theyre used.

Jay Iyengar, Data Systems Consultants LLC

Session 4312-2020

Important Performance Considerations when Moving SAS to the Public Cloud

When choosing a hardware infrastructure for your SAS^® applications, you need a solid understanding of all the layers and components of the SAS infrastructure. You also need to not just successfully run the software but to optimize its performance. Finally, you need an administrator to configure and manage the infrastructure. This paper discusses important performance considerations for SAS^® 9 (both SAS^® Foundation and SAS^® Grid Manager) and for SAS^® Viya^® when hosted in any of the available public clouds-Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Platform, and so on. It also provides guidance on how to configure the cloud infrastructure to get the best performance with SAS. Disclaimer: We strongly encourage you to take the advice in this paper and work with your local public cloud teams to make sure the instances you decide to use are available in the closest region and that you understand their costs. In addition, any advice in this paper is based on the information we have at the time of publishing this paper (March 2020).

Margaret Crevar, SAS Institute

Session 4175-2020

Improving Facility Management through Machine Learning

The facility management industry is experiencing an influx of smart technologies and a retiring workforce. There are not enough qualified replacements coming into the industry to offset the retiring engineers and maintenance staff. Facility owners are faced with adopting technology to augment their workforce shortages and are constantly being asked to do more with less. More facilities, more square footage, more complex systems, while being given fewer resources and tighter budgets. Technology, specifically advanced analytics and machine learning, must be adopted and operationalized to augment the current tools and staff. The primary business drivers are enhanced productivity through root cause analysis, energy reduction, and streamlined operations, while establishing and exceeding sustainability goals. This case study explains the journey from a local government perspective on how they added an initiative to the City's Strategic Plan, piloted the technology to prove the value, and created the business case to get approval from the City Manager for expansion.

Chris Beall, Joel Urban, and Sean Murphy, Building Clarity

Session 4311-2020

Incorporating Auxiliary Information into Your Model Using Bayesian Methods in SAS Econometrics

In addition to data, analysts often have available useful auxiliary information about inputs into their model-for example, knowledge that high prices typically decrease demand or that sunny weather increases outdoor mall foot trafÔ¨Åc. If used and incorporated correctly into the analysis, the auxiliary information can signiÔ¨Åcantly improve the quality of the analysis. But this information is often ignored. Bayesian analysis provides a principled means of incorporating this information into the model through the prior distribution, but it does not provide a road map for translating auxiliary information into a usefulprior. This paper reviews the basics of Bayesian analysis and provides a framework for turning auxiliary information into prior distributions for parameters in your model by using SAS^® Econometrics software. It discusses common pitfalls and gives several examples of how to use the framework.

Matthew Simpson, SAS Institute

Session 5116-2020

Interpretation Methods for Black Box Machine Learning Methods in Insurance Rating-Type Applications

Traditional generalized linear models (GLM) are often favored in the insurance realm for rating due to their interpretation simplicity and intuitive distributional assumptions. This is partly driven by insurance regulation needs and partly based on customers demand for explanation of their rates. In insurance rating, there is often a need to provide reason codes and customer feedback in determining the factors adversely impacting policyholders individual premiums. Black-box machine learning models, on the other hand, often lack transparency and interpretability but have powerful predicting potential. Actuarially, modelers have to balance between accurate confident pricing and model interpretability, for both the collective customer population and individual customers. In this presentation, we demonstrate the value of interpretation methods at the global and local level for a gradient boosting decision tree model using simulated auto insurance data. We review partial dependence (PD), individual conditional expectation (ICE), and accumulated local effects (ALE) plots for global variable level interpretation as a substitute for parameter estimate and variable significancetype analysis. We also demonstrate the use of localized interpretable model-agnostic explanations (LIME) and Shapley values for local prediction explainability. LIME and Shapley values can be used independently or together to provide feedback at an individual customer level. This research was done in SAS^®Viya^®.

Gabe Taylor, Sunish Menon, and Huimin Ru, State Farm

Ray Wright, SAS Institute

Session 4281-2020

Introducing PROC SIMSYSTEM for Systematic Nonnormal Simulation

Simulation is a key element in how decisions are made in a wide variety of Ô¨Åelds, from Ô¨Ånance and insurance to engineering and statistics. Analysts use simulation to estimate the probability that a business process will succeed, that an insurance portfolio will fail, or that a statistical method will control Type I error. Often, the base case that is used to devise such processes depends on assuming that random variation is normal (Gaussian). Normal variation is common and is often easier to compute with. But it is still an assumption, so robust simulation studies should explore what happens when that assumption is violated. The SIMSYSTEM procedure, new in SAS^® Visual Statistics 8.5, enables you to simulate random variation from a wide class of distributions. You can use PROC SIMSYSTEM to generate random variates with any possible values for skewness and kurtosis, which deÔ¨Åne the asymmetry and tail behavior of the underlying distribution. Key tools are computational facilities for handling the Pearson and Johnson families of distributions, whose parameters span this skewness/kurtosis space. This paper introduces PROC SIMSYSTEM and demonstrates its utility with examples that simulate complex processes in science and industry.

Bucky Ransdell and Randy Tobias, SAS Institute

Session 4283-2020

Introducing the GAMSELECT Procedure for Generalized Additive Model Selection

Model selection is an important area in statistical learning. Both SAS/STAT^® software and SAS^® Visual Statistics software provide a rich set of tools for performing model selection over linear models, generalized linear models, and Cox proportional hazards models. With these tools, you can build parsimonious predictive models by constructing linear or Ô¨Åxed nonlinear effects to describe the dependency structure. But what if the dependency structure is nonlinear and the nonlinearity is unknown? And what if covariates are nonlinearly correlated? The new GAMSELECT procedure, available in SAS Visual Statistics, addresses these questions by using spline terms to approximate the nonlinear dependency, then selecting important variables in appropriate nonlinear transformations by using the boosting method or the shrinkage method. The procedure builds models for response variables in the exponential family, so that you can use it for continuous, count, or binary responses. This paper introduces the GAMSELECT procedure and provides a brief comparison to related SAS^® procedures.

Michael Lamm and Weijie Cai, SAS Institute

Session 4626-2020

Introduction to SAS/ACCESS^® Interface to Google BigQuery

Google BigQuery is a service running on the Google Cloud Platform that facilitates analysis of massive data sets working in tandem with Google Cloud Storage. SAS/ACCESS^® Interface to Google BigQuery enables SAS^® to take advantage of this exciting technology. This paper describes Google BigQuery and discusses how it is different from other databases that you might have used in the past. Using examples, we discuss the following topics: how to get started connecting your SAS software to Google BigQuery; tricks to get the most out of SAS/ACCESS Interface to Google BigQuery; how to effectively move data into and out of Google BigQuery; and potential gotchas (or idiosyncrasies) of Google BigQuery. This paper uses an example-driven approach to explore these topics. Using the examples provided, you can apply what you learn from this paper to your environment.

Joel Odom, SAS Institute

Session 4870-2020

Is it possible to reduce smoking by increasing taxes? An analysis on Danish data using SAS^® software

In Denmark all ideas to reduce smoking are discussed. It is commonly accepted that increasing prices is an efficient tool to reach this goal. In a Danish context taxes are frequently used for such purposes as we already have a heavy taxation of all tobacco products as we also tax alcohol, fuel and even for some years ago and extra taxation on the content of fat in goods like meat, cheese, chocolate etc. The question is however whether it is true that smoking could be reduced by increasing taxes. This paper presents several statistical time series analyses by several procedures - VARMAX, X12, ARIMA and UCM - in the SAS^® ETS software package to answer this question. The data is series for monthly and yearly sales of cigarettes combined with series for the price of cigarettes and the overall consumer price index together with detailed information on previous changes in the consumer tax for cigarettes. The results indicate a price elasticity of at least 0.6, but the chock effect of a sudden price increase by an increasing tax fades out rater fast.

Anders Milhoj, University of Copenhagen

Session 4092-2020

It's All about the Base-Procedures, Part 2

Its All about the BaseProcedures (a SAS Global Forum 2019 paper) explored the strengths and challenges of commonly used Base SAS^®procedures. It also compared each procedure to others that could accomplish similar tasks. This paper takes the comparison further, focusing on the FREQ, MEANS, TABULATE, REPORT, PRINT, and SQL procedures. As a programmer, whether novice or advanced, it is helpful to know when to choose which procedure. The first paper provided best-use cases, and this paper takes it a step further in its discussion of when to choose one procedure over another. It also provides example code to demonstrate how to get the most out of the procedure that you choose.

Jane Eslinger, SAS Institute

Session 5169-2020

Joinless Join: The Impossible Dream Come True Using SAS^® Enterprise Guide^®, PROC SQL, and DATA Step

SAS^® Enterprise Guide^® and Base SAS^® can easily combine data from tables or data sets by using PROC SQL Join to match on like columns or by using a DATA Step Merge to match on the same variable name. However, what do you do when tables or data sets do not contain like columns or the same variable name and a Join or Merge cannot be used? We invite you to attend our exciting Joinless Join session where we will empower you to expand the power of SAS Enterprise Guide and Base SAS in new ways by creatively overcoming the limits of a standard Join or Merge.

Kent and Ronda Phelps, Illuminator Coaching, Inc

Session 4704-2020

Language Lessons for Data-Driven Decisions: Achieving Data Literacy

As they collect more and more data, organizations everywhere are pushing for more datadriven decision-making. They are hiring data scientists, data engineers, and data analysts to help them harness that data and reach this goal. However, while there is no question that technical programming and statistical skills are critical inputs in this endeavor, they are not enough. When you travel abroad, a phrase book or Google Translate^® can help you navigate-as can learning a few key phrases-but you would not confuse this with knowing the language. The language of data literacy is no different. Speaking the language of data is a skill you need to cultivate like any other. Whether you are the traveler (aka an analytical or data professional) trying to make yourself understood, or the person trying to help those visiting, learning the language is critical. And it is especially so when you are trying to make key business decisions; getting it wrong will result in more than just the wrong sandwich or a detour. Gartner describes this ability to derive meaningful information from data as Data Literacy. This paper aims to explain why Data Literacy matters now and what leaders should do about it.

Emma Warrillow, Data Insight Group Inc.

Session 4871-2020

Lessons Learned from Managing SAS^® Viya^® at Scale in the Cloud

Starting in 2015, SaasNow has been managing SAS^® products in a private cloud in a unique monthly, fully self-service cloud model. Starting with SAS^® vApp technology, SaasNow was one of the first providers to offer a true cloud experience for SAS^® Visual Analytics. With the release of SAS Viya in 2016, SaasNow has been able to support many more SAS products, including SAS^® Visual Data Mining and Machine Learning. Currently, all SAS Viya products and bundles can be automatically deployed on SaasNow within a few hours. Over the years, we have deployed and managed hundreds of environments, ranging from small four core demo environments, to multi-server production setups of up to 128 cores with a wide variety of customer-specific customizations and integrations, like customized authentication (Kerberos or SSO integrations) and external data source connections (SQL and Apache Hadoop). We have developed best practices for sizing, deploying, updating, upgrading, and monitoring many instances of SAS Viya at the same time. This enables us to make SAS Viya available to our customers and partners within several hours and have new releases available within days after they have been released by SAS. In this way, we offer a true cloud experience for SAS Viya. In this paper, we share the lessons learned from deploying and managing SAS Viya at scale.

Jelle Daemen and Tom Dogger, SAASNOW

Session 4778-2020

Let's See Further than One Observation: Applying Hash Objects for Typical Clinical Programming Tasks

It is commonly known that SAS^®software executes the DATA step for each observation iteratively. A common method that is used to compare the current record to data from previous records is to sort using first/last automatic variables and the RETAIN statement. This method is often time-consuming, requires the creation of additional DATA steps and variables and additional sorting, and is not always intuitive or easy to update. This presentation demonstrates an alternative approach to typical clinical programming through the use of hash objects: defining baseline, detecting repeated adverse events, flagging all records from an interval, and fuzzy merging. Hash objects allow for loading a whole data set into memory and for analyzing combinations of observations. The presentation explains techniques for how to navigate the hash table and provide a comparison of execution speed between code containing hash objects and traditional code.

Valeriia Oreshko, IQVIA

Session 5208-2020

Like, Learn to Love SAS^® Like

How do I like SAS^®? Let me count the ways.... There are numerous instances where LIKE or LIKE statements can be used in SAS - and all of them are useful. This paper will walk through such uses of LIKE as: searches and joins with that smooth LIKE operator (and the NOT LIKE operator); the SOUNDS LIKE operator; using the LIKE condition to perform pattern-matching and create variables in PROC SQL; and PROC SQL CREATE TABLE LIKE.

Louise S. Hadden, Abt Associates Inc.

Session 4517-2020

Look Ma, No Hands: How to Train Your Dragon for SAS^®

Have you ever experienced pain in your hands while using a keyboard? Have you wondered how to write SAS^® code without using your hands? Well, you will love this paper. After a SAS accessibility developer received a repetitive strain injury (RSI) diagnosis, she started using Dragon^® NaturallySpeaking (a speech-to-text product) to write SAS code hands-free. In this presentation, she describes tips for getting started with Dragon, demonstrates techniques for training Dragon to understand SAS language vocabulary, and shares a case study for using Dragon with her favorite SAS programming client.

Julianna Langston and Elizabeth Langston, SAS Institute Inc.

Session 5171-2020

Machine Learning Approach to Combat the Opioid Epidemic

The ever-increasing practice of using opioids or street drugs in the United States has caused the rates of mortality from drug abuse to hit the roof. Although prescribed opioids are mainly used for pain relief, there is a prevalence of illegal use of opioids, and the likelihood of becoming dependent on an opioid long-term spikes after just five days of use. Unfortunately, the epidemic has affected almost every age group in every U.S. demographic, but the opioid addiction has been disproportionately affecting older adults living in rural America. The impact is so serious that every day more than 130 Americans die from an opioid overdose. Considering the severity of the problem and the inability of traditional approaches thus far to solve it, we are aiming to engage in the battle against the opioid epidemic using machine learning techniques. In this study, we examine whether there is any correlation between the prescription of opioids and deaths from overdose. Further, our objective is to find correlations among risk factors and identify statistically significant factors in individuals or groups who are susceptible to opioid abuse. Finally, we aim to develop a risk model to predict a patients risk of opioid abuse or death from future opioid use.

Navjot Kaur, Goutam Chakraborty, and Miriam Mcgaugh, Oklahoma State University

Session 4869-2020

Machine Learning Data Analysis for the ERP Adoption and Enterprise Performance with SAS^® Enterprise Miner and Python Scikit-learn

Enterprise resource planning (ERP) is one of the most critical IT investments an organization can make. Previous studies based on a resource-based view or balanced scorecard revealed positive relationships between firms ERP adoption and their after-adoption performance. However, what exogenous factors affect ERP adoption has not been researched much. Thus, predictive modeling for ERP adoption (as the target variable) with various business performance indices (as input variables) still needs to be tested. The purpose of this research is to conduct a machine learning data analysis using SAS^®Enterprise Miner and Python scikit-learn and to compare the results of these methods. We tried various predictive models, including neural networks and others, by using both tools. By searching for the best predictive model, we reveal what business performances or operating factors affect ERP adoption. We obtained the data from the South Korea government statistics organization. From 2006 to 2014, the panel data consist of 256 variables and 102,746 rows. We used 113 input variables. In the early stage of analysis, the best model proved to be the neural network model with input variables selected by a decision tree model. The analysis results are compared year by year. Then, the whole process performed with SAS Enterprise Miner is cloned and executed with Python scikit-learn. Through this approach, we shed light on how business entities can use these two tools complementarily rather than exclusively.

Sunjip Yim and Dr. HoChang Chae, University of Central Oklahoma

Session 5036-2020

Machine Learning Models for Forecasting New and Existing Product Sales Using SAS^® Viya^®

Sales forecasting is an essential activity for the day-to-day operations of a retailer. The product forecasts are used not only for the purchase and allocation decisions by the merchants but also as an input in the assortment optimization process for the stores. The goal of this presentation is to introduce a nontraditional forecasting approach using machine learning techniques to generate the annual store- and product-level sales forecast for a household merchandise and home furnishing retail chain. SAS^®Viya^®is leveraged for developing the machine learning models, and the server-less technology of Google BigQuery is leveraged for the data exploration and input preparation from the large volume of raw data. In this presentation, we lay out a modeling approach using store and product attributes that is not constrained by the availability of historical sales or time series data. Machine learning algorithms such as decision trees, random forests, and gradient boosting, which could be very easily autotuned using Model Studio in SAS Viya, were evaluated and applied to the sales prediction. SAS^®Visual Data Mining and Machine Learning, along with SAS^®Data Preparation, SAS^®Visual Analytics, and SAS^®Model Manager, was instrumental in orchestrating the forecast generation process.

Mallika Dey, Core Compete

Session 4171-2020

Make Beautiful Vertically Scrolling Dashboards in SAS^® Visual Analytics

Tabs are a natural way of creating content-rich business intelligence dashboards. They visually separate distinct ideas and concepts, typically flowing from left-to-right to tell a story hidden in gigabytes of tables full of numbers, words, codes, and acronyms. As users ask for additional views and insights into their data, designers may start finding themselves pressed for space. Some users may want to add large crosstabs, while others may want detailed bar charts with 25 categories. Before SAS^® Viya^®, designers had the following options: 1. Make a new tab with additional visualizations 2. Try resizing all graphs to support the new graph 3. Move visualizations into stacked containers Adding too many tabs and densely-packed stacked containers can make for a difficult and frustrating user experience. With SAS^® Viya^® and SAS^® Visual Analytics 7.5, designers can increase their dashboard real estate by creating vertically scrolling dashboards at the click of a button. This paper aims to show designers how to break beyond the vertical pixel barrier, allowing dashboards to flow naturally not only from left-to-right, but also top-to-bottom

Stu Sztukowski, SAS Institute Inc.

Session 5001-2020

Make Your DO Loop More Efficient

In certain cases, a DO loop does not need to run all of its iterations in order to obtain the correct result. This paper will examine three examples to illustrate when we can or cannot make the DO loop more efficient with no loss of accuracy, through the use of the LEAVE statement.

G Liu, University of Heidelberg

Session 4576-2020

Making Administration Easier by Scripting the SAS^® Viya^®

SAS^® Viya^® offers a set of rich command-line interfaces (CLIs) to directly interact with SAS Viya without using a graphical interface. This paper describes how these CLIs can be scripted and chained together to accomplish administrative tasks. Examples include backing up and then changing the logging levels for SAS^® Cloud Analytic Services (CAS), auditing failed logins, updating themes across a set of SAS Viya reports, batch creating caslibs and adding access controls to them, as well as many other tasks. All of these tasks can be done remotely as well, without having to physically be on a SAS Viya system.

James Holman, SAS Institute

Session 4836-2020

Making Data Actionable: From Concept to Reality

Anyone can make a pretty bar graph, but can you make sound decisions based on what your development staff currently provides? How do you turn a flashy concept into an actionable visualization? Can you see the end result? Will your ideas become reality? Do you have the vision and drive to figure out how to get from today to tomorrow before it becomes yesterday? American mathematician John Tukey once said, "The greatest value of a picture is when it forces us to notice what we never expected to see." What value do you see in your data? And what ideas do you have when you see it? Learn how you can capitalize on your ideas and turn them into reality by blending the internal with the external, leveraging them into a cohesive strategy for both the short term AND the long term. In this session, see the five "Stages of the Spectrum" in action while discovering the difference between impact and influence, and how that plays into making data actionable.

Dr. Joseph W. Perez, NC Department of Health & Human Services

Session 4699-2020

Making Life Easier on the SAS^® End: Best Practices for

Data reports can be produced using SAS to convey the progress and potential needs of an ongoing study to research staff and investigators. It is critical this information is accurate and timely. Tailoring these reports to your team's needs allows for maximum communication. It is imperative to begin by knowing your data, the structure, variable formats, and meanings. What you report is highly dependent on the audience; this paper reviews basic concepts of report production. We cover the most requested types of information for current field studies: cumulative counts over time, frequencies, percentages, missing values, descriptive statistics, and plots. Presentation can be just as important as content. Creating visually appealing output can be done with a few extra lines of code. Including titles, formatting and system options as well as utilizing the SAS output delivery system (ODS) can improve appearance and clarity.

Julie Plano and Keli Sorrentino, Yale University

Session 4630-2020

Making Long Data Wide with a Flexible Macro

When you need to make a wide dataset from multiple sources where the data are long instead, this macro approach is for you. This macro has several variations that allow for customization and the addition of dynamic prefixes to the new variable names. The purpose of the macro was to take periodbased data and put it into a format to be used for predictive modeling. Over 10 different data sources were used and this macro was easily applied to each. See how macros can make your life easier!

Stephanie R. Thompson, Jenzabar / Datamum

Session 5082-2020

Managers: Adapt the SDPDC to Plan Employees' Professional Development

One of the many tasks managers face is professional development (PD) of employees. Professional development matters because it can improve retention, create a promotion pipeline, and decrease overall company expenses. However, determining appropriate PD for SAS^®professionals is especially difficult due to their diverse backgrounds and skill levels. At SAS^®Global Forum 2019, the Self-Determined Professional Development Cycle (SDPDC) was presented as a method for SAS professionals to work through their current skills, identify desired learning, select an appropriate learning option, and evaluate their learning experience. Managers can also use the SDPDC to craft personalized development plans for their employees. This presentation shows managers how to adapt the SDPDC to maximize employees potential and enhance company success. Kelly Smith, Central Piedmont Community College

Kelly D. Smith

Session 4764-2020

Managing "IDLE" Grid-Launched SAS^® Workspace Servers

In SAS Grid Computing, jobs may be running for hours or even days. It can be difficult to determine if these jobs are active or idle. Batch jobs like those submitted using the SAS Grid Manager Client Utility have their status accurately reflected by the Load Sharing Facility (LSF) as batch jobs do not wait for input during execution. Grid-Launched Workspace Servers however, will appear in a RUNNING status to LSF and its associated commands such as bjobs, despite sitting idle waiting on a user's input. The code and techniques given in this paper will describe how to check for how long a GridLaunched Workspace Server has been "idle", waiting on user input. Based on the idle time of the SAS session, a decision can be made to kill the respective LSF job id to free up the job slots from SAS Grid Platform. This technique can be even more helpful for the user who forgets to close their SAS^® Enterprise Guide session before leaving for the day, keeping a job slot busy throughout the night.

Piyush Singh, TATA Consultancy Services Ltd.

Session 4145-2020

Measuring Analytic Performance in Financial Terms for Executives

Better use of analytics is clearly a priority for executives but approving funding for new tools or resources is not. Part of the challenge is justifying the need for even more investment when they don't immediately understand how analytics contributes to the bottom line. Developing the financial justification for analytics that executives respond to starts with measuring analytic results in real-world dollars. Its not about making new discoveries or gaining access to new data. Its about driving financial value. This presentation walks you through the process of developing the business and financial justification for advanced analytics by using examples gathered from customers across numerous industries. Focus is given to building the formulas necessary to measure improvements in operational performance, as well as addressing how these companies are able to achieve the extra bandwidth necessary to make these improvements a reality.

Tho Nguyen, Teradata

Session 4491-2020

Medical Image Analyses in SAS Viya with Applications in Automatic Tissue Morphometry in the Clinic

Imaging and image analytics are indispensable tools in clinical medicine today. Among the various metrics that doctors routinely derive from images, measures of the morphology of tissue structures, including their shape and size, are of key significance. Quantifying tissue morphology and linking those quantities to other clinical data enable clinicians to diagnose diseases and plan treatment strategies. Image segmentation, which classifies image pixels into regions of interest, is an important step in such tissue morphology quantification. However, common segmentation methods involve a process that is either fully or partially manual. Accordingly, these methods can be extremely arduous when you process very large amounts of data. This paper illustrates how to build end-to-end pipelines for automatically deriving clinically significant tissue morphology metrics from raw medical images by using powerful tools that were introduced in SAS^® Viya^® 3.5. Specifically, it shows how you can load medical images and metadata, preprocess the loaded data, build convolutional neural network models for automatic segmentation, and postprocess the results to compute clinically significant 2-D and 3-D morphological metrics. The examples include colorectal liver metastases morphometry in collaboration with the Amsterdam University Medical Center, and normal spinal cord morphometry with data available from the Cancer Imaging Archive, both based on 3-D CT scans.

Courtney Ambrozic, Joost Huiskens, and Fijoy Vadakkumpadan, SAS Institute

Session 4755-2020

Metrics and Medical Writing - We Don't Just Count Words

This presentation introduces the use of various document-specific metrics in support of business objectives and productivity. Topics address within-document metrics as well as group and departmental approaches to quantifying the value added to the drug development process through medical writing expertise.

Daniel Augusto Muriel Ramirez, MSD, Merck & Co.

Dominic De Bellis, PhD, Merck & Co., Inc.

Session 4457-2020

Migrating from SAS^® 9.4 to SAS^® Viya^® 3.5

SAS^® Viya^® brings with it a wealth of new analytic and data management capabilities, as well as a resilient, scalable, and open architecture. Users of SAS^®9.4 need an easy path to migrate their workloads to SAS Viya in order to leverage these capabilities. In this paper, we discuss your options for adopting SAS Viya to complement, and in some cases replace, a SAS 9.4 environment. The options fall into three general categories: interoperability, content promotion, and replacement. We discuss how SAS 9.4 clients can interoperate with SAS Viya analytic servers. Further, we describe how you can promote content like library definitions and SAS Visual Analytics reports from SAS 9.4 to SAS Viya, all the while taking advantage of backward compatibility as you bring SAS programs forward. Finally, we cover scenarios in which you can completely replace SAS 9.4 workloads with equivalent SAS Viya product support. Along the way, we highlight SAS^® tools that can ease the migration process. For example, the content assessment tool profiles your SAS 9.4 metadata and helps you determine which content is ready for promotion to SAS Viya. SAS^® Management Console creates packages that feed into the SAS Viya transfer framework. And, if you/ve already deployed SAS Viya, its backup and restore utility supports migration from one SAS Viya environment to another.

Mark Schneider and Susan Pearsall, SAS Institute

Session 4336-2020

Minimizing Environmental Disturbances: Applying Leave No Trace Principles to SAS^® Programming

It is essential that all components behave appropriately when integrating multiple programs and macros. Although they might have worked fine in their original context, issues can come up in a reuse scenario as the result of either expectations of certain state conditions in the SAS^® environment that are suddenly not in effect, or because some other program has altered the state of the SAS environment. The "Leave No Trace Seven Principles", which focus on outdoor ethics, behavior, and cleanliness, can also be applied to SAS programming. These principles help establish a coding standard that contributes positively to the business continuum, to a business process modernization, or to a migration strategy within a customer environment. This paper presents this rubric, as well as a macro that can be used to create and compare environmental snapshots, in order to help you quickly identify, isolate, and understand the before and after state of your environment.

Carl Sommer, SAS Institute Inc.

Session 4937-2020

Mixed-Effects Models and Complex Survey Data with the GLIMMIX Procedure

Longitudinal data with repeated measures are of great interest in clinical and epidemiology research and are often analyzed using a mixed-effects model. These data are frequently collected through complex survey methods. Thus, the survey features probability weights, clusters, and strata should be included in the analysis to obtain accurate population-based estimates. Fitting mixed models with survey data is still an area of active research, and currently there is no survey analysis procedure in SAS^® for mixed models. As a workaround, researchers can use some of the options available in the SAS GLIMMIX procedure. However, when there are many strata or clusters, the model tends to be computationally intensive and complicated to optimize. In this paper, we explore different methods to account for survey features in mixed models using PROC GLIMMIX. We compare the parameter estimates, 95% confidence intervals, and running time among models, and make some recommendations on the specification of the final model. We illustrate the proposed methods with the Health and Retirement Study survey data. The data analysis for this paper was generated using SAS/STAT^® 15.1 for Microsoft Windows.

L. Grisell Diaz-Ramirez, Kenneth E. Covinsky, and W. John Boscardin, University of California, San Francisco

Bocheng Jing, Northern California Institute for Research and Education

Session 4290-2020

Model Governance for AI: Rethinking Our Approach to Validating and Using Black Box Models

Model governance, including model validation, is a well-practiced discipline. The traditional approaches have served the industry well over the years. These typically involve activities such as backtesting, performing methodology and documentation reviews, benchmarking models, and performing qualitative assessments. A central part of this governance process is in model explainability, where knowing the exact variables and their weights is considered key. In the emerging world of artificial intelligence (AI) and machine learning models, it is often very difficult to understand model components, and traditional approaches might not be ideal. Such new models, known as black box models, require a re-think on how we approach model governance. This paper proposes a new framework for model governance that does not depend on knowing specific details about what is inside the model. Such an approach can enable institutions, particularly regulated financial ones, to be able to use black box models with confidence.

Naeem Siddiqi, SAS Institute

Session 4697-2020

Model-Based Risk-Adjustment in Clinical Outcome Research

To compare clinical outcome across patient groups or health providers, we need to account for the variations from patient-level demographic and clinical treatments. Model-based riskadjustment approach provides strategy to control these variations, which is commonly used in hospital report cards. In the present paper, we will focus on the concept of the modelbased risk-adjusted outcomes using different regression models, including multilevel hierarchical model, according to the type of outcomes.

Jiming Fang and Feng Qiu, ICES

Session 4181-2020

Modeling of Between - and Within-Subject Variances using Mixed Effects Location Scale (MELS) Models

Intensive longitudinal data are increasingly obtained in health studies to examine subjective experiences within changing environmental contexts. Such studies often use ecological momentary assessment (EMA) and/or experience sampling methods to obtain up to 30 or 40 observations for each subject within a period of a week or so. In this presentation, we focus on data from an adolescent smoking study using ecological momentary assessment in which there was interest in examining mood variation associated with smoking. We describe the mixed-effects location scale (MELS) model, which allows covariates to influence both the within-subject (WS) and between-subject (BS) variances, in addition to their influence on the mean. The model also includes a subject-level random effect to the within-subject variance specification. This permits subjects to have influence on the mean, or location, and variability, or (square of the) scale, of their mood responses. Additionally, we allow the location and scale random effects to be correlated. These mixed-effects location scale models have useful applications in many research areas where interest centers on the joint modeling of the mean and variance structure. In this presentation, we describe how the NLMIXED and MCMC procedures can be used to estimate the parameters of the MELS model.

Qianheng Ma and Donald Hedeker, University of Chicago

Session 5021-2020

Modelling Imbalanced Classes

In this study, separate sampling was applied to various modeling procedures to assist in the identification of the most important variables describing mobile phone users who are security compliant. Initial analysis of the data found that only 7% of mobile phone users reported applying security measures to protect their phones and/or their personal information stored on their devices. Due to the class imbalance in the target variable, predictive modeling procedures failed to produce accurate models. Separate sampling proportions were introduced to establish whether classification accuracy could be improved. This study tested target class oversampling ratios of 20%, 30%, 40%, and 50% and compared the results of the models fitted on these data sets to those fitted on the original data where no separate sampling was applied. Models fitted included decision trees, five-fold cross-validated decision trees, logistic regression, neural networks, and gradient boosted decision trees. The results showed that the logistic regression, neural network, and gradient boosted decision tree models produced unstable models regardless of the target class ratios. More stable models, however, were reported for the decision trees and five-fold cross-validated decision trees. Variables found to influence mobile security compliance included age, gender, and various security/privacy-related behaviors.

Humphrey Brydon, University of the Western Cape

Session 4359-2020

Modernize your SAS^® Visual Analytics Environment

In recent releases of SAS Visual Analytics on SAS^® Viya^®, many reporting and dashboarding capabilities have been introduced that make the move to the current version of SAS Visual Analytics very attractive. Because the underlying architecture has changed, this move has to be prepared carefully. We take a closer look not only at how you can promote SAS Visual Analytics reports from earlier releases to the current version, but also at which other object types can be converted to SAS Viya. These include SAS^® Visual Data Builder queries, information maps, SAS^® Stored Processes, and SAS^® OLAP cubes. After attending this session, you should have a comprehensive overview of which parts of your SAS^®9 applications are ready to be promoted to SAS Visual Analytics on SAS Viya.

Gregor Herrmann, SAS Institute

Session 4179-2020

Modernizing Credit Risk Analytics: From Risk Management for Banking into Stress Testing

Due to the continuous evolution of regulations issued around the world1 and the constant advancements in methodology, the analysis of credit, market, and operational risk are complex topics that require tools to be updated during development, implementation, and even during the evaluation phase itself. These tools need to be customized to the specific needs of a variety of financial institutions and, yet they need to leverage a common platform containing common components needed for the analysis. SAS^® Solution for Stress Testing is a powerful and highly customizable environment where several areas of interest (expected credit loss, risk weighted asset, economic capital, and more) can be evaluated. This paper presents some of the details about the evolution of credit risk analytics from SAS^® Risk Management for Banking into SAS Solution for Stress Testing.

Christian Macaro, Rocco Cannizzaro, and Satish Garla, SAS Institute

Session 4950-2020

Monitor Assignment for Students with Disabilities with SAS: Boston Public Schools

A subset of students with disabilities in the Boston Public Schools (BPS) system require a designated monitor (supervisor) to ride the school bus with them. Monitors can ride several bus routes in a given day as long as predetermined rules are satisfied. BPS constructs packages of routes for monitors to make their bids at the beginning of the academic year, with a goal of maximizing the number of routes per package. For a given academic year, BPS manages approximately 3,500 routes scheduled for 625 buses and about 1,350 students requiring different types of monitors. Given the high complexity of this combinatorial problem, an automated system is required in order to solve the problem optimally while satisfying all rules. In this presentation, we discuss the different mathematical formulations and algorithms that were implemented to solve this problem, and we assess the pros and cons of each approach.

Angela Zhang, Boston Public Schools Nabaruna Karmakar, Natalia Summerville, Rob Pratt, and Golbarg Tutunchi, SAS Institute

Session 5012-2020

MOVIE REVIEWS: TO READ OR NOT TO READ! Spoiler Detection with Applied Machine Learning

Would you still watch The Avengers if you found out that Iron Man dies at the end? Would you watch the final season of Game of Thrones if somebody told you Bran was going to be the king of the North? Highly unlikely! Given that many invest a great deal of time and emotions into movies and T.V. shows, having the experience ruined by, otherwise harmless, scrolling through Facebook can be frustrating. Having always been a movie lover myself, I have experienced first-hand how spoilers can take away from the enjoyment of a movie. Along with being detrimental to viewer experience, these revelations may also hurt the entertainment industry by causing people to lose interest in a particular film, resulting in the loss of revenue for the filmmakers.

Sreejita Biswas, Oklahoma State University; Goutam Chakraborty, Oklahoma State University

Session 5142-2020

Moving from Messy Data to a Clean Analytic Dataset: Common Data Management Techniques using SAS ^®

Despite the amount of quantitative research that exists in the social and behavioral sciences, many graduate programs do not offer classes focused on the multitude of steps necessary to manage quantitative data sets. Instead, this skill is often learned through trial and error, with the beginning SAS^®user having to use multiple resources, including but not limited to the plethora of proceedings papers from SAS^®Global Forum and regional users groups, SAS publications, and other SAS friendly resources such as UCLAs Institute for Research and Digital Education. Although these resources are incredibly useful when a SAS user knows what procedure he or she needs, they are less useful for the novice analyst who does not know where to begin. The focus of this presentation is to help guide the novice user through common data management techniques to transform raw, messy data into an analytic data set. The presentation discusses data management processes as basic as getting an external data set into SAS and techniques as advanced as using a macro to examine missing data mechanisms and using PROC SURVEYSELECT to split the data into an exploratory sample and a holdout sample. We illustrate the various processes using Wave 1 public use data from the National Longitudinal Study of Adolescent to Adult Health (Add Health).

Bethany Bell, Raymond B. Smith, and Jason A. Schoeneberger, ICF International University of South Carolina

Session 4180-2020

Multilingual Sentiment Analysis: An RNN-Based Framework for Limited Data

Sentiment analysis is a widely studied natural language processing task, whose goal is to determine users opinions, emotions, and evaluations of a product, entity, or service that they review. One of the biggest challenges for sentiment analysis is that it is highly language-dependent. Word embeddings, sentiment lexicons, and even annotated data are language-specific. Furthermore, optimizing models for each language is time-consuming and labor-intensive, especially for recurrent neural network (RNN) models. From a resource perspective, it is very challenging to collect data for different languages. In this paper, we look for an answer to the following research question: Can a sentiment analysis model that is trained on one language be reused for sentiment analysis in other languages where the data are more limited? Our goal is to build a single model in the language with the largest data set available for the task and to reuse that model for languages that have limited resources. For this purpose, we use reviews in English to train a sentiment analysis model by using recurrent neural networks. We then translate those reviews into other languages and reuse the model to evaluate the sentiments. Experimental results show that our robust approach of training a single model on English-language reviews outperforms the baselines in several different languages.

Ethem Can and Aysu Ezen-Can, SAS Institute

Session 4853-2020

Multinomial vs. Ordinal. Does model selection make a difference?

Regression modeling, a foundational component of data analysis and machine learning, is one of the most highly sought-after skills by employers seeking new data scientists.1 While most data science curricula tend to include regression modeling techniques, the conceptual nuances between theoretical and practical applications can be nebulous. In this paper we use SAS 9.4 and the 2016 Monitoring Futures Survey to demonstrate the utility and effects of model selection on context. That is, the ability of the model to properly communicate the users intended information. A cross-sectional secondary analysis using Pearson's Chi-Square test statistic for independence was conducted on the 2016 Monitoring Futures Study dataset to determine the association between "behavior risk" (v7335) and each of the three predictor variables, "parental communication" (v7254), "time spent alone after school" (loner) and student letter grades (v7221). Next, two logistic models were computer using these predictors. Statistically significant associations were identified, and adjusted odds ratios were produced using both multinomial and ordinal regression models. Output from both models were evaluated and compared to demonstrate the utility of ordinal modeling (and output) as a "higher-view" and generalizing procedure, whereas multinomial models are more appropriate for a more detailed view of group-level comparisons.

Corey Leadbeater, 8People, Inc. and National University

Session 4703-2020

My Sharky Secrets for Telling Fabulous Data Stories

One of the hardest things about presenting data is capturing people's attention! Numbers might prove your point, but how do you get others to care about them? Simple-use data storytelling! Data storytelling enables you to mix narratives with data so that you can maximize your impact. Data storytelling is a popular way to present data. However, many data professionals don't understand the value of using these methods when presenting data. By keeping the message focused, considering the audience, and using a convincing narrative, data storytellers engage and move listeners to act. This powerful technique will help you clearly communicate the business insights found in your data. And, in doing so, you can enable decision-making and create a lasting, positive impact on your organization. Let's make you a data storyteller today!

Tricia Aanderud, Boston Scientific

Jaime D'Agord, Zencos

Session 4493-2020

Neural-Network Based Forecasting Strategies in SAS^® Viya^®

Recent literature indicates that hybrids of machine learning and classical time series models are among the top contenders in accurately forecasting the future. Classical linear models are parsimonious and often perform well, but they are unable to capture nonlinear relationships in the data. On the other hand, machine learning models such as neural networks (NNs) are very good at modeling nonlinear effects. Knowing when and how to use machine learning models might seem difficult, but these decisions can be distilled down to best practices that any analyst can use with little experience. This paper discusses several NN-based modeling strategies available in SAS^® Visual Forecasting software and the important factors to consider in choosing and training a model. The discussion includes key features of the data that inform the decision to use machine learning models, feature generation options to augment the training process, and best practices to fit a robust model. This knowledge will enable you to leverage the advantages of both NN and linear models to achieve more powerful forecasts.

Steven C. Mills, SAS Institute

Session 4546-2020

Next Steps: Important Considerations for Moving Your Data and Formats into CAS

Now that your SAS^® Cloud Analytic Services (CAS) environment is available, you are ready to start moving your SAS data sets and formats from SAS^® 9.4 to CAS. However, there are a number of considerations to think about before, during, and after you move your data and formats. This paper, written for the beginner CAS user, focuses on three main considerations in the context of examples that illustrate how to load SAS data sets and formats. The first consideration that is discussed is the likely change in character encoding from LATIN1 or WLATIN1 to UTF-8 and how to prevent possible truncation of data. Next, the advantages of loading data into CAS in parallel versus serially is discussed. After you load your data, you might see that the size of the loaded table is larger than the source data set. This paper explains how the increase in size affects memory and how to reduce the size of a loaded table. The most effective way to discuss these considerations is to put them in the context of examples that show how to move your SAS data sets and formats to CAS. This paper covers two main examples. The first example shows how to use the CASUTIL procedure to load a data set. The second example illustrates how to move a format catalog to CAS by using the CNTLOUT= and CNTLIN= options in the FORMAT procedure statement. This example also shows how to save the format so that the format is available for subsequent CAS sessions.

Kevin Russell, SAS Institute

Session 4429-2020

NLP with BERT: Sentiment Analysis Using SAS^® Deep Learning and DLPy

A revolution is taking place in natural language processing (NLP) as a result of two ideas. The first idea is that pretraining a deep neural network as a language model is a good starting point for a range of NLP tasks. These networks can be augmented (layers can be added or dropped) and then fine-tuned with transfer learning for specific NLP tasks. The second idea involves a paradigm shift away from traditional recurrent neural networks (RNNs) and toward deep neural networks based on Transformer building blocks. One architecture that embodies these ideas is Bidirectional Encoder Representations from Transformers (BERT). BERT and its variants have been at or near the top of the leaderboard for many traditional NLP tasks, such as the general language understanding evaluation (GLUE) benchmarks. This paper provides an overview of BERT and shows how you can create your own BERT model by using SAS^® Deep Learning and the SAS DLPy Python package. It illustrates the effectiveness of BERT by performing sentiment analysis on unstructured product reviews submitted to Amazon.

Doug Cairns and Xiangxiang Meng, SAS Institute

Session 4509-2020

Noninvasive Beehive Monitoring through Acoustic Data Using SAS^® Event Stream Processing and SAS^® Viya^®

Honey bees are critical pollinators; their demise would probably be disastrous for human beings. Thus, maintaining healthy bee colonies is vitally important. Beekeepers usually monitor the status of beehives by performing manual examinations in order to check whether the queen bee is missing or to look for any other potential problems. But not only are manual hive inspections time-consuming, they are also disruptive to the colony. Research in computational bioacoustics has discovered connections between the sounds in the hive and different behaviors of bees. Yet, to date, no automatic acoustic monitoring system has been completely successful. Acoustic data from beehives are messy. With many thousands of bees performing an array of time-varying tasks, and with external sounds from birds, crickets, cars, trains, and other sources, it is difficult to make the link between sound recorded in a beehive and hive health. This paper shares our progress in developing a bioacoustic monitoring system. Along the way, we illustrate the usefulness of the digital signal processing tools and machine learning algorithms available in SAS^® Event Stream Processing software and SAS^® Viya^® to noninvasively monitor the real-time condition of a beehive. The technical aspects of the acoustic monitoring system described here are part of a larger effort at SAS to monitor the four beehives at the Cary, North Carolina, campus headquarters with many different sensors.

Yuwei Liao, Anya McGuirk, Byron Biggs, Arin Chaudhuri, Allen Langlois, and Vince Deters, SAS Institute

Session 4653-2020

Obtaining National Readmission Estimates: Examining Readmission Rates in the United States

In the past several years, there has been increased focus from healthcare stakeholders in reducing the rate of inpatient hospital readmissions. Policies, including the Hospital Readmissions and Reduction Program (HRRP) from the Centers of Medicare and Medicaid Services (CMS), have placed financial penalties on hospitals with excessive readmission rates. The Nationwide Readmissions Database (NRD) is a comprehensive nationwide database that enables users to obtain population-based national estimates of readmissions. With a claims-based data structure, this database allows for analysis of readmission trends, analysis of primary cause of readmission, or even analysis of time to readmission. This E-Poster shows how to obtain population-based estimates and trends by using both PROC SQL and the SAS/STAT^® Survey Sampling and Analysis Procedures. Emphasis was placed on working with the claims-based data structure of the NRD and obtaining accurate estimates of standard errors for readmission rates. Examples involve readmissions in thyroidectomies, one of the most commonly performed procedures, as well as Transcatheter Aortic Valve Replacement (TAVR), a newly developed approach to heart valve replacement.

Patrick Karabon, Alexander Balinski, and Rachel Patterson, Oakland University William Beaumont School of Medicine

Session 4526-2020

ODS: It's Not Just for Tables Anymore. Using Formatted Text and Lists in Your Reports

You're used to seeing tables in your ODS reports, but maybe you need that extra something to drive your results home. Did you know that you can add formatted text and lists to your report? With PROC ODSLIST and PROC ODSTEXT, you can add richer descriptions to your reports, or even generate data-driven lists and blocks of text. The examples in this paper demonstrate several ways to display your output. Whether you need to create a form letter or an infographic, PROC ODSLIST and PROC ODSTEXT can help you generate the report that you want.

Scott Huntley, SAS Institute

Session 4708-2020

Open Source Python & R Lang on our SAS Shared Grid (SSG)

In November 2018 the requirement was received to enable Open Source Python and R Lang on our SAS Shared Grid Environment to provide enhanced functionality for our Data Science community. Our platform is one of the Biggest SAS Environment's in the world. In May 2019 the functionality was enabled on our SAS platform following successful User Acceptance Testing. This functionality along with SAS' interoperability with other data source types has provided new opportunities for our business users.

Thomas Ball, Christopher Hughes, NatWest Group

Session 4402-2020

Open-Source Model Management with SAS^® Model Manager

Open-source models that are developed in Python, R, TensorFlow, and so on, are increasingly important to organizations that produce and deploy analytical and machine learning models. Not only are the models created using open-source tools, they are deployed to open-source environments that use Docker and Kubernetes in place of more traditional environments. SAS^® Model Manager is evolving to be a management platform that handles traditional SAS models and open-source models as equal partners. This paper discusses strategies for managing the life cycles of Python, R, and TensorFlow models using SAS Model Manager.

Glenn Clingroth, Hongjie Xin, and Scott Lindauer, SAS Institute

Session 4978-2020

Optimal Use of Extended Data Types, Memory Mapping in SAS Viya 3.3

The major challenge for business users these days is optimally ingesting data in terms of size, time, and performance. This presentation focuses on the optimal use of SAS^®Cloud Analytic Services (CAS) and how SAS^®Viya^®users will benefit from using a set of rules to achieve their goals in the most efficient manner. A cognitive approach to data ingestion was used in SAS Viya, which can handle the data bloating and load time that in effect improves the performance of SAS^®Visual Analytics in SAS Viya. To overcome data-related challenges, we have leveraged the capabilities of SAS Viya, like powerful new table indexing, block mapping, memory mapping, extended data types, and native data formats capabilities, that can significantly improve performance for data handling and analytics actions submitted to CAS. This presentation highlights the performance benefits obtained using these approachesfor example, data are compressed from 180 GB to 60 GB; CPU and elapsed time reductions are reported by measuring report performances; and the CPU times could be up to three orders of magnitude greater than without mapping, which in effect reduces operational costs. The following examples are included: Optimal use of extended data types Indexing of appropriate variables while loading data into CAS Optimal use of hdat conversion Memory mapping and block mapping Handling special characters in the data Indexing in SAS Viya versus SAS^®9.4 Performance comparisons

Saurabh Tripathi, Pranay Barua, Ayush Tiwari, Core Compete

Session 4582-2020

Optimize SAS^® Viya^®: Dividing Load through the Use of Multiple SAS^® Cloud Analytic Services Servers

Analytic workloads mixed with data preparation and visualization can put SAS^® Viya^® to the test. Learn how to optimize your investment and orchestrate different workloads by leveraging multiple SAS^® Cloud Analytic Services (CAS) servers. In this paper, we explain setting up and administrating additional CAS servers to keep processes optimized. Once the CAS servers are set up, we cover data administration and securing the data for the intended workload. Orchestrating the workload on SAS Viya keeps your general SAS^® Visual Analytics report consumers and machine learning data scientists from stepping on each other's toes.

Brandon Kirk, Jerry Read, Jason Shoffner, SAS Institute

Session 4301-2020

Optimizing Collection Strategies with SAS Intelligent Decisioning

Debt collection is an important part of asset resolution and credit loss mitigation for many institutions. Traditionally, collection decisions are based primarily on delinquency status, product type, and collateral type. In many cases, this can be a complicated and time-consuming process. As a result, recovery rates can be lower than expected due to lower collection revenue and high recovery costs. In this paper, we introduce a new collection decision process that identifies the optimal collection channel with the action using recovery action scores. The recovery scores estimate the "expected payoff" based on both customer-level risk models and channel propensity models incorporating cost variables. Micro-segments for customers are further developed to assist the mapping of specific actions to different customers. The approach is simple and intuitive, and we illustrate it with a case study applicable to both financial services and telco in the SAS^® Intelligent Decisioning solution environment. This new action/channel scoring approach can help an institution design and optimize its collection strategies in order to improve recovery rates and collection revenue while reducing collection time and costs.

Luiz Kauffmann and Sunny Zhang, SAS Institute

Session 4535-2020

Optimizing Supply Chain Robustness through Simulation and Machine Learning

This paper introduces a supply chain simulator that has been built using SAS^® Simulation Studio. The key features of the SAS^® simulation technology, which enable the development of digital supply chains and the analysis of thousands of scenarios to perform risk-and-return tradeoff, are discussed. The paper concludes with a description of how computational efficiencies can be achieved through an integrated use of SAS Simulation Studio and SAS^® Visual Data Mining and Machine Learning.

Bahar Biller and Jinxin Yi, SAS Institute

Session 4821-2020

Organize and Manage Files by Using SAS^® Programming

In daily work, we often save different types of files on physical servers or hard disks. Statistics show that 7.5 percent of an organization's documents are lost entirely, while another 3 percent are misfiled. As time goes by, the servers or hard disks would get cluttered. We may want to regularly organize, archive or delete some old files to release space. Many applications can do these, SAS^® software being one of them. SAS is not only an analytics, business intelligence and data management software, but also a powerful tool that can make your housekeeping work more efficient. Together with SAS^® Management Console Scheduler or UNIX crontab, you can set up automated process to regularly scan your physical server or hard disk. The process can send you a report of all the files in a directory. It can delete files of a size over a threshold or having not been modified for some days. It can remove duplicate files. It can archive files. It can split and merge files. In a nutshell, it can do everything that a file management software can do.

Weibo Wang, TELUS Communications

Heyang Li, Montclair State University

Session 4230-2020

Parsing - Using SAS When the Data Are Hiding in a Non-Standard Format

Sequential files? Spreadsheets? Databases? There are numerous tutorials that instruct the SAS^® user in techniques to extract data from standard sources. Sometimes, however, the desired data is hidden inside a non-standard source; information may be found within the flow of a text document, for example. This presentation will address some techniques that can be used when not dealing with cleanly formatted data, through use of an example where data are found within a free-form text file. It will deal with identifying what can be considered useful data and what can be discarded, then tackle techniques to extract the data for further analysis, reporting, or whatever is the desired result. Please note that this paper covers many of the items covered in "Parsing Useful Data Out of Unusual Formats Using SAS", which the author first wrote and presented approximately 10 years ago. This paper uses a new example for use as a learning tool. It has been modified to use a new approach; this paper introduces concepts and commands when they are needed to deal with the example, rather than the prior technique of providing numerous commands, and then utilizing some of them to solve the example.

Andrew T. Kuligowski, Independent Consultant

Session 4357-2020

Performance Do's and Don'ts for SAS^® Visual Analytics

When you ask users of business intelligence applications what are the important factors that determine whether they accept the solution, usability and performance are frequent mentions. While usability is predefined by the design of the application, performance is determined by a variety of aspects. These range from tuning the compute server to optimizing data structures for reporting over adjusting report design to the choice of the browser that runs SAS^® Visual Analytics. In this paper, I describe the most important adjusting screws that report authors and administrators should look at if they want to improve overall performance of SAS Visual Analytics.

Gregor Herrmann, SAS Institute

Session 5162-2020

Piecewise Latent Growth Curve Models to Test for Discontinuities in Disease Prevalence Trends

The International Classification of Diseases (ICD) is a standard coding and classification system of diagnoses and health conditions developed by the World Health Organization (WHO). The ICD has changed over time; different versions of the ICD are used in health data repositories of hospital and physician service records that contain diagnosis codes. These changes can result in discontinuities in prevalence estimates for health conditions, such as hypertension, arthritis, and dementia, as the meaning of a condition changes over time. Our research purpose is to demonstrate piecewise latent growth curve models to test for discontinuities in ICD-coded data. We apply these models to health data from one Canadian province (population 1.2 million) for cardiorespiratory health conditions captured in ICD versions 8, 9, and 10. Two transition points are considered as a potential source of discontinuity: the transition from ICD-8 to ICD-9 (Clinical Modification) in 1980 in both hospital and physician records and the transition from ICD-9 to ICD-10 in 2004 in hospital data. We investigated piecewise latent growth curve models with the following characteristics: linear or nonlinear trend within each time segment before and after the transition points, discontinuity with or without an elevation (i.e., change in slope) at the transition point. We also show how SAS^® generalized linear mixed model procedures can be used to fit piecewise latent growth curve models and identify the best fitting model. Testing for unexpected changes in chronic disease prevalence estimates over time is important to produce accurate regional and national chronic disease surveillance.

Lin Yan, Stephanie Goguen, Lisa M. Lix, University of Manitoba

Session 4425-2020

Pies and Donuts: A New SAS^® ODS Graphics Procedure Dessert

Pie charts are a very common graphical display, particularly when showing part-to-whole comparisons. Previously, you needed to use the Graph Template Language (GTL) to create this display in the SAS^® ODS Graphics system. But now, with the SGPIE procedure, pie and donut charts are a "piece of cake" to create! This paper focuses not only on the features of this new procedure, but also on the effective use of pie charts in general.

Dan Heath, SAS Institute

Session 4668-2020

Play More, Achieve More: Training the Next Generation of Data Analysts Through Strategic Gaming

How did you learn to program? Most of us would answer this question by pointing to on-thejob experience. Documentation and instructional videos have ample value, but we all know that the best way to learn something is to jump in and do it. However, building a comprehensive programming skill set can take years' worth of individual experiences, and, with the ever-changing landscape of technical innovation, the expectation for analytical programmers to learn faster is increasing. Recent research indicates that gaming can be an effective way to learn new skills because games encourage an emotional investment that traditional training methods do not. This paper and presentation demonstrate a gaming solution based on SAS^® for introducing a new data analyst to the most common tasks they will encounter on the job, including skills every analytical programmer should learn: simulating a data set by following a specification document, the basics of visualizing data, implementing common statistical modeling techniques, and using macros to streamline an analysis, all while being transported to the different 'worlds' of R and Spotfire along the way.

Hillary Graham, Eli Lilly and Company

Session 4368-2020

PNetwork Pattern Match for Identifying Fraud with SAS^® Visual Investigator

Network analytics plays an increasingly important part in detecting and investigating criminal activity and regulatory compliance issues. SAS^® Visual Investigator provides a user interface that, among other things, facilitates the investigation of network activity. These networks are often a combination of relationships formed through common entities referred to in the data (for example, people, locations, and so on) as well as events that tie these entities together. If complex organized activity is detected, a key question an investigator might have is, "Does this activity occur elsewhere in my data?". SAS^® provides the NETWORK procedure to analyze graph data, and at SAS^® Global Forum 2019, the SAS Cloud Analytic Services (CAS) action pattern Match was introduced, which executes graph queries. Its functionality enables you to search copies of a query graph within a larger graph, with the option of respecting node or link attributes (or both). This paper presents a way for SAS Visual Investigator users to dynamically identify a pattern of interest within their existing investigations, and from that pattern, generate alerts for other occurrences of this activity, thereby putting the ability to answer the key question in the hands of the investigator.

James Morris and Nicholas Ablitt, SAS Institute

Session 4676-2020

Post-9/11 GI Bill: Teasing Out Insights about Veterans'

This paper (1) examines the challenges researchers face in using publicly available databases to track veterans use of GI Bill benefits and their success in earning postsecondary credentials, and (2) presents the methodology and findings of several Veterans Education Success reports that, despite these challenges, teased out important insights on veteran outcomes.

Walter Ochinko, Veterans Education Success

Session 4492-2020

Power to the Report Viewers

In this paper, we explore how you can take the guesswork out of creating reports by giving report consumers the ability to make reports their own. Based on the controls that you give consumers, they can change what they see, by doing things like toggling legends and changing chart types. With even more power, they can create self-service queries like filters, ranks, and custom groups, and even change business metrics-all with undo and redo. You'll learn the different levels of control report authors can provide and all the great ways you can customize a report you're reading just for you.

Atrin Assa, SAS Institute

Session 4293-2020

Practical Geospatial Analysis of Open and Public-Use Data

As a geospatial data scientist, you might want to analyze publicly available data sets from several sources, including kaggle.com, data.gov, and other academic or scientific organizations. Analyzing open and public-use data sets such as these poses many challenges, including disparate data formats (ESRI shapefile, NetCDF, GeoJSON, and so on), disparate geographic references, and the use of street address data instead of actual spatial units. This paper presents a variety of real-world examples of public-use geospatial data to show you how to transform the data into a prespecified geographic reference, import the data into SAS® Viya® via a Python interface, and select suitable subsets that can be convenient for data analysis. These subsets are then analyzed by using SAS/STAT® procedures KRIGE2D, SPP, and GLIMMIX. The paper also demonstrates how to map the estimates from these procedures by using the Base SAS® mapping procedure SGMAP invoked from Python. Examples in the paper show you how to use Python as a common programming environment to perform data management functions by using the SAS Scripting Wrapper for Analytics Transfer (SWAT) interface to SAS Viya and how to perform statistical analysis by submitting procedure calls via the SASPy interface to SAS software.

Pradeep Mohan, SAS Institute Inc.

Session 4992-2020

Predict Freezes in Natural Gas Pipelines by Using SAS^® Machine Learning and Python Scikit-Learn

In 2018, natural gas was the largest source of energy production in the United States, accounting for 31.8% of all energy production. Midstream companies are responsible for transportation and processing of natural gas. One of the biggest problems facing midstream companies is meter freezing. A meter freeze is caused by the presence of natural gas hydrates (a high-pressure form of frozen water and natural gas molecules) in the tap valves, gauge lines, or manifold so that the differential sensing element is effectively no longer connected to the flow line. Because one side of the manifold is likely to freeze before the other, small changes in line pressure on either side of the orifice can create large changes in the indicated differential pressure record. The differential may read either high or low as a result of meter freeze, and the effect may cause substantial measurement error. Currently, data analysts spend hundreds of man-hours looking through data for possible meter freezes, and this manual process is time-consuming, tedious, and error-prone. Thus, the goal of our project is to use advanced machine learning models such as random forests and neural networks to identify meter freezes. We use SAS^®Enterprise Miner and Python scikit-learn and compare the results from these two applications. Successful models should be able to correctly classify meter freezesa result that will eventually save midstream companies millions of dollars in revenue and man-hours.

Bingi Arnold Kanagwa M.S. and Ho -Chang Chae, PHD, University of Central Oklahoma

Session 4101-2020

Pricing Life Insurance Contracts with Early Exercise Features Using Neural Networks

This paper describes an algorithm based on neural networks to price life insurance contracts embedding American options. We focus on equity-linked contracts with surrender options and terminal guarantees on benefits payable upon death, survival, and surrender. We use the Monte Carlo approach to generate artificial sample paths of the processes. We then use the least squares neural networks regression estimates to estimate from this data the so-called continuation values, rather than the ordinary least squares (OLS) estimates. For the purpose of fitting the neural network, we use the NEURAL procedure in SAS/IML^®. The results from this investigation showed that neural network regression estimates provide adequate improvements to the OLS estimates and are thus useful additions to pricing option embedded insurance products.

Helgard Raubenheimer, Neels Geldenhuys, and Jason Tappan, Centre for Business Mathematics and Informatics, North-West University

Session 5110-2020

Principal Component Analysis Demystified

Have you used or thought of using Principal Component Analysis (PCA) as a feature extraction method in your machine learning pipelines, but wished for a better understanding of what a principal component is and how it's obtained? We take a deep dive into a small dimensional data set, present a visual explanation of the role played by eigenvalues and eigenvectors when PCA is applied, and illustrate how what you start with leads to what you end with, what the advantages are, and what could get lost along the way.

Caroline Walker, Warren Rogers LLC

Session 4922-2020

Programmatic Tools for Parallel Processing in SAS^®

This poster details three programmatic tools for increasing efficiency while creating consistent output in the SAS^®Grid environment. The utility of these tools is detailed with examples from processing administrative healthcare claims data. The first tool that provides consistent output from consistent input data is the SAS macro facility. Macro functions can increase efficiency by reducing both the need to write or copy/paste code and the risk of typographical errors when reusing code. In the SAS Grid environment, users must manually create and save all programs that reference a separate program that contains a macro function. The second tool is another way to leverage the SAS macro facility to create a single program that writes a number of other programs via loops. As with the traditional use of SAS macro functions, this tool enables users to generate consistent results with consistent processes. Autogenerating multiple SAS programs in this manner considerably reduces time otherwise spent manually saving new programs with minor parameter changes. The third tool is a SAS program that can extract pertinent information from multiple logs at once. Several published articles include code that extracts important messages from multiple .log files at the same time. This presentation also features code that extracts timings from batch log files to identify bottlenecks in processing and document program durations for time estimation purposes.

Leah Durbak, The Urban Institute

Session 4942-2020

Propensity Score Matching with Survey Data

Clinicians and healthcare researchers often use national-representative surveys to conduct observational studies. Propensity score methods are one of the popular methods to draw estimates for key exposures, treatments, and interventions. It is surprising that propensity score methods have not been ubiquitously incorporated in survey observational studies. Some authors have proposed incorporating sampling weights as a covariate in the propensity score model, while others have used sampling weights as survey weights, that is, a weighted model. Thus, there are still inconclusive results on which method would account best for complex survey design features. In our paper, we used five different methods to calculate propensity scores that account differently for survey design features. We used both matching and weighting methods to adjust the outcome model. It was unclear which method of propensity score calculation was better. For binary or survival outcome, the key exposure seems to be insignificant in both matching and weighting outcome models.

Bocheng Jing, Northern California Institute for Research and Education L. Grisell Diaz-Ramirez, W. John Boscardin, and Michael A. Steinman, University of California, San Francisco and the San Francisco VA Medical Center

Session 4157-2020

Python and the SAS^® Quality Knowledge Base for Better Data Quality and Entity Resolution

Python coders can now leverage the power of the SAS^® Quality Knowledge Base and dramatically improve their data quality and data matching results. This session explores the capabilities now available to Python coders and gives coding examples and demonstrations showing how to leverage SAS Quality Knowledge Base capabilities such as parsing, standardization, and match-coding to better prepare data for analytics. Techniques for entity resolution and duplicate elimination are also explored.

Arnold Toporowski, SAS Institute (Canada)

Session 5027-2020

Raising the Bar! Developing World-Class Reports with SAS^® Visual Analytics^®

SAS^® Visual Analytics 8.x provides around 40 different visuals out of the box. This gives all users, both novice and advanced, a lot of potential ways to present their data. However, as the reporting requirements and dashboards develop, the default visuals might not always be enough. This paper covers three exciting ways to enhance and extend the default visuals in SAS Visual Analytics. The first method starts with default visual and explores options in SAS Visual Analytics to customize the appearance. The second method looks at using the custom graph builder to extend the default visuals to represent your data in the required way. The last method showcases how to include a completely new custom visual to SAS Visual Analytics, using the D3.js JavaScript framework for visualization. This paper is for SAS Visual Analytics users of any ability, and the techniques covered are applicable in SAS Visual Analytics 8.x on SAS^® Viya^®.

Lucy Smith, Amadeus Software Ltd

Session 4654-2020

Rare Events or Non-Convergence with a Binary Outcome? The Power of Firth Regression in PROC LOGISTIC

Rare Events and separation are both common analytical challenges encountered whenworking with a binary variable. Problems with convergence of a logistic regression modeldue to complete separation is a particular challenge. Firth's Penalized Likelihood is asimplistic solution that can mitigate the bias caused by rare events in a data set. Called bythe FIRTH option in PROC LOGISTIC, this method will even converge when there is completeseparation in a dataset and traditional Maximum Likelihood (ML) logistic regression cannotbe run. The implementation of the Firth method is straightforward in SAS^® and hasadvantages as compared to other potential methods, including Fisher's Exact test,traditional ML logistic regression, and Exact logistic regression.This paper briefly introduces the Firth method and discusses the advantages of this methodcompared to other methods. In addition to the introduction, multiple applications of datawill be used to show SAS^® Users when Firth's Penalized Likelihood method might be a goodanalytic strategy. The applications also will show how to apply Firth's method and provide comparisons between Firth's method and other methods.

Patrick Karabon, Oakland University, William Beaumont School of Medicine

Session 4657-2020

Reaching Your Potential as a Statistician and Leader

In year's past, a popular question in interviews, mentoring sessions or goal-setting discussions was "Where do you hope to be in five or ten years?" Many statistical leaders will tell you that the path of their career was not what they planned, if there even was a plan. But they can tell you some skills and/or experiences that were instrumental in helping them achieve advancement and success. So, perhaps a better question is "How will you advance your skills so that you can grow as a statistician, take on new roles and challenges, and continue to contribute to your organization and/or profession in a way that is fulfilling and rewarding?" Simply put, "How will you reach your potential as a statistician and leader?" This presentation will provide insights and guidance to this question through personal experiences and leadership study. Concepts that will be discussed include networking, business acumen, strategic thinking and teamwork. The presentation will also provide ideas for actions you can take to move forward in reaching your potential as a statistical leader.

Gary R. Sullivan, Espirer Consulting

Session 4839-2020

Read Before You Read: Reading, Rewriting & Re-Reading Difficult Delimited Data in a Data Step

Loading delimited data within the DATA step can get interesting/frustrating quickly if you have quirky data. The abrupt appearance of a delimiter in an unquoted character field shifts all following fields to the right; the random removal of one shifts fields to the left. Left unchecked, the line will not get processed correctly, as SAS^® attempts to read each separated field using its neighbour's INFORMAT. Thankfully, there is potentially a way to spot issues like these, namely via the "INPUT @" statement. What's more, it may also be possible to correct them on-the-fly by directly modifying the "_INFILE_" automatic variable. This additional coding can be injected into the existing DATA step code such that the original INPUT statement(s) can continue to function properly even when faced with the difficult delimited data. This paper provides an in-depth exploration of the approach outlined above. Readers can immediately test out this concept using the supplied code. Other potential workarounds are also touched upon. After digesting this information, readers will possess another method to ingest raw data elegantly into a SAS dataset.

Michael Chu, TD Bank

Session 4923-2020

Reading in a Comma Delimited File with a Data Dictionary

Comma-separated values files (CSV) are one of the most common files that SAS programmers import to create SAS datasets. When using PROC IMPORT, user defined formats and labels will be lost and the programmer needs to establish informats, formats and labels to each variable. This process is typically done manually, and it is highly time consuming if the dataset has many variables. This process can be automated through storing meta data in an excel file and writing a simple SAS program to combine the meta data with the raw data. A dynamic approach has been developed for reading large raw CSV files using simple data step statements understandable to beginner programmers.

Kalyani Telu, Henry Jackson Foundation

Session 4940-2020

Real-Time Call Analytics Using SAS^® Viya^® and Open-Source Platforms

IVR calls in call centers is one of the primary engagement channels for customers to interact with service and product provider companies. Call data is very rich and can play a big role in improving customer experience e.g. provide real-time agent recommendation and customer feedback. This data can also be used for post-call analytics to help drive effective and efficient future handling of calls and improve agent performance thereby meeting company service level agreements (SLA) e.g. first call resolution. Customer call interaction can be captured by converting audio transcripts to text. However, there are very few systems that are capable of manually converting audio transcripts to text on a real-time basis. SAS^® Viya^® provides a variety of new capabilities, one of these capabilities being voice transcriptions and its execution on cloud on a real-time basis. Viya also supports interactions with other open source technologies which extends machine learning solution development capabilities and supports real-time API integration with ease.

Ranjit Jangam and Anvesh Reddy Minukuri, Comcast

Session 4286-2020

Recent Developments in Survival Analysis with SAS^® Software

Are you interested in analyzing lifetime and survival data in SAS^® software? SAS/STAT^® and SAS^® Visual Statistics offer a suite of procedures and survival analysis methods that enable you to overcome a variety of challenges that are frequently encountered in time-to-event data. This paper brings you up to date on new approaches and procedures in SAS software and provides an overview on when to use these procedures to overcome the challenges inherent in conducting survival analysis in today's world. Procedures and methods that you will learn about include performing model selection and model comparison with the PHSELECT procedure, fitting the Fine and Gray model and fitting Bayesian frailty models with the PHREG procedure, analyzing accelerated failure time models with the LIFEREG procedure, and handling interval-censored data with the ICLIFETEST and ICPHREG procedures. You will see how to deal with nonproportional hazards by using the LIFETEST procedure or the RMSTREG procedure.

G. Gordon Brown, SAS Institute

Session 5207-2020

Reducing Ecological Footprint: Examining Panel Data

Degradation of the global environment is occurring at an alarming rate. It is feared by many that the current path of economic development is unsustainable. In this paper, worldwide panel data is explored to develop a model that can be used to find characteristics that are associated with a nation's ecological impact, as measured by Ecological Footprint Accounting. Through panel regression in SAS^®, several factors are found that are associated with a reduction in Ecological Footprint among OECD countries after controlling for GDP and population size. These findings point to general policy objectives that can be used by industrialized and industrializing countries to reduce their ecological impact.

Neil Belford, Jordan Humes, Manasi Murde, and Bruce Rehburg, Oklahoma State University

Session 5172-2020

RegExing in SAS for Pattern Matching and Replacement

SAS^® has numerous character functions which are very useful for manipulating character fields, but knowing Perl Regular Expressions (RegEx) will help anyone implement complex pattern matching and search-and-replace operations in their programs. Moreover, this skill can be easily portable to other popular languages such as Perl, Python, JavaScript, PHP and more. This presentation will cover the basics of character classes and metacharacters, using them to build regular expressions in simple examples. These samples range from finding simple literals to finding complex string patterns and replacing them, demonstrating that regular expressions are powerful, convenient and easily implemented.

Pratap Singh Kunwar, The EMMES Company, LLC

Session 5043-2020

Relationship analysis and customer churn forecasting in a financial cooperative institution

The analysis of financial institutions, facing the new era of information access technologies, opened several ways to be addressed and explored, in particular to increase understanding about clients needs and improve the relationship with customers. In view of this, the objective of the present paper is to analyze the exit of clients in a financial cooperative, in order to understand the behaviors related to churn and identify in advance its possible occurrence. The development of the analysis was done following the steps of the SEMMA methodology. SAS^® Institute defines SEMMA as a data mining process of Sampling, Exploring, Modifying, and Assessing large amounts of data. The analyzed clients were initially separated in three clusters, using the K-means algorithm. In each group, a predictive modeling was done with algorithms of Random Forest, Decision Tree and Logistic Regression. Predictive modeling provided greater understanding about churn incidence reasons. One of the clear points in this regard is that the constant use of products offered decreases the probability of the client leaving the institution. In addition, business rules were created and their application was performed in conjunction with the best results achieved with the predictive modeling in order to provide a better churn classification.

Carolina Silva, Sicoob

Brunno Sousa Ramos, Brazilian Air Force

Session 4070-2020

Resources for SAS Admins and Architects: What You Need to Know to Get the Job Done Hassle-Free

As a SAS^® administrator or architect, your customers expect you to know everything related to SAS. And there's a lot to know. What is available to you, and where do you find all this information? Attend this discussion and you'll walk away with a multitude of resources to help you and your customers succeed with SAS.

Shelley Sessoms, SAS Institute Inc.

Session 4426-2020

REST Just Got Easy with SAS^® and PROC HTTP

Hypertext Transfer Protocol (HTTP) is the lifeblood of the web. It is used every time you upload a photo, refresh a web page, and in most every modern application you use today from your phone to your TV. A big reason why HTTP has become so ubiquitous with modern technology is because of a software architectural style known as Representational State Transfer (REST). REST has become the standard for interacting with independent systems across the web, which means that the consumption of REST APIs is, or is going to be, mandatory. The HTTP procedure in SAS^® has enabled developers to use REST APIs simply and securely for many years now, but the demands of current systems keep increasing. PROC HTTP has been given quite a few updates to make using this great procedure even easier than it ever was before.

Joseph Henry, SAS Institute

Session 4318-2020

Risk Modeling on the Fast-Track: Pythonistas (and Others) Harness the Power of the SAS^® Risk Engine

With the availability of the SAS^® Risk Engine on SAS^® Viya^®, SAS^® provides a first-class platform for developers to create and deploy risk models. The SAS Risk Engine is a complete, scalable solution that delivers a comprehensive set of risk models and risk metrics. The SAS Viya version of the engine accelerates development of custom risk applications using Python and other languages. Developers can focus their effort on the specific value-added algorithms related to their business context while leveraging key features from SAS, like risk simulation and stress testing. With the SAS Risk Engine on SAS Viya, the path from designing your model to deploying it in a high-performance environment has never been easier. In this paper, we demonstrate the power of developing risk applications in Python based on the SAS Risk Engine with examples from both market and credit risk.

Dave Stonehouse, Joshua Johnstone, and Katherine Taylor, SAS Institute

Session 4435-2020

RSUB, CLI for SAS^® Server Environments

RSUB is a command line interface, written in Java, which takes advantage of SAS Integration Technologies to fill a gap in SAS 9. NORC at the University of Chicago performed a migration from PC SAS to a clustered SAS Server Platform without SAS/Grid. This left many users accustomed to performing batch processing without their local SAS executable or alternatives provided in SAS Grid, such as bsub and sasgsub commands. Creating the RSUB utility fulfilled the needs of these users, providing them a CLI to use for batch processing and scheduling in third party enterprise schedulers.

Matthew Kastin, NORC at the University of Chicago

Session 4398-2020

Sailing Your Data Lake with Self-Service Data Prep

Did the crystal clear analytics promised by your data lake never materialize? Have you been left with a murky data swamp, fearing what might be lurking beneath the surface? There are many reasons that the promise of data lakes hasn't come to fruition. One of those reasons is the lack of a single comprehensive tool to connect, integrate, transform, and govern the data in a data lake. This tool should be usable by citizen data scientists to deliver true self-service data preparation. It's easy to understand why there hasn't been a single tool that connects, integrates, transforms, and governs data in a data lake. By their nature, data lakes are not homogenous environments. Any tool working with data in a data lake needs to be able to work with structured, semi-structured, unstructured, and binary data. Integration is the next challenge. Baseline data integration should include basic table manipulation such as joining, appending, subsetting, and so on. Integration should be GUIdriven and not require advanced coding. During and after the data integration process, the data should be transformed to meet data quality standards. Both integration and transformation should be able to leverage artificial intelligence (AI) and machine learning to facilitate and streamline the processes. Finally, the tool should have data governance so that users can know what data they are working with. Self-service data preparation is the tool that can help make data lake dreams a reality.

Peter Baquero, SAS Institute

Session 4675-2020

Sample Size Calculations Using SAS, R, and nQuery Software

A prospective determination of the sample size enables researchers to conduct a study that has the statistical power needed to detect the minimum clinically important difference between treatment groups. With knowledge or assumptions about the study design, dropout rate, variation of the outcome measure, and desired power and alpha levels, the required sample size for a study can be calculated. This paper discusses methods for calculating sample size by hand and through the use of statistical software. It walks through the method for computing sample size using the POWER procedure and the GLMPOWER procedure in SAS^® and compares the commands and user interfaces of SAS with R and nQuery software for sample size calculations.

Jenna Cody, Johnson & Johnson

Session 4439-2020

SAS and Microsoft Office 365: A Programming Approach to Integration

Many of us are now using cloud-based productivity tools like Microsoft Office 365. And some of us are using SAS^® software in the cloud, too. For those of us who use SAS to read and create Microsoft Excel documents, cloud-based files can add an extra wrinkle when we automate the process. It also adds some exciting possibilities! The Microsoft Office 365 suite offers APIs to discover, fetch, and update our documents using code. In this paper, I show you how to use SAS programs to reach into your Microsoft OneDrive and SharePoint cloud to read and update your files. And I show you how to leverage the collaborative features of Microsoft Teams to publish useful updates to your colleagues and yourself. The approach relies on the REST APIs in Microsoft Office 365 and on the HTTP procedure in SAS. The paper covers each step, including: ‚Ä¢ registering a new application in your Microsoft Office 365 account ‚Ä¢ authentication and access using OAuth2 ‚Ä¢ using SAS to explore your document folders in OneDrive and SharePoint and import into SAS data sets ‚Ä¢ using SAS to create new documents in OneDrive and SharePoint ‚Ä¢ sending rich messages to your teammates in Microsoft Teams In addition to the detailed steps, this paper references a GitHub repository with code that you can adapt for your own use.

Chris Hemedinger, SAS Institute

Session 4702-2020

SAS Data-Driven Design - How to Develop More Flexible, Configurable, Reusable Software through Data Independence of Control Data

Data-driven design describes software design in which the control logic, program flow, business rules, data models, data mappings, and other dynamic and configurable elements are abstracted to control data that are interpreted by (rather than contained within) code. Thus, data-driven design leverages parameterization and external data structures (including configuration files, control tables, decision tables, data dictionaries, business rules repositories, and other control files) to produce dynamic software functionality. This handson workshop introduces real-world scenarios in which the flexibility, configurability, reusability, and maintainability of SAS^® software are improved through data-driven design methods, as introduced in the author's 2019 book: SAS^® Data-Driven Development: From Abstract Design to Dynamic Functionality. This white paper highlights one of these scenarios, in which malleable comma-separated values (CSV) files- where the variables and their order can vary-are ingested with the aid of a data dictionary control table.

Troy Martin Hughes

Session 4347-2020

SAS Intelligent Decisioning: An Approach to High Availability for Real Time Integration

Are you thinking about real-time analytics-integrating your analytics with your business applications? Do you understand the deployment patterns for SAS^® Micro Analytic Service? Do you understand high availability with SAS^® Viya^®? This paper outlines two approaches to implementing high availability for SAS Viya, which is used for real-time analytics using SAS^® Intelligent Decisioning. In order to implement high availability, you need to understand the deployment patterns and you need to understand that SAS Intelligent Decisioning can be used for batch and real-time processing. Learn about the architecture considerations, deployment patterns, and an approach to high availability using a shared-nothing deployment architecture pattern.

Michael Goddard, SAS Institute

Session 4941-2020

SAS Stored Process Web Services and Dynamic Data Exchange

The growing need for a Web User Interface is becoming more apparent in todays' applications. At Prime Therapeutics LLC we paired a Pega^® Web interface with SAS^® through Web services and SAS Stored Processes to build an application named Benefit Modeling Tool (BMT). This paper will describe the steps we took in building and supporting the BMT application.

Greg Dorfner, Prime Therapeutics LLC Dwight Buffum, SAS Silver Circle Solutions Danni Luo, Prime Therapeutics LLC

Session 4102-2020

SAS System Elasticity in Amazon Web Services Using SAS Grid Manager

In many systems, demand for computational resources varies over time. Wouldnt it be nice if the resources available at any given time to a SAS^®system, and the associated cost, were scaled based on the demand for resources at that time? Now, that can happen when the SAS system lives in Amazon Web Services (AWS) where you pay for only what you use. SAS^®Grid Manager for Platform knows how busy your system is and can dynamically grow the system when demand gets close to exceeding availability. And we can shrink the system when the demand decreases. The system can grow to a finite upper limit or unbounded. And all this is based on system attributes you define. This presentation describes how a SAS system is configured to make this happen, the AWS infrastructure requirements, and system attributes used to grow and shrink resources.

Glenn Horton, SAS Institute

Session 4094-2020

SAS Visual Analytics: "Where" Can Tell You "Why"

Everything happens somewhere and much of our data includes location information. Internet of Things (IoT) sensors often include x,y coordinates in their data, and actions we take using mobile apps can include location information. How are you gaining insight from location information in your dashboards and data explorations? Gain deeper insight with location analytics in SAS^®Visual Analytics, which includes capabilities out of the box, as well as access to advanced location analytics capabilities through integration with Esri. With location analytics, you can do the following: enhance your existing data sets by geocoding or enhancing with Esri demographics data; gain insights through dense data; perform travel-time and travel-distance visual analysis; include your multi-layer web maps with your data in SAS Visual Analytics; and more. Attend this session to see how where can tell you why using Location Analytics in SAS Visual Analytics.

Robby Powell, SAS Institute

Session 4468-2020

SAS^® and Amazon Redshift: Overview of Current Capabilities

Amazon Redshift has become a very popular database management system in modern cloud infrastructures. It provides flexibility and agility, and it can scale easily and save costs. More and more companies are using Redshift as their new data warehousing tool. How does SAS^®fit in this context? How do SAS users work efficiently with Redshift data? This paper gives an overview of the current access and integration capabilities between SAS products and Amazon Redshift. We cover this from both SAS^®Viya^®and SAS^®9 standpoints, including news from the latest releases.

Nicolas Robert, SAS Institute

Session 4837-2020

SAS^® BI Platform to Improve Use of Analytics in Higher Education: Do the Stats Match the Intent?

A national trend for all university and college administrators is increased pressure from legislators and other education advocates to increase persistence and graduation rates of all students. Learning analytics (LA) models have been implemented nationwide with the use of Learning Management Systems (LMS) software to identify access patterns of students to support materials, identify those who might be performing below expectations, and to use this information to be more effective in providing educational support. A quick review of data structures and the common metrics employed revealed numerous statistical anomalies that might be problematic in use of LA and LMS in higher education. For example, one dubious metric identified was "duration", which was an aggregate of the time a student used to complete practice exams. If a student required 50 minutes to complete three practice exams, the information provided was 50 minutes and the final score on the last exam, with no information provided on the number of attempts. A student requiring 20 minutes to score 5/10, then 15 minutes to score of 7/10, and finally 5 minutes to obtain 9/10 is a very different pattern of progression, and is important in order to understand student persistence. This session presents an integrated data model using SAS® Business Intelligence Platform to improve the accuracy and interpretation of analytics in order to improve student persistence and graduation rates in higher education.

Sean W. Mulvenon, University of Nevada, Las Vegas

Session 4795-2020

SAS^® DataFlux^® Matchcodes: A Practical Use at the Canada Revenue Agency for Tax Filing Status

To streamline the "data analytics supply chain," the Canada Revenue Agency (CRA) in partnership with SAS^® has established an automated matchcodes program in SAS^® DataFlux software, which takes collected web domain records and matches them on existing taxpayer databases in the data lake, using combinations of company or trade name, phone number, and/or postal code. Ultimately, this helps to reveal any non-filers as well as taxpayer risk/audit history, which can be used to derive predictive analytics algorithms, and is much more efficient than manual lookup of company contact info from web pages against the operational taxpayer database. This case study can conceivably be applied in the abstract for multiple types of government functions, such as investigations or law enforcement or national security-but in this presentation, it is from a taxation enforcement perspective.

Jason A. Oliver, Canada Revenue Agency

Session 4994-2020

SAS^® Enterprise Miner vs. Scikit-Learn - How do they recommend me good songs?

Analytics: Machine Learning (Data mining/Pred Modeling) Making good recommendations are essential for the retail and wholesale markets, and in many cases these suggestions become a competitive differential when aligned with marketing and sales campaigns. Empowered by those market trends big companies like Netflix in their streaming platform, or giants like Amazon and Airbnb are working hard to improve their recommendations always seeking for customer satisfaction. A Recommender System(RS) is a software with models that provide items suggestions for its user appreciation, like thousands of Spotify® subscribers, we are used to get a good custom playlist recommendation every week, but every song wrongly recommended, makes us wonder how to make better suggestions, so we decided to consume our data from Spotify API and recommend our own songs. In this scenario, we will develop two Recommender Systems, first one using SAS® Enterprise Miner and another one using Python-Scikit-Learn, and evaluate the accuracy of both modelling tools, and the results were amazing!

Raphael Lima

Session 4673-2020

SAS^® Event Stream Processing at the Edge: Reduce or Eliminate the Need to Transmit Data for Analysis

The new frontier is the Intelligent Edge. The intersection of people, places, sensors, things and computers defines the intelligent edge. HPE tested SAS ESP using NVIDIA Tesla T4 GPUs that were installed in an HPE EL4000, along with m510 server cartridges. This allows for video processing to be performed right at the edge, which reduces or eliminates the need to transmit large amounts of raw data to the data center or cloud for analysis. There will still be times when a subset of summarized data needs to be transmitted to the data center for further analysis. For example, immediate analysis can happen at the edge, but summarized data transmitted from the edge to the data center can then be combined with other like sources to look for trends across a larger sample. These trends could be looking for issues where a design update may be required or they could be looking at how environmental factors affect equipment in the field. Also, we will show how visual data captured by drones that can be used to determine when maintenance is required, rather than performing maintenance on a time basis.

Bob Augustin and Sri Raghavan, Hewlett Packard Enterprise

Session 4765-2020

SAS^® Grid Execution with Amazon Web Services (AWS)

Because of cost and maintenance, industries are moving towards the cloud adaption. Day by day, companies are pushing their data and processing in the cloud. Having data and processing closer makes data processing efficient. Currently, AWS is one of the key cloud providers in the market. Increase in the use of the AWS environment in the industries makes it important to establish the connection between AWS and SAS^® as well. This paper explains the techniques and configurations that SAS^® users should consider to access and process the AWS files in the SAS^® Grid environment. Though there are SAS^® procedures like S3 which can be used in SAS^® code to read/write files, we need to make other configuration changes, proxy settings, and access checks before we process AWS files in the SAS^® Grid platform. This paper explains the techniques which can help the SAS^® user to process cloud data easily and efficiently in the SAS^® Grid environment.

Piyush Singh and Gyanendra Shukla TATA Consultancy Services Ltd. David Glemaker, SAS Institute., Cary, NC.

Session 4577-2020

SAS^® Grid Manager and SAS^® Viya^®: A Strong Relationship

SAS^® Grid Manager and SAS^® Viya^® implement distributed computing according to different computational patterns. They complement each other in providing a highly available and scalable environment to process large volumes of data and produce rapid results. From finding the best way to allocate tens of jobs on multiple machines, to allocating huge amounts of data to quickly analyze in parallel, this paper shows how to architect and implement SAS Grid Manager and SAS Viya to effectively support your business.

Edoardo Riva, SAS Institute

Session 4131-2020

SAS^® Grid Manager: Inside Look into Grid-Launched Servers

As a SAS^® Grid Manager administrator, you might wonder whether you can improve utilization of compute resources so that your organization has better return on investment. You start SAS^® server processes in your grid environment by running server start-up scripts and through application invocation. SAS server configuration in SAS metadata determines how a server is launched. This paper presents best practices for instantiating SAS servers within a grid environment. Grid-launched servers benefit from SAS Grid Manager features including workload balancing, parallel processing, high availability, and flexibility with scaling the architecture.

Murali Srinivasan, SAS Institute

Session 4383-2020

SAS^® Knows Good Manufacturing Practice

The two key system requirements for pharmaceutical manufacturers are electronic data storage and statistical process control, and both must comply with the electronic records requirements of Title 21 CFR Part 11. SAS^®Life Science Analytics Framework is specifically designed for electronic records and electronic signatures. SAS Life Science Analytics Framework efficiently manages the transformation, analysis, and reporting of life science data. SAS/QC^®software, operating within SAS Life Science Analytics Framework, provides the specialized tools required for analysis of manufacturing data. SAS/QC is designed for process and manufacturing engineers, as well as for directors of quality improvement. This presentation shows how these two SAS^®products, working together, satisfy the business needs of the pharmaceutical manufacturing industry.

Lois Wright, Andrea Coombs, and Ben Bocchicchio, SAS Institute

Session 4634-2020

SAS^® on an Amazon Web Services Data Lake: Enabling Access to Data Lakes on SAS

Modern business applications produce a large variety of data at different volumes and speeds. Traditional data warehouses were not designed to consume and process such large amounts of data. Due to their tightly coupled storage and compute nature, customers end up having to pay for data at rest even if that data is not frequently queried or used. Data lakes provide the ability to store data in raw format, which enables a wide variety of use cases such as real-time analytics, machine learning, batch processing from different sources such as log files, clickstream events, and so on. SAS^® and its integration with the AWS data lake solution (Amazon Simple Storage Service, or Amazon S3) provides users with the flexibility to access data in disparate formats using AWS Glue. AWS Glue is a fully managed service that automates the time-consuming steps of data preparation by automatically discovering and profiling your data into a metadata catalog and transforming it into the target schemas/destinations. This paper provides an overview of data lake components within AWS and how they can be used with SAS for various use cases

Dilip Rajan, Amazon Web Services

Session 4952-2020

SAS^® Options Precedence in SAS^® Grid Environment

This paper helps to understand the precedence of SAS^® Options mentioned in different configuration files. There are multiple configuration scripts files that get deployed with SAS^® Grid install, and this paper explains the purpose of these scripts and how we should edit these scripts to meet the new business needs for SAS^® options which are not present with default SAS^® Grid deployment. The codes and techniques given in this paper will describe how to edit these configuration scripts files to set the desired options or environment variables. There can be different kinds of jobs to be executed in SAS^® Grid. The Platform Administrator creates different SAS^® Application Server context for different kinds of business requirements or different user groups. It becomes important to know how these different SAS^® Application Servers are being used (with different SAS^® jobs). This paper explains how we can use the SAS^® configuration files to get such information in an easy way by understanding the precedence of SAS^® options in SAS^® Grid platform.

Piyush Singh, Sumit Bhati, and Lavanya Erisetti, TATA Consultancy Services Ltd

Session 4725-2020

SAS^® Packages: The Way to Share

When you are working on Base SAS^®code, especially when it becomes complex, there is a point in time when you decide to break it into small pieces. You create separate files for macros, formats and informats, and for functions or data, too. You have the code ready and tested, and now it is time to deploy. The thing is you made it on local Microsoft Windows machine, and the deployment is on a remote Linux server. Folders and files have to be moved in the proper structure, code has to be run in the right order and not mixed up. Moreover, it is not you who is deploying... small challenge, isnt it? How nice it would be to have it all wrapped up in a single filea portable packagewhich could be copied and deployed with a one-liner like %loadPackage(MyPackage)? In my presentation (and paper) I would like to share an idea of how to create such a SAS package in a fast and convenient way. I present: 1) a concept of how to build a package; 2) the tools required to do so; and 3) a live-demo of how the process works on real examples (generating packages, loading, and using them). The intended audience for the presentation is intermediate SAS users with good knowledge of Base SAS and practice in macro programming who want to learn how to share their code with others.

Bartosz Jablonski, Warsaw University of Technology / Citibank Europe PLC

Session 4666-2020

SAS^® Software in a Crisis: Spatial Insights from Mapping a Humanitarian Disaster

Base SAS^® software includes powerful tools for spatial analytics that can be used in a variety of circumstances. This case study examines the unfolding of the Ebola crisis in 2018, and the efforts of the Humanitarian OpenStreetMap Team (HOT) community to support aid workers in reaching impacted communities. Using SAS^® Output Delivery System graphics procedures to analyze the OpenStreetMap^® data, the work undertaken by global volunteers and mapping communities to identify terrain, infrastructure, and housing can be analyzed to show the impact both spatially and over time. This analysis demonstrates techniques that are applicable for any organization using the open data from OpenStreetMap or other mapping data sources.

Michael Matthews, Innotwist Pty Ltd

Session 4537-2020

SAS^® Studio Custom Tasks: Tips and Tricks for the Adventurous Task Author

SAS^® Studio provides built-in point-and-click tasks for generating and executing complex SAS^® code. SAS Studio also enables users to embark on the journey of creating their own interface for their own SAS code, known as a custom task. Building a custom task is easier than you might think. There are great resources available for getting started writing custom tasks: SAS^® Communities articles, GitHub examples, free e-learning, and previous SAS^® Global Forum papers. But what about those adventurous task authors who have progressed out of the "getting started" phase? This paper focuses on more daring custom task concepts that aren't covered in introductory material. Examples include writing the optional requirements and dependencies sections, creating a multi-step (multiple-task) workflow, incorporating Apache Velocity Template Language code beyond the #foreach, and working with SAS^® Cloud Analytic Services (CAS) tables. Join me on this quest to create advanced custom tasks that push the limits and incorporate the features provided by SAS Studio and Apache Velocity Template Language.

Olivia Wright, SAS Institute

Session 4309-2020

SAS^® Visual Analytics SDK: Embed SAS Visual Analytics Insights in Your Web Pages and Web Apps

Embedding SAS^® Visual Analytics insights in your web pages and web apps lets you share insights through the portals your users regularly access. You already use SAS Visual Analytics to gain insights through drag-and-drop interactions and rich visualizations of your analytics results. And you share those insights with others to collaborate and solve challenges. The next step is embedding your insights in your corporate portal to increase the number of eyes that see what you see. Welcome to the SAS^® Visual Analytics SDK! The SAS Visual Analytics SDK is a collection of JavaScript libraries that web developers can use to embed SAS Visual Analytics insights within their web pages and web apps. Add to your web pages and web apps entire SAS Visual Analytics reports or individual objects from your reports. Join this session to learn about the SAS Visual Analytics SDK and see live demos of how easy it is to embed live, interactive SAS Visual Analytics content in your web pages and web apps.

Brad Morris and Robby Powell, SAS Institute

Session 4315-2020

SAS^® Viya^® 3.5 Kerberos Constrained Delegation: Putting the Dog on a Leash

Do you maintain a SAS^® Viya^® 3.5 environment leveraging Kerberos for authentication? Are you struggling with the strict security requirements of your IT department? SAS Viya 3.5 introduces support for Kerberos constrained delegation. This paper outlines the additional steps you must complete to correctly implement Kerberos constrained delegation with your SAS Viya 3.5 environment. This implementation enables you to leverage technologies such as Microsoft Windows Defender Credential Guard on your client desktops. Learn about the configuration changes you need to make and learn some effective troubleshooting techniques to get your environment working and make your users happy.

Stuart J Rogers, SAS Institute

Session 4214-2020

SAS^® Viya^® Monitoring Using Open-Source Tools

Did you know that SAS^® Viya^® has an event-driven infrastructure that gives you access to a continuous flow of logs, metrics, and other activity? This paper discusses the event-driven architecture of SAS Viya and demonstrates how you can leverage it to send logs, metrics, and events to leading open-source tools. See how to export metrics to Prometheus and set up custom alerts that are triggered when specified thresholds are met or exceeded. Understand how to use Grafana to visualize metrics on dashboards that are customized to your needs. Learn how to send logs and events to Elasticsearch, and how to efficiently filter, search, and report on this data using Kibana.

Bryan Ellington, SAS Institute

Session 818 -2020

SAS^® Viya^®: Autoscaling SAS^® Micro Analytic Service (MAS) Nodes Using Compute Virtual Machines on the Google Cloud Platform

Suvadeep Chatterjee and Sounak Nag, Core Complete

Session 4440-2020

Scalable Cloud-Based Time Series Analysis and Forecasting Using Open Source Software

Many organizations need to process large numbers of time series for analysis, decomposition, forecasting, monitoring, and data mining. The TSMODEL procedure, available in SAS^® Visual Forecasting and SAS Econometrics^® software, provides a resilient, distributed, and optimized generic time series analysis environment for cloud computing. PROC TSMODEL offers capabilities such as automatic forecast model generation, automatic variable and event selection, automatic model selection, and parameter optimization. It also provides advanced support for time series analysis (in the time domain or in the frequency domain), time series decomposition, time series modeling, signal analysis and anomaly detection (for IoT), and temporal data mining. In addition, PROC TSMODEL supports opensource integration with external languages Python and R. This paper describes the scripting language that supports cloud-based open-source integration between SAS^® software and external languages; examples that demonstrate this use case are provided.

Javier Delgado, Thiago Quirino, and Michael Leonard, SAS Institute

Session 4589-2020

Secret to Successful CECL Model Implementation

The deadline is coming, the deadline is coming! Are your Current Expected Credit Loss (CECL) models ready to go? Financial institutions are required to have CECL models implemented as soon as 2020 or as late as 2023. Whether your model development process just started or your models have been finalized for a while now, the implementation process is still a large project to take on. This process can be just as long and complex as the model development process. So what's a financial institution to do? Fortunately, this paper is written by two model implementation experts. They have seen a wide array of models implemented for a wide range of financial institutions, and they're here to give you the tips and tricks for the most successful model implementation possible, using SAS^®‚ÄØModel Implementation Platform (MIP) component of the SAS Expected Credit Loss (ECL) Solution.

Monica Wang and Jackie Yanchuck, SAS Institute

Session 4542-2020

Section 508 and Maps: Breaking Down Barriers for People with Visual Impairments or Blindness

Independent access to maps and geospatial data has always been a systemic barrier for people with visual impairments or blindness (VIB). That barrier inhibits our participation in the classroom, on the job, and within our communities. SAS^® is working to solve that problem. This paper defines accessibility as it relates to maps and geospatial data, describes initial support for accessible maps in SAS^® Graphics Accelerator, and explains how you can use SAS^® 9.4 to create maps that are accessible for everyone, including people with VIB.

Ed Summers and Sean Mealin, SAS Institute

Session 5006-2020

Securing SAS^® Viya^® Access with Single Sign-on and Two-Factor Authentication

A quick look at data breach trends shows that most of the security breaches involved weak, default or stolen passwords. Two-factor authentication (2FA) strengthens access security by requiring two methods (also referred to as factors) to verify your identity. These factors can include something you know - like a username and password, plus something you have - like a smartphone app to approve authentication requests. 2FA protects against phishing, social engineering and password brute-force attacks and secures your logins from attackers exploiting weak or stolen credentials. Single sign-on (SSO) is a session and user authentication service that permits an end user to enter one set of login credentials (such as a name and password) and be able to access multiple applications.

Sandeep Grande, Core Complete Inc.

Session 4843-2020

Security Considerations in SAS^® Cloud Analytic Services for Google Cloud Platform Data Sources (Google BigQuery, Google Cloud Storage, Google Bigtable)

Customers who are moving into cloud infrastructure, such as Google Cloud Platform (GCP), and are deploying SAS^®Viya^®often say that they already have a well-established way to manage their users and want to keep using their existing identity management system. In this presentation, we walk you through implementing that approach and discuss the best ways to provision or sync users when using your existing identity management system with GCP and to integrate with SAS Viya through SAML authentication. This will help you consider the security in SAS^®Cloud Analytic Services (CAS) for GCP data sources such as Google BigQuery, Google Cloud Storage, and Google Bigtable.

Sanket Mitra, Core Compete

Session 4438-2020

Semi-automatic Feature Engineering from Transactional Data

Transactional data are ubiquitous: whether you are looking at point-of-sale data, weblog data, social media data, genomic sequencing data, text, or even a standard relational database, these data generally come in transactional form. But data mining assumes that all the data are neatly packaged into a single record for each individual being observed. With experts noting that 80% of a data scientist's time is spent in data preparation, is there a way to automate this process to make that job easier? We have developed a toolkit that can be deployed through SAS^® Studio and SAS^® Data Management Studio software to address this conundrum. Given transactional tables, which represent both categorical and numerical data, you can use this toolkit to pipeline a series of steps-including steps to reduce cardinality of categorical data, to roll columns up to a subject level, and to generate "embeddings" of those columns at the subject level-in order to prepare the data for the modeling task. In addition, the process automatically creates a scoring script so that you can replicate that pipeline for any additional data that come in.

James A. Cox, Biruk Gebremariam, and Tao Wang, SAS Institute

Session 4705-2020

Show Me the Money! Preparing Economics Students for Data Science Careers

Economists make great data scientists. Economics offers as a discipline many valuable skills such as problem-solving ability and storytelling. When economic theory and deep business acumen combine with applied econometric analytics plus an obsession to understand the data generating process and methods of dealing with dirty data, you have a lot of research savvy. Add a layer of deep SAS^® programming and analytics, and you have the beginnings of a great data scientist. We partnered in 2015 with the SAS Global Academic Program to offer a joint certificate in economic data analytics. Every graduate and undergraduate since have received the certificate. This paper will discuss the methodologies taught and pedagogies used in our program. Students learn programming in a team- and problem-based learning environment working on real data and real problems. A revision of the undergraduate program in 2003 made a significant commitment to bring the success of MA students to undergraduates with much success. Starting in fall 2019, we accept students into an even more powerful curriculum leading to a BBA in Business Data Analytics, requiring analytic courses in economics and business. The paper will discuss the state of the saturated field of analytics in higher education and how the University of Akron is adapting to remain competitive and to produce highly demanded students. The talk will highlight student success stories and provide a guide to others on how to prepare students for data science careers.

Steven C. Myers, Ph.D., University of Akron

Session 4647-2020

Simple and Efficient Bootstrap Validation of Predictive Models Using SAS/STAT^® Software

Validation is essential for assessing a predictive model's performance with respect to optimism or overfitting. While traditional sample-splitting techniques like cross validation require us to divide our data between model building and model assessment, bootstrap validation enables us to use the full sample for both. This paper demonstrates a simple method for efficiently calculating bootstrap-corrected measures of predictive model performance using SAS/STAT^® procedures. While several SAS^® procedures have options for automatic cross validation, bootstrap validation requires a more manual process. Examples focus on logistic regression using the LOGISTIC procedure, but these techniques can be readily extended to other procedures and statistical models.

Isaiah Lankham, University of California Office of the President

Matthew Slaughter, Kaiser Permanente Center for Health Research

Session 4400-2020

Simulating Data for Complex Linear Models

One of the core tools of any statistician is working with linear models, from simple or multiple regression models to more complex, generalized linear mixed models. Data simulation enables you to be more comfortable with new types of models, by providing data to a model that will give known results. This paper introduces you to the random number functions in the SAS^® DATA step and shows you how to construct programs to simulate data for a variety of models. The paper briefly discusses basic areas of data simulation, such as linear regression. The paper then shows how to generate data for more complex mixed models, including repeated measures models under a variety of within-subject covariance structures. Also, the paper covers generalized linear mixed models like logistic and Poisson. Finally, nonlinear mixed models are discussed. After reading this paper, you should be able to better understand models with difficult parameterizations, especially in the covariance structure.

Phil Gibbs and Kathleen Kiernan, SAS Institute

Session 4116-2020

Simulating Time Series Analysis Using SAS^® - Part II

Time series data analysis has applications in many areas, including studying the relationship between wages and house prices, between profits and dividends, and between consumption and GDP. Many analysts erroneously use the framework of linear regression (OLS) models to predict change over time or extrapolate from present conditions to future conditions. Extreme caution is needed in interpreting the results of regression models that are estimated using time series data. Statisticians and analysts working with time series data have uncovered a serious problem with standard analysis techniques applied to time series: estimation of parameters of the ordinary least square. This series of presentations discusses a simple SAS^®framework to assist SAS programmers in understanding and modeling time series data in a univariate series. Part II continues the discussion of how to move beyond the ADF testing and focuses on examining time series variables long-term relationships (cointegration). A third part of this series will discuss how to develop an error correction model (ECM), a mechanism and concept discussed by many authors, including Granger (1983) and Banergee et al. (1993), that is used by many analysts to determine time series short-term deviations from long-term equilibrium.

Ismail Mohamed Federal, Housing Finance Agency

Session 4838-2020

Slinging Hash: The HASHING Functions Available in SAS^®

This paper provides an overview of the HASHING* functions that have been introduced in 9.4m6 of the SAS System. These functions can perform hashing using MD5, SHA1, SHA256, SHA384, SHA512, and even old-school CRC methods, along with HMAC computations.

Rick Langston

Session 4617-2020

Smarter and Faster Self-Service Data Preparation

In this paper, you'll learn about the latest and greatest self-service data preparation capabilities of SAS^® Visual Analytics. You will understand how smart suggestions can help you improve the quality of your data, how the new interface can help you work faster, and how better-prepared data can help you build better visualizations, better reports, and tell a more compelling data story.

Atrin Assa, SAS Institute

Session 4609-2020

Smarter and Faster with SAS^® Visual Analytics

SAS^®Visual Analytics is the smartest business intelligence tool available. Automated explanation, the new name for automated analysis, has been rewritten and redesigned to give you smarter and clearer insights, more interactivity, and easier-to-read explanations. In seconds, you can get the analytical story for the business intelligence hidden in your data that would take you hours to do manually. On top of that, you can automatically see suggested visualizations and identify related measures. For more advanced analytical visualizations, like decision trees, you get human-friendly natural language descriptions, drawing out insights that are easy to digest.

Rick Styll, SAS Institute

Session 4683-2020

Solving Health Problems on the Academic Frontier: Training Students to Use SAS^® Through Research

SAS^® is a valuable tool for data management and analysis in a variety of settings, and academia is no exception. As an educator of future public health professionals and a researcher with a data driven epidemiology laboratory, integrating various analytic tools into my courses and my lab is necessary. Students in my research laboratory complete data analysis to help solve large problems related to human and animal health issues, but many do not join the group with backgrounds in data analytics or statistics. It is absolutely imperative that these students get a well-versed education, and quickly, through-real world opportunities. This presentation discusses our methodology for training students, some of the benefits and struggles of training students in this manner, and demonstrate some of the overall competencies our students acquire through this experience.

Charlotte Baker, Virginia Tech

Session 4098-2020

Solving the VAST Challenge: Three Case Studies Using SAS^® Visual Analytics

The VAST Challenge is an annual contest that provides a synthetic data set and a series of questions for teams from around the world to test their existing tools against and to push boundaries by developing novel tools and methods. Teams within SAS have dedicated their spare time to create entries to the VAST Challenge for several years. This paper explores three of those entries showing how SAS^®Visual Analytics, other SAS^®tools, and open-source software were combined to tackle the challenges and show how the challenge has helped uncover areas for growth.

Riley Benson, Lisa Everdyke, and Karl Prewo, SAS Institute

Session 5103-2020

Some_FILE_Magic

The use of the SAS^® automatic variable _INFILE_ has been the subject of several published papers. However, discussion of possible uses of the automatic variable _FILE_ has been limited to postings on the SAS-L listserv and on the SAS^® Support Communities website. This paper shows several uses of the variable _FILE_, including creating a new variable in a data set by concatenating the formatted values of other variables and recoding variable values.

Mike Zdeb, FSL, University at Albany School of Public Health

Session 4434-2020

Sound Insights: A Pipeline for Information Extraction from Audio Files

Audio files, like other unstructured data, present special challenges for analytics but also an opportunity to discover valuable new insights. For example, technical support or call center recordings can be used for quickly prioritizing product or service improvements based on the voice of the customer. Similarly, audio portions of video recordings can be mined for common topics and widespread concerns. To uncover the value hidden in audio files, you can use a pipeline that starts with the speech-to-text capabilities of SAS^® Visual Data Mining and Machine Learning and continues with analysis of unstructured text using SAS^® Visual Text Analytics software. This pipeline can be illustrated with data from the Big Ideas talk series at SAS, which gives employees the opportunity to share their ideas in short, TED Talk-type presentations that are recorded on video. If you ever wondered what SAS employees are thinking about when they're not thinking of ways to make SAS products better, the answers lie in a pipeline for information extraction from audio files. You can use this versatile pipeline to discover sound insights from your own audio data.

Dr. Biljana Belamariƒá Wilsey and Xiaozhuo Cheng, SAS Institute

Session 4648-2020

Statistical Programming in the Pharmaceutical Industry: Advancing and Accelerating Drug Development

In pharmaceutical product development, statistical programming and the analysis of clinical trial data has always played a key role in helping to deliver medicines to patients. Over the last decade, the industry landscape and analysis needs have changed dramatically. Most organizations have shifted from a one-size-fits-all approach to personalized healthcare, where genetics and other biological information influence individual treatment. Clinical trial design and analysis has become more innovative-and complex-with the use of synthetic control arms, patient reported outcomes, real-world evidence, and biomarker data. Industry data standards under CDISC have created guidance for most data sources but novel endpoints might require additional considerations. Statistical programmers and analysts need to ensure that data is FAIR-findable, accessible, interoperable, and reusable; this is critical to the longevity of an organization who prioritizes their data as an asset. SAS continues to lead the industry in data analysis but its use has also evolved to where it now can be used with other programming languages, contributing to interactive visualizations and automation. This influences how and what we use to analyze data. This presentation provides an overview of statistical programming and analysis in the pharmaceutical industry, including how skills and responsibilities have adapted and advanced the impact of bringing medicines to patients sooner.

Harper Forbes, Hoffmann-La Roche Limited

Session 4223-2020

Steer Your Hybrid SAS^® Viya^®/SAS^®9 Ship Toward the 'Governed Data' Port

Imagine you are in charge of a ship marked SAS^® Viya^® on starboard and SAS^®9 on port. The water is data. Navigation challenges: stay afloat, avoid the rocky shores of regulatory scrutiny, and circumvent the shallow waters of customer complaints. This session guides you safely toward the "Governed Data" port. Learn when to steer to SAS^®9, and use SAS^® Business Data Network as a chart and SAS^® Lineage as your tracking system. Find out which SAS^® Data Management REST APIs and command-line interface scripts can give you the extra steering. Turn to SAS Viya and accelerate data processing, preparation, and discovery. Improve decisions made with SAS^® Intelligent Decisioning. Enter key data governance events in your ship's log, a SAS^® Cloud Analytic Services (CAS) table. Watch key indicators on your SAS^® Visual Analytics dashboards. Learn the most useful CAS actions and SAS Viya REST APIs. Finally, understand the protection and remediation kits available, in case of an emergency.

Bogdan Teleuca, SAS Institute

Session 5167-2020

Step-by-Step SQL Procedure

PROC SQL is a powerful query language that can sort, summarize, subset, join, and print results all in one step. Users who are continuously improving their analytical processing will benefit from this hands-on workshop. In this paper, participants learn the following elements to master PROC SQL: 1. Understand the syntax order in which to submit queries to PROC SQL 2. Summarize data using Boolean operations 3. Manage metadata using dictionary tables 4. Join tables using join conditions like inner join and reflexive join 5. Internalize the logical order in which PROC SQL processes queries

Charu Shankar, SAS Institute

Session 5149-2020

Stepping Up Your SAS^® Output Delivery System Graphics Game

Do you want to create eye-catching and informative data displays, but don't have access to SAS^® Viya^® or SAS^® Visual Analytics? Not to worry! With a little bit of extra work and ingenuity, you can use SAS^® Output Delivery System (ODS) Graphics, available in Base SAS^®, to turn mediocre data displays into high-impact communication tools. This paper provides ideas for combining tables, charts, and formatting elements to step up your ODS Graphics game. It demonstrates how SGPLOT procedure output can be combined with REPORT procedure output into a single visual element and gives examples of how SG annotate functions can enhance standard graphs. These code examples will empower the applied statistician to create more advanced data visualization elements.

Sara Richter, Professional Data Analysts

Session 4635-2020

Survey Data Analysis Made Easy wth SAS

Population-based, representative surveys often incorporate complex methods in data collection, such as oversampling, weighting, stratification, or clustering. Analysis of these data sets using standard procedures (such as the FREQ procedure) results in incorrect estimates and might overstate the statistical significance of results due to the complex survey design factors. However, SAS^® survey procedures, such as the SURVEYFREQ and SURVEYMEANS procedures, make it easy to adjust for the complex sample design and weighting of representative surveys. This hands-on workshop (HOW) provides an overview of complex survey design and explains how SAS survey procedures can adjust for complex survey design factors. Attendees learn how to easily generate accurate frequencies, percentages, means, and odds ratios from survey data sets using SAS survey procedures. The workshop provides information about obtaining accurate standard errors and confidence intervals, and demonstrates how to statistically test for differences using chi-square or t-tests. The course also explains how to interpret the output data from the survey procedures and provides examples of SAS code and output. This workshop uses publicly available data from the National Health and Nutrition Examination Survey (NHANES) and the California Health Interview Survey (CHIS) as examples. Attendees have the opportunity to practice using SAS survey procedures on these data sets.

Melanie Dove, UC Davis

Katherine Heck, UC San Francisco

Session 4908-2020

Surviving the Cox Proportional Hazards Model with the POWER Procedure

Prior to the release of SAS/STAT^® 14.2, power analyses for survival methods were immured to a single option-TWOSAMPLESURVIVAL-within the POWER procedure. The weakness of TWOSAMPLESURVIVAL is its inability to address the aspects of survival analysis like the Cox proportional hazards model. Modeling is a key tool toward understanding this association of a given number of covariates with an outcome and thus requires a high level of statistical power. Statistical power informs researchers of the probability of their study predicting a conclusion to be true when in fact it should be false, which is called a type II error. The appropriate sample size ensures a low probability of a type II error equating to a high level of statistical power. With the new features in SAS/STAT 14.2, researchers are now able to conduct power analyses for the Cox proportional hazards model in survival statistics by using the COXREG option in PROC POWER. Researchers can now prospectively power survival models to maintain a high level of confidence that the association between the covariates and survival time is reliable and robust.

Rachel R. Baxter, Grand Valley State University

Session 4671-2020

Synchronized Multivariate Resampling to Designated Distribution and Population Level with PROC SURVEYSELECT

PROC SURVEYSELECT is a powerful SAS^® procedure for random resampling. With SAMPSIZE and STRATA option, the population level can be altered in the resampled data for designated variables. To further extend the function of PROC SURVEYSELECT, we developed an innovative approach which can perform synchronized multivariate resampling with PROC SURVEYSELECT. The approach first prepares a cross-bin flag through crossing all involved variables, then calculates the expected percent for each level of the created cross-bin flag by crossing the designated percent of each level in each involved variable. Based on the derived percent, the sample number for each cross-bin level is calculated, and finally applied in PROC SURVEYSELECT for resampling.

Zhiyong Chen, Zem Data Science LLC, Previous Pharmerit International Inc

Session 4915-2020

Take Control of Your Data!

As organizations embrace analytics, data is increasingly taking center stage in the modern enterprise. Users are finding innovative ways to build analytics into their daily processes, using data from warehouses, data lakes, and external sources to feed into their models. Often, the data sets and insights they fuel are stored in sandboxes and personal workspaces, inaccessible to others that might benefit from them. The proliferation of these data islands inhibits collaboration and creates storage and governance concerns. In this emerging Age of Analytics, organizations need to:

manage the proliferation of data sets to control storage costs
enable users to find the most relevant data assets in a timely manner
monitor access across data marts and sandboxes
maintain good stewardship and governance across the enterprise

Powered by SAS^® Visual Investigator, an Analytics-Code and Data Registry (ADR), is a solution to help businesses understand, share, and gain control over their data and analytics ecosystem. The solution provides the framework for a knowledgebase of analytics and data assets supported by integrated analytics project governance, compute environment provisioning, and application-aware workflow and alerting tools. The ADR solution is a collaboration between SAS and Core Compete that enables businesses to build a data-driven, knowledge-sharing culture of governance and collaboration to empower better analytics and drive innovation.

Diane Hatcher, Core Compete

Session 4611-2020

Teaching SAS^® Coding with Self-Checking Exercises

Programming is an essential skill for many people and for most STEM jobs. Teaching students to code is a standard part of statistics and computer science curriculums. In my opinion, students need individual hands-on practice to master the skill of writing code regardless of the programming language. Classroom lectures help explain concepts and design patterns, but they cannot substitute for students spending time practicing. Designing meaningful practice with timely feedback is difficult for faculty. Manual grading is either non-specific or has a significant lag between assignment completion and student feedback. In this paper, I outline how professors can teach SAS^® using SAS^® Analytics Cloud with selfgrading assignments to help the students get immediate feedback on programming exercises. This case study leverages SAS Analytics Cloud and Jupyter Notebooks along with the SAS kernel and nbgrader.

Jared Dean, SAS Institute

Session 4640-2020

Tell Me A Different Data Story

The easiest data story to write involves an event-something in real life-changing over time. This change can be measured with the right data metric, comparing its value before and after the event. The more dramatic the change, the more interesting the data story. But insight projects are usually more complicated than a single event. Many involve the analysis of multiple events or the output of different segmentation profiles. This insight also needs to be effectively communicated to initiate change. This paper builds on knowledge in the paper Tell Me A Data Story (3168-2019), Kat presented at SAS^® Global Forum 2019. This paper explains how to write different data stories, depending on the insight you have to communicate, and how to effectively tell them using data visualization. All of these learnings can be applied to SAS^® Visual Analytics to enhance your data storytelling.

Kat Greenbrook, Rogue Penguin

Session 4278-2020

Ten Minutes to Your First Hello World: REST APIs

Like many other software applications, your time to first hello world (TtFHW) with the SAS^® Viya^® REST APIs should be seamless and simple. The goal of this playbook that works out of the box in SAS Viya is to demonstrate a sample playbook with a standard workflow. Then, once you have used the playbook, you'll be enabled to envision how you can extend the playbook for your use cases. The playbook uses standard Python modules to connect to the SAS Viya REST APIs and is designed with easy to read variable declarations and helper functions for ease of use. The GitHub project includes all dependencies and is designed to work with SAS Viya 3.4 (and will be upgraded as needed to work with SAS Viya 3.5).

Andy Bouts, SAS Institute

Session 4641-2020

Testing Hypotheses for Equivalence and Non-inferiority with Binary and Survival Outcomes

The classic superiority test of comparison of two treatment groups seeks to show that they differ on a measure of their efficacy. However, when a new treatment has a similar therapeutic action to an existing standard treatment, there might be little difference in their efficacy. An equivalence study is designed to show that the difference in a measure of efficacy between the new and standard treatments is within a pre-specified margin of clinical indifference. In another context, the new treatment may offer lower cost and/or better patient compliance but might have a lower efficacy than the standard treatment. A non-inferiority study is designed to show that the new treatment is not less effective than the standard treatment to within a pre-specified margin of clinical indifference. In this paper, we discuss the formulation of tests of hypotheses for equivalence and non-inferiority studies in the context of a two-sample design for binary and survival outcomes. For a binary outcome, the comparison between the new and standard treatments can be assessed by the difference in the probability of response to treatment, the relative risk or odds ratio. For a survival outcome, the comparative assessment can be made using survival probabilities or where appropriate, the hazard ratio. SAS POWER and FREQ procedures offer options for performing the tests of hypotheses, and assessment of statistical power and sample size for conducting equivalence and non-inferiority studies.

Joseph C. Gardiner, Department of Epidemiology and Biostatistics, Michigan State University

Session 4427-2020

That's How I Like It: Personalize Reports in SAS^® Visual Analytics Apps for iOS, Android, and Microsoft Windows 10

The report you are viewing on your phone is both beautiful and informative. You can drill, filter, and sort the data, but there's a burning question you need to answer. If only you could make a few modifications to the report, you could gain the insight you are searching for. Viewer customizations, a collection of capabilities recently introduced in SAS^® Visual Analytics, puts the power in the hands of report viewers to gain their own insights. You can personalize your report-viewing experience and answer the questions that used to lie just out of reach. Customize filters, groups, and ranks for your data to get a different view and gain new insights. Change data items and visualizations to dig a bit deeper. While viewing your report, you can make decisions more quickly by customizing the report according to your needs-no need to bother the report designer with change requests. You can easily change the data being analyzed from revenue to profit, if you like, or filter out regions of the world that are not relevant to your data questions. You can do all of this in the SAS^® Visual Analytics Apps, using your preferred platform: iOS, Android, or Windows 10.

Cheryl L. Coyle, Rich Hogan, and Robby Powell, SAS Institute

Session 4321-2020

The 4 Ws: A Strategy Session on SAS^® Customer Intelligence 360 Personalized Consumer Experiences

Now more than ever, consumers are looking for brands, products, and experiences that fit their wants and needs. They no longer take the first product offered to them. With information at their fingertips, they can research and make more knowledge decisions regarding their purchases. Marketers are faced with the challenge of answering the four Ws: who, what, when, and where. Who is your target consumer? What are they going to purchase? When are they going to purchase? Lastly, where is the best channel to make and/or continue contact? Marketers are responsible for strategizing, creating, and executing complex, omni-channel personalized campaigns for their consumers. SAS^® Customer Intelligence 360 contains a multi-module system that enables marketers to accomplish not only campaign execution across connected marketing channels, but also provides actionable insights and continued campaign optimization. This presentation examines the ability to leverage SAS Customer Intelligence 360 to personalize a consumer's experience across multiple marketing channels and to take the guess work out of the four Ws.

Jaclyn Coate, SAS Institute

Session 5123-2020

The Curious CAT (Q, S, T, X) Functions

The CAT family of functions CAT, CATQ, CATS, CATT, CATX in SAS are very useful in concatenating strings. These functions not only join strings, but also serve to write compact codes with an additional advantage of easier discernment of the same. The CAT family attain these positives by eliminating the need of complex and extended logics associated with multiple functions. In this paper, we present the various CAT functions with relevant examples of their application. The main objective of this paper is to remove the curiosity of the CAT family and encourage the programmers to add these functions as a must in their toolbox.

Jinson J. Erinjeri, Independent SAS Programmer

Saritha Bathi, Bristol-Myers Squibb

Session 4397-2020

The Essentials of SAS Dates and Times

The first thing you need to know is that SAS^® software stores dates and times as numbers. However, this is not the only thing you need to know. This paper is designed to give you a solid base for working with dates and times in SAS. It introduces you to functions and features that enable you to manipulate your dates and times with surprising flexibility. This paper shows you some of the possible pitfalls with dates (and times and datetimes) in your SAS code and how to avoid them. We show you how SAS handles dates and times through examples, including the ISO 8601 formats and informats, and how to use dates and times in TITLE and FOOTNOTE statements. The paper closes with a brief discussion of Excel conversions.

Derek Morgan, PAREXEL International, Billerica, MA

Session 4762-2020

The Face Behind That Fraud: Humanizing Data

Over $905 million in total fraud losses in 2017 & there is a human behind every record. Businesses must do all they can to protect new & existing customers. When that fails, ultimately someone has been impacted - whether it's a breach of organizational data or a theft of someone's identity to commit a crime. The Identity Theft Resource Center, a victim assistance non-profit, aids anyone with a U.S. identity credential by providing access to the information & guidance they need - whether it's emotional or financial. The impact of the increase in data breaches, exposing over 147.4 million sensitive records from January to September 2019 alone, is astronomical. As the ITRC sees an influx of calls & live-chats, the strain on the existing resources available to assist consumers increases exponentially. During this session, we will talk about the human element of identity theft, how ITRC has partnered with SAS to employ its AI & Analytics to maximize access to resources without impact to the operational needs. Through the implementation of its new AI-driven chatbot - the Virtual Victim Assistance Network or ViViAN - the ITRC can assist more victims in realtime by automatically responding with the right resources to get them started on their remediation plan or just need quick answers. We will showcase the capability of the chatbot as well as other solutions that SAS has worked on to aid in the fight against identity theft & help humankind feel safer in their communities.

Eva Velasquez and Charity Lacey, Identity Theft Resource Center

John Watkins and Jared Peterson, SAS Institute Inc.

Session 4949-2020

The Hack-asaurus Guide to SAS^® Grid: Saving Time with Parallel Database Queries. SAS Global Forum Paper 4949-2020

Have you ever worked overtime or missed a deadline because it took hours (even days) to extract all of the data you needed from multiple sources? And heaven help us if there is a flaw in the query, a change in data structures, or a change in customer requirements. SAS^®Grid supports parallel processing, enabling a single program to query those data sources in parallel and making retrieval and aggregation as simple as F3. Unlike other parallel processing solutions, SAS Grid does not require learning a completely new syntax (Im looking at you, DS2!). Unfortunately, it is not a simple as standing up a SAS Grid server. Do you live in the WORK library? Sorry, you cannot use that at all. Like using global variables or inheriting local macro variables? Sorry, those break too. Do you use %INCLUDEs to create macro libraries? Sorry, SAS Grid sessions dont know about your local %INCLUDE statements. But do not despair: all of these issues (and others) can be addressed to prevent massive rewriting. This session will save you those tears and overtime hours by giving you the solutions to these problems. It is like getting into a prestigious university without your parents having to bribe administrators!

Ted Williams, Magellan Method

Session 4141-2020

The History and Evolution of SASPy, Including an Overview of What It Can Do and How to Use It

SASPy is one of the most popular SAS^®open-source packages, available on the SAS^®GitHub site (https://github.com/sassoftware/saspy) and also installable via the Python Package Index (PyPI) and Anaconda. But, how did it come to be? How and why did SAS create an open-source Python interface to your existing SAS^®9.4 systems, and without even needing any updates to your existing SAS servers for it to be able to connect to them? Often, having an understanding of the design criteria and constraints of a software package helps you better understand and use it, so you can be more productive in what you are really trying to accomplish. This session blends in this insight while giving an overview of SASPy, including the various ways it can connect to your different SAS installations and how to configure it for each one, a walk-through of its core functionality, and a look at some of the more advanced features. The goal is that you end up with a much better understanding of how to use SASPy to accomplish even better combined Python and SAS workflows for your projects.

Tom Weber, SAS Institute Inc.

Session 4476-2020

The Increasing Use of Artificial Intelligence in the Intelligence Community

The management of intelligence data within a law enforcement environment has traditionally been a very manual process. Reviewing the contents of intelligence reports and creating structured records has traditionally been the responsibility of intelligence professionals. With the massive increase in the amount of data being processed and the advancements in technology, the intelligence community is increasingly turning to artificial intelligence to help automate some of the tasks. This paper discusses some of the areas of progress and some of the challenges being faced.

Gordon Robinson, SAS Institute

Session 4494-2020

The New solveBlackbox Action in SAS^® Optimization 8.5

In SAS^® Viya^® 3.5, SAS^® Optimization 8.5 introduces the solveBlackbox action. This action greatly expands both the means of access to this essential optimization technique and the range of its potential applications. Black box optimization uses innovative methods to find solutions to some of the most challenging optimization problems, in which the functions involved might be nonsmooth, discontinuous, and computationally expensive to evaluate. You can call the solveBlackbox action from programs in the CASL, Python, Lua, Java, and R languages. And because this action uses CASL to define objective functions, there is almost no function it cannot work with; any CASL script can become an objective function of the solveBlackbox action. In addition, this action makes extensive use of multilevel parallelism, enabling you to make the most effective use of distributed computational resources. Distributed computational tasks that are called by this action can in turn call other distributed tasks. In SAS Viya, black box optimization has been available through the OPTMODEL optimization modeling language in SAS Optimization and is the foundation of the Autotune action set, which is used by SAS^® Visual Data Mining and Machine Learning to perform automated algorithm selection and hyperparameter tuning for machine learning models. This paper reviews the capabilities of the solveBlackbox action and explores several examples of its use in a variety of settings.

Ed Hughes, Steve Gardner, Josh Griffin, and Oleg Golovidov, SAS Institute Inc.

Session 4652-2020

The Past, Present, and Future of Training SAS^® Professionals in a University Program

University students who are fortunate enough to encounter SAS^® typically do so in a variety of courses and contexts. A few programs offer courses designed specifically to build SAS programmingfundamentals before moving on to courses with specific applications. However, many curricula merelyinclude offerings that target the applications of immediate interest without building a broad skill baseto prepare students for careers that rely heavily on programming. This session discusses past andcurrent practices at several institutions, the philosophies behind them, and where they succeed andfail. As SAS and its role in the marketplace evolve, a university curriculum must evolve with it.Therefore, this session closes with a discussion of how SAS and universities can work together toprovide the training necessary to continue preparing high-quality programmers.

Jonathan W. Duggins, NC State University

Jim Blum, UNC Wilmington

Session 5206-2020

The Plight of the Honeybee

A decline in the honeybee population is raising concerns worldwide. Honeybees are an important part of agricultural industries. Is it possible that the neonicotinoid pesticides used to protect crops from damaging insects are also harming the insects necessary for pollinating the same crops? This study explores the effects of these pesticides on honeybees by comparing annual honey production yields and honeybee colony counts in the United States from 1995 through 2015.

Rachael Bishop, DaMarkus Green, and Karin Kolb, Kennesaw State University

Session 4561-2020

The SAS^® Encoding Journey: A Byte at a Time

UTF-8 is becoming the most dominant character encoding for the World Wide Web. It supports a great number of characters from many languages, including English, and is compatible with ASCII characters. While all characters are one byte in WLATIN1, the same familiar characters might take several bytes in UTF-8. This major difference in data representation imposes several challenges for SAS programmers. Transcoding errors, truncation, or garbage characters such as ÔøΩ might appear unexpectedly when your data is processed. To help you in this endeavor to overcome these challenges, this paper explains the basics of character encoding and how it is handled in SAS^®. It defines some common terms such as ASCII, single-byte character set (SBCS), multi-byte character set (MBCS), and Unicode. It also introduces SAS functions and macros that are available to detect potential problematic characters and to make it easier to fix the problem.

Micka√´l Bouedo, SAS Institute

Session 5033-2020

The Secreit to Finding the Right Talent in Tech: Why Artists Can Be Your Greatest Asset

We are all looking for talent. Talent who will stay with our companies and help them grow. We are looking for great cultural fit- people who communicate well, work hard and get the job done. If you think finding people with more STEM is your answer, you might want to shift your perspective. Learn why and how dancers, actors, artists and performers will change the face of your team for the better. We will consider how empathy, teamwork and communication skills are developed through the arts and how they can help your existing technical team of developers, analytics professionals and managers grow. We will outline why adding people with significant arts training can help your team thrive, bring unique perspective and disrupt the ‚Äòway things have always been done'. We will also consider what creative, active learning looks like in a traditional business classroom and why you should ask about learning styles and varied experience while recruiting. The arts are the original immersive and experiential pedagogy - leaders in active learning. Contemplate how adding "A" to STEM could be your secret weapon in building perspective, keeping clients happy and rounding out your work space!

Miranda Wickett, Ivey Business School

Session 4218-2020

The Seven Most Popular Machine Learning Algorithms for Online Fraud Detection and Their Use in SAS^®

Today, illegal activities regarding online financial transactions have become increasingly complex and borderless. This unlawful activity results in substantial economic losses for both customers and organizations. Many techniques have been proposed for fraud prevention and detection in the online environment. All these techniques have the same goal of identifying and combating fraudulent online transactions. However, each machine learning technique comes with its characteristics, advantages, and disadvantages. This session reviews the use of the most common machine learning algorithms used in online fraud detection, the strengths and weaknesses of these techniques, and how these algorithms are developed and deployed in SAS^®. Types of fraud discussed include credit card fraud, financial fraud, and e-commerce fraud. Algorithms reviewed include neural networks, decision trees, support vector machines, K-nearest neighbor, logistic regression, random forest, and na√Øve Bayes.

Patrick Maher, SAS Institute

Session 4160-2020

Think Globally, Act Locally: Understanding the Global and Local Macro Symbol Tables

One of the most missed questions on the Advanced SAS^® Certification exam during Beta testing dealt with global and local macro variables. Even people who got nearly every other question right missed these questions. In this paper, you learn when a local symbol table gets created, when a macro variable gets placed in the local table, what happens if you have a macro variable by the same name, and why does any of this matter. %LET, CALL SYMPUT, CALL SYMPUTX, and the INTO clause in SQL each have their own rules for addressing how they interact with the local and global tables. Techniques for debugging and displaying symbol tables are also discussed. Learn them here, live them when you leave.

Michelle Buchecker, Independent Consultant

Session 4085-2020

Three Reports on a Page and 4-Page Layouts on the Fly with SAS^® and the Output Delivery System

This paper details the creation of a ready-to-publish document with multiple data sourcesand differing page layouts of the same type of information based on page fit. Each set ofinformation is placed on one, two, or three pages depending on fit, and requires three callsto the REPORT procedure using three different data sets. This solution does not involve ODSDOCUMENT. Instead, we calculate how much horizontal and vertical space the reportrequires and use the macro language to execute code conditionally. This allows us todynamically determine which page layout to apply, including setting or suppressing pagebreaks with ODS STARTPAGE=. This solution makes extensive use of inline formatting aswell. The output was produced as an RTF file using SAS^® University Edition running on aMicrosoft Windows 10 machine.

Derek Morgan, PAREXEL International, Billerica, MA

Session 4674-2020

Time After Time: D-I-D and ITS Models in SAS

Healthcare and other epidemiological researchers are increasingly turning to difference-indifferences(D-I-D) and interrupted time series models (ITS) to analyze pre- and post- changes in outcomes around an intervention or exposure. These models are often used in quasi-experimental studies with non-randomized exposures using retrospective observational data, and they allow a causal interpretation and adjustment for secular trends in the outcome of interest. D-I-D models compare the rate of change an outcome measure before and after an exposure in exposed and control groups based on a single measure in each period. D-I-D data can be modeled with repeated-measures generalized linear models with an interaction term between time period and the exposure variable. The ITS - an extension of the D-I-D design - compares trends in an outcome over multiple pre- and post- time period measures, allowing for a discontinuity (an "interruption") in both control and exposure average rates during the study period. Models of this type account for situations where rates of an outcome of interest shift in both exposed and control groups at a specific time point, such as the change from ICD9 to ICD10 diagnosis codes during 2015. We will discuss the statistical basis of these methods and illustrate data structure, modeling methods, power calculations and interpretation of model estimates for both models with data from studies performed in a large integrated healthcare system.

E Margaret Warton, Kaiser Permanente Northern California Division of Research

Session 4801-2020

Time Is on Your Side. Yes It Is. Using Base SAS to Manipulate Time

Time can be tricky. In relational database management systems (RDBMS) there can be many time-based (temporal) components (for example, datetime, date, and time fields). They are typically stored as numeric data types in the native RDBMS format for temporal components. Sometimes they are stored as character fields in the RDBMS. The way that the SAS^® software interprets and reads temporal components depends on several things. The challenge of reading and converting them into useful variables for data analysis in SAS requires some finesse, but any SAS programmer or analyst can master this with a few tips and tricks. This presentation helps you understand how temporal components are stored and formatted in RDBMS and subsequently how to take advantage of all that Base SAS^® has to offer for working with temporal components. The use of SAS INFORMATs, FORMATs, automatic macro variables (like &SYSDATE9), and useful SAS functions (like INTCK and/or INTNX) are demonstrated to inspire programmers and analysts to tackle any temporal component manipulation needed to render useful analyses of their data.

Carole Jesse, Idaho National Laboratory

Session 4219-2020

Tips for Building Rich Interaction in Your SAS^® Visual Analytics Reports

Your data has a lot to say. How do you make these insights available to your audience? Enhance the report viewing experience by taking advantage of features in SAS^® Visual Analytics to build reports that engage your audience in exploration. This paper highlights features including automated analytical objects, object interactions, drilling hierarchies, as well as report and page controls. The paper also describes techniques to make the most out of these features. With these tips, you can create an interactive report that helps you and your audience dig deeper into the data to gain insights.

Jeanne Marie Tan and Sierra Shell, SAS Institute

Session 4376-2020

Tips for Writing Custom SAS^® Studio Tasks That Use CAS Actions for Advanced Analytics

The SAS^® Studio programming interface includes point-and-click user interfaces, called tasks, that enable you to use SAS^® software without writing any code. As you navigate a task's menus and options, the underlying SAS code is automatically generated for you in real time, making SAS analytics accessible even to users who are not familiar with SAS code. SAS Studio tasks cover a wide range of analytical areas, but you can also create custom tasks to suit your needs. For example, the code that a task generates can include the CASL programming language that you can use to interact with SAS^® Cloud Analytic Services (CAS). CAS is a cloud-enabled, in-memory analytics engine from SAS that uses distributed computing for highly scalable, very fast execution. Because CASL is a relatively new language, custom tasks that generate CASL code make it easier for you to use CAS actions and learn the CASL syntax as you migrate your analytics to CAS. This paper starts by brieÔ¨Çy presenting some background on SAS Studio tasks and CAS. Then it uses analytical actions in SAS^® Econometrics software to provide several tips and tricks for writing CASL-generating custom tasks.

Brian R. Gaines, SAS Institute

Session 4956-2020

Tips, Traps, and Techniques in BASE SAS for vertically combining SAS data sets

Although not as frequent as merging, a data manipulation task which SAS^® programmers are required to perform is vertically combining SAS data sets. The SAS system provides multiple techniques for appending SAS data sets, which is otherwise known as concatenating, or stacking. There are pitfalls and adverse data quality consequences for using traditional approaches to appending data sets. There are also efficiency implications with using different methods to append SAS data files. In this paper, with practical examples, I examine the technical procedures that are necessary to follow to prepare data to be appended. I also compare different methods that are available in BASE SAS to append SAS data sets, based on efficiency criteria.

Jay Iyengar, Data Systems Consultants

Session 4445-2020

Title: Forecasting Made Easy: A Macro to Select an Optimal Exponential Smoothing Model Using SAS^® Proc ESM

An accurate forecast is an invaluable tool for anticipating changes that may require a policy, budget, or other response. In health care, there are many potential applications, including enrollment, utilization, cost, and operational processes. The ESM procedure in SAS^®software generates forecasts with the option to use a variety of different exponential smoothing methods. However, deciding on which method to use is a challenge. Using publicly available Medicare enrollment data, we demonstrate a macro that makes selecting the best-performing forecast model easy and intuitive, so that users are able to create reliable forecasts to inform their decision-making.

Bil Westerfield, Magpie Health Analytics

Session 4655-2020

Transdermal Medicine - Challenges and Solutions with SAS^®

In 21 century Transdermal Drug Delivery System (TDDS or TDS) has become the fastest growing division in pharmaceutical industry because of innovative technologies and various advantages over the oral medicine. Fast growth came along with multiple challenges. The challenging dermal safety studies must be conducted to determine transdermal system performance, like adhesion, discomfort, irritation, and sensitization. Non-inferiority tests are also required to justify acceptable adhesion and irritation of a new product in comparison to an active reference. The rationale for data analyses and clinically meaningful choices for noninferiority margins is still an area of research and development. This paper concentrates on the analyses and reporting of transdermal data utilizing SAS. The author will: 1) review the existing measurement scales for transdermal safety; 2) highlight the difference in scales and requirements between regulatory agencies; 3) concentrate on the planning and conducting non-inferiority studies, and 4) demonstrate how p-values facilitate conclusions of superiority, non-inferiority, and bioequivalence of treatments. Sample size and power calculations will be demonstrated. The author is convinced that TDDS has not been fully appreciated as an effective alternative to oral and injection drug delivery methods, and hopes this paper helps to convince others that efficient transdermal studies can be conducted with confidence.

Marina Komaroff, Noven Pharmaceuticals, Inc

Session 5029-2020

Transfer Learning for Mining Digital Phenotype by SAS^® Viya^®

With the remarkable development of various AI and machine learning methods in recent years, various advanced technologies based on collected data have been born in each industry. One of the technologies represented by them is deep learning. Image recognition, speech recognition, and natural language processing are well-known usage applications for deep learning. Computers automatically extract important features that affect the results from data without human intervention. This method achieves recognition and identification accuracy that is much higher than that of the method, and is not inferior to humans. By using this, from data collected from digital devices that are now widespread worldwide, such as posts to social media such as SNS and blogs, call logs from smartphones, and accelerometer data obtained from sensors, it may be possible to discover and quantify the human phenotypes, so-called digital phenotypes, that characterize the owner, which cannot be easily discovered by human hands. However, the implementation of deep learning requires a large amount of training data. Although barriers to obtaining big data have disappeared from several years ago, there are still many cases where it is difficult to obtain a sufficient amount of data depending on the target problem and the environment where it is located. One way to solve such a problem is transfer learning. In this paper, we implemented transfer learning in SAS^® Viya etc. as a method that can be expected to be applied to digital phenotyping in the situation where the amount of data is insufficient. In particular, we examine the usefulness of transfer learning by dealing with cases where deep learning is used to determine whether or not a sticky note is attached in a document image. Furthermore, we try to find out the characteristics of the author and the story by applying transfer learning to the case that identifies the author from a sentence of the book. It imitates mining phenotype.

Satoki Fujita,Ryo Kiguchi, Yuki Yoshida, Katsunari Hirano, and Yoshitake Kitanishi, Shionogi & Co., Ltd.

Session 4109-2020

Translating SQL to SAS^® and Back: Performing Basic Functions Using DATA Steps vs. PROC SQL

Many beginning SAS^® software users know how to write SQL code or SAS code, but not both. This quick tip is designed to teach users how to complete basic data manipulation and visualization processes using both DATA steps and PROC SQL.

Katie Haring, Highmark Health

Session 4232-2020

Tricks of the Trade: Streamlining and Enhancing Your SAS^® Sessions

We are all here today because we have chosen SAS^®; coding in SAS is likely a part of daily life, whether as programmers, database managers, or statisticians. Some of us with options at our companies have chosen SAS over other coding languages, such as R or Python. SAS is incredibly powerful and flexible; its PROCedures are validated and its code tested. It provides an impressive amount of options within PROCedures, and by default, the output contains all the relevant information for model fitting, statistical tests, and parameter estimates. Unfortunately, SAS, by design, requires more code to be written to perform the same operation compared to other languages and much of the output is valuable only for review and not necessarily for reporting. We live and die by the semi-colon, run statements, and procedure calls. From my R user colleagues, the sheer differential in writing is enough to turn them off. However, coding in SAS need not be so cumbersome and reporting so tiresome. Features exist to make our code easier to read and to reduce the amount of code we write. Today, I'm going to cover a range of streamlining methods in SAS, such as including abbreviations, shortcut key macros, and using macros with ODS output to combine and present tables. This talk is intended for SAS Enhanced Editor users who want to have easier navigation, cleaner code, and streamlined output.

Debra A. Goldman, Memorial Sloan Kettering Cancer Center

Session 4530-2020

Troubleshooting Encoding Issues When Integrating Data from Multiple Systems

Symptoms, diagnoses, and cure for encoding issues: When integrating multiple data sources in a real-world environment, especially with legacy databases, it is not uncommon to encounter data sets with different encodings. What types of errors can this cause? How can you recognize that what you are seeing is a result of mixing various encodings? What are some of the methods that can be used to determine what the native encoding of your data sources and SAS^® session are? This session walks through troubleshooting a real-life example and its resolution.

Lydia Sbityakov and Breanna Barton, SAS Institute

Session 4645-2020

Turn Yourself into a SAS^® Internationalization Detective in One Day: A Macro for Analyzing SAS Data of Any Encoding

With the growing popularity of using Unicode in SAS^®, it is more and more likely that you will be handed SAS data sets encoded in UTF-8. Or at least they claim those are genuine UTF-8 SAS data. How can you tell? What happens if you are actually required to generate output in plain old ASCII encoding from data sources of whatever encoding? This paper provides a solution that gives you all the details about the source data regarding potential multibyte Unicode and other non-ASCII characters in addition to ASCII control characters. Armed with the report, you can go back to your data provider and ask them to fix the problems that they did not realize they have been causing you. If that is not feasible, you can at least run your production jobs successfully against cleaned data after removing all the exotic characters.

Houliang Li, HL SASBIPROS INC

Session 4612-2020

Turning the Crank. A Simulation of Optimizing Model Retraining

Model retraining is a common practice in the advanced model life cycle. However, the critical question is how do you know when you need to retrain the model? Once the model is retrained, how do we determine when we need to redeploy the model? Can we predict how long the model will be relevant? The answers can depend on one or more of many factors including calendar fluctuations, business cycles, data drift, model performance, expected benefit, and many others. Given those factors, we want to find the optimal points in time to retrain and redeploy a predictive model. This paper presents a simulation study of different strategies and techniques for optimizing model retraining with the goal of maintaining optimal business performance.

David R. Duling, SAS Institute

Session 4713-2020

Twenty Ways to Run Your SAS Programs Faster and Use Less Space

When we run SAS^® programs that use large amounts of data or have complicated algorithms, we often are frustrated by the amount of time it takes for the programs to run and by the large amount of space required for the program to run to completion. Even experienced SAS programmers sometimes run into this situation, perhaps through the need to produce results quickly, through a change in the data source, through inheriting someone else's programs, or for some other reason. This paper outlines twenty techniques that can reduce the time and space required for a program without requiring an extended period of time for the modifications. The twenty techniques are a mixture of space-saving and time-saving techniques, and many are a combination of the two approaches. They do not require advanced knowledge of SAS, only a reasonable familiarity with Base SAS^® and a willingness to delve into the details of the programs. By applying some or all of these techniques, people can gain significant reductions in the space used by their programs and the time it takes them to run. The two concerns are often linked, as programs that require large amounts of space often require more paging to use the available space, and that increases the run time for these programs.

Stephen Sloan, Accenture

Session 5099-2020

U-can-NIX Leaving Your SAS^® Session to Do That

SAS^® programmers often play the role of "electronic storage space steward," requiring them to monitor the electronic space their programs and data sets take up as well as the electronic space their colleagues utilize. With tight deadlines, it's easy for programmers to write programs or create data sets without putting a lot of thought into where the data sets are stored or even how large those data sets are. As a result, electronic space and organization can quickly get out of hand. This presentation provides examples of how to utilize UNIX commands to locate files that you may be able to delete in order to free up electronic space. For example, data sets with "old" or "temp" at the end of the file name are typically intermediate data sets before a final data set is created or updated. The presentation also provides examples of reading in files, such as Excel files, that are not SAS files by using the UNIX FILENAME PIPE command and utilizing FIND - MAXDEPTH to monitor electronic space.

Lorelle Benetti, Division of Biomedical Statistics and Informatics, Mayo Clinic

Session 6015-2020

Unlock the Business Value of IoT with Analytics

The impact of Internet of Things (IoT) Analytics might be greater—and taking hold more quickly—than many observers expect. Organizations that are pursuing an IoT strategy are finding they can’t compete effectively without using analytics. Despite the importance of IoT Analytics, many organizations lack a clear vision to execute their initiatives. According to a recent study by PWC, “more than six out of ten organizations have failed to take operational IoT initiatives past proof-of-concept stage or beyond implementation.” This paper provides an example IoT setup using SAS® software, open source software, and a Raspberry Pi® to build a project that can be utilized for demonstrating IoT Analytics. Why would we do this? Although not covered in detail in this paper, we can use IoT Analytics to help with the following use cases:

Predictive maintenance
Image recognition and analysis
Overall equipment effectiveness
Refinery monitoring
Risk analysis
Operational performance
And more

It’s time to make your IoT data work for you!

Jon Klopfer, Elizabeth McGlone, and Donald L. Penix, Jr. (DJ), Pinnacle Solutions, Inc.

Session 5203-2020

Unsupervised Contextual Clustering of Abstracts

This study utilizes publicly available data from the National Science Foundation (NSF) Web Application Programming Interface (API). In this paper, various machine learning techniques are demonstrated to explore, analyze and recommend similar proposal abstracts to aid the NSF or Awardee with the Merit Review Process. These techniques extract textual context and group it with similar context. The goal of the analysis was to utilize the Doc2Vec unsupervised learning algorithms to embed NSF funding proposal abstracts text into vector space. Once vectorized, the abstracts were grouped together using K-means clustering. These techniques together proved to be successful at grouping similar proposals together and could be used to find similar proposals to newly submitted NSF funding proposals. To perform text analysis, SAS^® University Edition is used which supports SASPy, SAS^® Studio and Python JupyterLab. Gensim Doc2vec is used to generate document vectors for proposal abstracts. Afterwards, document vectors were used to cluster similar abstracts using SAS^® Studio KMeans Clustering Module. For visualization, the abstract embeddings were reduced to two dimensions using Principal Component Analysis (PCA) within SAS^® Studio. This was then compared to a t-Distributed Stochastic Neighbor Embedding (t-SNE) dimensionality reduction technique as part of the Scikit-learn machine learning toolkit for Python. Conclusively, NSF proposal abstract text analysis can help an awardee read and improve their proposal model by identifying similar proposal abstracts from the last 24 years. It could also help NSF evaluators identify similar existing proposals that indirectly provides insights on whether a new proposal is going to be fruitful or not.

Jacob Noble and Himanshu Gamit, University of St Thomas, St Paul

Session 5184-2020

UPDATE to the Rescue

The UPDATE statement provides a unique method of combining two SAS^® data sets. That method is invaluable when needed; on the other hand, it is rarely needed. This paper explores a variety of ways to expand the usefulness of the UPDATE statement, making it the simplest solution to a broad variety of programming problems.

Robert Virgile

Session 4936-2020

Upgrading Clinical Trial Reports from ODS LISTING to ODS TAGSETS.RTF

Several pharmaceutical and biotechnology companies are still using ODS LISTING as the primary method for producing outputs. While there is nothing inherently wrong with this approach, using either ODS RTF or ODS TAGSETS.RTF can provide more aesthetically pleasing outputs in the same font as the main clinical study report. In addition, it can provide a more harmonized approach between tables, listings and figures by having a file format compatible with graphics procedures. Lastly, it lays the foundation to have a more robust validation process by having the option to read the actual RTF file back into a SAS^® data set. This paper provides examples of typical clinical tables and listings and how to create them in the TAGSETS.RTF destination. Several commonly used options found within the REPORT procedure such as HEADLINE, HEADSKIP, SPLIT=, SKIP, and many others are only available in the LISTING destination. We will provide alternative approaches including STYLE override options, inline formatting, and COMPUTE BLOCK statements when upgrading to using the TAGSETS.RTF destination. We will also touch on STYLE template basics in the TEMPLATE procedure to avoid lengthy PROC REPORT procedures. After covering these topics, the programmer will be equipped to produce any table or listing typically needed for a clinical trial report. SAS 9.4 M6 was used in the examples presented. The intended audience for this paper is beginner to intermediate SAS users with basic knowledge of Base SAS and PROC REPORT.

Christopher J. Smith, Cytel

Joshua M. Horstman, Nested Loop Consulting

Session 5145-2020

Urge to MERGE? Maybe You Should UPDATE Instead

The DATA step's UPDATE statement is similar to the MERGE, but it has some helpful built-in logic of which many users of SAS may not be familiar. In most cases, this built in logic can yield much simpler DATA steps. This paper sheds light on some of these build-in features and takes a step-by-step approach to showing you how to take advantage of this power that is already there.

Ben Cochran, The Bedford Group

Session 4925-2020

Use SAS to Model Non-Linearity

The relationship between an outcome and a continuous predictor is often not linear. However, linearity is one of the assumptions for a multivariable regression model. Many options, such as transformation and restricted cubic splines, are available to handle non-linear relationships; however, these models are often hard to interpret. Linear spline is a simple approach to account for non-linearity and can provide interpretable results. This paper illustrates the use of linear splines to describe the relationship between a continuous variable and a binary outcome in a regression model. We used the relationship between hematocrit and blood transfusion as an example. Low hematocrit, a continuous measurement of volume percentage of red blood cells in whole blood, is known to be associated with blood transfusion in a clinical setting. We first assessed the non-linearity using the LOESS, GAM and SGPLOT procedures. We then constructed the linear splines using BASE SAS programming. Lastly SAS Logistic procedure was used to estimate the linear splines. Emphasis is given to model interpretation to demonstrate the value of linear splines.

Xiaoting Wu and Donald S. Likosky, Department of Cardiac Surgery, Michigan Medicine

Session 4228-2020

Using a Heat Map or the GTILE Procedure: Does Size Matter in Your Graphics?

With SAS^® 9.2 and beyond, ODS Graphics brings in a new way of generating high quality graphs. Many users still find themselves at the crossroads, trying to decide what path to follow - the traditional SAS/GRAPH or ODS Graphics. Both can produce most of the common types of graphs, such as scatter plots, regression and box plots, line graphs, bar charts, and histograms. In this paper we will share examples to generate a heat map using SGPLOT in ODS Graphics and a GTILE graph with SAS/GRAPH and discuss the advantages of each. With heat maps, you can display patterns in the data for a chosen response variable for one-dimensional data (in a map) or two-dimensional data (in a table or graph). Heat maps make tables and graphs easier to interpret, by shading the background color based on the frequency of observations in each cell of the graph or table. With GTILE, the response values are shown by both color gradient and area. We will also show how to customize a heat map by making use of the SAS^® Graph Template Language (GTL) and SGRENDER procedure. Users will appreciate how quick and easy it is to generate sophisticated graphs with ODS graphics.

Devi Sekar, RTI International, North Carolina

Session 5109-2020

Using a Microsimulation to Evaluate Population and Policy Changes on Program Integrity Priorities

The Centers for Medicare & Medicaid Services (CMS) conducts Risk Adjustment Data Validation (RADV) to ensure the accuracy and integrity of data submitted for Medicare Advantage (MA) payments. CMS requires a simulation to forecast future program integrity trends. The RADV Synthetic Plan Microsimulation makes individual-level projections with a series of machine learning models and uses an agent-based simulation framework to forecast outcomes. SAS Viya and Python are used together in a streamlined framework, from data manipulation, to modeling, to simulation, and finally to visualization. Beneficiary data, including social determinants of health, are simulated with machine learning models such as enrollment events, health conditions, and payment error. Bootstrapping enables CMS to estimate variability in future payment error via point estimates and confidence intervals enabling CMS to compare various sampling strategies. A combination of code optimization and work parallelization in SAS Viya deployed in a cloud environment improves and distributes the required computation, reducing the run time of the bootstrapping process for multiple sampling strategies from nearly a year to only a few hours. Displaying the results of the sample strategies through an interactive dashboard in SAS Visual Analytics provides CMS with high-level information to help future decision making. The dynamic interactions and filters in this dashboard are comprehensive and user-friendly. Stakeholders can make policy and program integrity decisions with this interactive, scenario-based approach.

Jonathan Smith Meghan Beckowski, Michale Greene, Andrew Kemple, Bil Westerfield, Jason A. Oliver, Canada Revenue Agency (CRA)

Andrew Gannon, The Financial Risk Group

Stefanie Kairs, National University

Session 4701-2020

Using Analytics to Predict Tax Recovery and Prioritize Audits and Investigations in Canada

The Canada Revenue Agency (CRA) has made tremendous inroads in the last two years by leveraging the power of predictive analytics, notably by using web domain and E-Commerce data for corporate taxpayers. This session leverages the capabilities of SAS^® Enterprise Guide^® and SAS^® Enterprise Miner in unearthing predictive patterns of interest with the clear objective of strengthening a feedback loop between tax risk assessment and the corresponding accrual of tax via audit. We examine powerful data learning techniques, as they apply to tax-based analytics, such as neural networks, decision trees, and regression analysis.

Jason A. Oliver, Canada Revenue Agency (CRA)

Session 4081-2020

Using Base SAS^® Code to Dynamically Load Data to the LASR Analytic Server

This paper serves as an introduction to loading data into the LASR Analytic Server for use in Visual Analytics (VA) using SAS code. Many organizations have robust solutions built in SAS that create reports. Some of these reports are in output files such as XLSX or PDF, others are in the form of OLAP Cubes. As Visual Analytics is swiftly moving into many organizations as the premier reporting tool, the need to automatically and dynamically load data into the platform becomes more crucial. Currently many users utilize the Data Preparation tab inside of Visual Analytics to load their data in. The purpose of this paper is to show how to use base SAS code to dynamically load data into the LASR Analytic Server for use in the Visual Analytics environment.

Andrew Gannon, The Financial Risk Group

Session 5008-2020

Using Factor Analysis and MANOVA to Explore Academic Achievement in the 2016 Monitoring the Future Survey Data

The 2016 Monitoring the Future survey is part of an annual, long-term study of American adolescents and adult high school graduates conducted by the University of Michigans Institute for Social Research. This secondary data analysis uses the FACTOR procedure in SAS^®Studio software to perform factor analysis to extract latent structures describing academic achievement, environment, and student delinquency. A total of 17,719 observations were used to perform multivariate analysis of variance (MANOVA) via the GLM procedure to explore the relationships between the extracted factors and demographic variables for ethnicity, gender, and population density. The SAS^®code and results are presented here, along with a discussion of the necessary data cleaning steps, data quality assessment, and post hoc analyses. Population density explains 2% (Pillais trace = 0.022, p < 0.0001) of the variance in academic achievement, academic environment, and at-risk behaviors. Gender explains 4% (Pillais trace = 0.044, p < 0.0001), and race explains 14% (Pillais trace = 0.145, p < 0.0001) of the variance. The academic environment for 8th- and 10th-grade students was described by an extracted factor with high loadings for the variables for parental education, college preparatory program, and remedial schooling (negative loading) and was shown to vary significantly by race.

Stefanie Kairs, National University

Session 4732-2020

Using Jupyter to Boost Your Data Science Workflow

From state-of-the-art research to routine analytics, the Jupyter Notebook offers an unprecedented reporting medium. Historically, tables, graphics, and other types of output had to be created separately and then integrated into a report piece by piece, amidst the drafting of text. The Jupyter Notebook interface enables you to create code cells and markdown cells in any arrangement. Markdown cells allow all typical formatting. Code cells can run code in the document. As a result, report creation happens naturally and in a completely reproducible way. Handing a colleague a Jupyter Notebook file to be re-run or revised is much easier and simpler for them than passing along, at a minimum, two files:one for the code and one for the text. Traditional reports become dynamic documents that include both text and living SAS^®, R, Python or other code that is run during document creation. With Jupyter, you have the power to create these computational narratives and much more!

Hunter Glanz, Cal Poly, San Luis Obispo

Session 4167-2020

Using Lag and Other Function in SAS^® to Create Final Datasets Temperature Study

Temperature control is important for the health of premature babies. One way to monitor body temperature in premature infants is through continuous measurement of central (abdominal) and peripheral (foot) skin temperature. Fourteen premature infants, born ‚â§ 32 weeks gestational age and birthweights < 1500 grams were enrolled for study after Institutional Review Board approval and parental consent for participation in this study. Each infant had one skin temperature probe (thermistor) attached to their abdomen and one skin temperature probe to the sole of one foot. The data downloaded from incubator had several problems such as extra rows, missing value for several minutes, and so forth. The data for each days included about 86,400 rows. The final datasets for each infant included averaged data for every minute of all variables for 28 days. Lag and several functions in SAS were used to prepare data for analyses. Several programs were used to delete unnecessary rows, create minutes from the time each infant was born to the time infants completed data collection, to replace missing minutes, combine and merge different datasets. Several procedures in SAS were used to analyze data such as Means, Freq, Univariate, Gplot, and Sgplot. All data analyses was performed using SAS/STAT^® statistical software, version 9.4

Abbas S. Tavakoli, DrPH, MPH, ME, Thomas Best, BS Student, and Robin B. Dail, PhD, RN, FAAN, University of South Carolina

Session 4405-2020

Using Machine Learning and Demand Sensing to Enhance Short-Term Forecasting for CPGs

Consumer packaged goods companies (CPGs) account for some of the biggest industries in the world, providing essential items on a regular basis. Supply chain management at CPGs is complex because several products are supplied through multiple channels and distribution methods. Products follow complex order patterns characterized by promotional events, seasonal influences, natural disasters, and so on. Given this complexity, it is crucial to generate accurate short-term forecasts of order quantities that reflect the realistic demand for products. Such forecasts enable companies to drive an efficient supply chain response to improve customer service. This paper uses machine learning along with traditional time-series forecasting models to generate enhanced weekly and daily forecasts by using historical-demand signal data and point-of-sale data. The model first creates enhanced weekly forecasts, and then breaks down enhanced weekly forecasts into daily forecasts. For weekly forecasts, a combination of a traditional time-series forecasting model and a neural network is used to create a productwise forecast. This model combination allows for capturing the complex weekly order patterns and provides an accurate forecast of product demand. Weekly forecasts are divided into daily forecasts using an ensemble of three models: a seasonal model, a trend model, and a neural network model. The paper discusses the methodology behind this approach, along with short-term forecasting results.

Kedar Prabhudesai, Varunraj Valsaraj, Dan Woo, Jinxin Yi, amd Roger Baldridge, SAS Institute

Session 4384-2020

Using Python with Model Studio for SAS^® Visual Data Mining and Machine Learning

There are many benefits to using Python with Model Studio. It enables you to use the SAS^® Scripting Wrapper for Analytics Transfer (SWAT) package with the SAS Code node. It makes interacting with SAS^® Viya^® easier for nontraditional SAS programmers within Model Studio. It is used within a SAS Code node for deployable data preparation and machine learning (with autogenerated reporting). It enables you to use packages built on top of SWAT, such as the SAS Deep Learning Python (DLPy) package, for deep learning within a SAS Code node. It gives you the ability to call Python in an existing open source environment from Model Studio to authenticate and transfer data to and from native Python structures. It lets you use your preferred Python environment for either data preparation or model building and call it through a SAS Code node for use or assessment within a pipeline. It enables you to use Python with the Open Source Code node. And it provides native Python integration for data preparation and model building. This paper discusses all these benefits and presents examples to show how you can take full advantage of them.

Jagruti Kanjia, Dominique Latour, and Jesse Luebbert, SAS Institute

Session 4638-2020

Using SAS PRX functions for medical records

SAS^® programmers analyzing complex free text data, like that present in the electronic medical record, likely find basic string functions insufficient or unwieldy for extracting meaningful data points from highly variable text sources. In addition to coded data like diagnoses, a patient's medical record consists of doctors' notes, test results, and reports with immense provider variation in text-cryptic abbreviations and typos are only the beginning. Perl regular expressions tackle many text problems like those encountered by programmers analyzing the medical record, and SAS uses this powerful tool with a suite of functions and call routines. This paper reviews the basics of implementing Perl regular expressions in SAS using SAS functions like PRXPARSE, CALL PRXSUBSTR, PRXMATCH, PRXCHANGE, and CALL PRXNEXT. Text examples draw from several medical specialties. Topics covered include managing variation within single words (like typos), locating multiple keywords with varying distance between words, finding multiple iterations of a target phrase, breaking up text by words or sentences, dealing with negation, and managing very long regular expressions. Practical considerations for getting started with a natural language processing project and balancing false positives versus false negatives are discussed.

Amy Alabaster and Mary Anne Armstrong, Kaiser Permanente

Session 4152-2020

Using SAS^® and R Integration to Manage and Create a Multilevel Complex Database: A Case Study of the U. S. Department of Defense Military Health System Data Repository

As the most established statistical analysis system, SAS^® is widely used by researchers and programmers in diverse fields. R, an open-source statistical programming language, has widespread usage among small and medium enterprises as well as both researchers and programmers. Despite widespread use, both SAS^® and R have advantages and limitations. Consequently, integrating SAS^® and R provides an efficient solution to practical data management and analysis problems. Using the U. S. Department of Defense's Military Health System Data Repository, we present a case study integrating SAS and R to manage and analyze complex, multi-sourced administrative and medical claims databases. SAS code to utilize the R interface within SAS is also provided.

Akhtar Hossain and Nikki R. Wooten, University of South Carolina

Laura A. Hopkins, Kennell & Associates, Incorporated

Session 4601-2020

Using SAS^® Customer Intelligence 360 Multivariate Testing to Determine Incremental Campaign Revenue

Do you want to determine which campaign would be the most financially successful at increasing revenue for each customer segment? Probably your business has different customer segments, but you want to determine which single campaign will be the most relevant at driving the most business value. Most companies want to identify their most impactful campaigns. Accenture found that "91% of consumers are more likely to shop brands who provide relevant offers and recommendations," but concluding which campaign resonates the most with your visitors can be overwhelming and time-consuming. In this paper, we discuss SAS^® 360 Discover, SAS^® 360 Engage Digital, and SAS^® Studio solutions can be leveraged for an e-commerce business to determine which campaign and functional effort was the best at delivering incremental revenue and conversion per customer segment. We deployed A/B and multivariate tests using SAS 360 Engage and captured the data that delivered targeted personalized campaigns, and then we analyzed each marketing campaign to adjudicate the most prosperous based on conversation rate, average order value, revenue per session, and conversation rate lift. Finally, we calculated the statistical significance of each metric.

Mia Patterson-Brennan and Kate Davies, SAS Institute

Session 4782-2020

Using SAS^® for Generating and Sending Certificates by Email

This paper shows a %certificate macro to create a basic structure of a certificate and then send it by email in a pdf format. Before to use the %certificate macro, it is required only a list (or a file) with the name and email of the participants.

Alan Ricardo da Silva, Universidade de Brasília, Departamento de Estatística

Session 4679-2020

Using SAS^® Macro Variable Lists to Create Dynamic Data-Driven Programs

The SAS^® macro facility is an amazing tool for creating dynamic, flexible, reusable programs that can automatically adapt to change. In this hands-on workshop, you'll learn how to create and use macro variable lists, a simple but powerful mechanism for creating data-driven programming logic. Don't hardcode data values into your programs. Eliminate data dependencies forever and let the macro facility write your SAS code for you!

Joshua M. Horstman, Nested Loop Consulting

Session 5069-2020

Using SAS^® OpRisk Global Data to Improve Decision-Making at a Bank

The management of financial losses is crucial as banks are required to set aside regulatory capital to absorb unexpected losses. Banks also need to calculate economic capital to ensure solvency according to their own risk profile. The main financial risks faced by banks are market, credit, and operational risk. Operational risk, the focus here, includes fraud, improper business practices, and so on. Barings Bank's loss of over USD1 billion due to rogue trading activities is an extreme example of such risk. In order to calculate capital to withstand this risk, the aggregate distribution of expected losses for the next year is determined. The extreme quantiles of this distribution are of specific interest. For instance, a bank should hold capital to survive a one-in-a-thousand-year aggregate operational loss (the 99.9% VaR of the distribution). Companies often have only limited internal data available to accurately model the distribution and therefore use external sources and scenario assessments to supplement their data. Combining the internal data of a given bank with external data is challenging, as such data is collected from differently sized institutions in various regions. This might impact the estimated loss distribution. In this paper, we use SAS^® OpRisk Global Data to show how external and internal data can be integrated for use in the capital modeling process. We also suggest measures to challenge experts to adjust scenario assessments based on historical data.

Mentje Gericke and Helgard Raubenheimer, Centre for Business Mathematics and Informatics^®, North-West University

Session 5077-2020

Using SAS^® to Pick the Right Report

The equity project at Central Piedmont Community College originally consisted of one report per new student cohort; the report was generated from historical information obtained over a three-year period. In order to offer information on a more frequent and timely basis, the original report has been split into a series of mini-reports: first term, first year, first term in second year, second year, and third year outcome. Selecting the appropriate report for a cohort depends on both the cohort start date and the date of code execution. SAS^®software can be used to automatically analyze, select, and produce the appropriate report(s) through the use of macro functions (%IF %THEN, %DO %END, and %INCLUDE) within a larger macro. This presentation includes a logic map and sample output in addition to the code.

Kelly Smith, Central Piedmont Community College

Session 4270-2020

Using SAS^® Visual Analytics Reports for Operational Reporting in Clinical Trials

Our presentation shows how to use SAS^® Visual Analytics to implement data visualization for operational reports in clinical trials. In clinical trials, operational reports are often used to analyze and monitor site characteristics and performance metrics as well as patients' enrollment, safety, and follow up. Data visualization can track the site and enrollment information in a dynamic and interactive way, and provide significant time savings with no decrease in accuracy compared to traditional static reporting. At the Duke Clinic Research Institute, we have successfully implemented data visualization for operational reports using SAS Visual Analytics. In our operational reports, we use SAS Visual Analytics Geo Maps to monitor site enrollment distribution. In our Geo Maps, we add the display rules to identify at risk sites, and the time series to show the site activation/enrollment by time. We add Hierarchy (patients' status) on pie charts so we can track patients' status from screening to end of study by drilling down into the pie charts. We also add drop down lists to subset enrollment/site reports, so the reports can be used to compare sites by milestones, or enrollment by desired frequency (monthly/weekly). We use interactions to link aggregated reports (line chart, bar chart, pie chart et al.) and detailed listing tables, so for any interesting summary information, we can easily identify the individual patient/site level information.

Jun Wen and Jack Shostak, Duke Clinical Research Institute

Session 4901-2020

Using SAS9 API and R to Create Violin Plots, Interactive 3D Plots, and a Shiny App for SAS^® Data Sets

Open-source tools are extremely popular within the data science community, and the R language is one of them. While SAS^®Viya^®allows easy collaboration between SAS^®and open-source languages like R and Python using the HTTP protocol, SAS^®9 lacks this feature. To address this issue, we designed our SAS9API solution. Among other options, it enables you to load SAS data into R. In this presentation, we look at different open-source data visualization techniques: violin plots, interactive 3D plots, and even more interactive Shiny apps. Step by step, we guide you through how to visualize SAS data sets using basic R code.

Olga Pavlova, Analytium

Session 5141-2020

Using the ODS Report Writing Interface to Streamline Publication of Existing Reports with Complex Tables

This paper examines the SAS® Output Delivery System (ODS) Report Writing Interface (RWI) to streamline the publication process of "Health, United States", an annual report on health and health care in the United States. The report’s tables contain multiple nested categories within rows and columns and are suitable for printing (PDF) and downloading (Microsoft Excel). The publication process from SAS datasets currently involves several labor-intensive steps. In addition, Section 508 of the Rehabilitation Act of 1973 requires all federal agency website content to be accessible to people with disabilities. The process of manually labeling content for screen readers ("tagging") presents an additional challenge for these complex nested tables. The publishing process is being automated and the output delivery is being enhanced using the RWI. The ODS templates and Cascading Style Sheets are used to standardize the format of the output and varied formatting using conditional processing. This approach maintains the existing output formats while limiting the manual processing, decreasing production time, and increasing quality. This approach allows automated production of a variety of file formats including PDF, HTML5, and Excel. Using the RWI and SAS accessibility features, most of the manual Section 508 compliance task of the output should be eliminated. This paper will highlight these proposed changes for production of “Health, United States 2017.”

James Brittain and Barnali Das, National Center for Health Statistics (CDC)

Session 4156-2020

Using the Optgraph Procedure to Construct Closed Knight's Tours on Standard and Variant Chessboards

A closed knight's tour of a chessboard uses legal moves of the knight to visit every square exactly once and return to its starting position. The closed knight’s tour is a recreational variation of the well-known Traveling Salesman Problem (TSP). Gerlach uses Warnsdorff's heuristic technique and SAS® to produce closed knight's tours for the traditional 𝟖 × 𝟖 board. Generalizing the same approach, Gerlach and Gerlach produce closed knight’s tours for the 𝒏 × 𝟖 × 𝟖 chessboard for 𝒏 = 𝟐, 𝟑, 𝟒, 𝟓, 𝟔 and theorize that the technique works for larger values of n as well. In 1991, Schwenk completely classified the 𝒏 × 𝒎 rectangular chessboards that admit a closed knight's tour. In 2011, DeMaio and Mathew completely classified the 𝒊 × 𝒋 × 𝒌 rectangular prism chessboards that admit a closed knight's tour. This paper demonstrates how to utilize PROC OPTGRAPH with the TSP option to construct closed knight's for rectangles, rectangular prisms, hexagons and other variant chessboards.

Joe DeMaio and Md Shafiul Alam, Kennesaw State University;

Session 4170-2020

Using the R interface in SAS ^® to Call R Functions and Transfer Data

Starting in SAS ^® 9.3, the R interface enables SAS users on Windows and Linux who license SAS/IML ^® software to call R functions and transfer data between SAS and R from within SAS. Potential users include SAS/IML users and other SAS users who can use PROC IML just as a wrapper to transfer data between SAS and R and call R functions. This paper provides a basic introduction and some simple examples. The focus is on SAS users who are not PROC IML users, but who want to take advantage of the R interface.

Bruce Gilsen, Federal Reserve Board, Washington, DC

Session 4826-2020

Variable Selection Using Random Forests in SAS^®

Random forests are an increasingly popular statistical method of classification and regression. The method was introduced by Leo Breiman in 2001. A good prediction model begins with a great feature selection process. This paper proposes the ways of selecting important variables to be included in the model using random forests. The variables to be included in the model are indexed or ranked according to the score of importance of each variable. The comparison of performance between random forest models (variables selected by the random forest method) and logistic regression models (variables selected by the stepwise method) is demonstrated.

Denis Nyongesa, Kaiser Permanente Center for Health Research

Session 5000-2020

Visual Data Insights: Image and Ranked Precise Values for Quick, Easy, and Reliable Inference

This paper provides eight widely applicable graphic examples and the embodied design principles for effective visual communication that apply to any graph and any software. Learn how to deliver the best results with Output Delivery System (ODS) Graphics when doing trend or time series plots, horizontal or vertical bar charts, pie charts, and choropleth maps (maps in which geographic unit areas are color-coded to identify their category or data range). As extras, two easy-to-create and easy-to-interpret special cases are included with exhibits and code in a handout, along with a reference list of design guidelines. Even if you are just a requestor or viewer of other people's data visualization, come to learn what to ask for and what to expect. This paper assumes no prior knowledge of ODS Graphics.

LeRoy Bessler, Bessler Consulting and Research

Session 5118-2020

Visualizing America: SAS Graphical Depictions utilizing the American Community Survey (ACS 2018)

SAS provides an array of powerful tools to analytically examineand graphically depictlarge-scale data sets. This project explored the PROC FREQ functionality to produce mosaic plots for categorical data. Mosaic plots, as described by Hartigan and Kleiner (1981), are formed when numbers in a contingency table are represented by rectangles of areas proportional to the numbers, with shape and position rendered to expose deviations from independence models. The resulting visual depiction (a collection of rectangles for the contingency table) is called a mosaic (Hartigan and Kleiner 1981; Friendly 2001). The colors and patterns displayed within the mosaic plots illustrate and define the relationships displayed by the categorical variable values. The American Community Survey (ACS) is conducted annually by the United States Census Bureau. The four major survey sections of the ACS consist of social, housing, economic, and demographic data. Within these four sections, the ACS yields rich, personal data covering more than 40 topic areas, such as educational level of attainment, commute time to work, type of housing, and ethnicity. By combining SAS^®functionality with varying combinations of categorical information from a large-scale, national data set, mosaic plots were generated to create a visual portrait-in-time of the ACS respondents. This work thus provides a contemporary window into the daily lives of millions of Americans across our country.

Wendy Dickinson, University of South Florida

Session 4335-2020

Visualizing Geographical Data with a Tile Grid Map in SAS

The tile grid map is an increasingly popular tool to visualize statistical data, such as the US population change by state, on a map. It mimics an actual map with a set of equal squares in a rectangular grid. For example, you can make a tile grid map of the United States with each square representing an individual state. Unlike the choropleth map, the tile grid map does not show the perception bias that favors larger regions. Furthermore, the squares are well-suited for laying out "rubber stamp" graphs with subsetted data to create map-based small multiples. You can use the small multiples to effectively compare and analyze data in different regions of the map. Although the Graph Template Language (GTL) does not directly support this visualization, you can easily make one in SAS with a combination of the DATA step, the SQL and SUMMARY procedures, and the SAS^® ODS Graphics procedures (often called the Statistical Graphics procedures). This paper uses examples to show you how to systematically create a variety of tile grid maps that include time series and infographics.

Lingxiao Li, SAS Institute

Session 4403-2020

Vroom Vroom! Fast-Track Your Organization onto SAS^® Visual Analytics in SAS Viya^®

As a business leader new to SAS^® Visual Analytics in SAS^® Viya^®, you might be wondering, "How do I get my team up and running quickly so that we can uncover the insights that we need?" You might have some of these questions: How is data accessed and refreshed? How should the data be structured? What security is available? How would I implement a development process and workflow? How do I convert my team from what we are using now? What else do I need to be thinking about? Classes, documentation, and videos teach how to use the SAS Visual Analytics software, much like driver's education teaches how to drive a car. This session builds off that content. You learn how to plan your route, organize your trip, set up traffic rules, and even make your car "self-driving." Your driving instructor is a SAS^® consultant who is on the road with new users every day. Although these best practices have been developed with government and public agencies, they apply to any businesses new to SAS Visual Analytics, especially those in small to medium enterprises.

Cathy Warner, SAS

Session 5200-2020

What Happens After Police Shootings?

In the United States, the use of lethal force by police officers has come under extreme scrutiny. Police Departments have instituted new programs such as the wearing of body cameras and recruiting social workers to work together with on-duty officers. Due to limited resources, these cannot be instituted by all departments; therefore, it is important to make data-driven decisions to make the most of those resources. This paper explores the possibility of underlying biases and prejudices that are leading to officers not being held accountable. To accomplish this, police shooting data and the results of the follow-on investigations were examined. The goal was to identify if there is a predictive model to evaluate if a police officer will face Grand Jury Indictment based on the attributes within the dataset. The police disposition (which specifies if an officer faced a Grand Jury Indictment or not) was used as the target variable. The Decision Tree model that performed best identified the most important variables which were the victim's State, County, Department Involved, Mental Impaired, Age group, Race, and Gender. The result of the Decision Tree model had a 35.86% misclassification rate. This model can be used to predict the outcomes of police shootings and possibly be used by a government oversight committee, such as the US Justice Department, to investigate those counties and states that have abnormally high perceived unjustified shootings.

Aravind Dhanabal, Mason Kopasz, Alex Lindsay, Oklahoma State University

Session 4104-2020

What's Bugging You? Find Out with the Interactive SAS Code Debugger

Lifes too short to put up with buggy code. Join us as we show you how to use the new SAS^® Code Debugger to debug SAS^® code in the SAS Function Compiler (FCMP) procedure and in the SAS^®Cloud Analytic Services FCMP action set. SAS Code Debugger is a full-featured, interactive debugger that runs on SAS^® Viya^® 3.5 and SAS^® 9.4M6. The debugger is available as a stand-alone tool, and it is also built into SAS products such as SAS^® Model Implementation Platform. With SAS Code Debugger, you can discover programming problems, improve code quality, and debug complex models with ease. You can also set breakpoints, watch variables, and step into nested functions and subroutines. This paper introduces SAS Code Debugger, shows you tips for debugging SAS code, and gets you up and running quickly.

Jenna Austin, Toshiba Burns-Johnson, Aaron Mays, and Mike Whitcher, SAS Institute

Session 4079-2020

What's New in SAS Data Management

The latest releases of SAS^®Data Management software provide a comprehensive and integrated set of capabilities for collecting, transforming, and managing your data. The latest features in the product suite include capabilities for working with data from a wide variety of environments and types including Apache Hadoop, cloud data sources, relational database management system (RDBMS), files, unstructured data, images, and streaming data, with the ability to perform extract, transform, load (ETL) and extract, load, transform (ELT) transformations in diverse run-time environments including SAS^®, Hadoop, Spark, SAS^®Analytics, cloud, and data virtualization environments. The SAS Data Management offering has been enhanced to include integration with SAS^®Studio, including a new ETL flow building capability that can be used to build reusable data flow processes. Enhancements have also been added to leverage analytics and artificial intelligence (AI) to help automate data management tasks. This paper provides an overview of the latest features of the SAS Data Management product suite and includes use cases and examples for leveraging product capabilities.

Nancy Rausch, SAS Institute

Session 4449-2020

What's New in SAS^® Drive?

Do you use SAS^® Drive to organize and share content with your colleagues? Did you know you can customize SAS Drive so that the items you want to see are easy to find? In all versions of SAS Drive, you can add items to Quick Access. In SAS Drive 2.2, you can reorder the items in Quick Access through a simple drag-n-drop, and you can get recommendations for what to add to your Quick Access area. In all versions of SAS Drive, you can hide and show object tabs. In SAS Drive 2.2, you can make any folder in your organization's folder structure into a tab - so the information you care about most is at your fingertips. To help you visually identify your favorite items, we've added stars to the tiles, and to help you adjust SAS Drive so that it's most useful to you, you can now resize the Information Pane. Along with these personal modifications, you can close all your search tabs with one action, you can preview a PDF file, you can copy a report link, and you can see who shared an item with you. We've also extended the SAS Drive experience to SAS Visual Analytics. When SAS Visual Analytics has no open reports, you see a mini version of SAS Drive, limited to reports. You can sort, view, and search for reports in SAS Visual Analytics the same way you do in SAS Drive. And finally, SAS Drive comes with a product tour that is offered to you the first time you sign in and available to you at any time.

Cheryl L. Coyle and Scott P. Leslie, SAS Institute

Session 4078-2020

What's New in SAS^® Visual Analytics? Smart Business Intelligence, Smart Analytics

The new SAS^® Visual Analytics provides a richer reporting experience with the combined editor and viewer. It provides better suggestive user assistance that is embedded in the user's flow of working with data and content. It gives the report author the ability to provide exploring features for report consumers. There are more ways to interact with the visuals. Sharing and reuse are made easier. The automated explanation object allows for enhanced storytelling with analytics.

Rajiv Ramarajan and Mark Malek, SAS Institute

Session 4572-2020

What's New in SAS^® Visual Data Mining and Machine Learning: From a Programmer's Perspective

The latest releases of SAS^®Data Mining and Machine Learning software provide a comprehensive set of data mining and machine learning capabilities. The latest features in this product suite include enhanced principal component analysis through the new Kernel PCA action set, the new SEMISUPLEARN procedure to impute missing target labels in your training data automatically, and the new SPARSEML procedure for discovering insights from sparse data that now exist in many business domains. There are also many new features in existing action sets and procedures, such as the Image action set and the SVMACHINE procedure, and much more. This paper provides an overview of some hot topics and the latest features of the SAS Data Mining and Machine Learning product suite and includes use cases and examples that demonstrate how to take full advantage of product capabilities from the programmers perspective.

Tao Wang, SAS Institute

Session 4660-2020

What's Your Favorite Color? Controlling the Appearance of a Graph

The appearance of a graph produced by the Graph Template Language (GTL) is controlled by Output Delivery System (ODS) style elements. These elements include fonts and line and marker properties as well as colors. A number of procedures, including the Statistical Graphics (SG) procedures, produce graphics using a specific ODS style template. This paper provides a very basic background of the different style templates and the elements associated with the style templates. However, sometimes the default style associated with a particular destination does not produce the desired appearance. Instead of using the default style, you can control which style is used by indicating the desired style on the ODS destination statement. However, sometimes not a single one of the 50-plus styles provided by SAS^® achieves the desired look. Luckily, you can modify an ODS style template to meet your own needs. One such style modification is to control which colors are used in the graph. Different approaches to modifying a style template to specify colors used are discussed in depth in this paper.

Richann Watson, DataRich Consulting

Session 5009-2020

Who Are You When Playing World of Warcraft? An Analysis of Player Demographics and Social Behavior

Online, you can become whoever you want to be. Some take this as an opportunity to completely rewrite their identity, while others are no different than they would be in person. However, we as a species are fond of looking for patterns, and so this leads to a culture, particularly in video games, of stereotyping in an attempt to better know the people with whom you are socializing. Avid gamers and data scientists naturally wonder about the amount of truth in these assumptions. World of Warcraft (WoW) is one of the largest massively multiplayer online role-playing games (MMORPGs) of all time, an ever-expanding experience and social environment made for long-term playing and not limited to a single storyline or campaign that is completed. This means that players can use the same character for years, which is a more serious endeavor than playing a character for only 2050 hours. World of Warcraft can be a very social game, with guilds as an in-game association of characters, controlled by players and formed to make finding groups for in-game achievements easier, as well as to form social relationships with other players. This study goes beyond simple demographics of gender and age (which are stereotypically thought to be skewed toward young, single males) to examine the way people play WoW and the potential correlations and differences with players deeper demographics, such as relationship status, sexual orientation, and relationships in the real world with other WoW players.

Alyssa Venn and Joe DeMaio, Kennesaw State University

Session 4716-2020

Why is it taking so long!? Utilizing query optimization to pull from external sources

Simply receiving data from a big data repository can be rewarding, but designing that query to run quickly is an art. Here at IDEXX Laboratories, weve developed a three-tiered system to optimize our queries to run quickly and consistently when pulling from external data sources: 1) Utilizing pass-through queries: Pushing the processing load to the external database means that we dont need to pull the breadth of the table into SAS^®if we only want a small subsection of data. 2) Partitioning queries: Rather than pulling data for a large range of time (such as a year), breaking the query down into bite-sized pieces can help speed up processing time and avoid crashing your environment. 3) Implementing sub-queries: If you dont need every column in a table, why bring it into the query? Minimizing the number of columns required by building sub-queries will create a lean solution to speed up your query. Implementing these three solutions has helped our company drastically improve our query run time, minimized the load on our servers, and allowed us to get more data to our business partners faster. We are drastically improving productivity by using some of these query optimization techniques everywhere we can.

Kurt Stultz, IDEXX

Session 4121-2020

Wildcarding in WHERE Clauses.pptx

There is wildcarding allowed within where clause like expressions by using special characters. But if you want to search for those specific characters, you have to override them. If you forget to escape the character, you'll get unexpected matches.

David Horvath, CCP, MS

Session 4693-2020

Winning Tactics for SAS^® Visual Data Mining and Machine Learning: Why it's the Only Skill You Need

The ability to rapidly identify actionable data insights determines today's business success. SAS^® Visual Data Mining and Machine Learning on SAS^® Viya^® is a one-stop shop that enables both technical and non-technical stakeholders to complete that task with ease. From data exploration to machine learning, SAS Visual Data Mining and Machine Learning enables analysts to load data and find the needed answers quickly and efficiently. In this workshop, users will gain hands-on experience using SAS Visual Data Mining and Machine Learning as you walk through an analytics project from start to finish. Users will learn how to import and explore the data, build several predictive models, and then determine a champion model to answer business questions. Users will walk away with a baseline knowledge of the application and the various control mechanisms it provides. For a fun twist on common everyday business problems, this workshop uses college basketball data to predict the outcomes for tournament games.

Chris St. Jeor, Zencos

Session 5115-2020

Work Smarter, Not Harder: Learning to Live without Your X

Within the SAS^® community, a great deal of the population are experts in their own fields, but those aren't always technical roles. After supporting hundreds of users over a number of years, we realized there is one question that always comes up - how do we live without our X? With X-commands, pretty much anything is possible, and that unlimited potential makes both administrators and auditors nervous. As environments evolve and controls tighten, the loss of X-commands is all but inevitable for most teams; but never fear, there are alternatives! This paper will discuss how to replace the following: getting file listings; making and deleting directories and files; changing file permissions; Microsoft Windows compression. By harnessing the power of built-in SAS functions as well as other functionality, we are able to replicate much of what users have lost out on when faced with X-commands being disabled, enabling users to finally move on from their X.

Amit Patel and Lewis Mitchell

Session 4319-2020

Write Custom Parallel Programs by Using the iml Action

This paper introduces the iml action, which is available in SAS^® Viya^® 3.5. The iml action supports most of the same syntax and functionality as the SAS/IML^® matrix language that SAS^® software has supported for decades. With minimal changes, most programs that run in the IML procedure can also run in the iml action. In addition, the iml action supports new programming features for parallel programming. The iml action is different from most actions, which perform a speciÔ¨Åc task. The iml action provides a set of general programming tools that you can use to implement a custom parallel algorithm. The programmer can control the computation itself and can control how the computation is distributed among nodes and threads on a cluster of machines (or threads on a single machine). The parallel programming features are demonstrated by using examples of simulation, power estimates, and scoring regression models.

Rick Wicklin and Arash Dehghan Banadaki, SAS Institute

Session 4659-2020

Write SAS to Generate SAS: Three Code-Generating Techniques for Many Automation Solutions

While SAS has no shortage of automation techniques, it does not always occur to SAS programmers to write SAS codes that generate SAS codes. This paper introduces three SAS techniques having the capacity of generating SAS codes from driver data sets. Driver data sets can be external specification files, the readily-available SAS dictionary metadata, or even the data values in the original data sets. Through real-world examples, we will demonstrate that the code generators can be effective tools in many applications where there are either repetitive programming or immature and changing data. Additionally, this paper will also provide programming tips and compare the three techniques using the simplicity, flexibility, and efficiency metrics.

Yun (Julie) Zhuo, PRA Health Sciences

Session 4726-2020

Writing Data Quality rules using SAS Meta-programming

Data quality and data governance are important for any organizations data office, considering the major impact of bad data on all downstream applications like regulatory reporting and advanced analytics. There are many tools in the market to assess and govern the data quality of data warehouses and data marts. This presentation explains how to implement a data quality check for dimensional risk data marts using the SAS^®meta-programming technique. Meta-programming is way of programming in which a program is used to write a program. In SAS this can be easily achieved by using SAS macros and metadata tables. In our presentation, we demonstrate how you can store and maintain the data quality rules in one metadata table and run the rules on various tables or extracts in a data mart using a SAS metadata-driven approach. Data quality rules are applied to multiple extracts that are created by joining dimension and fact tables. The final report provides a list of variables with rule name, description, and percentage of error as per the rule. With all rules in a single table, you can achieve more flexibility in rule writing, maintenance, and process automation. This presentation is intended for data science and data engineering professionals with an intermediate level of SAS expertise.

Shreyas Dalvi, Suntrust Banks

Session 5019-2020

Writing Reliable SAS^® Programs

This presentation offers simple, reliable techniques that can help SAS^®programmers write reliable code. Before diving into the techniques, let me tell you why it is important. Reliable code can be depended upon to deliver expected functionality and performance in a consistent, repeatable manner for a specified duration. Reliability is application-specific and depends largely on the code written and very little on the language itself. Because unreliable code is considered worthless or untrustworthy, reliability is considered paramount to other coding requirements, such as portability, efficiency, and performance. Reliable code handles failure very well. In this presentation, we discuss a few possible scenarios of failure and how to handle them. Summary of Scenarios: Archiving data sets of previous cycle before executing the current cycle Not running the job if required input files are not available When an error is encountered, aborting the program and sending an email to the user with the log file Handling change of column type while reading data from an Excel file using PROC IMPORT Observation count when performing joining: if observations=0, aborting the program or implementing custom logic Using AuthDomain instead of UserID and passwords while connecting to ODBC Using global macro variable without initialization Not generating a report if the data is not available and continuing for next report Replacing special characters in column names with underscores Replacing . with 0 while performing division.

Balraj Pitlola and Venkata Karimiddela. Core Complete

Session 4224-2020

You are using PROC GLM too much (and what you should be using instead)

Ordinary least square regression is one of the most widely used statistical methods. However, it is a parametric model and relies on assumptions that are often not met. Alternative methods of regression for continuous dependent variables relax these assumptions in various ways. This paper will explore PROCS such as QUANTREG, ADAPTIVEREG and TRANSREG for these data.

Peter Flom

Session 4195-2020

Your Data Will Go On: Practice for Character Data Migration

With the rapid advancement of technology, you inevitably face continual upgrades and changes in your working platforms. No matter how greatly the platform evolves, your work can keep in high continuity, and reliability if your data is successfully ported without being destroyed. Character data can be at risk during data migration, which is determined by its internationalization features such as encoding sensitivity and semantics dependence. This paper discusses the potential issues when moving your character data across environments, such as a migration to SAS^® with UTF-8 or other encoding environments. It also demonstrates the use of common tools during the migration, such as the character variable padding (CVP) engine. By using these tools flexibly, no matter how the environment changes, your data will go on.

Edwin (You) Xie, SAS Institute

Session 4123-2020

Zen and the Art of Problem Solving.docx

Although software development is taught as a STEM out of Science or Engineering schools, it is as much an art or craft - a creative process - as a science. This presentation focuses on innovative problem solving techniques the tools and techniques to use when your normal process just doesn't seem to get you to a solution. Much of the information in this talk is based on Robert Pirsig's Zen and the Art of Motorcycle Maintenance, which, although it focuses on Motorcycles, applies to all kinds of problem spaces (and Pirsig was a tech writer for IBM). These techniques have served me well over the years. The difference in art versus science approaches is actually supported by the way the brain works.

David B. Horvath, MS, CCP