SAS Global Forum 2017 Proceedings

Analyzing big data and visualizing trends in social media is a challenge that many companies face as large sources of publicly available data become accessible. While the sheer size of usable data can be staggering, knowing how to find trends in unstructured textual data is just as important an issue. At a big data conference, data scientists from several companies were invited to participate in tackling this challenge by identifying trends in cancer using unstructured data from Twitter users and presenting their results. This paper explains how our approach using SAS^® analytical methods were superior to other big data approaches in investigating these trends.

Read the paper (PDF)

Migration to a SAS^® Grid Computing environment provides many advantages. However, such migration might not be free from challenges especially considering users' pre-migration routines and programming practices. While SAS^® provides good graphical user interface solutions (for example, SAS^® Enterprise Guide^®) to develop and submit the SAS code to SAS Grid Computing, some situations might need command-line batch submission of a group of related SAS programs in a particular sequence. Saving individual log and output files for each program might also be a favorite routine in many organizations. SAS has provided the SAS Grid Manager Client Utility and SASGSUB commands to enable command-line submission of SAS programs to the grid. However, submitting a sequence of SAS programs in a conventional batch program style and getting individual logs and outputs for them needs a customized approach. This paper presents such an approach. In addition, an HTML and JavaScript tool developed in-house is introduced. This tool automates the generation of a SAS program that almost emulates a conventional scenario of submitting a batch program in a command-line shell using SASGSUB commands.

Read the paper (PDF)

This poster shows how to predict a past-due amount using traditional and machine learning techniques: logistic analysis, k-nearest neighbors, and random forest. The data set that was analyzed is about real-world commerce. It contains 305 categories of financial information from more than 11,787,287 unique businesses, from 2006 to 2014. The big challenge is how to handle the big and noisy real-world data sets. The first step of any model-building exercise is to define the outcome. A common prediction method in the financial services industry is to use binary outcomes, such as Good and Bad. For our research problem, we reduced past-due amounts into two cases, Good and Bad. Next, we built a two-stage model using the logistic regression method; that is, the first stage predicts the likelihood of a Bad outcome, and the second predicts a past-due amount, given a Bad outcome. Logistic analysis as a traditional statistical technique is commonly used for prediction and classification in the financial services industry. However, for analyzing big, noisy, or complex data sets, machine learning techniques are typically preferred to detect hard-to-discern patterns. To compare with both techniques, we use predictive accuracy, ROC index, sensitivity, and specificity as criteria.

Automatic loading, tracking, and visualization of data readiness in SAS^® Visual Analytics is easy when you combine SAS^® Data Integration Studio with the DATASET and LASR procedures. This paper illustrates the simple method that the University of North Carolina at Chapel Hill (Enterprise Reporting and Departmental Systems) uses to automatically load tables into the SAS^® LASR Analytic Servers, and then store reportable data about the HDFS tables created, the LASR tables loaded, and the ETL job execution times. This methodology gives the department the ability to longitudinally visualize system loading performance and identify changes in system behavior, as well as providing a means of measuring how well we are serving our customers over time.

Read the paper (PDF)

A propensity score is the probability that an individual will be assigned to a condition or group, given a set of baseline covariates when the assignment is made. For example, the type of drug treatment given to a patient in a real-world setting might be non-randomly based on the patient's age, gender, geographic location, and socioeconomic status when the drug is prescribed. Propensity scores are used in many different types of observational studies to reduce selection bias. Subjects assigned to different groups are matched based on these propensity score probabilities, rather than matched based on the values of individual covariates. Although the underlying statistical theory behind the use of propensity scores is complex, implementing propensity score matching with SAS^® is relatively straightforward. An output data set of each subject's propensity score can be generated with SAS using PROC LOGISTIC. And, a generalized SAS macro can generate optimized N:1 propensity score matching of subjects assigned to different groups using the radius method. Matching can be optimized either for the number of matches within the maximum allowable radius or by the closeness of the matches within the radius. This presentation provides the general PROC LOGISTIC syntax to generate propensity scores, provides an overview of different propensity score matching techniques, and discusses how to use the SAS macro for optimized propensity score matching using the radius method.

Read the paper (PDF)

Creating sophisticated, visually stunning reports is imperative in today s business environment, but is your fancy report really accessible to all? Let s explore some simple enhancements that the fourth maintenance release of SAS^® 9.4 made to Output Delivery System (ODS) layout and the Report Writing Interface that will truly empower you to accommodate people who use assistive technology. ODS now provides the tools for you to meet Section 508 compliance and to create an engaging experience for all who consume your reports.

Read the paper (PDF)

When a large and important project with a strict deadline hits your desk, it's easy to revert to those tried-and-true SAS^® programming techniques that have been successful for you in the past. In fact, trying to learn new techniques at such a time can prove to be distracting and a waste of precious time. However, the lull after a project's completion is the perfect time to reassess your approach and see whether there are any new features added to the SAS arsenal since the last time you looked that could be of great use the next time around. Such a post-project post-mortem has provided me with the opportunity to learn about several new features that will prove to be hugely valuable in the next release of my project. For example: 1) The PRESENV option and procedure 2) Fuzzy matching with the COMPGED function 3) The ODS POWERPOINT statement 4) SAS^® Enterprise Guide^® enhancements, including copying and pasting process flows and the SAS Macro Variable Viewer

Read the paper (PDF)

In this paper, a SAS^® macro is introduced that can search and replace any string in a SAS program. To use the macro, the user needs only to pass the search string to a folder. If the user wants to use the replacement function, the user also needs to pass the replacement string. The macro checks all of the SAS programs in the folder and subfolders to find out which files contain the search string. The macro generates new SAS files for replacements so that the old files are not affected. An HTML report is generated by the macro to include the original file locations, the line numbers of the SAS code that contain the search string, and the SAS code with search strings highlighted in yellow. If you use the replacement function, the HTML report also includes the location information for the new SAS files. The location information in the HTML report is created with hyperlinks so that the user can directly open the files from the report.

Read the paper (PDF) | View the e-poster or slides (PDF)

This paper introduces a macro that can generate the keyhole markup language (KML) files for U.S. states and counties. The generated KML files can be used directly by Google Maps to add customized state and county layers with user-defined colors and transparencies. When someone clicks on the state and county layers in Google Maps, customized information is shown. To use the macro, the user needs to prepare only a simple SAS^® input data set. The paper includes all the SAS codes for the macro and provides examples that show you how to use the macro as well as how to display the KML files in Google Maps.

Read the paper (PDF)

Disseminating data to potential collaborators can be essential in the development of models, algorithms, and innovative research opportunities. However, it is often time-consuming to get approval to access sensitive data such as health data. An alternative to sharing the real data is to use synthetic data, which has similar properties to the original data but does not disclose sensitive information. The collaborators can use the synthetic data to make preliminary models or to work out bugs in their code while waiting to get approval to access the original data. A data owner can also use the synthetic data to crowdsource solutions from the public through competitions like Kaggle and then test those solutions on the original data. This paper implements a method that generates fully synthetic data in a way that matches the statistical moments of the true data up to a specified moment order as a SAS^® macro. Variables in the synthetic data set are of the same data type as the true data (for example, integer, binary, continuous). The implementation uses the linear programming solver within a column generation algorithm and the mixed integer linear programming solver from the OPTMODEL procedure in SAS/OR^® software. The COFOR statement in PROC OPTMODEL automatically parallelizes a portion of the algorithm. This paper demonstrates the method by using the Sashelp.Heart data set to generate fully synthetic data copies.

Read the paper (PDF)

Microsoft Windows 10 is a new operating system that is increasingly being adopted by enterprises around the world. SAS has planned to expand SAS^® Mobile BI, which is currently available on Apple iOS and Google Android, to the Microsoft Windows 10 platform. With this new application, customers can download business reports from SAS^® Visual Analytics to their desktop, laptop, or Microsoft Surface device, and use these reports both online and offline in their day-to-day business life. With Windows 10, users have the option of pinning a report to the desktop for quick access. This paper demonstrates this new SAS mobile application. We also demonstrate the cool new functionality on iOS and Android platforms, and compare them with the Windows 10 application.

Read the paper (PDF)

Specifying the functional form of a covariate is a fundamental part of developing a regression model. The choice to include a variable as continuous, categorical, or as a spline can be determined by model fit. This paper offers an efficient and user-friendly SAS^® macro (%SPECI) to help analysts determine how best to specify the appropriate functional form of a covariate in a linear, logistic, and survival analysis model. For each model, our macro provides a graphical and statistical single-page comparison report of the covariate as a continuous, categorical, and restricted cubic spline variable so that users can easily compare and contrast results. The report includes the residual plot and distribution of the covariate. You can also include other covariates in the model for multivariable adjustment. The output displays the likelihood ratio statistic, the Akaike Information Criterion (AIC), as well as other model-specific statistics. The %SPECI macro is demonstrated using an example data set. The macro includes the PROC REG, PROC LOGISTIC, PROC PHREG, PROC REPORT, PROC SGPLOT, and more procedures in SAS^® 9.4.

Read the paper (PDF) | Download the data file (ZIP)

Visual+D2:D18ization is a critical part of turning data into knowledge. A customized graph is essential to make data visualization meaningful, powerful, and interpretable. Furthermore, customizing grouped data into a desired layout with specific requirements such as clusters, colors, symbols, and patterns for each group can be challenging. This paper provides a start-from-scratch, step-by-step solution to create a customized graph for grouped data using SAS^® Graph Template Language (GTL). By analyzing the data and target graph with the available tools and options that GTL provided, this paper demonstrates GTL is a powerful and flexible tool to create a customized, complex graph.

Read the paper (PDF)

Visualization is a critical part to turn data into knowledge. A customized graph is essential to make data visualization meaningful, powerful, and interpretable. Furthermore, customizing grouped data into a desired layout with specific requirements such as clusters, colors, symbols, and patterns for each group can be challenging. This paper provides a start-from-scratch, step-by-step solution to create a customized graph for grouped data using the Graph Template Language (GTL). From analyzing the data to creating the target graph with the tools and options that are available with GTL, this paper demonstrates GTL is a powerful and flexible tool for creating a customized, complex graph.

Read the paper (PDF)

SAS^® functions provide amazing power to your DATA step programming. Some of these functions are essential some of them help you avoid writing volumes of unnecessary code. This talk covers some of the most useful SAS functions. Some of these functions might be new to you, and they will change the way you program and approach common programming tasks. The majority of the functions described in this talk work with character data. There are functions that search for strings, and others that can find and replace strings or join strings together. Still others can measure the spelling distance between two strings (useful for 'fuzzy' matching). Some of the newest and most amazing functions are not functions at all, but call routines. Did you know that you can sort values within an observation? Did you know that not only can you identify the largest or smallest value in a list of variables, but you can identify the second- or third- or nth-largest or smallest value? A knowledge of the functions described here will make you a much better SAS programmer.

Read the paper (PDF)

Accelerate your data preparation by having your DS2 execute without translation inside the Teradata database or on the Hadoop platform with SAS^® Code Accelerator. This presentation shows how easy it is to use SAS Code Accelerator via a live demonstration.

Read the paper (PDF)

Many organizations that use SAS^® Visual Analytics must conform with accessibility requirements such as Section 508, the Americans with Disabilities Act, and the Accessibility for Ontarians with Disabilities Act. SAS Visual Analytics provides a number of different ways to view reports, including the SAS^® Report Viewer and SAS^® Mobile BI native applications for Apple iOS and Google Android. Each of these options has its own strengths and weaknesses when it comes to accessibility a one-size-fits-all approach is unlikely to work well for the people in your audience who have disabilities. This paper provides a comprehensive assessment of the latest versions of all SAS Visual Analytics report viewers, using Web Content Accessibility Guidelines (WCAG) 2.0 as a benchmark to evaluate accessibility. You can use this paper to direct the end users of your reports to the viewer that best meets their individual needs.

Read the paper (PDF) | Download the data file (ZIP)

SAS/ACCESS^® software grants access to data in third-party database management systems (DBMS), but how do you access data in DBMS not supported by SAS/ACCESS products? The introduction of the GROOVY procedure in SAS^® 9.3 lets you retrieve this formerly inaccessible data through a JDBC connection. Groovy is an object-oriented, dynamic programming language executed on the Java Virtual Machine (JVM). Using Microsoft Azure HDInsight as an example, this paper demonstrates how to access and read data into a SAS data set using PROC GROOVY and a JDBC connection.

Read the paper (PDF)

Monitoring server events to proactively identify future outages. Looking at financial transactions to check for money laundering. Analyzing insurance claims to detect fraud. These are all examples of the many applications that can use the power of SAS^® analytics to identify threats to a business. Using SAS^® Visual Investigator, users can now add a workflow to control how these threats are managed. Using the administrative tools provided, users can visually design the workflow that the threat would be routed through. In this way, the administrator can control the tasks within the workflow, as well as which users or groups those tasks are assigned to. This presentation walks through an example of using the administrative tools of SAS Visual Investigator to create a ticketing system in response to threats to a business. It shows how SAS Visual Investigator can easily be adapted to meet the changing nature of the threats the business faces.

Read the paper (PDF)

Hierarchical models, also known as random-effects models, are widely used for data that consist of collections of units and are hierarchically structured. Bayesian methods offer flexibility in modeling assumptions that enable you to develop models that capture the complex nature of real-world data. These flexible modeling techniques include choice of likelihood functions or prior distributions, regression structure, multiple levels of observational units, and so on. This paper shows how you can fit these complex, multilevel hierarchical models by using the MCMC procedure in SAS/STAT^® software. PROC MCMC easily handles models that go beyond the single-level random-effects model, which typically assumes the normal distribution for the random effects and estimates regression coefficients. This paper shows how you can use PROC MCMC to fit hierarchical models that have varying degrees of complexity, from frequently encountered conditional independent models to more involved cases of modeling intricate interdependence. Examples include multilevel models for single and multiple outcomes, nested and non-nested models, autoregressive models, and Cox regression models with frailty. Also discussed are repeated measurement models, latent class models, spatial models, and models with nonnormal random-effects prior distributions.

Read the paper (PDF)

Location information plays a big role in business data. Everything that happens in a business happens somewhere, whether it s sales of products in different regions or crimes that happened in a city. Business analysts typically use the historic data that they have gathered for years for analysis. One of the most important pieces of data that can help answer more questions qualitatively, is the demographic data along with the business data. An analyst can match the sales or the crimes with the population metrics like gender, age groups, family income, race, and other pieces of information, which are part of the demographic data, for better insight. This paper demonstrates how a business analyst can bring the demographic and lifestyle data from Esri into SAS^® Visual Analytics and join the data with business data. The integration of SAS Visual Analytics with Esri allows this to happen. We demonstrate different methods of accessing Esri demographic data from SAS Visual Analytics. We also demonstrate how you can use custom shape files and integrate with Esri Portal for ArcGIS.

Read the paper (PDF)

The SQL procedure has a number of powerful and elegant language features for SQL users. This hands-on workshop emphasizes highly valuable and widely usable advanced programming techniques that will help users of Base SAS^® harness the power of PROC SQL. Topics include using PROC SQL to identify FIRST.row, LAST.row, and Between.rows in BY-group processing; constructing and searching the contents of a value-list macro variable for a specific value; data validation operations using various integrity constraints; data summary operations to process down rows and across columns; and using the MSGLEVEL= system option and _METHOD SQL option to capture vital processing and the algorithm selected and used by the optimizer when processing a query.

Read the paper (PDF) | Download the data file (ZIP)

SAS^® Visual Analytics provides a robust platform to perform business intelligence through a high-end and advanced dashboarding style. In today's technology era, dashboards not only help in gaining insight into an organization's operations, but they also are a key performance indicator. In this paper, I discuss five important and frequently used objects in SAS Visual Analytics. These objects are used to get the most out of dashboards in an effective and efficient way. This paper covers the use of dates (as a format) in the date slider and gauges, cascading filters, custom graphs, linking reports within sections of the same report or with other reports, and associating buttons with graphs for dynamic functionality.

Read the paper (PDF)

Would you agree that the value of SAS^® for your organization comes from transforming data into actionable information, using well-prepared human resources? This paper presents seven areas where this potential SAS value can be lost by inefficient data access, limited reporting and visualization, poor data cleansing, obsolete predictive analytics, incomplete SAS solutions, limited hardware use, and lack of governance. This paper also suggests what to do to overcome these issues.

Read the paper (PDF)

To determine whether there is a correlation between the repetitiveness of a song s lyrics and its popularity, the top 10 songs from the Billboard Hot 100 songs chart from 2006 to 2015 were collected. Song lyrics were assessed to determine the count of the top 10 words used. Word counts were used to predict the number of weeks the song was on the chart. The prediction model was analyzed to determine the quality of the model and whether word count was a significant predictor of a song s popularity. To investigate whether song lyrics are becoming more simplistic over time, several tests were performed to see whether the average word count has been changing over the years. All analysis was completed in SAS^® using various procedures.

View the e-poster or slides (PDF)

Are you tired of copying PROC FREQ or PROC MEANS output and pasting it into your tables? Do you need to produce summary tables repeatedly? Are you spending a lot of your time generating the same summary tables for different subpopulations? This paper introduces an easy-to-use macro to generate a descriptive statistics table. The table reports counts and percentages for categorical variables, and means, standard deviations, medians, and quantiles for continuous variables. For variables with missing values, the table also includes the count and percentage missing. Customization options allow for the analysis of stratified data, specification of variable output order, and user-defined formats. In addition, this macro incorporates the SAS^® Output Delivery System (ODS) to automatically produce a Rich Text Format (RTF) file, which can be further edited by a word processor for the purpose of publication.

Read the paper (PDF) | View the e-poster or slides (PDF)

Significant growth of the Internet has created an enormous volume of unstructured text data. In recent years, the amount of this type of data that is available for analysis has exploded. While the amount of textual data is increasing rapidly, an ability to obtain key pieces of information from such data in a fast, flexible, and efficient way is still posing challenges. This paper introduces SAS^® Contextual Analysis In-Database Scoring for Hadoop, which integrates SAS^® Contextual Analysis with the SAS^® Embedded Process. SAS^® Contextual Analysis enables users to customize their text analytics models in order to realize the value of their text-based data. The SAS^® Embedded Process enables users to take advantage of SAS^® Scoring Accelerator for Hadoop to run scoring models. By using these key SAS^® technologies, the overall experience of analyzing unstructured text data can be greatly improved. The paper also provides guidelines and examples on how to publish and run category, concept, and sentiment models for text analytics in Hadoop.

Read the paper (PDF)

The SAS^® code looks perfect. You submit it and to your amazement, there is a problem with the CREATE TABLE statement. You need to change the table definition, ever so slightly, but how? Explicit pass-through? That's not an option. Fortunately, there are a handful of SAS options that can save the day. This presentation covers everything you need to know in order to adjust the SAS CREATE TABLE statements using SAS options. This presentation covers the following SAS options: DBCREATE_TABLE_OPTS=, POST_STMT_OPTS=, POST_TABLE_OPTS=, PRE_STMT_OPTS=, and PRE_TABLE_OPTS=. We use Hadoop and Oracle examples to show why these options can make your life easier. From there, we use real code to show you how to use them.

Read the paper (PDF)

Whether you are an existing SAS^® Visual Analytics user or you are exploring SAS Visual Analytics for the first time, the first release of SAS^® Visual Analytics 8.1 on SAS^® Viya has something exciting for everyone. The latest version is a clean, modern HTML5 interface. SAS^® Visual Analytics Designer, SAS^® Visual Analytics Explorer, and SAS^® Visual Statistics are merged into a single web application. Whether you are designing reports, exploring data, or running interactive, predictive models, everything is integrated into one seamless experience. The application delivers on the same basic promise: get pertinent answers from any-size data. The paper walks you through key features that you have come to count on, from auto charting, to display rules, and more. It acclimates you to the new interface and highlights a few exciting new features like web content and donut pie charts. Finally, the paper touches upon the ability to promote your existing reports to the new environment.

Read the paper (PDF)

Interactively redeploying SAS^® Data Integration Studio jobs can be a slow and tedious process. The updated batch deployment utility gives the ETL Tech Lead a more efficient and repeatable method for administering batch jobs. This improved tool became available in SAS^® Data Integration Studio 4.901.

Read the paper (PDF)

UNIX and Linux SAS^® administrators, have you ever been greeted by one of these statements as you walk into the office before you have gotten your first cup of coffee? Power outage! SAS servers are down. I cannot access my reports. Have you frantically tried to restart the SAS servers to avoid loss of productivity and missed one of the steps in the process, causing further delays while other work continues to pile up? If you have had this experience, you understand the benefit to be gained from a utility that automates the management of these multi-tiered deployments. Until recently, there was no method for automatically starting and stopping multi-tiered services in an orchestrated fashion. Instead, you had to use time-consuming manual procedures to manage SAS services. These procedures were also prone to human error, which could result in corrupted services and additional time lost, debugging and resolving issues injected by this process. To address this challenge, SAS Technical Support created the SAS Local Services Management (SAS_lsm) utility, which provides automated, orderly management of your SAS^® multi-tiered deployments. The intent of this paper is to demonstrate the deployment and usage of the SAS_lsm utility. Now, go grab a coffee, and let's see how SAS_lsm can make life less chaotic.

Read the paper (PDF)

Machine learning is in high demand. Whether you are a citizen data scientist who wants to work interactively or you are a hands-on data scientist who wants to code, you have access to the latest analytic techniques with SAS^® Visual Data Mining and Machine Learning on SAS^® Viya . This offering surfaces in-memory machine learning techniques such as gradient boosting, factorization machines, neural networks, and much more through its interactive visual interface, SAS^® Studio tasks, procedures, and a Python client. Learn about this multi-faceted new product and see it in action.

Read the paper (PDF)

This presentation illustrates new technology that has been added to the SAS^® Customer Intelligence 360 analytics testing suite for digital campaign marketing. SAS Customer Intelligence 360 now has a multivariate testing tool (MVT). In digital marketing, MVT has become an increasingly popular process by which multiple components of a campaign can be tested in a live environment with the goal of finding an optimal mix, which drives a defined response metric. In simple terms, MVT is equivalent to running numerous A/B tests performed simultaneously. In theory, MVT can test the effectiveness of limitless combinations of factors. The number of factor-level combinations determines the test duration and the number of samples required to make statistically sound predictions for all permutations of a full factorial design. SAS^® applies experimental design analytics, driven by an interactive process that considers constraints and control cell definition, to guide the user toward an optimal reduced design of the test that can still adequately predict all factor-level combinations, given available resources. The results of the analysis enable the marketer to compare responses for both observed factor-level combinations to the predicted responses to the untested combinations.

Read the paper (PDF)

Customer churn is an important area of concern that affects not just the growth of your company, but also the profit. Conventional survival analysis can provide a customer's likelihood to churn in the near term, but it does not take into account the lifetime value of the higher-risk churn customers you are trying to retain. Not all customers are equally important to your company. Recency, frequency, and monetary (RFM) analysis can help companies identify customers that are most important and most likely to respond to a retention offer. In this paper, we use the IML and PHREG procedures to combine the RFM analysis and survival analysis in order to determine the optimal number of higher-risk and higher-value customers to retain.

Read the paper (PDF)

The Consumer Financial Protection Bureau (CFPB) collects tens of thousands of complaints against companies each year, many of which result in the companies in question taking action, including making payouts to the individuals who filed the complaints. Given the volume of the complaints, how can an overseeing organization quantitatively assess the data for various trends, including the areas of greatest concern for consumers? In this presentation, we propose a repeatable model of text analytics techniques to the publicly available CFPB data. Specifically, we use SAS^® Contextual Analysis to explore sentiment, and machine learning techniques to model the natural language available in each free-form complaint against a disposition code for the complaint, primarily focusing on whether a company paid out money. This process generates a taxonomy in an automated manner. We also explore methods to structure and visualize the results, showcasing how areas of concern are made available to analysts using SAS^® Visual Analytics and SAS^® Visual Statistics. Finally, we discuss the applications of this methodology for overseeing government agencies and financial institutions alike.

Read the paper (PDF)

Many organizations are using SAS^® Visual Analytics for their daily reporting. But as more users gain access to the visual tool, it is easy to lose track of what data is being used, what reports are being accessed, and what elements of the system are classified as critical. With SAS Visual Analytics comes a governance exercise that all organizations should provision for, as otherwise it jeopardizes its maintenance and performance. This paper explores the three different auditing areas that can be configured with SAS Visual Analytics and the different metrics that are associated with them. It presents how to configure the auditing, the data sources that are being populated on the background, and how to exploit them to expand your reports beyond the pre-created audit reports. Consideration is also given to the IT and infrastructure side of enabling auditing mechanisms, with data volumes and archiving practices being at the heart of the discussion.

Read the paper (PDF)

Machine learning predictive modeling algorithms are governed by hyperparameters that have no clear defaults agreeable to a wide range of applications. A few examples of quantities that must be prescribed for these algorithms are the depth of a decision tree, number of trees in a random forest, number of hidden layers and neurons in each layer in a neural network, and degree of regularization to prevent overfitting. Not only do ideal settings for the hyperparameters dictate the performance of the training process, but more importantly they govern the quality of the resulting predictive models. Recent efforts to move from a manual or random adjustment of these parameters have included rough grid search and intelligent numerical optimization strategies. This paper presents an automatic tuning implementation that uses SAS/OR^® local search optimization for tuning hyperparameters of modeling algorithms in SAS^® Visual Data Mining and Machine Learning. The AUTOTUNE statement in the NNET, TREESPLIT, FOREST, and GRADBOOST procedures defines tunable parameters, default ranges, user overrides, and validation schemes to avoid overfitting. Given the inherent expense of training numerous candidate models, the paper addresses efficient distributed and parallel paradigms for training and tuning in SAS^® Viya . It also presents sample tuning results that demonstrate improved model accuracy over default configurations and offers recommendations for efficient and effective model tuning.

Read the paper (PDF)

The singular spectrum analysis (SSA) method of time series analysis applies nonparametric techniques to decompose time series into principal components. SSA is particularly valuable for long time series, in which patterns (such as trends and cycles) are difficult to visualize and analyze. An important step in SSA is determining the spectral groupings; this step can be automated by analyzing the w-correlations of the spectral components. This paper provides an introduction to singular spectrum analysis and demonstrates how to use SAS/ETS^® software to perform it. To illustrate, monthly data on temperatures in the United States over the last century are analyzed to discover significant patterns.

Read the paper (PDF)

I have come up with a way to use the output of the SCAPROC procedure to produce DOT directives, which are then put through the Graphviz engine to produce a diagram. This allows the production of flowcharts of SAS^® code automatically. I have enhanced the charts to also show the longest steps by run time, so even if you look at thousands of steps in a complex program, you can easily see the structure and flow of it, and where most of the time is spent, just by having a look for a few seconds. Great for documentation, benchmarking, tuning, understanding, and more.

Read the paper (PDF)

A lot of time and effort goes into creating presentations or dashboards for the purposes of management business reviews. Data for the presentation is produced from a variety of tools, and the output is cut and pasted into Microsoft PowerPoint or Microsoft Excel. Time is spent not only on the data preparation and reporting, but also on the finishing and touching up of these presentations. In previous years, SAS^® Global Forum authors have described the automation capabilities of SAS^® and Microsoft Office. The default look and feel of SAS output in Microsoft PowerPoint and Microsoft Excel is not always adequate for the more polished requirement of an executive presentation. This paper focuses on how to combine the capabilities of SAS^® Enterprise Guide^®, SAS^® Visual Analytics, and Microsoft PowerPoint into a finished, professional presentation. We will build and automate a beautiful finished end product that can be refreshed by anyone with the click of a mouse.

Read the paper (PDF)

Let's walk through an example of communicating from the SAS^® client to SAS^® Viya . The demonstration focuses on how to use SAS^® language to establish a session, transport and persist data, and receive results. Learn how to establish communication with SAS Viya. Explore topics such as: What is a session? How do I make requests? What does my SAS log tell me? Get a deeper understanding of data location on the client and the server side. Learn about applying existing user formats, how to get listings or reports, and how to query sessions, data, and properties.

Read the paper (PDF)

You have SAS^® software. You have databases or data platforms like Hadoop, possibly with some large distributed data. If you already know how to make SAS code talk to your data platforms, you have already taken a solid step toward a successful integration. But you might also want to know how to take this communication to a different level. If your data platform is the one that is built for massively parallel processing, chances are that SAS code has already created the SAS^® Embedded Process framework that allows SAS tasks to be embedded next to your data sources for execution. SAS^® In-Database Technologies is a family of products that use this framework and provide an accelerated level of integration. This paper explains core principles behind these technologies and presents application scenarios for each of these products. We use a variety of examples to highlight the specifics of individual SAS accelerators (SAS^® Scoring Accelerator, SAS^® In-Database Code Accelerator, and others) across the data platforms.

Read the paper (PDF)

Connecting database schemas to libraries in the SAS^® metadata is a very important part of setting up a functional and useful environment for business users. This task can be quite difficult for the untrained administrator. This paper addresses the key configuration items that often go unnoticed but that can make a big difference. Using the wrong options can lead to poor database performance or even to a total lockdown, depending on the number of connections to the database.

Read the paper (PDF)

It's essential that SAS^® users enhance their skills to implement best-practice programming techniques when using Base SAS^® software. This presentation illustrates core concepts with examples to ensure that code is readable, clearly written, understandable, structured, portable, and maintainable. Attendees learn how to apply good programming techniques including implementing naming conventions for data sets, variables, programs, and libraries; code appearance and structure using modular design, logic scenarios, controlled loops, subroutines and embedded control flow; code compatibility and portability across applications and operating platforms; developing readable code and program documentation; applying statements, options, and definitions to achieve the greatest advantage in the program environment; and implementing program generality into code to enable its continued operation with little or no modifications.

Read the paper (PDF)

Nearly every SAS^® program includes logic that causes certain code to be executed only when specific conditions are met. This is commonly done using the IF-THEN/ELSE syntax. This paper explores various ways to construct conditional SAS logic, some of which might provide advantages over the IF statement in certain situations. Topics include the SELECT statement, the IFC and IFN functions, and the CHOOSE and WHICH families of functions, as well as some more esoteric methods. We also discuss the intricacies of the subsetting IF statement and explain the difference between a regular IF and the %IF macro statement.

Read the paper (PDF)

Soon after the advent of the SAS^® hash object in SAS^®9, its early adopters realized that its potential functionality is much broader than merely using its fast table lookup capability for file matching. This is because in reality, the hash object is a versatile data storage structure with a roster of standard table operations such as create, drop, insert, delete, clear, search, retrieve, update, order, and enumerate. Since it is memory-resident and its key-access operations execute in O(1) time, it runs them as fast as or faster than other canned SAS techniques, with the added bonus of not having to code around their inherent limitations. Another advantage of the hash object as compared to the methods that had existed before its implementation is its dynamic, run-time nature and the ability to handle I/O all by itself, independently of the intrinsic statements of a DATA step or DS2 program calling its methods. The hash object operations, or their combination thereof, lend themselves to diverse SAS programming functionalities well beyond the original focus on data search and retrieval. In this paper, which can be thought of as a preview of a SAS book being written by the authors, we aim to present this logical connection using the power of example.

Read the paper (PDF)

Data that are gathered in modern data collection processes are often large and contain geographic information that enables you to examine how spatial proximity affects the outcome of interest. For example, in real estate economics, the price of a housing unit is likely to depend on the prices of housing units in the same neighborhood or nearby neighborhoods, either because of their locations or because of some unobserved characteristics that these neighborhoods share. Understanding spatial relationships and being able to represent them in a compact form are vital to extracting value from big data. This paper describes how to glean analytical insights from big data and discover their big value by using spatial econometric methods in SAS/ETS^® software.

Read the paper (PDF)

As in any analytical process, data sampling is a definitive step for unstructured data analysis. Sampling is of paramount importance if your data is fed from social media reservoirs such as Twitter, Facebook, Amazon, and Reddit, where information proliferation happens minute by minute. Unless you have a sophisticated analytical engine and robust physical servers to handle billions of pieces of data, you can't use all your data for analysis without sampling. So, how do you sample textual data? The standard method is to generate either a simple random sample, or a stratified random sample if a stratification variable exists in the data. Neither of these two methods can reliably produce a representative sample of documents from the population data simply because the process does not encompass a step to ensure that the distribution of terms between the population and sample sets remains similar. This shortcoming can cause the supervised or unsupervised learning to yield inaccurate results. If the generated sample is not representative of the population data, it is difficult to train and validate categories or sub-categories for those rare events during taxonomy development. In this paper, we show you new methods for sampling text data. We rely on a term-by-document matrix and SAS^® macros to generate representative samples. Using these methods helps generate sufficient samples to train and validate every category in rule-based modeling approaches using SAS^® Contextual Analysis.

Read the paper (PDF)

Often the burden of productionisation of analytical models falls upon the analyst, so every Monday morning the analyst comes in and presses the Run button. Now this is obviously fraught with danger (for example, the source data isn't available, the analyst goes on holidays, or the analyst resigns), and might lead to invalid results being consumed by downstream systems. There are many reasons that this might occur, but the most common one is that it takes IT too long to put a model into full production (especially if that model contains new data sources). In this presentation, I show a tested architecture that allows for the typical rapid development of models (and in fact it actually significantly speeds up the discovery phase), as well as allows for an orderly handover to IT for them to productionise without disrupting the regular run of the models. This allows for notification of downstream users if there is a delay in the arrival of data, as well as rapid IT Operations response if there is a problem during the loading and creation.

Whether you are calculating a credit risk, a health risk, or something entirely different, you need instant, on-the-fly risk score calculation across multiple industries. This paper demonstrates how you can produce individualized risk scores through interactive dashboards. Your risk scores are backed by powerful SAS^® analytics because they leverage score code that you produce in SAS^® Visual Statistics. Advanced topics, including the use of calculated items and parameters in your dashboards, as well as how to develop SAS^® Stored Processes capable of accepting parameters that are passed through your SAS^® Visual Analytics Dashboard are covered in detail.

Read the paper (PDF)

SAS^® is perfect for building enterprise apps. Think about it: SAS speaks to almost any database you can think of and is probably already hooked in to most of the data sources in your organization. A full-fledged metadata security layer happens to already be integrated with your single sign-on authentication provider, and every time a user interacts with the system, their permissions are checked and the data their app asks for is automatically encrypted. SAS ticks all the boxes required by IT, and the skills required to start developing apps already sit within your department. Your team most likely already knows what your app needs to do, so instead of writing lists of requirements, give them an HTML5 resource, and together they can write and deploy the back-end code themselves. The apps run in the browser, the server-side logic is deployed using SAS .spk packages, and permissions are managed via SAS^® Management Console. Best of all, the infrastructure that would normally take months to integrate is already there, eliminating barriers to entry and letting you demonstrate the value of your solution to internal customers with zero up-front investment. This paper shows how SAS integrates with open-source tools like H54S, AngularJS, and PostGIS, together with next-generation developer-centric analytical platforms like SAS^® Viya , to build secure, enterprise-class apps that can support thousands of users. This presentation includes lots of app demos. This presentation was included at SAS^® Forum UK 2016.

Read the paper (PDF)

Cascading Style Sheets (CSS) frameworks like Bootstrap, and JavaScript libraries such as jQuery and h54s, have made it faster than ever before to develop enterprise-grade web apps on top of the SAS^® platform. Hailing the benefits of using SAS as a back end (authentication, security, ease of data access), this paper navigates the configuration issues to consider for maximum responsiveness to client web requests (pooled sessions, load balancing, multibridge connections). Cherry picking from the whirlwind of front end technologies and approaches, the author presents a framework that enables the novice programmer to build a simple web app in minutes. The exact steps necessary to achieve this are described, alongside a hurricane of practical tips like the following: dealing with CORS; logging in SAS; debugging AJAX calls; and SAS http responses. Beware this approach is likely to cause a storm of demand in your area! Server requirements: SAS^® Business Intelligence Platform (SAS^® 9.2 or later); SAS^® Stored Process Web Application (SAS^® Integration Technologies). Client requirements: HTML5 browser (Microsoft Internet Explorer 8 or later); access to open-source libraries (which can be hosted on-premises if Internet access is an issue).

Read the paper (PDF)

A Bayesian network is a directed acyclic graphical model that represents probability relationships and conditional independence structure between random variables. SAS^® Enterprise Miner implements a Bayesian network primarily as a classification tool; it includes na ve Bayes, tree-augmented na ve Bayes, Bayesian-network-augmented na ve Bayes, parent-child Bayesian network, and Markov blanket Bayesian network classifiers. The HPBNET procedure uses a score-based approach and a constraint-based approach to model network structures. This paper compares the performance of Bayesian network classifiers to other popular classification methods, such as classification tree, neural network, logistic regression, and support vector machines. The paper also shows some real-world applications of the implemented Bayesian network classifiers and a useful visualization of the results.

Read the paper (PDF)

The SAS^® Macro Language gives you the power to create tools that, to a large extent, think for themselves. How often have you used a macro that required your input, and you thought to yourself, Why do I need to provide this information when SAS^® already knows it? SAS might already know most of this information, but how does SAS direct your macro programs to self-discern the information that they need? Fortunately, there are a number of functions and tools in SAS that can intelligently enable your programs to find and use the information that they require. If you provide a variable name, SAS should know its type and length. If you provide a data set name, SAS should know its list of variables. If you provide a library or libref, SAS should know the full list of data sets that it contains. In each of these situations, functions can be used by the macro language to determine and return information. By providing a libref, functions can determine the library's physical location and the list of data sets it contains. By providing a data set, they can return the names and attributes of any of the variables that it contains. These functions can read and write data, create directories, build lists of files in a folder, and build lists of folders. Maximize your macro's intelligence; learn and use these functions.

Read the paper (PDF)

When new technologies, workflows, or processes are implemented, an organization and its employees must embrace changes in order to ensure long-term success. This paper provides guidelines and best practices in change management that the SAS Advanced Analytics Division uses with customers when it implements prescriptive analytics solutions (provided by SAS/OR^® software). Highlights include engaging technical leaders in defining project scope and providing functional design documents. The paper also highlights SAS' approach in engaging business leaders on business scope, garnering executive-level project involvement, establishing steering committees, defining use cases, developing an effective communication strategy, training, and implementing of SAS/OR solutions.

Read the paper (PDF)

Rapid advances in technology have empowered musicians all across the globe to share their music easily, resulting in intensified competition in the music industry. For this reason, musicians and record labels need to be aware of factors that can influence the popularity of their songs. The focus of our study is to determine how themes, topics, and terms within song lyrics have changed over time and how these changes might have influenced the popularity of songs. Moreover, we plan to run time series analysis on the numeric attributes of Billboard Top 100 songs in order to determine the appropriate combination of relevant attributes that influences a song's popularity. The findings of our study can potentially benefit musicians and record labels in understanding the necessary lyrical construction, overall themes, and topics that might enable a song to reach the highest chart position on the Billboard Top 100. The Billboard Top 100 is an optimal source of data, as it is an objective measure of popularity. Our data has been collected from open sources. Our data set consists of all 334,784 Billboard Top 100 observations for the years 1955-2015, with metadata covering all 26,869 unique songs that have appeared on the chart for that period. Our expanding lyric data set currently contains 18,002 of those songs, which were used to conduct our analysis. SAS^® Enterprise Miner and SAS^® Sentiment Analysis Studio were the primary tools of our analysis.

View the e-poster or slides (PDF)

SAS^® Output Delivery System (ODS) Graphics started appearing in SAS^® 9.2. Collectively these new tools were referred to as 'ODS Graphics,' 'SG Graphics' and 'Statistical Graphics'. When first starting to use these tools, the traditional SAS/GRAPH^® software user might come upon some very significant challenges in learning the new way to do things. This is further complicated by the lack of simple demonstrations of capabilities. Most graphs in training materials and publications are rather complicated graphs that, while useful, are not good teaching examples for starting purposes. This paper contains many examples of very simple ways to get very simple things accomplished. Many different graphs are developed using only a few lines of code each, using data from the SASHELP data sets. The use of the SGPLOT, SGPANEL, and SGSCATTER procedures are shown. In addition, the paper addresses those situations in which the user must alternatively use a combination of the TEMPLATE and SGRENDER procedures to accomplish the task at hand. Most importantly, the use of the 'ODS Graphics Designer' as a teaching tool and a generator of sample graphs and code are covered. This tool makes use of the TEMPLATE and SGRENDER Procedures, generating Graphics Template Language (GTL) code. Users get extremely productive fast. The emphasis in this paper is the simplicity of the learning process. Users will be able to take the generated code and run it immediately on their personal machines.

Read the paper (PDF) | View the e-poster or slides (PDF)

In the pharmaceutical industry, we find ourselves having to re-run our programs repeatedly for each deliverable. These programs can be run individually in an interactive SAS^® session, which enables us to review the logs as we execute the programs. We could run the individual programs in batch and open each individual log to review for unwanted log messages, such as ERROR, WARNING, uninitialized, have been converted to, and so on. Both of these approaches are fine if there are only a handful of programs to execute. But what do you do if you have hundreds of programs that need to be re-run? Do you want to open every single one of the programs and search for unwanted messages? This manual approach could take hours and is prone to accidental oversight. This paper discusses a macro that searches a specified directory and checks either all the logs in the directory, only logs with a specific naming convention, or only the files listed. The macro then produces a report that lists all the files checked and indicates whether issues were found.

Read the paper (PDF)

SAS^® is often deployed in a client/server architecture in which SAS^® Foundation is installed on a server and is accessed from each user's workstation. Many system administrators prefer that users not log on directly to the server to run SAS, nor do they want to set up a complex Citrix environment. SAS client applications are an attractive alternative for this type of architecture. But with the advent of multiple SAS^® Studio editions and ongoing enhancements to SAS^® Enterprise Guide^®, choosing the most suitable client application presents a challenge for many system administrators. To help guide you in this choice, this paper compares the administration of three SAS Foundation client applications that can be used in a client/server architecture: SAS Enterprise Guide, SAS^® Studio Basic, and SAS^® Studio Mid-Tier. The usage differences between SAS Studio and SAS Enterprise Guide have been addressed elsewhere. In this paper, we focus on differences that pertain specifically to system administration, including deployment, maintenance, and authentication. The information presented here will help system administrators determine which application best fits the needs of their users and their environment.

Read the paper (PDF)

Today it is vital for an organization to manage, distribute, and secure content for its employees. In most cases, different groups of employees are interested in different content, and some content should not be available to everyone. It is the SAS^® administrator's job to design a metadata group structure that makes managing content easier. SAS enables you to create any metadata group organizational structure imaginable, and it is common to define a metadata group structure that mimics the organization's hierarchy. Circular group memberships are frequently the cause of unexpected issues with SAS web applications. A circular group relationship can be as simple as two groups being members of one another. You might not be aware that you have defined this type of recursive association between groups. The paper identifies some problems that are caused by recursive group memberships and provides tools to investigate your metadata group structure that help identify recursive metadata group relationships. We explain the process of extracting group associations from the SAS^® Metadata Server, and we show how to organize this data to investigate group relationships. We use a stored process to generate a report and SAS^® Visual Analytics to generate a network diagram that provides a graphical representation of an organization's group relationship structure, to easily identify circular group structures.

Read the paper (PDF)

A/B testing is a form of statistical hypothesis testing on two business options (A and B) to determine which is more effective in the modern Internet age. The challenge for startups or new product businesses leveraging A/B testing are two-fold: a small number of customers and poor understanding of their responses. This paper shows you how to use the IML and POWER procedures to deal with the reassessment of sample size for adaptive multiple business stage designs based on conditional power arguments, using the data observed at the previous business stage.

Read the paper (PDF)

The LUA procedure is a relatively new SAS^® procedure, having been available since SAS^® 9.4. It allows for the Lua language to be used as an interface to SAS, as an alternative scripting language to the SAS macro facility. This paper compares and contrasts PROC LUA with the SAS macro facility, showing examples of approaches and highlighting the advantages and disadvantages of each.

Read the paper (PDF)

This presentation discusses the options for including continuous covariates in regression models. In his book, 'Clinical Prediction Models,' Ewout Steyerberg presents a hierarchy of procedures for continuous predictors, starting with dichotomizing the variable and moving to modeling the variable using restricted cubic splines or using a fractional polynomial model. This presentation discusses all of the choices, with a focus on the last two. Restricted cubic splines express the relationship between the continuous covariate and the outcome using a set of cubic polynomials, which are constrained to meet at pre-specified points, called knots. Between the knots, each curve can take on the shape that best describes the data. A fractional polynomial model is another flexible method for modeling a relationship that is possibly nonlinear. In this model, polynomials with noninteger and negative powers are considered, along with the more conventional square and cubic polynomials, and the small subset of powers that best fits the data is selected. The presentation describes and illustrates these methods at an introductory level intended to be useful to anyone who is familiar with regression analyses.

Read the paper (PDF)

Learn how SAS^® works with analytics, big data, and the cloud all in one product: SAS^® Analytics for Containers. This session describes the architecture of containers running in the public, private, or hybrid cloud. The reference architecture also shows how SAS leverages the distributed compute of Hadoop. Topics include how SAS products such as Base SAS^®, SAS/STAT^® software, and SAS/GRAPH^® software can all run in a container in the cloud. This paper discusses how to work with a SAS container running in a variety of Infrastructure as a Service (IaaS) models, including Amazon Web Services and OpenStack cloud. Additional topics include provisioning web-browser-based clients via Jupyter Notebooks and SAS^® Studio to provide data scientists with the tool of their choice. A customer use case is discussed that describes how SAS Analytics for Containers enables an IT department to meet the ad hoc, compute-intensive, and scaling demands of the organization. An exciting differentiator for the data scientist is the ability to send some or all of the analytic workload to run inside their Hadoop cluster by using the SAS accelerators for Hadoop. Doing so enables data scientists to dive inside the data lake and harness the power of all the data.

Read the paper (PDF)

An interactive voice response (IVR) system is a powerful tool that automates routine inbound call tasks. Companies leverage this system and make substantial savings by cutting down call center costs while customers from do-it-yourselfers to non-tech-savvies take advantage of this technology rather than wait in line to speak to a Customer Care Representative (CSR). The flip side of the coin is that customers often see IVR as a barrier to overcome in order to talk to a real person. So it is important that IVR is managed in such a way that it is mutually beneficial for both a business and their customers. If managing IVR is critical, then measuring Customer Satisfaction (CSAT) scores is paramount as it helps in understanding customers better. The first section of this paper discusses analysis of different use cases on how CSAT scores correlate with customers' journeys inside IVR. West Corporation's leading financial services client offers a survey to their customers, and customers rate questions on a scale of 1 to 10 based on their IVR experience (10 being extremely satisfied). Analysis of survey ratings using SAS^® helped Operations understand challenges faced by customers traversing different sections of the IVR. The second section of the paper discusses how the research helped Operations to identify a population specification error that occurred while surveying customers. The error was rectified, and IVR CSAT scores improved by 3%.

Read the paper (PDF)

This paper shows how to use Base SAS^® to create unique datetime stamps that can be used for naming external files. These filenames provide automatic versioning for systems and are intuitive and completely sortable. In addition, they provide enhanced flexibility compared to generation data sets, which can be created by SAS^® or by the operating system.

Read the paper (PDF)

Clinical research study enrollment data consists of subject identifiers and enrollment dates that are used by investigators to monitor enrollment progress. Meeting study enrollment targets is critical to ensuring there will be enough data and end points to enable the statistical power of the study. For clinical trials that do not experience heavy, nearly daily enrollment, there will be a number of dates on which no subjects were enrolled. Therefore, plots of cumulative enrollment represented by a smoothed line can give a false impression, or imprecise reading, of study enrollment. A more accurate display would be a step function plot that would include dates where no subjects were enrolled. Rolling average plots often start with summing the data by month and creating a rolling average from the monthly sums. This session shows how to use the EXPAND procedure, along with the SQL and GPLOT procedures and the INTNX function, to create plots that display cumulative enrollment and rolling 6-month averages for each day. This includes filling in the dates with no subject enrollment and creating a rolling 6-month average for each date. This allows analysis of day-to-day variation as well as the short- and long-term impacts of changes, such as adding an enrollment center or initiatives to increase enrollment. This technique can be applied to any data that has gaps in dates. Examples include service history data and installation rates for a newly launched product.

Read the paper (PDF)

Computer and video games are complex these days. Events in video games are in some cases recorded automatically in text files, creating a history or story of game play. There are countable items in these event records that can be used as data for statistics and other types of modeling. This E-Poster shows you how to statistically analyze text files for video game events using SAS^®. Two games are analyzed. EVE Online, a massive multi-user online role-playing spaceship game, is one. The other game is Ingress, a cell phone game that combines exercise with a GPS and real-world environments. In both examples, the techniques involve parsing large amounts of text data to examine recurring patterns in text that describe events in the game play.

View the e-poster or slides (PDF)

This presentation describes an ongoing effort to standardize and simplify SAS^® coding across a rapidly growing analytics team in the health care industry. The number of SAS analysts in Kaiser Permanente's Data and Information Management Enhancement (DIME) department has nearly doubled in the past two years, going from approximately 20 to 40 analysts. The level of experience and technical skill varies greatly within the department. Analysts are required to provide quick turn-around on a large volume of analytical requests in this dynamic and high-demand environment. An effort was initiated in 2016 to create a SAS^® Enterprise Guide^® Template to standardize and simplify SAS coding across the department. The SAS Enterprise Guide^® template is designed to be a standard project file containing predefined code shells and examples that can be used as a basis for all new SAS Enterprise Guide^® projects. The primary goals of the template are to: 1) Effectively onboard new analysts to department standards; 2) Increase the efficiency of SAS development; 3) Bring consistency to how SAS is used; and 4) Simplify the transitioning of SAS jobs to the department's Production Support team. This presentation focuses on the process in which the template was initiated, drafted, and socialized across a large and diverse team of SAS analysts. It also highlights plans for ongoing maintenance of and improvements to the original template.

Read the paper (PDF)

We've learned a great deal about how to develop great reports and about business intelligence (BI) tools and how to use them to create reports, but have we figured out how to create true BI reports? Not every report that comes out of a BI tool provides business intelligence! In pursuit of the perfect BI report, this paper explores how we can combine the best of lessons learned about developing and running traditional reports and about applying business analytics in order to create true BI reports that deliver integrated analytics and intelligence.

Read the paper (PDF)

The DATA step is the familiar and powerful data processing language in SAS^® and now SAS Viya . The DATA step's simple syntax provides row-at-a-time operations to edit, restructure, and combine data. New to the DATA step in SAS Viya are a varying-size character data type and parallel execution. Varying-size character data enables intuitive string operations that go beyond the 32KB limit of current DATA step operations. Parallel execution speeds the processing of big data by starting the DATA step on multiple machines and dividing data processing among threads on these machines. To avoid multi-threaded programming errors, the run-time environment for the DATA step is presented along with potential programming pitfalls. Come see how the DATA step in SAS Viya makes your data processing simpler and faster.

Read the paper (PDF)

The DATA Step has served SAS^® programmers well over the years. Although the DATA step is handy, the new, exciting, and powerful DS2 provides a significant alternative to the DATA step by introducing an object-oriented programming environment. It enables users to effectively manipulate complex data and efficiently manage the programming through additional data types, programming structure elements, user-defined methods, and shareable packages, as well as providing threaded execution. This tutorial was developed based on our experiences with getting started with DS2 and learning to use it to access, manage, and share data in a scalable and standards-based way. It facilitates SAS users of all levels to easily get started with DS2 and understand its basic functionality by practicing how to use the features of DS2.

Read the paper (PDF) | Download the data file (ZIP)

For all business analytics projects big or small, the results are used to support business or managerial decision-making processes, and many of them eventually lead to business actions. However, executives or decision makers are often confused and feel uninformed about contents when presented with complicated analytics steps, especially when multi-processes or environments are involved. After many years of research and experiment, a web reporting framework based on SAS^® Stored Processes was developed to smooth the communication between data analysts, researches, and business decision makers. This web reporting framework uses a storytelling style to present essential analytical steps to audiences, with dynamic HTML5 content and drill-down and drill-through functions in text, graph, table, and dashboard formats. No special skills other than SAS^® programming are needed for implementing a new report. The model-view-controller (MVC) structure in this framework significantly reduced the time needed for developing high-end web reports for audiences not familiar with SAS. Additionally, the report contents can be used to feed to tablet or smartphone users. A business analytical example is demonstrated during this session. By using this web reporting framework based on SAS Stored Processes, many existing SAS results can be delivered more effectively and persuasively on a SAS^® Enterprise BI platform.

Read the paper (PDF)

Do your reports effectively communicate the message you intended? Are your reports aesthetically pleasing? An attractive report does not ensure the accurate delivery of a data story, nor does a logical data story guarantee visual appeal. This paper provides guidance for SAS^® Visual Analytics Designer users to facilitate the creation of compelling data stories. The primary goal of a report is to enable readers to quickly and easily get answers to their questions. Achieving this goal is strongly influenced by the choice of visualizations for the data, the quantity and arrangement of the information that is included, and the use or misuse of color. This paper describes how to guide readers' movement through a report to support comprehension of the data story; provides tips on how to express quantitative data using the most appropriate graphs; suggests ways to organize content through the use of visual and interactive design techniques; and instructs report designers about the meaning of colors, presenting the notion that even subtle changes in color can evoke feelings that are different from those intended. A thoughtfully designed report can educate the viewer without compromising visual appeal. Included in this paper are recommendations and examples which, when applied to your own work, will help you create reports that are both informative and beautiful.

Read the paper (PDF)

Users want more power. SAS^® delivers. Data grids are a new data type available to users of SAS^® Business Rules Manager and SAS^® Decision Manager. These data grids can be deployed to both batch and web service scoring for data mining models and business decisions. Users will learn how to construct data with grid data types, create business rules using high-level expressions, and deploy decisions to both batch and web services for scoring.

Read the paper (PDF)

SAS^® Visual Analytics is a very powerful tool for users to visually explore data, but in some organizations not all data should be available for everybody. And although it is relatively easy to scale up a SAS Visual Analytics environment when the need for data increases, it still would be beneficial to set up a structure where the organization can keep control over who actually has the right to load data versus providing everybody the right to load data into a SAS Visual Analytics environment. Within this breakout session a potential solution is shown by providing a high-level overview of the SAS Visual Analytics data access management solution at ING bank in the Netherlands for the Risk Services Organization.

Read the paper (PDF)

As an information security or data professional, you have seen and heard about how advanced analytics has impacted nearly every business domain. You recognize the potential of insights derived from advanced analytics to improve the information security of your organization. You want to realize these benefits, and to understand their pitfalls. To successfully apply advanced analytics to the information security business problem, proper application of data management processes and techniques is of paramount importance. Based on professional services experience in implementing SAS^® Cybersecurity, this session teaches you about the data sources used, the activities involved in properly managing this data, and the means to which these processes address information security business problems. You will come to appreciate how using advanced analytics in the information security domain requires more than just the application of tools or modeling techniques. Using a data management regime for information security concerns can benefit your organization by providing insights into IT infrastructure, enabling successful data science activities, and providing greater resilience by way of improved information security investigations.

Read the paper (PDF)

The discipline of data science has seen an unprecedented evolution from primordial darkness to becoming the academic equivalent of an apex predator on university campuses across the country. But, survival of the discipline is not guaranteed. This session explores the genetic makeup of programs that are likely to survive, the genetic makeup of those that are likely to become extinct, and the role that the business community plays in that evolutionary process.

Read the paper (PDF)

Microsoft SharePoint is a popular web application framework and platform that is widely used for content and document management by companies and organizations. Connecting SAS^® with SharePoint combines the power of these two into one. As a continuation of my SAS^® Global Forum Paper 11520-2016 titled Releasing the Power of SAS^® into Microsoft SharePoint, this paper expands on how to implement data visualization from SAS to SharePoint. This paper shows users how to use SAS/GRAPH^® software procedures, Output Delivery System (ODS), and emails to create and send visualization output files from SAS to SharePoint Document Library. Several SAS code examples are included to show how to create tables, bar charts (with PROC GCHART), line plots (with PROC SGPLOT) and maps (with PROC GMAP) from SAS to SharePoint. The paper also demonstrates how to create data visualization based on JavaScript by feeding SAS data into HTML pages on SharePoint. A couple of examples on how to export SAS data to JSON formats and create data visualization in SharePoint based on JavaScript are provided.

Read the paper (PDF)

The SAS^® 9.4 SGPLOT procedure is a great tool for creating all types of graphs, from business graphs to complex clinical graphs. The goal for such graphs is to convey the data in a simple and direct manner with minimal distractions. But often, you need to grab the attention of a reader in the midst of a sea of data and graphs. For such cases, you need a visual that can stand out above the rest of the noise. Such visuals insert a decorative flavor into the graph to attract the eye of the reader and to encourage them to spend more time studying the visual. This presentation discusses how you can create such attention-grabbing visuals using the SGPLOT procedure.

Read the paper (PDF)

This paper presents considerations for deploying SAS^® Foundation across software-defined storage (SDS) infrastructures, and within virtualized storage environments. There are many new offerings on the market that offer easy, point-and-click creation of storage entities, with simplified management. Internal storage area network (SAN) virtualization also removes much of the hands-on management for defining storage device pools. Automated tier software further attempts to optimize data placement across performance tiers without manual intervention. Virtual storage provisioning and automated tier placement have many time-saving and management benefits. In some cases, they have also caused serious unintended performance issues with heavy large-block workloads, such as those found in SAS Foundation. You must follow best practices to get the benefit of these new technologies while still maintaining performance. For SDS infrastructures, this paper offers specific considerations for the performance of applications in SAS Foundation, workload management and segregation, replication, high availability, and disaster recovery. Architecture and performance ramifications and advice are offered for virtualized and tiered storage systems. General virtual storage pros and cons are also discussed in detail.

Read the paper (PDF)

As a report designer using SAS^® Visual Analytics, your goal is to create effective data visualizations that quickly communicate key information to report readers. But what makes a dashboard or report effective? How do you ensure that key points are understood quickly? One of the most common questions asked about SAS Visual Analytics is: what are the best practices for designing a report? Experts like Stephen Few and Edward Tufte have written extensively about successful visual design and data visualization. This paper focuses mainly on a different aspect of visual reports-the speed with which online reports render. In today's world, instant results are almost always expected. And the faster your report renders, the sooner decisions can be made and actions taken. Based on proven best practices and existing customer implementations, this paper focuses on server-side performance, client-side performance, and design performance. The end result is a set of design techniques that you can put into practice immediately and optimize your report performance.

Read the paper (PDF)

This paper describes specific actions to be taken to increase the usability, data consistency, and performance of an advanced SAS^® Customer Intelligence solution for marketing and analytic purposes. In addition, the paper focuses on the establishment of a data governance program to support the processes that take place within this environment. This paper presents our experiences developing a data governance light program for the enterprise data warehouse and its sources as well as for the data marts created downstream to address analytic and campaign management purposes. The challenge was to design a data governance program for this system in 90 days.

Read the paper (PDF)

The ever growing volume of data challenges us to keep pace in ensuring that we use it to its full advantage. Unfortunately, often our response to new data sources, data types, and applications is somewhat reactionary. There exists a misperception that organizations have precious little time to consider a purposeful strategy without disrupting business continuity. Strategy is a phrase that is often misused and ill-defined. However, it is nothing more than a set of integrated choices that help position an initiative for future success. This presentation covers the key elements defining data strategy. The following key topics are included: What data should we keep or toss? How should we structure data (warehouse versus data lake versus real-time event streaming)? How do we store data (cloud, virtualization, federation, cloud, Hadoop)? What is the approach we use to integrate and cleanse data (ETL versus cognitive/ automated profiling)? How do we protect and share data? These topics ensure that the organization gets the most value from our data. They explore how we prioritize and adapt our strategy to meet unanticipated needs in the future. As with any strategy, we need to make sure that we have a roadmap or plan for execution, so we talk specifically about the tools, technologies, methods, and processes that are useful as we design a data strategy that is both relevant and actionable to your organization.

Read the paper (PDF)

Standard SAS^® Studio tasks already include many advanced analytic procedures for data mining and other high-performance models, enabling point-and-click generation and execution of SAS^® code. However, you can extend the power of tasks by creating tasks of your own to enable point-and-click access to the latest SAS statistical procedures, to your own default model definitions, or to your previously developed SAS/STAT^® or SAS macro code. Best of all, these point-and-click tasks can be developed directly in SAS Studio without the need to compile binaries or build DLL files using third-party software. In this paper, we demonstrate three approaches to developing custom tasks. First, we build a custom task to provide point-and-click access to PROC IRT, including recently added functionality to PROC IRT used to analyze educational test and opinion survey data. Second, we build a custom task that calls a macro for previously developed SAS code, and we show how point-and-click options can be set up to allow users to guide the execution of complex macro code. Third, we demonstrate just enough of the underlying Apache Velocity Template Language code to enable developers to take advantage of the benefits of that language to support their SAS process. Finally, we show how these tasks can easily be shared with a user community, increasing the efficiency of analytic modeling across the organization.

Read the paper (PDF)

Hash objects have been supported in the DATA step and in the FCMP procedure for a while, but have you ever felt that hash objects could do a little more? For example, what if you needed to store more than doubles and character strings? Introducing PROC FCMP dictionaries. Dictionaries allow you to create references not only to numeric and character data, but they also give you fast in-memory hashing to arrays, other dictionaries, and even PROC FCMP hash objects. This paper gets you started using PROC FCMP dictionaries, describes usage syntax, and explores new programming patterns that are now available to your PROC FCMP programs, functions, and subroutines in the new SAS^® Viya platform environment.

Read the paper (PDF)

In highly competitive markets, the response rates to economically reasonable marketing campaigns are as low as a few percentage points or less. In that case, the direct measure of the delta between the average key performance indicators (KPIs) of the treated and control groups is heavily 'contaminated' by non-responders. This paper focuses on measuring promotional marketing campaigns with two properties: (1) price discounts or other benefits, which are changing profitability of the targeted group for at least the promotion periods, and (2) impact of self-responders. The paper addresses the decomposition of the KPI measurement between responders and non-responders for both groups. Assuming that customers who rejected promotional offers will not change their behavior and that non-responders of both treated and control groups are not biased, the delta of the average KPIs for non-responders should be equal to zero. In practice, this component might be significantly deviated from zero. It might be caused by an initial nonzero delta of KPI values despite a random split between groups or by existence of outliers, especially for non-balanced campaigns. In order to address the deviation of the delta from zero, it might require running additional statistical tests comparing not just the means but the distributions of KPIs as well. The decomposition of the measurement between responders and non-responders for both groups can then be used in differential modeling.

Read the paper (PDF)

SAS^® has a very efficient and powerful way to get distances between an event and a customer. Using the tables and code located at http://support.sas.com/rnd/datavisualization/mapsonline/html/geocode.html#street, you can load latitude and longitude to addresses that you have for your events and customers. Once you have the tables downloaded from SAS, and you have run the code to get them into SAS data sets, this paper helps guide you through the rest using PROC GEOCODE and the GEODIST function. This can help you determine to whom to market an event. And, you can see how far a client is from one of your facilities.

Read the paper (PDF) | View the e-poster or slides (PDF)

Creating an effective style for your graphics can make the difference between clearly conveying your message to your audience and hiding your message in a sea of lines, markers, and text. A number of books explain the concepts of effective graphics, but you need an understanding of how styles work in your environment to correctly apply those principles. The goal of this paper is to give you an in-depth discussion of how styles are applied to Output Delivery System (ODS) graphics, from the ODS style level all the way down to the graph syntax. This discussion includes information about differences in grouped versus non-grouped plots, precedence order of style application, using style references, and much more. Don't forget your scuba gear!

Read the paper (PDF)

Are you prepared if a disaster happens? If your company relies on SAS^® applications to stay in business, you should have a Disaster Recovery Plan (DRP) in place. By a DRP, we mean documentation of the process to recover and protect your SAS infrastructure (SAS binaries, the operating system that is tuned to run your SAS applications, and all the pertinent data that the SAS applications require) in the event of a disaster. This paper discusses what needs to be in this plan to ensure that your SAS infrastructure not only works after it is recovered, but is able to be maintained on the recovery hardware infrastructure.

Read the paper (PDF)

Discover how to document your SAS^® programs, data sets, and catalogs with a few lines of code that include SAS functions, macro code, and SAS metadata. Do you start every project with the best of intentions to document all of your work, and then fall short of that aspiration when deadlines loom? Learn how SAS system macro variables can provide valuable information embedded in your programs, logs, lists, catalogs, data sets and ODS output; how your programs can automatically update a processing log; and how to generate two different types of codebooks.

Read the paper (PDF) | View the e-poster or slides (PDF)

Learn neat new (and not so new) methods for joining text, graphs, and tables in a single document. This paper shows how you can create a report that includes all three with a single solution: SAS^®. The text portion is taken from a Microsoft Word document and joined with output from the GPLOT and REPORT procedures.

Read the paper (PDF)

Whether you have been programming in SAS^® for years, or you are new to it, or you have dabbled with SAS^® Enterprise Guide^® before, this hands-on workshop sheds some light on the depth, breadth, and power of the SAS Enterprise Guide environment. With all the demands on your time, you need powerful tools that are easy to learn and that deliver end-to-end support for your data exploration, reporting, and analytics needs. Included in this workshop are data exploration tools; formatting code (cleaning up after your coworkers); enhanced programming environment (and how to calm it down); easily creating reports and graphics; producing the output formats you need (XLS, PDF, RTF, HTML); workspace layout; and productivity tips. This workshop uses SAS Enterprise Guide 7.1, but most of the content is applicable to earlier versions.

Read the paper (PDF) | Download the data file (ZIP)

Some data is best visualized in a polar orientation, particularly when the data is directional or cyclical. Although the SG procedures and Graph Template Language (GTL) do not directly support polar coordinates, they are quite capable of drawing such graphs with a little bit of data processing. We demonstrate how to convert your data from polar coordinates to Cartesian coordinates and use the power of SG procedures to create graphs that retain the polar nature of your data. Stop going around in circles: let us show you the way out with SG procedures!

Read the paper (PDF)

Customer feedback is a critical aspect of businesses in today's world as it is invaluable in determining what customers like and dislike about the business' service. This loop of regularly listening to customers' voice through survey comments and improving services based on it leads to better business, and more importantly, to an enhancement in customer experience. The challenge is to classify and analyze these unstructured text comments to gain insights and to focus on areas of improvement. The purpose of this paper is to illustrate how text mining in SAS^® Enterprise Miner 14.1 helped one of our clients a leading financial services company convert their customers problems into opportunities. The customers' feedback pertaining to their experience with an Interactive Voice Response (IVR) system is collected by an enterprise feedback management (EFM) company. The comments are then split into two groups, which helps us differentiate customer opinions. This grouping is based on customers who have given a rating of 0 6 and a rating of 9 10 on a Likert scale of 0 10 (10 being extremely satisfied) in the survey questionnaire. Text mining is performed on both these groups, and an algorithm creates clusters that are consequentially used to segment customers based on opinions they are interested in voicing. Furthermore, sentiment scores are calculated for each one of the segments. The scores classify the polarity of customer feedback and prioritizes the problems the client needs to focus on.

Read the paper (PDF)

Sometimes it might be beneficial to share a BI environment with multiple tenants within an enterprise, but at the same time this might also introduce additional complexity with regard to the administration of data access. In this breakout session, one possible setup is shown by sharing a high-level overview of such an environment within the ING bank in the Netherlands for the Risk Services organization.

Read the paper (PDF)

The Base SAS^® 9.4 Output Delivery System (ODS) EPUB destination enables users to deliver SAS^® reports as e-books on Apple mobile devices. ODS EPUB e-books are truly mobile you don't need an Internet connection to read them. Just install Apple's free iBooks app, and you're good to go. This paper shows you how to create an e-book with ODS EPUB and sideload it onto your Apple device. You will learn new SAS^® 9.4 techniques for including text, images, audio, and video in your ODS EPUB e-books. You will understand how to customize your e-book's table of contents (TOC) so that readers can easily navigate the e-book. And you will learn how to modify the ODS EPUB style to create specialized presentation effects. This paper provides beginning to intermediate instruction for writing e-books with ODS EPUB. Please bring your iPad, iPhone, or iPod to the presentation so that you can download and read the examples.

Read the paper (PDF)

Creating an environment that enables and empowers self-service and agile analytic capabilities requires a tremendous amount of working together and extensive agreements between IT and the business. Business and IT users are struggling to know what version of the data is valid, where they should get the data from, and how to combine and aggregate all the data sources to apply analytics and deliver results in a timely manner. All the while, IT is struggling to supply the business with more and more data that is becoming available through many different data sources such as the Internet, sensors, the Internet of Things, and others. In addition, once they start trying to join and aggregate all the different types of data, the manual coding can be very complicated and tedious, can demand extraneous resources and processing, and can negatively impact the overhead on the system. If IT enables agile analytics in a data lab, it can alleviate many of these issues, increase productivity, and deliver an effective self-service environment for all users. This self-service environment using SAS^® analytics in Teradata has decreased the time required to prepare the data and develop the statistical data model, and delivered faster results in minutes compared to days or even weeks. This session discusses how you can enable agile analytics in a data lab, leverage SAS analytics in Teradata to increase performance, and learn how hundreds of organizations have adopted this concept to deliver self-service capabilities in a streamlined process.

Randomized control trials have long been considered the gold standard for establishing causal treatment effects. Can causal effects be reasonably estimated from observational data too? In observational studies, you observe treatment T and outcome Y without controlling confounding variables that might explain the observed associations between T and Y. Estimating the causal effect of treatment T therefore requires adjustments that remove the effects of the confounding variables. The new CAUSALTRT (causal-treat) procedure in SAS/STAT^® 14.2 enables you to estimate the causal effect of a treatment decision by modeling either the treatment assignment T or the outcome Y, or both. Specifically, modeling the treatment leads to the inverse probability weighting methods, and modeling the outcome leads to the regression methods. Combined modeling of the treatment and outcome leads to doubly robust methods that can provide unbiased estimates for the treatment effect even if one of the models is misspecified. This paper reviews the statistical methods that are implemented in the CAUSALTRT procedure and includes examples of how you can use this procedure to estimate causal effects from observational data. This paper also illustrates some other important features of the CAUSALTRT procedure, including bootstrap resampling, covariate balance diagnostics, and statistical graphics.

Read the paper (PDF)

Pooling two or more cross-sectional survey data sets (such as stacking the data sets on top of one another) is a strategy often used by researchers for one of two purposes: (1) to more efficiently conduct significance tests on point estimate changes observed over time or (2) to increase the sample size in hopes of improving the precision of a point estimate. The latter purpose is especially common when making inferences on a subgroup, or domain, of the target population insufficiently represented by a single survey data set. Using data from the National Survey of Family Growth (NSFG), the aim of this paper is to walk through a series of practical estimation objectives that can be tackled by analyzing data from two or more pooled survey data sets. Where applicable, we comment on the resulting interpretive nuances.

Read the paper (PDF)

Model validation is an important step in the model building process because it provides opportunities to assess the reliability of models before their deployment. Predictive accuracy measures the ability of the models to predict future risks, and significant developments have been made in recent years in the evaluation of survival models. SAS/STAT^® 14.2 includes updates to the PHREG procedure with a variety of techniques to calculate overall concordance statistics and time-dependent receiver operator characteristic (ROC) curves for right-censored data. This paper describes how to use these criteria to validate and compare fitted survival models and presents examples to illustrate these applications.

Read the paper (PDF)

The British Airways (BA) revenue management team is responsible for surfacing prices made available in the market with the objective of maximizing revenue from our 40,000,000 passenger journeys. BA is currently working to understand how competitor data can be exploited to help facilitate better decision making. Due to the low level of aggregation, competitor data is too large (and consequently too expensive) to store on conventional relational databases. Therefore, it has been stored on a small Hadoop installation at BA. Thanks to SAS/ACCESS^® Interface to Hadoop, we have been able to run our complex algorithms on these large data sets without changing the way we work and whilst exploiting the full capabilities of SAS^®.

Read the paper (PDF)

Traditional analytical modeling, with roots in statistical techniques, works best on structured data. Structured data enables you to impose certain standards and formats in which to store the data values. For example, a variable indicating gas mileage in miles per gallon should always be a number (for example, 25). However, with unstructured data analysis, the free-form text no longer limits you to expressing this information in only one way (25 mpg, twenty-five mpg, and 25M/G). The nuances of language, context, and subjectivity of text make it more complex to fit generalized models. Although statistical methods using supervised learning prove efficient and effective in some cases, sometimes you need a different approach. These situations are when rule-based models with Natural Language Processing capabilities can add significant value. In what context would you choose a rule-based modeling versus a statistical approach? How do you assess the tradeoffs of choosing a rule-based modeling approach with higher interpretability versus a statistical model that is black-box in nature? How can we develop rule-based models that optimize model performance without compromising accuracy? How can we design, construct, and maintain a complex rule-based model? What is a data-driven approach to rule writing? What are the common pitfalls to avoid? In this paper, we discuss all these questions based on our experiences working with SAS^® Contextual Analysis and SAS^® Sentiment Analysis.

Read the paper (PDF)

The DOCUMENT procedure is a little known procedure that can save you vast amounts of time and effort when managing the output of your SAS^® programming efforts. This procedure is deeply associated with the mechanism by which SAS controls output in the Output Delivery System (ODS). Have you ever wished you didn't have to modify and rerun the report-generating program every time there was some tweak in the desired report? PROC DOCUMENT enables you to store one version of the report as an ODS Document Object and then call it out in many different output forms, such as PDF, HTML, listing, RTF, and so on, without rerunning the code. Have you ever wished you could extract those pages of the output that apply to certain BY variables such as State, StudentName, or CarModel? With PROC DOCUMENT, you have where capabilities to extract these. Do you want to customize the table of contents that assorted SAS procedures produce when you make frames for the table of contents with HTML, or use the facilities available for PDF? PROC DOCUMENT enables you to get to the inner workings of ODS and manipulate them. This paper addresses PROC DOCUMENT from the viewpoint of end results, rather than provide a complete technical review of how to do the task at hand. The emphasis is on the benefits of using the procedure, not on detailed mechanics.

Read the paper (PDF) | View the e-poster or slides (PDF)

Factorization machines are a new type of model that is well suited to very high-cardinality, sparsely observed transactional data. This paper presents the new FACTMAC procedure, which implements factorization machines in SAS^® Visual Data Mining and Machine Learning. This powerful and flexible model can be thought of as a low-rank approximation of a matrix or a tensor, and it can be efficiently estimated when most of the elements of that matrix or tensor are unknown. Thanks to a highly parallel stochastic gradient descent optimization solver, PROC FACTMAC can quickly handle data sets that contain tens of millions of rows. The paper includes examples that show you how to use PROC FACTMAC to recommend movies to users based on tens of millions of past ratings, predict whether fine food will be highly rated by connoisseurs, restore heavily damaged high-resolution images, and discover shot styles that best fit individual basketball players. ^®

Read the paper (PDF)

Implementation of state transition models for loan-level portfolio evaluation was an arduous task until now. Several features have been added to the SAS^® High-Performance Risk engine that greatly enhance the ability of users to implement and execute these complex, loan-level models. These new features include model methods, model groups, and transition matrix functions. These features eliminate unnecessary and redundant calculations; enable the user to seamlessly interconnect systems of models; and automatically handle the bulk of the process logic in model implementation that users would otherwise need to code themselves. These added features reduce both the time and effort needed to set up model implementation processes, as well as significantly reduce model run time. This paper describes these new features in detail. In addition, we show how these powerful models can be easily implemented by using SAS^® Model Implementation Platform with SAS^® 9.4. This implementation can help many financial institutions take a huge leap forward in their modeling capabilities.

Read the paper (PDF)

Credit card fraud. Loan fraud. Online banking fraud. Money laundering. Terrorism financing. Identity theft. The strains that modern criminals are placing on financial and government institutions demands new approaches to detecting and fighting crime. Traditional methods of analyzing large data sets on a periodic, batch basis are no longer sufficient. SAS^® Event Stream Processing provides a framework and run-time architecture for building and deploying analytical models that run continuously on streams of incoming data, which can come from virtually any source: message queues, databases, files, TCP\IP sockets, and so on. SAS^® Visual Scenario Designer is a powerful tool for developing, testing, and deploying aggregations, models, and rule sets that run in the SAS^® Event Stream Processing Engine. This session explores the technology architecture, data flow, tools, and methodologies that are required to build a solution based on SAS Visual Scenario Designer that enables organizations to fight crime in real time.

Read the paper (PDF)

Finding daylight saving time (DST) is a common task for manipulating time series data. The date of daylight saving time changes every year. If SAS^® programmers depend on manually entering the value of daylight saving time in their programs, the maintenance of the program becomes tedious. Using a SAS function can make finding the value easy. This paper discusses several ways to capture and use daylight saving time.

Read the paper (PDF)

This paper discusses format enumeration (via the DICTIONARY.FORMATS view) and the new FMTINFO function that gives information about a format, such as whether it is a date or currency format.

Read the paper (PDF)

SAS/STAT^® software has several procedures that estimate parameters from generalized linear models designed for both continuous and discrete response data (including proportions and counts). Procedures such as LOGISTIC, GENMOD, GLIMMIX, and FMM, among others, offer a flexible range of analysis options to work with data from a variety of distributions and also with correlated or clustered data. SAS^® procedures can also model zero-inflated and truncated distributions. This paper demonstrates how statements from PROC NLMIXED can be written to match the output results from these procedures, including the LS-means. Situations arise where the flexible programming statements of PROC NLMIXED are needed for other situations such as zero-inflated or hurdle models, truncated counts, or proportions (including legitimate zeros) that have random effects, and also for probability distributions not available elsewhere. A useful application of these coding techniques is that programming statements from NLMIXED can often be directly transferred into PROC MCMC with little or no modification to perform analyses from a Bayesian perspective with these various types of complex models.

Read the paper (PDF)

Cumulative logistic regression models are used to predict an ordinal response. They have the assumption of proportional odds. Proportional odds means that the coefficients for each predictor category must be consistent or have parallel slopes across all levels of the response. This paper uses a sample data set to demonstrate how to test the proportional odds assumption. It shows how to use the UNEQUALSLOPES option when the assumption is violated. A cumulative logistic regression model is built, and then the performance of the model on a test set is compared to the performance of a generalized multinomial model. This shows the utility and necessity of the UNEQUALSLOPES option when building a cumulative logistic regression model. The procedures shown are produced using SAS^® Enterprise Guide^® 7.1.

Read the paper (PDF)

Longitudinal count data arise when a subject's outcomes are measured repeatedly over time. Repeated measures count data have an inherent within subject correlation that is commonly modeled with random effects in the standard Poisson regression. A Poisson regression model with random effects is easily fit in SAS^® using existing options in the NLMIXED procedure. This model allows for overdispersion via the nature of the repeated measures; however, departures from equidispersion can also exist due to the underlying count process mechanism. We present an extension of the cross-sectional COM-Poisson (CMP) regression model established by Sellers and Shmueli (2010) (a generalized regression model for count data in light of inherent data dispersion) to incorporate random effects for analysis of longitudinal count data. We detail how to fit the CMP longitudinal model via a user-defined log-likelihood function in PROC NLMIXED. We demonstrate the model flexibility of the CMP longitudinal model via simulated and real data examples.

Read the paper (PDF)

Do you ever wonder how to create a report with weighted averages, or one that displays the last day of the month by default? Do you want to take advantage of the one-click relative-time calculations available in SAS^® Visual Analytics, or learn a few other creative ways to enhance your report? If your answer is yes, then this paper is for you. We not only teach you some new tricks, but the techniques covered here will also help you expand the way you think about SAS Visual Analytics the next time you are challenged to create a report.

Read the paper (PDF)

The increasing complexity of data in research and business analytics requires versatile, robust, and scalable methods of building explanatory and predictive statistical models. Quantile regression meets these requirements by fitting conditional quantiles of the response with a general linear model that assumes no parametric form for the conditional distribution of the response; it gives you information that you would not obtain directly from standard regression methods. Quantile regression yields valuable insights in applications such as risk management, where answers to important questions lie in modeling the tails of the conditional distribution. Furthermore, quantile regression is capable of modeling the entire conditional distribution; this is essential for applications such as ranking the performance of students on standardized exams. This expository paper explains the concepts and benefits of quantile regression, and it introduces you to the appropriate procedures in SAS/STAT^® software.

Read the paper (PDF)

The macro language is both powerful and flexible. With this power, however, comes complexity, and this complexity often makes the language more difficult to learn and use. Fortunately, one of the key elements of the macro language is its use of macro variables, and these are easy to learn and easy to use. You can create macro variables using a number of different techniques and statements. However, the five most commonly methods are not only the most useful, but also among the easiest to master. Since macro variables are used in so many ways within the macro language, learning how they are created can also serve as an excellent introduction to the language itself. These methods include: 1) the %LET statement; 2) macro parameters (named and positional); 3) the iterative %DO statement; 4) using the INTO clause in PROC SQL; and 5) using the CALL SYMPUTX routine.

Read the paper (PDF) | Download the data file (ZIP)

Formats can be used for more than just making your data look nice. They can be used in memory lookup tables and to help you create data-driven code. This paper shows you how to build a format from a data set, how to write a format out as a data set, and how to use formats to make programs data driven. Examples are provided.

Read the paper (PDF)

SAS^® Environment Manager is the predominant tool for managing your SAS^® environment. Its popularity is increasing quickly as evidenced by the increased technical support requests from our customers. This paper identifies the most frequently asked questions from customers by reviewing the support work completed by the development and technical support teams over the last few years. The questions range across topics such as web interface usage; alerts, controls, and resource discovery; Agent issues; and security issues. Questions discussed in the paper include: What resources need to be configured after we install SAS Environment Manager? What Control Actions are available, what is their purpose, and when do I use them? Why does SAS Environment Manager show all resources as (!) (Down)? What is the best way to enable an alert for a resource? How do I configure HTTPs? Can we configure the Agents with certificates other than the default? What is the combination of roles needed to see the Resources Tab? This paper presents detailed answers to the questions and also points out where you can find more information. We believe that by understanding these answers, SAS^® administrators will be more knowledgeable about SAS Environment Manager, and can better implement and manage their SAS environment.

Read the paper (PDF)

Are you a marketing analyst who speaks SAS^®? Congratulations, you are in high demand! Or are you? Marketing analysts with programming skills are critical today. The ability to extract large volumes of data, massage it into a manageable format, and display it simply are necessary skills in the world of big data. However, programming skills are not nearly enough. In fact, some marketing managers are putting less and less weight on them and are focusing more on the softer skills that they require. This session will help ensure that you are not left out. In this session, Emma Warrillow shares why being a good programmer is only the beginning. She provide practical tips on moving from being a someone who is good at coding to becoming a true collaborator with marketing taking your marketing analytics to the next level. In 2016, Emma Warrillow's presentation at SAS^® Global Forum was very well received (http://blogs.sas.com/content/sgf/2016/04/21/always-be-yourself-unless-you-can-be-a-unicorn/). In this follow-up, she revisits some of the highlights from 2016 and shares some new ideas. You can be sure of an engaging code-free session!

Read the paper (PDF)

In the quest for valuable analytics, access to business data through message queues provides near real-time access to the entire data life cycle. This in turn enables our analytical models to perform accurately. What does the item a user temporarily put in the shopping basket indicate, and what can be done to motivate the user? How do you recover the user who has now unsubscribed, given that the user had previously unsubscribed and re-subscribed quickly? User behavior can be captured completely and efficiently using a message queue, which causes minimal load on production systems and allows for distributed environments. There are some technical issues encountered when attempting to populate a data warehouse using events from a message queue. The presentation outlines a solution to the following issues: the message queue connection, how to ensure that messages aren't lost in transit, and how to efficiently process messages with SAS^®; message definition and metadata, and how to react to changes in message structure; data architecture and which data architecture is appropriate for storing message data and other business data; late arrival of messages and how late arriving data can be loaded into slowly changing dimensions; and analytical processing and how transactional message data can be reformatted for analytical modeling. Ultimately, populating a data warehouse with message queue data can require less development than accessing source databases; however a robust architecture

Read the paper (PDF)

Having crossed the spectrum from an epidemiologist and researcher (where ad hoc is a way of life and where research is the main focus) to a SAS^® programmer (writing reusable code for automation and batch jobs, which require no manual interventions), I have learned a few things that I wish I had known as a researcher. These things would not only have helped me to be a better SAS programmer, but they also would have saved me time and effort as a researcher by enabling me to have well-organized, accurate code (that I didn't accidentally remove) and code that would work when I ran it again on another date. This poster presents five SAS tips that are common practice among SAS programmers. I provide researchers who use SAS with tips that are handy and useful, and I provide code (where applicable) that they can try out at home. Using the tips provided will make any SAS programmer smile when they are presented with your code (not guaranteed, but your results should not vary by using these tips).

View the e-poster or slides (PDF)

This paper demonstrates how to use the capabilities of SAS^® Data Integration Studio to extract, load, and transform your data within a Hadoop environment. Which transformations can be used in each layer of the ELT process is illustrated using a sample use case, and the functionality of each is described. The use case steps through the process from source to target.

Read the paper (PDF)

The acquisition of new customers is fundamental to the success of every business. Data science methods can greatly improve the effectiveness of acquiring prospective customers and can contribute to the profitability of business operations. In a business-to-business (B2B) setting, a predictive model might target business prospects as individual firms listed by, for example, Dunn & Bradstreet. A typical acquisition model can be defined using a binary response with the values categorizing a firm's customers and non-customers, for which it is then necessary to identify which of the prospects are actually customers of the firm. The methods of fuzzy logic, for example, based on the distance between strings, might help in matching customers' names and addresses with the overall universe of prospects. However, two errors can occur: false positives (when the prospect is incorrectly classified as a firm's customer), and false negatives (when the prospect is incorrectly classified as a non-customer). In the current practice of building acquisition models, these errors are typically ignored. In this presentation, we assess how these errors affect the performance of the predictive model as measured by a lift. In order to improve the model's performance, we suggest using a pre-determined sample of correct matches and to calibrate its predicted probabilities based on actual take-up rates. The presentation is illustrated with real B2B data and includes elements of SAS^® code that was used in the research.

Read the paper (PDF)

When creating statistical models that include multiple covariates (for example, Cox proportional hazards models or multiple linear regression), it is important to address which variables are categorical and continuous for proper analysis and interpretation in SAS^®. Categorical variables, regardless of SAS data type, should be added in the MODEL statement with an additional CLASS statement. In larger models containing many continuous or categorical variables, it is easy to overlook variables that should be added to the CLASS statement. To solve this problem, we have created a macro that uses simple input from the model variables, with PROC CONTENTS and additional logic checks, to create the necessary CLASS statement and to run the desired model. With this macro, variables are evaluated on multiple conditions to see whether they should be considered class variables. Then, they are added automatically to the CLASS statement.

Read the paper (PDF) | View the e-poster or slides (PDF)

The presentation will give a brief introduction to Bayesian Analysis within SAS. Participants will learn the difference between Bayesian and Classical Statistics and be introduced to PROC MCMC.

SAS^® has been installed at your organization now what? How do you approach configuring groups, roles, folders, and permissions in your environment? This presentation is built on best practices used within the U.S. SAS^® Professional Services and Delivery division and aims to equip new and seasoned SAS administrators with the knowledge and tools necessary to design and implement a SAS metadata and file system security model. We start by covering the basic building blocks of the SAS^® Intelligence Platform metadata and security framework. We discuss the SAS metadata architecture, and highlight the differences between groups and roles, permissions and capabilities, access control entries and access control templates, and what content can be stored within metadata folders versus in file system folders. We review the various authorization layers in a SAS deployment that must work together to create a secure environment, including the metadata layer, the file system, and the data layer. Then, we present a 10-step best practice approach for how to design your SAS metadata security model. We provide an introduction to basic metadata security design and file system security design templates that have been used extensively by SAS Professional Services and Delivery in helping customers secure their SAS environments.

Read the paper (PDF)

Machine Learning algorithms have been available in SAS software since 1979. This session provides practical examples of machine learning applications. The evolution of machine learning at SAS is illustrated with examples of nearest-neighbor discriminant analysis in SAS/STAT PROC DISCRIM to advanced predictive modeling in SAS Enterprise Miner. Machine learning techniques addressed include memory based reasoning, decision trees, neural networks, and gradient boosting algorithms.

In this presentation you will learn the basics of working with nested data, such as students within classes, customers within households, or patients within clinics through the use of multilevel models. Multilevel models can accommodate correlation among nested units through random intercepts and slopes, and generalize easily to 2, 3, or more levels of nesting. These models represent a statistically efficient and powerful way to test your key hypotheses while accounting for the hierarchical nesting of the design. The GLIMMIX procedure is used to demonstrate analyses in SAS.

Allowing SAS^® users to leverage SAS prompts when running programs is a very powerful tool. Using SAS prompts makes it easier for SAS users to submit parameter-driven programs and for developers to create robust, data-driven programs. This presentation demonstrates how to create SAS prompts from SAS^® Enterprise Guide^® and shows how to roll them out to users so that they can take advantage of them from SAS Enterprise Guide, the SAS^® Add-In for Microsoft Office, and the SAS^® Stored Process Web Application.

Read the paper (PDF)

Getting Started with ARIMA Models will introduce the basic features of time series variation, and the model components used to accommodate them; stationary (ARMA), trend and seasonal (the 'I' in ARIMA) and exogenous (input variable related). The Identify, Estimate and Forecast framework for building ARIMA models is illustrated with two demonstrations.

SAS^® 9.4 provides three ways to upgrade: upgrade in place, automated migration with the SAS^® Migration Utility, and partial promotion. This session focuses primarily on the different techniques and best practices for each. We also discuss the pros and cons of using the SAS Migration Utility and what is required for migrating users' content like projects, data, and code.

Read the paper (PDF)

When you look at examples of the REPORT procedure, you see code that tests _BREAK_ and _RBREAK_, but you wonder what s the breakdown of the COMPUTE block? And, sometimes, you need more than one break line on a report, or you need a customized or adjusted number at the break. Everything in PROC REPORT that is advanced seems to involve a COMPUTE block. This paper provides examples of advanced PROC REPORT output that uses _BREAK_ and _RBREAK_ to customize the extra break lines that you can request with PROC REPORT. Examples include how to get custom percentages with PROC REPORT, how to get multiple break lines at the bottom of the report, how to customize break lines, and how to customize LINE statement output. This presentation is aimed at the intermediate to advanced report writer who knows some about PROC REPORT, but wants to get the breakdown of how to do more with PROC REPORT and the COMPUTE block.

Read the paper (PDF) | Download the data file (ZIP)

This paper shows how you can reduce the computing footprint of your SAS^® applications without compromising your end products. The paper presents the 15 axioms of going green with your SAS applications. The axioms are proven, real-world techniques for reducing the computer resources used by your SAS programs. When you follow these axioms, your programs run faster, use less network bandwidth, use fewer desktop or shared server computing resources, and create more compact SAS data sets.

Read the paper (PDF)

Many SAS^® users are working across multiple platforms, commonly combining Microsoft Windows and UNIX environments. Often, SAS code developed on one platform (for example, on a PC) might not work on another platform (for example, on UNIX). Portability is not just working across multi-platform environments; it is also about making programs easier to use across projects, across companies, or across clients and vendors. This paper examines some good programming practices to address common issues that occur when you work across SAS on a PC and SAS on UNIX. They include: 1) avoid explicitly defining file paths in LIBNAME, filename, and %include statements that require platform-specific syntax such as forward slash (in UNIX) or backslash (in PC SAS); 2) avoid using X commands in SAS code to execute statements on the operating system, which works only on Windows but not on UNIX; 3) use the appropriate SAS rounding function for numeric variables to avoid different results when dealing with 64-bit operating systems and 32-bit systems. The difference between rounding before or after calculations and derivations is discussed; 4) develop portable SAS code to import or export Microsoft Excel spreadsheets across PC SAS and UNIX SAS, especially when dealing with multiple worksheets within one Excel file; and 5) use SAS^® Enterprise Guide^® to access and run PC SAS programs in UNIX effectively.

Read the paper (PDF)

Because many SAS^® users either work for or own companies that house big data, the threat that malicious software poses becomes even more extreme. Malicious software, often abbreviated as malware, includes many different classifications, ways of infection, and methods of attack. This E-Poster highlights the types of malware, detection strategies, and removal methods. It provides guidelines to secure essential assets and prevent future malware breaches.

Read the paper (PDF) | View the e-poster or slides (PDF)

In this course you will learn how to access and manage SAS and Excel data in SAS^® Viya .

This workshop provides hands-on experience with using SAS Enterprise Miner. Workshop participants will learn to do the following: open a project; create and explore a data source; build and compare models; and produce and examine score code that can be used for deployment.

This workshop provides hands-on experience with some basic functionality of SAS Data Loader for Hadoop. You will learn how to: Copy Data to Hadoop Profile Data in Hadoop Cleanse Data in Hadoop

This workshop provides hands-on experience with SAS^® Studio. Workshop participants will use SAS's new web-based interface to access data, write SAS programs, and generate SAS code through predefined tasks. This workshop is intended for SAS programmers from all experience levels.

This workshop provides hands-on experience with SAS Viya Data Mining and Machine Learning through the programming interface to SAS Viya. Workshop participants will learn how to start and stop a CAS session; move data into CAS; prepare data for machine learning; use SAS Studio tasks for supervised learning; and evaluate the results of analyses.

This workshop provides hands-on experience performing statistical analysis with the Statistics tasks in SAS Studio. Workshop participants will learn to perform statistical analyses using tasks, evaluate which tasks are ideal for different kinds of analyses, edit the generated code, and customize a task.

This workshop provides hands-on experience using SAS^® Text Miner. For a collection of documents, workshop participants will learn how to: read and convert documents for use by SAS Text Miner; retrieve information from the collection using query features of the software; identify the dominant themes and concepts in the collection; and classify documents having pre-assigned categories.

When modeling time series data, we often use a LAG of the dependent variable. The LAG function works great for this, until you try to predict out into the future and need the model's predicted value from one record as an independent value for a future record. This paper examines exactly how the LAG function works, and explains why it doesn't in this case. It also explains how to create a hash object that will accomplish a LAG of any value, how to load the initial data, how to add predicted values to the hash object, and how to extract those values when needed as an independent variable for future observations.

Read the paper (PDF)

Data processing can sometimes require complex logic to match and rank record associations across events. This paper presents an efficient solution to generating these complex associations using the DATA step and data hash objects. The solution applies to multiple business needs including subsequent purchases, repayment of loan advance, or hospital readmits. The logic demonstrates how to construct a hash process to identify a qualifying initial event and append linking information with various rank and analysis factors, through the example of a specific use case of the process.

Read the paper (PDF) | Download the data file (ZIP)

Heat maps use colors to communicate numeric data by varying the underlying values that represent red, green, and blue (RGB) as a linear function of the data. You can use heat maps to display spatial data, plot big data sets, and enhance tables. You can use colors on the spectrum from blue to red to show population density in a US map. In fields such as epidemiology and sociology, colors and maps are used to show spatial data, such as how rates of disease or crime vary with location. With big data sets, patterns that you would hope to see in scatter plots are hidden in dense clouds of points. In contrast, patterns in heat maps are clear, because colors are used to display the frequency of observations in each cell of the graph. Heat maps also make tables easier to interpret. For example, when displaying a correlation matrix, you can vary the background color from white to red to correspond to the absolute correlation range from 0 to 1. You can shade the cell behind a value, or you can replace the table with a shaded grid. This paper shows you how to make a variety of heat maps by using PROC SGPLOT, the Graph Template Language, and SG annotation.

Read the paper (PDF)

How would you answer this question? Most of us struggle to articulate the value of the tools, techniques, and teams we use when using analytics. How do you help the new director understand the value of SAS^® to you, your job, and the company? In this interactive session, you will discover the components that make up total cost of ownership (TCO) as they apply to the analytics lifecycle. What should you consider when you evaluate total cost of ownership and why should you measure it? How can you help your management team understand the value that SAS provides?

Read the paper (PDF)

Another year implementing, validating, securing, optimizing, migrating, and adopting the Hadoop platform. What have been the top 10 accomplishments with Hadoop seen over the last year? We also review issues, concerns, and resolutions from the past year as well. We discuss where implementations are and some best practices for moving forward with Hadoop and SAS^® releases.

Read the paper (PDF)

The openness of SAS^® Viya , the new cloud analytic platform that uses SAS^® Cloud Analytic Services (CAS), emphasizes a unified experience for data scientists. You can now execute the analytics capabilities of SAS^® in different programming languages including Python, Java, and Lua, as well as use a RESTful endpoint to execute CAS actions directly. This paper provides an introduction to these programming languages. For each language, we illustrate how the API is surfaced from the CAS server, the types of data that you can upload to a CAS server, and the result tables that are returned. This paper also provides a comprehensive comparison of using these programming languages to build a common analytical process, including loading data to a CAS server; exploring, manipulating, and visualizing data; and building statistical and machine learning models.

Read the paper (PDF)

The clock is ticking! Is your company ready for May 25, 2018 when the General Data Protection Regulation that affects data privacy laws across Europe comes into force? If companies fail to comply, they incur very large fines and might lose customer trust if sensitive information is compromised. With data streaming in from multiple channels in different formats, sizes, and wavering quality, it is increasingly difficult to keep track of personal data so that you can protect it. SAS^® Data Management helps companies on their journey toward governance and compliance involving tasks such as detection, quality assurance, and protection of personal data. This paper focuses on using SAS^® Federation Server and SAS^® Data Management Studio in the SAS^® data management suite of products to surface and manage that hard-to find-personal data. SAS Federation Server provides you with a universal way to access data in Hadoop, Teradata, SQL Server, Oracle, SAP HANA, and other types of data without data movement during processing. The advanced data masking and encryption capabilities of SAS Federation Server can be use when virtualizing data for users. Purpose-built data quality functions are used to perform identification analysis, parsing, and matching and extraction of personal data in real time. We also provide insight to how the exploratory data analysis capability of SAS^® Data Management Studio enables you to scan through your investigation hub to identify and categorize personal data.

Read the paper (PDF)

What if you had analytics near the edge for your Internet of Things (IoT) devices that would tell you whether a piece of equipment is operating within its normal range? And what if those same analytics could help you intelligently determine what data you should keep and what data should be filtered at the edge? This session focuses on classifying streaming data near the edge by showcasing a demo that implements a single-class classification model within a gateway device. The model identifies observations that are normal and abnormal to help determine possible machine issues and preventative maintenance opportunities. The classification also helps to provide a method for filtering data at the edge by capturing all abnormal data but taking only a sample of the normal operating data. The model is developed using SAS^® Viya and implemented on a gateway device using SAS^® Event Stream Processing. By using a single-class classification technique, the demo also illustrates how to avoid issues with binary classification that would require failure observations in order to build an accurate model. Problems that this demo addresses include: identifying potential and future equipment failures in near real time; filtering sensor data near the edge to prevent unnecessary transport and storage of less valuable data; and building a classification model for failure that doesn't require observations relating to failures.

Read the paper (PDF)

Many SAS^® environments are set up for single-byte character sets (SBCS). But many organizations now have to process names of people and companies with characters outside that set. You can solve this problem by changing the configuration to the UTF-8 encoding, which is a multi-byte character set (MBCS). But the commonly used text manipulating functions like SUBSTR, INDEX, FIND, and so on, act on bytes, and should not be used anymore. SAS has provided new functions to replace these (K-functions). Also, character fields have to be enlarged to make room for multi-byte characters. This paper describes the problems and gives guidelines for a strategy to change. It also presents code to analyze existing code for functions that might cause problems. For those interested, a short historic background and a description of UTF-8 encoding is also provided. Conclusions focus on the positioning of SAS environments configured with UTF-8 versus single-byte encodings, the strategy of organizations faced with a necessary change, and the documentation.

Read the paper (PDF)

Capacity management is concerned with managing, controlling, and optimizing the hardware resources on a technology platform. Its primary goal is to ensure that IT resources are right-sized to meet current and future business requirements in a cost-effective manner. In other words, keeping those hardware vendors at bay! A SAS^® LASR Analytic Server, with its dependence on in-memory resources, necessitate a revisit to the traditional IT server capacity management practices. A major UK-based financial services institution operates a multi-tenanted Enterprise SAS^® platform. The tenants share platform resources and as such, require quotas enforced with system limits and costs for their resource utilization, aligned to the business outcomes and agreed-upon service level agreements (SLAs). This paper discusses the implementation of system, operational, and development polices applicable in a multi-tenanted SAS platform, in order to optimize an investment in the analytic platform provided by SAS LASR Analytic Server and to be in control as to when capacity uplifts are required.

Read the paper (PDF)

Traditionally, role-based access control is implemented as group memberships. Access to SAS^® data sets or metadata libraries requires membership in the group that 'owns' the resources. From the point of view of a SAS process, these authorizations are additive. If a user is a member in two distinct groups, her SAS processes have access to the data resources of both groups simultaneously. This happens every time the user runs a SAS process; even when the code in question is meant to be used with only one group's resources. As a consequence, having a master data source defining data flows between groups becomes futile, as any SAS process of the user can bypass said definitions. In addition, as it is not possible to reduce the user's authorizations to match those of only the relevant group, it becomes challenging to determine whether other members of the group have sufficient authorization. Furthermore, it becomes difficult to audit statistics production, as it cannot be automatically determined which of the groups owns a certain log file. All these problems can be avoided by using role-based access control with dynamic separation of duties (RBAC DSoD). In DSoD, the user is able to activate only one group membership at a time. This paper describes one way to implement an RBAC with DSoD schema in a UNIX server environment.

Read the paper (PDF)

In the past 10 years, SAS^® Enterprise Guide^® has developed into the go-to application to access the power of SAS^®. With each new release, SAS continues to add functionality that makes the SAS user's life easier. We take a closer look at some of the built-in features within SAS Enterprise Guide and how they can make your life easier. One of the most exciting and powerful features we explore is allowing parallel execution on the same server. This gives you the ability to run multiple SAS processes at the same time regardless of whether you have a SAS^® Grid Computing environment. Some other topics we cover include conditional processing within SAS Enterprise Guide, how to securely store database login and password information, setting up autoexec files in SAS Enterprise Guide, exploiting process flows, and much more.

Read the paper (PDF)

SAS^® Enterprise Guide^® continues to add easy-to-use features that enable you to work more efficiently. For example, you can now debug your DATA step code with a DATA step debugger tool; upload data to SAS^® Viya with a point-and-click task; control process flow execution behavior when an error occurs; export results to Microsoft Excel and Microsoft PowerPoint destinations with the click of a button; zoom views; filter the data grid with your own WHERE clause; easily define case-insensitive filters; and automatically get the latest product updates. Come see these and more new features and enhancements in SAS Enterprise Guide 7.11, 7.12, and 7.13.

Read the paper (PDF)

Do you need to create a format instantly? Does the format have a lot of labels, and it would take a long time to type in all the codes and labels by hand? Sometimes, a SAS^® programmer needs to create a user-defined format for hundreds or thousands of codes, and he needs an easy way to accomplish this without having to type in all of the codes. SAS provides a way to create a user-defined format without having to type in any codes. If the codes and labels are in a text file, SAS data set, Excel file, or in any file that can be converted to a SAS data set, then a SAS user-defined format can be created on the fly. The CNTLIN=option of PROC FORMAT allows a user to create a user-defined format or informat from raw data or from a SAS file. This paper demonstrates how to create two user-defined formats instantly from a raw text file on our Census Bureau website. It explains how to use these user-defined formats for the final report and final output data set from PROC TABULATE. The paper focuses on the CNTLIN= option of PROC FORMAT, not the CNTLOUT= option.

Read the paper (PDF)

SAS^® Visual Analytics has two offerings, SAS^® Visual Statistics and SAS^® Visual Data Mining and Machine Learning, that provide knowledge workers and data scientists an interactive interface for data partition, data exploration, feature engineering, and rapid modeling. These offerings are powered by the SAS^® Viya platform, thus enabling big data and big analytic problems to be solved. This paper focuses on the steps a user would perform during an interactive modeling session.

Read the paper (PDF)

This paper will build on the knowledge gained in the Intro to SAS^® ODS Graphics. The capabilities in ODS Graphics grow with every release as both new paradigms and smaller tweaks are introduced. After talking with the ODS developers, a selection of the many wonderful capabilities was selected. This paper will look at that selection of both types of capabilities and provide the reader with more tools for their belt. Visualization of data is an important part of telling the story seen in the data. And while the standards and defaults in ODS Graphics are very well done, sometimes the user has specific nuances for characters in the story or additional plot lines they want to incorporate. Almost any possibility, from drama to comedy to mystery, is available in ODS Graphics if you know how. We will explore tables, annotation and changing attributes, as well as the BLOCK plot. Any user of Base SAS on any platform will find great value from the SAS ODS Graphics procedures. Some experience with these procedures is assumed, but not required.

Read the paper (PDF) | Download the data file (ZIP)

How can we run traditional SAS^® jobs, including SAS^® Workspace Servers, on Hadoop worker nodes? The answer is SAS^® Grid Manager for Hadoop, which is integrated with the Hadoop ecosystem to provide resource management, high availability and enterprise scheduling for SAS customers. This paper provides an introduction to the architecture, configuration, and management of SAS Grid Manager for Hadoop. Anyone involved with SAS and Apache Hadoop should find the information in this paper useful. The first area covered is a breakdown of each required SAS and Hadoop component. From the Hadoop ecosystem, we define the role of Hadoop YARN, Hadoop Distributed File System (HDFS) storage, and Hadoop client services. We review SAS metadata definitions for SAS Grid Manager, SAS^® Object Spawner, and SAS^® Workspace Servers. We cover required Kerberos security, as well as SAS^® Enterprise Guide^® and the SAS^® Grid Manager Client Utility. YARN queues and the SAS Grid Policy file for optimizing job scheduling are also reviewed. And finally, we discuss traditional SAS math running on a Hadoop worker node, and how it can take advantage of high-performance math to accelerate job execution. By leveraging SAS Grid Manager for Hadoop, sites are moving SAS jobs inside a Hadoop cluster. This will ultimately cut down on data movement and provide more consistent job execution. Although this paper is written for SAS and Hadoop administrators, SAS users can also benefit from this session.

Read the paper (PDF)

This presentation teaches the audience how to use ODS Graphics. Now part of Base SAS^®, ODS Graphics are a great way to easily create clear graphics that enable any user to tell their story well. SGPLOT and SGPANEL are two of the procedures that can be used to produce powerful graphics that used to require a lot of work. The core of the procedures is explained, as well as some of the many options available. Furthermore, we explore the ways to combine the individual statements to make more complex graphics that tell the story better. Any user of Base SAS on any platform will find great value in the SAS ODS Graphics procedures.

Read the paper (PDF) | Download the data file (ZIP)

For many years now you have learned the ins and outs of using SAS/ACCESS^® software to move data into SAS^® to do your analytics. With the new open, cloud-ready SAS^® Viya platform comes a new set of data access technologies known as SAS data connectors and SAS data connect accelerators. This paper describes what these new data access products are and how they integrate with the SAS Viya platform. After reading this paper, you will have the foundation needed to load data from third-party data sources into SAS Viya.

Read the paper (PDF)

Statistical analysis is like detective work, and a data set is like the crime scene. The data set contains unorganized clues and patterns that can, with proper analysis, ultimately lead to meaningful conclusions. Using SAS^® tools, a statistical analyst (like any good crime scene investigator) performs a preliminary analysis of the data set through visualization and descriptive statistics. Based on the preliminary analysis, followed by a detailed analysis, both the crime scene investigator (CSI) and the statistical analyst (SA) can use scientific or analytical tools to answer the key questions: What happened? What were the causes and effects? Why did this happen? Will it happen again? Applying the CSI analogy, this paper presents an example case study using a two-step process to investigate a big-data crime scene. Part I shows the general procedures that are used to identify clues and patterns and to obtain preliminary insights from those clues. Part II narrows the focus on the specific statistical analyses that provide answers to different questions.

Read the paper (PDF)

In 1993, Erin Brockovich, a legal clerk to Edward L. Masry, began a lengthy manual investigation after discovering a link between elevated clusters of cancer cases in Hinkley, CA, and contaminated water in the same area due to the disposal of chemicals from a utility company. In this session, we combine disparate data sources - cancer cases and chemical spillages - to identify connections between the two data sets using SAS^® Visual Investigator. Using the map and network functionalities, we visualize the contaminated areas and their link to cancer clusters. What took Erin Brockovich months and months to investigate, we can do in minutes with SAS Visual Investigator.

Read the paper (PDF)

As technology expands, we have the need to create programs that can be handed off to clients, to regulatory agencies, to parent companies, or to other projects, and handed off with little or no modification by the recipient. Minimizing modification by the recipient often requires the program itself to self-modify. To some extent the program must be aware of its own operating environment and what it needs to do to adapt to it. There are a great many tools available to the SAS^® programmer that will allow the program to self-adjust to its own surroundings. These include location-detection routines, batch files based on folder contents, the ability to detect the version and location of SAS, programs that discern and adjust to the current operating system and the corresponding folder structure, the use of automatic and user defined environmental variables, and macro functions that use and modify system information. Need to create a portable program? We can hand you the tools.

Read the paper (PDF)

Data with a location component is naturally displayed on a map. Base SAS^® 9.4 provides libraries of map data sets to assist in creating these images. Sometimes, a particular sub-region is all that needs to be displayed. SAS/GRAPH^® software can create a new subset of the map using the GPROJECT procedure minimum and maximum latitude and longitude options. However, this method is capable only of cutting out a rectangular area. This paper presents a polygon clipping algorithm that can be used to create arbitrarily shaped custom map regions. Maps are nothing more than sets of polygons, defined by sets of border points. Here, a custom polygon shape overlays the map polygons and saves the intersection of the two. The DATA step hash object is used for easier bookkeeping of the added and deleted points needed to maintain the correct shape of the clipped polygons.

Read the paper (PDF)

How do you enable strong authentication across different parts of your organization in a safe and secure way? We know that Kerberos provides us with a safe and secure strong authentication mechanism, but how does it work across different domains or realms? In this paper, we examine how Kerberos cross-realm authentication works and the different parts that you need ready in order to use Kerberos effectively. Understanding the principals and applying the ideas we present will make you successful at improving the security of your authentication system.

Read the paper (PDF)

When analyzing data with SAS^®, we often use the SAS DATA step and the SQL procedure to explore and manipulate data. Though they both are useful tools in SAS, many SAS users do not fully understand their differences, advantages, and disadvantages and thus have numerous unnecessary biased debates on them. Therefore, this paper illustrates and discusses these aspects with real work examples, which give SAS users deep insights into using them. Using the right tool for a given circumstance not only provides an easier and more convenient solution, it also saves time and work in programming, thus improving work efficiency. Furthermore, the illustrated methods and advanced programming skills can be used in a wide variety of data analysis and business analytics fields.

Read the paper (PDF)

Managing your career future involves learning outside the box at all stages. The next step is not always on the path we planned as opportunities develop and must be taken when we are ready. Prepare with this paper, which explains important features of Base SAS^® that support teams. In this presentation, you learn about the following: concatenating team shared folders with personal development areas; creating consistent code; guidelines for a team (not standards); knowing where the documentation will provide the basics; thinking of those who follow (a different interface); creating code for use by others; and how code can learn about the SAS environment.

Read the paper (PDF)

Programming for others involves new disciplines not called for when we write to provide results. There are many additional facilities in the languages of SAS^® to ensure the processes and programs you provide for others will please your customers. Not all are obvious and some seem hidden. The never-ending search to please your friends, colleagues, and customers could start in this presentation.

Read the paper (PDF)

Making sure that you have saved all the necessary information to replicate a deliverable can be a cumbersome task. You want to make sure that all the raw data sets and all the derived data sets, whether they are Study Data Tabulation Model (SDTM) data sets or Analysis Data Model (ADaM) data sets, are saved. You prefer that the date/time stamps are preserved. Not only do you need the data sets, you also need to keep a copy of all programs that were used to produce the deliverable, as well as the corresponding logs from when the programs were executed. Any other information that was needed to produce the necessary outputs also needs to be saved. You must do all of this for each deliverable, and it can be easy to overlook a step or some key information. Most people do this process manually. It can be a time-consuming process, so why not let SAS^® do the work for you?

Read the paper (PDF)

Developing software using agile methodologies has become the common practice in many organizations. We use the SCRUM methodology to prepare, plan, and implement changes in our analytics environment. Preparing for the deployment of a new release usually took two days of creating packages, promoting them, deploying jobs, creating migration scripts, and correcting errors made in the first attempt. A sprint that originally took 10 working days (two weeks) was effectively reduced to barely seven. By automating this process, we were able to reduce the time needed to prepare our deployment to less than half a day, increasing the time we can spend developing by 25%. In this paper, we present the process and system prerequisites for automating the deployment process. We also describe the process, code, and scripts required for automating metadata promotion and physical table comparison and update.

Read the paper (PDF)

String externalization is the key to making your SAS^® applications speak multiple languages, even if you can't. Using the new features in SAS^® 9.3 for internationalization, your SAS applications can be written to adapt to whatever environment they are found in. String externalization is the process of identifying and separating translatable strings from your SAS program. This paper outlines the four steps of string externalization: create a Microsoft Excel spreadsheet for messages (optional), create SMD files, convert SMD files, and create the final SAS data set. Furthermore, it briefly shows you a real-world project on applying the concept. Using the Excel spreadsheet message text approach, professional translators can work more efficiently translating text in a friendlier and more comfortable environment. Subsequently, a programmer can also fully concentrate on developing and maintaining SAS code when your application is traveling to a new country.

View the e-poster or slides (PDF)

Geofencing is one of the most promising and exciting concepts that has developed with the advent of the internet of things. Like John Anderton in the 2002 movie Minority Report, you can now enter a mall and immediately receive commercial ads and offers based on your personal taste and past purchases. Authorities can track vessels positions and detect when a ship is not in the area it should be, or they can forecast and optimize harbor arrivals. When a truck driver breaks from the route, the dispatcher can be alerted and can act immediately. And there are countless examples from manufacturing, industry, security, or even households. All of these applications are based on the core concept of geofencing, which consists of detecting whether a device s position is within a defined geographical boundary. Geofencing requires real-time processing in order to react appropriately. In this session, we explain how to implement real-time geofencing on streaming data with SAS^® Event Stream Processing and achieve high-performance processing, in terms of millions of events per second, over hundreds of millions of geofences.

Read the paper (PDF)

In this panel session, professors from three geographically diverse universities explain what makes for an effective partnership with private sector companies. Specific examples are discussed from health care, insurance, financial services, insurance, and retail. The panelists discuss what works, what doesn t, and what both parties need to be prepared to bring to the table for a long-term, mutually beneficial partnership.

The days of comparing paper copies of graphs on light boxes are long gone, but the problems associated with validating graphical reports still remain. Many recent graphs created using SAS/GRAPH^® software include annotations, which complicate an already complex problem. In ODS Graphics, only a single input data set should be used. Because annotation can be more easily added by overlaying an additional graph layer, it is now more practical to use that single input data set for validation, which removes all of the scaling, platform, and font issues that got in the way before. This paper guides you through the techniques to simplify validation while you are creating your perfect graph.

Read the paper (PDF)

Every organization, from the most mature to a day-one start-up, needs to grow organically. A deep understanding of internal customer and operational data is the single biggest catalyst to develop and sustain the data. Advanced analytics and big data directly feed into this, and there are best practices that any organization (across the entire growth curve) can adopt to drive success. Analytics teams can be drivers of growth. But to be truly effective, key best practices need to be implemented. These practices include in-the-weeds details, like the approach to data hygiene, as well as strategic practices, like team structure and model governance. When executed poorly, business leadership and the analytics team are unable to communicate with each other they talk past each other and do not work together toward a common goal. When executed well, the analytics team is part of the business solution, aligned with the needs of business decision-makers, and drives the organization forward. Through our engagements, we have discovered best practices in three key areas. All three are critical to analytics team effectiveness. 1) Data Hygiene 2) Complex Statistical Modeling 3) Team Collaboration

Read the paper (PDF)

You re in the business of performing complex analyses on large amounts of data. This data changes quickly and often, so you ve invested in a powerful high-performance analytics engine with the speed to respond to a real-time data stream. However, you realize an immediate problem upon the implementation of your software solution: your analytics engine wants to process many records of data at once, but your streaming engine wants to send individual records. How do you store this streaming data? How do you tell the analytics engine about the updates? This paper explains how to manage real-time streaming data in a batch-processing analytics engine. The problem of managing streaming data in analytics engines comes up in many industries: energy, finance, health care, and marketing to name a few. The solution described in this paper can be applied in any industry, using features common to most analytics engines. You learn how to store and manage streaming data in such a way as to guarantee that the analytics engine has only current information, limit interruptions to data access, avoid duplication of data, and maintain a historical record of events.

Read the paper (PDF)

How many environments does your organization have-three (Dev/Test/Prod), five (Dev/SIT/UAT/Pre-Prod/Prod), or maybe only one? Once you've built your SAS^® process-an ETL job, a model, an exploration, or a report-how should you promote it across these environments? If you have only one environment, is a development life cycle still possible? (Yes, it is.) Historically, the traditional systems development life cycle (SDLC) spans multiple environments (for example, Dev/Test/Prod). This approach has benefits-primarily to ensure that change in one environment does not adversely impact others, but costs and release time-frames mean this is not always practicable. Some sites now adopt a two-platform approach: Non-Production and Production. Non-Prod exists for technology change, such as new software, hot fixes, database connections, and so on. At these sites, the business runs wholly within the Production environment, yet still requires a business-specific life-cycle management within the Production environment. And, of course, all promotion must include thorough testing. Other questions to consider are: 1) Can this promotion process be automated? 2) Can this process extend beyond business content to include configuration settings? This presentation investigates the SAS tools available to promote content between environments or between functional areas of a single environment, and how to automate and test the promotion process. Just imagine: a weekly automated and tested promotion process? Let's see

Read the paper (PDF)

One of the first maps of the present United States was John White's 1585 map of the Albemarle Sound and Roanoke Island, the site of the Lost Colony and the site of my present home. This presentation looks at advances in mapping through the ages, from the early surveys and hand-painted maps, through lithographic and photochemical processes, to digitization and computerization. Inherent difficulties in including small pieces of coastal land (often removed from map boundary files and data sets to smooth a boundary) are also discussed. The paper concludes with several current maps of Roanoke Island created with SAS^®.

Read the paper (PDF)

Meta-analysis is a method for combining multiple independent studies on the same subject or question, producing a single large study with increased accuracy and enhanced ability to detect overall trends and smaller effects. This is done by treating the results of each study as a single observation and performing analysis on the set, while controlling for differences between individual studies. These differences can be treated as either fixed or random effects, depending on context. This paper demonstrates the process and techniques used in meta-analysis using human trafficking studies. This problem has seen increasing interest in the past few years, and there are now a number of localized studies for one state or a metropolitan area. This meta-analysis combines these to begin development of a comprehensive analytic understanding of human trafficking across the United States. Both fixed and random effects are described. All elements of this analysis were performed using SAS^® University Edition.

Read the paper (PDF)

Many practitioners of machine learning are familiar with support vector machines (SVMs) for solving binary classification problems. Two established methods of using SVMs in multinomial classification are the one-versus-all approach and the one-versus-one approach. This paper describes how to use SAS^® software to implement these two methods of multinomial classification, with emphasis on both training the model and scoring new data. A variety of data sets are used to illustrate the pros and cons of each method.

Read the paper (PDF)

SAS^® BI Dashboard is an important business intelligence and data visualization product used by many customers worldwide. They still rely on SAS BI Dashboard for performance monitoring and decision support. SAS^® Visual Analytics is a new-generation product, which empowers customers to explore huge volumes of data very quickly and view visualized results with web browsers and mobile devices. Since SAS Visual Analytics is used by more and more regular customers, some SAS BI Dashboard customers might want to migrate existing dashboards to SAS Visual Analytics to take advantage of new technologies. In addition, some customers might hope to deploy the two products in parallel and keep everyone on the same page. Because the two products use different data models and formats, a special conversion tool is developed to convert SAS BI Dashboard dashboards into SAS Visual Analytics dashboards and reports. This paper comprehensively describes the guidelines, methods, and detailed steps to migrate dashboards from SAS BI Dashboard to SAS Visual Analytics. Then the converted dashboards can be shown in supported viewers of SAS Visual Analytics including mobile devices and modern browsers.

Read the paper (PDF)

SAS^® migrations are the number one reason why SAS architects and administrators are fired. Even though this bold statement is not universally true, it has been at the epicenter of many management and technical discussions at UnitedHealth Group. The competing business forces between the desire to innovate and to provide platform stability drive difficult discussions between business leaders and IT partners that tend to result in a frustrated user-base, flustered IT professionals, and a stale SAS environment. Migrations are the antagonist of any IT professional because of the disruption, long hours, and stress that typically ensues. This paper addresses the lessons learned from a SAS migration from the first maintenance release of SAS^® 9.4 to the third maintenance release of SAS^® 9.4 on a technically sophisticated enterprise SAS platform including clustered metadata servers, clustered middle-tier, SSL, an IBM Platform Load Sharing Facility (LSF) grid, and SAS^® Visual Analytics.

Read the paper (PDF)

This paper presents a case study in which social media posts by individuals related to public transport companies in the United Kingdom were collected from social media sites such as Twitter and Facebook and also from forums using SAS^® and Python. The posts were then further processed by SAS^® Text Miner and SAS^® Visual Analytics to retrieve brand names, means of public transport (underground, trains, buses), and any mentioned attributes. Relevant concepts and topics are identified using text mining techniques and visualized using concept maps and word clouds. Later, we aim to identify and categorize sentiments against public transport in the corpus of the posts. Finally, we create an association map/mind-map of the different service dimensions/topics and the brands of public transport, using correspondence analysis.

Read the paper (PDF)

Prince Niccolo Machiavelli said things on the order of, The promise given was a necessity of the past: the word broken is a necessity of the present. His utilitarian philosophy can be summed up by the phrase, The ends justify the means. As a personality trait, Machiavelianism is characterized by the drive to pursue one's own goals at the cost of others. In 1970, Richard Christie and Florence L. Geis created the MACH-IV test to assign a MACH score to an individual, using 20 Likert-scaled questions. The purpose of this study was to build a regression model that can be used to predict the MACH score of an individual using fewer factors. Such a model could be useful in screening processes where personality is considered, such as in job screening, offender profiling, or online dating. The research was conducted on a data set from an online personality test similar to the MACH-IV test. It was hypothesized that a statistically significant model exists that can predict an average MACH score for individuals with similar factors. This hypothesis was accepted.

View the e-poster or slides (PDF)

Dynamic social networks can be used to monitor the constantly changing nature of interactions and relationships between people and groups. The size and complexity of modern dynamic networks can make this task extremely challenging. Using the combination of SAS/IML^®, SAS/QC^®, and R, we propose a fast approach to monitor dynamic social networks. A discrepancy score at edge level was developed to measure the unusualness of the observed social network. Then, multivariate and univariate change-point detection methods were applied on the aggregated discrepancy score to identify the edges and vertices that have experienced changes. Stochastic block model (SBM) networks were simulated to demonstrate this method using SAS/IML and R. PROC SHEWHART and PROC CUSUM in SAS/QC and PROC SGRENDER heat maps were applied on the aggregated discrepancy score to monitor the dynamic social network. The combination of SAS/IML, SAS/QC, and R make it an ideal tool to monitor dynamic social networks.

View the e-poster or slides (PDF)

Microsoft Excel worksheets enable you to explore data that answers the difficult questions that you face daily in your work. When you combine the SAS^® Output Deliver System (ODS) with the capabilities of Excel, you have a powerful toolset that you can use to manipulate data in various ways, including highlighting data, using formulas to answer questions, and adding a pivot table or graph. In addition, ODS and Excel give you many methods for enhancing the appearance of your tables and graphs. This paper, written for the beginning analyst to the most advanced programmer, illustrates first how to manipulate styles and presentation elements in your worksheets by controlling text wrapping, highlighting and exploring data, and specifying Excel templates for data. Then, the paper explains how to use the TableEditor tagset and other tools to build and manipulate both basic and complex pivot tables that can help you answer all of the questions about your data. You will also learn techniques for sorting, filtering, and summarizing pivot-table data. ^®

Read the paper (PDF)

The SAS/IML^® language excels in handling matrices and performing matrix computations. A new feature in SAS/IML 14.2 is support for nonmatrix data structures such as tables and lists. In a matrix, all elements are of the same type: numeric or character. Furthermore, all rows have the same length. In contrast, SAS/IML 14.2 enables you to create a structure that contains many objects of different types and sizes. For example, you can create an array of matrices in which each matrix has a different dimension. You can create a table, which is an in-memory version of a data set. You can create a list that contains matrices, tables, and other lists. This paper describes the new data structures and shows how you can use them to emulate other structures such as stacks, associative arrays, and trees. It also presents examples of how you can use collections of objects as data structures in statistical algorithms.

Read the paper (PDF)

The TABULATE procedure has long been a central workhorse of our organization's reporting processes, given that it offers a uniquely concise syntax for obtaining descriptive statistics on deeply grouped and nested categories within a data set. Given the diverse output capabilities of SAS^®, it often then suffices to simply ship the procedure's completed output elsewhere via the Output Delivery System (ODS). Yet there remain cases in which we want to not only obtain a formatted result, but also to acquire the full nesting tree and logic by which the computations were made. In these cases, we want to treat the details of the Tabulate statements as data, not merely as presentation. I demonstrate how we have solved this problem by parsing our Tabulate statements into a nested tree structure in JSON that can be transferred and easily queried for deep values elsewhere beyond the SAS program. Along the way, this provides an excellent opportunity to walk through the nesting logic of the procedure's statements and explain how to think about the axes, groupings, and set computations that make it tick. The source code for our syntax parser are also available on GitHub for further use.

Read the paper (PDF)

Multicollinearity can be briefly described as the phenomenon in which two or more identified predictor variables in a multiple regression model are highly correlated. The presence of this phenomenon can have a negative impact on the analysis as a whole and can severely limit the conclusions of the research study. This paper reviews and provides examples of the different ways in which multicollinearity can affect a research project, and tells how to detect multicollinearity and how to reduce it once it is found. In order to demonstrate the effects of multicollinearity and how to combat it, this paper explores the proposed techniques by using the Behavioral Risk Factor Surveillance System data set. This paper is intended for any level of SAS^® user. This paper is also written to an audience with a background in behavioral science or statistics.

Read the paper (PDF)

Recent advances in computing technology, monitoring systems, and data collection mechanisms have prompted renewed interest in multivariate time series analysis. In contrast to univariate time series models, which focus on temporal dependencies of individual variables, multivariate time series models also exploit the interrelationships between different series, thus often yielding improved forecasts. This paper focuses on cointegration and long memory, two phenomena that require careful consideration and are observed in time series data sets from several application areas, such as finance, economics, and computer networks. Cointegration of time series implies a long-run equilibrium between the underlying variables, and long memory is a special type of dependence in which the impact of a series' past values on its future values dies out slowly with the increasing lag. Two examples illustrate how you can use the new features of the VARMAX procedure in SAS/ETS^® 14.1 and 14.2 to glean important insights and obtain improved forecasts for multivariate time series. One example examines cointegration by using the Granger causality tests and the vector error correction models, which are the techniques frequently applied in the Federal Reserve Board's Comprehensive Capital Analysis and Review (CCAR), and the other example analyzes the long-memory behavior of US inflation rates.

Read the paper (PDF) | Download the data file (ZIP)

No Batch Scheduler? No problem! This paper describes the use of a SAS^® Data Integration Studio job that can be started by a time-dependent scheduler like Windows Scheduler (or crontab in UNIX) to mimic a batch scheduler using SAS^® Grid Manager.

Read the paper (PDF)

The SAS^® DATA step is one of the best (if not the best) data manipulators in the programming world. One of the areas that gives the DATA step its power is the wealth of functions that are available to it. This paper takes a PEEK at some of the functions whose names have more than one MEANing. While the subject matter is very serious, the material is presented in a humorous way that is guaranteed not to BOR the audience. With so many functions available, we have to TRIM our list so that the presentation can be made within the TIME allotted. This paper also discusses syntax and shows several examples of how these functions can be used to manipulate data.

Read the paper (PDF)

A new ODS destination for creating Microsoft Excel workbooks is available starting in the third maintenance release for SAS^® 9.4. This destination creates native Microsoft Excel XLSX files, supports graphic images, and offers other advantages over the older ExcelXP tagset. In this presentation, you learn step-by-step techniques for quickly and easily creating attractive multi-sheet Excel workbooks that contain your SAS^® output. The techniques can be used regardless of the platform on which SAS software is installed. You can even use them on a mainframe! Creating and delivering your workbooks on demand and in real time using SAS server technology is discussed. Using earlier versions of SAS to create multi-sheet workbooks is also discussed. Although the title is similar to previous presentations by this author, this presentation contains new and revised material not previously presented.

Read the paper (PDF) | Download the data file (ZIP)

Creating your first suite of reports using SAS^® Visual Analytics is like being a kid in a candy store with so many options for data visualization, it is difficult to know where to start. Having a plan for implementation can save you a lot of time in development and beyond, especially when you are wrangling big data. This paper helps you make sure that you are parallelizing work (where possible), maximizing your data insights, and creating a polished end product. We provide guidelines to common questions, such as How many objects are too many ? or When should I use multiple tabs versus report linking? to start any data visualizer off on the right foot.

Read the paper (PDF)

Confidence intervals are critical to understanding your survey data. If your intervals are too narrow, you might inadvertently judge a result to be statistically significant when it is not. While many familiar SAS^® procedures, such as PROC MEANS and PROC REG, provide statistical tests, they rely on the assumption that the data comes from a simple random sample. However, almost no real-world survey uses such sampling. Learn how to use the SURVEYMEANS procedure and its SURVEY cousins to estimate confidence intervals and perform significance tests that account for the structure of the underlying survey, including the replicate weights now supplied by some statistical agencies. Learn how to extract the results you need from the flood of output that these procedures deliver.

Read the paper (PDF)

Do you create Excel files from SAS^®? Do you use the ODS EXCELXP tagset or the ODS EXCEL destination? In this presentation, the EXCELXP tagset and the ODS EXCEL destination are compared face to face. There's gonna be a showdown! We give quick tips for each and show how to create Excel files for our Special Census program. Pros of each method are explored. We show the added benefits of the ODS EXCEL destination. We display how to create XML files with the EXCELXP tagset. We present how to use TAGATTR formats with the EXCELXP tagset to ensure that leading and trailing zeros in Excel are preserved. We demonstrate how to create the same Excel file with the ODS EXCEL destination with SAS formats instead of with TAGATTR formats. We show how the ODS EXCEL destination creates native Excel files. One of the drawbacks of an XML file created with the EXCELXP tagset is that a pop-up message is displayed in Excel each time you open it. We present differences using the ABSOLUTE_COLUMN_WIDTH= option in both methods.

Read the paper (PDF)

In order to display data visually, our audience preferred charts and graphs generated by Microsoft Excel over those generated by SAS^®. However, to make the necessary 30 graphs in Excel took 2 3 hours of manual work, even though the chart templates had already been created, and led to mistakes due to human error. SAS graphs took much less time to create, but lacked key functionality that the audience preferred and that was available in Excel graphs. Thanks to SAS, the answer came in Excel 4 Macro Language (X4ML) programming. SAS can actually submit coding to Excel in order to create customized data reporting, to create graphs or to update templates' data series, and even to populate Microsoft Word documents for finalized reports. This paper explores how SAS can be used to create presentation-ready graphs in a proven process that takes less than one minute, compared to the earlier process that took hours. The following code is used and discussed: %macro(macro_var), filename, rc commands, Output Delivery System (ODS), X4ML, and Microsoft Visual Basic for Applications (VBA).

Read the paper (PDF)

When first learning SAS^®, programmers often see the proprietary DATA step as a foreign and nonstandard concept. The introduction of the SAS^® 9.4 DS2 language eases the transition for traditional programmers delving into SAS for the first time. Object Oriented Programming (OOP) has been an industry mainstay for many years, and the DS2 procedure provides an object-oriented environment for the DATA step. In this poster, we go through a business case to show how DS2 can be used to define a reusable package following object-oriented principles.

View the e-poster or slides (PDF)

As a data scientist, you need analytical tools and algorithms, whether commercial or open source, and you have some favorites. But how do you decide when to use what? And how can you integrate their use to your maximum advantage? This presentation provides several best practices for deploying both SAS^® and open-source analytical tools to increase productivity and efficiency in your enterprise ecosystem. See an example of a marketing analysis using SAS and R algorithms in SAS^® Enterprise Miner to develop a predictive model, and then operationalize that model for performance monitoring and in-database scoring. Also learn about using Python and SAS integration for developing predictive models from a Jupyter Notebook environment. Seeing these cases will help you decide how to improve your analytics with similar integration of SAS and open source.

Read the paper (PDF)

Many communication channels exist for customers to engage with businesses, yet an interactive voice response (IVR) system remains the most critical of them. The reason is is because IVR acts as the front end to consumer interaction and is the most effective method for customers to do business with companies in order to resolve their issues before talking to an agent. If the IVR interface is not designed properly, customers can be stuck in an endless loop of pressing buttons that can lead to consumer annoyance. The bottom line is: An IVR system should be set up to quickly resolve as many routine inbound inquires as possible and to allow customers to speak to an agent when necessary. In order to accomplish this, the IVR interface has to be optimized so that it is fully effective and provides a great customer experience. This paper demonstrates how SAS^® tools helped optimize the IVR system of a book publishing company. The data set used in this study was obtained from a telecom services company and contained IVR logs of more than 300,000 calls with 1.4 million observations. To gain insights into customer behaviors, path analysis was performed on this data using SAS^® Enterprise Miner and obstacles faced by customers were identified. This helped in determining underperforming prompts, and analysis using SAS procedures was conducted on such prompts. Prompts tuning was recommended and new self-service areas were identified that avoid transfers and can save clients thousands of dollars in investments in call centers.

Read the paper (PDF)

Making optimal use of SAS^® Grid Computing relies on the ability to spread the workload effectively across all of the available nodes. With SAS^® Scalable Performance Data Server (SPD Server), it is possible to partition your data and spread the processing across the SAS Grid Computing environment. In an ideal world it would be possible to adjust the size and number of partitions according to the data volumes being processed on any given day. This paper discusses a technique that enables the processing performed in the SAS Grid Computing environment to be dynamically reconfigured, automatically at run time, to optimize the use of SAS Grid Computing, and to provide significant performance benefits.

Read the paper (PDF)

Today, companies are increasingly using analytics to discover new revenue and cost-saving opportunities. Many business professionals turn to SAS, a leader in business analytics software and service, to help them improve performance and make better decisions faster. Analytics is also being used in risk management, fraud detection, life sciences, sports, and many more emerging markets. However, to maximize the value to the business, analytics solutions need to be deployed quickly and cost-effectively, while also providing the ability to readily scale without degrading performance. Of course, in today's demanding environments, where budgets are still shrinking and mandates to reduce carbon footprints are growing, the solution must deliver excellent hardware utilization, power efficiency, and return on investment. To help solve some of these challenges, Red Hat and SAS have collaborated to recommend the best practices for configuring SAS^®9 running on Red Hat Enterprise Linux. The scope of this document covers Red Hat Enterprise Linux 6 and 7. Areas researched include the I/O subsystem, file system selection, and kernel tuning, both in bare metal and virtualized (KVM) environments. Additionally, we now include grid-based configurations running with Red Hat Resilient Storage Add-On (Global File System 2 [GFS2] clusters).

Read the paper (PDF)

The analytical data life cycle consists of 4 stages: data exploration, preparation, model development, and model deployment. Traditionally, these stages can consume 80% of the time and resources within your organization. With innovative techniques such as in-database and in-memory processing, managing data and analytics can be streamlined, with an increase in performance, economics, and governance. This session explores how you can optimize the analytical data life cycle with some best practices and tips using SAS^® and Teradata.

Outliers, such as unusual, violated, unexpected or rare events, have been focused on intensively by researchers and practitioners, providing their impacts on estimated statistics and developed models. Today, some business disciplines are focusing primarily on outliers such as defaults of credit, operational risks, quality nonconformities, fraud, or even the results of marketing initiatives in highly competitive environments with low response rates of a couple percent or even less. This paper discusses the importance of detecting, isolating, and categorizing business outliers to discover their root causes and to monitor them dynamically. Addressing not only extreme values or multivariable densities detecting outliers, but also addressing distributions, patterns, clusters, combinations of items, and sequences of events will allow for opportunities to be established for business improvement. SAS^® Enterprise Miner can be used to perform such detections. Thus, creating special business segments or running specialized outlier oriented data mining processes, such as decision trees, allows for isolation of business important outliers, which are normally masked in traditional statistical techniques. This process combined with 'What-If' scenario generation prepares businesses for future possible surges even when having no current specific type outliers. Furthermore, analyzing some specific outliers may play a role in assessing business stability to corresponding stress tests.

Read the paper (PDF)

The DATASETS procedure provides the most diverse selection of capabilities and features of any of the SAS^® procedures. It is the prime tool that programmers can use to manage SAS data sets, indexes, catalogs, and so on. Many SAS programmers are only familiar with a few of PROC DATASETS's many capabilities. Most often, they only use the data set updating, deleting, and renaming capabilities. However, there are many more features and uses that should be in a SAS programmer's toolkit. This paper highlights many of the major capabilities of PROC DATASETS. It discusses how it can be used as a tool to update variable information in a SAS data set; provide information about data set and catalog contents; delete data sets, catalogs, and indexes; repair damaged SAS data sets; rename files; create and manage audit trails; add, delete, and modify passwords; add and delete integrity constraints; and more. The paper contains examples of the various uses of PROC DATASETS that programmers can cut and paste into their own programs as a starting point. After reading this paper, a SAS programmer will have practical knowledge of the many different facets of this important SAS procedure.

Read the paper (PDF)

In this paper, we explore advantages of the DS2 procedure over the DATA step programming in SAS^®. DS2 is a new SAS proprietary programming language appropriate for advanced data manipulation. We explore the use of PROC DS2 to execute queries in databases using SAS FedSQL. Several DS2 language elements accept embedded FedSQL syntax, and the run-time generated queries can exchange data interactively between DS2 and the supported database. This action enables SQL preprocessing of input tables, which effectively allows processing data from multiple tables in different databases within the same query, thereby drastically reducing processing times and improving performance. We explore use of DS2 for creating tables, bulk loading tables, manipulating tables, and querying data in an efficient manner. We explore advantages of using PROC DS2 over DATA step programming such as support for different data types, ANSI SQL types, programming structure elements, and benefits of using new expressions or writing one's own methods or packages available in the DS2 system. The DS2 procedure enables requests to be processed by the DS2 data access technology that supports a scalable, threaded, high-performance, and standards-based way to access, manage, and share relational data. In the end, we empirically measure performance benefits of using PROC DS2 over PROC SQL for processing queries in-database by taking advantage of threaded processing in supported databases such as Oracle.

Read the paper (PDF)

Multicategory logit models extend the techniques of logistic regression to response variables with three or more categories. For ordinal response variables, a cumulative logit model assumes that the effect of an explanatory variable is identical for all modeled logits (known as the assumption of proportional odds). Past research supports the finding that as the sample size and number of predictors increase, it is unlikely that proportional odds can be assumed across all predictors. An emerging method to effectively model this relationship uses a partial proportional odds model, fit with unique parameter estimates at each level of the modeled relationship only for the predictors in which proportionality cannot be assumed. First used in SAS/STAT^® 12.1, PROC LOGISTIC in SAS^® 9.4 now extends this functionality for variable selection methods in a manner in which all equal and unequal slope parameters are available for effect selection. Previously, the statistician was required to assess predictor non-proportionality a priori through likelihood tests or subjectively through graphical diagnostics. Following a review of statistical methods and limitations of other commercially available software to model data exhibiting non-proportional odds, a public-use data set is used to examine the new functionality in PROC LOGISTIC using stepwise variable selection methods. Model diagnostics and the improvement in prediction compared to a general cumulative model are noted.

Read the paper (PDF) | Download the data file (ZIP) | View the e-poster or slides (PDF)

Real workflow dependencies exist when the completion or output of one data process is a prerequisite for subsequent data processes. For example, in extract, transform, load (ETL) systems, the extract must precede the transform and the transform must precede the load. This serialization is common in SAS^® data analytic development but should be implemented only when actual dependencies exist. A false dependency, by contrast, exists when the workflow itself does not require serialization but is coded in a manner that forces a process to wait unnecessarily for some unrelated process to complete. For example, an ETL system might extract, transform, and load one data set, and then extract, transform, and load a second data set, causing processing of the second data set to wait unnecessarily for the first to complete. This hands-on session demonstrates three common patterns of false dependencies, teaching SAS practitioners how to recognize and remedy false dependencies through parallel processing paradigms. Groups of participants are pitted against each other, as the class simultaneously runs both serialized software and distributed software that runs in parallel. Participants execute exercises in unison, and then watch their machines race to the finish as the tremendous performance advantages of parallel processing are demonstrated in one exercise after another--ideal for anyone seeking to walk away with proven techniques that can measurably increase your performance and bonus.

Read the paper (PDF)

JavaScript Object Notation (JSON) has quickly become the de facto standard for data transfer on the Internet due to an increase in web data and the usage of full-stack JavaScript. JSON has become dominant in the emerging technologies of the web today, such as in the Internet of Things and in the mobile cloud. JSON offers a light and flexible format for data transfer. It can be processed directly from JavaScript without the need for an external parser. This paper discusses several abilities within SAS^® to process JSON files, the new JSON LIBNAME, and several procedures. This paper compares all of these in detail.

Read the paper (PDF)

Moving a workforce in a new direction takes a lot of energy. Your planning should include four pillars: culture, technology, process, and people. These pillars assist small and large SAS^® rollouts with a successful implementation and an eye toward future proofing. Boston Scientific is a large multi-national corporation that recently grew SAS from a couple of desktops to a global implementation. Boston Scientific's real world experiences reflect on each pillar, both in what worked and in lessons learned.

Read the paper (PDF)

Installation and configuration of a SAS^® Enterprise BI platform in the requirements of the today's world requires knowledge on a wide variety of subjects. Security requirements are growing, the number of involved components is growing, time to delivery should be shorter, and the quality must be increased. The expectations of the customers are based on a cloud experience where automated deployments with ready-to-use applications are state of the art. This paper describes an approach to address the challenges to deploy SAS^® 9.4 on Linux to meet today's customer expectations.

Read the paper (PDF)

For customers providing SAS^® reporting to the public, the ability to use a Social login opens up a number of possibilities to provide richer services. Instead of everybody using generic Guest access and being limited to a common subset of reports or other functionality, previously unknown users can seamlessly log in and access SAS web content while SAS administrators can continue to apply best-practice security. This paper focuses on integrating Google Sign-In, Microsoft Account Sign-In, and Facebook Sign-In as alternative methods to log in from the SAS Logon Manager, as well as registering any new users SAS metadata automatically.

Read the paper (PDF)

SAS^® Decision Manager includes a hidden gem: a web service for high-speed online scoring of business events. The fourth maintenance release of SAS^® 9.4 represents the third release of the SAS^® Micro Analytics Service for scoring SAS^® DS2 code decisions in a standard JSON web service. Users will learn how to create decisions, deploy modules to the web service, test the service, and record business events.

Read the paper (PDF)

The most commonly reported model evaluation metric is the accuracy. This metric can be misleading when the data are imbalanced. In such cases, other evaluation metrics should be considered in addition to the accuracy. This study reviews alternative evaluation metrics for assessing the effectiveness of a model in highly imbalanced data. We used credit card clients in Taiwan as a case study. The data set contains 30,000 instances (22.12% risky and 77.88% non-risky) assessing the likeliness of a customer defaulting on a payment. Three different techniques were used during the model building process. The first technique involved down-sampling the majority class in the training subset. The second used the original imbalanced data whereas prior probabilities were set to account for oversampling in the third technique. The same sets of predictive models were then built for each technique after which the evaluation metrics were computed. The results suggest that model evaluation metrics might reveal more about distribution of classes than they do about the actual performance of models when the data are imbalanced. Moreover, some of the predictive models were identified to be very sensitive to imbalance. The final decision in model selection should consider a combination of different measures instead of relying on one measure. To minimize imbalance-biased estimates of performance, we recommend reporting both the obtained metric values and the degree of imbalance in the data.

Read the paper (PDF)

Predictive modeling might just be the single most thrilling aspect of data science. Who among us can deny the allure: to observe a naturally occurring phenomenon, conjure a mathematical model to explain it, and then use that model to make predictions about the future? Though many SAS^® users are familiar with using a data set to generate a model, they might not use the awesome power of SAS to store the model and score other data sets. In this paper, we distinguish between parametric and nonparametric models and discuss the tools that SAS provides for storing and scoring each. Along the way, you come to know the STORE statement and the SCORE procedure. We conclude with a brief overview of the PLM procedure and demonstrate how to effectively load and evaluate models that have been stored during the model building process.

Read the paper (PDF)

Predictive analytics has been evolving in property and casualty insurance for the past two decades. This paper first provides a high-level overview of predictive analytics in each of the following core business operations in the property and casualty (P&C) insurance industry: marketing, underwriting, actuarial pricing, actuarial reserving, and claims. Then, a common P&C insurance predictive modeling technical process in SAS^® dealing with large data sets is introduced. The steps of this process include data acquisition, data preparation, variable creation, variable selection, model building (also known as model fitting), model validation, model testing, and so on. Finally, some successful models are introduced. Base SAS^®, SAS/STAT^® software, SAS^® Enterprise Guide^®, and SAS^® Enterprise Miner are presented as the main tools for this process. This predictive modeling process could be tweaked or directly used in many other industries as the statistical foundations of predictive analytics have large overlaps across P&C insurance, health care, life insurance, banking, pharmaceutical, genetics industries, and so on. This paper is intended for any level of SAS^® user or business people from different industries who are interested in learning about general predictive analytics.

Read the paper (PDF)

Face it your data can occasionally contain characters that wreak havoc on your macro code. Characters such as the ampersand in at&t, or the apostrophe in McDonald's, for example. This paper is designed for programmers who know most of the ins and outs of SAS^® macro code already. Now let's take your macro skills a step farther by adding to your skill set, specifically, %BQUOTE, %STR, %NRSTR, and %SUPERQ. What is up with all these quoting functions? When do you use one over the other? And why would you need %UNQUOTE? The macro language is full of subtleties and nuances, and the quoting functions represent the epitome of all of this. This paper shows you in which instances you would use the different quoting functions. Specifically, we show you the difference between the compile-time and the execution-time functions. In addition to looking at the traditional quoting functions, you learn how to use %QSCAN and %QSYSFUNC among other functions that apply the regular function and quote the result.

Read the paper (PDF)

Conferences for SAS^® programming are replete with the newest software capabilities and clever programming techniques. However, discussion about quality control (QC) is lacking. QC is fundamental to ensuring both correct results and sound interpretation of data. It is not industry specific, and it simply makes sense. Most QC procedures are a function of regulatory requirements, industry standards, and corporate philosophies. Good QC goes well beyond just reviewing results, and should also consider the underlying data. It should be driven by a thoughtful consideration of relevance and impact. While programmers strive to produce correct results, it is no wonder that programming mistakes are common despite rigid QC processes in an industry where expedited deliverables and a lean workforce are the norm. This leads to a lack of trust in team members and an overall increase in resource requirements as these errors are corrected, particularly when SAS programming is outsourced. Is it possible to produce results with a high degree of accuracy, even when time and budget are limited? Thorough QC is easy to overlook in a high-pressure environment with increased expectations of workload and expedited deliverables. Does this suggest that QC programming is becoming a lost art, or does it simply suggest that we need to evolve with technology? The focus of the presentation is to review the who, what, when, how, why, and where of QC programming implementation.

Read the paper (PDF)

SQL is a universal language that allows you to access data stored in relational databases or tables. This hands-on workshop presents core concepts and features of using PROC SQL to access data stored in relational database tables. Attendees learn how to define, access, and manipulate data from one or more tables using PROC SQL quickly and easily. Numerous code examples are presented on how to construct simple queries, subset data, produce simple and effective output, join two tables, summarize data with summary functions, construct BY-groups, identify FIRST. and LAST. rows, and create and use virtual tables.

Read the paper (PDF) | Download the data file (ZIP)

SAS^® Enterprise Guide^® empowers organizations, programmers, business analysts, statisticians, and end users with all the capabilities that SAS has to offer. This hands-on workshop presents the SAS Enterprise Guide graphical user interface (GUI). It covers access to multi-platform enterprise data sources, various data manipulation techniques that do not require you to learn complex coding constructs, built-in wizards for performing reporting and analytical tasks, the delivery of data and results to a variety of mediums and outlets, and support for data management and documentation requirements. Attendees learn how to use the graphical user interface to access SAS^® data sets and tab-delimited and Microsoft Excel input files; to subset and summarize data; to join (or merge) two tables together; to flexibly export results to HTML, PDF, and Excel; and to visually manage projects using flow diagrams.

Read the paper (PDF)

SAS^® Enterprise Guide^® empowers organizations, programmers, business analysts, statisticians, and users with all the capabilities that SAS^® has to offer. This hands-on workshop presents the SAS Enterprise Guide graphical user interface (GUI), access to multi-platform enterprise data sources, various data manipulation techniques without the need to learn complex coding constructs, built-in wizards for performing reporting and analytical tasks, the delivery of data and results to a variety of mediums and outlets, and support for data management and documentation requirements. Attendees learn how to use the GUI to access SAS data sets and tab-delimited and Excel input files; how to subset and summarize data; how to join (or merge) two tables together; how to flexibly export results to HTML, PDF, and Excel; and how to visually manage projects using flow diagrams.

Read the paper (PDF) | Download the data file (ZIP)

The announcement of SAS Institute's free SAS^® University Edition is an exciting development for SAS users and learners around the world! The software bundle includes Base SAS^®, SAS/STAT^® software, SAS/IML^® software, SAS^® Studio (user interface), and SAS/ACCESS^® for Windows, with all the popular features found in the licensed SAS versions. This is an incredible opportunity for users, statisticians, data analysts, scientists, programmers, students, and academics everywhere to use (and learn) for career opportunities and advancement. Capabilities include data manipulation, data management, comprehensive programming language, powerful analytics, high-quality graphics, world-renowned statistical analysis capabilities, and many other exciting features. This paper illustrates a variety of powerful features found in the SAS University Edition. Attendees will be shown a number of tips and techniques on how to use the SAS^® Studio user interface, and they will see demonstrations of powerful data management and programming features found in this exciting software bundle.

Read the paper (PDF)

Getting speedy results from your SAS^® programs when you re working with bulky data sets is more than elegant coding techniques. There are several approaches to improving performance when working with biggish data. Although you can upgrade your hardware, this just helps you to run inefficient code and bloated tables quicker. So, you should also consider the results that tuning your database and adjusting your SAS platform can bring. In this paper, we review the various options available to give you some ideas about things you can do better.

Read the paper (PDF)

The United States Access Board will soon refresh the Section 508 accessibility standards. The new requirements are based on Web Content Accessibility Guidelines (WCAG) 2.0 and include a total of 38 testable success criteria-16 more than the current requirements. Is your organization ready? Don't worry, the fourth maintenance release for SAS^® 9.4 Output Delivery System (ODS) HTML5 destination has you covered. This paper describes the new accessibility features in the ODS HTML5 destination, explains how to use them, and shows you how to test your output for compliance with the new Section 508 standards.

Read the paper (PDF)

A random forest is an ensemble of decision trees that often produce more accurate results than a single decision tree. The predictions of the individual trees in the forest are averaged to produce a final prediction. The question now arises whether a better or more accurate final prediction cannot be obtained by a more intelligent use of the trees in the forest. In particular, in the way random forests are currently defined, every tree contributes the same fraction to the final result (for example, if there are 50 trees, each tree contributes 1/50th to the final result). This ignores model uncertainty as less accurate trees are treated exactly like more accurate trees. Replacing averaging with Bayesian Model Averaging will give better trees the opportunity to contribute more to the final result, which might lead to more accurate predictions. However, there are several complications to this approach that have to be resolved, such as the computation of an SBC value for a decision tree. Two novel approaches to solving this problem are presented and the results compared to that obtained with the standard random forest approach.

Read the paper (PDF)

SAS^® Management Console has been a key tool to interact with SAS^® Metadata Server. But sometimes users need much more than what SAS Management Console can do. This paper contains a couple of SAS^® macros that can be used in SAS^® Enterprise Guide^® and PC SAS to read SAS metadata. These macros read users, roles, and groups registered in metadata. This paper explains how these macros can be executed in SAS Enterprise Guide and how to change these macros to meet other business requirements. There might be some tools available in the market that can be used to read SAS metadata, but this paper helps in achieving most of them within a SAS client like PC SAS and SAS Enterprise Guide, without requiring any additional plug-ins.

Read the paper (PDF) | View the e-poster or slides (PDF)

The intrepid Mars Rovers have inspired awe and curiosity and dreams of mapping Mars using SAS/GRAPH^® software. This presentation demonstrates how to import Esri shapefile (SHP) data (using the MAPIMPORT procedure) from sources other than SAS^® and GfK GeoMarketing map data to produce useful (and sometimes creative) maps. Examples include mapping neighborhoods, ZCTA5 areas, postal codes, and of course, Mars. Products used are Base SAS^® and SAS/GRAPH^®. SAS programmers of any skill level will benefit from this presentation.

Read the paper (PDF)

We live in a world of data; small data, big data, and data in every conceivable size between small and big. In today's world, data finds its way into our lives wherever we are. We talk about data, create data, read data, transmit data, receive data, and save data constantly during any given hour in a day, and we still want and need more. So, we collect even more data at work, in meetings, at home, on our smartphones, in emails, in voice messages, sifting through financial reports, analyzing profits and losses, watching streaming videos, playing computer games, comparing sports teams and favorite players, and countless other ways. Data is growing and being collected at such astounding rates, all in the hope of being able to better understand the world around us. As SAS^® professionals, the world of data offers many new and exciting opportunities, but it also presents a frightening realization that data sources might very well contain a host of integrity issues that need to be resolved first. This presentation describes the available methods to remove duplicate observations (or rows) from data sets (or tables) based on the row's values and keys using SAS.

Read the paper (PDF)

Do you ever feel like you email the same reports to the same people over and over and over again? If your customers are anything like mine, you create reports, and lots of them. Our office is using macros, SAS^® email capabilities, and other programming techniques, in conjunction with our trusty contact list, to automate report distribution. Customers now receive the data they need, and only the data they need, on the schedule they have requested. In addition, not having to send these emails out manually saves our office valuable time and resources that can be used for other initiatives. In this session, we walk through a few of the SAS techniques we are using to provide better service to our internal and external partners and, hopefully, make us look a little more like rock stars.

Read the paper (PDF)

If you've got an iPhone, you might have noticed that the Health app is hard at work collecting data on every step you take. And, of course, the data scientist inside you is itching to analyze that data with SAS^®. This paper and an accompanying E-Poster show you how to get step data out of your iPhone Health app and into SAS. Once it's there, you can have at it with all things SAS. In this presentation, we show you how a (what else?) step plot can be used to visualize the 73,000+ steps the author took at SAS^® Global Forum 2016.

Read the paper (PDF) | View the e-poster or slides (PDF)

From state-of-the-art research to routine analytics, the Jupyter Notebook offers an unprecedented reporting medium. Historically, tables, graphics, and other types of output had to be created separately, and then integrated into a report piece by piece, amidst the drafting of text. The Jupyter Notebook interface enables you to create code cells and markdown cells in any arrangement. Markdown cells allow all typical formatting. Code cells can run code in the document. As a result, report creation happens naturally and in a completely reproducible way. Handing a colleague a Jupyter Notebook file to be re-run or revised is much easier and simpler for them than passing along, at a minimum, two files: one for the code and one for the text. Traditional reports become dynamic documents that include both text and living SAS^® code that is run during document creation. With the new SAS kernel for Jupyter, all of this is possible and more!

Read the paper (PDF)

SAS^® job flows created by Windows services have a problem. Currently, they can execute only jobs in a series (one at a time). This can slow down job processing, and it limits the utility of the flows. This paper shows how you can alter the flow of Windows services after they have been generated to enable jobs to run in parallel (side by side). A high-level overview of PROC GROOVY, which automates these changes, is provided, as well as a summary of the positives and negatives of running jobs in parallel.

Read the paper (PDF) | Download the data file (ZIP)

There are so many ways for SAS/ACCESS^® users to read and write data from and to Microsoft Excel files: SAS^® PC Files Server, XLS and XLSX engines, the SAS IMPORT and EXPORT procedures, various Excel file formats (.xls, .xlsx, .xlsb, .xlsm), and more. Many users ask, 'Which is best for me?' This paper explores the requirements and limitations of each engine, along with performance considerations and some of the not-so-obvious things to consider. It also includes a brief analogous discussion on Microsoft Access databases, which share some of the same mechanisms.

Read the paper (PDF)

SAS^® has an amazing arsenal of tools for using and displaying geographic information that are relatively unknown and underused. High-quality GfK GeoMarketing maps have been provided by SAS since the second maintenance release for SAS^® 9.3, as sources for inexpensive map data dried up. SAS has been including both GfK and traditional SAS map data sets with licenses for SAS/GRAPH^® software for some time, recognizing there will need to be an extended transitional period. However, for those of us who have been putting off converting our SAS/GRAPH mapping programs to use the new GfK maps, the time has come, as the traditional SAS map data sets are no longer being updated. If you visit SAS^® Maps Online, you can find only GfK maps in current maps. The GfK maps are updated once a year. This presentation walk through the conversion of a long-standing SAS program to produce multiple US maps for a data compendium to take advantage of GfK maps. Products used are Base SAS^® and SAS/GRAPH^®. SAS programmers of any skill level will benefit from this presentation.

Read the paper (PDF)

One of the many difficulties for a SAS^® programmer is remembering how to accurately use SAS syntax, especially syntax that includes many parameters. Not mastering basic syntax parameters definitely makes coding inefficient because the programmer has to check reference manuals constantly to ensure that syntax is correct. One of the more useful but somewhat unknown tools in SAS is the use of SAS abbreviations. This feature enables users to store text strings (such as the syntax of a DATA step function, a SAS procedure, or a complete DATA step) in a user-defined and easy-to-remember abbreviation. When this abbreviation is entered in the enhanced editor, SAS automatically brings up the corresponding stored syntax. Knowing how to use SAS abbreviations is beneficial to programmers with varying levels of SAS expertise. In this paper, various examples of using SAS abbreviations are demonstrated.

Have you heard of SAS^® Customer Intelligence 360, the program for creating a digital marketing SasS offering on a multi-tenant SAS cloud? Were you mesmerized by it but found it overwhelming? Did you tell yourself, I wish someone would show me how to do this ? This paper is for you. This paper provides you with an easy, step-by-step procedure on how to create a successful digital web, mobile, and email marketing campaign. In addition to these basics, the paper points to resources that allow you to get deeper into the application and customize each object to satisfy your marketing needs.

Read the paper (PDF)

SAS^® Data Integration Studio jobs are not always linear. While Loop transformations have been part of SAS Data Integration Studio for ages, only more recently has SAS Data Integration Studio included the Conditional Control transformations to control logic flow within a job. This paper demonstrates the use of both the Loop and Conditional transformations in a real world example.

Read the paper (PDF)

A common issue in data integration is that often the documentation and the SAS^® data integration job source code start to diverge and eventually become out of sync. At Capgemini, working for a specific client, we developed a solution to rectify this challenge. We proposed moving all necessary documentation into the SAS^® Data Integration Studio job itself. In this way, all documentation then becomes part of the metadata we have created, with the possibility of automatically generating Job and Release documentation from the metadata. This presentation therefore focuses on the metadata documentation generator. Specifically, this presentation: 1) looks at how to use programming and documentation standards in SAS data integration jobs to enable the generation of documentation from the metadata; and 2) shows how the documentation is generated from the metadata, and the challenges that were encountered creating the code. I draw on our hands-on experience; Capgemini has implemented this for a customer in the Netherlands, and we are rolling this out as an accelerator in other SAS data integration projects worldwide. I share examples of the generated documentation, which contains functional and technical designs, including a list with all source tables, a list with the target tables, all transformations with their own documentation, job dependencies, and more.

Read the paper (PDF)

The hash object provides an efficient method for quick data storage and data retrieval. Using a common set of lookup keys, hash objects can be used to retrieve data, store data, merge or join tables of data, and split a single table into multiple tables. This paper explains what a hash object is and why you should use hash objects, and provides basic programming instructions associated with the construction and use of hash objects in a DATA step.

Read the paper (PDF)

Binary logistic regression models are widely used in CRM (customer relationship management) or credit risk modeling. In these models, it is common to use nominal, ordinal, or discrete (NOD) predictors. NOD predictors typically are binned (reducing the number of their levels) before usage in a logistic model. The primary purpose of binning is to obtain parsimony without greatly reducing the strength of association of the predictor X to the binary target Y. In this paper, two SAS^® macros are discussed. The %NOD_BIN macro bins predictors with nominal values (and ordinal and discrete values) by collapsing levels to maximize information value (IV). The %ORDINAL_BIN macro is applied to predictors that are ordered and in which collapsing can occur only for levels that are adjacent in the ordering of X. The %ORDINAL_BIN macro finds all possible binning solutions by complete enumeration. Solutions are ranked by IV, and monotonic solutions are identified.

Read the paper (PDF)

The SAS^® macro language provides a powerful tool to write a program once and reuse it many times in multiple places. A repeatedly executed section of a program can be wrapped into a macro, which can then be shared among many users. A practical example of a macro can be a utility that takes in a set of input parameters, performs some calculations, and sends back a result (such as an interest calculator). In general, a macro modularizes a program into smaller and more manageable sections, and encapsulates repetitive tasks into re-usable code. Modularization can help the code to be tested independently. This paper provides an introduction to writing macros. It introduces the user to the basic macro constructs and statements. This paper covers the following advanced macro subjects: 1) using multiple &s to retrieve/resolve the value of a macro variable; 2) creating a macro variable from the value of another macro variable; 3) handling special characters; 4) the EXECUTE statement to pass a DATA step variable to a macro; 5) using the Execute statement to invoke a macro; and 6) using %RETURN to return a variable from a macro.

Read the paper (PDF)

The purpose of this paper is to provide an overview of SAS^® metadata security for new or inexperienced SAS administrators. The focus of the discussion is on identifying the most common metadata security objects such as access control entries (ACEs), access control templates (ACTs), metadata folders, authentication domains, and so on, and on describing how these objects work together to secure the SAS environment. Based on a standard SAS^® Enterprise Office Analytics for Midsize Business installation in a Windows environment, this paper walks through a simple example of securing a metadata environment, which demonstrates how security is prioritized, the impact of each security layer, and how conflicts are resolved.

Read the paper (PDF)

You have got your SAS^® environments installed, configured, and running smoothly. Time to relax and put your feet up, right? Not so fast! There is still one more leg to go on your security journey. After the deployment of your initial security plan, the security audit process provides active and regular monitoring and ensures that your environment remains secure. There are many reasons to carry out security audits: to ensure regulatory compliance, to maintain business confidence, and to keep your SAS platform as per the design specifications. This paper looks at some of the available ways to regularly review your environment to ensure that protected resources are not at risk, to comply with security auditing requirements, and to quickly and easily answer the question 'Who has access to what?' through efficient SAS metadata security management using Metacoda software.

Read the paper (PDF)

After you know the basics of SAS^® Visual Analytics, you realize that there are some situations that require unique strategies. Sometimes tables are not structured correctly or become too large for the environment. Maybe creating the right custom calculation for a dashboard can be confusing. Geospatial data is hard to work with if you haven't ever used it before. We studied hundreds of SAS^® Communities posts for the most common questions. These solutions (and a few extras) were extracted from the newly released book titled 'An Introduction to SAS^® Visual Analytics: How to Explore Numbers, Design Reports, and Gain Insight into Your Data'.

Read the paper (PDF)

Not only does the new SAS^® Viya platform bring exciting advancements in high-performance analytics, it also takes a revolutionary step forward in the area of administration. The new SAS^® Cloud Analytic Services is accompanied by new platform management tools and techniques that are designed to ease the administrative burden while leveraging the open programming and visual interfaces that are standard among SAS Viya applications. Learn about the completely rewritten SAS^® Environment Manager 3.2, which supports the SAS Viya platform. It includes a cleaner HTML5-based user interface, more flexible and intuitive authorization windows, and user and group management that is integrated with your corporate Lightweight Directory Access Protocol (LDAP). Understand how authentication works in SAS Viya without metadata identities. Discover the key differences between SAS^®9 and SAS Viya deployments, including installation and automated update-in-place strategies orchestrated by Ansible for hot fixes, maintenance, and new product versions alike. See how the new microservices and stateful servers are managed and monitored. In general, gain a better understanding of the components of the SAS Viya architecture, and how they can be collectively managed to keep your environment available, secure, and performant for the users and processes you support.

Read the paper (PDF)

The fourth maintenance release for SAS^® 9.4 and the new SAS^® Viya platform bring even more progress with respect to the interoperability between SAS^® and Hadoop the industry standard for big data. This talk brings you up-to-date with where we are: more distributions, more data types, more options and then there is the cloud. Come and learn about the exciting new developments for blending your SAS processing with your shared Hadoop cluster.

Read the paper (PDF)

The SAS^® platform with Unicode's UTF-8 encoding is ready to help you tackle the challenges of dealing with data in multiple languages. In today's global economy, software needs are changing. Companies are globalizing and consolidating systems from various parts of the world. Software must be ready to handle data from social media, international web pages, and databases that have characters in many different languages. SAS makes migrating your data to Unicode a snap! This paper helps you move smoothly from your legacy SAS environment to the powerful SAS Unicode environment with UTF-8 support. Along the way, you will uncover secrets to successfully manipulate your characters, so that all of your data remains intact.

Read the paper (PDF)

SAS^® Visual Analytics provides a complete platform for analytics visualization and exploration of the data. There are several interactive visualizations such as charts, histograms, heat maps, decision tree, and Sankey diagrams. A Sankey diagram helps in performing path analytics and offers a better understanding of complex data. It is a graphic illustration of flows from one set of values to another as a series of paths, where the width of each flow represents the quantity. It is a better and more efficient way to illustrate which flows represent advantages and which flows are responsible for the disadvantages or losses. Sankey diagrams are named after Matthew Henry Phineas Riall Sankey, who first used this in a publication on energy efficiency of a steam engine in 1898. This paper begins with information regarding the essentials or parts of Sankey: nodes, links, drop-off links, and path. Later, the paper explains the method for creating a meaningful visualization (with the help of examples) with a Sankey diagram by looking into the data roles and properties, describing ways to manage the path selection, exploring the transaction identifier values for a path selection, and using the spotlight tool to view multiple data tips in SAS Visual Analytics. Finally, the paper provides recommendation and tips to work effectively and efficiently with the Sankey diagram.

Read the paper (PDF)

With the proliferation of analytics expanding across every function of the enterprise, the need for broader access to data by experienced data scientists and non-technical users to produce reports and do discovery is growing exponentially. The unintended consequence of this trend is a bottleneck within IT to deliver the necessary data while still maintaining the necessary governance and data security standards required to safeguard this critical corporate asset. This presentation illustrates how organizations are solving this challenge and enabling users to both access larger quantities of existing data and add new data to their own models without negatively impacting the quality, security, or cost to store that data. It also highlights some of the cost and performance benefits achieved by enabling self-service data management.

Imagine if you will a program, a program that loves its data, a program that loves its data to be in the same directory as the program itself. Together, in the same directory. True love. The program loves its data so much, it just refers to it by filename. No need to say what directory the data is in; it is the same directory. Now imagine that program being thrust into the world of the server. The server knows not what directory this program resides in. The server is an uncaring, but powerful, soul. Yet, when the program is executing, and the program refers to the data just by filename, the server bellows nay, no path, no data. A knight in shining armor emerges, in the form of a SAS^® macro, who says lo, with the help of the SAS^® Enterprise Guide^® macro variable minions, I can gift you with the location of the program directory and send that with you to yon mighty server. And there was much rejoicing. Yay. This paper shows you a SAS macro that you can include in your SAS Enterprise Guide pre-code to automatically set your present working directory to the same directory where your program is saved on your UNIX or Linux operating system. This is applicable to submitting to any type of server, including a SAS Grid Server. It gives you the flexibility of moving your code and data to different locations without having to worry about modifying the code. It also helps save time by not specifying complete pathnames in your programs. And can't we all use a little more time?

Read the paper (PDF)

As a SAS^® administrator, have you ever wanted to look at the data in SAS^® Environment Manager spanning a longer length of time? Has your manager asked for access to the data so that they can use it to spot trends and make predictions? This paper shows you how to share that wealth of information found in the SAS Environment Manager log data. It explains how to save and store the data for use in SAS^® Visual Analytics. You will find tips on structuring the data for easy analysis and examples of using the data to make business decisions.

Read the paper (PDF)

If you are planning to deploy SAS^® Grid Manager or SAS^® Enterprise BI (or other distributed SAS^® Foundation applications) with load-balanced servers on multiple operating systems instances, a shared file system is required. In order to determine the best shared file system choice for a given deployment, it is important to understand how the file system is used, the SAS^® I/O workload characteristics performed on it, and the stressors that SAS Foundation applications produce on the file system. For the purposes of this paper, we use the term shared file system to mean both a clustered file system and shared file system, even though shared can denote a network file system and a distributed file system not clustered. This paper examines the shared file systems that are most commonly used with SAS and reviews their strengths and weaknesses.

Read the paper (PDF)

In 2012, US Customs scanned nearly 4% and physically inspected less than 1% of the 11.5 million cargo containers that entered the United States. Laundering money through trade is one of the three primary methods used by criminals and terrorists. The other two methods used to launder money are using financial institutions and physically moving money via cash couriers. The Financial Action Task Force (FATF) roughly defines trade-based money laundering (TBML) as disguising proceeds from criminal activity by moving value through the use of trade transactions in an attempt to legitimize their illicit origins. As compared to other methods, this method of money laundering receives far less attention than those that use financial institutions and couriers. As countries have budget shortfalls and realize the potential loss of revenue through fraudulent trade, they are becoming more interested in TBML. Like many problems, applying detection methods against relevant data can result in meaningful insights, and can result in the ability to investigate and bring to justice those perpetuating fraud. In this paper, we apply TBML red flag indicators, as defined by John A. Cassara, against shipping and trade data to detect and explore potentially suspicious transactions. (John A. Cassara is an expert in anti-money laundering and counter-terrorism, and author of the book Trade-Based Money Laundering. ) We use the latest detection tool in SAS^® Viya , along with SAS^® Visual Investigator.

View the e-poster or slides (PDF)

Web services are becoming more and more relied upon for serving up vast amounts of data. With such a heavy reliance on the web, and security threats increasing every day, security is a big concern. OAuth 2.0 has become a go-to way for websites to allow secure access to the services they provide. But with increased security, comes increased complexity. Accessing web services that use OAuth 2.0 is not entirely straightforward, and can cause a lot of users plenty of trouble. This paper helps clarify the basic uses of OAuth and shows how you can easily use Base SAS^® to access a few of the most popular web services out there.

Read the paper (PDF)

One often uses an iterative %DO loop to execute a section of a macro repetitively. An alternative method is to use the implicit loop in the DATA step with the EXECUTE routine to generate a series of macro calls. One of the advantages in the latter approach is eliminating the need of using indirect referencing. To better understand the use of the CALL EXECUTE routine, it is essential for programmers to understand the mechanism and the timing of macro processing to avoid programming errors. These technical issues are discussed in detail in this paper.

Read the paper (PDF)

Are you tired of constantly creating new emails each and every time you run a report, frantically searching for the reports, attaching said reports, and writing emails, all the while thinking there has to be a better way? Then, have I got some code to share with you! This session provides you with code to flee from your old ways of emailing data and reports. Instead, you set up your SAS^® code to send an email to your recipients. The email attaches the most current files each and every time the code is run. You do not have to do anything manually after you run your SAS code. This session provides SAS programmers with instructions about how to create their own email in a macro that is based on their current reports. We demonstrate different options to customize the code to add the email body (and to change the body) and to add attachments (such as PDF and Excel). We show you an additional macro that checks whether a file exists and adds a note in the SAS log if it is missing so that you won't get a warning message. Using SAS code, you will become more efficient and effective by automating a tedious process and reducing errors in email attachments, wording, and recipient lists.

Read the paper (PDF)

The syntax to combine SAS^® data sets is simple: use the SET statement to concatenate, and use the MERGE and BY statements to merge. The data sets themselves, however, might be complex. Combining the data sets might not result in what you need. This paper reviews techniques to perform before you combine data sets, including checking for the following: common variables; common variables with different attributes; duplicate identifiers; duplicate observations; and acceptable match rates.

Read the paper (PDF)

SAS^® 9.4 Graph Template Language: Reference has more than 1300 pages and hundreds of options and statements. It is no surprise that programmers sometimes experience unexpected twists and turns when using the graph template language (GTL) to draw figures. Understandably, it is easy to become frustrated when your program fails to produce the desired graphs despite your best effort. Although SAS needs to continue improving GTL, this paper offers several tricks that help overcome some of the roadblocks in graphing.

Read the paper (PDF)

Can you actually get something for nothing? With PROC SQL's subquery and remerging features, then yes, you can. When working with categorical variables, there is often a need to add flag variables based on group descriptive statistics, such as group counts and minimum and maximum values. Instead of first creating the group count or minimum or maximum values, and then merging the summarized data set with the original data set with conditional statements creating a flag variable, why not take advantage of PROC SQL to complete three steps in one? With PROC SQL's subquery, CASE-WHEN clause, and summary functions by the group variable, you can easily remerge the new flag variable back with the original data set.

Read the paper (PDF)

This presentation brings together experiences from SAS^® professionals working as volunteers for organizations, charities, and in academic research. Pro bono work, much like that done by physicians, attorneys, and professionals in other areas, is rapidly growing in statistical practice as an important part of a statistical career, offering the opportunity to use your skills in a places where they are so needed but cannot be supported in a for-pay position. Statistical volunteers also gain important learning experiences, mentoring, networking, and other opportunities for professional development. The presenter shares experiences from volunteering for local charities, non-governmental organizations (NGOs) and other organizations and causes, both in the US and around the world. The mission, methods, and focus of some organizations are presented, including DataKind, Statistics Without Borders, Peacework, and others.

Read the paper (PDF)

Have you ever run SAS^® code with a DATA step and the results were not what you expected? Tracking down the problem can be a time-consuming task. To assist you in this common scenario, SAS^® Enterprise Guide^® 7.13 and beyond has a DATA step debugger tool. The simple and interactive DATA step debugger enables you to visually walk through the execution of your DATA step program. You can control the DATA step execution, view the variables, and set breakpoints to quickly identify data and logic errors. Come see the full capabilities of the new SAS Enterprise Guide DATA step debugger. You'll be squashing bugs in no time!

Read the paper (PDF)

Has the rapid pace of SAS/STAT^® releases left you unaware of powerful enhancements that could make a difference in your work? Are you still using PROC REG rather than PROC GLMSELECT to build regression models? Do you understand how the GENMOD procedure compares with the newer GEE and HPGENSELECT procedures? Have you grasped the distinction between PROC PHREG and PROC ICPHREG? This paper will increase your awareness of modern alternatives to well-established tools in SAS/STAT by using succinct, high-level comparisons rather than detailed descriptions to explain the relative benefits of procedures and methods. The paper focuses on alternatives in the areas of regression modeling, mixed models, generalized linear models, and survival analysis. When you see the advantages of these newer tools, you will want to put them into practice. This paper points you to helpful resources for getting started.

Read the paper (PDF)

Dealing with analysts and managers who do not know how to or want to use SAS^® can be quite tricky if everything you are doing uses SAS. This is where stored processes using SAS^® Enterprise Guide^® comes in handy. Once you know what they want to get out of the code, prompts can be defined in a smart and flexible way to give all users (whether they are SAS or not) full control over the output of the code. The key is having code that requires minimal maintenance and for you to be very flexible so that you can accommodate anything that the user comes up with. This session provides examples of credit risk stress testing where loss forecasting results were presented using different levels. Results were driven by a stored process prompt using a simple DATA step, PROC SQL, and PROC REPORT. This functionality can be used in other industries where data is shown using different levels of granularity.

Read the paper (PDF)

There is an industry-wide push toward making workflows seamless and reproducible. Incorporating reproducibility into the workflow has many benefits; among them are increased transparency, time savings, and accuracy. We walk through how to seamlessly integrate SAS^®, LaTeX, and R into a single reproducible document. We also discuss best practices for general principles such as literate programming and version control.

Read the paper (PDF)

The SAS^® LOCK statement was introduced in SAS^®7 with great pomp and circumstance, as it enabled SAS^® software to lock data sets exclusively. In a multiuser or networked environment, an exclusive file lock prevents other users and processes from accessing and accidentally corrupting a data set while it is in use. Moreover, because file lock status can be tested programmatically with the LOCK statement return code (&SYSLCKRC), data set accessibility can be validated before attempted access, thus preventing file access collisions and facilitating more reliable, robust software. Notwithstanding the intent of the LOCK statement, stress testing demonstrated in this session illustrates vulnerabilities in the LOCK statement that render its use inadvisable due to its inability to lock data sets reliably outside of the SAS/SHARE^® environment. To overcome this limitation and enable reliable data set locking, a methodology is demonstrated that uses semaphores (flags) that indicate whether a data set is available or is in use, and mutually exclusive (mutex) semaphores that restrict data set access to a single process at one time. With Base SAS^® file locking capabilities now restored, this session further demonstrates control table locking to support process synchronization and parallel processing. The LOCKSAFE macro demonstrates a busy-waiting (or spinlock) design that tests data set availability repeatedly until file access is achieved or the process times out.

Read the paper (PDF)

In SAS^® Visual Analytics, we demonstrate a search functionality that enables users to filter a LASR table for records containing a search string. The search is performed on selected character fields that are defined for the table. The search string can be portions of words. Each additional string to search for narrows the search results.

Read the paper (PDF)

At the University of Central Florida (UCF), Student Development and Enrollment Services (SDES) combined efforts with Institutional Knowledge Management (IKM), which is the official source of data at UCF, to venture in a partnership to bring to life an electronic version of the SDES Dashboard at UCF. Previously, SDES invested over two months in a manual process to create a booklet with graphs and data that was not vetted by IKM; upon review, IKM detected many data errors plus inconsistencies in the figures that had been manually collected by multiple staff members over the years. The objective was to redesign this booklet using SAS^® Web Report Studio. The result was a collection of five major reports. IKM reports use SAS^® Business Intelligence (BI) tools to surface the official UCF data, which is provided to the State of Florida. Now it just takes less than an hour to refresh these reports for the next academic year cycle. Challenges in the design, implementation, usage, and performance are presented.

Read the paper (PDF)

As a SAS^® programmer, how often does it happen where you would like to submit some code but not wait around for it to finish? SAS^® Studio has a way to achieve this and much more! This paper covers how to submit and execute SAS code in the background using SAS Studio. Background submit in the SAS Studio interface allows you to submit code and continue with your work. You receive a notification when it is finished, or you can even disconnect from your browser session and check the status of the submitted code later. Or you can choose to use SAS Studio to submit your code without bringing up the SAS Studio interface at all. This paper also covers the ability to use a command-line executable program that uses SAS Studio to execute SAS code in the background and generate log and result files without having to create a new SAS Studio session. These techniques make it much easier to spin up long-running jobs, while still being able to get your other work done in the meantime.

Read the paper (PDF)

Did you know that you could leverage the statistical power of the FREQ procedure and still be able to control the appearance of your output? Many people think they have to use procedures such as REPORT and TABULATE to be able to apply style options and control formats and headings for their output. However, if you pair PROC FREQ with a TEMPLATE procedure step, you can customize the appearance of your output and make enhancements to tables, such as adding colors and controlling headings. If you are a statistician, you know the many PROC FREQ options that produce high-level statistics. But did you also know that PROC FREQ can generate a graphical representation of those statistics? PROC FREQ can generate the graphs, and then you can use ODS Graphics and the Graph Template Language (GTL) to improve the appearance of the graphs. Written for intermediate users, this paper demonstrates how you can enhance the default output for PROC FREQ one-way and multi-way tables by modifying colors, formats, and labels. This paper also describes the syntax for creating graphs for multiple statistics, and it uses examples to show how you can customize these graphs.

Read the paper (PDF)

In the game of tag, being it is bad, but where accessibility compliance is concerned, being tagged is good! Tagging is required for PDF files to comply with accessibility standards such as Section 508 and the Web Content Accessibility Guidelines (WCAG). In the fourth maintenance release for SAS^® 9.4, the preproduction option in the ODS PDF statement, ACCESSIBLE, creates a tagged PDF file. We look at how this option changes the file that is created and focus on the SAS^® programming techniques that work best with the new option. You ll then have the opportunity to try it yourself in your own code and provide feedback to SAS.

Read the paper (PDF)

In 32 years as a SAS^® consultant at the Federal Reserve Board, I have seen some questions about common SAS tasks surface again and again. This paper collects the most common questions related to basic DATA step processing from my previous 'Tales from the Help Desk' papers, and provides code to explain and resolve them. The following tasks are reviewed: using the LAG function with conditional statements; avoiding character variable truncation; surrounding a macro variable with quotes in SAS code; handling missing values (arithmetic calculations versus functions); incrementing a SAS date value with the INTNX function; converting a variable from character to numeric or vice versa and keeping the same name; converting character or numeric values to SAS date values; using an array definition in multiple DATA steps; using values of a variable in a data set throughout a DATA step by copying the values into a temporary array; and writing data to multiple external files in a DATA step, determining file names dynamically from data values. In the context of discussing these tasks, the paper provides details about SAS processing that can help users employ SAS more effectively. See the references for seven previous papers that contain additional common questions.

Read the paper (PDF)

SAS^® Macro Language can be used to enhance many report-generating processes. This presentation showcases the potential that macros have in populating predesigned RTF templates. If you have multiple report templates saved, SAS^® can choose and populate the correct ones based on macro programming and DATA _NULL_ using the TRANSTRN function. The autocall macro %TRIM, combined with a macro (for example, &TEMPLATE), can be attached to the output RTF template name. You can design and save as many templates as you like or need. When SAS assigns the macro variable TEMPLATE a value, the %TRIM(&TEMPLATE) statement in the output pathway correctly populates the appropriate template. This can make life easy if you create multiple different reports based on one data set. All that's required are stored templates on accessible pathways.

View the e-poster or slides (PDF)

Temporal text mining (TTM) is the discovery of temporal patterns in documents that are collected over time. It involves discovery of latent themes, construction of a thematic evolution graph, and analysis of thematic patterns. This paper uses text mining and time series analysis techniques to explore Don Quixote de la Mancha, a two-volume master work of Western literature. First, it uses singular value decomposition in SAS^® Text Miner to discover 25 key themes that characterize the two volumes. Then it treats the chapters of the two books as time-ordered documents and creates a semiautomated visual summary of the two volumes. It also explores the trajectory of individual themes over the course of the chapters and identifies episodes, recurring themes, and climaxes. Finally, it uses time series clustering in SAS^® Enterprise Miner to group chapters that have similar themes and to group themes that have similar trajectories. The TTM methods demonstrated in this paper lend themselves to business applications such as monitoring changes in customer sentiment and summarizing research and legislative trends.

Read the paper (PDF)

This paper discusses a set of practical recommendations for optimizing the performance and scalability of your Hadoop system using SAS^®. Topics include recommendations gleaned from actual deployments from a variety of implementations and distributions. Techniques cover tips for improving performance and working with complex Hadoop technologies such as Kerberos, techniques for improving efficiency when working with data, methods to better leverage the SAS in Hadoop components, and other recommendations. With this information, you can unlock the power of SAS in your Hadoop system.

Read the paper (PDF)

Testing is a weak spot in many data warehouse environments. A lot of the testing is focused on the correct implementation of requirements. But due to the complex nature of analytics environments, a change in a data integration process can lead to unexpected results in totally different and untouched areas. We developed a method to identify unexpected changes often and early by doing a nightly regression test. The test does a full ETL run, compares all output from the test to a baseline, and reports all the changes. This paper describes the process and the SAS^® code needed to back up existing data, trigger ETL flows, compare results, and restore situations after a nightly regression test. We also discuss the challenges we experienced while implementing the nightly regression test framework.

Read the paper (PDF)

SAS offers generation data set structure as part of the language feature that many users are familiar with. They use it in their organizations and manage it using keywords such as GENMAX and GENNUM. While SAS operates in a mainframe environment, users also have the ability to tap into the GDG (generation data group) feature available on z/OS, OS/390, OS/370, IBM 3070, or IBM 3090 machines. With cost-saving initiatives across businesses and due to some scaling factors, many organizations are in the process of migrating to mid-tier platforms to cheaper operating platforms such as UNIX and AIX. Because Linux is open source and is a cheaper alternative, several organizations have opted for the UNIX distribution of SAS that can work in UNIX and AIX environments. While this might be a viable alternative, there are certain nuances that the migration effort brings to the technical conversion teams. On UNIX, the concept of GDGs does not exist. While SAS offers generation data sets, they are good only for SAS data sets. If the business organization needs to house and operate with a GDG-like structure for text data sets, there isn't one available. While my organization had a similar initiative to migrate programs used to run the subprime mortgage analytic, incentive, and regulatory reporting, we identified the paucity of literature and research on this topic. Hence, I ended up developing the utility that addresses this need. This is a simple macro that helps us closely simulate a GDG/GDS.

Read the paper (PDF) | View the e-poster or slides (PDF)

SAS^® Cloud Analytic Services (CAS) is the cloud-based run-time environment for data management and analytics in SAS^®. By run-time environment, we refer to the combination of hardware and software where data management and analytics take place. In a sense, CAS is just another SAS platform to do things. CAS is a platform for high-performance analytics and distributed computing. The CAS server provides data management and an analytics framework that can run in the cloud, that can act as a cloud, and that provides the best-in-class analytics that SAS is known for. This new architecture functions as a public API, allowing access from many different clients such as Lua, Python, Java, REST, and yes, even SAS. The CAS server is designed to provide user-level sessions, to share data between sessions, and to provide fault tolerance, which allows a worker node to crash without losing data and allows the user action to continue running to completion. The isolation provided to each session allows one session to crash without affecting other sessions. The concept of 'always in memory' in CAS means that an action is not aware of what the server does to allow the action to access the data. The entire file might be in memory or just pieces of the file might be mapped into memory, just in time for the action to access the data. This allows CAS tables to be loaded that are larger than the memory available across the grid. Hadoop can be used to provide data redundancy. The server is elastic and can add or remove nodes as needed. Users can specify how many nodes they want their session to use, so that the session fits their needs.

Read the paper (PDF)

SAS^® provides an extensive set of graphs for different needs. But as a SAS programmer or someone who uses SAS^® Visual Analytics Designer to create reports, the number of possible scenarios you have to address outnumber the available graphs. This paper demonstrates how to create your own advanced graphs by intelligently combining existing graphs. This presentation explains how you can create the following types of graphs by combining existing graphs: a line-based graph that shows a line for each group such that each line is partly solid and partly dashed to show the actual and predicted values respectively; a control chart (which is currently not available as a standard graph) that lets you show how values change within multiple upper and lower limits; a line-based graph that gives you more control over attributes (color, symbol, and so on) of specific markers to depict special conditions; a visualization that shows the user only a part of the data at a given instant, and lets him move a window to see other parts of the data; a chart that lets the user compare the data values to a specific value in a visual way to make the comparison more intuitive; and a visualization that shows the overall data and at the same time shows the detailed data for the selected category. This paper demonstrates how to use the technique of combining graphs to create such advanced charts in SAS^® Visual Analytics and SAS^® Graph Builder as well as by using SAS procedures like the SGRENDER procedure.

Read the paper (PDF)

You've all seen the job posting that looks more like an advertisement for the ever-elusive unicorn. It begins by outlining the required skills that include a mixture of tools, technologies, and masterful things that you should be able to do. Unfortunately, many such postings begin with restrictions to those with advanced degrees in math, science, statistics, or computer science and experience in your specific industry. They must be able to perform predictive modeling, natural language processing, and, for good measure, candidates should apply only if they know artificial intelligence, cognitive computing, and machine learning. The candidate should be proficient in SAS^®, R, Python, Hadoop, ETL, real-time, in-cloud, in-memory, in-database and must be a master storyteller. I know of no one who would be able to fit that description and still be able to hold a normal conversation with another human. In our work, we have developed a competency model for analytics, which describes nine performance domains that encompass the knowledge, skills, behaviors, and dispositions that today's analytic professional should possess in support of a learning, analytically driven organization. In this paper, we describe the model and provide specific examples of job families and career paths that can be followed based on the domains that best fit your skills and interests. We also share with participants a self-assessment tool so that they can see where the stack up!

Read the paper (PDF)

As computer technology advances, SAS^® continually pursues opportunities to implement state-of-the-art systems that solve problems in data preparation and analysis faster and more efficiently. In this pursuit, we have extended the TRANSPOSE procedure to operate in a distributed fashion within both Teradata and Hadoop, using dynamically generated DS2 executed by the SAS^® Embedded Process and within SAS^® Viya , using its native transpose action. With its new ability to work within these environments, PROC TRANSPOSE provides you with access to its parallel processing power and produces results that are compatible with your existing SAS programs.

Read the paper (PDF)

Have you ever had your macro code not work and you couldn't figure out why? Maybe even something as simple as %if &sysscp=WIN %then LIBNAME libref 'c:\temp'; ? This paper is designed for programmers who know %LET and can write basic macro definitions already. Now let's take your macro skills a step farther by adding to your skill set. The %IF statement can be a deceptively tricky statement due to how IF statements are processed in a DATA step and how that differs from how %IF statements are processed by the macro processor. Focus areas of this paper are: 1) emphasizing the importance of the macro facility as a code-generation facility; 2) how an IF statement in a DATA step differs from a macro %IF statement and when to use which; 3) why semicolons can be misinterpreted in an %IF statement.

Read the paper (PDF)

JSON is quickly becoming the industry standard for data interchanges, especially in supporting REST APIs. But until now, importing JSON content into SAS^® software and leveraging it in SAS has required significant custom code. Developing that code can be laborious, requiring transcoding, manual text parsing, and creating handlers for unexpected structure changes. Fortunately, the new JSON LIBNAME engine (in the fourth maintenance release for SAS^® 9.4 and later) delivers a robust, efficient method for importing JSON content into SAS data structures. This paper demonstrates several real-world examples of the JSON LIBNAME using open data APIs. The first example contrasts the traditional custom code and JSON LIBNAME approach using big data from the United Nations Comtrade Database. The two approaches are compared in terms of complexity of code, time to execute, and the resulting data structures. The same method is applied to data from Google and the US Census Bureau's APIs. Finally, to demonstrate the ability of the JSON LIBNAME to handle unexpected changes to a JSON data structure, we use the SAS JSON procedure to write a JSON file and then simulate changes to that structure to show how one JSON LIBNAME process can easily adjust the import to handle those changes.

Read the paper (PDF)

You might scream in pain or cry with joy that SAS^® software can directly produce output in Microsoft Excel as .xlsx workbooks. Excel is an excellent vehicle for delivering large amounts of summary information that needs to be partitioned for human review, exploratory filtering, and sorting. SAS supports ODS EXCEL as a production destination. This paper discusses using the ODS EXCEL statement and the TABULATE and REPORT procedures in the domain of summarizing cross-sectional data extracted from a medical claims database. The discussion covers data preparation, report preparation, and tabulation statements such as CLASS, CLASSLEV, and TABLE. The effects of STYLE options and the TAGATTR suboption for inserting features that are specific to Excel such as formulas, formats, and alignment are covered in detail. A short discussion of reusing these concepts in PROC REPORT statements such as DEFINE, COMPUTE, and CALL DEFINE are also covered.

Read the paper (PDF)

Does your job require you to create reports in Microsoft Excel on a quarterly, monthly, or even weekly basis? Are you creating all or part of these reports by hand, referencing another sheet containing rows and rows and rows of data? If so, stop! There is a better way! The new ODS destination for Excel enables you to create native Excel files directly from SAS^®. Now you can include just the data you need, create great-looking tabular output, and do it all in a fraction of the time! This paper shows you how to use the REPORT procedure to create polished tables that contain formulas, colored cells, and other customized formatting. Also presented in the paper are the destination options used to create various workbook structures, such as multiple tables per worksheet. Using these techniques to automate the creation of your Excel reports will save you hours of time and frustration, enabling you to pursue other endeavors.

Read the paper (PDF)

You might encounter people who used SAS^® long ago (perhaps in university), or people who had a very limited use of SAS in a job. Some of these people with limited knowledge and experience think that SAS is just a statistics package or just a GUI. Those that think of it as a GUI usually reference SAS^® Enterprise Guide^® or, if it was a really long time ago, SAS/AF^® or SAS/FSP^®. The reality is that the modern SAS system is a very large and complex ecosystem, with hundreds of software products and diversified tools for programmers and users. This poster provides diagrams and tables that illustrate the complexity of the SAS system from the perspective of a programmer. Diagrams and illustrations include the functional scope and operating systems in the ecosystem; different environments that program code can run in; cross-environment interactions and related tools; SAS^® Grid Computing: parallel processing; how SAS can run with files in memory (the legacy SAFILE statement and big data and Hadoop); and how some code can run in-database. We end with a tabulation of the many programming languages and SQL dialects that are directly or indirectly supported within SAS. This poster should enlighten those who think that SAS is an old, dated statistics package or just a GUI.

View the e-poster or slides (PDF)

As a SAS^® Visual Analytics administrator, how do you efficiently manage your SAS^® LASR environment? How do you ensure reliable data availability to your end users? How do you ensure that your users have the proper permissions to perform their tasks in SAS Visual Analytics? This paper covers some common management issues in SAS Visual Analytics, why and how they might arise, and how to resolve them. It discusses methods of programmatically managing your SAS^® LASR Analytic Server and tables, as well as using SAS^® Visual Analytics Administrator. Furthermore, it provides a better understanding of the roles in SAS Visual Analytics and demonstrates how to set up appropriate user permissions. Using the methods discussed in this paper can help you improve the end-user experience as well as system performance.

Read the paper (PDF)

SAS^® environments are evolving in multiple directions. Modern web interfaces such as SAS^® Studio are replacing the traditional SAS^® Display Manager system. At the same time, distributed analytic computing, centrally managed by SAS^® Grid Manager, is becoming the standard topology for many enterprises. SAS administrators are faced with the task of providing business users properly configured, tuned, and monitored applications. The tips included in this paper provide SAS administrators with best practices to centrally manage SAS Studio options and repositories, proper grid tuning, effective monitoring of user sessions, high-availability considerations and more.

Read the paper (PDF)

The advent of robust and thorough data collection has resulted in the term big data. With Census data becoming richer, more nationally representative, and voluminous, we need methodologies that are designed to handle the manifold survey designs that Census data sets implement. The relatively nascent PROC SURVEYLOGISTIC, an experimental procedure in SAS^®9 and fully supported in SAS 9.1, addresses some of these methodologies, including clusters, strata, and replicate weights. PROC SURVEYLOGISTIC handles data that is not a straightforward random sample. Using Census data sets, this paper provides examples highlighting the appropriate use of survey weights to calculate various estimates, as well as the calculation and interpretation of odds ratios between categorical variable interactions when predicting a binary outcome.

Read the paper (PDF)

Time series analysis and forecasting have always been popular as businesses realize the power and impact they can have. Getting students to learn effective and correct ways to build their models is key to having successful analyses as more graduates move into the business world. Using SAS^® University Edition is a great way for students to learn analysis, and this talk focuses on the time series tasks. A brief introduction to time series is provided, as well as other important topics that are key to building strong models.

Read the paper (PDF)

Many organizations need to analyze large numbers of time series that have time-varying or frequency-varying properties (or both). The time-varying properties can include time-varying trends, and the frequency-varying properties can include time-varying periodic cycles. Time-frequency analysis simultaneously analyzes both time and frequency; it is particularly useful for monitoring time series that contain several signals of differing frequency. These signals are commonplace in data that are associated with the internet of things. This paper introduces techniques for large-scale time-frequency analysis and uses SAS^® Forecast Server and SAS/ETS^® software to demonstrate these techniques.

Read the paper (PDF)

Predictive analytics is powerful in its ability to predict likely future behavior, and is widely used in marketing. However, the old adage timing is everything continues to hold true the right customer and the right offer at the wrong time is less than optimal. In life, timing matters a great deal, but predictive analytics seldom takes timing into account explicitly. We should be able to do better. Financial service consumption changes often have precursor changes in behavior, and a behavior change can lead to multiple subsequent consumption changes. One way to improve our awareness of the customer situation is to detect significant events that have meaningful consequences that warrant proactive outreach. This session presents a simple time series approach to event detection that has proven to work successfully. Real case studies are discussed to illustrate the approach and implementation. Adoption of this practice can augment and enhance predictive analytics practice to elevate our game to the next level.

Read the paper (PDF)

Using SAS^® to query relational databases can be challenging, even for seasoned SAS programmers. SAS/ACCESS^® software makes it easy to directly access data on nearly any platform, but there is a lot of under-the-hood functionality that takes time to learn. Here are tips that will get you on your way fast, including understanding and mastering SQL pass-through; efficiently bulk-loading data from SAS into other databases; tuning your SQL queries; and when to use native database versus SAS functionality.

View the e-poster or slides (PDF)

Trends in predictive modeling, specifically in machine learning, are moving toward automated approaches where well-known predictive algorithms are applied and the best one is chosen according to some evaluation metric. This approach's efficacy relies on the underlying data preparation in general, and on data preprocessing in particular. Commonly used data preprocessing techniques include missing value imputation, outlier detection and treatment, functional transformation, discretization, and nominal grouping. Excluding toy problems, the composition of the best data preprocessing step depends on the modeling task at hand. This necessitates an iterative generation and evaluation of predictive pipelines, which consist of a mix of data preprocessing techniques and predictive algorithms. This is a combinatorial problem that can be a bottleneck in the analytics workflow. In this paper, we discuss the SAS^® Cloud Analytic Services (CAS) actions in SAS^® Viya that can be used to effect this end-to-end predictive pipeline in a scalable way, with special emphasis on CAS actions for data exploration, preprocessing, and feature transformation. In addition, we discuss how the whole process can be automated.

Knowing which SAS^® products are being used in your organization, by whom, and how often helps you decide whether you have the right mix and quantity licensed. These questions are not easy to answer. We present an innovative technique using three SAS utilities to answer these questions. This paper includes example code written for Linux that can easily be modified for Windows and other operating systems.

Read the paper (PDF) | View the e-poster or slides (PDF)

Transport Layer Security (TLS) configuration for SAS^® components is essential to protect data in motion. All necessary encryption arrangement is established through a TLS handshake between the client and the server side. Many SAS^® 9.4 and SAS^® Viya components can be a client side, a server side, or both. SAS documentation primarily provides how-to steps for the configuration. This paper examines the X.509 certificate and the TLS handshake protocol, which are the basic building blocks of the secure communication. The paper focuses on the logic behind the setup and how various types of certificates are used in the configuration. Many unique client and server combinations of SAS components are illustrated and explained with the best-practice suggestions.

Read the paper (PDF)

We are always looking for ways to improve the performance, efficiency, and availability of our investment in SAS^® solutions. To address those needs, SAS offers the ability to cluster many of its constituent software components. A cluster is a set of systems that work together with the goal of providing a single service. This session identifies 12 different technologies to create clusters of SAS software components and describes how they are designed to boost the capabilities of SAS to function in the enterprise.

Read the paper (PDF)

SAS^® Embedded Process enables user-written DS2 code and scoring models to run inside Hadoop. It taps into the massively parallel processing (MPP) architecture of Hadoop for scalable performance. SAS Embedded Process explores and complies with many Hadoop components. This paper explains how SAS Embedded Process interacts with existing Hadoop security technologies, such as Apache Sentry and RecordServices.

Read the paper (PDF)

Within a SOX-compliant environment, a batch job is run. During the process, an FTP server needs to be accessed. The batch user password is not known and the FTP credentials are not known either. How safely and securely can we achieve this? The approach is to have an authentication domain within the SAS metadata created that has the FTP credentials. Create an internal SAS user within the SAS metadata. This user exists only within the SAS metadata, so it does not pose any risk. Create an FTP server within the SAS metadata. Add and link everything together within the SAS metadata. Within the SAS batch job, the SAS internal user will be used (with the use of the hashed password) to connect to the metadata to get the FTP credentials stored within the authentication domain and retrieve or upload the data.

Read the paper (PDF)

Machine learning is not just for data scientists. Business analysts can use machine learning to discover rules from historical decision data or from historical performance data. Decision tree learning and logistic regression scorecard learning are available for standard data tables, and Associations Analysis is available for transactional event tables. These rules can be edited and optimized for changing business conditions and policies, and then deployed into automated decision-making systems. Users will see demonstrations using real data and will learn how to apply machine learning to business problems.

Read the paper (PDF)

The traditional model of SAS^® source-code production is for all code to be directly written by users or indirectly written (that is, generated by user-written macros, Lua code, or with DATA steps). This model was recently extended to enable SAS macro code to operate on arbitrary text (for example, on HTML) using the STREAM procedure. In contrast, SAS includes many products that operate in the client/server environment and function as follows: 1) the user interacts with the product via a GUI to specify the processing desired; 2) the product saves the user-specifications in metadata and generates SAS source code for the target processing; 3) the source code is then run (per user directions) to perform the processing. Many of these products give users the ability to modify the generated code and/or insert their own user-written code. Also, the target code (system-generated plus optional user-written) can be exported or deployed to be run as a stored process, in batch, or in another SAS environment. In this paper, we review the SAS ecosystem contexts where source code is produced, the pros and cons of each approach, discuss why some system-generated code is inelegant, and make some suggestions for determining when to write the code manually, and when and how to use system-generated code.

Read the paper (PDF)

Visualizing the movement of people over time in an animation can provide insights that tables and static graphs cannot. There are many options, but what if you want to base the visualization on large amounts of data from several sources? SAS^® is a great tool for this type of project. This paper summarizes how visualizing movement is accomplished using several data sets, large and small, and using various SAS procedures to pull it together. The use of a custom shape file is also highlighted. The end result is a GIF, which can be shared, that provides insights not available with other methods.

Read the paper (PDF)

Hash tables are powerful tools when building an electronic code book, which often requires a lot of match-merging between the SAS^® data sets. In projects that span multiple years (e.g., longitudinal studies), there are usually thousands of new variables introduced at the end of every year or at the end of each phase of the project. These variables usually have the same stem or core as the previous year's variables. However, they differ only in a digit or two that usually signifies the year number of the project. So, every year, there is this extensive task of comparing thousands of new variables to older variables for the sake of carrying forward key database elements corresponding to the previously defined variables. These elements can include the length of the variable, data type, format, discrete or continuous flag, and so on. In our SAS program, hash objects are efficiently used to cut down not only time, but also the number of DATA and PROC steps used to accomplish the task. Clean and lean code is much easier to understand. A macro is used to create the data set containing new and older variables. For a specific new variable, the FIND method in hash objects is used in a loop to find the match to the most recent older variable. What was taking around a dozen PROC SQL steps is now a single DATA step using hash tables.

Read the paper (PDF)

If you run SAS^® and Teradata software with default application and database client encodings, some operations with international character sets will appear to work because you are actually treating character strings as streams of bytes instead of streams of characters. As long as no one in the chain of custody tries to interpret the data as anything other than a stream of bytes, then data can sometimes flow in and out of a database without being altered, giving the appearance of correct operation. But when you need to compare a particular character to an international character, or when your data approaches the maximum size of a character field, then you will run into trouble. To correctly handle international character sets, every layer of software that touches the data needs to agree on the encoding of the data. UTF-8 encoding is a flexible way to handle many single-byte and multi-byte international character sets. To use UTF-8 encoding with international character sets, we need to configure the SAS session encoding, Teradata client encoding, and Teradata server encoding to all agree, so that they are handling UTF-8 encoded data. This paper shows you how to configure SAS and Teradata so that your applications run successfully with international characters.

Read the paper (PDF)

Do you have a complex report involving multiple tables, text items, and graphics that could best be displayed in a multi-tabbed spreadsheet format? The Output Delivery System (ODS) destination for Excel, introduced in SAS^® 9.4, enables you to create Microsoft Excel workbooks that easily integrate graphics, text, and tables, including column labels, filters, and formatted data values. In this paper, we examine the syntax used to generate a multi-tabbed Excel report that incorporates output from the REPORT, PRINT, SGPLOT, and SGPANEL procedures.

Read the paper (PDF)

With SAS^® Viya and SAS^® Cloud Analytic Services (CAS), SAS is moving into a new territory where SAS^® Analytics is accessible to popular scripting languages using open APIs. Python is one of those client languages. We demonstrate how to connect to CAS, run CAS actions, explore data, build analytical models, and then manipulate and visualize the results using standard Python packages such as Pandas and Matplotlib. We cover a wide variety of topics to give you a bird's eye view of what is possible when you combine the best of SAS with the best of open source.

Read the paper (PDF)

Are you frustrated with manually setting options to control your SAS^® Display Manager sessions but become daunted every time you look at all the places you can set options and window layouts? In this paper, we look at various files SAS^® accesses when starting, what can (and cannot) go into them, and what takes precedence after all are executed. We also look at the SAS registry and how to programmatically change settings. By the end of the paper, you will be comfortable in knowing where to make the changes that best fit your needs.

Read the paper (PDF)

This paper discusses the techniques I used at the US Census Bureau to overcome the issue of dealing with large amounts of data while modernizing some of their public-facing web applications by using service-oriented architecture (SOA) to deploy JavaScript web applications powered by SAS^®. The paper covers techniques that resulted in reducing 1,753,926 records (82 MB) down to 58 records (328 KB), a 99.6% size reduction in summarized data on the server side.

Read the paper (PDF)

Your SAS^® Visual Analytics users begin to create and share reports. As an administrator, you want to track performance of the reports over time, analyzing timing metrics for key tasks such as data query and rendering, relative to total user workload for the system. Logging levels can be set for the SAS Visual Analytics reporting services that provide timing metrics for each report execution. The log files can then be mined to create a data source for a time series plot in SAS Visual Analytics. You see report performance over time with peak workloads and how this impacts the user experience. Isolation on key metrics can identify performance bottlenecks for improvement. First we look at how logging levels are modified for the reporting services and focus on tracking a single user viewing a report. Next, we extract data from a long running log file to create a report performance data source. Using SAS Visual Analytics, we analyze the data with a time series plot, looking at times of peak work load and how the user experience changes.

Read the paper (PDF)

Chemical incidents involving irritant chemicals such as chlorine pose a significant threat to life and require rapid assessment. Data from the Validating Triage for Chemical Mass Casualty Incidents A First Step R01 grant was used to determine the most predictive signs and symptoms (S/S) for a chlorine mass casualty incident. SAS^® 9.4 was used to estimate sensitivity, specificity, positive and negative predictive values, and other statistics of irritant gas syndrome agent S/S for two exiting systems designed to assist emergency responders in hazardous material incidents (Wireless Information System for Emergency Responders (WISER) and CHEMM Intelligent Syndrome Tool (CHEMM-IST)). The results for WISER showed the sensitivity was .72 to 1.0; specificity .25 to .47; and the positive predictive value and negative predictive value were .04 to .87 and .33 to 1.0, respectively. The results for CHEMM-IST showed the sensitivity was .84 to .97; specificity .29 to .45; and the positive predictive value and negative predictive value were .18 to .42 and .86 to .97, respectively.

Read the paper (PDF) | View the e-poster or slides (PDF)

What will your customer do next? Customers behave differently; they are not all average. Segmenting your customers into different groups enables you to build more powerful and meaningful predictive models. You can use SAS^® Visual Analytics to instantaneously visualize and build your segments identified by a decision tree or cluster analysis with respect to customer attributes. Then, you can save the cluster/segment membership, and use that as a separate predictor or as a group variable for building stratified predictive models. Dividing your customer population into segments is useful because what drives one group of people to exhibit a behavior can be quite different from what drives another group. By analyzing the segments separately, you are able to reduce the overall error variance or noise in the models. As a result, you improve the overall performance of the predictive models. This paper covers the building and use of segmentation in predictive models and demonstrates how SAS Visual Analytics, with its point-and-click functionality and in-memory capability, can be used for an easy and comprehensive understanding of your customers, as well as predicting what they are likely to do next.

Read the paper (PDF)

Using shared accounts to access third-party database servers is a common architecture in SAS^® environments. SAS software can support seamless user access to shared accounts in databases such as Oracle and MySQL, via group definitions and outbound authentication domains in metadata. However, the configurations necessary to leverage shared accounts in Kerberized Hadoop clusters are more complicated. Kerberos tickets must often be generated and maintained in order to simply access the Hadoop environment, and those tickets must allow access as the shared account instead of as an individual user's account. In all cases, key prerequisites and configurations must be put into place in order for seamless Hadoop access to function with the shared account. Methods for implementing these arrangements in SAS environments can be non-intuitive. This paper starts by outlining general architectures of shared accounts in third-party database environments. It then presents several methods of managing remote access to shared accounts in Kerberized Hadoop environments using SAS, including specific implementation details, code samples, and security implications.

Read the paper (PDF)

Panel data, which are collected on a set (panel) of individuals over several time points, are ubiquitous in economics and other analytic fields because their structure allows for individuals to act as their own control groups. The PANEL procedure in SAS/ETS^® software models panel data that have a continuous response, and it provides many options for estimating regression coefficients and their standard errors. Some of the available estimation methods enable you to estimate a dynamic model by using a lagged dependent variable as a regressor, thus capturing the autoregressive nature of the underlying process. Including lagged dependent variables introduces correlation between the regressors and the residual error, which necessitates using instrumental variables. This paper guides you through the process of using the typical estimation method for this situation-the generalized method of moments (GMM)-and the process of selecting the optimal set of instrumental variables for your model. Your goal is to achieve unbiased, consistent, and efficient parameter estimates that best represent the dynamic nature of the model.

Read the paper (PDF)

In many healthcare settings, patients are like customers they have a choice. One example is whether to participate in a procedure. In population-based screening in which the goal is to reduce deaths, the success of a program hinges on the patient's choice to accept and comply with the procedure. Like in many other industries, this not only relies on the program to attract new eligible patients to attend for the first time, but it also relies on the ability of the program to retain existing customers. The success of a new customer retention strategy within a breast screening environment is examined by applying a population averaged model (also know as marginal models), which uses generalized estimating equations (GEEs) to account for the lack of independence of the observations. Arguments for why a population average model was applied instead of a mixed effects model (or random effects model) are provided. This business case provides a great introductory session for people to better understand the difference between mixed effects and marginal models, and illustrates how to implement a population average model within SAS^® by using the GENMOD procedure.

Read the paper (PDF)

It is often necessary to assess multi-rater agreement for multiple-observation categories in case-controlled studies. The Kappa statistic is one of the most common agreement measures for categorical data. The purpose of this paper is to show an approach for using SAS^® 9.4 procedures and the SAS^® Macro Language to estimate Kappa with 95% CI for pairs of nurses that used two different triage systems during a computer-simulated chemical mass casualty incident (MCI). Data from the Validating Triage for Chemical Mass Casualty Incidents A First Step R01 grant was used to assess the performance of a typical hospital triage system called the Emergency Severity Index (ESI), compared with an Irritant Gas Syndrome Agent (IGSA) triage algorithm being developed from this grant, to quickly prioritize the treatment of victims of IGSA incidents. Six different pairs of nurses used ESI triage, and seven pairs of nurses used the IGSA triage prototype to assess 25 patients exposed to an IGSA and 25 patients not exposed. Of the 13 pairs of nurses in this study, two pairs were randomly selected to illustrate the use of the SAS Macro Language for this paper. If the data was not square for two nurses, a square-form table for observers using pseudo-observations was created. A weight of 1 for real observations and a weight of .0000000001 for pseudo-observations were assigned. Several macros were used to reduce programming. In this paper, we show only the results of one pair of nurses for ESI.

Read the paper (PDF) | View the e-poster or slides (PDF)

The challenge is to assign outbound calling agents in a telemarketing campaign to geographic districts. The districts have a variable number of leads, and each agent needs to be assigned entire districts with the total number of leads being as close as possible to a specified number for each of the agents (usually, but not always, an equal number). In addition, there are constraints concerning the distribution of assigned districts across time zones, in order to maximize productivity and availability. The SAS/OR^® CLP procedure solves the problem by formulating the challenge as a constraint satisfaction problem (CSP). Our use of PROC CLP places the actual leads within a specified percentage of the target number.

Read the paper (PDF)

The ODS EXCEL destination has made sharing SAS^® reports and graphs much easier. What is even more exciting is that this destination is available for use regardless of the platform. This is extremely useful when reporting is performed on remote servers. This presentation goes through the basics of using the ODS EXCEL destination and shows specific examples of how to use this in a remote environment. Examples for both SAS^® on Windows and in SAS^® Enterprise Guide^® are provided.

Read the paper (PDF)

Students now have access to a SAS^® learning tool called SAS^® University Edition. This online tool is freely available to all, for non-commercial use. This means it is basically a free version of SAS that can be used to teach yourself or someone else how to use SAS. Since a large part of my body of writings has focused upon moving data between SAS and Microsoft Excel, I thought I would take some time to highlight the tasks that permit movement of data between SAS and Excel using SAS University Edition. This paper is directed toward sending graphs to Excel using the new ODS EXCEL destination.

Read the paper (PDF)

Graphs are mathematical structures capable of representing networks of objects and their relationships. Clustering is an area in graph theory where objects are split into groups based on their connections. Depending on the application domain, object clusters have various meanings (for example, in market basket analysis, clusters are families of products that are frequently purchased together). This paper provides a SAS^® macro featuring PROC OPTGRAPH, which enables the transformation of transactional data, or any data with a many-to-many relationship between two entities, into graph data, allowing for the generation and application of the co-occurrence graph and the probability graph.

Read the paper (PDF)

JMP^® integrates very nicely with SAS^® software, so you can do some pretty amazing things by combining the power of JMP and SAS. You can submit some code to run something on a SAS server and bring the results back as a JMP table. Then you can do lots of things with the JMP table to analyze the data returned. This workshop shows you how to access data via SAS servers, run SAS code and bring data back to JMP, and use JMP to do many things very quickly and easily. Explore the synergies between these tools; having both is a powerful combination that far outstrips just having one, or not using them together.

Read the paper (PDF) | Download the data file (ZIP)

More than ever, customers are demanding consistent and relevant interaction across all channels. Businesses are having to develop omnichannel marketing capabilities to please these customers. Implementing omnichannel marketing is often difficult, especially when using digital channels. Most products designed solely for digital channels lack capabilities to integrate with traditional channels that have on-premises processes and data. SAS^® Customer Intelligence 360 is a new offering that enables businesses to leverage both cloud and on-premises channels and data. This is possible due to the solution's hybrid cloud architecture. This paper discusses the SAS Customer Intelligence 360 approach to the hybrid cloud, and covers key capabilities on security, throughput, and integration.

Read the paper (PDF)

SAS^® Theme Designer provides a rich set of colors and graphs that enables customers to create a custom application and report themes. Users can also preview their work within SAS^® Visual Analytics. The features of SAS Theme Designer enable the user to bring a new look and feel to their entire application and to their reports. Users can customize their reports to use a unique theme across the organization, yet they have the ability to customize these reports based on their individual business requirements. Providing this capability involves meeting the customers demands from the theming perspectives of customization, branding, and logo, and making them seamless within their application. This paper walks users through the process of using SAS Theme Designer in SAS Visual Analytics. It further highlights the following features of SAS Theme Designer: creating and modifying application and report themes, previewing output in SAS Visual Analytics, and importing and exporting themes for reuse.

Read the paper (PDF)

Weight of evidence (WOE) coding of a nominal or discrete variable X is widely used when preparing predictors for usage in binary logistic regression models. The concept of WOE is extended to ordinal logistic regression for the case of the cumulative logit model. If the target (dependent) variable has L levels, then L-1 WOE variables are needed to recode X. The appropriate setting for implementing WOE coding is the cumulative logit model with partial proportionate odds. As in the binary case, it is important to bin X to achieve parsimony before the WOE coding. SAS^® code to perform this binning is discussed. An example is given that shows the implementation of WOE coding and binning for a cumulative logit model with the target variable having three levels.

Read the paper (PDF)

In the last few years, machine learning and statistical learning methods have gained increasing popularity among data scientists and analysts. Statisticians have sometimes been reluctant to embrace these methodologies, partly due to a lack of familiarity, and partly due to concerns with interpretability and usability. In fact, statisticians have a lot to gain by using these modern, highly computational tools. For certain types of problems, machine learning methods can be much more accurate predictors than traditional methods for regression and classification, and some of these methods are particularly well suited for the analysis of big and wide data. Many of these methods have origins in statistics or at the boundary of statistics and computer science, and some are already well established in statistical procedures, including LASSO and elastic net for model selection and cross validation for model evaluation. In this talk, I go through some examples illustrating the application of machine learning methods and interpretation of their results, and show how these methods are similar to, and differ from, traditional statistical methodology for the same problems.

Read the paper (PDF)

It has become a need-it-now world, and many managers and decision-makers need their reports and information quicker than ever before to compete. As SAS^® developers, we need to acknowledge this fact and write code that gets us the results we need in seconds or minutes, rather than in hours. SAS is a great tool for extracting, transferring, and loading data, but as with any tool, it is most efficient when used in the most appropriate way. Using the SQL pass-through techniques presented in this paper can reduce run time by up to 90% by passing the processing to the database instead of moving the data back to SAS to be consumed. You can reap these benefits with only a minor increase in coding difficulty.

Read the paper (PDF) | View the e-poster or slides (PDF)

The latest releases of SAS^® Data Management software provide a comprehensive and integrated set of capabilities for collecting, transforming, and managing your data. The latest features in the product suite include capabilities for working with data from a wide variety of environments and types including Hadoop, cloud data sources, RDBMS, files, unstructured data, streaming, and others, and the ability to perform ETL and ELT transformations in diverse run-time environments including SAS^®, database systems, Hadoop, Spark, SAS^® Analytics, cloud, and data virtualization environments. There are also new capabilities for lineage, impact analysis, clustering, and other data governance features for enhancements to master data and support metadata management. This paper provides an overview of the latest features of the SAS^® Data Management product suite and includes use cases and examples for leveraging product capabilities.

Read the paper (PDF)

SAS^® Visual Analytics gives customers the power to quickly and easily make sense of any data that matters to them. SAS^® Visual Analytics 7.4 delivers requested enhancements to familiar features. These enhancements include dynamic text, custom geographical regions, improved PDF printing, and enhanced prompted filter controls. There are also enhancements to report parameters and calculated data items. This paper provides an overview of the latest features of SAS Visual Analytics 7.4, including use cases and examples for leveraging these new capabilities.

Whether you are a new SAS^® administrator or you are switching to a Linux environment, you have a complex mission. This job becomes even more formidable when you are working with a system like SAS^® Visual Analytics that requires multiple users loading data daily. Eventually a user has data issues or creates a disruption that causes the system to malfunction. When that happens, what do you do next? In this paper, we go through the basics of a SAS Visual Analytics Linux environment and how to troubleshoot the system when issues arise.

Read the paper (PDF)

Have you ever been working on a task and wondered whether there might be a SAS^® function that could save you some time? Let alone, one that might be able to do the work for you? Data review and validation tasks can be time-consuming efforts. Any gain in efficiency is highly beneficial, especially if you can achieve a standard level where the data itself can drive parts of the process. The ANY and NOT functions can help alleviate some of the manual work in many tasks such as data review of variable values, data compliance, data formats, and derivation or validation of a variable's data type. The list goes on. In this poster, we cover the functions and their details and use them in an example of handling date and time data and mapping it to ISO 8601 date and time formats.

Read the paper (PDF) | View the e-poster or slides (PDF)

For the past couple of years, it seems that big data has been a buzzword in the industry. We have more and more data coming in from more and more places, and it is our job to figure out how best to handle it. One way to attempt to organize data is with arrays, but what do you do when the array you are attempting to populate is so large that it cannot be handled in memory? How do you handle a large array when most of the elements are missing? This paper and presentation deals with the concept of a sparse matrix. A sparse matrix is a large array with relatively few actual elements. We address methods for handling such a construct while keeping memory, CPU, clock, and programmer time to their respective minimums.

High-quality analytics works best with the best-quality data. Preparing your data ranges from activities like text manipulation and filtering to creating calculated items and blending data from multiple tables. This paper covers the range of activities you can easily perform to get your data ready. High-performance analytics works best with in-memory data. Getting your data into an in-memory server, as well as keeping it fresh and secure, are considerations for in-memory data management. This paper covers how to make small or large data available and how to manage it for analytics. You can choose to perform these activities in a graphical user interface or via batch scripts. This paper describes both ways to perform these activities. You ll be well-prepared to get your data wrangled into shape for analytics!

Read the paper (PDF)

A SAS^® program with the extension .SAS is simply a text file. This fact opens the door to many powerful results. You can read a typical SAS program into a SAS data set as a text file with a character variable, with one line of the program being one record in the data set. The program's code can be changed, and a new program can be written as a simple text file with a .SAS extension. This presentation shows an example of dynamically editing SAS code on the fly and generating statistics about SAS programs.

Read the paper (PDF)

SAS^® has many methods of doing table lookups in DATA steps: formats, arrays, hash objects, the SASMSG function, indexed data sets, and so on. Of these methods, hash objects and indexed data sets enable you to specify multiple lookup keys and to return multiple table values. Both methods can be updated dynamically in the middle of a DATA step as you obtain new information (such as reading new keys from an input file or creating new synthetic keys). Hash objects are very flexible, fast, and fairly easy to use, but they are limited by the amount of data that can be held in memory. Indexed data sets can be slower, but they are not limited by what can be held in memory. As a result, they might be your only option in some circumstances. This presentation discusses how to use an indexed data set for table lookup and how to update it dynamically using the MODIFY statement and its allies.

Read the paper (PDF)

The SAS RAKING macro, introduced in 2000, has been implemented by countless survey researchers worldwide. The authors receive messages from users who tirelessly rake survey data using all three generations of the macro. In this poster, we present the fourth generation of the macro, cleaning up remnants from the previous versions, and resolving user-reported confusion. Most important, we introduce a few helpful enhancements including: 1) An explicit indicator for trimming (or not trimming) the weight that substantially saves run time when no trimming is needed. 2) Two methods of weight trimming, AND and OR, that enable users to overcome a stubborn non-convergence. When AND is indicated, weight trimming occurs only if both (individual and global) high weight cap values are true. Conversely, weight increase occurs only if both low weight cap values are true. When OR is indicated, weight trimming occurs if either of the two (individual or global) high weight cap values is true. Conversely, weight increase occurs if either of the two low weight cap values is true. 3) Summary statistics related to the number of cases with trimmed or increased weights have been expanded. 4) We introduce parameters that enable users to use different criteria of convergence for different raking marginal variables. We anticipate that these innovations will be enthusiastically received and implemented by the survey research community.

View the e-poster or slides (PDF)

Since databases often lack the extensive string-handling capabilities available in SAS^®, SAS users are often forced to extract complex character data from the database into SAS for string manipulation. As database vendors make regular expression functionality more widely available for use in SQL, the need to move data into SAS for pattern matching, string replacement, and character extraction is necessary less often. This paper covers enough regular expression patterns to make you dangerous, demonstrates the various REGEXP SQL functions, and provides practical applications for each.

Read the paper (PDF)

How often have you pulled oodles of data out of the corporate data warehouse down into SAS^® for additional processing? Additional processing, sometimes thought to be unique to SAS, includes FIRST. logic, cumulative totals, lag functionality, specialized summarization, and advanced date manipulation. Using the Analytical/OLAP and Windowing functionality available in many databases (for example, Teradata and Netezza) all of this processing can be performed directly in the database without moving and reprocessing detail data unnecessarily. This presentation illustrates how to increase your coding and execution efficiency by using the database's power through your SAS environment.