SAS Global Forum 2017 Proceedings

Automatic loading, tracking, and visualization of data readiness in SAS^® Visual Analytics is easy when you combine SAS^® Data Integration Studio with the DATASET and LASR procedures. This paper illustrates the simple method that the University of North Carolina at Chapel Hill (Enterprise Reporting and Departmental Systems) uses to automatically load tables into the SAS^® LASR Analytic Servers, and then store reportable data about the HDFS tables created, the LASR tables loaded, and the ETL job execution times. This methodology gives the department the ability to longitudinally visualize system loading performance and identify changes in system behavior, as well as providing a means of measuring how well we are serving our customers over time.

Read the paper (PDF)

Creating sophisticated, visually stunning reports is imperative in today s business environment, but is your fancy report really accessible to all? Let s explore some simple enhancements that the fourth maintenance release of SAS^® 9.4 made to Output Delivery System (ODS) layout and the Report Writing Interface that will truly empower you to accommodate people who use assistive technology. ODS now provides the tools for you to meet Section 508 compliance and to create an engaging experience for all who consume your reports.

Read the paper (PDF)

When a large and important project with a strict deadline hits your desk, it's easy to revert to those tried-and-true SAS^® programming techniques that have been successful for you in the past. In fact, trying to learn new techniques at such a time can prove to be distracting and a waste of precious time. However, the lull after a project's completion is the perfect time to reassess your approach and see whether there are any new features added to the SAS arsenal since the last time you looked that could be of great use the next time around. Such a post-project post-mortem has provided me with the opportunity to learn about several new features that will prove to be hugely valuable in the next release of my project. For example: 1) The PRESENV option and procedure 2) Fuzzy matching with the COMPGED function 3) The ODS POWERPOINT statement 4) SAS^® Enterprise Guide^® enhancements, including copying and pasting process flows and the SAS Macro Variable Viewer

Read the paper (PDF)

This paper introduces a macro that can generate the keyhole markup language (KML) files for U.S. states and counties. The generated KML files can be used directly by Google Maps to add customized state and county layers with user-defined colors and transparencies. When someone clicks on the state and county layers in Google Maps, customized information is shown. To use the macro, the user needs to prepare only a simple SAS^® input data set. The paper includes all the SAS codes for the macro and provides examples that show you how to use the macro as well as how to display the KML files in Google Maps.

Read the paper (PDF)

We have a lot of chances to use time-to-event (survival) analysis, especially in the biomedical and pharmaceutical fields. SAS^® provides the LIFETEST procedure to calculate Kaplan-Meier estimates for survival function and to delineate a survival plot. The PHREG procedure is used in Cox regression models to estimate the effect of predictors in hazard rates. Programs with ODS tables that are defined by PROC LIFETEST and PROC PHREG can provide more statistical information from the generated data sets. This paper provides a macro that uses PROC LIFETEST and PROC PHREG with ODS. It helps users to have a survival plot with estimates that include the subject at risk, events and total subject number, survival rate with median and 95% confidence interval, and hazard ratio estimates with 95% confidence interval. Some of these estimates are optional in the macro, so users can select what they need to display in the output. (Subject at risk and event and subject number are not optional.) Users can also specify the tick marks in the X-axis and subject at risk table, for example, every 10 or 20 units. The macro dynamic calculates the maximum for the X-axis and uses the interval that the user specified. Finally, the macro uses ODS and can be output in any document files, including JPG, PDF, and RTF formats.

Read the paper (PDF) | View the e-poster or slides (PDF)

Duplicates in a clinical trial or survey database could jeopardize data quality and integrity, and they can induce biased analysis results. These complications often happen in clinical trials, meta analyses, and registry and observational studies. Common practice to identify possible duplicates involves sensitive personal information, such as name, Social Security number (SSN), date of birth, address, telephone number, etc. However, access to this sensitive information is limited. Sometimes, it is even restricted. As a measure of data quality control, a SAS^® program was developed to identify duplicated individuals using non-sensitive information, such as age, gender, race, medical history, vital signs, and laboratory measurements. A probabilistic approach was used by calculating weights for data elements used to identify duplicates based on two probabilities (probability of agreement for an element among matched pairs and probability of agreement purely by chance among non-matched pairs). For elements with categorical values, agreement was defined as matching pairs sharing the same value. For elements with interval values, agreement was defined as matching values within 1% of measurement precision range. Probabilities used to compute matching element weights were estimated using an expectation-maximization (EM) algorithm. The method was then tested on a survey and clinical trial data from hypertension studies.

View the e-poster or slides (PDF)

Visualization is a critical part to turn data into knowledge. A customized graph is essential to make data visualization meaningful, powerful, and interpretable. Furthermore, customizing grouped data into a desired layout with specific requirements such as clusters, colors, symbols, and patterns for each group can be challenging. This paper provides a start-from-scratch, step-by-step solution to create a customized graph for grouped data using the Graph Template Language (GTL). From analyzing the data to creating the target graph with the tools and options that are available with GTL, this paper demonstrates GTL is a powerful and flexible tool for creating a customized, complex graph.

Read the paper (PDF)

SAS^® functions provide amazing power to your DATA step programming. Some of these functions are essential some of them help you avoid writing volumes of unnecessary code. This talk covers some of the most useful SAS functions. Some of these functions might be new to you, and they will change the way you program and approach common programming tasks. The majority of the functions described in this talk work with character data. There are functions that search for strings, and others that can find and replace strings or join strings together. Still others can measure the spelling distance between two strings (useful for 'fuzzy' matching). Some of the newest and most amazing functions are not functions at all, but call routines. Did you know that you can sort values within an observation? Did you know that not only can you identify the largest or smallest value in a list of variables, but you can identify the second- or third- or nth-largest or smallest value? A knowledge of the functions described here will make you a much better SAS programmer.

Read the paper (PDF)

Accelerate your data preparation by having your DS2 execute without translation inside the Teradata database or on the Hadoop platform with SAS^® Code Accelerator. This presentation shows how easy it is to use SAS Code Accelerator via a live demonstration.

Read the paper (PDF)

Are you tired of copying PROC FREQ or PROC MEANS output and pasting it into your tables? Do you need to produce summary tables repeatedly? Are you spending a lot of your time generating the same summary tables for different subpopulations? This paper introduces an easy-to-use macro to generate a descriptive statistics table. The table reports counts and percentages for categorical variables, and means, standard deviations, medians, and quantiles for continuous variables. For variables with missing values, the table also includes the count and percentage missing. Customization options allow for the analysis of stratified data, specification of variable output order, and user-defined formats. In addition, this macro incorporates the SAS^® Output Delivery System (ODS) to automatically produce a Rich Text Format (RTF) file, which can be further edited by a word processor for the purpose of publication.

Read the paper (PDF) | View the e-poster or slides (PDF)

UNIX and Linux SAS^® administrators, have you ever been greeted by one of these statements as you walk into the office before you have gotten your first cup of coffee? Power outage! SAS servers are down. I cannot access my reports. Have you frantically tried to restart the SAS servers to avoid loss of productivity and missed one of the steps in the process, causing further delays while other work continues to pile up? If you have had this experience, you understand the benefit to be gained from a utility that automates the management of these multi-tiered deployments. Until recently, there was no method for automatically starting and stopping multi-tiered services in an orchestrated fashion. Instead, you had to use time-consuming manual procedures to manage SAS services. These procedures were also prone to human error, which could result in corrupted services and additional time lost, debugging and resolving issues injected by this process. To address this challenge, SAS Technical Support created the SAS Local Services Management (SAS_lsm) utility, which provides automated, orderly management of your SAS^® multi-tiered deployments. The intent of this paper is to demonstrate the deployment and usage of the SAS_lsm utility. Now, go grab a coffee, and let's see how SAS_lsm can make life less chaotic.

Read the paper (PDF)

Correlated data is extensively used across disciplines when modeling data with any type of correlation that might exist among observations due to clustering or repeated measurements. When modeling clustered data, hierarchical linear modeling (HLM) is a popular multilevel modeling technique that is widely used in different fields such as education and health studies (Gibson and Olejnik, 2003). A typical example of multilevel data involves students nested within classrooms that behave similarly due to shared situational factors. Ignoring their correlation might result in underestimated standard errors and inflated type-I error (Raudenbush and Bryk, 2002). When modeling longitudinal data, many studies have been conducted on continuous outcomes. However, fewer studies on discrete responses over time have been completed. These studies require models within conditional, transitional, and marginal models (Fitzmaurice et al., 2009). Examples of such models that enable researchers to account for the autocorrelation among repeated observations include generalized linear mixed model (GLMM), generalized estimating equations (GEE), alternating logistic regression (ALR), and fixed effects with conditional logit analysis. This study explores the aforementioned methods as well as several other correlated modeling options for longitudinal and hierarchical data within SAS^® 9.4 using real data sets. These procedures include PROC GLIMMIX, PROC GENMOD, PROC NLMIXED, PROC GEE, PROC PHREG, and PROC MIXED.

Read the paper (PDF)

I have come up with a way to use the output of the SCAPROC procedure to produce DOT directives, which are then put through the Graphviz engine to produce a diagram. This allows the production of flowcharts of SAS^® code automatically. I have enhanced the charts to also show the longest steps by run time, so even if you look at thousands of steps in a complex program, you can easily see the structure and flow of it, and where most of the time is spent, just by having a look for a few seconds. Great for documentation, benchmarking, tuning, understanding, and more.

Read the paper (PDF)

Let's walk through an example of communicating from the SAS^® client to SAS^® Viya . The demonstration focuses on how to use SAS^® language to establish a session, transport and persist data, and receive results. Learn how to establish communication with SAS Viya. Explore topics such as: What is a session? How do I make requests? What does my SAS log tell me? Get a deeper understanding of data location on the client and the server side. Learn about applying existing user formats, how to get listings or reports, and how to query sessions, data, and properties.

Read the paper (PDF)

Nearly every SAS^® program includes logic that causes certain code to be executed only when specific conditions are met. This is commonly done using the IF-THEN/ELSE syntax. This paper explores various ways to construct conditional SAS logic, some of which might provide advantages over the IF statement in certain situations. Topics include the SELECT statement, the IFC and IFN functions, and the CHOOSE and WHICH families of functions, as well as some more esoteric methods. We also discuss the intricacies of the subsetting IF statement and explain the difference between a regular IF and the %IF macro statement.

Read the paper (PDF)

Soon after the advent of the SAS^® hash object in SAS^®9, its early adopters realized that its potential functionality is much broader than merely using its fast table lookup capability for file matching. This is because in reality, the hash object is a versatile data storage structure with a roster of standard table operations such as create, drop, insert, delete, clear, search, retrieve, update, order, and enumerate. Since it is memory-resident and its key-access operations execute in O(1) time, it runs them as fast as or faster than other canned SAS techniques, with the added bonus of not having to code around their inherent limitations. Another advantage of the hash object as compared to the methods that had existed before its implementation is its dynamic, run-time nature and the ability to handle I/O all by itself, independently of the intrinsic statements of a DATA step or DS2 program calling its methods. The hash object operations, or their combination thereof, lend themselves to diverse SAS programming functionalities well beyond the original focus on data search and retrieval. In this paper, which can be thought of as a preview of a SAS book being written by the authors, we aim to present this logical connection using the power of example.

Read the paper (PDF)

SAS^® is perfect for building enterprise apps. Think about it: SAS speaks to almost any database you can think of and is probably already hooked in to most of the data sources in your organization. A full-fledged metadata security layer happens to already be integrated with your single sign-on authentication provider, and every time a user interacts with the system, their permissions are checked and the data their app asks for is automatically encrypted. SAS ticks all the boxes required by IT, and the skills required to start developing apps already sit within your department. Your team most likely already knows what your app needs to do, so instead of writing lists of requirements, give them an HTML5 resource, and together they can write and deploy the back-end code themselves. The apps run in the browser, the server-side logic is deployed using SAS .spk packages, and permissions are managed via SAS^® Management Console. Best of all, the infrastructure that would normally take months to integrate is already there, eliminating barriers to entry and letting you demonstrate the value of your solution to internal customers with zero up-front investment. This paper shows how SAS integrates with open-source tools like H54S, AngularJS, and PostGIS, together with next-generation developer-centric analytical platforms like SAS^® Viya , to build secure, enterprise-class apps that can support thousands of users. This presentation includes lots of app demos. This presentation was included at SAS^® Forum UK 2016.

Read the paper (PDF)

Cascading Style Sheets (CSS) frameworks like Bootstrap, and JavaScript libraries such as jQuery and h54s, have made it faster than ever before to develop enterprise-grade web apps on top of the SAS^® platform. Hailing the benefits of using SAS as a back end (authentication, security, ease of data access), this paper navigates the configuration issues to consider for maximum responsiveness to client web requests (pooled sessions, load balancing, multibridge connections). Cherry picking from the whirlwind of front end technologies and approaches, the author presents a framework that enables the novice programmer to build a simple web app in minutes. The exact steps necessary to achieve this are described, alongside a hurricane of practical tips like the following: dealing with CORS; logging in SAS; debugging AJAX calls; and SAS http responses. Beware this approach is likely to cause a storm of demand in your area! Server requirements: SAS^® Business Intelligence Platform (SAS^® 9.2 or later); SAS^® Stored Process Web Application (SAS^® Integration Technologies). Client requirements: HTML5 browser (Microsoft Internet Explorer 8 or later); access to open-source libraries (which can be hosted on-premises if Internet access is an issue).

Read the paper (PDF)

The SAS^® Macro Language gives you the power to create tools that, to a large extent, think for themselves. How often have you used a macro that required your input, and you thought to yourself, Why do I need to provide this information when SAS^® already knows it? SAS might already know most of this information, but how does SAS direct your macro programs to self-discern the information that they need? Fortunately, there are a number of functions and tools in SAS that can intelligently enable your programs to find and use the information that they require. If you provide a variable name, SAS should know its type and length. If you provide a data set name, SAS should know its list of variables. If you provide a library or libref, SAS should know the full list of data sets that it contains. In each of these situations, functions can be used by the macro language to determine and return information. By providing a libref, functions can determine the library's physical location and the list of data sets it contains. By providing a data set, they can return the names and attributes of any of the variables that it contains. These functions can read and write data, create directories, build lists of files in a folder, and build lists of folders. Maximize your macro's intelligence; learn and use these functions.

Read the paper (PDF)

In the pharmaceutical industry, we find ourselves having to re-run our programs repeatedly for each deliverable. These programs can be run individually in an interactive SAS^® session, which enables us to review the logs as we execute the programs. We could run the individual programs in batch and open each individual log to review for unwanted log messages, such as ERROR, WARNING, uninitialized, have been converted to, and so on. Both of these approaches are fine if there are only a handful of programs to execute. But what do you do if you have hundreds of programs that need to be re-run? Do you want to open every single one of the programs and search for unwanted messages? This manual approach could take hours and is prone to accidental oversight. This paper discusses a macro that searches a specified directory and checks either all the logs in the directory, only logs with a specific naming convention, or only the files listed. The macro then produces a report that lists all the files checked and indicates whether issues were found.

Read the paper (PDF)

The LUA procedure is a relatively new SAS^® procedure, having been available since SAS^® 9.4. It allows for the Lua language to be used as an interface to SAS, as an alternative scripting language to the SAS macro facility. This paper compares and contrasts PROC LUA with the SAS macro facility, showing examples of approaches and highlighting the advantages and disadvantages of each.

Read the paper (PDF)

This paper shows how to use Base SAS^® to create unique datetime stamps that can be used for naming external files. These filenames provide automatic versioning for systems and are intuitive and completely sortable. In addition, they provide enhanced flexibility compared to generation data sets, which can be created by SAS^® or by the operating system.

Read the paper (PDF)

Clinical research study enrollment data consists of subject identifiers and enrollment dates that are used by investigators to monitor enrollment progress. Meeting study enrollment targets is critical to ensuring there will be enough data and end points to enable the statistical power of the study. For clinical trials that do not experience heavy, nearly daily enrollment, there will be a number of dates on which no subjects were enrolled. Therefore, plots of cumulative enrollment represented by a smoothed line can give a false impression, or imprecise reading, of study enrollment. A more accurate display would be a step function plot that would include dates where no subjects were enrolled. Rolling average plots often start with summing the data by month and creating a rolling average from the monthly sums. This session shows how to use the EXPAND procedure, along with the SQL and GPLOT procedures and the INTNX function, to create plots that display cumulative enrollment and rolling 6-month averages for each day. This includes filling in the dates with no subject enrollment and creating a rolling 6-month average for each date. This allows analysis of day-to-day variation as well as the short- and long-term impacts of changes, such as adding an enrollment center or initiatives to increase enrollment. This technique can be applied to any data that has gaps in dates. Examples include service history data and installation rates for a newly launched product.

Read the paper (PDF)

The DATA step is the familiar and powerful data processing language in SAS^® and now SAS Viya . The DATA step's simple syntax provides row-at-a-time operations to edit, restructure, and combine data. New to the DATA step in SAS Viya are a varying-size character data type and parallel execution. Varying-size character data enables intuitive string operations that go beyond the 32KB limit of current DATA step operations. Parallel execution speeds the processing of big data by starting the DATA step on multiple machines and dividing data processing among threads on these machines. To avoid multi-threaded programming errors, the run-time environment for the DATA step is presented along with potential programming pitfalls. Come see how the DATA step in SAS Viya makes your data processing simpler and faster.

Read the paper (PDF)

The DATA Step has served SAS^® programmers well over the years. Although the DATA step is handy, the new, exciting, and powerful DS2 provides a significant alternative to the DATA step by introducing an object-oriented programming environment. It enables users to effectively manipulate complex data and efficiently manage the programming through additional data types, programming structure elements, user-defined methods, and shareable packages, as well as providing threaded execution. This tutorial was developed based on our experiences with getting started with DS2 and learning to use it to access, manage, and share data in a scalable and standards-based way. It facilitates SAS users of all levels to easily get started with DS2 and understand its basic functionality by practicing how to use the features of DS2.

Read the paper (PDF) | Download the data file (ZIP)

Users want more power. SAS^® delivers. Data grids are a new data type available to users of SAS^® Business Rules Manager and SAS^® Decision Manager. These data grids can be deployed to both batch and web service scoring for data mining models and business decisions. Users will learn how to construct data with grid data types, create business rules using high-level expressions, and deploy decisions to both batch and web services for scoring.

Read the paper (PDF)

The SAS^® 9.4 SGPLOT procedure is a great tool for creating all types of graphs, from business graphs to complex clinical graphs. The goal for such graphs is to convey the data in a simple and direct manner with minimal distractions. But often, you need to grab the attention of a reader in the midst of a sea of data and graphs. For such cases, you need a visual that can stand out above the rest of the noise. Such visuals insert a decorative flavor into the graph to attract the eye of the reader and to encourage them to spend more time studying the visual. This presentation discusses how you can create such attention-grabbing visuals using the SGPLOT procedure.

Read the paper (PDF)

Standard SAS^® Studio tasks already include many advanced analytic procedures for data mining and other high-performance models, enabling point-and-click generation and execution of SAS^® code. However, you can extend the power of tasks by creating tasks of your own to enable point-and-click access to the latest SAS statistical procedures, to your own default model definitions, or to your previously developed SAS/STAT^® or SAS macro code. Best of all, these point-and-click tasks can be developed directly in SAS Studio without the need to compile binaries or build DLL files using third-party software. In this paper, we demonstrate three approaches to developing custom tasks. First, we build a custom task to provide point-and-click access to PROC IRT, including recently added functionality to PROC IRT used to analyze educational test and opinion survey data. Second, we build a custom task that calls a macro for previously developed SAS code, and we show how point-and-click options can be set up to allow users to guide the execution of complex macro code. Third, we demonstrate just enough of the underlying Apache Velocity Template Language code to enable developers to take advantage of the benefits of that language to support their SAS process. Finally, we show how these tasks can easily be shared with a user community, increasing the efficiency of analytic modeling across the organization.

Read the paper (PDF)

Hash objects have been supported in the DATA step and in the FCMP procedure for a while, but have you ever felt that hash objects could do a little more? For example, what if you needed to store more than doubles and character strings? Introducing PROC FCMP dictionaries. Dictionaries allow you to create references not only to numeric and character data, but they also give you fast in-memory hashing to arrays, other dictionaries, and even PROC FCMP hash objects. This paper gets you started using PROC FCMP dictionaries, describes usage syntax, and explores new programming patterns that are now available to your PROC FCMP programs, functions, and subroutines in the new SAS^® Viya platform environment.

Read the paper (PDF)

Until recently, psychometric analyses of test data within the Item Response Theory (IRT) framework were conducted using specialized, commercial software. However, with the inclusion of the IRT procedure in the suite of SAS^® statistical tools, SAS users can explore the psychometric properties of test items using modern test theory or IRT. Considering the item as the unit of analysis, the relationship between test items and the constructs they measure can be modeled as a function of an unobservable or latent variable. This latent variable or trait (for example, ability or proficiency), vary in the population. However, when examinees having the same trait level do not have the same probability to answering correctly or endorsing an item, we said that such an item might be functioning differently or exhibiting differential item functioning or DIF (Thissen, Steinberg, and Wainer, 2012). This study introduces the implementation of PROC IRT for conducting a DIF analysis for graded responses, using Samejima's graded response model (GRM; Samejima, 1969, 2010). The effectiveness of PROC IRT for evaluation of DIF items is assessed in terms of the Type I error and statistical power of the likelihood ratio test for testing DIF in graded responses.

Creating an effective style for your graphics can make the difference between clearly conveying your message to your audience and hiding your message in a sea of lines, markers, and text. A number of books explain the concepts of effective graphics, but you need an understanding of how styles work in your environment to correctly apply those principles. The goal of this paper is to give you an in-depth discussion of how styles are applied to Output Delivery System (ODS) graphics, from the ODS style level all the way down to the graph syntax. This discussion includes information about differences in grouped versus non-grouped plots, precedence order of style application, using style references, and much more. Don't forget your scuba gear!

Read the paper (PDF)

Some data is best visualized in a polar orientation, particularly when the data is directional or cyclical. Although the SG procedures and Graph Template Language (GTL) do not directly support polar coordinates, they are quite capable of drawing such graphs with a little bit of data processing. We demonstrate how to convert your data from polar coordinates to Cartesian coordinates and use the power of SG procedures to create graphs that retain the polar nature of your data. Stop going around in circles: let us show you the way out with SG procedures!

Read the paper (PDF)

The Base SAS^® 9.4 Output Delivery System (ODS) EPUB destination enables users to deliver SAS^® reports as e-books on Apple mobile devices. ODS EPUB e-books are truly mobile you don't need an Internet connection to read them. Just install Apple's free iBooks app, and you're good to go. This paper shows you how to create an e-book with ODS EPUB and sideload it onto your Apple device. You will learn new SAS^® 9.4 techniques for including text, images, audio, and video in your ODS EPUB e-books. You will understand how to customize your e-book's table of contents (TOC) so that readers can easily navigate the e-book. And you will learn how to modify the ODS EPUB style to create specialized presentation effects. This paper provides beginning to intermediate instruction for writing e-books with ODS EPUB. Please bring your iPad, iPhone, or iPod to the presentation so that you can download and read the examples.

Read the paper (PDF)

The DOCUMENT procedure is a little known procedure that can save you vast amounts of time and effort when managing the output of your SAS^® programming efforts. This procedure is deeply associated with the mechanism by which SAS controls output in the Output Delivery System (ODS). Have you ever wished you didn't have to modify and rerun the report-generating program every time there was some tweak in the desired report? PROC DOCUMENT enables you to store one version of the report as an ODS Document Object and then call it out in many different output forms, such as PDF, HTML, listing, RTF, and so on, without rerunning the code. Have you ever wished you could extract those pages of the output that apply to certain BY variables such as State, StudentName, or CarModel? With PROC DOCUMENT, you have where capabilities to extract these. Do you want to customize the table of contents that assorted SAS procedures produce when you make frames for the table of contents with HTML, or use the facilities available for PDF? PROC DOCUMENT enables you to get to the inner workings of ODS and manipulate them. This paper addresses PROC DOCUMENT from the viewpoint of end results, rather than provide a complete technical review of how to do the task at hand. The emphasis is on the benefits of using the procedure, not on detailed mechanics.

Read the paper (PDF) | View the e-poster or slides (PDF)

Finding daylight saving time (DST) is a common task for manipulating time series data. The date of daylight saving time changes every year. If SAS^® programmers depend on manually entering the value of daylight saving time in their programs, the maintenance of the program becomes tedious. Using a SAS function can make finding the value easy. This paper discusses several ways to capture and use daylight saving time.

Read the paper (PDF)

U.S. stock exchanges (currently there are 12) are tracked in real time via the Consolidated Trade System (CTS) and the Consolidated Quote System (CQS). CQS contains every updated quote from each of these exchanges, covering some 8,500 stock tickers. It provides the basis by which brokers can honor their fiduciary obligation to investors to execute transactions at the best price, that is, at the National Best Bid or Best Offer (NBBO). With the advent of electronic exchanges and high-frequency trading (timestamps are published to the nanosecond), data set size (approaching 1 billion quotes requiring 80 gigabytes of storage for a normal trading day) has become a major operational consideration for market behavior researchers re-creating NBBO values from quotes. This presentation demonstrates a straightforward use of hash tables for tracking constantly changing quotes for each ticker/exchange combination to provide the NBBO for each ticker at each time point in the trading day.

This paper discusses format enumeration (via the DICTIONARY.FORMATS view) and the new FMTINFO function that gives information about a format, such as whether it is a date or currency format.

Read the paper (PDF)

The macro language is both powerful and flexible. With this power, however, comes complexity, and this complexity often makes the language more difficult to learn and use. Fortunately, one of the key elements of the macro language is its use of macro variables, and these are easy to learn and easy to use. You can create macro variables using a number of different techniques and statements. However, the five most commonly methods are not only the most useful, but also among the easiest to master. Since macro variables are used in so many ways within the macro language, learning how they are created can also serve as an excellent introduction to the language itself. These methods include: 1) the %LET statement; 2) macro parameters (named and positional); 3) the iterative %DO statement; 4) using the INTO clause in PROC SQL; and 5) using the CALL SYMPUTX routine.

Read the paper (PDF) | Download the data file (ZIP)

Formats can be used for more than just making your data look nice. They can be used in memory lookup tables and to help you create data-driven code. This paper shows you how to build a format from a data set, how to write a format out as a data set, and how to use formats to make programs data driven. Examples are provided.

Read the paper (PDF)

Tracking gains or losses from the purchase and sale of diverse equity holdings depends in part on whether stocks sold are assumed to be from the earliest lots acquired (a first-in, first-out queue, or FIFO queue) or the latest lots acquired (a last-in, first-out queue, or LIFO queue). Other inventory tracking applications have a similar need for application of either FIFO or LIFO rules. This presentation shows how a collection of simple ordered hash objects, in combination with a hash-of-hashes, is a made-to-order technique for easy data-step implementation of FIFO, LIFO, and other less likely rules (for example, HIFO [highest-in, first-out] and LOFO [lowest-in, first-out]).

Read the paper (PDF)

When you look at examples of the REPORT procedure, you see code that tests _BREAK_ and _RBREAK_, but you wonder what s the breakdown of the COMPUTE block? And, sometimes, you need more than one break line on a report, or you need a customized or adjusted number at the break. Everything in PROC REPORT that is advanced seems to involve a COMPUTE block. This paper provides examples of advanced PROC REPORT output that uses _BREAK_ and _RBREAK_ to customize the extra break lines that you can request with PROC REPORT. Examples include how to get custom percentages with PROC REPORT, how to get multiple break lines at the bottom of the report, how to customize break lines, and how to customize LINE statement output. This presentation is aimed at the intermediate to advanced report writer who knows some about PROC REPORT, but wants to get the breakdown of how to do more with PROC REPORT and the COMPUTE block.

Read the paper (PDF) | Download the data file (ZIP)

This paper shows how you can reduce the computing footprint of your SAS^® applications without compromising your end products. The paper presents the 15 axioms of going green with your SAS applications. The axioms are proven, real-world techniques for reducing the computer resources used by your SAS programs. When you follow these axioms, your programs run faster, use less network bandwidth, use fewer desktop or shared server computing resources, and create more compact SAS data sets.

Read the paper (PDF)

Many SAS^® users are working across multiple platforms, commonly combining Microsoft Windows and UNIX environments. Often, SAS code developed on one platform (for example, on a PC) might not work on another platform (for example, on UNIX). Portability is not just working across multi-platform environments; it is also about making programs easier to use across projects, across companies, or across clients and vendors. This paper examines some good programming practices to address common issues that occur when you work across SAS on a PC and SAS on UNIX. They include: 1) avoid explicitly defining file paths in LIBNAME, filename, and %include statements that require platform-specific syntax such as forward slash (in UNIX) or backslash (in PC SAS); 2) avoid using X commands in SAS code to execute statements on the operating system, which works only on Windows but not on UNIX; 3) use the appropriate SAS rounding function for numeric variables to avoid different results when dealing with 64-bit operating systems and 32-bit systems. The difference between rounding before or after calculations and derivations is discussed; 4) develop portable SAS code to import or export Microsoft Excel spreadsheets across PC SAS and UNIX SAS, especially when dealing with multiple worksheets within one Excel file; and 5) use SAS^® Enterprise Guide^® to access and run PC SAS programs in UNIX effectively.

Read the paper (PDF)

Would you like to be more confident in producing graphs and figures? Do you understand the differences between the OVERLAY, GRIDDED, LATTICE, DATAPANEL, and DATALATTICE layouts? Finally, would you like to learn the fundamental Graph Template Language methods in a relaxed environment that fosters questions? Great this topic is for you! In this hands-on workshop, you are guided through the fundamental aspects of the GTL procedure, and you can try fun and challenging SAS^® graphics exercises to enable you to more easily retain what you have learned.

Read the paper (PDF) | Download the data file (ZIP)

This hands-on-workshop explores the power of SAS Macro, a text substitution facility for extending and customizing SAS programs. Examples will range from simple macro variables to advanced macro programs. As a participant, you will add macro syntax to existing programs to dynamically enhance your programming experience.

This workshop provides hands-on experience with SAS^® Studio. Workshop participants will use SAS's new web-based interface to access data, write SAS programs, and generate SAS code through predefined tasks. This workshop is intended for SAS programmers from all experience levels.

Do you need to add annotations to your graphs? Do you need to specify your own colors on the graph? Would you like to add Unicode characters to your graph, or would you like to create templates that can also be used by non-programmers to produce the required figures? Great, then this topic is for you! In this hands-on workshop, you are guided through the more advanced features of the GTL procedure. There are also fun and challenging SAS^® graphics exercises to enable you to more easily retain what you have learned.

Read the paper (PDF) | Download the data file (ZIP)

Data processing can sometimes require complex logic to match and rank record associations across events. This paper presents an efficient solution to generating these complex associations using the DATA step and data hash objects. The solution applies to multiple business needs including subsequent purchases, repayment of loan advance, or hospital readmits. The logic demonstrates how to construct a hash process to identify a qualifying initial event and append linking information with various rank and analysis factors, through the example of a specific use case of the process.

Read the paper (PDF) | Download the data file (ZIP)

Heat maps use colors to communicate numeric data by varying the underlying values that represent red, green, and blue (RGB) as a linear function of the data. You can use heat maps to display spatial data, plot big data sets, and enhance tables. You can use colors on the spectrum from blue to red to show population density in a US map. In fields such as epidemiology and sociology, colors and maps are used to show spatial data, such as how rates of disease or crime vary with location. With big data sets, patterns that you would hope to see in scatter plots are hidden in dense clouds of points. In contrast, patterns in heat maps are clear, because colors are used to display the frequency of observations in each cell of the graph. Heat maps also make tables easier to interpret. For example, when displaying a correlation matrix, you can vary the background color from white to red to correspond to the absolute correlation range from 0 to 1. You can shade the cell behind a value, or you can replace the table with a shaded grid. This paper shows you how to make a variety of heat maps by using PROC SGPLOT, the Graph Template Language, and SG annotation.

Read the paper (PDF)

The openness of SAS^® Viya , the new cloud analytic platform that uses SAS^® Cloud Analytic Services (CAS), emphasizes a unified experience for data scientists. You can now execute the analytics capabilities of SAS^® in different programming languages including Python, Java, and Lua, as well as use a RESTful endpoint to execute CAS actions directly. This paper provides an introduction to these programming languages. For each language, we illustrate how the API is surfaced from the CAS server, the types of data that you can upload to a CAS server, and the result tables that are returned. This paper also provides a comprehensive comparison of using these programming languages to build a common analytical process, including loading data to a CAS server; exploring, manipulating, and visualizing data; and building statistical and machine learning models.

Read the paper (PDF)

In the past 10 years, SAS^® Enterprise Guide^® has developed into the go-to application to access the power of SAS^®. With each new release, SAS continues to add functionality that makes the SAS user's life easier. We take a closer look at some of the built-in features within SAS Enterprise Guide and how they can make your life easier. One of the most exciting and powerful features we explore is allowing parallel execution on the same server. This gives you the ability to run multiple SAS processes at the same time regardless of whether you have a SAS^® Grid Computing environment. Some other topics we cover include conditional processing within SAS Enterprise Guide, how to securely store database login and password information, setting up autoexec files in SAS Enterprise Guide, exploiting process flows, and much more.

Read the paper (PDF)

Do you need to create a format instantly? Does the format have a lot of labels, and it would take a long time to type in all the codes and labels by hand? Sometimes, a SAS^® programmer needs to create a user-defined format for hundreds or thousands of codes, and he needs an easy way to accomplish this without having to type in all of the codes. SAS provides a way to create a user-defined format without having to type in any codes. If the codes and labels are in a text file, SAS data set, Excel file, or in any file that can be converted to a SAS data set, then a SAS user-defined format can be created on the fly. The CNTLIN=option of PROC FORMAT allows a user to create a user-defined format or informat from raw data or from a SAS file. This paper demonstrates how to create two user-defined formats instantly from a raw text file on our Census Bureau website. It explains how to use these user-defined formats for the final report and final output data set from PROC TABULATE. The paper focuses on the CNTLIN= option of PROC FORMAT, not the CNTLOUT= option.

Read the paper (PDF)

As technology expands, we have the need to create programs that can be handed off to clients, to regulatory agencies, to parent companies, or to other projects, and handed off with little or no modification by the recipient. Minimizing modification by the recipient often requires the program itself to self-modify. To some extent the program must be aware of its own operating environment and what it needs to do to adapt to it. There are a great many tools available to the SAS^® programmer that will allow the program to self-adjust to its own surroundings. These include location-detection routines, batch files based on folder contents, the ability to detect the version and location of SAS, programs that discern and adjust to the current operating system and the corresponding folder structure, the use of automatic and user defined environmental variables, and macro functions that use and modify system information. Need to create a portable program? We can hand you the tools.

Read the paper (PDF)

Data with a location component is naturally displayed on a map. Base SAS^® 9.4 provides libraries of map data sets to assist in creating these images. Sometimes, a particular sub-region is all that needs to be displayed. SAS/GRAPH^® software can create a new subset of the map using the GPROJECT procedure minimum and maximum latitude and longitude options. However, this method is capable only of cutting out a rectangular area. This paper presents a polygon clipping algorithm that can be used to create arbitrarily shaped custom map regions. Maps are nothing more than sets of polygons, defined by sets of border points. Here, a custom polygon shape overlays the map polygons and saves the intersection of the two. The DATA step hash object is used for easier bookkeeping of the added and deleted points needed to maintain the correct shape of the clipped polygons.

Read the paper (PDF)

From stock price histories to hospital stay records, analysis of time series data often requires use of lagged (and occasionally lead) values of one or more analysis variable. For the SAS^® user, the central operational task is typically getting lagged (lead) values for each time point in the data set. Although SAS has long provided a LAG function, it has no analogous lead function, which is an especially significant problem in the case of large data series. This paper 1) reviews the lag function, in particular, the powerful but non-intuitive implications of its queue-oriented basis; 2) demonstrates efficient ways to generate leads with the same flexibility as the LAG function, but without the common and expensive recourse to data re-sorting; and 3) shows how to dynamically generate leads and lags through use of the hash object.

Read the paper (PDF)

Programming for others involves new disciplines not called for when we write to provide results. There are many additional facilities in the languages of SAS^® to ensure the processes and programs you provide for others will please your customers. Not all are obvious and some seem hidden. The never-ending search to please your friends, colleagues, and customers could start in this presentation.

Read the paper (PDF)

Making sure that you have saved all the necessary information to replicate a deliverable can be a cumbersome task. You want to make sure that all the raw data sets and all the derived data sets, whether they are Study Data Tabulation Model (SDTM) data sets or Analysis Data Model (ADaM) data sets, are saved. You prefer that the date/time stamps are preserved. Not only do you need the data sets, you also need to keep a copy of all programs that were used to produce the deliverable, as well as the corresponding logs from when the programs were executed. Any other information that was needed to produce the necessary outputs also needs to be saved. You must do all of this for each deliverable, and it can be easy to overlook a step or some key information. Most people do this process manually. It can be a time-consuming process, so why not let SAS^® do the work for you?

Read the paper (PDF)

String externalization is the key to making your SAS^® applications speak multiple languages, even if you can't. Using the new features in SAS^® 9.3 for internationalization, your SAS applications can be written to adapt to whatever environment they are found in. String externalization is the process of identifying and separating translatable strings from your SAS program. This paper outlines the four steps of string externalization: create a Microsoft Excel spreadsheet for messages (optional), create SMD files, convert SMD files, and create the final SAS data set. Furthermore, it briefly shows you a real-world project on applying the concept. Using the Excel spreadsheet message text approach, professional translators can work more efficiently translating text in a friendlier and more comfortable environment. Subsequently, a programmer can also fully concentrate on developing and maintaining SAS code when your application is traveling to a new country.

View the e-poster or slides (PDF)

The days of comparing paper copies of graphs on light boxes are long gone, but the problems associated with validating graphical reports still remain. Many recent graphs created using SAS/GRAPH^® software include annotations, which complicate an already complex problem. In ODS Graphics, only a single input data set should be used. Because annotation can be more easily added by overlaying an additional graph layer, it is now more practical to use that single input data set for validation, which removes all of the scaling, platform, and font issues that got in the way before. This paper guides you through the techniques to simplify validation while you are creating your perfect graph.

Read the paper (PDF)

This presentation has the objective to present a methodology for interest rates, life tables, and actuarial calculations using generational mortality tables and the forward structure of interest rates for pension funds, analyzing long-term actuarial projections and their impacts on the actuarial liability. It was developed as a computational algorithm in SAS^® Enterprise Guide^® and Base SAS^® for structuring the actuarial projections and it analyzes the impacts of this new methodology. There is heavy use of the IML and SQL procedures.

Read the paper (PDF)

Microsoft Excel worksheets enable you to explore data that answers the difficult questions that you face daily in your work. When you combine the SAS^® Output Deliver System (ODS) with the capabilities of Excel, you have a powerful toolset that you can use to manipulate data in various ways, including highlighting data, using formulas to answer questions, and adding a pivot table or graph. In addition, ODS and Excel give you many methods for enhancing the appearance of your tables and graphs. This paper, written for the beginning analyst to the most advanced programmer, illustrates first how to manipulate styles and presentation elements in your worksheets by controlling text wrapping, highlighting and exploring data, and specifying Excel templates for data. Then, the paper explains how to use the TableEditor tagset and other tools to build and manipulate both basic and complex pivot tables that can help you answer all of the questions about your data. You will also learn techniques for sorting, filtering, and summarizing pivot-table data. ^®

Read the paper (PDF)

The SAS/IML^® language excels in handling matrices and performing matrix computations. A new feature in SAS/IML 14.2 is support for nonmatrix data structures such as tables and lists. In a matrix, all elements are of the same type: numeric or character. Furthermore, all rows have the same length. In contrast, SAS/IML 14.2 enables you to create a structure that contains many objects of different types and sizes. For example, you can create an array of matrices in which each matrix has a different dimension. You can create a table, which is an in-memory version of a data set. You can create a list that contains matrices, tables, and other lists. This paper describes the new data structures and shows how you can use them to emulate other structures such as stacks, associative arrays, and trees. It also presents examples of how you can use collections of objects as data structures in statistical algorithms.

Read the paper (PDF)

The TABULATE procedure has long been a central workhorse of our organization's reporting processes, given that it offers a uniquely concise syntax for obtaining descriptive statistics on deeply grouped and nested categories within a data set. Given the diverse output capabilities of SAS^®, it often then suffices to simply ship the procedure's completed output elsewhere via the Output Delivery System (ODS). Yet there remain cases in which we want to not only obtain a formatted result, but also to acquire the full nesting tree and logic by which the computations were made. In these cases, we want to treat the details of the Tabulate statements as data, not merely as presentation. I demonstrate how we have solved this problem by parsing our Tabulate statements into a nested tree structure in JSON that can be transferred and easily queried for deep values elsewhere beyond the SAS program. Along the way, this provides an excellent opportunity to walk through the nesting logic of the procedure's statements and explain how to think about the axes, groupings, and set computations that make it tick. The source code for our syntax parser are also available on GitHub for further use.

Read the paper (PDF)

The EXPAND procedure is very useful when handling time series data and is commonly used in fields such as finance or economics, but it can also be applied to medical encounter data within a health research setting. Medical encounter data consists of detailed information about healthcare services provided to a patient by a managed care entity and is a rich resource for epidemiologic research. Specific data items include, but are not limited to, dates of service, procedures performed, diagnoses, and costs associated with services provided. Drug prescription information is also available. Because epidemiologic studies generally focus on a particular health condition, a researcher using encounter data might wish to distinguish individuals with the health condition of interest by identifying encounters with a defining diagnosis and/or procedure. In this presentation, I provide two examples of how cases can be identified from a medical encounter database. The first uses a relatively simple case definition, and then I EXPAND the example to a more complex case definition.

View the e-poster or slides (PDF)

The SAS^® DATA step is one of the best (if not the best) data manipulators in the programming world. One of the areas that gives the DATA step its power is the wealth of functions that are available to it. This paper takes a PEEK at some of the functions whose names have more than one MEANing. While the subject matter is very serious, the material is presented in a humorous way that is guaranteed not to BOR the audience. With so many functions available, we have to TRIM our list so that the presentation can be made within the TIME allotted. This paper also discusses syntax and shows several examples of how these functions can be used to manipulate data.

Read the paper (PDF)

Graphics are an excellent way to display results from multiple statistical analyses and get a visual message across to the correct audience. Scientific journals often have very precise requirements for graphs that are submitted with manuscripts. While authors often find themselves using tools other than SAS^® to create these graphs, the combination of the SGPLOT procedure and the Output Delivery System enables authors to create what they need in the same place as they conducted their analysis. This presentation focuses on two methods for creating a publication quality graphic in SAS^® 9.4 and provides solutions for some issues encountered when doing so.

Read the paper (PDF)

A new ODS destination for creating Microsoft Excel workbooks is available starting in the third maintenance release for SAS^® 9.4. This destination creates native Microsoft Excel XLSX files, supports graphic images, and offers other advantages over the older ExcelXP tagset. In this presentation, you learn step-by-step techniques for quickly and easily creating attractive multi-sheet Excel workbooks that contain your SAS^® output. The techniques can be used regardless of the platform on which SAS software is installed. You can even use them on a mainframe! Creating and delivering your workbooks on demand and in real time using SAS server technology is discussed. Using earlier versions of SAS to create multi-sheet workbooks is also discussed. Although the title is similar to previous presentations by this author, this presentation contains new and revised material not previously presented.

Read the paper (PDF) | Download the data file (ZIP)

Do you create Excel files from SAS^®? Do you use the ODS EXCELXP tagset or the ODS EXCEL destination? In this presentation, the EXCELXP tagset and the ODS EXCEL destination are compared face to face. There's gonna be a showdown! We give quick tips for each and show how to create Excel files for our Special Census program. Pros of each method are explored. We show the added benefits of the ODS EXCEL destination. We display how to create XML files with the EXCELXP tagset. We present how to use TAGATTR formats with the EXCELXP tagset to ensure that leading and trailing zeros in Excel are preserved. We demonstrate how to create the same Excel file with the ODS EXCEL destination with SAS formats instead of with TAGATTR formats. We show how the ODS EXCEL destination creates native Excel files. One of the drawbacks of an XML file created with the EXCELXP tagset is that a pop-up message is displayed in Excel each time you open it. We present differences using the ABSOLUTE_COLUMN_WIDTH= option in both methods.

Read the paper (PDF)

In order to display data visually, our audience preferred charts and graphs generated by Microsoft Excel over those generated by SAS^®. However, to make the necessary 30 graphs in Excel took 2 3 hours of manual work, even though the chart templates had already been created, and led to mistakes due to human error. SAS graphs took much less time to create, but lacked key functionality that the audience preferred and that was available in Excel graphs. Thanks to SAS, the answer came in Excel 4 Macro Language (X4ML) programming. SAS can actually submit coding to Excel in order to create customized data reporting, to create graphs or to update templates' data series, and even to populate Microsoft Word documents for finalized reports. This paper explores how SAS can be used to create presentation-ready graphs in a proven process that takes less than one minute, compared to the earlier process that took hours. The following code is used and discussed: %macro(macro_var), filename, rc commands, Output Delivery System (ODS), X4ML, and Microsoft Visual Basic for Applications (VBA).

Read the paper (PDF)

When first learning SAS^®, programmers often see the proprietary DATA step as a foreign and nonstandard concept. The introduction of the SAS^® 9.4 DS2 language eases the transition for traditional programmers delving into SAS for the first time. Object Oriented Programming (OOP) has been an industry mainstay for many years, and the DS2 procedure provides an object-oriented environment for the DATA step. In this poster, we go through a business case to show how DS2 can be used to define a reusable package following object-oriented principles.

View the e-poster or slides (PDF)

Making optimal use of SAS^® Grid Computing relies on the ability to spread the workload effectively across all of the available nodes. With SAS^® Scalable Performance Data Server (SPD Server), it is possible to partition your data and spread the processing across the SAS Grid Computing environment. In an ideal world it would be possible to adjust the size and number of partitions according to the data volumes being processed on any given day. This paper discusses a technique that enables the processing performed in the SAS Grid Computing environment to be dynamically reconfigured, automatically at run time, to optimize the use of SAS Grid Computing, and to provide significant performance benefits.

Read the paper (PDF)

The DATASETS procedure provides the most diverse selection of capabilities and features of any of the SAS^® procedures. It is the prime tool that programmers can use to manage SAS data sets, indexes, catalogs, and so on. Many SAS programmers are only familiar with a few of PROC DATASETS's many capabilities. Most often, they only use the data set updating, deleting, and renaming capabilities. However, there are many more features and uses that should be in a SAS programmer's toolkit. This paper highlights many of the major capabilities of PROC DATASETS. It discusses how it can be used as a tool to update variable information in a SAS data set; provide information about data set and catalog contents; delete data sets, catalogs, and indexes; repair damaged SAS data sets; rename files; create and manage audit trails; add, delete, and modify passwords; add and delete integrity constraints; and more. The paper contains examples of the various uses of PROC DATASETS that programmers can cut and paste into their own programs as a starting point. After reading this paper, a SAS programmer will have practical knowledge of the many different facets of this important SAS procedure.

Read the paper (PDF)

Multicategory logit models extend the techniques of logistic regression to response variables with three or more categories. For ordinal response variables, a cumulative logit model assumes that the effect of an explanatory variable is identical for all modeled logits (known as the assumption of proportional odds). Past research supports the finding that as the sample size and number of predictors increase, it is unlikely that proportional odds can be assumed across all predictors. An emerging method to effectively model this relationship uses a partial proportional odds model, fit with unique parameter estimates at each level of the modeled relationship only for the predictors in which proportionality cannot be assumed. First used in SAS/STAT^® 12.1, PROC LOGISTIC in SAS^® 9.4 now extends this functionality for variable selection methods in a manner in which all equal and unequal slope parameters are available for effect selection. Previously, the statistician was required to assess predictor non-proportionality a priori through likelihood tests or subjectively through graphical diagnostics. Following a review of statistical methods and limitations of other commercially available software to model data exhibiting non-proportional odds, a public-use data set is used to examine the new functionality in PROC LOGISTIC using stepwise variable selection methods. Model diagnostics and the improvement in prediction compared to a general cumulative model are noted.

Read the paper (PDF) | Download the data file (ZIP) | View the e-poster or slides (PDF)

Real workflow dependencies exist when the completion or output of one data process is a prerequisite for subsequent data processes. For example, in extract, transform, load (ETL) systems, the extract must precede the transform and the transform must precede the load. This serialization is common in SAS^® data analytic development but should be implemented only when actual dependencies exist. A false dependency, by contrast, exists when the workflow itself does not require serialization but is coded in a manner that forces a process to wait unnecessarily for some unrelated process to complete. For example, an ETL system might extract, transform, and load one data set, and then extract, transform, and load a second data set, causing processing of the second data set to wait unnecessarily for the first to complete. This hands-on session demonstrates three common patterns of false dependencies, teaching SAS practitioners how to recognize and remedy false dependencies through parallel processing paradigms. Groups of participants are pitted against each other, as the class simultaneously runs both serialized software and distributed software that runs in parallel. Participants execute exercises in unison, and then watch their machines race to the finish as the tremendous performance advantages of parallel processing are demonstrated in one exercise after another--ideal for anyone seeking to walk away with proven techniques that can measurably increase your performance and bonus.

Read the paper (PDF)

JavaScript Object Notation (JSON) has quickly become the de facto standard for data transfer on the Internet due to an increase in web data and the usage of full-stack JavaScript. JSON has become dominant in the emerging technologies of the web today, such as in the Internet of Things and in the mobile cloud. JSON offers a light and flexible format for data transfer. It can be processed directly from JavaScript without the need for an external parser. This paper discusses several abilities within SAS^® to process JSON files, the new JSON LIBNAME, and several procedures. This paper compares all of these in detail.

Read the paper (PDF)

Predictive modeling might just be the single most thrilling aspect of data science. Who among us can deny the allure: to observe a naturally occurring phenomenon, conjure a mathematical model to explain it, and then use that model to make predictions about the future? Though many SAS^® users are familiar with using a data set to generate a model, they might not use the awesome power of SAS to store the model and score other data sets. In this paper, we distinguish between parametric and nonparametric models and discuss the tools that SAS provides for storing and scoring each. Along the way, you come to know the STORE statement and the SCORE procedure. We conclude with a brief overview of the PLM procedure and demonstrate how to effectively load and evaluate models that have been stored during the model building process.

Read the paper (PDF)

This paper explores the utilization of medical services, which has a characteristic exponential distribution. Because of this characteristic, a variable generalized linear model can be applied to it to obtain self-managed health plan rates. This approach is different from what is generally used to set the rates of health plans. This new methodology is characterized by capturing qualitative elements of exposed participants that old rate-making methods are not able to capture. Moreover, this paper also uses generalized linear models to estimate the number of days that individuals remain hospitalized. The method is expanded in a project in SAS^® Enterprise Guide^®, in which the utilization of medical services by the base during the years 2012, 2013, 2014, and 2015 (the last year of the base) is compared with the Hospital Cost Index of Variation. The results show that, among the variables chosen for the model, the income variable has an inverse relationship with the risk of health care expenses. Individuals with higher earnings tend to use fewer services offered by the health plan. Male individuals have a higher expenditure than female individuals, and this is reflected in the rate statistically determined. Finally, the model is able to generate tables with rates that can be charged to plan participants for health plans that cover all average risks.

Read the paper (PDF)

Face it your data can occasionally contain characters that wreak havoc on your macro code. Characters such as the ampersand in at&t, or the apostrophe in McDonald's, for example. This paper is designed for programmers who know most of the ins and outs of SAS^® macro code already. Now let's take your macro skills a step farther by adding to your skill set, specifically, %BQUOTE, %STR, %NRSTR, and %SUPERQ. What is up with all these quoting functions? When do you use one over the other? And why would you need %UNQUOTE? The macro language is full of subtleties and nuances, and the quoting functions represent the epitome of all of this. This paper shows you in which instances you would use the different quoting functions. Specifically, we show you the difference between the compile-time and the execution-time functions. In addition to looking at the traditional quoting functions, you learn how to use %QSCAN and %QSYSFUNC among other functions that apply the regular function and quote the result.

Read the paper (PDF)

SAS^® Enterprise Guide^® empowers organizations, programmers, business analysts, statisticians, and users with all the capabilities that SAS^® has to offer. This hands-on workshop presents the SAS Enterprise Guide graphical user interface (GUI), access to multi-platform enterprise data sources, various data manipulation techniques without the need to learn complex coding constructs, built-in wizards for performing reporting and analytical tasks, the delivery of data and results to a variety of mediums and outlets, and support for data management and documentation requirements. Attendees learn how to use the GUI to access SAS data sets and tab-delimited and Excel input files; how to subset and summarize data; how to join (or merge) two tables together; how to flexibly export results to HTML, PDF, and Excel; and how to visually manage projects using flow diagrams.

Read the paper (PDF) | Download the data file (ZIP)

The United States Access Board will soon refresh the Section 508 accessibility standards. The new requirements are based on Web Content Accessibility Guidelines (WCAG) 2.0 and include a total of 38 testable success criteria-16 more than the current requirements. Is your organization ready? Don't worry, the fourth maintenance release for SAS^® 9.4 Output Delivery System (ODS) HTML5 destination has you covered. This paper describes the new accessibility features in the ODS HTML5 destination, explains how to use them, and shows you how to test your output for compliance with the new Section 508 standards.

Read the paper (PDF)

We live in a world of data; small data, big data, and data in every conceivable size between small and big. In today's world, data finds its way into our lives wherever we are. We talk about data, create data, read data, transmit data, receive data, and save data constantly during any given hour in a day, and we still want and need more. So, we collect even more data at work, in meetings, at home, on our smartphones, in emails, in voice messages, sifting through financial reports, analyzing profits and losses, watching streaming videos, playing computer games, comparing sports teams and favorite players, and countless other ways. Data is growing and being collected at such astounding rates, all in the hope of being able to better understand the world around us. As SAS^® professionals, the world of data offers many new and exciting opportunities, but it also presents a frightening realization that data sources might very well contain a host of integrity issues that need to be resolved first. This presentation describes the available methods to remove duplicate observations (or rows) from data sets (or tables) based on the row's values and keys using SAS.

Read the paper (PDF)

Do you ever feel like you email the same reports to the same people over and over and over again? If your customers are anything like mine, you create reports, and lots of them. Our office is using macros, SAS^® email capabilities, and other programming techniques, in conjunction with our trusty contact list, to automate report distribution. Customers now receive the data they need, and only the data they need, on the schedule they have requested. In addition, not having to send these emails out manually saves our office valuable time and resources that can be used for other initiatives. In this session, we walk through a few of the SAS techniques we are using to provide better service to our internal and external partners and, hopefully, make us look a little more like rock stars.

Read the paper (PDF)

If you've got an iPhone, you might have noticed that the Health app is hard at work collecting data on every step you take. And, of course, the data scientist inside you is itching to analyze that data with SAS^®. This paper and an accompanying E-Poster show you how to get step data out of your iPhone Health app and into SAS. Once it's there, you can have at it with all things SAS. In this presentation, we show you how a (what else?) step plot can be used to visualize the 73,000+ steps the author took at SAS^® Global Forum 2016.

Read the paper (PDF) | View the e-poster or slides (PDF)

One of the many difficulties for a SAS^® programmer is remembering how to accurately use SAS syntax, especially syntax that includes many parameters. Not mastering basic syntax parameters definitely makes coding inefficient because the programmer has to check reference manuals constantly to ensure that syntax is correct. One of the more useful but somewhat unknown tools in SAS is the use of SAS abbreviations. This feature enables users to store text strings (such as the syntax of a DATA step function, a SAS procedure, or a complete DATA step) in a user-defined and easy-to-remember abbreviation. When this abbreviation is entered in the enhanced editor, SAS automatically brings up the corresponding stored syntax. Knowing how to use SAS abbreviations is beneficial to programmers with varying levels of SAS expertise. In this paper, various examples of using SAS abbreviations are demonstrated.

The hash object provides an efficient method for quick data storage and data retrieval. Using a common set of lookup keys, hash objects can be used to retrieve data, store data, merge or join tables of data, and split a single table into multiple tables. This paper explains what a hash object is and why you should use hash objects, and provides basic programming instructions associated with the construction and use of hash objects in a DATA step.

Read the paper (PDF)

SAS^® In-Memory Analytics for Hadoop is an analytical programming environment that enables a user to use many components of an analytics project in a single environment, rather than switching between different applications. Users can easily prepare raw data for different types of analytics procedures. These techniques explore the data to enhance the information extractions. They can apply a large variety of statistical and machine learning techniques to the data to compare different analytical approaches. The model comparison capabilities let them quickly find the best model, which they can deploy and score in the Hadoop environment. All of these different components of the analytics project are supported in a distributed in-memory environment for lightning-fast processing. This paper highlights tips for working with the interaction between Hadoop data and for dealing with SAS^® LASR Analytic Server. It contains multiple scenarios with elementary but pragmatic approaches that enable SAS^® programmers to work efficiently within the SAS^® In-Memory Analytics environment.

Read the paper (PDF) | View the e-poster or slides (PDF)

The SAS^® macro language provides a powerful tool to write a program once and reuse it many times in multiple places. A repeatedly executed section of a program can be wrapped into a macro, which can then be shared among many users. A practical example of a macro can be a utility that takes in a set of input parameters, performs some calculations, and sends back a result (such as an interest calculator). In general, a macro modularizes a program into smaller and more manageable sections, and encapsulates repetitive tasks into re-usable code. Modularization can help the code to be tested independently. This paper provides an introduction to writing macros. It introduces the user to the basic macro constructs and statements. This paper covers the following advanced macro subjects: 1) using multiple &s to retrieve/resolve the value of a macro variable; 2) creating a macro variable from the value of another macro variable; 3) handling special characters; 4) the EXECUTE statement to pass a DATA step variable to a macro; 5) using the Execute statement to invoke a macro; and 6) using %RETURN to return a variable from a macro.

Read the paper (PDF)

The SAS^® platform with Unicode's UTF-8 encoding is ready to help you tackle the challenges of dealing with data in multiple languages. In today's global economy, software needs are changing. Companies are globalizing and consolidating systems from various parts of the world. Software must be ready to handle data from social media, international web pages, and databases that have characters in many different languages. SAS makes migrating your data to Unicode a snap! This paper helps you move smoothly from your legacy SAS environment to the powerful SAS Unicode environment with UTF-8 support. Along the way, you will uncover secrets to successfully manipulate your characters, so that all of your data remains intact.

Read the paper (PDF)

With the proliferation of analytics expanding across every function of the enterprise, the need for broader access to data by experienced data scientists and non-technical users to produce reports and do discovery is growing exponentially. The unintended consequence of this trend is a bottleneck within IT to deliver the necessary data while still maintaining the necessary governance and data security standards required to safeguard this critical corporate asset. This presentation illustrates how organizations are solving this challenge and enabling users to both access larger quantities of existing data and add new data to their own models without negatively impacting the quality, security, or cost to store that data. It also highlights some of the cost and performance benefits achieved by enabling self-service data management.

Imagine if you will a program, a program that loves its data, a program that loves its data to be in the same directory as the program itself. Together, in the same directory. True love. The program loves its data so much, it just refers to it by filename. No need to say what directory the data is in; it is the same directory. Now imagine that program being thrust into the world of the server. The server knows not what directory this program resides in. The server is an uncaring, but powerful, soul. Yet, when the program is executing, and the program refers to the data just by filename, the server bellows nay, no path, no data. A knight in shining armor emerges, in the form of a SAS^® macro, who says lo, with the help of the SAS^® Enterprise Guide^® macro variable minions, I can gift you with the location of the program directory and send that with you to yon mighty server. And there was much rejoicing. Yay. This paper shows you a SAS macro that you can include in your SAS Enterprise Guide pre-code to automatically set your present working directory to the same directory where your program is saved on your UNIX or Linux operating system. This is applicable to submitting to any type of server, including a SAS Grid Server. It gives you the flexibility of moving your code and data to different locations without having to worry about modifying the code. It also helps save time by not specifying complete pathnames in your programs. And can't we all use a little more time?

Read the paper (PDF)

Web services are becoming more and more relied upon for serving up vast amounts of data. With such a heavy reliance on the web, and security threats increasing every day, security is a big concern. OAuth 2.0 has become a go-to way for websites to allow secure access to the services they provide. But with increased security, comes increased complexity. Accessing web services that use OAuth 2.0 is not entirely straightforward, and can cause a lot of users plenty of trouble. This paper helps clarify the basic uses of OAuth and shows how you can easily use Base SAS^® to access a few of the most popular web services out there.

Read the paper (PDF)

One often uses an iterative %DO loop to execute a section of a macro repetitively. An alternative method is to use the implicit loop in the DATA step with the EXECUTE routine to generate a series of macro calls. One of the advantages in the latter approach is eliminating the need of using indirect referencing. To better understand the use of the CALL EXECUTE routine, it is essential for programmers to understand the mechanism and the timing of macro processing to avoid programming errors. These technical issues are discussed in detail in this paper.

Read the paper (PDF)

Are you tired of constantly creating new emails each and every time you run a report, frantically searching for the reports, attaching said reports, and writing emails, all the while thinking there has to be a better way? Then, have I got some code to share with you! This session provides you with code to flee from your old ways of emailing data and reports. Instead, you set up your SAS^® code to send an email to your recipients. The email attaches the most current files each and every time the code is run. You do not have to do anything manually after you run your SAS code. This session provides SAS programmers with instructions about how to create their own email in a macro that is based on their current reports. We demonstrate different options to customize the code to add the email body (and to change the body) and to add attachments (such as PDF and Excel). We show you an additional macro that checks whether a file exists and adds a note in the SAS log if it is missing so that you won't get a warning message. Using SAS code, you will become more efficient and effective by automating a tedious process and reducing errors in email attachments, wording, and recipient lists.

Read the paper (PDF)

SAS^® 9.4 Graph Template Language: Reference has more than 1300 pages and hundreds of options and statements. It is no surprise that programmers sometimes experience unexpected twists and turns when using the graph template language (GTL) to draw figures. Understandably, it is easy to become frustrated when your program fails to produce the desired graphs despite your best effort. Although SAS needs to continue improving GTL, this paper offers several tricks that help overcome some of the roadblocks in graphing.

Read the paper (PDF)

Can you actually get something for nothing? With PROC SQL's subquery and remerging features, then yes, you can. When working with categorical variables, there is often a need to add flag variables based on group descriptive statistics, such as group counts and minimum and maximum values. Instead of first creating the group count or minimum or maximum values, and then merging the summarized data set with the original data set with conditional statements creating a flag variable, why not take advantage of PROC SQL to complete three steps in one? With PROC SQL's subquery, CASE-WHEN clause, and summary functions by the group variable, you can easily remerge the new flag variable back with the original data set.

Read the paper (PDF)

Have you ever run SAS^® code with a DATA step and the results were not what you expected? Tracking down the problem can be a time-consuming task. To assist you in this common scenario, SAS^® Enterprise Guide^® 7.13 and beyond has a DATA step debugger tool. The simple and interactive DATA step debugger enables you to visually walk through the execution of your DATA step program. You can control the DATA step execution, view the variables, and set breakpoints to quickly identify data and logic errors. Come see the full capabilities of the new SAS Enterprise Guide DATA step debugger. You'll be squashing bugs in no time!

Read the paper (PDF)

The SAS^® LOCK statement was introduced in SAS^®7 with great pomp and circumstance, as it enabled SAS^® software to lock data sets exclusively. In a multiuser or networked environment, an exclusive file lock prevents other users and processes from accessing and accidentally corrupting a data set while it is in use. Moreover, because file lock status can be tested programmatically with the LOCK statement return code (&SYSLCKRC), data set accessibility can be validated before attempted access, thus preventing file access collisions and facilitating more reliable, robust software. Notwithstanding the intent of the LOCK statement, stress testing demonstrated in this session illustrates vulnerabilities in the LOCK statement that render its use inadvisable due to its inability to lock data sets reliably outside of the SAS/SHARE^® environment. To overcome this limitation and enable reliable data set locking, a methodology is demonstrated that uses semaphores (flags) that indicate whether a data set is available or is in use, and mutually exclusive (mutex) semaphores that restrict data set access to a single process at one time. With Base SAS^® file locking capabilities now restored, this session further demonstrates control table locking to support process synchronization and parallel processing. The LOCKSAFE macro demonstrates a busy-waiting (or spinlock) design that tests data set availability repeatedly until file access is achieved or the process times out.

Read the paper (PDF)

As a SAS^® programmer, how often does it happen where you would like to submit some code but not wait around for it to finish? SAS^® Studio has a way to achieve this and much more! This paper covers how to submit and execute SAS code in the background using SAS Studio. Background submit in the SAS Studio interface allows you to submit code and continue with your work. You receive a notification when it is finished, or you can even disconnect from your browser session and check the status of the submitted code later. Or you can choose to use SAS Studio to submit your code without bringing up the SAS Studio interface at all. This paper also covers the ability to use a command-line executable program that uses SAS Studio to execute SAS code in the background and generate log and result files without having to create a new SAS Studio session. These techniques make it much easier to spin up long-running jobs, while still being able to get your other work done in the meantime.

Read the paper (PDF)

Did you know that you could leverage the statistical power of the FREQ procedure and still be able to control the appearance of your output? Many people think they have to use procedures such as REPORT and TABULATE to be able to apply style options and control formats and headings for their output. However, if you pair PROC FREQ with a TEMPLATE procedure step, you can customize the appearance of your output and make enhancements to tables, such as adding colors and controlling headings. If you are a statistician, you know the many PROC FREQ options that produce high-level statistics. But did you also know that PROC FREQ can generate a graphical representation of those statistics? PROC FREQ can generate the graphs, and then you can use ODS Graphics and the Graph Template Language (GTL) to improve the appearance of the graphs. Written for intermediate users, this paper demonstrates how you can enhance the default output for PROC FREQ one-way and multi-way tables by modifying colors, formats, and labels. This paper also describes the syntax for creating graphs for multiple statistics, and it uses examples to show how you can customize these graphs.

Read the paper (PDF)

In the game of tag, being it is bad, but where accessibility compliance is concerned, being tagged is good! Tagging is required for PDF files to comply with accessibility standards such as Section 508 and the Web Content Accessibility Guidelines (WCAG). In the fourth maintenance release for SAS^® 9.4, the preproduction option in the ODS PDF statement, ACCESSIBLE, creates a tagged PDF file. We look at how this option changes the file that is created and focus on the SAS^® programming techniques that work best with the new option. You ll then have the opportunity to try it yourself in your own code and provide feedback to SAS.

Read the paper (PDF)

In 32 years as a SAS^® consultant at the Federal Reserve Board, I have seen some questions about common SAS tasks surface again and again. This paper collects the most common questions related to basic DATA step processing from my previous 'Tales from the Help Desk' papers, and provides code to explain and resolve them. The following tasks are reviewed: using the LAG function with conditional statements; avoiding character variable truncation; surrounding a macro variable with quotes in SAS code; handling missing values (arithmetic calculations versus functions); incrementing a SAS date value with the INTNX function; converting a variable from character to numeric or vice versa and keeping the same name; converting character or numeric values to SAS date values; using an array definition in multiple DATA steps; using values of a variable in a data set throughout a DATA step by copying the values into a temporary array; and writing data to multiple external files in a DATA step, determining file names dynamically from data values. In the context of discussing these tasks, the paper provides details about SAS processing that can help users employ SAS more effectively. See the references for seven previous papers that contain additional common questions.

Read the paper (PDF)

SAS^® Macro Language can be used to enhance many report-generating processes. This presentation showcases the potential that macros have in populating predesigned RTF templates. If you have multiple report templates saved, SAS^® can choose and populate the correct ones based on macro programming and DATA _NULL_ using the TRANSTRN function. The autocall macro %TRIM, combined with a macro (for example, &TEMPLATE), can be attached to the output RTF template name. You can design and save as many templates as you like or need. When SAS assigns the macro variable TEMPLATE a value, the %TRIM(&TEMPLATE) statement in the output pathway correctly populates the appropriate template. This can make life easy if you create multiple different reports based on one data set. All that's required are stored templates on accessible pathways.

View the e-poster or slides (PDF)

SAS offers generation data set structure as part of the language feature that many users are familiar with. They use it in their organizations and manage it using keywords such as GENMAX and GENNUM. While SAS operates in a mainframe environment, users also have the ability to tap into the GDG (generation data group) feature available on z/OS, OS/390, OS/370, IBM 3070, or IBM 3090 machines. With cost-saving initiatives across businesses and due to some scaling factors, many organizations are in the process of migrating to mid-tier platforms to cheaper operating platforms such as UNIX and AIX. Because Linux is open source and is a cheaper alternative, several organizations have opted for the UNIX distribution of SAS that can work in UNIX and AIX environments. While this might be a viable alternative, there are certain nuances that the migration effort brings to the technical conversion teams. On UNIX, the concept of GDGs does not exist. While SAS offers generation data sets, they are good only for SAS data sets. If the business organization needs to house and operate with a GDG-like structure for text data sets, there isn't one available. While my organization had a similar initiative to migrate programs used to run the subprime mortgage analytic, incentive, and regulatory reporting, we identified the paucity of literature and research on this topic. Hence, I ended up developing the utility that addresses this need. This is a simple macro that helps us closely simulate a GDG/GDS.

Read the paper (PDF) | View the e-poster or slides (PDF)

As computer technology advances, SAS^® continually pursues opportunities to implement state-of-the-art systems that solve problems in data preparation and analysis faster and more efficiently. In this pursuit, we have extended the TRANSPOSE procedure to operate in a distributed fashion within both Teradata and Hadoop, using dynamically generated DS2 executed by the SAS^® Embedded Process and within SAS^® Viya , using its native transpose action. With its new ability to work within these environments, PROC TRANSPOSE provides you with access to its parallel processing power and produces results that are compatible with your existing SAS programs.

Read the paper (PDF)

Have you ever had your macro code not work and you couldn't figure out why? Maybe even something as simple as %if &sysscp=WIN %then LIBNAME libref 'c:\temp'; ? This paper is designed for programmers who know %LET and can write basic macro definitions already. Now let's take your macro skills a step farther by adding to your skill set. The %IF statement can be a deceptively tricky statement due to how IF statements are processed in a DATA step and how that differs from how %IF statements are processed by the macro processor. Focus areas of this paper are: 1) emphasizing the importance of the macro facility as a code-generation facility; 2) how an IF statement in a DATA step differs from a macro %IF statement and when to use which; 3) why semicolons can be misinterpreted in an %IF statement.

Read the paper (PDF)

JSON is quickly becoming the industry standard for data interchanges, especially in supporting REST APIs. But until now, importing JSON content into SAS^® software and leveraging it in SAS has required significant custom code. Developing that code can be laborious, requiring transcoding, manual text parsing, and creating handlers for unexpected structure changes. Fortunately, the new JSON LIBNAME engine (in the fourth maintenance release for SAS^® 9.4 and later) delivers a robust, efficient method for importing JSON content into SAS data structures. This paper demonstrates several real-world examples of the JSON LIBNAME using open data APIs. The first example contrasts the traditional custom code and JSON LIBNAME approach using big data from the United Nations Comtrade Database. The two approaches are compared in terms of complexity of code, time to execute, and the resulting data structures. The same method is applied to data from Google and the US Census Bureau's APIs. Finally, to demonstrate the ability of the JSON LIBNAME to handle unexpected changes to a JSON data structure, we use the SAS JSON procedure to write a JSON file and then simulate changes to that structure to show how one JSON LIBNAME process can easily adjust the import to handle those changes.

Read the paper (PDF)

Does your job require you to create reports in Microsoft Excel on a quarterly, monthly, or even weekly basis? Are you creating all or part of these reports by hand, referencing another sheet containing rows and rows and rows of data? If so, stop! There is a better way! The new ODS destination for Excel enables you to create native Excel files directly from SAS^®. Now you can include just the data you need, create great-looking tabular output, and do it all in a fraction of the time! This paper shows you how to use the REPORT procedure to create polished tables that contain formulas, colored cells, and other customized formatting. Also presented in the paper are the destination options used to create various workbook structures, such as multiple tables per worksheet. Using these techniques to automate the creation of your Excel reports will save you hours of time and frustration, enabling you to pursue other endeavors.

Read the paper (PDF)

You might encounter people who used SAS^® long ago (perhaps in university), or people who had a very limited use of SAS in a job. Some of these people with limited knowledge and experience think that SAS is just a statistics package or just a GUI. Those that think of it as a GUI usually reference SAS^® Enterprise Guide^® or, if it was a really long time ago, SAS/AF^® or SAS/FSP^®. The reality is that the modern SAS system is a very large and complex ecosystem, with hundreds of software products and diversified tools for programmers and users. This poster provides diagrams and tables that illustrate the complexity of the SAS system from the perspective of a programmer. Diagrams and illustrations include the functional scope and operating systems in the ecosystem; different environments that program code can run in; cross-environment interactions and related tools; SAS^® Grid Computing: parallel processing; how SAS can run with files in memory (the legacy SAFILE statement and big data and Hadoop); and how some code can run in-database. We end with a tabulation of the many programming languages and SQL dialects that are directly or indirectly supported within SAS. This poster should enlighten those who think that SAS is an old, dated statistics package or just a GUI.

View the e-poster or slides (PDF)

As the IT industry moves to further embrace cloud computing and the benefits it enables, many companies have been slow to adopt these changes due to concerns around data compliance. Compliance with state and federal law and the relevant regulations often leads decision makers to insist that systems dealing with protected health information or similarly sensitive data remain on-premises, as the risks for non-compliance are so high. In this session, we detail BNL Consulting s standard practices for transitioning solutions that are compliant with the Health Insurance Portability and Accountability Act (HIPAA) from on-premises to a cloud-based environment hosted by Amazon Web Services (AWS). We explain that by following best practices and doing plenty of research, HIPAA compliance in a cloud environment is no more challenging than compliance in an on-premises environment. We discuss the role of best-in-practice dev-ops tools like Docker, Consul, ELK Stack, and others, which improve the reliability and the repeat-ability of your HIPAA-compliant solutions. We tie these recommendations to the use of common SAS tools and show how they can work in concert to stabilize and improve the performance of the solution over the on-premises alternatives. Although this presentation is focused on health care and HIPAA-specific examples, many of the described practices and processes apply to any sensitive-data solutions that are being considered for the cloud.

Read the paper (PDF)

SAS^® Embedded Process enables user-written DS2 code and scoring models to run inside Hadoop. It taps into the massively parallel processing (MPP) architecture of Hadoop for scalable performance. SAS Embedded Process explores and complies with many Hadoop components. This paper explains how SAS Embedded Process interacts with existing Hadoop security technologies, such as Apache Sentry and RecordServices.

Read the paper (PDF)

This presentation explores the steps taken by a large public research institution to develop a five-year enrollment forecasting model to support the critical enrollment management process at an institution. A key component of the process is providing university stakeholders with a self-service, secure, and flexible tool that enables them to quickly generate different enrollment projections using the most up-to-date information as possible in Microsoft Excel. The presentation shows how we integrated both SAS^® Enterprise Guide^® and the SAS^® Add-In for Microsoft Office to support this critical process, which had very specific stakeholder requirements and expectations.

Read the paper (PDF)

Do you have a complex report involving multiple tables, text items, and graphics that could best be displayed in a multi-tabbed spreadsheet format? The Output Delivery System (ODS) destination for Excel, introduced in SAS^® 9.4, enables you to create Microsoft Excel workbooks that easily integrate graphics, text, and tables, including column labels, filters, and formatted data values. In this paper, we examine the syntax used to generate a multi-tabbed Excel report that incorporates output from the REPORT, PRINT, SGPLOT, and SGPANEL procedures.

Read the paper (PDF)

With SAS^® Viya and SAS^® Cloud Analytic Services (CAS), SAS is moving into a new territory where SAS^® Analytics is accessible to popular scripting languages using open APIs. Python is one of those client languages. We demonstrate how to connect to CAS, run CAS actions, explore data, build analytical models, and then manipulate and visualize the results using standard Python packages such as Pandas and Matplotlib. We cover a wide variety of topics to give you a bird's eye view of what is possible when you combine the best of SAS with the best of open source.

Read the paper (PDF)

Are you frustrated with manually setting options to control your SAS^® Display Manager sessions but become daunted every time you look at all the places you can set options and window layouts? In this paper, we look at various files SAS^® accesses when starting, what can (and cannot) go into them, and what takes precedence after all are executed. We also look at the SAS registry and how to programmatically change settings. By the end of the paper, you will be comfortable in knowing where to make the changes that best fit your needs.

Read the paper (PDF)

This paper discusses the techniques I used at the US Census Bureau to overcome the issue of dealing with large amounts of data while modernizing some of their public-facing web applications by using service-oriented architecture (SOA) to deploy JavaScript web applications powered by SAS^®. The paper covers techniques that resulted in reducing 1,753,926 records (82 MB) down to 58 records (328 KB), a 99.6% size reduction in summarized data on the server side.

Read the paper (PDF)

It is often necessary to assess multi-rater agreement for multiple-observation categories in case-controlled studies. The Kappa statistic is one of the most common agreement measures for categorical data. The purpose of this paper is to show an approach for using SAS^® 9.4 procedures and the SAS^® Macro Language to estimate Kappa with 95% CI for pairs of nurses that used two different triage systems during a computer-simulated chemical mass casualty incident (MCI). Data from the Validating Triage for Chemical Mass Casualty Incidents A First Step R01 grant was used to assess the performance of a typical hospital triage system called the Emergency Severity Index (ESI), compared with an Irritant Gas Syndrome Agent (IGSA) triage algorithm being developed from this grant, to quickly prioritize the treatment of victims of IGSA incidents. Six different pairs of nurses used ESI triage, and seven pairs of nurses used the IGSA triage prototype to assess 25 patients exposed to an IGSA and 25 patients not exposed. Of the 13 pairs of nurses in this study, two pairs were randomly selected to illustrate the use of the SAS Macro Language for this paper. If the data was not square for two nurses, a square-form table for observers using pseudo-observations was created. A weight of 1 for real observations and a weight of .0000000001 for pseudo-observations were assigned. Several macros were used to reduce programming. In this paper, we show only the results of one pair of nurses for ESI.

Read the paper (PDF) | View the e-poster or slides (PDF)

The ODS EXCEL destination has made sharing SAS^® reports and graphs much easier. What is even more exciting is that this destination is available for use regardless of the platform. This is extremely useful when reporting is performed on remote servers. This presentation goes through the basics of using the ODS EXCEL destination and shows specific examples of how to use this in a remote environment. Examples for both SAS^® on Windows and in SAS^® Enterprise Guide^® are provided.

Read the paper (PDF)

Students now have access to a SAS^® learning tool called SAS^® University Edition. This online tool is freely available to all, for non-commercial use. This means it is basically a free version of SAS that can be used to teach yourself or someone else how to use SAS. Since a large part of my body of writings has focused upon moving data between SAS and Microsoft Excel, I thought I would take some time to highlight the tasks that permit movement of data between SAS and Excel using SAS University Edition. This paper is directed toward sending graphs to Excel using the new ODS EXCEL destination.

Read the paper (PDF)

JMP^® integrates very nicely with SAS^® software, so you can do some pretty amazing things by combining the power of JMP and SAS. You can submit some code to run something on a SAS server and bring the results back as a JMP table. Then you can do lots of things with the JMP table to analyze the data returned. This workshop shows you how to access data via SAS servers, run SAS code and bring data back to JMP, and use JMP to do many things very quickly and easily. Explore the synergies between these tools; having both is a powerful combination that far outstrips just having one, or not using them together.

Read the paper (PDF) | Download the data file (ZIP)

It has become a need-it-now world, and many managers and decision-makers need their reports and information quicker than ever before to compete. As SAS^® developers, we need to acknowledge this fact and write code that gets us the results we need in seconds or minutes, rather than in hours. SAS is a great tool for extracting, transferring, and loading data, but as with any tool, it is most efficient when used in the most appropriate way. Using the SQL pass-through techniques presented in this paper can reduce run time by up to 90% by passing the processing to the database instead of moving the data back to SAS to be consumed. You can reap these benefits with only a minor increase in coding difficulty.

Read the paper (PDF) | View the e-poster or slides (PDF)

Have you ever been working on a task and wondered whether there might be a SAS^® function that could save you some time? Let alone, one that might be able to do the work for you? Data review and validation tasks can be time-consuming efforts. Any gain in efficiency is highly beneficial, especially if you can achieve a standard level where the data itself can drive parts of the process. The ANY and NOT functions can help alleviate some of the manual work in many tasks such as data review of variable values, data compliance, data formats, and derivation or validation of a variable's data type. The list goes on. In this poster, we cover the functions and their details and use them in an example of handling date and time data and mapping it to ISO 8601 date and time formats.

Read the paper (PDF) | View the e-poster or slides (PDF)

For the past couple of years, it seems that big data has been a buzzword in the industry. We have more and more data coming in from more and more places, and it is our job to figure out how best to handle it. One way to attempt to organize data is with arrays, but what do you do when the array you are attempting to populate is so large that it cannot be handled in memory? How do you handle a large array when most of the elements are missing? This paper and presentation deals with the concept of a sparse matrix. A sparse matrix is a large array with relatively few actual elements. We address methods for handling such a construct while keeping memory, CPU, clock, and programmer time to their respective minimums.

A SAS^® program with the extension .SAS is simply a text file. This fact opens the door to many powerful results. You can read a typical SAS program into a SAS data set as a text file with a character variable, with one line of the program being one record in the data set. The program's code can be changed, and a new program can be written as a simple text file with a .SAS extension. This presentation shows an example of dynamically editing SAS code on the fly and generating statistics about SAS programs.

Read the paper (PDF)

SAS^® has many methods of doing table lookups in DATA steps: formats, arrays, hash objects, the SASMSG function, indexed data sets, and so on. Of these methods, hash objects and indexed data sets enable you to specify multiple lookup keys and to return multiple table values. Both methods can be updated dynamically in the middle of a DATA step as you obtain new information (such as reading new keys from an input file or creating new synthetic keys). Hash objects are very flexible, fast, and fairly easy to use, but they are limited by the amount of data that can be held in memory. Indexed data sets can be slower, but they are not limited by what can be held in memory. As a result, they might be your only option in some circumstances. This presentation discusses how to use an indexed data set for table lookup and how to update it dynamically using the MODIFY statement and its allies.

Read the paper (PDF)

The SAS RAKING macro, introduced in 2000, has been implemented by countless survey researchers worldwide. The authors receive messages from users who tirelessly rake survey data using all three generations of the macro. In this poster, we present the fourth generation of the macro, cleaning up remnants from the previous versions, and resolving user-reported confusion. Most important, we introduce a few helpful enhancements including: 1) An explicit indicator for trimming (or not trimming) the weight that substantially saves run time when no trimming is needed. 2) Two methods of weight trimming, AND and OR, that enable users to overcome a stubborn non-convergence. When AND is indicated, weight trimming occurs only if both (individual and global) high weight cap values are true. Conversely, weight increase occurs only if both low weight cap values are true. When OR is indicated, weight trimming occurs if either of the two (individual or global) high weight cap values is true. Conversely, weight increase occurs if either of the two low weight cap values is true. 3) Summary statistics related to the number of cases with trimmed or increased weights have been expanded. 4) We introduce parameters that enable users to use different criteria of convergence for different raking marginal variables. We anticipate that these innovations will be enthusiastically received and implemented by the survey research community.