SAS Global Forum 2015 Proceedings

All SAS^® data sets and variables have standard attributes. These include items such as creation date, engine, compression and sort information for data sets, and format and length information for variables. However, for the first time in SAS 9.4, the developer can add their own customized attributes to both data sets and variables. This paper shows how these extended attributes can be created, modified, and maintained. It suggests the sort of items that might be candidates for use as extended attributes and explains in what circumstances they can be used. It also provides a worked example of how they can be used to inform and aid the SAS programmer in creating SAS applications.

Read the paper (PDF).

Looking for a handy technique to have in your toolkit? Consider SAS^® Views^®, especially if you work with large data sets. After a brief introduction to SAS Views, I'll show you several cool ways to use them that will streamline your code and save workspace.

Read the paper (PDF).

Delimited text files are often plagued by appended and/or truncated records. Writing customized SAS^® code to import such a text file and break out into fields can be challenging. If only there was a way to fix the file before importing it. Enter the file_fixing_tool, a SAS^® Enterprise Guide^® project that uses the SAS PRX functions to import, fix, and export a delimited text file. This fixed file can then be easily imported and broken out into fields.

Read the paper (PDF). | Download the data file (ZIP).

Whenever you travel, whether it's to a new destination or to your favorite vacation spot, it's nice to have a guide to assist you with planning and setting expectations. The ODS LAYOUT statement became production in SAS^® 9.4. For those intrepid programmers who used ODS LAYOUT in an earlier release of SAS^®, this paper contains tons of information about changes you need to know about. Programmers new to SAS 9.4 (or new to ODS LAYOUT) need to understand the basics. This paper reviews some common issues customers have reported to SAS Technical Support when migrating to the LAYOUT destination in SAS 9.4 and explores the basics for those who are making their first foray into the adventure that is ODS LAYOUT. This paper discusses some tips and tricks to ensure that your trip through the ODS LAYOUT statement will be a fun and rewarding one.

Read the paper (PDF).

In the very near future you will likely encounter Hadoop. It is rapidly displacing database management systems in the corporate world and is rearing its head in the SAS^® world. If you think now is the time to learn how to use SAS with Hadoop, you are in luck. This workshop is the jump start you need. This workshop introduces Hadoop and shows you how to access it by using SAS/ACCESS^® Interface to Hadoop. During the workshop, we show you how to do the following: how to configure your SAS environment so that you can access Hadoop data; how to use the Hadoop FILENAME statement; how to use the HADOOP procedure; and how to use SAS/ACCESS Interface to Hadoop (including performance tuning).

The merge is one of the SAS^® programmer's most commonly used tools. However, it can be fraught with pitfalls to the unwary user. In this paper, we look under the hood of the DATA step and examine how the program data vector works. We see what's really happening when data sets are merged and how to avoid subtle problems.

Read the paper (PDF).

You know that you want to control the process flow of your program. When your program is executed multiple times, with slight variations, you will need to control the changes from iteration to iteration, the timing of the execution, and the maintenance of output and logs. Unfortunately, in order to achieve the control that you know that you need to have, you will need to make frequent, possibly time-consuming and potentially error-prone, manual corrections and edits to your program. Fortunately, the control you seek is available and it does not require the use of time-intensive manual techniques. List processing techniques are available that give you control and peace of mind and enable you to be a successful control freak. These techniques are not new, but there is often hesitancy on the part of some programmers to take full advantage of them. This paper reviews these techniques and demonstrates them through a series of examples.

Read the paper (PDF).

This presentation is an open-ended discussion about techniques for transferring data and analytical results from SAS^® to Microsoft Excel. There are some introductory comments, but this presentation does not have any set content. Instead, the topics discussed are dictated by attendee questions. Come prepared to ask and get answers to your questions. To submit your questions or suggestions for discussion in advance, go to http://support.sas.com/surveys/askvince.html.

We have to pull data from several data files in creating our working databases. The simplest use of SAS^® hash objects greatly reduces the time required to draw data from many sources when compared to the use of multiple proc sorts and merges.

Read the paper (PDF). | Download the data file (ZIP).

Over the years, the SAS^® Business Intelligence platform has proved its importance in this big data world with its suite of applications that enable us to efficiently process, analyze, and transform huge amounts of business data. Within the data warehouse universe, 'batch execution' sits in the heart of SAS Data Integration technologies. On a day-to-day basis, batches run, and the current status of the batch is generally sent out to the team or to the client as a 'static' e-mail or as a report. From experience, we know that they don't provide much insight into the real 'bits and bytes' of a batch run. Imagine if the status of the running batch is automatically captured in one central repository and is presented on a beautiful web browser on your computer or on your iPad. All this can be achieved without asking anybody to send reports and with all 'post-batch' queries being answered automatically with a click. This paper aims to answer the same with a framework that is designed specifically to automate the reporting aspects of SAS batches and, yes, it is all about collecting statistics of the batch, and we call it - 'BatchStats.'

Kaiser Permanente Northwest is contractually obligated for regulatory submissions to Oregon Health Authority, Health Share of Oregon, and Molina Healthcare in Washington. The submissions consist of Medicaid Encounter data for medical and pharmacy claims. SAS^® programs are used to extract claims data from Kaiser's claims data warehouse, process the data, and produce output files in HIPAA ASC X12 and NCPDP format. Prior to April 2014, programs were written in SAS^® 8.2 running on a VAX server. Several key drivers resulted in the conversion of the existing system to SAS^® Enterprise Guide^® 5.1 running on UNIX. These drivers were: the need to have a scalable system in preparation for the Affordable Care Act (ACA); performance issues with the existing system; incomplete process reporting and notification to business owners; and a highly manual, labor-intensive process of running individual programs. The upgraded system addressed these drivers. The estimated cost reduction was from $1.30 per reported encounter to $0.13 per encounter. The converted system provides for better preparedness for the ACA. One expected result of ACA is significant Medicaid membership growth. The program has already increased in size by 50% in the preceding 12 months. The updated system allows for the expected growth in membership.

Read the paper (PDF).

You've worked for weeks or even months to produce an analysis suite for a project. Then, at the last moment, someone wants a subgroup analysis, and they inform you that they need it yesterday. This should be easy to do, right? So often, the programs that we write fall apart when we use them on subsets of the original data. This paper takes a look at some of the best practice techniques that can be built into a program at the beginning, so that users can subset on the fly without losing categories or creating errors in statistical tests. We review techniques for creating tables and corresponding titles with BY-group processing so that minimal code needs to be modified when more groups are created. And we provide a link to sample code and sample data that can be used to get started with this process.

Read the paper (PDF).

SAS^® provides a wealth of resources for creating useful, attractive metadata tables, including PROC CONTENTS listing output (to ODS destinations), the PROC CONTENTS OUT= SAS data set, and PROC CONTENTS ODS Output Objects. This paper and presentation explores some less well-known resources to create metadata such as %SYSFUNC, PROC DATASETS, and Dictionary Tables. All these options in conjunction with the use of the ExcelXP tagset (and, new in the second maintenance release for SAS^® 9.4, the Excel tagset) enable the creation of multi-tab metadata workbooks at the click of a mouse.

Read the paper (PDF).

So you are still writing SAS^® DATA steps and SAS macros and running them through a command-line scheduler. When work comes in, there is only one person who knows that code, and they are out--what to do? This paper shows how SAS applies extract, transform, load (ETL) modernization techniques with SAS^® Data Integration Studio to gain resource efficiencies and to break down the ETL black box. We are going to share the fundamentals (metadata foldering and naming standards) that ensure success, along with steps to ease into the pool while iteratively gaining benefits. Benefits include self-documenting code visualization, impact analysis on jobs and tables impacted by change, and being supportable by interchangeable bench resources. We conclude with demonstrating how SAS^® Visual Analytics is being used to monitor service-level agreements and provide actionable insights into job-flow performance and scheduling.

Read the paper (PDF).

The DATA step enables you to read, write, and manipulate many types of data. As data evolves to a more free-form state, the ability of SAS^® to handle character data becomes increasingly important. This presentation, expanded and enhanced from an earlier version, addresses character data from multiple vantage points. For example, what is the default length of a character string, and why does it appear to change under different circumstances? Special emphasis is given to the myriad functions that can facilitate the processing and manipulation of character data. This paper is targeted at a beginning to intermediate audience.

Read the paper (PDF).

With all the talk of 'big data' and 'visual analytics' we sometimes forget how important it is, and often how hard it is, to get external data into SAS^®. In this paper, we review some common data sources such as delimited sources (for example, CSV), as well as structured flat files, and the programming steps needed to successfully load these files into SAS. In addition to examining the INFILE and INPUT statements, we look at some methods for dealing with bad data. This paper assumes only basic SAS skills, although the topic can be of interest to anyone who needs to read external files.

Read the paper (PDF).

There is a plethora of uses of the colon (:) in SAS^® programming. The colon is used as a data or variable name wild-card, a macro variable creator, an operator modifier, and so forth. The colon helps you write clear, concise, and compact code. The main objective of this paper is to encourage the effective use of the colon in writing crisp code. This paper presents real-time applications of the colon in day-to-day programming. In addition, this paper presents cases where the colon limits programmers' wishes.

Read the paper (PDF).

In previous papers I have described how many standard SAS/GRAPH^® plots can be converted easily to ODS Graphics by using simple PROC SGPLOT or SGPANEL code. SAS/GRAPH Annotate code would appear, at first sight, to be much more difficult to convert to ODS Graphics, but by using its layering features, many Annotate plots can be replicated in a more flexible and repeatable way. This paper explains how to convert many of your Annotate plots, so they can be reproduced using Base SAS^®.

Read the paper (PDF).

This presentation explains how to use Base SAS^®9 software to create multi-sheet Excel workbooks. You learn step-by-step techniques for quickly and easily creating attractive multi-sheet Excel workbooks that contain your SAS^® output using the ExcelXP Output Delivery System (ODS) tagset. The techniques can be used regardless of the platform on which SAS software is installed. You can even use them on a mainframe! Creating and delivering your workbooks on-demand and in real time using SAS server technology is discussed. Although the title is similar to previous presentations by this author, this presentation contains new and revised material not previously presented.

Read the paper (PDF). | Download the data file (ZIP).

Hysteresis loops occur because of a time lag between input and output. In the pharmaceutical industry, hysteresis plots are tools to visualize the time lag between drug concentration and drug effects. Before SAS^® 9.2, SAS annotations were used to generate such plots. One of the criticisms was that SAS programmers had to write complex macros to automate the process; code management and validation tasks were not easy. With SAS 9.2, SAS programmers are able to generate such plots with ease. This paper demonstrates the generation of such plots with both Base SAS and SAS/GRAPH^® software.

Read the paper (PDF).

The Centers for Disease Control and Prevention (CDC) went through a large migration from a mainframe to a Windows platform. This e-poster will highlight the Data Automated Transfer Utility (DATU) that was developed to migrate historic files between the two file systems using SAS^® macros and SAS/CONNECT^®. We will demonstrate how this program identifies the type of file, transfers the file appropriately, verifies the successful transfer, and provides the details in a Microsoft Excel report. SAS/CONNECT code, special system options, and mainframe code will be shown. In 2009, the CDC made the decision to retire a mainframe that was used for years of primarily SAS work. The replacement platform is a SAS grid system, based on Windows, which is referred to as the Consolidated Statistical Platform (CSP). The change from mainframe to Windows required the migration of over a hundred thousand files totaling approximately 20 terabytes. To minimize countless man hours and human error, an automated solution was developed. DATU was developed for users to migrate their files from the mainframe to the new Windows CSP or other Windows destinations. Approximately 95% of the files on the CDC mainframe were one of three file types: SAS data sets, sequential text files, and partitioned data sets (PDS) libraries. DATU dynamically determines the file type and uses the appropriate method to transfer the file to the assigned Windows destination. Variations of files are detected and handled appropriately. File variations include multiple SAS versions of SAS data sets and sequential files that contain binary values such as packed decimal fields. To mitigate the loss of numeric precision during the migration, SAS numeric variables are identified and promoted to account for architectural differences between mainframe and Windows platforms. To aid users in verifying the accuracy of the file transfer, the program compares file information of the source and destination files. When a SAS file is d ownloaded, PROC CONTENTS is run on both files, and the PROC CONTENTS output is compared. For sequential text files, a checksum is generated for both files and the checksum file is compared. A PDS file transfer creates a list of the members in the PDS and destination Windows folder, and the file lists are compared. The development of this program and the file migration was a daunting task. This paper will share some of our lessons learned along the way and the method of our implementation.

Read the paper (PDF).

The DATA Step has served SAS^® programmers well over the years, and although it is handy, the new, exciting, and powerful DS2 is a significant alternative to the DATA Step by introducing an object-oriented programming environment. It enables users to effectively manipulate complex data and efficiently manage the programming through additional data types, programming structure elements, user-defined methods, and shareable packages, as well as threaded execution. This tutorial is developed based on our experiences with getting started with DS2 and learning to use it to access, manage, and share data in a scalable and standards-based way. It facilitates SAS users of all levels to easily get started with DS2 and understand its basic functionality by practicing the features of DS2.

Read the paper (PDF).

Soon after the advent of the SAS^® hash object in SAS^® 9.0, its early adopters realized that the potential functionality of the new structure is much broader than basic 0(1)-time lookup and file matching. Specifically, they went on to invent methods of data aggregation based on the ability of the hash object to quickly store and update key summary information. They also demonstrated that the DATA step aggregation using the hash object offered significantly lower run time and memory utilization compared to the SUMMARY/MEANS or SQL procedures, coupled with the possibility of eliminating the need to write the aggregation results to interim data files and the programming flexibility that allowed them to combine sophisticated data manipulation and adjustments of the aggregates within a single step. Such developments within the SAS user community did not go unnoticed by SAS R&D, and for SAS^® 9.2 the hash object had been enriched with tag parameters and methods specifically designed to handle aggregation without the need to write the summarized data to the PDV host variable and update the hash table with new key summaries, thus further improving run-time performance. As more SAS programmers applied these methods in their real-world practice, they developed aggregation techniques fit to various programmatic scenarios and ideas for handling the hash object memory limitations in situations calling for truly enormous hash tables. This paper presents a review of the DATA step aggregation methods and techniques using the hash object. The presentation is intended for all situations in which the final SAS code is either a straight Base SAS DATA step or a DATA step generated by any other SAS product.

Read the paper (PDF).

Many users would like to check the quality of data after the data integration process has loaded the data into a data set or table. The approach in this paper shows users how to develop a process that scores columns based on rules judged against a set of standards set by the user. Each rule has a standard that determines whether it passes, fails, or needs review (a green, red, or yellow score). A rule can be as simple as: Is the value for this column missing, or is this column within a valid range? Further, it includes comparing a column to one or more other columns, or checking for specific invalid entries. It also includes rules that compare a column value to a lookup table to determine whether the value is in the lookup table. Users can create their own rules and each column can have any number of rules. For example, a rule can be created to measure a dollar column to a range of acceptable values. The user can determine that it is expected that up to two percent of the values are allowed to be out of range. If two to five percent of the values are out of range, then data should be reviewed. And, if over five percent of the values are out of range, the data is not acceptable. The entire table has a color-coded scorecard showing each rule and its score. Summary reports show columns by score and distributions of key columns. The scorecard enables the user to quickly assess whether the SAS data set is acceptable, or whether specific columns need to be reviewed. Drill-down reports enable the user to drill into the data to examine why the column scored as it did. Based on the scores, the data set can be accepted or rejected, and the user will know where and why the data set failed. The process can store each scorecard data in a data mart. This data mart enables the user to review the quality of their data over time. It can answer questions such as: is the quality of the data improving overall? Are there specific columns that are improving or declining over time? What can we do to improve the qu ality of our data? This scorecard is not intended to replace the quality control of the data integration or ETL process. It is a supplement to the ETL process. The programs are written using only Base SAS^® and Output Delivery System (ODS), macro variables, and formats. This presentation shows how to: (1) use ODS HTML; (2) color code cells with the use of formats; (3) use formats as lookup tables; (4) use INCLUDE statements to make use of template code snippets to simplify programming; and (5) use hyperlinks to launch stored processes from the scorecard.

Read the paper (PDF). | Download the data file (ZIP).

As SAS^® programmers and statisticians, we rarely write programs that are run only once and then set aside. Instead, we are often asked to develop programs very early in a project, on immature data, following specifications that may be little more than a guess as to what the data is supposed to look like. These programs will then be run repeatedly on periodically updated data through the duration of the project. This paper offers strategies for not only making those programs more flexible, so they can handle some of the more commonly encountered variations in that data, but also for setting traps to identify unexpected data points that require further investigation. We will also touch upon some good programming practices that can benefit both the original programmer and others who might have to touch the code. In this paper, we will provide explicit examples of defensive coding that will aid in kicking the tires, pumping the breaks, checking your blind spots, and merging ahead for quality programming from the beginning.

Read the paper (PDF).

Using geocoded addresses from FDIC Summary of Deposits data with Census geospatial data including TIGER boundary files and population-weighted centroid shapefiles, we were able to calculate a reasonable distance threshold by metropolitan statistical area (MSA) (or metropolitan division, where applicable (MD)) through a series of SAS^® DATA steps and SQL joins. We first used the Cartesian join with PROC SQL on the data set containing population-weighted centroid coordinates. (The data set contained geocoded coordinates of approximately 91,000 full-service bank branches.) Using the GEODIST function in SAS, we were able to calculate the distance to the nearest bank branch from the population-weighted centroid of each Census tract. The tract data set was then grouped by MSA/MD and sorted in ascending order within each grouping (using the RETAIN function) by distance to the nearest bank branch. We calculated the cumulative population and cumulative population percent for each MSA/MD. The reasonable threshold distance is established where cumulative population percent is closest (in either direction +/-) to 90%.

Read the paper (PDF).

Intervals have been a feature of Base SAS^® for a long time, enabling SAS users to work with commonly (and not-so-commonly) defined periods of time such as years, months, and quarters. With the release of SAS^®9, there are more options and capabilities for intervals and their functions. This paper first discusses the basics of intervals in detail, and then discusses several of the enhancements to the interval feature, such as the ability to select how the INTCK function defines interval boundaries and the ability to create your own custom intervals beyond multipliers and shift operators.

Read the paper (PDF).

This paper puts forward an approach on how to create duplicate records from one Adverse Event (AE) datum based on study treatments. In order to fulfill this task, one flag was created to check if we need to produce duplicate records depending on an existing AE in different treatment periods. If yes, then we create these duplicate records and derive AE dates for these duplicate records based on treatment periods or discontinued dates.

Read the paper (PDF).

We have many options for performing merges or joins these days, each with various advantages and disadvantages. Depending on how you perform your joins, different checks can help you verify whether the join was successful. In this presentation, we look at some sample data, use different methods, and see what kinds of tests can be done to ensure that the results are correct. If the join is performed with PROC SQL and two criteria are fulfilled (the number of observations in the primary data set has not changed [presuming a one-to-one or many-to-one situation], and a variable that should be populated is not missing), then the merge was successful.

Read the paper (PDF).

Being able to split SAS^® processing over multiple SAS processers on a single machine or over multiple machines running SAS, as in the case of SAS^® Grid Manager, enables you to get more done in less time. This paper looks at the methods of using SAS/CONNECT^® to process SAS code in parallel, including the SAS statements, macros, and PROCs available to make this processing easier for the SAS programmer. SAS products that automatically generate parallel code are also highlighted.

Read the paper (PDF). | Download the data file (ZIP).

Your project ended a year ago, and now you need to explain what you did, or rerun some of your code. Can you remember the process? Can you describe what you did? Can you even find those programs? Visually presented here are examples of tools and techniques that follow best practices to help us, as programmers, manage the flow of information from source data to a final product.

Read the paper (PDF).

This paper explains best practices for using temporary files in SAS^® programs. These practices include using the TEMP access method, writing to the WORK directory, and ensuring that you leave no litter files behind. An additional special temporary file technique is described for mainframe users.

Read the paper (PDF).

Programming SAS^® has just been made easier, now that SAS 9.4 has incorporated the Lua programming language into the heart of the SAS System. With its elegant syntax, modern design, and support for data structures, Lua offers you a fresh way to write SAS programs, getting you past many of the limitations of the SAS macro language. This paper shows how you can get started using Lua to drive SAS, via a quick introduction to Lua and a tour through some of the features of the Lua and SAS combination that make SAS programming easier. SAS macro programming is also compared with Lua, so that you can decide where you might benefit most by using either language.

Read the paper (PDF).

Dynamic, interactive visual displays known as dashboards are most effective when they show essential graphs, tables, statistics, and other information where data is the star. The first rule for creating an effective dashboard is to keep it simple. Striking a balance between content and style, a dashboard should be devoid of excessive clutter so as not to distract from and obscure the information displayed. The second rule of effective dashboard design involves displaying data that meets one or more business or organizational objectives. To accomplish this, the elements in a dashboard should convey a format easily understood by its intended audience. Attendees learn how to create dynamic, interactive user- and data-driven dashboards, graphical and table-driven dashboards, statistical dashboards, and drill-down dashboards with a purpose.

Read the paper (PDF).

Programmers can create keyboard macros to perform common editing tasks in SAS^® Enterprise Guide^®. This paper introduces how to record keystrokes, save a keyboard macro, edit the commands, and assign a shortcut key. Sample keyboard macros are included. Techniques to share keyboard macros are also covered.

Read the paper (PDF). | Download the data file (ZIP).

Respondent Driven Sampling (RDS) is both a sampling method and a data analysis technique. As a sampling method, RDS is a chain referral technique with strategic recruitment quotas and specific data gathering requirements. Like other chain referral techniques (for example, snowball sampling), the chains and waves are the start point to conduct analysis. But building the chains and waves still would be a daunting task because it involves too many transpositions and merges. This paper provides an efficient method of using Base SAS^® to build up chains and waves.

Read the paper (PDF).

For the Research Data Centers (RDCs) of the United States Census Bureau, the demand for disk space substantially increases with each passing year. Efficiently using the SAS^® data view might successfully address the concern about disk space challenges within the RDCs. This paper discusses the usage and benefits of the SAS data view to save disk space and reduce the time and effort required to manage large data sets. The ability and efficiency of the SAS data view to process regular ASCII, compressed ASCII, and other commonly used file formats are analyzed and evaluated in detail. The authors discuss ways in which using SAS data views is more efficient than the traditional methods in processing and deploying the large census and survey data in the RDCs.

Read the paper (PDF).

As many leadership experts suggest, growth happens only when it is intentional. Growth is vital to the immediate and long-term success of our employees as well as our employers. As SAS^® programming leaders, we have a responsibility to encourage individual growth and to provide the opportunity for it. With an increased workload yet fewer resources, initial and ongoing training seem to be deemphasized as we are pressured to meet project timelines. The current workforce continues to evolve with time and technology. More important than simply providing the opportunity for training, individual trainees need the motivation for any training program to be successful. Although many existing principles for growth remain true, how such principles are applied needs to evolve with the current generation of SAS programmers. The primary goal of this poster is to identify the critical components that we feel are necessary for the development of an effective training program, one that meets the individual needs of the current workforce. Rather than proposing a single 12-step program that works for everyone, we think that identifying key components for enhancing the existing training infrastructure is a step in the right direction.

Read the paper (PDF).

When faced with a difficult data reduction problem, a SAS^® programmer has many options for how to solve the problem. In this presentation, three different methods are reviewed and compared in terms of processing time, debugging, and ease of understanding. The three methods include linearizing the data, using SQL Cartesian joins, and using sequential data processing. Inconsistencies in the raw data caused the data linearization to be problematic. The large number of records and the need for many-to-many merges resulted in a long run time for the SQL code. The sequential data processing, although older technology, provided the most time efficient and error-free results.

Read the paper (PDF).

Building and maintaining a data warehouse can require a complex series of jobs. Having an ETL flow that is reliable and well integrated is one big challenge. An ETL process might need some pre- and post-processing operations on the database to be well integrated and reliable. Some might handle this via maintenance windows. Others like us might generate custom transformations to be included in SAS^® Data Integration Studio jobs. Custom transformations in SAS Data Integration Studio can be used to speed ETL process flows and reduce the database administrator's intervention after ETL flows are complete. In this paper, we demonstrate the use of custom transformations in SAS Data Integration Studio jobs to handle database-specific tasks for improving process efficiency and reliability in ETL flows.

Read the paper (PDF).

Are you looking to track changes to your SAS^® programs? Do you wish you could easily find errors, warnings, and notes in your SAS logs? Looking for a convenient way to find point-and-click tasks? Want to search your SAS^® Enterprise Guide^® project? How about a point-and-click way to view SAS system options and SAS macro variables? Or perhaps you want to upload data to the SAS^® LASR™ Analytics Server, view SAS^® Visual Analytics reports, or run SAS^® Studio tasks, all from within SAS Enterprise Guide? You can find these capabilities and more in SAS Enterprise Guide. Knowing what tools are at your disposal and how to use them will put you a step ahead of the rest. Come learn about some of the newer features in SAS Enterprise Guide 7.1 and how you can leverage them in your work.

Read the paper (PDF).

The PROPCASE function is useful when you are cleansing a database of names and addresses in preparation for mailing. But it does not know the difference between a proper name (in which initial capitalization should be used) and an acronym (which should be all uppercase). This paper explains an algorithm that determines with reasonable accuracy whether a word is an acronym and, if it is, converts it to uppercase.

Read the paper (PDF). | Download the data file (ZIP).

Data access collisions occur when two or more processes attempt to gain concurrent access to a single data set. Collisions are a common obstacle to SAS^® practitioners in multi-user environments. As SAS instances expand to infrastructures and ultimately empires, the inherent increased complexities must be matched with commensurately higher code quality standards. Moreover, permanent data sets will attract increasingly more devoted users and automated processes clamoring for attention. As these dependencies increase, so too does the likelihood of access collisions that, if unchecked or unmitigated, lead to certain process failure. The SAS/SHARE^® module offers concurrent file access capabilities, but causes a (sometimes dramatic) reduction in processing speed, must be licensed and purchased separately from Base SAS^®, and is not a viable solution for many organizations. Previously proposed solutions in Base SAS use a busy-wait spinlock cycle to repeatedly attempt file access until process success or timeout. While effective, these solutions are inefficient because they generate only read-write locked data sets that unnecessarily prohibit access by subsequent read-only requests. This presentation introduces the %LOCKITDOWN macro that advances previous solutions by affording both read-write and read-only lock testing and deployment. Moreover, recognizing the responsibility for automated data processes to be reliable, robust, and fault tolerant, %LOCKITDOWN is demonstrated in the context of a macro-based exception handling paradigm.

Read the paper (PDF).

Quality measurement is increasingly important in the health-care sphere for both performance optimization and reimbursement. Treatment of chronic conditions is a key area of quality measurement. However, medication compendiums change frequently, and health-care providers often free text medications into a patient's record. Manually reviewing a complete medications database is time consuming. In order to build a robust medications list, we matched a pharmacist-generated list of categorized medications to a raw medications database that contained names, name-dose combinations, and misspellings. The matching procedure we used is called PROC COMPGED. We were able to combine a truncation function and an upcase function to optimize the output of PROC COMPGED. Using these combinations and manipulating the scoring metric of PROC COMPGED enabled us to narrow the database list to medications that were relevant to our categories. This process transformed a tedious task for PROC COMPARE or an Excel macro into a quick and efficient method of matching. The task of sorting through relevant matches was still conducted manually, but the time required to do so was significantly decreased by the fuzzy match in our application of PROC COMPGED.

Read the paper (PDF).

When ODS Graphics was introduced a few years ago, it gave SAS users an array of new ways to generate graphs. One of those ways is with Statistical Graphics procedures. Now, with just a few lines of simple code, you can create a wide variety of high-quality graphs. This paper shows how to produce single-celled graphs using PROC SGPLOT and paneled graphs using PROC SGPANEL. This paper also shows how to send your graphs to different ODS destinations, how to apply ODS styles to your graphs, and how to specify properties of graphs, such as format, name, height, and width.

Read the paper (PDF).

Axis tables, polygon plot, text plot, and more features have been added to Statistical Graphics (SG) procedures and Graph Template Language (GTL) for SAS^® 9.4. These additions are a direct result of your feedback and are designed to make creating graphs easier. Axis tables let you add multiple tables of data to your graphs and to correctly align with the axis values with the right colors for group values in your data. Text plots can have rotated and aligned text anywhere in the graph. You can overlay jittered markers on box plots, use images and font glyphs as markers, specify group attributes without making style changes, and create entirely new custom graphs using the polygon plot. All this without using the annotation facility, which is now supported both for SG procedures and GTL. This paper guides you through these exciting new features now available in SG procedures and GTL.

Read the paper (PDF). | Download the data file (ZIP).

The SAS^® Macro Language is a powerful tool for extending the capabilities of the SAS^® System. This hands-on workshop teaches essential macro coding concepts, techniques, tips, and tricks to help beginning users learn the basics of how the macro language works. Using a collection of proven macro language coding techniques, attendees learn how to write and process macro statements and parameters; replace text strings with macro (symbolic) variables; generate SAS code using macro techniques; manipulate macro variable values with macro functions; create and use global and local macro variables; construct simple arithmetic and logical expressions; interface the macro language with the SQL procedure; store and reuse macros; troubleshoot and debug macros; and develop efficient and portable macro language code.

Read the paper (PDF).

Graduate students encounter many challenges when conducting health services research using real world data obtained from electronic health records (EHRs). These challenges include cleaning and sorting data, summarizing and identifying present-on-admission diagnosis codes, identifying appropriate metrics for risk-adjustment, and determining the effectiveness and cost effectiveness of treatments. In addition, outcome variables commonly used in health service research are not normally distributed. This necessitates the use of nonparametric methods in statistical analyses. This paper provides graduate students with the basic tools for the conduct of health services research with EHR data. We will examine SAS^® tools and step-by-step approaches used in an analysis of the effectiveness and cost-effectiveness of the ABCDE (Awakening and Breathing Coordination, Delirium monitoring/management, and Early exercise/mobility) bundle in improving outcomes for intensive care unit (ICU) patients. These tools include the following: (1) ARRAYS; (2) lookup tables; (3) LAG functions; (4) PROC TABULATE; (5) recycled predictions; and (6) bootstrapping. We will discuss challenges and lessons learned in working with data obtained from the EHR. This content is appropriate for beginning SAS users.

Read the paper (PDF).

SAS^® users are already familiar with the FCMP procedure and the flexibility it provides them in writing their own functions and subroutines. However, did you know that FCMP also allows you to call functions written in C? Did you know that you can create and populate complex C structures and use C types in FCMP? With the PROTO procedure, you can define function prototypes, structures, enumeration types, and even small bits of C code. This paper gets you started on how to use the PROTO procedure and, in turn, how to call your C functions from within FCMP and SAS.

Read the paper (PDF). | Download the data file (ZIP).

No matter what type of programming you do in a pharmaceutical environment, there will eventually be a need to combine your data with a lookup table. This lookup table could be a code list for adverse events, a list of names for visits, one of your own summary data sets containing totals that you will be using to calculate percentages, or you might have your favorite way to incorporate it. This paper describes and discusses the reasons for using five different simple ways to merge data sets with lookup tables, so that when you take over the maintenance of a new program, you will be ready for anything!

Read the paper (PDF).

With the constant need to inform researchers about neighborhood health data, the Santa Clara County Health Department created socio-demographic and health profiles for 109 neighborhoods in the county. Data was pulled from many public and county data sets, compiled, analyzed, and automated using SAS^®. With over 60 indicators and 109 profiles, an efficient set of macros was used to automate the calculation of percentages, rates, and mean statistics for all of the indicators. Macros were also used to automate individual census tracts into pre-decided neighborhoods to avoid data entry errors. Simple SQL procedures were used to calculate and format percentages within the macros, and output was pushed out using Output Delivery System (ODS) Graphics. This output was exported to Microsoft Excel, which was used to create a sortable database for end users to compare cities and/or neighborhoods. Finally, the automated SAS output was used to map the demographic data using geographic information system (GIS) software at three geographies: city, neighborhood, and census tract. This presentation describes the use of simple macros and SAS procedures to reduce resources and time spent on checking data for quality assurance purposes. It also highlights the simple use of ODS Graphics to export data to an Excel file, which was used to mail merge the data into 109 unique profiles. The presentation is aimed at intermediate SAS users at local and state health departments who might be interested in finding an efficient way to run and present health statistics given limited staff and resources.

Read the paper (PDF).

Storage space on a UNIX platform is a costly--and finite--resource to maintain, even under ideal conditions. By regularly monitoring and promptly responding to space limitations that might occur during production, an organization can mitigate the risk of wasted expense, time and effort caused by this problem. SAS^® programmers at Truven Health Analytics have designed a reporting tool to measure space usage by a number of distinct factors over time. Using tabular and graphical output, the tool provides a full picture of what often contributes to critical reductions of available hardware space. It enables managers and users to respond appropriately and effectively whenever this occurs. It also helps to identify ways to encourage more efficient practices, thereby minimizing the likelihood of this occurring in the future. Operating System: RHEL 5.4 (Red Hat Enterprise Linux), Oracle Sun Fire X4600 M2 SAS^® 9.3 TS1M1.

The widely used method to convert RSR XML data to some standard, ready-to-process database uses a Visual Basic mapper as a buffer tool when reading XML data (for example, into an MS Access database). This paper describes the shortcomings of this method with respect to the different schemas of RSR data and offers a SAS^® macro that enables users to read any schema of RSR data directly into a SAS relational database. This macro entirely eliminates the step of creating an MS Access database. Using our macro, the user can cut the time of processing of Ryan White data by 99% and more, depending on the number of files that need to be processed in one run.

Read the paper (PDF).

The DATA step has served SAS^® programmers well over the years, and although it is powerful, it has not fundamentally changed. With DS2, SAS introduced a significant alternative to the DATA step by providing an object-oriented programming environment. In this paper, we share our experiences with getting started with DS2 and learning to use it to access, manage, and share data in a scalable, threaded, and standards-based way.

Read the paper (PDF).

Why did my merge fail? How did that variable get truncated? Why am I getting unexpected results? Understanding how the DATA step actually works is the key to answering these and many other questions. In this paper, two independent consultants with a combined three decades of SAS^® programming experience share a treasure trove of knowledge aimed at helping the novice SAS programmer take his or her game to the next level by peering behind the scenes of the DATA step. We touch on a variety of topics, including compilation versus execution, the program data vector, and proper merging techniques, with a focus on good programming practices that help the programmer steer clear of common pitfalls.

Read the paper (PDF).

In SAS^® software development, data specifications and process requirements can be built into user-defined control data set functioning as components of ETL routines. A control data set provides comprehensive definition on the data source, relationship, logic, description, and metadata of each data element. This approach facilitates auto-generated SAS codes during program execution to perform data ingestion, transformation, and loading procedures based on rules defined in the table. This paper demonstrates the application of using a control data set for the following: (1) data table initialization and integration; (2) validation and quality control; (3) element transformation and creation; (4) data loading; and (5) documentation. SAS programmers and business analysts would find programming development and maintenance of business rules more efficient with this standardized method.

Read the paper (PDF).

Microsoft SharePoint has been adopted by a number of companies today as their content management tool because of its ability to create and manage documents, records, and web content. It is described as an enterprise collaboration platform with a variety of capabilities, and thus it stands to reason that this platform should also be used to surface content from analytical applications such as SAS^® and the R language. SAS provides various methods for surfacing SAS content through SharePoint. This paper describes one such methodology that is both simple and elegant, requiring only SAS Foundation. It also explains how SAS and R can be used together to form a robust solution for delivering analytical results. The paper outlines the approach for integrating both languages into a single security model that uses Microsoft Active Directory as the primary authentication mechanism for SharePoint. It also describes how to extend the authorization to SAS running on a Linux server where LDAP is used. Users of this system are blissfully ignorant of the back-end technology components, as we offer up a seamless interface where they simply authenticate to the SharePoint site and the rest is, as they say, magic.

Read the paper (PDF).

The SAS^® hash object is an incredibly powerful technique for integrating data from two or more data sets based on a common key. This session describes the basic methodology for defining, populating, and using a hash object to perform lookups within the DATA step and provides examples of situations in which the performance of SAS programs is improved by their use. Common problems encountered when using hash objects are explained, and tools and techniques for optimizing hash objects within your SAS program are demonstrated.

Read the paper (PDF).

From stock price histories to hospital stay records, analysis of time series data often requires the use of lagged (and occasionally lead) values of one or more analysis variables. For the SAS^® user, the central operational task is typically getting lagged (lead) values for each time point in the data set. Although SAS has long provided a LAG function, it has no analogous lead function--an especially significant problem in the case of large data series. This paper reviews the LAG function (in particular, the powerful but non-intuitive implications of its queue-oriented basis), demonstrates efficient ways to generate leads with the same flexibility as the LAG function (but without the common and expensive recourse of data re-sorting), and shows how to dynamically generate leads and lags through the use of the hash object.

Read the paper (PDF). | Download the data file (ZIP).

Across the languages of SAS^® are many golden nuggets--functions, formats, and programming features just waiting to impress your friends and colleagues. Learning SAS over 30+ years, I have collected a few, and I offer them to you in this presentation.

Read the paper (PDF).

A forest plot is a common visualization for meta-analysis. Some popular versions that use subgroups with indented text and bold fonts can seem outright daunting to create. With SAS^® 9.4, the Graph Template Language (GTL) has introduced the AXISTABLE statement, specifically designed for including text data columns into a graph. In this paper, we demonstrate the simplicity of creating various forest plots using AXISTABLE statements. Come and see how to create forest plots as clear as day!

Read the paper (PDF). | Download the data file (ZIP).

SAS^® provides a number of tools for creating customized professional reports. While SAS provides point-and-click interfaces through products such as SAS^® Web Report Studio, SAS^® Visual Analytics or even SAS^® Enterprise Guide^®, unfortunately, many users do not have access to the high-end tools and require customization beyond the SAS Enterprise Guide point-and-click interface. Fortunately, base SAS procedures such as the REPORT procedure, combined with graphics procedures, macros, ODS, and Annotate can be used to create very customized professional reports. When toggling together different solutions such as SAS Statistical Graphics, the REPORT procedure, ODS, and SAS/GRAPH^®, different techniques need to be used to keep the same look and feel throughout the report package. This presentation looks at solutions that can be used to keep a consistent look and feel in a report package created with different SAS products.

Read the paper (PDF).

You might be familiar with or experienced in writing or running reports using PROC REPORT, PROC TABULATE, or other methods of report generation. These reporting methods are often very flexible, but they can be limited in the statistics that are available as options for inclusion in the resulting output. SAS^® provides the capability to produce a variety of statistics through Base SAS^® and SAS/STAT^® procedures by using ODS OUTPUT. These procedures include statistics from PROC CORR, PROC FREQ, and PROC UNIVARIATE in Base SAS, as well as PROC GLM, PROC LIFETEST, PROC MIXED, PROC LOGISTIC, and PROC TTEST in SAS/STAT. A number of other procedures can also produce useful ODS OUTPUT objects. Commonly requested statistics for reports include p-values, confidence intervals, and test statistics. These values can be computed with the appropriate procedure, and then use ODS OUTPUT to output the desired information to a data set and include the new information with the other data used to produce the report. Examples that demonstrate how to easily generate the desired statistics or other information and include it to produce the requested final reports are provided and discussed.

Read the paper (PDF).

This paper describes the new features added to the macro facility in SAS^® 9.3 and SAS^® 9.4. New features described include the /READONLY option for macro variables, the %SYSMACEXIST macro function, the %PUT &= feature, and new automatic macro variables such as &SYSTIMEZONEOFFSET.

Read the paper (PDF).

SAS^® SQL is so powerful that you hardly miss using Oracle PL/SQL. One SAS SQL forte can be found in using the SQL reflexive join. Another area of SAS SQL strength is the SQL subquery concept. The focus of this paper is to show alternative approaches to data reporting and to show how to surface data quality problems using reflexive join and subquery SQL concepts. The target audience for this paper is the intermediate SAS programmer or the experienced ANSI SQL programmer new to SAS programming.

Read the paper (PDF).

The DS2 programming language was introduced as part of the SAS^® 9.4 release. Although this new language introduced many significant advancements, one of the most overlooked features is the addition of object-oriented programming constructs. Specifically, the addition of user-defined packages and methods enables programmers to create their own objects, greatly increasing the opportunity for code reuse and decreasing both development and QA duration. In addition, using this object-oriented approach provides a powerful design methodology where objects closely resemble the real-world entities that they model, leading to programs that are easier to understand and maintain. This paper introduces the object-oriented programming paradigm in a three-step manner. First, the key object-oriented features found in the DS2 language are introduced, and the value each provides is discussed. Next, these object-oriented concepts are demonstrated through the creation of a blackjack simulation where the players, the dealer, and the deck are modeled and coded as objects. Finally, a credit risk scoring object is presented to demonstrate the application of this approach in a real-world setting.

Read the paper (PDF).

SAS^® data sets have PROC DATASETS, and SAS catalogs have PROC CATALOG. Find out what the little-known PROC CATALOG can do for you!

Read the paper (PDF).

The task was to produce a figure legend that gave the quintile ranges of a continuous measure corresponding to each color on a five-color choropleth US map. Actually, we needed to produce the figures and associated legends for several dozen maps for several dozen different continuous measures and time periods, as well as create the associated alt text for compliance with Section 508. So, the process needed to be automated. A method was devised using PROC RANK to generate the quintiles, PROC SQL to get the data value ranges within each quintile, and PROC FORMAT (with the CNTLIN= option) to generate and store the legend labels. The resulting data files and format catalogs were used to generate both the maps (with legends) and associated alt text. Then, these processes were rolled into a macro to apply the method for the many different maps and their legends. Each part of the method is quite simple--even mundane--but together, these techniques enabled us to standardize and automate an otherwise very tedious process. The same basic strategy could be used whenever you need to dynamically generate data buckets and keep track of the bucket boundaries (for producing labels, map legends, or alt text or for benchmarking future data against the stored categories).

Read the paper (PDF).

One of the fascinating features of SAS^® is that the software often provides multiple ways to accomplish the same task. A perfect example of this is the aggregation and summarization of data across multiple rows or BY groups of interest. These groupings can be study participants, time periods, geographical areas, or just about any type of discrete classification that you want. While many SAS programmers might be accustomed to accomplishing these aggregation tasks with PROC SUMMARY (or equivalently, PROC MEANS), PROC SQL can also do a bang-up job of aggregation--often with less code and fewer steps. This step-by-step paper explains how to use PROC SQL for a variety of summarization and aggregation tasks. It uses a series of concrete, task-oriented examples to do so. The presentation style issimilar to that used in the author's previous paper, PROC SQL for DATA Step Die-Hards.'

Read the paper (PDF).

It is a common task to reshape your data from long to wide for the purpose of reporting or analytical modeling and PROC TRANSPOSE provides a convenient way to accomplish this. However, when performing the transpose action on large tables stored in a database management system (DBMS) such as Teradata, the performance of PROC TRANSPOSE can be significantly compromised. In this case, it is more efficient for the DBMS to perform the transpose task. SAS^® provides in-database processing technology in PROC SQL, which allows the SQL explicit pass-through method to push some or all of the work to the DBMS. This technique has facilitated integration between SAS and a wide range of data warehouses and databases, including Teradata, EMC Greenplum, IBM DB2, IBM Netezza, Oracle, and Aster Data. This paper uses the Teradata database as an example DBMS and explains how to transpose a large table that resides in it using the SQL explicit pass-through method. The paper begins with comparing the execution time using PROC TRANSPOSE with the execution time using SQL explicit pass-through. From this comparison, it is clear that SQL explicit pass-through is more efficient than the traditional PROC TRANSPOSE when transposing Teradata tables, especially large tables. The paper explains how to use the SQL explicit pass-through method and discusses the types of data columns that you might need to transpose, such as numeric and character. The paper presents a transpose solution for these types of columns. Finally, the paper provides recommendations on packaging the SQL explicit pass-through method by embedding it in a macro. SAS programmers who are working with data stored in an external DBMS and who would like to efficiently transpose their data will benefit from this paper.

Read the paper (PDF).

Do you have reports based on SAS/GRAPH^® procedures, customized with multiple GOPTIONS? Do you dream of those same graphs existing in a GOPTIONS and ANNOTATE-free world? Re-creating complex graphs using statistical graphics (SG) procedures is not only possible, but much easier than you think! Using before and after examples, I discuss how the graphs were created using the combination of Graph Template Language (GTL) and the SG procedures. This method produces graphs that are nearly indistinguishable from the original. This method simplifies the code required to make complex graphs, allows for maximum re-usability for graphics code, and enables changes to cascade to multiple reports simultaneously.

Read the paper (PDF).

Although it does not happen every day, it is not unusual to need to place a quoted string within another quoted string. Fortunately, SAS^® recognizes both single and double quote marks and either can be used within the other, which gives you the ability to have two-deep quoting. There are situations, however, where two kinds of quotes are not enough. Sometimes you need a third layer or, more commonly, you need to use a macro variable within the layers of quotes. Macro variables can be especially problematic, because they generally do not resolve when they are inside single quotes. However, this is SAS and that implies that there are several things going on at once and that there are several ways to solve these types of quoting problems. The primary goal of this presentation is to assist the programmer with solutions to the quotes-within-quotes problem with special emphasis on the presence of macro variables. The various techniques are contrasted as are the likely situations that call for these types of solutions. A secondary goal of this presentation is to help you understand how SAS works with quote marks and how it handles quoted strings. Without going into the gory details, a high-level understanding can be useful in a number of situations.

Read the paper (PDF).

REST is being used across the industry for designing networked applications to provide lightweight and powerful alternatives to web services such as SOAP and Web Services Description Language (WSDL). Since REST is based entirely around HTTP, SAS^® provides everything you need to make REST calls and process structured and unstructured data alike. Learn how PROC HTTP and other SAS language features provide everything you need to simply and securely make use of REST.

Read the paper (PDF).

The IN operator within the DATA step is used for searching a specific variable for some values, either numeric or character (for example, 'if X in (2), then...'). This brief note explains how the opposite situation can be managed. That is, it explains how to search for a specific value in several variables through applying an array and the IN operator together.

Read the paper (PDF).

It is a safe assumption that almost every SAS^® user learns how to use the SET statement not long after they're taught the concept of a DATA step. Further, it would probably be reasonable to guess that almost everyone of those people covered the MERGE statement soon afterwards. Many, maybe most, also got to try the UPDATE and/or MODIFY statements eventually, as well. It would also be a safe assumption that very few people have taken the time to review the manual since they've learned about those statements. That is most unfortunate, because there are so many options available to the users that can assist them in their tasks of obtaining and combining data sets. This presentation is designed to build onto the basic understanding of SET, MERGE, and UPDATE. It assumes that the attendee or reader has a basic knowledge of those statements, and it introduces various options and usages that extend the utility of these basic commands.

Read the paper (PDF).

Join us for lunch as we discuss the benefits of being part of the elite group that is SAS Certified Professionals. The SAS Global Certification program has awarded more than 79,000 credentials to SAS users across the globe. Come listen to Terry Barham, Global Certification Manager, give an overview of the SAS Certification program, explain the benefits of becoming SAS certified and discuss exam preparation tips. This session will also include a Q&A section where you can get answers to your SAS Certification questions.

SAS^® formats can be used in so many different ways! Even the most basic SAS format use (modifying the way a SAS data value is displayed without changing the underlying data value) holds a variety of nifty tricks, such as nesting formats, formats that affect various style attributes, and conditional formatting. Add in picture formats, multi-label formats, using formats for data cleaning, and formats for joins and table look-ups, and we have quite a bag of tricks for the humble SAS format and PROC FORMAT, which are used to generate them. This paper describes a few very useful programming techniques that employ SAS formats. While this paper will be appropriate for the newest SAS user, it will also focus on some of the lesser-known features of formats and PROC FORMAT and so should be useful for even quite experienced users of SAS.

Read the paper (PDF).

For the many relational database products that SAS/ACCESS^®supports (Oracle, Teradata, DB2, MySQL, SQL Server, Hadoop, Greenplum, PC Files, to name but a few), there are a myriad of RDBMS-specific options at your disposal, but how do you know the right options for any given situation? How much data should you transfer at a time? Which SAS^® functions can be passed through to the database--which cannot? How do you verify that your processes are running efficiently? How do you test and validate any changes? The answer lies with the feedback capabilities of the SASTRACE system option.

Read the paper (PDF).

In today's competitive job market, both recent graduates and experienced professionals are looking for ways to set themselves apart from the crowd. SAS^® certification is one way to do that. SAS Institute Inc. offers a range of exams to validate your knowledge level. In writing this paper, we have drawn upon our personal experiences, remarks shared by new and longtime SAS users, and conversations with experts at SAS. We discuss what certification is and why you might want to pursue it. Then we share practical tips you can use to prepare for an exam and do your best on exam day.

Read the paper (PDF).

Can you create hundreds of great looking Microsoft Excel tables all within SAS^® and make them all Section 508 compliant at the same time? This paper examines how to use the ODS TAGSETS.EXCELXP statement and other Base SAS^® features to create fantastic looking Excel worksheet tables that are all Section 508 compliant. This paper demonstrates that there is no need for any outside intervention or pre- or post-meddling with the Excel files to make them Section 508 compliant. We do it all with simple Base SAS code.

Read the paper (PDF).

Customer expectations are set high when Microsoft Excel and Microsoft PowerPoint are used to design reports. Using SAS^® for reporting has benefits because it generates plots directly from prepared data sets, automates the plotting process, minimizes labor-intensive manual construction using Microsoft products, and does not compromise the presentation value. SAS^® Enterprise Guide^® 5.1 has a powerful point-and-click method that is quick and easy to use. However, it is limited in its ability to customize the output to mimic manually created Microsoft graphics. This paper demonstrates why SAS Enterprise Guide is the perfect starting point for creating initial code for plots using SAS/GRAPH^® point-and-click features and how the code can be enhanced using established PROC GPLOT, ANNOTATE, and ODS options to re-create the look and feel of plots generated by Excel and PowerPoint. Examples show the generation of plots and tables using PROC TABULATE to embed the plot data into the graphical output. Also included are tips for overcoming the ODS limitation of SAS^® 9.3, which is used by SAS Enterprise Guide 5.1, to transfer the SAS graphical output to PowerPoint files. These SAS^® 9.3 tips are contrasted with the new SAS^® 9.4 ODS POWERPOINT statement that enables direct PowerPoint file creation from a SAS program.

Read the paper (PDF).

SAS^® Studio (previously known as SAS^® Web Editor) was introduced in the first maintenance release of SAS^® 9.4 as an alternative programming environment to SAS^® Enterprise Guide^® and SAS^® Display Manager. SAS Studio is different in many ways from SAS Enterprise Guide and SAS Display Manager. As a programmer, I currently use SAS Enterprise Guide to help me code, test, maintain, and organize my SAS^® programs. I have SAS Display Manager installed on my PC, but I still prefer to write my programs in SAS Enterprise Guide because I know it saves my log and output whenever I run a program, even if that program crashes and takes the SAS session with it! So should I now be using SAS Studio instead, and should you be using it, too?

Read the paper (PDF).

Once you have a SAS^® Visual Analytics environment up and running, the next important piece to the puzzle is to keep your users happy by keeping their data loaded and refreshed on a consistent basis. Loading data from the SAS Visual Analytics UI is both a great first start and great for ad hoc data exploring. But automating this data load so that users can focus on exploring the data and creating reports is where to power of SAS Visual Analytics comes into play. By using tried-and-true SAS^® Data Integration Studio techniques (both out of the box and custom transforms), you can easily make this happen. Proven techniques such as sweeping from a source library and stacking similar Hadoop Distributed File System (HDFS) tables into SAS^® LASR™ Analytic Server for consumption by SAS Visual Analytics are presented using SAS Visual Analytics and SAS Data Integration Studio.

Read the paper (PDF).

Six Sigma is a business management strategy that seeks to improve the quality of process outputs by identifying and removing the causes of defects (errors) and minimizing variability in manufacturing and business processes. Each Six Sigma project carried out within an organization follows a defined sequence of steps and has quantified financial targets. All Six Sigma project methodologies include an extensive analysis phase in which SAS^® software can be applied. JMP^® software is widely used for Six Sigma projects. However, this paper demonstrates how Base SAS^® (and a bit of SAS/GRAPH^® and SAS/STAT^® software) can be used to address a wide variety of Six Sigma analysis tasks. The reader is assumed to have a basic knowledge of Six Sigma methodology. Therefore, the focus of the paper is the use of SAS code to produce outputs for analysis.

Read the paper (PDF).

Text messages (SMS) are a convenient way to receive notifications away from your computer screen. In SAS^®, text messages can be sent to mobile phones via the DATA step. This paper briefly describes several methods for sending text messages from SAS and explores possible applications.

Read the paper (PDF).

Little did you know that your last delivery ran on incomplete data. To make matters worse, the client realized the issue first. Sounds like a horror story, no? A few preventative measures can go a long way in ensuring that your data are up-to-date and progressing normally. At the data set level, metadata comparisons between the current and previous data cuts will help identify observation and variable discrepancies. Comparisons will also uncover attribute differences at the variable level. At the subject level, they will identify missing subjects. By compiling these comparison results into a comprehensive scheduled e-mail, a data facilitator need only skim the report to confirm that the data is good to go--or in need of some corrective action. This paper introduces a suite of checks contained in a macro that will compare data cuts in the data set, variable, and subject levels and produce an e-mail report. The wide use of this macro will help all SAS^® users create better deliveries while avoiding rework.

Read the paper (PDF). | Download the data file (ZIP).

SAS^® functions provide amazing power to your DATA step programming. Specific functions are essential--they save you from writing volumes of unnecessary code. This presentation covers some of the most useful SAS functions. A few might be new to you and they can all change how you program and approach common programming tasks. The majority of these functions work with character data. There are functions that search for strings, others that find and replace strings, and some that join strings together. Furthermore, certain functions can measure the spelling distance between two strings (useful for fuzzy matching). Some of the newest and most incredible functions are not functions at all--they are call routines. Did you know that you can sort values within an observation? Did you know that not only can you identify the largest or smallest value in a list of variables, but you can identify the second or third or nth largest or smallest value? A knowledge of these functions will make you a better SAS programmer.

Read the paper (PDF).

The report looks simple enough--a bar chart and a table, like something created with GCHART and REPORT procedures. But, there are some twists to the reporting requirements that make those procedures not quite flexible enough. The solution was to mix 'old' and 'new' DATA step-based techniques to solve the problem. Annotate datasets are used to create the bar chart and the Report Writing Interface (RWI) to create the table. Without a whole lot of additional code, an extreme amount of flexibility is gained.

Read the paper (PDF).

Can you actually get something for nothing? With the SAS^® PROC SQL subquery and remerging features, yes, you can. When working with categorical variables, you often need to add group descriptive statistics such as group counts and minimum and maximum values for further BY-group processing. Instead of first creating the group count and minimum or maximum values and then merging the summarized data set to the original data set, why not take advantage of PROC SQL to complete two steps in one? With the PROC SQL subquery and summary functions by the group variable, you can easily remerge the new group descriptive statistics with the original data set. Now with a few DATA step enhancements, you too can include percent calculations.

Read the paper (PDF).

Every day, companies all over the world are moving their data into the Cloud. While there are many options available, much of this data will wind up in Amazon Redshift. As a SAS^® user, you are probably wondering, 'What is the best way to access this data using SAS?' This paper discusses the many ways that you can use SAS/ACCESS^® to get to Amazon Redshift. We compare and contrast the various approaches and help you decide which is best for you. Topics that are discussed are building a connection, choosing appropriate data types, and SQL functions.

Read the paper (PDF).

SAS^® provides a complex ecosystem with multiple tools and products that run in a variety of environments and modes. SAS provides numerous error-handling and program control statements, options, and features. These features can function differently according to the run environment, and many of them have constraints and limitations. In this presentation, we review a number of potential error-handling and program control strategies that can be employed, along with some of the inherent limitations of each. The bottom line is that there is no single strategy that will work in all environments, all products, and all run modes. Instead, programmers need to consider the underlying program requirements and choose the optimal strategy for their situation.

Read the paper (PDF). | Download the data file (ZIP).

SAS^® users organize their applications in a variety of ways. However, there are some approaches that are more successful, and some that are less successful. In particular, the need to process some of the code some of the time in a file is sometimes challenging. Reproducible research methods require that SAS applications be understandable by the author and other staff members. In this presentation, you learn how to organize and structure your SAS application to manage the process of data access, data analysis, and data presentation. The approach to structure applications requires that tasks in the process of data analysis be compartmentalized. This can be done using a well-defined program. The author presents his structuring algorithm, and discusses the characteristics of good structuring methods for SAS applications. Reproducible research methods are becoming more centrally important, and SAS users must keep up with the current developments.

Read the paper (PDF). | Download the data file (ZIP).

One of the more commonly needed operations in SAS^® programming is to determine the value of one variable based on the value of another. A series of techniques and tools have evolved over the years to make the matching of these values go faster, smoother, and easier. A majority of these techniques require operations such as sorting, searching, and comparing. As it turns out, these types of techniques are some of the more computationally intensive. Consequently, an understanding of the operations involved and a careful selection of the specific technique can often save the user a substantial amount of computing resources. Many of the more advanced techniques can require substantially fewer resources. It is incumbent on the user to have a broad understanding of the issues involved and a more detailed understanding of the solutions available. Even if you do not currently have a BIG data problem, you should at the very least have a basic knowledge of the kinds of techniques that are available for your use.

Read the paper (PDF).

How many times has this happened to you? You create a really helpful report and share it with others. It becomes popular and you find yourself running it over and over. Then they start asking, But can't you re-run it and just change ___? (Fill in the blank with whatever simple request you can think of.) Don't you want to just put the report out as a web page with some basic parameters that users can choose themselves and run when they want? Consider writing your own task in SAS^® Studio! SAS Studio includes several predefined tasks, which are point-and-click user interfaces that guide the user through an analytical process. For example, tasks enable users to create a bar chart, run a correlation analysis, or rank data. When a user selects a task option, SAS^® code is generated and run on the SAS server. Because of the flexibility of the task framework, you can make a copy of a predefined task and modify it or create your own. Tasks use the same common task model and the Velocity Template Language--no Java programming or ActionScript programming is required. Once you have the interface set up to generate the SAS code you need, then you can publish the task for other SAS Studio users to use or you can use a straight URL. Now that others can generate the output themselves, you actually might have time to go fishing!

Read the paper (PDF).

Many languages, including the SAS DATA step, have extensive debuggers that can be used to detect logic errors in programs. Another easy way to detect logic errors is to simply display messages and variable content at strategic times. The PUTLOG statement will be discussed and examples given that show how using this statement is probably the easiest and most flexible way to detect and correct errors in your DATA step logic.

Read the paper (PDF).

The first thing that you need to know is that SAS^® software stores dates and times as numbers. However, this is not the only thing that you need to know, and this presentation gives you a solid base for working with dates and times in SAS. It also introduces you to functions and features that enable you to manipulate your dates and times with surprising flexibility. This paper also shows you some of the possible pitfalls with dates (and with times and datetimes) in your SAS code, and how to avoid them. We show you how SAS handles dates and times through examples, including the ISO 8601 formats and informats, and how to use dates and times in TITLE and FOOTNOTE statements. We close with a brief discussion of Microsoft Excel conversions.

Read the paper (PDF). | Download the data file (ZIP).

It is well-known in the world of SAS^® programming that the REPORT procedure is one of the best procedures for creating dynamic reports. However, you might not realize that the compute block is where all of the action takes place! Its flexibility enables you to customize your output. This paper is a primer for using a compute block. With a compute block, you can easily change values in your output with the proper assignment statement and add text with the LINE statement. With the CALL DEFINE statement, you can adjust style attributes such as color and formatting. Through examples, you learn how to apply these techniques for use with any style of output. Understanding how to use the compute-block functionality empowers you to move from creating a simple report to creating one that is more complex and informative, yet still easy to use.

Read the paper (PDF).

If you are one of the many customers who want to move your SAS^® data to Hadoop, one decision you will encounter is what data storage format to use. There are many choices, and all have their pros and cons. One factor to consider is how you currently store your data. If you currently use the Base SAS^® engine or the SAS^® Scalable Performance Data Engine, then using the SPD Engine with Hadoop will enable you to continue accessing your data with as little change to your existing SAS programs as possible. This paper discusses the enhancements, usage, and benefits of the SPD Engine with Hadoop.

Read the paper (PDF).

There are many 'gotcha's' when you are trying to automate a well-written program. The details differ depending on the way you schedule the program and the environment you are using. This paper covers system options, error handling logic, and best practices for logging. Save time and frustration by using these tips as you schedule programs to run.

Read the paper (PDF).

The Work library is at the core of most SAS^® programs, but programmers tend to ignore it unless something breaks. This paper first discusses the USER= system option for saving the Work files in a directory. Then, we cover a similar macro-controlled method for saving the files in your Work library, and the interaction of this method with OPTIONS NOREPLACE and the syntax check options. A number of SAS system options that help you to manage Work libraries are discussed: WORK=, WORKINIT, WORKTERM; these options might be restricted by SAS Administrators. Additional considerations in managing Work libraries are discussed: handling large files, file compression, programming style, and macro-controlled deletion of redundant files in SAS^® Enterprise Guide^®.

Read the paper (PDF). | Download the data file (ZIP).

The SAS^® macro processor is a powerful ally, but it requires respect. There are a myriad of macro functions available, most of which emulate DATA step functions, but some of which require special consideration to fully use their capabilities. Questions to be answered include the following: When should you protect the macro variable? During macro compilation, during macro execution? (What do those phrases even mean?) How do you know when to use which macro function? %BQUOTE(), %NBRQUOTE(), %UNQUOTE(),%SUPERQ(), and so on? What's with the %Q prefix of some macro functions? And more: %SYSFUNC(), %SYSCALL, and so on. Macro developers will no longer by daunted by the complexity of choices. With a little clarification, the power of these macro functions will open up new possibilities.

Read the paper (PDF).

Using PROC TRANSPOSE to make wide files wider requires running separate PROC TRANSPOSE steps for each variable that you want transposed, as well as a DATA step using a MERGE statement to combine all of the transposed files. In addition, if you want the variables in a specific order, an extra DATA step is needed to rearrange the variable ordering. This paper presents a method that accomplishes the task in a simpler manner using less code and requiring fewer steps, and which runs n times faster than PROC TRANSPOSE (where n=the number of variables to be transposed).

Read the paper (PDF). | Download the data file (ZIP).

Currently, there are several methods for reading JSON formatted files into SAS^® that depend on the version of SAS and which products are licensed. These methods include user-defined macros, visual analytics, PROC GROOVY, and more. The user-defined macro %GrabTweet, in particular, provides a simple way to directly read JSON-formatted tweets into SAS^® 9.3. The main limitation of %GrabTweet is that it requires the user to repeatedly run the macro in order to download large amounts of data over time. Manually downloading tweets while conforming to the Twitter rate limits might cause missing observations and is time-consuming overall. Imagine having to sit by your computer the entire day to continuously grab data every 15 minutes, just to download a complete data set of tweets for a popular event. Fortunately, the %GrabTweet macro can be modified to automate the retrieval of Twitter data based on the rate that the tweets are coming in. This paper describes the application of the %GrabTweet macro combined with batch processing to download tweets without manual intervention. Users can specify the phrase parameters they want, run the batch processing macro, leave their computer to automatically download tweets overnight, and return to a complete data set of recent Twitter activity. The batch processing implements an automated retrieval of tweets through an algorithm that assesses the rate of tweets for the specified topic in order to make downloading large amounts of data simpler and effortless for the user.

Read the paper (PDF).

This paper explores feature extraction from unstructured text variables using Term Frequency-Inverse Document Frequency (TF-IDF) weighting algorithms coded in Base SAS^®. Data sets with unstructured text variables can often hold a lot of potential to enable better predictive analysis and document clustering. Each of these unstructured text variables can be used as inputs to build an enriched data set-specific inverted index, and the most significant terms from this index can be used as single word queries to weight the importance of the term to each document from the corpus. This paper also explores the usage of hash objects to build the inverted indices from the unstructured text variables. We find that hash objects provide a considerable increase in algorithm efficiency, and our experiments show that a novel weighting algorithm proposed by Paik (2013) best enables meaningful feature extraction. Our TF-IDF implementations are tested against a publicly available data breach data set to understand patterns specific to insider threats to an organization.

Read the paper (PDF). | Watch the recording.

We've all heard it before: 'If two ampersands don't work, add a third.' But how many of us really know how ampersands work behind the scenes? We show the function of multiple ampersands by going through examples of the common two- and three-ampersand scenarios, and expand to show four, five, six, and even seven ampersands, and explain when they might be (rarely) useful.

Read the paper (PDF).

There have been many SAS^® Global Forum papers written about getting your data into SAS^® from Microsoft Excel and getting your data back out to Excel after using SAS to manipulate it. But sometimes you have to update Excel files with special formatting and formulas that would be too hard to replicate with Dynamic Data Exchange (DDE) or that change too often to make DDE worthwhile. But we can still use the output prowess of SAS and a sprinkling of Visual Basic for Applications (VBA) to maintain the existing formatting and formulas in your Excel file! This paper focuses on the possibilities you have in updating Excel files by reading in the data, using SAS to modify it as needed, and then using DDE and a simple Excel VBA macro to output back to Excel and use a formula while maintaining the existing formatting that is present in your source Excel file.

Read the paper (PDF). | Download the data file (ZIP).

Becoming one of the best memorizers in the world doesn't happen overnight. With hard work, dedication, a bit of obsession, and with the assistance of some clever analytics metrics, Nelson Dellis was able to climb himself up to the top of the memory rankings in under a year to become the now 3x USA Memory Champion. In this talk, he explains what it takes to become the best at memory, what is involved in such grueling memory competitions, and how analytics helped him get there.

Many epidemiological studies use medical claims to identify and describe a population. But finding out who was diagnosed, and who received treatment, isn't always simple. Each claim can have dozens of medical codes, with different types of codes for procedures, drugs, and diagnoses. Even a basic definition of treatment could require a search for any one of 100 different codes. A SAS^® macro may come to mind, but generalizing the macro to work with different codes and types allows it to be reused in a variety of different scenarios. We look at a number of examples, starting with a single code type and variable. Then we consider multiple code variables, multiple code types, and multiple flag variables. We show how these macros can be combined and customized for different data with minimal rework. Macro flexibility and reusability are also discussed, along with ways to keep our list of medical codes separate from our program. Finally, we discuss time-dependent medical codes, codes requiring database lookup, and macro performance.

Read the paper (PDF). | Download the data file (ZIP).

Many papers have been written over the years that describe how to use Dynamic Data Exchange (DDE) to pass data from SAS^® to Excel. This presentation aims to show you how to do the same exchange with the SAS Output Delivery System (ODS) and the TEMPLATE Procedure.

Read the paper (PDF). | Download the data file (ZIP).

If you have not had a chance to explore SAS^® Studio yet, or if you're anxious to see what's new, this paper gives you an introduction to this new browser-based interface for SAS^® programmers and a peek at what's coming. With SAS Studio, you can access your data files, libraries, and existing programs, and you can write new programs while using SAS software behind the scenes. SAS Studio connects to a SAS server in order to process SAS programs. The SAS server can be a hosted server in a cloud environment, a server in your local environment, or a copy of SAS on your local machine. It's a comfortable environment for those used to the traditional SAS windowing environment (SAS^® Display Manager) but new features like a query window, process flow diagrams, and tasks have been added to appeal to traditional SAS^® Enterprise Guide^® users.

Read the paper (PDF).

Now that SAS^® users are moving to 64-bit Microsoft Windows platforms, some are discovering that vendor-supplied DLLs might still be 32-bit. Since 64-bit applications cannot use 32-bit DLLs, this would present serious technical issues. This paper explains how the MODULE routines in SAS can be used to call into 32-bit DLLs successfully, using new features added in SAS^® 9.3.

Read the paper (PDF).

A familiar adage in firefighting--if you can predict it, you can prevent it--rings true in many circles of accident prevention, including software development. If you can predict that a fire, however unlikely, someday might rage through a structure, it's prudent to install smoke detectors to facilitate its rapid discovery. Moreover, the combination of smoke detectors, fire alarms, sprinklers, fire-retardant building materials, and rapid intervention might not prevent a fire from starting, but it can prevent the fire from spreading and facilitate its immediate and sometimes automatic extinguishment. Thus, as fire codes have grown to incorporate increasingly more restrictions and regulations, and as fire suppression gear, tools, and tactics have continued to advance, even the harrowing business of firefighting has become more reliable, efficient, and predictable. As operational SAS^® data processes mature over time, they too should evolve to detect, respond to, and overcome dynamic environmental challenges. Erroneous data, invalid user input, disparate operating systems, network failures, memory errors, and other challenges can surprise users and cripple critical infrastructure. Exception handling describes both the identification of and response to adverse, unexpected, or untimely events that can cause process or program failure, as well as anticipated events or environmental attributes that must be handled dynamically through prescribed, predetermined channels. Rapid suppression and automatic return to functioning is the hopeful end state but, when catastrophic events do occur, exception handling routines can terminate a process or program gracefully while providing meaningful execution and environmental metrics to developers both for remediation and future model refinement. This presentation introduces fault-tolerant Base SAS^® exception handling routines that facilitate robust, reliable, and responsible software design.

Read the paper (PDF).

The experiences of the programmer role in a large SAS^® shop are shared. Shortages in SAS programming talent tend to result in one SAS programmer doing all of the production programming within a unit in a shop. In a real-world example, management realized the problem and brought in new programmers to help do the work. The new programmers actually improved the existing programmers' programs. It became easier for the experienced programmers to complete other programming assignments within the unit. And, the different programs in the shop had a standard structure. As a result, all of the programmers had a clearer picture of the work involved and knowledge hoarding was eliminated. Experienced programmers were now available when great SAS code needed to be written. Yet, they were not the only programmers who could do the work! With multiple programmers able to do the same tasks, vacations were possible and didn't threaten deadlines. It was even possible for these programmers to be assigned other tasks outside of the unit and broaden their own skills in statistical production work.

Read the paper (PDF).

Working with multiple data sources in SAS^® was not a straight forward thing until PROC FEDSQL was introduced in the SAS^® 9.4 release. Federated Query Language, or FEDSQL, is a vendor-independent language that provides a common SQL syntax to communicate across multiple relational databases without having to worry about vendor-specific SQL syntax. PROC FEDSQL is a SAS implementation of the FEDSQL language. PROC FEDSQL enables us to write federated queries that can be used to perform joins on tables from different databases with a single query, without having to worry about loading the tables into SAS individually and combining them using DATA steps and PROC SQL statements. The objective of this paper is to demonstrate the working of PROC FEDSQL to fetch data from multiple data sources such as Microsoft SQL Server database, MySQL database, and a SAS data set, and run federated queries on all the data sources. Other powerful features of PROC FEDSQL such as transactions and FEDSQL pass-through facility are discussed briefly.

Read the paper (PDF).

Managing and organizing external files and directories play an important part in our data analysis and business analytics work. A good file management system can streamline project management and file organizations and significantly improve work efficiency . Therefore, under many circumstances, it is necessary to automate and standardize the file management processes through SAS^® programming. Compared with managing SAS files via PROC DATASETS, managing external files is a much more challenging task, which requires advanced programming skills. This paper presents and discusses various methods and approaches to managing external files with SAS programming. The illustrated methods and skills can have important applications in a wide variety of analytic work fields.

Read the paper (PDF).

How often have you pulled oodles of data out of the corporate data warehouse down into SAS^® for additional processing? This additional processing, sometimes thought to be uniquely SAS, might include FIRST. logic, cumulative totals, lag functionality, specialized summarization, or advanced date manipulation. Using the analytical (or OLAP) and Windowing functionality available in many databases (for example, in Teradata and IBM Netezza ), all of this processing can be performed directly in the database without moving and reprocessing detail data unnecessarily. This presentation illustrates how to increase your coding and execution efficiency by using the database's power through your SAS environment.