General Programming Papers A-Z

A
Session SAS3801-2016:
A Second Look at the ODS Destination for PowerPoint
This paper demonstrates how to use the ODS destination for PowerPoint to create attractive presentations from your SAS® output. Packed with examples, this paper gives you a behind-the-scenes tour of how ODS creates Microsoft PowerPoint presentations. You get an in-depth look at how to customize the ODS PowerPoint style templates that control the appearance of your presentation. With this information you can quickly turn your SAS output into an engaging and informative presentation. This paper is a follow-on to the SAS® Global Forum 2013 paper A First Look at the ODS Destination for PowerPoint.
Read the paper (PDF) | Download the data file (ZIP)
Tim Hunter, SAS
Session 2280-2016:
A Survey of Some Useful SAS® Functions
SAS® functions provide amazing power to your DATA step programming. Some of these functions are essential--they save you from writing volumes of unnecessary code. This talk covers a number of the most useful SAS functions. Some may be new to you, and they will change the way you program and approach common programming tasks. The majority of the functions discussed in this talk work with character data. Some functions search for strings, and others find and replace strings or join strings together. Still others measure the spelling distance between two strings (useful for fuzzy matching). Some of the newest and most amazing functions are not functions at all, but call routines. Did you know that you can sort values within an observation? Did you know that you can identify not only the largest or smallest value in a list of variables, but the second- or third- or nth-largest or smallest value? Knowledge of the functions described here will make you a much better SAS programmer.
Read the paper (PDF) | Watch the recording
Ron Cody, Camp Verde Associates
Session 9622-2016:
Accessing PubMed Data Using SAS® and the Entrez Programming Utilities
In health sciences and biomedical research we often search for scientific journal abstracts and publications within MEDLINE, which is a suite of indexed databases developed and maintained by the National Center for Biotechnology Information (NCBI) at the United States National Library of Medicine (NLM). PubMed is a free search engine within MEDLINE and has become one of the standard databases to search for scientific abstracts. Entrez is the information retrieval system that gives you direct access to the 40 databases with over 1.3 billion records within the NCBI. You can access these records by using the eight e-utilities (einfo, esearch, summary, efetch, elink, einfo, epost, and egquery), which are the NCBI application programming interfaces (APIs). In this paper I will focus on using three of the e-utilities to retrieve data from PubMed as opposed to the standard method of manually searching and exporting the information from the PubMed website. I will demonstrate how to use the SAS HTTP procedure along with the XML mapper to develop a SAS® macro that generates all the required data from PubMed based on search term parameters. Using SAS to extract information from PubMed searches and save it directly into a data set allows the process to be automated (which is good for running routine updates) and eliminates manual searching and exporting from PubMed.
View the e-poster or slides (PDF)
Craig hansen, South Australian Health and Medical Research Institute
Session 9342-2016:
All Aboard! Next Stop is the Destination Excel
Over the last few years both Microsoft Excel file formats and the SAS® interfaces to those Excel formats have changed. SAS® has worked hard to make the interface between the two systems easier to use. Starting with Comma Separated Variable files and moving to PROC IMPORT and PROC EXPORT, LIBNAME processing, SQL processing, SAS® Enterprise Guide®, JMP®, and then on to the HTML and XML tagsets like MSOFFICE2K, and EXCELXP. Well, there is now a new entry into the processes available for SAS users to send data directly to Excel. This new entry into the ODS arena of data transfer to Excel is the ODS destination called EXCEL. This process is included within SAS ODS and produces native format Excel files for version 2007 of Excel and later. It was first shipped as an experimental version with the first maintenance release of SAS® 9.4. This ODS destination has many features similar to the EXCELXP tagsets.
Read the paper (PDF) | Download the data file (ZIP) | Watch the recording
William E Benjamin Jr, Owl Computer Consultancy LLC
Session 6406-2016:
An Introduction to SAS® Arrays
So you've heard about SAS® arrays, but you're not sure when or why you would use them. This presentation provides some background on SAS arrays, from explaining what occurs during compile time to explaining how to use them programmatically. It also includes a discussion about how DO loops and macro variables can enhance array usability. Specific examples, including Fahrenheit-to-Celsius temperature conversion, salary adjustments, and data transposition and counting, demonstrate how you can use SAS arrays effectively in your own work and also provide a few caveats about their use.
Read the paper (PDF) | Watch the recording
Andrew Kuligowski, HSN
Lisa Mendez, IMS Government Solutions
Session 2380-2016:
Analyzing the Hospital Episode Statistics Data Set: How Much Does SAS® Help?
Hospital Episode Statistics (HES) is a data warehouse that contains records of all admissions, outpatient appointments, and accident and emergency (A&E) attendances at National Health Service (NHS) hospitals in England. Each year it processes over 125 million admitted patient, outpatient, and A&E records. Such a large data set gives endless research opportunities for researchers and health-care professionals. However, patient care data is complex and might be difficult to manage. This paper demonstrates the flexibility and power of SAS® programming tools such as the DATA step, the SQL procedure, and macros to help to analyze HES data.
Read the paper (PDF)
Violeta Balinskaite, Imperial College London
Session 3900-2016:
Assessing SAS® Skill Levels during Interviews
This paper provides tools and guidelines to assess SAS® skill level during the interview process. Included are interview questions for Base SAS® developers, statisticians, and SAS administration candidates. Whether the interviewer has a deep understanding of or just some familiarity with these areas, the questions provided will allow discussions to uncover skills and usage experience for the skills your company requires.
Read the paper (PDF)
Jenine Milum, Citi
Session 3660-2016:
Automating Load for Multiple Tables in SAS® Visual Analytics
The SAS® Visual Analytics autoload feature loads any files placed in the autoload location into the public SAS® LASR™ Analytic Server. But what if we want to autoload the tables into a private SAS LASR Analytic Server and we want to load database tables into SAS Visual Analytics and we also want to reload them every day so that the tables reflect any changes in the database? One option is to create a query in SAS® Visual Data Builder and schedule the table for load into the SAS LASR Analytic Server, but in this case, a query has to be created for each table that has to be loaded. It would be rather efficient if all tables could be loaded simultaneously; that would make the job a lot easier. This paper provides a custom solution for autoloading the tables in one step, so that tables in SAS Visual Analytics don't have to be reloaded manually or multiple queries don't have to be rerun whenever there is a change in the source data. This solution automatically unloads and reloads all the tables listed in a text file every day so that the data available in SAS Visual Analytics is always up-to-date. This paper focuses on autoloading the database tables into the private SAS LASR Analytic server in SAS Visual Analytics hosted in a UNIX environment and uses the SASIOLA engine to perform the load. The process involves four steps: initial load, unload, reload, and schedule unload and reload. This paper provides the scripts associated for each step. The initial load needs to be performed only once. Unload and reload have to be performed periodically to refresh the data source, so these steps have to be scheduled. Once the tables are loaded, a description is also added to the tables detailing the name of the table, time of the load, and user ID performing the load, which is similar to the description added when loading the tables manually into the SAS LASR Analytic Server using SAS Visual Analytics Administrator.
Read the paper (PDF) | Watch the recording
Swetha Vuppalanchi, SAS Institute Inc.
B
Session 11521-2016:
Building Macros for Quick Survey Scoring
Surveys are a critical research component when trying to gather information about a population of people and their knowledge, attitudes, and experiences. Researchers often implement surveys after a population has participated in a class or educational program. Survey developers often create sets of items, which when analyzed as a group, can measure a construct that describes the underlying behavior, attribute, or characteristic of a study participant. After testing for the reliability and validity of individual items, the group of items can be combined to create a scale score that measures a predefined construct. For example, in education research, a construct can measure students' use of global reading strategies or teachers' confidence in literacy instruction. The number of items that compose a construct can vary widely. Construct scores can be used as outcome measures to assess the impact of a program on its participants. Our example is taken from a project evaluating a teacher's professional development program aimed at improving students' literacy in target subject classes. The programmer is tasked with creating such scores quickly. With an unlimited amount of time to spend, a programmer could operationalize the creation of these constructs by manually entering predefined formulas in the DATA step. More often, time and resources are limited. Therefore, the programmer is more efficient by automating this process by using macro programs. In this paper, we present a technique that uses an externally created key data set that contains the construct specifications and corresponding items of each construct. By iterating through macro variables created from this key data set, we can create construct score variables for varying numbers of items. This technique can be generalized to other processing tasks that involve any number of mathematical combinations of variables to create one single score.
Read the paper (PDF)
Vincent Chan, IMPAQ International
Lorena Ortiz, IMPAQ International
Session 4021-2016:
Building and Using User-Defined Formats
Formats are powerful tools within the SAS® System. They can be used to change how information is brought into SAS, to modify how it is displayed, and even to reshape the data itself. Base SAS® comes with a great many predefined formats, and it is even possible for you to create your own specialized formats. This paper briefly reviews the use of formats in general and then covers a number of aspects of user-generated formats. Since formats themselves have a number of uses that are not at first apparent to the new user, we also look at some of the broader applications of formats. Topics include building formats from data sets, using picture formats, transformations using formats, value translations, and using formats to perform table lookups.
Read the paper (PDF) | Download the data file (ZIP)
Art Carpenter, California Occidental Consultants
C
Session 11667-2016:
Can You Decipher the Code? If You Can, Maybe You Can Break It
You would think that training as a code breaker, similar to those who were employed during the Second World War, wouldn't be necessary to perform the routine SAS® programming duties of your job, such as debugging code. However, if the author of the code doesn't incorporate good elements of style in his or her program, the task of reviewing code becomes arduous and tedious for others. Style touches upon very specific aspects of writing code--indention for code and comments, casing of keywords, variables, and data set names, and spacing between PROC and DATA steps. Paying attention to these specific issues enhances the reusability and lifespan of your code. By using style to make your code readable, you'll impress your superiors and grow as a programmer.
Read the paper (PDF) | Watch the recording
Jay Iyengar, Data Systems Consultants
Session 11840-2016:
Can You Read This into SAS® for Me: Using INFILE and INPUT to Load Data into SAS®
With all the talk of big data and visual analytics, we sometimes forget how important, and often difficult, it is to get external data into SAS®. In this paper, we review some common data sources such as delimited sources (for example, comma-separated values format [CSV]) as well as structured flat files and the programming steps needed to successfully load these files into SAS. In addition to examining the INFILE and INPUT statements, we look at some methods for dealing with bad data. This paper assumes only basic SAS skills, although the topic can be of interest to anyone who needs to read external files.
Read the paper (PDF) | Watch the recording
Peter Eberhardt, Fernwood Consulting Group Inc.
Audrey Yeo, Athene
Session 9541-2016:
Cleaning Up Your SAS® Log: Note Messages
As a SAS® programmer, you probably spend some of your time reading and possibly creating specifications. Your job also includes writing and testing SAS code to produce the final product, whether it is Study Data Tabulation Model (SDTM) data sets, Analysis Data Model (ADaM) data sets, or statistical outputs such as tables, listings, or figures. You reach the point where you have completed the initial programming, removed all obvious errors and warnings from your SAS log, and checked your outputs for accuracy. You are almost done with your programming task, but one important step remains. It is considered best practice to check your SAS log for any questionable messages generated by the SAS system. In addition to messages that begin with the words WARNING or ERROR, there are also messages that begin with the words NOTE or INFO. This paper will focus on five different types of NOTE messages that commonly appear in the SAS log and will present ways to remove these messages from your log.
Read the paper (PDF) | Watch the recording
Jennifer Srivastava, Quintiles
Session 11778-2016:
Comparative Study of PROC EXPORT and Output Delivery System
Suppose that you have a very large data set with some specific values in one of the columns of the data set, and you want to classify the entire data set into different comma-separated-values format (CSV) sheets based on the values present in that specific column. Perhaps you might use codes using IF/THEN and ELSE statement conditions in SAS®, along with some OUTPUT statements. If you divide that data set into csv sheets, it is more frustrating to use the conventional, manual process of converting each of the separated data sets into csv files. This paper shows a comparative study of using the Macro command in SAS with the help of the proc Export statement and the Output Delivery System (ODS) command using proc tabulate. In these two processes, the whole tedious process is done automatically using the SAS code.
Read the paper (PDF) | Watch the recording
Saurabh Nandy, Oklahoma State University
Session 11887-2016:
Considerations in Organizing the Structure of SAS® Macro Libraries
SAS® macros offer a very flexible way of developing, organizing, and running SAS code. In many systems, programming components are stored as macros in files and called as macros via a variety of means so that they can perform the task at hand. Consideration must be given to the organization of these files in a manner consistent with proper use and retention. An example might be the development of some macros that are company-wide in potential application, others that are departmental, and still others that might be for personal or individual user. Super-imposed on this structure in a factorial fashion are the need to have areas that are considered: (1) production - validated, (2) archived - retired and (3) developmental. Several proposals for accomplishing this are discussed as well as how this structure might interrelate with stored processes available in some SAS systems. The use of these macros in systems ranging from simple background batch processing to highly interactive SAS and BI processing are discussed. Specifically, the drivers or driver files are addressed. Pro's and con's to all approaches are considered.
Read the paper (PDF) | View the e-poster or slides (PDF)
Roger Muller, Data-To-Events, Inc.
Session 2080-2016:
Creating a Q-gram Algorithm to Determine the Similarity of Two Character Strings
This paper shows how to program a powerful Q-gram algorithm for measuring the similarity of two character strings. Along with built-in SAS® functions--such as SOUNDEX, SPEDIS, COMPGED, and COMPLEV--Q-gram can be a valuable tool in your arsenal of string comparators. Q-gram is especially useful when measuring the similarity of strings that are intuitively identical, but which have a different word order, such as John Smith and Smith, John.
Read the paper (PDF) | Watch the recording
Joe DeShon, Boehringer-Ingelheim
D
Session SAS6462-2016:
Data Analysis with User-Written DS2 Packages
The DATA step and DS2 both offer the user a built-in general purpose hash object that has become the go-to tool for many data analysis problems. However, there are occasions where the best solution would require a custom object specifically tailored to the problem space. The DS2 Package syntax allows the user to create custom objects that can form linked structures in memory. With DS2 Packages it is possible to create lists or tree structures that are custom tailored to the problem space. For data that can describe a great many more states than actually exist, dynamic structures can provide an extremely compact way to manipulate and analyze the data. The SAS® In-Database Code Accelerator allows these custom packages to be deployed in parallel on massive data grids.
Read the paper (PDF)
Robert Ray, SAS
Session SAS2802-2016:
Dealing with Nanoseconds in SAS® Datetime Values in Transaction Processing
This presentation describes a technique for dealing with the precision problems inherent in datetime values containing nanosecond data. Floating-point values cannot store sufficient precision for this, and this limitation can be a problem for transactional data where nanoseconds are pertinent. Methods discussed include separation of variables and using the special GROUPFORMAT feature of the BY statement with MERGE in the DATA step.
Read the paper (PDF) | Watch the recording
Rick Langston, SAS
Session SAS6422-2016:
Deep Dive with SAS® Studio into SAS® Grid Manager 9.4
Do you know how many different ways SAS® Studio can run your programs with SAS® Grid Manager? SAS Studio is the latest and coolest interface to the SAS® software. As such, we want to use it in most situations, including sites that leverage SAS Grid Manager. Are you new to SAS and want to be guided by a modern GUI? SAS Studio is here to help you. Are you a code fanatic, who wants total control of how your program runs to harness the full power of SAS Grid Manager? Sure, SAS Studio is for you, too. This paper covers all the different SAS Studio editions. You can learn how to connect each of them to SAS Grid Manager and discover best practices for harnessing a high-performance SAS analytics environment, while avoiding potential pitfalls.
Read the paper (PDF)
Edoardo Riva, SAS
Session SAS6140-2016:
Developing SAS® Studio Repositories
This paper covers developing SAS® Studio repositories. SAS Studio introduced a new way of developing custom tasks using an XML markup specification and the Apache Velocity templating language. SAS Studio repositories build on this framework and provide a flexible way to package custom tasks and snippets. After tasks and snippets are packaged into a repository, they can be shared with users inside your organization or outside your organization. This paper uses several examples to help you create your first repository.
Read the paper (PDF)
Swapnil Ghan, SAS
Michael Monaco, SAS
Amy Peters, SAS
Session 8300-2016:
Document and Enhance Your SAS® Code, Data Sets, and Catalogs with SAS Functions, Macros, and SAS Metadata
Discover how to document your SAS® programs, data sets, and catalogs with a few lines of code that include SAS functions, macro code, and SAS metadata. Do you start every project with the best of intentions to document all of your work, and then fall short of that aspiration when deadlines loom? Learn how your programs can automatically update your processing log. If you have ever wondered who ran a program that overwrote your data, SAS has the answer! And If you don't want to be tracing back through a year's worth of code to produce a codebook for your client at the end of a contract, SAS has the answer!
Read the paper (PDF) | Download the data file (ZIP)
Roberta Glass, Abt Associates
Louise Hadden, Abt Associates
Session 7480-2016:
"Don't You SAS® Me!": Fostering SAS® Adoption in an Excel-based Organization
In any organization where people work with data, it is extremely unlikely that there will be only one way of doing things. Things like divisional silos and differences in education, training, and subject matter often result in a diversity of tools and levels of expertise. One situation that frequently arises is that of 'code versus click.' Let's call it the difference between code-based, 'power user' data tools and simpler, purely graphic point-and-click tools such as Microsoft Excel. Even though the work itself might be quite similar, differences in analysis tools often mean differences in vocabulary and experience, and it can be difficult to convert users of one tool to another. This discussion will highlight the potential challenges of SAS® adoption in an Excel-based workplace and propose strategies to gain new SAS advocates in your organization.
Read the paper (PDF)
Andrew Clapson, MD Financial Management
E
Session 11662-2016:
Effective Ways of Handling Various File Types and Importing Techniques Using SAS®9.4
Data-driven decision making is critical for any organization to thrive in this fiercely competitive world. The decision-making process has to be accurate and fast in order to stay a step ahead of the competition. One major problem organizations face is huge data load times in loading or processing the data. Reducing the data loading time can help organizations perform faster analysis and thereby respond quickly. In this paper, we compared the methods that can import data of a particular file type in the shortest possible time and thereby increase the efficiency of decision making. SAS® takes input from various file types (such as XLS, CSV, XLSX, ACCESS, and TXT) and converts that input into SAS data sets. To perform this task, SAS provides multiple solutions (such as the IMPORT procedure, the INFILE statement, and the LIBNAME engine) to import the data. We observed the processing times taken by each method for different file types with a data set containing 65,535 observations and 11 variables. We executed the procedure multiple times to check for variation in processing time. From these tests, we recorded the minimum processing time for the combination of procedure and file type. From our analysis of processing times taken by each importing technique, we observed that the shortest processing times for CSV and TXT files, XLS and XLSX files, and ACCESS files are the INFILE statement, the LIBNAME engine, and PROC IMPORT, respectively.
View the e-poster or slides (PDF)
Divya Dadi, Oklahoma State University
Rahul Jhaver, Oklahoma State University
Session 9180-2016:
Efficiently Create Rates over Different Time Periods (PROC MEANS and PROC EXPAND)
This session illustrates how to quickly create rates over a specified period of time, using the MEANS and EXPAND procedures. For example, do you want to know how to use the power of SAS® to create a year-to-date, rolling 12-month, or monthly rate? At Kaiser Permanente, we use this technique to develop Emergency Department (ED) use rates, ED admit rates, patient day rates, readmission rates, and more. A powerful function of PROC MEANS, given a database table with several dimensions and one or more facts, is to perform a mathematical calculation on fact columns across several different combinations of dimensions. For example, if a membership database table exists with the dimensions member ID, year-month, line of business, medical office building, and age grouping, PROC MEANS can easily determine and output the count of members by every possible dimension combination into a SAS data set. Likewise, if a hospital visit database table exists with the same dimensions and facts, PROC MEANS can output the number of hospital visits by the dimension combinations into a second SAS data set. With the power of PROC EXPAND, each of the data sets above, once sorted properly, can have columns added, which calculate total members and total hospital visits by a time dimension of the analyst's choice. Common time dimensions used for Kaiser Permanente's utilization rates are monthly, rolling 12-months, and year-to-date. The resulting membership and hospital visit data sets can be joined with a MERGE statement, and simple division produces a rate for the given dimensions.
Read the paper (PDF) | Watch the recording
Thomas Gant, Kaiser Permanente
Session SAS5060-2016:
Exploring SAS® Embedded Process Technologies on Hadoop
SAS® Embedded Process offers a flexible, efficient way to leverage increasing amounts of data by injecting the processing power of SAS® directly where the data lives. SAS Embedded Process can tap into the massively parallel processing (MPP) architecture of Hadoop for scalable performance. Using SAS® In-Database Technologies for Hadoop, you can run scoring models generated by SAS® Enterprise Miner™ or, with SAS® In-Database Code Accelerator for Hadoop, user-written DS2 programs in parallel. With SAS Embedded Process on Hadoop you can also perform data quality operations, and extract and transform data using SAS® Data Loader. This paper explores key SAS technologies that run inside the Hadoop parallel processing framework and prepares you to get started with them.
Read the paper (PDF)
David Ghazaleh, SAS
F
Session 9260-2016:
FASHION, STYLE "GOTTA HAVE IT" COMPUTE DEFINE BLOCK
Do you create complex reports using PROC REPORT? Are you confused by the COMPUTE BLOCK feature of PROC REPORT? Are you even aware of it? Maybe you already produce reports using PROC REPORT, but suddenly your boss needs you to modify some of the values in one or more of the columns. Maybe your boss needs to see the values of some rows in boldface and others highlighted in a stylish yellow. Perhaps one of the columns in the report needs to display a variety of fashionable formats (some with varying decimal places and some without any decimals). Maybe the customer needs to see a footnote in specific cells of the report. Well, if this sounds familiar then come take a look at the COMPUTE BLOCK of PROC REPORT. This paper shows a few tips and tricks of using the COMPUTE DEFINE block with conditional IF/THEN logic to make your reports stylish and fashionable. The COMPUTE BLOCK allows you to use data DATA step code within PROC REPORT to provide customization and style to your reports. We'll see how the Census Bureau produces a stylish demographic profile for customers of its Special Census program using PROC REPORT with the COMPUTE BLOCK. The paper focuses on how to use the COMPUTE BLOCK to create this stylish Special Census profile. The paper shows quick tips and simple code to handle multiple formats within the same column, make the values in the Total rows boldface, trafficlighting, and how to add footnotes to any cell based on the column or row. The Special Census profile report is an Excel table created with ODS tagsets.ExcelXP that is stylish and fashionable, thanks in part to the COMPUTE BLOCK.
Read the paper (PDF) | Watch the recording
Chris Boniface, Census Bureau
Session 8780-2016:
File Management Using Pipes and X Commands in SAS®
SAS® for Windows is extremely powerful software, not only for analyzing data, but also for organizing and maintaining output and permanent data sets. By using pipes and operating system (x) commands within a SAS session, you can easily and effectively manage files of all types stored on your local network.
Read the paper (PDF) | Download the data file (ZIP) | Watch the recording
Emily Sisson, Boston University School of Public Health Data Coordinating Center
Session SAS6661-2016:
Finding the One: Using SAS® Code and SAS® Data Quality Server to Create a Matched Record
In this era of data analytics, you are often faced with a challenge of joining data from multiple legacy systems. When the data systems share a consistent merge key, such as ID or SSN, the solution is straightforward. However, what do you do when there is no common merge key? If one data system has a character value ID field, another has an alphanumeric field, and the only common fields are the names or addresses or dates of birth, a standard merge query does not work. This paper demonstrates fuzzy matching methods that can overcome this obstacle and build your master record through Base SAS® coding. The paper also describes how to leverage the SAS® Data Quality Server in SAS® code.
Read the paper (PDF)
Elena Shtern, SAS
Kim Hare, SAS
Session 10821-2016:
Four Lines of Code: Using Merge and Colons to Construct Historical Data from Status Tables
Collection of customer data is often done in status tables or snapshots, where, for example, for each month, the values for a handful of variables are recorded in a new status table whose name is marked with the value of the month. In this QuickTip, we present how to construct a table of last occurrence times for customers using a DATA step merge of such status tables and the colon (':') wildcard. If the status tables are sorted, this can be accomplished in four lines of code (where RUN; is the fourth). Also, we look at how to construct delta tables (for example, from one time period to another, or which customers have arrived or left) using a similar method of merge and colons.
Read the paper (PDF) | Watch the recording
Jingyu She, Danica Pension
Session 7760-2016:
Fuzzy Matching: Where Is It Appropriate and How Is It Done? SAS® Can Help.
When attempting to match names and addresses from different files, we often run into a situation where the names are similar, but not exactly the same. Sometimes there are additional words in the names, sometimes there are different spellings, and sometimes the businesses have the same name but are located thousands of miles apart. The files that contain the names might have numeric keys that cannot be matched. Therefore, we need to use a process called fuzzy matching to match the names from different files. The SAS® function COMPGED, combined with SAS character-handling functions, provides a straightforward method of applying business rules and testing for similarity.
Read the paper (PDF)
Stephen Sloan, Accenture
Daniel Hoicowitz, Accenture
H
Session 4342-2016:
High-Performance Data Access with FedSQL and DS2
SAS® Federated Query Language (FedSQL) is a SAS proprietary implementation of the ANSI SQL:1999 core standard capable of providing a scalable, threaded, high-performance way to access relational data in multiple data sources. This paper explores the FedSQL language in a three-step approach. First, we introduce the key features of the language and discuss the value each feature provides. Next, we present the FEDSQL procedure (the Base SAS® implementation of the language) and compare it to PROC SQL in terms of both syntax and performance. Finally, we examine how FedSQL queries can be embedded in DS2 programs to merge the power of these two high-performance languages.
Read the paper (PDF)
Shaun Kaufmann, Farm Credit Canada
Session 9540-2016:
How to Create Data-Driven Lists
As SAS® programmers we often want our code or program logic to be driven by the data at hand, rather than be directed by us. Such dynamic code enables the data to drive the logic or sequence of execution. This type of programming can be very useful when creating lists of observations, variables, or data sets from ever-changing data. Whether these lists are used to verify the data at hand or are to be used in later steps of the program, dynamic code can write our lists once and ensure that the values change in tandem with our data. This Quick Tip paper will present the concepts of creating data-driven lists for observations, variables, and data sets, the code needed to execute these tasks, and examples to better explain the process and results of the programs we will create.
Read the paper (PDF) | Watch the recording
Kate Burnett-Isaacs, Statistics Canada
Session 10940-2016:
How to Move Data among Client Hard Disk, the Hadoop File System, and SAS® LASR™ Analytic Server
In SAS® LASR™ Analytic Server, data can reside in three types of environments: client hard disk (for example, a laptop), the Hadoop File System (HDFS) and the memory of the SAS LASR Analytic Server. Moving the data efficiently among these is critical for getting insights from the data on time. In this paper, we illustrate all the possible ways to move the data, including 1) moving data from client hard disk to HDFS; 2) moving data from HDFS to client hard disk; 3) moving data from HDFS to SAS LASR Analytic Server; 4) moving data from SAS LASR Analytic Server to HDFS; 5) moving data from client hard disk to SAS LASR Analytic Server; and 6) moving data from SAS LASR Analytic Server to client hard disk.
Read the paper (PDF) | Watch the recording
Yue Qi, SAS
Session 11763-2016:
How to Use PERL Functions in SAS®: Some Basic Examples
PERL is a good language to work with text in in the case of SAS® with character variables. PERL was much used in website programming in the early days of the World Wide Web. SAS provides some PERL functions in DATA step statements. PERL is useful for finding text patterns or simply strings of text and returning the strings or positions in a character variable of the string. I often use PERL to locate a string of text in a character variable. Knowing this location, I know where to start a substr function. The poster covers a number of PERL functions available in SAS® 9.2 and later. Each function is explained with written text, and then a brief code demonstration shows how I use the function. The examples are jointly explained based on SAS documentation and my best practices.
View the e-poster or slides (PDF)
Peter Timusk, Independent
I
Session SAS2800-2016:
Implementing Hashing Techniques in SAS®
Although hashing methods such as SHA256 and MD5 are already available in SAS®, other methods may be needed by SAS users. This presentation shows how hashing and methods can be implemented using SAS DATA steps and macros, with judicious use of the bitwise functions (BAND, BOR, and so on) and by referencing public domain sources.
Read the paper (PDF) | Watch the recording
Rick Langston, SAS
Session 6463-2016:
Importing Metadata Programmatically Using the SAS® Batch Import Tool
Importing metadata can be a time-consuming, painstaking, and fraught task when you have multiple SAS packages to deal with. Fortunately SAS® software provides a batch facility to import (and export) metadata by using the command line, thereby saving you from many a mouse click and potential repetitive strain injury when you use the Metadata Import Wizard. This paper aims to demystify the SAS batch import tool. It also introduces a template-driven process for programmatically building a batch file that contains the syntax needed to import metadata and then automatically executing the batch import scripts.
Read the paper (PDF) | Download the data file (ZIP)
David Moors, Whitehound Limited
Session 10240-2016:
Instant Interactive SAS® Log Window Analyzer
An interactive SAS® environment is preferred for developing programs as it gives the flexibility of instantly viewing the log in the log window. The programmer must review the log window to ensure that each and every single line of a written program is running successfully without displaying any messages defined by SAS that are potential errors. Reviewing the log window every time is not only time consuming but also prone to manual error for any level of programmer. Just to confirm that the log is free from error, the programmer must check the log. Currently in the interactive SAS environment there is no way to get an instant notification about the generated log from the Log window, indicating whether there have been any messages defined by SAS that are potential errors. This paper introduces an instant approach to analyzing the Log window using the SAS macro %ICHECK that displays the reports instantly in the same SAS environment. The report produces a summary of all the messages defined by SAS in the Log window. The programmer does not need to add %ICHECK at the end of the program. Whether a single DATA step, a single PROC step, or the whole program is submitted, the %ICHECK macro is automatically executed at the end of every submission. It might be surprising to you to learn how a compiled macro can be executed without calling it in the Editor window. But it is possible with %ICHECK, and you can develop it using only SAS products. It can be used in a Windows or UNIX interactive SAS environment without requiring any user inputs. With the proposed approach, there is a significant benefit in the log review process and a 100% gain in time saved for all levels of programmers because the log is free from error. Similar functionality can be introduced in the SAS product itself.
Read the paper (PDF) | Watch the recording
Amarnath Vijayarangan, Emmes Services Pvt Ltd, India
PALANISAMY MOHAN, ICON CLINICAL RESEARCH INDIA PVT LTD
K
Session 9000-2016:
Kicking and Screaming Your Way to SAS® Enterprise Guide®
You are a skilled SAS® programmer. You can code circles around those newbies who point and click in SAS® Enterprise Guide®. And yet& there are tasks you struggle with on a regular basis, such as Is the name of that data set DRUG or DRUGS? and What intern wrote this code? It's not formatted well at all and is hard to read. In this seminar you learn how to program, yes program, more efficiently. You learn the benefits of autocomplete and inline help, as well as how to easily format the code that intern wrote that you inherited. In addition, you learn how to create a process flow of a program to identify any dead ends, i.e., data sets that get created but are not used in that program.
Read the paper (PDF)
Michelle Buchecker, ThotWave Technologies
L
Session 11666-2016:
Let the CAT Out of the Bag: String Concatenation in SAS®9
Are you still using TRIM, LEFT, and vertical bar operators to concatenate strings? It's time to modernize and streamline that clumsy code by using the string concatenation functions introduced in SAS®9. This paper is an overview of the CAT, CATS, CATT, and CATX functions introduced in SAS®9, and the new CATQ function added in SAS® 9.2. In addition to making your code more compact and readable, this family of functions offers some new tricks for accomplishing previously cumbersome tasks.
Read the paper (PDF)
Josh Horstman, Nested Loop Consulting
M
Session 5580-2016:
Macro Variables in SAS® Enterprise Guide®
For SAS® Enterprise Guide® users, sometimes macro variables and their values need to be brought over to the local workspace from the server, especially when multiple data sets or outputs need to be written to separate files in a local drive. Manually retyping the macro variables and their values in the local workspace after they have been created on the server workspace would be time-consuming and error-prone, especially when we have quite a number of macro variables and values to bring over. Instead, this task can be achieved in an efficient manner by using dictionary tables and the CALL SYMPUT routine, as illustrated in more detail below. The same approach can also be used to bring macro variables and their values from the local to the server workspace.
Read the paper (PDF) | Download the data file (ZIP) | Watch the recording
Khoi To, Office of Planning and Decision Support, Virginia Commonwealth University
Session 6481-2016:
Mastering Data Summarization with PROC SQL
The SQL procedure is extremely powerful when it comes to summarizing and aggregating data, but it can be a little daunting for programmers who are new to SAS® or for more experienced programmers who are more familiar with using the SUMMARY or MEANS procedure for aggregating data. This hands-on workshop demonstrates how to use PROC SQL for a variety of summarization and aggregation tasks. These tasks include summarizing multiple measures for groupings of interest, combining summary and detail information (via several techniques), nesting summary functions by using inline views, and generating summary statistics for derived or calculated variables. Attendees will learn how to use a variety of PROC SQL summary functions, how to effectively use WHERE and HAVING clauses in constructing queries, and how to exploit the PROC SQL remerge. The examples become progressively more complex over the course of the workshop as you gain mastery of using PROC SQL for summarizing data.
Read the paper (PDF)
Christianna Williams, Self-Employed
Session 10460-2016:
Missing Values: They Are NOT Nothing
When analyzing data with SAS®, we often encounter missing or null values in data. Missing values can arise from the availability, collectibility, or other issues with the data. They represent the imperfect nature of real data. Under most circumstances, we need to clean, filter, separate, impute, or investigate the missing values in data. These processes can take up a lot of time, and they are annoying. For these reasons, missing values are usually unwelcome and need to be avoided in data analysis. There are two sides to every coin, however. If we can think outside the box, we can take advantage of the negative features of missing values for positive uses. Sometimes, we can create and use missing values to achieve our particular goals in data manipulation and analysis. These approaches can make data analyses convenient and improve work efficiency for SAS programming. This kind of creative and critical thinking is the most valuable quality for data analysts. This paper exploits real-world examples to demonstrate the creative uses of missing values in data analysis and SAS programming, and discusses the advantages and disadvantages of these methods and approaches. The illustrated methods and advanced programming skills can be used in a wide variety of data analysis and business analytics fields.
Read the paper (PDF)
Justin Jia, Trans Union Canada
Shan Shan Lin, CIBC
Session 2120-2016:
More Hidden Base SAS® Features to Impress Your Colleagues
Across the languages of SAS® are many golden nuggets--functions, formats, and programming features just waiting to impress your friends and colleagues. While learning SAS for over 30 years, I have collected a few of these nuggets, and I offer a dozen more of them to you in this presentation. I presented the first dozen in a similar paper at SAS Global Forum 2015.
Read the paper (PDF) | Watch the recording
Peter Crawford, Crawford Software Consultancy limited
N
Session 3442-2016:
Name That Tune--Writing Music with SAS®
Writing music with SAS® is a simple process. Including a snippet of music in a program is a great way to signal completion of processing. It's also fun! This paper illustrates a method for translating music into SAS code using the CALL SOUND routine.
Read the paper (PDF) | Watch the recording
Dan Bretheim, Towers Watson
Session 9140-2016:
No FREQ'n Way
In the consumer credit industry, privacy is key and the scrutiny increases every day. When returning files to a client, we must ensure that they are depersonalized so that the client cannot match back to any personally identifiable information (PII). This means that we must locate any values for a variable that occur on a limited number of records and null them out (i.e., replace them with missing values). When you are working with large files that have more than one million observations and thousands of variables, locating variables with few unique values is a difficult task. While the FREQ procedure and the DATA step merging can accomplish the task, using first./last. BY variable processing to locate the suspect values and hash objects to merge the data set back together might offer increased efficiency.
Read the paper (PDF)
Renee Canfield, Experian
Session 7260-2016:
No More Bad Dates!: A Guide to SAS® Dates in Macro Language
The SAS® macro language is an efficient and effective way to handle repetitive processing tasks. One such task is conducting the same DATA steps or procedures for different time periods, such as every month, quarter, or year. SAS® dates are based on the number of days between January 1st, 1960, and the target date value, which, while very simple for a computer, is not a human-friendly representation of dates. SAS dates and macro processing are not simple concepts themselves, so mixing the two can be particularly challenging, and it can be very easy to end up working with bad dates! Understanding how macros and SAS dates work individually and together can greatly improve the efficiency of your coding and data processing tasks. This paper covers the basics of SAS macro processing, SAS dates, and the benefits and difficulties of using SAS dates in macros, to ensure that you never have a bad date again.
Read the paper (PDF)
Kate Burnett-Isaacs, Statistics Canada
Andrew Clapson, MD Financial Management
O
Session 7720-2016:
Omitting Records with Invalid Default Values
Many databases include default values that are set inappropriately. For example, a top-level Yes/No question might be followed by a group of check boxes, to which a respondent might indicate multiple answers. When creating the database, a programmer might choose to set a default value of 0 for each box that is not checked. One should interpret these default values with caution, however, depending on whether the top-level question is answered Yes or No or is missing. A similar scenario occurs with a Yes/No question where No is the default value if the question is not answered (but actually the value should be missing). These default values might be scattered throughout the database; there might be no pattern to their occurrence. Records without valid information should be omitted from statistical analysis. Programmers should understand the difference between missing values and invalid values (that is, incorrectly set defaults) because it is important to handle these records differently. Choosing the best method to omit records with invalid values can be difficult. Manual corrections are often prohibitively time-consuming. SAS® offers a useful approach: a combined DATA step and SQL procedure. This paper provides a step-by-step method to accomplish the task.
Read the paper (PDF) | Download the data file (ZIP) | Watch the recording
Lily Yu, Statistics Collaborative, Inc.
P
Session 3440-2016:
PROC DATASETS: The Swiss Army Knife of SAS® Procedures
This paper highlights many of the major capabilities of the DATASETS procedure. It discusses how it can be used as a tool to update variable information in a SAS® data set; to provide information on data set and catalog contents; to delete data sets, catalogs, and indexes; to repair damaged SAS data sets; to rename files; to create and manage audit trails; to add, delete, and modify passwords; to add and delete integrity constraints; and more. The paper contains examples of the various uses of PROC DATASETS that programmers can cut and paste into their own programs as a starting point. After reading this paper, a SAS programmer will have practical knowledge of the many different facets of this important SAS procedure.
Read the paper (PDF) | Watch the recording
Michael Raithel, Westat
Session 7540-2016:
PROC SQL for SQL DieHards
Inspired by Christianna William's paper on transitioning to PROC SQL from the DATA step, this paper aims to help SQL programmers transition to SAS® by using PROC SQL. SAS adapted the Structured Query Language (SQL) by means of PROC SQL back with SAS®6. PROC SQL syntax closely resembles SQL. However, there are some SQL features that are not available in SAS. Throughout this paper, we outline common SQL tasks and how they might differ in PROC SQL. We also introduce useful SAS features that are not available in SQL. Topics covered are appropriate for novice SAS users.
Read the paper (PDF)
Barbara Ross, NA
Jessica Bennett, Snap Finance
Session 2480-2016:
Performing Pattern Matching by Using Perl Regular Expressions
SAS® software provides many DATA step functions that search and extract patterns from a character string, such as SUBSTR, SCAN, INDEX, TRANWRD, etc. Using these functions to perform pattern matching often requires you to use many function calls to match a character position. However, using the Perl regular expression (PRX) functions or routines in the DATA step improves pattern-matching tasks by reducing the number of function calls and making the program easier to maintain. This talk, in addition to discussing the syntax of Perl regular expressions, demonstrates many real-world applications.
Read the paper (PDF) | Download the data file (ZIP)
Arthur Li, City of Hope
R
Session SAS6363-2016:
REST at Ease with SAS®: How to Use SAS to Get Your REST
Representational State Transfer (REST) is being used across the industry for designing networked applications to provide lightweight and powerful alternatives to web services such as SOAP and Web Services Description Language (WSDL). Since REST is based entirely on HTTP, SAS® provides everything you need to make REST calls and to process structured and unstructured data alike. This paper takes a look at how some enhancements in the third maintenance release of SAS® 9.4 can benefit you in this area. Learn how the HTTP procedure and other SAS language features provide everything you need to simply and securely use REST.
Read the paper (PDF)
Joseph Henry, SAS
Session 10701-2016:
Running Projects for the Average Joe
This paper explores some proven methods used to automate complex SAS® Enterprise Guide® projects so that the average Joe can run them with little or no prior experience. There are often times when a programmer is requested to extract data and dump it into Microsoft Excel for a user. Often these data extracts are very similar and can be run with previously saved code. However, the user quite often has to wait for the programmer to have the time to simply run the code. By automating the code, the programmer regains control over their data requests. This paper discusses the benefits of establishing macro variables and creating stored procedures, among other tips
Read the paper (PDF) | Watch the recording
Jennifer Davies, Department of Education
S
Session 2748-2016:
SAS® Debugging 101
SAS® users are always surprised to discover their programs contain bugs (or errors). In fact, when asked, users will emphatically stand by their programs and logic by saying they are error free. But, the vast number of experiences, along with the realities of writing code, say otherwise. Errors in program code can appear anywhere, whether accidentally introduced by developers or programmers, when writing code. No matter where an error occurs, the overriding sentiment among most users is that debugging SAS programs can be a daunting and humbling task. This presentation explores the world of SAS errors, providing essential information about the various error types. Attendees learn how errors are created, their symptoms, identification techniques, and how to apply effective techniques to better understand, repair, and enable program code to work as intended.
Read the paper (PDF)
Kirk Paul Lafler, Software Intelligence Corporation
Session 11768-2016:
SAS® Spontaneous Combustion: Securing Software Portability through Self-Extracting Code
Spontaneous combustion describes combustion that occurs without an external ignition source. With the right combination of fire tetrahedron components--including fuel, oxidizer, heat, and chemical reaction--it can be a deadly yet awe-inspiring phenomenon, and differs from traditional combustion that requires a fire source, such as a match, flint, or spark plugs (in the case of combustion engines). SAS® code as well often requires a 'spark' the first time it is run or run within a new environment. Thus, SAS® programs might operate correctly in an organization's original development environment, but might fail in its production environment until folders are created, SAS® libraries are assigned, control tables are constructed, or configuration files are built or modified. And, if software portability is problematic within a single organization, imagine the complexities that exist when SAS code is imported from a blog, white paper, textbook, or other external source into one's own infrastructure. The lack of software portability and the complexities of initializing new code often compel development teams to build code from scratch rather than attempting to rehabilitate or customize existent code to run in their environment. A solution is to develop SAS code that flexibly builds and validates its environment and required components during execution. To that end, this text describes techniques that increase the portability, reusability, and maintainability of SAS code by demonstrating self-extracting, spontaneously combustible code that requires no spark.
Read the paper (PDF)
Troy Hughes, Datmesis Analytics
Session 11741-2016:
Secrets of Efficient SAS® Coding Techniques
Just as there are many ways to solve any problem in any facet of life, most SAS® programming problems have more than one potential solution. Each solution has tradeoffs; a complex program might execute very quickly but prove challenging to maintain, while a program designed for ease of use might require more resources for development, execution, and maintenance. Too often, it seems like those tasked to produce the results are advised on delivery date and estimated development time in advance, but are given no guidelines for efficiency expectations. This paper provides ways for improving the efficiency of your SAS® programs. It suggests coding techniques, provides guidelines for their use, and shows the results of experimentation to compare various coding techniques, with examples of acceptable and improved ways to accomplish the same task.
Read the paper (PDF) | Watch the recording
Andrew Kuligowski, HSN
Swati Agarwal, Optum
Session 11700-2016:
Solving the 1,001-Piece Puzzle in 10 (or Fewer) Easy Steps: Using SAS®9 .cfg, autoexec.sas, SAS Registry, and Options to Set Up Base SAS®
Are you frustrated with manually setting options to control your SAS® Display Manager sessions but become daunted every time you look at all the places you can set options and window layouts? In this paper, we look at various files SAS accesses when starting, what can (and cannot) go into them, and what takes precedence after all are executed. We also look at the SAS Registry and how to programmatically change settings. By the end of the paper, you will be comfortable in knowing where to make the changes that best fit your needs.
Read the paper (PDF)
Peter Eberhardt, Fernwood Consulting Group Inc.
Session 2100-2016:
Super Boost Data Transpose Puzzle
This paper compares different solutions to a data transpose puzzle presented to the SAS® User Group at the United States Census Bureau. The presented solutions range from a SAS 101 multi-step solution to an advanced solution using techniques that are not widely known, which yields run-time savings of 85 percent!
Read the paper (PDF) | Download the data file (ZIP)
Ahmed Al-Attar, AnA Data Warehousing Consulting, LLC
T
Session 6124-2016:
The Fantastic Four: Running Your Report Using the TABULATE, TRANSPOSE, REPORT, or SQL Procedure
Like all skilled tradespeople, SAS® programmers have many tools at their disposal. Part of their expertise lies in knowing when to use each tool. In this paper, we use a simple example to compare several common approaches to generating the requested report: the TABULATE, TRANSPOSE, REPORT, and SQL procedures. We investigate the advantages and disadvantages of each method and consider when applying it might make sense. A variety of factors are examined, including the simplicity, reusability, and extensibility of the code in addition to the opportunities that each method provides for customizing and styling the output. The intended audience is beginning to intermediate SAS programmers.
Read the paper (PDF)
Josh Horstman, Nested Loop Consulting
Session 10641-2016:
The Path Length: Parent-Child De-lineage with PROC TREE and ODS
The SAS® procedure PROC TREE sketches parent-child lineage--also known as trees--from hierarchical data. Hierarchical relationships can be difficult to flatten out into a data set, but with PROC TREE, its accompanying ODS table TREELISTING, and some creative yet simple handcrafting, a de-lineage of parent-children variables can be derived. Because the focus of PROC TREE is to provide the tree structure in graphical form, it does not explicitly output the depth of the tree, although the depth is visualized in the accompanying graph. Perhaps unknown to most, the path length variable, or simply the height of the tree, can be extracted from PROC TREE merely by capturing it from the ODS output, as demonstrated in this paper.
Read the paper (PDF)
Can Tongur, Statistics Sweden
Session 9200-2016:
The Roads We Take: Let's Hash It Out
Time flies. Thirteen years have passed since the first introduction of the hash method by SAS®. Dozens of papers have been written offering new (sometimes weird) applications for hash. Even beyond look-ups, we can use the hash object for summation, splitting files, sorting files, fuzzy matching, and other data processing problems. In SAS, almost any task can be solved by more than one method. For a class of problems where hash can be applied, is it the best tool to use? We present a variety of problems and provide several solutions, using both hash tables and more traditional methods. Comparing the solutions, you must consider computing time as well as your time to code and validate your responses. Which solution is better? You decide!
View the e-poster or slides (PDF)
David Izrael, Abt Associates
Elizabeth Axelrod, Abt Associates
Session 11340-2016:
The SAS® Test from H*ll: How Well do You REALLY know SAS?
This SAS® test begins where the SAS® Advanced Certification test leaves off. It has 25 questions to challenge even the most experienced SAS programmers. Most questions are about the DATA step, but there are a few macro and SAS/ACCESS® software questions thrown in. The session presents each question on a slide, then the answer on the next. You WILL be challenged.
View the e-poster or slides (PDF)
Glen Becker, USAA
Session 9341-2016:
The Three I's of SAS® Log Messages, IMPORTANT, INTERESTING, and IRRELEVANT
I like to think that SAS® error messages come in three flavors, IMPORTANT, INTERESTING, and IRRELEVANT. SAS calls its messages NOTES, WARNINGS, and ERRORS. I intend to show you that not all NOTES are IRRELEVANT nor are all ERRORS IMPORTANT. This paper walks through many different scenarios and explains in detail the meaning and impact of messages presented in the SAS log. I show you how to locate, classify, analyze, and resolve many different SAS message types. And for those brave enough, I go on to teach you how to both generate and suppress messages sent to the SAS log. This paper presents an overview of messages that can often be found in a SAS Log window or output file. The intent of this presentation is to familiarize you with common messages, the meaning of the messages, and how to determine the best way to react to the messages. Notice I said react , not necessarily correct. Code examples and log output will be presented to aid in the explanations.
Read the paper (PDF) | Download the data file (ZIP)
William E Benjamin Jr, Owl Computer Consultancy LLC
Session 7280-2016:
Tips and Techniques for User-Defined Packages in SAS® DS2
SAS® DS2 is a powerful new object-oriented programming language that was introduced with SAS® 9.4. Having been designed for advanced data manipulation, it enables the programmer not only to use a much wider range of data types than with the traditional Base SAS® language but it also allows for the creation of custom methods and packages. These packages are analogous to classes in other object-oriented languages such as C# or Ruby and can be used to massively improve your programming effectiveness. This paper demonstrates a number of techniques that can be used to take full advantage of DS2's object-oriented user-defined package capabilities. Examples include creating new data types, storing and reusing packages, using simple packages as building blocks in the creation of more complex packages, a technique to overcome DS2's lack of support for inheritance and building lightweight packages to facilitate method overloading and make parameter passing simpler.
Read the paper (PDF)
Chris Brooks, Melrose Analytics Ltd
Session 11822-2016:
To Macro or not to Macro: That Is the Question
Do you need a macro for your program? This paper provides some guidelines, based on user experience, and explores whether it's worth the time to create a macro (for example, a parameter-driven macro or just a simple macro variable). This paper is geared toward new users and experienced users who do not use macros.
Read the paper (PDF)
Claudine Lougee, Dualenic
Session 4500-2016:
Transformers: Revenge of the PROCs
Sometimes data are not arranged the way you need them to be for the purpose of reporting or combining with other data. This presentation examines just such a scenario. We focus on the TRANSPOSE procedure and its ability to transform data to meet our needs. We explore this method as an alternative to using longhand code involving arrays and OUTPUT statements in a SAS® DATA step.
Read the paper (PDF)
Faron Kincheloe, Baylor University
U
Session 11863-2016:
Uncommon Techniques for Common Variables
If data sets are being merged and a variable occurs in more than one of the data sets, the last value (from the variable in the right-most data set listed in the MERGE statement) is kept. It is critical, therefore, to know the name and sources of any common variables. This paper reviews simple techniques for noting overwriting in the log, summarizing the names and sources of all variables, and identifying common variables before merging files.
Read the paper (PDF) | Watch the recording
Christopher Bost, MDRC
Session 1780-2016:
Universal File Flattener
This session describes the construction of a program that converts any set of relational tables into a single flat file, using Base SAS® in SAS® 9.3. The program gets the information it needs from the data tables themselves, with a minimum of user configuration. It automatically detects one-to-many relationships and creates sets of replicate variables in the flat file. The program illustrates the use of macro programming and the SQL, DATASETS, and TRANSPOSE procedures, among others. The output is sent to a Microsoft Excel spreadsheet using the Output Delivery System (ODS).
Read the paper (PDF) | Download the data file (ZIP)
Dav Vandenbroucke, HUD
Session SAS6600-2016:
Using SASIOTEST to Measure I/O Throughput
Have you questioned the Read throughput or Write throughput for your Windows system drive? What about the input/output (I/O) throughput of a non-system drive? One solution is use SASIOTEST.EXE to measure the Read or Write throughput for any drive connected to your system. Since SAS® 9.2, SASIOTEST.EXE has been included with each release of SAS for Windows. This paper explains the options supported by SASIOTEST and the various ways to use SASIOTEST. It also describes how the I/O relates to SAS I/O on Windows.
Read the paper (PDF)
Mike Jones, SAS
Session 7820-2016:
Using a SAS® Hash Object to Speed and Simplify the Survey Cell Collapsing Process
This paper introduces an extremely fast and simple implementation of the survey cell collapsing process. Prior implementations had used either several SQL queries or numerous DATA step arrays, with multiple data reads. This new approach uses a single hash object with a maximum of two data reads. The hash object provides an efficient and convenient mechanism for quick data storage and retrieval (sub-second total run time).
Read the paper (PDF) | Download the data file (ZIP)
Ahmed Al-Attar, AnA Data Warehousing Consulting, LLC
V
Session 8140-2016:
Virtual Accessing of a SAS® Data Set Using OPEN, FETCH, and CLOSE Functions with %SYSFUNC and %DO Loops
One of the truths about SAS® is that there are, at a minimum, three approaches to achieve any intended task and each approach has its own pros and cons. Identifying and using efficient SAS programming techniques are recommended and efficient programming becomes mandatory when larger data sets are accessed. This paper describes the efficiency of virtual access and various situations to use virtual access of data sets using OPEN, FETCH and CLOSE functions with %SYSFUNC and %DO loops. There are several ways to get data set attributes like number of observations, variables, types of variables, and so on. It becomes more efficient to access larger data sets using the OPEN function with %SYSFUNC. The FETCH with OPEN function gives the ability to access the values of the variables from a SAS data set for conditional and iterative executions with %DO loops. In the following situations, virtual access of the SAS data set becomes more efficient on larger data sets. Situation 1: It is often required to split the master data set into several pieces depending upon a certain criterion for the batch submission, as it is required that data sets be independent. Situation 2: For many reports and dashboards and for the array process, it is required to keep the relevant variables together instead of maintaining the original data set order. Situation 3: For most of the statistical analysis, particularly for correlation and regression, a widely and frequently used SAS procedure requires a dynamic variable list to be passed. Creating a single macro variable list might run into an issue with the macro variable length on the larger transactional data sets. It is recommended that you prepare the list of the variables as a data set and access them using the proposed approach in many places, instead of creating several macro variables.
Read the paper (PDF) | View the e-poster or slides (PDF)
Amarnath Vijayarangan, Emmes Services Pvt Ltd, India
W
Session 8301-2016:
What to Expect When You Need to Make a Data Delivery. . . Helpful Tips and Techniques
Making a data delivery to a client is a complicated endeavor. There are many aspects that must be carefully considered and planned for: de-identification, public use versus restricted access, documentation, ancillary files such as programs, formats, and so on, and methods of data transfer, among others. This paper provides a blueprint for planning and executing your data delivery.
Read the paper (PDF) | View the e-poster or slides (PDF)
Louise Hadden, Abt Associates
Session 11742-2016:
When Reliable Programs Fail: Designing for Timely, Efficient, Push-Button Recovery
Software quality comprises a combination of both functional and performance requirements that specify not only what software should accomplish, but also how well it should accomplish it. Recoverability--a common performance objective--represents the timeliness and efficiency with which software or a system can resume functioning following a catastrophic failure. Thus, requirements for high availability software often specify the recovery time objective (RTO), or the maximum amount of time that software might be down following an unplanned failure or a planned outage. While systems demanding high or near perfect availability require redundant hardware and network resources, and additional infrastructure, software must also facilitate rapid recovery. And, in environments in which system or hardware redundancy is infeasible, recoverability can be improved only through effective software development practices. Because even the most robust code can fail under duress or due to unavoidable or unpredictable circumstances, software reliability must incorporate recoverability principles and methods. This text introduces the TEACH mnemonic that describes guiding principles that software recovery should be timely, efficient, autonomous, constant, and harmless. Moreover, the text introduces the SPICIER mnemonic that describes discrete phases in the recovery period, each of which can benefit from and be optimized with TEACH principles. Software failure is inevitable, but negative impacts can be minimized through SAS® development best practices.
Read the paper (PDF)
Troy Hughes, Datmesis Analytics
Session 9440-2016:
Who's Your Neighbor? A SAS® Algorithm for Finding Nearby Zip Codes
Even if you're not a GIS mapping pro, it pays to have some geographic problem-solving techniques in your back pocket. In this paper we illustrate a general approach to finding the closest location to any given US zip code, with a specific, user-accessible example of how to do it, using only Base SAS®. We also suggest a method for implementing the solution in a production environment, as well as demonstrate how parallel processing can be used to cut down on computing time if there are hardware constraints.
Read the paper (PDF) | Download the data file (ZIP)
Andrew Clapson, MD Financial Management
Annmarie Smith, HomeServe USA
Session 6503-2016:
Writing Reusable Macros
I get annoyed when macros tread all over my SAS® environment, macro variables and data sets. So how do you write macros that play nicely and then clean up afterwards? This paper describes techniques for writing macros that do just that.
Read the paper (PDF) | Watch the recording
Philip Holland, Holland Numerics Ltd
Y
Session 10600-2016:
You Can Bet on It: Missing Observations Are Preserved with the PRELOADFMT and COMPLETETYPES Options
Do you write reports that sometimes have missing categories across all class variables? Some programmers write all sorts of additional DATA step code in order to show the zeros for the missing rows or columns. Did you ever wonder whether there is an easier way to accomplish this? PROC MEANS and PROC TABULATE, in conjunction with PROC FORMAT, can handle this situation with a couple of powerful options. With PROC TABULATE, we can use the PRELOADFMT and PRINTMISS options in conjunction with a user-defined format in PROC FORMAT to accomplish this task. With PROC SUMMARY, we can use the COMPLETETYPES option to get all the rows with zeros. This paper uses examples from Census Bureau tabulations to illustrate the use of these procedures and options to preserve missing rows or columns.
Read the paper (PDF) | Watch the recording
Chris Boniface, Census Bureau
Janet Wysocki, U.S. Census Bureau
back to top