In the clinical research world, data accuracy plays a significant role in delivering quality results. Various validation methods are available to confirm data accuracy. Of these, double programming is the most highly recommended and commonly used method to demonstrate a perfect match between production and validation output. PROC COMPARE is one of the SAS® procedures used to compare two data sets and confirm the accuracy In the current practice, whenever a program rerun happens, the programmer must manually review the output file to ensure an exact match. This is tedious, time-consuming, and error prone because there are more LST files to be reviewed and manual intervention is required. The proposed approach programmatically validates the output of PROC COMPARE in all the programs and generates an HTML output file with a Pass/Fail status flag for each output file with the following: 1. Data set name, label, and number of variables and observations 2. Number of observations not in both compared and base data sets 3. Number of variables not in both compared and base data sets The status is flagged as Pass whenever the output file meets the following 3N criteria: 1. NOTE: No unequal values were found. All values compared are exactly equal 2. Number of observations in base and compared data sets are equal 3. Number of variables in base and compared data sets are equal The SAS® macro %threeN efficiently validates all the output files generated from PROC COMPARE in a short time and also expedites validation with accuracy. It reduces up to 90% of the time spent on the manual review of the PROC COMPARE output files.
Amarnath Vijayarangan, Emmes Services Pvt Ltd, India
There are currently thousands of Jamaican citizens that lack access to basic health care. In order to improve the health-care system, I collect and analyze data from two clinics in remote locations of the island. This report analyzes data collected from Clarendon Parish, Jamaica. In order to create a descriptive analysis, I use SAS® Studio 9.4. A few of the procedures I use include: PROC IMPORT, PROC MEANS, PROC FREQ, and PROC GCHART. After conducting the aforementioned procedures, I am able to produce a descriptive analysis of the health issues plaguing the island.
Verlin Joseph, Florida A&M University
Disability rights groups are using the court system to exact change. As a result, the enforcement of Section 508 and similar laws around the world has become a priority. When you generate SAS® output, you need to protect your organization from litigation. This paper describes concrete steps that help you use SAS® 9.4 Output Delivery System (ODS) to create SAS output that complies with accessibility standards. It also provides recommendations and code samples that are aligned with the accessibility standards defined by Section 508 and the Web Content Accessibility Guidelines (WCAG 2.0).
Glen Walker, SAS
Common tasks that we need to perform are merging or appending SAS® data sets. During this process, we sometimes get error or warning messages saying that the same fields in different SAS data sets have different lengths or different types. If the problems involve a lot of fields and data sets, we need to spend a lot of time to identify those fields and write extra SAS codes to solve the issues. However, if you use the macro in this paper, it can help you identify the fields that have inconsistent data type or length issues. It also solves the length issues automatically by finding the maximum field length among the current data sets and assigning that length to the field. An html report is generated after running the macro that includes the information about which fields' lengths have been changed and which fields have inconsistent data type issues.
Ting Sa, Cincinnati Children's Hospital Medical Center
Two of the powerful features of ODS Graphics procedures is the ability to create forest plots and add inner margin tables to graphics output. The drawback, however, is that the syntax required by the programmer from PROC TEMPLATE is complex and tedious. A prompted application or even a parameterized stored process that connects PROC TEMPLATE code to a point-and-click application definitely makes life easier for coders in many industries who frequently create these types of graphic output.
Ted Durie, SAS
With the advent of the exciting new hybrid field of Data Science, programming and data management skills are in greater demand than ever and have never been easier to attain. Online resources like codecademy and w3schools offer a host of tutorials and assistance to those looking to develop their programming abilities and knowledge. Though their content is limited to languages and tools suited mostly for web developers, the value and quality of these sites are undeniable. To this end, similar tutorials for other free-to-use software applications are springing up. The interactivity of these tutorials elevates them above most, if not all, other out-of-classroom learning tools. The process of learning programming or a new language can be quite disjointed when trying to pair a textbook or similar walk-through material with matching coding tasks and problems. These sites unify these pieces for users by presenting them with a series of short, simple lessons that always require the user to demonstrate their understanding in a coding exercise before progressing. After teaching SAS® in a classroom environment, I became fascinated by the potential for a similar student-driven approach to learning SAS. This could afford me more time to provide individualized attention, as well as open up additional class time to more advanced topics. In this talk, I discuss my development of a series of SAS scripts that walk the user through learning the basics of SAS and that involve programming at every step of the process. This collection of scripts should serve as a self-contained, pseudo-interactive course in SAS basics that students could be asked to complete on their own in a few weeks, leaving the remainder of the term to be spent on more challenging, realistic tasks.
Hunter Glanz, California Polytechnic State University
With increasing data needs, it becomes more and more unwieldy to ensure that all scheduled jobs are running successfully and on time. Worse, maintaining your reputation as an information provider becomes a precarious prospect as the likelihood increases that your customers alert you of reporting issues before you are even aware yourself. By combining various SAS® capabilities and tying them together with concise visualizations, it is possible to track jobs actively and alert customers of issues before they become a problem. This paper introduces a report tracking framework that helps achieve this goal and improve customer satisfaction. The report tracking starts by obtaining table and job statuses and then color-codes them by severity levels. Based on the job status, it then goes deeper into the log to search for potential errors or other relevant information to help assess the processes. The last step is to send proactive alerts to users informing them of possible delays or potential data issues.
Jia Heng, Wyndham Destination Network
The new and highly anticipated SAS® Output Delivery System (ODS) destination for Microsoft Excel is finally here! Available as a production feature in the third maintenance release of SAS® 9.4 (TS1M3), this new destination generates native Excel (XLSX) files that are compatible with Microsoft Office 2010 or later. This paper is written for anyone, from entry-level programmers to business analysts, who uses the SAS® System and Microsoft Excel to create reports. The discussion covers features and benefits of the new Excel destination, differences between the Excel destination and the older ExcelXP tagset, and functionality that exists in the ExcelXP tagset that is not available in the Excel destination. These topics are all illustrated with meaningful examples. The paper also explains how you can bridge the gap that exists as a result of differences in the functionality between the destination and the tagset. In addition, the discussion outlines when it is beneficial for you to use the Excel destination versus the ExcelXP tagset, and vice versa. After reading this paper, you should be able to make an informed decision about which tool best meets your needs.
Chevell Parker, SAS
This work presents a macro for generating a ternary graph that can be used to solve a problem in the ceramics industry. The ceramics industry uses mixtures of clay types to generate a product with good properties that can be assessed according to breakdown pressure after incineration, the porosity, and water absorption. Beyond these properties, the industry is concerned with managing geological reserves of each type of clay. Thus, it is important to seek alternative compositions that present properties similar to the default mixture. This can be done by analyzing the surface response of these properties according to the clay composition, which is easily done in a ternary graph. SAS® documentation does not describe how to adjust an analysis grid in nonrectangular forms on graphs. A macro triaxial, however, can generate ternary graphical analysis by creating a special grid, using an annotated database, and making linear transformations of the original data. The GCONTOUR procedure is used to generate three-dimensional analysis.
IGOR NASCIMENTO, UNB
Zacarias Linhares, IFPI
ROBERTO SOARES, IFPI
This paper demonstrates how to use the ODS destination for PowerPoint to create attractive presentations from your SAS® output. Packed with examples, this paper gives you a behind-the-scenes tour of how ODS creates Microsoft PowerPoint presentations. You get an in-depth look at how to customize the ODS PowerPoint style templates that control the appearance of your presentation. With this information you can quickly turn your SAS output into an engaging and informative presentation. This paper is a follow-on to the SAS® Global Forum 2013 paper A First Look at the ODS Destination for PowerPoint.
Tim Hunter, SAS
Longitudinal and repeated measures data are seen in nearly all fields of analysis. Examples of this data include weekly lab test results of patients or performance of test score by children from the same class. Statistics students and analysts alike might be overwhelmed when it comes to repeated measures or longitudinal data analyses. They might try to educate themselves by diving into text books or taking semester-long or intensive weekend courses, resulting in even more confusion. Some might try to ignore the repeated nature of data and take short cuts such as analyzing all data as independent observations or analyzing summary statistics such as averages or changes from first to last points and ignoring all the data in-between. This hands-on presentation introduces longitudinal and repeated measures analyses without heavy emphasis on theory. Students in the workshop will have the opportunity to get hands-on experience graphing longitudinal and repeated measures data. They will learn how to approach these analyses with tools like PROC MIXED and PROC GENMOD. Emphasis will be on continuous outcomes, but categorical outcomes will briefly be covered.
Leanne Goldstein, City of Hope
SAS® functions provide amazing power to your DATA step programming. Some of these functions are essential--they save you from writing volumes of unnecessary code. This talk covers a number of the most useful SAS functions. Some may be new to you, and they will change the way you program and approach common programming tasks. The majority of the functions discussed in this talk work with character data. Some functions search for strings, and others find and replace strings or join strings together. Still others measure the spelling distance between two strings (useful for fuzzy matching). Some of the newest and most amazing functions are not functions at all, but call routines. Did you know that you can sort values within an observation? Did you know that you can identify not only the largest or smallest value in a list of variables, but the second- or third- or nth-largest or smallest value? Knowledge of the functions described here will make you a much better SAS programmer.
Ron Cody, Camp Verde Associates
The Waze application, purchased by Google in 2013, alerts millions of users about traffic congestion, collisions, construction, and other complexities of the road that can stymie motorists' attempts to get from A to B. From jackknifed rigs to jackalope carcasses, roads can be gnarled by gridlock or littered with obstacles that impede traffic flow and efficiency. Waze algorithms automatically reroute users to more efficient routes based on user-reported events as well as on historical norms that demonstrate typical road conditions. Extract, transform, load (ETL) infrastructures often represent serialized process flows that can mimic highways and that can become similarly snarled by locked data sets, slow processes, and other factors that introduce inefficiency. The LOCKITDOWN SAS® macro, introduced at the Western Users of SAS® Software Conference 2014, detects and prevents data access collisions that occur when two or more SAS processes or users simultaneously attempt to access the same SAS data set. Moreover, the LOCKANDTRACK macro, introduced at the conference in 2015, provides real-time tracking of and historical performance metrics for locked data sets through a unified control table, enabling developers to hone processes in order to optimize efficiency and data throughput. This paper demonstrates the implementation of LOCKANDTRACK and its lock performance metrics to create data-driven, fuzzy logic algorithms that preemptively reroute program flow around inaccessible data sets. Thus, rather than needlessly waiting for a data set to become available or for a process to complete, the software actually anticipates the wait time based on historical norms, performs other (independent) functions, and returns to the original process when it becomes available.
Troy Hughes, Datmesis Analytics
In health sciences and biomedical research we often search for scientific journal abstracts and publications within MEDLINE, which is a suite of indexed databases developed and maintained by the National Center for Biotechnology Information (NCBI) at the United States National Library of Medicine (NLM). PubMed is a free search engine within MEDLINE and has become one of the standard databases to search for scientific abstracts. Entrez is the information retrieval system that gives you direct access to the 40 databases with over 1.3 billion records within the NCBI. You can access these records by using the eight e-utilities (einfo, esearch, summary, efetch, elink, einfo, epost, and egquery), which are the NCBI application programming interfaces (APIs). In this paper I will focus on using three of the e-utilities to retrieve data from PubMed as opposed to the standard method of manually searching and exporting the information from the PubMed website. I will demonstrate how to use the SAS HTTP procedure along with the XML mapper to develop a SAS® macro that generates all the required data from PubMed based on search term parameters. Using SAS to extract information from PubMed searches and save it directly into a data set allows the process to be automated (which is good for running routine updates) and eliminates manual searching and exporting from PubMed.
Craig hansen, South Australian Health and Medical Research Institute
Over the last few years both Microsoft Excel file formats and the SAS® interfaces to those Excel formats have changed. SAS® has worked hard to make the interface between the two systems easier to use. Starting with Comma Separated Variable files and moving to PROC IMPORT and PROC EXPORT, LIBNAME processing, SQL processing, SAS® Enterprise Guide®, JMP®, and then on to the HTML and XML tagsets like MSOFFICE2K, and EXCELXP. Well, there is now a new entry into the processes available for SAS users to send data directly to Excel. This new entry into the ODS arena of data transfer to Excel is the ODS destination called EXCEL. This process is included within SAS ODS and produces native format Excel files for version 2007 of Excel and later. It was first shipped as an experimental version with the first maintenance release of SAS® 9.4. This ODS destination has many features similar to the EXCELXP tagsets.
William E Benjamin Jr, Owl Computer Consultancy LLC
In an effort to increase transparency and accountability in the US health care system, the Obama administration mandated the Centers for Medicare & Medicaid Services (CMS) to make available data for use by researchers and interested parties from the general public. Among the more well-known uses of this data are analyses published by the Wall Street Journal showing that a large, and in some cases, shocking discrepancy between what hospitals potentially charge the uninsured and what they are paid by Medicare for the same procedure. Analyses such as these highlight both potential inequities in the US health care system and, more importantly, potential opportunities for its reform. However, while capturing the public imagination, analyses such as these are but one means to capitalize on the remarkable wealth of information this data provides. Specifically, data from the public distribution CMS data can help both researchers and the public better understand the burden specific conditions and medical treatments place on the US health care system. It was this simple, but important objective that motivated the present study. Our specific analyses focus on two of what we believe to be important questions. First, using the total number of hospital discharges as a proxy for incidence of a condition or treatment, which have the highest incidence rates nationally? Does their incidence remain stable, or is it increasing/decreasing? And, is there variability in these incidence rates across states? Second, as psychologists, we are necessarily interested in understanding the state of mental health care. To date, and to the best of our knowledge, there has been no study utilizing the public inpatient Medicare provider utilization and payment data set to explore the utilization of mental illness services funded by Medicare.
Joo Ann Lee, York University
Micheal Friendly, York University
cathy labrish, york university
So you've heard about SAS® arrays, but you're not sure when or why you would use them. This presentation provides some background on SAS arrays, from explaining what occurs during compile time to explaining how to use them programmatically. It also includes a discussion about how DO loops and macro variables can enhance array usability. Specific examples, including Fahrenheit-to-Celsius temperature conversion, salary adjustments, and data transposition and counting, demonstrate how you can use SAS arrays effectively in your own work and also provide a few caveats about their use.
Andrew Kuligowski, HSN
Lisa Mendez, IMS Government Solutions
Hospital Episode Statistics (HES) is a data warehouse that contains records of all admissions, outpatient appointments, and accident and emergency (A&E) attendances at National Health Service (NHS) hospitals in England. Each year it processes over 125 million admitted patient, outpatient, and A&E records. Such a large data set gives endless research opportunities for researchers and health-care professionals. However, patient care data is complex and might be difficult to manage. This paper demonstrates the flexibility and power of SAS® programming tools such as the DATA step, the SQL procedure, and macros to help to analyze HES data.
Violeta Balinskaite, Imperial College London
For some users, having an annotation facility is an integral part of creating polished graphics for their work. To meet that need, we created a new annotation facility for the ODS Graphics procedures in SAS® 9.3. Now, with SAS® 9.4, the Graph Template Language (GTL) supports annotation as well! In fact, GTL annotation facility has some unique features not available in the ODS Graphics procedures, such as using multiple sets of annotation in the same graph and the ability to bind annotation to a particular cell in the graph. This presentation covers some basic concepts of annotating that are common to both GTL and the ODS Graphics procedures. I apply those concepts to demonstrate some unique abilities of GTL annotation. Come see annotation in action!
Dan Heath, SAS
Arrays and DO loops have been used for years by SAS® programmers who work with diagnosis fields in health-care data, and the opportunity to use these techniques has only grown since the launch of the Affordable Care Act (ACA) in the United States. Users new to SAS or to the health-care field may find an overview of existing (as well as new) applications helpful. Risk-adjustment software, including the publicly available Health and Human Services (HHS) risk software that uses SAS and was released as part of the ACA implementation, is one example of code that is significantly improved by the use of arrays. Similar projects might include evaluations of diagnostic persistence, comparisons of diagnostic frequency or intensity between providers, and checks for unusual clusters of diagnosed conditions. This session reviews examples suitable for intermediate SAS users, including the application of two-dimensional arrays to diagnosis fields.
Ryan Ferland, Blue Cross Blue Shield of Arizona
This presentation is an open-ended discussion about techniques for creating graphical reports using SAS® ODS Graphics components. After some introductory comments, this presentation does not have any set content. Instead, the topics discussed are dictated by attendee questions. Come prepared to ask and get answers to your questions.
This paper provides tools and guidelines to assess SAS® skill level during the interview process. Included are interview questions for Base SAS® developers, statisticians, and SAS administration candidates. Whether the interviewer has a deep understanding of or just some familiarity with these areas, the questions provided will allow discussions to uncover skills and usage experience for the skills your company requires.
Jenine Milum, Citi
Traditional approaches to assessing undergraduate assignments in the field of software-related courses, including Analytics and Data Science courses, involve the course tutors in reading the students' code and getting the students to physically demonstrate their artifacts. However, this approach tends to assess only the technical skills of solving the set task. It generally fails to assess the many soft skills that industry is looking for, as identified in the e-skills UK (Tech Partnership)/SAS® report of November 2014 and the associated infographic poster. This presentation describes and evaluates the effectiveness of a different approach to defining the assessment task and summatively assessing the work of the students in order to effectively evaluate and mark both the soft skills, including creativity, curiosity, storytelling, problem solving, and communication, and the technical skills. This approach works effectively at all levels of undergraduate and masters courses. The session is structured to provide adequate time for audience participation to discuss the approach and its applicability.
Richard Self, University of Derby
For me, it's all about avoiding manual effort and repetition. Whether your work involves data exploration, reporting, or analytics, you probably find yourself repeating steps and tasks with each new program, project, or analysis. That repetition adds time to the delivery of results and also contributes to a lack of standardization. This presentation focuses on productivity tips and tricks to help you create a standard and efficient environment for your SAS® work so you can focus on the results and not the processes. Included are the following: setting up your programming environment (comment blocks, environment cleanup, easy movement between test and production, and modularization) sharing easily with your team (format libraries, macro libraries, and common code modules) managing files and results (date and time stamps for logs and output, project IDs, and titles and footnotes)
Marje Fecht, Prowerk Consulting
The power of SAS®9 applications allows information and knowledge creation from very large amounts of data. Analysis that used to consist of 10s to 100s of gigabytes (GBs) of supporting data has rapidly grown into the 10s to 100s of terabytes (TBs). This data expansion has resulted in more and larger SAS® data stores. Setting up file systems to support these large volumes of data with adequate performance, as well as ensuring adequate storage space for the SAS temporary files, can be very challenging. Technology advancements in storage and system virtualization, flash storage, and hybrid storage management require continual updating of best practices to configure IO subsystems. This paper presents updated best practices for configuring the IO subsystem for your SAS®9 applications, ensuring adequate capacity, bandwidth, and performance for your SAS®9 workloads. We have found that very few storage systems work ideally with SAS with their out-of-the-box settings, so it is important to convey these general guidelines.
Tony Brown, SAS
Surveys are a critical research component when trying to gather information about a population of people and their knowledge, attitudes, and experiences. Researchers often implement surveys after a population has participated in a class or educational program. Survey developers often create sets of items, which when analyzed as a group, can measure a construct that describes the underlying behavior, attribute, or characteristic of a study participant. After testing for the reliability and validity of individual items, the group of items can be combined to create a scale score that measures a predefined construct. For example, in education research, a construct can measure students' use of global reading strategies or teachers' confidence in literacy instruction. The number of items that compose a construct can vary widely. Construct scores can be used as outcome measures to assess the impact of a program on its participants. Our example is taken from a project evaluating a teacher's professional development program aimed at improving students' literacy in target subject classes. The programmer is tasked with creating such scores quickly. With an unlimited amount of time to spend, a programmer could operationalize the creation of these constructs by manually entering predefined formulas in the DATA step. More often, time and resources are limited. Therefore, the programmer is more efficient by automating this process by using macro programs. In this paper, we present a technique that uses an externally created key data set that contains the construct specifications and corresponding items of each construct. By iterating through macro variables created from this key data set, we can create construct score variables for varying numbers of items. This technique can be generalized to other processing tasks that involve any number of mathematical combinations of variables to create one single score.
Vincent Chan, IMPAQ International
Lorena Ortiz, IMPAQ International
Nowadays, the recommender system is a popular tool for online retailer businesses to predict customers' next-product-to-buy (NPTB). Based on statistical techniques and the information collected by the retailer, an efficient recommender system can suggest a meaningful NPTB to customers. A useful suggestion can reduce the customer's searching time for a wanted product and improve the buying experience, thus increasing the chance of cross-selling for online retailers and helping them build customer loyalty. Within a recommender system, the combination of advanced statistical techniques with available information (such as customer profiles, product attributes, and popular products) is the key element in using the retailer's database to produce a useful suggestion of an NPTB for customers. This paper illustrates how to create a recommender system with the SAS® RECOMMEND procedure for online business. Using the recommender system, we can produce predictions, compare the performance of different predictive models (such as decision trees or multinomial discrete-choice models), and make business-oriented recommendations from the analysis.
Miao Nie, ABN AMRO Bank
Shanshan Cong, SAS Institute
Formats are powerful tools within the SAS® System. They can be used to change how information is brought into SAS, to modify how it is displayed, and even to reshape the data itself. Base SAS® comes with a great many predefined formats, and it is even possible for you to create your own specialized formats. This paper briefly reviews the use of formats in general and then covers a number of aspects of user-generated formats. Since formats themselves have a number of uses that are not at first apparent to the new user, we also look at some of the broader applications of formats. Topics include building formats from data sets, using picture formats, transformations using formats, value translations, and using formats to perform table lookups.
Art Carpenter, California Occidental Consultants
You would think that training as a code breaker, similar to those who were employed during the Second World War, wouldn't be necessary to perform the routine SAS® programming duties of your job, such as debugging code. However, if the author of the code doesn't incorporate good elements of style in his or her program, the task of reviewing code becomes arduous and tedious for others. Style touches upon very specific aspects of writing code--indention for code and comments, casing of keywords, variables, and data set names, and spacing between PROC and DATA steps. Paying attention to these specific issues enhances the reusability and lifespan of your code. By using style to make your code readable, you'll impress your superiors and grow as a programmer.
Jay Iyengar, Data Systems Consultants
With all the talk of big data and visual analytics, we sometimes forget how important, and often difficult, it is to get external data into SAS®. In this paper, we review some common data sources such as delimited sources (for example, comma-separated values format [CSV]) as well as structured flat files and the programming steps needed to successfully load these files into SAS. In addition to examining the INFILE and INPUT statements, we look at some methods for dealing with bad data. This paper assumes only basic SAS skills, although the topic can be of interest to anyone who needs to read external files.
Peter Eberhardt, Fernwood Consulting Group Inc.
Audrey Yeo, Athene
Even though marketing is inevitable in every business, every year the marketing budget is limited and prudent fund allocations are required to optimize marketing investment. In many businesses, the marketing fund is allocated based on the marketing manager's experience, departmental budget allocation rules, and sometimes 'gut feelings' of business leaders. Those traditional ways of budget allocation yield suboptimal results and in many cases lead to money being wasted on irrelevant marketing efforts. Marketing mixed models can be used to understand the effects of marketing activities and identify the key marketing efforts that drive the most sales among a group of competing marketing activities. The results can be used in marketing budget allocation to take out the guesswork that typically goes into the budget allocation. In this paper, we illustrate practical methods for developing and implementing marketing mixed modeling using SAS® procedures. Real-life challenges of marketing mixed model development and execution are discussed, and several recommendations are provided to overcome some of those challenges.
Delali Agbenyegah, Alliance Data Systems
In the biopharmaceutical industry, biostatistics plays an important and essential role in the research and development of drugs, diagnostics, and medical devices. Familiarity with biostatistics combined with knowledge of SAS® software can lead to a challenging and rewarding career that also improves patients' lives. This paper provides a broad overview of the different types of jobs and career paths available, discusses the education and skill sets needed for each, and presents some ideas for overcoming entry barriers to careers in biostatistics and clinical SAS programming.
Justina Flavin, Independent Consultant
As a SAS® programmer, you probably spend some of your time reading and possibly creating specifications. Your job also includes writing and testing SAS code to produce the final product, whether it is Study Data Tabulation Model (SDTM) data sets, Analysis Data Model (ADaM) data sets, or statistical outputs such as tables, listings, or figures. You reach the point where you have completed the initial programming, removed all obvious errors and warnings from your SAS log, and checked your outputs for accuracy. You are almost done with your programming task, but one important step remains. It is considered best practice to check your SAS log for any questionable messages generated by the SAS system. In addition to messages that begin with the words WARNING or ERROR, there are also messages that begin with the words NOTE or INFO. This paper will focus on five different types of NOTE messages that commonly appear in the SAS log and will present ways to remove these messages from your log.
Jennifer Srivastava, Quintiles
Graphs are essential for many clinical and health care domains, including analysis of clinical trials safety data and analysis of the efficacy of the treatment, such as change in tumor size. Creating such graphs is a breeze with procedures from SAS® 9.4 ODS Graphics. This paper shows how to create many industry-standard graphs such as Lipid Profile, Swimmer Plot, Survival Plot, Forest Plot with Subgroups, Waterfall Plot, and Patient Profile using Study Data Tabulation Model (SDTM) data with just a few lines of code.
Sanjay Matange, SAS
Managing and coordinating various aspects of a report can be challenging. This is especially true when the structure and composition of the report are data driven. For complex reports, the inclusion of attributes such as color, labeling, and the ordering of items complicates the coding process. Fortunately we have some powerful reporting tools in SAS® that allow the process to be automated to a great extent. In the example presented in this paper, we are tasked with generating an Excel spreadsheet that ranks types of injuries within age groups. A given injury type is to be assigned a constant color regardless of its rank, and the labeling is to include not only the injury label but the actual count as well. Of course the user needs to be able to control such things as the age groups, color selection and order, and number of desired ranks.
Art Carpenter, California Occidental Consultants
Suppose that you have a very large data set with some specific values in one of the columns of the data set, and you want to classify the entire data set into different comma-separated-values format (CSV) sheets based on the values present in that specific column. Perhaps you might use codes using IF/THEN and ELSE statement conditions in SAS®, along with some OUTPUT statements. If you divide that data set into csv sheets, it is more frustrating to use the conventional, manual process of converting each of the separated data sets into csv files. This paper shows a comparative study of using the Macro command in SAS with the help of the proc Export statement and the Output Delivery System (ODS) command using proc tabulate. In these two processes, the whole tedious process is done automatically using the SAS code.
Saurabh Nandy, Oklahoma State University
SAS® macros offer a very flexible way of developing, organizing, and running SAS code. In many systems, programming components are stored as macros in files and called as macros via a variety of means so that they can perform the task at hand. Consideration must be given to the organization of these files in a manner consistent with proper use and retention. An example might be the development of some macros that are company-wide in potential application, others that are departmental, and still others that might be for personal or individual user. Super-imposed on this structure in a factorial fashion are the need to have areas that are considered: (1) production - validated, (2) archived - retired and (3) developmental. Several proposals for accomplishing this are discussed as well as how this structure might interrelate with stored processes available in some SAS systems. The use of these macros in systems ranging from simple background batch processing to highly interactive SAS and BI processing are discussed. Specifically, the drivers or driver files are addressed. Pro's and con's to all approaches are considered.
Roger Muller, Data-To-Events, Inc.
Within SAS® literally millions of colors are available for use in our charts, graphs, and reports. We can name these colors using techniques that include color wheels, RGB (Red, Green, Blue) HEX codes, and HLS (Hue, Lightness, Saturation) HEX codes. But sometimes I just want to use a color by name. When I want purple, I want to be able to ask for purple, not CX703070 or H03C5066. But am I limiting myself to just one purple? What about light purple or pinkish purple? Do those colors have names or must I use the codes? It turns out that they do have names. Names that we can use. Names that we can select, names that we can order, names that we can use to build our graphs and reports. This paper shows you how to gather color names and manipulate them so that you can take advantage of your favorite purple, be it purple, grayish purple, vivid purple, or pale purplish blue.
Art Carpenter, California Occidental Consultants
You might already know that SAS® Studio tasks provide you with prompts to fill in the blanks in SAS® code and help you navigate SAS syntax. But did you know that SAS Studio tasks are not only designed to allow you to modify them, but there is an entire common task model (CTM) provided for you to build your own? You can build basic utilities or complex reports. You can just run SAS code or you can have elaborate prompts to dynamically generate code. And since SAS Studio runs in a web browser, your tasks are browser-based and can easily be shared with others. In this paper, you learn how to take a typical SAS program and create a SAS Studio task from it. When the task is run, you see web-based prompts for the dynamic portions of the program and HTML output delivered to your browser.
Kris Kiser, SAS
Christie Corcoran, SAS
Marie Dexter, SAS
Amy Peters, SAS
This paper shows how to program a powerful Q-gram algorithm for measuring the similarity of two character strings. Along with built-in SAS® functions--such as SOUNDEX, SPEDIS, COMPGED, and COMPLEV--Q-gram can be a valuable tool in your arsenal of string comparators. Q-gram is especially useful when measuring the similarity of strings that are intuitively identical, but which have a different word order, such as John Smith and Smith, John.
Joe DeShon, Boehringer-Ingelheim
You've heard that SAS® Output Delivery System (ODS) Graphics provides a powerful and detailed syntax for creating custom graphs, but for whatever reason you still haven't added them to your bag of SAS® tricks. Let's change that! We will also present a code playground based on Microsoft Office that will enable you to quickly try out a variety of prepared SAS ODS Graphics examples, tweak the code, and see the results--all directly from Microsoft Excel. More experienced users will also find the code playground (which is similar in spirit to Google Code Playground or JSFiddle) useful for compiling SAS ODS Graphics code snippets for themselves and for sharing with colleagues, as well as for creating dashboards hosted by Microsoft Excel or Microsoft PowerPoint that contain precisely sized and placed SAS graphics.
Ted Conway, Discover Financial Services
Ezequiel Torres, MPA Healthcare Solutions, Inc.
Customer retention is a primary concern for businesses that rely on a subscription revenue model. It is common for marketers of subscription-based offerings to develop predictive models that are aimed at identifying subscribers who have the highest risk of attrition. With these likely unsubscribes identified, marketers then attempt to forestall customer termination by using a variety of retention enhancement tactics, which might include free offers, customer training, satisfaction surveys, or other measures. Although customer retention is always a worthy pursuit, it is often expensive to retain subscribers. In many cases, associated retention programs simply prove unprofitable over time because the overall cost of such programs frequently exceeds the lifetime value of the cohort of unsubscribed customers. Generally, it is more profitable to focus resources on identifying and marketing to a targeted prospective customer. When the target marketing strategy focuses on identifying prospects who are most likely to subscribe over the long term, the need for special retention marketing efforts decreases sharply. This paper describes results of an analytically driven targeting approach that is aimed at inviting new customers to a milk and grocery home-delivery service, with the promise of attracting only those prospects who are expected to exhibit high long-term retention rates.
The DATA step and DS2 both offer the user a built-in general purpose hash object that has become the go-to tool for many data analysis problems. However, there are occasions where the best solution would require a custom object specifically tailored to the problem space. The DS2 Package syntax allows the user to create custom objects that can form linked structures in memory. With DS2 Packages it is possible to create lists or tree structures that are custom tailored to the problem space. For data that can describe a great many more states than actually exist, dynamic structures can provide an extremely compact way to manipulate and analyze the data. The SAS® In-Database Code Accelerator allows these custom packages to be deployed in parallel on massive data grids.
Robert Ray, SAS
During the course of a clinical trial study, large numbers of new and modified data records are received on an ongoing basis. Providing end users with an approach to continuously review and monitor study data, while enabling them to focus reviews on new or modified (incremental) data records, allows for greater efficiency in identifying potential data issues. In addition, supplying data reviewers with a familiar machine-readable output format (for example, Microsoft Excel) allows for greater flexibility in filtering, highlighting, and retention of data reviewers' comments. In this paper, we outline an approach using SAS® in a Windows server environment and a shared folder structure to automatically refresh data review listings. Upon each execution, the listings are compared against previously reviewed data to flag new and modified records, as well as carry forward any data reviewers' comments made during the previous review. In addition, we highlight the use and capabilities of the SAS® ExcelXP tagset, which enables greater control over data formatting, including management of Microsoft Excel's sometimes undesired automatic formatting. Overall, this approach provides a significantly improved end-user experience above and beyond the more traditional approach of performing cumulative or incremental data reviews using PDF listings.
Victor Lopez, Samumed, LLC
Heli Ghandehari, Samumed, LLC
Bill Knowlton, Samumed, LLC
Christopher Swearingen, Samumed, LLC
This presentation describes a technique for dealing with the precision problems inherent in datetime values containing nanosecond data. Floating-point values cannot store sufficient precision for this, and this limitation can be a problem for transactional data where nanoseconds are pertinent. Methods discussed include separation of variables and using the special GROUPFORMAT feature of the BY statement with MERGE in the DATA step.
Rick Langston, SAS
Phishing is the attempt of a malicious entity to acquire personal, financial, or otherwise sensitive information such as user names and passwords from recipients through the transmission of seemingly legitimate emails. By quickly alerting recipients of known phishing attacks, an organization can reduce the likelihood that a user will succumb to the request and unknowingly provide sensitive information to attackers. Methods to detect phishing attacks typically require the body of each email to be analyzed. However, most academic institutions do not have the resources to scan individual emails as they are received, nor do they wish to retain and analyze message body data. Many institutions simply rely on the education and participation of recipients within their network. Recipients are encouraged to alert information security (IS) personnel of potential attacks as they are delivered to their mailboxes. This paper explores a novel and more automated approach that uses SAS® to examine email header and transmission data to determine likely phishing attempts that can be further analyzed by IS personnel. Previously a collection of 2,703 emails from an external filtering appliance were examined with moderate success. This paper focuses on the gains from analyzing an additional 50,000 emails, with the inclusion of an additional 30 known attacks. Real-time email traffic is exported from Splunk Enterprise into SAS for analysis. The resulting model aids in determining the effectiveness of alerting IS personnel to potential phishing attempts faster than a user simply forwarding a suspicious email to IS personnel.
Taylor Anderson, University of Alabama
Denise McManus, University of Alabama
This paper covers developing SAS® Studio repositories. SAS Studio introduced a new way of developing custom tasks using an XML markup specification and the Apache Velocity templating language. SAS Studio repositories build on this framework and provide a flexible way to package custom tasks and snippets. After tasks and snippets are packaged into a repository, they can be shared with users inside your organization or outside your organization. This paper uses several examples to help you create your first repository.
Swapnil Ghan, SAS
Michael Monaco, SAS
Amy Peters, SAS
Today's employment and business marketplace is highly competitive. As a result, it is necessary for SAS® professionals to differentiate themselves from the competition. Success depends on a number of factors, including positioning yourself with the necessary technical skills in relation to the competition. This presentation illustrates how SAS professionals can acquire a wealth of knowledge and enhance their skills by accessing valuable and free web content related to SAS. With the aid of a web browser and the Internet, anyone can access published PDF papers, Microsoft Word documents, Microsoft PowerPoint presentations, comprehensive student notes, instructor lesson plans, hands-on exercises, webinars, audios, videos, a comprehensive technical support website maintained by SAS, and more to acquire the essential expertise that is needed to cut through all the marketplace noise and begin differentiating yourself to secure desirable opportunities with employers and clients.
Kirk Paul Lafler, Software Intelligence Corporation
While cardiac revascularization procedures like cardiac catheterization, percutaneous transluminal angioplasty, and cardiac artery bypass surgery have become standard practices in restorative cardiology, the practice is not evenly prescribed or subscribed to. We analyzed Florida hospital discharge records for the period 1992 to 2010 to determine the odds of receipt of any of these procedures by Hispanics and non-Hispanic Whites. Covariates (potential confounders) were age, insurance type, gender, and year of discharge. Additional covariates considered included comorbidities such as hypertension, diabetes, obesity, and depression. The results indicated that even after adjusting for covariates, Hispanics in Florida during the time period 1992 to 2010 were consistently less likely to receive these procedures than their White counterparts. Reasons for this phenomenon are discussed.
C. Perry Brown, Florida A&M University
Jontae Sanders, Florida Department of Health
Discover how to document your SAS® programs, data sets, and catalogs with a few lines of code that include SAS functions, macro code, and SAS metadata. Do you start every project with the best of intentions to document all of your work, and then fall short of that aspiration when deadlines loom? Learn how your programs can automatically update your processing log. If you have ever wondered who ran a program that overwrote your data, SAS has the answer! And If you don't want to be tracing back through a year's worth of code to produce a codebook for your client at the end of a contract, SAS has the answer!
Roberta Glass, Abt Associates
Louise Hadden, Abt Associates
In any organization where people work with data, it is extremely unlikely that there will be only one way of doing things. Things like divisional silos and differences in education, training, and subject matter often result in a diversity of tools and levels of expertise. One situation that frequently arises is that of 'code versus click.' Let's call it the difference between code-based, 'power user' data tools and simpler, purely graphic point-and-click tools such as Microsoft Excel. Even though the work itself might be quite similar, differences in analysis tools often mean differences in vocabulary and experience, and it can be difficult to convert users of one tool to another. This discussion will highlight the potential challenges of SAS® adoption in an Excel-based workplace and propose strategies to gain new SAS advocates in your organization.
Andrew Clapson, MD Financial Management
Dynamic interactive visual displays known as dashboards are most effective when they show essential graphs, tables, statistics, and other information where data is the star. The first rule for creating an effective SAS® dashboard is to keep it simple. Striking a balance between content and style, a dashboard should be void of clutter so as not to distract from or obscure the information displayed. The second rule of effective dashboard design is to display data that meets one or more business or organizational objectives. To accomplish this, the elements in a dashboard should convey a format easily understood by its intended audience. Attendees learn how to create dynamic interactive user- and data-driven dashboards, graphical and table-driven dashboards, statistical dashboards, and drill-down dashboards with a purpose by using Base SAS® programming techniques including the DATA step, PROC FORMAT, PROC PRINT, PROC MEANS, PROC SQL, ODS, statistical graphics, and HTML.
Kirk Paul Lafler, Software Intelligence Corporation
Data-driven decision making is critical for any organization to thrive in this fiercely competitive world. The decision-making process has to be accurate and fast in order to stay a step ahead of the competition. One major problem organizations face is huge data load times in loading or processing the data. Reducing the data loading time can help organizations perform faster analysis and thereby respond quickly. In this paper, we compared the methods that can import data of a particular file type in the shortest possible time and thereby increase the efficiency of decision making. SAS® takes input from various file types (such as XLS, CSV, XLSX, ACCESS, and TXT) and converts that input into SAS data sets. To perform this task, SAS provides multiple solutions (such as the IMPORT procedure, the INFILE statement, and the LIBNAME engine) to import the data. We observed the processing times taken by each method for different file types with a data set containing 65,535 observations and 11 variables. We executed the procedure multiple times to check for variation in processing time. From these tests, we recorded the minimum processing time for the combination of procedure and file type. From our analysis of processing times taken by each importing technique, we observed that the shortest processing times for CSV and TXT files, XLS and XLSX files, and ACCESS files are the INFILE statement, the LIBNAME engine, and PROC IMPORT, respectively.
Divya Dadi, Oklahoma State University
Rahul Jhaver, Oklahoma State University
Now that you have deployed SAS®, what should you do to ensure it continues to meet your SAS users' performance expectations? This paper discusses how to proactively monitor your SAS infrastructure, with tools that should be used on a regular basis to keep tabs on infrastructure performance and housekeeping. Basic SAS administration concepts are discussed, with references for blogs, communities, and the new visual learning sites.
Margaret Crevar, SAS
The use of logistic models for independent binary data has relied first on asymptotic theory and later on exact distributions for small samples, as discussed by Troxler, Lalonde, and Wilson (2011). While the use of logistic models for dependent analysis based on exact analyses is not common, it is usually presented in the case of one-stage clustering. We present a SAS® macro that allows the testing of hypotheses using exact methods in the case of one-stage and two-stage clustering for small samples. The accuracy of the method and the results are compared to results obtained using an R program.
Kyle Irimata, Arizona State University
Jeffrey Wilson, Arizona State University
Do you create complex reports using PROC REPORT? Are you confused by the COMPUTE BLOCK feature of PROC REPORT? Are you even aware of it? Maybe you already produce reports using PROC REPORT, but suddenly your boss needs you to modify some of the values in one or more of the columns. Maybe your boss needs to see the values of some rows in boldface and others highlighted in a stylish yellow. Perhaps one of the columns in the report needs to display a variety of fashionable formats (some with varying decimal places and some without any decimals). Maybe the customer needs to see a footnote in specific cells of the report. Well, if this sounds familiar then come take a look at the COMPUTE BLOCK of PROC REPORT. This paper shows a few tips and tricks of using the COMPUTE DEFINE block with conditional IF/THEN logic to make your reports stylish and fashionable. The COMPUTE BLOCK allows you to use data DATA step code within PROC REPORT to provide customization and style to your reports. We'll see how the Census Bureau produces a stylish demographic profile for customers of its Special Census program using PROC REPORT with the COMPUTE BLOCK. The paper focuses on how to use the COMPUTE BLOCK to create this stylish Special Census profile. The paper shows quick tips and simple code to handle multiple formats within the same column, make the values in the Total rows boldface, trafficlighting, and how to add footnotes to any cell based on the column or row. The Special Census profile report is an Excel table created with ODS tagsets.ExcelXP that is stylish and fashionable, thanks in part to the COMPUTE BLOCK.
Chris Boniface, Census Bureau
Road safety is a major concern for all United States of America citizens. According to the National Highway Traffic Safety Administration, 30,000 deaths are caused by automobile accidents annually. Oftentimes fatalities occur due to a number of factors such as driver carelessness, speed of operation, impairment due to alcohol or drugs, and road environment. Some studies suggest that car crashes are solely due to driver factors, while other studies suggest car crashes are due to a combination of roadway and driver factors. However, other factors not mentioned in previous studies may be contributing to automobile accident fatalities. The objective of this project was to identify the significant factors that lead to multiple fatalities in the event of a car crash.
Bill Bentley, Value-Train
Gina Colaianni, Kennesaw State University
Cheryl Joneckis, Kennesaw State University
Sherry Ni, Kennesaw State University
Kennedy Onzere, Kennesaw State University
SAS® for Windows is extremely powerful software, not only for analyzing data, but also for organizing and maintaining output and permanent data sets. By using pipes and operating system (x) commands within a SAS session, you can easily and effectively manage files of all types stored on your local network.
Emily Sisson, Boston University School of Public Health Data Coordinating Center
U.S. stock exchanges (currently there are 12) are tracked in real time via the Consolidated Tape System (CTS) and the Consolidated Quotation System (CQS). CQS contains every updated quote (buyer's bid price and seller's offer price) from each exchange, covering some 8,500 stock tickers. This is the basis by which brokers can honor their obligation to investors, mandated by the U.S. Securities and Exchange Commission, to execute transactions at the best price, that is, at the National Best Bid and Offer (NBBO). With the advent of electronic exchanges and high-frequency trading (timestamps are published to the microsecond), data set size has become a major operational consideration for market researchers re-creating NBBO values (over 1 billion quotes requiring 80 gigabytes of storage for a normal trading day). This presentation demonstrates a straightforward use of hash tables for tracking constantly changing quotes for each ticker/exchange combination, in tandem with an efficient means of determining changes in NBBO with every new quote.
Mark Keintz, Wharton Research Data Services
In this era of data analytics, you are often faced with a challenge of joining data from multiple legacy systems. When the data systems share a consistent merge key, such as ID or SSN, the solution is straightforward. However, what do you do when there is no common merge key? If one data system has a character value ID field, another has an alphanumeric field, and the only common fields are the names or addresses or dates of birth, a standard merge query does not work. This paper demonstrates fuzzy matching methods that can overcome this obstacle and build your master record through Base SAS® coding. The paper also describes how to leverage the SAS® Data Quality Server in SAS® code.
Elena Shtern, SAS
Kim Hare, SAS
Collection of customer data is often done in status tables or snapshots, where, for example, for each month, the values for a handful of variables are recorded in a new status table whose name is marked with the value of the month. In this QuickTip, we present how to construct a table of last occurrence times for customers using a DATA step merge of such status tables and the colon (':') wildcard. If the status tables are sorted, this can be accomplished in four lines of code (where RUN; is the fourth). Also, we look at how to construct delta tables (for example, from one time period to another, or which customers have arrived or left) using a similar method of merge and colons.
Jingyu She, Danica Pension
When attempting to match names and addresses from different files, we often run into a situation where the names are similar, but not exactly the same. Sometimes there are additional words in the names, sometimes there are different spellings, and sometimes the businesses have the same name but are located thousands of miles apart. The files that contain the names might have numeric keys that cannot be matched. Therefore, we need to use a process called fuzzy matching to match the names from different files. The SAS® function COMPGED, combined with SAS character-handling functions, provides a straightforward method of applying business rules and testing for similarity.
Stephen Sloan, Accenture
Daniel Hoicowitz, Accenture
There are many methods to randomize participants in randomized control trials. If it is important to have approximately balanced groups throughout the course of the trial, simple randomization is not a suitable method. Perhaps the most common alternative method that provides balance is the blocked randomization method. A less well-known method called the treatment adaptive randomized design also achieves balance. This paper shows you how to generate an entire randomization sequence to randomize participants in a two-group clinical trial using the adaptive biased coin randomization design (ABCD), prior to recruiting any patients. Such a sequence could be used in a central randomization server. A unique feature of this method allows the user to determine the extent to which imbalance is permitted to occur throughout the sequence while retaining the probabilistic nature that is essential to randomization. Properties of sequences generated by the ABCD approach are compared to those generated by simple randomization, a variant of simple randomization that ensures balance at the end of the sequence, and by blocked randomization.
Gary Foster, St Joseph's Healthcare
SAS® Output Delivery System (ODS) Graphics started appearing in SAS® 9.2. When first starting to use these tools, the traditional SAS/GRAPH® software user might come upon some very significant challenges in learning the new way to do things. This is further complicated by the lack of simple demonstrations of capabilities. Most graphs in training materials and publications are rather complicated graphs that, while useful, are not good teaching examples. This paper contains many examples of very simple ways to get very simple things accomplished. Over 20 different graphs are developed using only a few lines of code each, using data from the SASHELP data sets. The use of the SGPLOT, SGPANEL, and SGSCATTER procedures are shown. In addition, the paper addresses those situations in which the user must alternatively use a combination of the TEMPLATE and SGRENDER procedures to accomplish the task at hand. Most importantly, the use of ODS Graphics Designer as a teaching tool and a generator of sample graphs and code are covered. The emphasis in this paper is the simplicity of the learning process. Users will be able to take the included code and run it immediately on their personal machines to achieve an instant sense of gratification.
Roger Muller, Data-To-Events, Inc.
In today's world of torrential data, plotting large amounts of data has its own challenges. The SGPLOT procedure in ODS Graphics offers a simple yet powerful tool for your graphing needs. In this paper we present some graphs that work better for large data sets. We create heat maps, box plots, and parallel coordinate plots that visualize large data. You can make your graphs resilient to your growing data with ODS Graphics!
Prashant Hebbar, SAS Institute Inc
Lingxiao Li, SAS
Sanjay Matange, SAS
There are many ways to customize data presentation and graphics for an area as specialized as oncology. This paper touches on various approaches to presenting oncological data and on ways that each of the graphics can be customized and individualized based on the needs of a study. The topics range from using SGPLOTs and SGPANEL, which are a simple but visually effective way to present stratified information, to using a complex integration of a series of HIGHLOW plots to depict duration of patient responses on a clinical trial. We also take a detailed look at how to generate specialized and customized waterfall plots used to visualize tumor growth patterns, as well as linear graphics that display a different approach to viewing the same endpoint. In addition to specifying the syntax to graph the data, instruction will be given on how to construct data sets in order to achieve the desired graphics.
Nora Ruel, City of Hope
Project management is a hot topic across many industries, and there are multiple commercial software applications for managing projects available. The reality, however, is that the majority of project management software is not applicable for daily usage. SAS® has a solution for this issue that can be used for managing projects graphically in real time. This paper introduces a new paradigm for project management using the SAS® Graph Template Language (GTL). SAS clients, in real time, can use GTL to visualize resource assignments, task plans, delivery tracking, and project status across multiple project levels for more efficient project management.
Zhouming(Victor) Sun, Medimmune
SAS® programs can be very I/O intensive. SAS data sets with inappropriate variable attributes can degrade the performance of SAS programs. Using SAS compression offers some relief but does not eliminate the issues caused by inappropriately defined SAS variables. This paper examines the problems that inappropriate SAS variable attributes can cause and introduces a macro to tackle the problem of minimizing the footprint of a SAS data set.
Brian Varney, Experis BI & Analytics Practice
Would you like to be more confident in producing graphs and figures? Do you understand the differences between the OVERLAY, GRIDDED, LATTICE, DATAPANEL, and DATALATTICE layouts? Would you like to know how to easily create life sciences industry standard graphs such as adverse event timelines, Kaplan-Meier plots, and waterfall plots? Finally, would you like to learn all these methods in a relaxed environment that fosters questions? Great--this topic is for you! In this hands-on workshop, you will be guided through the Graph Template Language (GTL). You will also complete fun and challenging SAS graphics exercises to enable you to more easily retain what you have learned. This session is structured so that you will learn how to create the standard plots that your manager requests, how to easily create simple ad hoc plots for your customers, and also how to create complex graphics. You will be shown different methods to annotate your plots, including how to add Unicode characters to your plots. You will find out how to create reusable templates, which can be used by your team. Years of information have been carefully condensed into this 90-minute hands-on, highly interactive session. Feel free to bring some of your challenging graphical questions along!
Kriss Harris, SAS Specialists Ltd
Annotating a blank Case Report Form (blankcrf.pdf, which is required for a nondisclosure agreement with the US Food and Drug Administration) by hand is an arduous process that can take many hours of precious time. Once again SAS® comes to the rescue! You can have SAS use the Study Data Tabulation Model (SDTM) specifications data to create all the necessary annotations and place them on the appropriate pages of the blankcrf.pdf. In addition you can dynamically color and adjust the font size of these annotations. This approach uses SAS, Adobe Acrobat's forms definition format (FDF) language, and Adobe Reader to complete the process. In this paper I go through each of the steps needed and explain in detail exactly how to accomplish each task.
Steven Black, Agility Clinical
SAS® provides some powerful, flexible tools for creating reports, like the REPORT and TABULATE procedures. With the advent of the Output Delivery System (ODS), you have almost total control over how the output from those procedures looks. But there are still times where you need (or want) just a little more and that's where the Report Writing Interface (RWI) can help. The Report Writing Interface is just a fancy way of saying you're using the ODSOUT object in a DATA step. This object allows you to lay out the page, create tables, embed images, add titles and footnotes, and more--all from within a DATA step, using whatever DATA step logic you need. Also, all the style capabilities of ODS are available to you so that the output created by your DATA step can have fonts, sizes, colors, backgrounds, and borders to make your report look just like you want. This presentation quickly covers some of the basics of using the ODSOUT object and then walks through some of the techniques to create four real-world examples. Who knows, you might even go home and replace some of your PROC REPORT code--I know I have!
Pete Lund, Looking Glass Analytics
Since Atul Gawande popularized the term in describing the work of Dr. Jeffrey Brenner in a New Yorker article, hot-spotting has been used in health care to describe the process of identifying super-utilizers of health care services, then defining intervention programs to coordinate and improve their care. According to Brenner's data from Camden, New Jersey, 1% of patients generate 30% of payments to hospitals, while 5% of patients generate 50% of payments. Analyzing administrative health care claims data, which contains information about diagnoses, treatments, costs, charges, and patient sociodemographic data, can be a useful way to identify super-users, as well as those who may be receiving inappropriate care. Both groups can be targeted for care management interventions. In this paper, techniques for patient outlier identification and prioritization are discussed using examples from private commercial and public health insurance claims data. The paper also describes techniques used with health care claims data to identify high-risk, high-cost patients and to generate analyses that can be used to prioritize patients for various interventions to improve their health.
Paul LaBrec, 3M Health Information Systems
A group tasked with testing SAS® software from the customer perspective has gathered a number of helpful hints for SAS® 9.4 that will smooth the transition to its new features and products. These hints will help with the 'huh?' moments that crop up when you are getting oriented and will provide short, straightforward answers. We also share insights about changes in your order contents. Gleaned from extensive multi-tier deployments, SAS® Customer Experience Testing shares insiders' practical tips to ensure that you are ready to begin your transition to SAS 9.4 and guidance for after you are at SAS 9.4. The target audience for this paper is primarily system administrators who will be installing, configuring, or administering the SAS 9.4 environment. This paper was first published in 2012; it has been revised each year since with new information.
Lisa Cook, SAS
Lisa Cook, SAS
Edith Jeffries, SAS
Cindy Taylor, SAS
SAS® Federated Query Language (FedSQL) is a SAS proprietary implementation of the ANSI SQL:1999 core standard capable of providing a scalable, threaded, high-performance way to access relational data in multiple data sources. This paper explores the FedSQL language in a three-step approach. First, we introduce the key features of the language and discuss the value each feature provides. Next, we present the FEDSQL procedure (the Base SAS® implementation of the language) and compare it to PROC SQL in terms of both syntax and performance. Finally, we examine how FedSQL queries can be embedded in DS2 programs to merge the power of these two high-performance languages.
Shaun Kaufmann, Farm Credit Canada
You can use annotation, modify templates, and change dynamic variables to customize graphs in SAS®. Standard graph customization methods include template modification (which most people use to modify graphs that analytical procedures produce) and SG annotation (which most people use to modify graphs that procedures such as PROC SGPLOT produce). However, you can also use SG annotation to modify graphs that analytical procedures produce. You begin by using an analytical procedure, ODS Graphics, and the ODS OUTPUT statement to capture the data that go into the graph. You use the ODS document to capture the values that the procedure sets for the dynamic variables, which control many of the details of how the graph is created. You can modify the values of the dynamic variables, and you can modify graph and style templates. Then you can use PROC SGRENDER along with the ODS output data set, the captured or modified dynamic variables, the modified templates, and SG annotation to create highly customized graphs. This paper shows you how and provides examples.
Warren Kuhfeld, SAS
As SAS® programmers we often want our code or program logic to be driven by the data at hand, rather than be directed by us. Such dynamic code enables the data to drive the logic or sequence of execution. This type of programming can be very useful when creating lists of observations, variables, or data sets from ever-changing data. Whether these lists are used to verify the data at hand or are to be used in later steps of the program, dynamic code can write our lists once and ensure that the values change in tandem with our data. This Quick Tip paper will present the concepts of creating data-driven lists for observations, variables, and data sets, the code needed to execute these tasks, and examples to better explain the process and results of the programs we will create.
Kate Burnett-Isaacs, Statistics Canada
Today's SAS® environment has large numbers of concurrent SAS processes that have to process ever-growing data volumes. To help SAS users remain productive, SAS administrators must ensure that SAS applications have sufficient computer resources that are properly configured and monitored often. Understanding how all of the components of SAS work and how they are used by your users is the first step. The guidance offered in this paper helps SAS administrators evaluate hardware, operating system, and infrastructure options for a SAS environment that will keep their SAS applications running at optimal performance and keep their user community happy.
Margaret Crevar, SAS
Managing large data sets comes with the task of providing a certain level of quality assurance, no matter what the data is used for. We present here the fundamental SAS® procedures to perform when determining the completeness of a data set. Even though each data set is unique and has its own variables that need more examination in detail, it is important to first examine the size, time, range, interactions, and purity (STRIP) of a data set to determine its readiness for any use. This paper covers first steps you should always take, regardless of whether you're dealing with health, financial, demographic, or environmental data.
Michael Santema, Kaiser Permanente
Fagen Xie, Kaiser Permanente
This paper provides tips and techniques to speed up the validation process without and with automation. For validation without automation, it introduces both standard use and clever use of options and statements to be implemented in the COMPARE procedure that can speed up the validation process. For validation with automation, a macro named %QCDATA is introduced for individual data set validation, and a macro named %QCDIR is introduced for comparison of data sets in two different directories. Also introduced in this section is a definition of &SYSINFO and an explanation of how it can be of use to interpret the result of the comparison.
Alice Cheng, Portola Pharmaceuticals
Justina Flavin, Independent Consultant
Michael Wise, Experis BI & Analytics Practice
PERL is a good language to work with text in in the case of SAS® with character variables. PERL was much used in website programming in the early days of the World Wide Web. SAS provides some PERL functions in DATA step statements. PERL is useful for finding text patterns or simply strings of text and returning the strings or positions in a character variable of the string. I often use PERL to locate a string of text in a character variable. Knowing this location, I know where to start a substr function. The poster covers a number of PERL functions available in SAS® 9.2 and later. Each function is explained with written text, and then a brief code demonstration shows how I use the function. The examples are jointly explained based on SAS documentation and my best practices.
Peter Timusk, Independent
Have you ever wondered how to get the most from Web 2.0 technologies in order to visualize SAS® data? How to make those graphs dynamic, so that users can explore the data in a controlled way, without needing prior knowledge of SAS products or data science? Wonder no more! In this session, you learn how to turn basic sashelp.stocks data into a snazzy HighCharts stock chart in which a user can review any time period, zoom in and out, and export the graph as an image. All of these features with only two DATA steps and one SORT procedure, for 57 lines of SAS code.
Vasilij Nevlev, Analytium Ltd
Although hashing methods such as SHA256 and MD5 are already available in SAS®, other methods may be needed by SAS users. This presentation shows how hashing and methods can be implemented using SAS DATA steps and macros, with judicious use of the bitwise functions (BAND, BOR, and so on) and by referencing public domain sources.
Rick Langston, SAS
One of the tedious but necessary things that SAS® programmers must do is to trace and monitor SAS runs: counting observations and columns, checking performance, drawing flowcharts, and diagramming data flows. This is required in order to audit results, find errors, find long-running steps, identify large data files, and to simply understand what a particular job does. SAS® Enterprise Guide® includes functionality to help with some of this, on some SAS jobs. This paper presents an innovative custom tool that automatically produces flowcharts--showing all SAS steps in a job, file counts in and out, variable counts, recommendations and generated code for performance improvements, and more. The system was written completely in SAS using SAS source programs, SAS logs, PROC SCAPROC output, and more as input. This tool can eliminate much of the manual work needed in job optimization, space issues, debugging, change management, documentation, and job check-out. As has been said many times before: We have computers, let's use them.
Steven First, Systems Seminar Consultants, Inc.
Jennifer First-Kluge, Systems Seminar Consultants, Inc.
An interactive SAS® environment is preferred for developing programs as it gives the flexibility of instantly viewing the log in the log window. The programmer must review the log window to ensure that each and every single line of a written program is running successfully without displaying any messages defined by SAS that are potential errors. Reviewing the log window every time is not only time consuming but also prone to manual error for any level of programmer. Just to confirm that the log is free from error, the programmer must check the log. Currently in the interactive SAS environment there is no way to get an instant notification about the generated log from the Log window, indicating whether there have been any messages defined by SAS that are potential errors. This paper introduces an instant approach to analyzing the Log window using the SAS macro %ICHECK that displays the reports instantly in the same SAS environment. The report produces a summary of all the messages defined by SAS in the Log window. The programmer does not need to add %ICHECK at the end of the program. Whether a single DATA step, a single PROC step, or the whole program is submitted, the %ICHECK macro is automatically executed at the end of every submission. It might be surprising to you to learn how a compiled macro can be executed without calling it in the Editor window. But it is possible with %ICHECK, and you can develop it using only SAS products. It can be used in a Windows or UNIX interactive SAS environment without requiring any user inputs. With the proposed approach, there is a significant benefit in the log review process and a 100% gain in time saved for all levels of programmers because the log is free from error. Similar functionality can be introduced in the SAS product itself.
Amarnath Vijayarangan, Emmes Services Pvt Ltd, India
PALANISAMY MOHAN, ICON CLINICAL RESEARCH INDIA PVT LTD
Microsoft Visual Basic Scripting Edition (VBScript) and SAS® software are each powerful tools in their own right. These two technologies can be combined so that SAS code can call a VBScript program or vice versa. This gives a programmer the ability to automate SAS tasks; traverse the file system; send emails programmatically via Microsoft Outlook or SMTP; manipulate Microsoft Word, Microsoft Excel, and Microsoft PowerPoint files; get web data; and more. This paper presents example code to demonstrate each of these capabilities.
Christopher Johnson, BrickStreet Insurance
This paper builds on the knowledge gained in the paper 'Introduction to ODS Graphics.' The capabilities in ODS Graphics grow with every release as both new paradigms and smaller tweaks are introduced. After talking with the ODS developers, I have chosen a selection of the many wonderful capabilities to highlight here. This paper provides the reader with more tools for his or her tool belt. Visualization of data is an important part of telling the story seen in the data. And while the standards and defaults in ODS Graphics are very well done, sometimes the user has specific nuances for characters in the story or additional plot lines they want to incorporate. Almost any possibility, from drama to comedy to mystery, is available in ODS Graphics if you know how. We explore tables, annotation and changing attributes, as well as the block plot. Any user of Base SAS® on any platform will find great value from the SAS® ODS Graphics procedures. Some experience with these procedures is assumed, but not required.
Chuck Kincaid, Experis BI & Analytics Practice
This presentation teaches the audience how to use ODS Graphics. Now part of Base SAS®, ODS Graphics are a great way to easily create clear graphics that enable users to tell their stories well. SGPLOT and SGPANEL are two of the procedures that can be used to produce powerful graphics that used to require a lot of work. The core of the procedures is explained, as well as some of the many options available. Furthermore, we explore the ways to combine the individual statements to make more complex graphics that tell the story better. Any user of Base SAS on any platform will find great value in the SAS® ODS Graphics procedures.
Chuck Kincaid, Experis BI & Analytics Practice
The SAS® Output Delivery System (ODS) ExcelXP tagset offers users the ability to export a SAS data set, with all of the accompanying functions and calculations, to a Microsoft Excel spreadsheet. For industries, this is particularly useful because although not everyone is a SAS programmer, they would like to have access to and manipulate data from SAS. The ExcelXP tagset is one of several built-in templates for exporting data in SAS. The tagset gives programmers the ability to export functions and calculations into the cells of an Excel spreadsheet. Several options within the tagset enable the programmer to customize the Excel file. Some of these options enable the programmer to name the worksheet, style each table, embed titles and footnotes, and export multiple data tables to the same Excel worksheet.
Veronica Renauldo, Grand Valley State University
While there is no shortage of MS and MA programs in data science, PhD programs have been slow to develop. Is there a role for a PhD in data science? How would this differ from an advanced degree in statistics? Computer science? This session explores the issues related to the curriculum content for a PhD in data science and what those students would be expected to do after graduation. Come join the conversation.
Jennifer Priestley, Kennesaw State University
High-quality effective graphs not only enhance understanding of the data but also facilitate regulators in the review and approval process. In recent SAS® releases, SAS has made significant progress toward more efficient graphing in ODS Statistical Graphics (SG) procedures and Graph Template Language (GTL). A variety of graphs can be quickly produced using convenient built-in options in SG procedures. With graphical examples and comparison between SG procedures and traditional SAS/GRAPH® procedures in reporting clinical trial data, this paper highlights several key features in ODS Graphics to efficiently produce sophisticated statistical graphs with more flexible and dynamic control of graphical presentation including: 1) Better control of axes in different scales and intervals; 2) Flexible ways to control graph appearance; 3) Plots overlay in single-cell or multi-cell graphs; 4) Enhanced annotation; 5) Classification panel of multiple plots with individualized labeling.
Yuxin (Ellen) Jiang, Biogen
The current study looks at several ways to investigate latent variables in longitudinal surveys and their use in regression models. Three different analyses for latent variable discovery are briefly reviewed and explored. The latent analysis procedures explored in this paper are PROC LCA, PROC LTA, PROC TRAJ, and PROC CALIS. The latent variables are then included in separate regression models. The effect of the latent variables on the fit and use of the regression model compared to a similar model using observed data is briefly reviewed. The data used for this study was obtained via the National Longitudinal Study of Adolescent Health, a study distributed and collected by Add Health. Data was analyzed using SAS® 9.4. This paper is intended for any level of SAS® user. This paper is also written to an audience with a background in behavioral science and/or statistics.
Deanna Schreiber-Gregory, National University
From stock price histories to hospital stay records, analysis of time series data often requires use of lagged (and occasionally lead) values of one or more analysis variables. For the SAS® user, the central operational task is typically getting lagged (lead) values for each time point in the data set. While SAS has long provided a LAG function, it has no analogous lead function--an especially significant problem in the case of large data series. This paper reviews the LAG function, in particular the powerful, but non-intuitive implications of its queue-oriented basis. The paper demonstrates efficient ways to generate leads with the same flexibility as the LAG function, but without the common and expensive recourse to data re-sorting. It also shows how to dynamically generate leads and lags through use of the hash object.
Mark Keintz, Wharton Research Data Services
In this paper, a SAS® macro is introduced that can help users find and access their folders and files very easily. By providing a path to the macro and letting the macro know which folders and files you are looking for under this path, the macro creates an HTML report that lists the matched folders and files. The best part of this HTML report is that it also creates a hyperlink for each folder and file so that when a user clicks the hyperlink, it directly opens the folder or file. Users can also ask the macro to find certain folders or files by providing part of the folder or file name as the search criterion. The results shown in the report can be sorted in different ways so that it can further help users quickly find and access their folders and files.
Ting Sa, Cincinnati Children's Hospital Medical Center
Are you still using TRIM, LEFT, and vertical bar operators to concatenate strings? It's time to modernize and streamline that clumsy code by using the string concatenation functions introduced in SAS®9. This paper is an overview of the CAT, CATS, CATT, and CATX functions introduced in SAS®9, and the new CATQ function added in SAS® 9.2. In addition to making your code more compact and readable, this family of functions offers some new tricks for accomplishing previously cumbersome tasks.
Josh Horstman, Nested Loop Consulting
Logic model produced propensity scores have been intensively used to assist direct marketing name selections. As a result, only customers with an absolute higher likelihood to respond are mailed offers in order to achieve cost reduction. Thus, event ROI is increased. There is a fly in the ointment, however. Compared to the model building performance time window, usually 6 months to 12 months, a marketing event time period is usually much shorter. As such, this approach lacks of the ability to deselect those who have a high propensity score but are unlikely to respond to an upcoming campaign. To consider dynamically building a complete propensity model for every upcoming camping is nearly impossible. But, incorporating time to respond has been of great interest to marketers to add another dimension for response prediction enhancement. Hence, this paper presents an inventive modeling technique combining logistic regression and the Cox Proportional Hazards Model. The objective of the fusion approach is to allow a customer's shorter next to repurchase time to compensate for his or her insignificant lower propensity score in winning selection opportunities. The method is accomplished using PROC LOGISTIC, PROC LIFETEST, PROC LIFEREF, and PROC PHREG on the fusion model that is building in a SAS® environment. This paper also touches on how to use the results to predict repurchase response by demonstrating a case of repurchase time-shift prediction on the 12-month inactive customers of a big box store retailer. The paper also shares a results comparison between the fusion approach and logit alone. Comprehensive SAS macros are provided in the appendix.
Hsin-Yi Wang, Alliance Data Systems
When I help users design or debug their SAS® programs, they are sometimes unable to provide relevant SAS data sets because they contain confidential information. Sometimes, confidential data values are intrinsic to their problem, but often the problem could still be identified or resolved with innocuous data values that preserve some of the structure of the confidential data. Or the confidential values are in variables that are unrelated to the problem. While techniques for masking or disguising data exist, they are often complex or proprietary. In this paper, I describe a very simple macro, REVALUE, that can change the values in a SAS data set. REVALUE preserves some of the structure of the original data by ensuring that for a given variable, observations with the same real value have the same replacement value, and if possible, observations with a different real value have a different replacement value. REVALUE enables the user to specify which variables to change and whether to order the replacement values for each variable by the sort order of the real values or by observation order. I discuss the REVALUE macro in detail and provide a copy of the macro.
Bruce Gilsen, Federal Reserve Board
The SQL procedure is extremely powerful when it comes to summarizing and aggregating data, but it can be a little daunting for programmers who are new to SAS® or for more experienced programmers who are more familiar with using the SUMMARY or MEANS procedure for aggregating data. This hands-on workshop demonstrates how to use PROC SQL for a variety of summarization and aggregation tasks. These tasks include summarizing multiple measures for groupings of interest, combining summary and detail information (via several techniques), nesting summary functions by using inline views, and generating summary statistics for derived or calculated variables. Attendees will learn how to use a variety of PROC SQL summary functions, how to effectively use WHERE and HAVING clauses in constructing queries, and how to exploit the PROC SQL remerge. The examples become progressively more complex over the course of the workshop as you gain mastery of using PROC SQL for summarizing data.
Christianna Williams, Self-Employed
In 1965, nearly half of all Americans 65 and older had no health insurance. Now, 50 years later, only 2% lack health insurance. The difference, of course, is Medicare. Medicare now covers 55 million people, about 17% of the US population, and is the single largest purchaser of personal health care. Despite this success, the rising costs of health care in general and Medicare in particular have become a growing concern. Medicare policies are important not only because they directly affect large numbers of beneficiaries, payers, and providers, but also because they affect private-sector policies as well. Analyses of Medicare policies and their consequences are complicated both by the effects of an aging population that has changing cost drivers (such as less smoking and more obesity) and by different Medicare payment models. For example, the average age of the Medicare population will initially decrease as the baby-boom generation reaches eligibility, but then increase as that generation grows older. Because younger beneficiaries have lower costs, these changes will affect cost trends and patterns that need to be interpreted within the larger context of demographic shifts. This presentation examines three Medicare payment models: fee-for-service (FFS), Medicare Advantage (MA), and Accountable Care Organizations (ACOs). FFS, originally based on payment methods used by Blue Cross and Blue Shield in the mid-1960s, pays providers for individual services (for example, physicians are paid based on the fees they charge). MA is a capitated payment model in which private plans receive a risk-adjusted rate. ACOs are groups of providers who are given financial incentives for reducing cost and maintaining quality of care for specified beneficiaries. Each model has strengths and weaknesses in specific markets. We examine each model, in addition to new data sources and more recent, innovative payment models that are likely to affect future trends.
Paul Gorrell, IMPAQ International
When analyzing data with SAS®, we often encounter missing or null values in data. Missing values can arise from the availability, collectibility, or other issues with the data. They represent the imperfect nature of real data. Under most circumstances, we need to clean, filter, separate, impute, or investigate the missing values in data. These processes can take up a lot of time, and they are annoying. For these reasons, missing values are usually unwelcome and need to be avoided in data analysis. There are two sides to every coin, however. If we can think outside the box, we can take advantage of the negative features of missing values for positive uses. Sometimes, we can create and use missing values to achieve our particular goals in data manipulation and analysis. These approaches can make data analyses convenient and improve work efficiency for SAS programming. This kind of creative and critical thinking is the most valuable quality for data analysts. This paper exploits real-world examples to demonstrate the creative uses of missing values in data analysis and SAS programming, and discusses the advantages and disadvantages of these methods and approaches. The illustrated methods and advanced programming skills can be used in a wide variety of data analysis and business analytics fields.
Justin Jia, Trans Union Canada
Shan Shan Lin, CIBC
Across the languages of SAS® are many golden nuggets--functions, formats, and programming features just waiting to impress your friends and colleagues. While learning SAS for over 30 years, I have collected a few of these nuggets, and I offer a dozen more of them to you in this presentation. I presented the first dozen in a similar paper at SAS Global Forum 2015.
Peter Crawford, Crawford Software Consultancy limited
It is well documented in the literature the impact of missing data: data loss (Kim & Curry, 1977) and consequently loss of power (Raaijmakers, 1999), and biased estimates (Roth, Switzer, & Switzer, 1999). If left untreated, missing data is a threat to the validity of inferences made from research results. However, the application of a missing data treatment does not prevent researchers from reaching spurious conclusions. In addition to considering the research situation at hand and the type of missing data present, another important factor that researchers should also consider when selecting and implementing a missing data treatment is the structure of the data. In the context of educational research, multiple-group structured data is not uncommon. Assessment data gathered from distinct subgroups of students according to relevant variables (e.g., socio-economic status and demographics) might not be independent. Thus, when comparing the test performance of subgroups of students in the presence of missing data, it is important to preserve any underlying group effect by applying separate multiple imputation with multiple groups before any data analysis. Using attitudinal (Likert-type) data from the Civics Education Study (1999), this simulation study evaluates the performance of multiple-group imputation and total-group multiple imputation in terms of item parameter invariance within structural equation modeling using the SAS® procedure CALIS and item response theory using the SAS procedure MCMC.
Patricia Rodriguez de Gil, University of South Florida
Jeff Kromrey, University of South Florida
We asked for it, and once again, SAS delivers! Now production in SAS® 9.4M3, the ODS EXCEL statement provides users with an extremely easy way to export output to Microsoft Excel. Using existing ODS techniques that you may already be familiar with (like predefined styles, traffic-lighting, and custom formatting) in tandem with common Excel features (like formulas, frozen headers and footers, and page setup options), you can create a publication-ready document in a snap. As a matter of fact, the new ODS EXCEL statement is such a handy tool, you may find yourself using it for more than large-scope production programs. This statement is so straightforward that it may be a good choice for outputting quick ad hoc requests as well. Join me on my adventure as I explore how to use the ODS EXCEL statement to create institutional research reports. I offer tips and techniques you can use to make your output excel lent, too.
Gina Huff, Western Kentucky University
Writing music with SAS® is a simple process. Including a snippet of music in a program is a great way to signal completion of processing. It's also fun! This paper illustrates a method for translating music into SAS code using the CALL SOUND routine.
Dan Bretheim, Towers Watson
A new ODS destination for creating Microsoft Excel workbooks is available starting in the third maintenance release of SAS® 9.4. This destination creates native Microsoft Excel XLSX files, supports graphic images, and offers other advantages over the older ExcelXP tagset. In this presentation you learn step-by-step techniques for quickly and easily creating attractive multi-sheet Excel workbooks that contain your SAS® output. The techniques can be used regardless of the platform on which SAS software is installed. You can even use them on a mainframe! Creating and delivering your workbooks on-demand and in real time using SAS server technology is discussed. Although the title is similar to previous presentations by this author, this presentation contains new and revised material not previously presented.
Vince DelGobbo, SAS Institute Inc.
In the consumer credit industry, privacy is key and the scrutiny increases every day. When returning files to a client, we must ensure that they are depersonalized so that the client cannot match back to any personally identifiable information (PII). This means that we must locate any values for a variable that occur on a limited number of records and null them out (i.e., replace them with missing values). When you are working with large files that have more than one million observations and thousands of variables, locating variables with few unique values is a difficult task. While the FREQ procedure and the DATA step merging can accomplish the task, using first./last. BY variable processing to locate the suspect values and hash objects to merge the data set back together might offer increased efficiency.
Renee Canfield, Experian
The SAS® macro language is an efficient and effective way to handle repetitive processing tasks. One such task is conducting the same DATA steps or procedures for different time periods, such as every month, quarter, or year. SAS® dates are based on the number of days between January 1st, 1960, and the target date value, which, while very simple for a computer, is not a human-friendly representation of dates. SAS dates and macro processing are not simple concepts themselves, so mixing the two can be particularly challenging, and it can be very easy to end up working with bad dates! Understanding how macros and SAS dates work individually and together can greatly improve the efficiency of your coding and data processing tasks. This paper covers the basics of SAS macro processing, SAS dates, and the benefits and difficulties of using SAS dates in macros, to ensure that you never have a bad date again.
Kate Burnett-Isaacs, Statistics Canada
Andrew Clapson, MD Financial Management
Many databases include default values that are set inappropriately. For example, a top-level Yes/No question might be followed by a group of check boxes, to which a respondent might indicate multiple answers. When creating the database, a programmer might choose to set a default value of 0 for each box that is not checked. One should interpret these default values with caution, however, depending on whether the top-level question is answered Yes or No or is missing. A similar scenario occurs with a Yes/No question where No is the default value if the question is not answered (but actually the value should be missing). These default values might be scattered throughout the database; there might be no pattern to their occurrence. Records without valid information should be omitted from statistical analysis. Programmers should understand the difference between missing values and invalid values (that is, incorrectly set defaults) because it is important to handle these records differently. Choosing the best method to omit records with invalid values can be difficult. Manual corrections are often prohibitively time-consuming. SAS® offers a useful approach: a combined DATA step and SQL procedure. This paper provides a step-by-step method to accomplish the task.
Lily Yu, Statistics Collaborative, Inc.
In today's fast paced work environment, time management is crucial to the success of the project. Sending requests to SAS® programmers to run reports every time you need to get the most current data can be a stretch sometimes on an already strained schedule. Why bother to contact the programmer? Why not build the execution of the SAS program into the report itself so that when the report is launched, real-time data is retrieved and the report shows the most recent data. This paper demonstrates that by opening an existing SAS report in Microsoft Word or Microsoft Excel, the data in the report refreshes automatically. Simple Visual Basic for Applications (VBA) code is written in Word or Excel. When an existing SAS report is opened, this VBA code calls the SAS programs that create the report from within a Microsoft Office product and overwrites the existing report data with the most current data.
Ron Palanca, Mathematica Policy Research
Today, companies are increasingly using analytics to discover new revenue-increasing and cost-saving opportunities. Many business professionals turn to SAS, a leader in business analytics software and service, to help them improve performance and make better decisions faster. Analytics are also being used in risk management, fraud detection, life sciences, sports, and many more emerging markets. To maximize their value to a business, analytics solutions need to be deployed quickly and cost-effectively, while also providing the ability to scale readily without degrading performance. Of course, in today's demanding environments, where budgets are shrinking and the number of mandates to reduce carbon footprints are growing, the solution must deliver excellent hardware utilization, power efficiency, and return on investment. To address some of these challenges, Red Hat and SAS have collaborated to recommend the best practices for configuring SAS® 9 running on Red Hat Enterprise Linux. The scope of this document includes Red Hat Enterprise Linux 6 and 7. Researched areas include the I/O subsystem, file system selection, and kernel tuning, in both bare-metal and kernel-based virtual machine (KVM) environments. In addition, we include grid configurations that run with the Red Hat Resilient Storage Add-On, which includes Global File System 2 (GFS2) clusters.
Barry Marson, Red Hat, Inc
This paper highlights many of the major capabilities of the DATASETS procedure. It discusses how it can be used as a tool to update variable information in a SAS® data set; to provide information on data set and catalog contents; to delete data sets, catalogs, and indexes; to repair damaged SAS data sets; to rename files; to create and manage audit trails; to add, delete, and modify passwords; to add and delete integrity constraints; and more. The paper contains examples of the various uses of PROC DATASETS that programmers can cut and paste into their own programs as a starting point. After reading this paper, a SAS programmer will have practical knowledge of the many different facets of this important SAS procedure.
Michael Raithel, Westat
The DATA step has served SAS® programmers well over the years. Now there is a new, exciting, and powerful alternative: the DS2 procedure. By introducing an object-oriented programming environment, PROC DS2 enables users to effectively manipulate complex data and efficiently manage programming through additional data types, programming structure elements, user-defined methods, and shareable packages, as well as threaded execution. This hands-on workshop was developed based on our experiences with getting started with PROC DS2 and learning to use it to access, manage, and share data in a scalable and standards-based way. It will help SAS users of all levels easily get started with PROC DS2 and understand its basic functionality by practicing with its features.
Peter Eberhardt, Fernwood Consulting Group Inc.
Xue Yao, Winnipeg Regional Health Aurthority
Inspired by Christianna William's paper on transitioning to PROC SQL from the DATA step, this paper aims to help SQL programmers transition to SAS® by using PROC SQL. SAS adapted the Structured Query Language (SQL) by means of PROC SQL back with SAS®6. PROC SQL syntax closely resembles SQL. However, there are some SQL features that are not available in SAS. Throughout this paper, we outline common SQL tasks and how they might differ in PROC SQL. We also introduce useful SAS features that are not available in SQL. Topics covered are appropriate for novice SAS users.
Barbara Ross, NA
Jessica Bennett, Snap Finance
Today many analysts in information technology are facing challenges working with large amounts of data. Most analysts are smart enough to write queries using correct table join conditions, text mining, index keys, and hash objects for quick data retrieval. The SAS® system is now able to work with Hadoop data storage to provide efficient processing power to SAS analytics. In this paper we demonstrate differences in data retrieval between the SQL procedure, the DATA step, and the Hadoop environment. In order to test data retrieval time, we used the following code to randomly generate ten million observations with character and numeric variables using the RANUNI function. DO LOOP=1 to 10e7 will generate ten million records; however this code can generate any number of records by changing the exponential log. We use the most commonly used functions and procedures to retrieve records on a test data set and we illustrate real- and CPU-time processing. All PROC SQL and Hadoop queries are on SAS® Enterprise Guide® 6.1 to record processing time. This paper includes an overview of Hadoop data architecture and describes how to connect to a Hadoop environment through SAS. It provides sample queries for data retrieval, explains differences using Hadoop pass-through versus Hadoop LIBNAME, and covers the big challenges for real-time and CPU-time comparison using the most commonly used SAS functions and procedures.
Anjan Matlapudi,, AmeriHealth Caritas Family of Companies
SAS® software provides many DATA step functions that search and extract patterns from a character string, such as SUBSTR, SCAN, INDEX, TRANWRD, etc. Using these functions to perform pattern matching often requires you to use many function calls to match a character position. However, using the Perl regular expression (PRX) functions or routines in the DATA step improves pattern-matching tasks by reducing the number of function calls and making the program easier to maintain. This talk, in addition to discussing the syntax of Perl regular expressions, demonstrates many real-world applications.
Arthur Li, City of Hope
Financial institutions are working hard to optimize credit and pricing strategies at adjudication and for ongoing account and customer management. For cards and other personal lending products, there is intense competitive pressure, together with relentless revenue challenges, that creates a huge requirement for sophisticated credit and pricing optimization tools. Numerous credit and pricing optimization applications are available on the market to satisfy these needs. We present a relatively new approach that relies heavily on the effect modeling (uplift or net lift) technique for continuous target metrics--revenue, cost, losses, and profit. Examples of effect modeling to optimize the impact of marketing campaigns are known. We discuss five essential steps on the credit and pricing optimization path: (1) setting up critical credit and pricing champion/challenger tests, (2) performance measurement of specific test campaigns, (3) effect modeling, (4) defining the best effect model, and (5) moving from the effect model to the optimal solution. These steps require specific applications that are not easily available in SAS®. Therefore, necessary tools have been developed in SAS/STAT® software. We go through numerous examples to illustrate credit and pricing optimization solutions.
Yuri Medvedev, Bank of Montreal
This session discusses challenges and considerations typically faced in credit risk scoring as well as options and practical ways to address them. No two problems are ever the same even if the general approach is clear. Every model has its own unique characteristics and creative ways to address them. Successful credit scoring modeling projects are always based on a combination of both advanced analytical techniques and data, and a deep understanding of the business and how the model will be applied. Different aspects of the process are discussed, including feature selection, reject inferencing, sample selection and validation, and model design questions and considerations.
Regina Malina, Equifax
Horizontal data sorting is a very useful SAS® technique in advanced data analysis when you are using SAS programming. Two years ago (SAS® Global Forum Paper 376-2013), we presented and illustrated various methods and approaches to perform horizontal data sorting, and we demonstrated its valuable application in strategic data reporting. However, this technique can also be used as a creative analytic method in advanced business analytics. This paper presents and discusses its innovative and insightful applications in product purchase sequence analyses such as product opening sequence analysis, product affinity analysis, next best offer analysis, time-span analysis, and so on. Compared to other analytic approaches, the horizontal data sorting technique has the distinct advantages of being straightforward, simple, and convenient to use. This technique also produces easy-to-interpret analytic results. Therefore, the technique can have a wide variety of applications in customer data analysis and business analytics fields.
Justin Jia, Trans Union Canada
Shan Shan Lin, CIBC
A bubble map is a useful tool for identifying trends and visualizing the geographic proximity and intensity of events. This session describes how to use readily available map data sets in SAS® along with PROC GEOCODE and PROC GMAP to turn a data set of addresses and events into a bubble map of the United States with scaled bubbles depicting the location and intensity of events.
Caroline Walker, Warren Rogers Associates
Representational State Transfer (REST) is being used across the industry for designing networked applications to provide lightweight and powerful alternatives to web services such as SOAP and Web Services Description Language (WSDL). Since REST is based entirely on HTTP, SAS® provides everything you need to make REST calls and to process structured and unstructured data alike. This paper takes a look at how some enhancements in the third maintenance release of SAS® 9.4 can benefit you in this area. Learn how the HTTP procedure and other SAS language features provide everything you need to simply and securely use REST.
Joseph Henry, SAS
JavaScript Object Notation (JSON) has quickly become the de-facto standard for data transfer on the Internet, due to an increase in both web data and the usage of full-stack JavaScript. JSON has become dominant in the emerging technologies of the web today, such as the Internet of Things and the mobile cloud. JSON offers a light and flexible format for data transfer, and can be processed directly from JavaScript without the need for an external parser. The SAS® JSON procedure lacks the ability to read in JSON. However, the addition of the GROOVY procedure in SAS® 9.3 allows for execution of Java code from within SAS, allowing for JSON data to be read into a SAS data set through XML conversion. This paper demonstrates the method for parsing JSON into data sets with Groovy and the XML LIBNAME Engine, all within Base SAS®.
John Kennedy, Mesa Digital
The success of any marketing promotion is measured by the incremental response and revenue generated by the targeted population known as Test in comparison with the holdout sample known as Control. An unbiased random Test and Control sampling ensures that the incremental revenue is in fact driven by the marketing intervention. However, isolating the true incremental effect of any particular marketing intervention becomes increasingly challenging in the face of overlapping marketing solicitations. This paper demonstrates how a look-alike model can be applied using the GMATCH algorithm on a SAS® platform to design a truly comparable control group to accurately measure and isolate the impact of a specific marketing intervention.
Mou Dutta, Genpact LLC
Arjun Natarajan, Genpact LLC
As credit unions market themselves to increase their market share against the big banks, they understandably focus on gaining new members. However, they must also retain their existing members. Otherwise, the new members they gain can easily be offset by existing members who leave. Happily, by using predictive analytics as described in this paper, keeping (and further engaging) existing members can actually be much easier and less expensive than enlisting new members. This paper provides a step-by-step overview of a relatively simple but comprehensive approach to reduce member attrition. We first prepare the data for a statistical analysis. With some basic predictive analytics techniques, we can then identify those members who have the highest chance of leaving and the highest value. For each of these members, we can also identify why they would leave, thus suggesting the best way to intervene to retain them. We then make suggestions to improve the model for better accuracy. Finally, we provide suggestions to extend this approach to further engaging existing members and thus increasing their lifetime value. This approach can also be applied to many other organizations and industries. Code snippets are shown for any version of SAS® software; they also require SAS/STAT® software.
Nate Derby, Stakana Analytics
Mark Keintz, Wharton Research Data Services
SharePoint is a web application framework and platform developed by Microsoft, mostly used for content and document management by mid-size businesses and large departments. Linking SAS® with SharePoint combines the power of these two into one. This paper shows users how to send PDF reports and files in other formats (such as Microsoft Excel files, HTML files, JPEG files, zipped files, and so on) from SAS to a SharePoint Document Library. The paper demonstrates how to configure SharePoint Document Library settings to receive files from SAS. A couple of SAS code examples are included to show how to send files from SAS to SharePoint. The paper also introduces a framework for creating data visualization on SharePoint by feeding SAS data into HTML pages on SharePoint. An example of how to create an infographics SharePoint page with data from SAS is also provided.
Xiaogang Tang, Wyndham Worldwide
High-quality documentation of SAS® code is standard practice in multi-user environments for smoother group collaborations. One of the documentation items that facilitate program sharing and retrospective review is a header section at the beginning of a SAS program highlighting the main features of the program, such as the program's name, its creation date, the program's aims, the programmer's identification, and the project title. In this header section, it is helpful to keep a list of the inputs and outputs of the SAS program (for example, SAS data sets and files that the program used and created). This paper introduces SAS-IO, a browser-based HTML/JavaScript tool that can automate production of such an input/output list. This can save the programmers' time, especially when working with long SAS programs.
Mohammad Reza Rezai, Institute for Clinical Evaluative Sciences
SAS® users are always surprised to discover their programs contain bugs (or errors). In fact, when asked, users will emphatically stand by their programs and logic by saying they are error free. But, the vast number of experiences, along with the realities of writing code, say otherwise. Errors in program code can appear anywhere, whether accidentally introduced by developers or programmers, when writing code. No matter where an error occurs, the overriding sentiment among most users is that debugging SAS programs can be a daunting and humbling task. This presentation explores the world of SAS errors, providing essential information about the various error types. Attendees learn how errors are created, their symptoms, identification techniques, and how to apply effective techniques to better understand, repair, and enable program code to work as intended.
Kirk Paul Lafler, Software Intelligence Corporation
Spontaneous combustion describes combustion that occurs without an external ignition source. With the right combination of fire tetrahedron components--including fuel, oxidizer, heat, and chemical reaction--it can be a deadly yet awe-inspiring phenomenon, and differs from traditional combustion that requires a fire source, such as a match, flint, or spark plugs (in the case of combustion engines). SAS® code as well often requires a 'spark' the first time it is run or run within a new environment. Thus, SAS® programs might operate correctly in an organization's original development environment, but might fail in its production environment until folders are created, SAS® libraries are assigned, control tables are constructed, or configuration files are built or modified. And, if software portability is problematic within a single organization, imagine the complexities that exist when SAS code is imported from a blog, white paper, textbook, or other external source into one's own infrastructure. The lack of software portability and the complexities of initializing new code often compel development teams to build code from scratch rather than attempting to rehabilitate or customize existent code to run in their environment. A solution is to develop SAS code that flexibly builds and validates its environment and required components during execution. To that end, this text describes techniques that increase the portability, reusability, and maintainability of SAS code by demonstrating self-extracting, spontaneously combustible code that requires no spark.
Troy Hughes, Datmesis Analytics
The third maintenance release of SAS® 9.4 was a huge release with respect to the interoperability between SAS® and Hadoop, the industry standard for big data. This talk brings you up-to-date with where we are: more distributions, more data types, more options. Come and learn about the exciting new developments for blending your SAS processing with your shared Hadoop cluster. Grid processing. Check. SAS data sets on HDFS. Check.
Paul Kent, SAS
This paper examines the various sampling options that are available in SAS® through PROC SURVEYSELECT. We do not cover all of the possible sampling methods or options that PROC SURVEYSELECT features. Instead, we look at Simple Random Sampling, Stratified Random Sampling, Cluster Sampling, Systematic Sampling, and Sequential Random Sampling.
Rachael Becker, University of Central Florida
Drew Doyle, University of Central Florida
Just as there are many ways to solve any problem in any facet of life, most SAS® programming problems have more than one potential solution. Each solution has tradeoffs; a complex program might execute very quickly but prove challenging to maintain, while a program designed for ease of use might require more resources for development, execution, and maintenance. Too often, it seems like those tasked to produce the results are advised on delivery date and estimated development time in advance, but are given no guidelines for efficiency expectations. This paper provides ways for improving the efficiency of your SAS® programs. It suggests coding techniques, provides guidelines for their use, and shows the results of experimentation to compare various coding techniques, with examples of acceptable and improved ways to accomplish the same task.
Andrew Kuligowski, HSN
Swati Agarwal, Optum
Whether it is a question of security or a question of centralizing the SAS® installation to a server, the need to phase out SAS in the PC environment has never been so common. On the surface, this type of migration seems very simple and smooth. However, migrating SAS from a PC environment to a SAS server environment (SAS® Enterprise Guide®) is really easy to underestimate. This paper presents a way to set up the winning conditions to achieve this goal without falling into the common traps. Based on a successful conversion with a previous employer, I have identified a high-level roadmap with specific objectives that will guide people in this important task.
Mathieu Gaouette, Videotron
Sampling, whether with or without replacement, is an important component of the hypothesis testing process. In this paper, we demonstrate the mechanics and outcome of sampling without replacement, where sample values are not independent. In other words, what we get in the first sample affects what we can get for the second sample, and so on. We use the popular variant of poker known as No Limit Texas Hold'em to illustrate.
Dan Bretheim, Towers Watson
Are you frustrated with manually setting options to control your SAS® Display Manager sessions but become daunted every time you look at all the places you can set options and window layouts? In this paper, we look at various files SAS accesses when starting, what can (and cannot) go into them, and what takes precedence after all are executed. We also look at the SAS Registry and how to programmatically change settings. By the end of the paper, you will be comfortable in knowing where to make the changes that best fit your needs.
Peter Eberhardt, Fernwood Consulting Group Inc.
The report looks simple enough--a bar chart and a table, like something created with GCHART and REPORT procedures. But there are some twists to the reporting requirements that make those procedures not quite flexible enough. It's often the case that the programming tools and techniques we envision using for a project or are most comfortable with aren't necessarily the best to use. Fortunately, SAS® can provide many ways to get results. Rather than procedure-based output, the solution here was to mix 'old' and 'new' DATA step-based techniques to solve the problem. Annotate data sets are used to create the bar chart and the Report Writing Interface (RWI) is used to create the table. Without a whole lot of additional code, you gain an extreme amount of flexibility.
Pete Lund, Looking Glass Analytics
Big data is often distinguished as encompassing high volume, high velocity, or high variability of data. While big data can signal big business intelligence and big business value, it can also wreak havoc on systems and software ill-prepared for its profundity. Scalability describes the ability of a system or software to adequately meet the needs of additional users or its ability to use additional processors or resources to fulfill those added requirements. Scalability also describes the adequate and efficient response of a system to increased data throughput. Because sorting data is one of the most common and most resource-intensive operations in any software language, inefficiencies or failures caused by big data often are first observed during sorting routines. Much SAS® literature has been dedicated to optimizing big data sorts for efficiency, including minimizing execution time and, to a lesser extent, minimizing resource usage (that is, memory and storage consumption). However, less attention has been paid to implementing big data sorting that is reliable and robust even when confronted with resource limitations. To that end, this text introduces the SAFESORT macro, which facilitates a priori exception-handling routines (which detect environmental and data set attributes that could cause process failure) and post hoc exception-handling routines (which detect actual failed sorting routines). If exception handling is triggered, SAFESORT automatically reroutes program flow from the default sort routine to a less resource-intensive routine, thus sacrificing execution speed for reliability. Moreover, macro modularity enables developers to select their favorite sort procedure and, for data-driven disciples, to build fuzzy logic routines that dynamically select a sort algorithm based on environmental and data set attributes.
Troy Hughes, Datmesis Analytics
Teaching online courses is very different from teaching in the classroom setting. Developing and delivering an effective online class entails more than just transferring traditional course materials into written documents and posting them in a course shell. This paper discusses the author's experience in converting a traditional hands-on introductory SAS® programming class into an online course and presents some ideas for promoting successful learning and knowledge transfer when teaching online.
Justina Flavin, Independent Consultant
This paper compares different solutions to a data transpose puzzle presented to the SAS® User Group at the United States Census Bureau. The presented solutions range from a SAS 101 multi-step solution to an advanced solution using techniques that are not widely known, which yields run-time savings of 85 percent!
Ahmed Al-Attar, AnA Data Warehousing Consulting, LLC
Universities, government agencies, and non-profits all require various levels of data analysis, data manipulation, and data management. This requires a workforce that knows how to use statistical packages. The server-based SAS® OnDemand for Academics: Studio offerings are excellent tools for teaching SAS in both postsecondary education and in professional continuing education settings. SAS OnDemand for Academics: Studio can be used to share resources; teach users using Windows, UNIX, and Mac OS computers; and teach in various computing environments. This paper discusses why one might use SAS OnDemand for Academics: Studio instead of SAS® University Edition, SAS® Enterprise Guide®, or Base SAS® for education and provides examples of how SAS OnDemand for Academics: Studio has been used for in-person and virtual training.
Charlotte Baker, Florida A&M University
C. Perry Brown, Florida Agricultural and Mechanical University
Are you living in Heartbreak Hotel because your boss wants different statistics in the SAME column on your report? Need a currency symbol in one cell of your pre-summarized data, but a percent sign in another cell? Large blocks of text on your report have you all shook up because they wrap badly on your report? Have you hit the wall with PROC PRINT? Well, rock out of your jailhouse with ODS, DATA step, and PROC REPORT. Are you living in Heartbreak Hotel because your boss wants different statistics in the SAME column on your report? Need a currency symbol in one cell of your pre-summarized data, but a percent sign in another cell? Large blocks of text on your report have you all shook up because they wrap badly on your report? Have you hit the wall with PROC PRINT? Well, rock out of your jailhouse with ODS, DATA step, and PROC REPORT. This paper is a sequel to the popular 2008 paper Creating Complex Reports. The paper presents a nuts-and-bolts look at more complex report examples gleaned from SAS® Community Forum questions and questions from students. Examples will include use of DATA step manipulation to produce PROC REPORT and PROC SGPLOT output as well as examples of ODS LAYOUT and the new report writing interface. And PROC TEMPLATE makes a special guest appearance. Even though the King of Rock 'n' Roll won't be there for the presentation, perhaps we'll hear his ghost say Thank you very much, I always wanted to know how to do that at the end of this presentation.
Cynthia Zender, SAS
VBA has been described as a glue language, and has been widely used in exchanging data between Microsoft products such as Excel and Word or PowerPoint. How to trigger the VBA macro from SAS® via DDE has been widely discussed in recent years. However, using SAS to send parameters to a VBA macro was seldom reported. This paper provides a solution for this problem. Copying Excel tables to PowerPoint using the combination of SAS and VBA is illustrated as an example. The SAS program rapidly scans all Excel files that are contained in one folder, passes the file information to VBA as parameters, and triggers the VBA macro to write PowerPoint files in a loop. As a result, a batch of PowerPoint files can be generated by just one mouse-click.
Zhu Yanrong, Medtronic
Like a good pitcher and catcher in baseball, ODS layout and the ODS destination for PowerPoint are a winning combination in SAS® 9.4. With this dynamic duo, you can go straight from performing data analysis to creating a quality presentation. The ODS destination for PowerPoint produces native PowerPoint files from your output. When you pair it with ODS layout, you are able to dynamically place your output on each slide. Through code examples this paper shows you how to create a custom title slide, as well as place the desired number of graphs and tables on each slide. Don't be relegated to the sidelines--increase your winning percentage by learning how ODS layout works with the ODS destination for PowerPoint.
Jane Eslinger, SAS
Like all skilled tradespeople, SAS® programmers have many tools at their disposal. Part of their expertise lies in knowing when to use each tool. In this paper, we use a simple example to compare several common approaches to generating the requested report: the TABULATE, TRANSPOSE, REPORT, and SQL procedures. We investigate the advantages and disadvantages of each method and consider when applying it might make sense. A variety of factors are examined, including the simplicity, reusability, and extensibility of the code in addition to the opportunities that each method provides for customizing and styling the output. The intended audience is beginning to intermediate SAS programmers.
Josh Horstman, Nested Loop Consulting
The HPSUMMARY procedure provides data summarization tools to compute basic descriptive statistics for variables in a SAS® data set. It is a high-performance version of the SUMMARY procedure in Base SAS®. Though PROC SUMMARY is popular with data analysts, PROC HPSUMMARY is still a new kid on the block. The purpose of this paper is to provide an introduction to PROC HPSUMMARY by comparing it with its well-known counterpart, PROC SUMMARY. The comparison focuses on differences in syntax, options, and performance in terms of execution time and memory usage. Sample code, outputs, and SAS log snippets are provided to illustrate the discussion. Simulated large data is used to observe the performance of the two procedures. Using SAS® 9.4 installed on a single-user machine with four cores available, preliminary experiments that examine performance of the two procedures show that HPSUMMARY is more memory-efficient than SUMMARY when the data set is large. (For example, SUMMARY failed due to insufficient memory, whereas HPSUMMARY finished successfully). However, there is no evidence of a performance advantage of the HPSUMMARY over the SUMMARY procedures in this single-user machine.
Anh Kellermann, University of South Florida
Jeff Kromrey, University of South Florida
SAS customers have a growing need for accurate and quality SAS® maps. SAS has licensed the map data from a third party to satisfy this need. This presentation explores the new map data by discussing the problems and limitations with the old map data, and the features and examples for using the new data.
Darrell Massengill, SAS
The SAS® procedure PROC TREE sketches parent-child lineage--also known as trees--from hierarchical data. Hierarchical relationships can be difficult to flatten out into a data set, but with PROC TREE, its accompanying ODS table TREELISTING, and some creative yet simple handcrafting, a de-lineage of parent-children variables can be derived. Because the focus of PROC TREE is to provide the tree structure in graphical form, it does not explicitly output the depth of the tree, although the depth is visualized in the accompanying graph. Perhaps unknown to most, the path length variable, or simply the height of the tree, can be extracted from PROC TREE merely by capturing it from the ODS output, as demonstrated in this paper.
Can Tongur, Statistics Sweden
Time flies. Thirteen years have passed since the first introduction of the hash method by SAS®. Dozens of papers have been written offering new (sometimes weird) applications for hash. Even beyond look-ups, we can use the hash object for summation, splitting files, sorting files, fuzzy matching, and other data processing problems. In SAS, almost any task can be solved by more than one method. For a class of problems where hash can be applied, is it the best tool to use? We present a variety of problems and provide several solutions, using both hash tables and more traditional methods. Comparing the solutions, you must consider computing time as well as your time to code and validate your responses. Which solution is better? You decide!
David Izrael, Abt Associates
Elizabeth Axelrod, Abt Associates
This SAS® test begins where the SAS® Advanced Certification test leaves off. It has 25 questions to challenge even the most experienced SAS programmers. Most questions are about the DATA step, but there are a few macro and SAS/ACCESS® software questions thrown in. The session presents each question on a slide, then the answer on the next. You WILL be challenged.
Glen Becker, USAA
The Scheduler is an innovative tool that maps linear data (that is, time stamps) to an intuitive three-dimensional representation. This transformation enables the user to identify relationships, conflicts, gaps, and so on, that were not readily apparent in the data's native form. This tool has applications in operations research and litigation-related analyses. This paper discusses why the Scheduler was created, the data that is available to analyze the issue, and how this code can be used in other types of applications. Specifically, this paper discusses how the Scheduler can be used by supervisors to maintain the presence of three or more employees at all times to ensure that all federal- and state- mandated work breaks are taken. Additional examples of the Scheduler include assisting construction foremen to create schedules that visualize the presence of contractors in a logical sequence while eliminating overlap and ensuring cushions of time between contractors and matching items to non-uniform information.
Ariel Kumpinsky, The Claro Group, LLC
I like to think that SAS® error messages come in three flavors, IMPORTANT, INTERESTING, and IRRELEVANT. SAS calls its messages NOTES, WARNINGS, and ERRORS. I intend to show you that not all NOTES are IRRELEVANT nor are all ERRORS IMPORTANT. This paper walks through many different scenarios and explains in detail the meaning and impact of messages presented in the SAS log. I show you how to locate, classify, analyze, and resolve many different SAS message types. And for those brave enough, I go on to teach you how to both generate and suppress messages sent to the SAS log. This paper presents an overview of messages that can often be found in a SAS Log window or output file. The intent of this presentation is to familiarize you with common messages, the meaning of the messages, and how to determine the best way to react to the messages. Notice I said react , not necessarily correct. Code examples and log output will be presented to aid in the explanations.
William E Benjamin Jr, Owl Computer Consultancy LLC
Specifying colors based on group value is a quite popular practice in visualizing data, but it is not so easy to do, especially when there are multiple group values. This paper explores three different methods to dynamically assign colors to plots based on their group values. They are combining EVAL and IFN functions in the plot statements; bringing the DISCRETEATTRMAP block into the plot statements; and using the macro from the SAS® sample 40255.
Amos Shu, MedImmune
This paper shows how SAS® can be used to obtain a Time Series Analysis of data regarding World War II. This analysis tests whether Truman's justification for the use of atomic weapons was valid. Truman believed that by using the atomic weapons, he would prevent unacceptable levels of U.S. casualties that would be incurred in the course of a conventional invasion of the Japanese islands.
Rachael Becker, University of Central Florida
SAS® DS2 is a powerful new object-oriented programming language that was introduced with SAS® 9.4. Having been designed for advanced data manipulation, it enables the programmer not only to use a much wider range of data types than with the traditional Base SAS® language but it also allows for the creation of custom methods and packages. These packages are analogous to classes in other object-oriented languages such as C# or Ruby and can be used to massively improve your programming effectiveness. This paper demonstrates a number of techniques that can be used to take full advantage of DS2's object-oriented user-defined package capabilities. Examples include creating new data types, storing and reusing packages, using simple packages as building blocks in the creation of more complex packages, a technique to overcome DS2's lack of support for inheritance and building lightweight packages to facilitate method overloading and make parameter passing simpler.
Chris Brooks, Melrose Analytics Ltd
Do you need a macro for your program? This paper provides some guidelines, based on user experience, and explores whether it's worth the time to create a macro (for example, a parameter-driven macro or just a simple macro variable). This paper is geared toward new users and experienced users who do not use macros.
Claudine Lougee, Dualenic
Sometimes data are not arranged the way you need them to be for the purpose of reporting or combining with other data. This presentation examines just such a scenario. We focus on the TRANSPOSE procedure and its ability to transform data to meet our needs. We explore this method as an alternative to using longhand code involving arrays and OUTPUT statements in a SAS® DATA step.
Faron Kincheloe, Baylor University
If data sets are being merged and a variable occurs in more than one of the data sets, the last value (from the variable in the right-most data set listed in the MERGE statement) is kept. It is critical, therefore, to know the name and sources of any common variables. This paper reviews simple techniques for noting overwriting in the log, summarizing the names and sources of all variables, and identifying common variables before merging files.
Christopher Bost, MDRC
This session describes the construction of a program that converts any set of relational tables into a single flat file, using Base SAS® in SAS® 9.3. The program gets the information it needs from the data tables themselves, with a minimum of user configuration. It automatically detects one-to-many relationships and creates sets of replicate variables in the flat file. The program illustrates the use of macro programming and the SQL, DATASETS, and TRANSPOSE procedures, among others. The output is sent to a Microsoft Excel spreadsheet using the Output Delivery System (ODS).
Dav Vandenbroucke, HUD
Administrative health databases, including hospital and physician records, are frequently used to estimate the prevalence of chronic diseases. Disease-surveillance information is used by policy makers and researchers to compare the health of populations and develop projections about disease burden. However, not all cases are captured by administrative health databases, which can result in biased estimates. Capture-recapture (CR) models, originally developed to estimate the sizes of animal populations, have been adapted for use by epidemiologists to estimate the total sizes of disease populations for such conditions as cancer, diabetes, and arthritis. Estimates of the number of cases are produced by assessing the degree of overlap among incomplete lists of disease cases captured in different sources. Two- and three-source CR models are most commonly used, often with covariates. Two important assumptions--independence of capture in each data source and homogeneity of capture probabilities, which underlie conventional CR models--are unlikely to hold in epidemiological studies. Failure to satisfy these assumptions bias the model results. Log-linear, multinomial logistic regression, and conditional logistic regression models, if used properly, can incorporate dependency among sources and covariates to model the effect of heterogeneity in capture probabilities. However, none of these models is optimal, and researchers might be unfamiliar with how to use them in practice. This paper demonstrates how to use SAS® to implement the log-linear, multinomial logistic regression, and conditional logistic regression CR models. Methods to address the assumptions of independence between sources and homogeneity of capture probabilities for a three-source CR model are provided. The paper uses a real numeric data set about Parkinson's disease involving physician claims, hospital abstract, and prescription drug records from one Canadian province. Advantages and disadvantages of each model are discus
sed.
Lisa Lix, University of Manitoba
The Statistical Graphics (SG) procedures and the Graph Template Language (GTL) are capable of generating powerful individual data displays. What if one wanted to visualize how a distribution changes with different parameters, or to view multiple aspects of a three-dimensional plot, just as two examples? By using macros to generate a graph for each frame, combined with the ODS PRINTER destination, it is possible to create GIF files to create effective animated data displays. This paper outlines the syntax and strategy necessary to generate these displays and provides a handful of examples. Intermediate knowledge of PROC SGPLOT, PROC TEMPLATE, and the SAS® macro language is assumed.
Jesse Pratt, Cincinnati Children's Hospital Medical Center
Stephen Curry, James Harden, and LeBron James are considered to be three of the most gifted professional basketball players in the National Basketball Association (NBA). Each year the Kia Most Valuable Player (MVP) award is given to the best player in the league. Stephen Curry currently holds this title, followed by James Harden and LeBron James, the first two runners-up. The decision for MVP was made by a panel of judges comprised of 129 sportswriters and broadcasters, along with fans who were able to cast their votes through NBA.com. Did the judges make the correct decision? Is there statistical evidence that indicates that Stephen Curry is indeed deserving of this prestigious title over James Harden and LeBron James? Is there a significant difference between the two runners-up? These are some of the questions that are addressed through this project. Using data collected from NBA.com for the 2014-2015 season, a variety of parametric and nonparametric k-sample methods were used to test 20 quantitative variables. In an effort to determine which of the three players is the most deserving of the MVP title, post-hoc comparisons were also conducted on the variables that were shown to be significant. The time-dependent variables were standardized, because there was a significant difference in the number of minutes each athlete played. These variables were then tested and compared with those that had not been standardized. This led to significantly different outcomes, indicating that the results of the tests could be misleading if the time variable is not taken into consideration. Using the standardized variables, the results of the analyses indicate that there is a statistically significant difference in the overall performances of the three athletes, with Stephen Curry outplaying the other two players. However, the difference between James Harden and LeBron James is not so clear.
Sherrie Rodriguez, Kennesaw State University
Most businesses have benefited from using advanced analytics for marketing and other decision making. But to apply analytical techniques to pharmaceutical marketing is challenging and emerging as it is critical to ensure that the analysis makes sense from the medical side. The drug for a specific disease finally consumed is directly or indirectly influenced by many factors, including the disease origins, health-care system policy, physicians' clinical decisions, and the patients' perceptions and behaviors. The key to pharmaceutical marketing is in identifying the targeted populations for specific diseases and to focus on those populations. Because the health-care environment consistently changes, the predictive models are important to predict the change of the targeted population over time based on the patient journey and epidemiology. Time series analysis is used to forecast the number of cases of infectious diseases; correspondingly, over the counter and prescribed medicines for the specific disease could be predicted. The accurate prediction provides valuable information for the strategic plan of campaigns. For different diseases, different analytical techniques are applied. By taking the medical features of the disease and epidemiology into account, the prediction of the potential and total addressable markets can reveal more insightful marketing trends. And by simulating the important factors and quantifying how they impact the patient journey within the typical health-care system, the most accurate demand for specific medicines or treatments could be discovered. Through monitoring the parameters in the dynamic simulation, the smart decision can be made using what-if comparisons to optimize the marketing result.
Xue Yao, Winnipeg Regional Health Aurthority
An important strength of observational studies is the ability to estimate a key behavior's or treatment's effect on a specific health outcome. This is a crucial strength as most health outcomes research studies are unable to use experimental designs due to ethical and other constraints. Keeping this in mind, one drawback of observational studies (that experimental studies naturally control for) is that they lack the ability to randomize their participants into treatment groups. This can result in the unwanted inclusion of a selection bias. One way to adjust for a selection bias is through the use of a propensity score analysis. In this study, we provide an example of how to use these types of analyses. Our concern is whether recent substance abuse has an effect on an adolescent's identification of suicidal thoughts. In order to conduct this analysis, a selection bias was identified and adjustment was sought through three common forms of propensity scoring: stratification, matching, and regression adjustment. Each form is separately conducted, reviewed, and assessed as to its effectiveness in improving the model. Data for this study was gathered through the Youth Risk Behavior Surveillance System, an ongoing nation-wide project of the Centers for Disease Control and Prevention. This presentation is designed for any level of statistician, SAS® programmer, or data analyst with an interest in controlling for selection bias, as well as for anyone who has an interest in the effects of substance abuse on mental illness.
Deanna Schreiber-Gregory, National University
Have you questioned the Read throughput or Write throughput for your Windows system drive? What about the input/output (I/O) throughput of a non-system drive? One solution is use SASIOTEST.EXE to measure the Read or Write throughput for any drive connected to your system. Since SAS® 9.2, SASIOTEST.EXE has been included with each release of SAS for Windows. This paper explains the options supported by SASIOTEST and the various ways to use SASIOTEST. It also describes how the I/O relates to SAS I/O on Windows.
Mike Jones, SAS
The increasing popularity and affordability of wearable devices, together with their ability to provide granular physical activity data down to the minute, have enabled researchers to conduct advanced studies on the effects of physical activity on health and disease. This provides statistical programmers the challenge of processing data and translating it into analyzable measures. One such measure is the number of time-specific bouts of moderate to vigorous physical activity (MVPA) (similar to exercise), which is needed to determine whether the participant meets current physical activity guidelines (for example, 150 minutes of MVPA per week performed in bouts of at least 20 minutes). In this paper, we illustrate how we used SAS® arrays to calculate the number of 20-minute bouts of MVPA per day. We provide working code on how we processed Fitbit Flex data from 63 healthy volunteers whose physical activities were monitored daily for a period of 12 months.
Faith Parsons, Columbia University Medical Center
Keith M Diaz, Columbia University Medical Center
Jacob E Julian, Columbia University Medical Center
Our daily work in SAS® involves manipulation of many independent data sets, and a lot of time can be saved if independent data sets can be manipulated simultaneously. This paper presents our interface RunParallel, which opens multiple SAS sessions and controls which SAS procedures and DATA steps to run on which sessions by parsing comments such as /*EP.SINGLE*/ and /*EP.END*/. The user can easily parallelize any code by simply wrapping procedure steps and DATA steps in such comments and executing in RunParallel. The original structure of the SAS code is preserved so that it can be developed and run in serial regardless of the RunParallel comments. When applied in SAS programs with many external data sources and heavy computations, RunParallel can give major performance boosts. Among our examples we include a simulation that demonstrates how to run DATA steps in parallel, where the performance gain greatly outweighs the single minute it takes to add RunParallel comments to the code. In a world full a big data, a lot of time can be saved by running in parallel in a comprehensive way.
Jingyu She, Danica Pension
Tomislav Kajinic, Danica Pension
The Add Health Parent Study is using a new and innovative method to augment our other interview verification strategies. Typical verification strategies include calling respondents to ask questions about their interview, recording pieces of interaction (CARI - Computer Aided Recorded Interview), and analyzing timing data to see that each interview was within a reasonable length. Geocoding adds another tool to the toolbox for verifications. By applying street-level geocoding to the address where an interview is reported to be conducted and comparing that to a captured latitude/longitude reading from a GPS tracking device, we are able to compute the distance between two points. If that distance is very small and time stamps are close to each other, then the evidence points to the field interviewer being present at the respondent's address during the interview. For our project, the street-level geocoding to an address is done using SAS® PROC GEOCODE. Our paper describes how to obtain a US address database from the SAS website and how it can be used in PROC GEOCODE. We also briefly compare this technique to using the Google Map API and Python as an alternative.
Chris Carson, RTI International
Lilia Filippenko, RTI International
Mai Nguyen, RTI International
SAS® Studio includes tasks that can be used to generate SAS® programs to process SAS data sets. The graph tasks generate SAS programs that use ODS Graphics to produce a range of plots and charts. SAS® Enterprise Guide 7.1 and SAS® Add-In for Microsoft Office 7.1 can also make use of these SAS Studio tasks to generate graphs with ODS Graphics, even though their built-in tasks use SAS/GRAPH®. This paper describes these SAS Studio graph tasks.
Philip Holland, Holland Numerics Ltd
Marriage laws have changed in 38 states, including the District of Columbia, permitting same-sex couples to marry since 2004. However, since 1996, 13 states have banned same-sex marriages, some of which are under legal challenge. A U.S. animated map, made using the SAS® GMAP procedure in SAS® 9.4, shows the changes in the acceptance, rejection, or undecided legal status of this issue, year-by-year. In this study, we also present other SAS data visualization techniques to show how concentrations (percentages) of married same-sex couples and their relationships have evolved over time, thereby instigating changes in marriage laws. Using a SAS GRAPH display, we attempt to show how both the total number of same-sex-couple households and the percent that are married have changed on a state-by-state basis since 2005. These changes are also compared to the year in which marriage laws permitting same-sex marriages were enacted, and are followed over time since that year. It is possible to examine trends on an annual basis from 2005 to 2013 using the American Community Survey one-year estimates. The SAS procedures and SAS code used to create the data visualization and data manipulation techniques are provided.
Bula Ghose, HHS/ASPE
Susan Queen, National Center for Health Statistics
Joan Turek, Health and Human Services
The objective of this study is to use the GLM procedure in SAS® to solve a complex linkage problem with multiple test forms in educational research. Typically, the ABSORB option in the GLM procedure makes this task relatively easy to implement. Note that for educational assessments, to apply one-dimensional combinations of two-parameter logistic (2PL) models (Hambleton, Swaminathan, and Rogers 1991, ch. 1) and generalized partial credit models (Muraki 1997) to a large-scale high-stakes testing program with very frequent administrations requires a practical approach to link test forms. Haberman (2009) suggested a pragmatic solution of simultaneous linking to solve the challenging linking problem. In this solution, many separately calibrated test forms are linked by the use of least-squares methods. In SAS, the GLM procedure can be used to implement this algorithm by the use of the ABSORB option for the variable that specifies administrations, as long as the data are sorted by order of administration. This paper presents the use of SAS to examine the application of this proposed methodology to a simple case of real data.
Lili Yao, Educational Testing Service
The LACE readmission risk score is a methodology used by Kaiser Permanente Northwest (KPNW) to target and customize readmission prevention strategies for patients admitted to the hospital. This presentation shares how KPNW used SAS® in combination with Epic's Datalink to integrate the LACE score into its electronic health record (EHR) for usage in real time. The LACE score is an objective measure, composed of four components including: L) length of stay; A) acuity of admission; C) pre-existing co-morbidities; and E) Emergency department (ED) visits in the prior six months. SAS was used to perform complex calculations and combine data from multiple sources (which was not possible for the EHR alone), and then to calculate a score that was integrated back into the EHR. The technical approach includes a trigger macro to kick off the process once the database ETL completes, several explicit and implicit proc SQL statements, a volatile temp table for filtering, and a series of SORT, MEANS, TRANSPOSE, and EXPORT procedures. We walk through the technical approach taken to generate and integrate the LACE score into Epic, as well as describe the challenges we faced, how we overcame them, and the beneficial results we have gained throughout the process.
Delilah Moore, Kaiser Permanente
This paper introduces an extremely fast and simple implementation of the survey cell collapsing process. Prior implementations had used either several SQL queries or numerous DATA step arrays, with multiple data reads. This new approach uses a single hash object with a maximum of two data reads. The hash object provides an efficient and convenient mechanism for quick data storage and retrieval (sub-second total run time).
Ahmed Al-Attar, AnA Data Warehousing Consulting, LLC
The DOCUMENT procedure is a little known procedure that can save you vast amounts of time and effort when managing the output of your SAS® programming efforts. This procedure is deeply associated with the mechanism by which SAS controls output in the Output Delivery System (ODS). Have you ever wished you didn't have to modify and rerun the report-generating program every time there was some tweak in the desired report? PROC DOCUMENT enables you to store one version of the report as an ODS Document Object and then call it out in many different output forms, such as PDF, HTML, listing, RTF, and so on, without rerunning the code. Have you ever wished you could extract those pages of the output that apply to certain BY variables such as State, StudentName, or CarModel? With PROC DOCUMENT, you have where capabilities to extract these. Do you want to customize the table of contents that assorted SAS procedures produce when you make frames for the table of contents with HTML, or use the facilities available for PDF? PROC DOCUMENT enables you to get to the inner workings of ODS and manipulate them. This paper addresses PROC DOCUMENT from the viewpoint of end results, rather than provide a complete technical review of how to do the task at hand. The emphasis is on the benefits of using the procedure, not on detailed mechanics.
Roger Muller, Data-To-Events, Inc.
A number of SAS® tools can be used to report data, such as the PRINT, MEANS, TABULATE, and REPORT procedures. The REPORT procedure is a single tool that can produce many of the same results as other SAS tools. Not only can it create detailed reports like PROC PRINT can, but it can summarize and calculate data like the MEANS and TABULATE procedures do. Unfortunately, despite its power, PROC REPORT seems to be used less often than the other tools, possibly due to its seemingly complex coding. This paper uses PROC REPORT and the Output Delivery System (ODS) to export a big data set into a customized XML file that a user who is not familiar with SAS can easily read. Several options for the COLUMN, DEFINE, and COMPUTE statements are shown that enable you to present your data in a more colorful way. We show how to control the format of the selected columns and rows, make column headings more meaningful, and how to color selected cells differently to bring attention to the most important data.
Guihong Chen, TCF Bank
By default, the SAS® hash object permits only entries whose keys, defined in its key portion, are unique. While in certain programming applications this is a rather utile feature, there are also others for which being able to insert and manipulate entries with duplicate keys is imperative. Such an ability, facilitated in SAS since SAS® 9.2, was a welcome development: It vastly expanded the functionality of the hash object and eliminated the necessity to work around the distinct-key limitation using custom code. However, nothing comes without a price, and the ability of the hash object to store duplicate key entries is no exception. In particular, additional hash object methods had to be--and were--developed to handle specific entries sharing the same key. The extra price is that using these methods is surely not quite as straightforward as the simple corresponding operations on distinct-key tables, and the documentation alone is a rather poor help for making them work in practice. Rather extensive experimentation and investigative coding is necessary to make that happen. This paper is a result of such endeavor, and hopefully, it will save those who delve into it a good deal of time and frustration.
Paul Dorfman, Dorfman Consulting
One of the truths about SAS® is that there are, at a minimum, three approaches to achieve any intended task and each approach has its own pros and cons. Identifying and using efficient SAS programming techniques are recommended and efficient programming becomes mandatory when larger data sets are accessed. This paper describes the efficiency of virtual access and various situations to use virtual access of data sets using OPEN, FETCH and CLOSE functions with %SYSFUNC and %DO loops. There are several ways to get data set attributes like number of observations, variables, types of variables, and so on. It becomes more efficient to access larger data sets using the OPEN function with %SYSFUNC. The FETCH with OPEN function gives the ability to access the values of the variables from a SAS data set for conditional and iterative executions with %DO loops. In the following situations, virtual access of the SAS data set becomes more efficient on larger data sets. Situation 1: It is often required to split the master data set into several pieces depending upon a certain criterion for the batch submission, as it is required that data sets be independent. Situation 2: For many reports and dashboards and for the array process, it is required to keep the relevant variables together instead of maintaining the original data set order. Situation 3: For most of the statistical analysis, particularly for correlation and regression, a widely and frequently used SAS procedure requires a dynamic variable list to be passed. Creating a single macro variable list might run into an issue with the macro variable length on the larger transactional data sets. It is recommended that you prepare the list of the variables as a data set and access them using the proposed approach in many places, instead of creating several macro variables.
Amarnath Vijayarangan, Emmes Services Pvt Ltd, India
Few data visualizations are as striking or as useful as heat maps, and fortunately there are many applications for them. A heat map displaying eye-tracking data is a particularly potent example: the intensity of a viewer's gaze is quantified and superimposed over the image being viewed, and the resulting data display is often stunning and informative. This paper shows how to use Base SAS® to prepare, smooth, and transform eye-tracking data and ultimately render it on top of a corresponding image. By customizing a graphical template and using specific Graph Template Language (GTL) options, a heat map can be drawn precisely so that the user maintains pixel-level control. In this talk, and in the related paper, eye-tracking data is used primarily, but the techniques provided are easily adapted to other fields such as sports, weather, and marketing.
Matthew Duchnowski, Educational Testing Service
Making a data delivery to a client is a complicated endeavor. There are many aspects that must be carefully considered and planned for: de-identification, public use versus restricted access, documentation, ancillary files such as programs, formats, and so on, and methods of data transfer, among others. This paper provides a blueprint for planning and executing your data delivery.
Louise Hadden, Abt Associates
Software quality comprises a combination of both functional and performance requirements that specify not only what software should accomplish, but also how well it should accomplish it. Recoverability--a common performance objective--represents the timeliness and efficiency with which software or a system can resume functioning following a catastrophic failure. Thus, requirements for high availability software often specify the recovery time objective (RTO), or the maximum amount of time that software might be down following an unplanned failure or a planned outage. While systems demanding high or near perfect availability require redundant hardware and network resources, and additional infrastructure, software must also facilitate rapid recovery. And, in environments in which system or hardware redundancy is infeasible, recoverability can be improved only through effective software development practices. Because even the most robust code can fail under duress or due to unavoidable or unpredictable circumstances, software reliability must incorporate recoverability principles and methods. This text introduces the TEACH mnemonic that describes guiding principles that software recovery should be timely, efficient, autonomous, constant, and harmless. Moreover, the text introduces the SPICIER mnemonic that describes discrete phases in the recovery period, each of which can benefit from and be optimized with TEACH principles. Software failure is inevitable, but negative impacts can be minimized through SAS® development best practices.
Troy Hughes, Datmesis Analytics
Even if you're not a GIS mapping pro, it pays to have some geographic problem-solving techniques in your back pocket. In this paper we illustrate a general approach to finding the closest location to any given US zip code, with a specific, user-accessible example of how to do it, using only Base SAS®. We also suggest a method for implementing the solution in a production environment, as well as demonstrate how parallel processing can be used to cut down on computing time if there are hardware constraints.
Andrew Clapson, MD Financial Management
Annmarie Smith, HomeServe USA
I get annoyed when macros tread all over my SAS® environment, macro variables and data sets. So how do you write macros that play nicely and then clean up afterwards? This paper describes techniques for writing macros that do just that.
Philip Holland, Holland Numerics Ltd
Do you write reports that sometimes have missing categories across all class variables? Some programmers write all sorts of additional DATA step code in order to show the zeros for the missing rows or columns. Did you ever wonder whether there is an easier way to accomplish this? PROC MEANS and PROC TABULATE, in conjunction with PROC FORMAT, can handle this situation with a couple of powerful options. With PROC TABULATE, we can use the PRELOADFMT and PRINTMISS options in conjunction with a user-defined format in PROC FORMAT to accomplish this task. With PROC SUMMARY, we can use the COMPLETETYPES option to get all the rows with zeros. This paper uses examples from Census Bureau tabulations to illustrate the use of these procedures and options to preserve missing rows or columns.
Chris Boniface, Census Bureau
Janet Wysocki, U.S. Census Bureau