Migration to a SAS® Grid Computing environment provides many advantages. However, such migration might not be free from challenges especially considering users' pre-migration routines and programming practices. While SAS® provides good graphical user interface solutions (for example, SAS® Enterprise Guide®) to develop and submit the SAS code to SAS Grid Computing, some situations might need command-line batch submission of a group of related SAS programs in a particular sequence. Saving individual log and output files for each program might also be a favorite routine in many organizations. SAS has provided the SAS Grid Manager Client Utility and SASGSUB commands to enable command-line submission of SAS programs to the grid. However, submitting a sequence of SAS programs in a conventional batch program style and getting individual logs and outputs for them needs a customized approach. This paper presents such an approach. In addition, an HTML and JavaScript tool developed in-house is introduced. This tool automates the generation of a SAS program that almost emulates a conventional scenario of submitting a batch program in a command-line shell using SASGSUB commands.
Mohammad-Reza Rezai, Institute for Clinical Evaluative Sciences
Mahmoud Azimaee, Institute for Clinical Evaluative Sciences
Jiming Fang, Institute for Clinical Evaluative Sciences
Jason Chai-Onn, Institute for Clinical Evaluative Sciences
The Clinical Data Interchange Standards Consortium (CDISC) encompasses a variety of standards for medical research. Amongst the several standards developed by the CDISC organization are standards for data collection (Clinical Data Acquisition Standards Harmonization CDASH), data submission (Study Data Tabulation Model SDTM), and data analysis (Analysis Data Model ADaM). These standards were originally developed with drug development in mind. Therapeutic Area User Guides (TAUGs) have been a recent focus to provide advice, examples, and explanations for collecting and submitting data for a specific disease. Non-subjects even have a way to collect data using the Associated Persons Implementation Guide (APIG). SDTM domains for medical devices were published in 2012. Interestingly, the use of device domains in the TAUGs occurs in 14 out of 18 TAUGs, providing examples of the use of various device domains. Drug-device studies also provide a contrast on adoption of CDISC standards for drug submissions versus device submissions. Adoption of SDTM in general and the seven device SDTM domains by the medical device industry has been slow. Reasons for the slow adoption are discussed in this paper.
Carey Smoak, DataCeutics
Creating sophisticated, visually stunning reports is imperative in today s business environment, but is your fancy report really accessible to all? Let s explore some simple enhancements that the fourth maintenance release of SAS® 9.4 made to Output Delivery System (ODS) layout and the Report Writing Interface that will truly empower you to accommodate people who use assistive technology. ODS now provides the tools for you to meet Section 508 compliance and to create an engaging experience for all who consume your reports.
Daniel OConnor, SAS
When a large and important project with a strict deadline hits your desk, it's easy to revert to those tried-and-true SAS® programming techniques that have been successful for you in the past. In fact, trying to learn new techniques at such a time can prove to be distracting and a waste of precious time. However, the lull after a project's completion is the perfect time to reassess your approach and see whether there are any new features added to the SAS arsenal since the last time you looked that could be of great use the next time around. Such a post-project post-mortem has provided me with the opportunity to learn about several new features that will prove to be hugely valuable in the next release of my project. For example: 1) The PRESENV option and procedure 2) Fuzzy matching with the COMPGED function 3) The ODS POWERPOINT statement 4) SAS® Enterprise Guide® enhancements, including copying and pasting process flows and the SAS Macro Variable Viewer
Lisa Horwitz, SAS
In this paper, a SAS® macro is introduced that can search and replace any string in a SAS program. To use the macro, the user needs only to pass the search string to a folder. If the user wants to use the replacement function, the user also needs to pass the replacement string. The macro checks all of the SAS programs in the folder and subfolders to find out which files contain the search string. The macro generates new SAS files for replacements so that the old files are not affected. An HTML report is generated by the macro to include the original file locations, the line numbers of the SAS code that contain the search string, and the SAS code with search strings highlighted in yellow. If you use the replacement function, the HTML report also includes the location information for the new SAS files. The location information in the HTML report is created with hyperlinks so that the user can directly open the files from the report.
Ting Sa, Cincinnati Children's Hospital Medical Center
We have a lot of chances to use time-to-event (survival) analysis, especially in the biomedical and pharmaceutical fields. SAS® provides the LIFETEST procedure to calculate Kaplan-Meier estimates for survival function and to delineate a survival plot. The PHREG procedure is used in Cox regression models to estimate the effect of predictors in hazard rates. Programs with ODS tables that are defined by PROC LIFETEST and PROC PHREG can provide more statistical information from the generated data sets. This paper provides a macro that uses PROC LIFETEST and PROC PHREG with ODS. It helps users to have a survival plot with estimates that include the subject at risk, events and total subject number, survival rate with median and 95% confidence interval, and hazard ratio estimates with 95% confidence interval. Some of these estimates are optional in the macro, so users can select what they need to display in the output. (Subject at risk and event and subject number are not optional.) Users can also specify the tick marks in the X-axis and subject at risk table, for example, every 10 or 20 units. The macro dynamic calculates the maximum for the X-axis and uses the interval that the user specified. Finally, the macro uses ODS and can be output in any document files, including JPG, PDF, and RTF formats.
Chia-Ling Wu, University of Southern California
SAS® functions provide amazing power to your DATA step programming. Some of these functions are essential some of them help you avoid writing volumes of unnecessary code. This talk covers some of the most useful SAS functions. Some of these functions might be new to you, and they will change the way you program and approach common programming tasks. The majority of the functions described in this talk work with character data. There are functions that search for strings, and others that can find and replace strings or join strings together. Still others can measure the spelling distance between two strings (useful for 'fuzzy' matching). Some of the newest and most amazing functions are not functions at all, but call routines. Did you know that you can sort values within an observation? Did you know that not only can you identify the largest or smallest value in a list of variables, but you can identify the second- or third- or nth-largest or smallest value? A knowledge of the functions described here will make you a much better SAS programmer.
Ron Cody, Camp Verde Associates
The SQL procedure has a number of powerful and elegant language features for SQL users. This hands-on workshop emphasizes highly valuable and widely usable advanced programming techniques that will help users of Base SAS® harness the power of PROC SQL. Topics include using PROC SQL to identify FIRST.row, LAST.row, and Between.rows in BY-group processing; constructing and searching the contents of a value-list macro variable for a specific value; data validation operations using various integrity constraints; data summary operations to process down rows and across columns; and using the MSGLEVEL= system option and _METHOD SQL option to capture vital processing and the algorithm selected and used by the optimizer when processing a query.
Kirk Paul Lafler, Software Intelligence Corporation
Significant growth of the Internet has created an enormous volume of unstructured text data. In recent years, the amount of this type of data that is available for analysis has exploded. While the amount of textual data is increasing rapidly, an ability to obtain key pieces of information from such data in a fast, flexible, and efficient way is still posing challenges. This paper introduces SAS® Contextual Analysis In-Database Scoring for Hadoop, which integrates SAS® Contextual Analysis with the SAS® Embedded Process. SAS® Contextual Analysis enables users to customize their text analytics models in order to realize the value of their text-based data. The SAS® Embedded Process enables users to take advantage of SAS® Scoring Accelerator for Hadoop to run scoring models. By using these key SAS® technologies, the overall experience of analyzing unstructured text data can be greatly improved. The paper also provides guidelines and examples on how to publish and run category, concept, and sentiment models for text analytics in Hadoop.
Seung Lee, SAS
Xu Yang, SAS
Saratendu Sethi, SAS
The SAS® code looks perfect. You submit it and to your amazement, there is a problem with the CREATE TABLE statement. You need to change the table definition, ever so slightly, but how? Explicit pass-through? That's not an option. Fortunately, there are a handful of SAS options that can save the day. This presentation covers everything you need to know in order to adjust the SAS CREATE TABLE statements using SAS options. This presentation covers the following SAS options: DBCREATE_TABLE_OPTS=, POST_STMT_OPTS=, POST_TABLE_OPTS=, PRE_STMT_OPTS=, and PRE_TABLE_OPTS=. We use Hadoop and Oracle examples to show why these options can make your life easier. From there, we use real code to show you how to use them.
Jeff Bailey, SAS
There are many good validation tools for Clinical Data Interchange Standards Consortium (CDISC) Study Data Tabulation Model (SDTM) such as Pinnacle 21. However, the power and customizability of SAS® provide an effective tool for validating SDTM data sets used in clinical trials FDA submissions. This paper presents three distinct methods of using SAS to validate the transformation from Electronic Data Capture (EDC) data into CDISC SDTM format. This includes: duplicate programming, an independent SAS program used to transform EDC data with PROC COMPARE; rules checker, a SAS program to verify a specific SDTM or regulatory rules applied to SDTM SAS data sets; and transformation validation, a SAS macro used to compare EDC data and SDTM using PROC FREQ to identify outliers. The three examples illustrate the diverse approaches to applying SAS programs to catch errors in data standard compliance or identify inconsistencies that would otherwise be missed by other general purpose utilities. The stakes are high when preparing for an FDA submission. Catching errors in SDTM during validation prior to a submission can mean the difference between success or failure for a drug or medical device.
Sy Truong, Pharmacyclics
I have come up with a way to use the output of the SCAPROC procedure to produce DOT directives, which are then put through the Graphviz engine to produce a diagram. This allows the production of flowcharts of SAS® code automatically. I have enhanced the charts to also show the longest steps by run time, so even if you look at thousands of steps in a complex program, you can easily see the structure and flow of it, and where most of the time is spent, just by having a look for a few seconds. Great for documentation, benchmarking, tuning, understanding, and more.
Philip Mason, Wood Street Consultants Ltd.
In the pharmaceutical industry, the Clinical Data Interchange Standards Consortium s (CDISC) Study Data Tabulation Model (SDTM) is required by the US Food and Drug Administration (FDA) as the standard data structure for regulatory submission of clinical data. Manually mapping raw data to SDTM domains can be time consuming and error prone, considering the increasing complexity of clinical data. However, this process can be much more efficient if the raw data is collected using the Clinical Data Acquisition Standards Harmonization (CDASH) standard, allowing for the automatic conversion to the SDTM data structure. This paper introduces a macro that can automatically create a SAS® program for each SDTM domain (for example, dm.sas for Demography [DM]), that maps CDASH data to SDTM data. The macro compares the attributes of CDASH raw data sets with SDTM domains to generate SAS code that performs the mapping. Each SAS program does the following: 1) sets up variables and assigns their proper order; 2) converts date and time to ISO8601 standard; 3) converts numeric variables to character variables; and 4) transposes the data sets from wide to long for the Findings and Events domains. This macro, which sets up the basic frame of SDTM mapping, can minimize the manual work for SAS programmers or, in some cases, completely handle some simple domains without any further modifications. This greatly increases the efficiency and speed of the SDTM conversion process.
Hao Xu, McDougall Scientific
Hong Chen, McDougall Scientific
Let's walk through an example of communicating from the SAS® client to SAS® Viya . The demonstration focuses on how to use SAS® language to establish a session, transport and persist data, and receive results. Learn how to establish communication with SAS Viya. Explore topics such as: What is a session? How do I make requests? What does my SAS log tell me? Get a deeper understanding of data location on the client and the server side. Learn about applying existing user formats, how to get listings or reports, and how to query sessions, data, and properties.
Denise Poll, SAS
It's essential that SAS® users enhance their skills to implement best-practice programming techniques when using Base SAS® software. This presentation illustrates core concepts with examples to ensure that code is readable, clearly written, understandable, structured, portable, and maintainable. Attendees learn how to apply good programming techniques including implementing naming conventions for data sets, variables, programs, and libraries; code appearance and structure using modular design, logic scenarios, controlled loops, subroutines and embedded control flow; code compatibility and portability across applications and operating platforms; developing readable code and program documentation; applying statements, options, and definitions to achieve the greatest advantage in the program environment; and implementing program generality into code to enable its continued operation with little or no modifications.
Kirk Paul Lafler, Software Intelligence Corporation
Nearly every SAS® program includes logic that causes certain code to be executed only when specific conditions are met. This is commonly done using the IF-THEN/ELSE syntax. This paper explores various ways to construct conditional SAS logic, some of which might provide advantages over the IF statement in certain situations. Topics include the SELECT statement, the IFC and IFN functions, and the CHOOSE and WHICH families of functions, as well as some more esoteric methods. We also discuss the intricacies of the subsetting IF statement and explain the difference between a regular IF and the %IF macro statement.
Joshua Horstman, Nested Loop Consulting
Soon after the advent of the SAS® hash object in SAS®9, its early adopters realized that its potential functionality is much broader than merely using its fast table lookup capability for file matching. This is because in reality, the hash object is a versatile data storage structure with a roster of standard table operations such as create, drop, insert, delete, clear, search, retrieve, update, order, and enumerate. Since it is memory-resident and its key-access operations execute in O(1) time, it runs them as fast as or faster than other canned SAS techniques, with the added bonus of not having to code around their inherent limitations. Another advantage of the hash object as compared to the methods that had existed before its implementation is its dynamic, run-time nature and the ability to handle I/O all by itself, independently of the intrinsic statements of a DATA step or DS2 program calling its methods. The hash object operations, or their combination thereof, lend themselves to diverse SAS programming functionalities well beyond the original focus on data search and retrieval. In this paper, which can be thought of as a preview of a SAS book being written by the authors, we aim to present this logical connection using the power of example.
Paul Dorfman, Dorfman Consulting
Don Henderson, Henderson Consulting Services, LLC
The SAS® Macro Language gives you the power to create tools that, to a large extent, think for themselves. How often have you used a macro that required your input, and you thought to yourself, Why do I need to provide this information when SAS® already knows it? SAS might already know most of this information, but how does SAS direct your macro programs to self-discern the information that they need? Fortunately, there are a number of functions and tools in SAS that can intelligently enable your programs to find and use the information that they require. If you provide a variable name, SAS should know its type and length. If you provide a data set name, SAS should know its list of variables. If you provide a library or libref, SAS should know the full list of data sets that it contains. In each of these situations, functions can be used by the macro language to determine and return information. By providing a libref, functions can determine the library's physical location and the list of data sets it contains. By providing a data set, they can return the names and attributes of any of the variables that it contains. These functions can read and write data, create directories, build lists of files in a folder, and build lists of folders. Maximize your macro's intelligence; learn and use these functions.
Art Carpenter, California Occidental Consultants
In the pharmaceutical industry, we find ourselves having to re-run our programs repeatedly for each deliverable. These programs can be run individually in an interactive SAS® session, which enables us to review the logs as we execute the programs. We could run the individual programs in batch and open each individual log to review for unwanted log messages, such as ERROR, WARNING, uninitialized, have been converted to, and so on. Both of these approaches are fine if there are only a handful of programs to execute. But what do you do if you have hundreds of programs that need to be re-run? Do you want to open every single one of the programs and search for unwanted messages? This manual approach could take hours and is prone to accidental oversight. This paper discusses a macro that searches a specified directory and checks either all the logs in the directory, only logs with a specific naming convention, or only the files listed. The macro then produces a report that lists all the files checked and indicates whether issues were found.
Richann Watson, Experis
The LUA procedure is a relatively new SAS® procedure, having been available since SAS® 9.4. It allows for the Lua language to be used as an interface to SAS, as an alternative scripting language to the SAS macro facility. This paper compares and contrasts PROC LUA with the SAS macro facility, showing examples of approaches and highlighting the advantages and disadvantages of each.
Anand Vijayaraghavan, SAS
This paper shows how to use Base SAS® to create unique datetime stamps that can be used for naming external files. These filenames provide automatic versioning for systems and are intuitive and completely sortable. In addition, they provide enhanced flexibility compared to generation data sets, which can be created by SAS® or by the operating system.
Joe DeShon, Boehringer Ingelheim Animal Health
Clinical research study enrollment data consists of subject identifiers and enrollment dates that are used by investigators to monitor enrollment progress. Meeting study enrollment targets is critical to ensuring there will be enough data and end points to enable the statistical power of the study. For clinical trials that do not experience heavy, nearly daily enrollment, there will be a number of dates on which no subjects were enrolled. Therefore, plots of cumulative enrollment represented by a smoothed line can give a false impression, or imprecise reading, of study enrollment. A more accurate display would be a step function plot that would include dates where no subjects were enrolled. Rolling average plots often start with summing the data by month and creating a rolling average from the monthly sums. This session shows how to use the EXPAND procedure, along with the SQL and GPLOT procedures and the INTNX function, to create plots that display cumulative enrollment and rolling 6-month averages for each day. This includes filling in the dates with no subject enrollment and creating a rolling 6-month average for each date. This allows analysis of day-to-day variation as well as the short- and long-term impacts of changes, such as adding an enrollment center or initiatives to increase enrollment. This technique can be applied to any data that has gaps in dates. Examples include service history data and installation rates for a newly launched product.
Susan Schleede, University of Rochester
The DATA step is the familiar and powerful data processing language in SAS® and now SAS Viya . The DATA step's simple syntax provides row-at-a-time operations to edit, restructure, and combine data. New to the DATA step in SAS Viya are a varying-size character data type and parallel execution. Varying-size character data enables intuitive string operations that go beyond the 32KB limit of current DATA step operations. Parallel execution speeds the processing of big data by starting the DATA step on multiple machines and dividing data processing among threads on these machines. To avoid multi-threaded programming errors, the run-time environment for the DATA step is presented along with potential programming pitfalls. Come see how the DATA step in SAS Viya makes your data processing simpler and faster.
Jason Secosky, SAS
The SAS® 9.4 SGPLOT procedure is a great tool for creating all types of graphs, from business graphs to complex clinical graphs. The goal for such graphs is to convey the data in a simple and direct manner with minimal distractions. But often, you need to grab the attention of a reader in the midst of a sea of data and graphs. For such cases, you need a visual that can stand out above the rest of the noise. Such visuals insert a decorative flavor into the graph to attract the eye of the reader and to encourage them to spend more time studying the visual. This presentation discusses how you can create such attention-grabbing visuals using the SGPLOT procedure.
Sanjay Matange, SAS
The Analysis Data Model (ADaM) Basic Data Structure (BDS) can be used for many analysis needs. We all know that the SAS® DATA step is a very flexible and powerful tool for data processing. In fact, the DATA step is very useful in the creation of a non-trivial BDS data set. This paper walks through a series of examples showing use of the SAS DATA step when deriving rows in BDS. These examples include creating new parameters, new time points, and changes from multiple baselines.
Sandra Minjoe
Hash objects have been supported in the DATA step and in the FCMP procedure for a while, but have you ever felt that hash objects could do a little more? For example, what if you needed to store more than doubles and character strings? Introducing PROC FCMP dictionaries. Dictionaries allow you to create references not only to numeric and character data, but they also give you fast in-memory hashing to arrays, other dictionaries, and even PROC FCMP hash objects. This paper gets you started using PROC FCMP dictionaries, describes usage syntax, and explores new programming patterns that are now available to your PROC FCMP programs, functions, and subroutines in the new SAS® Viya platform environment.
Andrew Henrick, SAS
Mike Whitcher, SAS
Karen Croft, SAS
SAS® has a very efficient and powerful way to get distances between an event and a customer. Using the tables and code located at http://support.sas.com/rnd/datavisualization/mapsonline/html/geocode.html#street, you can load latitude and longitude to addresses that you have for your events and customers. Once you have the tables downloaded from SAS, and you have run the code to get them into SAS data sets, this paper helps guide you through the rest using PROC GEOCODE and the GEODIST function. This can help you determine to whom to market an event. And, you can see how far a client is from one of your facilities.
Jason O'Day, US Bank
Creating an effective style for your graphics can make the difference between clearly conveying your message to your audience and hiding your message in a sea of lines, markers, and text. A number of books explain the concepts of effective graphics, but you need an understanding of how styles work in your environment to correctly apply those principles. The goal of this paper is to give you an in-depth discussion of how styles are applied to Output Delivery System (ODS) graphics, from the ODS style level all the way down to the graph syntax. This discussion includes information about differences in grouped versus non-grouped plots, precedence order of style application, using style references, and much more. Don't forget your scuba gear!
Dan Heath, SAS
Discover how to document your SAS® programs, data sets, and catalogs with a few lines of code that include SAS functions, macro code, and SAS metadata. Do you start every project with the best of intentions to document all of your work, and then fall short of that aspiration when deadlines loom? Learn how SAS system macro variables can provide valuable information embedded in your programs, logs, lists, catalogs, data sets and ODS output; how your programs can automatically update a processing log; and how to generate two different types of codebooks.
Louise Hadden, Abt Associates
Roberta Glass, Abt Associates
Some data is best visualized in a polar orientation, particularly when the data is directional or cyclical. Although the SG procedures and Graph Template Language (GTL) do not directly support polar coordinates, they are quite capable of drawing such graphs with a little bit of data processing. We demonstrate how to convert your data from polar coordinates to Cartesian coordinates and use the power of SG procedures to create graphs that retain the polar nature of your data. Stop going around in circles: let us show you the way out with SG procedures!
Prashant Hebbar, SAS
Sanjay Matange, SAS
The Base SAS® 9.4 Output Delivery System (ODS) EPUB destination enables users to deliver SAS® reports as e-books on Apple mobile devices. ODS EPUB e-books are truly mobile you don't need an Internet connection to read them. Just install Apple's free iBooks app, and you're good to go. This paper shows you how to create an e-book with ODS EPUB and sideload it onto your Apple device. You will learn new SAS® 9.4 techniques for including text, images, audio, and video in your ODS EPUB e-books. You will understand how to customize your e-book's table of contents (TOC) so that readers can easily navigate the e-book. And you will learn how to modify the ODS EPUB style to create specialized presentation effects. This paper provides beginning to intermediate instruction for writing e-books with ODS EPUB. Please bring your iPad, iPhone, or iPod to the presentation so that you can download and read the examples.
David Kelley, SAS
Finding daylight saving time (DST) is a common task for manipulating time series data. The date of daylight saving time changes every year. If SAS® programmers depend on manually entering the value of daylight saving time in their programs, the maintenance of the program becomes tedious. Using a SAS function can make finding the value easy. This paper discusses several ways to capture and use daylight saving time.
Chao-Ying Hsieh, Southern Company Services, Inc.
This paper discusses format enumeration (via the DICTIONARY.FORMATS view) and the new FMTINFO function that gives information about a format, such as whether it is a date or currency format.
Richard Langston, SAS
The macro language is both powerful and flexible. With this power, however, comes complexity, and this complexity often makes the language more difficult to learn and use. Fortunately, one of the key elements of the macro language is its use of macro variables, and these are easy to learn and easy to use. You can create macro variables using a number of different techniques and statements. However, the five most commonly methods are not only the most useful, but also among the easiest to master. Since macro variables are used in so many ways within the macro language, learning how they are created can also serve as an excellent introduction to the language itself. These methods include: 1) the %LET statement; 2) macro parameters (named and positional); 3) the iterative %DO statement; 4) using the INTO clause in PROC SQL; and 5) using the CALL SYMPUTX routine.
Art Carpenter, California Occidental Consultants
Formats can be used for more than just making your data look nice. They can be used in memory lookup tables and to help you create data-driven code. This paper shows you how to build a format from a data set, how to write a format out as a data set, and how to use formats to make programs data driven. Examples are provided.
Anita Measey, BMO Financial Group
Lei Sun, BMO Financial Group
In the quest for valuable analytics, access to business data through message queues provides near real-time access to the entire data life cycle. This in turn enables our analytical models to perform accurately. What does the item a user temporarily put in the shopping basket indicate, and what can be done to motivate the user? How do you recover the user who has now unsubscribed, given that the user had previously unsubscribed and re-subscribed quickly? User behavior can be captured completely and efficiently using a message queue, which causes minimal load on production systems and allows for distributed environments. There are some technical issues encountered when attempting to populate a data warehouse using events from a message queue. The presentation outlines a solution to the following issues: the message queue connection, how to ensure that messages aren't lost in transit, and how to efficiently process messages with SAS®; message definition and metadata, and how to react to changes in message structure; data architecture and which data architecture is appropriate for storing message data and other business data; late arrival of messages and how late arriving data can be loaded into slowly changing dimensions; and analytical processing and how transactional message data can be reformatted for analytical modeling. Ultimately, populating a data warehouse with message queue data can require less development than accessing source databases; however a robust architecture
Bronwen Fairbairn, Collection House Group
This paper demonstrates how to use the capabilities of SAS® Data Integration Studio to extract, load, and transform your data within a Hadoop environment. Which transformations can be used in each layer of the ELT process is illustrated using a sample use case, and the functionality of each is described. The use case steps through the process from source to target.
Darryl Yewchin, SAS
Todd Foreman, SAS
Tracking gains or losses from the purchase and sale of diverse equity holdings depends in part on whether stocks sold are assumed to be from the earliest lots acquired (a first-in, first-out queue, or FIFO queue) or the latest lots acquired (a last-in, first-out queue, or LIFO queue). Other inventory tracking applications have a similar need for application of either FIFO or LIFO rules. This presentation shows how a collection of simple ordered hash objects, in combination with a hash-of-hashes, is a made-to-order technique for easy data-step implementation of FIFO, LIFO, and other less likely rules (for example, HIFO [highest-in, first-out] and LOFO [lowest-in, first-out]).
Mark Keintz, Wharton Research Data Services
Allowing SAS® users to leverage SAS prompts when running programs is a very powerful tool. Using SAS prompts makes it easier for SAS users to submit parameter-driven programs and for developers to create robust, data-driven programs. This presentation demonstrates how to create SAS prompts from SAS® Enterprise Guide® and shows how to roll them out to users so that they can take advantage of them from SAS Enterprise Guide, the SAS® Add-In for Microsoft Office, and the SAS® Stored Process Web Application.
Brian Varney, Experis
When you look at examples of the REPORT procedure, you see code that tests _BREAK_ and _RBREAK_, but you wonder what s the breakdown of the COMPUTE block? And, sometimes, you need more than one break line on a report, or you need a customized or adjusted number at the break. Everything in PROC REPORT that is advanced seems to involve a COMPUTE block. This paper provides examples of advanced PROC REPORT output that uses _BREAK_ and _RBREAK_ to customize the extra break lines that you can request with PROC REPORT. Examples include how to get custom percentages with PROC REPORT, how to get multiple break lines at the bottom of the report, how to customize break lines, and how to customize LINE statement output. This presentation is aimed at the intermediate to advanced report writer who knows some about PROC REPORT, but wants to get the breakdown of how to do more with PROC REPORT and the COMPUTE block.
Cynthia Zender, SAS
This paper shows how you can reduce the computing footprint of your SAS® applications without compromising your end products. The paper presents the 15 axioms of going green with your SAS applications. The axioms are proven, real-world techniques for reducing the computer resources used by your SAS programs. When you follow these axioms, your programs run faster, use less network bandwidth, use fewer desktop or shared server computing resources, and create more compact SAS data sets.
Michael Raithel, Westat
Many SAS® users are working across multiple platforms, commonly combining Microsoft Windows and UNIX environments. Often, SAS code developed on one platform (for example, on a PC) might not work on another platform (for example, on UNIX). Portability is not just working across multi-platform environments; it is also about making programs easier to use across projects, across companies, or across clients and vendors. This paper examines some good programming practices to address common issues that occur when you work across SAS on a PC and SAS on UNIX. They include: 1) avoid explicitly defining file paths in LIBNAME, filename, and %include statements that require platform-specific syntax such as forward slash (in UNIX) or backslash (in PC SAS); 2) avoid using X commands in SAS code to execute statements on the operating system, which works only on Windows but not on UNIX; 3) use the appropriate SAS rounding function for numeric variables to avoid different results when dealing with 64-bit operating systems and 32-bit systems. The difference between rounding before or after calculations and derivations is discussed; 4) develop portable SAS code to import or export Microsoft Excel spreadsheets across PC SAS and UNIX SAS, especially when dealing with multiple worksheets within one Excel file; and 5) use SAS® Enterprise Guide® to access and run PC SAS programs in UNIX effectively.
James Zhao, Merck & Co. Inc.
Because many SAS® users either work for or own companies that house big data, the threat that malicious software poses becomes even more extreme. Malicious software, often abbreviated as malware, includes many different classifications, ways of infection, and methods of attack. This E-Poster highlights the types of malware, detection strategies, and removal methods. It provides guidelines to secure essential assets and prevent future malware breaches.
Ryan Lafler
Would you like to be more confident in producing graphs and figures? Do you understand the differences between the OVERLAY, GRIDDED, LATTICE, DATAPANEL, and DATALATTICE layouts? Finally, would you like to learn the fundamental Graph Template Language methods in a relaxed environment that fosters questions? Great this topic is for you! In this hands-on workshop, you are guided through the fundamental aspects of the GTL procedure, and you can try fun and challenging SAS® graphics exercises to enable you to more easily retain what you have learned.
Kriss Harris
Session SAS2003-2017:
Hands-On Workshop: Macro Coding by Example
This hands-on-workshop explores the power of SAS Macro, a text substitution facility for extending and customizing SAS programs. Examples will range from simple macro variables to advanced macro programs. As a participant, you will add macro syntax to existing programs to dynamically enhance your programming experience.
Michele Ensor, SAS
Session SAS2008-2017:
Hands-On Workshop: Python and CAS Integration on SAS® Viya™
Jay Laramore, SAS
Session SAS2006-2017:
Hands-On Workshop: SAS® Visual Data Mining and Machine Learning on SAS® Viya™
This workshop provides hands-on experience with SAS Viya Data Mining and Machine Learning through the programming interface to SAS Viya. Workshop participants will learn how to start and stop a CAS session; move data into CAS; prepare data for machine learning; use SAS Studio tasks for supervised learning; and evaluate the results of analyses.
Carlos Andre Reis Pinheiro, SAS
Do you need to add annotations to your graphs? Do you need to specify your own colors on the graph? Would you like to add Unicode characters to your graph, or would you like to create templates that can also be used by non-programmers to produce the required figures? Great, then this topic is for you! In this hands-on workshop, you are guided through the more advanced features of the GTL procedure. There are also fun and challenging SAS® graphics exercises to enable you to more easily retain what you have learned.
Kriss Harris
Data processing can sometimes require complex logic to match and rank record associations across events. This paper presents an efficient solution to generating these complex associations using the DATA step and data hash objects. The solution applies to multiple business needs including subsequent purchases, repayment of loan advance, or hospital readmits. The logic demonstrates how to construct a hash process to identify a qualifying initial event and append linking information with various rank and analysis factors, through the example of a specific use case of the process.
John Schmitz, Luminare Data LLC
Heat maps use colors to communicate numeric data by varying the underlying values that represent red, green, and blue (RGB) as a linear function of the data. You can use heat maps to display spatial data, plot big data sets, and enhance tables. You can use colors on the spectrum from blue to red to show population density in a US map. In fields such as epidemiology and sociology, colors and maps are used to show spatial data, such as how rates of disease or crime vary with location. With big data sets, patterns that you would hope to see in scatter plots are hidden in dense clouds of points. In contrast, patterns in heat maps are clear, because colors are used to display the frequency of observations in each cell of the graph. Heat maps also make tables easier to interpret. For example, when displaying a correlation matrix, you can vary the background color from white to red to correspond to the absolute correlation range from 0 to 1. You can shade the cell behind a value, or you can replace the table with a shaded grid. This paper shows you how to make a variety of heat maps by using PROC SGPLOT, the Graph Template Language, and SG annotation.
Warren Kuhfeld, SAS
Another year implementing, validating, securing, optimizing, migrating, and adopting the Hadoop platform. What have been the top 10 accomplishments with Hadoop seen over the last year? We also review issues, concerns, and resolutions from the past year as well. We discuss where implementations are and some best practices for moving forward with Hadoop and SAS® releases.
Howard Plemmons, SAS
Mauro Cazzari, SAS
Investors usually trade stocks or exchange-traded funds (ETFs) based on a methodology, such as a theory, a model, or a specific chart pattern. There are more than 10,000 securities listed on the US stock market. Picking the right one based on a methodology from so many candidates is usually a big challenge. This paper presents the methodology based on the CANSLIM1 theorem and momentum trading (MT) theorem. We often hear of the cup and handle shape (C&H), double bottoms and multiple bottoms (MB), support and resistance lines (SRL), market direction (MD), fundamental analyses (FA), and technical analyses (TA). Those are all covered in CANSLIM theorem. MT is a trading theorem based on stock moving direction or momentum. Both theorems are easy to learn but difficult to apply without an appropriate tool. The brokers' application system usually cannot provide such filtering due to its complexity. For example, for C&H, where is the handle located? For the MB, where is the last bottom you should trade at? Now, the challenging task can be fulfilled through SAS®. This paper presents the methods on how to apply the logic and graphically present them though SAS. All SAS users, especially those who work directly on capital market business, can benefit from reading this document to achieve their investment goals. Much of the programming logic can also be adopted in SAS finance packages for clients.
Brian Shen, Merlin Clinical Service LLC
XML documents are becoming increasingly popular for transporting data from different operating systems. In the pharmaceutical industry, the Food and Drug Administration (FDA) requires pharmaceutical companies to submit certain types of data in XML format. This paper provides insights into XML documents and summarizes different methods of importing and exporting XML documents with SAS®, including: using the XML LIBNAME engine to translate between the XML markup and the SAS format; creating an XML Map and using the XML92 LIBNAME engine to read in XML documents and create SAS data sets; and using Clinical Data Interchange Standards Consortium (CDISC) procedures to import and export XML documents. An example of importing OpenClinica data into SAS by implementing these methods is provided.
Fei Wang, McDougall Scientific
SAS® Enterprise Guide® continues to add easy-to-use features that enable you to work more efficiently. For example, you can now debug your DATA step code with a DATA step debugger tool; upload data to SAS® Viya with a point-and-click task; control process flow execution behavior when an error occurs; export results to Microsoft Excel and Microsoft PowerPoint destinations with the click of a button; zoom views; filter the data grid with your own WHERE clause; easily define case-insensitive filters; and automatically get the latest product updates. Come see these and more new features and enhancements in SAS Enterprise Guide 7.11, 7.12, and 7.13.
Casey Smith, SAS
Do you need to create a format instantly? Does the format have a lot of labels, and it would take a long time to type in all the codes and labels by hand? Sometimes, a SAS® programmer needs to create a user-defined format for hundreds or thousands of codes, and he needs an easy way to accomplish this without having to type in all of the codes. SAS provides a way to create a user-defined format without having to type in any codes. If the codes and labels are in a text file, SAS data set, Excel file, or in any file that can be converted to a SAS data set, then a SAS user-defined format can be created on the fly. The CNTLIN=option of PROC FORMAT allows a user to create a user-defined format or informat from raw data or from a SAS file. This paper demonstrates how to create two user-defined formats instantly from a raw text file on our Census Bureau website. It explains how to use these user-defined formats for the final report and final output data set from PROC TABULATE. The paper focuses on the CNTLIN= option of PROC FORMAT, not the CNTLOUT= option.
Christopher Boniface, U.S. Census Bureau
This paper will build on the knowledge gained in the Intro to SAS® ODS Graphics. The capabilities in ODS Graphics grow with every release as both new paradigms and smaller tweaks are introduced. After talking with the ODS developers, a selection of the many wonderful capabilities was selected. This paper will look at that selection of both types of capabilities and provide the reader with more tools for their belt. Visualization of data is an important part of telling the story seen in the data. And while the standards and defaults in ODS Graphics are very well done, sometimes the user has specific nuances for characters in the story or additional plot lines they want to incorporate. Almost any possibility, from drama to comedy to mystery, is available in ODS Graphics if you know how. We will explore tables, annotation and changing attributes, as well as the BLOCK plot. Any user of Base SAS on any platform will find great value from the SAS ODS Graphics procedures. Some experience with these procedures is assumed, but not required.
Chuck Kincaid, Experis Business Analytics
This presentation teaches the audience how to use ODS Graphics. Now part of Base SAS®, ODS Graphics are a great way to easily create clear graphics that enable any user to tell their story well. SGPLOT and SGPANEL are two of the procedures that can be used to produce powerful graphics that used to require a lot of work. The core of the procedures is explained, as well as some of the many options available. Furthermore, we explore the ways to combine the individual statements to make more complex graphics that tell the story better. Any user of Base SAS on any platform will find great value in the SAS ODS Graphics procedures.
Chuck Kincaid, Experis Business Analytics
For many years now you have learned the ins and outs of using SAS/ACCESS® software to move data into SAS® to do your analytics. With the new open, cloud-ready SAS® Viya platform comes a new set of data access technologies known as SAS data connectors and SAS data connect accelerators. This paper describes what these new data access products are and how they integrate with the SAS Viya platform. After reading this paper, you will have the foundation needed to load data from third-party data sources into SAS Viya.
Salman Maher, SAS
Chris DeHart, SAS
Barbara Kemper, SAS
As technology expands, we have the need to create programs that can be handed off to clients, to regulatory agencies, to parent companies, or to other projects, and handed off with little or no modification by the recipient. Minimizing modification by the recipient often requires the program itself to self-modify. To some extent the program must be aware of its own operating environment and what it needs to do to adapt to it. There are a great many tools available to the SAS® programmer that will allow the program to self-adjust to its own surroundings. These include location-detection routines, batch files based on folder contents, the ability to detect the version and location of SAS, programs that discern and adjust to the current operating system and the corresponding folder structure, the use of automatic and user defined environmental variables, and macro functions that use and modify system information. Need to create a portable program? We can hand you the tools.
Art Carpenter, California Occidental Consultants
Mary Rosenbloom, Alcon, a Novartis Division
When analyzing data with SAS®, we often use the SAS DATA step and the SQL procedure to explore and manipulate data. Though they both are useful tools in SAS, many SAS users do not fully understand their differences, advantages, and disadvantages and thus have numerous unnecessary biased debates on them. Therefore, this paper illustrates and discusses these aspects with real work examples, which give SAS users deep insights into using them. Using the right tool for a given circumstance not only provides an easier and more convenient solution, it also saves time and work in programming, thus improving work efficiency. Furthermore, the illustrated methods and advanced programming skills can be used in a wide variety of data analysis and business analytics fields.
Justin Jia, TransUnion
Managing your career future involves learning outside the box at all stages. The next step is not always on the path we planned as opportunities develop and must be taken when we are ready. Prepare with this paper, which explains important features of Base SAS® that support teams. In this presentation, you learn about the following: concatenating team shared folders with personal development areas; creating consistent code; guidelines for a team (not standards); knowing where the documentation will provide the basics; thinking of those who follow (a different interface); creating code for use by others; and how code can learn about the SAS environment.
Peter Crawford, Crawford Software Consultancy Limited
Programming for others involves new disciplines not called for when we write to provide results. There are many additional facilities in the languages of SAS® to ensure the processes and programs you provide for others will please your customers. Not all are obvious and some seem hidden. The never-ending search to please your friends, colleagues, and customers could start in this presentation.
Peter Crawford, Crawford Software Consultancy Limited
Making sure that you have saved all the necessary information to replicate a deliverable can be a cumbersome task. You want to make sure that all the raw data sets and all the derived data sets, whether they are Study Data Tabulation Model (SDTM) data sets or Analysis Data Model (ADaM) data sets, are saved. You prefer that the date/time stamps are preserved. Not only do you need the data sets, you also need to keep a copy of all programs that were used to produce the deliverable, as well as the corresponding logs from when the programs were executed. Any other information that was needed to produce the necessary outputs also needs to be saved. You must do all of this for each deliverable, and it can be easy to overlook a step or some key information. Most people do this process manually. It can be a time-consuming process, so why not let SAS® do the work for you?
Richann Watson, Experis
String externalization is the key to making your SAS® applications speak multiple languages, even if you can't. Using the new features in SAS® 9.3 for internationalization, your SAS applications can be written to adapt to whatever environment they are found in. String externalization is the process of identifying and separating translatable strings from your SAS program. This paper outlines the four steps of string externalization: create a Microsoft Excel spreadsheet for messages (optional), create SMD files, convert SMD files, and create the final SAS data set. Furthermore, it briefly shows you a real-world project on applying the concept. Using the Excel spreadsheet message text approach, professional translators can work more efficiently translating text in a friendlier and more comfortable environment. Subsequently, a programmer can also fully concentrate on developing and maintaining SAS code when your application is traveling to a new country.
Lihsin Hwang, Statistics Canada
The days of comparing paper copies of graphs on light boxes are long gone, but the problems associated with validating graphical reports still remain. Many recent graphs created using SAS/GRAPH® software include annotations, which complicate an already complex problem. In ODS Graphics, only a single input data set should be used. Because annotation can be more easily added by overlaying an additional graph layer, it is now more practical to use that single input data set for validation, which removes all of the scaling, platform, and font issues that got in the way before. This paper guides you through the techniques to simplify validation while you are creating your perfect graph.
Philip Holland, Holland Numerics
No Batch Scheduler? No problem! This paper describes the use of a SAS® Data Integration Studio job that can be started by a time-dependent scheduler like Windows Scheduler (or crontab in UNIX) to mimic a batch scheduler using SAS® Grid Manager.
Patrick Cuba, Cuba BI Consulting
Data has to be a moderate size when you estimate parameters with machine learning. For data with huge numbers of records like healthcare big data (for example, receipt data) or super multi-dimensional data like genome big data, it is important to follow a procedure in which data is cleaned first and then selection of data or variables for modeling is performed. Big data often consists of macroscopic and microscopic groups. With these groups, it is possible to increase the accuracy of estimation by following the above procedure in which data is cleaned from a macro perspective and the selection of data or variables for modeling is performed from a micro perspective. This kind of step-wise procedure can be expected to help reduce bias. We also propose a new analysis algorithm with N-stage machine learning. For simplicity, we assume N =2. Note that different machine learning approaches should be applied; that is, a random forest method is used at the first stage for data cleaning, and an elastic net method is used for the selection of data or variables. For programming N-stage machine learning, we use the LUA procedure that is not only efficient, but also enables an easily readable iteration algorithm to be developed. Note that we use well-known machine learning methods that are implementable with SAS® 9.4, SAS® In-Memory Statistics, and so on.
Ryo Kiguchi, Shionogi & Co., LTD
Eri Sakai, Shionogi & Co., LTD
Yoshitake Kitanishi, Shionogi & Co., LTD
Akio Tsuji, Shionogi & Co., LTD
A new ODS destination for creating Microsoft Excel workbooks is available starting in the third maintenance release for SAS® 9.4. This destination creates native Microsoft Excel XLSX files, supports graphic images, and offers other advantages over the older ExcelXP tagset. In this presentation, you learn step-by-step techniques for quickly and easily creating attractive multi-sheet Excel workbooks that contain your SAS® output. The techniques can be used regardless of the platform on which SAS software is installed. You can even use them on a mainframe! Creating and delivering your workbooks on demand and in real time using SAS server technology is discussed. Using earlier versions of SAS to create multi-sheet workbooks is also discussed. Although the title is similar to previous presentations by this author, this presentation contains new and revised material not previously presented.
Vince DelGobbo, SAS
Do you create Excel files from SAS®? Do you use the ODS EXCELXP tagset or the ODS EXCEL destination? In this presentation, the EXCELXP tagset and the ODS EXCEL destination are compared face to face. There's gonna be a showdown! We give quick tips for each and show how to create Excel files for our Special Census program. Pros of each method are explored. We show the added benefits of the ODS EXCEL destination. We display how to create XML files with the EXCELXP tagset. We present how to use TAGATTR formats with the EXCELXP tagset to ensure that leading and trailing zeros in Excel are preserved. We demonstrate how to create the same Excel file with the ODS EXCEL destination with SAS formats instead of with TAGATTR formats. We show how the ODS EXCEL destination creates native Excel files. One of the drawbacks of an XML file created with the EXCELXP tagset is that a pop-up message is displayed in Excel each time you open it. We present differences using the ABSOLUTE_COLUMN_WIDTH= option in both methods.
Christopher Boniface, U.S. Census Bureau
When first learning SAS®, programmers often see the proprietary DATA step as a foreign and nonstandard concept. The introduction of the SAS® 9.4 DS2 language eases the transition for traditional programmers delving into SAS for the first time. Object Oriented Programming (OOP) has been an industry mainstay for many years, and the DS2 procedure provides an object-oriented environment for the DATA step. In this poster, we go through a business case to show how DS2 can be used to define a reusable package following object-oriented principles.
Ryan Kumpfmiller, Zencos
Maria Nicholson, Zencos
Making optimal use of SAS® Grid Computing relies on the ability to spread the workload effectively across all of the available nodes. With SAS® Scalable Performance Data Server (SPD Server), it is possible to partition your data and spread the processing across the SAS Grid Computing environment. In an ideal world it would be possible to adjust the size and number of partitions according to the data volumes being processed on any given day. This paper discusses a technique that enables the processing performed in the SAS Grid Computing environment to be dynamically reconfigured, automatically at run time, to optimize the use of SAS Grid Computing, and to provide significant performance benefits.
Andy Knight, SAS
Real workflow dependencies exist when the completion or output of one data process is a prerequisite for subsequent data processes. For example, in extract, transform, load (ETL) systems, the extract must precede the transform and the transform must precede the load. This serialization is common in SAS® data analytic development but should be implemented only when actual dependencies exist. A false dependency, by contrast, exists when the workflow itself does not require serialization but is coded in a manner that forces a process to wait unnecessarily for some unrelated process to complete. For example, an ETL system might extract, transform, and load one data set, and then extract, transform, and load a second data set, causing processing of the second data set to wait unnecessarily for the first to complete. This hands-on session demonstrates three common patterns of false dependencies, teaching SAS practitioners how to recognize and remedy false dependencies through parallel processing paradigms. Groups of participants are pitted against each other, as the class simultaneously runs both serialized software and distributed software that runs in parallel. Participants execute exercises in unison, and then watch their machines race to the finish as the tremendous performance advantages of parallel processing are demonstrated in one exercise after another--ideal for anyone seeking to walk away with proven techniques that can measurably increase your performance and bonus.
Troy Hughes, Datmesis Analytics
JavaScript Object Notation (JSON) has quickly become the de facto standard for data transfer on the Internet due to an increase in web data and the usage of full-stack JavaScript. JSON has become dominant in the emerging technologies of the web today, such as in the Internet of Things and in the mobile cloud. JSON offers a light and flexible format for data transfer. It can be processed directly from JavaScript without the need for an external parser. This paper discusses several abilities within SAS® to process JSON files, the new JSON LIBNAME, and several procedures. This paper compares all of these in detail.
John Kennedy, Mesa Digital LLC
Organizations that create and store personally identifiable information (PII) are often required to de-identify sensitive data to protect an individual s privacy. There are multiple methods in SAS® that can be used to de-identify PII depending on data types and encryption needs. The first method is to apply crosswalk mapping by linking a data set with PII to a secured data set that contains the PII and its corresponding surrogate. Then, the surrogate replaces the PII in the original data set. A second method is SAS encryption, which involves translating PII into an encrypted string using SAS functions. This could be a one-byte-to-one-byte swap or a one-byte-to-two-byte swap. The third method is in-database encryption, which encrypts the PII in a data warehouse, such as Oracle and Teradata, using SAS tools before any information is imported into SAS for users to see. This paper discusses the advantages and disadvantages of these three methods, provides sample SAS code, and describes the corresponding methods to decrypt the encrypted data.
Shuhua Liang, Kaiser Permanente
Zoe Bider-Canfield, Kaiser Permanente
This paper compiles information from documents produced by the U.S. Food and Drug Administration (FDA), the Clinical Data Interchange Standards Consortium (CDISC), and Computational Sciences Symposium (CSS) workgroups to identify what analysis data and other documentation is to be included in submissions and where it all needs to go. It not only describes requirements, but also includes recommendations for things that aren't so cut-and-dried. It focuses on the New Drug Application (NDA) submissions and a subset of Biologic License Application (BLA) submissions that are covered by the FDA binding guidance documents. Where applicable, SAS® tools are described and examples given.
Sandra Minjoe
John Troxell, Accenture Accelerated R&D Services
Face it your data can occasionally contain characters that wreak havoc on your macro code. Characters such as the ampersand in at&t, or the apostrophe in McDonald's, for example. This paper is designed for programmers who know most of the ins and outs of SAS® macro code already. Now let's take your macro skills a step farther by adding to your skill set, specifically, %BQUOTE, %STR, %NRSTR, and %SUPERQ. What is up with all these quoting functions? When do you use one over the other? And why would you need %UNQUOTE? The macro language is full of subtleties and nuances, and the quoting functions represent the epitome of all of this. This paper shows you in which instances you would use the different quoting functions. Specifically, we show you the difference between the compile-time and the execution-time functions. In addition to looking at the traditional quoting functions, you learn how to use %QSCAN and %QSYSFUNC among other functions that apply the regular function and quote the result.
Michelle Buchecker, ThotWave Technologies, LLC.
Conferences for SAS® programming are replete with the newest software capabilities and clever programming techniques. However, discussion about quality control (QC) is lacking. QC is fundamental to ensuring both correct results and sound interpretation of data. It is not industry specific, and it simply makes sense. Most QC procedures are a function of regulatory requirements, industry standards, and corporate philosophies. Good QC goes well beyond just reviewing results, and should also consider the underlying data. It should be driven by a thoughtful consideration of relevance and impact. While programmers strive to produce correct results, it is no wonder that programming mistakes are common despite rigid QC processes in an industry where expedited deliverables and a lean workforce are the norm. This leads to a lack of trust in team members and an overall increase in resource requirements as these errors are corrected, particularly when SAS programming is outsourced. Is it possible to produce results with a high degree of accuracy, even when time and budget are limited? Thorough QC is easy to overlook in a high-pressure environment with increased expectations of workload and expedited deliverables. Does this suggest that QC programming is becoming a lost art, or does it simply suggest that we need to evolve with technology? The focus of the presentation is to review the who, what, when, how, why, and where of QC programming implementation.
Amber Randall, Axio Research
Bill Coar, Axio Research
SQL is a universal language that allows you to access data stored in relational databases or tables. This hands-on workshop presents core concepts and features of using PROC SQL to access data stored in relational database tables. Attendees learn how to define, access, and manipulate data from one or more tables using PROC SQL quickly and easily. Numerous code examples are presented on how to construct simple queries, subset data, produce simple and effective output, join two tables, summarize data with summary functions, construct BY-groups, identify FIRST. and LAST. rows, and create and use virtual tables.
Kirk Paul Lafler, Software Intelligence Corporation
SAS® Enterprise Guide® empowers organizations, programmers, business analysts, statisticians, and end users with all the capabilities that SAS has to offer. This hands-on workshop presents the SAS Enterprise Guide graphical user interface (GUI). It covers access to multi-platform enterprise data sources, various data manipulation techniques that do not require you to learn complex coding constructs, built-in wizards for performing reporting and analytical tasks, the delivery of data and results to a variety of mediums and outlets, and support for data management and documentation requirements. Attendees learn how to use the graphical user interface to access SAS® data sets and tab-delimited and Microsoft Excel input files; to subset and summarize data; to join (or merge) two tables together; to flexibly export results to HTML, PDF, and Excel; and to visually manage projects using flow diagrams.
Kirk Paul Lafler, Software Intelligence Corporation
Ryan Lafler
The United States Access Board will soon refresh the Section 508 accessibility standards. The new requirements are based on Web Content Accessibility Guidelines (WCAG) 2.0 and include a total of 38 testable success criteria-16 more than the current requirements. Is your organization ready? Don't worry, the fourth maintenance release for SAS® 9.4 Output Delivery System (ODS) HTML5 destination has you covered. This paper describes the new accessibility features in the ODS HTML5 destination, explains how to use them, and shows you how to test your output for compliance with the new Section 508 standards.
Glen Walker, SAS
The intrepid Mars Rovers have inspired awe and curiosity and dreams of mapping Mars using SAS/GRAPH® software. This presentation demonstrates how to import Esri shapefile (SHP) data (using the MAPIMPORT procedure) from sources other than SAS® and GfK GeoMarketing map data to produce useful (and sometimes creative) maps. Examples include mapping neighborhoods, ZCTA5 areas, postal codes, and of course, Mars. Products used are Base SAS® and SAS/GRAPH®. SAS programmers of any skill level will benefit from this presentation.
Louise Hadden, Abt Associates
We live in a world of data; small data, big data, and data in every conceivable size between small and big. In today's world, data finds its way into our lives wherever we are. We talk about data, create data, read data, transmit data, receive data, and save data constantly during any given hour in a day, and we still want and need more. So, we collect even more data at work, in meetings, at home, on our smartphones, in emails, in voice messages, sifting through financial reports, analyzing profits and losses, watching streaming videos, playing computer games, comparing sports teams and favorite players, and countless other ways. Data is growing and being collected at such astounding rates, all in the hope of being able to better understand the world around us. As SAS® professionals, the world of data offers many new and exciting opportunities, but it also presents a frightening realization that data sources might very well contain a host of integrity issues that need to be resolved first. This presentation describes the available methods to remove duplicate observations (or rows) from data sets (or tables) based on the row's values and keys using SAS.
Kirk Paul Lafler, Software Intelligence Corporation
If you've got an iPhone, you might have noticed that the Health app is hard at work collecting data on every step you take. And, of course, the data scientist inside you is itching to analyze that data with SAS®. This paper and an accompanying E-Poster show you how to get step data out of your iPhone Health app and into SAS. Once it's there, you can have at it with all things SAS. In this presentation, we show you how a (what else?) step plot can be used to visualize the 73,000+ steps the author took at SAS® Global Forum 2016.
Ted Conway, Self
There are so many ways for SAS/ACCESS® users to read and write data from and to Microsoft Excel files: SAS® PC Files Server, XLS and XLSX engines, the SAS IMPORT and EXPORT procedures, various Excel file formats (.xls, .xlsx, .xlsb, .xlsm), and more. Many users ask, 'Which is best for me?' This paper explores the requirements and limitations of each engine, along with performance considerations and some of the not-so-obvious things to consider. It also includes a brief analogous discussion on Microsoft Access databases, which share some of the same mechanisms.
Joe Schluter, SAS
Henry Feldman, SAS
SAS® has an amazing arsenal of tools for using and displaying geographic information that are relatively unknown and underused. High-quality GfK GeoMarketing maps have been provided by SAS since the second maintenance release for SAS® 9.3, as sources for inexpensive map data dried up. SAS has been including both GfK and traditional SAS map data sets with licenses for SAS/GRAPH® software for some time, recognizing there will need to be an extended transitional period. However, for those of us who have been putting off converting our SAS/GRAPH mapping programs to use the new GfK maps, the time has come, as the traditional SAS map data sets are no longer being updated. If you visit SAS® Maps Online, you can find only GfK maps in current maps. The GfK maps are updated once a year. This presentation walk through the conversion of a long-standing SAS program to produce multiple US maps for a data compendium to take advantage of GfK maps. Products used are Base SAS® and SAS/GRAPH®. SAS programmers of any skill level will benefit from this presentation.
Louise Hadden, Abt Associates
Session 1232-2017:
SAS® Abbreviations: Shortcuts for Remembering Complicated Syntax
One of the many difficulties for a SAS® programmer is remembering how to accurately use SAS syntax, especially syntax that includes many parameters. Not mastering basic syntax parameters definitely makes coding inefficient because the programmer has to check reference manuals constantly to ensure that syntax is correct. One of the more useful but somewhat unknown tools in SAS is the use of SAS abbreviations. This feature enables users to store text strings (such as the syntax of a DATA step function, a SAS procedure, or a complete DATA step) in a user-defined and easy-to-remember abbreviation. When this abbreviation is entered in the enhanced editor, SAS automatically brings up the corresponding stored syntax. Knowing how to use SAS abbreviations is beneficial to programmers with varying levels of SAS expertise. In this paper, various examples of using SAS abbreviations are demonstrated.
Yaorui Liu, USC
SAS® Data Integration Studio jobs are not always linear. While Loop transformations have been part of SAS Data Integration Studio for ages, only more recently has SAS Data Integration Studio included the Conditional Control transformations to control logic flow within a job. This paper demonstrates the use of both the Loop and Conditional transformations in a real world example.
Harry Droogendyk, Stratia Consulting Inc
The hash object provides an efficient method for quick data storage and data retrieval. Using a common set of lookup keys, hash objects can be used to retrieve data, store data, merge or join tables of data, and split a single table into multiple tables. This paper explains what a hash object is and why you should use hash objects, and provides basic programming instructions associated with the construction and use of hash objects in a DATA step.
Dari Mazloom, USAA
The SAS® macro language provides a powerful tool to write a program once and reuse it many times in multiple places. A repeatedly executed section of a program can be wrapped into a macro, which can then be shared among many users. A practical example of a macro can be a utility that takes in a set of input parameters, performs some calculations, and sends back a result (such as an interest calculator). In general, a macro modularizes a program into smaller and more manageable sections, and encapsulates repetitive tasks into re-usable code. Modularization can help the code to be tested independently. This paper provides an introduction to writing macros. It introduces the user to the basic macro constructs and statements. This paper covers the following advanced macro subjects: 1) using multiple &s to retrieve/resolve the value of a macro variable; 2) creating a macro variable from the value of another macro variable; 3) handling special characters; 4) the EXECUTE statement to pass a DATA step variable to a macro; 5) using the Execute statement to invoke a macro; and 6) using %RETURN to return a variable from a macro.
Dari Mazloom, USAA
The fourth maintenance release for SAS® 9.4 and the new SAS® Viya platform bring even more progress with respect to the interoperability between SAS® and Hadoop the industry standard for big data. This talk brings you up-to-date with where we are: more distributions, more data types, more options and then there is the cloud. Come and learn about the exciting new developments for blending your SAS processing with your shared Hadoop cluster.
Paul Kent, SAS
The SAS® platform with Unicode's UTF-8 encoding is ready to help you tackle the challenges of dealing with data in multiple languages. In today's global economy, software needs are changing. Companies are globalizing and consolidating systems from various parts of the world. Software must be ready to handle data from social media, international web pages, and databases that have characters in many different languages. SAS makes migrating your data to Unicode a snap! This paper helps you move smoothly from your legacy SAS environment to the powerful SAS Unicode environment with UTF-8 support. Along the way, you will uncover secrets to successfully manipulate your characters, so that all of your data remains intact.
Elizabeth Bales, SAS
Wei Zheng, SAS
Imagine if you will a program, a program that loves its data, a program that loves its data to be in the same directory as the program itself. Together, in the same directory. True love. The program loves its data so much, it just refers to it by filename. No need to say what directory the data is in; it is the same directory. Now imagine that program being thrust into the world of the server. The server knows not what directory this program resides in. The server is an uncaring, but powerful, soul. Yet, when the program is executing, and the program refers to the data just by filename, the server bellows nay, no path, no data. A knight in shining armor emerges, in the form of a SAS® macro, who says lo, with the help of the SAS® Enterprise Guide® macro variable minions, I can gift you with the location of the program directory and send that with you to yon mighty server. And there was much rejoicing. Yay. This paper shows you a SAS macro that you can include in your SAS Enterprise Guide pre-code to automatically set your present working directory to the same directory where your program is saved on your UNIX or Linux operating system. This is applicable to submitting to any type of server, including a SAS Grid Server. It gives you the flexibility of moving your code and data to different locations without having to worry about modifying the code. It also helps save time by not specifying complete pathnames in your programs. And can't we all use a little more time?
Michelle Buchecker, ThotWave Technologies, LLC.
Web services are becoming more and more relied upon for serving up vast amounts of data. With such a heavy reliance on the web, and security threats increasing every day, security is a big concern. OAuth 2.0 has become a go-to way for websites to allow secure access to the services they provide. But with increased security, comes increased complexity. Accessing web services that use OAuth 2.0 is not entirely straightforward, and can cause a lot of users plenty of trouble. This paper helps clarify the basic uses of OAuth and shows how you can easily use Base SAS® to access a few of the most popular web services out there.
Joseph Henry, SAS
One often uses an iterative %DO loop to execute a section of a macro repetitively. An alternative method is to use the implicit loop in the DATA step with the EXECUTE routine to generate a series of macro calls. One of the advantages in the latter approach is eliminating the need of using indirect referencing. To better understand the use of the CALL EXECUTE routine, it is essential for programmers to understand the mechanism and the timing of macro processing to avoid programming errors. These technical issues are discussed in detail in this paper.
Arthur Li, City of Hope
Are you tired of constantly creating new emails each and every time you run a report, frantically searching for the reports, attaching said reports, and writing emails, all the while thinking there has to be a better way? Then, have I got some code to share with you! This session provides you with code to flee from your old ways of emailing data and reports. Instead, you set up your SAS® code to send an email to your recipients. The email attaches the most current files each and every time the code is run. You do not have to do anything manually after you run your SAS code. This session provides SAS programmers with instructions about how to create their own email in a macro that is based on their current reports. We demonstrate different options to customize the code to add the email body (and to change the body) and to add attachments (such as PDF and Excel). We show you an additional macro that checks whether a file exists and adds a note in the SAS log if it is missing so that you won't get a warning message. Using SAS code, you will become more efficient and effective by automating a tedious process and reducing errors in email attachments, wording, and recipient lists.
Crystal Carel, Baylor Scott & White Health
The syntax to combine SAS® data sets is simple: use the SET statement to concatenate, and use the MERGE and BY statements to merge. The data sets themselves, however, might be complex. Combining the data sets might not result in what you need. This paper reviews techniques to perform before you combine data sets, including checking for the following: common variables; common variables with different attributes; duplicate identifiers; duplicate observations; and acceptable match rates.
Christopher Bost, Independent SAS Consultant
SAS® 9.4 Graph Template Language: Reference has more than 1300 pages and hundreds of options and statements. It is no surprise that programmers sometimes experience unexpected twists and turns when using the graph template language (GTL) to draw figures. Understandably, it is easy to become frustrated when your program fails to produce the desired graphs despite your best effort. Although SAS needs to continue improving GTL, this paper offers several tricks that help overcome some of the roadblocks in graphing.
Amos Shu, AstraZeneca
Can you actually get something for nothing? With PROC SQL's subquery and remerging features, then yes, you can. When working with categorical variables, there is often a need to add flag variables based on group descriptive statistics, such as group counts and minimum and maximum values. Instead of first creating the group count or minimum or maximum values, and then merging the summarized data set with the original data set with conditional statements creating a flag variable, why not take advantage of PROC SQL to complete three steps in one? With PROC SQL's subquery, CASE-WHEN clause, and summary functions by the group variable, you can easily remerge the new flag variable back with the original data set.
Sunil Gupta, Cytel
Session 0846-2017:
Spawning SAS® Sleeper Cells and Calling Them into Action: SAS® University Parallel Processing
With the 2014 launch of SAS® University Edition, the reach of SAS® was greatly expanded to educators, students, researchers, non-profits, and the curious, who for the first time could use a full version of Base SAS® software for free. Because SAS University Edition allows a maximum of two CPUs, however, performance is curtailed sharply from more substantial SAS environments that can benefit from parallel and distributed processing, such as environments that implement SAS® Grid Manager, Teradata, or Hadoop solutions. Even when comparing performance of SAS University Edition against the most straightforward implementation of the SAS windowing environment, the SAS windowing environment demonstrates greater performance when run on the same computer. With parallel processing and distributed computing becoming the status quo in SAS production environments, SAS University Edition will unfortunately fall behind counterpart SAS solutions if it cannot harness parallel processing best practices and performance. To curb this disparity, this session introduces groundbreaking programmatic methods that enable commodity hardware to be networked so that multiple instances of SAS University Edition can communicate and work collectively to divide and conquer complex tasks. With parallel processing facilitated, a SAS practitioner can now harness an endless number of computers to produce blitzkrieg solutions with the SAS University Edition that rival the performance of more costly, complex infrastructure.
Troy Hughes, Datmesis Analytics
Have you ever run SAS® code with a DATA step and the results were not what you expected? Tracking down the problem can be a time-consuming task. To assist you in this common scenario, SAS® Enterprise Guide® 7.13 and beyond has a DATA step debugger tool. The simple and interactive DATA step debugger enables you to visually walk through the execution of your DATA step program. You can control the DATA step execution, view the variables, and set breakpoints to quickly identify data and logic errors. Come see the full capabilities of the new SAS Enterprise Guide DATA step debugger. You'll be squashing bugs in no time!
Joe Flynn, SAS
The SAS® LOCK statement was introduced in SAS®7 with great pomp and circumstance, as it enabled SAS® software to lock data sets exclusively. In a multiuser or networked environment, an exclusive file lock prevents other users and processes from accessing and accidentally corrupting a data set while it is in use. Moreover, because file lock status can be tested programmatically with the LOCK statement return code (&SYSLCKRC), data set accessibility can be validated before attempted access, thus preventing file access collisions and facilitating more reliable, robust software. Notwithstanding the intent of the LOCK statement, stress testing demonstrated in this session illustrates vulnerabilities in the LOCK statement that render its use inadvisable due to its inability to lock data sets reliably outside of the SAS/SHARE® environment. To overcome this limitation and enable reliable data set locking, a methodology is demonstrated that uses semaphores (flags) that indicate whether a data set is available or is in use, and mutually exclusive (mutex) semaphores that restrict data set access to a single process at one time. With Base SAS® file locking capabilities now restored, this session further demonstrates control table locking to support process synchronization and parallel processing. The LOCKSAFE macro demonstrates a busy-waiting (or spinlock) design that tests data set availability repeatedly until file access is achieved or the process times out.
Troy Hughes, Datmesis Analytics
As a SAS® programmer, how often does it happen where you would like to submit some code but not wait around for it to finish? SAS® Studio has a way to achieve this and much more! This paper covers how to submit and execute SAS code in the background using SAS Studio. Background submit in the SAS Studio interface allows you to submit code and continue with your work. You receive a notification when it is finished, or you can even disconnect from your browser session and check the status of the submitted code later. Or you can choose to use SAS Studio to submit your code without bringing up the SAS Studio interface at all. This paper also covers the ability to use a command-line executable program that uses SAS Studio to execute SAS code in the background and generate log and result files without having to create a new SAS Studio session. These techniques make it much easier to spin up long-running jobs, while still being able to get your other work done in the meantime.
Jennifer Jeffreys-Chen, SAS
Amy Peters, SAS
Swapnil Ghan, SAS
Did you know that you could leverage the statistical power of the FREQ procedure and still be able to control the appearance of your output? Many people think they have to use procedures such as REPORT and TABULATE to be able to apply style options and control formats and headings for their output. However, if you pair PROC FREQ with a TEMPLATE procedure step, you can customize the appearance of your output and make enhancements to tables, such as adding colors and controlling headings. If you are a statistician, you know the many PROC FREQ options that produce high-level statistics. But did you also know that PROC FREQ can generate a graphical representation of those statistics? PROC FREQ can generate the graphs, and then you can use ODS Graphics and the Graph Template Language (GTL) to improve the appearance of the graphs. Written for intermediate users, this paper demonstrates how you can enhance the default output for PROC FREQ one-way and multi-way tables by modifying colors, formats, and labels. This paper also describes the syntax for creating graphs for multiple statistics, and it uses examples to show how you can customize these graphs.
Kathryn McLawhorn, SAS
In the game of tag, being it is bad, but where accessibility compliance is concerned, being tagged is good! Tagging is required for PDF files to comply with accessibility standards such as Section 508 and the Web Content Accessibility Guidelines (WCAG). In the fourth maintenance release for SAS® 9.4, the preproduction option in the ODS PDF statement, ACCESSIBLE, creates a tagged PDF file. We look at how this option changes the file that is created and focus on the SAS® programming techniques that work best with the new option. You ll then have the opportunity to try it yourself in your own code and provide feedback to SAS.
Glen Walker, SAS
In 32 years as a SAS® consultant at the Federal Reserve Board, I have seen some questions about common SAS tasks surface again and again. This paper collects the most common questions related to basic DATA step processing from my previous 'Tales from the Help Desk' papers, and provides code to explain and resolve them. The following tasks are reviewed: using the LAG function with conditional statements; avoiding character variable truncation; surrounding a macro variable with quotes in SAS code; handling missing values (arithmetic calculations versus functions); incrementing a SAS date value with the INTNX function; converting a variable from character to numeric or vice versa and keeping the same name; converting character or numeric values to SAS date values; using an array definition in multiple DATA steps; using values of a variable in a data set throughout a DATA step by copying the values into a temporary array; and writing data to multiple external files in a DATA step, determining file names dynamically from data values. In the context of discussing these tasks, the paper provides details about SAS processing that can help users employ SAS more effectively. See the references for seven previous papers that contain additional common questions.
Bruce Gilsen, Federal Reserve Board
SAS® Macro Language can be used to enhance many report-generating processes. This presentation showcases the potential that macros have in populating predesigned RTF templates. If you have multiple report templates saved, SAS® can choose and populate the correct ones based on macro programming and DATA _NULL_ using the TRANSTRN function. The autocall macro %TRIM, combined with a macro (for example, &TEMPLATE), can be attached to the output RTF template name. You can design and save as many templates as you like or need. When SAS assigns the macro variable TEMPLATE a value, the %TRIM(&TEMPLATE) statement in the output pathway correctly populates the appropriate template. This can make life easy if you create multiple different reports based on one data set. All that's required are stored templates on accessible pathways.
Patrick Leon, University of Southern California
This paper discusses a set of practical recommendations for optimizing the performance and scalability of your Hadoop system using SAS®. Topics include recommendations gleaned from actual deployments from a variety of implementations and distributions. Techniques cover tips for improving performance and working with complex Hadoop technologies such as Kerberos, techniques for improving efficiency when working with data, methods to better leverage the SAS in Hadoop components, and other recommendations. With this information, you can unlock the power of SAS in your Hadoop system.
Wilbram Hazejager, SAS
Nancy Rausch, SAS
SAS offers generation data set structure as part of the language feature that many users are familiar with. They use it in their organizations and manage it using keywords such as GENMAX and GENNUM. While SAS operates in a mainframe environment, users also have the ability to tap into the GDG (generation data group) feature available on z/OS, OS/390, OS/370, IBM 3070, or IBM 3090 machines. With cost-saving initiatives across businesses and due to some scaling factors, many organizations are in the process of migrating to mid-tier platforms to cheaper operating platforms such as UNIX and AIX. Because Linux is open source and is a cheaper alternative, several organizations have opted for the UNIX distribution of SAS that can work in UNIX and AIX environments. While this might be a viable alternative, there are certain nuances that the migration effort brings to the technical conversion teams. On UNIX, the concept of GDGs does not exist. While SAS offers generation data sets, they are good only for SAS data sets. If the business organization needs to house and operate with a GDG-like structure for text data sets, there isn't one available. While my organization had a similar initiative to migrate programs used to run the subprime mortgage analytic, incentive, and regulatory reporting, we identified the paucity of literature and research on this topic. Hence, I ended up developing the utility that addresses this need. This is a simple macro that helps us closely simulate a GDG/GDS.
Dr. Kannan Deivasigamani, HSBC
SAS® Cloud Analytic Services (CAS) is the cloud-based run-time environment for data management and analytics in SAS®. By run-time environment, we refer to the combination of hardware and software where data management and analytics take place. In a sense, CAS is just another SAS platform to do things. CAS is a platform for high-performance analytics and distributed computing. The CAS server provides data management and an analytics framework that can run in the cloud, that can act as a cloud, and that provides the best-in-class analytics that SAS is known for. This new architecture functions as a public API, allowing access from many different clients such as Lua, Python, Java, REST, and yes, even SAS. The CAS server is designed to provide user-level sessions, to share data between sessions, and to provide fault tolerance, which allows a worker node to crash without losing data and allows the user action to continue running to completion. The isolation provided to each session allows one session to crash without affecting other sessions. The concept of 'always in memory' in CAS means that an action is not aware of what the server does to allow the action to access the data. The entire file might be in memory or just pieces of the file might be mapped into memory, just in time for the action to access the data. This allows CAS tables to be loaded that are larger than the memory available across the grid. Hadoop can be used to provide data redundancy. The server is elastic and can add or remove nodes as needed. Users can specify how many nodes they want their session to use, so that the session fits their needs.
Jerry Pendergrass, SAS
SAS® provides an extensive set of graphs for different needs. But as a SAS programmer or someone who uses SAS® Visual Analytics Designer to create reports, the number of possible scenarios you have to address outnumber the available graphs. This paper demonstrates how to create your own advanced graphs by intelligently combining existing graphs. This presentation explains how you can create the following types of graphs by combining existing graphs: a line-based graph that shows a line for each group such that each line is partly solid and partly dashed to show the actual and predicted values respectively; a control chart (which is currently not available as a standard graph) that lets you show how values change within multiple upper and lower limits; a line-based graph that gives you more control over attributes (color, symbol, and so on) of specific markers to depict special conditions; a visualization that shows the user only a part of the data at a given instant, and lets him move a window to see other parts of the data; a chart that lets the user compare the data values to a specific value in a visual way to make the comparison more intuitive; and a visualization that shows the overall data and at the same time shows the detailed data for the selected category. This paper demonstrates how to use the technique of combining graphs to create such advanced charts in SAS® Visual Analytics and SAS® Graph Builder as well as by using SAS procedures like the SGRENDER procedure.
Vineet Raina, SAS
As computer technology advances, SAS® continually pursues opportunities to implement state-of-the-art systems that solve problems in data preparation and analysis faster and more efficiently. In this pursuit, we have extended the TRANSPOSE procedure to operate in a distributed fashion within both Teradata and Hadoop, using dynamically generated DS2 executed by the SAS® Embedded Process and within SAS® Viya , using its native transpose action. With its new ability to work within these environments, PROC TRANSPOSE provides you with access to its parallel processing power and produces results that are compatible with your existing SAS programs.
Scott Mebust, SAS
Have you ever had your macro code not work and you couldn't figure out why? Maybe even something as simple as %if &sysscp=WIN %then LIBNAME libref 'c:\temp'; ? This paper is designed for programmers who know %LET and can write basic macro definitions already. Now let's take your macro skills a step farther by adding to your skill set. The %IF statement can be a deceptively tricky statement due to how IF statements are processed in a DATA step and how that differs from how %IF statements are processed by the macro processor. Focus areas of this paper are: 1) emphasizing the importance of the macro facility as a code-generation facility; 2) how an IF statement in a DATA step differs from a macro %IF statement and when to use which; 3) why semicolons can be misinterpreted in an %IF statement.
Michelle Buchecker, ThotWave Technologies, LLC.
JSON is quickly becoming the industry standard for data interchanges, especially in supporting REST APIs. But until now, importing JSON content into SAS® software and leveraging it in SAS has required significant custom code. Developing that code can be laborious, requiring transcoding, manual text parsing, and creating handlers for unexpected structure changes. Fortunately, the new JSON LIBNAME engine (in the fourth maintenance release for SAS® 9.4 and later) delivers a robust, efficient method for importing JSON content into SAS data structures. This paper demonstrates several real-world examples of the JSON LIBNAME using open data APIs. The first example contrasts the traditional custom code and JSON LIBNAME approach using big data from the United Nations Comtrade Database. The two approaches are compared in terms of complexity of code, time to execute, and the resulting data structures. The same method is applied to data from Google and the US Census Bureau's APIs. Finally, to demonstrate the ability of the JSON LIBNAME to handle unexpected changes to a JSON data structure, we use the SAS JSON procedure to write a JSON file and then simulate changes to that structure to show how one JSON LIBNAME process can easily adjust the import to handle those changes.
Michael Drutar, SAS
Eric Thies, SAS
Does your job require you to create reports in Microsoft Excel on a quarterly, monthly, or even weekly basis? Are you creating all or part of these reports by hand, referencing another sheet containing rows and rows and rows of data? If so, stop! There is a better way! The new ODS destination for Excel enables you to create native Excel files directly from SAS®. Now you can include just the data you need, create great-looking tabular output, and do it all in a fraction of the time! This paper shows you how to use the REPORT procedure to create polished tables that contain formulas, colored cells, and other customized formatting. Also presented in the paper are the destination options used to create various workbook structures, such as multiple tables per worksheet. Using these techniques to automate the creation of your Excel reports will save you hours of time and frustration, enabling you to pursue other endeavors.
Jane Eslinger, SAS
You might encounter people who used SAS® long ago (perhaps in university), or people who had a very limited use of SAS in a job. Some of these people with limited knowledge and experience think that SAS is just a statistics package or just a GUI. Those that think of it as a GUI usually reference SAS® Enterprise Guide® or, if it was a really long time ago, SAS/AF® or SAS/FSP®. The reality is that the modern SAS system is a very large and complex ecosystem, with hundreds of software products and diversified tools for programmers and users. This poster provides diagrams and tables that illustrate the complexity of the SAS system from the perspective of a programmer. Diagrams and illustrations include the functional scope and operating systems in the ecosystem; different environments that program code can run in; cross-environment interactions and related tools; SAS® Grid Computing: parallel processing; how SAS can run with files in memory (the legacy SAFILE statement and big data and Hadoop); and how some code can run in-database. We end with a tabulation of the many programming languages and SQL dialects that are directly or indirectly supported within SAS. This poster should enlighten those who think that SAS is an old, dated statistics package or just a GUI.
Thomas Billings, MUFG Union Bank
Using SAS® to query relational databases can be challenging, even for seasoned SAS programmers. SAS/ACCESS® software makes it easy to directly access data on nearly any platform, but there is a lot of under-the-hood functionality that takes time to learn. Here are tips that will get you on your way fast, including understanding and mastering SQL pass-through; efficiently bulk-loading data from SAS into other databases; tuning your SQL queries; and when to use native database versus SAS functionality.
Andrew Clapson, MD Financial Management
SAS® Embedded Process enables user-written DS2 code and scoring models to run inside Hadoop. It taps into the massively parallel processing (MPP) architecture of Hadoop for scalable performance. SAS Embedded Process explores and complies with many Hadoop components. This paper explains how SAS Embedded Process interacts with existing Hadoop security technologies, such as Apache Sentry and RecordServices.
David Ghazaleh, SAS
The traditional model of SAS® source-code production is for all code to be directly written by users or indirectly written (that is, generated by user-written macros, Lua code, or with DATA steps). This model was recently extended to enable SAS macro code to operate on arbitrary text (for example, on HTML) using the STREAM procedure. In contrast, SAS includes many products that operate in the client/server environment and function as follows: 1) the user interacts with the product via a GUI to specify the processing desired; 2) the product saves the user-specifications in metadata and generates SAS source code for the target processing; 3) the source code is then run (per user directions) to perform the processing. Many of these products give users the ability to modify the generated code and/or insert their own user-written code. Also, the target code (system-generated plus optional user-written) can be exported or deployed to be run as a stored process, in batch, or in another SAS environment. In this paper, we review the SAS ecosystem contexts where source code is produced, the pros and cons of each approach, discuss why some system-generated code is inelegant, and make some suggestions for determining when to write the code manually, and when and how to use system-generated code.
Thomas Billings, MUFG Union Bank
If you run SAS® and Teradata software with default application and database client encodings, some operations with international character sets will appear to work because you are actually treating character strings as streams of bytes instead of streams of characters. As long as no one in the chain of custody tries to interpret the data as anything other than a stream of bytes, then data can sometimes flow in and out of a database without being altered, giving the appearance of correct operation. But when you need to compare a particular character to an international character, or when your data approaches the maximum size of a character field, then you will run into trouble. To correctly handle international character sets, every layer of software that touches the data needs to agree on the encoding of the data. UTF-8 encoding is a flexible way to handle many single-byte and multi-byte international character sets. To use UTF-8 encoding with international character sets, we need to configure the SAS session encoding, Teradata client encoding, and Teradata server encoding to all agree, so that they are handling UTF-8 encoded data. This paper shows you how to configure SAS and Teradata so that your applications run successfully with international characters.
Greg Otto, Teradata
Salman Maher, SAS
Austin Swift, SAS
Are you frustrated with manually setting options to control your SAS® Display Manager sessions but become daunted every time you look at all the places you can set options and window layouts? In this paper, we look at various files SAS® accesses when starting, what can (and cannot) go into them, and what takes precedence after all are executed. We also look at the SAS registry and how to programmatically change settings. By the end of the paper, you will be comfortable in knowing where to make the changes that best fit your needs.
Peter Eberhardt, Fernwood Consulting Group Inc.
Mengting Wang, Qualcomm
The ODS EXCEL destination has made sharing SAS® reports and graphs much easier. What is even more exciting is that this destination is available for use regardless of the platform. This is extremely useful when reporting is performed on remote servers. This presentation goes through the basics of using the ODS EXCEL destination and shows specific examples of how to use this in a remote environment. Examples for both SAS® on Windows and in SAS® Enterprise Guide® are provided.
Thomas Bugg, Wells Fargo Home Mortgage
JMP® integrates very nicely with SAS® software, so you can do some pretty amazing things by combining the power of JMP and SAS. You can submit some code to run something on a SAS server and bring the results back as a JMP table. Then you can do lots of things with the JMP table to analyze the data returned. This workshop shows you how to access data via SAS servers, run SAS code and bring data back to JMP, and use JMP to do many things very quickly and easily. Explore the synergies between these tools; having both is a powerful combination that far outstrips just having one, or not using them together.
Philip Mason, Wood Street Consultants Ltd.
Visualization of complex data can be a valuable tool for researchers and policy makers, and Base SAS® has powerful tools for such data exploration. In particular, SAS/GRAPH® software is a flexible tool that enables the analyst to create a wide variety of data visualizations. This paper uses SAS® to visualize complex demographic data related to the membership of a large American healthcare provider. Kaiser Permanente (KP) has demographic data about 4 million active members in Southern California. We use SAS to create a number of geographic visualizations of KP demographic data related to membership at the census-block level of detail and higher. Demographic data available from the US Census' American Community Survey (ACS) at the same level of geographic organization are also used as comparators to show how the KP membership differs from the demographics of the geographies from which it draws. In addition, we use SAS to create a number of visualizations of KP demographic data related to utilizations (inpatient and outpatient) at the medical center area level through time. As with the membership data, data available from the ACS is used as a comparator to show how patterns of KP utilizations at various medical centers compare to the demographics of the populations that these medical centers serve. The paper will be of interest to programmers learning how to use SAS to visualize data and to researchers interested in the demographics of one of the largest health care providers in the US.
Don McCarthy, Kaiser Permanente
Michael Santema, Kaiser Permanente
It has become a need-it-now world, and many managers and decision-makers need their reports and information quicker than ever before to compete. As SAS® developers, we need to acknowledge this fact and write code that gets us the results we need in seconds or minutes, rather than in hours. SAS is a great tool for extracting, transferring, and loading data, but as with any tool, it is most efficient when used in the most appropriate way. Using the SQL pass-through techniques presented in this paper can reduce run time by up to 90% by passing the processing to the database instead of moving the data back to SAS to be consumed. You can reap these benefits with only a minor increase in coding difficulty.
Jason O'Day, US Bank
The latest releases of SAS® Data Management software provide a comprehensive and integrated set of capabilities for collecting, transforming, and managing your data. The latest features in the product suite include capabilities for working with data from a wide variety of environments and types including Hadoop, cloud data sources, RDBMS, files, unstructured data, streaming, and others, and the ability to perform ETL and ELT transformations in diverse run-time environments including SAS®, database systems, Hadoop, Spark, SAS® Analytics, cloud, and data virtualization environments. There are also new capabilities for lineage, impact analysis, clustering, and other data governance features for enhancements to master data and support metadata management. This paper provides an overview of the latest features of the SAS® Data Management product suite and includes use cases and examples for leveraging product capabilities.
Nancy Rausch, SAS
Have you ever been working on a task and wondered whether there might be a SAS® function that could save you some time? Let alone, one that might be able to do the work for you? Data review and validation tasks can be time-consuming efforts. Any gain in efficiency is highly beneficial, especially if you can achieve a standard level where the data itself can drive parts of the process. The ANY and NOT functions can help alleviate some of the manual work in many tasks such as data review of variable values, data compliance, data formats, and derivation or validation of a variable's data type. The list goes on. In this poster, we cover the functions and their details and use them in an example of handling date and time data and mapping it to ISO 8601 date and time formats.
Richann Watson, Experis
Karl Miller, inVentiv Health
Session 1488-2017:
Working with Sparse Matrices in SAS®
For the past couple of years, it seems that big data has been a buzzword in the industry. We have more and more data coming in from more and more places, and it is our job to figure out how best to handle it. One way to attempt to organize data is with arrays, but what do you do when the array you are attempting to populate is so large that it cannot be handled in memory? How do you handle a large array when most of the elements are missing? This paper and presentation deals with the concept of a sparse matrix. A sparse matrix is a large array with relatively few actual elements. We address methods for handling such a construct while keeping memory, CPU, clock, and programmer time to their respective minimums.
Andrew Kuligowski, HSN
Lisa Mendez, QuintilesIMS
A SAS® program with the extension .SAS is simply a text file. This fact opens the door to many powerful results. You can read a typical SAS program into a SAS data set as a text file with a character variable, with one line of the program being one record in the data set. The program's code can be changed, and a new program can be written as a simple text file with a .SAS extension. This presentation shows an example of dynamically editing SAS code on the fly and generating statistics about SAS programs.
Peter Timusk, Statistics Canada
SAS® has many methods of doing table lookups in DATA steps: formats, arrays, hash objects, the SASMSG function, indexed data sets, and so on. Of these methods, hash objects and indexed data sets enable you to specify multiple lookup keys and to return multiple table values. Both methods can be updated dynamically in the middle of a DATA step as you obtain new information (such as reading new keys from an input file or creating new synthetic keys). Hash objects are very flexible, fast, and fairly easy to use, but they are limited by the amount of data that can be held in memory. Indexed data sets can be slower, but they are not limited by what can be held in memory. As a result, they might be your only option in some circumstances. This presentation discusses how to use an indexed data set for table lookup and how to update it dynamically using the MODIFY statement and its allies.
Jack Hamilton, Kaiser Permanente
The SAS RAKING macro, introduced in 2000, has been implemented by countless survey researchers worldwide. The authors receive messages from users who tirelessly rake survey data using all three generations of the macro. In this poster, we present the fourth generation of the macro, cleaning up remnants from the previous versions, and resolving user-reported confusion. Most important, we introduce a few helpful enhancements including: 1) An explicit indicator for trimming (or not trimming) the weight that substantially saves run time when no trimming is needed. 2) Two methods of weight trimming, AND and OR, that enable users to overcome a stubborn non-convergence. When AND is indicated, weight trimming occurs only if both (individual and global) high weight cap values are true. Conversely, weight increase occurs only if both low weight cap values are true. When OR is indicated, weight trimming occurs if either of the two (individual or global) high weight cap values is true. Conversely, weight increase occurs if either of the two low weight cap values is true. 3) Summary statistics related to the number of cases with trimmed or increased weights have been expanded. 4) We introduce parameters that enable users to use different criteria of convergence for different raking marginal variables. We anticipate that these innovations will be enthusiastically received and implemented by the survey research community.
David Izrael, Abt Associates
Michael Battaglia, Battaglia Consulting Group, LLC.
Ann Battaglia, Battaglia Consulting Group, LLC.
Sarah Ball, Abt Associates
Since databases often lack the extensive string-handling capabilities available in SAS®, SAS users are often forced to extract complex character data from the database into SAS for string manipulation. As database vendors make regular expression functionality more widely available for use in SQL, the need to move data into SAS for pattern matching, string replacement, and character extraction is necessary less often. This paper covers enough regular expression patterns to make you dangerous, demonstrates the various REGEXP SQL functions, and provides practical applications for each.
Harry Droogendyk, Stratia Consulting Inc
How often have you pulled oodles of data out of the corporate data warehouse down into SAS® for additional processing? Additional processing, sometimes thought to be unique to SAS, includes FIRST. logic, cumulative totals, lag functionality, specialized summarization, and advanced date manipulation. Using the Analytical/OLAP and Windowing functionality available in many databases (for example, Teradata and Netezza) all of this processing can be performed directly in the database without moving and reprocessing detail data unnecessarily. This presentation illustrates how to increase your coding and execution efficiency by using the database's power through your SAS environment.
Harry Droogendyk, Stratia Consulting Inc