SAS Global Forum 2017 Proceedings

The Clinical Data Interchange Standards Consortium (CDISC) encompasses a variety of standards for medical research. Amongst the several standards developed by the CDISC organization are standards for data collection (Clinical Data Acquisition Standards Harmonization CDASH), data submission (Study Data Tabulation Model SDTM), and data analysis (Analysis Data Model ADaM). These standards were originally developed with drug development in mind. Therapeutic Area User Guides (TAUGs) have been a recent focus to provide advice, examples, and explanations for collecting and submitting data for a specific disease. Non-subjects even have a way to collect data using the Associated Persons Implementation Guide (APIG). SDTM domains for medical devices were published in 2012. Interestingly, the use of device domains in the TAUGs occurs in 14 out of 18 TAUGs, providing examples of the use of various device domains. Drug-device studies also provide a contrast on adoption of CDISC standards for drug submissions versus device submissions. Adoption of SDTM in general and the seven device SDTM domains by the medical device industry has been slow. Reasons for the slow adoption are discussed in this paper.

Read the paper (PDF)

A propensity score is the probability that an individual will be assigned to a condition or group, given a set of baseline covariates when the assignment is made. For example, the type of drug treatment given to a patient in a real-world setting might be non-randomly based on the patient's age, gender, geographic location, and socioeconomic status when the drug is prescribed. Propensity scores are used in many different types of observational studies to reduce selection bias. Subjects assigned to different groups are matched based on these propensity score probabilities, rather than matched based on the values of individual covariates. Although the underlying statistical theory behind the use of propensity scores is complex, implementing propensity score matching with SAS^® is relatively straightforward. An output data set of each subject's propensity score can be generated with SAS using PROC LOGISTIC. And, a generalized SAS macro can generate optimized N:1 propensity score matching of subjects assigned to different groups using the radius method. Matching can be optimized either for the number of matches within the maximum allowable radius or by the closeness of the matches within the radius. This presentation provides the general PROC LOGISTIC syntax to generate propensity scores, provides an overview of different propensity score matching techniques, and discusses how to use the SAS macro for optimized propensity score matching using the radius method.

Read the paper (PDF)

Creating sophisticated, visually stunning reports is imperative in today s business environment, but is your fancy report really accessible to all? Let s explore some simple enhancements that the fourth maintenance release of SAS^® 9.4 made to Output Delivery System (ODS) layout and the Report Writing Interface that will truly empower you to accommodate people who use assistive technology. ODS now provides the tools for you to meet Section 508 compliance and to create an engaging experience for all who consume your reports.

Read the paper (PDF)

When a large and important project with a strict deadline hits your desk, it's easy to revert to those tried-and-true SAS^® programming techniques that have been successful for you in the past. In fact, trying to learn new techniques at such a time can prove to be distracting and a waste of precious time. However, the lull after a project's completion is the perfect time to reassess your approach and see whether there are any new features added to the SAS arsenal since the last time you looked that could be of great use the next time around. Such a post-project post-mortem has provided me with the opportunity to learn about several new features that will prove to be hugely valuable in the next release of my project. For example: 1) The PRESENV option and procedure 2) Fuzzy matching with the COMPGED function 3) The ODS POWERPOINT statement 4) SAS^® Enterprise Guide^® enhancements, including copying and pasting process flows and the SAS Macro Variable Viewer

Read the paper (PDF)

In this paper, a SAS^® macro is introduced that can search and replace any string in a SAS program. To use the macro, the user needs only to pass the search string to a folder. If the user wants to use the replacement function, the user also needs to pass the replacement string. The macro checks all of the SAS programs in the folder and subfolders to find out which files contain the search string. The macro generates new SAS files for replacements so that the old files are not affected. An HTML report is generated by the macro to include the original file locations, the line numbers of the SAS code that contain the search string, and the SAS code with search strings highlighted in yellow. If you use the replacement function, the HTML report also includes the location information for the new SAS files. The location information in the HTML report is created with hyperlinks so that the user can directly open the files from the report.

Read the paper (PDF) | View the e-poster or slides (PDF)

This paper introduces a macro that can generate the keyhole markup language (KML) files for U.S. states and counties. The generated KML files can be used directly by Google Maps to add customized state and county layers with user-defined colors and transparencies. When someone clicks on the state and county layers in Google Maps, customized information is shown. To use the macro, the user needs to prepare only a simple SAS^® input data set. The paper includes all the SAS codes for the macro and provides examples that show you how to use the macro as well as how to display the KML files in Google Maps.

Read the paper (PDF)

We have a lot of chances to use time-to-event (survival) analysis, especially in the biomedical and pharmaceutical fields. SAS^® provides the LIFETEST procedure to calculate Kaplan-Meier estimates for survival function and to delineate a survival plot. The PHREG procedure is used in Cox regression models to estimate the effect of predictors in hazard rates. Programs with ODS tables that are defined by PROC LIFETEST and PROC PHREG can provide more statistical information from the generated data sets. This paper provides a macro that uses PROC LIFETEST and PROC PHREG with ODS. It helps users to have a survival plot with estimates that include the subject at risk, events and total subject number, survival rate with median and 95% confidence interval, and hazard ratio estimates with 95% confidence interval. Some of these estimates are optional in the macro, so users can select what they need to display in the output. (Subject at risk and event and subject number are not optional.) Users can also specify the tick marks in the X-axis and subject at risk table, for example, every 10 or 20 units. The macro dynamic calculates the maximum for the X-axis and uses the interval that the user specified. Finally, the macro uses ODS and can be output in any document files, including JPG, PDF, and RTF formats.

Read the paper (PDF) | View the e-poster or slides (PDF)

The hospital Medicare readmission rate has become a key indicator for measuring the quality of health care in the US. This rate is currently used by major health-care stakeholders including the Centers for Medicare and Medicaid Services (CMS), the Agency for Healthcare Research and Quality (AHRQ), and the National Committee for Quality Assurance (NCQA) (Fan and Sarfarazi, 2014). Although many papers have been written about how to calculate readmissions, this paper provides updated code that includes ICD-10 (International Classification of Diseases) code and offers a novel and comprehensive approach using SAS^® DATA step options and PROC SQL. We discuss: 1) De-identifying patient data 2) Calculating sequential admissions 3) Subsetting criteria required to report for CMS 30-day readmissions. In addition, this papers demonstrates: 1) Using the output delivery system (ODS) to create a labeled and de-identified data set 2) Macro variables to examine data quality 3) Summary statistics for further reporting and analysis.

Read the paper (PDF) | View the e-poster or slides (PDF)

Duplicates in a clinical trial or survey database could jeopardize data quality and integrity, and they can induce biased analysis results. These complications often happen in clinical trials, meta analyses, and registry and observational studies. Common practice to identify possible duplicates involves sensitive personal information, such as name, Social Security number (SSN), date of birth, address, telephone number, etc. However, access to this sensitive information is limited. Sometimes, it is even restricted. As a measure of data quality control, a SAS^® program was developed to identify duplicated individuals using non-sensitive information, such as age, gender, race, medical history, vital signs, and laboratory measurements. A probabilistic approach was used by calculating weights for data elements used to identify duplicates based on two probabilities (probability of agreement for an element among matched pairs and probability of agreement purely by chance among non-matched pairs). For elements with categorical values, agreement was defined as matching pairs sharing the same value. For elements with interval values, agreement was defined as matching values within 1% of measurement precision range. Probabilities used to compute matching element weights were estimated using an expectation-maximization (EM) algorithm. The method was then tested on a survey and clinical trial data from hypertension studies.

View the e-poster or slides (PDF)

Visual+D2:D18ization is a critical part of turning data into knowledge. A customized graph is essential to make data visualization meaningful, powerful, and interpretable. Furthermore, customizing grouped data into a desired layout with specific requirements such as clusters, colors, symbols, and patterns for each group can be challenging. This paper provides a start-from-scratch, step-by-step solution to create a customized graph for grouped data using SAS^® Graph Template Language (GTL). By analyzing the data and target graph with the available tools and options that GTL provided, this paper demonstrates GTL is a powerful and flexible tool to create a customized, complex graph.

Read the paper (PDF)

Visualization is a critical part to turn data into knowledge. A customized graph is essential to make data visualization meaningful, powerful, and interpretable. Furthermore, customizing grouped data into a desired layout with specific requirements such as clusters, colors, symbols, and patterns for each group can be challenging. This paper provides a start-from-scratch, step-by-step solution to create a customized graph for grouped data using the Graph Template Language (GTL). From analyzing the data to creating the target graph with the tools and options that are available with GTL, this paper demonstrates GTL is a powerful and flexible tool for creating a customized, complex graph.

Read the paper (PDF)

SAS^® functions provide amazing power to your DATA step programming. Some of these functions are essential some of them help you avoid writing volumes of unnecessary code. This talk covers some of the most useful SAS functions. Some of these functions might be new to you, and they will change the way you program and approach common programming tasks. The majority of the functions described in this talk work with character data. There are functions that search for strings, and others that can find and replace strings or join strings together. Still others can measure the spelling distance between two strings (useful for 'fuzzy' matching). Some of the newest and most amazing functions are not functions at all, but call routines. Did you know that you can sort values within an observation? Did you know that not only can you identify the largest or smallest value in a list of variables, but you can identify the second- or third- or nth-largest or smallest value? A knowledge of the functions described here will make you a much better SAS programmer.

Read the paper (PDF)

SAS/ACCESS^® software grants access to data in third-party database management systems (DBMS), but how do you access data in DBMS not supported by SAS/ACCESS products? The introduction of the GROOVY procedure in SAS^® 9.3 lets you retrieve this formerly inaccessible data through a JDBC connection. Groovy is an object-oriented, dynamic programming language executed on the Java Virtual Machine (JVM). Using Microsoft Azure HDInsight as an example, this paper demonstrates how to access and read data into a SAS data set using PROC GROOVY and a JDBC connection.

Read the paper (PDF)

The SQL procedure has a number of powerful and elegant language features for SQL users. This hands-on workshop emphasizes highly valuable and widely usable advanced programming techniques that will help users of Base SAS^® harness the power of PROC SQL. Topics include using PROC SQL to identify FIRST.row, LAST.row, and Between.rows in BY-group processing; constructing and searching the contents of a value-list macro variable for a specific value; data validation operations using various integrity constraints; data summary operations to process down rows and across columns; and using the MSGLEVEL= system option and _METHOD SQL option to capture vital processing and the algorithm selected and used by the optimizer when processing a query.

Read the paper (PDF) | Download the data file (ZIP)

Technology plays an integral role in every aspect of daily life. As a result, educators should leverage technology-based learning to ensure that students are provided with authentic, engaging, and meaningful learning experiences (Pringle, Dawson, and Ritzhaupt, 2015).The significance and value of computer science understanding continue to increase. A major resource that can be credited with spreading support for computer science is the site Code.org. Its mission is to enable every student in every school to have the opportunity to learn computer science (https://code.org/about). Two years ago, our mentor partnered with Code.org to conduct workshops within the Charlotte, NC area to educate teachers on how to teach computer science activities and concepts in their classrooms. We had the opportunity to assist during the workshops to provide student perspectives and opinions. As we look back on the workshops, we wondered, How are the teachers who attended the workshops implementing the concepts they were taught? After each workshop, a survey was distributed to the attendees to receive workshop feedback and to follow up. We collected the data from the surveys sent to participants and analyzed it using SAS^® University Edition. The results of the survey concluded that the workshops were beneficial and that the educators had implemented a concept that they learned. We believe that computer science activity implementations will assist students across the curriculum.

View the e-poster or slides (PDF)

To determine whether there is a correlation between the repetitiveness of a song s lyrics and its popularity, the top 10 songs from the Billboard Hot 100 songs chart from 2006 to 2015 were collected. Song lyrics were assessed to determine the count of the top 10 words used. Word counts were used to predict the number of weeks the song was on the chart. The prediction model was analyzed to determine the quality of the model and whether word count was a significant predictor of a song s popularity. To investigate whether song lyrics are becoming more simplistic over time, several tests were performed to see whether the average word count has been changing over the years. All analysis was completed in SAS^® using various procedures.

View the e-poster or slides (PDF)

Are you tired of copying PROC FREQ or PROC MEANS output and pasting it into your tables? Do you need to produce summary tables repeatedly? Are you spending a lot of your time generating the same summary tables for different subpopulations? This paper introduces an easy-to-use macro to generate a descriptive statistics table. The table reports counts and percentages for categorical variables, and means, standard deviations, medians, and quantiles for continuous variables. For variables with missing values, the table also includes the count and percentage missing. Customization options allow for the analysis of stratified data, specification of variable output order, and user-defined formats. In addition, this macro incorporates the SAS^® Output Delivery System (ODS) to automatically produce a Rich Text Format (RTF) file, which can be further edited by a word processor for the purpose of publication.

Read the paper (PDF) | View the e-poster or slides (PDF)

UNIX and Linux SAS^® administrators, have you ever been greeted by one of these statements as you walk into the office before you have gotten your first cup of coffee? Power outage! SAS servers are down. I cannot access my reports. Have you frantically tried to restart the SAS servers to avoid loss of productivity and missed one of the steps in the process, causing further delays while other work continues to pile up? If you have had this experience, you understand the benefit to be gained from a utility that automates the management of these multi-tiered deployments. Until recently, there was no method for automatically starting and stopping multi-tiered services in an orchestrated fashion. Instead, you had to use time-consuming manual procedures to manage SAS services. These procedures were also prone to human error, which could result in corrupted services and additional time lost, debugging and resolving issues injected by this process. To address this challenge, SAS Technical Support created the SAS Local Services Management (SAS_lsm) utility, which provides automated, orderly management of your SAS^® multi-tiered deployments. The intent of this paper is to demonstrate the deployment and usage of the SAS_lsm utility. Now, go grab a coffee, and let's see how SAS_lsm can make life less chaotic.

Read the paper (PDF)

As you know, real world data (RWD) provides highly valuable and practical insights. But as valuable as RWD is, it still has limitations. It is encounter-based, and we are largely blind to what happens between encounters in the health-care system. The encounters generally occur in a clinical setting that might not reflect actual patient experience. Many of the encounters are subjective interviews, observations, or self-reports rather than objective data. Information flow can be slow (even real time is not fast enough in health care anymore). And some data that could be transformative cannot be captured currently. Select Internet of Things (IoT) data can fill the gaps in our current RWD for certain key conditions and provide missing components that are key to conducting Analytics of Healthcare Things (AoHT), such as direct, objective measurements; data collected in usual patient settings rather than artificial clinical settings; data collected continuously in a patient s setting; insights that carry greater weight in Regulatory and Payer decision-making; and insights that lead to greater commercial value. Teradata has partnered with an IoT company whose technology generates unique data for conditions impacted by mobility or activity. This data can fill important gaps and provide new insights that can help distinguish your value in your marketplace. Join us to hear details of successful pilots that have been conducted as well as ongoing case studies.

Read the paper (PDF)

Correlated data is extensively used across disciplines when modeling data with any type of correlation that might exist among observations due to clustering or repeated measurements. When modeling clustered data, hierarchical linear modeling (HLM) is a popular multilevel modeling technique that is widely used in different fields such as education and health studies (Gibson and Olejnik, 2003). A typical example of multilevel data involves students nested within classrooms that behave similarly due to shared situational factors. Ignoring their correlation might result in underestimated standard errors and inflated type-I error (Raudenbush and Bryk, 2002). When modeling longitudinal data, many studies have been conducted on continuous outcomes. However, fewer studies on discrete responses over time have been completed. These studies require models within conditional, transitional, and marginal models (Fitzmaurice et al., 2009). Examples of such models that enable researchers to account for the autocorrelation among repeated observations include generalized linear mixed model (GLMM), generalized estimating equations (GEE), alternating logistic regression (ALR), and fixed effects with conditional logit analysis. This study explores the aforementioned methods as well as several other correlated modeling options for longitudinal and hierarchical data within SAS^® 9.4 using real data sets. These procedures include PROC GLIMMIX, PROC GENMOD, PROC NLMIXED, PROC GEE, PROC PHREG, and PROC MIXED.

Read the paper (PDF)

Data from an extensive survey conducted by the National Center for Education Statistics (NCES) is used for predicting qualified secondary school teachers across public schools in the U.S. The sample data includes socioeconomic data at the county level, which is used as a predictor for hiring a qualified teacher. The resultant model is used to score other regions and is presented on a heat map of the U.S. The survey family of procedures that SAS^® offers, such as PROC SURVEYFREQ and PROC SURVEYLOGISTIC, are used in the analyses since the data involves replicate weights. In looking at residuals from a logistic regression, since all the outcomes (observed values) are either 0 or 1, the residuals do not necessarily follow the normal distribution that is so often assumed in residual analysis. Furthermore, in dealing with survey data, the weights of the observations must be accounted for, as these affect the variance of the observations. To adjust for this, rather than looking at the difference in the observed and predicted values, the difference between the expected and actual counts is calculated by using the weights on each observation, and the predicted probability from the logistic model for the observation. Three types of residuals are analyzed: Pearson, Deviance, and Studentized residuals. The purpose is to identify which type of residuals best satisfy the assumption of normality when investigating residuals from a logistic regression.

Read the paper (PDF)

Chronic obstructive pulmonary disease (COPD) is the third leading cause of death in the US. An estimated 24 million people suffer from COPD, and the medical cost associated with it stands at a whopping $36 billion. Besides the emotional and physical impact, a patient with COPD has to undergo severe economic burden to pay for the medication. Hospitals are subjected to heavy penalties for high re-admissions. Identifying the best medicine combinations to treat COPD benefits patients and hospitals. This paper deals with analyzing the effectiveness of three popular drugs prescribed for COPD patients in terms of mortality rates and re-admission within 30 days of discharge. The data from Cerner Health Facts consists of over 1 million real-world, anonymized patient records collected in a real-world health environment. Base SAS^® is used to perform statistical analysis and data processing; re-admission of patients is analyzed using a lag function. The preliminary results show a re-admission rate of 5.96% and a mortality rate of 3.3% among all patients. The odds ratios computed using logistic regression show an increased mortality rate 2.4 times more for patients using Symbicort compared to Spiriva and Advair. This paper also uses text mining of social media, drug portals, and blogs to gauge the sentiments of patients using these drugs. The results obtained through sentiment analysis are then compared with the statistical analysis to determine the effectiveness of drugs prescribed to the COPD patients.

Read the paper (PDF)

Research frequently shows that exposure to sunlight contributes to non-melanoma skin cancer. But, it also shows that sunlight might protect you against multiple sclerosis and breast, ovarian, prostate, and colon cancer. In my study, I explored whether mortality from skin cancer, myocardial infarction, atrial fibrillation, and stroke is associated with exposure to sunlight. I used SAS^® 9.4 and RStudio to conduct the entire study. I collected mortality data including cause of death in Los Angeles from 2000 to 2003. In addition, I collected sunlight data for Los Angeles for the same period. There are three types of sunlight in my data global sunlight, diffuse sunlight, and direct sunlight. Data was collected at three different times morning, middle of day, and afternoon. I used two models the Poisson time series regression model and a logistic regression model to investigate the association. I considered a one-year and two-year lag of sunlight association with the types of diseases. I adjusted for age, sex, race, education, temperature, and day of week. Results show that stroke is statistically and significantly associated with a one-year lag of sunlight (p<0.001). Previous epidemiological studies have found that sunlight exposure can ameliorate osteoporosis in stroke patients, and my study provides the protective effects of sunlight on stroke patients.

View the e-poster or slides (PDF)

There are many good validation tools for Clinical Data Interchange Standards Consortium (CDISC) Study Data Tabulation Model (SDTM) such as Pinnacle 21. However, the power and customizability of SAS^® provide an effective tool for validating SDTM data sets used in clinical trials FDA submissions. This paper presents three distinct methods of using SAS to validate the transformation from Electronic Data Capture (EDC) data into CDISC SDTM format. This includes: duplicate programming, an independent SAS program used to transform EDC data with PROC COMPARE; rules checker, a SAS program to verify a specific SDTM or regulatory rules applied to SDTM SAS data sets; and transformation validation, a SAS macro used to compare EDC data and SDTM using PROC FREQ to identify outliers. The three examples illustrate the diverse approaches to applying SAS programs to catch errors in data standard compliance or identify inconsistencies that would otherwise be missed by other general purpose utilities. The stakes are high when preparing for an FDA submission. Catching errors in SDTM during validation prior to a submission can mean the difference between success or failure for a drug or medical device.

Read the paper (PDF)

I have come up with a way to use the output of the SCAPROC procedure to produce DOT directives, which are then put through the Graphviz engine to produce a diagram. This allows the production of flowcharts of SAS^® code automatically. I have enhanced the charts to also show the longest steps by run time, so even if you look at thousands of steps in a complex program, you can easily see the structure and flow of it, and where most of the time is spent, just by having a look for a few seconds. Great for documentation, benchmarking, tuning, understanding, and more.

Read the paper (PDF)

In the pharmaceutical industry, the Clinical Data Interchange Standards Consortium s (CDISC) Study Data Tabulation Model (SDTM) is required by the US Food and Drug Administration (FDA) as the standard data structure for regulatory submission of clinical data. Manually mapping raw data to SDTM domains can be time consuming and error prone, considering the increasing complexity of clinical data. However, this process can be much more efficient if the raw data is collected using the Clinical Data Acquisition Standards Harmonization (CDASH) standard, allowing for the automatic conversion to the SDTM data structure. This paper introduces a macro that can automatically create a SAS^® program for each SDTM domain (for example, dm.sas for Demography [DM]), that maps CDASH data to SDTM data. The macro compares the attributes of CDASH raw data sets with SDTM domains to generate SAS code that performs the mapping. Each SAS program does the following: 1) sets up variables and assigns their proper order; 2) converts date and time to ISO8601 standard; 3) converts numeric variables to character variables; and 4) transposes the data sets from wide to long for the Findings and Events domains. This macro, which sets up the basic frame of SDTM mapping, can minimize the manual work for SAS programmers or, in some cases, completely handle some simple domains without any further modifications. This greatly increases the efficiency and speed of the SDTM conversion process.

Read the paper (PDF)

It's essential that SAS^® users enhance their skills to implement best-practice programming techniques when using Base SAS^® software. This presentation illustrates core concepts with examples to ensure that code is readable, clearly written, understandable, structured, portable, and maintainable. Attendees learn how to apply good programming techniques including implementing naming conventions for data sets, variables, programs, and libraries; code appearance and structure using modular design, logic scenarios, controlled loops, subroutines and embedded control flow; code compatibility and portability across applications and operating platforms; developing readable code and program documentation; applying statements, options, and definitions to achieve the greatest advantage in the program environment; and implementing program generality into code to enable its continued operation with little or no modifications.

Read the paper (PDF)

Nearly every SAS^® program includes logic that causes certain code to be executed only when specific conditions are met. This is commonly done using the IF-THEN/ELSE syntax. This paper explores various ways to construct conditional SAS logic, some of which might provide advantages over the IF statement in certain situations. Topics include the SELECT statement, the IFC and IFN functions, and the CHOOSE and WHICH families of functions, as well as some more esoteric methods. We also discuss the intricacies of the subsetting IF statement and explain the difference between a regular IF and the %IF macro statement.

Read the paper (PDF)

Soon after the advent of the SAS^® hash object in SAS^®9, its early adopters realized that its potential functionality is much broader than merely using its fast table lookup capability for file matching. This is because in reality, the hash object is a versatile data storage structure with a roster of standard table operations such as create, drop, insert, delete, clear, search, retrieve, update, order, and enumerate. Since it is memory-resident and its key-access operations execute in O(1) time, it runs them as fast as or faster than other canned SAS techniques, with the added bonus of not having to code around their inherent limitations. Another advantage of the hash object as compared to the methods that had existed before its implementation is its dynamic, run-time nature and the ability to handle I/O all by itself, independently of the intrinsic statements of a DATA step or DS2 program calling its methods. The hash object operations, or their combination thereof, lend themselves to diverse SAS programming functionalities well beyond the original focus on data search and retrieval. In this paper, which can be thought of as a preview of a SAS book being written by the authors, we aim to present this logical connection using the power of example.

Read the paper (PDF)

The SAS^® Macro Language gives you the power to create tools that, to a large extent, think for themselves. How often have you used a macro that required your input, and you thought to yourself, Why do I need to provide this information when SAS^® already knows it? SAS might already know most of this information, but how does SAS direct your macro programs to self-discern the information that they need? Fortunately, there are a number of functions and tools in SAS that can intelligently enable your programs to find and use the information that they require. If you provide a variable name, SAS should know its type and length. If you provide a data set name, SAS should know its list of variables. If you provide a library or libref, SAS should know the full list of data sets that it contains. In each of these situations, functions can be used by the macro language to determine and return information. By providing a libref, functions can determine the library's physical location and the list of data sets it contains. By providing a data set, they can return the names and attributes of any of the variables that it contains. These functions can read and write data, create directories, build lists of files in a folder, and build lists of folders. Maximize your macro's intelligence; learn and use these functions.

Read the paper (PDF)

SAS^® Output Delivery System (ODS) Graphics started appearing in SAS^® 9.2. Collectively these new tools were referred to as 'ODS Graphics,' 'SG Graphics' and 'Statistical Graphics'. When first starting to use these tools, the traditional SAS/GRAPH^® software user might come upon some very significant challenges in learning the new way to do things. This is further complicated by the lack of simple demonstrations of capabilities. Most graphs in training materials and publications are rather complicated graphs that, while useful, are not good teaching examples for starting purposes. This paper contains many examples of very simple ways to get very simple things accomplished. Many different graphs are developed using only a few lines of code each, using data from the SASHELP data sets. The use of the SGPLOT, SGPANEL, and SGSCATTER procedures are shown. In addition, the paper addresses those situations in which the user must alternatively use a combination of the TEMPLATE and SGRENDER procedures to accomplish the task at hand. Most importantly, the use of the 'ODS Graphics Designer' as a teaching tool and a generator of sample graphs and code are covered. This tool makes use of the TEMPLATE and SGRENDER Procedures, generating Graphics Template Language (GTL) code. Users get extremely productive fast. The emphasis in this paper is the simplicity of the learning process. Users will be able to take the generated code and run it immediately on their personal machines.

Read the paper (PDF) | View the e-poster or slides (PDF)

In the pharmaceutical industry, we find ourselves having to re-run our programs repeatedly for each deliverable. These programs can be run individually in an interactive SAS^® session, which enables us to review the logs as we execute the programs. We could run the individual programs in batch and open each individual log to review for unwanted log messages, such as ERROR, WARNING, uninitialized, have been converted to, and so on. Both of these approaches are fine if there are only a handful of programs to execute. But what do you do if you have hundreds of programs that need to be re-run? Do you want to open every single one of the programs and search for unwanted messages? This manual approach could take hours and is prone to accidental oversight. This paper discusses a macro that searches a specified directory and checks either all the logs in the directory, only logs with a specific naming convention, or only the files listed. The macro then produces a report that lists all the files checked and indicates whether issues were found.

Read the paper (PDF)

The LUA procedure is a relatively new SAS^® procedure, having been available since SAS^® 9.4. It allows for the Lua language to be used as an interface to SAS, as an alternative scripting language to the SAS macro facility. This paper compares and contrasts PROC LUA with the SAS macro facility, showing examples of approaches and highlighting the advantages and disadvantages of each.

Read the paper (PDF)

Epidemiologists and other health scientists are often tasked with solving health problems but find collecting original data prohibitive for a multitude of reasons. For this reason, it is common to instead use secondary data such as that from emergency departments (ED) or inpatient hospital stays. In order to use some of these secondary data sets to study problems over time, it is necessary to link them together using common identifiers and still keep all the unique information about each ED visit or hospitalization. This paper discusses a method that was used to combine five years worth of individual ED visits and five years worth of individual hospitalizations to create a single and (much) larger data set for longitudinal analysis.

Read the paper (PDF)

This paper shows how to use Base SAS^® to create unique datetime stamps that can be used for naming external files. These filenames provide automatic versioning for systems and are intuitive and completely sortable. In addition, they provide enhanced flexibility compared to generation data sets, which can be created by SAS^® or by the operating system.

Read the paper (PDF)

Clinical research study enrollment data consists of subject identifiers and enrollment dates that are used by investigators to monitor enrollment progress. Meeting study enrollment targets is critical to ensuring there will be enough data and end points to enable the statistical power of the study. For clinical trials that do not experience heavy, nearly daily enrollment, there will be a number of dates on which no subjects were enrolled. Therefore, plots of cumulative enrollment represented by a smoothed line can give a false impression, or imprecise reading, of study enrollment. A more accurate display would be a step function plot that would include dates where no subjects were enrolled. Rolling average plots often start with summing the data by month and creating a rolling average from the monthly sums. This session shows how to use the EXPAND procedure, along with the SQL and GPLOT procedures and the INTNX function, to create plots that display cumulative enrollment and rolling 6-month averages for each day. This includes filling in the dates with no subject enrollment and creating a rolling 6-month average for each date. This allows analysis of day-to-day variation as well as the short- and long-term impacts of changes, such as adding an enrollment center or initiatives to increase enrollment. This technique can be applied to any data that has gaps in dates. Examples include service history data and installation rates for a newly launched product.

Read the paper (PDF)

Computer and video games are complex these days. Events in video games are in some cases recorded automatically in text files, creating a history or story of game play. There are countable items in these event records that can be used as data for statistics and other types of modeling. This E-Poster shows you how to statistically analyze text files for video game events using SAS^®. Two games are analyzed. EVE Online, a massive multi-user online role-playing spaceship game, is one. The other game is Ingress, a cell phone game that combines exercise with a GPS and real-world environments. In both examples, the techniques involve parsing large amounts of text data to examine recurring patterns in text that describe events in the game play.

View the e-poster or slides (PDF)

We've learned a great deal about how to develop great reports and about business intelligence (BI) tools and how to use them to create reports, but have we figured out how to create true BI reports? Not every report that comes out of a BI tool provides business intelligence! In pursuit of the perfect BI report, this paper explores how we can combine the best of lessons learned about developing and running traditional reports and about applying business analytics in order to create true BI reports that deliver integrated analytics and intelligence.

Read the paper (PDF)

The DATA step is the familiar and powerful data processing language in SAS^® and now SAS Viya . The DATA step's simple syntax provides row-at-a-time operations to edit, restructure, and combine data. New to the DATA step in SAS Viya are a varying-size character data type and parallel execution. Varying-size character data enables intuitive string operations that go beyond the 32KB limit of current DATA step operations. Parallel execution speeds the processing of big data by starting the DATA step on multiple machines and dividing data processing among threads on these machines. To avoid multi-threaded programming errors, the run-time environment for the DATA step is presented along with potential programming pitfalls. Come see how the DATA step in SAS Viya makes your data processing simpler and faster.

Read the paper (PDF)

The DATA Step has served SAS^® programmers well over the years. Although the DATA step is handy, the new, exciting, and powerful DS2 provides a significant alternative to the DATA step by introducing an object-oriented programming environment. It enables users to effectively manipulate complex data and efficiently manage the programming through additional data types, programming structure elements, user-defined methods, and shareable packages, as well as providing threaded execution. This tutorial was developed based on our experiences with getting started with DS2 and learning to use it to access, manage, and share data in a scalable and standards-based way. It facilitates SAS users of all levels to easily get started with DS2 and understand its basic functionality by practicing how to use the features of DS2.

Read the paper (PDF) | Download the data file (ZIP)

Data validation plays a key role as an organization engages in a data governance initiative. Better data leads to better decisions. This applies to public schools as well as business entities. Each Local Educational Agency (LEA) in Pennsylvania reports children with disabilities to the Pennsylvania Department of Education (PDE) in compliance with IDEA (Individuals with Disabilities Education Act). PDE provides a Comparison Report to each LEA to assist in their data validation process. This Comparison Report provides counts of various categories for the most recent and previous year. LEAs use the Comparison Report to validate data submitted to PDE. This paper discusses how the Base SAS^® SORT procedure and MERGE statement extract hidden information behind the counts to assist LEAs in their data validation process.

Read the paper (PDF)

Microsoft SharePoint is a popular web application framework and platform that is widely used for content and document management by companies and organizations. Connecting SAS^® with SharePoint combines the power of these two into one. As a continuation of my SAS^® Global Forum Paper 11520-2016 titled Releasing the Power of SAS^® into Microsoft SharePoint, this paper expands on how to implement data visualization from SAS to SharePoint. This paper shows users how to use SAS/GRAPH^® software procedures, Output Delivery System (ODS), and emails to create and send visualization output files from SAS to SharePoint Document Library. Several SAS code examples are included to show how to create tables, bar charts (with PROC GCHART), line plots (with PROC SGPLOT) and maps (with PROC GMAP) from SAS to SharePoint. The paper also demonstrates how to create data visualization based on JavaScript by feeding SAS data into HTML pages on SharePoint. A couple of examples on how to export SAS data to JSON formats and create data visualization in SharePoint based on JavaScript are provided.

Read the paper (PDF)

The SAS^® 9.4 SGPLOT procedure is a great tool for creating all types of graphs, from business graphs to complex clinical graphs. The goal for such graphs is to convey the data in a simple and direct manner with minimal distractions. But often, you need to grab the attention of a reader in the midst of a sea of data and graphs. For such cases, you need a visual that can stand out above the rest of the noise. Such visuals insert a decorative flavor into the graph to attract the eye of the reader and to encourage them to spend more time studying the visual. This presentation discusses how you can create such attention-grabbing visuals using the SGPLOT procedure.

Read the paper (PDF)

The Analysis Data Model (ADaM) Basic Data Structure (BDS) can be used for many analysis needs. We all know that the SAS^® DATA step is a very flexible and powerful tool for data processing. In fact, the DATA step is very useful in the creation of a non-trivial BDS data set. This paper walks through a series of examples showing use of the SAS DATA step when deriving rows in BDS. These examples include creating new parameters, new time points, and changes from multiple baselines.

Read the paper (PDF)

For all healthcare systems, considerable attention and resources are directed at gauging and improving patient satisfaction. Dignity Health has made considerable efforts in improving most areas of patient satisfaction. However, improving metrics around physician interaction with patients has been challenging. Failure to improve these publicly reported scores can result in reimbursement penalties, damage to Dignity's brand and an increased risk of patient harm. One possible way to improve these scores is to better identify the physicians that present the best opportunity for positive change. Currently, the survey tool mandated by the Centers for Medicare and Medicaid Services (CMS), the Hospital Consumer Assessment of Healthcare Providers and Systems (HCAHPS), has three questions centered on patient experience with providers, specifically concerning listening, respect, and clarity of conversation. For purposes of relating patient satisfaction scores to physicians, Dignity Health has assigned scores based on the attending physician at discharge. By conducting a manual record review, it was determined that this method rarely corresponds to the manual review (PPV = 20.7%, 95% CI: 9.9% -38.4%). Using a variety of SAS^® tools and predictive modeling programs, we developed a logistic regression model that had better agreement with chart abstractors (PPV = 75.9%, 95% CI: 57.9% - 87.8%). By attributing providers based on this predictive model, opportunities for improvement can be more accurately targeted, resulting in improved patient satisfaction and outcomes while protecting fiscal health.

Read the paper (PDF)

SAS^® has a very efficient and powerful way to get distances between an event and a customer. Using the tables and code located at http://support.sas.com/rnd/datavisualization/mapsonline/html/geocode.html#street, you can load latitude and longitude to addresses that you have for your events and customers. Once you have the tables downloaded from SAS, and you have run the code to get them into SAS data sets, this paper helps guide you through the rest using PROC GEOCODE and the GEODIST function. This can help you determine to whom to market an event. And, you can see how far a client is from one of your facilities.

Read the paper (PDF) | View the e-poster or slides (PDF)

Creating an effective style for your graphics can make the difference between clearly conveying your message to your audience and hiding your message in a sea of lines, markers, and text. A number of books explain the concepts of effective graphics, but you need an understanding of how styles work in your environment to correctly apply those principles. The goal of this paper is to give you an in-depth discussion of how styles are applied to Output Delivery System (ODS) graphics, from the ODS style level all the way down to the graph syntax. This discussion includes information about differences in grouped versus non-grouped plots, precedence order of style application, using style references, and much more. Don't forget your scuba gear!

Read the paper (PDF)

Are you prepared if a disaster happens? If your company relies on SAS^® applications to stay in business, you should have a Disaster Recovery Plan (DRP) in place. By a DRP, we mean documentation of the process to recover and protect your SAS infrastructure (SAS binaries, the operating system that is tuned to run your SAS applications, and all the pertinent data that the SAS applications require) in the event of a disaster. This paper discusses what needs to be in this plan to ensure that your SAS infrastructure not only works after it is recovered, but is able to be maintained on the recovery hardware infrastructure.

Read the paper (PDF)

Discover how to document your SAS^® programs, data sets, and catalogs with a few lines of code that include SAS functions, macro code, and SAS metadata. Do you start every project with the best of intentions to document all of your work, and then fall short of that aspiration when deadlines loom? Learn how SAS system macro variables can provide valuable information embedded in your programs, logs, lists, catalogs, data sets and ODS output; how your programs can automatically update a processing log; and how to generate two different types of codebooks.

Read the paper (PDF) | View the e-poster or slides (PDF)

Learn neat new (and not so new) methods for joining text, graphs, and tables in a single document. This paper shows how you can create a report that includes all three with a single solution: SAS^®. The text portion is taken from a Microsoft Word document and joined with output from the GPLOT and REPORT procedures.

Read the paper (PDF)

Some data is best visualized in a polar orientation, particularly when the data is directional or cyclical. Although the SG procedures and Graph Template Language (GTL) do not directly support polar coordinates, they are quite capable of drawing such graphs with a little bit of data processing. We demonstrate how to convert your data from polar coordinates to Cartesian coordinates and use the power of SG procedures to create graphs that retain the polar nature of your data. Stop going around in circles: let us show you the way out with SG procedures!

Read the paper (PDF)

The Base SAS^® 9.4 Output Delivery System (ODS) EPUB destination enables users to deliver SAS^® reports as e-books on Apple mobile devices. ODS EPUB e-books are truly mobile you don't need an Internet connection to read them. Just install Apple's free iBooks app, and you're good to go. This paper shows you how to create an e-book with ODS EPUB and sideload it onto your Apple device. You will learn new SAS^® 9.4 techniques for including text, images, audio, and video in your ODS EPUB e-books. You will understand how to customize your e-book's table of contents (TOC) so that readers can easily navigate the e-book. And you will learn how to modify the ODS EPUB style to create specialized presentation effects. This paper provides beginning to intermediate instruction for writing e-books with ODS EPUB. Please bring your iPad, iPhone, or iPod to the presentation so that you can download and read the examples.

Read the paper (PDF)

The DOCUMENT procedure is a little known procedure that can save you vast amounts of time and effort when managing the output of your SAS^® programming efforts. This procedure is deeply associated with the mechanism by which SAS controls output in the Output Delivery System (ODS). Have you ever wished you didn't have to modify and rerun the report-generating program every time there was some tweak in the desired report? PROC DOCUMENT enables you to store one version of the report as an ODS Document Object and then call it out in many different output forms, such as PDF, HTML, listing, RTF, and so on, without rerunning the code. Have you ever wished you could extract those pages of the output that apply to certain BY variables such as State, StudentName, or CarModel? With PROC DOCUMENT, you have where capabilities to extract these. Do you want to customize the table of contents that assorted SAS procedures produce when you make frames for the table of contents with HTML, or use the facilities available for PDF? PROC DOCUMENT enables you to get to the inner workings of ODS and manipulate them. This paper addresses PROC DOCUMENT from the viewpoint of end results, rather than provide a complete technical review of how to do the task at hand. The emphasis is on the benefits of using the procedure, not on detailed mechanics.

Read the paper (PDF) | View the e-poster or slides (PDF)

Finding daylight saving time (DST) is a common task for manipulating time series data. The date of daylight saving time changes every year. If SAS^® programmers depend on manually entering the value of daylight saving time in their programs, the maintenance of the program becomes tedious. Using a SAS function can make finding the value easy. This paper discusses several ways to capture and use daylight saving time.

Read the paper (PDF)

U.S. stock exchanges (currently there are 12) are tracked in real time via the Consolidated Trade System (CTS) and the Consolidated Quote System (CQS). CQS contains every updated quote from each of these exchanges, covering some 8,500 stock tickers. It provides the basis by which brokers can honor their fiduciary obligation to investors to execute transactions at the best price, that is, at the National Best Bid or Best Offer (NBBO). With the advent of electronic exchanges and high-frequency trading (timestamps are published to the nanosecond), data set size (approaching 1 billion quotes requiring 80 gigabytes of storage for a normal trading day) has become a major operational consideration for market behavior researchers re-creating NBBO values from quotes. This presentation demonstrates a straightforward use of hash tables for tracking constantly changing quotes for each ticker/exchange combination to provide the NBBO for each ticker at each time point in the trading day.

This paper discusses format enumeration (via the DICTIONARY.FORMATS view) and the new FMTINFO function that gives information about a format, such as whether it is a date or currency format.

Read the paper (PDF)

The macro language is both powerful and flexible. With this power, however, comes complexity, and this complexity often makes the language more difficult to learn and use. Fortunately, one of the key elements of the macro language is its use of macro variables, and these are easy to learn and easy to use. You can create macro variables using a number of different techniques and statements. However, the five most commonly methods are not only the most useful, but also among the easiest to master. Since macro variables are used in so many ways within the macro language, learning how they are created can also serve as an excellent introduction to the language itself. These methods include: 1) the %LET statement; 2) macro parameters (named and positional); 3) the iterative %DO statement; 4) using the INTO clause in PROC SQL; and 5) using the CALL SYMPUTX routine.

Read the paper (PDF) | Download the data file (ZIP)

In fantasy football, it is a relatively common strategy to rotate which team's defense a player uses based on some combination of favorable/unfavorable player matchups, recent performance, and projection of expected points. However, there is danger in this strategy because defensive scoring volatility is high, and any team has the possibility of turning in a statistically bad performance in a given week. This paper uses data mining techniques to identify which National Football League (NFL) teams give up high numbers of defensive penalties on a week-to-week basis, and to what degree those high-penalty games correlate with poor team defensive fantasy scores. Examining penalty count and penalty yards allowed totals, we can narrow down which teams are consistently hurt by poor technique and find correlation between games with high penalty totals to their respective fantasy football score. By doing so, we seek to find which teams should be avoided in fantasy football due to their likelihood of poor performance.

Formats can be used for more than just making your data look nice. They can be used in memory lookup tables and to help you create data-driven code. This paper shows you how to build a format from a data set, how to write a format out as a data set, and how to use formats to make programs data driven. Examples are provided.

Read the paper (PDF)

Innovation in teaching and assessment has become critical for many reasons. This is especially true in the fields of data science and big data analytics. Reasons range from the need to significantly improve the development of soft skills (as reported in an e-skills UK and SAS^® joint report from November 2014), to the rapidly changing software standards of products used by students, to the rapidly increasing range of functionality and product set, to the need to develop lifelong learning skills to learn new software and functionality. And, this is just a few of the reasons. In some educational institutions, it is easy to be extremely innovative. However, in many institutions and countries, there are numerous constraints on the levels of innovation that can be implemented. This presentation captures the author's developing pedagogic practice at the University of Derby. He suggests fundamental changes to the classic approaches to teaching and assessing data science and big data analytics. These changes have resulted in significant improvement in student engagement and achievements and students soft skills. Improvements are illustrated by innovations in teaching SAS to first-year students and teaching IBM Bluemix and Watson Analytics to final-year students. Students have successfully developed both technical and soft skills and experienced excellent levels of achievement.

Read the paper (PDF)

In the quest for valuable analytics, access to business data through message queues provides near real-time access to the entire data life cycle. This in turn enables our analytical models to perform accurately. What does the item a user temporarily put in the shopping basket indicate, and what can be done to motivate the user? How do you recover the user who has now unsubscribed, given that the user had previously unsubscribed and re-subscribed quickly? User behavior can be captured completely and efficiently using a message queue, which causes minimal load on production systems and allows for distributed environments. There are some technical issues encountered when attempting to populate a data warehouse using events from a message queue. The presentation outlines a solution to the following issues: the message queue connection, how to ensure that messages aren't lost in transit, and how to efficiently process messages with SAS^®; message definition and metadata, and how to react to changes in message structure; data architecture and which data architecture is appropriate for storing message data and other business data; late arrival of messages and how late arriving data can be loaded into slowly changing dimensions; and analytical processing and how transactional message data can be reformatted for analytical modeling. Ultimately, populating a data warehouse with message queue data can require less development than accessing source databases; however a robust architecture

Read the paper (PDF)

Having crossed the spectrum from an epidemiologist and researcher (where ad hoc is a way of life and where research is the main focus) to a SAS^® programmer (writing reusable code for automation and batch jobs, which require no manual interventions), I have learned a few things that I wish I had known as a researcher. These things would not only have helped me to be a better SAS programmer, but they also would have saved me time and effort as a researcher by enabling me to have well-organized, accurate code (that I didn't accidentally remove) and code that would work when I ran it again on another date. This poster presents five SAS tips that are common practice among SAS programmers. I provide researchers who use SAS with tips that are handy and useful, and I provide code (where applicable) that they can try out at home. Using the tips provided will make any SAS programmer smile when they are presented with your code (not guaranteed, but your results should not vary by using these tips).

View the e-poster or slides (PDF)

Tracking gains or losses from the purchase and sale of diverse equity holdings depends in part on whether stocks sold are assumed to be from the earliest lots acquired (a first-in, first-out queue, or FIFO queue) or the latest lots acquired (a last-in, first-out queue, or LIFO queue). Other inventory tracking applications have a similar need for application of either FIFO or LIFO rules. This presentation shows how a collection of simple ordered hash objects, in combination with a hash-of-hashes, is a made-to-order technique for easy data-step implementation of FIFO, LIFO, and other less likely rules (for example, HIFO [highest-in, first-out] and LOFO [lowest-in, first-out]).

Read the paper (PDF)

When creating statistical models that include multiple covariates (for example, Cox proportional hazards models or multiple linear regression), it is important to address which variables are categorical and continuous for proper analysis and interpretation in SAS^®. Categorical variables, regardless of SAS data type, should be added in the MODEL statement with an additional CLASS statement. In larger models containing many continuous or categorical variables, it is easy to overlook variables that should be added to the CLASS statement. To solve this problem, we have created a macro that uses simple input from the model variables, with PROC CONTENTS and additional logic checks, to create the necessary CLASS statement and to run the desired model. With this macro, variables are evaluated on multiple conditions to see whether they should be considered class variables. Then, they are added automatically to the CLASS statement.

Read the paper (PDF) | View the e-poster or slides (PDF)

When you look at examples of the REPORT procedure, you see code that tests _BREAK_ and _RBREAK_, but you wonder what s the breakdown of the COMPUTE block? And, sometimes, you need more than one break line on a report, or you need a customized or adjusted number at the break. Everything in PROC REPORT that is advanced seems to involve a COMPUTE block. This paper provides examples of advanced PROC REPORT output that uses _BREAK_ and _RBREAK_ to customize the extra break lines that you can request with PROC REPORT. Examples include how to get custom percentages with PROC REPORT, how to get multiple break lines at the bottom of the report, how to customize break lines, and how to customize LINE statement output. This presentation is aimed at the intermediate to advanced report writer who knows some about PROC REPORT, but wants to get the breakdown of how to do more with PROC REPORT and the COMPUTE block.

Read the paper (PDF) | Download the data file (ZIP)

This paper shows how you can reduce the computing footprint of your SAS^® applications without compromising your end products. The paper presents the 15 axioms of going green with your SAS applications. The axioms are proven, real-world techniques for reducing the computer resources used by your SAS programs. When you follow these axioms, your programs run faster, use less network bandwidth, use fewer desktop or shared server computing resources, and create more compact SAS data sets.

Read the paper (PDF)

Many SAS^® users are working across multiple platforms, commonly combining Microsoft Windows and UNIX environments. Often, SAS code developed on one platform (for example, on a PC) might not work on another platform (for example, on UNIX). Portability is not just working across multi-platform environments; it is also about making programs easier to use across projects, across companies, or across clients and vendors. This paper examines some good programming practices to address common issues that occur when you work across SAS on a PC and SAS on UNIX. They include: 1) avoid explicitly defining file paths in LIBNAME, filename, and %include statements that require platform-specific syntax such as forward slash (in UNIX) or backslash (in PC SAS); 2) avoid using X commands in SAS code to execute statements on the operating system, which works only on Windows but not on UNIX; 3) use the appropriate SAS rounding function for numeric variables to avoid different results when dealing with 64-bit operating systems and 32-bit systems. The difference between rounding before or after calculations and derivations is discussed; 4) develop portable SAS code to import or export Microsoft Excel spreadsheets across PC SAS and UNIX SAS, especially when dealing with multiple worksheets within one Excel file; and 5) use SAS^® Enterprise Guide^® to access and run PC SAS programs in UNIX effectively.

Read the paper (PDF)

Because many SAS^® users either work for or own companies that house big data, the threat that malicious software poses becomes even more extreme. Malicious software, often abbreviated as malware, includes many different classifications, ways of infection, and methods of attack. This E-Poster highlights the types of malware, detection strategies, and removal methods. It provides guidelines to secure essential assets and prevent future malware breaches.

Read the paper (PDF) | View the e-poster or slides (PDF)

Would you like to be more confident in producing graphs and figures? Do you understand the differences between the OVERLAY, GRIDDED, LATTICE, DATAPANEL, and DATALATTICE layouts? Finally, would you like to learn the fundamental Graph Template Language methods in a relaxed environment that fosters questions? Great this topic is for you! In this hands-on workshop, you are guided through the fundamental aspects of the GTL procedure, and you can try fun and challenging SAS^® graphics exercises to enable you to more easily retain what you have learned.

Read the paper (PDF) | Download the data file (ZIP)

This hands-on-workshop explores the power of SAS Macro, a text substitution facility for extending and customizing SAS programs. Examples will range from simple macro variables to advanced macro programs. As a participant, you will add macro syntax to existing programs to dynamically enhance your programming experience.

Do you need to add annotations to your graphs? Do you need to specify your own colors on the graph? Would you like to add Unicode characters to your graph, or would you like to create templates that can also be used by non-programmers to produce the required figures? Great, then this topic is for you! In this hands-on workshop, you are guided through the more advanced features of the GTL procedure. There are also fun and challenging SAS^® graphics exercises to enable you to more easily retain what you have learned.

Read the paper (PDF) | Download the data file (ZIP)

When modeling time series data, we often use a LAG of the dependent variable. The LAG function works great for this, until you try to predict out into the future and need the model's predicted value from one record as an independent value for a future record. This paper examines exactly how the LAG function works, and explains why it doesn't in this case. It also explains how to create a hash object that will accomplish a LAG of any value, how to load the initial data, how to add predicted values to the hash object, and how to extract those values when needed as an independent variable for future observations.

Read the paper (PDF)

Data processing can sometimes require complex logic to match and rank record associations across events. This paper presents an efficient solution to generating these complex associations using the DATA step and data hash objects. The solution applies to multiple business needs including subsequent purchases, repayment of loan advance, or hospital readmits. The logic demonstrates how to construct a hash process to identify a qualifying initial event and append linking information with various rank and analysis factors, through the example of a specific use case of the process.

Read the paper (PDF) | Download the data file (ZIP)

Secondary use of administrative claims data, EHRs and EMRs, registry data, and other data sources within the health data ecosystem provide rich opportunity and potential to study topics ranging from public health surveillance to comparative effectiveness research. Data sourced from individual sites can be limited in their scope, coverage, and statistical power. Sharing and pooling data from multiple sites and sources, however, present administrative, governance, analytic, and patient-privacy challenges. Distributed data networks represent a paradigm shift in health-care data sharing. They have evolved at a critical time when big data and patient privacy are often competing priorities. A distributed data network is one that has no central repository of data. Data reside behind the firewall of each data-contributing partner in a network. Each partner transforms its source data in accordance with a common data model and allows indirect access to data through a standard query approach using flexibly designed informatics tools. This presentation discusses how distributed data networks have matured to make important contributions to the health-care data ecosystem and the evolving Learning Healthcare System. The presentation focuses on 1) the distributed data network and its purpose, concept, guiding principles, and benefits. 2) Common data models and their concepts, designs, and benefits. 3) Analytic tool development and its design and implementation considerations. 4) Analytic chal

Read the paper (PDF)

Heat maps use colors to communicate numeric data by varying the underlying values that represent red, green, and blue (RGB) as a linear function of the data. You can use heat maps to display spatial data, plot big data sets, and enhance tables. You can use colors on the spectrum from blue to red to show population density in a US map. In fields such as epidemiology and sociology, colors and maps are used to show spatial data, such as how rates of disease or crime vary with location. With big data sets, patterns that you would hope to see in scatter plots are hidden in dense clouds of points. In contrast, patterns in heat maps are clear, because colors are used to display the frequency of observations in each cell of the graph. Heat maps also make tables easier to interpret. For example, when displaying a correlation matrix, you can vary the background color from white to red to correspond to the absolute correlation range from 0 to 1. You can shade the cell behind a value, or you can replace the table with a shaded grid. This paper shows you how to make a variety of heat maps by using PROC SGPLOT, the Graph Template Language, and SG annotation.

Read the paper (PDF)

This panel discusses a wide range of topics related to analytics in higher education. Panelists are from diverse institutions and represent academic research, information technology, and institutional research. Challenges related to data acquisition and quality, system support, and meeting customer needs are covered. Topics such as effective dashboards and reporting, big data, predictive analytics, and more are on the agenda.

Another year implementing, validating, securing, optimizing, migrating, and adopting the Hadoop platform. What have been the top 10 accomplishments with Hadoop seen over the last year? We also review issues, concerns, and resolutions from the past year as well. We discuss where implementations are and some best practices for moving forward with Hadoop and SAS^® releases.

Read the paper (PDF)

Investors usually trade stocks or exchange-traded funds (ETFs) based on a methodology, such as a theory, a model, or a specific chart pattern. There are more than 10,000 securities listed on the US stock market. Picking the right one based on a methodology from so many candidates is usually a big challenge. This paper presents the methodology based on the CANSLIM1 theorem and momentum trading (MT) theorem. We often hear of the cup and handle shape (C&H), double bottoms and multiple bottoms (MB), support and resistance lines (SRL), market direction (MD), fundamental analyses (FA), and technical analyses (TA). Those are all covered in CANSLIM theorem. MT is a trading theorem based on stock moving direction or momentum. Both theorems are easy to learn but difficult to apply without an appropriate tool. The brokers' application system usually cannot provide such filtering due to its complexity. For example, for C&H, where is the handle located? For the MB, where is the last bottom you should trade at? Now, the challenging task can be fulfilled through SAS^®. This paper presents the methods on how to apply the logic and graphically present them though SAS. All SAS users, especially those who work directly on capital market business, can benefit from reading this document to achieve their investment goals. Much of the programming logic can also be adopted in SAS finance packages for clients.

Read the paper (PDF)

XML documents are becoming increasingly popular for transporting data from different operating systems. In the pharmaceutical industry, the Food and Drug Administration (FDA) requires pharmaceutical companies to submit certain types of data in XML format. This paper provides insights into XML documents and summarizes different methods of importing and exporting XML documents with SAS^®, including: using the XML LIBNAME engine to translate between the XML markup and the SAS format; creating an XML Map and using the XML92 LIBNAME engine to read in XML documents and create SAS data sets; and using Clinical Data Interchange Standards Consortium (CDISC) procedures to import and export XML documents. An example of importing OpenClinica data into SAS by implementing these methods is provided.

Read the paper (PDF)

Do you need to create a format instantly? Does the format have a lot of labels, and it would take a long time to type in all the codes and labels by hand? Sometimes, a SAS^® programmer needs to create a user-defined format for hundreds or thousands of codes, and he needs an easy way to accomplish this without having to type in all of the codes. SAS provides a way to create a user-defined format without having to type in any codes. If the codes and labels are in a text file, SAS data set, Excel file, or in any file that can be converted to a SAS data set, then a SAS user-defined format can be created on the fly. The CNTLIN=option of PROC FORMAT allows a user to create a user-defined format or informat from raw data or from a SAS file. This paper demonstrates how to create two user-defined formats instantly from a raw text file on our Census Bureau website. It explains how to use these user-defined formats for the final report and final output data set from PROC TABULATE. The paper focuses on the CNTLIN= option of PROC FORMAT, not the CNTLOUT= option.

Read the paper (PDF)

This paper will build on the knowledge gained in the Intro to SAS^® ODS Graphics. The capabilities in ODS Graphics grow with every release as both new paradigms and smaller tweaks are introduced. After talking with the ODS developers, a selection of the many wonderful capabilities was selected. This paper will look at that selection of both types of capabilities and provide the reader with more tools for their belt. Visualization of data is an important part of telling the story seen in the data. And while the standards and defaults in ODS Graphics are very well done, sometimes the user has specific nuances for characters in the story or additional plot lines they want to incorporate. Almost any possibility, from drama to comedy to mystery, is available in ODS Graphics if you know how. We will explore tables, annotation and changing attributes, as well as the BLOCK plot. Any user of Base SAS on any platform will find great value from the SAS ODS Graphics procedures. Some experience with these procedures is assumed, but not required.

Read the paper (PDF) | Download the data file (ZIP)

This presentation teaches the audience how to use ODS Graphics. Now part of Base SAS^®, ODS Graphics are a great way to easily create clear graphics that enable any user to tell their story well. SGPLOT and SGPANEL are two of the procedures that can be used to produce powerful graphics that used to require a lot of work. The core of the procedures is explained, as well as some of the many options available. Furthermore, we explore the ways to combine the individual statements to make more complex graphics that tell the story better. Any user of Base SAS on any platform will find great value in the SAS ODS Graphics procedures.

Read the paper (PDF) | Download the data file (ZIP)

As technology expands, we have the need to create programs that can be handed off to clients, to regulatory agencies, to parent companies, or to other projects, and handed off with little or no modification by the recipient. Minimizing modification by the recipient often requires the program itself to self-modify. To some extent the program must be aware of its own operating environment and what it needs to do to adapt to it. There are a great many tools available to the SAS^® programmer that will allow the program to self-adjust to its own surroundings. These include location-detection routines, batch files based on folder contents, the ability to detect the version and location of SAS, programs that discern and adjust to the current operating system and the corresponding folder structure, the use of automatic and user defined environmental variables, and macro functions that use and modify system information. Need to create a portable program? We can hand you the tools.

Read the paper (PDF)

Data with a location component is naturally displayed on a map. Base SAS^® 9.4 provides libraries of map data sets to assist in creating these images. Sometimes, a particular sub-region is all that needs to be displayed. SAS/GRAPH^® software can create a new subset of the map using the GPROJECT procedure minimum and maximum latitude and longitude options. However, this method is capable only of cutting out a rectangular area. This paper presents a polygon clipping algorithm that can be used to create arbitrarily shaped custom map regions. Maps are nothing more than sets of polygons, defined by sets of border points. Here, a custom polygon shape overlays the map polygons and saves the intersection of the two. The DATA step hash object is used for easier bookkeeping of the added and deleted points needed to maintain the correct shape of the clipped polygons.

Read the paper (PDF)

Throughout history, the phrase know thyself has been the aspiration of many. The trend of wearable technologies has certainly provided the opportunity to collect personal data. These technologies enable individuals to know thyself on a more sophisticated level. Specifically, wearable technologies that can track a patient's medical profile in a web-based environment, such as continuous blood glucose monitors, are saving lives. The main goal for diabetics is to replicate the functions of the pancreas in a manner that allows them to live a normal, functioning lifestyle. Many diabetics have access to a visual analytics website to track their blood glucose readings. However, they often are unreadable and overloaded with information. Analyzing these readings from the glucose monitor and insulin pump with SAS^®, diabetics can parse their own information into more simplified and readable graphs. This presentation demonstrates the ease in creating these visualizations. Not only is this beneficial for diabetics, but also for the doctors that prescribe the necessary basal and bolus levels of insulin for a patient s insulin pump.

View the e-poster or slides (PDF)

When analyzing data with SAS^®, we often use the SAS DATA step and the SQL procedure to explore and manipulate data. Though they both are useful tools in SAS, many SAS users do not fully understand their differences, advantages, and disadvantages and thus have numerous unnecessary biased debates on them. Therefore, this paper illustrates and discusses these aspects with real work examples, which give SAS users deep insights into using them. Using the right tool for a given circumstance not only provides an easier and more convenient solution, it also saves time and work in programming, thus improving work efficiency. Furthermore, the illustrated methods and advanced programming skills can be used in a wide variety of data analysis and business analytics fields.

Read the paper (PDF)

From stock price histories to hospital stay records, analysis of time series data often requires use of lagged (and occasionally lead) values of one or more analysis variable. For the SAS^® user, the central operational task is typically getting lagged (lead) values for each time point in the data set. Although SAS has long provided a LAG function, it has no analogous lead function, which is an especially significant problem in the case of large data series. This paper 1) reviews the lag function, in particular, the powerful but non-intuitive implications of its queue-oriented basis; 2) demonstrates efficient ways to generate leads with the same flexibility as the LAG function, but without the common and expensive recourse to data re-sorting; and 3) shows how to dynamically generate leads and lags through use of the hash object.

Read the paper (PDF)

Managing your career future involves learning outside the box at all stages. The next step is not always on the path we planned as opportunities develop and must be taken when we are ready. Prepare with this paper, which explains important features of Base SAS^® that support teams. In this presentation, you learn about the following: concatenating team shared folders with personal development areas; creating consistent code; guidelines for a team (not standards); knowing where the documentation will provide the basics; thinking of those who follow (a different interface); creating code for use by others; and how code can learn about the SAS environment.

Read the paper (PDF)

Programming for others involves new disciplines not called for when we write to provide results. There are many additional facilities in the languages of SAS^® to ensure the processes and programs you provide for others will please your customers. Not all are obvious and some seem hidden. The never-ending search to please your friends, colleagues, and customers could start in this presentation.

Read the paper (PDF)

Making sure that you have saved all the necessary information to replicate a deliverable can be a cumbersome task. You want to make sure that all the raw data sets and all the derived data sets, whether they are Study Data Tabulation Model (SDTM) data sets or Analysis Data Model (ADaM) data sets, are saved. You prefer that the date/time stamps are preserved. Not only do you need the data sets, you also need to keep a copy of all programs that were used to produce the deliverable, as well as the corresponding logs from when the programs were executed. Any other information that was needed to produce the necessary outputs also needs to be saved. You must do all of this for each deliverable, and it can be easy to overlook a step or some key information. Most people do this process manually. It can be a time-consuming process, so why not let SAS^® do the work for you?

Read the paper (PDF)

Linear regression, which is widely used, can be improved by the inclusion of the penalizing parameter. This helps reduce variance (at the cost of a slight increase in bias) and improves prediction accuracy and model interpretability. The regularization model is implemented on the sample data set, and recommendations for the practice are included.

View the e-poster or slides (PDF)

String externalization is the key to making your SAS^® applications speak multiple languages, even if you can't. Using the new features in SAS^® 9.3 for internationalization, your SAS applications can be written to adapt to whatever environment they are found in. String externalization is the process of identifying and separating translatable strings from your SAS program. This paper outlines the four steps of string externalization: create a Microsoft Excel spreadsheet for messages (optional), create SMD files, convert SMD files, and create the final SAS data set. Furthermore, it briefly shows you a real-world project on applying the concept. Using the Excel spreadsheet message text approach, professional translators can work more efficiently translating text in a friendlier and more comfortable environment. Subsequently, a programmer can also fully concentrate on developing and maintaining SAS code when your application is traveling to a new country.

View the e-poster or slides (PDF)

In this panel session, professors from three geographically diverse universities explain what makes for an effective partnership with private sector companies. Specific examples are discussed from health care, insurance, financial services, insurance, and retail. The panelists discuss what works, what doesn t, and what both parties need to be prepared to bring to the table for a long-term, mutually beneficial partnership.

The days of comparing paper copies of graphs on light boxes are long gone, but the problems associated with validating graphical reports still remain. Many recent graphs created using SAS/GRAPH^® software include annotations, which complicate an already complex problem. In ODS Graphics, only a single input data set should be used. Because annotation can be more easily added by overlaying an additional graph layer, it is now more practical to use that single input data set for validation, which removes all of the scaling, platform, and font issues that got in the way before. This paper guides you through the techniques to simplify validation while you are creating your perfect graph.

Read the paper (PDF)

SAS^® education is a mainstay across disciplines and educational levels in the United States. Along with other courses that are relevant to the jobs students want, independent SAS courses or SAS education integrated into additional courses can help a student be more interesting to a potential employer. The multitude of SAS offerings (SAS^® University Edition, Base SAS^®, SAS^® Enterprise Guide^®, SAS^® Studio, and the SAS^® OnDemand offerings) provide the tools for education, but reaching students where they are is the greatest key for making the education count. This presentation discusses several roadblocks to learning SAS^® syntax or point-and-click from the student perspective and several solutions developed jointly by students and educators in one graduate educational program.

Read the paper (PDF)

You re in the business of performing complex analyses on large amounts of data. This data changes quickly and often, so you ve invested in a powerful high-performance analytics engine with the speed to respond to a real-time data stream. However, you realize an immediate problem upon the implementation of your software solution: your analytics engine wants to process many records of data at once, but your streaming engine wants to send individual records. How do you store this streaming data? How do you tell the analytics engine about the updates? This paper explains how to manage real-time streaming data in a batch-processing analytics engine. The problem of managing streaming data in analytics engines comes up in many industries: energy, finance, health care, and marketing to name a few. The solution described in this paper can be applied in any industry, using features common to most analytics engines. You learn how to store and manage streaming data in such a way as to guarantee that the analytics engine has only current information, limit interruptions to data access, avoid duplication of data, and maintain a historical record of events.

Read the paper (PDF)

One of the first maps of the present United States was John White's 1585 map of the Albemarle Sound and Roanoke Island, the site of the Lost Colony and the site of my present home. This presentation looks at advances in mapping through the ages, from the early surveys and hand-painted maps, through lithographic and photochemical processes, to digitization and computerization. Inherent difficulties in including small pieces of coastal land (often removed from map boundary files and data sets to smooth a boundary) are also discussed. The paper concludes with several current maps of Roanoke Island created with SAS^®.

Read the paper (PDF)

In the increasingly competitive environment for banks and credit unions, every potential advantage should be pursued. One of these advantages is to market additional products to your existing customers rather than to new customers, since your existing customers already know (and hopefully trust) you, and you have so much data on them. But how can this best be done? How can you market the right products to the right customers at the right time? Predictive analytics can do this by forecasting which customers have the highest chance of purchasing a given financial product. This paper provides a step-by-step overview of a relatively simple but comprehensive approach to maximize cross-sell opportunities among your customers. We first prepare the data for a statistical analysis. With some basic predictive analytics techniques, we can then identify those members who have the highest chance of buying a financial product. For each of these members, we can also gain insight into why they would purchase, thus suggesting the best way to market to them. We then make suggestions to improve the model for better accuracy.

Read the paper (PDF)

Meta-analysis is a method for combining multiple independent studies on the same subject or question, producing a single large study with increased accuracy and enhanced ability to detect overall trends and smaller effects. This is done by treating the results of each study as a single observation and performing analysis on the set, while controlling for differences between individual studies. These differences can be treated as either fixed or random effects, depending on context. This paper demonstrates the process and techniques used in meta-analysis using human trafficking studies. This problem has seen increasing interest in the past few years, and there are now a number of localized studies for one state or a metropolitan area. This meta-analysis combines these to begin development of a comprehensive analytic understanding of human trafficking across the United States. Both fixed and random effects are described. All elements of this analysis were performed using SAS^® University Edition.

Read the paper (PDF)

This presentation has the objective to present a methodology for interest rates, life tables, and actuarial calculations using generational mortality tables and the forward structure of interest rates for pension funds, analyzing long-term actuarial projections and their impacts on the actuarial liability. It was developed as a computational algorithm in SAS^® Enterprise Guide^® and Base SAS^® for structuring the actuarial projections and it analyzes the impacts of this new methodology. There is heavy use of the IML and SQL procedures.

Read the paper (PDF)

Prince Niccolo Machiavelli said things on the order of, The promise given was a necessity of the past: the word broken is a necessity of the present. His utilitarian philosophy can be summed up by the phrase, The ends justify the means. As a personality trait, Machiavelianism is characterized by the drive to pursue one's own goals at the cost of others. In 1970, Richard Christie and Florence L. Geis created the MACH-IV test to assign a MACH score to an individual, using 20 Likert-scaled questions. The purpose of this study was to build a regression model that can be used to predict the MACH score of an individual using fewer factors. Such a model could be useful in screening processes where personality is considered, such as in job screening, offender profiling, or online dating. The research was conducted on a data set from an online personality test similar to the MACH-IV test. It was hypothesized that a statistically significant model exists that can predict an average MACH score for individuals with similar factors. This hypothesis was accepted.

View the e-poster or slides (PDF)

Microsoft Excel worksheets enable you to explore data that answers the difficult questions that you face daily in your work. When you combine the SAS^® Output Deliver System (ODS) with the capabilities of Excel, you have a powerful toolset that you can use to manipulate data in various ways, including highlighting data, using formulas to answer questions, and adding a pivot table or graph. In addition, ODS and Excel give you many methods for enhancing the appearance of your tables and graphs. This paper, written for the beginning analyst to the most advanced programmer, illustrates first how to manipulate styles and presentation elements in your worksheets by controlling text wrapping, highlighting and exploring data, and specifying Excel templates for data. Then, the paper explains how to use the TableEditor tagset and other tools to build and manipulate both basic and complex pivot tables that can help you answer all of the questions about your data. You will also learn techniques for sorting, filtering, and summarizing pivot-table data. ^®

Read the paper (PDF)

The TABULATE procedure has long been a central workhorse of our organization's reporting processes, given that it offers a uniquely concise syntax for obtaining descriptive statistics on deeply grouped and nested categories within a data set. Given the diverse output capabilities of SAS^®, it often then suffices to simply ship the procedure's completed output elsewhere via the Output Delivery System (ODS). Yet there remain cases in which we want to not only obtain a formatted result, but also to acquire the full nesting tree and logic by which the computations were made. In these cases, we want to treat the details of the Tabulate statements as data, not merely as presentation. I demonstrate how we have solved this problem by parsing our Tabulate statements into a nested tree structure in JSON that can be transferred and easily queried for deep values elsewhere beyond the SAS program. Along the way, this provides an excellent opportunity to walk through the nesting logic of the procedure's statements and explain how to think about the axes, groupings, and set computations that make it tick. The source code for our syntax parser are also available on GitHub for further use.

Read the paper (PDF)

The EXPAND procedure is very useful when handling time series data and is commonly used in fields such as finance or economics, but it can also be applied to medical encounter data within a health research setting. Medical encounter data consists of detailed information about healthcare services provided to a patient by a managed care entity and is a rich resource for epidemiologic research. Specific data items include, but are not limited to, dates of service, procedures performed, diagnoses, and costs associated with services provided. Drug prescription information is also available. Because epidemiologic studies generally focus on a particular health condition, a researcher using encounter data might wish to distinguish individuals with the health condition of interest by identifying encounters with a defining diagnosis and/or procedure. In this presentation, I provide two examples of how cases can be identified from a medical encounter database. The first uses a relatively simple case definition, and then I EXPAND the example to a more complex case definition.

View the e-poster or slides (PDF)

Multicollinearity can be briefly described as the phenomenon in which two or more identified predictor variables in a multiple regression model are highly correlated. The presence of this phenomenon can have a negative impact on the analysis as a whole and can severely limit the conclusions of the research study. This paper reviews and provides examples of the different ways in which multicollinearity can affect a research project, and tells how to detect multicollinearity and how to reduce it once it is found. In order to demonstrate the effects of multicollinearity and how to combat it, this paper explores the proposed techniques by using the Behavioral Risk Factor Surveillance System data set. This paper is intended for any level of SAS^® user. This paper is also written to an audience with a background in behavioral science or statistics.

Read the paper (PDF)

The SAS^® DATA step is one of the best (if not the best) data manipulators in the programming world. One of the areas that gives the DATA step its power is the wealth of functions that are available to it. This paper takes a PEEK at some of the functions whose names have more than one MEANing. While the subject matter is very serious, the material is presented in a humorous way that is guaranteed not to BOR the audience. With so many functions available, we have to TRIM our list so that the presentation can be made within the TIME allotted. This paper also discusses syntax and shows several examples of how these functions can be used to manipulate data.

Read the paper (PDF)

Graphics are an excellent way to display results from multiple statistical analyses and get a visual message across to the correct audience. Scientific journals often have very precise requirements for graphs that are submitted with manuscripts. While authors often find themselves using tools other than SAS^® to create these graphs, the combination of the SGPLOT procedure and the Output Delivery System enables authors to create what they need in the same place as they conducted their analysis. This presentation focuses on two methods for creating a publication quality graphic in SAS^® 9.4 and provides solutions for some issues encountered when doing so.

Read the paper (PDF)

A new ODS destination for creating Microsoft Excel workbooks is available starting in the third maintenance release for SAS^® 9.4. This destination creates native Microsoft Excel XLSX files, supports graphic images, and offers other advantages over the older ExcelXP tagset. In this presentation, you learn step-by-step techniques for quickly and easily creating attractive multi-sheet Excel workbooks that contain your SAS^® output. The techniques can be used regardless of the platform on which SAS software is installed. You can even use them on a mainframe! Creating and delivering your workbooks on demand and in real time using SAS server technology is discussed. Using earlier versions of SAS to create multi-sheet workbooks is also discussed. Although the title is similar to previous presentations by this author, this presentation contains new and revised material not previously presented.

Read the paper (PDF) | Download the data file (ZIP)

Do you create Excel files from SAS^®? Do you use the ODS EXCELXP tagset or the ODS EXCEL destination? In this presentation, the EXCELXP tagset and the ODS EXCEL destination are compared face to face. There's gonna be a showdown! We give quick tips for each and show how to create Excel files for our Special Census program. Pros of each method are explored. We show the added benefits of the ODS EXCEL destination. We display how to create XML files with the EXCELXP tagset. We present how to use TAGATTR formats with the EXCELXP tagset to ensure that leading and trailing zeros in Excel are preserved. We demonstrate how to create the same Excel file with the ODS EXCEL destination with SAS formats instead of with TAGATTR formats. We show how the ODS EXCEL destination creates native Excel files. One of the drawbacks of an XML file created with the EXCELXP tagset is that a pop-up message is displayed in Excel each time you open it. We present differences using the ABSOLUTE_COLUMN_WIDTH= option in both methods.

Read the paper (PDF)

When first learning SAS^®, programmers often see the proprietary DATA step as a foreign and nonstandard concept. The introduction of the SAS^® 9.4 DS2 language eases the transition for traditional programmers delving into SAS for the first time. Object Oriented Programming (OOP) has been an industry mainstay for many years, and the DS2 procedure provides an object-oriented environment for the DATA step. In this poster, we go through a business case to show how DS2 can be used to define a reusable package following object-oriented principles.

View the e-poster or slides (PDF)

People typically invest in more than one stock to help diversify their risk. These stock portfolios are a collection of assets that each have their own inherit risk. If you know the future risk of each of the assets, you can optimize how much of each asset to keep in the portfolio. The real challenge is trying to evaluate the potential future risk of these assets. Different techniques provide different forecasts, which can drastically change the optimal allocation of assets. This talk presents a case study of portfolio optimization in three different scenarios historical standard deviation estimation, capital asset pricing model (CAPM), and GARCH-based volatility modeling. The structure and results of these three approaches are discussed.

Read the paper (PDF)

Making optimal use of SAS^® Grid Computing relies on the ability to spread the workload effectively across all of the available nodes. With SAS^® Scalable Performance Data Server (SPD Server), it is possible to partition your data and spread the processing across the SAS Grid Computing environment. In an ideal world it would be possible to adjust the size and number of partitions according to the data volumes being processed on any given day. This paper discusses a technique that enables the processing performed in the SAS Grid Computing environment to be dynamically reconfigured, automatically at run time, to optimize the use of SAS Grid Computing, and to provide significant performance benefits.

Read the paper (PDF)

Today, companies are increasingly using analytics to discover new revenue and cost-saving opportunities. Many business professionals turn to SAS, a leader in business analytics software and service, to help them improve performance and make better decisions faster. Analytics is also being used in risk management, fraud detection, life sciences, sports, and many more emerging markets. However, to maximize the value to the business, analytics solutions need to be deployed quickly and cost-effectively, while also providing the ability to readily scale without degrading performance. Of course, in today's demanding environments, where budgets are still shrinking and mandates to reduce carbon footprints are growing, the solution must deliver excellent hardware utilization, power efficiency, and return on investment. To help solve some of these challenges, Red Hat and SAS have collaborated to recommend the best practices for configuring SAS^®9 running on Red Hat Enterprise Linux. The scope of this document covers Red Hat Enterprise Linux 6 and 7. Areas researched include the I/O subsystem, file system selection, and kernel tuning, both in bare metal and virtualized (KVM) environments. Additionally, we now include grid-based configurations running with Red Hat Resilient Storage Add-On (Global File System 2 [GFS2] clusters).

Read the paper (PDF)

The DATASETS procedure provides the most diverse selection of capabilities and features of any of the SAS^® procedures. It is the prime tool that programmers can use to manage SAS data sets, indexes, catalogs, and so on. Many SAS programmers are only familiar with a few of PROC DATASETS's many capabilities. Most often, they only use the data set updating, deleting, and renaming capabilities. However, there are many more features and uses that should be in a SAS programmer's toolkit. This paper highlights many of the major capabilities of PROC DATASETS. It discusses how it can be used as a tool to update variable information in a SAS data set; provide information about data set and catalog contents; delete data sets, catalogs, and indexes; repair damaged SAS data sets; rename files; create and manage audit trails; add, delete, and modify passwords; add and delete integrity constraints; and more. The paper contains examples of the various uses of PROC DATASETS that programmers can cut and paste into their own programs as a starting point. After reading this paper, a SAS programmer will have practical knowledge of the many different facets of this important SAS procedure.

Read the paper (PDF)

Real workflow dependencies exist when the completion or output of one data process is a prerequisite for subsequent data processes. For example, in extract, transform, load (ETL) systems, the extract must precede the transform and the transform must precede the load. This serialization is common in SAS^® data analytic development but should be implemented only when actual dependencies exist. A false dependency, by contrast, exists when the workflow itself does not require serialization but is coded in a manner that forces a process to wait unnecessarily for some unrelated process to complete. For example, an ETL system might extract, transform, and load one data set, and then extract, transform, and load a second data set, causing processing of the second data set to wait unnecessarily for the first to complete. This hands-on session demonstrates three common patterns of false dependencies, teaching SAS practitioners how to recognize and remedy false dependencies through parallel processing paradigms. Groups of participants are pitted against each other, as the class simultaneously runs both serialized software and distributed software that runs in parallel. Participants execute exercises in unison, and then watch their machines race to the finish as the tremendous performance advantages of parallel processing are demonstrated in one exercise after another--ideal for anyone seeking to walk away with proven techniques that can measurably increase your performance and bonus.

Read the paper (PDF)

JavaScript Object Notation (JSON) has quickly become the de facto standard for data transfer on the Internet due to an increase in web data and the usage of full-stack JavaScript. JSON has become dominant in the emerging technologies of the web today, such as in the Internet of Things and in the mobile cloud. JSON offers a light and flexible format for data transfer. It can be processed directly from JavaScript without the need for an external parser. This paper discusses several abilities within SAS^® to process JSON files, the new JSON LIBNAME, and several procedures. This paper compares all of these in detail.

Read the paper (PDF)

As a programmer specializing in tracking systems for research projects, I was recently given the task of implementing a newly developed, web-based tracking system for a complex field study. This tracking system uses an SQL database on the back end to hold a large set of related tables. As I learned about the new system, I found that there were deficiencies to overcome to make the system work on the project. Fortunately, I was able to develop a set of utilities in SAS^® to bridge the gaps in the system and to integrate the system with other systems used for field survey administration on the project. The utilities helped to do the following: 1) connect schemas and compare cases across subsystems; 2) compare the statuses of cases across multiple tracked processes; 3) generate merge input files to be used to initiate follow-up activities; 4) prepare and launch SQL stored procedures from a running SAS job; and 5) develop complex queries in Microsoft SQL Server Management Studio and making them run in SAS. This paper puts each of these needs into a larger context by describing the need that is not addressed by the tracking system. Then, each program is explained and documented, with comments.

Read the paper (PDF)

Moving a workforce in a new direction takes a lot of energy. Your planning should include four pillars: culture, technology, process, and people. These pillars assist small and large SAS^® rollouts with a successful implementation and an eye toward future proofing. Boston Scientific is a large multi-national corporation that recently grew SAS from a couple of desktops to a global implementation. Boston Scientific's real world experiences reflect on each pillar, both in what worked and in lessons learned.

Read the paper (PDF)

Are secondary schools in the United States hiring enough qualified math teachers? In which regions is there a disparity of qualified teachers? Data from an extensive survey conducted by the National Center for Education Statistics (NCES) was used for predicting qualified secondary school teachers across public schools in the US. The three criteria examined to determine whether a teacher is qualified to teach a given subject are: 1) Whether the teacher has a degree in the subject he or she is teaching 2) Whether he or she has a teaching certification in the subject 3) Whether he or she has five years of experience in the subject. A qualified teacher is defined as one who has all three of the previous qualifications. The sample data included socioeconomic data at the county level, which was used as predictors for hiring a qualified teacher. Data such as the number of students on free or reduced lunch at the school was used to assign schools as high-needs or low-needs schools. Other socioeconomic factors included were the income and education levels of working adults within a given school district. Some of the results show that schools with higher-needs students (a school that has more than 40% of the students on some form of reduced lunch program) have less-qualified teachers. The resultant model is used to score other regions and is presented on a heat map of the US. SAS^® procedures such as PROC SURVEYFREQ and PROC SURVEYLOGISTIC are used.

View the e-poster or slides (PDF)

Predictive modeling might just be the single most thrilling aspect of data science. Who among us can deny the allure: to observe a naturally occurring phenomenon, conjure a mathematical model to explain it, and then use that model to make predictions about the future? Though many SAS^® users are familiar with using a data set to generate a model, they might not use the awesome power of SAS to store the model and score other data sets. In this paper, we distinguish between parametric and nonparametric models and discuss the tools that SAS provides for storing and scoring each. Along the way, you come to know the STORE statement and the SCORE procedure. We conclude with a brief overview of the PLM procedure and demonstrate how to effectively load and evaluate models that have been stored during the model building process.

Read the paper (PDF)

This paper compiles information from documents produced by the U.S. Food and Drug Administration (FDA), the Clinical Data Interchange Standards Consortium (CDISC), and Computational Sciences Symposium (CSS) workgroups to identify what analysis data and other documentation is to be included in submissions and where it all needs to go. It not only describes requirements, but also includes recommendations for things that aren't so cut-and-dried. It focuses on the New Drug Application (NDA) submissions and a subset of Biologic License Application (BLA) submissions that are covered by the FDA binding guidance documents. Where applicable, SAS^® tools are described and examples given.

Read the paper (PDF)

Face it your data can occasionally contain characters that wreak havoc on your macro code. Characters such as the ampersand in at&t, or the apostrophe in McDonald's, for example. This paper is designed for programmers who know most of the ins and outs of SAS^® macro code already. Now let's take your macro skills a step farther by adding to your skill set, specifically, %BQUOTE, %STR, %NRSTR, and %SUPERQ. What is up with all these quoting functions? When do you use one over the other? And why would you need %UNQUOTE? The macro language is full of subtleties and nuances, and the quoting functions represent the epitome of all of this. This paper shows you in which instances you would use the different quoting functions. Specifically, we show you the difference between the compile-time and the execution-time functions. In addition to looking at the traditional quoting functions, you learn how to use %QSCAN and %QSYSFUNC among other functions that apply the regular function and quote the result.

Read the paper (PDF)

A recurring problem with large research databases containing sensitive information about an individual's health, financial, and personal information is how to make meaningful extracts available to qualified researchers without compromising the privacy of the individuals whose data is in the database. This problem is exacerbated when a large number of extracts need to be made from the database. In addition to using statistical disclosure control methods, this paper recommends limiting the variables included in each extract to the minimum needed and implementing a method of assigning request-specific randomized IDs to each extract that is both secure and self-documenting.

Read the paper (PDF)

Conferences for SAS^® programming are replete with the newest software capabilities and clever programming techniques. However, discussion about quality control (QC) is lacking. QC is fundamental to ensuring both correct results and sound interpretation of data. It is not industry specific, and it simply makes sense. Most QC procedures are a function of regulatory requirements, industry standards, and corporate philosophies. Good QC goes well beyond just reviewing results, and should also consider the underlying data. It should be driven by a thoughtful consideration of relevance and impact. While programmers strive to produce correct results, it is no wonder that programming mistakes are common despite rigid QC processes in an industry where expedited deliverables and a lean workforce are the norm. This leads to a lack of trust in team members and an overall increase in resource requirements as these errors are corrected, particularly when SAS programming is outsourced. Is it possible to produce results with a high degree of accuracy, even when time and budget are limited? Thorough QC is easy to overlook in a high-pressure environment with increased expectations of workload and expedited deliverables. Does this suggest that QC programming is becoming a lost art, or does it simply suggest that we need to evolve with technology? The focus of the presentation is to review the who, what, when, how, why, and where of QC programming implementation.

Read the paper (PDF)

SQL is a universal language that allows you to access data stored in relational databases or tables. This hands-on workshop presents core concepts and features of using PROC SQL to access data stored in relational database tables. Attendees learn how to define, access, and manipulate data from one or more tables using PROC SQL quickly and easily. Numerous code examples are presented on how to construct simple queries, subset data, produce simple and effective output, join two tables, summarize data with summary functions, construct BY-groups, identify FIRST. and LAST. rows, and create and use virtual tables.

Read the paper (PDF) | Download the data file (ZIP)

The announcement of SAS Institute's free SAS^® University Edition is an exciting development for SAS users and learners around the world! The software bundle includes Base SAS^®, SAS/STAT^® software, SAS/IML^® software, SAS^® Studio (user interface), and SAS/ACCESS^® for Windows, with all the popular features found in the licensed SAS versions. This is an incredible opportunity for users, statisticians, data analysts, scientists, programmers, students, and academics everywhere to use (and learn) for career opportunities and advancement. Capabilities include data manipulation, data management, comprehensive programming language, powerful analytics, high-quality graphics, world-renowned statistical analysis capabilities, and many other exciting features. This paper illustrates a variety of powerful features found in the SAS University Edition. Attendees will be shown a number of tips and techniques on how to use the SAS^® Studio user interface, and they will see demonstrations of powerful data management and programming features found in this exciting software bundle.

Read the paper (PDF)

This session introduces how Equifax uses SAS^® to develop a streamlined model automation process, including development and performance monitoring. The tool increases modeling efficiency and accuracy of the model, reduces error, and generates insights. You can integrate more advanced analytics tools in a later phase. The process can apply in any given business problem in risk and marketing, which helps leaders to make precise and accurate business decisions.

The United States Access Board will soon refresh the Section 508 accessibility standards. The new requirements are based on Web Content Accessibility Guidelines (WCAG) 2.0 and include a total of 38 testable success criteria-16 more than the current requirements. Is your organization ready? Don't worry, the fourth maintenance release for SAS^® 9.4 Output Delivery System (ODS) HTML5 destination has you covered. This paper describes the new accessibility features in the ODS HTML5 destination, explains how to use them, and shows you how to test your output for compliance with the new Section 508 standards.

Read the paper (PDF)

The intrepid Mars Rovers have inspired awe and curiosity and dreams of mapping Mars using SAS/GRAPH^® software. This presentation demonstrates how to import Esri shapefile (SHP) data (using the MAPIMPORT procedure) from sources other than SAS^® and GfK GeoMarketing map data to produce useful (and sometimes creative) maps. Examples include mapping neighborhoods, ZCTA5 areas, postal codes, and of course, Mars. Products used are Base SAS^® and SAS/GRAPH^®. SAS programmers of any skill level will benefit from this presentation.

Read the paper (PDF)

We live in a world of data; small data, big data, and data in every conceivable size between small and big. In today's world, data finds its way into our lives wherever we are. We talk about data, create data, read data, transmit data, receive data, and save data constantly during any given hour in a day, and we still want and need more. So, we collect even more data at work, in meetings, at home, on our smartphones, in emails, in voice messages, sifting through financial reports, analyzing profits and losses, watching streaming videos, playing computer games, comparing sports teams and favorite players, and countless other ways. Data is growing and being collected at such astounding rates, all in the hope of being able to better understand the world around us. As SAS^® professionals, the world of data offers many new and exciting opportunities, but it also presents a frightening realization that data sources might very well contain a host of integrity issues that need to be resolved first. This presentation describes the available methods to remove duplicate observations (or rows) from data sets (or tables) based on the row's values and keys using SAS.

Read the paper (PDF)

At the end of a project, many institutional review boards (IRBs) require project directors to certify that no personally identifiable information (PII) is retained by a project. This paper briefly reviews what information is considered PII and explores how to identify variables containing PII in a given project. It then shows a comprehensive way to ensure that all SAS^® variables containing PII have their values set to NULL and how to use SAS to document that this has been done.

Read the paper (PDF)

Do you ever feel like you email the same reports to the same people over and over and over again? If your customers are anything like mine, you create reports, and lots of them. Our office is using macros, SAS^® email capabilities, and other programming techniques, in conjunction with our trusty contact list, to automate report distribution. Customers now receive the data they need, and only the data they need, on the schedule they have requested. In addition, not having to send these emails out manually saves our office valuable time and resources that can be used for other initiatives. In this session, we walk through a few of the SAS techniques we are using to provide better service to our internal and external partners and, hopefully, make us look a little more like rock stars.

Read the paper (PDF)

If you've got an iPhone, you might have noticed that the Health app is hard at work collecting data on every step you take. And, of course, the data scientist inside you is itching to analyze that data with SAS^®. This paper and an accompanying E-Poster show you how to get step data out of your iPhone Health app and into SAS. Once it's there, you can have at it with all things SAS. In this presentation, we show you how a (what else?) step plot can be used to visualize the 73,000+ steps the author took at SAS^® Global Forum 2016.

Read the paper (PDF) | View the e-poster or slides (PDF)

Using zero inflated beta regression and gradient boosting, a solution to forecast the gross revenue of credit card products was developed. This solution was based on 1) A set of attributes from invoice information. 2) Zero inflated beta regression for forecasts of interchange and revolving revenue (by using PROC NLMIXED and by building data processing routines (with attributes and a target variable)). 3) Gradient boosting models for different product forecasts (annuity, insurance, etc.) using PROC TREEBOOST, exploring its parameters, and creating a routine for selecting and adjusting models. 4) Construction of ranges of revenue for policies and monitoring. This presentation introduces this credit card revenue forecasting solution.

Read the paper (PDF)

From state-of-the-art research to routine analytics, the Jupyter Notebook offers an unprecedented reporting medium. Historically, tables, graphics, and other types of output had to be created separately, and then integrated into a report piece by piece, amidst the drafting of text. The Jupyter Notebook interface enables you to create code cells and markdown cells in any arrangement. Markdown cells allow all typical formatting. Code cells can run code in the document. As a result, report creation happens naturally and in a completely reproducible way. Handing a colleague a Jupyter Notebook file to be re-run or revised is much easier and simpler for them than passing along, at a minimum, two files: one for the code and one for the text. Traditional reports become dynamic documents that include both text and living SAS^® code that is run during document creation. With the new SAS kernel for Jupyter, all of this is possible and more!

Read the paper (PDF)

SAS^® has an amazing arsenal of tools for using and displaying geographic information that are relatively unknown and underused. High-quality GfK GeoMarketing maps have been provided by SAS since the second maintenance release for SAS^® 9.3, as sources for inexpensive map data dried up. SAS has been including both GfK and traditional SAS map data sets with licenses for SAS/GRAPH^® software for some time, recognizing there will need to be an extended transitional period. However, for those of us who have been putting off converting our SAS/GRAPH mapping programs to use the new GfK maps, the time has come, as the traditional SAS map data sets are no longer being updated. If you visit SAS^® Maps Online, you can find only GfK maps in current maps. The GfK maps are updated once a year. This presentation walk through the conversion of a long-standing SAS program to produce multiple US maps for a data compendium to take advantage of GfK maps. Products used are Base SAS^® and SAS/GRAPH^®. SAS programmers of any skill level will benefit from this presentation.

Read the paper (PDF)

One of the many difficulties for a SAS^® programmer is remembering how to accurately use SAS syntax, especially syntax that includes many parameters. Not mastering basic syntax parameters definitely makes coding inefficient because the programmer has to check reference manuals constantly to ensure that syntax is correct. One of the more useful but somewhat unknown tools in SAS is the use of SAS abbreviations. This feature enables users to store text strings (such as the syntax of a DATA step function, a SAS procedure, or a complete DATA step) in a user-defined and easy-to-remember abbreviation. When this abbreviation is entered in the enhanced editor, SAS automatically brings up the corresponding stored syntax. Knowing how to use SAS abbreviations is beneficial to programmers with varying levels of SAS expertise. In this paper, various examples of using SAS abbreviations are demonstrated.

The hash object provides an efficient method for quick data storage and data retrieval. Using a common set of lookup keys, hash objects can be used to retrieve data, store data, merge or join tables of data, and split a single table into multiple tables. This paper explains what a hash object is and why you should use hash objects, and provides basic programming instructions associated with the construction and use of hash objects in a DATA step.

Read the paper (PDF)

Binary logistic regression models are widely used in CRM (customer relationship management) or credit risk modeling. In these models, it is common to use nominal, ordinal, or discrete (NOD) predictors. NOD predictors typically are binned (reducing the number of their levels) before usage in a logistic model. The primary purpose of binning is to obtain parsimony without greatly reducing the strength of association of the predictor X to the binary target Y. In this paper, two SAS^® macros are discussed. The %NOD_BIN macro bins predictors with nominal values (and ordinal and discrete values) by collapsing levels to maximize information value (IV). The %ORDINAL_BIN macro is applied to predictors that are ordered and in which collapsing can occur only for levels that are adjacent in the ordering of X. The %ORDINAL_BIN macro finds all possible binning solutions by complete enumeration. Solutions are ranked by IV, and monotonic solutions are identified.

Read the paper (PDF)

The SAS^® macro language provides a powerful tool to write a program once and reuse it many times in multiple places. A repeatedly executed section of a program can be wrapped into a macro, which can then be shared among many users. A practical example of a macro can be a utility that takes in a set of input parameters, performs some calculations, and sends back a result (such as an interest calculator). In general, a macro modularizes a program into smaller and more manageable sections, and encapsulates repetitive tasks into re-usable code. Modularization can help the code to be tested independently. This paper provides an introduction to writing macros. It introduces the user to the basic macro constructs and statements. This paper covers the following advanced macro subjects: 1) using multiple &s to retrieve/resolve the value of a macro variable; 2) creating a macro variable from the value of another macro variable; 3) handling special characters; 4) the EXECUTE statement to pass a DATA step variable to a macro; 5) using the Execute statement to invoke a macro; and 6) using %RETURN to return a variable from a macro.

Read the paper (PDF)

The fourth maintenance release for SAS^® 9.4 and the new SAS^® Viya platform bring even more progress with respect to the interoperability between SAS^® and Hadoop the industry standard for big data. This talk brings you up-to-date with where we are: more distributions, more data types, more options and then there is the cloud. Come and learn about the exciting new developments for blending your SAS processing with your shared Hadoop cluster.

Read the paper (PDF)

The SAS^® platform with Unicode's UTF-8 encoding is ready to help you tackle the challenges of dealing with data in multiple languages. In today's global economy, software needs are changing. Companies are globalizing and consolidating systems from various parts of the world. Software must be ready to handle data from social media, international web pages, and databases that have characters in many different languages. SAS makes migrating your data to Unicode a snap! This paper helps you move smoothly from your legacy SAS environment to the powerful SAS Unicode environment with UTF-8 support. Along the way, you will uncover secrets to successfully manipulate your characters, so that all of your data remains intact.

Read the paper (PDF)

Hospital Information Technologists are faced with a dilemma: how to get the many pharmacy databases, dynamic data sets, and software systems to communicate with each other and generate useful, automated, real-time output. SAS^® serves as a unifying tool for our hospital pharmacy. It brings together data from multiple sources, generates output in multiple formats, analyzes trends, and generates summary reports to meet workload, quality, and regulatory requirements. Data sets originate from multiple sources, including drug and device wholesalers, web-based drug information systems, dumb machine output, pharmacy drug-dispensing platforms, hospital administration systems, and others. SAS output includes CSV files that can be read by dispensing machines, report output for Pharmacy and Therapeutics committees, graphs to summarize year-to-year dispensing and quality trends, emails to customers with inventory and expiry date notifications, investigational drug information summaries for hospital staff, inventory trending with restock alerts, and quality assurance summary reports. For clinical trial support, additional output includes randomization codes, data collection forms, blinded enrollment summaries, study subject assignment lists, and others. For business operations, output includes invoices, shipping documents, and customer metrics. SAS brings our pharmacy information systems together and supports an efficient, cost-effective, flexible, and reliable workflow.

Read the paper (PDF) | View the e-poster or slides (PDF)

Imagine if you will a program, a program that loves its data, a program that loves its data to be in the same directory as the program itself. Together, in the same directory. True love. The program loves its data so much, it just refers to it by filename. No need to say what directory the data is in; it is the same directory. Now imagine that program being thrust into the world of the server. The server knows not what directory this program resides in. The server is an uncaring, but powerful, soul. Yet, when the program is executing, and the program refers to the data just by filename, the server bellows nay, no path, no data. A knight in shining armor emerges, in the form of a SAS^® macro, who says lo, with the help of the SAS^® Enterprise Guide^® macro variable minions, I can gift you with the location of the program directory and send that with you to yon mighty server. And there was much rejoicing. Yay. This paper shows you a SAS macro that you can include in your SAS Enterprise Guide pre-code to automatically set your present working directory to the same directory where your program is saved on your UNIX or Linux operating system. This is applicable to submitting to any type of server, including a SAS Grid Server. It gives you the flexibility of moving your code and data to different locations without having to worry about modifying the code. It also helps save time by not specifying complete pathnames in your programs. And can't we all use a little more time?

Read the paper (PDF)

If you are planning to deploy SAS^® Grid Manager or SAS^® Enterprise BI (or other distributed SAS^® Foundation applications) with load-balanced servers on multiple operating systems instances, a shared file system is required. In order to determine the best shared file system choice for a given deployment, it is important to understand how the file system is used, the SAS^® I/O workload characteristics performed on it, and the stressors that SAS Foundation applications produce on the file system. For the purposes of this paper, we use the term shared file system to mean both a clustered file system and shared file system, even though shared can denote a network file system and a distributed file system not clustered. This paper examines the shared file systems that are most commonly used with SAS and reviews their strengths and weaknesses.

Read the paper (PDF)

Web services are becoming more and more relied upon for serving up vast amounts of data. With such a heavy reliance on the web, and security threats increasing every day, security is a big concern. OAuth 2.0 has become a go-to way for websites to allow secure access to the services they provide. But with increased security, comes increased complexity. Accessing web services that use OAuth 2.0 is not entirely straightforward, and can cause a lot of users plenty of trouble. This paper helps clarify the basic uses of OAuth and shows how you can easily use Base SAS^® to access a few of the most popular web services out there.

Read the paper (PDF)

One often uses an iterative %DO loop to execute a section of a macro repetitively. An alternative method is to use the implicit loop in the DATA step with the EXECUTE routine to generate a series of macro calls. One of the advantages in the latter approach is eliminating the need of using indirect referencing. To better understand the use of the CALL EXECUTE routine, it is essential for programmers to understand the mechanism and the timing of macro processing to avoid programming errors. These technical issues are discussed in detail in this paper.

Read the paper (PDF)

Are you tired of constantly creating new emails each and every time you run a report, frantically searching for the reports, attaching said reports, and writing emails, all the while thinking there has to be a better way? Then, have I got some code to share with you! This session provides you with code to flee from your old ways of emailing data and reports. Instead, you set up your SAS^® code to send an email to your recipients. The email attaches the most current files each and every time the code is run. You do not have to do anything manually after you run your SAS code. This session provides SAS programmers with instructions about how to create their own email in a macro that is based on their current reports. We demonstrate different options to customize the code to add the email body (and to change the body) and to add attachments (such as PDF and Excel). We show you an additional macro that checks whether a file exists and adds a note in the SAS log if it is missing so that you won't get a warning message. Using SAS code, you will become more efficient and effective by automating a tedious process and reducing errors in email attachments, wording, and recipient lists.

Read the paper (PDF)

The syntax to combine SAS^® data sets is simple: use the SET statement to concatenate, and use the MERGE and BY statements to merge. The data sets themselves, however, might be complex. Combining the data sets might not result in what you need. This paper reviews techniques to perform before you combine data sets, including checking for the following: common variables; common variables with different attributes; duplicate identifiers; duplicate observations; and acceptable match rates.

Read the paper (PDF)

SAS^® 9.4 Graph Template Language: Reference has more than 1300 pages and hundreds of options and statements. It is no surprise that programmers sometimes experience unexpected twists and turns when using the graph template language (GTL) to draw figures. Understandably, it is easy to become frustrated when your program fails to produce the desired graphs despite your best effort. Although SAS needs to continue improving GTL, this paper offers several tricks that help overcome some of the roadblocks in graphing.

Read the paper (PDF)

Can you actually get something for nothing? With PROC SQL's subquery and remerging features, then yes, you can. When working with categorical variables, there is often a need to add flag variables based on group descriptive statistics, such as group counts and minimum and maximum values. Instead of first creating the group count or minimum or maximum values, and then merging the summarized data set with the original data set with conditional statements creating a flag variable, why not take advantage of PROC SQL to complete three steps in one? With PROC SQL's subquery, CASE-WHEN clause, and summary functions by the group variable, you can easily remerge the new flag variable back with the original data set.

Read the paper (PDF)

With the 2014 launch of SAS^® University Edition, the reach of SAS^® was greatly expanded to educators, students, researchers, non-profits, and the curious, who for the first time could use a full version of Base SAS^® software for free. Because SAS University Edition allows a maximum of two CPUs, however, performance is curtailed sharply from more substantial SAS environments that can benefit from parallel and distributed processing, such as environments that implement SAS^® Grid Manager, Teradata, or Hadoop solutions. Even when comparing performance of SAS University Edition against the most straightforward implementation of the SAS windowing environment, the SAS windowing environment demonstrates greater performance when run on the same computer. With parallel processing and distributed computing becoming the status quo in SAS production environments, SAS University Edition will unfortunately fall behind counterpart SAS solutions if it cannot harness parallel processing best practices and performance. To curb this disparity, this session introduces groundbreaking programmatic methods that enable commodity hardware to be networked so that multiple instances of SAS University Edition can communicate and work collectively to divide and conquer complex tasks. With parallel processing facilitated, a SAS practitioner can now harness an endless number of computers to produce blitzkrieg solutions with the SAS University Edition that rival the performance of more costly, complex infrastructure.

This presentation brings together experiences from SAS^® professionals working as volunteers for organizations, charities, and in academic research. Pro bono work, much like that done by physicians, attorneys, and professionals in other areas, is rapidly growing in statistical practice as an important part of a statistical career, offering the opportunity to use your skills in a places where they are so needed but cannot be supported in a for-pay position. Statistical volunteers also gain important learning experiences, mentoring, networking, and other opportunities for professional development. The presenter shares experiences from volunteering for local charities, non-governmental organizations (NGOs) and other organizations and causes, both in the US and around the world. The mission, methods, and focus of some organizations are presented, including DataKind, Statistics Without Borders, Peacework, and others.

Read the paper (PDF)

The SAS^® LOCK statement was introduced in SAS^®7 with great pomp and circumstance, as it enabled SAS^® software to lock data sets exclusively. In a multiuser or networked environment, an exclusive file lock prevents other users and processes from accessing and accidentally corrupting a data set while it is in use. Moreover, because file lock status can be tested programmatically with the LOCK statement return code (&SYSLCKRC), data set accessibility can be validated before attempted access, thus preventing file access collisions and facilitating more reliable, robust software. Notwithstanding the intent of the LOCK statement, stress testing demonstrated in this session illustrates vulnerabilities in the LOCK statement that render its use inadvisable due to its inability to lock data sets reliably outside of the SAS/SHARE^® environment. To overcome this limitation and enable reliable data set locking, a methodology is demonstrated that uses semaphores (flags) that indicate whether a data set is available or is in use, and mutually exclusive (mutex) semaphores that restrict data set access to a single process at one time. With Base SAS^® file locking capabilities now restored, this session further demonstrates control table locking to support process synchronization and parallel processing. The LOCKSAFE macro demonstrates a busy-waiting (or spinlock) design that tests data set availability repeatedly until file access is achieved or the process times out.

Read the paper (PDF)

Survival analysis differs from other types of statistical analysis, including graphical summaries and regression modeling procedures, because data is almost always censored. The purpose of this project is to apply survival analysis techniques in SAS^® to practical survival data, aiming to understand the effects of gender and age on lung cancer patient survival at different cancer sites. Results show that both gender and age are significant variables in predicting lung cancer patient survival using the Cox proportional hazards model. Females have better survival than males when other variables in the model are fixed (p-value 0.0254). Moreover, the hazard of patients who are over 65 is 1.385 times that of patients who are under 65 (p-value 0.0145).

View the e-poster or slides (PDF)

Did you know that you could leverage the statistical power of the FREQ procedure and still be able to control the appearance of your output? Many people think they have to use procedures such as REPORT and TABULATE to be able to apply style options and control formats and headings for their output. However, if you pair PROC FREQ with a TEMPLATE procedure step, you can customize the appearance of your output and make enhancements to tables, such as adding colors and controlling headings. If you are a statistician, you know the many PROC FREQ options that produce high-level statistics. But did you also know that PROC FREQ can generate a graphical representation of those statistics? PROC FREQ can generate the graphs, and then you can use ODS Graphics and the Graph Template Language (GTL) to improve the appearance of the graphs. Written for intermediate users, this paper demonstrates how you can enhance the default output for PROC FREQ one-way and multi-way tables by modifying colors, formats, and labels. This paper also describes the syntax for creating graphs for multiple statistics, and it uses examples to show how you can customize these graphs.

Read the paper (PDF)

In the game of tag, being it is bad, but where accessibility compliance is concerned, being tagged is good! Tagging is required for PDF files to comply with accessibility standards such as Section 508 and the Web Content Accessibility Guidelines (WCAG). In the fourth maintenance release for SAS^® 9.4, the preproduction option in the ODS PDF statement, ACCESSIBLE, creates a tagged PDF file. We look at how this option changes the file that is created and focus on the SAS^® programming techniques that work best with the new option. You ll then have the opportunity to try it yourself in your own code and provide feedback to SAS.

Read the paper (PDF)

In 32 years as a SAS^® consultant at the Federal Reserve Board, I have seen some questions about common SAS tasks surface again and again. This paper collects the most common questions related to basic DATA step processing from my previous 'Tales from the Help Desk' papers, and provides code to explain and resolve them. The following tasks are reviewed: using the LAG function with conditional statements; avoiding character variable truncation; surrounding a macro variable with quotes in SAS code; handling missing values (arithmetic calculations versus functions); incrementing a SAS date value with the INTNX function; converting a variable from character to numeric or vice versa and keeping the same name; converting character or numeric values to SAS date values; using an array definition in multiple DATA steps; using values of a variable in a data set throughout a DATA step by copying the values into a temporary array; and writing data to multiple external files in a DATA step, determining file names dynamically from data values. In the context of discussing these tasks, the paper provides details about SAS processing that can help users employ SAS more effectively. See the references for seven previous papers that contain additional common questions.

Read the paper (PDF)

SAS^® Macro Language can be used to enhance many report-generating processes. This presentation showcases the potential that macros have in populating predesigned RTF templates. If you have multiple report templates saved, SAS^® can choose and populate the correct ones based on macro programming and DATA _NULL_ using the TRANSTRN function. The autocall macro %TRIM, combined with a macro (for example, &TEMPLATE), can be attached to the output RTF template name. You can design and save as many templates as you like or need. When SAS assigns the macro variable TEMPLATE a value, the %TRIM(&TEMPLATE) statement in the output pathway correctly populates the appropriate template. This can make life easy if you create multiple different reports based on one data set. All that's required are stored templates on accessible pathways.

View the e-poster or slides (PDF)

SAS offers generation data set structure as part of the language feature that many users are familiar with. They use it in their organizations and manage it using keywords such as GENMAX and GENNUM. While SAS operates in a mainframe environment, users also have the ability to tap into the GDG (generation data group) feature available on z/OS, OS/390, OS/370, IBM 3070, or IBM 3090 machines. With cost-saving initiatives across businesses and due to some scaling factors, many organizations are in the process of migrating to mid-tier platforms to cheaper operating platforms such as UNIX and AIX. Because Linux is open source and is a cheaper alternative, several organizations have opted for the UNIX distribution of SAS that can work in UNIX and AIX environments. While this might be a viable alternative, there are certain nuances that the migration effort brings to the technical conversion teams. On UNIX, the concept of GDGs does not exist. While SAS offers generation data sets, they are good only for SAS data sets. If the business organization needs to house and operate with a GDG-like structure for text data sets, there isn't one available. While my organization had a similar initiative to migrate programs used to run the subprime mortgage analytic, incentive, and regulatory reporting, we identified the paucity of literature and research on this topic. Hence, I ended up developing the utility that addresses this need. This is a simple macro that helps us closely simulate a GDG/GDS.

Read the paper (PDF) | View the e-poster or slides (PDF)

You've all seen the job posting that looks more like an advertisement for the ever-elusive unicorn. It begins by outlining the required skills that include a mixture of tools, technologies, and masterful things that you should be able to do. Unfortunately, many such postings begin with restrictions to those with advanced degrees in math, science, statistics, or computer science and experience in your specific industry. They must be able to perform predictive modeling, natural language processing, and, for good measure, candidates should apply only if they know artificial intelligence, cognitive computing, and machine learning. The candidate should be proficient in SAS^®, R, Python, Hadoop, ETL, real-time, in-cloud, in-memory, in-database and must be a master storyteller. I know of no one who would be able to fit that description and still be able to hold a normal conversation with another human. In our work, we have developed a competency model for analytics, which describes nine performance domains that encompass the knowledge, skills, behaviors, and dispositions that today's analytic professional should possess in support of a learning, analytically driven organization. In this paper, we describe the model and provide specific examples of job families and career paths that can be followed based on the domains that best fit your skills and interests. We also share with participants a self-assessment tool so that they can see where the stack up!

Read the paper (PDF)

As computer technology advances, SAS^® continually pursues opportunities to implement state-of-the-art systems that solve problems in data preparation and analysis faster and more efficiently. In this pursuit, we have extended the TRANSPOSE procedure to operate in a distributed fashion within both Teradata and Hadoop, using dynamically generated DS2 executed by the SAS^® Embedded Process and within SAS^® Viya , using its native transpose action. With its new ability to work within these environments, PROC TRANSPOSE provides you with access to its parallel processing power and produces results that are compatible with your existing SAS programs.

Read the paper (PDF)

Have you ever had your macro code not work and you couldn't figure out why? Maybe even something as simple as %if &sysscp=WIN %then LIBNAME libref 'c:\temp'; ? This paper is designed for programmers who know %LET and can write basic macro definitions already. Now let's take your macro skills a step farther by adding to your skill set. The %IF statement can be a deceptively tricky statement due to how IF statements are processed in a DATA step and how that differs from how %IF statements are processed by the macro processor. Focus areas of this paper are: 1) emphasizing the importance of the macro facility as a code-generation facility; 2) how an IF statement in a DATA step differs from a macro %IF statement and when to use which; 3) why semicolons can be misinterpreted in an %IF statement.

Read the paper (PDF)

You might scream in pain or cry with joy that SAS^® software can directly produce output in Microsoft Excel as .xlsx workbooks. Excel is an excellent vehicle for delivering large amounts of summary information that needs to be partitioned for human review, exploratory filtering, and sorting. SAS supports ODS EXCEL as a production destination. This paper discusses using the ODS EXCEL statement and the TABULATE and REPORT procedures in the domain of summarizing cross-sectional data extracted from a medical claims database. The discussion covers data preparation, report preparation, and tabulation statements such as CLASS, CLASSLEV, and TABLE. The effects of STYLE options and the TAGATTR suboption for inserting features that are specific to Excel such as formulas, formats, and alignment are covered in detail. A short discussion of reusing these concepts in PROC REPORT statements such as DEFINE, COMPUTE, and CALL DEFINE are also covered.

Read the paper (PDF)

Does your job require you to create reports in Microsoft Excel on a quarterly, monthly, or even weekly basis? Are you creating all or part of these reports by hand, referencing another sheet containing rows and rows and rows of data? If so, stop! There is a better way! The new ODS destination for Excel enables you to create native Excel files directly from SAS^®. Now you can include just the data you need, create great-looking tabular output, and do it all in a fraction of the time! This paper shows you how to use the REPORT procedure to create polished tables that contain formulas, colored cells, and other customized formatting. Also presented in the paper are the destination options used to create various workbook structures, such as multiple tables per worksheet. Using these techniques to automate the creation of your Excel reports will save you hours of time and frustration, enabling you to pursue other endeavors.

Read the paper (PDF)

In the 2015-2016 season of the National Basketball Association (NBA), the Golden State Warriors achieved a record-breaking 73 regular-season wins. This accomplishment would not have been possible without their reigning Most Valuable Player (MVP) champion Stephen Curry and his historic shooting performance. Shattering his previous NBA record of 286 three-point shots made during the 2014-2015 regular season, he accrued an astounding 402 in the next season. With an increased emphasis on the advantages of the three-point shot and guard-heavy offenses in the NBA today, organizations are naturally eager to investigate player statistics related to shooting at long ranges, especially for the best of shooters. Furthermore, the addition of more advanced data-collecting entities such as SportVU creates an incredible opportunity for data analysis, moving beyond simply using aggregated box scores. This work uses quantile regression within SAS^® 9.4 to explore the relationships between the three-point shot and other relevant advanced statistics, including some SportVU player-tracking data, for the top percentile of three-point shooters from the 2015-2016 NBA regular season.

View the e-poster or slides (PDF)

You might encounter people who used SAS^® long ago (perhaps in university), or people who had a very limited use of SAS in a job. Some of these people with limited knowledge and experience think that SAS is just a statistics package or just a GUI. Those that think of it as a GUI usually reference SAS^® Enterprise Guide^® or, if it was a really long time ago, SAS/AF^® or SAS/FSP^®. The reality is that the modern SAS system is a very large and complex ecosystem, with hundreds of software products and diversified tools for programmers and users. This poster provides diagrams and tables that illustrate the complexity of the SAS system from the perspective of a programmer. Diagrams and illustrations include the functional scope and operating systems in the ecosystem; different environments that program code can run in; cross-environment interactions and related tools; SAS^® Grid Computing: parallel processing; how SAS can run with files in memory (the legacy SAFILE statement and big data and Hadoop); and how some code can run in-database. We end with a tabulation of the many programming languages and SQL dialects that are directly or indirectly supported within SAS. This poster should enlighten those who think that SAS is an old, dated statistics package or just a GUI.

View the e-poster or slides (PDF)

Predictive analytics is powerful in its ability to predict likely future behavior, and is widely used in marketing. However, the old adage timing is everything continues to hold true the right customer and the right offer at the wrong time is less than optimal. In life, timing matters a great deal, but predictive analytics seldom takes timing into account explicitly. We should be able to do better. Financial service consumption changes often have precursor changes in behavior, and a behavior change can lead to multiple subsequent consumption changes. One way to improve our awareness of the customer situation is to detect significant events that have meaningful consequences that warrant proactive outreach. This session presents a simple time series approach to event detection that has proven to work successfully. Real case studies are discussed to illustrate the approach and implementation. Adoption of this practice can augment and enhance predictive analytics practice to elevate our game to the next level.

Read the paper (PDF)

Public water supplies contain disease-causing microorganisms in the water or distribution ducts. To kill off these pathogens, a disinfectant, such as chlorine, is added to the water. Chlorine is the most widely used disinfectant in all US water treatment facilities. Chlorine is known to be one of the most powerful disinfectants to restrict harmful pathogens from reaching the consumer. In the interest of obtaining a better understanding of what variables affect the levels of chlorine in the water, this presentation analyzed a particular set of water samples randomly collected from locations in Orange County, Florida. Thirty water samples were collected and their chlorine level, temperature, and pH were recorded. A linear regression analysis was performed on the data collected with several qualitative and quantitative variables. Water storage time, temperature, time of day, location, pH, and dissolved oxygen level were the independent variables collected from each water sample. All data collected was analyzed using various SAS^® procedures. Partial residual plots were used to determine possible relationships between the chlorine level and the independent variables. A stepwise selection was used to eliminate possible insignificant predictors. From there, several possible models for the data were selected. F-tests were conducted to determine which of the models appeared to be the most useful.

View the e-poster or slides (PDF)

Knowing which SAS^® products are being used in your organization, by whom, and how often helps you decide whether you have the right mix and quantity licensed. These questions are not easy to answer. We present an innovative technique using three SAS utilities to answer these questions. This paper includes example code written for Linux that can easily be modified for Windows and other operating systems.

Read the paper (PDF) | View the e-poster or slides (PDF)

The traditional model of SAS^® source-code production is for all code to be directly written by users or indirectly written (that is, generated by user-written macros, Lua code, or with DATA steps). This model was recently extended to enable SAS macro code to operate on arbitrary text (for example, on HTML) using the STREAM procedure. In contrast, SAS includes many products that operate in the client/server environment and function as follows: 1) the user interacts with the product via a GUI to specify the processing desired; 2) the product saves the user-specifications in metadata and generates SAS source code for the target processing; 3) the source code is then run (per user directions) to perform the processing. Many of these products give users the ability to modify the generated code and/or insert their own user-written code. Also, the target code (system-generated plus optional user-written) can be exported or deployed to be run as a stored process, in batch, or in another SAS environment. In this paper, we review the SAS ecosystem contexts where source code is produced, the pros and cons of each approach, discuss why some system-generated code is inelegant, and make some suggestions for determining when to write the code manually, and when and how to use system-generated code.

Read the paper (PDF)

Visualizing the movement of people over time in an animation can provide insights that tables and static graphs cannot. There are many options, but what if you want to base the visualization on large amounts of data from several sources? SAS^® is a great tool for this type of project. This paper summarizes how visualizing movement is accomplished using several data sets, large and small, and using various SAS procedures to pull it together. The use of a custom shape file is also highlighted. The end result is a GIF, which can be shared, that provides insights not available with other methods.

Read the paper (PDF)

Do you have a complex report involving multiple tables, text items, and graphics that could best be displayed in a multi-tabbed spreadsheet format? The Output Delivery System (ODS) destination for Excel, introduced in SAS^® 9.4, enables you to create Microsoft Excel workbooks that easily integrate graphics, text, and tables, including column labels, filters, and formatted data values. In this paper, we examine the syntax used to generate a multi-tabbed Excel report that incorporates output from the REPORT, PRINT, SGPLOT, and SGPANEL procedures.

Read the paper (PDF)

Are you frustrated with manually setting options to control your SAS^® Display Manager sessions but become daunted every time you look at all the places you can set options and window layouts? In this paper, we look at various files SAS^® accesses when starting, what can (and cannot) go into them, and what takes precedence after all are executed. We also look at the SAS registry and how to programmatically change settings. By the end of the paper, you will be comfortable in knowing where to make the changes that best fit your needs.

Read the paper (PDF)

This paper discusses the techniques I used at the US Census Bureau to overcome the issue of dealing with large amounts of data while modernizing some of their public-facing web applications by using service-oriented architecture (SOA) to deploy JavaScript web applications powered by SAS^®. The paper covers techniques that resulted in reducing 1,753,926 records (82 MB) down to 58 records (328 KB), a 99.6% size reduction in summarized data on the server side.

Read the paper (PDF)

One of the research goals in public health is to estimate the burden of diseases on the US population. We describe burden of disease by analyzing the statistical association of various diseases with hospitalizations, emergency department (ED) visits, ambulatory/outpatient (doctors' offices) visits, and deaths. In this short paper, we discuss the use of large, nationally representative databases, such as those offered by the National Center for Health Statistics (NCHS) or the Agency for Healthcare Research and Quality (AHRQ), to produce reliable estimates of diseases for studies. In this example, we use SAS^® and SUDAAN to analyze the Nationwide Emergency Department Sample (NEDS), offered by AHRQ, to estimate ED visits for hand, foot, and mouth disease (HFMD) in children less than five years old.

Read the paper (PDF) | View the e-poster or slides (PDF)

Chemical incidents involving irritant chemicals such as chlorine pose a significant threat to life and require rapid assessment. Data from the Validating Triage for Chemical Mass Casualty Incidents A First Step R01 grant was used to determine the most predictive signs and symptoms (S/S) for a chlorine mass casualty incident. SAS^® 9.4 was used to estimate sensitivity, specificity, positive and negative predictive values, and other statistics of irritant gas syndrome agent S/S for two exiting systems designed to assist emergency responders in hazardous material incidents (Wireless Information System for Emergency Responders (WISER) and CHEMM Intelligent Syndrome Tool (CHEMM-IST)). The results for WISER showed the sensitivity was .72 to 1.0; specificity .25 to .47; and the positive predictive value and negative predictive value were .04 to .87 and .33 to 1.0, respectively. The results for CHEMM-IST showed the sensitivity was .84 to .97; specificity .29 to .45; and the positive predictive value and negative predictive value were .18 to .42 and .86 to .97, respectively.

Read the paper (PDF) | View the e-poster or slides (PDF)

It is often necessary to assess multi-rater agreement for multiple-observation categories in case-controlled studies. The Kappa statistic is one of the most common agreement measures for categorical data. The purpose of this paper is to show an approach for using SAS^® 9.4 procedures and the SAS^® Macro Language to estimate Kappa with 95% CI for pairs of nurses that used two different triage systems during a computer-simulated chemical mass casualty incident (MCI). Data from the Validating Triage for Chemical Mass Casualty Incidents A First Step R01 grant was used to assess the performance of a typical hospital triage system called the Emergency Severity Index (ESI), compared with an Irritant Gas Syndrome Agent (IGSA) triage algorithm being developed from this grant, to quickly prioritize the treatment of victims of IGSA incidents. Six different pairs of nurses used ESI triage, and seven pairs of nurses used the IGSA triage prototype to assess 25 patients exposed to an IGSA and 25 patients not exposed. Of the 13 pairs of nurses in this study, two pairs were randomly selected to illustrate the use of the SAS Macro Language for this paper. If the data was not square for two nurses, a square-form table for observers using pseudo-observations was created. A weight of 1 for real observations and a weight of .0000000001 for pseudo-observations were assigned. Several macros were used to reduce programming. In this paper, we show only the results of one pair of nurses for ESI.

Read the paper (PDF) | View the e-poster or slides (PDF)

The ODS EXCEL destination has made sharing SAS^® reports and graphs much easier. What is even more exciting is that this destination is available for use regardless of the platform. This is extremely useful when reporting is performed on remote servers. This presentation goes through the basics of using the ODS EXCEL destination and shows specific examples of how to use this in a remote environment. Examples for both SAS^® on Windows and in SAS^® Enterprise Guide^® are provided.

Read the paper (PDF)

Visualization of complex data can be a valuable tool for researchers and policy makers, and Base SAS^® has powerful tools for such data exploration. In particular, SAS/GRAPH^® software is a flexible tool that enables the analyst to create a wide variety of data visualizations. This paper uses SAS^® to visualize complex demographic data related to the membership of a large American healthcare provider. Kaiser Permanente (KP) has demographic data about 4 million active members in Southern California. We use SAS to create a number of geographic visualizations of KP demographic data related to membership at the census-block level of detail and higher. Demographic data available from the US Census' American Community Survey (ACS) at the same level of geographic organization are also used as comparators to show how the KP membership differs from the demographics of the geographies from which it draws. In addition, we use SAS to create a number of visualizations of KP demographic data related to utilizations (inpatient and outpatient) at the medical center area level through time. As with the membership data, data available from the ACS is used as a comparator to show how patterns of KP utilizations at various medical centers compare to the demographics of the populations that these medical centers serve. The paper will be of interest to programmers learning how to use SAS to visualize data and to researchers interested in the demographics of one of the largest health care providers in the US.

Read the paper (PDF)

Have you ever been working on a task and wondered whether there might be a SAS^® function that could save you some time? Let alone, one that might be able to do the work for you? Data review and validation tasks can be time-consuming efforts. Any gain in efficiency is highly beneficial, especially if you can achieve a standard level where the data itself can drive parts of the process. The ANY and NOT functions can help alleviate some of the manual work in many tasks such as data review of variable values, data compliance, data formats, and derivation or validation of a variable's data type. The list goes on. In this poster, we cover the functions and their details and use them in an example of handling date and time data and mapping it to ISO 8601 date and time formats.

Read the paper (PDF) | View the e-poster or slides (PDF)

For the past couple of years, it seems that big data has been a buzzword in the industry. We have more and more data coming in from more and more places, and it is our job to figure out how best to handle it. One way to attempt to organize data is with arrays, but what do you do when the array you are attempting to populate is so large that it cannot be handled in memory? How do you handle a large array when most of the elements are missing? This paper and presentation deals with the concept of a sparse matrix. A sparse matrix is a large array with relatively few actual elements. We address methods for handling such a construct while keeping memory, CPU, clock, and programmer time to their respective minimums.

A SAS^® program with the extension .SAS is simply a text file. This fact opens the door to many powerful results. You can read a typical SAS program into a SAS data set as a text file with a character variable, with one line of the program being one record in the data set. The program's code can be changed, and a new program can be written as a simple text file with a .SAS extension. This presentation shows an example of dynamically editing SAS code on the fly and generating statistics about SAS programs.

Read the paper (PDF)

SAS^® has many methods of doing table lookups in DATA steps: formats, arrays, hash objects, the SASMSG function, indexed data sets, and so on. Of these methods, hash objects and indexed data sets enable you to specify multiple lookup keys and to return multiple table values. Both methods can be updated dynamically in the middle of a DATA step as you obtain new information (such as reading new keys from an input file or creating new synthetic keys). Hash objects are very flexible, fast, and fairly easy to use, but they are limited by the amount of data that can be held in memory. Indexed data sets can be slower, but they are not limited by what can be held in memory. As a result, they might be your only option in some circumstances. This presentation discusses how to use an indexed data set for table lookup and how to update it dynamically using the MODIFY statement and its allies.

Read the paper (PDF)

The SAS RAKING macro, introduced in 2000, has been implemented by countless survey researchers worldwide. The authors receive messages from users who tirelessly rake survey data using all three generations of the macro. In this poster, we present the fourth generation of the macro, cleaning up remnants from the previous versions, and resolving user-reported confusion. Most important, we introduce a few helpful enhancements including: 1) An explicit indicator for trimming (or not trimming) the weight that substantially saves run time when no trimming is needed. 2) Two methods of weight trimming, AND and OR, that enable users to overcome a stubborn non-convergence. When AND is indicated, weight trimming occurs only if both (individual and global) high weight cap values are true. Conversely, weight increase occurs only if both low weight cap values are true. When OR is indicated, weight trimming occurs if either of the two (individual or global) high weight cap values is true. Conversely, weight increase occurs if either of the two low weight cap values is true. 3) Summary statistics related to the number of cases with trimmed or increased weights have been expanded. 4) We introduce parameters that enable users to use different criteria of convergence for different raking marginal variables. We anticipate that these innovations will be enthusiastically received and implemented by the survey research community.