Nowadays, most corporations build and maintain their own data warehouse, and an ETL (Extract, Transform, and Load) process plays a critical role in managing the data. Some people might create a large program and execute this program from top to bottom. Others might generate a SAS® driver with several programs included, and then execute this driver. If some programs can be run in parallel, then developers must write extra code to handle these concurrent processes. If one program fails, then users can either rerun the entire process or comment out the successful programs and resume the job from where the program failed. Usually the programs are deployed in production with read and execute permission only. Users do not have the priviledge of modifying codes on the fly. In this case, how do you comment out the programs if the job terminated abnormally? This paper illustrates an approach for managing ETL process flows. The approach uses a framework based on SAS, on a UNIX platform. This is a high-level infrastructure discussion with some explanation of the SAS codes that are used to implement the framework. The framework supports the rerun or partial run of the entire process without changing any source codes. It also supports the concurrent process, and therefore no extra code is needed.
Kevin Chung, Fannie Mae
Stepwise regression includes regression models in which the predictive variables are selected by an automated algorithm. The stepwise method involves two approaches: backward elimination and forward selection. Currently, SAS® has three procedures capable of performing stepwise regression: REG, LOGISTIC, and GLMSELECT. PROC REG handles the linear regression model, but does not support a CLASS statement. PROC LOGISTIC handles binary responses and allows for logit, probit, and complementary log-log link functions. It also supports a CLASS statement. The GLMSELECT procedure performs selections in the framework of general linear models. It allows for a variety of model selection methods, including the LASSO method of Tibshirani (1996) and the related LAR method of Efron et al. (2004). PROC GLMSELECT also supports a CLASS statement. We present a stepwise algorithm for generalized linear mixed models for both marginal and conditional models. We illustrate the algorithm using data from a longitudinal epidemiology study aimed to investigate parents beliefs, behaviors, and feeding practices that associate positively or negatively with indices of sleep quality.
Nagaraj Neerchal, University of Maryland Baltimore County
Jorge Morel, Procter and Gamble
Xuang Huang, University of Maryland Baltimore County
Alain Moluh, University of Maryland Baltimore County
SAS® functions provide amazing power to your DATA step programming. Some of these functions are essential some of them save you writing volumes of unnecessary code. This paper covers some of the most useful SAS functions. Some of these functions might be new to you, and they will change the way you program and approach common programming tasks.
Ron Cody, Camp Verde Associates
This paper simply develops a new SAS® macro, which allows you to scrap user textual reviews from Apple iTunes store for iPhone applications. It not only can help you understand your customers experiences and needs, but also can help you be aware of your competitors user experiences. The macro uses API in iTunes and PROC HTTP in SAS to extract and create data sets. This paper also shows how you can use the application ID and country code to extract user reviews.
Jiawen Liu, Qualex Consulting Services, Inc.
Mantosh Kumar Sarkar, Verizon
Meizi Jin, Oklahoma State University
Goutam Chakraborty, Oklahoma State University
The Base SAS® 9.4 Output Delivery System (ODS) EPUB destination enables users to deliver SAS® reports as e-books on Apple mobile devices. The first maintenance release of SAS® 9.4 adds the ODS EPUB3 destination, which offers powerful new multimedia and presentation features to report writers. This paper shows you how to include images, audio, and video in your ODS EPUB3 e-book reports. You learn how to use publishing presentation techniques such as sidebars and multicolumn layouts. You become familiar with best practices for accessibility when employing these new features in your reports. This paper provides advanced instruction for writing e-books with ODS EPUB. Please bring your iPad, iPhone, or iPod to the presentation so that you can download and read the examples.
David Kelley, SAS
This paper is an introduction to SAS® Studio and covers how to perform basic programming tasks in SAS Studio. Many people program in the SAS® language by using SAS Display Manager or SAS® Enterprise Guide®. SAS Studio is different because it enables you to write and run SAS code by using the most popular web browsers, without requiring a SAS® 9.4 installation on your machine. With SAS Studio, you can access your data files, libraries, and existing programs, and write new programs while using SAS software behind the scenes. SAS Studio connects to a SAS sever in order to process SAS programs. The SAS server can be a hosted server in a cloud environment, a server in your local environment, or a copy of SAS on your local machine.
Michael Monaco, SAS
Marie Dexter, SAS
Jennifer Tamburro, SAS
Big data! Hadoop! MapReduce! These are all buzzwords that you ve probably already heard mentioned at SAS® Global Forum 2014. But what exactly is MapReduce and what has it got to do with SAS®? This talk explains how a simple processing framework (created by Google and more recently popularized by the open-source technology Hadoop) can be replicated using cornerstone SAS technologies such as Base SAS®, SAS macros, and SAS/CONNECT®. The talk explains how, out of the box, the SAS DATA step can replicate the MAP function. It looks at how well-established SAS procedures can be used to create reduce-like functionality. We look at how parallel processing data across multiple machines using MPCONNECT can replicate MapReduce s shared-nothing approach to data processing.
David Moors, Whitehound Limited
A central component of discussions of healthcare reform in the U.S. are estimations of healthcare cost and use at the national or state level, as well as for subpopulation analyses for individuals with certain demographic properties or medical conditions. For example, a striking but persistent observation is that just 1% of the U.S. population accounts for more than 20% of total healthcare costs, and 5% account for almost 50% of total costs. In addition to descriptions of specific data sources underlying this type of observation, we demonstrate how to use SAS® to generate these estimates and to extend the analysis in various ways; that is, to investigate costs for specific subpopulations. The goal is to provide SAS programmers and healthcare analysts with sufficient data-source background and analytic resources to independently conduct analyses on a wide variety of topics in healthcare research. For selected examples, such as the estimates above, we concretely show how to download the data from federal web sites, replicate published estimates, and extend the analysis. An added plus is that most of the data sources we describe are available as free downloads.
Paul Gorrell, IMPAQ International
A SAS® license of any organization consists of a variety of SAS components such as SAS/STAT®, SAS/GRAPH®, SAS/OR®, and so on. SAS administrators do not have any automated tool supplied with Base SAS® software to find how many licensed copies are being actively used, how many SAS users are actively utilizing the SAS server, and how many SAS datasets are being referenced. These questions help a SAS administrator to take important decisions such as controlling SAS licenses, removing inactive SAS users, purging long-time non-referenced SAS data sets, and so on. With the help of a system parameter that is provided by SAS and called RTRACE, these questions can be answered. The goal of this paper is to explain the setup of the RTRACE parameter and to explain its use in making the SAS administrator s life easy. This paper is based on SAS® 9.2 running on AIX operating system.
Airaha Chelvakkanthan Manickam, Cognizant Technology Solutions
Sampling is widely used in different fields for quality control, population monitoring, and modeling. However, the purposes of sampling might be justified by the business scenario, such as legal or compliance needs. This paper uses one probability sampling method, stratified sampling, combined with quality control review business cost to determine an optimized procedure of sampling that satisfies both statistical selection criteria and business needs. The first step is to determine the total number of strata by grouping the strata with a small number of sample units, using box-and-whisker plots outliers as a whole. Then, the cost to review the sample in each stratum is justified by a corresponding business counter-party, which is the human working hour. Lastly, using the determined number of strata and sample review cost, optimal allocation of predetermined total sample population is applied to allocate the sample into different strata.
Yi Du, Freddie Mac
Paper SAS051-2014:
Ask Vince: Moving SAS® Data and Analytical Results to Microsoft Excel
This presentation is an open-ended discussion about techniques for transferring data and analytical results from SAS® to Microsoft Excel. There will be some introductory comments, but this presentation does not have any set content. Instead, the topics discussed are dictated by attendee questions. Come prepared to ask and get answers to your questions. To submit your questions or suggestions for discussion in advance, go to http://support.sas.com/surveys/askvince.html.
Vince DelGobbo, SAS
In our previous work, we often needed to perform large numbers of repetitive and data-driven post-campaign analyses to evaluate the performance of marketing campaigns in terms of customer response. These routine tasks were usually carried out manually by using Microsoft Excel, which was tedious, time-consuming, and error-prone. In order to improve the work efficiency and analysis accuracy, we managed to automate the analysis process with SAS® programming and replace the manual Excel work. Through the use of SAS macro programs and other advanced skills, we successfully automated the complicated data-driven analyses with high efficiency and accuracy. This paper presents and illustrates the creative analytical ideas and programming skills for developing the automatic analysis process, which can be extended to apply in a variety of business intelligence and analytics fields.
Justin Jia, Canadian Imperial Bank of Commerce (CIBC)
Amanda Lin, Bell Canada
This paper kicks off a project to write a comprehensive book of best practices for documenting SAS® projects. The presenter s existing documentation styles are explained. The presenter wants to discuss and gather current best practices used by the SAS user community. The presenter shows documentation styles at three different levels of scope. The first is a style used for project documentation, the second a style for program documentation, and the third a style for variable documentation. This third style enables researchers to repeat the modeling in SAS research, in an alternative language, or conceptually.
Peter Timusk, Statistics Canada
As IT professionals, saving time is critical. Delivering timely and quality-looking reports and information to management, end users, and customers is essential. SAS® provides numerous 'canned' PROCedures for generating quick results to take care of these needs ... and more. In this hands-on workshop, attendees acquire basic insights into the power and flexibility offered by SAS PROCedures using PRINT, FORMS, and SQL to produce detail output; FREQ, MEANS, and UNIVARIATE to summarize and create tabular and statistical output; and data sets to manage data libraries. Additional topics include techniques for informing SAS which data set to use as input to a procedure, how to subset data using a WHERE statement (or WHERE= data set option), and how to perform BY-group processing.
Kirk Paul Lafler, Software Intelligence Corporation
The emerging discipline of data governance encompasses data quality assurance, data access and use policy, security risks and privacy protection, and longitudinal management of an organization s data infrastructure. In the interests of forestalling another bureaucratic solution to data governance issues, this presentation features database programming tools that provide rapid access to big data and make selective access to and restructuring of metadata practical.
Sigurd Hermansen, Westat
The life of a SAS® program can be broken down into sets of changes made over time. Programmers are generally focused on the future, but when things go wrong, a look into the past can be invaluable. Determining what changes were made, why they were made, and by whom can save both time and headaches. This paper discusses version control and the current options available to SAS® Enterprise Guide® users. It then highlights the upcoming Program History feature of SAS Enterprise Guide. This feature enables users to easily track changes made to SAS programs. Properly managing the life cycle of your SAS programs will enable you to develop with peace of mind.
Joe Flynn, SAS
Casey Smith, SAS
Alex Song, SAS
When a SAS® user asked for help scanning words in textual data and then matching them to pre-scored keywords, it struck a chord with SAS programmers! They contributed code that solved the problem using hash structures, SQL, informats, arrays, and PRX routines. Of course, the next question was which program is fastest! This paper compares the different approaches and evaluates the performance of the programs on varying amounts of data. The code for each program is provided to show how SAS has a variety of tools available to solve common problems. While this won t make you an expert on any of these programming techniques, you ll see each of them in action on a common problem.
Tom Kari, Tom Kari Consulting
This paper describes a method that uses some simple SAS® macros and SQL to merge data sets containing related data that contains rows with varying effective date ranges. The data sets are merged into a single data set that represents a serial list of snapshots of the merged data, as of a change in any of the effective dates. While simple conceptually, this type of merge is often problematic when the effective date ranges are not consecutive or consistent, when the ranges overlap, or when there are missing ranges from one or more of the merged data sets. The technique described was used by the Fairfax County Human Resources Department to combine various employee data sets (Employee Name and Personal Data, Personnel Assignment and Job Classification, Personnel Actions, Position-Related data, Pay Plan and Grade, Work Schedule, Organizational Assignment, and so on) from the County's SAP-HCM ERP system into a single Employee Action History/Change Activity file for historical reporting purposes. The technique currently is used to combine fourteen data sets, but is easily expandable by inserting a few lines of code using the existing macros.
James Moon, County of Fairfax, Virginia
With the growth in size and complexity of organizations investing in SAS® platform technologies, the size and complexity of ETL subsystems and data integration (DI) jobs is growing at a rapid rate. Developers are pushed to come up with new and innovative ways to improve process efficiency in their DI jobs to meet increasingly demanding service level agreements (SLAs). The ability to conditionally execute or switch paths in a DI job is an extremely useful technique for improving process efficiency. How can a SAS® Data Integration developer design a job to best suit conditional execution? This paper discusses a technique for providing a parameterized dynamic execution custom transformation that can be easily incorporated into SAS® Data Integration Studio jobs to provide process path switching capabilities. The aim of any data integration task is to ensure that all sources of business data are integrated as efficiently as possible. It is concerned with the repurposing of data via transformation, should be a value-adding process, and also should be the product of collaboration. Modularization of common or repeatable processes is a fundamental part of the collaboration process in DI design and development. Switch path a custom transformation built to conditionally execute branches or nodes in SAS Data Integration Studio provides a reusable module for solving the conditional execution limitations of standard SAS Data Integration Studio transformations and jobs. Switch Path logic in SAS Data Integration Studio can serve many purposes in day-to-day business needs for a SAS data integration developer as it is completely reusable
Prajwal Shetty, Tesco
This presentation explains how to use Base SAS®9 software to create multi-sheet Microsoft Excel workbooks. You learn step-by-step techniques for quickly and easily creating attractive multi-sheet Excel workbooks that contain your SAS® output using the ExcelXP ODS tagset. The techniques can be used regardless of the platform on which your SAS software is installed. You can even use them on a mainframe! Creating and delivering your workbooks on-demand and in real time using SAS server technology is discussed. Although the title is similar to previous presentations by this author, this presentation contains new and revised material not previously presented.
Vince DelGobbo, SAS
Business Intelligence platforms provide a bridge between expert data analysts and decision-makers and other end-users. But what do you do when you can identify no system that meets both your needs and your budget? If you are the Consolidated Data Analysis Center in the HHS Office of Inspector General, you use SAS® Enterprise BI Server and the SAS® Stored Process Web Application to build your own. This presentation covers the inception, design, and implementation of the PAYment by Geographic Area (PAYGAR) system, which uses only SAS® Enterprise BI tools, namely the SAS Stored Process Web Application, PROC GMAP, and HTML/JAVA embedded in a DATA step, to create an interactive platform for presenting and exploring data that has a geographic component. In particular, the presentation reviews how we created a system of chained stored processes to enable a user to select the data to be presented, navigate through different geographic levels, and display companion reports related to the current data and geographic selections. It also covers the creation of the HTML front-end that sits over and manages the system. Throughout, the presentation emphasizes the scalability of PAYGAR, which the SAS Stored Process Web Application facilitates.
Scott Hutchison, HHS Office of Inspector General
John Venturini, Piper Enterprise Solutions
The Washington D.C. aqueduct was completed in 1863, carrying desperately needed clean water to its many residents. Just as the aqueduct was vital and important to its residents, a lifeline if you will, so too is the supply of data to the business. Without the flow of vital information, many businesses would not be able to make important decisions. The task of building my company s first dashboard was brought before us by our CIO; the business had not asked for it. In this poster, I discuss how we were able to bring fresh ideas and data to our business units by converting the data they saw on a daily basis in reports to dashboards. The road to success was long with plenty of struggles from creating our own business requirements to building data marts, synching SQL to SAS®, using information maps and SAS® Enterprise Guide® projects to move data around, all while dealing with technology and other I.T. team roadblocks. Then on to designing what would become our real-time dashboards, fighting for SharePoint single sign-on, and, oh yeah, user adoption. My story of how dashboards revitalized the business is a refreshing tale for all levels.
Jennifer McBride, Virginia Credit Union
Cross-visit checks are a vital part of data cleaning for longitudinal studies. The nature of longitudinal studies encourages repeatedly collecting the same information. Sometimes, these variables are expected to remain static, go away, increase, or decrease over time. This presentation reviews the na ve and the better approaches at handling one-variable and two-variable consistency checks. For a single-variable check, the better approach features the new ALLCOMB function, introduced in SAS® 9.2. For a two-variable check, the better approach uses the .first pseudo-class to flag inconsistencies. This presentation will provide you the tools to enhance your longitudinal data cleaning process.
Lauren Parlett, Johns Hopkins University
Having data that are consistent, reliable, and well linked is one of the biggest challenges faced by financial institutions. The paper describes how the SAS® Data Management offering helps to connect people, processes, and technology to deliver consistent results for data sourcing and analytics teams, and minimizes the cost and time involved in the development life cycle. The paper concludes with best practices learned from various enterprise data initiatives.
Anand Jagarapu, Arunam Technologies LLC
Your company s chronically overloaded SAS® environment, adversely impacted user community, and the resultant lackluster productivity have finally convinced your upper management that it is time to upgrade to a SAS® grid to eliminate all the resource problems once and for all. But after the contract is signed and implementation begins, you as the SAS administrator suddenly realize that your company-wide standard mode of SAS operations, that is, using the traditional SAS® Display Manager on a server machine, runs counter to the expectation of the SAS grid your users are now supposed to switch to SAS® Enterprise Guide® on a PC. This is utterly unacceptable to the user community because almost everything has to change in a big way. If you like to play a hero in your little world, this is your opportunity. There are a number of things you can do to make the transition to the SAS grid as smooth and painless as possible, and your users get to keep their favorite SAS Display Manager.
Houliang Li, HL SASBIPros Inc
New innovative, analytical techniques are necessary to extract patterns in big data that have temporal and geo-spatial attributes. An approach to this problem is required when geo-spatial time series data sets, which have billions of rows and the precision of exact latitude and longitude data, make it extremely difficult to locate patterns of interest The usual temporal bins of years, months, days, hours, and minutes often do not allow the analyst to have control of the precision necessary to find patterns of interest. Geohashing is a string representation of two-dimensional geometric coordinates. Time hashing is a similar representation, which maps time to preserve all temporal aspects of the date and time of the data into a one-dimensional set of data points. Geohashing and time hashing are both forms of a Z-order curve, which maps multidimensional data into single dimensions and preserves the locality of the data points. This paper explores the use of a multidimensional Z-order curve, combining both geohashing and time hashing, that is known as geo-temporal hashing or space-time boxes using SAS®. This technique provides a foundation for reducing the data into bins that can yield new methods for pattern discovery and detection in big data.
Richard La Valley, Leidos
Abraham Usher, Human Geo Group
Don Henderson, Henderson Consulting Services
Paul Dorfman, Dorfman Consulting
Very often, there is a need to present the analysis output from SAS® through web applications. On these occasions, it would make a lot of difference to have highly interactive charts over static image charts and graphs. Not only this is visually appealing, with features like zooming, filtering, etc., it enables consumers to have a better understanding of the output. There are a lot of charting libraries available in the market which enable us to develop cool charts without much effort. Some of the packages are Highcharts, Highstock, KendoUI, and so on. They are developed in JavaScript and use the latest HTML5 components, and they also support a variety of chart types such as line, spline, area, area spline, column, bar, pie, scatter, angular gauges, area range, area spline range, column range, bubble, box plot, error bars, funnel, waterfall, polar chart types etc. This paper demonstrates how we can combine the data processing and analytic powers of SAS with the visualization abilities of these charting libraries. Since most of them consume JSON-formatted data, the emphasis is on JSON producing capabilities of SAS, both with PROC JSON and other custom programming methods. The example would show how easy it is to do develop a stored process which produces JSON data which would be consumed by the charting library with minimum change to the sample program.
Rajesh Inbasekaran, Kavi Associates
Naren Mudivarthy, Kavi Associates
Neetha Sindhu, Kavi Associates
APP is an unofficial collective abbreviation for the SAS® functions ADDR, PEEK, PEEKC, the CALL POKE routine, and their so-called LONG 64-bit counterparts the SAS tools designed to directly read from and write to physical memory in the DATA step. APP functions have long been a SAS dark horse. First, the examples of APP usage in SAS documentation amount to a few technical report tidbits intended for mainframe system programming, with nary a hint how the functions can be used for data management programming. Second, the documentation note on the CALL POKE routine is so intimidating in tone that many potentially receptive folks might decide to avoid the allegedly precarious route altogether. However, little can stand in the way of an inquisitive SAS programmer daring to take a close look, and it turns out that APP functions are very simple and useful tools! They can be used to explore how things really work, to make code more concise, to implement en masse data movement, and they can often dramatically improve execution efficiency. The author and many other SAS experts (notably Peter Crawford, Koen Vyverman, Richard DeVenezia, Toby Dunn, and the fellow masked by his 'Puddin' Man' sobriquet) have been poking around the SAS APP realm on SAS-L and in their own practices since 1998, occasionally letting the SAS community at large to peek at their findings. This opus is an attempt to circumscribe the results in a systematic manner. Welcome to the APP world! You are in for a few glorious surprises.
Paul Dorfman, Dorfman Consulting
The implicit loop refers to the DATA step repetitively reading data and creating observations, one at a time. The explicit loop, which uses the iterative DO, DO WHILE, or DO UNTIL statements, is used to repetitively execute certain SAS® statements within each iteration of the DATA step execution. Explicit loops are often used to simulate data and to perform a certain computation repetitively. However, when an explicit loop is used along with array processing, the applications are extended widely, which includes transposing data, performing computations across variables, and so on. To be able to write a successful program that uses loops and arrays, one needs to know the contents in the program data vector (PDV) during the DATA step execution, which is the fundamental concept of DATA step programming. This workshop covers the basic concepts of the PDV, which is often ignored by novice programmers, and then illustrates how to use loops and arrays to transform lengthy code into more efficient programs.
Arthur Li, City of Hope
Typically, it takes a system administrator to understand the graphic data results that are generated in the Microsoft Windows Performance Monitor. However, using SAS/GRAPH® software, you can customize performance results in such a way that makes the data easier to read and understand than the data that appears in the default performance monitor graphs. This paper uses a SAS® data set that contains a subset of the most common performance counters to show how SAS programmers can create an improved, easily understood view of the key performance counters by using SAS/GRAPH software. This improved view can help your organization reduce resource bottlenecks on systems that range from large servers to small workstations. The paper begins with a concise explanation of how to collect data with Windows Performance Monitor. Next, examples are used to illustrate the following topics in detail: converting and formatting a subset of the performance-monitor data into a data set using a SAS program to generate clearly labeled graphs that summarize performance results analyzing results in different combinations that illustrate common resource bottlenecks
John Maxwell, SAS
SAS® is an outstanding suite of software, but not everyone in the workplace speaks SAS. However, almost everyone speaks Excel. Often, the data you are analyzing, the data you are creating, and the report you are producing is a form of a Microsoft Excel spreadsheet. Every year at SAS® Global Forum, there are SAS and Excel presentations, not just because Excel isso pervasive in the workplace, but because there s always something new to learn (or re-learn)! This paper summarizes and references (and pays homage to!) previous SAS Global Forum presentations, as well as examines some of the latest Excel capabilities with the latest versions of SAS® 9.4 and SAS® Visual Analytics.
Andrew Howell, ANJ Solutions
Explore the various DATA step merge and PROC SQL join processes. This presentation examines the similarities and differences between merges and joins, and provides examples of effective coding techniques. Attendees examine the objectives and principles behind merges and joins, one-to-one merges (joins), and match-merge (equi-join), as well as the coding constructs associated with inner and outer merges (joins) and PROC SQL set operators.
Kirk Paul Lafler, Software Intelligence Corporation
Each month, our project team delivers updated 5-Star ratings for 15,700+ nursing homes across the United States to Centers for Medicare and Medicaid Services. There is a wealth of data (and processing) behind the ratings, and this data is longitudinal in nature. A prior paper in this series, 'Programming the Provider Previews: Extreme SAS® Reporting,' discussed one aspect of the processing involved in maintaining the Nursing Home Compare website. This paper will discuss two other aspects of our processing: creating an annual data Compendium and extending the 5-star processing to accommodate several different output formats for different purposes. Products used include Base SAS®, SAS/STAT®, ODS Graphics procedures, and SAS/GRAPH®. New annotate facilities in both SAS/GRAPH and the ODS Graphics procedures will be discussed. This paper and presentation will be of most interest to SAS programmers with medium to advanced SAS skills.
Louise Hadden, Abt Associates Inc.
Traditionally, web applications interact with back-end databases by means of JDBC/ODBC connections to retrieve and update data. With the growing need for real-time charting and complex analysis types of data representation on these web applications, SAS computing power can be put to use by adding a SAS web service layer between the application and the database. With the experience that we have with integrating these applications to SAS® BI Web Services, this is our attempt to point out five things to do when using SAS BI Web Services. 1) Input Data Sources: always enable Allow rewinding stream while creating the stored process. 2) Use LIBNAME statements to define XML filerefs for the Input and Output Streams (Data Sources). 3) Define input prompts and output parameters as global macro variables in the stored process if the stored process calls macros that use these parameters. 4) Make sure that all of the output parameters values are set correctly as defined (data type) before the end of the stored process. 5) The Input Streams (if any) should have a consistent data type; essentially, every instance of the stream should have the same structure. This paper consist of examples and illustrations of errors and warnings associated with the previously mentioned cases.
Neetha Sindhu, Kavi Associates
Vimal Raj, Kavi Associates
Data is often stored in highly normalized ( tall and skinny ) structures that are not convenient for analysis. The SAS® programmer frequently needs to transform the data to arrange relevant variables together in a single row. Sometimes this is a simple matter of using the TRANSPOSE procedure to flip the values of a single variable into separate variables. However, when there are multiple variables to be transposed to a single row, it might require multiple transpositions to obtain the desired result. This paper describes five different ways to achieve this flip-flop, explains how each method works, and compares the usefulness of each method in various situations. Emphasis is given to achieving a data-driven solution that minimizes hard-coding based on prior knowledge of the possible values each variable can have and that improves maintainability and reusability of the code. The intended audience is novice and intermediate SAS programmers who have a basic understanding of the DATA step and the TRANSPOSE procedure.
Josh Horstman, Nested Loop Consulting
PROC TABULATE is the most widely used reporting tool in SAS®, along with PROC REPORT. Any kind of report with the desired statistics can be produced by PROC TABULATE. When we need to report some summary statistics like mean, median, and range in the heading, either we have to edit it outside SAS in word processing software or enter it manually. In this paper, we discuss how we can automate this to be dynamic by using PROC SQL and some simple macros.
Lovedeep Gondara, BC Cancer Agency
This paper shares our experience integrating two leading data analytics and Geographic Information Systems (GIS) software products SAS® and ArcGIS to provide integrated reporting capabilities. SAS is a powerful tool for data manipulation and statistical analysis. ArcGIS is a powerful tool for analyzing data spatially and presenting complex cartographic representations. Combining statistical data analytics and GIS provides increased insight into data and allows for new and creative ways of visualizing the results. Although products exist to facilitate the sharing of data between SAS and ArcGIS, there are no ready-made solutions for integrating the output of these two tools in a dynamic and automated way. Our approach leverages the individual strengths of SAS and ArcGIS, as well as the report delivery infrastructure of SAS® Information Delivery Portal.
Nathan Clausen, CACI
Aaron House, CACI
Beginning with SA®S 9.2, ODS Graphics introduces a whole new way of generating graphs using SAS®. With just a few lines of code, you can create a wide variety of high-quality graphs. This paper covers the three basic ODS Graphics procedures SGPLOT, SGPANEL, and SGSCATTER. SGPLOT produces single-celled graphs. SGPANEL produces multi-celled graphs that share common axes. SGSCATTER produces multi-celled graphs that might use different axes. This paper shows how to use each of these procedures in order to produce different types of graphs, how to send your graphs to different ODS destinations, how to access individual graphs, and how to specify properties of graphs, such as format, name, height, and width.
Lora Delwiche, University of California, Davis
Susan Slaughter, Avocet Solutions
Have you ever needed additional data that was only accessible via a web service in XML or JSON? In some situations, the web service is set up to only accept parameter values that return data for a single observation. To get the data for multiple values, we need to iteratively pass the parameter values to the web service in order to build the necessary dataset. This paper shows how to combine the SAS® hash object with the FILEVAR= option to iteratively pass a parameter value to a web service and input the resulting JSON or XML formatted data.
John Vickery, North Carolina State University
SAS® provides some powerful, flexible tools for creating reports, like PROC REPORT and PROC TABULATE. With the advent of the Output Delivery System (ODS), you have almost total control over how the output from those procedures looks. But there are still times when you need (or want) just a little more, and that s where the Report Writing Interface (RWI) can help. The RWI is just a fancy way of saying that you are using the ODSOUT object in a DATA step. This object enables you to lay out the page, create tables, embed images, add titles and footnotes, and more all from within a DATA step, using whatever DATA step logic you need. Also, all the style capabilities of ODS are available to you so that the output created by your DATA step can have fonts, sizes, colors, backgrounds, and borders that make your report look just like you want. This presentation quickly covers some of the basics of using the ODSOUT object and then walks through some of the techniques to create four real-world examples. Who knows, you might even go home and replace some of your PROC REPORT code I know I have!
Pete Lund, Looking Glass Analytics
When first presented with SAS® Enterprise Guide®, many existing SAS® programmers don't know where to begin. They want to understand, 'What's in it for me?' if they switch over. These longtime users of SAS are accustomed to typing all of their code into the Program Editor window and clicking Submit. This beginning tutorial introduces SAS Enterprise Guide 6.1 to old and new users of SAS who need to code. It points out advantages and tips that demonstrate why a user should be excited about the switch. This tutorial focuses on the key points of a session involving coding and introduces new features. It covers the top three items for a user to consider when switching over to a server-based environment. Attendees will return to the office with a new motivation and confidence to start coding with SAS Enterprise Guide.
Andy Ravenna, SAS
No matter how long you ve been programming in SAS®, using and manipulating dates still seems to require effort. Learn all about SAS dates, the different ways they can be presented, and how to make them useful. This paper includes excellent examples for dealing with raw input dates, functions to manage dates, and outputting SAS dates into other formats. Included is all the date information you will need: date and time functions, Informats, formats, and arithmetic operations.
Jenine Milum, Equifax Inc.
The DATA step has served SAS® programmers well over the years, and although it is powerful, it has not fundamentally changed. With DS2, SAS has introduced a significant alternative to the DATA step by introducing an object-oriented programming environment. In this paper, we share our experiences with getting started with DS2 and learning to use it to access, manage, and share data in a scalable, threaded, and standards-based way.
Peter Eberhardt, Fernwood Consulting Group Inc.
Xue Yao, Winnipeg Regional Health Authority
Controlled vocabularies define a common set of concepts that retain their meaning across contexts, supporting consistent use of terms to annotate, integrate, retrieve, and interpret information. Controlled vocabularies are large hierarchical structures that cannot be represented using typical SAS® practices (e.g., SAS format statements and hash objects). This paper compares and contrasts three models for representing hierarchical structures using SAS data sets: adjacency list, path enumeration, and nested set (Celko, 2004; Mackey, 2002). Specific controlled vocabularies include a university organizational structure and several biological vocabularies (MeSH, NCBI Taxonomy, and GO). The paper presents data models and SAS code for populating tables and performing queries. The paper concludes with a discussion of implications for data warehouse implementation and future work related to efficiency of update and delete operations.
Glenn Colby, University of Colorado Boulder
Traditional SAS® programs typically consist of a series of SAS DATA steps, which refine input data sets until the final data set or report is reached. SAS DATA steps do not run in-database. However, SAS® Enterprise Guide® users can replicate this kind of iterative programming and have the resulting process flow run in-database by linking a series of SAS Enterprise Guide Query Builder tasks that output SAS views pointing at data that resides in a Teradata database, right up to the last Query Builder task, which generates the final data set or report. This session both explains and demonstrates this functionality.
Frank Capobianco, Teradata
Being flexible and highlighting important details in your output is critical. The use of ODS ESCAPECHAR allows the SAS® programmer to insert inline formatting functions into variable values through the DATA step, and it makes for a quick and easy way to highlight specific data values or modify the style of the table cells in your output. What is an easier and more efficient way to concatenate those inline formatting functions to the variable values? This paper shows how the CAT functions can simplify this task.
Yanhong Liu, Cincinnati Children's Hospital Medical Center
Justin Bates, Cincinnati Children's Hospital Medical Center
Traditional merchandise planning processes have been primarily product and location focused, with decisions about assortment selection, breadth and depth, and distribution based on the historical performance of merchandise in stores. However, retailers are recognizing that in order to compete and succeed in an increasingly complex marketplace, assortments must become customer-centric. Advanced analytics can be leveraged to generate actionable insights into the relevance of merchandise to a retailer's various customer segments and purchase channel preferences. These insights enrich the merchandise and assortment planning process. This paper describes techniques for using advanced analytics to impact customer-centric assortments. Topics covered include approaches for scoring merchandise based on customer relevance and preferences, techniques for gaining insight into customer relevance without customer data, and an overall approach to a customer-driven merchandise planning process.
Christopher Matz, SAS
U.S. educators face a critical new imperative: to prepare all students for work and civic roles in a globalized environment in which success increasingly requires the ability to compete, connect, and cooperate on an international scale. The Asia Society and the Longview Foundation are collaborating on a project to show both the need for and supply of globally competent graduates. This presentation shows you how SAS assisted these organizations with a solution that leverages SAS® visualization technologies in order to produce a heatmap application. The heatmap application surfaces data from over 300 indicators and surfaces over a quarter million data points in a highly iterative heatmap application. The application features a drillable map that shows data at the state level as well as at the county level for all 50 states. This endeavor involves new SAS® 9.4 technology to both combine the data and to create the interface. You'll see how SAS procedures, such as PROC JSON, which came out in SAS 9.4, were used to prepare the data for the web application. The user interface demonstrates how SAS/GRAPH® output can be combined with popular JavaScript frameworks like Dojo and Twitter Bootstrap to create an HTML5 application that works on desktop, mobile, and tablet devices.
Jim Bauer, SAS
The capabilities of SAS® have been extended by the use of macros and custom formats. SAS macro code libraries and custom format libraries can be stored in various locations, some of which may or may not always be easily and efficiently accessed from other operating environments. Code can be in various states of development ranging from global organization-wide approved libraries to very elementary just-getting-started code. Formalized yet flexible file structures for storing code are needed. SAS user environments range from standalone systems such as PC SAS or SAS on a server/mainframe to much more complex installations using multiple platforms. Strictest attention must be paid to (1) file location for macros and formats and (2) management of the lack of cross-platform portability of formats. Macros are relatively easy to run from their native locations. This paper covers methods of doing this with emphasis on: (a) the option sasautos to define the location and the search order for identifying macros being called, and (b) even more importantly the little-known SAS option MAUTOLOCDISPLAY to identify the location of the macro actually called in the saslog. Format libraries are more difficult to manage and cannot be created and run in a different operating system than that in which they were created. This paper will discuss the export, copying and importing of format libraries to provide cross-platform capability. A SAS macro used to identify the source of a format being used will be presented.
Roger Muller, Data-To-Events, Inc.
Bootstrapped Decision Tree is a variable selection method used to identify and eliminate unintelligent variables from a large number of initial candidate variables. Candidates for subsequent modeling are identified by selecting variables consistently appearing at the top of decision trees created using a random sample of all possible modeling variables. The technique is best used to reduce hundreds of potential fields to a short list of 30 50 fields to be used in developing a model. This method for variable selection has recently become available in JMP® under the name BootstrapForest; this paper presents an implementation in Base SAS®9. The method does accept but does not require a specific outcome to be modeled and will therefore work for nearly any type of model, including segmentation, MCMC, multiple discrete choice, in addition to standard logistic regression. Keywords: Bootstrapped Decision Tree, Variable Selection
David Corliss, Magnify Analytic Solutions
While survey researchers make great attempts to standardize their questionnaires including the usage of ratings scales in order to collect unbiased data, respondents are still prone to introducing their own interpretation and bias to their responses. This bias can potentially affect the understanding of commonly investigated drivers of customer satisfaction and limit the quality of the recommendations made to management. One such problem is scale use heterogeneity, in which respondents do not employ a panoramic view of the entire scale range as provided, but instead focus on parts of the scale in giving their responses. Studies have found that bias arising from this phenomenon was especially prevalent in multinational research, e.g., respondents of some cultures being inclined to use only the neutral points of the scale. Moreover, personal variability in response tendencies further complicates the issue for researchers. This paper describes an implementation that uses a Bayesian hierarchical model to capture the distribution of heterogeneity while incorporating the information present in the data. More specifically, SAS® PROC MCMC is used to carry out a comprehensive modeling strategy of ratings data that account for individual level scale usage. Key takeaways include an assessment of differences between key driver analyses that ignore this phenomenon versus the one that results from our implementation. Managerial implications are also emphasized in light of the prevalent use of more simplistic approaches.
Jorge Alejandro, Market Probe
Sharon Kim, Market Probe
Most programmers are familiar with the directive Know your data. But not everyone knows about all the data and metadata that a SAS® data set holds or understands what to do with this information. This presentation talks about the majority of these attributes, how to obtain them, why they are important, and what you can do with them. For example, data sets that have been around for a while might have an inordinate number of deleted observations that you are carrying around unnecessarily. Or you might be able to quickly check to determine whether the data set is indexed and if so, by what variables in order to increase your program s performance. Also, engine-dependent data such as owner name and file size is found in PROC CONTENTS output, which is useful for understanding and managing your data. You can also use ODS output in order to use the values of these many attributes programmatically. This presentation shows you how.
Diane Olson, SAS
This poster shows the audience step-by-step how to connect to a database without registering the connection in either the Windows ODBC Administrator tool or in the Windows Registry database. This poster also shows how the connection can be more flexible and better managed by building it into a SAS® macro.
Jesper Michelsen, Nykredit
Have you ever asked, Why doesn't my PDF output look just like my HTML output? This paper explains the power and differences of each destination. You ll learn how each destination works and understand why the output looks the way it does. Learn tips and tricks for how to modify your SAS® code to make each destination look more like the other. The tips span from beginner to advanced in all areas of reporting. Each destination is like a superhero, helping you transform your reports to meet all your needs. Learn how to use each ODS destination to the fullest extent of its powers.
Scott Huntley, SAS
Cynthia Zender, SAS
PD_Calibrate is a macro that standardizes the calibration of our predictive credit-scoring models at Nykredit. The macro is activated with an input data set, variables, anchor point, specification of method, number of buckets, kink-value, and so on. The output consists of graphs, HTML, and two data sets containing key values for the model being calibrated and values for the use of graphics.
Keld Asnæs, Nykredit a/s
Jesper Michelsen, Nykredit
PROC TABULATE is a powerful tool for creating tabular summary reports. Its advantages, over PROC REPORT, are that it requires less code, allows for more convenient table construction, and uses syntax that makes it easier to modify a table s structure. However, its inability to compute the sum, difference, product, and ratio of column sums has hindered its use in many circumstances. This paper illustrates and discusses some creative approaches and methods for overcoming these limitations, enabling users to produce needed reports and still enjoy the simplicity and convenience of PROC TABULATE. These methods and skills can have prominent applications in a variety of business intelligence and analytics fields.
Justin Jia, Canadian Imperial Bank of Commerce (CIBC)
Amanda Lin, Bell Canada
A time-consuming part of statistical analysis is building an analytic data set for statistical procedures. Whether it is recoding input values, transforming variables, or combining data from multiple data sources, the work to create an analytic data set can take time. The DS2 programming language in SAS® 9.4 simplifies and speeds data preparation with user-defined methods, storing methods and attributes in shareable packages, and threaded execution on multi-core SMP and MPP machines. Come see how DS2 makes your job easier.
Jason Secosky, SAS
Robert Ray, SAS
Greg Otto, Teradata Corporation
Graphs in oncology studies are essential for getting more insight about the clinical data. This presentation demonstrates how ODS Graphics can be effectively and easily used to create graphs used in oncology studies. We discuss some examples and illustrate how to create plots like drug concentration versus time plots, waterfall charts, comparative survival plots, and other graphs using Graph Template Language and ODS Graphics procedures. These can be easily incorporated into a clinical report.
Debpriya Sarkar, SAS
The SQL procedure contains many powerful and elegant language features for intermediate and advanced SQL users. This presentation discusses topics that will help SAS® users unlock the many powerful features, options, and other gems found in the SQL universe. Topics include CASE logic; a sampling of summary (statistical) functions; dictionary tables; PROC SQL and the SAS macro language interface; joins and join algorithms; PROC SQL statement options _METHOD, MAGIC=101, MAGIC=102, and MAGIC=103; and key performance (optimization) issues.
Kirk Paul Lafler, Software Intelligence Corporation
An increase in sea levels is a potential problem that is affecting the human race and marine ecosystem. Many models are being developed to find out the factors that are responsible for it. In this research, the Memory-Based Reasoning model looks more effective than most other models. This is because this model takes the previous solutions and predicts the solutions for forthcoming cases. The data was collected from NASA. The data contains 1,072 observations and 10 variables such as emissions of carbon dioxide, temperature, and other contributing factors like electric power consumption, total number of industries established, and so on. Results of Memory-Based Reasoning models like RD tree, scan tree, neural networks, decision tree, and logistic regression are compared. Fit statistics, such as misclassification rate and average squared error are used to evaluate the model performance. This analysis is used to predict the rise in sea levels in the near future and to take the necessary actions to protect the environment from global warming and natural disasters.
Prasanna K S Sailaja Bhamidi, Oklahoma State University
Goutam Chakraborty, Oklahoma State University
The steady expansion of electronic health records (EHR) over the past decade has increased the use of observational healthcare data for analysis. One of the challenges with EHR data is to combine information from different domains (diagnosis, procedures, drugs, adverse events, labs, quality of life scores, and so on) onto a single timeline to get a longitudinal view of the patient. This enables the physician or researcher to visualize a patient's health profile, thereby revealing anomalies, trends, and responses graphically,thus empowering them to treat more effectively. This paper attempts to provide a composite view of a patient by using SAS® Graph Template Language to create a profile graph using the following data elements: key event dates, drugs, adverse events, Quality of Life (QoL) scores. For visualization, the GTL graph uses X and X2 axes for dates, vertical reference lines to represent key dates (for example, when the disease is first diagnosed), horizontal bar plot for duration of drugs taken and adverse events reported, and a series plot at the bottom to show the QoL score.
Radhikha Myneni, SAS
Eric Brinsfield, SAS
Sparse data sets are common in applications of text and data mining, social network analysis, and recommendation systems. In SAS® software, sparse data sets are usually stored in the coordinate list (COO) transactional format. Two major drawbacks are associated with this sparse data representation: First, most SAS procedures are designed to handle dense data and cannot consume data that are stored transactionally. In that case, the options for analysis are significantly limited. Second, a sparse data set in transactional format is hard to store and process in distributed systems. Most techniques require that all transactions for a particular object be kept together; this assumption is violated when the transactions of that object are distributed to different nodes of the grid. This paper presents some different ideas about how to package all transactions of an object into a single row. Approaches include storing the sparse matrix densely, doing variable selection, doing variable extraction, and compressing the transactions into a few text variables by using Base64 encoding. These simple but effective techniques enable you to store and process your sparse data in better ways. This paper demonstrates how to use SAS® Text Miner procedures to process sparse data sets and generate output data sets that are easy to store and can be readily processed by traditional SAS modeling procedures. The output of the system can be safely stored and distributed in any grid environment.
Zheng Zhao, SAS
Russell Albright, SAS
James Cox, SAS
In this session, you learn how Kaiser Permanente has taken a centralized production support approach to using SAS® Enterprise Guide® 4.3 in the healthcare industry. Kaiser Permanente Northwest (KPNW) has designed standardized processes and procedures that have allowed KPNW to streamline the support of production content, which enabled KPNW analytical resources to focus more on new content development rather than on maintenance and support of steady state programs and processes. We started with over 200 individual SAS® processes across four different SAS platforms, SAS Enterprise Guide, Mainframe SAS®, PC SAS® and SAS® Data Integration Studio, in oder to standardize our development approach on SAS Enterprise Guide and build efficient and scalable processes within our department and across the region. We walk through the need for change, how the team was set up, provide an overview of the UNIX SAS platform, walk through the standard production requirements (developer pack), and review lessons learned.
Ryan Henderson, Kaiser Permanente
Karl Petith, Kaiser Permanente
In a good clinical study, statisticians and various stakeholders are interested in assessing and isolating the effect of non-study drugs. One common practice in clinical trials is that clinical investigators follow the protocol to taper certain concomitant medications in an attempt to prevent or resolve adverse reactions and/or to minimize the number of subject withdrawals due to lack of efficacy or adverse event. To assess the impact of those tapering medicines during study is of high interest to clinical scientists and the study statistician. This paper presents the challenges and caveats of assessing the impact of tapering a certain type of concomitant medications using SAS® 9.3 based on a hypothetical case. The paper also presents the advantages of visual graphs in facilitating communications between clinical scientists and the study statistician.
Iuliana Barbalau, Santen Inc.
Chen Shi, Santen Inc
Yang Yang, Santen Inc.
The ZIP access method is new with SAS® 9.4. This paper provides several examples of reading from and writing to ZIP files using this access method, including the use of the DATA step directory management macros and the new MEMVAR= option.
Rick Langston, SAS
The Department of Market Monitoring (DMM) at California ISO is responsible for promoting a robust, competitive, and nondiscriminatory electric power market in California by keeping a close watch on the efficiency and effectiveness of the ancillary service, congestion management, and real-time spot markets. We monitor the potential of market participants to exercise undue market power, the behavior of market participants that is consistent with attempts to exercise market power and the market performance that results from the interaction of market structure with participant behavior. In order to perform monitoring activities effectively, DMM collects available data, designs, and implement reporting dashboards that track key market metrics. We are using various SAS® BI tools to develop and employ metrics and analytic tools applicable to market structure, participant behavior, and market performance. This paper provides details about the effective use of various SAS BI tools to implement an automated real time market monitoring functionality.
Amol Deshmukh, California ISO Corp.
Jeff McDonald, California ISO Corp.
Healthcare expenditure growth continues to be a prominent healthcare policy issue, and the uncertain impact of the Affordable Care Act (ACA) has put increased pressure on payers to find ways to exercise control over costs. Fueled by provider performance analytics, BCBSNC has developed innovative strategies that recognize, reward, and assist providers delivering high-quality and efficient care. A leading strategy has been the introduction of a new tiered network product called Blue Select, which was launched in 2013 and will be featured in the State Health Exchange. Blue Select is a PPO with differential member cost-sharing for tiered providers. Tier status of providers is determined by comparing providers to their peers on the efficiency and quality of the care they delivery. Providers who meet or exceed the standard for quality, measured using Healthcare Effectiveness Data and Information Set (HEDIS) adherence rates and potentially avoidable complication rates for certain procedures, are then evaluated on their case-mix adjusted costs for total episodic costs. Each practice s performance is compared, through indirect standardization, to expected performance, given the patients and conditions treated within a practice. A ratio of observed to expected performance is calculated for both cost and quality to use in determining the tier status of providers for Blue Select. While the primary goal of provider tiering is cost containment through member steerage, the initiative has also resulted in new and strengthened collaborative relationships between BCBSNC and providers. The strategy offers the opportunity to bend the cost curve and provide meaningful change in the quality of healthcare delivery.
Stephanie Poley, BCBSNC
Are you wondering what is causing your valuable machine asset to fail? What could those drivers be, and what is the likelihood of failure? Do you want to be proactive rather than reactive? Answers to these questions have arrived with SAS® Predictive Asset Maintenance. The solution provides an analytical framework to reduce the amount of unscheduled downtime and optimize maintenance cycles and costs. An all new (R&D-based) version of this offering is now available. Key aspects of this paper include: Discussing key business drivers for and capabilities of SAS Predictive Asset Maintenance. Detailed analysis of the solution, including: Data model Explorations Data selections Path I: analysis workbench maintenance analysis and stability monitoring Path II: analysis workbench JMP®, SAS® Enterprise Guide®, and SAS® Enterprise Miner™ Analytical case development using SAS Enterprise Miner, SAS® Model Manager, and SAS® Data Integration Studio SAS Predictive Asset Maintenance Portlet for reports A realistic business example in the oil and gas industry is used.
George Habek, SAS
Due to XML's growing role in data interchange, it is increasingly important for SAS® programmers to become proficient with SAS technologies and techniques for creating and consuming XML. The current work expands on a SAS® Global Forum 2013 presentation that dealt with these topics providing additional examples of using XML maps to read and write XML files and using the Output Delivery System (ODS) to create custom tagsets for generating XML.
Chris Schacherer, Clinical Data Management Systems, LLC
Traditionally, Java web applications interact with back-end databases by means of JDBC/ODBC connections to retrieve and update data. With the growing need for real-time charting and complex analysis types of data representation on these types of web applications, SAS® computing power can be put to use by adding a SAS web service layer between the application and the database. This paper shows how a SAS web service layer can be used to render data to a JAVA application in a summarized form using SAS® Stored Processes. This paper also demonstrates how inputs can be passed to a SAS Stored Process based on which computations/summarizations are made before output parameter and/or output data streams are returned to the Java application. SAS Stored Processes are then deployed as SAS® BI Web Services using SAS® Management Console, which are available to the JAVA application as a URL. We use the SOAP method to interact with the web services. XML data representation is used as a communication medium. We then illustrate how RESTful web services can be used with JSON objects being the communication medium between the JAVA application and SAS in SAS® 9.3. Once this pipeline communication between the application, SAS engine, and database is set up, any complex manipulation or analysis as supported by SAS can be incorporated into the SAS Stored Process. We then illustrate how graphs and charts can be passed as outputs to the application.
Neetha Sindhu, Kavi Associates
Hari Hara Sudhan, Kavi Associates
Mingming Wang, Kavi Associates
Statistical mediation analysis is common in business, social sciences, epidemiology, and related fields because it explains how and why two variables are related. For example, mediation analysis is used to investigate how product presentation affects liking the product, which then affects the purchase of the product. Mediation analysis evaluates the mechanism by which a health intervention changes norms that then change health behavior. Research on mediation analysis methods is an active area of research. Some recent research in statistical mediation analysis focuses on extracting accurate information from small samples by using Bayesian methods. The Bayesian framework offers an intuitive solution to mediation analysis with small samples; namely, incorporating prior information into the analysis when there is existing knowledge about the expected magnitude of mediation effects. Using diffuse prior distributions with no prior knowledge allows researchers to reason in terms of probability rather than in terms of (or in addition to) statistical power. Using SAS® PROC MCMC, researchers can choose one of two simple and effective methods to incorporate their prior knowledge into the statistical analysis, and can obtain the posterior probabilities for quantities of interest such as the mediated effect. This project presents four examples of using PROC MCMC to analyze a single mediator model with real data using: (1) diffuse prior information for each regression coefficient in the model, (2) informative prior distributions for each regression coefficient, (3) diffuse prior distribution for the covariance matrix of variables in the model, and (4) informative prior distribution for the covariance matrix.
Miočević Milica, Arizona State University
David MacKinnon, Arizona State University
To get the full benefit from PROC REPORT, the savvy programmer needs to master ACROSS usage and the COMPUTE block. Timing issues with PROC REPORT and ABSOLUTE column references can unlock the power of PROC REPORT. This presentation shows how to make the most of ACROSS usage with PROC REPORT. Use PROC REPORT instead of multiple TRANSPOSE steps. Find out how to use character variables with ACROSS. Learn how to impact the column headings for ACROSS usage items. Learn how to use aliases. Find out how to perform rowwise trafficlighting and trafficlighting based on multiple conditions.
Cynthia Zender, SAS
The scatter plot is a basic tool for examining the relationship between two variables. While the basic plot is good, enhancements can make it better. In addition, there might be problems of overplotting. In this paper, I cover ways to create basic and enhanced scatter plots and to deal with overplotting.
Peter Flom, Peter Flom Consulting
Business analysts commonly use Microsoft Excel with the SAS® System to answer difficult business questions. While you can use these applications independently of each other to obtain the information you need, you can also combine the power of those applications, using the SAS Output Delivery System (ODS) tagsets, to completely automate the process. This combination delivers a more efficient process that enables you to create fully functional and highly customized Excel worksheets within SAS. This paper starts by discussing common questions and problems that SAS Technical Support receives from users when they try to generate Excel worksheets. The discussion continues with methods for automating Excel worksheets using ODS tagsets and customizing your worksheets using the CSS style engine and extended tagsets. In addition, the paper discusses tips and techniques for moving from the current MSOffice2K and ExcelXP tagsets to the new Excel destination, which generates output in the native Excel 2010 format.
Chevell Parker, SAS
Big data is all the rage these days, with the proliferation of data-accumulating electronic gadgets and instrumentation. At the heart of big data analytics is the MapReduce programming model. As a framework for distributed computing, MapReduce uses a divide-and-conquer approach to allow large-scale parallel processing of massive data. As the name suggests, the model consists of a Map function, which first splits data into key-value pairs, and a Reduce function, which then carries out the final processing of the mapper outputs. It is not hard to see how these functions can be simulated with the SAS® hash objects technique, and in reality, implemented in the new SAS® DS2 language. This paper demonstrates how hash object programming can handle data in a MapReduce fashion and shows some potential applications in physics, chemistry, biology, and finance.
Joseph Hinson, Accenture Life Sciences
One beautiful graph provides visual clarity of data summaries reported in tables and listings. Waterfall graphs show, at a glance, the increase or decrease of data analysis results from various industries. The introduction of SAS® 9.2 ODS Statistical Graphics enables SAS® programmers to produce high-quality results with less coding effort. Also, SAS programmers can create sophisticated graphs in stylish custom layouts using the SAS® 9.3 Graph Template Language and ODS style template. This poster presents two sets of example waterfall graphs in the setting of clinical trials using SAS® 9.3 and later. The first example displays colorful graphs using new SAS 9.3 options. The second example displays simple graphs with gray-scale color coding and patterns. SAS programmers of all skill levels can create these graphs on UNIX or Windows.
Setsuko Chiba, Exelixis Inc.
Systematic reviews have become increasingly important in healthcare, particularly when there is a need to compare new treatment options and to justify clinical effectiveness versus cost. This paper describes a method in SAS/STAT® 9.2 for computing weighted averages and weighted standard deviations of clinical variables across treatment options while correctly using these summary measures to make accurate statistical inference. The analyses of data from systematic reviews typically involve computations of weighted averages and comparisons across treatment groups. However, the application of the TTEST procedure does not currently take into account weighted standard deviations when computing p-values. The use of a default non-weighted standard deviation can lead to incorrect statistical inference. This paper introduces a method for computing correct p-values using weighted averages and weighted standard deviations. Given a data set containing variables for three treatment options, we want to make pairwise comparisons of three independent treatments. This is done by creating two temporary data sets using PROC MEANS, which yields the weighted means and weighted standard deviations. Subsequently, we then perform a t-test on each temporary data set.The resultant data sets containing all comparisons of each treatment options are merged and then transposed to obtain the necessary statistics. The resulting output provides pairwise comparisons of each treatment option and uses the weighted standard deviations to yield the correct p-values in a desired format. This method allows the use of correct weighted standard deviations using PROC MEANS and PROC TTEST in summarizing data from a systematic review while providing correct p-values.
Ravi Gaddameedi, California State University
Usha Kreaden, Intuitive Surgical
Westat utilizes SAS® software as a core capability for providing clients in government and private industry with analysis and characterization of survey data. Staff programmers, analysts, and statisticians use SAS to manage, store, and analyze client data, as well as to produce tabulations, reports, graphs, and summary statistics. Because SAS is so widely used at Westat, the organization has built a comprehensive infrastructure to support its deployment and use. This paper provides an overview of Westat s SAS support infrastructure, which supplies resources that are aimed at educating staff, strengthening their SAS skills, providing SAS technical support, and keeping the staff on the cutting edge of SAS programming techniques.
Michael Raithel, Westat
In a clinical study, we often set up multiple hypotheses with regard to the cost of getting study result. However, the multiplicity problem arises immediately when they are performed in a univariate manner. Some methods to control the rate of the overall type I error are applied widely, and they are discussed in this paper, except the methodology, we will introduce its application in one study case and provide the SAS® code.
Lixiang Yao, icon
For decades, SAS® has been the cornerstone of many organizations for business reporting. In more recent times, the ability to quickly determine the performance of an organization through the use of dashboards has become a requirement. Different ways of providing dashboard capabilities are discussed in this paper: using out-of-the-box solutions such as SAS® Visual Analytics and SAS® BI Dashboard, through to alternative solutions using SAS® Stored Processes, batch processes, and SAS® Integration Technologies. Extending the available indicators is also discussed, using Graph Template Language and KPI indicators provided with Base SAS®, as well as alternatives such as Google Charts and Flash objects. Real-world field experience, problem areas, solutions, and tips are shared, along with live examples of some of the different methods.
Mark Bodt, The Knowledge Warehouse (Knoware)
This new SAS® tool is a two-dimensional color chart for visualizing changes in a population or in a system over time. Data for one point in time appear as a thin horizontal band of color. Bands for successive periods are stacked up to make a two-dimensional plot, with the vertical direction showing changes over time. As a system evolves over time, different kinds of events have different characteristic patterns. Creation of Time Contour plots is explained step-by-step. Examples are given in astrostatistics, biostatistics, econometrics, and demographics.
David Corliss, Magnify Analytic Solutions
As a longtime Base SAS® programmer, whether to use a different application for programming is a constant question when powerful applications such as SAS® Enterprise Guide® are available. This paper provides some important tips for a programmer, such as the best way to use the code window and how to take advantage of system-generated code in SAS Enterprise Guide 5.1. This paper also explains the differences between some of the functions and procedures in Base SAS and SAS Enterprise Guide. It highlights features in SAS Enterprise Guide such as process flow, data access management, and report automation, including formatting using XML tag sets.
Anjan Matlapudi, AmerihealthCaritas
This paper gives you a better idea of how and where to use the record lookup functions to locate observations where a variable has some characteristic. Various related functions are illustrated to search numeric and character values in this process. Code is shown with time comparisons. I will discuss three possible ways to retrieve records using the SAS® DATA step, PROC SQL, and Perl regular expressions. Real and CPU time processing issues will be highlighted when comparing to retrieve records using these methods. Although the program is written for the PC using SAS® 9.2 in a Windows XP 32-bit environment, all the functions are applicable to any system. All the tools discussed are in Base SAS®. The typical attendee or reader will have some experience in SAS, but not a lot of experience dealing with large amount of data.
Anjan Matlapudi, Amerihealth Critas
Two new production features offered in the Output Delivery System (ODS) in SAS® 9.4 are ODS LAYOUT and the ODS Report Writing Interface. This one-two punch gives you power and flexibility in structuring your SAS® output. What are the strengths for each? How do they differ? How do they interact? This paper highlights the similarities and differences between the two and illustrates the advantages of using them together. Why go twelve rounds? Make your report a knockout with ODS LAYOUT and the Report Writing Interface.
Daniel Kummer, SAS
Do you often create SAS® web applications? Do you need to update or retrieve values from a SAS data set and display them in a browser? Do you need to show the results of a SAS® Stored Process in a browser? Are you finding it difficult to figure out how to pass parameters from a web page to a SAS Stored Process? If you answered yes to any of these questions, then look no further. Techniques shown in this paper include: How to take advantage of JavaScript and minimize PUT statements. How to call a SAS Stored Process from your web page by using JavaScript and XMLHTTPRequest. How to pass parameters from a web page to a SAS Stored Process and from a SAS Stored Process back to the web page. How to use simple Ajax to refresh and update a specific part of a web page without the need to reload the entire page. How to apply Cascading Style Sheets (CSS) on your web page. How to use some of the latest HTML5 features, like drag and drop. How to display run-time graphs in your web page by using STATGRAPH and PROC SGRENDER. This paper contains sample code that demonstrates each of the techniques.
Yogendra Joshi, SAS
The independent means t-test is commonly used for testing the equality of two population means. However, this test is very sensitive to violations of the population normality and homogeneity of variance assumptions. In such situations, Yuen s (1974) trimmed t-test is recommended as a robust alternative. The purpose of this paper is to provide a SAS® macro that allows easy computation of Yuen s symmetric trimmed t-test. The macro output includes a table with trimmed means for each of two groups, Winsorized variance estimates, degrees of freedom, and obtained value of t (with two-tailed p-value). In addition, the results of a simulation study are presented and provide empirical comparisons of the Type I error rates and statistical power of the independent samples t-test, Satterthwaite s approximate t-test, and the trimmed t-test when the assumptions of normality and homogeneity of variance are violated.
Patricia Rodriguez de Gil, University of South Florida
Anh P. Kellermann, University of South Florida
Diep T. Nguyen, University of South Florida
Eun Sook Kim, University of South Florida
Jeffrey D. Kromrey, University of South Florida
As SAS® professionals, we often wish our clients would make more use of the many excellent SAS tools at their disposal. However, it remains an indisputable fact that for many business users, Microsoft Excel is still their go-to application when it comes to carrying out any form of data analysis. There have been many attempts to integrate SAS and Excel, but none of these has up to now been entirely seamless. This paper addresses that problem by showing how, with a minimum of VBA (Visual Basic for Applications) code and by using the SAS Integrated Object Model (IOM) together with Microsoft s ActiveX Data Objects (ADO), we can create an Excel User Defined Function (UDF) that can accept parameters, carry out all data manipulations in SAS, and return the result to the spreadsheet in a way that is completely invisible to the user. They can nest or link these functions together just as if they were native Excel functions. We then go on to demonstrate how, using the same techniques, we can create small Excel applications that can perform sophisticated data analyses in SAS while not forcing users out of their Excel comfort zones.
Chris Brooks, Melrose Analytics Ltd
You have built the simple bar chart and mastered the art of layering multiple plot statements to create complex graphs like the Survival Plot using the SGPLOT procedure. You know all about how to use plot statements creatively to get what you need and how to customize the axes to achieve the look and feel you want. Now it s time to up your game and step into the realm of the Graphics Wizard. Behold the magical powers of Graph Template Language Layouts! Here you will learn the esoteric art of creating complex multi-cell graphs using LAYOUT LATTICE. This is the incantation that gives you the power to build complex, multi-cell graphs like the Forest plot, Stock plots with multiple indicators like MACD and Stochastics, Adverse Events by Relative Risk graphs, and more. If you ever wondered how the Diagnostics panel in the REG procedure was built, this paper is for you. Be warned, this is not the realm for the faint of heart!
Sanjay Matange, SAS
When deploying SAS® code into a production environment, a programmer should ensure that the code satisfies the following key criteria: The code runs without errors. The code performs operations consistent with the agreed upon business logic. The code is not dependent on manual human intervention. The code performs necessary checks in order to provide sufficient quality control of the deployment process. Base SAS® programming offers a wide range of techniques to support the last two aforementioned criteria. This presentation demonstrates the use of SAS® macro variables in combination with simple macro programs to perform a number of routine automated tasks that are often part of the production-ready code. Some of the examples to be demonstrated include the following topics: How to check that required key parameters for a successful program run are populated in the parameters file. How to automatically copy the content of the permanent folder to the newly created backup folder. How to automatically update the log file with new run information. How to check whether a data set already exists in the library.
Elena Shtern, SAS
This session demonstrates how to use Base SAS® tools to add functional, reusable extensions to the SAS® system. Learn how to do the following: Write user-defined macro functions that can be used inline with any other SAS code. Use PROC FCMP to write and store user-defined functions that can be used in other SAS programs. Write DS2 user-defined methods and store them in packages for easy reuse in subsequent DS2 programs.
Mark Jordan, SAS
Have you found OS file permissions to be insufficient to tailor access controls to meet your SAS® data security requirements? Have you found metadata permissions on tables useful for restricting access to SAS data, but then discovered that SAS programmers can avoid the permissions by issuing LIBNAME statements that do not use the metadata? Would you like to ensure that users have access to only particular rows or columns in SAS data sets, no matter how they access the SAS data sets? Metadata-bound libraries provide the ability to authorize access to SAS data by authenticated Metadata User and Group identities that cannot be bypassed by SAS programmers who attempt to avoid the metadata with direct LIBNAME statements. They also provide the ability to limit the rows and columns in SAS data sets that an authenticated user is allowed to see. The authorization decision is made in the bowels of the SAS® I/O system, where it cannot be avoided when data is accessed. Metadata-bound libraries were first implemented in the second maintenance release of SAS® 9.3 and were enhanced in SAS® 9.4. This paper overviews the feature and discusses best practices for administering libraries bound to metadata and user experiences with bound data. It also discusses enhancements included in the first maintenance release of SAS 9.4.
Jack Wallace, SAS
There are yearly 2.35 million road accident cases recorded in the U.S. Among them, 37,000 were considered fatal. Road crashes cost USD 230.6 billion per year, or an average of USD 820 per person. Our efforts are to identify the important factors that lead to vehicle collisions and to predict the injury risk involved in them. Data was collected from National Automotive Sampling System (NASS), containing 20,247 cases with 19 variables. Input variables describe the factors involved in an accident like Height, Age, Weight, Gender, Vehicle model year, Speed limit, Energy absorption in Collision & Deformation location, etc. The target variable is nominal showing levels of injury. Missing values in interval variables were imputed using mean and class variables using the count method. Multivariate analysis suggests high correlation between tire footprint and wheelbase (Corr=0.97, P<0.0001) and original weight of car and curb weight of car (Corr=0.79, P<0.0001). Variables having high kurtosis values were transformed using range standardization. Variables were sorted using variable importance using decision tree analysis. Models like multiple regression, polynomial regression, neural network, and decision tree were applied in the dataset to identify the factors that are most significant in predicting the injury risk. Multilinear perception neural network came out to be the best model to predict injury risk index, with the least Average Squared Error 0.086 in validation dataset.
Prateek Khare, Oklahoma State University
Vandana Reddy, Oklahoma State University
Goutam Chakraborty, Oklahoma State University
This paper discusses the techniques I used at the Census Bureau to overcome the issue of dealing with large amounts of data while modernizing some of their public-facing web applications by using service oriented architecture (SOA) to deploy Flex web applications powered by SAS®. The paper covers techniques that resulted in reducing 142,293 XML lines (3.6 MB) down to 15,813 XML lines (1.8 MB), a 50% size reduction on the server side (HTTP Response), and 196,167 observations down to 283 observations, a reduction of 99.8% in summarized data on the client side (XML Lookup file).
Ahmed Al-Attar, AnA Data Warehousing Consulting, LLC
When reading data files or writing SAS® programs, we are often hunting for the right format or informat. There are so many to choose from! Does it seem like too many to search the manual? Let SAS help find the right one! We use the SAS dictionary table VFORMAT and a very small SAS program. This presentation demonstrates how two simple functions unlock the potential of this great resource: SASHELP.VFORMAT.
Peter Crawford, Crawford Software Consultancy Limited
In connection with the consolidation work at Nykredit, the data stored on the Nykredit z/OS SAS® installation had to be migrated (copied) to the new x64 Windows SAS platform storage. However, getting an overview of these data on the z/OS mainframe can be difficult, and a series of questions arise during the process. For example: Who is responsible? How many bytes? How many rows and columns? When were the data created? And so on. With extensive use of filename FTP and looping, and extracting metadata, it is possible to get an overview of the data on the host presented in a Microsoft Excel spreadsheet.
Jesper Michelsen, Nykredit
Do you know everything you need to know about missing values? Do you know how to assign a missing value to multiple variables with one statement? Can you display missing values as something other than . or blank? How many types of missing numeric values are there? This paper reviews techniques for assigning, displaying, referencing, and summarizing missing values for numeric variables and character variables.
Christopher Bost, MDRC
SAS® has an amazing arsenal of tools to use and display geographic information that is relatively unknown and underutilized. This presentation will highlight both new and existing capacities for creating stunning, informative maps as well as using geographic data in other ways. SAS provided map data files, functions, format libraries and other geographic data files will be explored in detail. Custom mapping of geographic areas will be discussed. Maps produced will include use of both the annotate facility (including some new functions) and PROC GREPLAY. Products used are Base SAS® and SAS/GRAPH®. SAS programmers of any skill level will benefit from this presentation.
Louise Hadden, Abt Associates Inc.