SAS Global Forum 2014 Proceedings

Nowadays, most corporations build and maintain their own data warehouse, and an ETL (Extract, Transform, and Load) process plays a critical role in managing the data. Some people might create a large program and execute this program from top to bottom. Others might generate a SAS^® driver with several programs included, and then execute this driver. If some programs can be run in parallel, then developers must write extra code to handle these concurrent processes. If one program fails, then users can either rerun the entire process or comment out the successful programs and resume the job from where the program failed. Usually the programs are deployed in production with read and execute permission only. Users do not have the priviledge of modifying codes on the fly. In this case, how do you comment out the programs if the job terminated abnormally? This paper illustrates an approach for managing ETL process flows. The approach uses a framework based on SAS, on a UNIX platform. This is a high-level infrastructure discussion with some explanation of the SAS codes that are used to implement the framework. The framework supports the rerun or partial run of the entire process without changing any source codes. It also supports the concurrent process, and therefore no extra code is needed.

Complex data manipulations can be resource intensive, both in terms of development time and processing duration. However, in recent years SAS has introduced a number of new technologies that, when used together, can produce a dramatic increase in performance while simultaneously simplifying program development and maintenance. This paper presents a development paradigm that utilizes the problem decomposition capabilities of DS2, the flexibility of SQL, and the performance benefits of in-memory storage using hash objects.

Hip fractures are a common source of morbidity and mortality among the elderly. While multiple prior studies have identified risk factors for poor outcomes, few studies have presented a validated method for stratifying patient risk. The purpose of this study was to develop a simple risk score calculator tool predictive of 30-day morbidity after hip fracture. To achieve this, we prospectively queried a database maintained by The American College of Surgeons (ACS) National Surgical Quality Improvement Program (NSQIP) to identify all cases of hip fracture between 2005 and 2010, based on primary Current Procedural Terminology (CPT) codes. Patient demographics, comorbidities, laboratory values, and operative characteristics were compared in a univariate analysis, and a multivariate logistic regression analysis was then used to identify independent predictors of 30-day morbidity. Weighted values were assigned to each independent risk factor and were used to create predictive models of 30-day complication risk. The models were internally validated with randomly partitioned 80%/20% cohort groups. We hypothesized that significant predictors of morbidity could be identified and used in a predictive model for a simple risk score calculator. All analyses are performed via SAS^® software.

This paper shows users how they can use a SAS^® macro named %SURVEYGLM to incorporate information about survey design to Generalized Linear Models (GLM). The R function %svyglm (Lumley, 2004) was used to verify the suitability of the %SURVEYGLM macro estimates. The results show that estimates are closer than the R function and that new distributions can be easily added to the algorithm.

Influence analysis in statistical modeling looks for observations that unduly influence the fitted model. Cook s distance is a standard tool for influence analysis in regression. It works by measuring the difference in the fitted parameters as individual observations are deleted. You can apply the same idea to examining influence of groups of observations for example, the multiple observations for subjects in longitudinal or clustered data but you need to adapt it to the fact that different subjects can have different numbers of observations. Such an adaptation is discussed by Zhu, Ibrahim, and Cho (2012), who generalize the subject size factor as the so-called degree of perturbation, and correspondingly generalize Cook s distances as the scaled Cook s distance. This paper presents the %SCDMixed SAS^® macro, which implements these ideas for analyzing influence in mixed models for longitudinal or clustered data. The macro calculates the degree of perturbation and scaled Cook s distance measures of Zhu et al. (2012) and presents the results with useful tabular and graphical summaries. The underlying theory is discussed, as well as some of the programming tricks useful for computing these influence measures efficiently. The macro is demonstrated using both simulated and real data to show how you can interpret its results for analyzing influence in your longitudinal modeling.

The ExcelXP tagset offers several options for controlling column widths, including Width_Points, Width_Fudge, and Absolute_Column_Width. Although Absolute_Column_Width might seem unpredictable at first, it is possible to fix the first two options so that the Absolute_Column_Width is the exact column width in pixels. This poster presents these settings and suggests how to create and manage the integer string of column widths.

The Base SAS^® 9.4 Output Delivery System (ODS) EPUB destination enables users to deliver SAS^® reports as e-books on Apple mobile devices. The first maintenance release of SAS^® 9.4 adds the ODS EPUB3 destination, which offers powerful new multimedia and presentation features to report writers. This paper shows you how to include images, audio, and video in your ODS EPUB3 e-book reports. You learn how to use publishing presentation techniques such as sidebars and multicolumn layouts. You become familiar with best practices for accessibility when employing these new features in your reports. This paper provides advanced instruction for writing e-books with ODS EPUB. Please bring your iPad, iPhone, or iPod to the presentation so that you can download and read the examples.

SAS^® 9.4 has overhauled web authentication schemes, and the integration with enterprise security infrastructure is quite different from that of SAS^® 9.3. This paper examines advanced security features such as Secure Sockets Layer (SSL) configuration, single sign-on (SSO) support through Integrated Windows authentication (IWA), and third-party security packages like CA SiteMinder and IBM Tivoli Access Manager and WebSEAL. FIPS 140-2 compliance efforts that enforce the use of a stronger encryption algorithm for web communication and the SAS^® system itself are also described. The authentication support for mobile devices such as the iPad is different. The secure Wi-Fi connection from a mobile device to the IT internal resources, as well as how it can be safely integrated into the enterprise security configuration by using the same user repository as the SAS web applications, is explained. The configuration example is shown with SAS^® Visual Analytics 6.2.

SAS^® 9.4 and SAS^® Visual Analytics support a wide list of authentication protocols such as Integrated Windows authentication (IWA), client certificate, IBM WebSEAL, CA SiteMinder, and Security Assertion Markup Language (SAML) 2.0. However, advanced customers might want to use some of these protocols together and also have the flexibility to select which protocols to use. In this paper, we focus on a fallback authentication framework that supports IWA as the primary authentication method. When IWA fails, it uses the X509 client certificate as the secondary authentication method, and when the client certificate fails, it uses the form-based username/password as the last option. The paper first introduces the security architecture of SAS^® 9.4 and SAS Visual Analytics. It then reviews the three above-mentioned security protocols. Further, it introduces the detailed fallback authentication framework and discusses how to configure it. Finally, we discuss the use of this framework in the customer scenario from implementing the fallback authentication framework in a customer s SAS^® 9.4 and SAS Visual Analytics environment.

This paper is an introduction to SAS^® Studio and covers how to perform basic programming tasks in SAS Studio. Many people program in the SAS^® language by using SAS Display Manager or SAS^® Enterprise Guide^®. SAS Studio is different because it enables you to write and run SAS code by using the most popular web browsers, without requiring a SAS^® 9.4 installation on your machine. With SAS Studio, you can access your data files, libraries, and existing programs, and write new programs while using SAS software behind the scenes. SAS Studio connects to a SAS sever in order to process SAS programs. The SAS server can be a hosted server in a cloud environment, a server in your local environment, or a copy of SAS on your local machine.

The Kolmogorov-Smirnov (K-S) test is one of the most useful and general nonparametric methods for comparing two samples. It is sensitive to all types of differences between two populations (shift, scale, shape, and so on). In this paper, we will present a thorough investigation into the K-S test including, derivation of the formal test procedure, practical demonstration of the test, large sample approximation of the test, and ease of use in SAS^® using the NPAR1WAY procedure.

Big data! Hadoop! MapReduce! These are all buzzwords that you ve probably already heard mentioned at SAS^® Global Forum 2014. But what exactly is MapReduce and what has it got to do with SAS^®? This talk explains how a simple processing framework (created by Google and more recently popularized by the open-source technology Hadoop) can be replicated using cornerstone SAS technologies such as Base SAS^®, SAS macros, and SAS/CONNECT^®. The talk explains how, out of the box, the SAS DATA step can replicate the MAP function. It looks at how well-established SAS procedures can be used to create reduce-like functionality. We look at how parallel processing data across multiple machines using MPCONNECT can replicate MapReduce s shared-nothing approach to data processing.

The proliferation of textual data in business is overwhelming. Unstructured textual data is being constantly generated via call center logs, emails, documents on the web, blogs, tweets, customer comments, customer reviews, and so on. While the amount of textual data is increasing rapidly, businesses ability to summarize, understand, and make sense of such data for making better business decisions remain challenging. This presentation takes a quick look at how to organize and analyze textual data for extracting insightful customer intelligence from a large collection of documents and for using such information to improve business operations and performance. Multiple business applications of case studies using real data that demonstrate applications of text analytics and sentiment mining using SAS^® Text Miner and SAS^® Sentiment Analysis Studio are presented. While SAS^® products are used as tools for demonstration only, the topics and theories covered are generic (not tool specific).

In randomized experiments, it is generally assumed that the hierarchical structures and variances are the same in the treatment and control groups. In some situations, however, these structures and variance components can differ. Consider a randomized experiment in which individuals randomized to the treatment condition are further assigned to clusters in which the intervention is administered, but no such clustering occurs in the control condition. Such a structure can occur, for example, when the individuals in the treatment condition are randomly assigned to group therapy sessions or to mathematics tutoring groups; individuals in the control condition do not receive group therapy or mathematics tutoring and therefore do not have that level of clustering. In this example, individuals in the treatment condition have a hierarchical structure, but individuals in the control condition do not. If the therapists or tutors differ in efficacy, the clustering in the treatment condition induces an extra source of variability in the data that needs to be accounted for in the analysis. We show how special features of SAS^® PROC MIXED and PROC GLIMMIX can be used to analyze data in which one or more treatment groups have a hierarchical structure that differs from that in the control group. We also discuss how to code variables in order to increase the computational efficiency for estimating parameters from these designs.

Hierarchical data are common in many fields, from pharmaceuticals to agriculture to sociology. As data sizes and sources grow, information is likely to be observed on nested units at multiple levels, calling for the multilevel modeling approach. This paper describes how to use the GLIMMIX procedure in SAS/STAT^® to analyze hierarchical data that have a wide variety of distributions. Examples are included to illustrate the flexibility that PROC GLIMMIX offers for modeling within-unit correlation, disentangling explanatory variables at different levels, and handling unbalanced data. Also discussed are enhanced weighting options, new in SAS/STAT 13.1, for both the MODEL and RANDOM statements. These weighting options enable PROC GLIMMIX to handle weights at different levels. PROC GLIMMIX uses a pseudolikelihood approach to estimate parameters, and it computes robust standard error estimators. This new feature is applied to an example of complex survey data that are collected from multistage sampling and have unequal sampling probabilities.

A central component of discussions of healthcare reform in the U.S. are estimations of healthcare cost and use at the national or state level, as well as for subpopulation analyses for individuals with certain demographic properties or medical conditions. For example, a striking but persistent observation is that just 1% of the U.S. population accounts for more than 20% of total healthcare costs, and 5% account for almost 50% of total costs. In addition to descriptions of specific data sources underlying this type of observation, we demonstrate how to use SAS^® to generate these estimates and to extend the analysis in various ways; that is, to investigate costs for specific subpopulations. The goal is to provide SAS programmers and healthcare analysts with sufficient data-source background and analytic resources to independently conduct analyses on a wide variety of topics in healthcare research. For selected examples, such as the estimates above, we concretely show how to download the data from federal web sites, replicate published estimates, and extend the analysis. An added plus is that most of the data sources we describe are available as free downloads.

A SAS^® license of any organization consists of a variety of SAS components such as SAS/STAT^®, SAS/GRAPH^®, SAS/OR^®, and so on. SAS administrators do not have any automated tool supplied with Base SAS^® software to find how many licensed copies are being actively used, how many SAS users are actively utilizing the SAS server, and how many SAS datasets are being referenced. These questions help a SAS administrator to take important decisions such as controlling SAS licenses, removing inactive SAS users, purging long-time non-referenced SAS data sets, and so on. With the help of a system parameter that is provided by SAS and called RTRACE, these questions can be answered. The goal of this paper is to explain the setup of the RTRACE parameter and to explain its use in making the SAS administrator s life easy. This paper is based on SAS^® 9.2 running on AIX operating system.

Many different neuroscience researchers have explored how various parts of the brain are connected, but no one has performed association mining using brain data. In this study, we used SAS^® Enterprise Miner^™ 7.1 for association mining of brain data collected by a 14-channel EEG device. An application of the association mining technique is presented in this novel context of brain activities and by linking our results to theories of cognitive neuroscience. The brain waves were collected while a user processed information about Facebook, the most well-known social networking site. The data was cleaned using Independent Component Analysis via an open source MATLAB package. Next, by applying the LORETA algorithm, activations at every fraction of the second were recorded. The data was codified into transactions to perform association mining. Results showing how various parts of brain get excited while processing the information are reported. This study provides preliminary insights into how brain wave data can be analyzed by widely available data mining techniques to enhance researcher s understanding of brain activation patterns.

With a growing enterprise analytics environment that comprises global users and a variety of sensitive data sources, a system administrator is faced with the challenge of knowing who logs into the system, how often, and what applications and what data sources are being consumed. This information is necessary for auditing the consumers of data as well as for monitoring the growth of data sources for hardware expansion. With the use of SAS^® Audit, Performance and Measurement Package, along with some additional middle-tier logging and SAS^® code, information about the major consumers of the environment can be loaded into LASR tables and analyzed with SAS^® Visual Analytics reporting tools.

In our previous work, we often needed to perform large numbers of repetitive and data-driven post-campaign analyses to evaluate the performance of marketing campaigns in terms of customer response. These routine tasks were usually carried out manually by using Microsoft Excel, which was tedious, time-consuming, and error-prone. In order to improve the work efficiency and analysis accuracy, we managed to automate the analysis process with SAS^® programming and replace the manual Excel work. Through the use of SAS macro programs and other advanced skills, we successfully automated the complicated data-driven analyses with high efficiency and accuracy. This paper presents and illustrates the creative analytical ideas and programming skills for developing the automatic analysis process, which can be extended to apply in a variety of business intelligence and analytics fields.

You can provide access and visibility to SAS^® BI Dashboards, SAS^® Stored Processes, and SAS^® Visual Analytics through the use of SAS^® Web Parts for Microsoft SharePoint. In many organizations, the administrators who are responsible for SharePoint and SAS^® are different. This paper provides best practices for the deployment of SAS Web Parts for Microsoft SharePoint. Bridging the gap between SharePoint and SAS is especially important for people who are not familiar with SharePoint administration. This paper also provides tips for co-existence between SAS Web Parts for Microsoft SharePoint 6.1 and 5.1. (The 5.1 release is available in SAS^® 9.3. The 6.1 release is available in SAS^® 9.4.) Finally, this paper provides some guidance on DNS, permissions, and installation techniques the fine points that make or break your deployment!

Digital data has manifested into a classic BIG DATA challenge for marketers who want to push past the retroactive analysis limitations of traditional web analytics. The current groundswell of digital device adoption and variety of digital interactions grows larger year after year. The opportunity for 'digital intelligence' has arrived, as traditional web analytic techniques were not designed for the breadth of channels, devices, and pace that fuels consumer experiences. In parallel, today's landscape for data visualization, advanced analytics, and our ability to process very large amounts of multi-channel information is changing. The democratization of analytics for the masses is upon us, and marketers have the oppourtunity to take advantage of descriptive, predictive, and (most importantly) prescriptive data-driven insights. This presentation describes how organizations can use SAS^® products, specifically SAS^® Visual Analytics and SAS^® Adaptive Customer Experience, to overcome the limitations of web analytics, and support data-driven integrated marketing objectives.

The Affordable Care Act (ACA) contains provisions that have stimulated interest in analytics among health care providers, especially those provisions that address quality of outcomes. High Impact Technologies (HIT) has been addressing these issues since before passage of the ACA and has a Health Care Data Model recognized by Gartner and implemented at several health care providers. Recently, HIT acquired SAS^® Visual Analytics, and this paper reports our successful efforts to use SAS Visual Analytics for visually exploring Big Data for health care providers. Health care providers can suffer significant financial penalties for readmission rates above a certain threshold and other penalties related to quality of care. We have been able to use SAS Visual Analytics, coupled with our experience gained from implementing the HIT Healthcare Data Model at a number of Healthcare providers, to identify clinical measures that are significant predictors for readmission. As a result, we can help health care providers reduce the rate of 30-day readmissions.

Inference of variance components in linear mixed effect models (LMEs) is not always straightforward. I introduce and describe a flexible SAS^® macro (%COVTEST) that uses the likelihood ratio test (LRT) to test covariance parameters in LMEs by means of the parametric bootstrap. Users must supply the null and alternative models (as macro strings), and a data set name. The macro calculates the observed LRT statistic and then simulates data under the null model to obtain an empirical p-value. The macro also creates graphs of the distribution of the simulated LRT statistics. The program takes advantage of processing accomplished by PROC MIXED and some SAS/IML^® functions. I demonstrate the syntax and mechanics of the macro using three examples.

A case control study is in its most basic form comparing a case series to a matched control series and are commonly implemented in the field of public health. While matching is intended to eliminate confounding, the main potential benefit of matching in case control studies is a gain in efficiency. There are many known methods for selecting potential match or matches (in case of 1:n studies) per case, the most prominent being distance-based approach and matching on propensity scores. In this paper, we will go through both and compare their results and will present a macro capable of performing both.

This paper demonstrates the new case-level residuals in the CALIS procedure and how they differ from classic residuals in structural equation modeling (SEM). Residual analysis has a long history in statistical modeling for finding unusual observations in the sample data. However, in SEM, case-level residuals are considerably more difficult to define because of 1) latent variables in the analysis and 2) the multivariate nature of these models. Historically, residual analysis in SEM has been confined to residuals obtained as the difference between the sample and model-implied covariance matrices. Enhancements to the CALIS procedure in SAS/STAT^® 12.1 enable users to obtain case-level residuals as well. This enables a more complete residual and influence analysis. Several examples showing mean/covariance residuals and case-level residuals are presented.

With the growth in size and complexity of organizations investing in SAS^® platform technologies, the size and complexity of ETL subsystems and data integration (DI) jobs is growing at a rapid rate. Developers are pushed to come up with new and innovative ways to improve process efficiency in their DI jobs to meet increasingly demanding service level agreements (SLAs). The ability to conditionally execute or switch paths in a DI job is an extremely useful technique for improving process efficiency. How can a SAS^® Data Integration developer design a job to best suit conditional execution? This paper discusses a technique for providing a parameterized dynamic execution custom transformation that can be easily incorporated into SAS^® Data Integration Studio jobs to provide process path switching capabilities. The aim of any data integration task is to ensure that all sources of business data are integrated as efficiently as possible. It is concerned with the repurposing of data via transformation, should be a value-adding process, and also should be the product of collaboration. Modularization of common or repeatable processes is a fundamental part of the collaboration process in DI design and development. Switch path a custom transformation built to conditionally execute branches or nodes in SAS Data Integration Studio provides a reusable module for solving the conditional execution limitations of standard SAS Data Integration Studio transformations and jobs. Switch Path logic in SAS Data Integration Studio can serve many purposes in day-to-day business needs for a SAS data integration developer as it is completely reusable

If you have an existing SAS^® Business Intelligence environment and you want to add SAS^® Visual Analytics, you need to make some architectural choices. SAS Visual Analytics and SAS Business Intelligence can share certain components, such as a SAS^® Metadata Server and the SAS^® Web Infrastructure Platform. Sharing metadata eliminates the need to create and maintain duplicate information, and it enables your users to take advantage of functionality that can be shared between SAS Visual Analytics and SAS Business Intelligence. Sharing the SAS Web Infrastructure Platform enables SAS middle-tier applications such as SAS^® Visual Analytics Services and SAS^® Web Report Studio to communicate with each other. Intended for SAS architects and administrators, this paper explores supported architecture for SAS Visual Analytics and SAS Business Intelligence. The paper then identifies areas where the architecture can be shared as well as where resources should be kept separate. In addition, the paper offers recommendations and other considerations to keep in mind when you are managing shared resources.

The big questions in consumer research lead to statistical methods appropriate to them. 'What do consumers say?' is all about analyzing surveys and finding relationships between preferences and background attributes. 'What do consumers think? is about looking at higher-level structures like preference mappings that can be derived from ratings. 'What will consumers pay?' is about conducting choice experiments to pin down the way consumers trade off among features and with prices, with the willingness to pay. 'How do you trigger purchases?' is about experiments that determine which interventions work, and how to target them to potential consumers, with uplift modeling. The SAS product JMP^® version 11 was released last fall with a new group of modeling tools to address these and other questions in consumer research. Traditionally JMP has specialized in engineering tools, but consumer research is an important part of engineering, in product planning, to make sure you produce the products with the attributes consumers want.

The Census Bureau conducts the Common Core of Data surveys for the National Center for Education Statistics annually. We have written SAS^® programs to automate the database documentation. We try to avoid including hard-coded values in the programs. Thanks to a record layout spreadsheet, the analysts can quickly update the survey metadata outside the SAS programs. This paper explains how SAS can read the record layout spreadsheet to create formats on the fly. The analysts can update the values as changes occur over time without having to worry about writing correct SAS syntax. Behind the scenes, SAS is using dictionary views, macros, ODS OUTPUT, PROC TEMPLATE, PROC FORMAT, the ODS Report Writing Interface, and RTF to create the desired results. This paper uses syntax for SAS^® 9.2, written for programmers at the intermediate level.

LaTeX is a free document creation package that is often used to create journal articles. It provides the capability to create very specific formatting and to write a wide variety of formulas. Using ODS, SAS^® can write documents to a LaTeX file, which can then be compiled through LaTeX into PDF files. This paper briefly reviews the basic syntax and options to produce these files. Then, we look at how to create a new tagset to make changes to the standard ODS LaTeX templates to create the non-gridded table appearance that is typically seen in journal articles. We also explore how to write special characters and equations not otherwise available through ODS LaTeX.

Business Intelligence platforms provide a bridge between expert data analysts and decision-makers and other end-users. But what do you do when you can identify no system that meets both your needs and your budget? If you are the Consolidated Data Analysis Center in the HHS Office of Inspector General, you use SAS^® Enterprise BI Server and the SAS^® Stored Process Web Application to build your own. This presentation covers the inception, design, and implementation of the PAYment by Geographic Area (PAYGAR) system, which uses only SAS^® Enterprise BI tools, namely the SAS Stored Process Web Application, PROC GMAP, and HTML/JAVA embedded in a DATA step, to create an interactive platform for presenting and exploring data that has a geographic component. In particular, the presentation reviews how we created a system of chained stored processes to enable a user to select the data to be presented, navigate through different geographic levels, and display companion reports related to the current data and geographic selections. It also covers the creation of the HTML front-end that sits over and manages the system. Throughout, the presentation emphasizes the scalability of PAYGAR, which the SAS Stored Process Web Application facilitates.

In this new era of healthcare reform, health insurance companies have heightened their efforts to pinpoint who their customers are, what their characteristics are, what they look like today, and how this impacts business in today s and tomorrow s healthcare environment. The passing of the Healthcare Reform policies led insurance companies to focus and prioritize their projects on understanding who the members in their current population were. The goal was to provide an integrated single view of the customer that could be used for retention, increased market share, balancing population risk, improving customer relations, and providing programs to meet the members' needs. By understanding the customer, a marketing strategy could be built for each customer segment classification, as predefined by specific attributes. This paper describes how SAS^® was used to perform the analytics that were used to characterize their insured population. The high-level discussion of the project includes regression modeling, customer segmentation, variable selection, and propensity scoring using claims, enrollment, and third-party psychographic data.

Merging or joining data sets is an integral part of the data consolidation process. Within SAS^®, there are numerous methods and techniques that can be used to combine two or more data sets. We commonly think that within the DATA step the MERGE statement is the only way to join these data sets, while in fact, the MERGE is only one of numerous techniques available to us to perform this process. Each of these techniques has advantages, and some have disadvantages. The informed programmer needs to have a grasp of each of these techniques if the correct technique is to be applied. This paper covers basic merging concepts and options within the DATA step, as well as a number of techniques that go beyond the traditional MERGE statement. These include fuzzy merges, double SET statements, and the use of key indexing. The discussion will include the relative efficiencies of these techniques, especially when working with large data sets.

Cross-visit checks are a vital part of data cleaning for longitudinal studies. The nature of longitudinal studies encourages repeatedly collecting the same information. Sometimes, these variables are expected to remain static, go away, increase, or decrease over time. This presentation reviews the na ve and the better approaches at handling one-variable and two-variable consistency checks. For a single-variable check, the better approach features the new ALLCOMB function, introduced in SAS^® 9.2. For a two-variable check, the better approach uses the .first pseudo-class to flag inconsistencies. This presentation will provide you the tools to enhance your longitudinal data cleaning process.

Having data that are consistent, reliable, and well linked is one of the biggest challenges faced by financial institutions. The paper describes how the SAS^® Data Management offering helps to connect people, processes, and technology to deliver consistent results for data sourcing and analytics teams, and minimizes the cost and time involved in the development life cycle. The paper concludes with best practices learned from various enterprise data initiatives.

New innovative, analytical techniques are necessary to extract patterns in big data that have temporal and geo-spatial attributes. An approach to this problem is required when geo-spatial time series data sets, which have billions of rows and the precision of exact latitude and longitude data, make it extremely difficult to locate patterns of interest The usual temporal bins of years, months, days, hours, and minutes often do not allow the analyst to have control of the precision necessary to find patterns of interest. Geohashing is a string representation of two-dimensional geometric coordinates. Time hashing is a similar representation, which maps time to preserve all temporal aspects of the date and time of the data into a one-dimensional set of data points. Geohashing and time hashing are both forms of a Z-order curve, which maps multidimensional data into single dimensions and preserves the locality of the data points. This paper explores the use of a multidimensional Z-order curve, combining both geohashing and time hashing, that is known as geo-temporal hashing or space-time boxes using SAS^®. This technique provides a foundation for reducing the data into bins that can yield new methods for pattern discovery and detection in big data.

This paper outlines the techniques that I have used with my clients over the last five years to build powerful applications that run from a web browser. The user interface is presented using HTML and JavaScript, which is generated by SAS^® Stored Processes. A JavaScript framework called Ext JS is used to build components such as tables and graphs, which have a lot of functionality built in. A range of SAS^® macros are used for building HTML and JavaScript, so the generation of the user interface is simplified. This technique has been used to create a medical monitoring system, the UK Census MIS, and a bank's risk management application. I also discuss some techniques involved with integrating a system like this with SAS^® Portal, cubes, and web reports.

Particle swarm optimization is a heuristic global optimization method that was given by James Kennedy and Russell C. Eberhart in 1995. (James Kennedy and Russell C. Eberhart). The purpose of this paper develops a code for particle swarm optimization in SAS^® 9.2.

APP is an unofficial collective abbreviation for the SAS^® functions ADDR, PEEK, PEEKC, the CALL POKE routine, and their so-called LONG 64-bit counterparts the SAS tools designed to directly read from and write to physical memory in the DATA step. APP functions have long been a SAS dark horse. First, the examples of APP usage in SAS documentation amount to a few technical report tidbits intended for mainframe system programming, with nary a hint how the functions can be used for data management programming. Second, the documentation note on the CALL POKE routine is so intimidating in tone that many potentially receptive folks might decide to avoid the allegedly precarious route altogether. However, little can stand in the way of an inquisitive SAS programmer daring to take a close look, and it turns out that APP functions are very simple and useful tools! They can be used to explore how things really work, to make code more concise, to implement en masse data movement, and they can often dramatically improve execution efficiency. The author and many other SAS experts (notably Peter Crawford, Koen Vyverman, Richard DeVenezia, Toby Dunn, and the fellow masked by his 'Puddin' Man' sobriquet) have been poking around the SAS APP realm on SAS-L and in their own practices since 1998, occasionally letting the SAS community at large to peek at their findings. This opus is an attempt to circumscribe the results in a systematic manner. Welcome to the APP world! You are in for a few glorious surprises.

The health-care industry in the United States is going through a paradigm shift moving away from its focus on treating diseases and toward promoting health, wellness, and preventive public health programs, so that both the individuals and the government can maintain a healthy bottom line. The high-level business problem is to reduce the expected medical costs and number of medical services required by the people of New Hampshire by implementing successful disease prevention programs. The objective is to identify which among the six prevention programs will successfully improve the health of the residents of New Hampshire over nine future years (2012 2020). The business scenario of the case is to identify the preventive programs that are most effective in reducing the costs in New Hampshire and to invest the money in those programs so that the overall health-care overhead costs can be reduced or controlled. The effectiveness of implementing the preventive programs was evaluated using SAS^® Enterprise Guide^® 5.1 and SAS^® Enterprise Miner^™ 12. Time series analysis, in particular, forecasting, is used to project the future health-care services and costs for the years from 2012 to 2020. Our analysis showed that all the preventive programs should be implemented concurrently. The minimum anticipated savings in cost is approximately $572,111 or 3.3% of the expected baseline cost of $17,297,931. Therefore, our recommendation is to use this cost reduction figure, $572,111, as the initial funding investment toward initiating the six prevention programs concurrently, so that tangible results can be noticed by 2020.

Vistaprint saw the opportunity in the printing market to get more out of high-volume printing by grouping similar orders in large groups. They heavily rely on technology to handle design, printing, and order handling and use the Internet as a medium. With their successful expansion across the world, the issue they were facing was a lot of one-time buyers and a lot of registered users who didn't finish the check-out. The need to implement a retention strategy was the next logical step, for which they chose SAS^® Campaign Management. In this session, Vistaprint explains how they use campaign management for retention and how the project was addressed. They will also touch on how the concept of high performance could open up new possibilities for them.

With the introduction of new features in SAS^® 9.4 Grid Manager, administrators of SAS solutions have even better capabilities for effectively managing the use of SAS^® Enterprise Guide^® in a grid environment. In this paper, we explain and demonstrate proven practices for configuring the SAS 9.4 Grid Manager environment, leveraging grid options sets and grid-spawned SAS^® Workspace Servers. We walk through the options provided by SAS Enterprise Guide that make the most effective use of the grid environment.

Literature suggests two main approaches, parametric and non-parametric, for constructing efficiency frontiers on which efficiency scores of other units can be based. Parametric functions can be either deterministic or stochastic in nature. However, when multiple inputs and outputs are encountered, Data Envelopment Analysis (DEA), a non-parametric approach, is a powerful tool used for decades in measurement of productivity/efficiency with a wide range of applications. Both approaches have advantages and limitations. This paper attempts to further explore and validate a hybrid approach, taking the best of both the DEA and the parametric approach, in order to estimate efficiency of Decision Making Units (DMUs) in an even better way.

Typically, it takes a system administrator to understand the graphic data results that are generated in the Microsoft Windows Performance Monitor. However, using SAS/GRAPH^® software, you can customize performance results in such a way that makes the data easier to read and understand than the data that appears in the default performance monitor graphs. This paper uses a SAS^® data set that contains a subset of the most common performance counters to show how SAS programmers can create an improved, easily understood view of the key performance counters by using SAS/GRAPH software. This improved view can help your organization reduce resource bottlenecks on systems that range from large servers to small workstations. The paper begins with a concise explanation of how to collect data with Windows Performance Monitor. Next, examples are used to illustrate the following topics in detail: converting and formatting a subset of the performance-monitor data into a data set using a SAS program to generate clearly labeled graphs that summarize performance results analyzing results in different combinations that illustrate common resource bottlenecks

Pipeline parallelism, an extension of MP Connect, is an effective way to speed processing. Piping allows the typical programming sequence of DATA step followed by PROC to execute in parallel. Piping uses TCP ports to pass records directly from the DATA step to the PROC immediately as each individual record is processed. The DATA step in effect becomes a data transformation filter for the PROC , running in parallel and incurring no additional disk storage or related I/O lag. Establishing a pipe with MP Connect typically requires specifying a physical TCP port to be used by the writing and by the reading processes. Coding in this style opens the possibility for users to generate systems conflicts by inadvertently requesting ports that are in use. SAS^® Metadata Server allows one to allocate ports dynamically; that is, users can use a symbolic name for the port with the server dynamically determining an unused port to temporarily assign to the SAS^® job. While this capability is attractive, implementing SAS Metadata Server on a system which does not use any of the other SAS BI technology can be inefficient from a cost perspective. To enable dynamic port allocation without the added cost, we created a UNIX script which can be called from within SAS to ascertain which ports are available at runtime. The script returns a list of available ports which is captured in a SAS macro variable and subsequently used in establishing pipeline parallelism.

Given a time series data set, you can use automatic time series modeling software to select an appropriate time series model. You can use various statistics to judge how well each candidate model fits the data (in-sample). Likewise, you can use various statistics to select an appropriate model from a list of candidate models (in-sample or out-of-sample or both). Finally, you can use rolling simulations to evaluate ex-ante forecast performance over several forecast origins. This paper demonstrates how you can use SAS^® Forecast Server Procedures and SAS^® Forecast Studiosoftware to perform the statistical analyses that are related to rolling simulations.

Potential of One, Power of All. That has a really nice ring to it, especially as it pertains to accessing all of your corporate data through one single data access point. It means the potential of having a single source for all of your data connections from throughout the enterprise. It also means that the complexities of connecting to these data assets from the various source systems throughout the enterprise are hidden from the end user. With this, however, comes the possibility of placing personally identifiable information in the hands of a user who should not have access to it. The bottom line is that there is risk and uncertainty with allowing users to have access to data that is disallowed by your existing data governance strategy. Blocking these data elements from specific users or groups of users is a challenge that many corporations face today, whether it is secure financial information, confidential personnel records, or personal medical information protected by strict regulations. How do you surface All necessary data to All necessary users, while at the same time maintaining the security of the data? SAS^® Federation Server Manager is an easy-to-use interface that allows the data administrator to manage your data assets in such a way that it alleviates this risk by controlling access to critical data elements and maintaining the proper level of data disclosure control. This session focuses on how to employ various data access control strategies from within SAS Federation Server Manager.

When you want to know the details about a small subset of a much larger data set, it can take a long time to select the records you need. This paper shows you how to create a user-defined SAS^® format to pull only the observations that you want out of a big data source. Even when selecting a million records out of data sets that can have more than 100 million records, this method is much quicker than either a PROC SQL join or a SAS merge.

Each month, our project team delivers updated 5-Star ratings for 15,700+ nursing homes across the United States to Centers for Medicare and Medicaid Services. There is a wealth of data (and processing) behind the ratings, and this data is longitudinal in nature. A prior paper in this series, 'Programming the Provider Previews: Extreme SAS^® Reporting,' discussed one aspect of the processing involved in maintaining the Nursing Home Compare website. This paper will discuss two other aspects of our processing: creating an annual data Compendium and extending the 5-star processing to accommodate several different output formats for different purposes. Products used include Base SAS^®, SAS/STAT^®, ODS Graphics procedures, and SAS/GRAPH^®. New annotate facilities in both SAS/GRAPH and the ODS Graphics procedures will be discussed. This paper and presentation will be of most interest to SAS programmers with medium to advanced SAS skills.

In this interconnected world, it is becoming ever more important to understand not just details about your data, but also how different parts of your data are related to each other. From social networks to supply chains to text analytics, network analysis is becoming a critical requirement and network visualization is one of the best ways to understand the results. The new SAS^® Visual Analytics network visualization shows links between related nodes as well as additional attributes such as color, size, or labels. This paper explains the basic concepts of networks as well as provides detailed background information on how to use network visualizations within SAS Visual Analytics.

PROC TABULATE is the most widely used reporting tool in SAS^®, along with PROC REPORT. Any kind of report with the desired statistics can be produced by PROC TABULATE. When we need to report some summary statistics like mean, median, and range in the heading, either we have to edit it outside SAS in word processing software or enter it manually. In this paper, we discuss how we can automate this to be dynamic by using PROC SQL and some simple macros.

This paper shares our experience integrating two leading data analytics and Geographic Information Systems (GIS) software products SAS^® and ArcGIS to provide integrated reporting capabilities. SAS is a powerful tool for data manipulation and statistical analysis. ArcGIS is a powerful tool for analyzing data spatially and presenting complex cartographic representations. Combining statistical data analytics and GIS provides increased insight into data and allows for new and creative ways of visualizing the results. Although products exist to facilitate the sharing of data between SAS and ArcGIS, there are no ready-made solutions for integrating the output of these two tools in a dynamic and automated way. Our approach leverages the individual strengths of SAS and ArcGIS, as well as the report delivery infrastructure of SAS^® Information Delivery Portal.

With the ever increasing proliferation of disparate complex data being collected and stored, it has never been more important that this information is accurate, clean, integrated, and often times in compliance with an expanding set of government regulations. This means that the data must be cleaned and standardized, duplicates must be identified and removed, and the individual data must be able to be joined or merged together in some way. However, it is often the case that this data does not have the same variables or values to make this possible with a simple Join or Merge. To that end, one has to employ a set of fuzzy logics or fuzzy matching. Simply put, fuzzy matching is the implementation of algorithmic processes (fuzzy logic) to determine the similarity between elements of data such as business names, people names, or address information. Fuzzy logic is used to predict the probability of data with non-exact matches to help in data cleansing, deduplication, or matching of disparate data sets. This paper shows the basics of using fuzzy logic by using SAS^® functions, COMPLEV, multiple variables matches, and a modified Porter stemming algorithm.

It is not uncommon to find models with random components like location, clinic, teacher, etc., not just the single error term we think of in ordinary regression. This paper uses several examples to illustrate the underlying ideas. In addition, the response variable might be Poisson or binary rather than normal, thus taking us into the realm of generalized linear mixed models, These too will be illustrated with examples.

Healthcare services data on products and services come in different shapes and forms. Data cleaning, characterization, massaging, and transformation are essential precursors to any statistical model-building efforts. In addition, data size, quality, and distribution influence model selection, model life cycle, and the ease with which business insights are extracted from data. Analysts need to examine data characteristics and determine the right data transformation and methods of analysis for valid interpretation of results. In this presentation, we demonstrate the common data distribution types for a typical healthcare services industry such as Cardinal Health and their salient features. In addition, we use Base SAS^® and SAS/STAT^® for data transformation of both the response (Y) and the explanatory (X) variables in four combinations [RR (Y and X as row data), TR (only Y transformed), RT (only X transformed), and TT (Y and X transformed)] and the practical significance of interpreting linear, logistic, and completely randomized design model results using the original and the transformed data values for decision-making processes. The reality of dealing with diverse forms of data, the ramification of data transformation, and the challenge of interpreting model results of transformed data are discussed. Our analysis showed that the magnitude of data variability is an overriding factor to the success of data transformation and the subsequent tasks of model building and interpretation of model parameters. Although data transformation provided some benefits, it complicated analysis and subsequent interpretation of model results.

Recent studies suggest that unstructured data, such as customer comments or feedback, can enhance the power of existing predictive models. SAS^® Text Miner can generate singular value decomposition (SVD) units from text documents, which is a vectorial representation of terms in documents. These SVDs, when used as additional inputs along with the existing structured input variables, often prove to capture the response better. However, SVD units are sort of black box variables and are not easy to interpret or explain. This is a big hindrance to win over the decision makers in the organizations to incorporate these derived textual data components in the models. In this paper, we demonstrate a new and powerful feature in SAS^® Text Miner 12.1 that helps in explaining the SVDs or the text cluster components. We discuss two important methods that are useful to interpreting them. For this purpose, we used data from a television network company that has transcripts of its call center notes from three prior calls of each customer. We are able to extract the key terms from the call center notes in the form of Boolean rules, which have contributed to the prediction of customer churn. These rules provide an intuitive sense of which set of terms, when occurring in either the presence or absence of another set of terms in the call center notes, might lead to a churn. It also provides insights into which customers are at a bigger risk of churning from the company s services and, more importantly, why.

This presentation is for users who are familiar with SAS^® Enterprise Guide^® but might not be aware of the many useful new features added in versions 4.2 and beyond. For example, SAS Enterprise Guide allows you to: Format your SAS^® source code to make it easier to read. Easily schedule a project to run at a given time. Work with OLAP data in your enterprise. We will overview these and other features to help you become even more productive using this powerful application.

This paper illustrates a permutation method for implementing multiple comparisons on Pearson s Chi-square test for an R×C contingency table, using the SAS^® FREQ procedure and a newly developed SAS macro called CHISQ_MC. This method is analogous to the Tukey-type multiple comparison method for one-way analysis of variance.

SAS^® solutions are tightly integrated with the scheduling capabilities provided by SAS^® Grid Manager and Platform Suite for SAS^®. Many organizations require that their corporate scheduler be used to control SAS processing within the enterprise. Historically this has been a laborious process, requiring duplication of job and flow information using manual forms and cumbersome change management. This paper provides proven techniques and methods that enable tight integration between the corporate scheduler and SAS without the administrative overhead. Platform Suite for SAS can be used to create flows which are then executed by the corporate scheduler. The business unit can tweak the flow without reference to the enterprise scheduling team. The approaches discussed are: Using the corporate scheduler to: Trigger SAS flows and to respond to flow return codes Restart a SAS flow that has exited due to error conditions Enable and disable LSF queues, allowing jobs that have been queued up to run within a time window that is managed on external dependencies rather than time How to configure your SAS environment to leverage the provided capabilities Real-world use cases to highlight the features and benefits of this approach The contents of this paper is of interest to SAS administrators and IT personnel responsible for enterprise scheduling. Full code and deployment instructions will be made available.

Despite its popularity in recent years, .NET development has yet to enjoy the quality, level, and depth of statistical support that has always been provided by SAS^®. And yet, many .NET applications could benefit greatly from the power of SAS and, likewise, some SAS applications could benefit from friendly graphical user interfaces (GUIs) supported by Microsoft s .NET Framework. What the author sets out to do here is to 1) outline the basic mechanics of automating SAS with .NET, 2) provide a framework and specific strategies for maintaining parallelism between the two platforms at runtime, and 3) sketch out put some simple applications that provide an exciting combination of powerful SAS analytics and highly accessible GUIs. The mechanics of automating SAS with .NET will be covered briefly. Attendees will learn the required objects and methods needed to pass information between the two platforms. The attendees will learn some strategies for organizing their projects and for writing SAS code that lends itself to automation. This will include embedding SAS scripts within a .NET project and managing communications between the two platforms. Specifically, the log and listing output will be captured and handled by .NET, and user actions will be interpreted and sent to the SAS engine. Example applications used throughout the session include a tool that converts between SAS variable types through simple drag-and-drop and an application that analyzes the growth of the user s computer hard drive.

The soaring number of publicly available data sets across disciplines have allowed for increased access to real-life data for use in both research and educational settings. These data often leverage cost-effective complex sampling designs including stratification and clustering, which allow for increased efficiency in survey data collection and analyses. Weighting becomes a necessary component in these survey data in order to properly calculate variance estimates and arrive at sound inferences through statistical analysis. Generally speaking, these weights are included with the variables provided in the public use data, though an explanation for how and when to use these weights is often lacking. This paper presents an analysis using the California Health Interview Survey to compare weighted and non-weighted results using SAS^® PROC LOGISTIC and PROC SURVEYLOGISTIC.

This paper provides a set of ideas about design elements of SAS^® macros. This paper is a checklist for programmers who write or test macros.

Effective graphs are indispensable for modern statistical analysis. They reveal tendencies that are not readily apparent in simple tables and add visual clarity to reports. My client is a big graph fan; he always shows me a lot of high-quality and complex sample graphs that were created by other software and asks me Can SAS^® duplicate these outputs? Often, by leveraging the capabilities of the ODS Graph Template Language and the SGRENDER procedure, the answer is Yes . Graph Template Language offers SAS users a more direct approach to customize the output and to overlay graphs in different levels. This paper uses cases drawn from a real work situation to demonstrate how to get the seemingly unattainable results with the power of Graph Template Language: utilizing bubble plots as your distribution density bars creating refreshing looking linear regression graphics with the slop information in the legend overlaying different plots together to create sophisticated analytical bottleneck test output

If you have been programming SAS^® for years, you have probably made Display Manager your own: customized window layout, program text colors, bookmarks, and abbreviations/keyboard macros. Now you are using SAS^® Enterprise Guide^®. Did you know you can have almost all the same modifications you had in Base SAS^® in SAS Enterprise Guide, plus more?

How do you compare group responses when the data are unbalanced or when covariates come into play? Simple averages will not do, but LS-means are just the ticket. Central to postfitting analysis in SAS/STAT^® linear modeling procedures, LS-means generalize the simple average for unbalanced data and complicated models. They play a key role both in standard treatment comparisons and Type III tests and in newer techniques such as sliced interaction effects and diffograms. This paper reviews the definition of LS-means, focusing on their interpretation as predicted population marginal means, and it illustrates their broad range of use with numerous examples.

Email is an important marketing channel for digital marketers. We can stay connected with our subscribers and attract them with relevant content as long as they are still subscribed to our email communication. In this session, we are planning to discuss why it's important to manage opt-out risk; how did we predict opt-out risk; and how do we proactively manage opt-out using the models we developed.

Data governance combines the disciplines of data quality, data management, data policy management, business process management, and risk management into a methodology that ensures important data assets are formally managed throughout an enterprise. SAS^® has developed a cohesive suite of technologies that can be used to implement efficient and effective data governance initiatives, thereby improving an enterprise s overall data management efficiency. This paper discusses data governance use cases and challenges, and provides an example of how to manage the data governance lifecycle to ensure success.

The capabilities of SAS^® have been extended by the use of macros and custom formats. SAS macro code libraries and custom format libraries can be stored in various locations, some of which may or may not always be easily and efficiently accessed from other operating environments. Code can be in various states of development ranging from global organization-wide approved libraries to very elementary just-getting-started code. Formalized yet flexible file structures for storing code are needed. SAS user environments range from standalone systems such as PC SAS or SAS on a server/mainframe to much more complex installations using multiple platforms. Strictest attention must be paid to (1) file location for macros and formats and (2) management of the lack of cross-platform portability of formats. Macros are relatively easy to run from their native locations. This paper covers methods of doing this with emphasis on: (a) the option sasautos to define the location and the search order for identifying macros being called, and (b) even more importantly the little-known SAS option MAUTOLOCDISPLAY to identify the location of the macro actually called in the saslog. Format libraries are more difficult to manage and cannot be created and run in a different operating system than that in which they were created. This paper will discuss the export, copying and importing of format libraries to provide cross-platform capability. A SAS macro used to identify the source of a format being used will be presented.

This paper describes a technique for calibrating street address match logic to maximize the match rate without introducing excessive erroneous matching.

One of the most common questions about logistic regression is How do I know if my model fits the data? There are many approaches to answering this question, but they generally fall into two categories: measures of predictive power (like R-squared) and goodness of fit tests (like the Pearson chi-square). This presentation looks first at R-squared measures, arguing that the optional R-squares reported by PROC LOGISTIC might not be optimal. Measures proposed by McFadden and Tjur appear to be more attractive. As for goodness of fit, the popular Hosmer and Lemeshow test is shown to have some serious problems. Several alternatives are considered.

In applied statistical practice, incomplete measurement sequences are the rule rather than the exception. Fortunately, in a large variety of settings, the stochastic mechanism governing the incompleteness can be ignored without hampering inferences about the measurement process. While ignorability only requires the relatively general missing at random assumption for likelihood and Bayesian inferences, this result cannot be invoked when non-likelihood methods are used. We will first sketch the framework used for contemporary missing-data analysis. Apart from revisiting some of the simpler but problematic methods, attention will be paid to direct likelihood and multiple imputation. Because popular non-likelihood-based methods do not enjoy the ignorability property in the same circumstances as likelihood and Bayesian inferences, weighted versions have been proposed. This holds true in particular for generalized estimating equations (GEE). Even so-called doubly-robust versions have been derived. Apart from GEE, also pseudo-likelihood based strategies can be adapted appropriately. We describe a suite of corrections to the standard form of pseudo-likelihood, to ensure its validity under missingness at random. Our corrections follow both single and double robustness ideas, and is relatively simple to apply.

Mobile devices are taking over conventional ways of sharing and presenting information in today s businesses and working environments. Accessibility to this information is a key factor for companies and institutions in order to reach wider audiences more efficiently. SAS^® software provides a powerful set of tools that allows developers to fulfill the increasing demand in mobile reporting without needing to upgrade to the latest version of the platform. Here at University of Central Florida (UCF), we were able to create reports targeting our iPad consumers at our executive level by using the SAS^® 9.2 Enterprise Business Intelligence environment, specifically SAS^® Web Report Studio 4.3. These reports provide them with the relevant data for their decision-making process. At UCF, the goal is to provide executive consumers with reports that fit on one screen in order to avoid the need of scrolling and that are easily exportable to PDF. This is done in order to respond to their demand to be able to accomodate their increasing use of portable technology to share sensitive data in a timely manner. The technical challenge is to provide specific data to those executive users requesting access through their iPad devices. Compatibility issues arise but are successfully bypassed. We are able to provide reports that fit on one screen and that can be opened as a PDF if needed. These enhanced capabilities were requested and well received by our users. This paper presents techniques we use in order to create mobile reports.

For most practitioners, ordinary least square (OLS) regression with a Gaussian distributional assumption might be the top choice for modeling fractional outcomes in many business problems. However, it is conceptually flawed to assume a Gaussian distribution for a response variable in the [0, 1] interval. In this paper, several modeling methodologies for fractional outcomes with their implementations in SAS^® are discussed through a data analysis exercise in predicting corporate financial leverage ratios. Various empirical and conceptual methods for the model evaluation and comparison are also discussed throughout the example. This paper provides a comprehensive survey about how to model fractional outcomes.

Predicting loss given default (LGD) is playing an increasingly crucial role in quantitative credit risk modeling. In this paper, we propose to apply mixed effects models to predict corporate bonds LGD, as well as other widely used LGD models. The empirical results show that mixed effects models are able to explain the unobservable heterogeneity and to make better predictions compared with linear regression and fractional response regression. All the statistical models are performed in SAS/STAT^®, SAS^® 9.2, using specifically PROC REG and PROC NLMIXED, and the model evaluation metrics are calculated in PROC IML. This paper gives a detailed description on how to use PROC NLMIXED to build and estimate generalized linear models and mixed effects models.

More organizations are understanding the importance of geo-tagged data and the need for tools that can successfully combine location data with business metrics to provide intelligent outputs that are beyond a simple map. SAS^® Visual Analytics provides a robust and powerful platform for achieving location intelligence performed with a combination of SAS^® Analytics and GIS mapping technologies such as that offered by Esri. This paper describes the essentials for achieving location intelligence and demonstrates with industry examples how SAS Visual Analytics makes it possible.

This paper considers the %MRE macro for estimating multivariate ratio estimates. Also, we use PROC REG to estimate multivariate regression estimates and to show that regression estimates are superior to the ratio estimates.

Two examples of Vector Autoregressive Moving Average modeling with exogenous variables are given in this presentation. Data is from the real world. One example is about a two-dimensional time series for wages and prices in Denmark that spans more than a hundred years. The other is about the market for agricultural products, especially eggs! These examples give a general overview of the many possibilities offered by PROC VARMAX, such as handling of seasonality, causality testing and Bayesian modeling, and so on.

SAS/OR^® software for operations research includes mathematical optimization, discrete-event simulation, and project and resource scheduling capabilities. This paper surveys a number of its new features that better equip you to address decision-making challenges such as planning, resource management, and asset allocation. Optimization performance improvements help you solve larger, more detailed problems more quickly. Improvements encompass linear, mixed integer linear, and nonlinear optimization, and include multithreading of the mixed integer linear solver and major improvements in the performance and functionality of the decomposition algorithm for linear and mixed integer linear optimization. The OPTMODEL procedure for optimization modeling adds direct access to the same set of efficient network optimization algorithms available via the OPTNET procedure in SAS/OR, enabling you to embed network optimization as a component of larger solution processes. Other new features enable you to execute multiple optimizations in parallel and use the FCMP procedure to define functions. The OPTLSO procedure for global and local search optimization adds the ability to work with multiple objective functions and produce a set of Pareto-optimal solutions. This approach enables you to manage the trade-offs that arise between competing objectives and adds to the range of optimization problems that you can solve using PROC OPTLSO. Another new feature is support for the READ_ARRAY function in PROC FCMP, with which you can much more easily input array-structured data to be used in function definitions. Finally, SAS^® Simulation Studio for discrete-event simulation enhances its graphical interface to better support customization and increase ease of use.

This poster shows the audience step-by-step how to connect to a database without registering the connection in either the Windows ODBC Administrator tool or in the Windows Registry database. This poster also shows how the connection can be more flexible and better managed by building it into a SAS^® macro.

Have you ever asked, Why doesn't my PDF output look just like my HTML output? This paper explains the power and differences of each destination. You ll learn how each destination works and understand why the output looks the way it does. Learn tips and tricks for how to modify your SAS^® code to make each destination look more like the other. The tips span from beginner to advanced in all areas of reporting. Each destination is like a superhero, helping you transform your reports to meet all your needs. Learn how to use each ODS destination to the fullest extent of its powers.

PD_Calibrate is a macro that standardizes the calibration of our predictive credit-scoring models at Nykredit. The macro is activated with an input data set, variables, anchor point, specification of method, number of buckets, kink-value, and so on. The output consists of graphs, HTML, and two data sets containing key values for the model being calibrated and values for the use of graphics.

PROC TABULATE is a powerful tool for creating tabular summary reports. Its advantages, over PROC REPORT, are that it requires less code, allows for more convenient table construction, and uses syntax that makes it easier to modify a table s structure. However, its inability to compute the sum, difference, product, and ratio of column sums has hindered its use in many circumstances. This paper illustrates and discusses some creative approaches and methods for overcoming these limitations, enabling users to produce needed reports and still enjoy the simplicity and convenience of PROC TABULATE. These methods and skills can have prominent applications in a variety of business intelligence and analytics fields.

The effectiveness of visual interpretation of the differences between pairs of LS-means in a generalized linear model includes the graph's ability to display four inferential and two perceptual tasks. Among the types of graphs which display some or all of these tasks are the forest plot, the mean-mean scatter plot (diffogram), and closely related to it, the mean-mean multiple comparison (MMC) plot. These graphs provide essential visual perspectives for interpretation of the differences among pairs of LS-means from a generalized linear model (GLM). The diffogram is a graphical option now available through ODS statistical graphics with linear model procedures such as GLIMMIX. Through combining ODS output files of the LS-means and their differences, the SGPLOT procedure can efficiently produce forest and MMC plots.

Power analysis helps you plan a study that has a controlled probability of detecting a meaningful effect, giving you conclusive results with maximum efficiency. SAS/STAT^® provides two procedures for performing sample size and power computations: the POWER procedure provides analyses for a wide variety of different statistical tests, and the GLMPOWER procedure focuses on power analysis for general linear models. In SAS/STAT 13.1, the GLMPOWER procedure has been updated to enable power analysis for multivariate linear models and repeated measures studies. Much of the syntax is similar to the syntax of the GLM procedure, including both the new MANOVA and REPEATED statements and the existing MODEL and CONTRAST statements. In addition, PROC GLMPOWER offers flexible yet parsimonious options for specifying the covariance. One such option is the two-parameter linear exponent autoregressive (LEAR) correlation structure, which includes other common structures such as AR(1), compound symmetry, and first-order moving average as special cases. This paper reviews the new repeated measures features of PROC GLMPOWER, demonstrates their use in several examples, and discusses the pros and cons of the MANOVA and repeated measures approaches.

The SQL procedure contains many powerful and elegant language features for intermediate and advanced SQL users. This presentation discusses topics that will help SAS^® users unlock the many powerful features, options, and other gems found in the SQL universe. Topics include CASE logic; a sampling of summary (statistical) functions; dictionary tables; PROC SQL and the SAS macro language interface; joins and join algorithms; PROC SQL statement options _METHOD, MAGIC=101, MAGIC=102, and MAGIC=103; and key performance (optimization) issues.

The steady expansion of electronic health records (EHR) over the past decade has increased the use of observational healthcare data for analysis. One of the challenges with EHR data is to combine information from different domains (diagnosis, procedures, drugs, adverse events, labs, quality of life scores, and so on) onto a single timeline to get a longitudinal view of the patient. This enables the physician or researcher to visualize a patient's health profile, thereby revealing anomalies, trends, and responses graphically,thus empowering them to treat more effectively. This paper attempts to provide a composite view of a patient by using SAS^® Graph Template Language to create a profile graph using the following data elements: key event dates, drugs, adverse events, Quality of Life (QoL) scores. For visualization, the GTL graph uses X and X2 axes for dates, vertical reference lines to represent key dates (for example, when the disease is first diagnosed), horizontal bar plot for duration of drugs taken and adverse events reported, and a series plot at the bottom to show the QoL score.

In a good clinical study, statisticians and various stakeholders are interested in assessing and isolating the effect of non-study drugs. One common practice in clinical trials is that clinical investigators follow the protocol to taper certain concomitant medications in an attempt to prevent or resolve adverse reactions and/or to minimize the number of subject withdrawals due to lack of efficacy or adverse event. To assess the impact of those tapering medicines during study is of high interest to clinical scientists and the study statistician. This paper presents the challenges and caveats of assessing the impact of tapering a certain type of concomitant medications using SAS^® 9.3 based on a hypothetical case. The paper also presents the advantages of visual graphs in facilitating communications between clinical scientists and the study statistician.

The ZIP access method is new with SAS^® 9.4. This paper provides several examples of reading from and writing to ZIP files using this access method, including the use of the DATA step directory management macros and the new MEMVAR= option.

The Department of Market Monitoring (DMM) at California ISO is responsible for promoting a robust, competitive, and nondiscriminatory electric power market in California by keeping a close watch on the efficiency and effectiveness of the ancillary service, congestion management, and real-time spot markets. We monitor the potential of market participants to exercise undue market power, the behavior of market participants that is consistent with attempts to exercise market power and the market performance that results from the interaction of market structure with participant behavior. In order to perform monitoring activities effectively, DMM collects available data, designs, and implement reporting dashboards that track key market metrics. We are using various SAS^® BI tools to develop and employ metrics and analytic tools applicable to market structure, participant behavior, and market performance. This paper provides details about the effective use of various SAS BI tools to implement an automated real time market monitoring functionality.

Predicting news articles that customers are likely to view/read next provides a distinct advantage to news sites. Collaborative filtering is a widely used technique for the same. This paper details an approach within collaborative filtering that uses the cosine similarity function to achieve this purpose. The paper further details two different approaches, customized targeting and article level targeting, that can be used in marketing campaigns. Please note that this presentation connects with Session ID 1887. Session ID 1887 happens immediately following this session

Personalized recommender systems are being used in many industries to increase customer engagement. In the TV industry, this is primarily used to increase viewership, which in turn increases market share, revenue, and profit. This paper attempts to develop a recommender system using the correlation procedure under collaborative filtering methodology. The only data requirement for this recommendation system would be past viewership of customers for a given time period. Please note that this session connects with Session ID 1886. Session ID 1886 happens immediately prior to this session

Duration and severity data arise in several fields including biostatistics, demography, economics, engineering, and sociology. SAS^® procedures LIFETEST, LIFEREG. and PHREG are the workhorses for analysis of time to event data in applications in biostatistics. Similar methods apply to the magnitude or severity of a random event, where the outcome might be right, left, or interval censored and/or, right or left truncated. All combinations of types of censoring and truncation could be present in the data set. Regression models such as the accelerated failure time model, the Cox model, and the non-homogeneous Poisson model have extensions to address time-varying covariates in the analysis of clustered outcomes, multivariate outcomes of mixed types, and recurrent events. We present an overview of new capabilities that are available in the procedures QLIM, QUANTLIFE, RELIABILITY, and SEVERITY with examples illustrating their application using empirical data sets drawn from easily accessible sources.

In healthcare, we often express our analytics results as being adjusted . For example, you might have read a study in which the authors reported the data as age-adjusted or risk-adjusted. The concept of adjustment is widely used in program evaluation, comparing quality indicators across providers and systems, forecasting incidence rates, and in cost-effectiveness research. In order to make reasonable comparisons across time, place, or population, we need to account for small sample sizes and case-mix variation in other words, we need to level the playing field and account for differences in health status and for uniqueness in a given population. If you are new to healthcare. What it really means to adjust the data in order to make comparisons might not be obvious. In this paper, we explore the methods by which we control for potentially confounding variables in our data. We do so through a series of examples from the healthcare literature in both primary care and health insurance. In this survey of methods, we discuss the concepts of rates and how they can be adjusted for demographic strata (such as age, gender, and race), as well as health risk factors such as case mix.

SAS^® 9.4 introduces several new software products to better support SAS^® web applications. These products include SAS^® Web Server, SAS^® Web Application Server (with the availability of out-of-the-box clustering), and SAS^® Environment Manager. Even though these products have been tuned and tested for SAS 9.4 web applications, advanced users might want to know the tools and techniques that they can use to further monitor, manage, tune, and improve the performance of their environment. This paper discusses how customers can achieve that by exploring the following concepts, activities, techniques, and tools: using SAS Environment Manager to monitor run-time performance of middle-tier components using additional tools to monitor middle-tier components (Apache server-status, Java VisualVM, Java command-line tools, Java GC logging) identifying the potential bottlenecks and tuning suggestions identifying appropriate clustering strategy (single-server vs. multi-server for homogenous or heterogeneous clustering) suggesting the data to collect when analyzing performance (GC data, thread dumps, heapdumps, system resource utilization information, log files) discussing in-depth performance analysis tools (Thread Dump Analyzer, HPjmeter, Eclipse Memory Analyzer (MAT), IBM Support Assistant tools: GC and Memory Visualizer, Memory Analyzer, Thread, and Monitor Dump Analyzer)

The high school dropout problem has been called a national crisis (Heppen & Therriault, 2008). Almost one-third of all high school students leave the public school system before graduating (Swanson, 2004), and the problem is particularly severe among minority students (Greene & Winters, 2005; U.S. Department of Education, 2006). Educators, researchers, and policymakers continue to work to identify effective dropout prevention strategies. One effective approach is to identify high-risk students at an early stage, and then provide corresponding interventions to keep them in school. One of the strengths of Educational Data Mining is to reveal hidden patterns and predict future performance by analyzing accessible student data. These predictive algorithms generated by predictive modeling can serve as an early warning system. However, because individual schools and districts have various combinations of race, gender, and socioeconomic status, we cannot use a set of standardized predictors and obtain satisfactory predictive results. Analyzing a limited number of variables and limited historical data does not generate accurate models. Additionally, the predictive model might not consider interactions among predictors. The strength of data mining is the capability to analyze a large amount of data and variables. Multiple analytic strategies (including model comparisons) can be applied to maximize model performance. For future goals, we propose a debuted data mining framework to construct an early warning and trend analysis system with components of data warehousing, data mining, and reporting at the levels of individual students, schools, school districts, and the entire state.

Changes in default behavior in the last few SAS^® releases have enabled faster processing of SAS formats, especially for SAS/ACCESS^® customers. But, as with any performance enhancement, your results may vary. This presentation teaches you: the differences between two important SAS format optimizations how to tell which optimization is in effect a simple method to get the behavior you want The target audience for this presentation is SAS/ACCESS customers, particularly those who have also licensed SAS^® In-Database Code Accelerator for Teradata or SAS^® In-Database Code Accelerator for Greenplum.

There are exciting new capabilities available from SAS^® High-Performance Analytics and SAS^® Visual Analytics. Current customers seek a deployment strategy that enables gradual migration to the new technologies. Such a strategy would mitigate the need for 'rip and replace' and would enable resource utilization to evolve along a continuum rather than partitioning resources, which would result in underused computing or storage hardware. New customers who deploy a combination of SAS^® Grid Manager, SAS High-Performance Analytics, and SAS Visual Analytics seek to reduce the cost of computing resources and reduce data duplication and data movement by deploying these solutions on the same pool of hardware. When sharing hardware, it is important to implement resource management in order to help guarantee that resources are available for critical applications and processes. This session discusses various methods for managing hardware resources in a multi-application environment. Specific strategies are suggested, along with implementation suggestions.

This discussion uses SAS^® Office Analytics as an example to demonstrate the importance of preparing for the SAS^® installation. There are many nuances as well as requirements that need to be addressed before you do an installation. These requirements are basically similar, yet they differ according to the target installation operating system. In other words, there are some differences in preparation routines for Windows and *Nix flavors. Our discussion focuses on these three topics: 1. Pre-installation considerations such as sizing, storage, proper credentials, and third-party requirements; 2. Installation steps and requirements; and 3. Post-installation configuration. In addition to preparation, this paper also discusses potential issues and pitfalls to watch out for, as well as best practices.

Are you wondering what is causing your valuable machine asset to fail? What could those drivers be, and what is the likelihood of failure? Do you want to be proactive rather than reactive? Answers to these questions have arrived with SAS^® Predictive Asset Maintenance. The solution provides an analytical framework to reduce the amount of unscheduled downtime and optimize maintenance cycles and costs. An all new (R&D-based) version of this offering is now available. Key aspects of this paper include: Discussing key business drivers for and capabilities of SAS Predictive Asset Maintenance. Detailed analysis of the solution, including: Data model Explorations Data selections Path I: analysis workbench maintenance analysis and stability monitoring Path II: analysis workbench JMP^®, SAS^® Enterprise Guide^®, and SAS^® Enterprise Miner^™ Analytical case development using SAS Enterprise Miner, SAS^® Model Manager, and SAS^® Data Integration Studio SAS Predictive Asset Maintenance Portlet for reports A realistic business example in the oil and gas industry is used.

Hospital readmission rates have become a key indicator for measuring the quality of health care. Currently, use of these rates has been adopted by major healthcare stakeholders, including the Centers for Medicare & Medicaid Services (CMS), the Agency for Healthcare Research and Quality (AHRQ), and the National Committee for Quality Assurance (NCQA). In the calculation of the readmission rate, it is often a challenging task to identify eligible hospital readmissions from the convoluted administrative claims data. By taking advantage of the flexibility and power of SAS^® programming tools, this paper proposes three different solutions using both DATA step and PROC SQL to help identify 30-day hospital readmissions more efficiently and accurately. Solution 1 (DATA STEP vertically) employs the LAG function to calculate the gap between the current admission date and the immediate previous discharge date. This vertical thinking process is straightforward and does not require additional data management. Solution 2 (DATA STEP horizontally) uses PROC TRANSPOSE procedures, ARRAYs, and DO loops to transform claims data from long to wide, and examines each patient s hospitalization experiences in just one line. A similar horizontal thinking process has been discussed in previous SAS papers for calculating medication utilization. Solution 3 (PROC SQL) takes advantage of a special table joining (self-join) by creating a Cartesian product further subsetted by a joining condition and WHERE statements. All three solutions have achieved the same results by correctly identifying 30-day hospital readmissions, and they can be handily applied to tackle similar programming challenges in research projects.

SAS^® Visual Analytics delivers the power of approachable in-memory analytics in an intuitive web interface. The scalable technology behind SAS Visual Analytics should not benefit just the analyst or data scientist in your organization but indeed everyone regardless of their analytical background. This paper outlines a framework for the creation of a cloud deployment of SAS Visual Analytics using the SAS^® 9.4 platform. Based on proven best practices and existing customer implementations, the paper focuses on architecture, processes, and design for reliability and scalable multi-tenancy. The framework enables your organization to move away from the departmental view of the world and to offer analytical capabilities for consumerization and collaboration across the enterprise.

This workshop provides hands-on experience using SAS^® Enterprise Miner^™ high-performance nodes. Workshop participants will do the following: learn the similarities and differences between high-performance nodes and standard nodes build a project flow using high-performance nodes extract and save a score code for model deployment

Due to XML's growing role in data interchange, it is increasingly important for SAS^® programmers to become proficient with SAS technologies and techniques for creating and consuming XML. The current work expands on a SAS^® Global Forum 2013 presentation that dealt with these topics providing additional examples of using XML maps to read and write XML files and using the Output Delivery System (ODS) to create custom tagsets for generating XML.

Statistical mediation analysis is common in business, social sciences, epidemiology, and related fields because it explains how and why two variables are related. For example, mediation analysis is used to investigate how product presentation affects liking the product, which then affects the purchase of the product. Mediation analysis evaluates the mechanism by which a health intervention changes norms that then change health behavior. Research on mediation analysis methods is an active area of research. Some recent research in statistical mediation analysis focuses on extracting accurate information from small samples by using Bayesian methods. The Bayesian framework offers an intuitive solution to mediation analysis with small samples; namely, incorporating prior information into the analysis when there is existing knowledge about the expected magnitude of mediation effects. Using diffuse prior distributions with no prior knowledge allows researchers to reason in terms of probability rather than in terms of (or in addition to) statistical power. Using SAS^® PROC MCMC, researchers can choose one of two simple and effective methods to incorporate their prior knowledge into the statistical analysis, and can obtain the posterior probabilities for quantities of interest such as the mediated effect. This project presents four examples of using PROC MCMC to analyze a single mediator model with real data using: (1) diffuse prior information for each regression coefficient in the model, (2) informative prior distributions for each regression coefficient, (3) diffuse prior distribution for the covariance matrix of variables in the model, and (4) informative prior distribution for the covariance matrix.

To get the full benefit from PROC REPORT, the savvy programmer needs to master ACROSS usage and the COMPUTE block. Timing issues with PROC REPORT and ABSOLUTE column references can unlock the power of PROC REPORT. This presentation shows how to make the most of ACROSS usage with PROC REPORT. Use PROC REPORT instead of multiple TRANSPOSE steps. Find out how to use character variables with ACROSS. Learn how to impact the column headings for ACROSS usage items. Learn how to use aliases. Find out how to perform rowwise trafficlighting and trafficlighting based on multiple conditions.

SAS^® has a number of procedures for smoothing scatter plots. In this tutorial, we review the nonparametric technique called LOESS, which estimates local regression surfaces. We review the LOESS procedure and then compare it to a parametric regression methodology that employs restricted cubic splines to fit nonlinear patterns in the data. Not only do these two methods fit scatterplot data, but they can also be used to fit multivariate relationships.

SAS^® OLAP technology is used to organize and present summarized data for business intelligence applications. It features flexible options for creating and storing aggregations to improve performance and brings a powerful multi-dimensional approach to querying data. This paper focuses on managing security features available to OLAP cubes through the combination of SAS metadata and MDX logic.

Security-conscious organizations have rigorous IT regulations, especially when company data is available on the move. This paper explores the options available to secure a deployment of SAS^® Mobile BI with SAS^® Visual Analytics. The setup ensures encrypted communication from remote mobile clients all the way to backend servers. Additionally, the integration of SAS Mobile BI with third-party Mobile Device Management (MDM) software and Virtual Private Network (VPN) technology enable you to place several layers of security and access control to your data. The paper also covers the out-of-the box security features of the SAS Mobile BI and SAS Visual Analytics administration applications to help you close the loop on all possible areas of exploitation.

Even if you are familiar with security considerations for SAS^® BI deployments, such as metadata and file system permissions, there are additional security aspects to consider when securing any environment that includes SAS^® Visual Analytics. These include files and permissions to the grid machines in a distributed environment, permissions on the SAS^® LASR™ Analytic Servers, and interactions with existing metadata types. We approach these security aspects from the perspective of an administrator who is securing the environment for himself, a data builder, and a report consumer.

Universities strive to be competitive in the quality of education as well as cost of attendance. Peer institutions are selected to make comparisons pertaining to academics, costs, and revenues. These comparisons lead to strategic decisions and long-range planning to meet goals. The process of finding comparable institutions could be completed with cluster analysis, a statistical technique. Cluster analysis places universities with similar characteristics into groups or clusters. A process to determine peer universities will be illustrated using PROC STANDARD, PROC FASTCLUS, and PROC CLUSTER.

Multiple imputation, a popular strategy for dealing with missing values, usually assumes that the data are missing at random (MAR). That is, for a variable X, the probability that an observation is missing depends only on the observed values of other variables, not on the unobserved values of X. It is important to examine the sensitivity of inferences to departures from the MAR assumption, because this assumption cannot be verified using the data. The pattern-mixture model approach to sensitivity analysis models the distribution of a response as the mixture of a distribution of the observed responses and a distribution of the missing responses. Missing values can then be imputed under a plausible scenario for which the missing data are missing not at random (MNAR). If this scenario leads to a conclusion different from that of inference under MAR, then the MAR assumption is questionable. This paper reviews the concepts of multiple imputation and explains how you can apply the pattern-mixture model approach in the MI procedure by using the MNAR statement, which is new in SAS/STAT^® 13.1. You can specify a subset of the observations to derive the imputation model, which is used for pattern imputation based on control groups in clinical trials. You can also adjust imputed values by using specified shift and scale parameters for a set of selected observations, which are used for sensitivity analysis with a tipping-point approach.

The Purchasing Department is considering contracting with your team for a new SAS^® Enterprise BI application. He's already met with SAS^® and seen the sales pitch, and he is very interested. But the manager is a tightwad and not sure about spending the money. Also, he wants his team to be the primary developers for this new application. Before investing his money on training, programming, and support, he would like a proof-of-concept. This paper will walk you through the seven steps to create a SAS Enterprise BI POC project: Develop a kick-off meeting including a full demo of the SAS Enterprise BI tools. Set up your UNIX file systems and security. Set up your SAS metadata ACTs, users, groups, folders, and libraries. Make sure the necessary SAS client tools are installed on the developers machines. Hold a SAS Enterprise BI workshop to introduce them to the basics, including SAS^® Enterprise Guide^®, SAS^® Stored Processes, SAS^® Information Maps, SAS^® Web Report Studio, SAS^® Information Delivery Portal, and SAS^® Add-In for Microsoft Office, along with supporting documentation. Work with them to develop a simple project, one that highlights the benefits of SAS Enterprise BI and shows several methods for achieving the desired results. Last but not least, follow up! Remember, your goal is not to launch a full-blown application. Instead, we ll strive toward helping them see the potential in your organization for applying this methodology.

SAS^® continues to expand and improve its reporting capability. With new SAS^® 9.4 enhancements in ODS (Output Delivery System), the opportunity to create stunning reports has expanded even further. If you are charged with creating relevant, informative, easy-to-read reports for clients or administrators, then the ODS Report Writing Interface, ODS LAYOUT enhancements, and the new ODSTEXT procedure are important tools to use. These tools allow you to create reports in a smart, eye-catching format that can be turned around quite quickly and programmed to provide optimum flexibility. How many times have you worked hours to tweak and fine-tune a report directly in Microsoft Excel, Microsoft Word, Microsoft Power Point or some other similar software only to be asked for a quick update , which would then take hours to recreate because you are manually transferring data? Do you ever dread receiving the compliment, This is really wonderful information!!!! because you know it will be followed by Can you run this for EVERY region? Well, dread no more, because when you harness the power of SAS^® ODS, you can create first-rate, flexible, fabulous reports! Join me as I share with you two real-world examples of ODS capabilities using (1) a marketing piece I designed to help the president of our university spotlight county- and region-specific data as he recruited across the state and (2) our academic program review form, a multi-page report that outputs to Word so that program coordinators can add personalized commentary to support their program s effectiveness.

Companies in the insurance and banking industries need to model the frequency and severity of adverse events every day. Accurate modeling of risks and the application of predictive methods ensure the liquidity and financial health of portfolios. Often, the modeling involves computationally intensive, large-scale simulation. SAS/ETS^® provides high-performance procedures to assist in this modeling. This paper discusses the capabilities of the HPCOUNTREG and HPSEVERITY procedures, which estimate count and loss distribution models in a massively parallel processing environment. The loss modeling features have been extended by the new HPCDM procedure, which simulates the probability distribution of the aggregate loss by compounding the count and severity distribution models. PROC HPCDM also analyzes the impact of various future scenarios and parameter uncertainty on the distribution of the aggregate loss. This paper steps through the entire modeling and simulation process that is useful in the insurance and banking industries.

Big data is all the rage these days, with the proliferation of data-accumulating electronic gadgets and instrumentation. At the heart of big data analytics is the MapReduce programming model. As a framework for distributed computing, MapReduce uses a divide-and-conquer approach to allow large-scale parallel processing of massive data. As the name suggests, the model consists of a Map function, which first splits data into key-value pairs, and a Reduce function, which then carries out the final processing of the mapper outputs. It is not hard to see how these functions can be simulated with the SAS^® hash objects technique, and in reality, implemented in the new SAS^® DS2 language. This paper demonstrates how hash object programming can handle data in a MapReduce fashion and shows some potential applications in physics, chemistry, biology, and finance.

For electricity retailer and distribution companies, the introduction of smart-meter technologies has been a key investment, reducing the significant costs associated with meter reading. Electricity companies continue to look for ways to generate a dividend from them in other ways. This presentation looks at selected practical applications of smart-meter data: forecasting using smart-meter data as inputs customer segmentation revenue protection This presentation aims to show some techniques that can be used to effectively manage and analyze the large amounts of data generated by these devices in order to generate business value.

All the documentation about the creation of graphs with SAS^® software states that ODS Graphics is not intended to replace SAS/GRAPH^®. However, ODS Graphics is included in the Base SAS^® license from SAS^® 9.3, but SAS/GRAPH still requires an additional component license, so there is definitely a financial incentive to convert to ODS Graphics. This paper gives examples that can be used to replace commonly created SAS/GRAPH plots, and highlights the small number of plots that are still very difficult, or impossible, to create in ODS Graphics.

Businesses today are inundated with unstructured data not just social media but books, blogs, articles, journals, manuscripts, and even detailed legal documents. Manually managing unstructured data can be time consuming and frustrating, and might not yield accurate results. Having an analyst read documents often introduces bias because analysts have their own experiences, and those experiences help shape how the text is interpreted. The fact that people become fatigued can also impact the way that the text is interpreted. Is the analyst as motivated at the end of the day as they are at the beginning? Data science involves using data management, analytical, and visualization strategies to uncover the story that the data is trying to tell in a more automated fashion. This is important with structured data but becomes even more vital with unstructured data. Introducing automated processes for managing unstructured data can significantly increase the value and meaning gleaned from the data. This paper outlines the data science processes necessary to ingest, transform, analyze, and visualize three Star Wars movie scripts: A New Hope, The Empire Strikes Back, and Return of the Jedi. It focuses on the need to create structure from unstructured data using SAS^® Data Management, SAS^® Text Miner, and SAS^® Content Categorization. The results are featured using SAS^® Visual Analytics.

Before you can analyze your big data, you need to prepare the data for analysis. This paper discusses capabilities and techniques for using the power of SAS^® to prepare big data for analytics. It focuses on how a SAS user can write code that will run in a Hadoop cluster and take advantage of the massive parallel processing power of Hadoop.

SAS^® platform installations are large, complex, growing, and ever-changing enterprise systems that support many diverse groups of users and content. A reliable metadata security implementation is critical for providing access to business resources in a methodical, organized, partitioned, and protected manner. With natural changes to users, groups, and folders from an organization s day-to-day activities, deviations from an original metadata security plan are very likely and can put protected resources at risk. Regular security testing can ensure compliance, but, given existing administrator commitments and the time consuming nature of manual testing procedures, it doesn't tend to happen. This paper discusses concepts and outlines several example test specifications from an automated metadata security testing framework being developed by Metacoda. With regularly scheduled, automated testing, using a well-defined set of test rules, administrators can focus on their other work, and let alerts notify them of any deviations from a metadata security test specification.

Global businesses must react to daily changes in market conditions over multiple geographies and industries. Consuming reputable daily economic reports assists in understanding these changing conditions, but requires both a significant human time commitment and a subjective assessment of each topic area of interest. To combat these constraints, Dow's Advanced Analytics team has constructed a process to calculate sentence-level topic frequency and sentiment scoring from unstructured economic reports. Daily topic sentiment scores are aggregated to weekly and monthly intervals and used as exogenous variables to model external economic time series data. These models serve to both validate the relationship between our sentiment scoring process and also as near-term forecasts where daily or weekly variables are unavailable. This paper will first describe our process of using SAS^® Text Miner to import and discover economic topics and sentiment from unstructured economic reports. The next section describes sentiment variable selection techniques that use SAS/STAT^®, SAS/ETS^®, and SAS^® Enterprise Miner^™ to generate similarity measures to economic indices. Our process then uses ARIMAX modeling in SAS^® Forecast Studio to create economic index forecasts with topic sentiments. Finally, we show how the sentiment model components are used as a matrix of economic key performance indicators by topic and geography.

Nowadays, in the Big Data era, Business Intelligence Departments collect, store, process, calculate, and monitor massive amounts of data. Nevertheless, sometimes hundreds of metrics built on the structured data are inefficient to explain why the offered deal sold better or worse than expected. The answer might be found in text data that every company owns and yet is not aware of its possible usage or neglects its value. This project shows text mining methods, implemented in SAS^® Text Miner 12.1, that enable the determination of a deal's success or failure factors based on in-house or Internet-scattered customers' views and opinions. The study is conducted on data gathered from Groupon Sp. z o.o. (Polish business unit) - e-commerce company, as it is assumed that the market is by and large a customer-driven environment.

For decades, SAS^® has been the cornerstone of many organizations for business reporting. In more recent times, the ability to quickly determine the performance of an organization through the use of dashboards has become a requirement. Different ways of providing dashboard capabilities are discussed in this paper: using out-of-the-box solutions such as SAS^® Visual Analytics and SAS^® BI Dashboard, through to alternative solutions using SAS^® Stored Processes, batch processes, and SAS^® Integration Technologies. Extending the available indicators is also discussed, using Graph Template Language and KPI indicators provided with Base SAS^®, as well as alternatives such as Google Charts and Flash objects. Real-world field experience, problem areas, solutions, and tips are shared, along with live examples of some of the different methods.

Raking (iterative proportional fitting) is a procedure that takes sampling weights from complex sample surveys and adjusts them so that they add to known control totals. This process reduces variance and adjusts for undercoverage. But raking in multiple dimensions can lead to extreme weights, which increase variance. Trimming is another sample weighting procedure that reduces extreme weights to cutoffs, thereby improving variance properties while potentially introducing bias. The RAKE-TRIM macro combines raking and trimming in an iterative algorithm to achieve these two goals simultaneously. The raking reduces the bias potential from trimming, and the trimming reduces the variance inflation from raking. When convergence occurs, the final weights aggregate to the control totals, as well as respect the trimming limits. SAS^® macros are well suited for this kind of envelope program: the larger macro consists of the integration of component macros that were developed for other applications. A parameter specification sheet enables users to provide all of the parameters needed to define the algorithm for their particular situation, and, if necessary, to alter the parameters to facilitate convergence. Diagnostics are included when convergence fails. Microsoft Excel tables are imported to provide the cell structure and are exported to provide statistics for the algorithm s results. This RAKE-TRIM macro was first developed in 2010 for the 2009 National Household Transportation Survey and has been used in other studies as well. The paper describes the algorithm and discusses our experiences with it.

In the traveling salesman problem, a salesman must minimize travel distance while visiting each of a given set of cities exactly once. This paper uses the SAS/OR^® OPTMODEL procedure to formulate and solve the traveling baseball fan problem, which complicates the traveling salesman problem by incorporating scheduling constraints: a baseball fan must visit each of the 30 Major League ballparks exactly once, and each visit must include watching a scheduled Major League game. The objective is to minimize the time between the start of the first attended game and the end of the last attended game. One natural integer programming formulation involves a binary decision variable for each scheduled game, indicating whether the fan attends. But a reformulation as a side-constrained network flow problem yields much better solver performance.

This new SAS^® tool is a two-dimensional color chart for visualizing changes in a population or in a system over time. Data for one point in time appear as a thin horizontal band of color. Bands for successive periods are stacked up to make a two-dimensional plot, with the vertical direction showing changes over time. As a system evolves over time, different kinds of events have different characteristic patterns. Creation of Time Contour plots is explained step-by-step. Examples are given in astrostatistics, biostatistics, econometrics, and demographics.

SAS^® Management Console was designed to control and monitor virtually all of the parts and features of the SAS^® Intelligence Platform. However, administering even a small SAS^® Business Intelligence system can be a daunting task. This paper presents a few techniques that will help you simplify your administrative tasks and enable you and your user community to get the most out of your system. The SAS^® Metadata Server stores most of the information required to maintain and run the SAS Intelligence Platform, which is obviously the heart of SAS BI. It stores information about libraries, users, database logons, passwords, stored processes, reports, OLAP cubes, and a myriad of other information. Organization of this metadata is an essential part of an optimally performing system. This paper discusses ways of organizing the metadata to serve your organization well. It also discusses some of the key features of SAS Management Console and best practices that will assist the administrator in defining roles, promoting, archiving, backing up, securing, and simply just organizing the data so that it can be found and accessed easily by administrators and users alike.

Two new production features offered in the Output Delivery System (ODS) in SAS^® 9.4 are ODS LAYOUT and the ODS Report Writing Interface. This one-two punch gives you power and flexibility in structuring your SAS^® output. What are the strengths for each? How do they differ? How do they interact? This paper highlights the similarities and differences between the two and illustrates the advantages of using them together. Why go twelve rounds? Make your report a knockout with ODS LAYOUT and the Report Writing Interface.

This paper introduces basic-to-advanced strategies and syntax, the tools of the SAS^® trade, that enable client-quality PDF output to be delivered through a production system of macro programs. A variety of PROC REPORT output with proven client value serves to illustrate a discussion of the fundamental syntax used to create and share formats, macro programs, PROC REPORT output, inline styles, and style templates. The syntax is integrated into basic macro programs that demonstrate the the core functionality of the reporting system. Later sections of the paper describe in detail the macro programs used to start and end a PDF: (a) programs to save all current titles, footnotes, and option settings, establish standard titles, footnotes and option settings, and initially create the PDF document; and (b) programs to create a final standard data documentation page, end the PDF, and restore all original titles, footnotes, and option settings. The paper also shows how macro programs enable the setting of inline styles at the global, macro program, and macro program call-levels. The paper includes the style template syntax and the complete PROC REPORT syntax generated by the macro programs, and is designed for the intermediate to advanced SAS programmer using Foundation SAS^® for Release 9.2 on a Windows operating system.

One of the most striking features separating SAS^® from other statistical languages is that SAS has native SQL (Structured Query Language) capacity. In addition to the merging or the querying that a SAS user commonly applies in daily practice, SQL significantly enhances the power of SAS in descriptive statistics and data management. In this paper, we show reproducible examples to introduce 10 useful tips for the SQL procedure in the BASE module.

Do you often create SAS^® web applications? Do you need to update or retrieve values from a SAS data set and display them in a browser? Do you need to show the results of a SAS^® Stored Process in a browser? Are you finding it difficult to figure out how to pass parameters from a web page to a SAS Stored Process? If you answered yes to any of these questions, then look no further. Techniques shown in this paper include: How to take advantage of JavaScript and minimize PUT statements. How to call a SAS Stored Process from your web page by using JavaScript and XMLHTTPRequest. How to pass parameters from a web page to a SAS Stored Process and from a SAS Stored Process back to the web page. How to use simple Ajax to refresh and update a specific part of a web page without the need to reload the entire page. How to apply Cascading Style Sheets (CSS) on your web page. How to use some of the latest HTML5 features, like drag and drop. How to display run-time graphs in your web page by using STATGRAPH and PROC SGRENDER. This paper contains sample code that demonstrates each of the techniques.

The independent means t-test is commonly used for testing the equality of two population means. However, this test is very sensitive to violations of the population normality and homogeneity of variance assumptions. In such situations, Yuen s (1974) trimmed t-test is recommended as a robust alternative. The purpose of this paper is to provide a SAS^® macro that allows easy computation of Yuen s symmetric trimmed t-test. The macro output includes a table with trimmed means for each of two groups, Winsorized variance estimates, degrees of freedom, and obtained value of t (with two-tailed p-value). In addition, the results of a simulation study are presented and provide empirical comparisons of the Type I error rates and statistical power of the independent samples t-test, Satterthwaite s approximate t-test, and the trimmed t-test when the assumptions of normality and homogeneity of variance are violated.

As SAS^® professionals, we often wish our clients would make more use of the many excellent SAS tools at their disposal. However, it remains an indisputable fact that for many business users, Microsoft Excel is still their go-to application when it comes to carrying out any form of data analysis. There have been many attempts to integrate SAS and Excel, but none of these has up to now been entirely seamless. This paper addresses that problem by showing how, with a minimum of VBA (Visual Basic for Applications) code and by using the SAS Integrated Object Model (IOM) together with Microsoft s ActiveX Data Objects (ADO), we can create an Excel User Defined Function (UDF) that can accept parameters, carry out all data manipulations in SAS, and return the result to the spreadsheet in a way that is completely invisible to the user. They can nest or link these functions together just as if they were native Excel functions. We then go on to demonstrate how, using the same techniques, we can create small Excel applications that can perform sophisticated data analyses in SAS while not forcing users out of their Excel comfort zones.

The DOW-loop is not official terminology that one can find in SAS^® documentation, but it has been well known and widely used among experienced SAS programmers. The DOW-loop was developed over a decade ago by a few SAS gurus, including Don Henderson, Paul Dorfman, and Ian Whitlock. A common construction of the DOW-loop consists of a DO-UNTIL loop with a SET and a BY statement within the loop. This construction isolates actions that are performed before and after the loop from the action within the loop, which results in eliminating the need for retaining or resetting the newly created variables to missing in the DATA step. In this talk, in addition to explaining the DOW-loop construction, we review how to apply the DOW-loop to various applications.

You have built the simple bar chart and mastered the art of layering multiple plot statements to create complex graphs like the Survival Plot using the SGPLOT procedure. You know all about how to use plot statements creatively to get what you need and how to customize the axes to achieve the look and feel you want. Now it s time to up your game and step into the realm of the Graphics Wizard. Behold the magical powers of Graph Template Language Layouts! Here you will learn the esoteric art of creating complex multi-cell graphs using LAYOUT LATTICE. This is the incantation that gives you the power to build complex, multi-cell graphs like the Forest plot, Stock plots with multiple indicators like MACD and Stochastics, Adverse Events by Relative Risk graphs, and more. If you ever wondered how the Diagnostics panel in the REG procedure was built, this paper is for you. Be warned, this is not the realm for the faint of heart!

When deploying SAS^® code into a production environment, a programmer should ensure that the code satisfies the following key criteria: The code runs without errors. The code performs operations consistent with the agreed upon business logic. The code is not dependent on manual human intervention. The code performs necessary checks in order to provide sufficient quality control of the deployment process. Base SAS^® programming offers a wide range of techniques to support the last two aforementioned criteria. This presentation demonstrates the use of SAS^® macro variables in combination with simple macro programs to perform a number of routine automated tasks that are often part of the production-ready code. Some of the examples to be demonstrated include the following topics: How to check that required key parameters for a successful program run are populated in the parameters file. How to automatically copy the content of the permanent folder to the newly created backup folder. How to automatically update the log file with new run information. How to check whether a data set already exists in the library.

Everyone has heard about SAS^® Cloud. Now come learn how you can build and manage your own cloud using the same SAS^® virtual application (vApp) technology.

Epidemic modeling is an increasingly important tool in the study of infectious diseases. As technology advances and more and more parameters and data are incorporated into models, it is easy for programs to get bogged down and become unacceptably slow. The use of arrays for importing real data and collecting generated model results in SAS^® can help to streamline the process so results can be obtained and analyzed more efficiently. This paper describes a stochastic mathematical model for transmission of influenza among residents and healthcare workers in long-term care facilities (LTCFs) in New Mexico. The purpose of the model was to determine to what extent herd immunity among LTCF residents could be induced by varying the vaccine coverage among LTCF healthcare workers. Using arrays in SAS made it possible to efficiently incorporate real surveillance data into the model while also simplifying analyses of the results, which ultimately held important implications for LTCF policy and practice.

This session demonstrates how to use Base SAS^® tools to add functional, reusable extensions to the SAS^® system. Learn how to do the following: Write user-defined macro functions that can be used inline with any other SAS code. Use PROC FCMP to write and store user-defined functions that can be used in other SAS programs. Write DS2 user-defined methods and store them in packages for easy reuse in subsequent DS2 programs.

Have you found OS file permissions to be insufficient to tailor access controls to meet your SAS^® data security requirements? Have you found metadata permissions on tables useful for restricting access to SAS data, but then discovered that SAS programmers can avoid the permissions by issuing LIBNAME statements that do not use the metadata? Would you like to ensure that users have access to only particular rows or columns in SAS data sets, no matter how they access the SAS data sets? Metadata-bound libraries provide the ability to authorize access to SAS data by authenticated Metadata User and Group identities that cannot be bypassed by SAS programmers who attempt to avoid the metadata with direct LIBNAME statements. They also provide the ability to limit the rows and columns in SAS data sets that an authenticated user is allowed to see. The authorization decision is made in the bowels of the SAS^® I/O system, where it cannot be avoided when data is accessed. Metadata-bound libraries were first implemented in the second maintenance release of SAS^® 9.3 and were enhanced in SAS^® 9.4. This paper overviews the feature and discusses best practices for administering libraries bound to metadata and user experiences with bound data. It also discusses enhancements included in the first maintenance release of SAS 9.4.

There are yearly 2.35 million road accident cases recorded in the U.S. Among them, 37,000 were considered fatal. Road crashes cost USD 230.6 billion per year, or an average of USD 820 per person. Our efforts are to identify the important factors that lead to vehicle collisions and to predict the injury risk involved in them. Data was collected from National Automotive Sampling System (NASS), containing 20,247 cases with 19 variables. Input variables describe the factors involved in an accident like Height, Age, Weight, Gender, Vehicle model year, Speed limit, Energy absorption in Collision & Deformation location, etc. The target variable is nominal showing levels of injury. Missing values in interval variables were imputed using mean and class variables using the count method. Multivariate analysis suggests high correlation between tire footprint and wheelbase (Corr=0.97, P<0.0001) and original weight of car and curb weight of car (Corr=0.79, P<0.0001). Variables having high kurtosis values were transformed using range standardization. Variables were sorted using variable importance using decision tree analysis. Models like multiple regression, polynomial regression, neural network, and decision tree were applied in the dataset to identify the factors that are most significant in predicting the injury risk. Multilinear perception neural network came out to be the best model to predict injury risk index, with the least Average Squared Error 0.086 in validation dataset.

In the past, calibration was done by using extremely complicated macros in Base SAS^® to create a Microsoft Excel workbook with multiple linked spreadsheets. This process made it hard to audit, was not reliably replicable, and was open to user error. The task was to create a replicable, auditable, and locked down application that allowed the user to change certain parameters and see the impact of those changes without needing to code. SAS^® Stored Processes are used to generate a screen that is split into three sections: one shows static reporting, the second is a data-driven custom input form, and the third shows test results. The initial screen uses a standard stored process that enables the user to select the model and time period. Macro variables are passed through to subset data. The Static reports are created from a stored process that executes two REPORT procedures that subset the data based on the passed parameters. The form is built using SAS^® to generate HTML and is data driven. The Update button at the end of the form executes a stored process that collects the data that the user has entered into the form and updates a database. After the rates have been updated, they are used to generate test results using PROC REPORT.

The Affordable Care Act that is being implemented now is expected to fundamentally reshape the health care industry. All current participants--providers, subscribers, and payers--will operate differently under a new set of key performance indicators (KPIs). This paper uses public data and SAS^® software to establish a baseline for the health care industry today so that structural changes can be measured in the future to establish the impact of the new laws.

Health plans use wide-ranging interventions based on criteria set by nationally recognized organizations (for example, NCQA and CMS) to change health-related behavior in large populations. Evaluation of these interventions has become more important with the increased need to report patient-centered quality of care outcomes. Findings from evaluations can detect successful intervention elements and identify at-risk patients for further targeted interventions. This paper describes how SAS^® was applied to evaluate the effectiveness of a patient-directed intervention designed to increase medication adherence and a health plan s CMS Part D Star Ratings. Topics covered include querying data warehouse tables, merging pharmacy and eligibility claims, manipulating data to create outcome variables, and running statistical tests to measure pre-post intervention differences.

The Patient-Centered Outcomes Research Institute (PCORI) was created as part of the Affordable Care Act. PCORI is authorized by Congress to conduct research to provide information about the best available evidence to help patients and their health care providers make more informed decisions. Community Care Behavioral Health Organization in Pittsburgh, Pennsylvania was awarded a PCORI research grant to investigate health care system improvements for adults with serious mental illness. The grant, titled Optimizing Behavioral Health Homes by Focusing on Outcomes that Matter Most for Adults with Serious Mental Illness, began in January of 2013 and is ongoing. Information Technology staff at Community Care have leveraged SAS^® solutions in providing real-time data extraction and reports to support the development and implementation of this research project. SAS tools have been used to merge data from multiple platforms and database sources, including web data sources. SAS has also enabled the formatting and traffic lighting of multiple Microsoft Excel data sets and files, in addition to the creation of many operational reports and data files needed for study implementation, administration, and maintenance. The challenges faced and the SAS solutions employed are the subject of this paper.

Volatility estimation plays an important role in the elds of statistics and nance. Many different techniques address the problem of estimating volatility of nancial assets. Autoregressive conditional heteroscedasticity (ARCH) models and the related generalized ARCH models are popular models for volatility. This talk will introduce the need for volatility modeling as well as introduce the framework of ARCH and GARCH models. A brief discussion about the structure of ARCH and GARCH models will then be compared to other volatility modeling techniques.

In connection with the consolidation work at Nykredit, the data stored on the Nykredit z/OS SAS^® installation had to be migrated (copied) to the new x64 Windows SAS platform storage. However, getting an overview of these data on the z/OS mainframe can be difficult, and a series of questions arise during the process. For example: Who is responsible? How many bytes? How many rows and columns? When were the data created? And so on. With extensive use of filename FTP and looping, and extracting metadata, it is possible to get an overview of the data on the host presented in a Microsoft Excel spreadsheet.

Expensive physical capital must be regularly maintained for optimal efficiency and long-term insurance against damage. The maintenance process usually consists of constantly monitoring high-frequency sensor data and performing corrective maintenance when the expected values do not match the actual values. An economic system can also be thought of as a system that requires constant monitoring and occasional maintenance in the form of monetary or fiscal policy. This paper shows how to use the SSM procedure in SAS/ETS^® to make forecasts of expected values by using high-frequency multivariate time series. The paper also demonstrates the functionality of the new SASEFRED interface engine in SAS/ETS.

A European utility company has several thousand service engineers who provide its customers with services that range from performing routine maintenance to handling emergency breakdowns. Each service engineer is assigned to a work area that consists of a set of postal sectors. The company wants to understand how it should configure its work areas to improve customer satisfaction, minimize travel time for its full-time service engineers, and minimize the costs of overtime and subcontractor hours. This paper describes the use of SAS/OR^® optimization procedures to model this problem and configure optimal work areas, and the use of SAS^® Simulation Studio to simulate how the optimal configurations might satisfy the customer service requirements. The experimental results show that the proposed solution can satisfy customer demand within the desired service-time window, with significantly less travel time for the engineers, and with lower overtime and subcontractor costs.