SAS Global Forum 2014 Proceedings

Accelerated testing is an effective tool for predicting when systems fail, where the system can be as simple as an engine gasket or as complex as a magnetic resonance imaging (MRI) scanner. In particular, you might conduct an experiment to determine how factors such as temperature and voltage impose enough stress on the system to cause failure. Because system components usually meet nominal quality standards, it can take a long time to obtain failure data under normal-use conditions. An effective strategy is to accelerate the experiment by testing under abnormally stressful conditions, such as higher temperatures. Following that approach, you obtain data more quickly, and you can then analyze the data by using the RELIABILITY procedure in SAS/QC^® software. The analysis is a three-step process: you establish a probability model, explore the relationship between stress and failure, and then extrapolate to normal-use conditions. Graphs are a key component of all three stages: you choose a model by comparing residual plots from candidate models, use graphs to examine the stress-failure relationship, and then use an appropriately scaled graph to extrapolate along a straight line. This paper guides you through the process, and it highlights features added to the RELIABILITY procedure in SAS/QC 13.1.

The use of Bayesian methods has become increasingly popular in modern statistical analysis, with applications in numerous scientific fields. In recent releases, SAS^® has provided a wealth of tools for Bayesian analysis, with convenient access through several popular procedures in addition to the MCMC procedure, which is specifically designed for complex Bayesian modeling (not discussed here). This paper introduces the principles of Bayesian inference and reviews the steps in a Bayesian analysis. It then describes the Bayesian capabilities provided in four procedures(the GENMOD, PHREG, FMM, and LIFEREG procedures) including the available prior distributions, posterior summary statistics, and convergence diagnostics. Various sampling methods that are used to sample from the posterior distributions are also discussed. The second part of the paper describes how to use the GENMOD and PHREG procedures to perform Bayesian analyses for real-world examples and how to take advantage of the Bayesian framework to address scientific questions.

SAS/STAT^® 13.1 includes the new ICLIFETEST procedure, which is specifically designed for analyzing interval-censored data. This type of data is frequently found in studies where the event time of interest is known to have occurred not at a specific time but only within a certain time period. PROC ICLIFETEST performs nonparametric survival analysis of interval-censored data and is a counterpart to PROC LIFETEST, which handles right-censored data. With similar syntax, you use PROC ICLIFETEST to estimate the survival function and to compare the survival functions of different populations. This paper introduces you to the ICLIFETEST procedure and presents examples that illustrate how you can use it to perform analyses of interval-censored data.

Do you have an abstract for an idea that you want to submit as a proposal to SAS^® conferences, but you are not sure which section is the most appropriate one? In this paper, we discuss a methodology for automatically identifying the most suitable section or sections for your proposal content. We use SAS^® Text Miner 12.1 and SAS^® Content Categorization Studio 12.1 to develop a rule-based categorization model. This model is used to automatically score your paper abstract to identify the most relevant and appropriate conference sections to submit to for a better chance of acceptance.

The creation of production reports for our organization has historically been a labor-intensive process. Each month, our team produced around 650 SAS^® graphs and 30 tables which were then copied and pasted into 16 custom Microsoft PowerPoint presentations, each between 20 and 30 pages. To reduce the number of manual steps, we converted to using stored processes and the SAS^® Add-In for Microsoft Office. This allowed us to simply refresh those 16 PowerPoint presentations by using SAS Add-In for Microsoft Office to run SAS^® Stored Processes. SAS Stored Processes generates the graphs and tables while SAS Add-In for Microsoft Office refreshes the document with updated graphs already sized and positioned on the slides just as we need them. With this new process, we are realizing the dream of reducing the amount of time spent on a single monthly production process. This paper will discuss the steps to creating a complex PowerPoint presentation that is simply refreshed rather than created new each month. I will discuss converting the original code to stored processes using SAS^® Enterprise Guide^®, options and style statements that are required to continue to use a custom style sheet, and how to create the PowerPoint presentation with an assortment of output types including horizontal bar charts, control charts, and tables. I will also discuss some of the challenges and solutions specific to the stored process and PowerPoint Add-In that we encountered during this conversion process.

As IT professionals, saving time is critical. Delivering timely and quality-looking reports and information to management, end users, and customers is essential. SAS^® provides numerous 'canned' PROCedures for generating quick results to take care of these needs ... and more. In this hands-on workshop, attendees acquire basic insights into the power and flexibility offered by SAS PROCedures using PRINT, FORMS, and SQL to produce detail output; FREQ, MEANS, and UNIVARIATE to summarize and create tabular and statistical output; and data sets to manage data libraries. Additional topics include techniques for informing SAS which data set to use as input to a procedure, how to subset data using a WHERE statement (or WHERE= data set option), and how to perform BY-group processing.

As complicated as the macro language is to learn, there are very strong reasons for doing so. At its heart, the macro language is a code generator. In its simplest uses, it can substitute simple bits of code like variable names and the names of data sets that are to be analyzed. In more complex situations, it can be used to create entire statements and steps based on information may even be unavailable to the person writing or even executing the macro. At the time of execution, it can be used to make queries of the SAS^® environment as well as the operating system, and utilize the gathered information to make informed decisions about how it is to further function and execute.

Because the macro language is primarily a code generator, it makes sense that the code that it creates must be generated before it can be executed. This implies that execution of the macro language comes first. Simple as this is in concept, timing issues and conflicts are often not so simple to recognize in application. As we use the macro language to take on more complex tasks, it becomes even more critical that we have an understanding of these issues.

Macro variables and their values are stored in symbol tables, which in turn are held in memory. Not only are there are a number of ways to create macro variables, but they can be created in a wide variety of situations. How they are created and under what circumstances effects the variable s scope how and where the macro variable is stored and retrieved. There are a number of misconceptions about macro variable scope and about how the macro variables are assigned to symbol tables. These misconceptions can cause problems that the new, and sometimes even the experienced, macro programmer does not anticipate. Understanding the basic rules for macro variable assignment can help the macro programmer solve some of these problems that are otherwise quite mystifying.

SAS^® Visual Analytics and the SAS^® LASR™ Analytic Server provide many capabilities to analyze data fast. Depending on your organization, data can be loaded as a self-service operation. Or, your data can be large and shared with many people. And, as data gets large, effectively loading it and keeping it updated become important. This presentation discusses the range of data scenarios from self-service spreadsheets to very large databases, from single-subject data to large star schema topologies, and from single-use data to continually updated data that requires high levels of resilience and monitoring. Fast and easy access to big data is important to empower your organization to make better business decisions. Understanding how to have a responsive and reliable data tier on which to make these decisions is within your reach.

Discover how SAS^® leverages field marketing programs to support AllAnalytics.com, a sponsored third-party community. This paper explores the use of SAS software, including SAS^® Enterprise Guide^®, SAS^® Customer Experience Analytics, and SAS^® Marketing Automation to enable marketers to have better insight, better targeting, and better response from SAS programs.

Missing data commonly occurs in medical, psychiatry, and social researches. The SAS^® MI and MIANALYZE procedures are often used to generate multiple imputations and then provide valid statistical inferences based on them. However, MIANALYZE is not applicable to combine type-III analyses obtained using multiple imputed data sets. In this manuscript, we write a macro to combine the type-III analyses generated from the SAS MIXED procedure based on multiple imputations. The proposed method can be extended to other procedures reporting type-III analyses, such as GENMOD and GLM.

There has been debate regarding which method to use to analyze repeated measures continuous data when the design includes only two measurement times. Five different techniques can be applied and give similar results when there is little to no correlation between pre- and post-test measurements and when data at each time point are complete: 1) analysis of variance on the difference between pre- and post-test, 2) analysis of covariance on the differences between pre- and post-test controlling for pre-test, 3) analysis of covariance on post-test controlling for pre-test, 4) multiple analysis of variance on post- test and pre-test, and 5) repeated measures analysis of variance. However, when there is missing data or if a moderate to high correlation between pre- and post-test measures exists under an intent-to-treat analysis framework, bias is introduced in the tests for the ANOVA, ANCOVA, and MANOVA techniques. A comparison of Type III sum of squares, F-tests, and p-values for a complete case and an intent-to-treat analysis are presented. The analysis using a complete case data set shows that all five methods produce similar results except for the repeated measures ANOVA due to a moderate correlation between pre- and post-test measures. However, significant bias is introduced for the tests using the intent-to-treat data set.

We used OPTNET to link hedge fund datasets from four vendors, covering overlapping populations, but with no universal identifier. This quick tip shows how to treat data records as nodes, use pairwise identifiers to generate distance measures, and get PROC OPTNET to assign clusters of records from all sources to each hedge fund. This proved to be far faster, and easier, than doing the same task in PROC SQL.

The evolution of the mobile landscape has created a shift in the workforce that now favors mobile devices over traditional desktops. Considering that today's workforce is not always in the office or at their desks, new opportunities have been created to deliver report content through innovative mobile experiences. SAS^® Mobile BI for both iOS and Android tablets compliments the SAS^® Visual Analytics offering by providing anytime, anywhere access to reports containing information that consumers need. This paper presents best practices and tips on how to optimize reports for mobile users, taking into consideration the constraints of limited screen real estate and connectivity, as well as answers a few frequently asked questions. Discover how SAS Mobile BI captures the power of mobile reporting to prepare for the vast growth that is predicted in the future.

New York City boasts a wide variety of cuisine owing to the rich tourism and the vibrant immigrant population. The quality of food and hygiene maintained at the restaurants serving different cuisines has a direct impact on the people dining in them. The objective of this paper is to build a model that predicts the grade of the restaurants in New York City. It also provides deeper statistical insights into the distribution of restaurants, cuisine categories, grades, criticality of violations, etc., and concludes with the sequence analysis performed on the complete set of violations recorded for the restaurants at different time periods over the years 2012 and 2013. The data for 2013 is used to test the model. The data set consists of 15 variables that capture to restaurant location-specific and violation details. The target is an ordinal variable with three levels, A, B, and C, in descending order of the quality representation. Various SAS^® Enterprise Miner^™ models, logistic regression, decision trees, neural networks, and ensemble models are built and compared using validation misclassification rate. The stepwise regression model appears to be the best model, with prediction accuracy of 75.33%. The regression model is trained at step 3. The number of critical violations at 8.5 gives the root node for the split of the target levels, and the rest of the tree splits are guided by the predictor variables such as number of critical and non-critical violations, number of critical violations for the year 2011, cuisine group, and the borough.

The implicit loop refers to the DATA step repetitively reading data and creating observations, one at a time. The explicit loop, which uses the iterative DO, DO WHILE, or DO UNTIL statements, is used to repetitively execute certain SAS^® statements within each iteration of the DATA step execution. Explicit loops are often used to simulate data and to perform a certain computation repetitively. However, when an explicit loop is used along with array processing, the applications are extended widely, which includes transposing data, performing computations across variables, and so on. To be able to write a successful program that uses loops and arrays, one needs to know the contents in the program data vector (PDV) during the DATA step execution, which is the fundamental concept of DATA step programming. This workshop covers the basic concepts of the PDV, which is often ignored by novice programmers, and then illustrates how to use loops and arrays to transform lengthy code into more efficient programs.

The availability of specialized programming and analysis resources in academic medical centers is often limited, creating a significant challenge for clinical research. The current work describes how Base SAS^® and SAS^® Enterprise Guide^® are being used to empower research staff so that they are less reliant on these scarce resources.

This paper is based on the belief that debugging your programs is not only necessary, but also a good way to gain insight into how SAS^® works. Once you understand why you got an error, a warning, or a note, you'll be better able to avoid problems in the future. In other words, people who are good debuggers are good programmers. This paper covers common problems including missing semicolons and character-to-numeric conversions, and the tricky problem of a DATA step that runs without suspicious messages but, nonetheless, produces the wrong results. For each problem, the message is deciphered, possible causes are listed, and how to fix the problem is explained.

The worst part of going to school is having to show up. However, data shows that those who do show up are the ones that are going to be the most successful (Johnson, 2000). As shown in a study done in Minneapolis, students who were in class at least 95% of the time were twice as likely pass state tests (Johnson, 2000). Studies have been conducted and show that school districts that show interest in attendance have higher achievement in students (Reeves, 2008). The goal in doing research on student attendance is to find out the patterns of when people are missing class and why they are absent. The data comes directly from the Phillip O Berry High School Attendance Office, with around 1600 students; there is plenty of data to be used from the 2012 2013 school year. Using Base SAS^® 9.3, after importing the data in from Microsoft Excel, a series of PROC formats and PROC GCharts were used to output and analyze the data. The data showed the days of the week and period that students missed the most, depending on grade level. The data shows that Freshman and Seniors were the most likely to be absent on a given day. Based on the data, attendance continues to be a issue; therefore, school districts need to take an active role in developing attendance policies.

Logistic regression is a powerful technique for predicting the outcome of a categorical response variable and is used in a wide range of disciplines. Until recently, however, this methodology was available only for data that were collected using a simple random sample. Thanks to the work of statisticians such as Binder (1983), logistic modeling has been extended to data that are collected from a complex survey design that includes strata, clusters, and weights. Through examples, this paper provides guidance on how to use PROC SURVEYLOGISTIC to apply logistic regression modeling techniques to data that are collected from a complex survey design. The examples relate to calculating odds ratios for models with interactions, scoring data sets, and producing ROC curves. As an extension of these techniques, a final example shows how to fit a Generalized Estimating Equations (GEE) logit model.

SAS^® can easily perform calculations and export the result to Microsoft Excel in a report. However, sometimes you need Excel to have a formula or a function in a cell and not just a number. Whether it s for a boss who wants to see a SUM formula in the total cell or to have automatically updating reports that can be sent to people who don t use SAS to be completed, exporting formulas to Excel can be very powerful. This paper illustrates how, by using PROC REPORT and PROC PRINT along with the ExcelXP tagset, you can easily export formulas and functions into Excel directly from SAS. The method outlined in this paper requires Base SAS^® 9.1 or higher and Excel 2002 or later and requires a basic understanding of the ExcelXP tagset.

Data is often stored in highly normalized ( tall and skinny ) structures that are not convenient for analysis. The SAS^® programmer frequently needs to transform the data to arrange relevant variables together in a single row. Sometimes this is a simple matter of using the TRANSPOSE procedure to flip the values of a single variable into separate variables. However, when there are multiple variables to be transposed to a single row, it might require multiple transpositions to obtain the desired result. This paper describes five different ways to achieve this flip-flop, explains how each method works, and compares the usefulness of each method in various situations. Emphasis is given to achieving a data-driven solution that minimizes hard-coding based on prior knowledge of the possible values each variable can have and that improves maintainability and reusability of the code. The intended audience is novice and intermediate SAS programmers who have a basic understanding of the DATA step and the TRANSPOSE procedure.

This introductory presentation is intended for an audience new to mixed models who wants to get an overview of this useful class of models. Learn about mixed models as an extension of ordinary regression models, and see several examples of mixed models in social, agricultural, and pharmaceutical research.

For decades, mixed models have been used by researchers to account for random sources of variation in regression-type models. Now, they are gaining favor in business statistics for giving better predictions for naturally occurring groups of data, such as sales reps, store locations, or regions. Learn about how predictions based on a mixed model differ from predictions in ordinary regression and see examples of mixed models with business data.

When the dependent variable is a count, Poisson regression is a natural choice of distribution for fitting a regression model. This presentation is intended for an audience experienced in linear regression modeling, but new to Poisson regression modeling. Learn the basics of this useful distribution and see some examples where it is appropriate. Tips for identifying problems with fitting a Poisson regression model and some helpful alternatives are provided.

Analyzing data from a complex probability survey involves weighting observations so that inferences are correct. This introductory presentation is intended for an audience new to analyzing survey data. Learn the essentials of using the SURVEYxx procedures in SAS/STAT^®.

Have you ever seen SAS^® Visual Analytics reports that are somehow more elegant than a standard report? Which qualities make reports easier to navigate, more appealing to the eye, or reveal insights more quickly? These quick tips will reveal several SAS Visual Analytics report design characteristics to help make your reports stand out from the pack. We cover concepts like color palettes, content organization, interactions, labeling, and branding, just to name a few.

Beginning with SA^®S 9.2, ODS Graphics introduces a whole new way of generating graphs using SAS^®. With just a few lines of code, you can create a wide variety of high-quality graphs. This paper covers the three basic ODS Graphics procedures SGPLOT, SGPANEL, and SGSCATTER. SGPLOT produces single-celled graphs. SGPANEL produces multi-celled graphs that share common axes. SGSCATTER produces multi-celled graphs that might use different axes. This paper shows how to use each of these procedures in order to produce different types of graphs, how to send your graphs to different ODS destinations, how to access individual graphs, and how to specify properties of graphs, such as format, name, height, and width.

SAS^® High-Performance Analytics is a significant step forward in the area of high-speed, analytic processing in a scalable clustered environment. However, Big Data problems generally come with data from lots of data sources, at varying levels of maturity. Teradata s innovative Unified Data Architecture (UDA) represents a significant improvement in the way that large companies can think about Enterprise Data Management, including the Teradata Database, Hortonworks Hadoop, and Aster Data Discovery platform in a seamless integrated platform. Together, the two platforms provide business users, analysts, and data scientists with the ideally suited data management platforms, targeted specifically to their analytic needs, based upon analytic use cases, managed in a single integrated enterprise data management environment. The paper will focus on how several companies today are using Teradata s Integrated Hardware and Software UDA Platform to manage a single enterprise analytic environment, fight the ongoing proliferation of analytic data marts, and speed their operational analytic processes.

Organizations today make numerous decisions within their businesses that affect almost every aspect of their daily operations. Many of these decisions are now automatically generated by sophisticated enterprise decision management systems. These decisions include what offers to make to customers, sales transaction processing, payment processing, call center interactions, industrial maintenance, transportation scheduling, and thousands of other applications that all have a significant impact on the business bottom line. Concurrently, many of these same companies have developed or are now developing analytics that provide valuable insight into their customers, their products, and their markets. Unfortunately, many of the decision systems cannot maximize the power of analytics in the business processes at the point where the decisions are made. SAS^® Decision Manager is a new product that integrates analytical models with business rules and deploys them to operational systems where the decisions are made. Analytically driven decisions can be monitored, assessed, and improved over time. This paper describes the new product and its use and shows how models and business rules can be joined into a decision process and deployed to either batch processes or to real-time web processes that can be consumed by business applications.

This session introduces frailty models and their use in biostatistics to model time-to-event or survival data. The session uses examples to review situations in which a frailty model is a reasonable modeling option, to describe which SAS^® procedures can be used to fit frailty models, and to discuss the advantages and disadvantages of frailty models compared to other modeling options.

Traditional SAS^® programs typically consist of a series of SAS DATA steps, which refine input data sets until the final data set or report is reached. SAS DATA steps do not run in-database. However, SAS^® Enterprise Guide^® users can replicate this kind of iterative programming and have the resulting process flow run in-database by linking a series of SAS Enterprise Guide Query Builder tasks that output SAS views pointing at data that resides in a Teradata database, right up to the last Query Builder task, which generates the final data set or report. This session both explains and demonstrates this functionality.

Breast cancer is the most common cancer among females globally. After being diagnosed and treated for breast cancer, patients fear the recurrence of breast cancer. Breast cancer recurrence (BCR) can be defined as the return of breast cancer after primary treatment, and it can recur within the first three to five years. BCR studies have been conducted mostly in developed countries such as the United States, Japan, and Canada. Thus, the primary aim of this study is to investigate the feasibility of building a medical scorecard to assess the risk of BCR among Malaysian women. The medical scorecard was developed using data from 454 out of 1,149 patients who were diagnosed and underwent treatment at the Department of Surgery, Hospital Kuala Lumpur from 2006 until 2011. The outcome variable is a binary variable with two values: 1 (recurrence) and 0 (remission). Based on the availability of data, only 13 categorical predictors were identified and used in this study. The predictive performance of the Breast Cancer Recurrence scorecard (BCR scorecard) model was compared to the standard logistic regression (LR) model. Both the BCR scorecard and LR model were developed using SAS^® Enterprise Miner^™ 7.1. From this exploratory study, although the BCR scorecard model has better predictive ability with a lower misclassification rate (18%) compared to the logistic regression model (23%), the sensitivity of the BCR scorecard model is still low, possibly due to the small sample size and small number of risk factors. Five important risk factors were identified: histological type, race, stage, tumor size, and vascular invasion in predicting recurrence status.

The linear logistic test model (LLTM) that incorporates the cognitive task characteristics into the Rasch model has been widely used for various purposes in educational contexts. However, the LLTM model assumes that the variance of item difficulties is completely accounted for by cognitive attributes. To overcome the disadvantages of the LLTM, Janssen and colleagues (2004) proposed the crossed random-effects (CRE) LLTM by adding the error term on item difficulty. This study examines the accuracy and precision of the CRE-LLTM in terms of parameter estimation for cognitive attributes. The effect of different factors (for example, sample size, population distributions, sparse or dense matrices, and test length), is examined. PROC GLIMMIX was used to do the analysis and SAS/IML^® software was used to generate data.

The use of predictive models in healthcare has steadily increased over the decades. Statistical models now are assumed to be a necessary component in population health management. This session will review practical considerations in the choice of models to develop, criteria for assessing the utility of the models for production, and challenges with incorporating the models into business process flows. Specific examples of models will be provided based upon work by the Health Economics team at Blue Cross Blue Shield of North Carolina.

Sparse data sets are common in applications of text and data mining, social network analysis, and recommendation systems. In SAS^® software, sparse data sets are usually stored in the coordinate list (COO) transactional format. Two major drawbacks are associated with this sparse data representation: First, most SAS procedures are designed to handle dense data and cannot consume data that are stored transactionally. In that case, the options for analysis are significantly limited. Second, a sparse data set in transactional format is hard to store and process in distributed systems. Most techniques require that all transactions for a particular object be kept together; this assumption is violated when the transactions of that object are distributed to different nodes of the grid. This paper presents some different ideas about how to package all transactions of an object into a single row. Approaches include storing the sparse matrix densely, doing variable selection, doing variable extraction, and compressing the transactions into a few text variables by using Base64 encoding. These simple but effective techniques enable you to store and process your sparse data in better ways. This paper demonstrates how to use SAS^® Text Miner procedures to process sparse data sets and generate output data sets that are easy to store and can be readily processed by traditional SAS modeling procedures. The output of the system can be safely stored and distributed in any grid environment.

Many SAS^® procedures use classification variables when they are processing the data. These variables control how the procedure forms groupings, summarizations, and analysis elements. For statistics procedures, they are often used in the formation of the statistical model that is being analyzed. Classification variables can be explicitly specified with a CLASS statement, or they can be specified implicitly from their usage in the procedure. Because classification variables have such a heavy influence on the outcome of so many procedures, it is essential that the analyst have a good understanding of how classification variables are applied. Certainly there are a number of options (system and procedural) that affect how classification variables behave. While you may be aware of some of these options, a great many are new, and some of these new options and techniques are especially powerful. You really need to be open to learning how to program with CLASS.

Spinal epidural abscess (SEA) is a serious complication in hemodialysis (HD) patients, yet there is little medical literature that discusses it. This analysis identified risk factors and co-morbidities associated with SEA, as well as risk factors for mortality following the diagnosis. All incident HD cases from the United States Renal Data System for calendar years 2005 2008 were queried for a diagnosis of SEA. Potential clinical covariates, survival, and risk factors were recovered using ICD-9 diagnosis codes. Log-binomial regressions were performed using PROC GENMOD to assess the relative risks, and Cox regression models were run using PROC PHREG to estimate hazard ratios for mortality. For the 4-year study period, 660/355084 (0.19%) HD patients were identified with SEA, the largest cohort to date. Older age (RR=1.625), infectious comorbidities including bacteremia (RR=7.7976), methicillin-resistant Staphylococcus aureus infection (RR=2.6507), hepatitis C (RR=1.545), and non-infectious factors including diabetes (RR=1.514) and presence of vascular catheters (RR=1.348) were identified as significant risk factors for SEA. SEA in HD patients was associated with an increased risk of death (HR=1.20). Older age (HR=2.269), the presence of dialysis catheters (HR=1.884), cirrhosis (HR=1.715), decubitus ulcers (HR=1.669), bacteremia (HR=1.407), and total parenteral nutrition (HR=1.376) constitute the greatest risk factors for death after SEA diagnosis and thus necessitate a comprehensive approach to management.

Guidelines from the International Conference on Harmonisation (ICH) suggest that clinical trial data should be actively monitored to ensure data quality. Traditional interpretation of this guidance has often led to 100 percent source data verification (SDV) of respective case report forms through on-site monitoring. Such monitoring activities can also identify deficiencies in site training and uncover fraudulent behavior. However, such extensive on-site review is time-consuming, expensive and, as is true for any manual effort, limited in scope and prone to error. In contrast, risk-based monitoring makes use of central computerized review of clinical trial data and site metrics to determine whether sites should receive more extensive quality review through on-site monitoring visits. We demonstrate a risk-based monitoring solution within JMP^® Clinical to assess clinical trial data quality. Further, we describe a suite of tools used for identifying potentially fraudulent data at clinical sites. Data from a clinical trial of patients who experienced an aneurysmal subarachnoid hemorrhage provide illustration.

Why would a SAS^® administrator need a dashboard? With the evolution of SAS^®9, the SAS administrator s role has dramatically changed. Creating a dashboard on a SAS environment gives the SAS administrator an overview on the environment health, ensures resources are used as predicted, and provides a way to explore. SAS^® Visual Analytics allows you to quickly explore, analyze, and visualize data. So, why not bring the two concepts together? In this session, you will learn tips for designing dashboards, loading what might seem like impossible data, and building visualizations that guide users toward the next level of analysis. Using the dashboard, SAS administrators will learn ways to determine the system health and how to take advantage of external tools, such as the Metacoda software, to find additional insights and explore problem areas.

As organizations deploy SAS^® applications to produce the analytical results that are critical for solid decision making, they are turning to distributed grid computing operated by SAS^® Grid Manager. SAS Grid Manager provides a flexible, centrally managed computing environment for processing large volumes of data for analytical applications. Exceptional storage performance is one of the most critical components of implementing SAS in a distributed grid environment. When the storage subsystem is not designed properly or implemented correctly, SAS applications do not perform well, thereby reducing a key advantage of moving to grid computing. Therefore, a well-architected SAS environment with a high-performance storage environment is integral to clients getting the most out of their investment. This paper introduces concepts from software storage virtualization in the cloud for the generalized SAS Grid Manager architecture, highlights platform and enterprise architecture considerations, and uses the most popularly selected distributed file system, IBM GPFS, as an example. File system scalability considerations, configuration details, and tuning suggestions are provided in a manner that can be applied to a client s own environment. A summary checklist of important factors to consider when architecting and deploying a shared, distributed file system is provided.

You've been coding in Base SAS^® for a while. You've seen it, maybe even run code written by someone else, but there is something about the SAS^® Macro Language that is preventing you from fully embracing it. Could it be that % sign that appears everywhere, that &, that &&, or even that dreaded &&&? Fear no more. This short presentation will make everything clearer and encourage you to start coding your own SAS macros.

Universities in the UK are now subject to League Table reporting by a range of providers. The criteria used by each League Table differ. Universities, their faculties, and individual subject areas want to understand how the different tables are constructed and calculated, and what is required in order to maximize their position in each league table in order to attract the best students to their institution, thereby maximizing recruitment and student-related income streams. The School of Computing and Maths at the University of Derby is developing the use SAS^® Visual Analytics to analyse each league table to provide actionable insights as to actions that can be taken to improve their relative standing in the league tables and also to gain insights into feasible levels of targets relative to the peer groups of institutions. This paper outlines the approaches taken and some of the critical insights developed that will be of value to other higher education institutions in the UK, and suggests useful approaches that might be valuable in other countries.

This workshop provides hands-on experience using tools in the SAS^® Data Management offering. Workshop participants will use the following products: SAS^® Data Integration Studio DataFlux^® Data Management Studio SAS^® Data Management Console

This workshop provides hands-on experience using SAS^® Enterprise Miner^™. Workshop participants will do the following: open a project create and explore a data source build and compare models produce and examine score code that can be used for deployment

This workshop provides hands-on experience using SAS^® Forecast Server. Workshop participants will do the following: create a project with a hierarchy generate multiple forecasts automatically evaluate the accuracy of the forecasts build a custom model

This workshop provides hands-on experience using SAS^® Office Analytics. Workshop participants will complete the following tasks: use SAS^® Enterprise Guide^® to access and analyze data create a stored process that can be shared across an organization access and analyze data sources and stored processes using the SAS^® Add-In for Microsoft Office

This workshop provides hands-on experience with SAS^® Visual Analytics. Workshop participants will do the following: explore data with SAS^® Visual Analytics Explorer design reports with SAS^® Visual Analytics Designer

This workshop provides hands-on experience using SAS^® Text Miner Workshop participants will do the following: read a collection of text documents and convert them for use by SAS Text Miner using the Text Import node use the simple query language supported by the Text Filter node to extract information from a collection of documents use the Text Topic node to identify the dominant themes and concepts in a collection of documents use the Text Rule Builder node to classify documents that have pre-assigned categories

SAS^® Visual Analytics enables you to conduct ad hoc data analysis, visually explore data, develop reports, and then share insights through the web and mobile tablet apps. You can now also share your insights with colleagues using the SAS^® Office Analytics integration with Microsoft Excel, Microsoft Word, Microsoft PowerPoint, Microsoft Outlook, and Microsoft SharePoint. In addition to opening and refreshing reports created using SAS Visual Analytics, a new SAS^® Central view enables you to manage and comment on your favorite and recent reports from your Microsoft Office applications. You can also view your SAS Visual Analytics results in SAS^® Enterprise Guide^®. Learn more about this integration and what's coming in the future in this breakout session.

In a clinical study, we often set up multiple hypotheses with regard to the cost of getting study result. However, the multiplicity problem arises immediately when they are performed in a univariate manner. Some methods to control the rate of the overall type I error are applied widely, and they are discussed in this paper, except the methodology, we will introduce its application in one study case and provide the SAS^® code.

The FORMAT procedure in SAS^® is a very powerful and productive tool, yet many beginning programmers rarely make use of it. The FORMAT procedure provides a convenient way to do a table lookup in SAS. User-generated FORMATS can be used to assign descriptive labels to data values, create new variables, and find unexpected values. PROC FORMAT can also be used to generate data extracts and to merge data sets. This paper provides an introductory look at PROC FORMAT for the beginning user and provides sample code that illustrates the power of PROC FORMAT in a number of applications. Additional examples and applications of PROC FORMAT can be found in the SAS^® Press book titled 'The Power of PROC FORMAT.'

No need to fret, Base SAS^® programmers. Converting to SAS^® Enterprise Guide^® is a breeze, and it provides so many advantages. Coding remote connections to SAS^® servers is a thing of the past. Generate WYSIWYG prompts to increase the usage of the SAS code and to create reports and SAS^® Stored Processes to share easily with people who don t use SAS Enterprise Guide. The first and most important thing, however, is to change the default options and preferences to tame SAS Enterprise Guide, making it behave similar to your Base SAS ways. I cover all of these topics and provide demos along the way.

As a retailer, your bottom line is determined by supply and demand. Are you supplying what your customer is demanding? Or do they have to go look somewhere else? Accurate allocation and size optimization mean your customer will find what they want more often. And that means more sales, higher profits, and fewer losses for your organization. In this session, Linda Canada will share how DSW went from static allocation models without size capability to precision allocation using intelligent, dynamic models that incorporate item plans and size optimization.

You don't have to be with the CIA to discover why your SAS^® stored process is producing clandestine results. In this talk, you will learn how to use prompts to get the results you want, work with the metadata to ensure correct results, and even pick up simple coding tricks to improve performance. You will walk away with a new decoder ring that allows you to discover the secrets of the SAS logs!

Dataprev has become the principal owner of social data on the citizens in Brazil by collecting information for over forty years in order to subsidize pension applications for the government. The use of this data can be expanded to provide new tools to aid policy and assist the government to optimize the use of its resources. Using SAS^® MDM, we are developing a solution that uniquely identifies the citizens of Brazil. Overcoming challenges with multiple government agencies and with the validation of survey records that suggest the same person requires rules for governance and a definition of what represents a particular Brazilian citizen. In short, how do you turn a repository of master data into an efficient catalyst for public policy? This is the goal for creating a repository focused on identifying the citizens of Brazil.

This presentation will teach the audience how to use SAS^® ODS Graphics. Now part of Base SAS^®, ODS Graphics is a great way to easily create clear graphics that enable any user to tell their story well. SGPLOT and SGPANEL are two of the procedures that can be used to produce powerful graphics that used to require a lot of work. The core of the procedures are explained, as well as the options available. Furthermore, we explore the ways to combine the individual statements to make more complex graphics that tell the story better. Any user of Base SAS on any platform will find great value from the SAS ODS Graphics procedures.

SAS^® Merchandise Planning introduces key changes with the recent 6.4 release and the upcoming 6.5 release. This session highlights the integration to SAS^® Visual Analytics, the analytic infrastructure that enables users to integrate analytic results into their planning decisions, as well as multiple usability enhancements. Included is a look at the first of the packaged analytics that include the Recommended Assortment analytic.

SAS^® has an amazing arsenal of tools to use and display geographic information that is relatively unknown and underutilized. This presentation will highlight both new and existing capacities for creating stunning, informative maps as well as using geographic data in other ways. SAS provided map data files, functions, format libraries and other geographic data files will be explored in detail. Custom mapping of geographic areas will be discussed. Maps produced will include use of both the annotate facility (including some new functions) and PROC GREPLAY. Products used are Base SAS^® and SAS/GRAPH^®. SAS programmers of any skill level will benefit from this presentation.

The DATA step allows one to read, write, and manipulate many types of data. As data evolves to a more free-form state, the ability of SAS^® to handle character data becomes increasingly important. This paper addresses character data from multiple vantage points. For example, what is the default length of a character string, and why does it appear to change under different circumstances? What type of formatting is available for character data? How can we examine and manipulate character data? The audience for this paper is beginner to intermediate, and the goal is to provide an introduction to the numerous character functions available in SAS, including the basic LENGTH and SUBSTR functions, plus many others.