SAS Global Forum 2014 Proceedings

Nowadays, most corporations build and maintain their own data warehouse, and an ETL (Extract, Transform, and Load) process plays a critical role in managing the data. Some people might create a large program and execute this program from top to bottom. Others might generate a SAS^® driver with several programs included, and then execute this driver. If some programs can be run in parallel, then developers must write extra code to handle these concurrent processes. If one program fails, then users can either rerun the entire process or comment out the successful programs and resume the job from where the program failed. Usually the programs are deployed in production with read and execute permission only. Users do not have the priviledge of modifying codes on the fly. In this case, how do you comment out the programs if the job terminated abnormally? This paper illustrates an approach for managing ETL process flows. The approach uses a framework based on SAS, on a UNIX platform. This is a high-level infrastructure discussion with some explanation of the SAS codes that are used to implement the framework. The framework supports the rerun or partial run of the entire process without changing any source codes. It also supports the concurrent process, and therefore no extra code is needed.

The current study looks at recent health trends and behavior analyses of youth in America. Data used in this analysis was provided by the Center for Disease Control and Prevention and gathered using the Youth Risk Behavior Surveillance System (YRBSS). A factor analysis was performed to identify and define latent mental health and risk behavior variables. A series of logistic regression analyses were then performed using the risk behavior and demographic variables as potential contributing factors to each of the mental health variables. Mental health variables included disordered eating and depression/suicidal ideation data, while the risk behavior variables included smoking, consumption of alcohol and drugs, violence, vehicle safety, and sexual behavior data. Implications derived from the results of this research are a primary focus of this study. Risks and benefits of using a factor analysis with logistic regression in social science research will also be discussed in depth. Results included reporting differences between the years of 1991 and 2011. All results are discussed in relation to current youth health trend issues. Data was analyzed using SAS^® 9.3.

Have you ever wished that with one click you could copy any SAS^® data set, including variable names, so that you could paste the text into a Microsoft Word file, Microsoft PowerPoint slide, or spreadsheet? You can and, with just Base SAS^®, there are some little-known but easy-to use methods that are available for automating many of your (or your users ) common tasks.

This paper shows users how they can use a SAS^® macro named %SURVEYGLM to incorporate information about survey design to Generalized Linear Models (GLM). The R function %svyglm (Lumley, 2004) was used to verify the suitability of the %SURVEYGLM macro estimates. The results show that estimates are closer than the R function and that new distributions can be easily added to the algorithm.

Stepwise regression includes regression models in which the predictive variables are selected by an automated algorithm. The stepwise method involves two approaches: backward elimination and forward selection. Currently, SAS^® has three procedures capable of performing stepwise regression: REG, LOGISTIC, and GLMSELECT. PROC REG handles the linear regression model, but does not support a CLASS statement. PROC LOGISTIC handles binary responses and allows for logit, probit, and complementary log-log link functions. It also supports a CLASS statement. The GLMSELECT procedure performs selections in the framework of general linear models. It allows for a variety of model selection methods, including the LASSO method of Tibshirani (1996) and the related LAR method of Efron et al. (2004). PROC GLMSELECT also supports a CLASS statement. We present a stepwise algorithm for generalized linear mixed models for both marginal and conditional models. We illustrate the algorithm using data from a longitudinal epidemiology study aimed to investigate parents beliefs, behaviors, and feeding practices that associate positively or negatively with indices of sleep quality.

SAS^® functions provide amazing power to your DATA step programming. Some of these functions are essential some of them save you writing volumes of unnecessary code. This paper covers some of the most useful SAS functions. Some of these functions might be new to you, and they will change the way you program and approach common programming tasks.

The SAS^® Data Quality Server allows SAS^® programmers to integrate the power of DataFlux^® into their data cleaning programs. The power of SAS Data Quality Server enables programmers to efficiently identify matching records across different datasets when exact matches are not present. During a recent educational research project, the DQMATCH function proved very capable when trying to link records from disparate data sources. Two key insights led to even greater success in linking records. The first insight was acknowledging that the hierarchical structure of data can greatly improve success in matching records. The second insight was that the names of individuals can be restructured to improve the chances of successful matches. This paper provides an overview of how these insights were implemented using the DQMATCH function to link educational data from multiple sources.

Finding groups with similar attributes is at the core of knowledge discovery. To this end, Cluster Analysis automatically locates groups of similar observations. Despite successful applications, many practitioners are uncomfortable with the degree of automation in Cluster Analysis, which causes intuitive knowledge to be ignored. This is more true in text mining applications since individual words have meaning beyond the data set. Discovering groups with similar text is extremely insightful. However, blind applications of clustering algorithms ignore intuition and hence are unable to group similar text categories. The challenge is to integrate the power of clustering algorithms with the knowledge of experts. We demonstrate how SAS/STAT^® 9.2 procedures and the SAS^® Macro Language are used to ensemble the opinion of domain experts with multiple clustering models to arrive at a consensus. The method has been successfully applied to a large data set with structured attributes and unstructured opinions. The result is the ability to discover observations with similar attributes and opinions by capturing the wisdom of the crowds whether man or model.

This paper expands upon A Multilevel Model Primer Using SAS^® PROC MIXED in which we presented an overview of estimating two- and three-level linear models via PROC MIXED. However, in our earlier paper, we, for the most part, relied on simple options available in PROC MIXED. In this paper, we present a more advanced look at common PROC MIXED options used in the analysis of social and behavioral science data, as well introduce users to two different SAS macros previously developed for use with PROC MIXED: one to examine model fit (MIXED_FIT) and the other to examine distributional assumptions (MIXED_DX). Specific statistical options presented in the current paper include (a) PROC MIXED statement options for estimating statistical significance of variance estimates (COVTEST, including problems with using this option) and estimation methods (METHOD =), (b) MODEL statement option for degrees of freedom estimation (DDFM =), and (c) RANDOM statement option for specifying the variance/covariance structure to be used (TYPE =). Given the importance of examining model fit, we also present methods for estimating changes in model fit through an illustration of the SAS macro MIXED_FIT. Likewise, the SAS macro MIXED_DX is introduced to remind users to examine distributional assumptions associated with two-level linear models. To maintain continuity with the 2013 introductory PROC MIXED paper, thus providing users with a set of comprehensive guides for estimating multilevel models using PROC MIXED, we use the same real-world data sources that we used in our earlier primer paper.

Overdispersion (extra variation) arises in binomial, multinomial, or count data when variances are larger than those allowed by the binomial, multinomial, or Poisson model. This phenomenon is caused by clustering of the data, lack of independence, or both. As pointed out by McCullagh and Nelder (1989), Overdispersion is not uncommon in practice. In fact, some would maintain that over-dispersion is the norm in practice and nominal dispersion the exception. Several approaches are found for handling overdispersed data, namely quasi-likelihood and likelihood models, generalized estimating equations, and generalized linear mixed models. Some classical likelihood models are presented. Among them are the beta-binomial, binomial cluster (a.k.a. random clumped binomial), negative-binomial, zero-inflated Poisson, zero-inflated negative-binomial, hurdle Poisson, and the hurdle negative-binomial. We focus on how these approaches or models can be implemented in a practical way using, when appropriate, the procedures GLIMMIX, GENMOD, FMM, COUNTREG, NLMIXED, and SURVEYLOGISTIC. Some real data set examples are discussed in order to illustrate these applications. We also provide some guidance on how to analyze generalized linear overdispersion mixed models and possible scenarios where we might encounter them.

The proliferation of textual data in business is overwhelming. Unstructured textual data is being constantly generated via call center logs, emails, documents on the web, blogs, tweets, customer comments, customer reviews, and so on. While the amount of textual data is increasing rapidly, businesses ability to summarize, understand, and make sense of such data for making better business decisions remain challenging. This presentation takes a quick look at how to organize and analyze textual data for extracting insightful customer intelligence from a large collection of documents and for using such information to improve business operations and performance. Multiple business applications of case studies using real data that demonstrate applications of text analytics and sentiment mining using SAS^® Text Miner and SAS^® Sentiment Analysis Studio are presented. While SAS^® products are used as tools for demonstration only, the topics and theories covered are generic (not tool specific).

In randomized experiments, it is generally assumed that the hierarchical structures and variances are the same in the treatment and control groups. In some situations, however, these structures and variance components can differ. Consider a randomized experiment in which individuals randomized to the treatment condition are further assigned to clusters in which the intervention is administered, but no such clustering occurs in the control condition. Such a structure can occur, for example, when the individuals in the treatment condition are randomly assigned to group therapy sessions or to mathematics tutoring groups; individuals in the control condition do not receive group therapy or mathematics tutoring and therefore do not have that level of clustering. In this example, individuals in the treatment condition have a hierarchical structure, but individuals in the control condition do not. If the therapists or tutors differ in efficacy, the clustering in the treatment condition induces an extra source of variability in the data that needs to be accounted for in the analysis. We show how special features of SAS^® PROC MIXED and PROC GLIMMIX can be used to analyze data in which one or more treatment groups have a hierarchical structure that differs from that in the control group. We also discuss how to code variables in order to increase the computational efficiency for estimating parameters from these designs.

The power of social media has increased to such an extent that businesses that fail to monitor consumer responses on social networking sites are now clearly at a disadvantage. In this paper, we aim to provide some insights on the impact of the Digital Rights Management (DRM) policies of Microsoft and the release of Xbox One on their customers' reactions. We have conducted preliminary research to compare the basic text mining capabilities of SAS^® and R, two very diverse yet powerful tools. A total of 6,500 Tweets were collected to analyze the impact of the DRM policies of Microsoft. The Tweets were segmented into three groups based on date: before Microsoft announced its Xbox One policies (May 18 to May 26), after the policies were announced (May 27 to June 16), and after changes were made to the announced policies (June 16 to July 1). Our results suggest that SAS works better than R when it comes to extensive analysis of textual data. In our following work, customers reactions to the release of Xbox One will be analyzed using SAS^® Sentiment Analysis Studio. We will collect Tweets on Xbox posted before and after the release of Xbox One by Microsoft. We will have two categories, Tweets posted between November 15 and November 21 and those posted between November 22 and November 29. Sentiment analysis will then be performed on these Tweets, and the results will be compared between the two categories.

Many different neuroscience researchers have explored how various parts of the brain are connected, but no one has performed association mining using brain data. In this study, we used SAS^® Enterprise Miner^™ 7.1 for association mining of brain data collected by a 14-channel EEG device. An application of the association mining technique is presented in this novel context of brain activities and by linking our results to theories of cognitive neuroscience. The brain waves were collected while a user processed information about Facebook, the most well-known social networking site. The data was cleaned using Independent Component Analysis via an open source MATLAB package. Next, by applying the LORETA algorithm, activations at every fraction of the second were recorded. The data was codified into transactions to perform association mining. Results showing how various parts of brain get excited while processing the information are reported. This study provides preliminary insights into how brain wave data can be analyzed by widely available data mining techniques to enhance researcher s understanding of brain activation patterns.

In our previous work, we often needed to perform large numbers of repetitive and data-driven post-campaign analyses to evaluate the performance of marketing campaigns in terms of customer response. These routine tasks were usually carried out manually by using Microsoft Excel, which was tedious, time-consuming, and error-prone. In order to improve the work efficiency and analysis accuracy, we managed to automate the analysis process with SAS^® programming and replace the manual Excel work. Through the use of SAS macro programs and other advanced skills, we successfully automated the complicated data-driven analyses with high efficiency and accuracy. This paper presents and illustrates the creative analytical ideas and programming skills for developing the automatic analysis process, which can be extended to apply in a variety of business intelligence and analytics fields.

As IT professionals, saving time is critical. Delivering timely and quality-looking reports and information to management, end users, and customers is essential. SAS^® provides numerous 'canned' PROCedures for generating quick results to take care of these needs ... and more. In this hands-on workshop, attendees acquire basic insights into the power and flexibility offered by SAS PROCedures using PRINT, FORMS, and SQL to produce detail output; FREQ, MEANS, and UNIVARIATE to summarize and create tabular and statistical output; and data sets to manage data libraries. Additional topics include techniques for informing SAS which data set to use as input to a procedure, how to subset data using a WHERE statement (or WHERE= data set option), and how to perform BY-group processing.

As complicated as the macro language is to learn, there are very strong reasons for doing so. At its heart, the macro language is a code generator. In its simplest uses, it can substitute simple bits of code like variable names and the names of data sets that are to be analyzed. In more complex situations, it can be used to create entire statements and steps based on information may even be unavailable to the person writing or even executing the macro. At the time of execution, it can be used to make queries of the SAS^® environment as well as the operating system, and utilize the gathered information to make informed decisions about how it is to further function and execute.

Because the macro language is primarily a code generator, it makes sense that the code that it creates must be generated before it can be executed. This implies that execution of the macro language comes first. Simple as this is in concept, timing issues and conflicts are often not so simple to recognize in application. As we use the macro language to take on more complex tasks, it becomes even more critical that we have an understanding of these issues.

Macro variables and their values are stored in symbol tables, which in turn are held in memory. Not only are there are a number of ways to create macro variables, but they can be created in a wide variety of situations. How they are created and under what circumstances effects the variable s scope how and where the macro variable is stored and retrieved. There are a number of misconceptions about macro variable scope and about how the macro variables are assigned to symbol tables. These misconceptions can cause problems that the new, and sometimes even the experienced, macro programmer does not anticipate. Understanding the basic rules for macro variable assignment can help the macro programmer solve some of these problems that are otherwise quite mystifying.

This paper provides an overview of how to create a SAS^® Enterprise Guide^® process that is well designed, simple, documented, automated, modular, efficient, reliable, and easy to maintain. Topics include how to organize a SAS Enterprise Guide process, how to best document in SAS Enterprise Guide, when to leverage point-and-click functionality, and how to automate and simplify SAS Enterprise Guide processes. This paper has something for any SAS Enterprise Guide user, new or experienced!

The emerging discipline of data governance encompasses data quality assurance, data access and use policy, security risks and privacy protection, and longitudinal management of an organization s data infrastructure. In the interests of forestalling another bureaucratic solution to data governance issues, this presentation features database programming tools that provide rapid access to big data and make selective access to and restructuring of metadata practical.

Digital data has manifested into a classic BIG DATA challenge for marketers who want to push past the retroactive analysis limitations of traditional web analytics. The current groundswell of digital device adoption and variety of digital interactions grows larger year after year. The opportunity for 'digital intelligence' has arrived, as traditional web analytic techniques were not designed for the breadth of channels, devices, and pace that fuels consumer experiences. In parallel, today's landscape for data visualization, advanced analytics, and our ability to process very large amounts of multi-channel information is changing. The democratization of analytics for the masses is upon us, and marketers have the oppourtunity to take advantage of descriptive, predictive, and (most importantly) prescriptive data-driven insights. This presentation describes how organizations can use SAS^® products, specifically SAS^® Visual Analytics and SAS^® Adaptive Customer Experience, to overcome the limitations of web analytics, and support data-driven integrated marketing objectives.

The use of Cohen s kappa has enjoyed a growing popularity in the social sciences as a way of evaluating rater agreement on a categorical scale. The kappa statistic can be calculated as Cohen first proposed it in his 1960 paper or by using any one of a variety of weighting schemes. The most popular among these are the linear weighted kappa and the quadratic weighted kappa. Currently, SAS^® users can produce the kappa statistic of their choice through PROC FREQ and the use of relevant AGREE options. Complications arise however when the data set does not contain a completely square cross-tabulation of data. That is, this method requires that both raters have to have at least one data point for every available category. There have been many solutions offered for this predicament. Most suggested solutions include the insertion of dummy records into the data and then assigning a weight of zero to those records through an additional class variable. The result is a multi-step macro, extraneous variable assignments, and potential data integrity issues. The author offers a much more elegant solution by producing a segment of code which uses brute force to calculate Cohen s kappa as well as all popular variants. The code uses nested PROC SQL statements to provide a single conceptual step which generates kappa statistics of all types even those that the user wishes to define for themselves.

This paper demonstrates the new case-level residuals in the CALIS procedure and how they differ from classic residuals in structural equation modeling (SEM). Residual analysis has a long history in statistical modeling for finding unusual observations in the sample data. However, in SEM, case-level residuals are considerably more difficult to define because of 1) latent variables in the analysis and 2) the multivariate nature of these models. Historically, residual analysis in SEM has been confined to residuals obtained as the difference between the sample and model-implied covariance matrices. Enhancements to the CALIS procedure in SAS/STAT^® 12.1 enable users to obtain case-level residuals as well. This enables a more complete residual and influence analysis. Several examples showing mean/covariance residuals and case-level residuals are presented.

Missing data commonly occurs in medical, psychiatry, and social researches. The SAS^® MI and MIANALYZE procedures are often used to generate multiple imputations and then provide valid statistical inferences based on them. However, MIANALYZE is not applicable to combine type-III analyses obtained using multiple imputed data sets. In this manuscript, we write a macro to combine the type-III analyses generated from the SAS MIXED procedure based on multiple imputations. The proposed method can be extended to other procedures reporting type-III analyses, such as GENMOD and GLM.

There has been debate regarding which method to use to analyze repeated measures continuous data when the design includes only two measurement times. Five different techniques can be applied and give similar results when there is little to no correlation between pre- and post-test measurements and when data at each time point are complete: 1) analysis of variance on the difference between pre- and post-test, 2) analysis of covariance on the differences between pre- and post-test controlling for pre-test, 3) analysis of covariance on post-test controlling for pre-test, 4) multiple analysis of variance on post- test and pre-test, and 5) repeated measures analysis of variance. However, when there is missing data or if a moderate to high correlation between pre- and post-test measures exists under an intent-to-treat analysis framework, bias is introduced in the tests for the ANOVA, ANCOVA, and MANOVA techniques. A comparison of Type III sum of squares, F-tests, and p-values for a complete case and an intent-to-treat analysis are presented. The analysis using a complete case data set shows that all five methods produce similar results except for the repeated measures ANOVA due to a moderate correlation between pre- and post-test measures. However, significant bias is introduced for the tests using the intent-to-treat data set.

Non-cognitive assessments, which measure constructs such as time management, goal-setting, and personality, are becoming more prevalent today in research within the domains of academic performance and workforce readiness. Many instruments that are used for this purpose contain a large number of items that can each be assigned to specific facets of the larger construct. The factor structure of each instrument emerges from a mixture of psychological theory and empirical research, often by doing exploratory factor analysis (EFA) using the SAS^® procedure PROC FACTOR. Once an initial model is established, it is important to perform confirmatory factor analysis (CFA) to confirm that the hypothesized model provides a good fit to the data. If outcome data such as grades are collected, structural equation modeling (SEM) should also be employed to investigate how well the assessment predicts these measures. This paper demonstrates how the SAS procedure PROC CALIS is useful for performing confirmatory factor analysis and structural equation modeling. Examples of these methods are demonstrated and proper interpretation of the fit statistics and resulting output is illustrated.

Merging or joining data sets is an integral part of the data consolidation process. Within SAS^®, there are numerous methods and techniques that can be used to combine two or more data sets. We commonly think that within the DATA step the MERGE statement is the only way to join these data sets, while in fact, the MERGE is only one of numerous techniques available to us to perform this process. Each of these techniques has advantages, and some have disadvantages. The informed programmer needs to have a grasp of each of these techniques if the correct technique is to be applied. This paper covers basic merging concepts and options within the DATA step, as well as a number of techniques that go beyond the traditional MERGE statement. These include fuzzy merges, double SET statements, and the use of key indexing. The discussion will include the relative efficiencies of these techniques, especially when working with large data sets.

Particle swarm optimization is a heuristic global optimization method that was given by James Kennedy and Russell C. Eberhart in 1995. (James Kennedy and Russell C. Eberhart). The purpose of this paper develops a code for particle swarm optimization in SAS^® 9.2.

Evaluation of the efficacy of an intervention is often complicated because the intervention is not randomly assigned. Usually, interventions in marketing, such as coupons or retention campaigns, are directed at customers because their spending is below some threshold or because the customers themselves make a purchase decision. The presence of nonrandom assignment of the stimulus can lead to over- or underestimating the value of the intervention. This can cause future campaigns to be directed at the wrong customers or cause the impacts of these effects to be over- or understated. This paper gives a brief overview of selection bias, demonstrates how selection in the data can be modeled, and shows how to apply some of the important consistent methods of estimating selection models, including Heckman's two-step procedure, in an empirical example. Sample code is provided in an appendix.

APP is an unofficial collective abbreviation for the SAS^® functions ADDR, PEEK, PEEKC, the CALL POKE routine, and their so-called LONG 64-bit counterparts the SAS tools designed to directly read from and write to physical memory in the DATA step. APP functions have long been a SAS dark horse. First, the examples of APP usage in SAS documentation amount to a few technical report tidbits intended for mainframe system programming, with nary a hint how the functions can be used for data management programming. Second, the documentation note on the CALL POKE routine is so intimidating in tone that many potentially receptive folks might decide to avoid the allegedly precarious route altogether. However, little can stand in the way of an inquisitive SAS programmer daring to take a close look, and it turns out that APP functions are very simple and useful tools! They can be used to explore how things really work, to make code more concise, to implement en masse data movement, and they can often dramatically improve execution efficiency. The author and many other SAS experts (notably Peter Crawford, Koen Vyverman, Richard DeVenezia, Toby Dunn, and the fellow masked by his 'Puddin' Man' sobriquet) have been poking around the SAS APP realm on SAS-L and in their own practices since 1998, occasionally letting the SAS community at large to peek at their findings. This opus is an attempt to circumscribe the results in a systematic manner. Welcome to the APP world! You are in for a few glorious surprises.

The implicit loop refers to the DATA step repetitively reading data and creating observations, one at a time. The explicit loop, which uses the iterative DO, DO WHILE, or DO UNTIL statements, is used to repetitively execute certain SAS^® statements within each iteration of the DATA step execution. Explicit loops are often used to simulate data and to perform a certain computation repetitively. However, when an explicit loop is used along with array processing, the applications are extended widely, which includes transposing data, performing computations across variables, and so on. To be able to write a successful program that uses loops and arrays, one needs to know the contents in the program data vector (PDV) during the DATA step execution, which is the fundamental concept of DATA step programming. This workshop covers the basic concepts of the PDV, which is often ignored by novice programmers, and then illustrates how to use loops and arrays to transform lengthy code into more efficient programs.

In evaluation instruments and tests, individual items are often collected using an ordinal measurement or Likert type scale. Typically measures such as Cronbach s alpha are estimated using the standard Pearson correlation. Gadderman and Zumbo (2012) illustrate how using the standard Pearson correlations may yield biased estimates of reliability when the data are ordinal and present methodology for using the polychoric correlation in reliability estimates as an alternative. This session shows how to implement the methods of Gadderman and Zumbo using SAS^® software. An example will be presented that incorporates these methods in the estimation of the reliability of an active learning post-occupancy evaluation instrument developed by Steelcase Education Solutions researchers.

The worst part of going to school is having to show up. However, data shows that those who do show up are the ones that are going to be the most successful (Johnson, 2000). As shown in a study done in Minneapolis, students who were in class at least 95% of the time were twice as likely pass state tests (Johnson, 2000). Studies have been conducted and show that school districts that show interest in attendance have higher achievement in students (Reeves, 2008). The goal in doing research on student attendance is to find out the patterns of when people are missing class and why they are absent. The data comes directly from the Phillip O Berry High School Attendance Office, with around 1600 students; there is plenty of data to be used from the 2012 2013 school year. Using Base SAS^® 9.3, after importing the data in from Microsoft Excel, a series of PROC formats and PROC GCharts were used to output and analyze the data. The data showed the days of the week and period that students missed the most, depending on grade level. The data shows that Freshman and Seniors were the most likely to be absent on a given day. Based on the data, attendance continues to be a issue; therefore, school districts need to take an active role in developing attendance policies.

SAS^® is an outstanding suite of software, but not everyone in the workplace speaks SAS. However, almost everyone speaks Excel. Often, the data you are analyzing, the data you are creating, and the report you are producing is a form of a Microsoft Excel spreadsheet. Every year at SAS^® Global Forum, there are SAS and Excel presentations, not just because Excel isso pervasive in the workplace, but because there s always something new to learn (or re-learn)! This paper summarizes and references (and pays homage to!) previous SAS Global Forum presentations, as well as examines some of the latest Excel capabilities with the latest versions of SAS^® 9.4 and SAS^® Visual Analytics.

Business Intelligence (BI) dashboards serve as an invaluable, high-level, visual reference tool for decision-making processes in many business industries. A request was made to our department to develop some BI dashboards that could be incorporated in an academic setting. These dashboards would aim to serve various undergraduate executive and administrative staff at the university. While most business data may lend itself to work very well and easily in the development of dashboards, academic data is typically modeled differently and, therefore, faces unique challenges. In this paper, the authors detail and share the design and development process of creating dashboards for decision making in an academic environment utilizing SAS^® BI Dashboard 4.3 and other SAS^® Enterprise Business Intelligence 9.2 tools. The authors also provide lessons learned as well as recommendations for future implementations of BI dashboards utilizing academic data.

Explore the various DATA step merge and PROC SQL join processes. This presentation examines the similarities and differences between merges and joins, and provides examples of effective coding techniques. Attendees examine the objectives and principles behind merges and joins, one-to-one merges (joins), and match-merge (equi-join), as well as the coding constructs associated with inner and outer merges (joins) and PROC SQL set operators.

For almost two decades, Western Kentucky University's Office of Institutional Research (WKU-IR) has used SAS^® to help shape the future of the institution by providing faculty and administrators with information they can use to make a difference in the lives of their students. This presentation provides specific examples of how WKU-IR has shaped the policies and practices of our institution and discusses how WKU-IR moved from a support unit to a key strategic partner. In addition, the presentation covers the following topics: How the WKU Office of Institutional Research developed over time; Why WKU abandoned reactive reporting for a more accurate, convenient system using SAS^® Enterprise Intelligence Suite for Education; How WKU shifted from investigating what happened to predicting outcomes using SAS^® Enterprise Miner^™ and SAS^® Text Miner; How the office keeps the system relevant and utilized by key decision makers; What the office has accomplished and key plans for the future.

Analyzing data from a complex probability survey involves weighting observations so that inferences are correct. This introductory presentation is intended for an audience new to analyzing survey data. Learn the essentials of using the SURVEYxx procedures in SAS/STAT^®.

It is not uncommon to find models with random components like location, clinic, teacher, etc., not just the single error term we think of in ordinary regression. This paper uses several examples to illustrate the underlying ideas. In addition, the response variable might be Poisson or binary rather than normal, thus taking us into the realm of generalized linear mixed models, These too will be illustrated with examples.

Beginning with SA^®S 9.2, ODS Graphics introduces a whole new way of generating graphs using SAS^®. With just a few lines of code, you can create a wide variety of high-quality graphs. This paper covers the three basic ODS Graphics procedures SGPLOT, SGPANEL, and SGSCATTER. SGPLOT produces single-celled graphs. SGPANEL produces multi-celled graphs that share common axes. SGSCATTER produces multi-celled graphs that might use different axes. This paper shows how to use each of these procedures in order to produce different types of graphs, how to send your graphs to different ODS destinations, how to access individual graphs, and how to specify properties of graphs, such as format, name, height, and width.

Determining what, when, and how to migrate SAS^® software from one major version to the next is a common challenge. SAS provides documentation and tools to help make the assessment, planning, and eventual deployment go smoothly. We describe some of the keys to making your migration a success, including the effective use of the SAS^® Migration Utility, both in the analysis mode and the execution mode. This utility is responsible for analyzing each machine in an existing environment, surfacing product-specific migration information, and creating packages to migrate existing product configurations to later versions. We show how it can be used to simplify each step of the migration process, including recent enhancements to flag product version compatibility and incompatibility.

Administrators at Western Kentucky University rely on the Institutional Research department to perform detailed statistical analyses to deepen the understanding of issues associated with enrollment management, student and faculty performance, and overall program operations. This paper presents several instances of analyses performed for the university to help it identify and recruit suitable candidates, uncover root causes in grade and enrollment trends, evaluate faculty effectiveness, and assess the impact of student characteristics, programs, or student activities on retention and graduation rates. The paper briefly discusses the data infrastructure created and used by Institutional Research. For each analysis performed, it reviews the SAS^® program and key components of the SAS code involved. The studies presented include the use of SAS^® Enterprise Miner^™ to create a retention model incorporating dozens of student background variables. It shows an examination of grade trends in the same courses taught by different faculty and subsequent student behavior and success, providing insights into the nuances and subtleties of evaluating faculty performance. Another analysis uncovers the possible influence of fraternities and sororities in freshmen algebra courses. Two investigations explore the impact of programs on student retention and graduation rates. Each example and its findings illustrate how Institutional Research can support the administration of university operations. The target audience is any SAS professional interested in learning more about Institutional Research in higher education and how SAS software is used by an Institutional Research department to serve its organization.

Understanding the actual gambling behavior of an individual over the Internet, we develop markers which identify behavioral patterns, which in turn can be used to predict the level of risk a subscriber is prone to gambling. The data set contains 4,056 subscribers. Using SAS^® Enterprise Miner^™ 12.1, a set of models are run to predict which subscriber is likely to become a high-risk internet gambler. The data contains 114 variables such as first active date and first active product used on the website as well as the characteristics of the game such as fixed odds, poker, casino, games, etc. Other measures of a subscriber s data such as money put at stake and what odds are being bet are also included. These variables provide a comprehensive view of a subscriber s behavior while gambling over the website. The target variable is modeled as a binary variable, 0 indicating a risky gambler and 1 indicating a controlled gambler. The data is a typical example of real-world data with many missing values and hence had to be transformed, imputed, and then later considered for analysis. The model comparison algorithm of SAS Enterprise Miner 12.1 was used to determine the best model. The stepwise Regression performs the best among a set of 25 models which were run using over a 100 permutations of each model. The Stepwise Regression model predicts a high-risk Internet gambler at an accuracy of 69.63% with variables such as wk4frequency and wk3frequency of bets.

Item response theory (IRT) is concerned with accurate test scoring and development of test items. You design test items to measure various types of abilities (such as math ability), traits (such as extroversion), or behavioral characteristics (such as purchasing tendency). Responses to test items can be binary (such as correct or incorrect responses in ability tests) or ordinal (such as degree of agreement on Likert scales). Traditionally, IRT models have been used to analyze these types of data in psychological assessments and educational testing. With the use of IRT models, you can not only improve scoring accuracy but also economize test administrations by adaptively using only the discriminative items. These features might explain why in recent years IRT models have become increasingly popular in many other fields, such as medical research, health sciences, quality-of-life research, and even marketing research. This paper describes a variety of IRT models, such as the Rasch model, two-parameter model, and graded response model, and demonstrates their application by using real-data examples. It also shows how to use the IRT procedure, which is new in SAS/STAT^® 13.1, to calibrate items, interpret item characteristics, and score respondents. Finally, the paper explains how the application of IRT models can help improve test scoring and develop better tests. You will see the value in applying item response theory, possibly in your own organization!

Traditional SAS^® programs typically consist of a series of SAS DATA steps, which refine input data sets until the final data set or report is reached. SAS DATA steps do not run in-database. However, SAS^® Enterprise Guide^® users can replicate this kind of iterative programming and have the resulting process flow run in-database by linking a series of SAS Enterprise Guide Query Builder tasks that output SAS views pointing at data that resides in a Teradata database, right up to the last Query Builder task, which generates the final data set or report. This session both explains and demonstrates this functionality.

Report automation and scheduling are very hot topics in many industries. They confer many advantages including reduced work load, elimination of repetitive tasks, generatation of accurate results, and better performance. This paper illustrates how to design an appropriate program to automate and schedule reports in SAS^® 9.1 and SAS^® Enterprise Guide^® 5.1 using a SAS^® server as well as the Windows Scheduler. The automation part includes good aspects of formatting Microsoft Excel tables using XML or VBA coding or any other formats, and conditional auto e-mailing with file attachments. We systematically walk through each step with a clear flow diagram from the data source to the final destination. We also discuss details of server-side and PC-side schedulers and how these schedulers involve invoking batch programs.

The soaring number of publicly available data sets across disciplines have allowed for increased access to real-life data for use in both research and educational settings. These data often leverage cost-effective complex sampling designs including stratification and clustering, which allow for increased efficiency in survey data collection and analyses. Weighting becomes a necessary component in these survey data in order to properly calculate variance estimates and arrive at sound inferences through statistical analysis. Generally speaking, these weights are included with the variables provided in the public use data, though an explanation for how and when to use these weights is often lacking. This paper presents an analysis using the California Health Interview Survey to compare weighted and non-weighted results using SAS^® PROC LOGISTIC and PROC SURVEYLOGISTIC.

U.S. educators face a critical new imperative: to prepare all students for work and civic roles in a globalized environment in which success increasingly requires the ability to compete, connect, and cooperate on an international scale. The Asia Society and the Longview Foundation are collaborating on a project to show both the need for and supply of globally competent graduates. This presentation shows you how SAS assisted these organizations with a solution that leverages SAS^® visualization technologies in order to produce a heatmap application. The heatmap application surfaces data from over 300 indicators and surfaces over a quarter million data points in a highly iterative heatmap application. The application features a drillable map that shows data at the state level as well as at the county level for all 50 states. This endeavor involves new SAS^® 9.4 technology to both combine the data and to create the interface. You'll see how SAS procedures, such as PROC JSON, which came out in SAS 9.4, were used to prepare the data for the web application. The user interface demonstrates how SAS/GRAPH^® output can be combined with popular JavaScript frameworks like Dojo and Twitter Bootstrap to create an HTML5 application that works on desktop, mobile, and tablet devices.

This paper provides a set of ideas about design elements of SAS^® macros. This paper is a checklist for programmers who write or test macros.

One of the most common questions about logistic regression is How do I know if my model fits the data? There are many approaches to answering this question, but they generally fall into two categories: measures of predictive power (like R-squared) and goodness of fit tests (like the Pearson chi-square). This presentation looks first at R-squared measures, arguing that the optional R-squares reported by PROC LOGISTIC might not be optimal. Measures proposed by McFadden and Tjur appear to be more attractive. As for goodness of fit, the popular Hosmer and Lemeshow test is shown to have some serious problems. Several alternatives are considered.

Mobile devices are taking over conventional ways of sharing and presenting information in today s businesses and working environments. Accessibility to this information is a key factor for companies and institutions in order to reach wider audiences more efficiently. SAS^® software provides a powerful set of tools that allows developers to fulfill the increasing demand in mobile reporting without needing to upgrade to the latest version of the platform. Here at University of Central Florida (UCF), we were able to create reports targeting our iPad consumers at our executive level by using the SAS^® 9.2 Enterprise Business Intelligence environment, specifically SAS^® Web Report Studio 4.3. These reports provide them with the relevant data for their decision-making process. At UCF, the goal is to provide executive consumers with reports that fit on one screen in order to avoid the need of scrolling and that are easily exportable to PDF. This is done in order to respond to their demand to be able to accomodate their increasing use of portable technology to share sensitive data in a timely manner. The technical challenge is to provide specific data to those executive users requesting access through their iPad devices. Compatibility issues arise but are successfully bypassed. We are able to provide reports that fit on one screen and that can be opened as a PDF if needed. These enhanced capabilities were requested and well received by our users. This paper presents techniques we use in order to create mobile reports.

This paper considers the %MRE macro for estimating multivariate ratio estimates. Also, we use PROC REG to estimate multivariate regression estimates and to show that regression estimates are superior to the ratio estimates.

It is often the case that parameters in a predictive model should be restricted to an interval that is either reasonable or necessary given the model s application. A simple and classic example of such a restriction is the regression model which requires that all parameters to be positive. In the case of multiple least squares (MLS) regression, the resulting model is therefore strictly additive and, in certain applications, not only appropriate but also intuitive. This special case of an MLS model is commonly referred to as a nonnegative least squares regression. While Base SAS^® contains a multitude of ways to perform a multiple least squares regression (PROC REG and PROC GLM, to name two), there exists no native SAS^® procedure to conduct a nonnegative least squares regression. The author offers a concise way to conduct the nonnegative least squares analysis by using PRON NLIN (proc non-linear ). PROC NLIN offers user restriction on parameter estimates. By fashioning a linear model in the framework of a nonlinear procedure, the end result can be achieved. As an additional corollary, the author will show how to calculate the _RSQUARE_ statistic for the resulting model, which has been left out of the PROC NLIN output for the reason that it is invalid in most cases (though not ours).

PROC TABULATE is a powerful tool for creating tabular summary reports. Its advantages, over PROC REPORT, are that it requires less code, allows for more convenient table construction, and uses syntax that makes it easier to modify a table s structure. However, its inability to compute the sum, difference, product, and ratio of column sums has hindered its use in many circumstances. This paper illustrates and discusses some creative approaches and methods for overcoming these limitations, enabling users to produce needed reports and still enjoy the simplicity and convenience of PROC TABULATE. These methods and skills can have prominent applications in a variety of business intelligence and analytics fields.

The linear logistic test model (LLTM) that incorporates the cognitive task characteristics into the Rasch model has been widely used for various purposes in educational contexts. However, the LLTM model assumes that the variance of item difficulties is completely accounted for by cognitive attributes. To overcome the disadvantages of the LLTM, Janssen and colleagues (2004) proposed the crossed random-effects (CRE) LLTM by adding the error term on item difficulty. This study examines the accuracy and precision of the CRE-LLTM in terms of parameter estimation for cognitive attributes. The effect of different factors (for example, sample size, population distributions, sparse or dense matrices, and test length), is examined. PROC GLIMMIX was used to do the analysis and SAS/IML^® software was used to generate data.

The SQL procedure contains many powerful and elegant language features for intermediate and advanced SQL users. This presentation discusses topics that will help SAS^® users unlock the many powerful features, options, and other gems found in the SQL universe. Topics include CASE logic; a sampling of summary (statistical) functions; dictionary tables; PROC SQL and the SAS macro language interface; joins and join algorithms; PROC SQL statement options _METHOD, MAGIC=101, MAGIC=102, and MAGIC=103; and key performance (optimization) issues.

An increase in sea levels is a potential problem that is affecting the human race and marine ecosystem. Many models are being developed to find out the factors that are responsible for it. In this research, the Memory-Based Reasoning model looks more effective than most other models. This is because this model takes the previous solutions and predicts the solutions for forthcoming cases. The data was collected from NASA. The data contains 1,072 observations and 10 variables such as emissions of carbon dioxide, temperature, and other contributing factors like electric power consumption, total number of industries established, and so on. Results of Memory-Based Reasoning models like RD tree, scan tree, neural networks, decision tree, and logistic regression are compared. Fit statistics, such as misclassification rate and average squared error are used to evaluate the model performance. This analysis is used to predict the rise in sea levels in the near future and to take the necessary actions to protect the environment from global warming and natural disasters.

Introduction to the PAR Framework, a non-profit, member-driven collaborative for student success providing affordable predictive analytics, innovative benchmark reports, and intervention assessment tools to colleges and universities nationwide.

Many SAS^® procedures use classification variables when they are processing the data. These variables control how the procedure forms groupings, summarizations, and analysis elements. For statistics procedures, they are often used in the formation of the statistical model that is being analyzed. Classification variables can be explicitly specified with a CLASS statement, or they can be specified implicitly from their usage in the procedure. Because classification variables have such a heavy influence on the outcome of so many procedures, it is essential that the analyst have a good understanding of how classification variables are applied. Certainly there are a number of options (system and procedural) that affect how classification variables behave. While you may be aware of some of these options, a great many are new, and some of these new options and techniques are especially powerful. You really need to be open to learning how to program with CLASS.

Can you juggle? Maybe. Can you shuffle a deck of cards? Probably. Can you do both at the same time? Welcome to the world of SAS^® and LSF! Very few SAS Administrators start out learning LSF at the same time they learn SAS; most already know SAS, possibly starting out as a programmer or analyst, but now have to step up to an enterprise platform with shared resources. The biggest challenge on an enterprise platform? How to share! How to maximum the utilization of a SAS platform, yet still ensure everyone gets their fair share? This presentation will boil down the 2000+ pages of LSF documentation to provide an introduction into various LSF concepts: * Host * Clusters * Nodes * Queues * First-Come-First-Serve * Fairshare * and various configuration settings: UJOB_LIMIT, PJOB_LIMIT, etc. Plus some insight on where to configure all these settings which are set up by the installation process, and which can be configured by the SAS or LSF administrator. This session is definitely NOT for experts. It is for those about to step into an enterprise deployment of SAS, and want to understand how the SAS server sessions they know so well can run on a shared platform.

Duration and severity data arise in several fields including biostatistics, demography, economics, engineering, and sociology. SAS^® procedures LIFETEST, LIFEREG. and PHREG are the workhorses for analysis of time to event data in applications in biostatistics. Similar methods apply to the magnitude or severity of a random event, where the outcome might be right, left, or interval censored and/or, right or left truncated. All combinations of types of censoring and truncation could be present in the data set. Regression models such as the accelerated failure time model, the Cox model, and the non-homogeneous Poisson model have extensions to address time-varying covariates in the analysis of clustered outcomes, multivariate outcomes of mixed types, and recurrent events. We present an overview of new capabilities that are available in the procedures QLIM, QUANTLIFE, RELIABILITY, and SEVERITY with examples illustrating their application using empirical data sets drawn from easily accessible sources.

In healthcare, we often express our analytics results as being adjusted . For example, you might have read a study in which the authors reported the data as age-adjusted or risk-adjusted. The concept of adjustment is widely used in program evaluation, comparing quality indicators across providers and systems, forecasting incidence rates, and in cost-effectiveness research. In order to make reasonable comparisons across time, place, or population, we need to account for small sample sizes and case-mix variation in other words, we need to level the playing field and account for differences in health status and for uniqueness in a given population. If you are new to healthcare. What it really means to adjust the data in order to make comparisons might not be obvious. In this paper, we explore the methods by which we control for potentially confounding variables in our data. We do so through a series of examples from the healthcare literature in both primary care and health insurance. In this survey of methods, we discuss the concepts of rates and how they can be adjusted for demographic strata (such as age, gender, and race), as well as health risk factors such as case mix.

Spinal epidural abscess (SEA) is a serious complication in hemodialysis (HD) patients, yet there is little medical literature that discusses it. This analysis identified risk factors and co-morbidities associated with SEA, as well as risk factors for mortality following the diagnosis. All incident HD cases from the United States Renal Data System for calendar years 2005 2008 were queried for a diagnosis of SEA. Potential clinical covariates, survival, and risk factors were recovered using ICD-9 diagnosis codes. Log-binomial regressions were performed using PROC GENMOD to assess the relative risks, and Cox regression models were run using PROC PHREG to estimate hazard ratios for mortality. For the 4-year study period, 660/355084 (0.19%) HD patients were identified with SEA, the largest cohort to date. Older age (RR=1.625), infectious comorbidities including bacteremia (RR=7.7976), methicillin-resistant Staphylococcus aureus infection (RR=2.6507), hepatitis C (RR=1.545), and non-infectious factors including diabetes (RR=1.514) and presence of vascular catheters (RR=1.348) were identified as significant risk factors for SEA. SEA in HD patients was associated with an increased risk of death (HR=1.20). Older age (HR=2.269), the presence of dialysis catheters (HR=1.884), cirrhosis (HR=1.715), decubitus ulcers (HR=1.669), bacteremia (HR=1.407), and total parenteral nutrition (HR=1.376) constitute the greatest risk factors for death after SEA diagnosis and thus necessitate a comprehensive approach to management.

The high school dropout problem has been called a national crisis (Heppen & Therriault, 2008). Almost one-third of all high school students leave the public school system before graduating (Swanson, 2004), and the problem is particularly severe among minority students (Greene & Winters, 2005; U.S. Department of Education, 2006). Educators, researchers, and policymakers continue to work to identify effective dropout prevention strategies. One effective approach is to identify high-risk students at an early stage, and then provide corresponding interventions to keep them in school. One of the strengths of Educational Data Mining is to reveal hidden patterns and predict future performance by analyzing accessible student data. These predictive algorithms generated by predictive modeling can serve as an early warning system. However, because individual schools and districts have various combinations of race, gender, and socioeconomic status, we cannot use a set of standardized predictors and obtain satisfactory predictive results. Analyzing a limited number of variables and limited historical data does not generate accurate models. Additionally, the predictive model might not consider interactions among predictors. The strength of data mining is the capability to analyze a large amount of data and variables. Multiple analytic strategies (including model comparisons) can be applied to maximize model performance. For future goals, we propose a debuted data mining framework to construct an early warning and trend analysis system with components of data warehousing, data mining, and reporting at the levels of individual students, schools, school districts, and the entire state.

This discussion uses SAS^® Office Analytics as an example to demonstrate the importance of preparing for the SAS^® installation. There are many nuances as well as requirements that need to be addressed before you do an installation. These requirements are basically similar, yet they differ according to the target installation operating system. In other words, there are some differences in preparation routines for Windows and *Nix flavors. Our discussion focuses on these three topics: 1. Pre-installation considerations such as sizing, storage, proper credentials, and third-party requirements; 2. Installation steps and requirements; and 3. Post-installation configuration. In addition to preparation, this paper also discusses potential issues and pitfalls to watch out for, as well as best practices.

Universities in the UK are now subject to League Table reporting by a range of providers. The criteria used by each League Table differ. Universities, their faculties, and individual subject areas want to understand how the different tables are constructed and calculated, and what is required in order to maximize their position in each league table in order to attract the best students to their institution, thereby maximizing recruitment and student-related income streams. The School of Computing and Maths at the University of Derby is developing the use SAS^® Visual Analytics to analyse each league table to provide actionable insights as to actions that can be taken to improve their relative standing in the league tables and also to gain insights into feasible levels of targets relative to the peer groups of institutions. This paper outlines the approaches taken and some of the critical insights developed that will be of value to other higher education institutions in the UK, and suggests useful approaches that might be valuable in other countries.

Using Lilypond typesetting software, you can write publication-grade music scores. The input for Lilypond is a text file that can be written once and then transferred to SAS^® for patterned repetition, so that you can cycle through patterns that occur in music. The author plays a sequence of notes and then writes this into Lilypond code. The sequence starts in the key of C with only a two-note sequence. Then the sequence is extended to three-, four-, then five-note sequences, always contained in one octave. SAS is then used to write the same code for all other eleven keys and in seven scale modes. The method is very simple and not advanced programming. Lookup files are used in the programming, demonstrating efficient lookup techniques. The result is a lengthy book or exercise for practicing music in a PDF file, and a sound source file in midi format is created that you can hear. This method shows how various programming languages can be used to write other programming languages.

Statistical mediation analysis is common in business, social sciences, epidemiology, and related fields because it explains how and why two variables are related. For example, mediation analysis is used to investigate how product presentation affects liking the product, which then affects the purchase of the product. Mediation analysis evaluates the mechanism by which a health intervention changes norms that then change health behavior. Research on mediation analysis methods is an active area of research. Some recent research in statistical mediation analysis focuses on extracting accurate information from small samples by using Bayesian methods. The Bayesian framework offers an intuitive solution to mediation analysis with small samples; namely, incorporating prior information into the analysis when there is existing knowledge about the expected magnitude of mediation effects. Using diffuse prior distributions with no prior knowledge allows researchers to reason in terms of probability rather than in terms of (or in addition to) statistical power. Using SAS^® PROC MCMC, researchers can choose one of two simple and effective methods to incorporate their prior knowledge into the statistical analysis, and can obtain the posterior probabilities for quantities of interest such as the mediated effect. This project presents four examples of using PROC MCMC to analyze a single mediator model with real data using: (1) diffuse prior information for each regression coefficient in the model, (2) informative prior distributions for each regression coefficient, (3) diffuse prior distribution for the covariance matrix of variables in the model, and (4) informative prior distribution for the covariance matrix.

Linear regression has been a widely used approach in social and medical sciences to model the association between a continuous outcome and the explanatory variables. Assessing the model assumptions, such as linearity, normality, and equal variance, is a critical step for choosing the best regression model. If any of the assumptions are violated, one can apply different strategies to improve the regression model, such as performing transformation of the variables or using a spline model. SAS^® has been commonly used to assess and validate the postulated model and SAS^® 9.3 provides many new features that increase the efficiency and flexibility in developing and analyzing the regression model, such as ODS Statistical Graphics. This paper aims to demonstrate necessary steps to find the best linear regression model in SAS 9.3 in different scenarios where variable transformation and the implementation of a spline model are both applicable. A simulated data set is used to demonstrate the model developing steps. Moreover, the critical parameters to consider when evaluating the model performance are also discussed to achieve accuracy and efficiency.

SAS^® OLAP technology is used to organize and present summarized data for business intelligence applications. It features flexible options for creating and storing aggregations to improve performance and brings a powerful multi-dimensional approach to querying data. This paper focuses on managing security features available to OLAP cubes through the combination of SAS metadata and MDX logic.

Universities strive to be competitive in the quality of education as well as cost of attendance. Peer institutions are selected to make comparisons pertaining to academics, costs, and revenues. These comparisons lead to strategic decisions and long-range planning to meet goals. The process of finding comparable institutions could be completed with cluster analysis, a statistical technique. Cluster analysis places universities with similar characteristics into groups or clusters. A process to determine peer universities will be illustrated using PROC STANDARD, PROC FASTCLUS, and PROC CLUSTER.

The Purchasing Department is considering contracting with your team for a new SAS^® Enterprise BI application. He's already met with SAS^® and seen the sales pitch, and he is very interested. But the manager is a tightwad and not sure about spending the money. Also, he wants his team to be the primary developers for this new application. Before investing his money on training, programming, and support, he would like a proof-of-concept. This paper will walk you through the seven steps to create a SAS Enterprise BI POC project: Develop a kick-off meeting including a full demo of the SAS Enterprise BI tools. Set up your UNIX file systems and security. Set up your SAS metadata ACTs, users, groups, folders, and libraries. Make sure the necessary SAS client tools are installed on the developers machines. Hold a SAS Enterprise BI workshop to introduce them to the basics, including SAS^® Enterprise Guide^®, SAS^® Stored Processes, SAS^® Information Maps, SAS^® Web Report Studio, SAS^® Information Delivery Portal, and SAS^® Add-In for Microsoft Office, along with supporting documentation. Work with them to develop a simple project, one that highlights the benefits of SAS Enterprise BI and shows several methods for achieving the desired results. Last but not least, follow up! Remember, your goal is not to launch a full-blown application. Instead, we ll strive toward helping them see the potential in your organization for applying this methodology.

SAS^® continues to expand and improve its reporting capability. With new SAS^® 9.4 enhancements in ODS (Output Delivery System), the opportunity to create stunning reports has expanded even further. If you are charged with creating relevant, informative, easy-to-read reports for clients or administrators, then the ODS Report Writing Interface, ODS LAYOUT enhancements, and the new ODSTEXT procedure are important tools to use. These tools allow you to create reports in a smart, eye-catching format that can be turned around quite quickly and programmed to provide optimum flexibility. How many times have you worked hours to tweak and fine-tune a report directly in Microsoft Excel, Microsoft Word, Microsoft Power Point or some other similar software only to be asked for a quick update , which would then take hours to recreate because you are manually transferring data? Do you ever dread receiving the compliment, This is really wonderful information!!!! because you know it will be followed by Can you run this for EVERY region? Well, dread no more, because when you harness the power of SAS^® ODS, you can create first-rate, flexible, fabulous reports! Join me as I share with you two real-world examples of ODS capabilities using (1) a marketing piece I designed to help the president of our university spotlight county- and region-specific data as he recruited across the state and (2) our academic program review form, a multi-page report that outputs to Word so that program coordinators can add personalized commentary to support their program s effectiveness.

With smartphone and mobile apps market developing so rapidly, the expectations about effectiveness of mobile applications is high. Marketers and app developers need to analyze huge data available much before the app release, not only to better market the app, but also to avoid costly mistakes. The purpose of this poster is to build models to predict the success rate of an app to be released in a particular category. Data has been collected for 540 android apps under the Top free newly released apps category from https://play.google.com/store . The SAS^® Enterprise Miner^™ Text Mining node and SAS^® Sentiment Analysis Studio are used to parse and tokenize the collected customer reviews and also to calculate the average customer sentiment score for each app. Linear regression, neural, and auto-neural network models have been built to predict the rank of an app by considering average rating, number of installations, total number of reviews, number of 1-5 star ratings, app size, category, content rating, and average customer sentiment score as independent variables. A linear regression model with least Average Squared Error is selected as the best model, and number of installations, app maturity content are considered as significant model variables. App category, user reviews, and average customer sentiment score are also considered as important variables in deciding the success of an app. The poster summarizes the app success trends across various factors and also introduces a new SAS^® macro %getappdata, which we have developed for web crawling and text parsing.

'Can I have that in Excel?' This is a request that makes many of us shudder. Now your boss has discovered Microsoft Excel pivot tables. Unfortunately, he has not discovered how to make them. So you get to extract the data, massage the data, put the data into Excel, and then spend hours rebuilding pivot tables every time the corporate data is refreshed. In this workshop, you learn to be the armchair quarterback and build pivot tables without leaving the comfort of your SAS^® environment. In this workshop, you learn the basics of Excel pivot tables and, through a series of exercises, you learn how to augment basic pivot tables first in Excel, and then using SAS. No prior knowledge of Excel pivot tables is required.

For decades, SAS^® has been the cornerstone of many organizations for business reporting. In more recent times, the ability to quickly determine the performance of an organization through the use of dashboards has become a requirement. Different ways of providing dashboard capabilities are discussed in this paper: using out-of-the-box solutions such as SAS^® Visual Analytics and SAS^® BI Dashboard, through to alternative solutions using SAS^® Stored Processes, batch processes, and SAS^® Integration Technologies. Extending the available indicators is also discussed, using Graph Template Language and KPI indicators provided with Base SAS^®, as well as alternatives such as Google Charts and Flash objects. Real-world field experience, problem areas, solutions, and tips are shared, along with live examples of some of the different methods.

The FORMAT procedure in SAS^® is a very powerful and productive tool, yet many beginning programmers rarely make use of it. The FORMAT procedure provides a convenient way to do a table lookup in SAS. User-generated FORMATS can be used to assign descriptive labels to data values, create new variables, and find unexpected values. PROC FORMAT can also be used to generate data extracts and to merge data sets. This paper provides an introductory look at PROC FORMAT for the beginning user and provides sample code that illustrates the power of PROC FORMAT in a number of applications. Additional examples and applications of PROC FORMAT can be found in the SAS^® Press book titled 'The Power of PROC FORMAT.'

The SAS^® Enterprise Guide^® Query Builder is one of the most powerful components of the software. It enables a user to bring in data, join, drop and add columns, compute new columns, sort, filter data, leverage the advanced expression builder, change column attributes, and more! This presentation provides an overview of the major features of this powerful tool and how to leverage it every day.

This is the way I have always done it and it works fine for me. Have you heard yourself or others say this when someone suggests a new technique to help solve a problem? Most of us have a set of tricks and techniques from which we draw when starting a new project. Over time we might overlook newer techniques because our old toolkit works just fine. Sometimes we actively avoid new techniques because our initial foray leaves us daunted by the steep learning curve to mastery. For me, the PRX functions and the SAS^® hash object fell into this category. In this workshop, we address possible objections to learning to use the SAS hash object. We start with the fundamentals of setting up the hash object and work through a variety of practical examples to help you master this powerful technique.

Applying models to analyze sports data has always been done by teams across the globe. The film Moneyball has generated much hype about how a sports team can use data and statistics to build a winning team. The objective of this poster is to use the model comparison algorithm of SAS^® Enterprise Miner^™ to pick the best model that can predict the outcome of a soccer game. It is hence important to determine which factors influence the results of a game. The data set used contains input variables about a team s offensive and defensive abilities and the outcome of a game is modeled as a target variable. Using SAS Enterprise Miner, multinomial regression, neural networks, decision trees, ensemble models and gradient boosting models are built. Over 100 different versions of these models are run. The data contains statistics from the 2012-13 English premier league season. The competition has 20 teams playing each other in a home and away format. The season has a total of 380 games; the first 283 games are used to predict the outcome of the last 97 games. The target variable is treated as both nominal variable and ordinal variable with 3 levels for home win, away win, and tie. The gradient boosting model is the winning model which seems to predict games with 65% accuracy and identifies factors such as goals scored and ball possession as more important compared to fouls committed or red cards received.

As a longtime Base SAS^® programmer, whether to use a different application for programming is a constant question when powerful applications such as SAS^® Enterprise Guide^® are available. This paper provides some important tips for a programmer, such as the best way to use the code window and how to take advantage of system-generated code in SAS Enterprise Guide 5.1. This paper also explains the differences between some of the functions and procedures in Base SAS and SAS Enterprise Guide. It highlights features in SAS Enterprise Guide such as process flow, data access management, and report automation, including formatting using XML tag sets.

This paper gives you a better idea of how and where to use the record lookup functions to locate observations where a variable has some characteristic. Various related functions are illustrated to search numeric and character values in this process. Code is shown with time comparisons. I will discuss three possible ways to retrieve records using the SAS^® DATA step, PROC SQL, and Perl regular expressions. Real and CPU time processing issues will be highlighted when comparing to retrieve records using these methods. Although the program is written for the PC using SAS^® 9.2 in a Windows XP 32-bit environment, all the functions are applicable to any system. All the tools discussed are in Base SAS^®. The typical attendee or reader will have some experience in SAS, but not a lot of experience dealing with large amount of data.

One of the most striking features separating SAS^® from other statistical languages is that SAS has native SQL (Structured Query Language) capacity. In addition to the merging or the querying that a SAS user commonly applies in daily practice, SQL significantly enhances the power of SAS in descriptive statistics and data management. In this paper, we show reproducible examples to introduce 10 useful tips for the SQL procedure in the BASE module.

SAS^® Add-In for Microsoft Office remains a popular tool for people who are not SAS^® programmers due to its easy interface with the SAS servers. In this session, you'll learn some of the many tricks that other organizations use for getting more value out of the tool.

The independent means t-test is commonly used for testing the equality of two population means. However, this test is very sensitive to violations of the population normality and homogeneity of variance assumptions. In such situations, Yuen s (1974) trimmed t-test is recommended as a robust alternative. The purpose of this paper is to provide a SAS^® macro that allows easy computation of Yuen s symmetric trimmed t-test. The macro output includes a table with trimmed means for each of two groups, Winsorized variance estimates, degrees of freedom, and obtained value of t (with two-tailed p-value). In addition, the results of a simulation study are presented and provide empirical comparisons of the Type I error rates and statistical power of the independent samples t-test, Satterthwaite s approximate t-test, and the trimmed t-test when the assumptions of normality and homogeneity of variance are violated.

The DOW-loop is not official terminology that one can find in SAS^® documentation, but it has been well known and widely used among experienced SAS programmers. The DOW-loop was developed over a decade ago by a few SAS gurus, including Don Henderson, Paul Dorfman, and Ian Whitlock. A common construction of the DOW-loop consists of a DO-UNTIL loop with a SET and a BY statement within the loop. This construction isolates actions that are performed before and after the loop from the action within the loop, which results in eliminating the need for retaining or resetting the newly created variables to missing in the DATA step. In this talk, in addition to explaining the DOW-loop construction, we review how to apply the DOW-loop to various applications.

You have built the simple bar chart and mastered the art of layering multiple plot statements to create complex graphs like the Survival Plot using the SGPLOT procedure. You know all about how to use plot statements creatively to get what you need and how to customize the axes to achieve the look and feel you want. Now it s time to up your game and step into the realm of the Graphics Wizard. Behold the magical powers of Graph Template Language Layouts! Here you will learn the esoteric art of creating complex multi-cell graphs using LAYOUT LATTICE. This is the incantation that gives you the power to build complex, multi-cell graphs like the Forest plot, Stock plots with multiple indicators like MACD and Stochastics, Adverse Events by Relative Risk graphs, and more. If you ever wondered how the Diagnostics panel in the REG procedure was built, this paper is for you. Be warned, this is not the realm for the faint of heart!

Regression is a helpful statistical tool for showing relationships between two or more variables. However, many users can find the barrage of numbers at best unhelpful, and at worst undecipherable. Using the shipments and inventories historical data from the U.S. Census Bureau's office of Manufacturers' Shipments, Inventories, and Orders (M3), we can create a graphical representation of two time series with PROC GPLOT and map out reported and expected results. By combining this output with results from PROC REG, we are able to highlight problem areas that might need a second look. The resulting graph shows which dates have abnormal relationships between our two variables and presents the data in an easy-to-use format that even users unfamiliar with SAS^® can interpret. This graph is ideal for analysts finding problematic areas such as outliers and trend-breakers or for managers to quickly discern complications and the effect they have on overall results.

The new Markov chain Monte Carlo (MCMC) procedure introduced in SAS/STAT^® 9.2 and further exploited in SAS/STAT^® 9.3 enables Bayesian computations to run efficiently with SAS^®. The MCMC procedure allows one to carry out complex statistical modeling within Bayesian frameworks under a wide spectrum of scientific research; in psychometrics, for example, the estimation of item and ability parameters is a kind. This paper describes how to use PROC MCMC for Bayesian inferences of item and ability parameters under a variety of popular item response models. This paper also covers how the results from SAS PROC MCMC are different from or similar to the results from WinBUGS. For those who are interested in the Bayesian approach to item response modeling, it is exciting and beneficial to shift to SAS, based on its flexibility of data managements and its power of data analysis. Using the resulting item parameter estimates, one can continue to test form constructions, test equatings, etc., with all these test development processes being accomplished with SAS!

Especially in this current financial climate, many of us are being asked to do more with less. For several years, the Office of Institutional Research and Testing at Baylor University has been using SAS^® software to increase the efficiency of the office and of the University as a whole. Reports that were once prepared manually have been automated. Data quality processes have been implemented in order to reduce the number of duplicate mailings. Predictive modeling is used to focus recruiting efforts on those prospective students most likely to respond. A web-based portal has been created to provide self-service report generation for many administrators across campus. Along with this, a number of data processing functions have been centralized, eliminating the need for additional programming skills and software support. This presentation discusses these improvements in more detail and provides examples of the end results.

When reading data files or writing SAS^® programs, we are often hunting for the right format or informat. There are so many to choose from! Does it seem like too many to search the manual? Let SAS help find the right one! We use the SAS dictionary table VFORMAT and a very small SAS program. This presentation demonstrates how two simple functions unlock the potential of this great resource: SASHELP.VFORMAT.

Volatility estimation plays an important role in the elds of statistics and nance. Many different techniques address the problem of estimating volatility of nancial assets. Autoregressive conditional heteroscedasticity (ARCH) models and the related generalized ARCH models are popular models for volatility. This talk will introduce the need for volatility modeling as well as introduce the framework of ARCH and GARCH models. A brief discussion about the structure of ARCH and GARCH models will then be compared to other volatility modeling techniques.

Missing observations caused by dropouts or skipped visits present a problem in studies of longitudinal data. When the analysis is restricted to complete cases and the missing data depend on previous responses, the generalized estimating equation (GEE) approach, which is commonly used when the population-average effect is of primary interest, can lead to biased parameter estimates. The new GEE procedure in SAS/STAT^® 13.2 implements a weighted GEE method, which provides consistent parameter estimates when the dropout mechanism is correctly specified. When none of the data are missing, the method is identical to the usual GEE approach, which is available in the GENMOD procedure. This paper reviews the concepts and statistical methods. Examples illustrate how you can apply the GEE procedure to incomplete longitudinal data.