SAS Global Forum 2014 Proceedings

Nowadays, most corporations build and maintain their own data warehouse, and an ETL (Extract, Transform, and Load) process plays a critical role in managing the data. Some people might create a large program and execute this program from top to bottom. Others might generate a SAS^® driver with several programs included, and then execute this driver. If some programs can be run in parallel, then developers must write extra code to handle these concurrent processes. If one program fails, then users can either rerun the entire process or comment out the successful programs and resume the job from where the program failed. Usually the programs are deployed in production with read and execute permission only. Users do not have the priviledge of modifying codes on the fly. In this case, how do you comment out the programs if the job terminated abnormally? This paper illustrates an approach for managing ETL process flows. The approach uses a framework based on SAS, on a UNIX platform. This is a high-level infrastructure discussion with some explanation of the SAS codes that are used to implement the framework. The framework supports the rerun or partial run of the entire process without changing any source codes. It also supports the concurrent process, and therefore no extra code is needed.

Have you ever wished that with one click you could copy any SAS^® data set, including variable names, so that you could paste the text into a Microsoft Word file, Microsoft PowerPoint slide, or spreadsheet? You can and, with just Base SAS^®, there are some little-known but easy-to use methods that are available for automating many of your (or your users ) common tasks.

SAS^® functions provide amazing power to your DATA step programming. Some of these functions are essential some of them save you writing volumes of unnecessary code. This paper covers some of the most useful SAS functions. Some of these functions might be new to you, and they will change the way you program and approach common programming tasks.

Accelerated testing is an effective tool for predicting when systems fail, where the system can be as simple as an engine gasket or as complex as a magnetic resonance imaging (MRI) scanner. In particular, you might conduct an experiment to determine how factors such as temperature and voltage impose enough stress on the system to cause failure. Because system components usually meet nominal quality standards, it can take a long time to obtain failure data under normal-use conditions. An effective strategy is to accelerate the experiment by testing under abnormally stressful conditions, such as higher temperatures. Following that approach, you obtain data more quickly, and you can then analyze the data by using the RELIABILITY procedure in SAS/QC^® software. The analysis is a three-step process: you establish a probability model, explore the relationship between stress and failure, and then extrapolate to normal-use conditions. Graphs are a key component of all three stages: you choose a model by comparing residual plots from candidate models, use graphs to examine the stress-failure relationship, and then use an appropriately scaled graph to extrapolate along a straight line. This paper guides you through the process, and it highlights features added to the RELIABILITY procedure in SAS/QC 13.1.

Finding groups with similar attributes is at the core of knowledge discovery. To this end, Cluster Analysis automatically locates groups of similar observations. Despite successful applications, many practitioners are uncomfortable with the degree of automation in Cluster Analysis, which causes intuitive knowledge to be ignored. This is more true in text mining applications since individual words have meaning beyond the data set. Discovering groups with similar text is extremely insightful. However, blind applications of clustering algorithms ignore intuition and hence are unable to group similar text categories. The challenge is to integrate the power of clustering algorithms with the knowledge of experts. We demonstrate how SAS/STAT^® 9.2 procedures and the SAS^® Macro Language are used to ensemble the opinion of domain experts with multiple clustering models to arrive at a consensus. The method has been successfully applied to a large data set with structured attributes and unstructured opinions. The result is the ability to discover observations with similar attributes and opinions by capturing the wisdom of the crowds whether man or model.

The proliferation of textual data in business is overwhelming. Unstructured textual data is being constantly generated via call center logs, emails, documents on the web, blogs, tweets, customer comments, customer reviews, and so on. While the amount of textual data is increasing rapidly, businesses ability to summarize, understand, and make sense of such data for making better business decisions remain challenging. This presentation takes a quick look at how to organize and analyze textual data for extracting insightful customer intelligence from a large collection of documents and for using such information to improve business operations and performance. Multiple business applications of case studies using real data that demonstrate applications of text analytics and sentiment mining using SAS^® Text Miner and SAS^® Sentiment Analysis Studio are presented. While SAS^® products are used as tools for demonstration only, the topics and theories covered are generic (not tool specific).

In randomized experiments, it is generally assumed that the hierarchical structures and variances are the same in the treatment and control groups. In some situations, however, these structures and variance components can differ. Consider a randomized experiment in which individuals randomized to the treatment condition are further assigned to clusters in which the intervention is administered, but no such clustering occurs in the control condition. Such a structure can occur, for example, when the individuals in the treatment condition are randomly assigned to group therapy sessions or to mathematics tutoring groups; individuals in the control condition do not receive group therapy or mathematics tutoring and therefore do not have that level of clustering. In this example, individuals in the treatment condition have a hierarchical structure, but individuals in the control condition do not. If the therapists or tutors differ in efficacy, the clustering in the treatment condition induces an extra source of variability in the data that needs to be accounted for in the analysis. We show how special features of SAS^® PROC MIXED and PROC GLIMMIX can be used to analyze data in which one or more treatment groups have a hierarchical structure that differs from that in the control group. We also discuss how to code variables in order to increase the computational efficiency for estimating parameters from these designs.

Sampling is widely used in different fields for quality control, population monitoring, and modeling. However, the purposes of sampling might be justified by the business scenario, such as legal or compliance needs. This paper uses one probability sampling method, stratified sampling, combined with quality control review business cost to determine an optimized procedure of sampling that satisfies both statistical selection criteria and business needs. The first step is to determine the total number of strata by grouping the strata with a small number of sample units, using box-and-whisker plots outliers as a whole. Then, the cost to review the sample in each stratum is justified by a corresponding business counter-party, which is the human working hour. Lastly, using the determined number of strata and sample review cost, optimal allocation of predetermined total sample population is applied to allocate the sample into different strata.

In our previous work, we often needed to perform large numbers of repetitive and data-driven post-campaign analyses to evaluate the performance of marketing campaigns in terms of customer response. These routine tasks were usually carried out manually by using Microsoft Excel, which was tedious, time-consuming, and error-prone. In order to improve the work efficiency and analysis accuracy, we managed to automate the analysis process with SAS^® programming and replace the manual Excel work. Through the use of SAS macro programs and other advanced skills, we successfully automated the complicated data-driven analyses with high efficiency and accuracy. This paper presents and illustrates the creative analytical ideas and programming skills for developing the automatic analysis process, which can be extended to apply in a variety of business intelligence and analytics fields.

As IT professionals, saving time is critical. Delivering timely and quality-looking reports and information to management, end users, and customers is essential. SAS^® provides numerous 'canned' PROCedures for generating quick results to take care of these needs ... and more. In this hands-on workshop, attendees acquire basic insights into the power and flexibility offered by SAS PROCedures using PRINT, FORMS, and SQL to produce detail output; FREQ, MEANS, and UNIVARIATE to summarize and create tabular and statistical output; and data sets to manage data libraries. Additional topics include techniques for informing SAS which data set to use as input to a procedure, how to subset data using a WHERE statement (or WHERE= data set option), and how to perform BY-group processing.

As complicated as the macro language is to learn, there are very strong reasons for doing so. At its heart, the macro language is a code generator. In its simplest uses, it can substitute simple bits of code like variable names and the names of data sets that are to be analyzed. In more complex situations, it can be used to create entire statements and steps based on information may even be unavailable to the person writing or even executing the macro. At the time of execution, it can be used to make queries of the SAS^® environment as well as the operating system, and utilize the gathered information to make informed decisions about how it is to further function and execute.

Because the macro language is primarily a code generator, it makes sense that the code that it creates must be generated before it can be executed. This implies that execution of the macro language comes first. Simple as this is in concept, timing issues and conflicts are often not so simple to recognize in application. As we use the macro language to take on more complex tasks, it becomes even more critical that we have an understanding of these issues.

Macro variables and their values are stored in symbol tables, which in turn are held in memory. Not only are there are a number of ways to create macro variables, but they can be created in a wide variety of situations. How they are created and under what circumstances effects the variable s scope how and where the macro variable is stored and retrieved. There are a number of misconceptions about macro variable scope and about how the macro variables are assigned to symbol tables. These misconceptions can cause problems that the new, and sometimes even the experienced, macro programmer does not anticipate. Understanding the basic rules for macro variable assignment can help the macro programmer solve some of these problems that are otherwise quite mystifying.

This paper provides an overview of how to create a SAS^® Enterprise Guide^® process that is well designed, simple, documented, automated, modular, efficient, reliable, and easy to maintain. Topics include how to organize a SAS Enterprise Guide process, how to best document in SAS Enterprise Guide, when to leverage point-and-click functionality, and how to automate and simplify SAS Enterprise Guide processes. This paper has something for any SAS Enterprise Guide user, new or experienced!

The emerging discipline of data governance encompasses data quality assurance, data access and use policy, security risks and privacy protection, and longitudinal management of an organization s data infrastructure. In the interests of forestalling another bureaucratic solution to data governance issues, this presentation features database programming tools that provide rapid access to big data and make selective access to and restructuring of metadata practical.

Digital data has manifested into a classic BIG DATA challenge for marketers who want to push past the retroactive analysis limitations of traditional web analytics. The current groundswell of digital device adoption and variety of digital interactions grows larger year after year. The opportunity for 'digital intelligence' has arrived, as traditional web analytic techniques were not designed for the breadth of channels, devices, and pace that fuels consumer experiences. In parallel, today's landscape for data visualization, advanced analytics, and our ability to process very large amounts of multi-channel information is changing. The democratization of analytics for the masses is upon us, and marketers have the oppourtunity to take advantage of descriptive, predictive, and (most importantly) prescriptive data-driven insights. This presentation describes how organizations can use SAS^® products, specifically SAS^® Visual Analytics and SAS^® Adaptive Customer Experience, to overcome the limitations of web analytics, and support data-driven integrated marketing objectives.

Merging or joining data sets is an integral part of the data consolidation process. Within SAS^®, there are numerous methods and techniques that can be used to combine two or more data sets. We commonly think that within the DATA step the MERGE statement is the only way to join these data sets, while in fact, the MERGE is only one of numerous techniques available to us to perform this process. Each of these techniques has advantages, and some have disadvantages. The informed programmer needs to have a grasp of each of these techniques if the correct technique is to be applied. This paper covers basic merging concepts and options within the DATA step, as well as a number of techniques that go beyond the traditional MERGE statement. These include fuzzy merges, double SET statements, and the use of key indexing. The discussion will include the relative efficiencies of these techniques, especially when working with large data sets.

Particle swarm optimization is a heuristic global optimization method that was given by James Kennedy and Russell C. Eberhart in 1995. (James Kennedy and Russell C. Eberhart). The purpose of this paper develops a code for particle swarm optimization in SAS^® 9.2.

Evaluation of the efficacy of an intervention is often complicated because the intervention is not randomly assigned. Usually, interventions in marketing, such as coupons or retention campaigns, are directed at customers because their spending is below some threshold or because the customers themselves make a purchase decision. The presence of nonrandom assignment of the stimulus can lead to over- or underestimating the value of the intervention. This can cause future campaigns to be directed at the wrong customers or cause the impacts of these effects to be over- or understated. This paper gives a brief overview of selection bias, demonstrates how selection in the data can be modeled, and shows how to apply some of the important consistent methods of estimating selection models, including Heckman's two-step procedure, in an empirical example. Sample code is provided in an appendix.

APP is an unofficial collective abbreviation for the SAS^® functions ADDR, PEEK, PEEKC, the CALL POKE routine, and their so-called LONG 64-bit counterparts the SAS tools designed to directly read from and write to physical memory in the DATA step. APP functions have long been a SAS dark horse. First, the examples of APP usage in SAS documentation amount to a few technical report tidbits intended for mainframe system programming, with nary a hint how the functions can be used for data management programming. Second, the documentation note on the CALL POKE routine is so intimidating in tone that many potentially receptive folks might decide to avoid the allegedly precarious route altogether. However, little can stand in the way of an inquisitive SAS programmer daring to take a close look, and it turns out that APP functions are very simple and useful tools! They can be used to explore how things really work, to make code more concise, to implement en masse data movement, and they can often dramatically improve execution efficiency. The author and many other SAS experts (notably Peter Crawford, Koen Vyverman, Richard DeVenezia, Toby Dunn, and the fellow masked by his 'Puddin' Man' sobriquet) have been poking around the SAS APP realm on SAS-L and in their own practices since 1998, occasionally letting the SAS community at large to peek at their findings. This opus is an attempt to circumscribe the results in a systematic manner. Welcome to the APP world! You are in for a few glorious surprises.

Given a time series data set, you can use automatic time series modeling software to select an appropriate time series model. You can use various statistics to judge how well each candidate model fits the data (in-sample). Likewise, you can use various statistics to select an appropriate model from a list of candidate models (in-sample or out-of-sample or both). Finally, you can use rolling simulations to evaluate ex-ante forecast performance over several forecast origins. This paper demonstrates how you can use SAS^® Forecast Server Procedures and SAS^® Forecast Studiosoftware to perform the statistical analyses that are related to rolling simulations.

SAS^® is an outstanding suite of software, but not everyone in the workplace speaks SAS. However, almost everyone speaks Excel. Often, the data you are analyzing, the data you are creating, and the report you are producing is a form of a Microsoft Excel spreadsheet. Every year at SAS^® Global Forum, there are SAS and Excel presentations, not just because Excel isso pervasive in the workplace, but because there s always something new to learn (or re-learn)! This paper summarizes and references (and pays homage to!) previous SAS Global Forum presentations, as well as examines some of the latest Excel capabilities with the latest versions of SAS^® 9.4 and SAS^® Visual Analytics.

Explore the various DATA step merge and PROC SQL join processes. This presentation examines the similarities and differences between merges and joins, and provides examples of effective coding techniques. Attendees examine the objectives and principles behind merges and joins, one-to-one merges (joins), and match-merge (equi-join), as well as the coding constructs associated with inner and outer merges (joins) and PROC SQL set operators.

Beginning with SA^®S 9.2, ODS Graphics introduces a whole new way of generating graphs using SAS^®. With just a few lines of code, you can create a wide variety of high-quality graphs. This paper covers the three basic ODS Graphics procedures SGPLOT, SGPANEL, and SGSCATTER. SGPLOT produces single-celled graphs. SGPANEL produces multi-celled graphs that share common axes. SGSCATTER produces multi-celled graphs that might use different axes. This paper shows how to use each of these procedures in order to produce different types of graphs, how to send your graphs to different ODS destinations, how to access individual graphs, and how to specify properties of graphs, such as format, name, height, and width.

Portfolio segmentation is key in all forecasting projects. Not all products are equally predictable. Nestl uses animal names for its segmentation, and the animal behavior translates well into how the planners should plan these products. Mad Bulls are those products that are tough to predict, if we don't know what is causing their unpredictability. The Horses are easier to deal with. Modern time series based statistical forecasting methods can tame Mad Bulls, as they allow to add explanatory variables into the models. Nestl now complements its Demand Planning solution based on SAP with predictive analytics technology provided by SAS^®, to overcome these issues in an industry that is highly promotion-driven. In this talk, we will provide an overview of the relationship Nestl is building with SAS, and provide concrete examples of how modern statistical forecasting methods available in SAS^® Demand-Driven Planning and Optimization help us to increase forecasting performance, and therefore to provide high service to our customers with optimized stock, the primary goal of Nestl 's supply chains.

SAS^® High-Performance Analytics is a significant step forward in the area of high-speed, analytic processing in a scalable clustered environment. However, Big Data problems generally come with data from lots of data sources, at varying levels of maturity. Teradata s innovative Unified Data Architecture (UDA) represents a significant improvement in the way that large companies can think about Enterprise Data Management, including the Teradata Database, Hortonworks Hadoop, and Aster Data Discovery platform in a seamless integrated platform. Together, the two platforms provide business users, analysts, and data scientists with the ideally suited data management platforms, targeted specifically to their analytic needs, based upon analytic use cases, managed in a single integrated enterprise data management environment. The paper will focus on how several companies today are using Teradata s Integrated Hardware and Software UDA Platform to manage a single enterprise analytic environment, fight the ongoing proliferation of analytic data marts, and speed their operational analytic processes.

The use, limits, and misuse of statistical models in different industries are propelling new techniques and best practices in forecasting. Until recently, many factors such as data collection and storage constraints, poor data synchronization capabilities, technology limitations, and limited internal analytical expertise have made it impossible to forecast intermittent demand. In addition, integrating consumer demand data (that is, point-of-sale [POS]/syndicated scanner data from ACNielsen/ Information Resources Inc. [IRI]/Intercontinental Marketing Services [IMS]) to shipment forecasts was a challenge. This presentation gives practical how-to advice on intermittent forecasting and outlines a framework, using multi-tiered causal analysis (MTCA), that links demand to supply. The framework uses a process of nesting causal models together by using data and analytics.

Report automation and scheduling are very hot topics in many industries. They confer many advantages including reduced work load, elimination of repetitive tasks, generatation of accurate results, and better performance. This paper illustrates how to design an appropriate program to automate and schedule reports in SAS^® 9.1 and SAS^® Enterprise Guide^® 5.1 using a SAS^® server as well as the Windows Scheduler. The automation part includes good aspects of formatting Microsoft Excel tables using XML or VBA coding or any other formats, and conditional auto e-mailing with file attachments. We systematically walk through each step with a clear flow diagram from the data source to the final destination. We also discuss details of server-side and PC-side schedulers and how these schedulers involve invoking batch programs.

How do you compare group responses when the data are unbalanced or when covariates come into play? Simple averages will not do, but LS-means are just the ticket. Central to postfitting analysis in SAS/STAT^® linear modeling procedures, LS-means generalize the simple average for unbalanced data and complicated models. They play a key role both in standard treatment comparisons and Type III tests and in newer techniques such as sliced interaction effects and diffograms. This paper reviews the definition of LS-means, focusing on their interpretation as predicted population marginal means, and it illustrates their broad range of use with numerous examples.

One of the most common questions about logistic regression is How do I know if my model fits the data? There are many approaches to answering this question, but they generally fall into two categories: measures of predictive power (like R-squared) and goodness of fit tests (like the Pearson chi-square). This presentation looks first at R-squared measures, arguing that the optional R-squares reported by PROC LOGISTIC might not be optimal. Measures proposed by McFadden and Tjur appear to be more attractive. As for goodness of fit, the popular Hosmer and Lemeshow test is shown to have some serious problems. Several alternatives are considered.

Bootstrapped Decision Tree is a variable selection method used to identify and eliminate unintelligent variables from a large number of initial candidate variables. Candidates for subsequent modeling are identified by selecting variables consistently appearing at the top of decision trees created using a random sample of all possible modeling variables. The technique is best used to reduce hundreds of potential fields to a short list of 30 50 fields to be used in developing a model. This method for variable selection has recently become available in JMP^® under the name BootstrapForest; this paper presents an implementation in Base SAS^®9. The method does accept but does not require a specific outcome to be modeled and will therefore work for nearly any type of model, including segmentation, MCMC, multiple discrete choice, in addition to standard logistic regression. Keywords: Bootstrapped Decision Tree, Variable Selection

PROC TABULATE is a powerful tool for creating tabular summary reports. Its advantages, over PROC REPORT, are that it requires less code, allows for more convenient table construction, and uses syntax that makes it easier to modify a table s structure. However, its inability to compute the sum, difference, product, and ratio of column sums has hindered its use in many circumstances. This paper illustrates and discusses some creative approaches and methods for overcoming these limitations, enabling users to produce needed reports and still enjoy the simplicity and convenience of PROC TABULATE. These methods and skills can have prominent applications in a variety of business intelligence and analytics fields.

The SQL procedure contains many powerful and elegant language features for intermediate and advanced SQL users. This presentation discusses topics that will help SAS^® users unlock the many powerful features, options, and other gems found in the SQL universe. Topics include CASE logic; a sampling of summary (statistical) functions; dictionary tables; PROC SQL and the SAS macro language interface; joins and join algorithms; PROC SQL statement options _METHOD, MAGIC=101, MAGIC=102, and MAGIC=103; and key performance (optimization) issues.

Many SAS^® procedures use classification variables when they are processing the data. These variables control how the procedure forms groupings, summarizations, and analysis elements. For statistics procedures, they are often used in the formation of the statistical model that is being analyzed. Classification variables can be explicitly specified with a CLASS statement, or they can be specified implicitly from their usage in the procedure. Because classification variables have such a heavy influence on the outcome of so many procedures, it is essential that the analyst have a good understanding of how classification variables are applied. Certainly there are a number of options (system and procedural) that affect how classification variables behave. While you may be aware of some of these options, a great many are new, and some of these new options and techniques are especially powerful. You really need to be open to learning how to program with CLASS.

Can you juggle? Maybe. Can you shuffle a deck of cards? Probably. Can you do both at the same time? Welcome to the world of SAS^® and LSF! Very few SAS Administrators start out learning LSF at the same time they learn SAS; most already know SAS, possibly starting out as a programmer or analyst, but now have to step up to an enterprise platform with shared resources. The biggest challenge on an enterprise platform? How to share! How to maximum the utilization of a SAS platform, yet still ensure everyone gets their fair share? This presentation will boil down the 2000+ pages of LSF documentation to provide an introduction into various LSF concepts: * Host * Clusters * Nodes * Queues * First-Come-First-Serve * Fairshare * and various configuration settings: UJOB_LIMIT, PJOB_LIMIT, etc. Plus some insight on where to configure all these settings which are set up by the installation process, and which can be configured by the SAS or LSF administrator. This session is definitely NOT for experts. It is for those about to step into an enterprise deployment of SAS, and want to understand how the SAS server sessions they know so well can run on a shared platform.

This discussion uses SAS^® Office Analytics as an example to demonstrate the importance of preparing for the SAS^® installation. There are many nuances as well as requirements that need to be addressed before you do an installation. These requirements are basically similar, yet they differ according to the target installation operating system. In other words, there are some differences in preparation routines for Windows and *Nix flavors. Our discussion focuses on these three topics: 1. Pre-installation considerations such as sizing, storage, proper credentials, and third-party requirements; 2. Installation steps and requirements; and 3. Post-installation configuration. In addition to preparation, this paper also discusses potential issues and pitfalls to watch out for, as well as best practices.

Are you wondering what is causing your valuable machine asset to fail? What could those drivers be, and what is the likelihood of failure? Do you want to be proactive rather than reactive? Answers to these questions have arrived with SAS^® Predictive Asset Maintenance. The solution provides an analytical framework to reduce the amount of unscheduled downtime and optimize maintenance cycles and costs. An all new (R&D-based) version of this offering is now available. Key aspects of this paper include: Discussing key business drivers for and capabilities of SAS Predictive Asset Maintenance. Detailed analysis of the solution, including: Data model Explorations Data selections Path I: analysis workbench maintenance analysis and stability monitoring Path II: analysis workbench JMP^®, SAS^® Enterprise Guide^®, and SAS^® Enterprise Miner^™ Analytical case development using SAS Enterprise Miner, SAS^® Model Manager, and SAS^® Data Integration Studio SAS Predictive Asset Maintenance Portlet for reports A realistic business example in the oil and gas industry is used.

Using Lilypond typesetting software, you can write publication-grade music scores. The input for Lilypond is a text file that can be written once and then transferred to SAS^® for patterned repetition, so that you can cycle through patterns that occur in music. The author plays a sequence of notes and then writes this into Lilypond code. The sequence starts in the key of C with only a two-note sequence. Then the sequence is extended to three-, four-, then five-note sequences, always contained in one octave. SAS is then used to write the same code for all other eleven keys and in seven scale modes. The method is very simple and not advanced programming. Lookup files are used in the programming, demonstrating efficient lookup techniques. The result is a lengthy book or exercise for practicing music in a PDF file, and a sound source file in midi format is created that you can hear. This method shows how various programming languages can be used to write other programming languages.

Linear regression has been a widely used approach in social and medical sciences to model the association between a continuous outcome and the explanatory variables. Assessing the model assumptions, such as linearity, normality, and equal variance, is a critical step for choosing the best regression model. If any of the assumptions are violated, one can apply different strategies to improve the regression model, such as performing transformation of the variables or using a spline model. SAS^® has been commonly used to assess and validate the postulated model and SAS^® 9.3 provides many new features that increase the efficiency and flexibility in developing and analyzing the regression model, such as ODS Statistical Graphics. This paper aims to demonstrate necessary steps to find the best linear regression model in SAS 9.3 in different scenarios where variable transformation and the implementation of a spline model are both applicable. A simulated data set is used to demonstrate the model developing steps. Moreover, the critical parameters to consider when evaluating the model performance are also discussed to achieve accuracy and efficiency.

SAS^® OLAP technology is used to organize and present summarized data for business intelligence applications. It features flexible options for creating and storing aggregations to improve performance and brings a powerful multi-dimensional approach to querying data. This paper focuses on managing security features available to OLAP cubes through the combination of SAS metadata and MDX logic.

The Purchasing Department is considering contracting with your team for a new SAS^® Enterprise BI application. He's already met with SAS^® and seen the sales pitch, and he is very interested. But the manager is a tightwad and not sure about spending the money. Also, he wants his team to be the primary developers for this new application. Before investing his money on training, programming, and support, he would like a proof-of-concept. This paper will walk you through the seven steps to create a SAS Enterprise BI POC project: Develop a kick-off meeting including a full demo of the SAS Enterprise BI tools. Set up your UNIX file systems and security. Set up your SAS metadata ACTs, users, groups, folders, and libraries. Make sure the necessary SAS client tools are installed on the developers machines. Hold a SAS Enterprise BI workshop to introduce them to the basics, including SAS^® Enterprise Guide^®, SAS^® Stored Processes, SAS^® Information Maps, SAS^® Web Report Studio, SAS^® Information Delivery Portal, and SAS^® Add-In for Microsoft Office, along with supporting documentation. Work with them to develop a simple project, one that highlights the benefits of SAS Enterprise BI and shows several methods for achieving the desired results. Last but not least, follow up! Remember, your goal is not to launch a full-blown application. Instead, we ll strive toward helping them see the potential in your organization for applying this methodology.

Global businesses must react to daily changes in market conditions over multiple geographies and industries. Consuming reputable daily economic reports assists in understanding these changing conditions, but requires both a significant human time commitment and a subjective assessment of each topic area of interest. To combat these constraints, Dow's Advanced Analytics team has constructed a process to calculate sentence-level topic frequency and sentiment scoring from unstructured economic reports. Daily topic sentiment scores are aggregated to weekly and monthly intervals and used as exogenous variables to model external economic time series data. These models serve to both validate the relationship between our sentiment scoring process and also as near-term forecasts where daily or weekly variables are unavailable. This paper will first describe our process of using SAS^® Text Miner to import and discover economic topics and sentiment from unstructured economic reports. The next section describes sentiment variable selection techniques that use SAS/STAT^®, SAS/ETS^®, and SAS^® Enterprise Miner^™ to generate similarity measures to economic indices. Our process then uses ARIMAX modeling in SAS^® Forecast Studio to create economic index forecasts with topic sentiments. Finally, we show how the sentiment model components are used as a matrix of economic key performance indicators by topic and geography.

'Can I have that in Excel?' This is a request that makes many of us shudder. Now your boss has discovered Microsoft Excel pivot tables. Unfortunately, he has not discovered how to make them. So you get to extract the data, massage the data, put the data into Excel, and then spend hours rebuilding pivot tables every time the corporate data is refreshed. In this workshop, you learn to be the armchair quarterback and build pivot tables without leaving the comfort of your SAS^® environment. In this workshop, you learn the basics of Excel pivot tables and, through a series of exercises, you learn how to augment basic pivot tables first in Excel, and then using SAS. No prior knowledge of Excel pivot tables is required.

For decades, SAS^® has been the cornerstone of many organizations for business reporting. In more recent times, the ability to quickly determine the performance of an organization through the use of dashboards has become a requirement. Different ways of providing dashboard capabilities are discussed in this paper: using out-of-the-box solutions such as SAS^® Visual Analytics and SAS^® BI Dashboard, through to alternative solutions using SAS^® Stored Processes, batch processes, and SAS^® Integration Technologies. Extending the available indicators is also discussed, using Graph Template Language and KPI indicators provided with Base SAS^®, as well as alternatives such as Google Charts and Flash objects. Real-world field experience, problem areas, solutions, and tips are shared, along with live examples of some of the different methods.

The SAS^® Enterprise Guide^® Query Builder is one of the most powerful components of the software. It enables a user to bring in data, join, drop and add columns, compute new columns, sort, filter data, leverage the advanced expression builder, change column attributes, and more! This presentation provides an overview of the major features of this powerful tool and how to leverage it every day.

This is the way I have always done it and it works fine for me. Have you heard yourself or others say this when someone suggests a new technique to help solve a problem? Most of us have a set of tricks and techniques from which we draw when starting a new project. Over time we might overlook newer techniques because our old toolkit works just fine. Sometimes we actively avoid new techniques because our initial foray leaves us daunted by the steep learning curve to mastery. For me, the PRX functions and the SAS^® hash object fell into this category. In this workshop, we address possible objections to learning to use the SAS hash object. We start with the fundamentals of setting up the hash object and work through a variety of practical examples to help you master this powerful technique.

As a longtime Base SAS^® programmer, whether to use a different application for programming is a constant question when powerful applications such as SAS^® Enterprise Guide^® are available. This paper provides some important tips for a programmer, such as the best way to use the code window and how to take advantage of system-generated code in SAS Enterprise Guide 5.1. This paper also explains the differences between some of the functions and procedures in Base SAS and SAS Enterprise Guide. It highlights features in SAS Enterprise Guide such as process flow, data access management, and report automation, including formatting using XML tag sets.

This paper gives you a better idea of how and where to use the record lookup functions to locate observations where a variable has some characteristic. Various related functions are illustrated to search numeric and character values in this process. Code is shown with time comparisons. I will discuss three possible ways to retrieve records using the SAS^® DATA step, PROC SQL, and Perl regular expressions. Real and CPU time processing issues will be highlighted when comparing to retrieve records using these methods. Although the program is written for the PC using SAS^® 9.2 in a Windows XP 32-bit environment, all the functions are applicable to any system. All the tools discussed are in Base SAS^®. The typical attendee or reader will have some experience in SAS, but not a lot of experience dealing with large amount of data.

SAS^® Add-In for Microsoft Office remains a popular tool for people who are not SAS^® programmers due to its easy interface with the SAS servers. In this session, you'll learn some of the many tricks that other organizations use for getting more value out of the tool.

You have built the simple bar chart and mastered the art of layering multiple plot statements to create complex graphs like the Survival Plot using the SGPLOT procedure. You know all about how to use plot statements creatively to get what you need and how to customize the axes to achieve the look and feel you want. Now it s time to up your game and step into the realm of the Graphics Wizard. Behold the magical powers of Graph Template Language Layouts! Here you will learn the esoteric art of creating complex multi-cell graphs using LAYOUT LATTICE. This is the incantation that gives you the power to build complex, multi-cell graphs like the Forest plot, Stock plots with multiple indicators like MACD and Stochastics, Adverse Events by Relative Risk graphs, and more. If you ever wondered how the Diagnostics panel in the REG procedure was built, this paper is for you. Be warned, this is not the realm for the faint of heart!

Regression is a helpful statistical tool for showing relationships between two or more variables. However, many users can find the barrage of numbers at best unhelpful, and at worst undecipherable. Using the shipments and inventories historical data from the U.S. Census Bureau's office of Manufacturers' Shipments, Inventories, and Orders (M3), we can create a graphical representation of two time series with PROC GPLOT and map out reported and expected results. By combining this output with results from PROC REG, we are able to highlight problem areas that might need a second look. The resulting graph shows which dates have abnormal relationships between our two variables and presents the data in an easy-to-use format that even users unfamiliar with SAS^® can interpret. This graph is ideal for analysts finding problematic areas such as outliers and trend-breakers or for managers to quickly discern complications and the effect they have on overall results.

When reading data files or writing SAS^® programs, we are often hunting for the right format or informat. There are so many to choose from! Does it seem like too many to search the manual? Let SAS help find the right one! We use the SAS dictionary table VFORMAT and a very small SAS program. This presentation demonstrates how two simple functions unlock the potential of this great resource: SASHELP.VFORMAT.

Expensive physical capital must be regularly maintained for optimal efficiency and long-term insurance against damage. The maintenance process usually consists of constantly monitoring high-frequency sensor data and performing corrective maintenance when the expected values do not match the actual values. An economic system can also be thought of as a system that requires constant monitoring and occasional maintenance in the form of monetary or fiscal policy. This paper shows how to use the SSM procedure in SAS/ETS^® to make forecasts of expected values by using high-frequency multivariate time series. The paper also demonstrates the functionality of the new SASEFRED interface engine in SAS/ETS.