In 2013, the University of North Carolina (UNC) at Chapel Hill initiated enterprise-wide use of SAS® solutions for reporting and data transformations. Just over one year later, the initial rollout was scheduled to go live to an audience of 5,500 users as part of an adoption of PeopleSoft ERP for Finance, Human Resources, Payroll, and Student systems. SAS® Visual Analytics was used for primary report delivery as an embedded resource within the UNC Infoporte, an existing portal. UNC made the date. With the SAS solutions, UNC delivered the data warehouse and initial reports on the same day that the ERP systems went live. After the success of the initial launch, UNC continues to develop and evolve the solution with additional technologies, data, and reports. This presentation touches on a few of the elements required for a medium to large size organization to integrate SAS solutions such as SAS Visual Analytics and SAS® Enterprise Business Intelligence within their infrastructure.
Jonathan Pletzke, UNC Chapel Hill
The importance of econometrics in the analytics toolkit is increasing every day. Econometric modeling helps uncover structural relationships in observational data. This paper highlights the many recent changes to the SAS/ETS® portfolio that increase your power to explain the past and predict the future. Examples show how you can use Bayesian regression tools for price elasticity modeling, use state space models to gain insight from inconsistent time series, use panel data methods to help control for unobserved confounding effects, and much more.
Mark Little, SAS
Kenneth Sanford, SAS
Soon after the advent of the SAS® hash object in SAS® 9.0, its early adopters realized that the potential functionality of the new structure is much broader than basic 0(1)-time lookup and file matching. Specifically, they went on to invent methods of data aggregation based on the ability of the hash object to quickly store and update key summary information. They also demonstrated that the DATA step aggregation using the hash object offered significantly lower run time and memory utilization compared to the SUMMARY/MEANS or SQL procedures, coupled with the possibility of eliminating the need to write the aggregation results to interim data files and the programming flexibility that allowed them to combine sophisticated data manipulation and adjustments of the aggregates within a single step. Such developments within the SAS user community did not go unnoticed by SAS R&D, and for SAS® 9.2 the hash object had been enriched with tag parameters and methods specifically designed to handle aggregation without the need to write the summarized data to the PDV host variable and update the hash table with new key summaries, thus further improving run-time performance. As more SAS programmers applied these methods in their real-world practice, they developed aggregation techniques fit to various programmatic scenarios and ideas for handling the hash object memory limitations in situations calling for truly enormous hash tables. This paper presents a review of the DATA step aggregation methods and techniques using the hash object. The presentation is intended for all situations in which the final SAS code is either a straight Base SAS DATA step or a DATA step generated by any other SAS product.
Paul Dorfman, Dorfman Consukting
Don Henderson, Henderson Consulting Services
My SAS® Global Forum 2013 paper 'Variable Reduction in SAS® by Using Weight of Evidence (WOE) and Information Value (IV)' has become the most sought-after online article on variable reduction in SAS since its publication. But the methodology provided by the paper is limited to reduction of numeric variables for logistic regression only. Built on a similar process, the current paper adds several major enhancements: 1) The use of WOE and IV has been expanded to the analytics and modeling for continuous dependent variables. After the standardization of a continuous outcome, all records can be divided into two groups: positive performance (outcome y above sample average) and negative performance (outcome y below sample average). This treatment is rigorously consistent with the concept of entropy in Information Theory: the juxtaposition of two opposite forces in one equation, and a stronger contrast between the two suggests a higher intensity , that is, more information delivered by the variable in question. As the standardization keeps the outcome variable continuous and quantified, the revised formulas for WOE and IV can be used in the analytics and modeling for continuous outcomes such as sales volume, claim amount, and so on. 2) Categorical and ordinal variables can be assessed together with numeric ones. 3) Users of big data usually need to evaluate hundreds or thousands of variables, but it is not uncommon that over 90% of variables contain little useful information. We have added a SAS macro that trims these variables efficiently in a broad-brushed manner without a thorough examination. Afterward, we examine the retained variables more carefully on their behaviors to the target outcome. 4) We add Chi-Square analysis for categorical/ordinal variables and Gini coefficients for numeric variable in order to provide additional suggestions for segmentation and regression. With the above enhancements added, a SAS macro program is provided at the end of the paper as a
complete suite for variable reduction/selection that efficiently evaluates all variables together. The paper provides a detailed explanation for how to use the SAS macro and how to read the SAS outputs that provide useful insights for subsequent linear regression, logistic regression, or scorecard development.
Alec Zhixiao Lin, PayPal Credit
At the University of North Carolina at Chapel Hill, we had the pleasure of rolling out a strong enterprise-wide SAS® Visual Analytics environment in 10 months, with strong support from SAS. We encountered many bumps in the road, moments of both mountain highs and worrisome lows, as we learned what we could and could not do, and new ways to accomplish our goals. Our journey started in December of 2013 when a decision was made to try SAS Visual Analytics for all reporting, and incorporate other solutions only if and when we hit an insurmountable obstacle. We are still strongly using SAS Visual Analytics and are augmenting the tools with additional products. Along the way, we learned a number of things about the SAS Visual Analytics environment that are gems, whether one is relatively new to SAS® or an old hand. Measuring what is happening is paramount to knowing what constraints exist in the system before trying to enhance performance. Targeted improvements help if measurements can be made before and after each alteration. There are a few architectural alterations that can help in general, but we have seen that measuring is the guaranteed way to know what the problems are and whether the cures were effective.
Jonathan Pletzke, UNC Chapel Hill
Big data is quickly moving from buzzword to critical tool for today's analytics applications. It can be easy to get bogged down by Apache Hadoop terminology, but when you get down to it, big data is about empowering organizations to deliver the right message or product to the right audience at the right time. Find out how Epsilon built a next-generation marketing application, leveraging Cloudera and taking advantage of SAS® capabilities by our data science/analytics team, that provides its clients with a 360-degree view of their customers. Join Bob Zurek, Senior Vice President of Products at Epsilon to hear how this new big data solution is enhancing customer service and providing a significant competitive differentiation.
Bob Zurek, Epsilon
In-database processing refers to the integration of advanced analytics into the data warehouse. With this capability, analytic processing is optimized to run where the data reside, in parallel, without having to copy or move the data for analysis. From a data governance perspective there are many good reasons to embrace in-database processing. Many analytical computing solutions and large databases use this technology because it provides significant performance improvements over more traditional methods. Come learn how Blue Cross Blue Shield of Tennessee (BCBST) uses in-database processing from SAS and Teradata.
Harold Klagstad, BlueCross BlueShield of TN
SAS® Analytics enables organizations to tackle complex business problems using big data and to provide insights needed to make critical business decisions. A well-architected enterprise storage infrastructure is needed to realize the full potential of SAS Analytics. However, as the need for big data analytics and rapid response times increases, the performance gap between server speeds and traditional hard disk drive (HDD) based storage systems can be a significant concern. The growing performance gap can have detrimental effects, particularly when it comes to critical business applications. As a result, organizations are looking for newer, smarter, faster storage systems to accelerate business insights. IBM FlashSystem Storage systems store the data in flash memory. They are designed for dramatically faster access times and support incredible amounts of input/output operations per second (IOPS) and throughput, with significantly lower latency than HDD-based solutions. Due to their macro-efficiency design, FlashSystem Storage systems consume less power and have significantly lower cooling and space requirements, while allowing server processors to run SAS Analytics more efficiently. Being an all-flash storage system, IBM FlashSystem provides consistent low latency response across IOPS range, as the analytics workload scales. This paper introduces the benefits of IBM FlashSystem Storage for deploying SAS Analytics and highlights some of the deployment scenarios and architectural considerations. This paper also describes best practices and tuning guidelines for deploying SAS Analytics on FlashSystem Storage systems, which would help SAS Analytics customers in architecting solutions with FlashSystem Storage.
David Gimpl, IBM
Matt Key, IBM
Narayana Pattipati, IBM
Harry Seifert, IBM
Is your company using or considering using SAP Business Warehouse (BW) powered by SAP HANA? SAS® provides various levels of integration with SAP BW in an SAP HANA environment. This integration enables you to not only access SAP BW components from SAS, but to also push portions of SAS analysis directly into SAP HANA, accelerating predictive modeling and data mining operations. This paper explains the SAS toolset for different integration scenarios, highlights the newest technologies contributing to integration, and walks you through examples of using SAS with SAP BW on SAP HANA. The paper is targeted at SAS and SAP developers and architects interested in building a productive analytical environment with the help of the latest SAS and SAP collaborative advancements.
Tatyana Petrova, SAS
A leading killer in the United States is smoking. Moreover, over 8.6 million Americans live with a serious illness caused by smoking or second-hand smoking. Despite this, over 46.6 million U.S. adults smoke tobacco, cigars, and pipes. The key analytic question in this paper is, How would e-cigarettes affect this public health situation? Can monitoring public opinions of e-cigarettes using SAS® Text Analytics and SAS® Visual Analytics help provide insight into the potential dangers of these new products? Are e-cigarettes an example of Big Tobacco up to its old tricks or, in fact, a cessation product? The research in this paper was conducted on thousands of tweets from April to August 2014. It includes API sources beyond Twitter--for example, indicators from the Health Indicators Warehouse (HIW) of the Centers for Disease Control and Prevention (CDC)--that were used to enrich Twitter data in order to implement a surveillance system developed by SAS® for the CDC. The analysis is especially important to The Office of Smoking and Health (OSH) at the CDC, which is responsible for tobacco control initiatives that help states to promote cessation and prevent initiation in young people. To help the CDC succeed with these initiatives, the surveillance system also: 1) automates the acquisition of data, especially tweets; and 2) applies text analytics to categorize these tweets using a taxonomy that provides the CDC with insights into a variety of relevant subjects. Twitter text data can help the CDC look at the public response to the use of e-cigarettes, and examine general discussions regarding smoking and public health, and potential controversies (involving tobacco exposure to children, increasing government regulations, and so on). SAS® Content Categorization helps health care analysts review large volumes of unstructured data by categorizing tweets in order to monitor and follow what people are saying and why they are saying it. Ultimatel
y, it is a solution intended to help the CDC monitor the public's perception of the dangers of smoking and e-cigarettes, in addition, it can identify areas where OSH can focus its attention in order to fulfill its mission and track the success of CDC health initiatives.
Manuel Figallo, SAS
Emily McRae, SAS
The SAS® LASR™ Analytic Server acts as a back-end, in-memory analytics engine for solutions such as SAS® Visual Analytics and SAS® Visual Statistics. It is designed to exist in a massively scalable, distributed environment, often alongside Hadoop. This paper guides you through the impacts of the architecture decisions shared by both software applications and what they specifically mean for SAS®. We then present positive actions you can take to rebound from unexpected outages and resume efficient operations.
Rob Collum, SAS
This unique culture has access to lots of data, unstructured and structured; is innovative, experimental, groundbreaking, and doesn't follow convention; and has access to powerful new infrastructure technologies and scalable, industry-standard computing power like never seen before. The convergence of data, and innovative spirit, and the means to process it is what makes this a truly unique culture. In response to that, SAS® proposes The New Analytics Experience. Attend this session to hear more about the New Analytics Experience and the latest Intel technologies that make it possible.
Mark Pallone, Intel
A Chinese wind energy company designs several hundred wind farms each year. An important step in its design process is micrositing, in which it creates a layout of turbines for a wind farm. The amount of energy that a wind farm generates is affected by geographical factors (such as elevation of the farm), wind speed, and wind direction. The types of turbines and their positions relative to each other also play a critical role in energy production. Currently the company is using an open-source software package to help with its micrositing. As the size of wind farms increases and the pace of their construction speeds up, the open-source software is no longer able to support the design requirements. The company wants to work with a commercial software vendor that can help resolve scalability and performance issues. This paper describes the use of the OPTMODEL and OPTLSO procedures on the SAS® High-Performance Analytics infrastructure together with the FCMP procedure to model and solve this highly nonlinear optimization problem. Experimental results show that the proposed solution can meet the company's requirements for scalability and performance. A Chinese wind energy company designs several hundred wind farms each year. An important step of their design process is micro-siting, which creates a layout of turbines for a wind farm. The amount of energy generated from a wind farm is affected by geographical factors (such as elevation of the farm), wind speed, and wind direction. The types of turbines and their positions relative to each other also play critical roles in the energy production. Currently the company is using an open-source software package to help them with their micro-siting. As the size of wind farms increases and the pace of their construction speeds up, the open-source software is no longer able to support their design requirements. The company wants to work with a commercial software vendor that can help them resolve scalability and performance issues. This pap
er describes the use of the FCMP, OPTMODEL, and OPTLSO procedures on the SAS® High-Performance Analytics infrastructure to model and solve this highly nonlinear optimization problem. Experimental results show that the proposed solution can meet the company's requirements for scalability and performance.
Sherry (Wei) Xu, SAS
Steven Gardner, SAS
Joshua Griffin, SAS
Baris Kacar, SAS
Jinxin Yi, SAS
Managing and organizing external files and directories play an important part in our data analysis and business analytics work. A good file management system can streamline project management and file organizations and significantly improve work efficiency . Therefore, under many circumstances, it is necessary to automate and standardize the file management processes through SAS® programming. Compared with managing SAS files via PROC DATASETS, managing external files is a much more challenging task, which requires advanced programming skills. This paper presents and discusses various methods and approaches to managing external files with SAS programming. The illustrated methods and skills can have important applications in a wide variety of analytic work fields.
Justin Jia, Trans Union
Amanda Lin, CIBC