SAS Visual Statistics Papers A-Z

A
Paper 3354-2015:
Applying Data-Driven Analytics to Relieve Suffering Associated with Natural Disasters
Managing the large-scale displacement of people and communities caused by a natural disaster has historically been reactive rather than proactive. Following a disaster, data is collected to inform and prompt operational responses. In many countries prone to frequent natural disasters such as the Philippines, large amounts of longitudinal data are collected and available to apply to new disaster scenarios. However, because of the nature of natural disasters, it is difficult to analyze all of the data until long after the emergency has passed. For this reason, little research and analysis have been conducted to derive deeper analytical insight for proactive responses. This paper demonstrates the application of SAS® analytics to this data and establishes predictive alternatives that can improve conventional storm responses. Humanitarian organizations can use this data to understand displacement patterns and trends and to optimize evacuation routing and planning. Identifying the main contributing factors and leading indicators for the displacement of communities in a timely and efficient manner prevents detrimental incidents at disaster evacuation sites. Using quantitative and qualitative methods, responding organizations can make data-driven decisions that innovate and improve approaches to managing disaster response on a global basis. The benefits of creating a data-driven analytical model can help reduce response time, improve the health and safety of displaced individuals, and optimize scarce resources in a more effective manner. The International Organization for Migration (IOM), an intergovernmental organization, is one of the first-response organizations on the ground that responds to most emergencies. IOM is the global co-load for the Camp Coordination and Camp Management (CCCM) cluster in natural disasters. This paper shows how to use SAS® Visual Analytics and SAS® Visual Statistics for the Philippines in response to Super Typhoon Haiyan in Nove mber 2013 to develop increasingly accurate models for better emergency-preparedness. Using data collected from IOM's Displacement Tracking Matrix (DTM), the final analysis shows how to better coordinate service delivery to evacuation centers sheltering large numbers of displaced individuals, applying accurate hindsight to develop foresight on how to better respond to emergencies and disasters. Predictive models build on patterns found in historical and transactional data to identify risks and opportunities. The capacity to predict trends and behavior patterns related to displacement and mobility has the potential to enable the IOM to respond in a more timely and targeted manner. By predicting the locations of displacement, numbers of persons displaced, number of vulnerable groups, and sites at most risk of security incidents, humanitarians can respond quickly and more effectively with the appropriate resources (material and human) from the outset. The end analysis uses the SAS® Storm Optimization model combined with human mobility algorithms to predict population movement.
Lorelle Yuen, International Organization for Migration
Kathy Ball, Devon Energy
D
Paper 3386-2015:
Defining and Mapping a Reasonable Distance for Consumer Access to Market Locations
Using geocoded addresses from FDIC Summary of Deposits data with Census geospatial data including TIGER boundary files and population-weighted centroid shapefiles, we were able to calculate a reasonable distance threshold by metropolitan statistical area (MSA) (or metropolitan division, where applicable (MD)) through a series of SAS® DATA steps and SQL joins. We first used the Cartesian join with PROC SQL on the data set containing population-weighted centroid coordinates. (The data set contained geocoded coordinates of approximately 91,000 full-service bank branches.) Using the GEODIST function in SAS, we were able to calculate the distance to the nearest bank branch from the population-weighted centroid of each Census tract. The tract data set was then grouped by MSA/MD and sorted in ascending order within each grouping (using the RETAIN function) by distance to the nearest bank branch. We calculated the cumulative population and cumulative population percent for each MSA/MD. The reasonable threshold distance is established where cumulative population percent is closest (in either direction +/-) to 90%.
Read the paper (PDF).
Sarah Campbell, Federal Deposit Insurance Corporation
E
Paper 3190-2015:
Educating Future Business Leaders in the Era of Big Data
At NC State University, our motto is Think and Do. When it comes to educating students in the Poole College of Management, that means that we want them to not only learn to think critically but also to gain hands-on experience with the tools that will enable them to be successful in their careers. And, in the era of big data, we want to ensure that our students develop skills that will help them to think analytically in order to use data to drive business decisions. One method that lends itself well to thinking and doing is the case study approach. In this paper, we discuss the case study approach for teaching analytical skills and highlight the use of SAS® software for providing practical, hands-on experience with manipulating and analyzing data. The approach is illustrated with examples from specific case studies that have been used for teaching introductory and intermediate courses in business analytics.
Read the paper (PDF).
Tonya Balan, NC State University
Paper 3242-2015:
Entropy-Based Measures of Weight of Evidence and Information Value for Variable Reduction and Segmentation for Continuous Dependent Variables
My SAS® Global Forum 2013 paper 'Variable Reduction in SAS® by Using Weight of Evidence (WOE) and Information Value (IV)' has become the most sought-after online article on variable reduction in SAS since its publication. But the methodology provided by the paper is limited to reduction of numeric variables for logistic regression only. Built on a similar process, the current paper adds several major enhancements: 1) The use of WOE and IV has been expanded to the analytics and modeling for continuous dependent variables. After the standardization of a continuous outcome, all records can be divided into two groups: positive performance (outcome y above sample average) and negative performance (outcome y below sample average). This treatment is rigorously consistent with the concept of entropy in Information Theory: the juxtaposition of two opposite forces in one equation, and a stronger contrast between the two suggests a higher intensity , that is, more information delivered by the variable in question. As the standardization keeps the outcome variable continuous and quantified, the revised formulas for WOE and IV can be used in the analytics and modeling for continuous outcomes such as sales volume, claim amount, and so on. 2) Categorical and ordinal variables can be assessed together with numeric ones. 3) Users of big data usually need to evaluate hundreds or thousands of variables, but it is not uncommon that over 90% of variables contain little useful information. We have added a SAS macro that trims these variables efficiently in a broad-brushed manner without a thorough examination. Afterward, we examine the retained variables more carefully on their behaviors to the target outcome. 4) We add Chi-Square analysis for categorical/ordinal variables and Gini coefficients for numeric variable in order to provide additional suggestions for segmentation and regression. With the above enhancements added, a SAS macro program is provided at the end of the paper as a complete suite for variable reduction/selection that efficiently evaluates all variables together. The paper provides a detailed explanation for how to use the SAS macro and how to read the SAS outputs that provide useful insights for subsequent linear regression, logistic regression, or scorecard development.
Read the paper (PDF).
Alec Zhixiao Lin, PayPal Credit
H
Paper SAS1704-2015:
Helpful Hints for Transitioning to SAS® 9.4
A group tasked with testing SAS® software from the customer perspective has gathered a number of helpful hints for SAS® 9.4 that will smooth the transition to its new features and products. These hints will help with the 'huh?' moments that crop up when you are getting oriented and will provide short, straightforward answers. We also share insights about changes in your order contents. Gleaned from extensive multi-tier deployments, SAS® Customer Experience Testing shares insiders' practical tips to ensure that you are ready to begin your transition to SAS 9.4. The target audience for this paper is primarily system administrators who will be installing, configuring, or administering the SAS 9.4 environment. (This paper is an updated version of the paper presented at SAS Global Forum 2014 and includes new features and software changes since the original paper was delivered, plus any relevant content that still applies. This paper includes information specific to SAS 9.4 and SAS 9.4 maintenance releases.)
Read the paper (PDF).
Cindy Taylor, SAS
Paper 3185-2015:
How to Hunt for Utility Customer Electric Usage Patterns Armed with SAS® Visual Statistics with Hadoop and Hive
Your electricity usage patterns reveal a lot about your family and routines. Information collected from electrical smart meters can be mined to identify patterns of behavior that can in turn be used to help change customer behavior for the purpose of altering system load profiles. Demand Response (DR) programs represent an effective way to cope with rising energy needs and increasing electricity costs. The Federal Energy Regulatory Commission (FERC) defines demand response as changes in electric usage by end-use customers from their normal consumption patterns in response to changes in the price of electricity over time, or to incentive payments designed to lower electricity use at times of high wholesale market prices or when system reliability of jeopardized. In order to effectively motivate customers to voluntarily change their consumptions patterns, it is important to identify customers whose load profiles are similar so that targeted incentives can be directed toward these customers. Hence, it is critical to use tools that can accurately cluster similar time series patterns while providing a means to profile these clusters. In order to solve this problem, though, hardware and software that is capable of storing, extracting, transforming, loading and analyzing large amounts of data must first be in place. Utilities receive customer data from smart meters, which track and store customer energy usage. The data collected is sent to the energy companies every fifteen minutes or hourly. With millions of meters deployed, this quantity of information creates a data deluge for utilities, because each customer generates about three thousand data points monthly, and more than thirty-six billion reads are collected annually for a million customers. The data scientist is the hunter, and DR candidate patterns are the prey in this cat-and-mouse game of finding customers willing to curtail electrical usage for a program benefit. The data scientist must connect large siloed data sources, external data , and even unstructured data to detect common customer electrical usage patterns, build dependency models, and score them against their customer population. Taking advantage of Hadoop's ability to store and process data on commodity hardware with distributed parallel processing is a game changer. With Hadoop, no data set is too large, and SAS® Visual Statistics leverages machine learning, artificial intelligence, and clustering techniques to build descriptive and predictive models. All data can be usable from disparate systems, including structured, unstructured, and log files. The data scientist can use Hadoop to ingest all available data at rest, and analyze customer usage patterns, system electrical flow data, and external data such as weather. This paper will use Cloudera Hadoop with Apache Hive queries for analysis on platforms such as SAS® Visual Analytics and SAS Visual Statistics. The paper will showcase optionality within Hadoop for querying large data sets with open-source tools and importing these data into SAS® for robust customer analytics, clustering customers by usage profiles, propensity to respond to a demand response event, and an electrical system analysis for Demand Response events.
Read the paper (PDF).
Kathy Ball, SAS
I
Paper 3411-2015:
Identifying Factors Associated with High-Cost Patients
Research has shown that the top five percent of patients can account for nearly fifty percent of the total healthcare expenditure in the United States. Using SAS® Enterprise Guide® and PROC LOGISTIC, a statistical methodology was developed to identify factors (for example, patient demographics, diagnostic symptoms, comorbidity, and the type of procedure code) associated with the high cost of healthcare. Analyses were performed using the FAIR Health National Private Insurance Claims (NPIC) database, which contains information about healthcare utilization and cost in the United States. The analyses focused on treatments for chronic conditions, such as trans-myocardial laser revascularization for the treatment of coronary heart disease (CHD) and pressurized inhalation for the treatment of asthma. Furthermore, bubble plots and heat maps were created using SAS® Visual Analytics to provide key insights into potentially high-cost treatments for heart disease and asthma patients across the nation.
Read the paper (PDF). | Download the data file (ZIP).
Jeff Dang, FAIR Health
P
Paper SAS1774-2015:
Predictive Modeling Using SAS® Visual Statistics: Beyond the Prediction
Predictions, including regressions and classifications, are the predominant focus of many statistical and machine-learning models. However, in the era of big data, a predictive modeling process contains more than just making the final predictions. For example, a large collection of data often represents a set of small, heterogeneous populations. Identification of these sub groups is therefore an important step in predictive modeling. In addition, big data data sets are often complex, exhibiting high dimensionality. Consequently, variable selection, transformation, and outlier detection are integral steps. This paper provides working examples of these critical stages using SAS® Visual Statistics, including data segmentation (supervised and unsupervised), variable transformation, outlier detection, and filtering, in addition to building the final predictive model using methodology such as linear regressions, decision trees, and logistic regressions. The illustration data was collected from 2010 to 2014, from vehicle emission testing results.
Read the paper (PDF).
Xiangxiang Meng, SAS
Jennifer Ames, SAS
Wayne Thompson, SAS
S
Paper SAS1683-2015:
SAS® Visual Analytics for Fun and Profit: A College Football Case Study
SAS® Visual Analytics is a powerful tool for exploring, analyzing, and reporting on your data. Whether you understand your data well or are in need of additional insights, SAS Visual Analytics has the capabilities you need to discover trends, see relationships, and share the results with your information consumers. This paper presents a case study applying the capabilities of SAS Visual Analytics to NCAA Division I college football data from 2005 through 2014. It follows the process from reading raw comma-separated values (csv) files through processing that data into SAS data sets, doing data enrichment, and finally loading the data into in-memory SAS® LASR™ tables. The case study then demonstrates using SAS Visual Analytics to explore detailed play-by-play data to discover trends and relationships, as well as to analyze team tendencies to develop game-time strategies. Reports on player, team, conference, and game statistics can be used for fun (by fans) and for profit (by coaches, agents and sportscasters). Finally, the paper illustrates how all of these capabilities can be delivered via the web or to a mobile device--anywhere--even in the stands at the stadium. Whether you are using SAS Visual Analytics to study college football data or to tackle a complex problem in the financial, insurance, or manufacturing industry, SAS Visual Analytics provides the power and flexibility to score a big win in your organization.
Read the paper (PDF).
John Davis, SAS
Paper 3510-2015:
SAS® Visual Analytics: Emerging Trend in Institutional Research
Institutional research and effectiveness offices at most institutions are often the primary beneficiaries of the data warehouse (DW) technologies. However, at many institutions, building the data warehouse for growing accountability, decision support, and the institutional effectiveness needs are still unfulfilled, in part due to the growing data volumes as well as the prohibitively expensive data warehousing costs built by UIT departments. In recent years, many institutional research offices in the country are often asked to take a leadership role in building the DW or partner with the campus IT department to improve the efficiency and effectiveness of the DW development. Within this context, the Office of Institutional Research and Effectiveness at a large public research university in the north east was entrusted with the responsibility to build the new campus data warehouse for growing needs such as resource allocation, competitive positioning, new program development in emerging STEM disciplines, and accountability reporting. These requirements necessitated the deployment of state-of-the-art analytical decision support applications, such as SAS® Visual Analytics (reporting and analysis), SAS® Visual Statistics (predictive), in a disparate data environment, including PeopleSoft (student), Kuali (finance), Genesys (human resources), and homegrown sponsored funding database. This presentation focuses on the efforts of institutional research and effectiveness offices in developing the decision support applications using the SAS® Enterprise business intelligence and analytical solutions. With users ranging from nontechnical to advanced analysts, greater efficiency lies in the ability to get faster and more elegant reporting from those huge stores of data and being able to share the resulting discoveries across departments. Most of the reporting applications were developed based on the needs of IPEDS, CUPA, Common Data Set, US News and World Report, g raduation and retention, and faculty activity, and deployed through an online web-based portal. The participants will learn how the University quickly analyzes institutional data through an easy-to-use, drag-and-drop, web-based application. This presentation demonstrates how to use SAS® Visual Analytics to quickly design reports that are attractive, interactive, and meaningful and then distribute those reports via the web, or through SAS® Mobile BI on an iPad® or tablet.
Read the paper (PDF).
Sivakumar Jaganathan, University of Connecticut
Thulasi Kumar Raghuraman, University of Connecticut
Sivakumar Jaganathan, University of Connecticut
Paper SAS4081-2015:
SAS® Workshop: SAS® Visual Statistics 7.1
This workshop provides hands-on experience with SAS® Visual Statistics. Workshop participants will learn to: move between the Visual Analytics Explorer interface and Visual Statistics, fit automatic statistical models, create exploratory statistical analysis, compare models using a variety of metrics, and create score code.
Read the paper (PDF).
Catherine Truxillo, SAS
Xiangxiang Meng, SAS
Mike Jenista, SAS
Paper SAS1541-2015:
SSL Configuration Best Practices for SAS® Visual Analytics 7.1 Web Applications and SAS® LASR™ Authorization Service
One of the challenges in Secure Socket Layer (SSL) configuration for any web configuration is the SSL certificate management for client and server side. The SSL overview covers the structure of the x.509 certificate and SSL handshake process for the client and server components. There are three distinctive SSL client/server combinations within the SAS® Visual Analytics 7.1 web application configuration. The most common one is the browser accessing the web application. The second one is the internal SAS® web application accessing another SAS web application. The third one is a SAS Workspace Server executing a PROC or LIBNAME statement that accesses the SAS® LASR™ Authorization Service web application. Each SSL client/server scenario in the configuration is explained in terms of SSL handshake and certificate arrangement. Server identity certificate generation using Microsoft Active Directory Certificate Services (AD CS) for enterprise level organization is showcased. The certificates, in proper format, need to be supplied to the SAS® Deployment Wizard during the configuration process. The prerequisites and configuration steps are shown with examples.
Read the paper (PDF).
Heesun Park, SAS
Jerome Hughes, SAS
Paper SAS1844-2015:
Securing Hadoop Clusters while Still Retaining Your Sanity
The Hadoop ecosystem is vast, and there's a lot of conflicting information available about how to best secure any given implementation. It's also difficult to fix any mistakes made early on once an instance is put into production. In this paper, we demonstrate the currently accepted best practices for securing and Kerberizing Hadoop clusters in a vendor-agnostic way, review some of the not-so-obvious pitfalls one could encounter during the process, and delve into some of the theory behind why things are the way they are.
Evan Kinney, SAS
Paper SAS1388-2015:
Sensing Demand Signals and Shaping Future Demand Using Multi-tiered Causal Analysis
The two primary objectives of multi-tiered causal analysis (MTCA) are to support and evaluate business strategies based on the effectiveness of marketing actions in both a competitive and holistic environment. By tying the performance of a brand, product, or SKU at retail to internal replenishment shipments at a point in time, the outcome of making a change to the marketing mix (demand) can be simulated and evaluated to determine the full impact on supply (shipments). The key benefit of MTCA is that it captures the entire supply chain by focusing on marketing strategies to shape future demand and to link them, using a holistic framework, to shipments (supply). These relationships are what truly define the marketplace and all marketing elements within the supply chain.
Read the paper (PDF).
Charlie Chase, SAS
Paper SAS1661-2015:
Show Me the Money! Text Analytics for Decision-Making in Government Spending
Understanding organizational trends in spending can help overseeing government agencies make appropriate modifications in spending to best serve the organization and the citizenry. However, given millions of line items for organizations annually, including free-form text, it is unrealistic for these overseeing agencies to succeed by using only a manual approach to this textual data. Using a publicly available data set, this paper explores how business users can apply text analytics using SAS® Contextual Analysis to assess trends in spending for particular agencies, apply subject matter expertise to refine these trends into a taxonomy, and ultimately, categorize the spending for organizations in a flexible, user-friendly manner. SAS® Visual Analytics enables dynamic exploration, including modeling results from SAS® Visual Statistics, in order to assess areas of potentially extraneous spending, providing actionable information to the decision makers.
Read the paper (PDF).
Tom Sabo, SAS
Paper SAS1972-2015:
Social Media and Open Data Integration through SAS® Visual Analytics and SAS® Text Analytics for Public Health Surveillance
A leading killer in the United States is smoking. Moreover, over 8.6 million Americans live with a serious illness caused by smoking or second-hand smoking. Despite this, over 46.6 million U.S. adults smoke tobacco, cigars, and pipes. The key analytic question in this paper is, How would e-cigarettes affect this public health situation? Can monitoring public opinions of e-cigarettes using SAS® Text Analytics and SAS® Visual Analytics help provide insight into the potential dangers of these new products? Are e-cigarettes an example of Big Tobacco up to its old tricks or, in fact, a cessation product? The research in this paper was conducted on thousands of tweets from April to August 2014. It includes API sources beyond Twitter--for example, indicators from the Health Indicators Warehouse (HIW) of the Centers for Disease Control and Prevention (CDC)--that were used to enrich Twitter data in order to implement a surveillance system developed by SAS® for the CDC. The analysis is especially important to The Office of Smoking and Health (OSH) at the CDC, which is responsible for tobacco control initiatives that help states to promote cessation and prevent initiation in young people. To help the CDC succeed with these initiatives, the surveillance system also: 1) automates the acquisition of data, especially tweets; and 2) applies text analytics to categorize these tweets using a taxonomy that provides the CDC with insights into a variety of relevant subjects. Twitter text data can help the CDC look at the public response to the use of e-cigarettes, and examine general discussions regarding smoking and public health, and potential controversies (involving tobacco exposure to children, increasing government regulations, and so on). SAS® Content Categorization helps health care analysts review large volumes of unstructured data by categorizing tweets in order to monitor and follow what people are saying and why they are saying it. Ultimatel y, it is a solution intended to help the CDC monitor the public's perception of the dangers of smoking and e-cigarettes, in addition, it can identify areas where OSH can focus its attention in order to fulfill its mission and track the success of CDC health initiatives.
Read the paper (PDF).
Manuel Figallo, SAS
Emily McRae, SAS
Paper SAS1864-2015:
Statistics for Gamers--Using SAS® Visual Analytics and SAS® Visual Statistics to Analyze World of Warcraft Logs
Video games used to be child's play. Today, millions of gamers of all ages kill countless in-game monsters and villains every day. Gaming is big business, and the data it generates is even bigger. Massive multi-player online games like World of Warcraft by Blizzard Entertainment not only generate data that Blizzard Entertainment can use to monitor users and their environments, but they can also be set up to log player data and combat logs client-side. Many users spend time analyzing their playing 'rotations' and use the information to adjust their playing style to deal more damage or, more appropriately, to heal themselves and other players. This paper explores World of Warcraft logs by using SAS® Visual Analytics and applies statistical techniques by using SAS® Visual Statistics to discover trends.
Mary Osborne, SAS
Adam Maness
T
Paper SAS1760-2015:
The Impact of Hadoop Resiliency on SAS® LASR™ Analytic Server
The SAS® LASR™ Analytic Server acts as a back-end, in-memory analytics engine for solutions such as SAS® Visual Analytics and SAS® Visual Statistics. It is designed to exist in a massively scalable, distributed environment, often alongside Hadoop. This paper guides you through the impacts of the architecture decisions shared by both software applications and what they specifically mean for SAS®. We then present positive actions you can take to rebound from unexpected outages and resume efficient operations.
Read the paper (PDF).
Rob Collum, SAS
Y
Paper 3262-2015:
Yes, SAS® Can Do! Manage External Files with SAS Programming
Managing and organizing external files and directories play an important part in our data analysis and business analytics work. A good file management system can streamline project management and file organizations and significantly improve work efficiency . Therefore, under many circumstances, it is necessary to automate and standardize the file management processes through SAS® programming. Compared with managing SAS files via PROC DATASETS, managing external files is a much more challenging task, which requires advanced programming skills. This paper presents and discusses various methods and approaches to managing external files with SAS programming. The illustrated methods and skills can have important applications in a wide variety of analytic work fields.
Read the paper (PDF).
Justin Jia, Trans Union
Amanda Lin, CIBC
back to top