Location information plays a big role in business data. Everything that happens in a business happens somewhere, whether it s sales of products in different regions or crimes that happened in a city. Business analysts typically use the historic data that they have gathered for years for analysis. One of the most important pieces of data that can help answer more questions qualitatively, is the demographic data along with the business data. An analyst can match the sales or the crimes with the population metrics like gender, age groups, family income, race, and other pieces of information, which are part of the demographic data, for better insight. This paper demonstrates how a business analyst can bring the demographic and lifestyle data from Esri into SAS® Visual Analytics and join the data with business data. The integration of SAS Visual Analytics with Esri allows this to happen. We demonstrate different methods of accessing Esri demographic data from SAS Visual Analytics. We also demonstrate how you can use custom shape files and integrate with Esri Portal for ArcGIS.
Murali Nori, SAS
Himesh Patel, SAS
Data is generated every second. The term big data refers to the volume, variety, and velocity of data that is being produced. Now woven into every sector, its size and complexity has left organizations faced with difficulties in being able to create, manipulate, and manage big data. This research identifies and reviews a range of big data techniques within SAS®, highlighting the fundamental opportunities that SAS provides for overcoming a variety of business challenges. Insurance is a data-dependent industry. This research focuses on understanding what SAS can offer to insurance companies and how it could interact with existing customer databases and online, user-generated content. A range of data sources have been identified for this purpose. The research demonstrates how models can be built based on existing relationships found in past data and then used to identify prospective customers. Principal component analysis, cluster analysis, and neural networks are all considered. You will learn how these techniques can be used to help capture valuable insight, create firm relationships, and support customer feedback. Whether it is prescriptive, predictive, descriptive, or diagnostic analytics, harnessing big data can add background and depth, providing insurance companies with a more complete story. You will see that you can reduce the complexity and dimensionality of data, provide actionable intelligence, and essentially make more informed business decisions.
Rebecca Peters, University of South Wales
Penny Holborn, University of South Wales
The use of telematics data within the insurance industry is becoming prevalent as insurers use this data to give discounts, categorize drivers, and provide feedback to improve customers' driving. The data captured through in-vehicle or mobile devices includes acceleration, braking, speed, mileage, and many other events. Data elements are analyzed to determine high-risk events such as rapid acceleration, hard braking, quick turning, and so on. The time between these successive high-risk events is a function of the mileage driven and time in the telematics program. Our discussion highlights how we treated these high-risk events as recurrent events and analyzed them using the RELIABILITY procedure within SAS/QC® software. The RELIABILITY procedure is used to determine a nonparametric mean cumulative function (MCF) of high-risk events. We illustrate the use of the MCF for identifying and categorizing average driver behavior versus individual driver behavior. We also discuss the use of the MCF to evaluate how a loss event or driver feedback can affect future driving behavior.
Kelsey Osterloo, State Farm Insurance Company
Deovrat Kakde, SAS
Session 1484-2017:
Can Incumbents Take the Digital Curve ?!
Digital transformation and analytics for incumbents isn't a question of choice or strategy. It's a question of business survival. Go analytics!
Liav Geffen, Harel Insurance & Finance
A/B testing is a form of statistical hypothesis testing on two business options (A and B) to determine which is more effective in the modern Internet age. The challenge for startups or new product businesses leveraging A/B testing are two-fold: a small number of customers and poor understanding of their responses. This paper shows you how to use the IML and POWER procedures to deal with the reassessment of sample size for adaptive multiple business stage designs based on conditional power arguments, using the data observed at the previous business stage.
Bo Zhang, IBM
Liwei Wang, Pharmaceutical Product Development Inc
Regarding a human disease network, most studies have estimated the associations of disorders primarily with gene or protein information. Those studies, however, have some difficulties in the data because of the massive volume of data and the huge computational cost. Instead, we constructed a human disease network that can describe the associations between diseases, using the claim data of Korean health insurance. Through several statistical analyses, we show the applicability and suitability of the disease network. Furthermore, we develop a statistical model that can predict a prevalence rate for dementia by using significant associations of the network in a statistical perspective.
Jinwoo Cho, Sung Kyun Kwan University
For all business analytics projects big or small, the results are used to support business or managerial decision-making processes, and many of them eventually lead to business actions. However, executives or decision makers are often confused and feel uninformed about contents when presented with complicated analytics steps, especially when multi-processes or environments are involved. After many years of research and experiment, a web reporting framework based on SAS® Stored Processes was developed to smooth the communication between data analysts, researches, and business decision makers. This web reporting framework uses a storytelling style to present essential analytical steps to audiences, with dynamic HTML5 content and drill-down and drill-through functions in text, graph, table, and dashboard formats. No special skills other than SAS® programming are needed for implementing a new report. The model-view-controller (MVC) structure in this framework significantly reduced the time needed for developing high-end web reports for audiences not familiar with SAS. Additionally, the report contents can be used to feed to tablet or smartphone users. A business analytical example is demonstrated during this session. By using this web reporting framework based on SAS Stored Processes, many existing SAS results can be delivered more effectively and persuasively on a SAS® Enterprise BI platform.
Qiang Li, Locfit LLC
Detection and adjustment of structural breaks are an important step in modeling time series and panel data. In some cases, such as studying the impact of a new policy or an advertising campaign, structural break analysis might even be the main goal of a data analysis project. In other cases, the adjustment of structural breaks is a necessary step to achieve other analysis objectives, such as obtaining accurate forecasts and effective seasonal adjustment. Structural breaks can occur in a variety of ways during the course of a time series. For example, a series can have an abrupt change in its trend, its seasonal pattern, or its response to a regressor. The SSM procedure in SAS/ETS® software provides a comprehensive set of tools for modeling different types of sequential data, including univariate and multivariate time series data and panel data. These tools include options for easy detection and adjustment of a wide variety of structural breaks. This paper shows how you can use the SSM procedure to detect and adjust structural breaks in many different modeling scenarios. Several real-world data sets are used in the examples. The paper also includes a brief review of the structural break detection facilities of other SAS/ETS procedures, such as the ARIMA, AUTOREG, and UCM procedures.
Rajesh Selukar, SAS
Session 1068-2017:
Establishing an Agile, Self-Service Environment to Empower Agile Analytic Capabilities
Creating an environment that enables and empowers self-service and agile analytic capabilities requires a tremendous amount of working together and extensive agreements between IT and the business. Business and IT users are struggling to know what version of the data is valid, where they should get the data from, and how to combine and aggregate all the data sources to apply analytics and deliver results in a timely manner. All the while, IT is struggling to supply the business with more and more data that is becoming available through many different data sources such as the Internet, sensors, the Internet of Things, and others. In addition, once they start trying to join and aggregate all the different types of data, the manual coding can be very complicated and tedious, can demand extraneous resources and processing, and can negatively impact the overhead on the system. If IT enables agile analytics in a data lab, it can alleviate many of these issues, increase productivity, and deliver an effective self-service environment for all users. This self-service environment using SAS® analytics in Teradata has decreased the time required to prepare the data and develop the statistical data model, and delivered faster results in minutes compared to days or even weeks. This session discusses how you can enable agile analytics in a data lab, leverage SAS analytics in Teradata to increase performance, and learn how hundreds of organizations have adopted this concept to deliver self-service capabilities in a streamlined process.
Bob Matsey, Teradata
David Hare, SAS
Implementation of state transition models for loan-level portfolio evaluation was an arduous task until now. Several features have been added to the SAS® High-Performance Risk engine that greatly enhance the ability of users to implement and execute these complex, loan-level models. These new features include model methods, model groups, and transition matrix functions. These features eliminate unnecessary and redundant calculations; enable the user to seamlessly interconnect systems of models; and automatically handle the bulk of the process logic in model implementation that users would otherwise need to code themselves. These added features reduce both the time and effort needed to set up model implementation processes, as well as significantly reduce model run time. This paper describes these new features in detail. In addition, we show how these powerful models can be easily implemented by using SAS® Model Implementation Platform with SAS® 9.4. This implementation can help many financial institutions take a huge leap forward in their modeling capabilities.
Shannon Clark, SAS
Credit card fraud. Loan fraud. Online banking fraud. Money laundering. Terrorism financing. Identity theft. The strains that modern criminals are placing on financial and government institutions demands new approaches to detecting and fighting crime. Traditional methods of analyzing large data sets on a periodic, batch basis are no longer sufficient. SAS® Event Stream Processing provides a framework and run-time architecture for building and deploying analytical models that run continuously on streams of incoming data, which can come from virtually any source: message queues, databases, files, TCP\IP sockets, and so on. SAS® Visual Scenario Designer is a powerful tool for developing, testing, and deploying aggregations, models, and rule sets that run in the SAS® Event Stream Processing Engine. This session explores the technology architecture, data flow, tools, and methodologies that are required to build a solution based on SAS Visual Scenario Designer that enables organizations to fight crime in real time.
John Shipway, SAS
An important component of insurance pricing is the insured location and the associated riskiness of that location. Recently, we have experienced a large increase in the availability of external risk classification variables and associated risk factors by geospatial location. As additional geospatial data becomes available, it is prudent for insurers to take advantage of the new information to better match price to risk. Generalized additive models using penalized likelihood (GAMPL) have been explored as a way to incorporate new location-based information. This type of model can leverage the new geospatial information and incorporate it with traditional insurance rating variables in a regression-based model for rating. In our method, we propose a local regression model in conjunction with our GAMPL model. Our discussion demonstrates the use of the LOESS procedure as well as the GAMPL procedure in a combined solution. Both procedures are in SAS/STAT® software. We discuss in detail how we built a local regression model and used the predictions from this model as an offset into a generalized additive model. We compare the results of the combined approach to results of each model individually.
Kelsey Osterloo, State Farm Insurance Company
Angela Wu, State Farm Insurance Company
Do you need to add annotations to your graphs? Do you need to specify your own colors on the graph? Would you like to add Unicode characters to your graph, or would you like to create templates that can also be used by non-programmers to produce the required figures? Great, then this topic is for you! In this hands-on workshop, you are guided through the more advanced features of the GTL procedure. There are also fun and challenging SAS® graphics exercises to enable you to more easily retain what you have learned.
Kriss Harris
When analyzing data with SAS®, we often use the SAS DATA step and the SQL procedure to explore and manipulate data. Though they both are useful tools in SAS, many SAS users do not fully understand their differences, advantages, and disadvantages and thus have numerous unnecessary biased debates on them. Therefore, this paper illustrates and discusses these aspects with real work examples, which give SAS users deep insights into using them. Using the right tool for a given circumstance not only provides an easier and more convenient solution, it also saves time and work in programming, thus improving work efficiency. Furthermore, the illustrated methods and advanced programming skills can be used in a wide variety of data analysis and business analytics fields.
Justin Jia, TransUnion
Every organization, from the most mature to a day-one start-up, needs to grow organically. A deep understanding of internal customer and operational data is the single biggest catalyst to develop and sustain the data. Advanced analytics and big data directly feed into this, and there are best practices that any organization (across the entire growth curve) can adopt to drive success. Analytics teams can be drivers of growth. But to be truly effective, key best practices need to be implemented. These practices include in-the-weeds details, like the approach to data hygiene, as well as strategic practices, like team structure and model governance. When executed poorly, business leadership and the analytics team are unable to communicate with each other they talk past each other and do not work together toward a common goal. When executed well, the analytics team is part of the business solution, aligned with the needs of business decision-makers, and drives the organization forward. Through our engagements, we have discovered best practices in three key areas. All three are critical to analytics team effectiveness. 1) Data Hygiene 2) Complex Statistical Modeling 3) Team Collaboration
Aarti Gupta, Bain & Company
Paul Markowitz, Bain & Company
A microservice architecture prescribes the design of your software application as suites of independently deployable services. In this paper, we detail how you can design your SAS® 9.4 programs so that they adhere to a microservice architecture. We also describe how you can leverage Many-Task Computing (MTC) in your SAS® programs to gain a high level of parallelism. Under these paradigms, your SAS code will gain encapsulation, robustness, reusability, and performance. The design principles discussed in this paper are implemented in the SAS® Infrastructure for Risk Management (IRM) solution. Readers with an intermediate knowledge of Base SAS® and the SAS macro language will understand how to design their SAS code so that it follows these principles and reaps the benefits of a microservice architecture.
Henry Bequet, SAS
Banks can create a competitive advantage in their business by using business intelligence (BI) and by building models. In the credit domain, the best practice is to build risk-sensitive models (Probability of Default, Exposure at Default, Loss Given Default, Unexpected Loss, Concentration Risk, and so on) and implement them in decision-making, credit granting, and credit risk management. There are models and tools on the next level that are built on these models and that are used to help in achieving business targets, setting risk-sensitive pricing, capital planning, optimizing Return on Equity/Risk Adjusted Return on Capital (ROE/RAROC), managing the credit portfolio, setting the level of provisions, and so on. It works remarkably well as long as the models work. However, over time, models deteriorate, and their predictive power can drop dramatically. As a result, heavy reliance on models in decision-making (some decisions are automated following the model's results-without human intervention) can result in a huge error, which might have dramatic consequences for the bank's performance. In my presentation, I share our experience in reducing model risk and establishing corporate governance of models with the following SAS® tools: SAS® Model Monitoring Microservice, SAS® Model Manager, dashboards, and SAS® Visual Analytics.
Boaz Galinson, Bank Leumi
Graphics are an excellent way to display results from multiple statistical analyses and get a visual message across to the correct audience. Scientific journals often have very precise requirements for graphs that are submitted with manuscripts. While authors often find themselves using tools other than SAS® to create these graphs, the combination of the SGPLOT procedure and the Output Delivery System enables authors to create what they need in the same place as they conducted their analysis. This presentation focuses on two methods for creating a publication quality graphic in SAS® 9.4 and provides solutions for some issues encountered when doing so.
Charlotte Baker, Florida A&M University
Whether you are a current SAS® Marketing Optimization user who wants to fine tune your scenarios, a SAS® Marketing Automation user who wants to understand more about how SAS Marketing Optimization might improve your campaigns, or completely new to the world of marketing optimizations, this session covers ideas and insights for getting the highest strategic impact out of SAS Marketing Optimization. SAS Marketing Optimization is powerful analytical software, but like all software, what you get out is largely predicated by what you put in. Building scenarios is as much an art as it is a science, and how you build those scenarios directly impacts your results. What questions should you be asking to establish the best objectives? What suppressions should you consider? We develop and compare multiple what-if scenarios and discuss how to leverage SAS Marketing Optimization as a business decisioning tool in order to determine the best scenarios to deploy for your campaigns. We include examples from various industries including retail, financial services, telco, and utilities. The following topics are discussed in depth: establishing high-impact objectives, with an emphasis on setting objectives that impact organizational key performance indicators (KPIs); performing and interpreting sensitivity analysis; return on investment (ROI); evaluating opportunity costs; and comparing what-if scenarios.
Erin McCarthy, SAS
This paper explores the utilization of medical services, which has a characteristic exponential distribution. Because of this characteristic, a variable generalized linear model can be applied to it to obtain self-managed health plan rates. This approach is different from what is generally used to set the rates of health plans. This new methodology is characterized by capturing qualitative elements of exposed participants that old rate-making methods are not able to capture. Moreover, this paper also uses generalized linear models to estimate the number of days that individuals remain hospitalized. The method is expanded in a project in SAS® Enterprise Guide®, in which the utilization of medical services by the base during the years 2012, 2013, 2014, and 2015 (the last year of the base) is compared with the Hospital Cost Index of Variation. The results show that, among the variables chosen for the model, the income variable has an inverse relationship with the risk of health care expenses. Individuals with higher earnings tend to use fewer services offered by the health plan. Male individuals have a higher expenditure than female individuals, and this is reflected in the rate statistically determined. Finally, the model is able to generate tables with rates that can be charged to plan participants for health plans that cover all average risks.
Luiz Carlos Leao, Universidade Federal Fluminense (UFF)
The report brings a simple and intuitive overview on behavior of technical provision and rentability of health insurance segments, based on historical data of a major insurance company. The profitability analysis displays indicators consisting of claims, prices, and quantity of insureds and their performance separated by gender, region, and different products. The report's user can simulate more accurate premiums by inputting information about medical costs increasing and target claims rate. The technical provision view identifies the greatest impacts on the provision, such as claims payments, legal expense estimates, and future claims payments and reports. Also, it compares the real health insurance costs with the provision estimated on a previous period. Therefore, the report enables the user to get a unique panorama of health insurance underwriting and evaluate its results in order to make strategic decision for the future.
Janice Leal, SulAmerica Companhia Nacional de Seguros
Ensemble models have become increasingly popular in boosting prediction accuracy over the last several years. Stacked ensemble techniques combine predictions from multiple machine learning algorithms and use these predictions as inputs to a second level-learning algorithm. This paper shows how you can generate a diverse set of models by various methods (such as neural networks, extreme gradient boosting, and matrix factorizations) and then combine them with popular stacking ensemble techniques, including hill-climbing, generalized linear models, gradient boosted decision trees, and neural nets, by using both the SAS® 9.4 and SAS® Visual Data Mining and Machine Learning environments. The paper analyzes the application of these techniques to real-life big data problems and demonstrates how using stacked ensembles produces greater prediction accuracy than individual models and na ve ensembling techniques. In addition to training a large number of models, model stacking requires the proper use of cross validation to avoid overfitting, which makes the process even more computationally expensive. The paper shows how to deal with the computational expense and efficiently manage an ensemble workflow by using parallel computation in a distributed framework.
Funda Gunes, SAS
Russ Wolfinger, SAS
Pei-Yi Tan, SAS
Every visualization tells a story. The effectiveness of showing data through visualization becomes clear as these visualizations will tell stories about differences in US mortality using the National Longitudinal Mortality Study (NLMS) data, using the Public-Use Microdata Samples (PUMS) of 1.2 million cases and 122 thousand records of mortality. SAS® Visual Analytics is a versatile and flexible tool that easily displays the simple effects of differences in mortality rates between age groups, genders, races, places of birth (native or foreign), education and income levels, and so on. Sophisticated analyses including logistical regression (with interactions), decision trees, and neural networks that are displayed in a clear, concise manner help describe more interesting relationships among variables that influence mortality. Some of the most compelling examples are: Males who live alone have a higher mortality rate than females. White men have higher rates of suicide than black men.
Catherine Loveless-Schmitt, U.S. Census Bureau
You might scream in pain or cry with joy that SAS® software can directly produce output in Microsoft Excel as .xlsx workbooks. Excel is an excellent vehicle for delivering large amounts of summary information that needs to be partitioned for human review, exploratory filtering, and sorting. SAS supports ODS EXCEL as a production destination. This paper discusses using the ODS EXCEL statement and the TABULATE and REPORT procedures in the domain of summarizing cross-sectional data extracted from a medical claims database. The discussion covers data preparation, report preparation, and tabulation statements such as CLASS, CLASSLEV, and TABLE. The effects of STYLE options and the TAGATTR suboption for inserting features that are specific to Excel such as formulas, formats, and alignment are covered in detail. A short discussion of reusing these concepts in PROC REPORT statements such as DEFINE, COMPUTE, and CALL DEFINE are also covered.
Richard DeVenezia, Johnson & Johnson
As part of the Talanx Group, HDI Insurance has been one of the leading insurers in Brazil. Recently HDI Brazil implemented an innovative and integrated solution to prevent fraud in the Auto Claims process based on SAS® Fraud Framework and SAS® Real-time Decision Manager. A car fix or a refund is approved immediately after the claim registration for those customers who have no suspicious information. On the other hand, the high-scored claims are checked by the inspectors using SAS® Social Network Analysis. In terms of analytics, the solution has a hybrid approach working with predictive models, business rules, anomalies, and network relationship. The main benefits are a reduction in the amount of fraud, more accuracy in determining the claims to be investigated, a decrease in the false-positive rate, and the use of a relationship network to investigate suspicious connections.
Rayani Melega, HDI SEGUROS
Rayani Melega
This paper discusses a specific example of using graph analytics or social network analysis (SNA) in predictive modeling in the life insurance industry. The methods of social network analysis are applied to agents that share compensation, and the results are used to derive input variables for a model to predict the likelihood of certain behavior by insurance agents. Both SAS® code and SAS® Enterprise Miner are used to illustrate implementing different graph analytical methods. This paper assumes that the reader is familiar with the basic process of creating predictive models using multiple (linear or logistic) regression, and, in some sections, familiarity with SAS Enterprise Miner.
Robert Moore, Thrivent Financial