Advanced Papers A-Z

Session 1256-2017:
A Comparison of Machine Learning Methods and Logistic Analysis for the Prediction of Past-Due Amount
This poster shows how to predict a past-due amount using traditional and machine learning techniques: logistic analysis, k-nearest neighbors, and random forest. The data set that was analyzed is about real-world commerce. It contains 305 categories of financial information from more than 11,787,287 unique businesses, from 2006 to 2014. The big challenge is how to handle the big and noisy real-world data sets. The first step of any model-building exercise is to define the outcome. A common prediction method in the financial services industry is to use binary outcomes, such as Good and Bad. For our research problem, we reduced past-due amounts into two cases, Good and Bad. Next, we built a two-stage model using the logistic regression method; that is, the first stage predicts the likelihood of a Bad outcome, and the second predicts a past-due amount, given a Bad outcome. Logistic analysis as a traditional statistical technique is commonly used for prediction and classification in the financial services industry. However, for analyzing big, noisy, or complex data sets, machine learning techniques are typically preferred to detect hard-to-discern patterns. To compare with both techniques, we use predictive accuracy, ROC index, sensitivity, and specificity as criteria.
Jie Hao, Kennesaw State University
Peter Eberhardt, Fernwood Consulting Group Inc.
Session 1337-2017:
A Data Mining Approach to Predict Students at Risk
With the increasing amount of educational data, educational data mining has become more and more important for uncovering the hidden patterns within institutional data so as to support institutional decision making (Luan 2012). However, only very limited studies have been done on educational data mining for institutional decision support. At the University of Connecticut (UCONN), organic chemistry is a required course for undergraduate students in a STEM discipline. It has a very high DFW rate (D=Drop, F=Failure, W=Withdraw). Take Fall 2014 as an example: the average DFW% for the Organic Chemistry lectures was 24% at UCONN, and there were over 1200 students enrolled in this class. In this study, undergraduate students enrolled during School Year 2010 2011 were used to build up the model. The purpose of this study was to predict student success in the future so as to improve the education quality in our institution. The Sample, Explore, Modify, Model, and Assess (SEMMA) method introduced by SAS was applied to develop the predictive model. The freshmen SAT scores, campus, semester GPA, financial aid, and other factors were used to predict students' performance in this course. In the predictive modeling process, several modeling techniques (decision tree, neural network, ensemble models, and logistic regression) were compared with each other in order to find an optimal one for our institution.
Read the paper (PDF)
Youyou Zheng, University of Connecticut
Thanuja Sakruti, University of Connecticut
Session SAS0537-2017:
Bringing Real-Time Scoring to Your SAS® Visual Analytics Dashboards with SAS® Visual Statistics Score Code
Whether you are calculating a credit risk, a health risk, or something entirely different, you need instant, on-the-fly risk score calculation across multiple industries. This paper demonstrates how you can produce individualized risk scores through interactive dashboards. Your risk scores are backed by powerful SAS® analytics because they leverage score code that you produce in SAS® Visual Statistics. Advanced topics, including the use of calculated items and parameters in your dashboards, as well as how to develop SAS® Stored Processes capable of accepting parameters that are passed through your SAS® Visual Analytics Dashboard are covered in detail.
Read the paper (PDF)
Eli Kovick, SAS
Session 0835-2017:
Building Intelligent Macros: Using Metadata Functions with the SAS® Macro Language
The SAS® Macro Language gives you the power to create tools that, to a large extent, think for themselves. How often have you used a macro that required your input, and you thought to yourself, Why do I need to provide this information when SAS® already knows it? SAS might already know most of this information, but how does SAS direct your macro programs to self-discern the information that they need? Fortunately, there are a number of functions and tools in SAS that can intelligently enable your programs to find and use the information that they require. If you provide a variable name, SAS should know its type and length. If you provide a data set name, SAS should know its list of variables. If you provide a library or libref, SAS should know the full list of data sets that it contains. In each of these situations, functions can be used by the macro language to determine and return information. By providing a libref, functions can determine the library's physical location and the list of data sets it contains. By providing a data set, they can return the names and attributes of any of the variables that it contains. These functions can read and write data, create directories, build lists of files in a folder, and build lists of folders. Maximize your macro's intelligence; learn and use these functions.
Read the paper (PDF)
Art Carpenter, California Occidental Consultants
Session 1484-2017:
Can Incumbents Take the Digital Curve ?!
Digital transformation and analytics for incumbents isn't a question of choice or strategy. It's a question of business survival. Go analytics!
Liav Geffen, Harel Insurance & Finance
Session SAS1414-2017:
Churn Prevention in the Telecom Services Industry: A Systematic Approach to Prevent B2B Churn Using SAS®
It takes months to find a customer and only seconds to lose one Unknown. Though the Business-to-Business (B2B) churn problem might not be as common as Business-to-Consumer (B2C) churn, it has become crucial for companies to address this effectively as well. Using statistical methods to predict churn is the first step in the process of retaining customers, which also includes model evaluation, prescriptive analytics (including outreach optimization), and performance reporting. Providing visibility into model and treatment performance enables the Data and Ops teams to tune models and adjust treatment strategy. West Corporation's Center for Data Science (CDS) has partnered with one of the lines of businesses in order to measure and prevent B2B customer churn. CDS has coupled firmographic and demographic data with internal CRM and past outreach data to build a Propensity to Churn model using SAS®. CDS has provided the churn model output to an internal Client Success Team (CST), who focuses on high-risk/high-value customers in order to understand and provide resolution to any potential concerns that might be expressed by such customers. Furthermore, CDS automated weekly performance reporting using SAS and Microsoft Excel that not only focuses on model statistics, but also on CST actions and impact. This paper focuses on all of the steps involved in the churn-prevention process, including building and reviewing the model, treatment design and implementation, as well as performance reporting.
Krutharth Peravalli, West Corporation
Dmitriy Khots, West Corporation
Session SAS0381-2017:
Circular Metadata Group Membership Can Make Us Dizzy!
Today it is vital for an organization to manage, distribute, and secure content for its employees. In most cases, different groups of employees are interested in different content, and some content should not be available to everyone. It is the SAS® administrator's job to design a metadata group structure that makes managing content easier. SAS enables you to create any metadata group organizational structure imaginable, and it is common to define a metadata group structure that mimics the organization's hierarchy. Circular group memberships are frequently the cause of unexpected issues with SAS web applications. A circular group relationship can be as simple as two groups being members of one another. You might not be aware that you have defined this type of recursive association between groups. The paper identifies some problems that are caused by recursive group memberships and provides tools to investigate your metadata group structure that help identify recursive metadata group relationships. We explain the process of extracting group associations from the SAS® Metadata Server, and we show how to organize this data to investigate group relationships. We use a stored process to generate a report and SAS® Visual Analytics to generate a network diagram that provides a graphical representation of an organization's group relationship structure, to easily identify circular group structures.
Read the paper (PDF)
Karen Hinkson, SAS
Greg Lehner, SAS
Session 1445-2017:
Complex Merging of Emergency Department and Hospitalization Data to Create a Longitudinal Data Set
Epidemiologists and other health scientists are often tasked with solving health problems but find collecting original data prohibitive for a multitude of reasons. For this reason, it is common to instead use secondary data such as that from emergency departments (ED) or inpatient hospital stays. In order to use some of these secondary data sets to study problems over time, it is necessary to link them together using common identifiers and still keep all the unique information about each ED visit or hospitalization. This paper discusses a method that was used to combine five years worth of individual ED visits and five years worth of individual hospitalizations to create a single and (much) larger data set for longitudinal analysis.
Read the paper (PDF)
Charlotte Baker, Florida A&M University
Session SAS0552-2017:
Deploying SAS® on Software-Defined and Virtual Storage Systems
This paper presents considerations for deploying SAS® Foundation across software-defined storage (SDS) infrastructures, and within virtualized storage environments. There are many new offerings on the market that offer easy, point-and-click creation of storage entities, with simplified management. Internal storage area network (SAN) virtualization also removes much of the hands-on management for defining storage device pools. Automated tier software further attempts to optimize data placement across performance tiers without manual intervention. Virtual storage provisioning and automated tier placement have many time-saving and management benefits. In some cases, they have also caused serious unintended performance issues with heavy large-block workloads, such as those found in SAS Foundation. You must follow best practices to get the benefit of these new technologies while still maintaining performance. For SDS infrastructures, this paper offers specific considerations for the performance of applications in SAS Foundation, workload management and segregation, replication, high availability, and disaster recovery. Architecture and performance ramifications and advice are offered for virtualized and tiered storage systems. General virtual storage pros and cons are also discussed in detail.
Read the paper (PDF)
Tony Brown, SAS
Margaret Crevar, SAS
Session 1170-2017:
Developing a Product Recommendation Platform for Real-Time Decisions in the Direct Sales Environment
Applying solutions for recommending products to final customers in e-commerce is already a known practice. Crossing consumer profile information with their behavior tends to generate results that are more than satisfactory for the business. Natura's challenge was to create the same type of solution for their sales representatives in the platform used for ordering. The sales representatives are not buying for their own consumption, but rather are ordering according to the demands of their customers. That is the difference, because in this case the analysts does not have information about the behavior or preferences of the final client. By creating a basket product concept for their sales representatives, Natura developed a new solution. Natura developed an algorithm using association analysis (Market Basket) and implemented this directly in the sales platform using SAS® Real-Time Decision Manager. Measuring the results in indications conversion (products added in the requests), the amount brought in by the new solution was 53% higher than indications that used random suggestions, and 38% higher than those that used business rules.
Read the paper (PDF)
Francisco Pigato, Natura
Session SAS0388-2017:
Factorization Machines: A New Tool for Sparse Data
Factorization machines are a new type of model that is well suited to very high-cardinality, sparsely observed transactional data. This paper presents the new FACTMAC procedure, which implements factorization machines in SAS® Visual Data Mining and Machine Learning. This powerful and flexible model can be thought of as a low-rank approximation of a matrix or a tensor, and it can be efficiently estimated when most of the elements of that matrix or tensor are unknown. Thanks to a highly parallel stochastic gradient descent optimization solver, PROC FACTMAC can quickly handle data sets that contain tens of millions of rows. The paper includes examples that show you how to use PROC FACTMAC to recommend movies to users based on tens of millions of past ratings, predict whether fine food will be highly rated by connoisseurs, restore heavily damaged high-resolution images, and discover shot styles that best fit individual basketball players. ®
Read the paper (PDF)
Jorge Silva, SAS
Ray Wright, SAS
Session SAS0538-2017:
Fast implementation of State Transition Models
Implementation of state transition models for loan-level portfolio evaluation was an arduous task until now. Several features have been added to the SAS® High-Performance Risk engine that greatly enhance the ability of users to implement and execute these complex, loan-level models. These new features include model methods, model groups, and transition matrix functions. These features eliminate unnecessary and redundant calculations; enable the user to seamlessly interconnect systems of models; and automatically handle the bulk of the process logic in model implementation that users would otherwise need to code themselves. These added features reduce both the time and effort needed to set up model implementation processes, as well as significantly reduce model run time. This paper describes these new features in detail. In addition, we show how these powerful models can be easily implemented by using SAS® Model Implementation Platform with SAS® 9.4. This implementation can help many financial institutions take a huge leap forward in their modeling capabilities.
Read the paper (PDF)
Shannon Clark, SAS
Session 0902-2017:
Fitting Complex Statistical Models with NLMIXED and MCMC Procedures
SAS/STAT® software has several procedures that estimate parameters from generalized linear models designed for both continuous and discrete response data (including proportions and counts). Procedures such as LOGISTIC, GENMOD, GLIMMIX, and FMM, among others, offer a flexible range of analysis options to work with data from a variety of distributions and also with correlated or clustered data. SAS® procedures can also model zero-inflated and truncated distributions. This paper demonstrates how statements from PROC NLMIXED can be written to match the output results from these procedures, including the LS-means. Situations arise where the flexible programming statements of PROC NLMIXED are needed for other situations such as zero-inflated or hurdle models, truncated counts, or proportions (including legitimate zeros) that have random effects, and also for probability distributions not available elsewhere. A useful application of these coding techniques is that programming statements from NLMIXED can often be directly transferred into PROC MCMC with little or no modification to perform analyses from a Bayesian perspective with these various types of complex models.
Read the paper (PDF)
Robin High, University of Nebraska Medical Center
Session 0771-2017:
From Event Queues to Analytics
In the quest for valuable analytics, access to business data through message queues provides near real-time access to the entire data life cycle. This in turn enables our analytical models to perform accurately. What does the item a user temporarily put in the shopping basket indicate, and what can be done to motivate the user? How do you recover the user who has now unsubscribed, given that the user had previously unsubscribed and re-subscribed quickly? User behavior can be captured completely and efficiently using a message queue, which causes minimal load on production systems and allows for distributed environments. There are some technical issues encountered when attempting to populate a data warehouse using events from a message queue. The presentation outlines a solution to the following issues: the message queue connection, how to ensure that messages aren't lost in transit, and how to efficiently process messages with SAS®; message definition and metadata, and how to react to changes in message structure; data architecture and which data architecture is appropriate for storing message data and other business data; late arrival of messages and how late arriving data can be loaded into slowly changing dimensions; and analytical processing and how transactional message data can be reformatted for analytical modeling. Ultimately, populating a data warehouse with message queue data can require less development than accessing source databases; however a robust architecture
Read the paper (PDF)
Bronwen Fairbairn, Collection House Group
Session 0864-2017:
Hands-on Graph Template Language (GTL): Part B
Do you need to add annotations to your graphs? Do you need to specify your own colors on the graph? Would you like to add Unicode characters to your graph, or would you like to create templates that can also be used by non-programmers to produce the required figures? Great, then this topic is for you! In this hands-on workshop, you are guided through the more advanced features of the GTL procedure. There are also fun and challenging SAS® graphics exercises to enable you to more easily retain what you have learned.
Read the paper (PDF) | Download the data file (ZIP)
Kriss Harris
Session 0340-2017:
How to Use SAS® to Filter Stock for Trade
Investors usually trade stocks or exchange-traded funds (ETFs) based on a methodology, such as a theory, a model, or a specific chart pattern. There are more than 10,000 securities listed on the US stock market. Picking the right one based on a methodology from so many candidates is usually a big challenge. This paper presents the methodology based on the CANSLIM1 theorem and momentum trading (MT) theorem. We often hear of the cup and handle shape (C&H), double bottoms and multiple bottoms (MB), support and resistance lines (SRL), market direction (MD), fundamental analyses (FA), and technical analyses (TA). Those are all covered in CANSLIM theorem. MT is a trading theorem based on stock moving direction or momentum. Both theorems are easy to learn but difficult to apply without an appropriate tool. The brokers' application system usually cannot provide such filtering due to its complexity. For example, for C&H, where is the handle located? For the MB, where is the last bottom you should trade at? Now, the challenging task can be fulfilled through SAS®. This paper presents the methods on how to apply the logic and graphically present them though SAS. All SAS users, especially those who work directly on capital market business, can benefit from reading this document to achieve their investment goals. Much of the programming logic can also be adopted in SAS finance packages for clients.
Read the paper (PDF)
Brian Shen, Merlin Clinical Service LLC
Session 1441-2017:
I'm Normal, You're Normal, but Is Your Weather Normal?
The traditional view is that a utility's long-term forecast must have a standard against which it is judged. Weather normalization is one of the industry-standard practices that utilities use to assess the efficacy of a forecasting solution. While recent advances in probabilistic load forecasting techniques are proving to be a methodology that brings many benefits to a forecast, many utilities still require the benchmarking process to determine the accuracy of their long-term forecasts. Due to climatological volatility and the potentially large annual variances in temperature, humidity, and other relevant weather variables, most utilities create normalized weather profiles through various processes in order to estimate what is traditionally called a weather normalized load profile. However, new research shows that due to the nonlinear response of electric demand to weather variations, a simple normal weather profile in many cases might not equate to a normal load. In this paper, we introduce a probabilistic approach to deriving normalized load profiles and monthly peak and energy in through a process we label load normalization against the effects of weather . We compare it with the traditional weather normalization process to quantify the costs and benefits of using such a process. The proposed method has been successfully deployed at utilities for their long-term operation and planning purposes, and risk management.
Read the paper (PDF)
Kyle Wood, Seminole Electric Cooperative Inc
Jason Wilson, SAS
Bradley Lawson, SAS
Rain Xie
Session 0885-2017:
Implementing Role-Based Access Control and DSoD Authorization Schema on SAS®
Traditionally, role-based access control is implemented as group memberships. Access to SAS® data sets or metadata libraries requires membership in the group that 'owns' the resources. From the point of view of a SAS process, these authorizations are additive. If a user is a member in two distinct groups, her SAS processes have access to the data resources of both groups simultaneously. This happens every time the user runs a SAS process; even when the code in question is meant to be used with only one group's resources. As a consequence, having a master data source defining data flows between groups becomes futile, as any SAS process of the user can bypass said definitions. In addition, as it is not possible to reduce the user's authorizations to match those of only the relevant group, it becomes challenging to determine whether other members of the group have sufficient authorization. Furthermore, it becomes difficult to audit statistics production, as it cannot be automatically determined which of the groups owns a certain log file. All these problems can be avoided by using role-based access control with dynamic separation of duties (RBAC DSoD). In DSoD, the user is able to activate only one group membership at a time. This paper describes one way to implement an RBAC with DSoD schema in a UNIX server environment.
Read the paper (PDF)
Perttu Muurimaki, Statistics Finland
Session 1117-2017:
Introduction to Configuring and Managing SAS® Grid Manager for Hadoop
How can we run traditional SAS® jobs, including SAS® Workspace Servers, on Hadoop worker nodes? The answer is SAS® Grid Manager for Hadoop, which is integrated with the Hadoop ecosystem to provide resource management, high availability and enterprise scheduling for SAS customers. This paper provides an introduction to the architecture, configuration, and management of SAS Grid Manager for Hadoop. Anyone involved with SAS and Apache Hadoop should find the information in this paper useful. The first area covered is a breakdown of each required SAS and Hadoop component. From the Hadoop ecosystem, we define the role of Hadoop YARN, Hadoop Distributed File System (HDFS) storage, and Hadoop client services. We review SAS metadata definitions for SAS Grid Manager, SAS® Object Spawner, and SAS® Workspace Servers. We cover required Kerberos security, as well as SAS® Enterprise Guide® and the SAS® Grid Manager Client Utility. YARN queues and the SAS Grid Policy file for optimizing job scheduling are also reviewed. And finally, we discuss traditional SAS math running on a Hadoop worker node, and how it can take advantage of high-performance math to accelerate job execution. By leveraging SAS Grid Manager for Hadoop, sites are moving SAS jobs inside a Hadoop cluster. This will ultimately cut down on data movement and provide more consistent job execution. Although this paper is written for SAS and Hadoop administrators, SAS users can also benefit from this session.
Read the paper (PDF)
Mark Lochbihler, Hortonworks
Session 0834-2017:
I’ve Got to Hand It to You: Portable Programming Techniques
As technology expands, we have the need to create programs that can be handed off to clients, to regulatory agencies, to parent companies, or to other projects, and handed off with little or no modification by the recipient. Minimizing modification by the recipient often requires the program itself to self-modify. To some extent the program must be aware of its own operating environment and what it needs to do to adapt to it. There are a great many tools available to the SAS® programmer that will allow the program to self-adjust to its own surroundings. These include location-detection routines, batch files based on folder contents, the ability to detect the version and location of SAS, programs that discern and adjust to the current operating system and the corresponding folder structure, the use of automatic and user defined environmental variables, and macro functions that use and modify system information. Need to create a portable program? We can hand you the tools.
Read the paper (PDF)
Art Carpenter, California Occidental Consultants
Mary Rosenbloom, Alcon, a Novartis Division
Session SAS0623-2017:
Kerberos Cross-Realm Authentication: Unraveling the Mysteries
How do you enable strong authentication across different parts of your organization in a safe and secure way? We know that Kerberos provides us with a safe and secure strong authentication mechanism, but how does it work across different domains or realms? In this paper, we examine how Kerberos cross-realm authentication works and the different parts that you need ready in order to use Kerberos effectively. Understanding the principals and applying the ideas we present will make you successful at improving the security of your authentication system.
Read the paper (PDF)
Stuart Rogers, SAS
Session 1069-2017:
Know Your Tools Before You Use
When analyzing data with SAS®, we often use the SAS DATA step and the SQL procedure to explore and manipulate data. Though they both are useful tools in SAS, many SAS users do not fully understand their differences, advantages, and disadvantages and thus have numerous unnecessary biased debates on them. Therefore, this paper illustrates and discusses these aspects with real work examples, which give SAS users deep insights into using them. Using the right tool for a given circumstance not only provides an easier and more convenient solution, it also saves time and work in programming, thus improving work efficiency. Furthermore, the illustrated methods and advanced programming skills can be used in a wide variety of data analysis and business analytics fields.
Read the paper (PDF)
Justin Jia, TransUnion
Session 1257-2017:
Let the System Do Repeating Work for You
Developing software using agile methodologies has become the common practice in many organizations. We use the SCRUM methodology to prepare, plan, and implement changes in our analytics environment. Preparing for the deployment of a new release usually took two days of creating packages, promoting them, deploying jobs, creating migration scripts, and correcting errors made in the first attempt. A sprint that originally took 10 working days (two weeks) was effectively reduced to barely seven. By automating this process, we were able to reduce the time needed to prepare our deployment to less than half a day, increasing the time we can spend developing by 25%. In this paper, we present the process and system prerequisites for automating the deployment process. We also describe the process, code, and scripts required for automating metadata promotion and physical table comparison and update.
Read the paper (PDF)
Laurent de Walick, PW Consulting
bas Marsman, NN Bank
Session SAS0366-2017:
Microservices and Many-Task Computing for High-Performance Analytics
A microservice architecture prescribes the design of your software application as suites of independently deployable services. In this paper, we detail how you can design your SAS® 9.4 programs so that they adhere to a microservice architecture. We also describe how you can leverage Many-Task Computing (MTC) in your SAS® programs to gain a high level of parallelism. Under these paradigms, your SAS code will gain encapsulation, robustness, reusability, and performance. The design principles discussed in this paper are implemented in the SAS® Infrastructure for Risk Management (IRM) solution. Readers with an intermediate knowledge of Base SAS® and the SAS macro language will understand how to design their SAS code so that it follows these principles and reaps the benefits of a microservice architecture.
Read the paper (PDF)
Henry Bequet, SAS
Session SAS0324-2017:
Migrating Dashboards from SAS® BI Dashboard to SAS® Visual Analytics
SAS® BI Dashboard is an important business intelligence and data visualization product used by many customers worldwide. They still rely on SAS BI Dashboard for performance monitoring and decision support. SAS® Visual Analytics is a new-generation product, which empowers customers to explore huge volumes of data very quickly and view visualized results with web browsers and mobile devices. Since SAS Visual Analytics is used by more and more regular customers, some SAS BI Dashboard customers might want to migrate existing dashboards to SAS Visual Analytics to take advantage of new technologies. In addition, some customers might hope to deploy the two products in parallel and keep everyone on the same page. Because the two products use different data models and formats, a special conversion tool is developed to convert SAS BI Dashboard dashboards into SAS Visual Analytics dashboards and reports. This paper comprehensively describes the guidelines, methods, and detailed steps to migrate dashboards from SAS BI Dashboard to SAS Visual Analytics. Then the converted dashboards can be shown in supported viewers of SAS Visual Analytics including mobile devices and modern browsers.
Read the paper (PDF)
Roc (Yipeng) Zhang, SAS
Junjie Li, SAS
Wei Lu, SAS
Huazhang Shao, SAS
Session 1425-2017:
Migrating Large, Complex SAS® Environments: In-Place versus New Build
SAS® migrations are the number one reason why SAS architects and administrators are fired. Even though this bold statement is not universally true, it has been at the epicenter of many management and technical discussions at UnitedHealth Group. The competing business forces between the desire to innovate and to provide platform stability drive difficult discussions between business leaders and IT partners that tend to result in a frustrated user-base, flustered IT professionals, and a stale SAS environment. Migrations are the antagonist of any IT professional because of the disruption, long hours, and stress that typically ensues. This paper addresses the lessons learned from a SAS migration from the first maintenance release of SAS® 9.4 to the third maintenance release of SAS® 9.4 on a technically sophisticated enterprise SAS platform including clustered metadata servers, clustered middle-tier, SSL, an IBM Platform Load Sharing Facility (LSF) grid, and SAS® Visual Analytics.
Read the paper (PDF)
Chris James, UnitedHealth Group
Session 0820-2017:
Model Risk: Learning from Others' Mistakes
Banks can create a competitive advantage in their business by using business intelligence (BI) and by building models. In the credit domain, the best practice is to build risk-sensitive models (Probability of Default, Exposure at Default, Loss Given Default, Unexpected Loss, Concentration Risk, and so on) and implement them in decision-making, credit granting, and credit risk management. There are models and tools on the next level that are built on these models and that are used to help in achieving business targets, setting risk-sensitive pricing, capital planning, optimizing Return on Equity/Risk Adjusted Return on Capital (ROE/RAROC), managing the credit portfolio, setting the level of provisions, and so on. It works remarkably well as long as the models work. However, over time, models deteriorate, and their predictive power can drop dramatically. As a result, heavy reliance on models in decision-making (some decisions are automated following the model's results-without human intervention) can result in a huge error, which might have dramatic consequences for the bank's performance. In my presentation, I share our experience in reducing model risk and establishing corporate governance of models with the following SAS® tools: SAS® Model Monitoring Microservice, SAS® Model Manager, dashboards, and SAS® Visual Analytics.
Read the paper (PDF)
Boaz Galinson, Bank Leumi
Session 0793-2017:
Modeling Actuarial Risk using SAS® Enterprise Guide®: A Study on Mortality Tables and Interest Rates
This presentation has the objective to present a methodology for interest rates, life tables, and actuarial calculations using generational mortality tables and the forward structure of interest rates for pension funds, analyzing long-term actuarial projections and their impacts on the actuarial liability. It was developed as a computational algorithm in SAS® Enterprise Guide® and Base SAS® for structuring the actuarial projections and it analyzes the impacts of this new methodology. There is heavy use of the IML and SQL procedures.
Read the paper (PDF)
Luiz Carlos Leao, Universidade Federal Fluminense (UFF)
Session SAS0724-2017:
Modeling Best Practices: An IFRS 9 Case Study
A successful conversion to the International Financial Reporting Standards (IFRS) standard known as IFRS 9 can present many challenges for a financial institution. We discuss how leveraging best practices in project management, accounting standards, and platform implementation can overcome these challenges. Effective project management ensures that the scope of the implementation and success criteria are well defined. It captures all major decision points and ensures thorough documentation of the platform and how its unique configuration ties back directly to specific business requirements. Understanding the nuances of the IFRS 9 standard, specifically the impact of bucketing all financial assets according to their cash flow characteristics and business models, is crucial to ensuring the design of an efficient and robust reporting platform. Credit impairment is calculated at the instrument level, and can both improve or deteriorate. Changes in the level of credit impairment of individual financial assets enters the balance sheet as either an amortized cost, other comprehensive income, or fair value through profit and loss. Introducing more volatility to these balances increases the volatility in key financial ratios used by regulators. A robust and highly efficient platform is essential to process these calculations, especially under tight reporting deadlines and the possibility of encountering challenges. Understanding how the system is built through the project documentatio
Read the paper (PDF)
Peter Baquero, SAS
Ling Xiang, SAS
Session 1400-2017:
More than a Report: Mapping the TABULATE Procedure as a Nested Data Object
The TABULATE procedure has long been a central workhorse of our organization's reporting processes, given that it offers a uniquely concise syntax for obtaining descriptive statistics on deeply grouped and nested categories within a data set. Given the diverse output capabilities of SAS®, it often then suffices to simply ship the procedure's completed output elsewhere via the Output Delivery System (ODS). Yet there remain cases in which we want to not only obtain a formatted result, but also to acquire the full nesting tree and logic by which the computations were made. In these cases, we want to treat the details of the Tabulate statements as data, not merely as presentation. I demonstrate how we have solved this problem by parsing our Tabulate statements into a nested tree structure in JSON that can be transferred and easily queried for deep values elsewhere beyond the SAS program. Along the way, this provides an excellent opportunity to walk through the nesting logic of the procedure's statements and explain how to think about the axes, groupings, and set computations that make it tick. The source code for our syntax parser are also available on GitHub for further use.
Read the paper (PDF)
Jason Phillips, The University of Alabama
Session 1148-2017:
My SAS® Grid Scheduler
No Batch Scheduler? No problem! This paper describes the use of a SAS® Data Integration Studio job that can be started by a time-dependent scheduler like Windows Scheduler (or crontab in UNIX) to mimic a batch scheduler using SAS® Grid Manager.
Read the paper (PDF)
Patrick Cuba, Cuba BI Consulting
Session SAS0747-2017:
Open Your Mind: Use Cases for SAS® and Open-Source Analytics
As a data scientist, you need analytical tools and algorithms, whether commercial or open source, and you have some favorites. But how do you decide when to use what? And how can you integrate their use to your maximum advantage? This presentation provides several best practices for deploying both SAS® and open-source analytical tools to increase productivity and efficiency in your enterprise ecosystem. See an example of a marketing analysis using SAS and R algorithms in SAS® Enterprise Miner to develop a predictive model, and then operationalize that model for performance monitoring and in-database scoring. Also learn about using Python and SAS integration for developing predictive models from a Jupyter Notebook environment. Seeing these cases will help you decide how to improve your analytics with similar integration of SAS and open source.
Read the paper (PDF)
Tuba Islam, SAS
Session 0814-2017:
Platform a la Carte: An Assembly Line to Create SAS® Enterprise BI Server Instances with Ansible
Installation and configuration of a SAS® Enterprise BI platform in the requirements of the today's world requires knowledge on a wide variety of subjects. Security requirements are growing, the number of involved components is growing, time to delivery should be shorter, and the quality must be increased. The expectations of the customers are based on a cloud experience where automated deployments with ready-to-use applications are state of the art. This paper describes an approach to address the challenges to deploy SAS® 9.4 on Linux to meet today's customer expectations.
Read the paper (PDF)
Javor Evstatiev, EVS
Andrey Turlov, AMOS
Session 1326-2017:
Price Recommendation Engine for Airbnb
Airbnb is the world's largest home-sharing company and has over 800,000 listings in more than 34,000 cities and 190 countries. Therefore, the pricing of their property, done by the Airbnb hosts, is crucial to the business. Setting low prices during a high-demand period might hinder profits, while setting high prices during a low-demand period might result in no bookings at all. In this paper, we suggest a price recommendation methodology for Airbnb hosts that helps in overcoming the problems of overpricing and underpricing. Through this methodology, we try to identify key factors related to Airbnb pricing: factors influential in determining a price for a property; the relation between the price of a property and the frequency of its booking; and similarities among successful and profitable properties. The constraints outlined in the analysis were entered into SAS® optimization procedures to achieve a best possible price. As a part of this methodology, we built a scraping tool to get details of New York City host user data along with their metrics. Using this data, we build a pricing model to predict the optimal price of an Airbnb home.
Read the paper (PDF)
Praneeth Guggilla, Oklahoma State University
Singdha Gutha, Oklahoma State University
Goutam Chakraborty, Oklahoma State University
Session 1130-2017:
Protecting Your Programs from Unwanted Text Using Macro Quoting Functions
Face it your data can occasionally contain characters that wreak havoc on your macro code. Characters such as the ampersand in at&t, or the apostrophe in McDonald's, for example. This paper is designed for programmers who know most of the ins and outs of SAS® macro code already. Now let's take your macro skills a step farther by adding to your skill set, specifically, %BQUOTE, %STR, %NRSTR, and %SUPERQ. What is up with all these quoting functions? When do you use one over the other? And why would you need %UNQUOTE? The macro language is full of subtleties and nuances, and the quoting functions represent the epitome of all of this. This paper shows you in which instances you would use the different quoting functions. Specifically, we show you the difference between the compile-time and the execution-time functions. In addition to looking at the traditional quoting functions, you learn how to use %QSCAN and %QSYSFUNC among other functions that apply the regular function and quote the result.
Read the paper (PDF)
Michelle Buchecker, ThotWave Technologies, LLC.
Session 0242-2017:
Random Forests with Approximate Bayesian Model Averaging
A random forest is an ensemble of decision trees that often produce more accurate results than a single decision tree. The predictions of the individual trees in the forest are averaged to produce a final prediction. The question now arises whether a better or more accurate final prediction cannot be obtained by a more intelligent use of the trees in the forest. In particular, in the way random forests are currently defined, every tree contributes the same fraction to the final result (for example, if there are 50 trees, each tree contributes 1/50th to the final result). This ignores model uncertainty as less accurate trees are treated exactly like more accurate trees. Replacing averaging with Bayesian Model Averaging will give better trees the opportunity to contribute more to the final result, which might lead to more accurate predictions. However, there are several complications to this approach that have to be resolved, such as the computation of an SBC value for a decision tree. Two novel approaches to solving this problem are presented and the results compared to that obtained with the standard random forest approach.
Read the paper (PDF)
Tiny Du Toit, North-West University
Andre De Waal, SAS
Session 0847-2017:
Revenue Score: Forecasting Credit Card Products with Zero Inflated Beta Regression and Gradient Boosting
Using zero inflated beta regression and gradient boosting, a solution to forecast the gross revenue of credit card products was developed. This solution was based on 1) A set of attributes from invoice information. 2) Zero inflated beta regression for forecasts of interchange and revolving revenue (by using PROC NLMIXED and by building data processing routines (with attributes and a target variable)). 3) Gradient boosting models for different product forecasts (annuity, insurance, etc.) using PROC TREEBOOST, exploring its parameters, and creating a routine for selecting and adjusting models. 4) Construction of ranges of revenue for policies and monitoring. This presentation introduces this credit card revenue forecasting solution.
Read the paper (PDF)
Marc Witarsa, Serasa Experian
Paulo Di Cellio Dias, Serasa Experian
Session 0268-2017:
%SURVEYGENMOD Macro: An Alternative to Deal with Complex Survey Design for the GENMOD Procedure
The purpose of this paper is to show a SAS® macro named %SURVEYGENMOD developed in a SAS/IML® procedure as an upgrade of macro %SURVEYGLM developed by Silva and Silva (2014) to deal with complex survey design in generalized linear models (GLMs). The new capabilities are the inclusion of negative binomial distribution, zero-inflated Poisson (ZIP) model, zero-inflated negative binomial (ZINB) model, and the possibility to get estimates for domains. The R function svyglm (Lumley, 2004) and Stata software were used as background, and the results showed that estimates generated by the %SURVEYGENMOD macro are close to the R function and Stata software.
Read the paper (PDF)
Alan Ricardo da Silva, University of Brasilia
Session 0385-2017:
Some Tricks in Graph Template Language
SAS® 9.4 Graph Template Language: Reference has more than 1300 pages and hundreds of options and statements. It is no surprise that programmers sometimes experience unexpected twists and turns when using the graph template language (GTL) to draw figures. Understandably, it is easy to become frustrated when your program fails to produce the desired graphs despite your best effort. Although SAS needs to continue improving GTL, this paper offers several tricks that help overcome some of the roadblocks in graphing.
Read the paper (PDF)
Amos Shu, AstraZeneca
Session 0846-2017:
Spawning SAS® Sleeper Cells and Calling Them into Action: SAS® University Parallel Processing
With the 2014 launch of SAS® University Edition, the reach of SAS® was greatly expanded to educators, students, researchers, non-profits, and the curious, who for the first time could use a full version of Base SAS® software for free. Because SAS University Edition allows a maximum of two CPUs, however, performance is curtailed sharply from more substantial SAS environments that can benefit from parallel and distributed processing, such as environments that implement SAS® Grid Manager, Teradata, or Hadoop solutions. Even when comparing performance of SAS University Edition against the most straightforward implementation of the SAS windowing environment, the SAS windowing environment demonstrates greater performance when run on the same computer. With parallel processing and distributed computing becoming the status quo in SAS production environments, SAS University Edition will unfortunately fall behind counterpart SAS solutions if it cannot harness parallel processing best practices and performance. To curb this disparity, this session introduces groundbreaking programmatic methods that enable commodity hardware to be networked so that multiple instances of SAS University Edition can communicate and work collectively to divide and conquer complex tasks. With parallel processing facilitated, a SAS practitioner can now harness an endless number of computers to produce blitzkrieg solutions with the SAS University Edition that rival the performance of more costly, complex infrastructure.
Troy Hughes, Datmesis Analytics
Session 1465-2017:
Stress Testing and Supplanting the LOCK Statement: Using Mutex Semaphores for Reliable File Locking
The SAS® LOCK statement was introduced in SAS®7 with great pomp and circumstance, as it enabled SAS® software to lock data sets exclusively. In a multiuser or networked environment, an exclusive file lock prevents other users and processes from accessing and accidentally corrupting a data set while it is in use. Moreover, because file lock status can be tested programmatically with the LOCK statement return code (&SYSLCKRC), data set accessibility can be validated before attempted access, thus preventing file access collisions and facilitating more reliable, robust software. Notwithstanding the intent of the LOCK statement, stress testing demonstrated in this session illustrates vulnerabilities in the LOCK statement that render its use inadvisable due to its inability to lock data sets reliably outside of the SAS/SHARE® environment. To overcome this limitation and enable reliable data set locking, a methodology is demonstrated that uses semaphores (flags) that indicate whether a data set is available or is in use, and mutually exclusive (mutex) semaphores that restrict data set access to a single process at one time. With Base SAS® file locking capabilities now restored, this session further demonstrates control table locking to support process synchronization and parallel processing. The LOCKSAFE macro demonstrates a busy-waiting (or spinlock) design that tests data set availability repeatedly until file access is achieved or the process times out.
Read the paper (PDF)
Troy Hughes, Datmesis Analytics
Session 1258-2017:
Testing the Night Away
Testing is a weak spot in many data warehouse environments. A lot of the testing is focused on the correct implementation of requirements. But due to the complex nature of analytics environments, a change in a data integration process can lead to unexpected results in totally different and untouched areas. We developed a method to identify unexpected changes often and early by doing a nightly regression test. The test does a full ETL run, compares all output from the test to a baseline, and reports all the changes. This paper describes the process and the SAS® code needed to back up existing data, trigger ETL flows, compare results, and restore situations after a nightly regression test. We also discuss the challenges we experienced while implementing the nightly regression test framework.
Read the paper (PDF)
Laurent de Walick, PW Consulting
bas Marsman, NN Bank
Stephan Minnaert, PW Consulting
Session 0274-2017:
Text Generation Data Sets (Text GDS)
SAS offers generation data set structure as part of the language feature that many users are familiar with. They use it in their organizations and manage it using keywords such as GENMAX and GENNUM. While SAS operates in a mainframe environment, users also have the ability to tap into the GDG (generation data group) feature available on z/OS, OS/390, OS/370, IBM 3070, or IBM 3090 machines. With cost-saving initiatives across businesses and due to some scaling factors, many organizations are in the process of migrating to mid-tier platforms to cheaper operating platforms such as UNIX and AIX. Because Linux is open source and is a cheaper alternative, several organizations have opted for the UNIX distribution of SAS that can work in UNIX and AIX environments. While this might be a viable alternative, there are certain nuances that the migration effort brings to the technical conversion teams. On UNIX, the concept of GDGs does not exist. While SAS offers generation data sets, they are good only for SAS data sets. If the business organization needs to house and operate with a GDG-like structure for text data sets, there isn't one available. While my organization had a similar initiative to migrate programs used to run the subprime mortgage analytic, incentive, and regulatory reporting, we identified the paucity of literature and research on this topic. Hence, I ended up developing the utility that addresses this need. This is a simple macro that helps us closely simulate a GDG/GDS.
Read the paper (PDF) | View the e-poster or slides (PDF)
Dr. Kannan Deivasigamani, HSBC
Session SAS1407-2017:
The Benefit of Using Clustering as Input to a Propensity to Buy Predictive Model
Propensity to Buy models comprise one of the most widely used techniques in supporting business strategy for customer segmentation and targeting. Some of the key challenges every data scientist faces in building predictive models are the utilization of all known predictor variables, uncovering any unknown signals, and adjusting for latent variable errors. Often, the business demands inclusion of certain variables based on a previous understanding of process dynamics. To meet such client requirements, these inputs are forced into the model, resulting in either a complex model with too many inputs or a fragile model that might decay faster than expected. West Corporation's Center for Data Science (CDS) has found a work around to strike a balance between meeting client requirements and building a robust model by using clustering techniques. A leading telecom services provider uses West's SMS Outbound Notification Platform to notify their customers about an upcoming Pay-Per-View event. As part of the modeling process, the client has identified a few variables as key business drivers and CDS used those variables to build clusters, which were then used as inputs to the predictive model. In doing so, not only all the effects of the client-mandated variables were captured successfully, but this also helped to reduce the number of inputs to the model, making it parsimonious. This paper illustrates how West has used clustering in the data preparation process and built a robust model.
Krutharth Peravalli, West Corporation
Sumit Sukhwani, West Corporation
Dmitriy Khots, West Corporation
Session 1482-2017:
The ODS EXCEL statement: Tips and Tricks for the TABULATE and REPORT Procedures
You might scream in pain or cry with joy that SAS® software can directly produce output in Microsoft Excel as .xlsx workbooks. Excel is an excellent vehicle for delivering large amounts of summary information that needs to be partitioned for human review, exploratory filtering, and sorting. SAS supports ODS EXCEL as a production destination. This paper discusses using the ODS EXCEL statement and the TABULATE and REPORT procedures in the domain of summarizing cross-sectional data extracted from a medical claims database. The discussion covers data preparation, report preparation, and tabulation statements such as CLASS, CLASSLEV, and TABLE. The effects of STYLE options and the TAGATTR suboption for inserting features that are specific to Excel such as formulas, formats, and alignment are covered in detail. A short discussion of reusing these concepts in PROC REPORT statements such as DEFINE, COMPUTE, and CALL DEFINE are also covered.
Read the paper (PDF)
Richard DeVenezia, Johnson & Johnson
Session 1354-2017:
Transitioning Health Care Data Analytic Platforms to the Cloud
As the IT industry moves to further embrace cloud computing and the benefits it enables, many companies have been slow to adopt these changes due to concerns around data compliance. Compliance with state and federal law and the relevant regulations often leads decision makers to insist that systems dealing with protected health information or similarly sensitive data remain on-premises, as the risks for non-compliance are so high. In this session, we detail BNL Consulting s standard practices for transitioning solutions that are compliant with the Health Insurance Portability and Accountability Act (HIPAA) from on-premises to a cloud-based environment hosted by Amazon Web Services (AWS). We explain that by following best practices and doing plenty of research, HIPAA compliance in a cloud environment is no more challenging than compliance in an on-premises environment. We discuss the role of best-in-practice dev-ops tools like Docker, Consul, ELK Stack, and others, which improve the reliability and the repeat-ability of your HIPAA-compliant solutions. We tie these recommendations to the use of common SAS tools and show how they can work in concert to stabilize and improve the performance of the solution over the on-premises alternatives. Although this presentation is focused on health care and HIPAA-specific examples, many of the described practices and processes apply to any sensitive-data solutions that are being considered for the cloud.
Read the paper (PDF)
Jay Baker, BNL Consulting
Session 1138-2017:
User-Written versus System-Generated SAS® Source Code
The traditional model of SAS® source-code production is for all code to be directly written by users or indirectly written (that is, generated by user-written macros, Lua code, or with DATA steps). This model was recently extended to enable SAS macro code to operate on arbitrary text (for example, on HTML) using the STREAM procedure. In contrast, SAS includes many products that operate in the client/server environment and function as follows: 1) the user interacts with the product via a GUI to specify the processing desired; 2) the product saves the user-specifications in metadata and generates SAS source code for the target processing; 3) the source code is then run (per user directions) to perform the processing. Many of these products give users the ability to modify the generated code and/or insert their own user-written code. Also, the target code (system-generated plus optional user-written) can be exported or deployed to be run as a stored process, in batch, or in another SAS environment. In this paper, we review the SAS ecosystem contexts where source code is produced, the pros and cons of each approach, discuss why some system-generated code is inelegant, and make some suggestions for determining when to write the code manually, and when and how to use system-generated code.
Read the paper (PDF)
Thomas Billings, MUFG Union Bank
Session 0612-2017:
Using Big Data to Visualize People Movement Using SAS® Basics
Visualizing the movement of people over time in an animation can provide insights that tables and static graphs cannot. There are many options, but what if you want to base the visualization on large amounts of data from several sources? SAS® is a great tool for this type of project. This paper summarizes how visualizing movement is accomplished using several data sets, large and small, and using various SAS procedures to pull it together. The use of a custom shape file is also highlighted. The end result is a GIF, which can be shared, that provides insights not available with other methods.
Read the paper (PDF)
Stephanie Thompson, Datamum
Session SAS0681-2017:
Using SAS/OR® Software to Optimize the Capacity Expansion Plan of a Robust Oil Products Distribution Network
A Middle Eastern company is responsible for daily distribution of over 230 million liters of oil products. For this distribution network, a failure scenario is defined as occurring when oil transport is interrupted or slows down, and/or when product demands fluctuate outside the normal range. Under all failure scenarios, the company plans to provide additional transport capacity at minimum cost so as to meet all point-to-point product demands. Currently, the company uses a wait-and-see strategy, which carries a high operating cost and depends on the availability of third-party transportation. This paper describes the use of the OPTMODEL procedure to implement a mixed integer programming model to model and solve this problem. Experimental results are provided to demonstrate the utility of this approach. It was discovered that larger instances of the problem, with greater numbers of potential failure scenarios, can become computationally extensive. In order to efficiently handle such instances of the problem, we have also implemented a Benders decomposition algorithm in PROC OPTMODEL.
Read the paper (PDF)
Dr. Shahrzad Azizzadeh, SAS
Session 0879-2017:
Using SAS® Visual Analytics to Improve a Customer Relationship Strategy: A Use Case at Oi S.A., a Brazilian Telecom Company
Oi S.A. (Oi) is a pioneer in providing convergent services in Brazil. It currently has the greatest network capillarity and WiFi availability Brazil. The company offers fixed lines, mobile services, broadband, and cable TV. In order to improve service to over 70 million customers, The Customer Intelligence Department manages the data generated by 40,000 call center operators. The call center produces more than a hundred million records per month, and we use SAS® Visual Analytics to collect, analyze, and distribute these results to the company. This new system changed the paradigm of data analysis in the company. SAS Visual Analytics is user-friendly and enabled the data analysis team to reduce IT time. Now it is possible to focus on business analysis. Oi started developing its SAS Visual Analytics project in June 2014. The test period lasted only 15 days and involved 10 people. The project became relevant to the company. It led us to the next step, in which 30 employees and 20 executives used the tool. During the last phase, we applied that to a larger scale with 300 users, including local managers, executives, and supervisors. The benefits brought by the fast implementation (two months) are many. We reduced the time it takes to produce reports by 80% and the time to complete business analysis by 40%.
Radakian Lino, Oi
Joao Pedro SantAnna, OI
Session SAS0642-2017:
Using a Dynamic Panel Estimator to Model Change in Panel Data
Panel data, which are collected on a set (panel) of individuals over several time points, are ubiquitous in economics and other analytic fields because their structure allows for individuals to act as their own control groups. The PANEL procedure in SAS/ETS® software models panel data that have a continuous response, and it provides many options for estimating regression coefficients and their standard errors. Some of the available estimation methods enable you to estimate a dynamic model by using a lagged dependent variable as a regressor, thus capturing the autoregressive nature of the underlying process. Including lagged dependent variables introduces correlation between the regressors and the residual error, which necessitates using instrumental variables. This paper guides you through the process of using the typical estimation method for this situation-the generalized method of moments (GMM)-and the process of selecting the optimal set of instrumental variables for your model. Your goal is to achieve unbiased, consistent, and efficient parameter estimates that best represent the dynamic nature of the model.
Read the paper (PDF)
Roberto Gutierrez, SAS
Session 0231-2017:
Using a SAS® Macro to Calculate Kappa and 95% CI for Several Pairs of Nurses of Chemical Triage
It is often necessary to assess multi-rater agreement for multiple-observation categories in case-controlled studies. The Kappa statistic is one of the most common agreement measures for categorical data. The purpose of this paper is to show an approach for using SAS® 9.4 procedures and the SAS® Macro Language to estimate Kappa with 95% CI for pairs of nurses that used two different triage systems during a computer-simulated chemical mass casualty incident (MCI). Data from the Validating Triage for Chemical Mass Casualty Incidents A First Step R01 grant was used to assess the performance of a typical hospital triage system called the Emergency Severity Index (ESI), compared with an Irritant Gas Syndrome Agent (IGSA) triage algorithm being developed from this grant, to quickly prioritize the treatment of victims of IGSA incidents. Six different pairs of nurses used ESI triage, and seven pairs of nurses used the IGSA triage prototype to assess 25 patients exposed to an IGSA and 25 patients not exposed. Of the 13 pairs of nurses in this study, two pairs were randomly selected to illustrate the use of the SAS Macro Language for this paper. If the data was not square for two nurses, a square-form table for observers using pseudo-observations was created. A weight of 1 for real observations and a weight of .0000000001 for pseudo-observations were assigned. Several macros were used to reduce programming. In this paper, we show only the results of one pair of nurses for ESI.
Read the paper (PDF) | View the e-poster or slides (PDF)
Abbas Tavakoli, University of South Carolina
Joan Culley, University of South Carolina
Jane Richter, University of South Carolina
Sara Donevant, University of South Carolina
Jean Craig, Medical University of South Carolina
Session 1185-2017:
Visualizing Market Structure Using Brand Sentiments
Increasingly, customers are using social media and other Internet-based applications such as review sites and discussion boards to voice their opinions and express their sentiments about brands. Such spontaneous and unsolicited customer feedback can provide brand managers with valuable insights about competing brands. There is a general consensus that listening to and reacting to the voice of the customer is a vital component of brand management. However, the unstructured, qualitative, and textual nature of customer data that is obtained from customers poses significant challenges for data scientists and business analysts. In this paper, we propose a methodology that can help brand managers visualize the competitive structure of a market based on an analysis of customer perceptions and sentiments that are obtained from blogs, discussion boards, review sites, and other similar sources. The brand map is designed to graphically represent the association of product features with brands, thus helping brand managers assess a brand's true strengths and weaknesses based on the voice of customers. Our multi-stage methodology uses the principles of topic modeling and sentiment analysis in text mining. The results of text mining are analyzed using correspondence analysis to graphically represent the differentiating attributes of each brand. We empirically demonstrate the utility of our methodology by using data collected from, a popular review site for car buyers.
Read the paper (PDF)
praveen kumar kotekal, Oklahoma state university
Amit K Ghosh, Cleveland State University
Goutam Chakraborty, Oklahoma State University
Session 1041-2017:
War and Peace: SAS® Platform Support. Can We Make It Easier?
Over the years, the use of SAS® has grown immensely within Royal Bank of Scotland (RBS), making platform support and maintenance overly complicated and time consuming. At RBS, we realized that we have been living 'war and peace' every day for many years and that the time has come to re-think how we support SAS platforms. With our approach to rationalize and consolidate the ways our organization uses SAS came the need to review and improve the processes and procedures we have in place. This paper explains why we did it, what we've changed or reinvented, and how all these have changed our way of operation by bringing us closer to DevOps and helping us to improve our relationship with our customers as well as building trust in the service we deliver.
Read the paper (PDF)
Sergey Iglov, RBS
Session 1483-2017:
Why Credit Risk Needs Advanced Analytics: A Journey from Base SAS® to SAS® High-Performance Risk
We are at a tipping point for credit risk modeling. To meet the technical and regulatory challenges of IFRS 9 and stress testing, and to strengthen model risk management, CBS aims to create an integrated, end-to-end, tools-based solution across the model lifecycle, with strong governance and controls and an improved scenario testing and forecasting capability. SAS has been chosen as the technology partner to enable CBS to meet these aims. A new predictive analytics platform combining well-known tools such as SAS® Enterprise Miner , SAS® Model Manager, and SAS® Data Management alongside SAS® Model Implementation Platform powered by SAS® High-Performance Risk is being deployed. Driven by technology, CBS has also considered the operating model for credit risk, restructuring resources around the new technology with clear lines of accountability, and has incorporated a dedicated data engineering function within the risk modeling team. CBS is creating a culture of collaboration across credit risk that supports the development of technology-led, innovative solutions that not only meet regulatory and model risk management requirements but that set a platform for the effective use of predictive analytics enterprise-wide.
Chris Arthur-McGuire
Session 0935-2017:
Zeroing In on Effective Member Communication: An Rx Education Study
In 2013, the Centers for Medicare & Medicaid Services (CMS) changed the pharmacy mail-order member-acquisition process so that Humana Pharmacy may only call a member with cost savings greater than $2.00 to educate the member on the potential savings and instruct the member to call back. The Rx Education call center asked for analytics work to help prioritize member outreach, improve conversions, and decrease the number of members who are unable to be contacted. After a year of contacting members using this additional insight, the conversions after agreement rate rose from 71.5% to 77.5% and the unable to contact rate fell from 30.7% to 17.4%. This case study takes you on an analytics journey from the initial problem diagnosis and analytics solution, followed by refinements, as well as test and learn campaigns.
Read the paper (PDF)
Brian Mitchell, Humana Inc.
back to top