SAS Global Forum 2015 Proceedings

A B C H I L M N O P R S T U V

A

Paper 3140-2015:

An Application of the Cox Proportional Hazards Model to the Construction of Objective Vintages for Credit in Financial Institutions, Using PROC PHREG

In Scotia - Colpatria Bank, the retail segment is very important. The quantity of lending applications makes it necessary to use statistical models and analytic tools in order to do an initial selection of good customers, who our credit analyst will study in depth to finally approve or deny a credit application. The construction of target vintages using the Cox model will generate past-due alerts in a shorter time, so the mitigation measures can be applied one or two months earlier than currently. This can reduce the losses by 100 bps in the new vintages. This paper makes the estimation of a proportional hazard model of Cox and compares the results with a logit model for a specific product of the bank. Additionally, we will estimate the objective vintage for the product.

Read the paper (PDF). | Download the data file (ZIP).

Ivan Atehortua Rojas, Scotia - Colpatria Bank

Paper 3371-2015:

An Application of the DEA Optimization Methodology to Make More Effective and Efficient Collection Calls

In our management and collection area, there was no methodology that provided the optimal number of collection calls to get the customer to make the minimum payment of his or her financial obligation. We wanted to determine the optimal number of calls using the data envelopment analysis (DEA) optimization methodology. Using this methodology, we obtained results that positively impacted the way our customers were contacted. We can maintain a healthy bank and customer relationship, keep management and collection at an operational level, and obtain a more effective and efficient portfolio recovery. The DEA optimization methodology has been successfully used in various fields of manufacturing production. It has solved multi-criteria optimization problems, but it has not been commonly used in the financial sector, especially in the collection area. This methodology requires specialized software, such as SAS^® Enterprise Guide^® and its robust optimization. In this presentation, we present the PROC OPTMODEL and show how to formulate the optimization problem, create the programming, and process the data available.

Read the paper (PDF).

Jenny Lancheros, Banco Colpatria Of ScotiaBank Group

Ana Nieto, Banco Colpatria of Scotiabank Group

Paper 3369-2015:

Analyzing Customer Answers in Calls from Collections Using SAS^® Text Miner to Respond in an Efficient and Effective Way

At the Multibanca Colpatria of Scotiabank, we offer a broad range of financial services and products in Colombia. In collection management, we currently manage more than 400,000 customers each month. In the call center, agents collect answers from each contact with the customer, and this information is saved in databases. However, this information has not been explored to know more about our customers and our own operation. The objective of this paper is to develop a classification model using the words in the answers from each customer from the call about receiving payment. Using a combination of text mining and cluster methodologies, we identify the possible conversations that can occur in each stage of delinquency. This knowledge makes developing specialized scripts for collection management possible.

Read the paper (PDF).

Oscar Ayala, Colpatria

Jenny Lancheros, Banco Colpatria Of ScotiaBank Group

B

Paper 3410-2015:

Building Credit Modeling Dashboards

Dashboards are an effective tool for analyzing and summarizing the large volumes of data required to manage loan portfolios. Effective dashboards must highlight the most critical drivers of risk and performance within the portfolios and must be easy to use and implement. Developing dashboards often require integrating data, analysis, or tools from different software platforms into a single, easy-to-use environment. FI Consulting has developed a Credit Modeling Dashboard in Microsoft Access that integrates complex models based on SAS into an easy-to-use, point-and-click interface. The dashboard integrates, prepares, and executes back-end models based on SAS using command-line programming in Microsoft Access with Visual Basic for Applications (VBA). The Credit Modeling Dashboard developed by FI Consulting represents a simple and effective way to supply critical business intelligence in an integrated, easy-to-use platform without requiring investment in new software or to rebuild existing SAS tools already in use.

Read the paper (PDF).

Jeremy D'Antoni, FI Consulting

C

Paper 3217-2015:

Credit Card Holders' Behavior Modeling: Transition Probability Prediction with Multinomial and Conditional Logistic Regression in SAS/STAT^®

Because of the variety of card holders' behavior patterns and income sources, each consumer account can change to different states. Each consumer account can change to states such as non-active, transactor, revolver, delinquent, and defaulted, and each account requires an individual model for generated income prediction. The estimation of the transition probability between statuses at the account level helps to avoid the lack of memory in the MDP approach. The key question is which approach gives more accurate results: multinomial logistic regression or multistage decision tree with binary logistic regressions. This paper investigates the approaches to credit cards' profitability estimation at the account level based on multistates conditional probability by using the SAS/STAT procedure PROC LOGISTIC. Both models show moderate, but not strong, predictive power. Prediction accuracy for decision tree is dependent on the order of stages for conditional binary logistic regression. Current development is concentrated on discrete choice models as nested logit with PROC MDC.

Read the paper (PDF).

Denys Osipenko, the University of Edinburgh

Jonathan Crook

Paper 3511-2015:

Credit Scorecard Generation Using the Credit Scoring Node in SAS^® Enterprise Miner™

In today's competitive world, acquiring new customers is crucial for businesses but what if most of the acquired customers turn out to be defaulters? This decision would backfire on the business and might lead to losses. The extant statistical methods have enabled businesses to identify good risk customers rather than intuitively judging them. The objective of this paper is to build a credit risk scorecard using the Credit Risk Node inside SAS^® Enterprise Miner™ 12.3, which can be used by a manager to make an instant decision on whether to accept or reject a customer's credit application. The data set used for credit scoring was extracted from UCI Machine Learning repository and consisted of 15 variables that capture details such as status of customer's existing checking account, purpose of the credit, credit amount, employment status, and property. To ensure generalization of the model, the data set has been partitioned using the data partition node in two groups of 70:30 as training and validation respectively. The target is a binary variable, which categorizes customers into good risk and bad risk group. After identifying the key variables required to generate the credit scorecard, a particular score was assigned to each of its sub groups. The final model generating the scorecard has a prediction accuracy of about 75%. A cumulative cut-off score of 120 was generated by SAS to make the demarcation between good and bad risk customers. Even in case of future variations in the data, model refinement is easy as the whole process is already defined and does not need to be rebuilt from scratch.

Read the paper (PDF).

Ayush Priyadarshi, Oklahoma State University

Kushal Kathed, Oklahoma State University

Shilpi Prasad, Oklahoma State University

H

Paper 3446-2015:

How to Implement Two-Phase Regression Analysis to Predict Profitable Revenue Units

Is it a better business decision to determine profitability of all business units/kiosks and then decide to prune the nonprofitable ones? Or does model performance improve if we decide to first find the units that meet the break-even point and then try to calculate their profits? In our project, we did a two-stage regression process due to highly skewed distribution of the variables. First, we performed logistic regression to predict which kiosks would be profitable. Then, we used linear regression to predict the average monthly revenue at each kiosk. We used SAS^® Enterprise Guide^® and SAS^® Enterprise Miner™ for the modeling process. The effectiveness of the linear regression model is much more for predicting the target variable at profitable kiosks as compared to unprofitable kiosks. The two-phase regression model seemed to perform better than simply performing a linear regression, particularly when the target variable has too many levels. In real-life situations, the dependent and independent variables can have highly skewed distributions, and two-phase regression can help improve model performance and accuracy. Some results: The logistic regression model has an overall accuracy of 82.9%, sensitivity of 92.6%, and specificity of 61.1% with comparable figures for the training data set at 81.8%, 90.7%, and 63.8% respectively. This indicates that the regression model seems to be consistently predicting the profitable kiosks at a reasonably good level. Linear regression model: For the training data set, the MAPE (mean absolute percentage errors in prediction) is 7.2% for the kiosks that earn more than $350 whereas the MAPE (mean absolute percentage errors in prediction) for kiosks that earn less than $350 is -102% for the predicted values (not log-transformed) of the target versus the actual value of the target respectively. For the validation data set, the MAPE (mean absolute percentage errors in prediction) is 7.6% for the kiosks that earn more than $350 whereas the MAPE (mean absolute percentage errors in prediction) for kiosks that earn less than $350 is -142% for the predicted values (not log-transformed) of the target versus the actual value of the target respectively. This means that the average monthly revenue figures seem to be better predicted for the model where the kiosks were earning higher than the threshold value of $350--that is, for those kiosk variables with a flag variable of 1. The model seems to be predicting the target variable with lower APE for higher values of the target variable for both the training data set above and the entire data set below. In fact, if the threshold value for the kiosks is moved to even say $500, the predictive power of the model in terms of APE will substantially increase. The validation data set (Selection Indicator=0) has fewer data points, and, therefore, the contrast in APEs is higher and more varied.

Read the paper (PDF).

Shrey Tandon, Sobeys West

Paper 3151-2015:

How to Use Internal and External Data to Realize the Potential for Changing the Game in Handset Campaigns

The telecommunications industry is the fastest changing business ecosystem in this century. Therefore, handset campaigning to increase loyalty is the top issue for telco companies. However, these handset campaigns have great fraud and payment risks if the companies do not have the ability to classify and assess customers properly according to their risk propensity. For many years, telco companies managed the risk with business rules such as customer tenure until the launch of analytics solutions into the market. But few business rules restrict telco companies in the sales of handsets to new customers. On the other hand, with increasing competition pressure in telco companies, it is necessary to use external credit data to sell handsets to new customers. Credit bureau data was a good opportunity to measure and understand the behaviors of the applicants. But using external data required system integration and real-time decision systems. For those reasons, we need a solution that enables us to predict risky customers and then integrate risk scores and all information into one real-time decision engine for optimized handset application vetting. After an assessment period, SAS^® Analytics platform and RTDM were chosen as the most suitable solution because they provide a flexible user friendly interface, high integration, and fast deployment capability. In this project, we build a process that includes three main stages to transform the data into knowledge. These stages are data collection, predictive modelling, and deployment and decision optimization. a) Data Collection: We designed a specific daily updated data mart that connects internal payment behavior, demographics, and customer experience data with external credit bureau data. In this way, we can turn data into meaningful knowledge for better understanding of customer behavior. b) Predictive Modelling: For using the company potential, it is critically important to use an analytics approach that is based on state-of-the-art tec hnologies. We built nine models to predict customer propensity to pay. As a result of better classification of customers, we obtain satisfied results in designing collection scenarios and decision model in handset application vetting. c) Deployment and Decision Optimization: Knowledge is not enough to reach success in business. It should be turned into optimized decision and deployed real time. For this reason, we have been using SAS^® Predictive Analytics Tools and SAS^® Real-Time Decision Manager to primarily turn data into knowledge and turn knowledge into strategy and execution. With this system, we are now able to assess customers properly and to sell handset even to our brand-new customers as part of the application vetting process. As a result of this, while we are decreasing nonpayment risk, we generated extra revenue that is coming from brand-new contracted customers. In three months, 13% of all handset sales was concluded via RTDM. Another benefit of the RTDM is a 30% cost saving in external data inquiries. Thanks to the RTDM, Avea has become the first telecom operator that uses bureau data in Turkish Telco industry.

Read the paper (PDF).

Hurcan Coskun, Avea

I

Paper SAS1756-2015:

Incorporating External Economic Scenarios into Your CCAR Stress Testing Routines

Since the financial crisis of 2008, banks and bank holding companies in the United States have faced increased regulation. One of the recent changes to these regulations is known as the Comprehensive Capital Analysis and Review (CCAR). At the core of these new regulations, specifically under the Dodd-Frank Wall Street Reform and Consumer Protection Act and the stress tests it mandates, are a series of what-if or scenario analyses requirements that involve a number of scenarios provided by the Federal Reserve. This paper proposes frequentist and Bayesian time series methods that solve this stress testing problem using a highly practical top-down approach. The paper focuses on the value of using univariate time series methods, as well as the methodology behind these models.

Read the paper (PDF).

Kenneth Sanford, SAS

Christian Macaro, SAS

L

Paper 3372-2015:

Leads and Lags: Static and Dynamic Queues in the SAS^® DATA Step

From stock price histories to hospital stay records, analysis of time series data often requires the use of lagged (and occasionally lead) values of one or more analysis variables. For the SAS^® user, the central operational task is typically getting lagged (lead) values for each time point in the data set. Although SAS has long provided a LAG function, it has no analogous lead function--an especially significant problem in the case of large data series. This paper reviews the LAG function (in particular, the powerful but non-intuitive implications of its queue-oriented basis), demonstrates efficient ways to generate leads with the same flexibility as the LAG function (but without the common and expensive recourse of data re-sorting), and shows how to dynamically generate leads and lags through the use of the hash object.

Read the paper (PDF). | Download the data file (ZIP).

Mark Keintz, Wharton Research Data Services

M

Paper 2524-2015:

Methodology of Model Creation

The goal of this session is to describe the whole process of model creation from the business request through model specification, data preparation, iterative model creation, model tuning, implementation, and model servicing. Each mentioned phase consists of several steps in which we describe the main goal of the step, the expected outcome, the tools used, our own SAS codes, useful nodes, and settings in SAS^® Enterprise Miner™, procedures in SAS^® Enterprise Guide^®, measurement criteria, and expected duration in man-days. For three steps, we also present deep insights with examples of practical usage, explanations of used codes, settings, and ways of exploring and interpreting the output. During the actual model creation process, we suggest using Microsoft Excel to keep all input metadata along with information about transformations performed in SAS Enterprise Miner. To get faster information about model results, we combine an automatic SAS^® code generator implemented in Excel, and then we input this code to SAS Enterprise Guide and create a specific profile of results directly from the nodes output tables of SAS Enterprise Miner. This paper also focuses on an example of a binary model stability check-in time performed in SAS Enterprise Guide through measuring optimal cut-off percentage and lift. These measurements are visualized and automatized using our own codes. By using this methodology, users would have direct contact with transformed data along with the possibility to analyze and explore any semi-results. Furthermore, the proposed approach could be used for several types of modeling (for example, binary and nominal predictive models or segmentation models). Generally, we have summarized our best practices of combining specific procedures performed in SAS Enterprise Guide, SAS Enterprise Miner, and Microsoft Excel to create and interpret models faster and more effectively.

Read the paper (PDF).

Peter Kertys, VÚB a.s.

Paper 1381-2015:

Model Risk and Corporate Governance of Models with SAS^®

Banks can create a competitive advantage in their business by using business intelligence (BI) and by building models. In the credit domain, the best practice is to build risk-sensitive models (Probability of Default, Exposure at Default, Loss-given Default, Unexpected Loss, Concentration Risk, and so on) and implement them in decision-making, credit granting, and credit risk management. There are models and tools on the next level built on these models and that are used to help in achieving business targets, risk-sensitive pricing, capital planning, optimizing of ROE/RAROC, managing the credit portfolio, setting the level of provisions, and so on. It works remarkably well as long as the models work. However, over time, models deteriorate and their predictive power can drop dramatically. Since the global financial crisis in 2008, we have faced a tsunami of regulation and accelerated frequency of changes in the business environment, which cause models to deteriorate faster than ever before. As a result, heavy reliance on models in decision-making (some decisions are automated following the model's results--without human intervention) might result in a huge error that can have dramatic consequences for the bank's performance. In my presentation, I share our experience in reducing model risk and establishing corporate governance of models with the following SAS^® tools: model monitoring, SAS^® Model Manager, dashboards, and SAS^® Visual Analytics.

Read the paper (PDF).

Boaz Galinson, Bank Leumi

Paper 3359-2015:

Modelling Operational Risk Using Extreme Value Theory and Skew t-Copulas via Bayesian Inference Using SAS^®

Operational risk losses are heavy tailed and likely to be asymmetric and extremely dependent among business lines and event types. We propose a new methodology to assess, in a multivariate way, the asymmetry and extreme dependence between severity distributions and to calculate the capital for operational risk. This methodology simultaneously uses several parametric distributions and an alternative mix distribution (the lognormal for the body of losses and the generalized Pareto distribution for the tail) via the extreme value theory using SAS^®; the multivariate skew t-copula applied for the first time to operational losses; and the Bayesian inference theory to estimate new n-dimensional skew t-copula models via Markov chain Monte Carlo (MCMC) simulation. This paper analyzes a new operational loss data set, SAS^® Operational Risk Global Data (SAS OpRisk Global Data), to model operational risk at international financial institutions. All of the severity models are constructed in SAS^® 9.2. We implement PROC SEVERITY and PROC NLMIXED and this paper describes this implementation.

Read the paper (PDF).

Betty Johanna Garzon Rozo, The University of Edinburgh

N

Paper SAS1866-2015:

Now That You Have Your Data in Hadoop, How Are You Staging Your Analytical Base Tables?

Well, Hadoop community, now that you have your data in Hadoop, how are you staging your analytical base tables? In my discussions with clients about this, we all agree on one thing: Data sizes stored in Hadoop prevent us from moving that data to a different platform in order to generate the analytical base tables. To address this dilemma, I want to introduce to you the SAS^® In-Database Code Accelerator for Hadoop.

Read the paper (PDF).

Steven Sober, SAS

Donna DeCapite, SAS

O

Paper 3425-2015:

Obtaining a Unique View of a Company: Reports in SAS^® Visual Analytics

SAS^® Visual Analytics provides users with a unique view of their company by monitoring products, and identifying opportunities and threats, making it possible to hold recommendations, set a price strategy, and accelerate or brake product growth. In SAS Visual Analytics, you can see in one report the return required, a competitor analysis, and a comparison of realized results versus predicted results. Reports can be used to obtain a vision of the whole company and include several hierarchies (for example, by business unit, by segment, by product, by region, and so on). SAS Visual Analytics enables senior executives to easily and quickly view information. You can also use tracking indicators that are used by the insurance market.

Read the paper (PDF).

Jacqueline Fraga, SulAmerica Cia Nacional de Seguros

P

Paper 3225-2015:

Portfolio Construction with OPTMODEL

Investment portfolios and investable indexes determine their holdings according to stated mandate and methodology. Part of that process involves compliance with certain allocation constraints. These constraints are developed internally by portfolio managers and index providers, imposed externally by regulations, or both. An example of the latter is the U.S. Internal Revenue Code (25/50) concentration constraint, which relates to a regulated investment company (RIC). These codes state that at the end of each quarter of a RIC's tax year, the following constraints should be met: 1) No more than 25 percent of the value of the RIC's assets might be invested in a single issuer. 2) The sum of the weights of all issuers representing more than 5 percent of the total assets should not exceed 50 percent of the fund's total assets. While these constraints result in a non-continuous model, compliance with concentration constraints can be formalized by reformulating the model as a series of continuous non-linear optimization problems solved using PROC OPTMODEL. The model and solution are presented in this paper. The approach discussed has been used in constructing investable equity indexes.

Read the paper (PDF).

Taras Zlupko, CRSP, University of Chicago

Robert Spatz

R

Paper SAS1958-2015:

Real-Time Risk Aggregation with SAS^® High-Performance Risk and SAS^® Event Stream Processing Engine

Risk managers and traders know that some knowledge loses its value quickly. Unfortunately, due to the computationally intensive nature of risk, most risk managers use stale data. Knowing your positions and risk intraday can provide immense value. Imagine knowing the portfolio risk impact of a trade before you execute. This paper shows you a path to doing real-time risk analysis leveraging capabilities from SAS^® Event Stream Processing Engine and SAS^® High-Performance Risk. Event stream processing (ESP) offers the ability to process large amounts of data with high throughput and low latency, including streaming real-time trade data from front-office systems into a centralized risk engine. SAS High-Performance Risk enables robust, complex portfolio valuations and risk calculations quickly and accurately. In this paper, we present techniques and demonstrate concepts that enable you to more efficiently use these capabilities together. We also show techniques for analyzing SAS High-Performance data with SAS^® Visual Analytics.

Read the paper (PDF).

Albert Hopping, SAS

Arvind Kulkarni, SAS

Ling Xiang, SAS

Paper SAS1871-2015:

Regulatory Compliance Reporting Using SAS^® XML Mapper

As a part of regulatory compliance requirements, banks are required to submit reports based on Microsoft Excel, as per templates supplied by the regulators. This poses several challenges, including the high complexity of templates, the fact that implementation using ODS can be cumbersome, and the difficulty in keeping up with regulatory changes and supporting dynamic report content. At the same time, you need the flexibility to customize and schedule these reports as per your business requirements. This paper discusses an approach to building these reports using SAS^® XML Mapper and the Excel XML spreadsheet format. This approach provides an easy-to-use framework that can accommodate template changes from the regulators without needing to modify the code. It is implemented using SAS^® technologies, providing you the flexibility to customize to your needs. This approach also provides easy maintainability.

Read the paper (PDF).

Sarita Kannarath, SAS

Phil Hanna, SAS

Amitkumar Nakrani, SAS

Nishant Sharma, SAS

Paper SAS1861-2015:

Regulatory Stress Testing--A Manageable Process with SAS^®

As a consequence of the financial crisis, banks are required to stress test their balance sheet and earnings based on prescribed macroeconomic scenarios. In the US, this exercise is known as the Comprehensive Capital Analysis and Review (CCAR) or Dodd-Frank Act Stress Testing (DFAST). In order to assess capital adequacy under these stress scenarios, banks need a unified view of their projected balance sheet, incomes, and losses. In addition, the bar for these regulatory stress tests is very high regarding governance and overall infrastructure. Regulators and auditors want to ensure that the granularity and quality of data, model methodology, and assumptions reflect the complexity of the banks. This calls for close internal collaboration and information sharing across business lines, risk management, and finance. Currently, this process is managed in an ad hoc, manual fashion. Results are aggregated from various lines of business using spreadsheets and Microsoft SharePoint. Although the spreadsheet option provides flexibility, it brings ambiguity into the process and makes the process error prone and inefficient. This paper introduces a new SAS^® stress testing solution that can help banks define, orchestrate and streamline the stress-testing process for easier traceability, auditability, and reproducibility. The integrated platform provides greater control, efficiency, and transparency to the CCAR process. This will enable banks to focus on more value-added analysis such as scenario exploration, sensitivity analysis, capital planning and management, and model dependencies. Lastly, the solution was designed to leverage existing in-house platforms that banks might already have in place.

Read the paper (PDF).

Wei Chen, SAS

Shannon Clark

Erik Leaver, SAS

John Pechacek

S

Paper 4400-2015:

SAS^® Analytics plus Warren Buffett's Wisdom Beats Berkshire Hathaway! Huh?

Individual investors face a daunting challenge. They must select a portfolio of securities comprised of a manageable number of individual stocks, bonds and/or mutual funds. An investor might initiate her portfolio selection process by choosing the number of unique securities to hold in her portfolio. This is both a practical matter and a matter of risk management. It is practical because there are tens of thousands of actively traded securities from which to choose and it is impractical for an individual investor to own every available security. It is also a risk management measure because investible securities bring with them the potential of financial loss -- to the point of becoming valueless in some cases. Increasing the number of securities in a portfolio decreases the probability that an investor will suffer drastically from corporate bankruptcy, for instance. However, holding too many securities in a portfolio can restrict performance. After deciding the number of securities to hold, the investor must determine which securities she will include in her portfolio and what proportion of available cash she will allocate to each security. Once her portfolio is constructed, the investor must manage the portfolio over time. This generally entails periodically reassessing the proportion of each security to maintain as time advances, but may also involve the elimination of some securities and the initiation of positions in new securities. This paper introduces an analytically driven method for portfolio security selection based on minimizing the mean correlation of returns across the portfolio. It also introduces a method for determining the proportion of each security that should be maintained within the portfolio. The methods for portfolio selection and security weighting described herein work in conjunction to maximize expected portfolio return, while minimizing the probability of loss over time. This involves a re-visioning of Harry Markowitz's Nobel Prize winning concept kno wn as Efficient Frontier . Resultant portfolios are assessed via Monte Carlo simulation and results are compared to the Standard & Poor's 500 Index and Warren Buffett's Berkshire Hathaway, which has a well-establish history of beating the Standard & Poor's 500 Index over a long period. To those familiar with Dr. Markowitz's Modern Portfolio Theory this paper may appear simply as a repackaging of old ideas. It is not.

Read the paper (PDF).

Bruce Bedford, Oberweis Dairy

Paper SAS1921-2015:

SAS^® Model Manager: An Easy Method for Deploying SAS^® Analytical Models into Relational Databases and Hadoop

SAS^® Model Manager provides an easy way to deploy analytical models into various relational databases or into Hadoop using either scoring functions or the SAS^® Embedded Process publish methods. This paper gives a brief introduction of both the SAS Model Manager publishing functionality and the SAS^® Scoring Accelerator. It describes the major differences between using scoring functions and the SAS Embedded Process publish methods to publish a model. The paper also explains how to perform in-database processing of a published model by using SAS applications as well as SQL code outside of SAS. In addition to Hadoop, SAS also supports these databases: Teradata, Oracle, Netezza, DB2, and SAP HANA. Examples are provided for publishing a model to a Teradata database and to Hadoop. After reading this paper, you should feel comfortable using a published model in your business environment.

Read the paper (PDF).

Jifa Wei, SAS

Kristen Aponte, SAS

Paper 3240-2015:

Sampling Financial Records Using the SURVEYSELECT Procedure

This paper presents an application of the SURVEYSELECT procedure. The objective is to draw a systematic random sample from financial data for review. Topics covered in this paper include a brief review of systematic sampling, variable definitions, serpentine sorting, and an interpretation of the output.

Read the paper (PDF). | Download the data file (ZIP).

Roger L Goodwin, US Government Printing Office

Paper SAS1661-2015:

Show Me the Money! Text Analytics for Decision-Making in Government Spending

Understanding organizational trends in spending can help overseeing government agencies make appropriate modifications in spending to best serve the organization and the citizenry. However, given millions of line items for organizations annually, including free-form text, it is unrealistic for these overseeing agencies to succeed by using only a manual approach to this textual data. Using a publicly available data set, this paper explores how business users can apply text analytics using SAS^® Contextual Analysis to assess trends in spending for particular agencies, apply subject matter expertise to refine these trends into a taxonomy, and ultimately, categorize the spending for organizations in a flexible, user-friendly manner. SAS^® Visual Analytics enables dynamic exploration, including modeling results from SAS^® Visual Statistics, in order to assess areas of potentially extraneous spending, providing actionable information to the decision makers.

Read the paper (PDF).

Tom Sabo, SAS

Paper SAS1880-2015:

Staying Relevant in a Competitive World: Using the SAS^® Output Delivery System to Enhance, Customize, and Render Reports

Technology is always changing. To succeed in this ever-evolving landscape, organizations must embrace the change and look for ways to use it to their advantage. Even standard business tasks such as creating reports are affected by the rapid pace of technology. Reports are key to organizations and their customers. Therefore, it is imperative that organizations employ current technology to provide data in customized and meaningful reports across a variety of media. The SAS^® Output Delivery System (ODS) gives you that edge by providing tools that enable you to package, present, and deliver report data in more meaningful ways, across the most popular desktop and mobile devices. To begin, the paper illustrates how to modify styles in your reports using the ODS CSS style engine, which incorporates the use of cascading style sheets (CSS) and the ODS document object model (DOM). You also learn how you can use SAS ODS to customize and generate reports in the body of e-mail messages. Then the paper discusses methods for enhancing reports and rendering them in desktop and mobile browsers by using the HTML and HTML5 ODS destinations. To conclude, the paper demonstrates the use of selected SAS ODS destinations and features in practical, real-world applications.

Read the paper (PDF).

Chevell Parker, SAS

Paper 3478-2015:

Stress Testing for Mid-Sized Banks

In 2014, for the first time, mid-market banks (consisting of banks and bank holding companies with $10-$50 billion in consolidated assets) were required to submit Capital Stress Tests to the federal regulators under the Dodd-Frank Act Stress Testing (DFAST). This is a process large banks have been going through since 2011. However, mid-market banks are not positioned to commit as many resources to their annual stress tests as their largest peers. Limited human and technical resources, incomplete or non-existent detailed historical data, lack of enterprise-wide cross-functional analytics teams, and limited exposure to rigorous model validations are all challenges mid-market banks face. While there are fewer deliverables required from the DFAST banks, the scrutiny the regulators are placing on the analytical modes is just as high as their expectations for Comprehensive Capital Analysis and Review (CCAR) banks. This session discusses the differences in how DFAST and CCAR banks execute their stress tests, the challenges facing DFAST banks, and potential ways DFAST banks can leverage the analytics behind this exercise.

Read the paper (PDF).

Charyn Faenza, F.N.B. Corporation

T

Paper 3328-2015:

The Comparative Analysis of Predictive Models for Credit Limit Utilization Rate with SAS/STAT^®

Credit card usage modelling is a relatively innovative task of client predictive analytics compared to risk modelling such as credit scoring. The credit limit utilization rate is a problem with limited outcome values and highly dependent on customer behavior. Proportion prediction techniques are widely used for Loss Given Default estimation in credit risk modelling (Belotti and Crook, 2009; Arsova et al, 2011; Van Berkel and Siddiqi, 2012; Yao et al, 2014). This paper investigates some regression models for utilization rate with outcome limits applied and provides a comparative analysis of the predictive accuracy of the methods. Regression models are performed in SAS/STAT^® using PROC REG, PROC LOGISTIC, PROC NLMIXED, PROC GLIMMIX, and SAS^® macros for model evaluation. The conclusion recommends credit limit utilization rate prediction techniques obtained from the empirical analysis.

Read the paper (PDF).

Denys Osipenko, the University of Edinburgh

Jonathan Crook

U

Paper 3212-2015:

Using SAS^® to Combine Regression and Time Series Analysis on U.S. Financial Data to Predict the Economic Downturn

During the financial crisis of 2007-2009, the U.S. labor market lost 8.4 million jobs, causing the unemployment rate to increase from 5% to 9.5%. One of the indicators for economic recession is negative gross domestic product (GDP) for two consecutive quarters. This poster combines quantitative and qualitative techniques to predict the economic downturn by forecasting recession probabilities. Data was collected from the Board of Governors of the Federal Reserve System and the Federal Reserve Bank of St. Louis, containing 29 variables and quarterly observations from 1976-Q1 to 2013-Q3. Eleven variables were selected as inputs based on their effects on recession and limiting the multicollinearity: long-term treasury yield (5-year and 10-year), mortgage rate, CPI inflation rate, prime rate, market volatility index, Better Business Bureau (BBB) corporate yield, house price index, stock market index, commercial real estate price index, and one calculated variable yield spread (Treasury yield-curve spread). The target variable was a binary variable depicting the economic recession for each quarter (1=Recession). Data was prepared for modeling by applying imputation and transformation on variables. Two-step analysis was used to forecast the recession probabilities for the short-term period. Predicted recession probabilities were first obtained from the Backward Elimination Logistic Regression model that was selected on the basis of misclassification (validation misclassification= 0.115). These probabilities were then forecasted using the Exponential Smoothing method that was selected on the basis of mean average error (MAE= 11.04). Results show the recession periods including the great recession of 2008 and the forecast for eight quarters (up to 2015-Q3).

Read the paper (PDF).

Avinash Kalwani, Oklahoma State University

Nishant Vyas, Oklahoma State University

V

Paper 1400-2015:

Validation and Monitoring of PD Models for Low Default Portfolios Using PROC MCMC

A bank that wants to use the Internal Ratings Based (IRB) methods to calculate minimum Basel capital requirements has to calculate default probabilities (PDs) for all its obligors. Supervisors are mainly concerned about the credit risk being underestimated. For high-quality exposures or groups with an insufficient number of obligors, calculations based on historical data may not be sufficiently reliable due to infrequent or no observed defaults. In an effort to solve the problem of default data scarcity, modeling assumptions are made, and to control the possibility of model risk, a high level of conservatism is applied. Banks, on the other hand, are more concerned about PDs that are too pessimistic, since this has an impact on their pricing and economic capital. In small samples or where we have little or no defaults, the data provides very little information about the parameters of interest. The incorporation of prior information or expert judgment and using Bayesian parameter estimation can potentially be a very useful approach in a situation like this. Using PROC MCMC, we show that a Bayesian approach can serve as a valuable tool for validation and monitoring of PD models for low default portfolios (LDPs). We cover cases ranging from single-period, zero correlation, and zero observed defaults to multi-period, non-zero correlation, and few observed defaults.

Read the paper (PDF).

Machiel Kruger, North-West University