With increasing regulatory emphasis on using more scientific statistical processes and procedures in the Bank Secrecy Act/Anti-Money Laundering (BSA/AML) compliance space, financial institutions are being pressured to replace their heuristic, rule-based customer risk rating models with well-established, academically supported, statistically based models. As part of their customer-enhanced due diligence, firms are expected to both rate and monitor every customer for the overall risk that the customer poses. Firms with ineffective customer risk rating models can face regulatory enforcement actions such as matters requiring attention (MRAs); the Office of the Comptroller of the Currency (OCC) can issue consent orders for federally chartered banks; and the Federal Deposit Insurance Corporation (FDIC) can take similar actions against state-chartered banks. Although there is a reasonable amount of information available that discusses the use of statistically based models and adherence to the OCC bulletin Supervisory Guidance on Model Risk Management (OCC 2011-12), there is only limited material about the specific statistical techniques that financial institutions can use to rate customer risk. This paper discusses some of these techniques; compares heuristic, rule-based models and statistically based models; and suggests ordinal logistic regression as an effective statistical modeling technique for assessing customer BSA/AML compliance risk. In discussing the ordinal logistic regression model, the paper addresses data quality and the selection of customer risk attributes, as well as the importance of following the OCC's key concepts for developing and managing an effective model risk management framework. Many statistical models can be used to assign customer risk, but logistic regression, and in this case ordinal logistic regression, is a fairly common and robust statistical method of assigning customers to ordered classifications (such as Low, Medium, High-Low, High-Medium, and High-High risk).
Using ordinal logistic regression, a financial institution can create a customer risk rating model that is effective in assigning risk, justifiable to regulators, and relatively easy to update, validate, and maintain.
Edwin Rivera, SAS
Jim West, SAS
In Scotia - Colpatria Bank, the retail segment is very important. The quantity of lending applications makes it necessary to use statistical models and analytic tools in order to do an initial selection of good customers, who our credit analyst will study in depth to finally approve or deny a credit application. The construction of target vintages using the Cox model will generate past-due alerts in a shorter time, so the mitigation measures can be applied one or two months earlier than currently. This can reduce the losses by 100 bps in the new vintages. This paper makes the estimation of a proportional hazard model of Cox and compares the results with a logit model for a specific product of the bank. Additionally, we will estimate the objective vintage for the product.
Ivan Atehortua Rojas, Scotia - Colpatria Bank
In our management and collection area, there was no methodology that provided the optimal number of collection calls to get the customer to make the minimum payment of his or her financial obligation. We wanted to determine the optimal number of calls using the data envelopment analysis (DEA) optimization methodology. Using this methodology, we obtained results that positively impacted the way our customers were contacted. We can maintain a healthy bank and customer relationship, keep management and collection at an operational level, and obtain a more effective and efficient portfolio recovery. The DEA optimization methodology has been successfully used in various fields of manufacturing production. It has solved multi-criteria optimization problems, but it has not been commonly used in the financial sector, especially in the collection area. This methodology requires specialized software, such as SAS® Enterprise Guide® and its robust optimization. In this presentation, we present the PROC OPTMODEL and show how to formulate the optimization problem, create the programming, and process the data available.
Jenny Lancheros, Banco Colpatria Of ScotiaBank Group
Ana Nieto, Banco Colpatria of Scotiabank Group
The importance of econometrics in the analytics toolkit is increasing every day. Econometric modeling helps uncover structural relationships in observational data. This paper highlights the many recent changes to the SAS/ETS® portfolio that increase your power to explain the past and predict the future. Examples show how you can use Bayesian regression tools for price elasticity modeling, use state space models to gain insight from inconsistent time series, use panel data methods to help control for unobserved confounding effects, and much more.
Mark Little, SAS
Kenneth Sanford, SAS
At the Multibanca Colpatria of Scotiabank, we offer a broad range of financial services and products in Colombia. In collection management, we currently manage more than 400,000 customers each month. In the call center, agents collect answers from each contact with the customer, and this information is saved in databases. However, this information has not been explored to know more about our customers and our own operation. The objective of this paper is to develop a classification model using the words in the answers from each customer from the call about receiving payment. Using a combination of text mining and cluster methodologies, we identify the possible conversations that can occur in each stage of delinquency. This knowledge makes developing specialized scripts for collection management possible.
Oscar Ayala, Colpatria
Jenny Lancheros, Banco Colpatria Of ScotiaBank Group
Dashboards are an effective tool for analyzing and summarizing the large volumes of data required to manage loan portfolios. Effective dashboards must highlight the most critical drivers of risk and performance within the portfolios and must be easy to use and implement. Developing dashboards often require integrating data, analysis, or tools from different software platforms into a single, easy-to-use environment. FI Consulting has developed a Credit Modeling Dashboard in Microsoft Access that integrates complex models based on SAS into an easy-to-use, point-and-click interface. The dashboard integrates, prepares, and executes back-end models based on SAS using command-line programming in Microsoft Access with Visual Basic for Applications (VBA). The Credit Modeling Dashboard developed by FI Consulting represents a simple and effective way to supply critical business intelligence in an integrated, easy-to-use platform without requiring investment in new software or to rebuild existing SAS tools already in use.
Jeremy D'Antoni, FI Consulting
Because of the variety of card holders' behavior patterns and income sources, each consumer account can change to different states. Each consumer account can change to states such as non-active, transactor, revolver, delinquent, and defaulted, and each account requires an individual model for generated income prediction. The estimation of the transition probability between statuses at the account level helps to avoid the lack of memory in the MDP approach. The key question is which approach gives more accurate results: multinomial logistic regression or multistage decision tree with binary logistic regressions. This paper investigates the approaches to credit cards' profitability estimation at the account level based on multistates conditional probability by using the SAS/STAT procedure PROC LOGISTIC. Both models show moderate, but not strong, predictive power. Prediction accuracy for decision tree is dependent on the order of stages for conditional binary logistic regression. Current development is concentrated on discrete choice models as nested logit with PROC MDC.
Denys Osipenko, the University of Edinburgh
Jonathan Crook
In today's competitive world, acquiring new customers is crucial for businesses but what if most of the acquired customers turn out to be defaulters? This decision would backfire on the business and might lead to losses. The extant statistical methods have enabled businesses to identify good risk customers rather than intuitively judging them. The objective of this paper is to build a credit risk scorecard using the Credit Risk Node inside SAS® Enterprise Miner™ 12.3, which can be used by a manager to make an instant decision on whether to accept or reject a customer's credit application. The data set used for credit scoring was extracted from UCI Machine Learning repository and consisted of 15 variables that capture details such as status of customer's existing checking account, purpose of the credit, credit amount, employment status, and property. To ensure generalization of the model, the data set has been partitioned using the data partition node in two groups of 70:30 as training and validation respectively. The target is a binary variable, which categorizes customers into good risk and bad risk group. After identifying the key variables required to generate the credit scorecard, a particular score was assigned to each of its sub groups. The final model generating the scorecard has a prediction accuracy of about 75%. A cumulative cut-off score of 120 was generated by SAS to make the demarcation between good and bad risk customers. Even in case of future variations in the data, model refinement is easy as the whole process is already defined and does not need to be rebuilt from scratch.
Ayush Priyadarshi, Oklahoma State University
Kushal Kathed, Oklahoma State University
Shilpi Prasad, Oklahoma State University
The purpose of this paper is to introduce a SAS® macro named %DOUBLEGLM that enables users to model the mean and dispersion jointly using double generalized linear models described in Nelder (1991) and Lee (1998). The R functions FITJOINT and DGLM (R Development Core Team, 2011) were used to verify the suitability of the %DOUBLEGLM macro estimates. The results showed that estimates were closer than the R functions.
Paulo Silva, Universidade de Brasilia
Alan Silva, Universidade de Brasilia
Is it a better business decision to determine profitability of all business units/kiosks and then decide to prune the nonprofitable ones? Or does model performance improve if we decide to first find the units that meet the break-even point and then try to calculate their profits? In our project, we did a two-stage regression process due to highly skewed distribution of the variables. First, we performed logistic regression to predict which kiosks would be profitable. Then, we used linear regression to predict the average monthly revenue at each kiosk. We used SAS® Enterprise Guide® and SAS® Enterprise Miner™ for the modeling process. The effectiveness of the linear regression model is much more for predicting the target variable at profitable kiosks as compared to unprofitable kiosks. The two-phase regression model seemed to perform better than simply performing a linear regression, particularly when the target variable has too many levels. In real-life situations, the dependent and independent variables can have highly skewed distributions, and two-phase regression can help improve model performance and accuracy. Some results: The logistic regression model has an overall accuracy of 82.9%, sensitivity of 92.6%, and specificity of 61.1% with comparable figures for the training data set at 81.8%, 90.7%, and 63.8% respectively. This indicates that the regression model seems to be consistently predicting the profitable kiosks at a reasonably good level. Linear regression model: For the training data set, the MAPE (mean absolute percentage errors in prediction) is 7.2% for the kiosks that earn more than $350 whereas the MAPE (mean absolute percentage errors in prediction) for kiosks that earn less than $350 is -102% for the predicted values (not log-transformed) of the target versus the actual value of the target respectively. For the validation data set, the MAPE (mean absolute percentage errors in prediction) is 7.6% for the kiosks that earn more
than $350 whereas the MAPE (mean absolute percentage errors in prediction) for kiosks that earn less than $350 is -142% for the predicted values (not log-transformed) of the target versus the actual value of the target respectively. This means that the average monthly revenue figures seem to be better predicted for the model where the kiosks were earning higher than the threshold value of $350--that is, for those kiosk variables with a flag variable of 1. The model seems to be predicting the target variable with lower APE for higher values of the target variable for both the training data set above and the entire data set below. In fact, if the threshold value for the kiosks is moved to even say $500, the predictive power of the model in terms of APE will substantially increase. The validation data set (Selection Indicator=0) has fewer data points, and, therefore, the contrast in APEs is higher and more varied.
Shrey Tandon, Sobeys West
In data mining modelling, data preparation is the most crucial, most difficult, and longest part of the mining process. A lot of steps are involved. Consider the simple distribution analysis of the variables, the diagnosis and reduction of the influence of variables' multicollinearity, the imputation of missing values, and the construction of categories in variables. In this presentation, we use data mining models in different areas like marketing, insurance, retail and credit risk. We show how to implement data preparation through SAS® Enterprise Miner™, using different approaches. We use simple code routines and complex processes involving statistical insights, cluster variables, transform variables, graphical analysis, decision trees, and more.
Ricardo Galante, SAS
Since the financial crisis of 2008, banks and bank holding companies in the United States have faced increased regulation. One of the recent changes to these regulations is known as the Comprehensive Capital Analysis and Review (CCAR). At the core of these new regulations, specifically under the Dodd-Frank Wall Street Reform and Consumer Protection Act and the stress tests it mandates, are a series of what-if or scenario analyses requirements that involve a number of scenarios provided by the Federal Reserve. This paper proposes frequentist and Bayesian time series methods that solve this stress testing problem using a highly practical top-down approach. The paper focuses on the value of using univariate time series methods, as well as the methodology behind these models.
Kenneth Sanford, SAS
Christian Macaro, SAS
This paper explains how to build a linear regression model using the variable transformation method. Testing the assumptions, which is required for linear modeling and testing the fit of a linear model, is included. This paper is intended for analysts who have limited exposure to building linear models. This paper uses the REG, GLM, CORR, UNIVARIATE, and GPLOT procedures.
Nancy Hu, Discover
Banks can create a competitive advantage in their business by using business intelligence (BI) and by building models. In the credit domain, the best practice is to build risk-sensitive models (Probability of Default, Exposure at Default, Loss-given Default, Unexpected Loss, Concentration Risk, and so on) and implement them in decision-making, credit granting, and credit risk management. There are models and tools on the next level built on these models and that are used to help in achieving business targets, risk-sensitive pricing, capital planning, optimizing of ROE/RAROC, managing the credit portfolio, setting the level of provisions, and so on. It works remarkably well as long as the models work. However, over time, models deteriorate and their predictive power can drop dramatically. Since the global financial crisis in 2008, we have faced a tsunami of regulation and accelerated frequency of changes in the business environment, which cause models to deteriorate faster than ever before. As a result, heavy reliance on models in decision-making (some decisions are automated following the model's results--without human intervention) might result in a huge error that can have dramatic consequences for the bank's performance. In my presentation, I share our experience in reducing model risk and establishing corporate governance of models with the following SAS® tools: model monitoring, SAS® Model Manager, dashboards, and SAS® Visual Analytics.
Boaz Galinson, Bank Leumi
Operational risk losses are heavy tailed and likely to be asymmetric and extremely dependent among business lines and event types. We propose a new methodology to assess, in a multivariate way, the asymmetry and extreme dependence between severity distributions and to calculate the capital for operational risk. This methodology simultaneously uses several parametric distributions and an alternative mix distribution (the lognormal for the body of losses and the generalized Pareto distribution for the tail) via the extreme value theory using SAS®; the multivariate skew t-copula applied for the first time to operational losses; and the Bayesian inference theory to estimate new n-dimensional skew t-copula models via Markov chain Monte Carlo (MCMC) simulation. This paper analyzes a new operational loss data set, SAS® Operational Risk Global Data (SAS OpRisk Global Data), to model operational risk at international financial institutions. All of the severity models are constructed in SAS® 9.2. We implement PROC SEVERITY and PROC NLMIXED and this paper describes this implementation.
Betty Johanna Garzon Rozo, The University of Edinburgh
Customer Long-Term Value (LTV) is a concept that is readily explained at a high level to marketing management of a company, but its analytic development is complex. This complexity involves the need to forecast customer behavior well into the future. This behavior includes the timing, frequency, and profitability of a customer's future purchases of products and services. This paper describes a method for computing LTV. First, a multinomial logistic regression provides probabilities for time-of-first-purchase, time-of-second-purchase, and so on, for each customer. Then the profits for the first purchase, second purchase, and so on, are forecast but only after adjustment for non-purchaser selection bias. Finally, these component models are combined in the LTV formula.
Bruce Lund, Marketing Associates, LLC
Risk managers and traders know that some knowledge loses its value quickly. Unfortunately, due to the computationally intensive nature of risk, most risk managers use stale data. Knowing your positions and risk intraday can provide immense value. Imagine knowing the portfolio risk impact of a trade before you execute. This paper shows you a path to doing real-time risk analysis leveraging capabilities from SAS® Event Stream Processing Engine and SAS® High-Performance Risk. Event stream processing (ESP) offers the ability to process large amounts of data with high throughput and low latency, including streaming real-time trade data from front-office systems into a centralized risk engine. SAS High-Performance Risk enables robust, complex portfolio valuations and risk calculations quickly and accurately. In this paper, we present techniques and demonstrate concepts that enable you to more efficiently use these capabilities together. We also show techniques for analyzing SAS High-Performance data with SAS® Visual Analytics.
Albert Hopping, SAS
Arvind Kulkarni, SAS
Ling Xiang, SAS
As a part of regulatory compliance requirements, banks are required to submit reports based on Microsoft Excel, as per templates supplied by the regulators. This poses several challenges, including the high complexity of templates, the fact that implementation using ODS can be cumbersome, and the difficulty in keeping up with regulatory changes and supporting dynamic report content. At the same time, you need the flexibility to customize and schedule these reports as per your business requirements. This paper discusses an approach to building these reports using SAS® XML Mapper and the Excel XML spreadsheet format. This approach provides an easy-to-use framework that can accommodate template changes from the regulators without needing to modify the code. It is implemented using SAS® technologies, providing you the flexibility to customize to your needs. This approach also provides easy maintainability.
Sarita Kannarath, SAS
Phil Hanna, SAS
Amitkumar Nakrani, SAS
Nishant Sharma, SAS
As a consequence of the financial crisis, banks are required to stress test their balance sheet and earnings based on prescribed macroeconomic scenarios. In the US, this exercise is known as the Comprehensive Capital Analysis and Review (CCAR) or Dodd-Frank Act Stress Testing (DFAST). In order to assess capital adequacy under these stress scenarios, banks need a unified view of their projected balance sheet, incomes, and losses. In addition, the bar for these regulatory stress tests is very high regarding governance and overall infrastructure. Regulators and auditors want to ensure that the granularity and quality of data, model methodology, and assumptions reflect the complexity of the banks. This calls for close internal collaboration and information sharing across business lines, risk management, and finance. Currently, this process is managed in an ad hoc, manual fashion. Results are aggregated from various lines of business using spreadsheets and Microsoft SharePoint. Although the spreadsheet option provides flexibility, it brings ambiguity into the process and makes the process error prone and inefficient. This paper introduces a new SAS® stress testing solution that can help banks define, orchestrate and streamline the stress-testing process for easier traceability, auditability, and reproducibility. The integrated platform provides greater control, efficiency, and transparency to the CCAR process. This will enable banks to focus on more value-added analysis such as scenario exploration, sensitivity analysis, capital planning and management, and model dependencies. Lastly, the solution was designed to leverage existing in-house platforms that banks might already have in place.
Wei Chen, SAS
Shannon Clark
Erik Leaver, SAS
John Pechacek
The era of mass marketing is over. Welcome to the new age of relevant marketing where whispering matters far more than shouting.' At ZapFi, using the combination of sponsored free Wi-Fi and real-time consumer analytics,' we help businesses to better understand who their customers are. This gives businesses the opportunity to send highly relevant marketing messages based on the profile and the location of the customer. It also leads to new ways to build deeper and more intimate, one-on-one relationships between the business and the customer. During this presentation, ZapFi will use a few real-world examples to demonstrate that the future of mobile marketing is much more about data and far less about advertising.
Gery Pollet, ZapFi
In 2014, for the first time, mid-market banks (consisting of banks and bank holding companies with $10-$50 billion in consolidated assets) were required to submit Capital Stress Tests to the federal regulators under the Dodd-Frank Act Stress Testing (DFAST). This is a process large banks have been going through since 2011. However, mid-market banks are not positioned to commit as many resources to their annual stress tests as their largest peers. Limited human and technical resources, incomplete or non-existent detailed historical data, lack of enterprise-wide cross-functional analytics teams, and limited exposure to rigorous model validations are all challenges mid-market banks face. While there are fewer deliverables required from the DFAST banks, the scrutiny the regulators are placing on the analytical modes is just as high as their expectations for Comprehensive Capital Analysis and Review (CCAR) banks. This session discusses the differences in how DFAST and CCAR banks execute their stress tests, the challenges facing DFAST banks, and potential ways DFAST banks can leverage the analytics behind this exercise.
Charyn Faenza, F.N.B. Corporation
Sampling for audits and forensics presents special challenges: Each survey/sample item requires examination by a team of professionals, so sample size must be contained. Surveys involve estimating--not hypothesis testing. So power is not a helpful concept. Stratification and modeling is often required to keep sampling distributions from being skewed. A precision of alpha is not required to create a confidence interval of 1-alpha, but how small a sample is supportable? Many times replicated sampling is required to prove the applicability of the design. Given the robust, programming-oriented approach of SAS®, the random selection, stratification, and optimizing techniques built into SAS can be used to bring transparency and reliability to the sample design process. While a sample that is used in a published audit or as a measure of financial damages must endure a special scrutiny, it can be a rewarding process to design a sample whose performance you truly understand and which will stand up under a challenge.
Turner Bond, HUD-Office of Inspector General
The vast and increasing demands of fraud detection and description have promoted the broad application of statistics and machine learning in fields as diverse as banking, credit card application and usage, insurance claims, trader surveillance, health care claims, and government funding and allowance management. SAS® Visual Scenario Designer enables you to derive interactive business rules, along with descriptive and predictive models, to detect and describe fraud. This paper focuses on building interactive decision trees to classify fraud. Attention to optimizing the feature space (candidate predictors) prior to modeling is also covered. Because big data plays an increasingly vital role in fraud detection and description, SAS Visual Scenario Designer leverages the in-memory, parallel, and distributed computing abilities of SAS® LASR™ Analytic Server as a back end to support real-time performance on massive amounts of data.
Yue Qi, SAS
The measurement of factors influencing consumer purchasing decisions is of interest to all manufacturers of goods, retailers selling these goods, and consumers buying these goods. In the past decade, conjoint analysis has become one of the commonly used statistical techniques for analyzing the decisions or trade-offs consumers make when they purchase products. Although recent years have seen increased use of conjoint analysis and conjoint software, there is limited work that has spelled out a systematic procedure on how to do a conjoint analysis or how to use conjoint software. This paper reviews basic conjoint analysis concepts, describes the mathematical and statistical framework on which conjoint analysis is built, and introduces the TRANSREG and PHREG procedures, their syntaxes, and the output they generate using simplified real-life data examples. This paper concludes by highlighting some of the substantives issues related to the application of conjoint analysis in a business environment and the available auto call macros in SAS/STAT®, SAS/IML®, and SAS/QC® software that can handle more complex conjoint designs and analyses. The paper will benefit the basic SAS user, and statisticians and research analysts in every industry, especially those in marketing and advertisement.
Delali Agbenyegah, Alliance Data Systems
Credit card usage modelling is a relatively innovative task of client predictive analytics compared to risk modelling such as credit scoring. The credit limit utilization rate is a problem with limited outcome values and highly dependent on customer behavior. Proportion prediction techniques are widely used for Loss Given Default estimation in credit risk modelling (Belotti and Crook, 2009; Arsova et al, 2011; Van Berkel and Siddiqi, 2012; Yao et al, 2014). This paper investigates some regression models for utilization rate with outcome limits applied and provides a comparative analysis of the predictive accuracy of the methods. Regression models are performed in SAS/STAT® using PROC REG, PROC LOGISTIC, PROC NLMIXED, PROC GLIMMIX, and SAS® macros for model evaluation. The conclusion recommends credit limit utilization rate prediction techniques obtained from the empirical analysis.
Denys Osipenko, the University of Edinburgh
Jonathan Crook
Did you ever wonder how large US bank holding companies (BHCs) perform stress testing? I had the pleasure to be a part of this process on the model building end, and now I perform model validation. As with everything that is new and uncertain, there is much room for the discovery process. This presentation explains how banks in general perform time series modeling of different loans and credits to establish the bank's position during simulated stress. You learn the basic process behind model building and validation for Comprehensive Capital Analysis and Review (CCAR) purposes, which includes, but is not limited to, back testing, sensitivity analysis, scenario analysis, and model assumption testing. My goal is to gain your interest in the areas of challenging current modeling techniques and looking beyond standard model assumption testing to assess the true risk behind the formulated model and its consequences. This presentation examines the procedures that happen behind the scenes of any code's syntax to better explore statistics that play crucial roles in assessing model performance and forecasting. Forecasting future periods is the process that needs more attention and a better understanding because this is what the CCAR is really all about. In summary, this presentation engages professionals and students to dig dipper into every aspect of time series forecasting.
Ania Supady, KeyCorp
During the financial crisis of 2007-2009, the U.S. labor market lost 8.4 million jobs, causing the unemployment rate to increase from 5% to 9.5%. One of the indicators for economic recession is negative gross domestic product (GDP) for two consecutive quarters. This poster combines quantitative and qualitative techniques to predict the economic downturn by forecasting recession probabilities. Data was collected from the Board of Governors of the Federal Reserve System and the Federal Reserve Bank of St. Louis, containing 29 variables and quarterly observations from 1976-Q1 to 2013-Q3. Eleven variables were selected as inputs based on their effects on recession and limiting the multicollinearity: long-term treasury yield (5-year and 10-year), mortgage rate, CPI inflation rate, prime rate, market volatility index, Better Business Bureau (BBB) corporate yield, house price index, stock market index, commercial real estate price index, and one calculated variable yield spread (Treasury yield-curve spread). The target variable was a binary variable depicting the economic recession for each quarter (1=Recession). Data was prepared for modeling by applying imputation and transformation on variables. Two-step analysis was used to forecast the recession probabilities for the short-term period. Predicted recession probabilities were first obtained from the Backward Elimination Logistic Regression model that was selected on the basis of misclassification (validation misclassification= 0.115). These probabilities were then forecasted using the Exponential Smoothing method that was selected on the basis of mean average error (MAE= 11.04). Results show the recession periods including the great recession of 2008 and the forecast for eight quarters (up to 2015-Q3).
Avinash Kalwani, Oklahoma State University
Nishant Vyas, Oklahoma State University
Developing a quality product or service, while at the same time improving cost management and maximizing profit, are challenging goals for any company. Finding the optimal balance between efficiency and profitability is not easy. The same can be said in regards to the development of a predictive statistical model. On the one hand, the model should predict as accurately as possible. On the other hand, having too many predictors can end up costing company money. One of the purposes of this project is to explore the cost of simplicity. When is it worth having a simpler model, and what are some of the costs of using a more complex one? The answer to that question leads us to another one: How can a predictive statistical model be maximized in order to increase a company's profitability? Using data from the consumer credit risk domain provided from CompuCredit (now Atlanticus), we used logistic regression to build binary classification models to predict the likelihood of default. This project compares two of these models. Although the original data set had several hundred predictor variables and more than a million observations, I chose to use rather simple models. My goal was to develop a model with as few predictors as possible, while not going lower than a concordant level of 80%. Two models were evaluated and compared based on efficiency, simplicity, and profitability. Using the selected model, cluster analysis was then performed in order to maximize the estimated profitability. Finally, the analysis was taken one step further through a supervised segmentation process, in order to target the most profitable segment of the best cluster.
Sherrie Rodriguez, Kennesaw State University
A bank that wants to use the Internal Ratings Based (IRB) methods to calculate minimum Basel capital requirements has to calculate default probabilities (PDs) for all its obligors. Supervisors are mainly concerned about the credit risk being underestimated. For high-quality exposures or groups with an insufficient number of obligors, calculations based on historical data may not be sufficiently reliable due to infrequent or no observed defaults. In an effort to solve the problem of default data scarcity, modeling assumptions are made, and to control the possibility of model risk, a high level of conservatism is applied. Banks, on the other hand, are more concerned about PDs that are too pessimistic, since this has an impact on their pricing and economic capital. In small samples or where we have little or no defaults, the data provides very little information about the parameters of interest. The incorporation of prior information or expert judgment and using Bayesian parameter estimation can potentially be a very useful approach in a situation like this. Using PROC MCMC, we show that a Bayesian approach can serve as a valuable tool for validation and monitoring of PD models for low default portfolios (LDPs). We cover cases ranging from single-period, zero correlation, and zero observed defaults to multi-period, non-zero correlation, and few observed defaults.
Machiel Kruger, North-West University
Network diagrams in SAS® Visual Analytics help highlight relationships in complex data by enabling users to visually correlate entire populations of values based on how they relate to one another. Network diagrams are appealing because they enable an analyst to visualize large volumes and relationships of data and to assign multiple roles to represent key factors for analysis such as node size and color and linkage size and color. SAS Visual Analytics can overlay a network diagram on top of a spatial geographic map for an even more appealing visualization. This paper focuses specifically on how to prepare data for network diagrams and how to build network diagrams in SAS Visual Analytics. This paper provides two real-world examples illustrating how to visualize users and groups from SAS® metadata and how banks can visualize transaction flow using network diagrams.
Stephen Overton, Zencos Consulting
Benjamin Zenick, Zencos
Working with multiple data sources in SAS® was not a straight forward thing until PROC FEDSQL was introduced in the SAS® 9.4 release. Federated Query Language, or FEDSQL, is a vendor-independent language that provides a common SQL syntax to communicate across multiple relational databases without having to worry about vendor-specific SQL syntax. PROC FEDSQL is a SAS implementation of the FEDSQL language. PROC FEDSQL enables us to write federated queries that can be used to perform joins on tables from different databases with a single query, without having to worry about loading the tables into SAS individually and combining them using DATA steps and PROC SQL statements. The objective of this paper is to demonstrate the working of PROC FEDSQL to fetch data from multiple data sources such as Microsoft SQL Server database, MySQL database, and a SAS data set, and run federated queries on all the data sources. Other powerful features of PROC FEDSQL such as transactions and FEDSQL pass-through facility are discussed briefly.
Zabiulla Mohammed, Oklahoma State University
Ganesh Kumar Gangarajula, Oklahoma State University
Pradeep Reddy Kalakota, Federal Home Loan Bank of Desmoines