A useful and often overlooked tool released with the SAS® Business Intelligence suite to aid in ETL is SAS® Data Integration Studio. This product gives users the ability to extract, transform, join, and load data from various database management systems (DBMSs), data marts, and other data stores by using a graphical interface and without having to code different credentials for each schema. It enables seamless promotion of code to a production system without the need to alter the code. And it is quite useful for deploying and scheduling jobs by using the schedule manager in SAS® Management Console, because all code created by Data Integration Studio is optimized. Although this tool enables users to create code from scratch, one of its most useful capabilities is that it can take legacy SAS® code and, with minimal alterations, have its data associations created and have all the properties of a job coded from scratch.
Erik Larsen, Independent Consultant
The new and highly anticipated SAS® Output Delivery System (ODS) destination for Microsoft Excel is finally here! Available as a production feature in the third maintenance release of SAS® 9.4 (TS1M3), this new destination generates native Excel (XLSX) files that are compatible with Microsoft Office 2010 or later. This paper is written for anyone, from entry-level programmers to business analysts, who uses the SAS® System and Microsoft Excel to create reports. The discussion covers features and benefits of the new Excel destination, differences between the Excel destination and the older ExcelXP tagset, and functionality that exists in the ExcelXP tagset that is not available in the Excel destination. These topics are all illustrated with meaningful examples. The paper also explains how you can bridge the gap that exists as a result of differences in the functionality between the destination and the tagset. In addition, the discussion outlines when it is beneficial for you to use the Excel destination versus the ExcelXP tagset, and vice versa. After reading this paper, you should be able to make an informed decision about which tool best meets your needs.
Chevell Parker, SAS
For far too long, anti-money laundering and terrorist financing solutions have forced analysts to wade through oceans of transactions and alerted work items (alerts). Alert-centered analysis is both ineffective and costly. The goal of an anti-money laundering program is to reduce risk for your financial institution, and to do this most effectively, you must start with analysis at the customer level, rather than simply troll through volumes of alerts and transactions. In this session, discover how a customer-centric approach leads to increased analyst efficiency and streamlined investigations. Rather than starting with alerts and transactions, starting with a customer-centric view allows your analysts to rapidly triage suspicious activities, prioritize work, and quickly move into investigating the highest risk customer activities.
Kathy Hart, SAS
In the Collection Direction of a well-recognized Colombian financial institution, there was no methodology that provided an optimal number of collection agents to improve the collection task and make possible that more customers be compromised with their minimum monthly payment of their debt. The objective of this paper is to apply the Data Envelopment Analysis (DEA) Optimization Methodology to determine the optimal number of agents to maximize the monthly collection in the bank. We show that the results can have a positive impact to the credit portfolio behavior and reduce the collection management cost. DEA optimization methodology has been successfully used in various fields to solve multi-criteria optimization problems, but it is not commonly used in the financial sector mostly because this methodology requires specialized software, such as SAS® Enterprise Guide®. In this paper, we present the PROC OPTMODEL and we show how to formulate the optimization problem, program the SAS® Code, and how to process adequately the available data.
Miguel Díaz, Scotiabank - Colpatria
Oscar Javier Cortés Arrigui, Scotiabank - Colpatria
Currently Colpatria, as a part of Scotiabank in Colombia, has several methodologies that enable us to have a vision of the customer from a risk perspective. However, the current trend in the financial sector is to have a global vision that involves aspects of risk as well as of profitability and utility. As a part of the business strategies to develop cross-sell and customer profitability under conditions of risk needs, it's necessary to create a customer value index to score the customer according to different groups of business key variables that permit us to describe the profitability and risk of each customer. In order to generate the Index of Customer Value, we propose to construct a synthetic index using principal component analysis and multiple factorial analysis.
Ivan Atehortua, Colpatria
Diana Flórez, Colpatria
At Royal Bank of Scotland, business intelligence users require sophisticated security permissions both at object level and data (row) level in order to comply with data security, audit, and regulatory requirements. When we rolled out SAS® Visual Analytics to our two main stakeholder groups, this was identified as a key requirement as data is no longer restricted to the desktop but is increasingly available on mobile devices such as tablets and smart phones. Implementing row-level security (RLS) controls, in addition to standard security measures such as authentication, is a most effective final layer in your data authorization process. RLS procedures in leading relational database management systems (RDBMSs) and business intelligence (BI) software are fairly commonplace, but with the emergence of big data and in-memory visualization tools such as SAS® Visual Analytics, those RLS procedures now need to be extended to the memory interface. Identity-driven row-level security is a specific RLS technique that enables the same report query to retrieve different sets of data in accordance with the varying security privileges afforded to the respective users. This paper discusses an automated framework approach for applying identity-driven RLS controls on SAS® Visual Analytics and our plans to implement a generic end-to-end RLS framework extended to the Teradata data warehouse.
Paul Johnson, Sopra Steria
Ekaitz Goienola, SAS
Dileep Pournami, RBS
Bayesian inference for complex hierarchical models with smoothing splines is typically intractable, requiring approximate inference methods for use in practice. Markov Chain Monte Carlo (MCMC) is the standard method for generating samples from the posterior distribution. However, for large or complex models, MCMC can be computationally intensive, or even infeasible. Mean Field Variational Bayes (MFVB) is a fast deterministic alternative to MCMC. It provides an approximating distribution that has minimum Kullback-Leibler distance to the posterior. Unlike MCMC, MFVB efficiently scales to arbitrarily large and complex models. We derive MFVB algorithms for Gaussian semiparametric multilevel models and implement them in SAS/IML® software. To improve speed and memory efficiency, we use block decomposition to streamline the estimation of the large sparse covariance matrix. Through a series of simulations and real data examples, we demonstrate that the inference obtained from MFVB is comparable to that of PROC MCMC. We also provide practical demonstrations of how to estimate additional posterior quantities of interest from MFVB either directly or via Monte Carlo simulation.
Jason Bentley, The University of Sydney
Cathy Lee, University of Technology Sydney
Financial institutions rely heavily on quantitative models for risk management, balance-sheet stress testing and various business analyses and decision support functions. Investment decisions and business strategies are largely driven by estimates from models. Recent financial crises and model failures at high-profile banks have emphasized the need for better modeling practices. Regulators have stepped-in to assist banks with enhanced guidance and regulations for effective model risk management. Effective model risk management is more than developing a good model. SAS® Model Risk Management provides a robust framework to capture and track model inventory. In this paper we present best practices in model risk management learned from implementation projects and interactions with industry experts. These best practices help firms that are setting up a model risk management framework or enhancing their existing practices.
Satish Garla, SAS
Sukhbir Dhillon, SAS
The surge of data and data sources in marketing has created an analytical bottleneck in most organizations. Analytics departments have been pushed into a difficult decision: either purchase black-box analytical tools to generate efficiencies or hire more analysts, modelers, and data scientists. Knowledge gaps stemming from restrictions in black-box tools or from backlogs in the work of analytical teams have resulted in lost business opportunities. Existing big data analytics tools respond well when dealing with large record counts and small variable counts, but they fall short in bringing efficiencies when dealing with wide data. This paper discusses the importance of an agile modeling engine designed to deliver productivity, irrespective of the size of the data or the complexity of the modeling approach.
Mariam Seirafi, Cornerstone Group of Companies
Even though marketing is inevitable in every business, every year the marketing budget is limited and prudent fund allocations are required to optimize marketing investment. In many businesses, the marketing fund is allocated based on the marketing manager's experience, departmental budget allocation rules, and sometimes 'gut feelings' of business leaders. Those traditional ways of budget allocation yield suboptimal results and in many cases lead to money being wasted on irrelevant marketing efforts. Marketing mixed models can be used to understand the effects of marketing activities and identify the key marketing efforts that drive the most sales among a group of competing marketing activities. The results can be used in marketing budget allocation to take out the guesswork that typically goes into the budget allocation. In this paper, we illustrate practical methods for developing and implementing marketing mixed modeling using SAS® procedures. Real-life challenges of marketing mixed model development and execution are discussed, and several recommendations are provided to overcome some of those challenges.
Delali Agbenyegah, Alliance Data Systems
SAS® Grid Manager, as well as other grid computing technologies, have a set of great capabilities that we, IT professionals, love to have in our systems. This technology increases high availability, allows parallel processing, facilitates increasing demand by scale out, and offers other features that make life better for those managing and using these environments. However, even when business users take advantage of these features, they are more concerned about the business part of the problem. Most of the time business groups hold the budgets and are key stakeholders for any SAS Grid Manager project. Therefore, it is crucial to demonstrate to business users how they will benefit from the new technologies, how the features will improve their daily operations, help them be more efficient and productive, and help them achieve better results. This paper guides you through a process to create a strong and persuasive business plan that translates the technology features from SAS Grid Manager to business benefits.
Marlos Bosso, SAS
Credit card profitability prediction is a complex problem because of the variety of card holders' behavior patterns and the different sources of interest and transactional income. Each consumer account can move to a number of states such as inactive, transactor, revolver, delinquent, or defaulted. This paper i) describes an approach to credit card account-level profitability estimation based on the multistate and multistage conditional probabilities models and different types of income estimation, and ii) compares methods for the most efficient and accurate estimation. We use application, behavioral, card state dynamics, and macroeconomic characteristics, and their combinations as predictors. We use different types of logistic regression such as multinomial logistic regression, ordered logistic regression, and multistage conditional binary logistic regression with the LOGISTIC procedure for states transition probability estimation. The state transition probabilities are used as weights for interest rate and non-interest income models (which one is applied depends on the account state). Thus, the scoring model is split according to the customer behavior segment and the source of generated income. The total income consists of interest and non-interest income. Interest income is estimated with the credit limit utilization rate models. We test and compare five proportion models with the NLMIXED, LOGISTIC, and REG procedures in SAS/STAT® software. Non-interest income depends on the probability of being in a particular state, the two-stage model of conditional probability to make a point-of-sales transaction (POS) or cash withdrawal (ATM), and the amount of income generated by this transaction. We use the LOGISTIC procedure for conditional probability prediction and the GLIMMIX and PANEL procedures for direct amount estimation with pooled and random-effect panel data. The validation results confirm that traditional techniques can be effectively applied to complex tasks with many para
meters and multilevel business logic. The model is used in credit limit management, risk prediction, and client behavior analytics.
Denys Osipenko, the University of Edinburgh Business School
The recent advances in regulatory stress testing, including stress testing regulated by Comprehensive Capital Analysis and Review (CCAR) in the US, the Prudential Regulation Authority (PRA) in the UK, and the European Banking Authority in the EU, as well as the new international accounting requirement known as IFRS 9 (International Financial Reporting Standard), all pose new challenges to credit risk modeling. The increasing sophistication of the models that are supposed to cover all the material risks in the underlying assets in various economic scenarios makes models harder to implement. Banks are spending a lot of resources on the model implementation but are still facing issues due to long time to deployment and disconnection between the model development and implementation teams. Models are also required at a more granular level, in many cases, down to the trade and account levels. Efficient model execution becomes valuable for banks to get timely response to the analysis requests. At the same time, models are subject to more stringent internal and external scrutiny. This paper introduces a suite of risk modeling solutions from credit risk modeling leader SAS® to help banks overcome these new challenges and be competent to meet the regulatory requirements.
Wei Chen, SAS
Martim Rocha, SAS
Jimmy Skoglund, SAS
There are standard risk metrics financial institutions use to assess the risk of a portfolio. These include well known measures like value at risk and expected shortfall and related measures like contribution value at risk. While there are industry-standard approaches for calculating these measures, it is often the case that financial institutions have their own methodologies. Further, financial institutions write their own measures, in addition to the common risk measures. SAS® High-Performance Risk comes equipped with over 20 risk measures that use standard methodology, but the product also allows customers to define their own risk measures. These user-defined statistics are treated the same way as the built-in measures, but the logic is specified by the customer. This paper leads the user through the creation of custom risk metrics using the HPRISK procedure.
Katherine Taylor, SAS
Steven Miles, SAS
Debt collection! The two words can trigger multiple images in one's mind--mostly harsh. However, let's try to think positively for a moment. In 2013, over $55 billion of debt were past due in the United States. What if all of these debts were left as is and the fate of credit issuers in the hands of good will payments made by defaulters? Well, this is not the most sustainable model, to say the least. In this situation, debt collection comes in as a tool that is employed at multiple levels of recovery to keep the credit flowing. Ranging from in-house to third-party to individual collection efforts, this industry is huge and plays an important role in keeping the engine of commerce running. In the recent past, with financial markets recovering and banks selling fewer charged-off accounts and at higher prices, debt collection has increasingly become a game of efficient operations backed by solid analytics. This paper takes you into the back alleys of all the data that is in there and gives an overview of some ways modeling can be used to impact the collection strategy and outcome. SAS® tools such as SAS® Enterprise Miner™ and SAS® Enterprise Guide® are extensively used for both data manipulation and modeling. Decision trees are given more focus to understand what factors make the most impact. Along the way, this paper also gives an idea of how analytics teams today are slowly trying to get the buy-in from other stakeholders in any company, which surprisingly is one of the most challenging aspects of our jobs.
Karush Jaggi, AFS Acceptance
Harold Dickerson, SquareTwo Financial
Thomas Waldschmidt, SquareTwo Financial
As SAS® programmers, we often develop listings, graphs, and reports that need to be delivered frequently to our customers. We might decide to manually run the program every time we get a request, or we might easily schedule an automatic task to send a report at a specific date and time. Both scenarios have some disadvantages. If the report is manual, we have to find and run the program every time someone request an updated version of the output. It takes some time and it is not the most interesting part of the job. If we schedule an automatic task in Windows, we still sometimes get an email from the customers because they need the report immediately. That means that we have to find and run the program for them. This paper explains how we developed an on-demand report platform using SAS® Enterprise Guide®, SAS® Web Application Server, and stored processes. We had developed many reports for different customer groups, and we were getting more and more emails from them asking for updated versions of their reports. We felt we were not using our time wisely and decided to create an infrastructure where users could easily run their programs through a web interface. The tool that we created enables SAS programmers to easily release on-demand web reports with minimum programming. It has web interfaces developed using stored processes for the administrative tasks, and it also automatically customizes the front end based on the user who connects to the website. One of the challenges of the project was that certain reports had to be available to a specific group of users only.
Romain Miralles, Genomic Health
Ensemble models are a popular class of methods for combining the posterior probabilities of two or more predictive models in order to create a potentially more accurate model. This paper summarizes the theoretical background of recent ensemble techniques and presents examples of real-world applications. Examples of these novel ensemble techniques include weighted combinations (such as stacking or blending) of predicted probabilities in addition to averaging or voting approaches that combine the posterior probabilities by adding one model at a time. Fit statistics across several data sets are compared to highlight the advantages and disadvantages of each method, and process flow diagrams that can be used as ensemble templates for SAS® Enterprise Miner™ are presented.
Wendy Czika, SAS
Ye Liu, SAS Institute
As Data Management professionals, you have to comply with new regulations and controls. One such regulation is Basel Committee on Banking Supervision (BCBS) 239. To respond to these new demands, you have to put processes and methods in place to automate metadata collection and analysis, and to provide rigorous documentation around your data flows. You also have to deal with many aspects of data management including data access, data manipulation (ETL and other), data quality, data usage, and data consumption, often from a variety of toolsets that are not necessarily from a single vendor. This paper shows you how to use SAS® technologies to support data governance requirements, including third party metadata collection and data monitoring. It highlights best practices such as implementing a business glossary and establishing controls for monitoring data. Attend this session to become familiar with the SAS tools used to meet the new requirements and to implement a more managed environment.
Jeff Stander, SAS
Scotiabank Colombian division - Colpatria, is the national leader in terms of providing credit cards, with more than 1,600,000 active cards--the equivalent to a portfolio of 700 million dollars approximately. The behavior score is used to offer credit cards through a cross-sell process, which happens only if customers have completed six months on books after using their first product with the bank. This is the minimum period of time requested by the behavior Artificial Neural Network (ANN) model. The six months on books internal policy suggests that the maturation of the client in this period is adequate, but this has never been proven. The following research aims to evaluate this hypothesis and calculate the appropriate time to offer cross-sales to new customers using Logistic Regression (Logit), while also segmenting these sales targets by their level of seniority using Discrete-Time Markov Chains (DTMC).
Oscar Javier Cortés Arrigui, Scotiabank - Colpatria
Miguel Angel Diaz Rodriguez, Scotiabank - Colpatria
SAS® Embedded Process offers a flexible, efficient way to leverage increasing amounts of data by injecting the processing power of SAS® directly where the data lives. SAS Embedded Process can tap into the massively parallel processing (MPP) architecture of Hadoop for scalable performance. Using SAS® In-Database Technologies for Hadoop, you can run scoring models generated by SAS® Enterprise Miner™ or, with SAS® In-Database Code Accelerator for Hadoop, user-written DS2 programs in parallel. With SAS Embedded Process on Hadoop you can also perform data quality operations, and extract and transform data using SAS® Data Loader. This paper explores key SAS technologies that run inside the Hadoop parallel processing framework and prepares you to get started with them.
David Ghazaleh, SAS
Do you create complex reports using PROC REPORT? Are you confused by the COMPUTE BLOCK feature of PROC REPORT? Are you even aware of it? Maybe you already produce reports using PROC REPORT, but suddenly your boss needs you to modify some of the values in one or more of the columns. Maybe your boss needs to see the values of some rows in boldface and others highlighted in a stylish yellow. Perhaps one of the columns in the report needs to display a variety of fashionable formats (some with varying decimal places and some without any decimals). Maybe the customer needs to see a footnote in specific cells of the report. Well, if this sounds familiar then come take a look at the COMPUTE BLOCK of PROC REPORT. This paper shows a few tips and tricks of using the COMPUTE DEFINE block with conditional IF/THEN logic to make your reports stylish and fashionable. The COMPUTE BLOCK allows you to use data DATA step code within PROC REPORT to provide customization and style to your reports. We'll see how the Census Bureau produces a stylish demographic profile for customers of its Special Census program using PROC REPORT with the COMPUTE BLOCK. The paper focuses on how to use the COMPUTE BLOCK to create this stylish Special Census profile. The paper shows quick tips and simple code to handle multiple formats within the same column, make the values in the Total rows boldface, trafficlighting, and how to add footnotes to any cell based on the column or row. The Special Census profile report is an Excel table created with ODS tagsets.ExcelXP that is stylish and fashionable, thanks in part to the COMPUTE BLOCK.
Chris Boniface, Census Bureau
Logistic regression models are commonly used in direct marketing and consumer finance applications. In this context the paper discusses two topics about the fitting and evaluation of logistic regression models. Topic 1 is a comparison of two methods for finding multiple candidate models. The first method is the familiar best subsets approach. Best subsets is then compared to a proposed new method based on combining models produced by backward and forward selection plus the predictors considered by backward and forward. This second method uses the SAS® procedure HPLOGISTIC with selection of models by Schwarz Bayes criterion (SBC). Topic 2 is a discussion of model evaluation statistics to measure predictive accuracy and goodness-of-fit in support of the choice of a final model. Base SAS® and SAS/STAT® software are used.
Bruce Lund, Magnify Analytic Solutions, Division of Marketing Associates
U.S. stock exchanges (currently there are 12) are tracked in real time via the Consolidated Tape System (CTS) and the Consolidated Quotation System (CQS). CQS contains every updated quote (buyer's bid price and seller's offer price) from each exchange, covering some 8,500 stock tickers. This is the basis by which brokers can honor their obligation to investors, mandated by the U.S. Securities and Exchange Commission, to execute transactions at the best price, that is, at the National Best Bid and Offer (NBBO). With the advent of electronic exchanges and high-frequency trading (timestamps are published to the microsecond), data set size has become a major operational consideration for market researchers re-creating NBBO values (over 1 billion quotes requiring 80 gigabytes of storage for a normal trading day). This presentation demonstrates a straightforward use of hash tables for tracking constantly changing quotes for each ticker/exchange combination, in tandem with an efficient means of determining changes in NBBO with every new quote.
Mark Keintz, Wharton Research Data Services
We introduce age-period-cohort (APC) models, which analyze data in which performance is measured by age of an account, account open date, and performance date. We demonstrate this flexible technique with an example from a recent study that seeks to explain the root causes of the US mortgage crisis. In addition, we show how APC models can predict website usage, retail store sales, salesperson performance, and employee attrition. We even present an example in which APC was applied to a database of tree rings to reveal climate variation in the southwestern United States.
Joseph Breeden, Prescient Models
If your organization already deploys one or more software solutions via Amazon Web Services (AWS), you know the value of the public cloud. AWS provides a scalable public cloud with a global footprint, allowing users access to enterprise software solutions anywhere at any time. Although SAS® began long before AWS was even imagined, many loyal organizations driven by SAS are moving their local SAS analytics into the public AWS cloud, alongside other software hosted by AWS. SAS® Solutions OnDemand has assisted organizations in this transition. In this paper, we describe how we extended our enterprise hosting business to AWS. We describe the open source automation framework from which SAS Soultions onDemand built our automation stack, which simplified the process of migrating a SAS implementation. We'll provide the technical details of our automation and network footprint, a discussion of the technologies we chose along the way, and a list of lessons learned.
Ethan Merrill, SAS
Bryan Harkola, SAS
Project management is a hot topic across many industries, and there are multiple commercial software applications for managing projects available. The reality, however, is that the majority of project management software is not applicable for daily usage. SAS® has a solution for this issue that can be used for managing projects graphically in real time. This paper introduces a new paradigm for project management using the SAS® Graph Template Language (GTL). SAS clients, in real time, can use GTL to visualize resource assignments, task plans, delivery tracking, and project status across multiple project levels for more efficient project management.
Zhouming(Victor) Sun, Medimmune
The SAS® Scalable Performance Data Server and SAS® Scalable Performance Data Engine are data formats from SAS® that support the creation of analytical base tables with tens of thousands of columns. These analytical base tables are used to support daily predictive analytical routines. Traditionally Storage Area Network (SAN) storage has been and continues to be the primary storage platform for the SAS Scalable Performance Data Server and SAS Scalable Performance Data Engine formats. Due to cost constraints associated with SAN storage, companies have added Hadoop to their environments to help minimize storage costs. In this paper we explore how the SAS Scalable Performance Data Server and SAS Scalable Performance Data Engine leverage the Hadoop Distributed File System.
Steven Sober, SAS
This paper provides tips and techniques to speed up the validation process without and with automation. For validation without automation, it introduces both standard use and clever use of options and statements to be implemented in the COMPARE procedure that can speed up the validation process. For validation with automation, a macro named %QCDATA is introduced for individual data set validation, and a macro named %QCDIR is introduced for comparison of data sets in two different directories. Also introduced in this section is a definition of &SYSINFO and an explanation of how it can be of use to interpret the result of the comparison.
Alice Cheng, Portola Pharmaceuticals
Justina Flavin, Independent Consultant
Michael Wise, Experis BI & Analytics Practice
Have you ever wondered how to get the most from Web 2.0 technologies in order to visualize SAS® data? How to make those graphs dynamic, so that users can explore the data in a controlled way, without needing prior knowledge of SAS products or data science? Wonder no more! In this session, you learn how to turn basic sashelp.stocks data into a snazzy HighCharts stock chart in which a user can review any time period, zoom in and out, and export the graph as an image. All of these features with only two DATA steps and one SORT procedure, for 57 lines of SAS code.
Vasilij Nevlev, Analytium Ltd
In the aftermath of the 2008 global financial crisis, banks had to improve their data risk aggregation in order to effectively identify and manage their credit exposures and credit risk, create early warning signs, and improve the ability of risk managers to challenge the business and independently assess and address evolving changes in credit risk. My presentation focuses on using SAS® Credit Risk Dashboard to achieve all of the above. Clearly, you can use my method and principles of building a credit risk dashboard to build other dashboards for other types of risks as well (market, operational, liquidity, compliance, reputation, etc.). In addition, because every bank must integrate the various risks with a holistic view, each of the risk dashboards can be the foundation for building an effective enterprise risk management (ERM) dashboard that takes into account correlation of risks, risk tolerance, risk appetite, breaches of limits, capital allocation, risk-adjusted return on capital (RAROC), and so on. This will support the actions of top management so that the bank can meet shareholder expectations in the long term.
Boaz Galinson, leumi
Looking for new ways to improve your business? Try mining your own data! Event log data is a side product of information systems generated for audit and security purposes and is seldom analyzed, especially in combination with business data. Along with the cloud computing era, more event log data has been accumulated and analysts are searching for innovative ways to take advantage of all data resources in order to get valuable insights. Process mining, a new field for discovering business patterns from event log data, has recently proved useful for business applications. Process mining shares some algorithms with data mining but it is more focused on interpretation of the detected patterns rather than prediction. Analysis of these patterns can lead to improvements in the efficiency of common existing and planned business processes. Through process mining, analysts can uncover hidden relationships between resources and activities and make changes to improve organizational structure. This paper shows you how to use SAS® Analytics to gain insights from real event log data.
Emily (Yan) Gao, SAS
Robert Chu, SAS
Xudong Sun, SAS
Real-time, integrated marketing solutions are a necessity for maintaining your competitive advantage. This presentation provides a brief overview of three SAS products (SAS® Marketing Automation, SAS® Real-Time Decision Manager, and SAS® Event Stream Processing) that form a basis for building modern, real-time, interactive marketing solutions. It presents typical (and also possible) customer-use cases that you can implement with a comprehensive real-time interactive marketing solution, in major industries like finance (banking), telco, and retail. It demonstrates typical functional architectures that need to be implemented to support business cases (how solution components collaborate with customer's IT landscape and with each other). And it provides examples of our experience in implementing these solutions--dos and don'ts, best practices, and what to expect from an implementation project.
Dmitriy Alergant, Tier One Analytics
Marje Fecht, Prowerk Consulting
Microsoft Visual Basic Scripting Edition (VBScript) and SAS® software are each powerful tools in their own right. These two technologies can be combined so that SAS code can call a VBScript program or vice versa. This gives a programmer the ability to automate SAS tasks; traverse the file system; send emails programmatically via Microsoft Outlook or SMTP; manipulate Microsoft Word, Microsoft Excel, and Microsoft PowerPoint files; get web data; and more. This paper presents example code to demonstrate each of these capabilities.
Christopher Johnson, BrickStreet Insurance
Gone are the days when the only method of receiving a loan was by visiting your local branch and working with a loan officer. In today's economy, financial institutions increasingly rely on online channels to interact with their customers. The anonymity that is inherent in this channel makes it a prime target for fraudsters. The solution is to profile the behavior of internet banking in real time and assess each transaction for risk as it is processed in order to prevent financial loss before it occurs. SAS® Visual Scenario Designer enables you to create rules, scenarios, and models, test their impact, and inject them into real-time transaction processing using SAS® Event Stream Processing.
Sam Atassi, SAS
Jamie Hutton, SAS
Do you want to see and experience how to configure SAS® Enterprise Miner™ single sign-on? Are you looking to explore setting up Integrated Windows Authentication with SAS® Visual Analytics? This hands-on workshop demonstrates how you can configure Kerberos delegation with SAS® 9.4. You see how to validate the prerequisites, make the configuration changes, and use the applications. By the end of this workshop you will be empowered to start your own configuration.
Stuart Rogers, SAS
Considering the fact that SAS® Grid Manager is becoming more and more popular, it is important to fulfill the user's need for a successful migration to a SAS® Grid environment. This paper focuses on key requirements and common issues for new SAS Grid users, especially if they are coming from a traditional environment. This paper describes a few common requirements like the need for a current working directory, the change of file system navigation in SAS® Enterprise Guide® with user-given location, getting job execution summary email, and so on. The GRIDWORK directory has been introduced in SAS Grid Manager, which is a bit different from the traditional SAS WORK location. This paper explains how you can use the GRIDWORK location in a more user-friendly way. Sometimes users experience data set size differences during grid migration. A few important reasons for data set size difference are demonstrated. We also demonstrate how to create new custom scripts as per business needs and how to incorporate them with SAS Grid Manager engine.
Piyush Singh, TATA Consultancy Services Ltd
Tanuj Gupta, TATA Consultancy Services
Prasoon Sangwan, Tata consultancy services limited
From stock price histories to hospital stay records, analysis of time series data often requires use of lagged (and occasionally lead) values of one or more analysis variables. For the SAS® user, the central operational task is typically getting lagged (lead) values for each time point in the data set. While SAS has long provided a LAG function, it has no analogous lead function--an especially significant problem in the case of large data series. This paper reviews the LAG function, in particular the powerful, but non-intuitive implications of its queue-oriented basis. The paper demonstrates efficient ways to generate leads with the same flexibility as the LAG function, but without the common and expensive recourse to data re-sorting. It also shows how to dynamically generate leads and lags through use of the hash object.
Mark Keintz, Wharton Research Data Services
Is uniqueness essential for your reports? SAS® Visual Analytics provides the ability to customize your reports to make them unique by using the SAS® Theme Designer. The SAS Theme Designer can be accessed from the SAS® Visual Analytics Hub to create custom themes to meet your branding needs and to ensure a unified look across your company. The report themes affect the colors, fonts, and other elements that are used in tables and graphs. The paper explores how to access SAS Theme Designer from the SAS Visual Analytics home page, how to create and modify report themes that are used in SAS Visual Analytics, how to create report themes from imported custom themes, and how to import and export custom report themes.
Meenu Jaiswal, SAS
Ipsita Samantarai, SAS Research & Development (India) Pvt Ltd
Business Intelligence users analyze business data in a variety of ways. Seventy percent of business data contains location information. For in-depth analysis, it is essential to combine location information with mapping. New analytical capabilities are added to SAS® Visual Analytics, leveraging the new partnership with Esri, a leader in location intelligence and mapping. The new capabilities enable users to enhance the analytical insights from SAS Visual Analytics. This paper demonstrates and discusses the new partnership with Esri and the new capabilities added to SAS Visual Analytics.
Murali Nori, SAS
Himesh Patel, SAS
For SAS® Enterprise Guide® users, sometimes macro variables and their values need to be brought over to the local workspace from the server, especially when multiple data sets or outputs need to be written to separate files in a local drive. Manually retyping the macro variables and their values in the local workspace after they have been created on the server workspace would be time-consuming and error-prone, especially when we have quite a number of macro variables and values to bring over. Instead, this task can be achieved in an efficient manner by using dictionary tables and the CALL SYMPUT routine, as illustrated in more detail below. The same approach can also be used to bring macro variables and their values from the local to the server workspace.
Khoi To, Office of Planning and Decision Support, Virginia Commonwealth University
Logic model produced propensity scores have been intensively used to assist direct marketing name selections. As a result, only customers with an absolute higher likelihood to respond are mailed offers in order to achieve cost reduction. Thus, event ROI is increased. There is a fly in the ointment, however. Compared to the model building performance time window, usually 6 months to 12 months, a marketing event time period is usually much shorter. As such, this approach lacks of the ability to deselect those who have a high propensity score but are unlikely to respond to an upcoming campaign. To consider dynamically building a complete propensity model for every upcoming camping is nearly impossible. But, incorporating time to respond has been of great interest to marketers to add another dimension for response prediction enhancement. Hence, this paper presents an inventive modeling technique combining logistic regression and the Cox Proportional Hazards Model. The objective of the fusion approach is to allow a customer's shorter next to repurchase time to compensate for his or her insignificant lower propensity score in winning selection opportunities. The method is accomplished using PROC LOGISTIC, PROC LIFETEST, PROC LIFEREF, and PROC PHREG on the fusion model that is building in a SAS® environment. This paper also touches on how to use the results to predict repurchase response by demonstrating a case of repurchase time-shift prediction on the 12-month inactive customers of a big box store retailer. The paper also shares a results comparison between the fusion approach and logit alone. Comprehensive SAS macros are provided in the appendix.
Hsin-Yi Wang, Alliance Data Systems
SAS® Visual Analytics Explorer puts the robust power of decision trees at your fingertips, enabling you to visualize and explore how data is structured. Decision trees help analysts better understand discrete relationships within data by visually showing how combinations of variables lead to a target indicator. This paper explores the practical use of decision trees in SAS Visual Analytics Explorer through an example of risk classification in the financial services industry. It explains various parameters and implications, explores ways the decision tree provides value, and provides alternative methods to help you the reality of imperfect data.
Stephen Overton, Zencos Consulting LLC
Ben Murphy, Zencos Consulting LLC
Business problems have become more stratified and micro-segmentation is driving the need for mass-scale, automated machine learning solutions. Additionally, deployment environments include diverse ecosystems, requiring hundreds of models to be built and deployed quickly via web services to operational systems. The new SAS® automated modeling tool allows you to build and test hundreds of models across all of the segments in your data, testing a wide variety of machine learning techniques. The tool is completely customizable, allowing you transparent access to all modeling results. This paper shows you how to identify hundreds of champion models using SAS® Factory Miner, while generating scoring web services using SAS® Decision Manager. Immediate benefits include efficient model deployments, which allow you to spend more time generating insights that might reveal new opportunities, expose hidden risks, and fuel smarter, well-timed decisions.
Jonathan Wexler, SAS
Steve Sparano, SAS
Every day, businesses have to remain vigilant of fraudulent activity, which threatens customers, partners, employees, and financials. Normally, networks of people or groups perpetrate deviant activity. Finding these connections is now made easier for analysts with SAS® Visual Investigator, an upcoming SAS® solution that ultimately minimizes the loss of money and preserves mutual trust among its shareholders. SAS Visual Investigator takes advantage of the capabilities of the new SAS® In-Memory Server. Investigators can efficiently investigate suspicious cases across business lines, which has traditionally been difficult. However, the time required to collect, process and identify emerging fraud and compliance issues has been costly. Making proactive analysis accessible to analysts is now more important than ever. SAS Visual Investigator was designed with this goal in mind and a key component is the visual social network view. This paper discusses how the network analysis view of SAS Visual Investigator, with all its dynamic visual capabilities, can make the investigative process more informative and efficient.
Danielle Davis, SAS
Stephen Boyd, SAS Institute
Ray Ong, SAS Institute
When analyzing data with SAS®, we often encounter missing or null values in data. Missing values can arise from the availability, collectibility, or other issues with the data. They represent the imperfect nature of real data. Under most circumstances, we need to clean, filter, separate, impute, or investigate the missing values in data. These processes can take up a lot of time, and they are annoying. For these reasons, missing values are usually unwelcome and need to be avoided in data analysis. There are two sides to every coin, however. If we can think outside the box, we can take advantage of the negative features of missing values for positive uses. Sometimes, we can create and use missing values to achieve our particular goals in data manipulation and analysis. These approaches can make data analyses convenient and improve work efficiency for SAS programming. This kind of creative and critical thinking is the most valuable quality for data analysts. This paper exploits real-world examples to demonstrate the creative uses of missing values in data analysis and SAS programming, and discusses the advantages and disadvantages of these methods and approaches. The illustrated methods and advanced programming skills can be used in a wide variety of data analysis and business analytics fields.
Justin Jia, Trans Union Canada
Shan Shan Lin, CIBC
Wouldn't it be fantastic to develop and tune scenarios in SAS® Visual Scenario Designer and then smoothly incorporate them into your SAS® Anti-Money Laundering solution with just a few clicks of your mouse? Well, now there is a way. SAS Visual Scenario Designer is the first data-driven solution for interactive rule and scenario authoring, testing, and validation. It facilitates exploration, visualization, detection, rule writing, auditing, and parameter tuning to reduce false positives; and all of these tasks are performed using point and click. No SAS® coding skills required! Using the approach detailed in this paper, we demonstrate how you can seamlessly port these SAS Visual Scenario Designer scenarios into your SAS Anti-Money Laundering solution. Rewriting the SAS Visual Scenario Designer scenarios in Base SAS® is no longer required! Furthermore, the SAS Visual Scenario Designer scenarios are executed on the lightning-speed SAS® LASR™ Analytic Server, reducing the time of the SAS Anti-Money Laundering scenario nightly batch run. The results of both the traditional SAS Anti-Money Laundering alerts and SAS Visual Scenario Designer alerts are combined and available for display on the SAS® Enterprise Case Management interface. This paper describes the different ways that the data can be explored to detect anomalous patterns and the three mechanisms for translating these patterns into rules. It also documents how to create the scenarios in SAS Visual Scenario Designer; how to test and tune the scenarios and parameters; and how alerts are ported seamlessly into the SAS Anti-Money Laundering alert generation process and the SAS Enterprise Case Management system.
Renee Palmer, SAS
Yue Chai, SAS Institute
Today's financial institutions employ tens of thousands of employees globally in the execution of manually intensive processes, from transaction processing to fraud investigation. While heuristics-based solutions have widely penetrated the industry, these solutions often work in the realm of broad generalities, lacking the nuance and experience of a human decision-maker. In today's regulatory environment, that translates to operational inefficiency and overhead as employees labor to rationalize human decisions that run counter to computed predictions. This session explores options that financial services institutions have to augment and automate day-to-day decision-making, leading to improvements in consistency, accuracy, and business efficiency. The session focuses on financial services case studies, including anti-money laundering, fraud, and transaction processing, to demonstrate real-world examples of how organizations can make the transition from predictions to decisions.
At Royal Bank of Scotland, one of our key organizational design principles is to 'share everything we can share.' In essence, this promotes the cross-departmental sharing of platform services. Historically, this was never enforced on our Business Intelligence platforms like SAS®, resulting in a diverse technology estate, which presents challenges to our platform team for maintaining software currency, software versions, and overall quality of service. Currently, we have SAS® 8.2 and SAS® 9.1.3 on the mainframe, SAS® 9.2, SAS® 9.3, and SAS® 9.4 across our Windows and Linux servers, and SAS® 9.1.3 and SAS® 9.4 on PC across the bank. One of the benefits to running a multi-tenant SAS environment is removing the need to procure, install, and configure a new environment when a new department wants to use SAS. However, the process of configuring a secure multi-tenant environment, using the default tools and procedures, can still be very labor intensive. This paper explains how we analyzed the benefits of creating a shared Enterprise Business Intelligence platform in SAS alongside the risks and organizational barriers to the approach. Several considerations are presented as well as some insight into how we managed to convince our key stakeholders with the approach. We also look at the 'custom' processes and tools that RBS has implemented. Through this paper, we encourage other organizations to think about the various considerations we present to decide if sharing is right for their context to maximize the return on investment in SAS.
Dileep Pournami, RBS
Christopher Blake, RBS
Ekaitz Goienola, SAS
Sergey Iglov, RBS
You've heard all the talk about SAS® Visual Analytics--but maybe you are still confused about how the product would work in your SAS® environment. Many customers have the same points of confusion about what they need to do with their data, how to get data into the product, how SAS Visual Analytics would benefit them, and even should they be considering Hadoop or the cloud. In this paper, we cover the questions we are asked most often about implementation, administration, and usage of SAS Visual Analytics.
Tricia Aanderud, Zencos Consulting LLC
Ryan Kumpfmiller, Zencos Consulting
Nick Welke, Zencos Consulting
Inspired by Christianna William's paper on transitioning to PROC SQL from the DATA step, this paper aims to help SQL programmers transition to SAS® by using PROC SQL. SAS adapted the Structured Query Language (SQL) by means of PROC SQL back with SAS®6. PROC SQL syntax closely resembles SQL. However, there are some SQL features that are not available in SAS. Throughout this paper, we outline common SQL tasks and how they might differ in PROC SQL. We also introduce useful SAS features that are not available in SQL. Topics covered are appropriate for novice SAS users.
Barbara Ross, NA
Jessica Bennett, Snap Finance
SAS® software provides many DATA step functions that search and extract patterns from a character string, such as SUBSTR, SCAN, INDEX, TRANWRD, etc. Using these functions to perform pattern matching often requires you to use many function calls to match a character position. However, using the Perl regular expression (PRX) functions or routines in the DATA step improves pattern-matching tasks by reducing the number of function calls and making the program easier to maintain. This talk, in addition to discussing the syntax of Perl regular expressions, demonstrates many real-world applications.
Arthur Li, City of Hope
Financial institutions are working hard to optimize credit and pricing strategies at adjudication and for ongoing account and customer management. For cards and other personal lending products, there is intense competitive pressure, together with relentless revenue challenges, that creates a huge requirement for sophisticated credit and pricing optimization tools. Numerous credit and pricing optimization applications are available on the market to satisfy these needs. We present a relatively new approach that relies heavily on the effect modeling (uplift or net lift) technique for continuous target metrics--revenue, cost, losses, and profit. Examples of effect modeling to optimize the impact of marketing campaigns are known. We discuss five essential steps on the credit and pricing optimization path: (1) setting up critical credit and pricing champion/challenger tests, (2) performance measurement of specific test campaigns, (3) effect modeling, (4) defining the best effect model, and (5) moving from the effect model to the optimal solution. These steps require specific applications that are not easily available in SAS®. Therefore, necessary tools have been developed in SAS/STAT® software. We go through numerous examples to illustrate credit and pricing optimization solutions.
Yuri Medvedev, Bank of Montreal
In recent years, many companies are trying to understand the rare events that are very critical in the current business environment. But a data set with rare events is always imbalanced and the models developed using this data set cannot predict the rare events precisely. Therefore, to overcome this issue, a data set needs to be sampled using specialized sampling techniques like over-sampling, under-sampling, or the synthetic minority over-sampling technique (SMOTE). The over-sampling technique deals with randomly duplicating minority class observations, but this technique might bias the results. The under-sampling technique deals with randomly deleting majority class observations, but this technique might lose information. SMOTE sampling deals with creating new synthetic minority observations instead of duplicating minority class observations or deleting the majority class observations. Therefore, this technique can overcome the problems, like biased results and lost information, found in other sampling techniques. In our research, we used an imbalanced data set containing results from a thyroid test with 3,163 observations, out of which only 4.7 percent of the observations had positive test results. Using SAS® procedures like PROC SURVERYSELECT and PROC MODECLUS, we created over-sampled, under-sampled, and the SMOTE sampled data set in SAS® Enterprise Guide®. Then we built decision tree, gradient boosting, and rule induction models using four different data sets (non-sampled, majority under-sampled, minority over-sampled with majority under-sampled, and minority SMOTE sampled with majority under-sampled) in SAS® Enterprise Miner™. Finally, based on the receiver operating characteristic (ROC) index, Kolmogorov-Smirnov statistics, and the misclassification rate, we found that the models built using minority SMOTE sampled with the majority under-sampled data yields better output for this data set.
Rhupesh Damodaran Ganesh Kumar, Oklahoma State University (SAS and OSU data mining Certificate)
Kiren Raj Mohan Jagan Mohan, Zions Bancorporation
In early 2006, the United States experienced a housing bubble that affected over half of the American states. It was one of the leading causes of the 2007-2008 financial recession. Primarily, the overvaluation of housing units resulted in foreclosures and prolonged unemployment during and after the recession period. The main objective of this study is to predict the current market value of a housing unit with respect to fair market rent, census region, metropolitan statistical area, area median income, household income, poverty income, number of units in the building, number of bedrooms in the unit, utility costs, other costs of the unit, and so on, to determine which factors affect the market value of the housing unit. For the purpose of this study, data was collected from the Housing Affordability Data System of the US Department of Housing and Urban Development. The data set contains 20 variables and 36,675 observations. To select the best possible input variables, several variable selection techniques were used. For example, LARS (least angle regression), LASSO (least absolute shrinkage and selection operator), adaptive LASSO, variable selection, variable clustering, stepwise regression, (PCA) principal component analysis only with numeric variables, and PCA with all variables were all tested. After selecting input variables, numerous modeling techniques were applied to predict the current market value of a housing unit. An in-depth analysis of the findings revealed that the current market value of a housing unit is significantly affected by the fair market value, insurance and other costs, structure type, household income, and more. Furthermore, a higher household income and median income of an area are associated with a higher market value of a housing unit.
Mostakim Tanjil, Oklahoma State University
Goutam Chakraborty, Oklahoma State University
This session discusses challenges and considerations typically faced in credit risk scoring as well as options and practical ways to address them. No two problems are ever the same even if the general approach is clear. Every model has its own unique characteristics and creative ways to address them. Successful credit scoring modeling projects are always based on a combination of both advanced analytical techniques and data, and a deep understanding of the business and how the model will be applied. Different aspects of the process are discussed, including feature selection, reject inferencing, sample selection and validation, and model design questions and considerations.
Regina Malina, Equifax
In a data warehousing system, change data capture (CDC) plays an important part not just in making the data warehouse (DWH) aware of the change but also in providing a means of flowing the change to the DWH marts and reporting tables so that we see the current and latest version of the truth. This and slowly changing dimensions (SCD) create a cycle that runs the DWH and provides valuable insights in the history and for the decision-making future. What if the source has no CDC? It would be an ETL nightmare to identify the exact change and report the absolute truth. If these two processes can be combined into a single process where just one single transform does both jobs of identifying the change and applying the change to the DWH, then we can save significant processing times and value resources of the system. Hence, I came up with a hybrid SCD with CDC approach for this. My paper focuses on sources that DO NOT have CDC in their sources and need to perform SCD Type 2 on such records without worrying about data duplications and increased processing times.
Vishant Bhat, University of Newcastle
Tony Blanch, SAS Consultant
Horizontal data sorting is a very useful SAS® technique in advanced data analysis when you are using SAS programming. Two years ago (SAS® Global Forum Paper 376-2013), we presented and illustrated various methods and approaches to perform horizontal data sorting, and we demonstrated its valuable application in strategic data reporting. However, this technique can also be used as a creative analytic method in advanced business analytics. This paper presents and discusses its innovative and insightful applications in product purchase sequence analyses such as product opening sequence analysis, product affinity analysis, next best offer analysis, time-span analysis, and so on. Compared to other analytic approaches, the horizontal data sorting technique has the distinct advantages of being straightforward, simple, and convenient to use. This technique also produces easy-to-interpret analytic results. Therefore, the technique can have a wide variety of applications in customer data analysis and business analytics fields.
Justin Jia, Trans Union Canada
Shan Shan Lin, CIBC
Representational State Transfer (REST) is being used across the industry for designing networked applications to provide lightweight and powerful alternatives to web services such as SOAP and Web Services Description Language (WSDL). Since REST is based entirely on HTTP, SAS® provides everything you need to make REST calls and to process structured and unstructured data alike. This paper takes a look at how some enhancements in the third maintenance release of SAS® 9.4 can benefit you in this area. Learn how the HTTP procedure and other SAS language features provide everything you need to simply and securely use REST.
Joseph Henry, SAS
One of the most important factors driving the success of requirements-gathering can be easily overlooked. Your user community needs to have a clear understanding of what is possible: from different ways to represent a hierarchy to how visualizations can drive an analysis to newer, but less common, visualizations that are quickly becoming standard. Discussions about desktop access versus mobile deployment and/or which users might need more advanced statistical reporting can lead to a serious case of option overload. One of the best cures for option overload is to provide your user community with access to template reports they can explore themselves. In this paper, we describe how you can take a single rich data set and build a set of template reports that demonstrate the full functionality of SAS® Visual Analytics, a suite of the most common, most useful SAS Visual Analytics report structures, from high-level dashboards to statistically deep dynamic visualizations. We show exactly how to build a dozen template reports from a single data source, simultaneously representing options for color schemes, themes, and other choices to consider. Although this template suite approach can apply to any industry, our example data set will be publicly available data from the Home Mortgage Disclosure Act, de-identified data on mortgage loan determinations. Instead of beginning requirements-gathering with a blank slate, your users can begin the conversation with, I would like something like Template #4, greatly reducing the time and effort required to meet their needs.
Elliot Inman, SAS
Michael Drutar, SAS
The success of any marketing promotion is measured by the incremental response and revenue generated by the targeted population known as Test in comparison with the holdout sample known as Control. An unbiased random Test and Control sampling ensures that the incremental revenue is in fact driven by the marketing intervention. However, isolating the true incremental effect of any particular marketing intervention becomes increasingly challenging in the face of overlapping marketing solicitations. This paper demonstrates how a look-alike model can be applied using the GMATCH algorithm on a SAS® platform to design a truly comparable control group to accurately measure and isolate the impact of a specific marketing intervention.
Mou Dutta, Genpact LLC
Arjun Natarajan, Genpact LLC
As credit unions market themselves to increase their market share against the big banks, they understandably focus on gaining new members. However, they must also retain their existing members. Otherwise, the new members they gain can easily be offset by existing members who leave. Happily, by using predictive analytics as described in this paper, keeping (and further engaging) existing members can actually be much easier and less expensive than enlisting new members. This paper provides a step-by-step overview of a relatively simple but comprehensive approach to reduce member attrition. We first prepare the data for a statistical analysis. With some basic predictive analytics techniques, we can then identify those members who have the highest chance of leaving and the highest value. For each of these members, we can also identify why they would leave, thus suggesting the best way to intervene to retain them. We then make suggestions to improve the model for better accuracy. Finally, we provide suggestions to extend this approach to further engaging existing members and thus increasing their lifetime value. This approach can also be applied to many other organizations and industries. Code snippets are shown for any version of SAS® software; they also require SAS/STAT® software.
Nate Derby, Stakana Analytics
Mark Keintz, Wharton Research Data Services
Analyzing massive amounts of big data quickly to get at answers that increase agility--an organization's ability to sense change and respond--and drive time-sensitive business decisions is a competitive differentiator in today's market. Customers will gain from the deep understanding of storage architecture provided by EMC combined with the deep expertise in analytics provided by SAS. This session provides an overview of how your mixed analytics SAS® workloads can be transformed on EMC XtremIO, DSSD, and Pivotal solutions. Whether you're working with one of the three primary SAS file systems or SAS® Grid, performance and capacity scale linearly, significantly eliminate application latency, and remove the complexity of storage tuning from the equation. SAS and EMC have partnered to meet customer challenges and deliver a modern analytic architecture. This unified approach encompasses big data management, analytics discovery, and deployment via end-to-end solutions that solve your big data problems. Learn about best practices from fellow customers and how they deliver new levels of SAS business value.
Mobile devices are an integral part of a business professional's life. These mobile devices are getting increasingly powerful in terms of processor speeds and memory capabilities. Business users can benefit from a more analytical visualization of the data along with their business context. The new SAS® Mobile BI contains many enhancements that facilitate the use of SAS® Analytics in the newest version of SAS® Visual Analytics. This paper demonstrates how to use the new analytical visualization that has been added to SAS Mobile BI from SAS Visual Analytics, for a richer and more insightful experience for business professionals on the go.
Murali Nori, SAS
In Brazil, almost 70% of all loans are made based on pre-approved limits, which are established by the bank. Sicredi wanted to improve the number of loans granted through those limits. In addition, Sicredi wanted an application that focuses on the business user; one that enables business users to change system behavior with little or no IT involvement. The new system will be used in three major areas: - In the registration of a new client for whom Sicredi does not have a history. - Upon request by business users, after the customer already has a relationship with Sicredi, without customer request. - In the loan approval process, when a limit has not yet been set for the customer. The limit system will try to measure a limit for the customer based on the loan request, before sending the loan to the human approval system. Due to the impact of these changes, we turned the project into a program, and then split that program into three projects. The first project, which we have already finished, aimed to select an application that meets our requirements, and then to develop the credit measurement for the registration phase. SAS Real-Time Decision Manager was selected because it fulfills our requirements, especially those that pertain to business user operation. A drag-and-drop interface makes all the technical rules more comprehensible to the business user. So far, four months after releasing the project for implementation by the bank's branches, we have achieved more the USD 20 million granted in pre-approved loan limits. In addition, we have reduced the process time for limit measurement in the branches by 84%. The branches can follow their results and goals through reports developed in SAS Visual Analytics.
Felipe Lopes Boff
This paper discusses a set of practical recommendations for optimizing the performance and scalability of your Hadoop system using SAS®. Topics include recommendations gleaned from actual deployments from a variety of implementations and distributions. Techniques cover tips for improving performance and working with complex Hadoop technologies such as Kerberos, techniques for improving efficiency when working with data, methods to better leverage the SAS in Hadoop components, and other recommendations. With this information, you can unlock the power of SAS in your Hadoop system.
Nancy Rausch, SAS
Wilbram Hazejager, SAS
The recent controversy regarding former Secretary Hillary Clinton's use of a non-government, privately maintained email server provides a great opportunity to analyze real-world data using a variety of analytic techniques. This email corpus is interesting because of the challenges in acquiring and preparing the data for analysis as well as the variety of analyses that can be performed, including techniques for searching, entity extraction and resolution, natural language processing for topic generation, and social network analysis. Given the potential for politically charged discussion, rest assured there will be no discussion of politics--just fact-based analysis.
Michael Ames, SAS
VBA has been described as a glue language, and has been widely used in exchanging data between Microsoft products such as Excel and Word or PowerPoint. How to trigger the VBA macro from SAS® via DDE has been widely discussed in recent years. However, using SAS to send parameters to a VBA macro was seldom reported. This paper provides a solution for this problem. Copying Excel tables to PowerPoint using the combination of SAS and VBA is illustrated as an example. The SAS program rapidly scans all Excel files that are contained in one folder, passes the file information to VBA as parameters, and triggers the VBA macro to write PowerPoint files in a loop. As a result, a batch of PowerPoint files can be generated by just one mouse-click.
Zhu Yanrong, Medtronic
For marketers who are responsible for identifying the best customer to target in a campaign, it is often daunting to determine which media channel, offer, or campaign program is the one the customer is more apt to respond to, and therefore, is more likely to increase revenue. This presentation examines the components of designing campaigns to identify promotable segments of customers and to target the optimal customers using SAS® Marketing Automation integrated with SAS® Marketing Optimization.
Pamela Dixon, SAS
Specifying colors based on group value is a quite popular practice in visualizing data, but it is not so easy to do, especially when there are multiple group values. This paper explores three different methods to dynamically assign colors to plots based on their group values. They are combining EVAL and IFN functions in the plot statements; bringing the DISCRETEATTRMAP block into the plot statements; and using the macro from the SAS® sample 40255.
Amos Shu, MedImmune
Are you going to enable HTTPS for your SAS® environment? Looking to improve the security of your SAS deployment? Do you need more details about how to efficiently configure HTTPS? This paper guides you through the configuration of SAS® 9.4 with HTTPS for the SAS middle tier. We examine how best to implement site-signed Transport Layer Security (TLS) certificates and explore how far you can take the encryption. This paper presents tips and proven practices that can help you be successful.
Stuart Rogers, SAS
SAS® High-Performance Risk distributes financial risk data and big data portfolios with complex analyses across a networked Hadoop Distributed File System (HDFS) grid to support rapid in-memory queries for hundreds of simultaneous users. This data is extremely complex and must be stored in a proprietary format to guarantee data affinity for rapid access. However, customers still desire the ability to view and process this data directly. This paper demonstrates how to use the HPRISK custom file reader to directly access risk data in Hadoop MapReduce jobs, using the HPDS2 procedure and the LASR procedure.
Mike Whitcher, SAS
Stacey Christian, SAS
Phil Hanna, SAS Institute
Don McAlister, SAS
Sensitive data requires elevated security requirements and the flexibility to apply logic that subsets data based on user privileges. Following the instructions in SAS® Visual Analytics: Administration Guide gives you the ability to apply row-level permission conditions. After you have set the permissions, you have to prove through audits who has access and row-level security. This paper provides you with the ability to easily apply, validate, report, and audit all tables that have row-level permissions, along with the groups, users, and conditions that will be applied. Take the hours of maintenance and lack of visibility out of row-level secure data and build confidence in the data and analytics that are provided to the enterprise.
Brandon Kirk, SAS
For SAS® users, PROC TABULATE and PROC REPORT (and its compute blocks) are probably among the most common procedures for calculating and displaying data. It is, however, pretty difficult to calculate and display changes from one column to another using data from other rows with just these two procedures. Compute blocks in PROC REPORT can calculate additional columns, but it would be challenging to pick up values from other rows as inputs. This presentation shows how PROC TABULATE can work with the lag(n) function to calculate rates of change from one period of time to another. This offers the flexibility of feeding into calculations the data retrieved from other rows of the report. PROC REPORT is then used to produce the desired output. The same approach can also be used in a variety of scenarios to produce customized reports.
Khoi To, Office of Planning and Decision Support, Virginia Commonwealth University
A number of SAS® tools can be used to report data, such as the PRINT, MEANS, TABULATE, and REPORT procedures. The REPORT procedure is a single tool that can produce many of the same results as other SAS tools. Not only can it create detailed reports like PROC PRINT can, but it can summarize and calculate data like the MEANS and TABULATE procedures do. Unfortunately, despite its power, PROC REPORT seems to be used less often than the other tools, possibly due to its seemingly complex coding. This paper uses PROC REPORT and the Output Delivery System (ODS) to export a big data set into a customized XML file that a user who is not familiar with SAS can easily read. Several options for the COLUMN, DEFINE, and COMPUTE statements are shown that enable you to present your data in a more colorful way. We show how to control the format of the selected columns and rows, make column headings more meaningful, and how to color selected cells differently to bring attention to the most important data.
Guihong Chen, TCF Bank
The latest releases of SAS® Data Integration Studio, SAS® Data Management Studio and SAS® Data Integration Server, SAS® Data Governance, and SAS/ACCESS® software provide a comprehensive and integrated set of capabilities for collecting, transforming, and managing your data. The latest features in the product suite include capabilities for working with data from a wide variety of environments and types including Hadoop, cloud, RDBMS, files, unstructured data, streaming, and others, and the ability to perform ETL and ELT transformations in diverse run-time environments including SAS®, database systems, Hadoop, Spark, SAS® Analytics, cloud, and data virtualization environments. There are also new capabilities for lineage, impact analysis, clustering, and other data governance features for enhancements to master data and support metadata management. This paper provides an overview of the latest features of the SAS® Data Management product suite and includes use cases and examples for leveraging product capabilities.
Nancy Rausch, SAS
Each night on the news we hear the level of the Dow Jones Industrial Average along with the 'first difference,' which is today's price-weighted average minus yesterday's. It is that series of first differences that excites or depresses us each night as it reflects whether stocks made or lost money that day. Furthermore, the differences form the data series that has the most addressable statistical features. In particular, the differences have the stationarity requirement, which justifies standard distributional results such as asymptotically normal distributions of parameter estimates. Differencing arises in many practical time series because they seem to have what are called 'unit roots,' which mathematically indicate the need to take differences. In 1976, Dickey and Fuller developed the first well-known tests to decide whether differencing is needed. These tests are part of the ARIMA procedure in SAS/ETS® in addition to many other time series analysis products. I'll review a little of what is was like to do the development and the required computing back then, say a little about why this is an important issue, and focus on examples.
David Dickey, NC State University
For many organizations, the answer to whether to manage their data and analytics in a public or private cloud is going to be both. Both can be the answer for many different reasons: common sense logic not to replace a system that already works just to incorporate something new; legal or corporate regulations that require some data, but not all data, to remain in place; and even a desire to provide local employees with a traditional data center experience while providing remote or international employees with cloud-based analytics easily managed through software deployed via Amazon Web Services (AWS). In this paper, we discuss some of the unique technical challenges of managing a hybrid environment, including how to monitor system performance simultaneously for two different systems that might not share the same infrastructure or even provide comparable system monitoring tools; how to manage authorization when access and permissions might be driven by two different security technologies that make implementation of a singular protocol problematic; and how to ensure overall automation of two platforms that might be independently automated, but not originally designed to work together. In this paper, we share lessons learned from a decade of experience implementing hybrid cloud environments.
Ethan Merrill, SAS
Bryan Harkola, SAS
Do you write reports that sometimes have missing categories across all class variables? Some programmers write all sorts of additional DATA step code in order to show the zeros for the missing rows or columns. Did you ever wonder whether there is an easier way to accomplish this? PROC MEANS and PROC TABULATE, in conjunction with PROC FORMAT, can handle this situation with a couple of powerful options. With PROC TABULATE, we can use the PRELOADFMT and PRINTMISS options in conjunction with a user-defined format in PROC FORMAT to accomplish this task. With PROC SUMMARY, we can use the COMPLETETYPES option to get all the rows with zeros. This paper uses examples from Census Bureau tabulations to illustrate the use of these procedures and options to preserve missing rows or columns.
Chris Boniface, Census Bureau
Janet Wysocki, U.S. Census Bureau