SAS Global Forum 2017 Proceedings

A major issue in America today is the growing gap between the rich and the poor. Even though the basic concept has entered the public consciousness, the effects of highly concentrated wealth are hotly debated and poorly understood by the general public. The goal of this paper is to get a fair picture of the wealth gap and its ill effects on American society. Before visualizing the financial gap, an exploration and descriptive analysis is carried out. By considering the data (gross annual income, taxable income, and taxes paid), which is available on the website of United States Census Bureau, we try to find out the actual spending capacity of the people in America. We visualize the financial gap on the basis of the spending capacity. With the help of this analysis we try to answer the following questions. Why is it important to have a fair idea of this gap? At what rate is the average wealth of the American population increasing? How does it affect the tax system? Insights generated from answering these questions will be used for further analysis.

View the e-poster or slides (PDF)

Sovereign risk rating and country risk rating are conceptually distinct in that the former captures the risk of a country defaulting on its commercial debt obligations using economic variables while the latter covers the downside of a country's business environment including political and social variables alongside economic variables. Through this paper we would like to understand the differences between these risk approaches in assessing a country's credit worthiness by statistically examining the predictive power of political and social variables in determining country risk. To do this, we wish to build two models, first model with economic variables as regressors (sovereign risk model) and the second model with economic, political and social variables as regressors (country risk model) to compare the predictive power of regressors and model performance metrics between both the models. This will be an OLS regression model with country risk rating obtained from S&P as the target variable. With a general assumption that economic variables are driven by political processes and social factors, we would like to see if the second model has better predictive power. The economic, political and social indicators data that will be used as independent variables in the model will be obtained from world bank open data and target variable (country risk rating) will be obtained from S&P country risk ratings data.

View the e-poster or slides (PDF)

For all business analytics projects big or small, the results are used to support business or managerial decision-making processes, and many of them eventually lead to business actions. However, executives or decision makers are often confused and feel uninformed about contents when presented with complicated analytics steps, especially when multi-processes or environments are involved. After many years of research and experiment, a web reporting framework based on SAS^® Stored Processes was developed to smooth the communication between data analysts, researches, and business decision makers. This web reporting framework uses a storytelling style to present essential analytical steps to audiences, with dynamic HTML5 content and drill-down and drill-through functions in text, graph, table, and dashboard formats. No special skills other than SAS^® programming are needed for implementing a new report. The model-view-controller (MVC) structure in this framework significantly reduced the time needed for developing high-end web reports for audiences not familiar with SAS. Additionally, the report contents can be used to feed to tablet or smartphone users. A business analytical example is demonstrated during this session. By using this web reporting framework based on SAS Stored Processes, many existing SAS results can be delivered more effectively and persuasively on a SAS^® Enterprise BI platform.

Read the paper (PDF)

Creating an environment that enables and empowers self-service and agile analytic capabilities requires a tremendous amount of working together and extensive agreements between IT and the business. Business and IT users are struggling to know what version of the data is valid, where they should get the data from, and how to combine and aggregate all the data sources to apply analytics and deliver results in a timely manner. All the while, IT is struggling to supply the business with more and more data that is becoming available through many different data sources such as the Internet, sensors, the Internet of Things, and others. In addition, once they start trying to join and aggregate all the different types of data, the manual coding can be very complicated and tedious, can demand extraneous resources and processing, and can negatively impact the overhead on the system. If IT enables agile analytics in a data lab, it can alleviate many of these issues, increase productivity, and deliver an effective self-service environment for all users. This self-service environment using SAS^® analytics in Teradata has decreased the time required to prepare the data and develop the statistical data model, and delivered faster results in minutes compared to days or even weeks. This session discusses how you can enable agile analytics in a data lab, leverage SAS analytics in Teradata to increase performance, and learn how hundreds of organizations have adopted this concept to deliver self-service capabilities in a streamlined process.

Implementation of state transition models for loan-level portfolio evaluation was an arduous task until now. Several features have been added to the SAS^® High-Performance Risk engine that greatly enhance the ability of users to implement and execute these complex, loan-level models. These new features include model methods, model groups, and transition matrix functions. These features eliminate unnecessary and redundant calculations; enable the user to seamlessly interconnect systems of models; and automatically handle the bulk of the process logic in model implementation that users would otherwise need to code themselves. These added features reduce both the time and effort needed to set up model implementation processes, as well as significantly reduce model run time. This paper describes these new features in detail. In addition, we show how these powerful models can be easily implemented by using SAS^® Model Implementation Platform with SAS^® 9.4. This implementation can help many financial institutions take a huge leap forward in their modeling capabilities.

Read the paper (PDF)

Credit card fraud. Loan fraud. Online banking fraud. Money laundering. Terrorism financing. Identity theft. The strains that modern criminals are placing on financial and government institutions demands new approaches to detecting and fighting crime. Traditional methods of analyzing large data sets on a periodic, batch basis are no longer sufficient. SAS^® Event Stream Processing provides a framework and run-time architecture for building and deploying analytical models that run continuously on streams of incoming data, which can come from virtually any source: message queues, databases, files, TCP\IP sockets, and so on. SAS^® Visual Scenario Designer is a powerful tool for developing, testing, and deploying aggregations, models, and rule sets that run in the SAS^® Event Stream Processing Engine. This session explores the technology architecture, data flow, tools, and methodologies that are required to build a solution based on SAS Visual Scenario Designer that enables organizations to fight crime in real time.

Read the paper (PDF)

U.S. stock exchanges (currently there are 12) are tracked in real time via the Consolidated Trade System (CTS) and the Consolidated Quote System (CQS). CQS contains every updated quote from each of these exchanges, covering some 8,500 stock tickers. It provides the basis by which brokers can honor their fiduciary obligation to investors to execute transactions at the best price, that is, at the National Best Bid or Best Offer (NBBO). With the advent of electronic exchanges and high-frequency trading (timestamps are published to the nanosecond), data set size (approaching 1 billion quotes requiring 80 gigabytes of storage for a normal trading day) has become a major operational consideration for market behavior researchers re-creating NBBO values from quotes. This presentation demonstrates a straightforward use of hash tables for tracking constantly changing quotes for each ticker/exchange combination to provide the NBBO for each ticker at each time point in the trading day.

The mean-variance model might be the most famous model in the financial field. It can determine the optimal portfolio if you know every asset's expected return and its covariance matrix. The tangency portfolio is a type of optimal portfolio, which means that it has the maximum expected return (mean) and the minimial risk (variance) among all portfolios. This paper uses sample data to get the tangency portfolio using SAS/IML^® code.

Read the paper (PDF) | View the e-poster or slides (PDF)

Would you like to be more confident in producing graphs and figures? Do you understand the differences between the OVERLAY, GRIDDED, LATTICE, DATAPANEL, and DATALATTICE layouts? Finally, would you like to learn the fundamental Graph Template Language methods in a relaxed environment that fosters questions? Great this topic is for you! In this hands-on workshop, you are guided through the fundamental aspects of the GTL procedure, and you can try fun and challenging SAS^® graphics exercises to enable you to more easily retain what you have learned.

Read the paper (PDF) | Download the data file (ZIP)

Do you need to add annotations to your graphs? Do you need to specify your own colors on the graph? Would you like to add Unicode characters to your graph, or would you like to create templates that can also be used by non-programmers to produce the required figures? Great, then this topic is for you! In this hands-on workshop, you are guided through the more advanced features of the GTL procedure. There are also fun and challenging SAS^® graphics exercises to enable you to more easily retain what you have learned.

Read the paper (PDF) | Download the data file (ZIP)

Investors usually trade stocks or exchange-traded funds (ETFs) based on a methodology, such as a theory, a model, or a specific chart pattern. There are more than 10,000 securities listed on the US stock market. Picking the right one based on a methodology from so many candidates is usually a big challenge. This paper presents the methodology based on the CANSLIM1 theorem and momentum trading (MT) theorem. We often hear of the cup and handle shape (C&H), double bottoms and multiple bottoms (MB), support and resistance lines (SRL), market direction (MD), fundamental analyses (FA), and technical analyses (TA). Those are all covered in CANSLIM theorem. MT is a trading theorem based on stock moving direction or momentum. Both theorems are easy to learn but difficult to apply without an appropriate tool. The brokers' application system usually cannot provide such filtering due to its complexity. For example, for C&H, where is the handle located? For the MB, where is the last bottom you should trade at? Now, the challenging task can be fulfilled through SAS^®. This paper presents the methods on how to apply the logic and graphically present them though SAS. All SAS users, especially those who work directly on capital market business, can benefit from reading this document to achieve their investment goals. Much of the programming logic can also be adopted in SAS finance packages for clients.

Read the paper (PDF)

When analyzing data with SAS^®, we often use the SAS DATA step and the SQL procedure to explore and manipulate data. Though they both are useful tools in SAS, many SAS users do not fully understand their differences, advantages, and disadvantages and thus have numerous unnecessary biased debates on them. Therefore, this paper illustrates and discusses these aspects with real work examples, which give SAS users deep insights into using them. Using the right tool for a given circumstance not only provides an easier and more convenient solution, it also saves time and work in programming, thus improving work efficiency. Furthermore, the illustrated methods and advanced programming skills can be used in a wide variety of data analysis and business analytics fields.

Read the paper (PDF)

From stock price histories to hospital stay records, analysis of time series data often requires use of lagged (and occasionally lead) values of one or more analysis variable. For the SAS^® user, the central operational task is typically getting lagged (lead) values for each time point in the data set. Although SAS has long provided a LAG function, it has no analogous lead function, which is an especially significant problem in the case of large data series. This paper 1) reviews the lag function, in particular, the powerful but non-intuitive implications of its queue-oriented basis; 2) demonstrates efficient ways to generate leads with the same flexibility as the LAG function, but without the common and expensive recourse to data re-sorting; and 3) shows how to dynamically generate leads and lags through use of the hash object.

Read the paper (PDF)

Every organization, from the most mature to a day-one start-up, needs to grow organically. A deep understanding of internal customer and operational data is the single biggest catalyst to develop and sustain the data. Advanced analytics and big data directly feed into this, and there are best practices that any organization (across the entire growth curve) can adopt to drive success. Analytics teams can be drivers of growth. But to be truly effective, key best practices need to be implemented. These practices include in-the-weeds details, like the approach to data hygiene, as well as strategic practices, like team structure and model governance. When executed poorly, business leadership and the analytics team are unable to communicate with each other they talk past each other and do not work together toward a common goal. When executed well, the analytics team is part of the business solution, aligned with the needs of business decision-makers, and drives the organization forward. Through our engagements, we have discovered best practices in three key areas. All three are critical to analytics team effectiveness. 1) Data Hygiene 2) Complex Statistical Modeling 3) Team Collaboration