SAS Global Forum 2017 Proceedings

Location information plays a big role in business data. Everything that happens in a business happens somewhere, whether it s sales of products in different regions or crimes that happened in a city. Business analysts typically use the historic data that they have gathered for years for analysis. One of the most important pieces of data that can help answer more questions qualitatively, is the demographic data along with the business data. An analyst can match the sales or the crimes with the population metrics like gender, age groups, family income, race, and other pieces of information, which are part of the demographic data, for better insight. This paper demonstrates how a business analyst can bring the demographic and lifestyle data from Esri into SAS^® Visual Analytics and join the data with business data. The integration of SAS Visual Analytics with Esri allows this to happen. We demonstrate different methods of accessing Esri demographic data from SAS Visual Analytics. We also demonstrate how you can use custom shape files and integrate with Esri Portal for ArcGIS.

Read the paper (PDF)

Designing a survey is a meticulous process involving a number of steps and many complex choices. For most survey researchers, the choice of a probability or non-probability sample is somewhat simple. However, throughout the sample design process, there are more complex choices each of which can introduce bias into the survey estimates. For example, the sampling statistician must decide whether to stratify the frame. And, if so, he has to decide how many strata, whether to explicitly stratify, and how should a stratum be defined. He also has to decide whether to use clusters. And, if so, how to define a cluster and what should be the ideal cluster size. The factors affecting these choices, along with the impact of different sample designs on survey estimates, are explored in this paper. The SURVEYSELECT procedure in SAS/STAT^® 14.1 is used to select a number of samples based on different designs using data from Jamaica's 2011 Population and Housing Census. Census results are assumed to be equal to the true population parameter. The estimates from each selected sample are evaluated against this parameter to assess the impact of different sample designs on point estimates. Design-adjusted survey estimates are computed using the SURVEYMEANS and SURVEYFREQ procedures in SAS/STAT 14.1. The resultant variances are evaluated to determine the sample design that yields the most precise estimates.

Read the paper (PDF)

This end-to-end capability demonstration illustrates how SAS^® Viya can aid intelligence, homeland security, and law enforcement agencies in counter radicalization. There are countless examples of agency failure to apportion significance to isolated pieces of information which, in context, are indicative of an escalating threat and require intervention. Recent terrorist acts have been carried out by radicalized individuals who should have been firmly on the organizational radar. Although SAS^® products enable analysis and interpretation of data that enables the law enforcement and homeland security community to recognize and triage threats, intelligence information must be viewed in full context. SAS Viya can rationalize previously disconnected capabilities in a single platform, empowering intelligence, security, and law enforcement agencies. SAS^® Visual Investigator provides a hub for SAS^® Event Stream Processing, SAS^® Visual Scenario Designer, and SAS^® Visual Analytics, combining network analysis, triage, and, by leveraging the mobile capability of SAS, operational case management to drive insights, leads, and investigation. This hub provides the capability to ingest relevant external data sources, and to cross reference both internally held data and, crucially, operational intelligence gained from normal policing activities. This presentation chronicles the exposure and substantiation of a radical network and informs tactical and strategic disruption.

Read the paper (PDF)

For all business analytics projects big or small, the results are used to support business or managerial decision-making processes, and many of them eventually lead to business actions. However, executives or decision makers are often confused and feel uninformed about contents when presented with complicated analytics steps, especially when multi-processes or environments are involved. After many years of research and experiment, a web reporting framework based on SAS^® Stored Processes was developed to smooth the communication between data analysts, researches, and business decision makers. This web reporting framework uses a storytelling style to present essential analytical steps to audiences, with dynamic HTML5 content and drill-down and drill-through functions in text, graph, table, and dashboard formats. No special skills other than SAS^® programming are needed for implementing a new report. The model-view-controller (MVC) structure in this framework significantly reduced the time needed for developing high-end web reports for audiences not familiar with SAS. Additionally, the report contents can be used to feed to tablet or smartphone users. A business analytical example is demonstrated during this session. By using this web reporting framework based on SAS Stored Processes, many existing SAS results can be delivered more effectively and persuasively on a SAS^® Enterprise BI platform.

Read the paper (PDF)

Creating an environment that enables and empowers self-service and agile analytic capabilities requires a tremendous amount of working together and extensive agreements between IT and the business. Business and IT users are struggling to know what version of the data is valid, where they should get the data from, and how to combine and aggregate all the data sources to apply analytics and deliver results in a timely manner. All the while, IT is struggling to supply the business with more and more data that is becoming available through many different data sources such as the Internet, sensors, the Internet of Things, and others. In addition, once they start trying to join and aggregate all the different types of data, the manual coding can be very complicated and tedious, can demand extraneous resources and processing, and can negatively impact the overhead on the system. If IT enables agile analytics in a data lab, it can alleviate many of these issues, increase productivity, and deliver an effective self-service environment for all users. This self-service environment using SAS^® analytics in Teradata has decreased the time required to prepare the data and develop the statistical data model, and delivered faster results in minutes compared to days or even weeks. This session discusses how you can enable agile analytics in a data lab, leverage SAS analytics in Teradata to increase performance, and learn how hundreds of organizations have adopted this concept to deliver self-service capabilities in a streamlined process.

Pooling two or more cross-sectional survey data sets (such as stacking the data sets on top of one another) is a strategy often used by researchers for one of two purposes: (1) to more efficiently conduct significance tests on point estimate changes observed over time or (2) to increase the sample size in hopes of improving the precision of a point estimate. The latter purpose is especially common when making inferences on a subgroup, or domain, of the target population insufficiently represented by a single survey data set. Using data from the National Survey of Family Growth (NSFG), the aim of this paper is to walk through a series of practical estimation objectives that can be tackled by analyzing data from two or more pooled survey data sets. Where applicable, we comment on the resulting interpretive nuances.

Read the paper (PDF)

Credit card fraud. Loan fraud. Online banking fraud. Money laundering. Terrorism financing. Identity theft. The strains that modern criminals are placing on financial and government institutions demands new approaches to detecting and fighting crime. Traditional methods of analyzing large data sets on a periodic, batch basis are no longer sufficient. SAS^® Event Stream Processing provides a framework and run-time architecture for building and deploying analytical models that run continuously on streams of incoming data, which can come from virtually any source: message queues, databases, files, TCP\IP sockets, and so on. SAS^® Visual Scenario Designer is a powerful tool for developing, testing, and deploying aggregations, models, and rule sets that run in the SAS^® Event Stream Processing Engine. This session explores the technology architecture, data flow, tools, and methodologies that are required to build a solution based on SAS Visual Scenario Designer that enables organizations to fight crime in real time.

Read the paper (PDF)

Would you like to be more confident in producing graphs and figures? Do you understand the differences between the OVERLAY, GRIDDED, LATTICE, DATAPANEL, and DATALATTICE layouts? Finally, would you like to learn the fundamental Graph Template Language methods in a relaxed environment that fosters questions? Great this topic is for you! In this hands-on workshop, you are guided through the fundamental aspects of the GTL procedure, and you can try fun and challenging SAS^® graphics exercises to enable you to more easily retain what you have learned.

Read the paper (PDF) | Download the data file (ZIP)

Do you need to add annotations to your graphs? Do you need to specify your own colors on the graph? Would you like to add Unicode characters to your graph, or would you like to create templates that can also be used by non-programmers to produce the required figures? Great, then this topic is for you! In this hands-on workshop, you are guided through the more advanced features of the GTL procedure. There are also fun and challenging SAS^® graphics exercises to enable you to more easily retain what you have learned.

Read the paper (PDF) | Download the data file (ZIP)

A growing need in higher education is the more effective use of analytics when evaluating the success of a postsecondary institution. The metrics currently used are simplistic measures of graduation rates, publications, and external funding. These measures offer a limited view of the effectiveness of postsecondary institutions. This paper provides a global perspective of the academic progress of students and the business of higher education. It presents innovative metrics that are more effective in evaluating postsecondary-institutional effectiveness.

Read the paper (PDF)

When analyzing data with SAS^®, we often use the SAS DATA step and the SQL procedure to explore and manipulate data. Though they both are useful tools in SAS, many SAS users do not fully understand their differences, advantages, and disadvantages and thus have numerous unnecessary biased debates on them. Therefore, this paper illustrates and discusses these aspects with real work examples, which give SAS users deep insights into using them. Using the right tool for a given circumstance not only provides an easier and more convenient solution, it also saves time and work in programming, thus improving work efficiency. Furthermore, the illustrated methods and advanced programming skills can be used in a wide variety of data analysis and business analytics fields.

Read the paper (PDF)

Every organization, from the most mature to a day-one start-up, needs to grow organically. A deep understanding of internal customer and operational data is the single biggest catalyst to develop and sustain the data. Advanced analytics and big data directly feed into this, and there are best practices that any organization (across the entire growth curve) can adopt to drive success. Analytics teams can be drivers of growth. But to be truly effective, key best practices need to be implemented. These practices include in-the-weeds details, like the approach to data hygiene, as well as strategic practices, like team structure and model governance. When executed poorly, business leadership and the analytics team are unable to communicate with each other they talk past each other and do not work together toward a common goal. When executed well, the analytics team is part of the business solution, aligned with the needs of business decision-makers, and drives the organization forward. Through our engagements, we have discovered best practices in three key areas. All three are critical to analytics team effectiveness. 1) Data Hygiene 2) Complex Statistical Modeling 3) Team Collaboration

Read the paper (PDF)

Prince Niccolo Machiavelli said things on the order of, The promise given was a necessity of the past: the word broken is a necessity of the present. His utilitarian philosophy can be summed up by the phrase, The ends justify the means. As a personality trait, Machiavelianism is characterized by the drive to pursue one's own goals at the cost of others. In 1970, Richard Christie and Florence L. Geis created the MACH-IV test to assign a MACH score to an individual, using 20 Likert-scaled questions. The purpose of this study was to build a regression model that can be used to predict the MACH score of an individual using fewer factors. Such a model could be useful in screening processes where personality is considered, such as in job screening, offender profiling, or online dating. The research was conducted on a data set from an online personality test similar to the MACH-IV test. It was hypothesized that a statistically significant model exists that can predict an average MACH score for individuals with similar factors. This hypothesis was accepted.

View the e-poster or slides (PDF)

The TABULATE procedure has long been a central workhorse of our organization's reporting processes, given that it offers a uniquely concise syntax for obtaining descriptive statistics on deeply grouped and nested categories within a data set. Given the diverse output capabilities of SAS^®, it often then suffices to simply ship the procedure's completed output elsewhere via the Output Delivery System (ODS). Yet there remain cases in which we want to not only obtain a formatted result, but also to acquire the full nesting tree and logic by which the computations were made. In these cases, we want to treat the details of the Tabulate statements as data, not merely as presentation. I demonstrate how we have solved this problem by parsing our Tabulate statements into a nested tree structure in JSON that can be transferred and easily queried for deep values elsewhere beyond the SAS program. Along the way, this provides an excellent opportunity to walk through the nesting logic of the procedure's statements and explain how to think about the axes, groupings, and set computations that make it tick. The source code for our syntax parser are also available on GitHub for further use.

Read the paper (PDF)

Graphics are an excellent way to display results from multiple statistical analyses and get a visual message across to the correct audience. Scientific journals often have very precise requirements for graphs that are submitted with manuscripts. While authors often find themselves using tools other than SAS^® to create these graphs, the combination of the SGPLOT procedure and the Output Delivery System enables authors to create what they need in the same place as they conducted their analysis. This presentation focuses on two methods for creating a publication quality graphic in SAS^® 9.4 and provides solutions for some issues encountered when doing so.

Read the paper (PDF)

Creating your first suite of reports using SAS^® Visual Analytics is like being a kid in a candy store with so many options for data visualization, it is difficult to know where to start. Having a plan for implementation can save you a lot of time in development and beyond, especially when you are wrangling big data. This paper helps you make sure that you are parallelizing work (where possible), maximizing your data insights, and creating a polished end product. We provide guidelines to common questions, such as How many objects are too many ? or When should I use multiple tabs versus report linking? to start any data visualizer off on the right foot.

Read the paper (PDF)

This paper compiles information from documents produced by the U.S. Food and Drug Administration (FDA), the Clinical Data Interchange Standards Consortium (CDISC), and Computational Sciences Symposium (CSS) workgroups to identify what analysis data and other documentation is to be included in submissions and where it all needs to go. It not only describes requirements, but also includes recommendations for things that aren't so cut-and-dried. It focuses on the New Drug Application (NDA) submissions and a subset of Biologic License Application (BLA) submissions that are covered by the FDA binding guidance documents. Where applicable, SAS^® tools are described and examples given.

Read the paper (PDF)

Web Intelligence is a business objects web-based application used by the FDA for accessing and querying data files, and ultimately creating reports from multiple databases. The system allows querying of different databases using common business terms, and in the case of the FDA's Center for Food Safety and Applied Nutrition (CFSAN), careful review of dietary supplement information. However, in order to create timely and efficient reports for detection of safety signals leading to adverse events, a more efficient system is needed to obtain and visually display the data. Using SAS^® Visual Analytics and SAS^® Enterprise Guide^® can assist with timely extraction of data from multiple databases commonly used by CFSAN and create a more user friendly interface for management's review to help make key decisions in prevention of adverse events for the public.

Read the paper (PDF)

In 2012, US Customs scanned nearly 4% and physically inspected less than 1% of the 11.5 million cargo containers that entered the United States. Laundering money through trade is one of the three primary methods used by criminals and terrorists. The other two methods used to launder money are using financial institutions and physically moving money via cash couriers. The Financial Action Task Force (FATF) roughly defines trade-based money laundering (TBML) as disguising proceeds from criminal activity by moving value through the use of trade transactions in an attempt to legitimize their illicit origins. As compared to other methods, this method of money laundering receives far less attention than those that use financial institutions and couriers. As countries have budget shortfalls and realize the potential loss of revenue through fraudulent trade, they are becoming more interested in TBML. Like many problems, applying detection methods against relevant data can result in meaningful insights, and can result in the ability to investigate and bring to justice those perpetuating fraud. In this paper, we apply TBML red flag indicators, as defined by John A. Cassara, against shipping and trade data to detect and explore potentially suspicious transactions. (John A. Cassara is an expert in anti-money laundering and counter-terrorism, and author of the book Trade-Based Money Laundering. ) We use the latest detection tool in SAS^® Viya , along with SAS^® Visual Investigator.

View the e-poster or slides (PDF)

Ensemble models have become increasingly popular in boosting prediction accuracy over the last several years. Stacked ensemble techniques combine predictions from multiple machine learning algorithms and use these predictions as inputs to a second level-learning algorithm. This paper shows how you can generate a diverse set of models by various methods (such as neural networks, extreme gradient boosting, and matrix factorizations) and then combine them with popular stacking ensemble techniques, including hill-climbing, generalized linear models, gradient boosted decision trees, and neural nets, by using both the SAS^® 9.4 and SAS^® Visual Data Mining and Machine Learning environments. The paper analyzes the application of these techniques to real-life big data problems and demonstrates how using stacked ensembles produces greater prediction accuracy than individual models and na ve ensembling techniques. In addition to training a large number of models, model stacking requires the proper use of cross validation to avoid overfitting, which makes the process even more computationally expensive. The paper shows how to deal with the computational expense and efficiently manage an ensemble workflow by using parallel computation in a distributed framework.

Read the paper (PDF)

Every visualization tells a story. The effectiveness of showing data through visualization becomes clear as these visualizations will tell stories about differences in US mortality using the National Longitudinal Mortality Study (NLMS) data, using the Public-Use Microdata Samples (PUMS) of 1.2 million cases and 122 thousand records of mortality. SAS^® Visual Analytics is a versatile and flexible tool that easily displays the simple effects of differences in mortality rates between age groups, genders, races, places of birth (native or foreign), education and income levels, and so on. Sophisticated analyses including logistical regression (with interactions), decision trees, and neural networks that are displayed in a clear, concise manner help describe more interesting relationships among variables that influence mortality. Some of the most compelling examples are: Males who live alone have a higher mortality rate than females. White men have higher rates of suicide than black men.

Read the paper (PDF) | View the e-poster or slides (PDF)

Hash tables are powerful tools when building an electronic code book, which often requires a lot of match-merging between the SAS^® data sets. In projects that span multiple years (e.g., longitudinal studies), there are usually thousands of new variables introduced at the end of every year or at the end of each phase of the project. These variables usually have the same stem or core as the previous year's variables. However, they differ only in a digit or two that usually signifies the year number of the project. So, every year, there is this extensive task of comparing thousands of new variables to older variables for the sake of carrying forward key database elements corresponding to the previously defined variables. These elements can include the length of the variable, data type, format, discrete or continuous flag, and so on. In our SAS program, hash objects are efficiently used to cut down not only time, but also the number of DATA and PROC steps used to accomplish the task. Clean and lean code is much easier to understand. A macro is used to create the data set containing new and older variables. For a specific new variable, the FIND method in hash objects is used in a loop to find the match to the most recent older variable. What was taking around a dozen PROC SQL steps is now a single DATA step using hash tables.

Read the paper (PDF)

Global trade and more people and freight moving across international borders present border and security agencies with a difficult challenge. While supporting freedom of movement, agencies must minimize risks, preserve national security, guarantee that correct duties are collected, deploy human resources to the right place, and ensure that additional checks do not result in increased delays for passengers or cargo. To meet these objectives, border agencies must make the most efficient use of their data, which is often found across disparate intelligence sources. Bringing this data together with powerful analytics can help them identify suspicious events, highlight areas of risk, process watch lists, and notify relevant agents so that they can investigate, take immediate action to intercept illegal or high-risk activities, and report findings. With SAS^® Visual Investigator, organizations can use advanced analytical models and surveillance scenarios to identify and score events, and to deliver them to agents and intelligence analysts for investigation and action. SAS Visual Investigator provides analysts with a holistic view of people, cargo, relationships, social networks, patterns, and anomalies, which they can explore through interactive visualizations before capturing their decision and initiating an action.