Session 3162-2019
There are many time-saving and headache-saving tips and tricks you can use to make working in SAS® Enterprise Guide® a breeze. Did you know that you can change your layout so that you can see your code and your results at the same time? You will learn 20 tips and tricks for working in SAS Enterprise Guide in 20 minutes. One tip per minute, and out of the twenty you are guaranteed to find at least one nugget that will making your life easier.
Kelly Gray, SAS
Session 3938-2019
Defining an analytics strategy and selecting a vendor for your analytics software is a big deal for any organization from start-ups to small and medium-sized enterprises (SMEs) to corporations. Making the right strategic investment decisions in the use of time, resources, and technology can be a make or break situation. In this paper, we present a high-level look at using the common questioning framework 5W1H, sharing insights on the business, technical, and external factors that can help you to make the right choices in software selection and cloud implementation. The Who, What, When, Where, Why, and How questioning method is used to identify key considerations that should be explored and that influence a good decision-making process that determines whether an investment in SAS® Viya® as a modern cloud analytics platform suits your organizations needs.
Daniel Hatton, Sopra Steria
Session 3842-2019
SAS® Viya® comes with a new command-line interface to interact with microservices. This poster attempts to embrace the openness of SAS Viya by creating a Chatbot that helps the SAS administrator to perform day-to-day tasks. While there are many ways to automate administrative tasks, this poster explores the latest cloud services such as Amazon Web Services (AWS) Lex chatbot service and AWS Lambda, which is a serverless computing platform to create a user-interactive chatbot with Slack App chatbot being the front end. This chatbot can be easily customized to work to our voice commands. The Lambda function uses the Python runtime environment, and we also explore the way we can interact with microservices using Python.
Sandeep Grande, Core Complete
Session 3604-2019
SAS® web deployment has traditionally been performed using a variety of techniques such as SAS® Application Dispatcher (SAS/IntrNet®), publishing in SAS® Enterprise Guide®, and others. This paper shows a very forward-looking methodology using .NET Core, ASP.NET Core, Entity Framework, SAS® Integration Technologies, and Base SAS®. It is lightweight, fast, runs on Windows and Linux, and is easy to publish. Web content is published at a service endpoint using JSON-formatted REST calls.
Alan Churchill, Savian
Session 3115-2019
If you are copying and pasting code over and over to perform the same operation on multiple variables in a SAS® DATA step, you need to learn about arrays and DO loops. Arrays and DO loops are efficient and powerful data manipulation tools that you should have in your programmers tool box. Arrays list the variables on which you want to perform the same operation and can be specified with or without the number of elements or variables in the array. DO loops are used to specify the operation across the elements in the array. This workshop shows you how to use ARRAY statements and DO loops with and without specifying the number of elements in the array on which to perform the operation in the DO loop.
Jennifer Waller, Augusta University
Session 3497-2019
Many requirements and guidelines need to be taken into consideration when you are setting up the compute infrastructure for SAS® 9.4. Tracking down these requirements can be a daunting task because they are located across several discipline areas of the support.sas.com website. This paper serves as a collection of the pre-installation requirements and guidelines, and presents a general overview of the topics as well as links to additional documentation.
Jim Kuell, SAS
Session 4057-2019
Algorithmic variable reduction techniques have the potential to reduce the intensive preprocessing required for regression. This paper uses algorithmic techniques as a variable selection method and an alternative to regression for identifying salient factors. The first approach used random forest as a substitute for standard pre-processing and variable reduction. However, this method generated a theoretically invalid set of predictors and was not explored further. Subsequently, a set of logistic regression and decision tree models were compared for their efficacy in identifying predictors of Type II diabetes. The optimal decision tree model (a chi-squared model with default SAS® Enterprise Miner (TM) options) performed comparably to the logistic regression model in terms of average squared error. These findings suggest that data mining methods can successfully be used to supplement traditional epidemiologic research methods in future research
Audrey Whittle, Sharon Pearcey, Wendy Ballew, and Rebekah Fallin, Kennesaw State University
Session 3238-2019
Thanks to the welcome introduction and support of an official SASPy module over the past couple of years, it is now a trivial task to incorporate SAS® into new workflows by leveraging the simple yet presentationally elegant Jupyter Notebook coding and publication environment, along with the broader Python data science ecosystem that comes with it. This paper and presentation begins with an overview of Jupyter Notebooks for the uninitiated, then proceeds to explain the essential concepts in SASPy that enable communicating seamlessly with a SAS session from Python code. Included along the way is an examination of Python DataFrames and their practical relationship to SAS data sets, as well as the unique advantages offered by bringing your SAS work into the Notebook workspace and into productive unity with the broad appeal of Pythons syntax.
Jason Phillips, The University of Alabama
Session 3163-2019
Forecasting balance sheet and income statement line items has long been a required activity, especially given the regulatory attention following the economic crisis of 20072010. Traditionally, financial institutions (FIs) have approached this activity through scenario and stress testing, which basically tests for outcomes arising from changes in circumstances. Unfortunately, the top-down approach and state of modeling to support line-item-level projections lag far behind other types of modeling being performed at FIs. Also, regulators have found additional weaknesses in approaches based on expert knowledge or historical evidence a priori and are increasingly advocating the exploration of tail risks that can render an FIs business models unviable, and likely cause the institute to default or become insolvent. SAS® Financial Statement Simulation Model presents a new approach to modeling an FIs line items and encompasses support for both scenario-based and outcome-based testing. Our approach avoids the typical inversion problems arising out of traditional independent line-item modeling and accounts for the ties between line items via correlations, concentrations, and migration dynamics. Further, SAS Financial Statement Simulation Model generates the distributions of balance sheets, income statements, and capital ratios that are visualized through SAS® Risk and Finance Workbench.
Shannon Clark, Chad Peterson, Sameer Padhi, and Srinivas Jonnalagadda, SAS Institute
Session 3433-2019
Hybrid vehicles let you use the best technology for a given scenario,
blending fuel efficiency with long driving range, and low emissions with
quick refueling. Its like two cars in onetwo motors magically complementing
each other under the hood, invisible to the driver. And as the driver, you
have just one control panel that chooses the best engine for the
circumstances, adjusting to changes in real time. We marketers face a
similar dilemma: we have different engines for different scenarios. Not only
do we have to learn how to drive multiple systems to address the diversity
of interactions we have with our customers, we must make them work together
seamlessly to deliver delightful customer experiences. Sounds more like
Marketing: Impossible! Are you thinking what Im thinking? Yes! What the
world needs now is a revolutionary approachhybrid marketing! The good news
is that SAS has been working hard on this and already has an answer for you.
In this session, you learn what hybrid marketing is and why its important to
your marketing success. You learn how this leverages your existing
investment in SAS®, and keeps your secret sauce analytics and
your datasafe, as well as keeps your business processes in compliance with
privacy-related regulations. We share our unique approach to this need,
showcase a few analytic tricks we've already built for marketers, and
describe in both business and technical terms how hybrid marketing can work
for you.
Andy Bober and Steve Hill, SAS
Session 3200-2019
An autocomplete feature has been present in integrated development
environments (IDEs) and source code editors for more than a decade. It
speeds up the process of coding by reducing typos and other common mistakes.
SAS® Studio, for example, suggests a next appropriate keyword
in a context. That software usually uses the declarative approach to
implement autocompletion. The main problem with this approach is a size of
an implementation, which scales with a languages grammar size, leading to
very complex logic. In this paper, we solve the autocomplete engine
construction problem by using machine learning, which automatically derives
an autocompletion engine from a language without explicitly being
programmed. The idea is to use generative models based on a recurrent neural
network (RNN) trained on SAS® source code to generate the
suggestions, like what you're going to type next, much like your phone
predicts your next words. An output of a generative model is a new piece of
something that looks like SAS source code. We say looks like because each
output of a generative model is unique and we cant be sure that the
generated text is valid SAS source code. And here is where SASLint comes to
the rescue. We use SASParser from SASLint to check that generated text is
valid SAS source code and suggest it to a user as an autocompletion. This
autocompletion functionality is demonstrated in a web-based text editor
built on top of Ace.
Alona Bulana, Independent Consultant
Igor Khorlo, Syneos Health
Session 3425-2019
SAS® Visual Analytics 8.3 introduces exciting new features to make it easier to craft beautiful reports. The new release improves efficiency by enabling report authors to reuse work done for data preparation, maintain report states across sessions, and use fewer clicks to get to a report. Users can create more compelling reports by using guides to layout the report, visual tables with graphical representations in table cells, and enhanced geo-analytics to explore geographic data. The report playback feature is a new way to present the report. Integration with the new SAS® Drive makes it easy to manage reports and share access with collaborators.
Rajiv Ramarajan and Jordan Riley Benson, SAS
Session 3536-2019
Application fraud in the finance industry is a form of fraud that is committed by obtaining various financial products via malicious acts such as identity theft, synthetic IDs, and account takeovers. In this paper, we present a real-time solution for application fraud prevention that developed within the Fraud and Security Intelligence division at SAS. This solution leverages some of our past successes in solving real-time fraud in other spaces such as payment cards and online payments. In addition to that, application fraud detection requires that we form a network view of the entities involved in the application to assess their risk. To this end, we have engineered a solution that combines techniques such as entity resolution, network and graph theory, real-time signatures, models based on machine learning to produce a score that represents the risk of an application, and the application of business rules for decisioning in real time. This paper describes the highlights of this solution.
Prathaban Mookiah, Tom O' Connell, John Watkins, and Ian Holmes, SAS
Session 3675-2019
The Arkansas All-Payers Claims Database (APCD) contains claims from multiple payer sources, as well as other available data sources. It does not have direct personal identifiers (DPIs), such as name or date of birth (DOB). Instead, it contains a hashed version of the concatenated last name and DOB, which enables linkage of individuals across payers and other data sources in the APCD. An expected match rate was derived for matching birth and death certificates to claims in the APCD. DPIs are contained on birth and death records, as well as Medicaid claims in the Health Data Initiative (HDI) data warehouse that is housed at the Arkansas Center for Health Improvement (ACHI). Hashed IDs were created for individuals contained in these data sources. To calculate a match rate, a denominator of true linkages was determined using the known DPI contained on the HDI data sources. The match scoring algorithm compared first and last names, DOB, and gender between these sources. The SPEDIS and COMPGED functions were used to compare first and last names, and because lower SPEDIS and COMPGED scores represent good matches, exact matches for gender and DOB were set to 0. A rubric was developed based on the scores and visual inspection to determine the true linkages. The numerator consisted of linkages that had a single hashed ID for each record linkage. This rate from this numerator and denominator gave us an estimate of true linkages we could expect when linking hashed IDs in the APCD.
Nichole Sanders, Arkansas Center for Health Improvement
Session 3584-2019
This paper discusses the nature and use of panel data in the economics and business literature. Various model specifications are presented, including variants of the Parks model, the Da Silva model, one-way and two-way random effects models (error component models), one-way and two-way fixed effects models, and seemingly unrelated regression (SUR) models. We illustrate these respective models with a practical example to demonstrate the use of SAS® programs to handle panel data. Emphasis is placed on the interpretation of the empirical results.
Oral Capps, Jr., Texas A&M University
Session 3358-2019
Have you ever wished that you could use Java Database Connectivity (JDBC) with SAS®? If you have, we have exciting news. SAS/ACCESS® Interface to JDBC! JDBC is one of the most versatile database connectivity APIs available. JDBC is very popular, and it is common for vendors to release JDBC drivers before they release Open Database Connectivity (ODBC) drivers. Some vendors release only JDBC drivers for their products. Now is the time to learn about this important new technology. This paper shows you how to effectively use SAS/ACCESS Interface to JDBC. You will learn how to do the following: configure the JDBC driver; connect to your database; query data from a database; and write data to a database. This paper is the jump start you need to learn how to effectively use SAS and JDBC with your database of choice.
Salman Maher, Bill Oliver, and Keith Handlon, SAS
Session 3348-2019
Seeing is believing. Understand your data by seeing where it lives. Adding
geographic context to your categorical and quantitative data through
geographic visualizations can present patterns that can help you understand
why things happen and what you can do to encourage that behavior or change
outcomes. In this paper, you learn the geographic capabilities in SAS®
Visual Analytics 8.3, how to use them, and the value they deliver.
Robby Powell, SAS
Session 3440-2019
SAS® Decision Manager enables users to operationalize analytics in real-time settings. In these deployments, analytics can create efficiencies and optimize your business process. However, to properly leverage the full value of analytical decision-making processes, users need access to their data. SAS Decision Manager offers that ability, enabling database queries within their real-time flows. This paper tours this ability, showing how to configure a database connection within SAS® Environment Manager and write custom DS2 code that queries a database within SAS Decision Manager. The results of the query are used to populate a data grid that can be used for further processing within the decision flow.
Ernest Jessee and Dean Taplin, SAS
Session 3526-2019
Automated analysis, a new feature in SAS® Visual Analytics, uses machine learning to automatically suggest business intelligence (BI) visualizations for you. Powered by the SAS® Platform, this analysis is performed quickly so that results are interactive. Once you choose a variable you are interested in, the most relevant factors are automatically identified and easy to compare. As a business analyst or manager, you can immediately understand the results in a report by using the automated narratives and familiar BI visualizations.
Rick Styll, SAS
Session 4055-2019
This paper investigates the relative importance of the qualifying factors stated by Airbnb to promote their Superhost program and makes an effort to discover other implicit factors like property name, description, amenities, and house rules that play a significant role in awarding the Superhost status. The goal of this project is to help hosts improve the experience of their guests and provide them with a framework that will guide them to increase their chances in the Superhost program. The study uses the Airbnb listings from New York City to predict the probability of becoming a Superhost based on text mining. Various statistical models such as logistic regression, least-angle regression (LARS), support vector machine (SVM), and decision tree were built to achieve this objective. Decision tree was the best model, with a misclassification rate of 12% and sensitivity of 52%. Another set of linear and non-linear regression models were built to predict the listing price of all Airbnb listings in New York City. In this case, the stepwise regression model outperformed the other alternatives with an R-squared value of 55%.
Soumya Ranjan Kar Choudhury, Man Singh, Rohit Banerjee, and Andres Manniste, Oklahoma State University
Session 3194-2019
The time is now, and with the help of this unique product, you can achieve your ultimate goal all tools available in one solution. In this infomercial, you will see how amazing a product this is and how its changing the whole game. But wait, there's more! This super-demo covers the entire analytical lifecycle end-to-end on one Integrated platform in no time from SAS® Add-in for Microsoft Office through SAS® Data Integration Studio, SAS® Data Preparation on SAS® Viya®, Model Studio, SAS® Visual Analytics, SAS® Lineage, and SAS® Enterprise Guide® in one demo. You will not ever need anything else, not sold in stores anywhere.
Gert Nissen and Vegard Hansen, SAS
Session 3071-2019
Efforts to counter human trafficking internationally must assess data from a variety of sources to determine where best to devote limited resources. These varied sources include the US Department of States Trafficking in Persons (TIP) reports, verified armed conflict events, migration patterns, and social media. How can analysts effectively tap all the relevant data to best inform decisions to counter human trafficking? This paper builds on two previous SAS® Global Forum submissions that apply SAS® text analytics to the TIP reports, as well as to the Armed Conflict Location Event Data (ACLED) project. We show a framework supporting artificial intelligence on SAS Viya for exploring all data related to counter human trafficking initiatives internationally, incorporating the TIP and ACLED sources as a starting point. The framework includes SAS rule-based and supervised machine learning text analytics results that were not available in the original data sets, providing a depth of computer-generated insight for analysts to explore. We ultimately show how this extensible framework provides decision-makers with capabilities for countering human trafficking internationally, and how it can be expanded as new techniques and sources of information become available.
Tom Sabo, SAS Institute Inc
Session 3136-2019
The Center for Medicare and Medicaid Services (CMS) Virtual Research Data Center (VRDC) is a resource that has strict stipulations on how to access and report on data. The ETL-related macros covered in this paper help with understanding the data. Also included is a useful SUMMARY procedure utility to help with obtaining useful facts from data (not just VRDC data). This set of ETL macros also includes a way to set low volume limits and satisfy the reporting requirements of the VRDC. That low volume format and function in this toolbox allows for an easier way of reporting and downloading data and meeting the requirements of the VRDC. This paper provides an overview of each of the macros and how they can be used. The following macros are included: 1) an optimized CONTENTS procedure, made more useful for sharing output with users not using SAS® and creating a better data dictionary style; 2) a macro that checks for uniqueness and any missing values on a field like claim id 3) a macro for standard frequency and getting the top 15 of a variable; 4) a macro for obtaining the frequency of date fields to ensure accuracy; 5) a macro to check numeric fields, from amounts to dates, to ensure that things like P25, Min, Max, p75 that are values are indeed accurate for the intended purpose of the field.
Zeke Torres, RedMane Technology
Session 3235-2019
The SAS® Output Delivery System (ODS) destination for Word enables customers to deliver SAS® reports as native Microsoft Word documents. The ODS WORD statement generates reports in the Office Open XML Document (.docx) format, which has been standard in Microsoft Word since 2007. The .docx format uses ZIP compression, which makes for a smaller storage footprint and speedier downloading. ODS WORD is preproduction in the sixth maintenance release of SAS 9.4. This paper shows you how to make a SAS report with ODS WORD. You learn how to create your reports content (images, tables, and text). You also learn how to customize aspects of your reports presentation (theme and styles). And you learn how to enhance your report with reader-friendly features such as a table of contents and custom page numbering. If you're cutting and pasting your SAS output into Microsoft Word documents, then this paper is especially for you!
David Kelley, SAS
Session 3274-2019
The main goal of this paper is to suggest a template for the process of building a high performing deep neural network in SAS®. A reader should be able to use the process shown in this paper as a template for their own work in deep neural networks. Even though deep neural networks are popular, not many papers discuss them in the context of a supporting, overall process of building a neural network. A second goal is to show details of several nodes in SAS® Enterprise Miner (TM), especially the powerful but often overlooked AutoNeural node on the Model tab. An additional goal is to build a network that can outperform a network in a paper by one of my professors. Deep learning is a kind of neural network and a specific kind of machine learning (for example, artificial intelligence). Deep learning is a recent and powerful machine learning algorithm that enables a computer to build a multi-layer non-linear model.
Yuting Tian, WestChester University
Session 3793-2019
Rural health trends are of great interest to service providers so that an effective method of intervention can be implemented. Technological intervention has been implemented in various rural areas of United States and its effectiveness has been studied to some extent. The objective of this research project is to examine recent trends in rural health services and determine whether technological implementations have made an impact on such trends. The focus area was chosen to be closer to home: the Southern United States. We highlight trends seen in rural health data, such as whether there is an expansion or shrinkage in Southern rural health clinics (RHC), federally qualified health centers (FQHC), and critical access hospitals (CAH). Further, we assess how technology, specifically telemedicine, is being used in rural areas. To answer these questions, we used HCRIS cost reports, Area Health Research Files (AHRF), Medicare Provider Utilization and Payment Data, and a review of current literature. Using these data, we were able to observe trends at the state, clinic, and provider level in these clinics over the period 20102016, and to determine how these trends can project the future outlook of rural health. We observed that rural health in the Deep South is expanding across all measures analyzed: number of clinics, full-time employees, patient visits, overall clinic costs, and cost per visit. We also saw that there is growth in technology use in these clinics, with a focus on telemedicine.
Jujit Kunwor, Aaron Driskell, and Courtney Hanson, University of Alabama
Session 3785-2019
The digital economy is showing tremendous growth in the 21st century, and it is having a massive impact on the current society. E-commerce is one element of the Internet of Things (IoT), and its worldwide retail sales amounted to 2.3 trillion US dollars. This amount shows the popularity of online shopping, and it indicates an evolution of retailers in this industry. A recent study conducted by GE Capital Retail Bank found that 81% of consumers perform online research before buying products. This indicates that consumers rely heavily on others opinions and experiences in order to buy a product. Businesses need to understand customers views of their products and of competitors products for strategic marketing. E-commerce businesses provide a platform to generate user-experience content through customer reviews, which are vital for a buyer to choose the best product out of numerous similar products available in the market. Companies need to analyze the customers perspectives through reviews for better business, evaluate customer engagement, and devise strategies for the launch of their products. This paper focuses on analyzing customer reviews primarily on Amazon using Python and SAS® Text Miner. This project determined which product features are given high-ratings or low-ratings, and how the high-rating features of a best-selling product perform compared to a similar product that is sold by a different vendor.
Manideep Mellachervu, Oklahoma State University
Session 3013-2019
Survival analysis handles time-to-event data. Classical methods, such as the
log-rank test and the Cox proportional hazards model, focus on the hazard
function and are most suitable when the proportional hazards assumption
holds. When it does not hold, restricted mean survival time (RMST) methods
often apply. The RMST is the expected survival time subject to a specific
time horizon, and it is an alternative measure to summarize the survival
profile. RMST-based inference has attracted attention from practitioners for
its capability to handle non-ortionality. This paper introduces RMST
methods in SAS/STAT® software: you can now use the RMSTREG
procedure to fit linear and log-linear models, and you can use the RMST
option in PROC LIFETEST to estimate the restricted mean survival time and
make comparisons between groups. The paper discusses the rationale behind
the RMST-based approach, outlines its recent development, and uses examples
to illustrate real-world applications of the RMSTREG and LIFETEST
procedures.
Changbin Guo and So Ying, SAS
Session 3765-2019
Structural Equation Modeling (SEM) is a statistical technique to model hypothesized relationships among observed (manifest) and unobserved (latent) variables. SEM is not only widely applied in the social sciences, but also is suitable in other areas such business, ecology, engineering, finance, pharmaceutical, and research. Under certain assumptions, a SEM can support causal inference as a Structural Causal Model (SCM). Path diagrams, commonly used with SEM, are visual representations of the hypothesized associations and dependencies and are particularly useful when studying causality. This paper describes how to formulate and interpret structural models as causal models. Using the PATH modeling language within the CALIS procedure, we fit SEMs for causal inference; we focus on interpreting model estimates, fit statistics, and how to infer causality from direct and indirect effects. SEM is appropriate for both observational data and randomized controlled experiments. Therefore, we support our discussion with two examples: the first application analyzes observational data from the Behavioral Risk Factor Surveillance System (BRFSS) to understand the relationship between obesity and depression; and the second example uses data from a designed engineering experiment to identify the settings that identifies fastest counting of bolts.
Banoo Madhanagopal
John Amrhein
McDougall Scientific Ltd
Session 3734-2019
Structured Query Language (SQL) was implemented in SAS® as the SQL procedure. A benefit of PROC SQL is that you can write queries or execute SQL statements on a SAS data set or in a database. Another benefit is that the SQL language makes it capable to combine the functionality of a DATA step and multiple PROC steps all into one procedure. Although useful, PROC SQL is limited in that it can make only one database connection per query, and it is not compliant with the American National Standards Institute (ANSI) SQL syntax. Due to this non-compliance, interacting with ANSI standard-compliant databases becomes more difficult. Due to these limitations with PROC SQL, a new procedure, PROC FEDSQL, was introduced. PROC FEDSQL was introduced in SAS® 9.4 and offers faster performance, ability to connect to multiple databases in one query, increased security, broader support of data types, and full compliance with the ANSI SQL: 1999 core standard, among others. In this paper, we explore these acclaimed benefits. We compare PROC SQL and PROC FEDSQL in terms of syntax, performance, user friendliness, and output. We determine which procedure should be used in various use cases, and how and when to incorporate PROC FEDSQL into your workflows.
Cuyler Huffman, Matthew Lypka, and Jessica Parker, Spectrum Health
Session 3800-2019
This e-poster showcases some of the potential applications of the command-line interface tools typically found within operating systems based on Linux to workflows built on SAS®. By executing SAS from the command line, a plethora of tools become available to the user. Some tools that are explored in this presentation are Bash, rsync, and Make. We demonstrate the application of these tools through individual examples for tasks such as automated code execution, scheduling, backup, and parallelization with SAS. The integration of the tools for advanced applications, such as automated software testing and dynamic parallelization, are also presented. Practical examples highlight the simplicity of implementation and potential efficiency improvements associated with these tools for typical SAS applications.
Shahriar Khosravi and Boone Tensuda, BMO Financial Group
Session 3634-2019
A relatively new procedure in SAS/STAT® software, PROC PSMATCH enables users to perform propensity score methods for observational study designs. Complex survey data sets are increasingly being used in many fields. Sampling weights, strata, and clusters provided with these data sets are important to include when calculating population-based estimates. Applying propensity score methods to complex surveys is possible with PROC PSMATCH. However, additional steps are required to properly account for design elements within these data sets to generalize the results to the population of interest. This paper discusses working with complex survey data sets and propensity score methods together. An illustrative example demonstrates the use of PROC PSMATCH in conjunction with other SAS/STAT procedures to obtain population-based estimates with propensity score methods. Additional steps needed for variable balance assessment and estimation of treatment effects are highlighted.
Patrick Karabon, Oakland University William Beaumont School of Medicine
Session 3630-2019
Federal and state governments, and private industries spend countless hours performing manual reconciliations between multiple accounting and transactional systems. Pulling unstructured information from two or more systems and trying to manually determine unreconciled items was an area that Texas Parks and Wildlife Department (TPWD) targeted to improve. This paradigm shift, led by automated reconciliation, helped the agency achieve significant time efficiencies by re-designing processes and freeing up staff time to focus more on analyzing and addressing differences. Most SAS Business Intelligence (BI) implementations center on various types of data analytics. However, by expanding the SAS BI toolset into other areas such as financial reporting and automating reconciliation, TPWD further justified the cost of using SAS BI.
Alejandro Farias and Drew Turner, Texas Parks and Wildlife
Session 3578-2019
Why get certified? Whats new? How do you set yourself apart from the rest? And how to nail the SAS® Certification exams? In 2019, SAS® Certification celebrates 20 years. Just as SAS continues to evolve, so does its certification program, as demand increases for highly skilled SAS® consultants. To date, most certification examinations have been multiple-choice tests. In 2019, more practical tests are coming not just for the predictive modeling certification, but for programming certifications as well requiring candidates to access SAS to create and execute code as part of the certification exam. In this session, we review everything you need to know to prove your worth as a SAS certified consultant: the certifications that are available in programming, data management, modeling, administration, visual analytics, data science, and more. We also look at the various ways to prepare for the certification exam: the training courses that are available (including free online training); the SAS Certification prep guides for Base SAS® programming and for advanced programming; what the SAS® Academy has to offer; how to download and run your own free installation of SAS to prepare for the SAS Certification exams, and lots more. Embrace lifelong learning, and use SAS Certification to validate that learning.
Andrew Howell, National Australia Bank
Session 4051-2019
A shortage of skilled resident US workers has resulted in the need to import workers from other countries. The H1-B visa lottery system helps fill these needs. This paper investigates sampling methods that can be used to predict the outcome of an H1-B application with an aim to contain application processing costs for both government and private industry. It specifically investigates how the imbalanced nature of the outcome of cases can adversely affect the predictive power of different models, and ways to address this issue. Reclassifying the target into different binary variables for government and private industry showed the benefits of different modeling techniques. The best model for the government focuses on forecasting withdrawals with a balanced gradient boosting model. For private industry, the best model focuses on predicting certified results with an 80% train decision tree model.
Matthew Bunker and Owen Kelly, Kennesaw State University
Session 3628-2019
With the onset of Industry 4.0 and the ubiquity of sensors leading to large volumes of data together with the advancements made in artificial intelligence (AI), this paper aims to demonstrate the concept of predicting machine faults by manipulating advanced data analysis techniques and enhancing maintenance efforts through the use of augmented reality (AR). Relevant data with regard to the health and performance of the machines were collected and transmitted through an Internet of Things (IoT) gateway to a centralized location, where the factory guardians are in place to monitor in real time. The dashboards should not only enable the monitoring to be done in its entirety, but also enable simple AI diagnoses to be performed via drill-downs and ad hoc exploration, as well as automatically capturing exceptions and providing notifications. Exceptions and alerts that require the interventions can be broadcasted remotely to the technicians in the field. The technicians tending to the machines can leverage the knowledge base and the AR diagnosis system for real-time decisions and visual assistance in diagnosing and repairing the machines. This not only reduces the need to perform intrusive troubleshooting but also greatly reduces repair time, enabling machines to be started and running in a shorter time.
Yee Yang Tay, UTM
Ye Sheng Koh, Universiti Sains Malaysia
Marvin Dares, Universiti Teknologi Malaysia
Session 3834-2019
Building programs that leverage the analytic and reporting powers of SAS® reduce the time to solution for critical tasks. This paper discusses, through example, how Base SAS® tools such as the FILENAME statement, macros, and the Output Delivery System (ODS) combined with the scheduler housed in SAS® Enterprise Guide® can be used to automate the process of moving from raw source data to insightful dashboard view quickly and efficiently. This paper is divided into two main parts. In the first part, we discuss how to use scripting language to bring data from a file location into the SAS environment, how to build programs that clean and subset the data, and how to transform the entire process into a macro. In the conclusion, we discuss how to use SAS procedures and ODS to transform the resulting data into a quality improvement dashboard view that can be set to automatically run and distributed to team members at a scheduled time.
Shavonne Standifer, Truman Medical Center
Session 3790-2019
The U.S. Food and Drug Administration (FDA) has published FDA Business Rules and expects sponsors to submit SDTM data sets that are compliant with the rules and with CDISC SDTMIG. These rules assess whether the data supports regulatory review and analysis. Some of them are specific to FDA internal processes rather than to CDISC SDTM standards. Pinnacle 21 is the most commonly used tool by both the industry and FDA to check compliance with both FDA business rules and CDSIC rules. However, Pinnacle 21 is usually used at a late stage of the SDTM programming development cycle, and it cannot help users to resolve its findings regarding Error and Warming messages, even if it is used at the very early stage. This paper presents a systematic approach to automate SDTM programming process to ensure compliance with FDA Business Rules. It contains study data collection design, data collection (edit-checking), standard SDTM programming process, and in-house macros for automatically reporting and fixing the issues to address non-compliance with FDA Business Rules. It avoids inefficient use of resources for repeated verification of the compliance and resolution of the findings from Pinnacle 21 for these rules. In fact, some of these non-compliant issues are often very costly or too late to be fixed at a late stage. This paper can assist readers to prepare SDTM data sets that are compliant with FDA business rules and with CDISC standards for FDA submission to ensure FDA submission quality, in addition to cost-effectiveness and efficiency.
Xiangchen (Bob) Cui, Hao Guan, Min Chen, and Letan (Cleo) Lin, Alkermes Inc
Session 3209-2019
In the pharmaceutical and contract research organization (CRO) industries, Microsoft Excel is widely used to create mapping specs and in reporting. SAS® enables you to export data into an Excel spreadsheet and do Excel formatting in many ways. But, Dynamic Data Exchange (DDE) is the only technique that provides total control over the Excel output. DDE uses a client/server relationship to enable a client application to request information from a server application. SAS is always the client. In this role, SAS requests data from server applications, sends data to server applications, or sends commands to server applications. Unfortunately, DDE is not supported on SAS® Grid Computing. This paper explores a replacement solution for DDE on a SAS grid by using a SAS® Stored Process, Microsoft Visual Basic for Applications (VBA), and SAS® Add-in for Microsoft Office. The paper also explores the automation process and extends the solution to format Microsoft Word documents.
Ajay Gupta, PPD
Session 3349-2019
Have you ever wished that you could get assistance with writing rules in LITI (language interpretation for textual information)? Now you can! Information extraction in SAS® Visual Text Analytics uses the LITI syntax, which is a powerful way to uncover useful information hidden in textual data. The extracted information can be used as new structured data for predictive models, for faceted searches, or can be added to reports to enable better business decisions. In this presentation and paper, we demonstrate how to use the rule generation feature in SAS Visual Text Analytics. We also explain how the rules are generated in the background and what you can expect from the resulting rule set. The resulting rules can be used directly in the model, edited by the user, or used in an exploratory way to find new candidate rule elements. We provide tips and best practices for using rule generation to build your information extraction models faster and easier.
Teresa Jade, Xu Yang, Christina Hsiao, and Aaron Arthur, SAS
Session 3824-2019
Many solutions based on SAS® use the SAS macro language to structure, automate, or make SAS code more dynamic. When you scale out to a large application with thousands of lines of macro code, a SAS macro usually becomes a bottleneck in development and starts to hit its limitations. Maintenance, debugging, and testing can become a challenging and time-consuming process even for experienced SAS programmers. The Lua language was designed to be embedded into applications to provide scripting functionality. Compared to alternatives like PROC GROOVY that runs in a separate JVM process, PROC LUA (introduced in SAS® 9.4M3) runs inside a SAS process and, as a result, has access to the C function bindings that SAS has for reading and writing data sets, and so on. With that plus its performance, small footprint, elegant syntax, support for data structures, PROC LUA became a very promising replacement for a SAS macro for big projects. A transpiler is a program that takes the source code of a program written in one programming language as its input and produces the equivalent source code in another programming language. In other words, it is a source-to-source compiler (like PROC DSTODS2 in SAS). In this paper, we consider performance differences in terms of CPU, memory, and disk I/O between PROC LUA and the SAS macro language for common coding situations, and build a transpiler that parses SAS macro code, creates an Abstract Syntax Tree representation, and translates it to Lua source code.
Igor Khorlo, Syneos Health
Session 3516-2019
Do you know who is visiting your website? Have they lurked here before or are they ready to purchase or worse yet, they just purchased and you don't even know it? Marketers know the risk of treating customers like strangers; research shows that consumers equate non-personalized brand interaction with a poor customer experience, and results show that personalized recommendations have a 5.5 times higher conversion rate. Marketers strive to provide the highly personalized, relevant messaging that consumers now expect with each interaction, but they are challenged to manage consumer identity across all channels in real time, and to use the real-time interactions to improve consumer profiles and targeting. In this paper, we share how SAS® Customer Intelligence solutions helped a professional sports team identify and understand their prospects and customers better in order to improve consumer experience and marketing effectiveness across online and offline channels. SAS® solutions were used to personalize messaging and refine marketing strategy, which improved insight in real time across their internal and third-party systems in the customer journey.
Mia Patterson-Brennan and Amy Glassman, SAS
Session 3114-2019
Frequently, a programmer or analyst needs to transfer data between SAS® and Microsoft Excel. Either the SAS LIBNAME engine, or the IMPORT and EXPORT procedures by default can transfer only single files between SAS and Excel. In the case of multiple files, it is too tedious to transfer files one by one. CALL EXECUTE is a SAS DATA step call routine with two magic features: 1) a mix of DATA step and procedures; and 2) you can pass DATA step values to SAS procedures or into an argument in a macro. These two features make it the best candidate for dynamic data processing and repeated tasks like batch file exchange between SAS and Excel. These features enable a programmer or analyst to write compact and highly efficient codes. This paper focuses on solutions of batch file exchange (import from or export to multiple Excel files or multiple data sheets in one Excel file by using either the SAS LIBNAME engine or PROC IMPORT and PROC EXPORT) between SAS and Excel through CALL EXECUTE.
Dadong Li, Regeneron Pharmaceuticals, Inc.
Eduardo Iturrate, New York University Medical Center
Session 3020-2019
Be Prepared is not just a motto for the Boy Scouts of America; it is also an
important concept for your data, on which you base your business decisions.
SAS® Data Preparation powered by SAS® Viya®
provides self-service capabilities for preparing your data to create more
consistent and accurate reports or analytic models, which ultimately lead to
better and more informed business decisions. This presentation walks through
using the profiling and data cleansing features of SAS Data Preparation to
show how a non-technical person can use the point-and-click interface to
prepare data.
Mary Kathryn Queen, SAS
Session 3111-2019
SAS® is increasingly being used in many organizations for business-critical processes. The company I work for is a prime example. Kiwibank, New Zealand's fifth-largest bank, uses SAS for a wide range of risk-related processes, many of which are business-critical. These processes include the following: regulatory capital calculations on lending; account behavior scorecards; scorecard monitoring; IFRS 9 credit loss calculations; and loan-to-value ratio (LVR) lending restrictions monitoring. Ensuring that all these applications keep going every day, producing reliable and accurate results, is no mean feat. It starts by architecting and implementing a resilient and robust SAS environment. This environment then needs to be carefully aligned and integrated with the company's business and IT infrastructure. Next, disciplined operational practices are required, including essential housekeeping, to keep all systems running smoothly. And finally there needs to be diligent management and checking of results to ensure accuracy. Many of these results have to be publicly disclosed, so the consequences of getting them wrong could result in serious reputational damage. Come learn some best practices used by Kiwibank to keep everything on track.
Ian Munro, Kiwibank Limited
Session 3019-2019
Iconic Betty Boop in the 1930s cartoon Boop Oop A Doop tamed a lion. Nowadays, SAS® has tamed the elephant, the yellow Apache Hadoop one, and this paper shows you how it is done! Some Hadoop elephants live on land and others in clouds, and with the right SAS tools, you can sneak up really close to tame that data of yours! This paper is your easy-to-read SAS on Hadoop jungle survival guide that explains Hadoop tools for SAS®9 and SAS® Viya®, the main Hadoop landscapes, and good practices to access and turn your Hadoop data into top-notch quality information. It is showtime with SAS on Hadoop!
Cecily Hoffritz, SAS
Session 3233-2019
The email access method has long been a powerful tool for SAS® programmers, providing them with the ability to easily generate notifications or deliver content as part of their data processing tasks. Although the method does provide a bevy of options, it has a few limitations such as the inability to specify multiple senders, custom headers, or custom formatting of recipient or sender names. This paper guides the user through the process of bypassing the built-in email access method by writing a custom email method that writes SMTP statements directly. Not only do readers gain the ability to work around the stated limitations, but they also gain a deeper understanding of advanced programming techniques available to SAS such as the SMTP protocol, sockets, encoding, and encryption.
David L. Ward, Health Data & Management Solutions, Inc.
Session 3625-2019
Building a solid financial entity is the entire purpose of having a portfolio that is able to hold environmental impacts because we expect that as some assets depreciate, others simultaneously appreciate due to diversification. Here we present a model that uses time as diversification factor to build a solid portfolio that holds its value under severe macroeconomical changes. A theoretical model was built under two very generic assumptions and SAS® Model Implementation Platform was used to evaluate the model using empirical data taken from a real portfolio and to stress the assumptions to extreme environmental changes.
Joao Pires da Cruz, Carla Tempera, and Mariana Eiras Soares, Closer Consultoria Lda
Manuel Fortes, SAS
Session 3056-2019
Propensity score matching is an intuitive approach that is often used in estimating causal effects from observational data. However, all claims about valid causal effect estimation require careful consideration, and thus many challenging questions can arise when you use propensity score matching in practice. How to select a propensity score model is one of the most difficult questions that you are likely to encounter when you do matching. The propensity score model should consider both the tenability of the assumption of no unmeasured confounding and the covariate balance in the matched data. This paper discusses how you can use the PSMATCH procedure in conjunction with other procedures in SAS/STAT® software to tackle some of these practical challenges. In particular, the paper describes how you can use causal graphs to investigate questions that are related to unmeasured confounding and how you can use the PSDATA statement in PROC PSMATCH to incorporate propensity scores that are computed using approaches other than logistic regression. The paper also illustrates features of PROC PSMATCH that you can use to try to improve covariate balance and control properties of the final matched data set.
Michael Lamm, Clay Thompson, and Yiu-Fai Yung, SAS
Session 3699-2019
This paper explains and demonstrates the many ways of calculating leads (and lags) in SAS®from the EXPAND procedure, reverse sorting, data merge, and application of data-set functions. It looks at the practicality of each way and examples of where each should be used and not used. This paper also analyzes how much time and memory each method requires.
Andrew Gannon, The Financial Risk Group
Session 3179-2019
SAS® Cloud Analytic Services language (CASL) is a language specification in SAS® Cloud Analytic Services (CAS) that provides a programming environment to manage the execution of actions. The results of those actions are processed with functions and expressions to prepare the parameters for subsequent actions. CASL and the CAS actions provide the most control, flexibility, and options when interacting with CAS. In addition to the many CAS actions supplied by SAS®, as of SAS® Viya® 3.4 you can create your own actions using CASL. You specify the interface to your new action, provide a name and parameters, and supply the CASL code that implements your new action. Once created, your action is available for use by any user that has permission. Your actions look and feel just like CAS actions provided by SAS. Jump on board and find out how to create your own actions that do just what you need!
Brian Kinnebrew, SAS
Session 3259-2019
Capture-recapture is a statistical methodology that uses repeated and independent identification from samples of the subjects of interest. The method is able to provide accurate counts of the entire population from these samples. Originally developed by ecologists to count animal populations, capture-recapture has become a critically important analytic tool for social justice, providing accurate counts of the number of people affected by a wide variety of problems, including crimes and natural disasters. In this method, careful design, development, and management of the underlying database are critical tasks. This paper demonstrates development and management of databases for capture-recapture analysis, including database organization, integrating additional data sources, addressing privacy issues, and database management and governance.
David Corliss, Peace-Work
Session 3240-2019
Structural equation models (SEMs) can be configured as structural causal models (SCMs) for causal inferences. These models are applicable in most industries and lines of business, such as pharmaceutical, financial, medical, marketing, operations, and research. SCM is a statistical method that estimates causal relationships among observed and unobserved (latent) variables within a system or population. Path diagrams are often used to represent SCM to visualize relationships between variables to better understand direct and indirect effects. In this hands-on workshop, participants code path diagrams using the CALIS procedure to fit two models for causal inference. One example uses data generated from a designed experiment, and the other uses operational or observational data. Participants will learn how to do the following: represent relationships among variables by using a graphical diagram; code and fit the graphical diagram using the PATH modeling language in PROC CALIS; interpret the model fit statistics and estimates; and infer direct and indirect causal effects.
Banoo Madhanagopal and John Amrhein, McDougall Scientific Ltd.
Session 2998-2019
Valid causal inferences are of paramount importance both in medical and social research and in public policy evaluation. Unbiased estimation of causal effects in a nonrandomized or imperfectly randomized experiment (such as an observational study) requires considerable care to adjust for confounding covariates. A graphical causal model is a powerful and convenient tool that you can use to remove such confounding influences and obtain valid causal inferences. This paper introduces the CAUSALGRAPH procedure, new in SAS/STAT® 15.1, for analyzing graphical causal models. The procedure takes as input one or more causal models, represented by directed acyclic graphs, and finds a strategy that you can use to estimate a specific causal effect. The paper provides details about using directed acyclic graphs to represent and analyze a causal model. Specific topics include sources of association and bias, the statistical implications of a causal model, and identification and estimation strategies such as adjustment and instrumental variables. Examples illustrate how to apply the procedure in various data analysis situations.
Clay Thompson and Yiu-Fai Yung, SAS
Session 3355-2019
The next generation of SAS® Enterprise Guide® is
here! The redefined SAS Enterprise Guide 8.1 is both sexy and intelligent.
SAS Enterprise Guide 8.1 sports a modern and attractive interface, tab-based
organization of content, and flexible window management such as floating,
docking, and leveraging multiple monitors. Want to view your code, log, and
results at the same time? SAS Enterprise Guide 8.1 has you covered. Want to
view four data sets at the same time? Sure, no problem. Want to just code,
without a project? You got it. See these features and more at the big reveal
of SAS Enterprise Guide!
Casey Smith, Samantha Dupont, and David Bailey, SAS
Session 3753-2019
Choice experiments help discover which product or service attributes a potential customer prefers. Choice models are an excellent tool to analyze the results of choice experiments. Consumers are presented with sets of product attributes, called profiles. Each respondent is shown a small set of profiles, called a choice set, and asked to select the preference that he or she most prefers. The choice model presented in this study aims to analyze business travelers trip choice when multiple options are available. A randomized set of options are presented to the user from a predefined choice set. The attributes for each travel choice include Mode, Time, Cost, Privacy, Flexibility, and Productivity. Data was collected from 400 participants using an online survey tool and did not contain any missing values. A Choice model was developed using JMP® Pro to analyze the choices made by survey participants. The aim of the analysis is to answer the specific research question: Do privacy, productivity, and flexibility have an influence on the travel choices made by participants? The results from the analysis include Effect Summary and Likelihood Ratio tests that can be used to determine the validity of the model. Additionally, Utility Profiler, Probability Profiler, and Effect Marginal provide Utility and Probability scores for each level of attributes. This enables the researcher to make conclusions about the likelihood the user will make a choice given the set of attributes.
Swarup Jacob, Oklahoma State University
Session 3756-2019
Community Care Behavioral Health Organization (CCBH) in Pittsburgh, Pennsylvania is a behavioral health managed care organization with a large provider network that works with people to obtain medical assistance and connects them to mental health and substance use treatment options. Studying the performance of different providers in this large network is of utmost importance because of the significant role these providers play in ensuring that our patients get the best possible treatment. Community Cares Quality Department designed specifications around different levels of care delivered by these providers and the Decision Support team at CCBH developed a report using IBM Cognos Business Intelligence. However, programming, executing, and delivering the reports via IBM Cognos proved to be challenging, and the paper addresses the various ways in which these challenges were minimized or overcome by using SAS®. SAS proved to be more efficient, more transparent, and more adaptable to possible future changes or additions to the specifications.
Ana Brito Skalos and Meghna Parthasarathy, Community Care Behavioral Health Organization, UPMC
Session 3520-2019
SAS® Viya® seems great, but how can teams collaborate with each other to analyze big data? Who owns the data? What about security? And where do the reports live? Is there a best practice? If you've found yourself asking these questions, you're not alone here at SAS, we experienced them first! In this paper, we walk through the steps of how we are establishing a collaborative environment so that your team can get the job done.
Weston Mcmanus and Allison Becker, SAS
Session 3398-2019
SAS® Theme Designer provides customers with a powerful set of options for customizing SAS® application and report themes. Apply sweeping changes to your theme with ease by leveraging the structure of SAS themes, and then override key features to refine the look. Strike the right color balance by mapping your brand palette to theme variables that are intelligently applied across the software. Create complimentary application and report themes to unite your interface controls and data visualizations. Preview your theme on a set of general controls within SAS Theme Designer. This paper explores best practices for optimizing a custom look and feel that reflects your brand and that provides a great experience for your users.
Lora Edwards, Lisa Everdyke, and Corey Benson, SAS
Session 3755-2019
The National Center for Health Statistics (NCHS) Data Linkage Program links NCHS health survey data with vital and administrative records. Due to changes in the way personally identifiable information (PII) is collected in the National Health Interview Survey (NHIS), current linkage algorithms depend more heavily on name variables. Changes to PII collection were made in part to improve linkage consent. In NHIS, prior to 2007, all nine digits of the Social Security number (SSN) were collected as part of the linkage consent process. However, after 2007, only the last four digits of the SSN were collected. Given fewer digits of the SSN, the algorithms have been modified to rely more heavily on names. Several functions were compared in SAS® to link names from the survey and administrative data source. These functions include SOUNDEX, NYSIIS, and COMPGED. This presentation describes the methodology used to compare the functions and presents results aiming to minimize the type I and type II errors in the linkage process.
Yu Sun and Cordell Golden, CDC NCHS
Session 3275-2019
Building a productive financial crime monitoring system to detect suspicious activity begins and ends with the data. Access to production data ensures that data scientists and strategy developers can design scenarios with actual behavior patterns and data profiles on hand not by using theoretical or mocked activity, which is not representative of real-world actions to be analyzed and modeled. Architecting a specialized analytical SAS® Viya® sandbox environment, with Read-Only access to production data, provides a powerful toolbox to productively analyze and prepare data; design, build, and tune analytical models; and visualize results efficiently. This paper provides best practices for leveraging SAS Viya technologies within an analytical sandbox environment, including SAS® Visual Analytics, SAS® Visual Statistics, SAS® Data Preparation, and SAS® Visual Data Mining and Machine Learning. This paper also demonstrates concepts and deployment strategies to operationalize financial crime detection strategies in SAS® Visual Investigator.
Steve Overton, 6 Degree Intelligence
Session 2975-2019
This paper explains the implementation of an account voting mechanism that enables dynamic updating of data. Three implementation methods are compared: using the MODIFY statement, using temporary arrays, and using hash objects. This paper also explains how the three methods impact performance.
Richard Langston, SAS
Session 3884-2019
When you see an interesting data set, report, or figure, do you wonder what it would take to replicate those outputs in SAS®? This paper does just that, by using SAS to re-create outputs that were originally generated by Python. A key concept for understanding this comparison is that the starting point is the Python code. The paper associates snippets of Python with the corresponding SAS statements, attempting a reasonable apples-to-apples comparison. In other words, the resulting SAS code does not necessarily represent how it would have been written if we had started with SAS rather than Python. The paper illustrates how SAS code lines up with the widely used Python language.
Daniel R. Bretheim, Willis Towers Watson
Session 3503-2019
SAS® Viya® extends the SAS® Platform in a number of ways and has opened the door for new SAS® software to take advantage of its capabilities. SAS® 9.4 continues to be a foundational component of the SAS Platform, not only providing the backbone for a product suite that has matured over the last forty years, but also delivering direct interoperability with the next-generation analytics engine in SAS Viya. Learn about the core capabilities shared between SAS Viya 3.4 and SAS 9.4, and about how they are unique. See how the capabilities complement each other in a common environment. Understand when it makes sense to choose between the two and when it makes sense to go with both. In addition to these core capabilities, see how the SAS software product lines stack up within each one, including analytics, visualization, and data management. Some products, like SAS® Visual Analytics, have one version aligned with SAS Viya and a different version with SAS 9.4. Other products, like SAS® Econometrics, leverage the in-memory, distributed processing in SAS Viya while at the same time including SAS 9.4 functionality like Base SAS® and SAS/ETS®. Still other products target one engine or the other. Learn which products are available on each, and see functional comparisons between the two. In general, gain a better understanding of the similarities and differences between these two engines behind the SAS Platform, and the ways in which products leverage them.
Mark Schneider, SAS
Session 3141-2019
Condition-based maintenance (CBM) of critical high-valued machines is a typical Internet of Things (IoT) scenario. Key to CBM is monitoring important machine health parameters, such that maintenance can be based on the perceived condition of the machine, rather than performing preventive maintenance at intervals, where the likelihood of failures between repairs is very small, or performing corrective maintenance by running the machine until it fails and repairing it. Vibration data analysis is a key tool for understanding the internal condition of a machine, often when it is running continuously. This paper illustrates how to use the new digital filters, time-frequency analysis tools, machine learning algorithms available in SAS® Visual Data Mining and Machine Learning, and SAS® Event Stream Processing to monitor the real-time condition of a live variable-speed rotating machine by analyzing vibration sensor measurements obtained at sampling rates higher than 10,000 Hz.
Yuwei Liao, Anya Mcguirk, and Byron Biggs, SAS
Session 3277-2019
Linear regression models are used to predict a response variable based on a set of independent variables (predictors). Multivariate regression is an extension of a linear regression model with more than one response variable in the model. In a linear regression model, a linear relationship between the response variable and the one or more predictors is assumed. In addition, the random errors are assumed to follow a normal distribution with a constant variance and are also assumed to be independent. In conducting a multivariate regression analysis, the assumptions are similar to the assumptions of a linear regression model but in a multivariate fashion. In this paper, we first review the concepts of multivariate regression models and tests that can be performed. In correspondence with the tests under multivariate regression analyses, we provide SAS® code for testing relationships among regression coefficients using the REG procedure. The MTEST statement in PROC REG is the key statement for conducting related tests. To correctly specify the necessary syntax, we first re-write the hypothesis test we want to test into a form of LBM = 0, where L and M are matrices determined by the hypothesis and B is the parameter matrix. The matrices L and M help us to correctly specify the syntax in the MTEST statement. Various hypothesis tests from an example are used to demonstrate how the L and M are decided and how the MTEST statement in PROC REG is written.
Joey Lin, San Diego State University
Session 3513-2019
There is often a misconception that performing risk analysis must be a major
endeavor that requires your institution to have advanced quants and
technical skills in house. However, with SAS® Commodity Risk
Analytics, the needed expertise is built in, but the risk platform is still
fully customizable. The easy-to-configure interface runs in your browser and
can have you understanding your risk in minutes. Explore your VaR and other
risk metrics, and drill down to the position level on the fly. Enable
traders to understand the impact of their trades before they make them.
Albert Hopping and Eric Carriere, SAS
Session 3345-2019
The most successful brands focus on users, not buyers. Engage with your internal and external users through digital dashboards and mobile apps that echo the value of your brand and deliver precise information to support decisions. See how you can convey your brand identity through your dashboards and custom mobile apps. In this paper, you learn how to use the capabilities in SAS® Theme Designer, SAS® Visual Analytics Graph Builder, reusable object templates, SAS® SDK for iOS, and SAS® SDK for Android.
Robby Powell, SAS
Session 3254-2019
This paper is a collection of eleven tips and tricks using the SAS® SQL procedure. It is intended for an intermediate SAS user who would like to take a peek under the hood. Code for this workshop is on the web at www.LexJansen.com. That code has the same section organization as this paper, and an interested person can get that code and read this paper while executing the code. That process provides examples of the concepts developed in the paper and, largely, can duplicate attendance at the hands-on workshop. Topics covered are: 1) the SAS® data engine; 2) indexing; 3) the optimizer; 4) correlated and uncorrelated sub-queries; 5) placement of sub-queries in SQL syntax; 6) views; 7) fuzzy merging; 8) coalescing; 9) finding duplicates using SQL; 10) reflexive joins; 11) using SQL dictionary tables and a macro to document SAS tables. The eleventh section adds some small tweaks to Vince DelGobbos excellent work on how to create a hyperlinked Microsoft Excel workbook. The appendix contains miscellaneous examples of SQL from which a reader can steal.
Russ Lavery, Contractor
Session 3489-2019
Models are specific units of work that have one job to perform: scoring new
data to make predictions. Containers are self-contained workers that can be
easily created, destroyed, and reused as needed. They are portable and
easily integrate into numerous modern cloud and on-premises execution
engines. SAS® users can now follow a recipe to turn advanced
model functions into on-demand containers such as Docker for both
on-premises and cloud deployment. SAS® Model Manager can be
used to organize the model content from many sources, including SAS and open
source, to create containers. This presentation presents the basics and
shows you how to turn your SAS analytical models into modern containers.
Hongjie Xin, Jacky Jia, David Duling, and Chris Toth, SAS
Session 3143-2019
Traditionally, tumor response and duration of treatment information have been displayed in separate graphs in which the subjects can be sorted by different criteria. In such cases, the clinician has to work harder to associate the subject across the graphs. Recently there has been increased interest in combining this information in one visual. Displaying the data together, sorted by the tumor response with associated duration information, makes it easier for the clinician to understand the data. Three-dimensional waterfall graphs, which have both pros and cons, have been proposed for such cases. This paper shows you how to build a 3-D graph using SAS® that shows both tumor response and duration of treatment. This paper also presents alternative 2-D visuals that were created using the SGPLOT procedure. These 2-D visuals enable easier decoding of the data, which enables you to display more information in the graph.
Cleester Heath Iv and Sanjay Matange, SAS
Session 3149-2019
In clinical trials, Consolidated Standards of Reporting Trials (CONSORT) flow diagrams are an important part of the randomized trials report. These diagrams present a birds eye view of the flow of patients through the different stages of the trial. The SG procedures do not support a statement for drawing these diagrams. But by completing some data processing steps, you can draw commonly used CONSORT diagrams by using the SGPLOT procedure. You can generate the output in all of the formats that are supported by the Output Delivery System (ODS)and not just RTF! In this paper, we show you how to harness the power of the SGPLOT procedure to create these diagrams.
Prashant Hebbar and Sanjay Matange, SAS
Session 3752-2019
The Centers for Medicare and Medicaid (CMS) Chronic Conditions Warehouse (CCW) Virtual Data Research Center (VRDC) is a large multi-user SAS® Grid environment in which most of the users are limited to SAS® Enterprise Guide® with no command-line interface. Load Sharing Facility (LSF) is a powerful true batch tool that enables a user to launch dozens of SAS® jobs and then log off of the environment, but only from the command-line interface. It is possible to make LSF batch available to users with access to only SAS Enterprise Guide with adequate security by using custom macros, script modifications, and SAS invocation options tied to Linux and UNIX group membership. This presentation describes customizing the LSF batch environment, security issues, and connecting SAS Enterprise Guide to the custom LSF environment.
Adam Hendricks and Derek Grittmann, General Dynamics Health Solutions
Session 3501-2019
The Thomson Reuters Cost of Compliance 2018 study surveyed global systemically important financial institutions about their expectations for costs associated with regulatory compliance over the next year. Seventy-six percent of participants expected an increase in focus on managing regulatory risk in the upcoming year. These companies expected that they would prioritize General Data Protection Regulation (GDPR) readiness, policy management, risk identification and control effectiveness, and the assessment of the impact that regulatory changes have on business. Since the financial crisis of 2008, this shift in focus on regulatory obligations within financial institutions has become a driving force for the creation of software that can be customized, enhanced, and easily updated by the user. These changes are often necessary to keep pace with the ever-changing regulatory efforts of governments around the world. SAS has developed software with these needs in mind. SAS® Model Risk Management, SAS® Governance and Compliance Manager, and Stress Testing Manager were built on a common framework with the idea that financial companies can plug in the software to fit their current needs and add on to it at any time as those needs evolve. Enabling risk and regulatory compliance software to work in parallel enables financial institutions to quickly solve increasing governmental demands and fosters a risk-aware business culture.
Kelly Potter, SAS
Session 3201-2019
SAS® Event Stream Processing has a set of great capabilities that both IT and business areas can easily benefit from. This technology enables seizing opportunities and spotting red flags that usually are hidden in the massive torrent of fast-moving data that flows through our business, without the need for storing any data, by applying simple business rules, pattern detection, geofencing, and real-time training and scoring of analytical models. However, it is not always easy to translate these specific features into direct business benefits for business groups, which usually hold the budget for projects like these. Therefore, it is crucial to demonstrate to business users how they will benefit from this technology and how the features will help them take faster action and achieve better results. This paper guides you through a process to create a strong and persuasive business plan that translates the technology features from SAS Event Stream Processing to business benefits, including a suggested roadmap for adopting SAS Event Stream Processing features.
Henrique Consolini Danc, SAS
Session 3266-2019
Knowledge of SAS® programming provides excellent employment prospects in a variety of industries and can lead to a challenging and rewarding career. However, learning the SAS programming language can be daunting, especially when undertaken alongside a full-time job and family obligations. This paper discusses the authors experience in developing and delivering effective online SAS programming courses and presents strategies and ideas to promote successful learning and knowledge transfer when teaching a diverse student population of working professionals.
Justina M. Flavin
Session 3506-2019
JSON is continuing to grow as the preferred data interchange format due to its simplicity and versatility. The JSON procedure gives SAS® users the ability to export SAS data sets in JSON, as well as the ability to create custom JSON output. The procedure is simple to use and gives the user a huge amount of flexibility and control of the JSON output. This paper gives an overview of how to use the JSON procedure as well as detailed use cases to highlight the most important options to get the most out of the output generated by the procedure.
Adam Linker, SAS
Session 3314-2019
With the introduction of the SGMAP procedure to SAS® ODS Graphics, geographic mapping has never been easier. This paper shows you how to unlock the geographic potential of your data in just a few statements to create rich, meaningful maps showing locations, regions, pathways, and labels. By mixing and combining statements, you can highlight important aspects of your data or reveal information hidden in the surrounding geography. With PROC SGMAP, the world is at your fingertips!
Kelly Mills, SAS
Session 3660-2019
Lists can be invaluable for numerous operations in SAS® including dropping or altering many columns in a data set or removing all rows containing particular values of a variable. However, typing a long list of values can be tedious and error prone. One potential solution is to use the INTO clause with the SQL procedure to assign a single value or multiple values to a macro variable and thereby reduce this manual work along with potential typographical errors or omissions. The INTO clause even has additional benefits such as the ability to retrieve and store data set characteristics into a macro variable and the ability to easily customize formats, which makes this a great tool for a wide variety of uses.
Julie Melzer, Educational Testing Service
Session 3880-2019
International Organization for Migration challenge: Towards enhanced needs mapping for improved capacity and resource planning. The International Organization for Migration (IOM) of the United Nations challenged start-ups to come up with a solution for better supply chain management by utilizing data and analytics. Together, Notilyze and Elva came up with a suitable solution to make more use of already available data by using SAS® Viya®. Specifically, the project team helped to create an analytical tool that allows for a better translation of existing analytical outputs (for example, the displacements tracking site assessments) for site planning and concrete supply chain management. It strives to realize this goal by achieving the following objectives: 1) Using both existing displacement tracking site assessment data and input from humanitarian field workers within IOM and other relevant stakeholders to identify a set of common indicators on humanitarian needs, related development needs, and supply chain gaps for needs mapping and forecasting during this project; 2) Based on these standardized indicators, develop a prototype algorithm that provides automated forecasting of humanitarian and development needs and supply chain requirements; and 3) Visualize these on-the-ground needs and supply gaps in interactive dashboards, maps, and automated forecasts.
Patrick Sekgoka, South African Reserve Bank
Olufemi Adetunji, University of Pretoria
Session 3139-2019
The SAS config file is powerful and useful. In this example, we show how to customize the names of the Log and List files from the code we run, and obtain a useful new name as the result. The new name of a log file might look like this: 20180904_hhmm_userID_name-of-code-that-was-run.log. The benefit is that when one user or a team of users builds the code, everyone can see its progress. In this way, collaborating is simplified and results are easier for colleagues to share and consume.
Zeke Torres, RedMane Technology
Session 3247-2019
All the major trends call for advanced control and accountability toward the use of data. From the migration to cloud applications and storage, to the deployment of big data environments, the democratization of analytics and artificial intelligence (AI), and the increasing requirements for data privacy and data protection data governance has changed from something that is nice to have to being a must-have, with an ever-expanding scope to address. Gone are the days of marketing databases, some ERP processes, or specific regulations such as the Solvency 2 Directive or BCBS 239 being the limit. Most organizations came through strong challenges, aligning people and processes, and trying to sustain the governance effort; progressively this dream of enterprise data governance is fading. Organizations are now looking at more surgical initiatives to take control of their data lakes, and to ensure that their analytical processes are fed with reliable information and that their data privacy policies are enforced. They want results immediately. In this session, we look at how data governance can be smarter, how it can be automated, and how it can be fun by relying on analytics and AI.
Vincent Rejany, SAS
Session 3127-2019
Developing and maintaining an analytical environment can be hard especially if your area of interest is complex, involves more than one subject area, and has several source systems that needs to be integrated. There are several tools available to us when developing and maintaining such an environment, usually a data warehouse; examples are data and analytical processes, and documentation. One of the neatest parts of documentation (and communication) is the use of information and data models. In this paper, I discuss how to use information models and some of the data modeling techniques that can be used in a data warehouse. I also show how to exchange data models between data modeling tools and SAS® metadata, and tools like the ETL tool in SAS® Data Integration Studio.
Linus Hjorth, Infotrek
Session 3142-2019
The term data preparation summarizes all tasks that are done when collecting, processing, and cleansing data for use in analytics and business intelligence. These tasks include accessing, loading, structuring, purging, unifying (joining), adjusting data types, verifying that only valid values are present, and checking for duplicates and uniform data (for example, two birthdates for one person). Data preparation can be costly and complex because of increasing data quantity and the number of data sources. Data preparation is a new paradigm that is now shaping the market. Previously, data management concentrated on designing and running extract, transfer, load (ETL) and data quality processes in order to feed analytic processes. This situation was all very well when data volumes were smaller and the velocity of new data was slower. Were now seeing a trend toward more dynamic data management, with data preparation playing the role of self-service data management. Traditional data management processes can produce data up to a point, but dynamic fine-tuning and last-minute work is being done in a self-service way, using data preparation tools. Its more and more important to shape the data and get it right for analytics. Data preparation has become important because many more companies are data-driven. Businesses make decisions based on data, and it is enormously important that you can access data quickly and prepare it for analysis and business intelligence.
Ivor Moan, SAS
Session 3030-2019
SAS® Cloud Analytic Services (CAS) provides advanced and powerful analytic capabilities on big data, both in-memory and in-parallel. But what do your reports, models, or decisions look like if your data is not of good quality? How do you improve the quality of big data by using distributed processing and in-memory techniques? CAS offers not only numerous actions for analyzing big data, but it also provides big data preparation and big data quality capabilities from a user interface perspective, as well as from a programming perspective. This paper focuses on the programming techniques that are available to perform common data quality tasks in parallel on in-memory (big) data.
Nicolas Robert, SAS
Session 3614-2019
Data-driven agent allocation provides immense opportunity in improving the efficiency of any process. A cost-effective system can be designed using machine learning and optimization in SAS®. In this paper, application of machine learning and optimization for agent allocation in two-stage job processing is described using a case study of a Business Process Outsourcing (BPO) organization. This organization handles credit card application screening and verification processes for a large bank. The paper provides a brief overview of unsupervised machine learning algorithms for agent and application profiling. Clusters of application and agents are created and are used in an agent allocation optimization problem. The optimization model framework is extensively discussed to solve the problem of skill-based agent allocation to loan application processing based on its complexities. This process has two stages: application verification and decision review. A mixed-integer optimization problem is modeled and is solved using the OPTMODEL procedure. The design for an end-to-end process to implement optimization for agent allocation is also discussed in this paper.
Lokendra Devangan and Padmanabha Chandu Saladi, Core Compete
Session 3910-2019
Data from US federal health surveys frequently use complex survey structures, rendering traditional procedures not useful for analysis. The SAS survey procedures exist but have not yet become a regularly used asset in analysis. Instead, users frequently choose to use other programs or add-ons for even the most basic of analyses. This paper demonstrates why the survey procedures such as SURVEYFREQ, SURVEYLOGISTIC, and SURVEYREG should be in everyone's toolbox when using complex survey data in research or practice.
Colin Nugteren, Notilyze
Session 3037-2019
Programmers come across various binary endpoint data when working on data analysis. However, when working on binary endpoint, one of the challenges is getting the correct confidence interval (CI) for a proportion. In SAS®, most of the programmers get confidence intervals using the FREQ procedure. Under certain scenarios, PROC FREQ calculates the confidence interval for a proportion of another group. Hence, it turns out to be an incorrect confidence interval for the proportion of the group programmers have requested. Here, we present a simple but accurate way to get a confidence interval of a proportion. We present a macro to calculate the confidence interval that is based on derivation as per the statistical method in a DATA step. This macro can help programmers to compact the code and avoid miscalculation of the confidence interval.
Kamleshkumar Patel, Jigar Patel, and Dilip Patel, Rang Technologies
Vaishali Patel, Independent
Session 3215-2019
Deep learning has evolved into one of the most powerful techniques for analytics on both structured and unstructured data. As a well-adopted analytics system, SAS® has also integrated deep learning functionalities into its product family, such as SAS® Viya®, SAS® Cloud Analytic Services, and SAS® Visual Data Mining and Machine Learning. In this presentation, we conduct an in-depth comparison between SAS and Python on their deep learning modeling with different types of data, including structured, images, text, and sequential data. Our comparison highlights the main differences between SAS and Python on programming styles and model performances on deep learning.
Linh Le, Institute of Analytics and Data Science
Ying Xie, Department of Information Technology; Kennesaw State University
Session 3623-2019
The call for industry-leading analytics at financial institutions of all sizes grows every day. Rising expectations from regulators, senior leaders, associates, and customers on topics as far reaching as how do we measure and mitigate risk? to why cant my bank send me the right targeted offer? seem at least difficult, if not unattainable. Over the last two years, Trustmark began its transformation to bring big bank analytics to a small, sleepy Southern bank. Driving this kind of change requires changing people, process, mindset, and technology. Like similarly sized and smaller banks, Trustmark is challenged with a limited talent pool of analysts, traditional IT-driven processes, the its always been that way mentality, and deeply rooted Microsoft Excel. This customer journey explores strategies to overcome these obstacles, and how SAS served as a technology partner along the way.
Chase Ogden, Trustmark National Bank
Session 3553-2019
Global manufacturing companies generate vast amounts of data from both operations and enterprise resource planning systems that have large unstructured components associated with them. These text-based data sources can be difficult to integrate into traditional reporting and analysis workflows that require querying and numerical calculations. With the advancement and ease of entry into developing text-based analytics, Dow is employing several types of text modeling methods to increase margins, improve safety, and enhance our customer experience. In this talk, we share several examples of how Dow uses internally and externally generated text data from large document repositories to develop supervised and unsupervised models that uncover actionable insights for production, marketing, and environmental health and safety.
Michael Dessauer
Session 3368-2019
In the excitement and hype around machine learning (ML) and artificial intelligence (AI), most of the time is spent on the model building. Much less energy is expended on how to take the insights from models and deploy them efficiently to create value and improve business outcomes. This paper shows a complete example using DevOps principals for building models and deploying them using SAS® in conjunction with opens source projects including Docker, Flask, Jenkins, Jupyter, and Python. The reference application is a recommendation engine on a web property with a global user base. This use case forces us to confront security, latency, scalability, and repeatability. The paper outlines the final solution but also includes some of the problems encountered along the way that informed the final solution.
Jared Dean, SAS
Session 3737-2019
When deploying SAS® products on Red Hat platforms, there are subsystems that need to be set up and configured properly from the very start. While some of these subsystems and tuning are easy to fix later on, others can have a dramatic effect on performance and scalability and can be very costly to fix, especially once your data sets have been deployed. We are here to discuss some of the places where many of the more costly mistakes are made.
Barry Marson, Red Hat, Inc.
Session 3668-2019
Student employees are an extremely valuable, but often underutilized asset. An organization must be willing to invest in a student employment program that benefits the organization while staying focused on the responsibility to educate and prepare students for the modern workforce. The Institute for Veterans and Military Families (IVMF) at Syracuse University, higher educations first interdisciplinary academic institute that is focused on advancing the lives of the nations military veterans and their families, has developed an internship program that is predominately led and managed by the students themselves. This presentation details the development of the student internship program, the types of projects students are given, and the management and cultural elements that the IVMF attributes to its success. In the IVMF program, an experienced graduate student is selected to lead and manage the other students, encouraging their own growth as well as reducing the time costs to staff. Students are responsible for recruiting, interviewing, selecting, and training their replacements within a hiring system the IVMF has refined. Due to the reputation of this program amongst Syracuse University students, IVMF is able to select from among some of the best and brightest students on campus, creating an environment that enhances both the skills of student interns as well as IVMF staff.
Bonnie Chapman and Nick Armstrong, Institute for Veterans and Military Families, Syracuse University
Session 3306-2019
Methods of committing fraud and other financial crimes are becoming increasingly sophisticated. One method for detecting fraud is to use unsupervised machine learning techniques. In this breakout session, we explore how isolation forests can be used to detect anomalies without requiring previously labeled data. We also compare this method against other well-known anomaly detection methods to evaluate its performance on selected data sets.
Ryan Gillespie, SAS
Session 3382-2019
According to CNN, there are approximately 20 to 30 million slaves in the world today. Trafficking in Human Beings, or THB, is the third largest international crime industry, behind illegal drugs and arms trafficking. THB generates a profit of $32 billion every year. Of that amount, $15.5 billion is made in industrialized countries. The United States, possibly surprisingly, is one of the highest THB destinations in the world. Can an established anti-money laundering program assist with combating THB? The short answer is Yes. A July 2011 Financial Action Task Force (FATF) report best summarizes this issue as follows: There is no specific guidance on money laundering associated with THB/SOM [smuggling of migrants] because the channels used, the instruments and the sectors implied are the same as for other criminal activities. However, the emphasis can be different and as those activities in their first steps are mostly cash-based businesses, it is important to pay special attention to cash operations. FATF lists a number of case studies and THB red flag indicators, which are considered helpful to the financial industry to better fight money laundering resulting from THB. In this session, we discuss THB and some case studies that yield these red flag indicators. You will learn how SAS® Visual Investigator can incorporate combatting THB into scenarios, alert generation, and investigations.
Dan Tamburro, SAS
Session 3554-2019
A credit risk score is an analytical method of modeling the credit riskiness of individual borrowers (prospects and customers). While there are several generic, one-size-might-fit-all risk scores developed by vendors, there are numerous factors increasingly driving the development of in-house risk scores. This presentation introduces the audience to how to develop an in-house risk score using SAS®, reject inference methodology, and machine learning and data science methods.
Amos Odeleye, TD BANK GROUP
Session 3981-2019
With many statistics, protecting the privacy of individual response data is legally mandated. Privacy of data is not a new topic in government statistics. In fact, much of the work involved in producing tables of statistical estimates can concern making sure data from individual respondents (be they businesses or persons) is kept confidential. The consideration of confidentiality needs to occur early in productions of tables of statistical estimates. The author explains some basics of statistical confidentiality calculations and explains why early consideration of confidentiality needs to happen in projects.
Peter Timusk, Statistics Canada
Session 3160-2019
Go beyond a basic report-viewing experience with SAS® Visual Analytics viewers. Discover seven precious gems in the viewers that enrich your report-viewing experience. With the nifty present screen feature, conduct a live report presentation with your geographically dispersed colleagues worldwide from your Apple iPad or iPhone. With the new voice assistant feature, navigate, open, and interact with reports on iOS devices. Use the report playback and screenplay feature to create a customized report presentation. Create custom summaries for report objects, and use the personalized state feature to save report object selections for drilling, filtering, and other interactions. Generate URL links to reports, objects, and images that you can share or copy and paste. Refresh only the report objects that you select. This paper is suitable for users of all experience levels.
Lavanya Mandavilli, Joe Sumpter, Houda Moudouni, and Achala Kamath, SAS
Session 3944-2019
Distribution transformer is the most vital asset in any electrical distribution network. Hence, distribution transformer health monitoring and load management are critical aspects of smart grids. Transformer health monitoring becomes more challenging for smaller transformers where attaching expensive health monitoring devices to the transformer is not economically justified. The addition of Advanced Metering Infrastructure (AMI) in smart grids offers significant visibility to the status of distribution transformers. However, leveraging vast amount of AMI data can be daunting. This paper uses the hourly usage data collected from Ameren Illinois AMI meters to determine distribution transformer outage, failure, and overload. The proposed methodology not only detects and visualizes outage and congested areas in near real time, but also detects transformers and distribution areas with a long history of outage and congestion. This paper also offers a predictive algorithm to enhance regular equipment maintenance schedules and reduce repair truck trips for unscheduled maintenance during unplanned incidents like storms. SAS® Enterprise Guide®, SAS® Enterprise Miner (TM), and SAS® Visual Analytics were used to efficiently produce the information necessary for operational decision-making from gigabytes of smart meter data.
Prasenjit Shil, Ameren
Tom Anderson, SAS
Session 3719-2019
In our longitudinal health study, when certain health events occur, we restrict access to biological specimens from that time period. The reason or reasons for restricted access are maintained in a data table. Our challenge is to provide the reasons for restriction at the individual and population level for reporting purposes. When biological specimens are requested for use with a project, we must review the ID/visits requested against the list of restricted specimens. Then, the restricted ID/visits, along with the reasons, are provided to an oversight committee to determine whether the restricted specimens can be used for the project. We developed an easy method for creating the reason list in words by creating variable names that are succinct and descriptive. Using an ARRAY statement and the CATX and VNAME functions, we quickly compile the reason list to give to our oversight committee to aid in their decision. We also show how to make minor changes to this method to create a similar list when variables are not descriptively named.
Gayle Springer, Johns Hopkins Bloomberg School of Public Health
Session 3267-2019
SAS® is still the undisputed market leader in the commercial data science space. A strong global academic alliance provides a quality stream of graduates literate in SAS, and the educational offering of SAS is among the best in the industry. Still, almost all organizations focused on SAS face increasing challenges and complexity due to the proliferation of new programming platforms and paradigms, both in the industry and in academia. Inexpensive online learning options that mostly use alternative languages are already an established supplement, even an alternative, to academic or corporate training. They are one of the key drivers of the democratization of data science. Publicly available SAS online courses remain largely those developed by SAS, although the courseware developed by members of the SAS® Academic Alliance is mostly unavailable on leading MOOC platforms. In this session, we lay out a proactive career and professional development strategy for SAS professionals and organizations in this highly competitive environment. We look at MOOC platforms such as Coursera, Udacity, DataCamp, edX, and Udemy. We look at the success of the Kaggle concept (and some Kaggle-inspired failures). We propose an approach centered around SAS that is based on industry-focused communities such as Project Data Sphere, combined with SAS® Education, SAS Academic Alliance, and MOOC platforms. Finally, we include lessons learned as the industry sponsor of a new SAS academic program.
Jovan Marjanovic, ProWerk Consulting
Session 3958-2019
Data from US federal health surveys frequently use complex survey structures, rendering traditional procedures not useful for analysis. The SAS survey procedures exist but have not yet become a regularly used asset in analysis. Instead, users frequently choose to use other programs or add-ons for even the most basic of analyses. This paper demonstrates why the survey procedures such as SURVEYFREQ, SURVEYLOGISTIC, and SURVEYREG should be in everyone's toolbox when using complex survey data in research or practice.
Charlotte Baker, Virginia Polytechnic Institute and State University
Session 3794-2019
When working with visualization of grouped data, the SAS® Output Delivery System (ODS) defines default attributes (colors, symbols, patterns) for each group in the graph. For example, if we were to plot line plots with markers for different groups in a graph, ODS styles would assign by default a line color, a symbol for the marker, and a marker color for each of the groups by itself. Although this might seem convenient, if there is an update in the data, the values for the groups might change or there might be missing groups in the data. This situation might cause the attributes for the groups to differ each time the graph is run. To aid in uniformity, we can use discrete attribute maps that assign set attributes to each group value. This consistency in the patterns and attributes then makes it easier to understand and review the graphs. In this paper, we show how to implement discrete attribute map sets with examples, and how it ensures an easy way to consistency in display of outputs.
Shilpakala Vasudevan, Ephicacy Lifescience Analytics
Session 3128-2019
The DS2 procedure brought the object-oriented concept to the world of SAS® programming and enabled a high level of data handling that the DATA step could not reach for years. One detailed but noteworthy feature is that the hash package, a predefined package in the DS2 procedure, enables us to assign a FedSQL query to the argument tag of the dataset method. By specifying a FedSQL query for the argument, you can directly store the result of the query into the hash object without creating an unnecessary subset data set beforehand. Despite this and other powerful advantages in data handling, the DATA step still prevails among SAS users, and in most cases, it makes us feel content to continue using the DATA step. Given the situation, this paper introduces an alternative way to achieve what the DS2 procedure enables us to, by the traditional DATA step combining a hash object, the SQL procedure, and the DOSUBL function, which enables the immediate execution of SAS code.
Yutaka Morioka and Jun Hasegawa, EPS Corporation
Session 3899-2019
The JPMorgan Chase Operations Research and Data Science Center of Excellence (ORDS CoE) started a multi-year project to provide the internal Business Resiliency team with a simulation-based application to: 1) Support strategic and tactical planning; and 2) Reduce the number of required physical shutdown tests (which increase operational costs and negatively impact customer service). In the event of an outage, Business Resiliency seeks key insights about impacted locations: What will happen to service level during an outage? How will mitigation strategies impact service level (add headcount, reduce volume, and/or processing time)? The approach leverages simulation-based modeling (via SAS/OR® software) to estimate the expected impacts to service level due to an outage. The dynamic design of the model enables users to simulate any combination of 100+ call centers and/or 50+ locations, with the ability to customize mitigation scenarios to compare with the do-nothing scenario. This presentation highlights the methodology employed through SAS, including: applying SAS/OR and PROC CPM as the simulation engine; deriving survival functions (PROC LIFETEST) to model abandonment behavior; fitting historical handle time data to probability distributions by call type and building a handle time function (PROC FCMP) for use in model parameterization; improving model performance through parallel processing with MP CONNECT; and generating output statistics through bootstrapping with PROC SURVEYSELECT.
Jay Carini, Amy Pielow, and Joel Weaver, JPMorgan Chase
Session 3963-2019
When using the TRANSPOSE procedure to transform narrow data (a single subject variable stored in many rows) into wide data (one row with subject values stored as distinct variables), the length of the original subject variable is used as the length for each new variable. As a result, variable lengths can be much larger than necessary. This paper demonstrates a method to automatically assign the smallest length necessary to each variable once the narrow data is transposed into wide data. The method is currently used on large-scale national survey data that is updated daily and accounts for the possibility that the optimal length of a variable might change from day to day.
Ethan Ritchie, Sarah Cannon
Session 3417-2019
The electricity industry is waking up to the prospect of large-scale
deployments of electric vehicles (EVs). As an IoT-enabled connected device,
electric vehicles present both opportunities and risks to the electric power
industry. They require a power source in order to recharge, but their
battery can also function as a source of supply. EV fleets and buses present
different opportunities than individually owned cars. There is a complex and
competitive ecosystem of stakeholders, some of which will be in direct
competition with incumbent energy suppliers; there is little room for
monopoly market thinking, even for vertically integrated utilities.
Effective use of IoT data will be key to EV market optimization. How
utilities build infrastructure today, both for EV charging and for data
analytics, can determine their opportunity to capitalize on this unique
growth opportunity. This presentation shares the global insights that SAS
and Intel commissioned from Navigant Research in the area of EV readiness in
a white paper titled Charging Ahead with EV Analytics.
Tim Fairchild, SAS
Session 3725-2019
Organizations often struggle to make strategic insights actionable. Powerful analytics, when available at key operational decision points, can make the difference. SAS® Viya® includes public REST APIs to all underlying functionality, so software developers can add proven SAS® analytics to enterprise applications. This presentation discusses techniques and use cases for centralizing organizational analytics on SAS Viya and embedding them in SAP and Salesforce environments.
Olivier Thierie, Wouter Travers, and Andrew Pease, Deloitte Belgium
Session 3740-2019
Generating sales forecasts for an online retailer of a Walmart.com size is a big task. Our sales pattern for the Thanksgiving-Christmas Holiday season is very different from the rest of a year. In addition, sales patterns significantly shift between different lines of goods distribution (such as shipping to customers home vs. shipping to store for customers pick-up) as well as between different departments (such as Electronics sales vs. Pharmacy sales). SAS® Forecast Server provides a great infrastructure to automatically forecast each of hundreds of our Sales time series at various levels of sales and time. However, traditional time series modeling with SAS Forecast Server assumes specifying just one periodicity value for data. Our data are highly twice periodic with first periodicity being weekly periodicity of daily lag 7 and second yearly periodicity of varying lag of 365366 days. In addition, Thanksgiving Holiday seasons at adjacent years are separated by an uneven number of days, and many departments have sales with multiplicative seasonality for holidays and additive seasonality for the rest of the year, thus making it hard for any classical time series model to forecast well for both season and off-season periods. In this presentation we go through data decomposition and re-assembly steps that address these challenges, significantly improve SAS Forecast Server performance on our data, and enable us to achieve daily site-total forecasting accuracies in the upper 90s.
Alexander Chobanyan and
Sangita Fatnani, Walmart Labs
Session 3472-2019
Representational State Transfer (REST) architecture has become the standard for cloud communication; it defines an abstraction among service providers that enables each to evolve independently from the others. REST architecture hides the underlying implementation, so SAS® software does not need to know how any required function (such as authorization or persistence) is provided and it is not affected by changes in the cloud service. For example, authorization can change from LDAP to Kerberos and persistence can change from PostgreSQL to Oracle. SAS software can access functions by invoking the HTTP procedure to send cloud service requests and by using the LIBNAME statement with the JSON (JavaScript Object Notation) engine to parse cloud service responses. REST also defines stateless communication, which allows automatic horizontal scaling to meet load demands. This paper shows how SAS software can become a RESTful client of cloud services.
Tom Caswell and
Fred Burke, SAS
Session 3351-2019
SAS® Viya® on SAS® Cloud Analytic Services (CAS) offers a highly performant and scalable computing environment. Making performant hardware architecture solutions for CAS server hardware is necessary to support the high software scalability and large data models. This paper offers best practices for hardware and network choices for your SAS Viya on CAS environment from CPU and Memory provisioning, to CAS_DISK_CACHE architecture and provisioning, and inter-server network bandwidth requirements. This paper offers best practice guidance for performant provisioning of your CAS system hardware.
Tony Brown, SAS
Session 3771-2019
Inference rules are a category of business rules expressed in a series of if-then statements. They are often used jointly with a modeling score for segmentation in many business decisions such as marketing, underwriting, pricing, attribution control, and so on. Each inference rule identifies a sweet or sour spot in a population whose performance deviates greatly from the population average but which a regression- or tree-based predictive model often fails to capture. A mature set of inference rules is a good compromise between maximizing business outcome and minimizing loss in business volume. Using decline rules in mortgage underwriting as an example, this paper introduces a stepwise process in SAS® to evaluate and improve the efficiency of inference rules. The final product is a series of neural-network-look-like rules expressed in a white-box manner that are easily communicable and implementable.
Alec Zhixiao Lin, Southern California Edison
Session 3189-2019
SASPy is a module developed by SAS Institute for the Python programming language, providing an alternative interface to SAS®. With SASPy, SAS procedures can be executed in Python scripts using Python syntax, and data can be transferred between SAS data sets and their Python DateFrame equivalent. This enables SAS programmers to take advantage of the flexibility of Python for flow control, and Python programmers can incorporate SAS analytics into their scripts. In this hands-on workshop, we use the Jupyter Notebook interface for SAS® University Edition to complete common data analysis tasks using both regular SAS code and SASPy within a Python script. We also highlight important tradeoffs for each approach, emphasizing the value of being a polyglot programmer. As background, Python is an open-source language originally developed for teaching programming in the 1990s. Highly praised for its straightforward syntax, Python initially became popular as a glue language and is now widely used in many problem domains, from data science to web development. Many popular websites are Python applications, including YouTube and Instagram. This workshop is aimed at users of all skill levels, including those with no prior experience using Python or Jupyter Notebook, and assumes only basic familiarity with SAS syntax.
Isaiah Lankham, University of California Office of the President
Matthew Slaughter, Kaiser Permanente Center for Health Research
Session 3573-2019
Data sets used in a web application served by SAS® Stored Processes can experience access errors during high-frequency usage. What can you do when attempts to use LIBNAME option FILELOCKWAIT=n and SAS/SHARE® served libraries continue to generate errors? This paper demonstrates how these failures can occur in Microsoft Windows servers and how to prevent them using system mutexes. A discussion of creating a custom .dll to interact with the mutexes, macros to interact with the .dll, and systematic macro usage is included. These techniques are both academically interesting and useful in actual code.
Richard Devenezia, HITactics
Session 3116-2019
SAS® In-Database Technologies offers a flexible, efficient way to leverage increasing amounts of data by injecting the processing power of SAS® wherever the data lives. SAS In-Database Technologies can tap into the massively parallel processing (MPP) architecture of Apache Hadoop and Apache Spark for scalable performance. SAS® In-Database Code Accelerator for Hadoop enables the parallel execution of user-written DS2 programs using Spark. This paper explains how SAS In-Database Code Accelerator for Hadoop exploits Scala and the parallel processing power of Spark, and prepares you to get started with SAS In-Database Technologies.
David Ghazaleh, SAS
Session 3643-2019
The study presented in this paper looked at possible methods and processes involved in the imputation of complete missing blocks of data. A secondary aim of the study was to investigate the accuracy of various predictive models constructed on the blocks of imputed data. Hot-deck imputation resulted in less accurate predictive models, whereas a single or multiple Monte Carlo Markov Chain (MCMC) or the Fully Conditional Specification (FCS) imputation methods resulted in more accurate predictive models. An iterative bagging technique applied to variants of the neural network, decision tree, and multiple linear regression (MLR) improved the estimates produced by the modeling procedures. A stochastic gradient boosted decision tree (SGBT) was also constructed as a comparison to the bagged decision tree. The results indicated that the choice of an imputation method as well as the selection of a predictive model is dependent on the data and hence should be a data-driven process.
Humphrey Brudon, Renette Blignaut, University of the Western Cape
Session 3317-2019
This paper describes the new object detection and semantic segmentation features in SAS Deep Learning, which are targeted to solve a wider variety of problems that are related to computer vision. The paper focuses on algorithms that are supported on SAS® Viya®, specifically Faster R-CNN and YOLO (you only look once) for object detection, and U-Net for semantic segmentation. This paper shows how to use the functionality of the Deep Learning action set in SAS® Visual Data Mining and Machine Learning in addition to DLPy, an open-source, high-level Python package for deep learning. The paper demonstrates applications of object detection and semantic segmentation on different scenarios, and it shows how to prepare data, build networks, select parameters, load or train the weights, and display results. Future development and potential applications in different areas are discussed
Xindian Long,
Maggie Du, and
Xiangqian Hu, SAS
Session 3317-2019
This paper describes the new object detection and semantic segmentation features in deep learning, which are targeted to solve a wider variety of problems that are related to computer vision. This paper focuses on algorithms that are supported on SAS® Viya®, specifically Faster R-CNN and YOLO (you only look once) for object detection, and U-Net for semantic segmentation. This paper shows how to use the functionality of the Deep Learning action set in SAS® Visual Data Mining and Machine Learning in addition to DLPy, an open-source, high-level Python package for deep learning. The paper demonstrates applications of object detection and semantic segmentation on general and specific scenarios, and it shows how to prepare data, build networks, select parameters, load or train the weights, and display results. Future development and potential applications in different areas are discussed.
Xindian Long, Maggie Du, and Xiangqian Hu, SAS Institute
Session 3393-2019
SAS® Viya® includes a new event-driven operational infrastructure for logs, metrics, and notifications. This paper explores the components, flows, and capabilities that give you powerful new insight into the operation of a SAS Viya deployment. With a particular focus on command-line tools, the paper teaches you how to view consolidated log and metric flows, check the status of services, and validate the deployment and its components. Various third-party and enterprise integration scenarios are also explored.
Bryan Ellington, SAS
Session 4052-2019
Entrepreneurship is fast emerging as a transformational megatrend of the 21st century, given its capacity to reshape economies and industries throughout the world. This study shows that initially, societal perception about an entrepreneur was influential in starting a business in the US, while in recent years other factors replaced this influence, such as experience in owning a business and individual perception regarding entrepreneurship. This study explores the determinants of why people likely start a business in the US. In addition, it sheds light on whether such intentions differ geographically by comparing them with clusters of countries.
Surabhi Arya,
Prashant Gour,
Dhruv Sharma, and
Jeroen Vanheeringen, Oklahoma State University
Session 3000-2019
Strong authentication using techniques such as Kerberos is becoming an IT security requirement for many organizations. SAS® Viya® 3.4 supports the option to delegate Kerberos credentials throughout the environment and onto your Apache Hadoop distribution. Doing so enables you to provide strong authentication both into and out of your SAS Viya 3.4 environment. This paper describes the steps that a SAS® administrator and IT security specialist need to complete in order to enable strong authentication both into and out of a SAS Viya 3.4 environment.
Stuart Rogers, SAS
Session 3015-2019
Accessing external FTP sites to download files is a resource-consuming manual process. This inefficiency increases more when the files are large, many, and zipped. The common practice to acquiring this data from an external FTP site is to go to the secured site, log on using user ID and password, find the location of the file within the directory structure, download the data to the target location, and then unzip each file to extract the data for use. The speed at which this repetitive process can be completed depends on network traffic as of the day and time, the distractions during this process, and other similar factors. This paper provides an automated process based on SAS® that can be scheduled on a Microsoft Windows or Linux environment to complete the entire process at any given time and day.
Shimels Afework,
Ricardo Alvarez, and
Sammi Nguyen, PerformRx
Session 3087-2019
Competition in customer experience management has never been as challenging as it is now. Customers spend more money in aggregate, but less per brand. The average size of a single purchase has decreased, partly because competitive offers are just one click away. Predicting offer relevance to potential (and existing) customers plays a key role in segmentation strategies, increasing macro- and micro-conversion rates, and the average order size. This session (and the associated white paper) covers the following topics: factorization machines and how they support personalized marketing; how SAS® Visual Data Mining and Machine Learning with SAS® Customer Intelligence 360 support building and deploying factorization machines with digital experiences; and a step-by-step demonstration and business use case for the sas.com bookstore.
Suneel Grover, SAS
Session 3966-2019
The suicide rate in the world is increasing, and one of the main reasons are mental disorders such as depression and mood disorders. The objective of this project is to predict how likely a person is to develop suicide intention based on their behavior and how we can take effective measures to stop it. Many factors are considered in order to analyze the reasons a person commits suicide and how to prevent it. In the past, many studies examined the reasons why a person commits suicide based on different variables. Our research creates a model that predicts suicide tendency among the employees at a workplace so that we can prevent a person from committing suicide by predicting his behavior and giving him a proper medical treatment at an early stage. Our research focuses mainly on people who are working because the suicide rate is high amongst those people. We use data mining, artificial neural network, and text mining to create a predictive model for our analysis. This model can help us to decrease the suicide rate amongst the younger generation and help them to cure mental disorders.
Shambhavi Pandey,
Mohammad Bilal Khan, and
Sagar Thakkar, Clark University
Session 4003-2019
There are some patterns and aspects that make some students perform better than their peers in terms of academics. At Oklahoma State University, the administration wanted to identify patterns and factors leading to high academic performance. For this, data consisting of almost 40,000 records and 35 variables were obtained from Institutional Research and Information Management at Oklahoma State University. The data set consists of student demographics, admission data, email interactions before joining (including interaction messages with the admissions office), athletes data, gymnasium check-in data, student employment data, and participation data in various departmental events. This paper attempts to identify factors and predict the performance of the students based on the attributes of the students, including demographics, participation in extracurricular activities, and exercise routine of the student. The research paper also talks about the performance of various predictive models such as logistic regression, ensemble models, neural networks, and decision trees, conducted using SAS® Enterprise Miner (TM). This analysis can help university personnel to identify the features that help students perform better and help identify areas in which proactive measures should be taken to boost the performance of the students.
Onkar Mayekar and
Miriam Mcgaugh, Oklahoma State University
Session 3607-2019
The SAS® data access functions [OPEN(), FETCH(), FETCHOBS(), GETVARN(), GETVARC(), ATTRN(), ATTRNC(), VARNAME(), and CLOSE()] are powerful and under-used elements of the SAS programming language, with broad applications in both macro and DATA step contexts. This paper demonstrates the mechanics and several common use cases of these functions in order to accomplish programming feats that would otherwise be a bit awkward or inefficient to attempt in SAS. In order to capture the most value from this paper, you should have a solid understanding of the DATA step program data vector (PDV) and typical DATA step behavior, as well as an established comfort level with the SAS macro language and the %SYSFUNC() macro function.
Chad Mukherjee, The Financial Risk Group
Session 3099-2019
There are many algorithms that are able to find a function root, like the well-known Newton method, Brents numerical root-finding method, and so on. But all these algorithms are calculus-based techniques that need an object function that is continuous or differential. These algorithms also require a reasonably good guess for the initial search point. A genetic algorithm does not need to consider these problems; just give it a pretty wide searching interval and see if the value of object function is near zero. This paper explains using a genetic algorithm to find a function root and display its power.
Xia Ke Shan, iFRE Inc.
Kurt Bremser, Allianz Technology GmbH
Session 2973-2019
Firms need accurate, timely, and robust forecasts so that they can better plan for the future. For this reason, many depend on SAS(r) software to build, execute, and maintain forecasting models. In SAS, many premade time series models are available, and custom models are easy to create and deploy. Practitioners, whether using SAS/ETS(r) software or SAS(r) Forecast Studio, can fit various models to a series and deploy the most accurate one according to a specified selection criterion. Practitioners often combine the forecasts from different models (that is, ensemble modeling), because the result can be more accurate and robust than an individual forecast alone. The ensembling process involves combining forecasts whose associated models are ranked using a single error metric. This practice is not the best option available because no single error metric is globally optimal. Instead, forecasters should combine the forecasts from champion models selected using numerous error metrics. The practitioner then sidesteps the issue of whether he or she is using the best error metric in the given context. Ensembling the forecasts from champion models selected using more than one error metric results in a more stable and robust forecast because the outcome is an average of many forecasts, each from the champion of a unique selection criteria. This process eliminates a significant amount of risk and uncertainty, both desirable from the forecasters perspective. This paper describes the process and demonstrates its benefits.
Zachary D. Blizard, Hanesbrands Inc.
Session 3346-2019
Your Legal, IT, or Communications department said that your reports must be accessible to people with disabilities. They might have used terms like Section 508 or WCAG. Now what? This paper leads you down the path to creating accessible reports by using SAS® Visual Analytics. It includes examples of what to do and what not to do to make your reports accessible. It provides information about which types of objects to use and how to use them in order to maximize the accessibility of your reports. You can use the information in this paper to create accessible reports, comply with your organizations accessibility requirements, and enable people with disabilities to benefit from the information that you publish.
Jesse Sookne, SAS
Session 3873-2019
Currently, global warming is one of the most severe problems in the world. From many researches, it is widely believed that CO2 is the main factor in global warming. Therefore, in many countries, introducing renewable energy is actively promoted so as to avoid high-emission power production methods such as thermal power generation. In this paper, the objective is to forecast future CO2 emission by using SAS® software. I analyzed the past generating trend of the amount of electricity demand or generation in order to predict future production quantity and CO2 emission. Also, taking worldwide tendencies into consideration, I made some probable eco-friendly scenarios, like shifting drastically to solar power, and made an analysis of each of them. Finally, I proposed some electrical generation plans for a sustainable world based on these analyses.
Kaito Kobayashi, The University of Tokyo
Session 3922-2019
When submitting your drug benefit assessments to the German Authority or other (foreign) regulatory agencies, you need to provide your reports in a specific format. These reports are usually in a non-English language, so the characters are different from the English characters that you are used to. The characters contain accents and other special characters, and the numerical results contain commas where you expect decimal points to be, and vice versa. To provide this report, a medical writer typically uses Microsoft Word to copy the results from a Clinical Study Report (CSR) into the document, and includes the appropriate formatting and translations. The copying must be done very carefully, and this process is error prone and very exhaustive. Also, the German authorities might have additional follow-up questions. Therefore, following this process is inefficient. There is a way to make the process more automated, and this paper demonstrated those methods. Firstly, this paper shows you a framework you can use for submitting a German Dossier, such as the time-to-event macros, subgroup analyses, and the processes that you can use to get your data in the right format. Secondly, this paper shows you how you can generate the results in the exact format needed for the German Dossiers. You are shown how to use encoding to read and write special characters, and how to use the REPORT procedure along with the style attributes to get your outputs in the required, correct format.
Kriss Harris, SAS Specialists Limited
Session 3917-2019
Free-form text data exists in a variety of forms and can come from a variety of sources. Increasingly, the collection and analysis of raw text data is becoming imperative in order for businesses to hear the voice of the customer as well as to monitor internal communications and sentiment. Despite its importance, unstructured text data presents unique challenges to the would-be analyst. Efforts to mine text data and extract actionable insight are greatly simplified with SAS® technology. Online reviews, call center transcripts, medical records, practitioners notes, survey responses, or any free-form text data can all yield valuable insights for any organization. This paper outlines an example of how to manage and transform free-form text data, apply advanced analytical methods (unsupervised and supervised) to extract useful patterns, and develop actionable insights. We leverage SAS® Visual Text Analytics on SAS® Viya® to mine and visually explore real stakeholder data for actionable insights.
Reid Baughman, Zencos
Session 3198-2019
In recent years, there has been a fundamental shift in the requirements of IT organizations to seamlessly manage and deploy their software estates in the public and private clouds. SAS® Event Stream Processing is at the forefront of this revolution, providing the benefits of a cloud-ready solution coupled with powerful analytics at scale when used alongside AI, machine learning, IoT, and real-time data streaming. This paper discusses advanced design, deployment, and engineering patterns for SAS Event Stream Processing in the cloud and its integration with key third-party technologies such as Apache Kafka.
Louis Katsouris, SAS
Session 3100-2019
In a competitive business environment, the ability to predict customers behaviors is imperative to the success of every company. The probabilities of customers behaviors are often estimated by predictive models. In this way, the models help determine who in the customer population should be targeted with the relevant offers. However, the predictive models do not explain why the targeted customers are more likely to buy. They help find the best customers to target but do not explain why or how to do it. As such, models might be ineffective in generating expected business outcomes and excluded from future deployments. In this paper, a new response-factors method is proposed that unravels customers potential motives to buy. This multi-step approach is based on a correlation analysis between the potential factors to buy and a models ranking, such as deciles. The highest-correlated factors are selected along with the qualifying threshold criteria for tagging the individual customers with multiple potential reasons to buy. The resulting list of high-ranking customers can then be obtained with the customized response-determining factors. Once the factors are selected, a suitable common-language explanation can be used, thus enabling a more successful relationship with the customer. In this paper, the method of response-factors is explained with an illustrative example and heuristic SAS® code is provided for generating the desired outcome.
Krzysztof Dzieciolowski, Concordia University
Session 3014-2019
In stress testing, an analysis of financial risk exposure needs to be carried out with respect to two key components: the current portfolio positions and the future portfolio positions. The risk analysis of current portfolio positions, properly projected in the future, is a well-known topic of discussion. The risk analysis of future portfolio positions (that is, positions that might originate in the future) is less studied, and it poses analytical challenges. In this paper, a random forest regression is used to predict the overall future portfolio volume. Then, conditionally on the future volume, a Copula model is used to generate synthetic positions. Finally, a K-Nearest Neighbors classification method is used to fill-in non-numerical attributes of the newly generated positions.
Rocco Cannizzaro and
Christian Macaro, SAS
Session 4048-2019
In this session, we tell you how to get everyone onboard into the world of data visualization. SAS® Visual Analytics can provide great insights, but how do you get your organization familiar with dashboarding? At TDC, we have had some success with live dashboards and reporting by providing education and regular problem-solving. In this way, we have made SAS Visual Analytics the primary tool for reporting and delivering insights. Data is not just data. Dashboards are not just dashboards. Getting everything to work together, focusing on enhanced performance, and getting everyone onboard is key! In this session, we give some answers and share some thoughts: How do you get maximum usage of dashboards and provide all the data?
Jais Tindborg, SAS
Session 3617-2019
The fraud risk management methodology is used by private sector organizations very successfully to curb deviations and minimize vulnerabilities. In the publics organizations the main reference is the Fraud Risk Management Guide published in 2013 by the Committee of Sponsoring Organizations of the Treadway Commission (COSO). Started at the end of 2016, the project for the State Audit Court of Mato Grosso do Sul (TCE-MS) called E-Extractor combines the extraction of data from the databases of all city halls by using artificial intelligence (AI), business intelligence (BI), and SAS® risk and fraud management tools. The E-Extractor enables, in addition to the automated collection of data from public organizations, the automated collection of different databases such as electronic invoice database, Ministry of Education database, and other databases. The integrated E-Extractor with BI enables the auditor to view the transformed data as information for analysis and decision-making in an organized way. With AI and BI, the auditor sees the analysis on the screen and, after prioritization by risk management and fraud tools, the analysis presents all the risks and indications of existing fraud. The objective of this project is to help the TCE-MS achieve effectiveness in their actions, thus fulfilling their constitutional role, which is to judge the accounts of governments and mayors. In this way, we can develop a priority for the better application of public resources.
Douglas Avedikian, Tribunal de Contas do Estado do Mato Grosso do Sul
Session 3167-2019
Do you want to create highly customizable, publication-ready graphics in just minutes using SAS®? This workshop introduces the SGPLOT procedure, which is part of ODS Statistical Graphics, included in Base SAS®. Starting with the basic building blocks, you can construct basic plots and charts in no time. We work through several plot types, and you learn some simple ways to customize each one.
Joshua M. Horstman, Nested Loop Consulting
Session 3881-2019
A large-scale SAS® Grid platform can be a complex environment to administer, and going from an out-of-the-box installation to running the service as part of business as usual (BAU) can appear daunting. Using the five pillars approach set out in this paper, your enterprise can start leveraging the power of a SAS Grid installation from day one to integrate 1) workload management; 2) alerting and monitoring; 3) logging and audit; 4) resilience and availability; and 5) disaster recovery, and backup and restore into your BAU methodology. A toolset consisting of shell scripts, SAS programs, SAS command-line interfaces (CLIs), and third-party alerting tools, combined with automation these guiding principles, along with proven examples, can smooth the path to enabling analytics across your organization.
Ryan Martis, Demarq
Session 4050-2019
We have designed and examined an analytics solution to make probability predictions of earning revenue per customer visit. The motivation for this study is to help marketing teams make better use of their budgets. Specifically, we want to aid a firm in using their data as a guiding tool for decision-making. Often the 80/20 rule has been proven right for many firms a significant portion of their revenue comes from a relatively small percentage of customers. Therefore, marketing teams are challenged to make appropriate investments in promotional strategies. We used customer data from Googles Merchandise Store (G-Store), available on Kaggle, along with SAS® University Edition to derive essential insights and generate models to predict the probability of earning revenue per visit. Our analysis shows that the G-Stores revenue earning potential is the highest amongst the customers who visited the store 100500 times since the mean revenues from this segment are the highest, even though the number of such transactions was limited. The features highlighted in our model can be used by Google to increase the revenues from its existing customer base rather than expanding its resources to acquire new customers who might or might not make a substantial purchase per visit on G-Store.
Rohit Kaul and
Meera Govindan, Purdue University
Session 3177-2019
Cloud Analytic Services Language (CASL) is a new language that simplifies running actions in SAS® Cloud Analytic Services and processing the results of those actions. A core strength of CASL is the ability to receive the results of an action (tables, dictionaries, arrays, and scalar values) and transform those results into a final report. You can combine CASLs expressiveness for transforming results with the capabilities of the SAS® Output Delivery System to produce refined reports. We present CASL syntax to extract rows and columns by using a WHERE clause and column subsetting operators. Find out how to pipeline actions using CASL to prepare arguments for an action using the results of a previous action. Come see CASL in action, producing polished reports!
Jerry Pendergrass, SAS
Session 3170-2019
Output Delivery System (ODS) graphics, produced by SAS® procedures, are the backbone of the Graph Template Language (GTL). Procedures such as the Statistical Graphics (SG) procedures rely on pre-defined templates built with GTL. GTL generates graphs using a template definition that provides extensive control over output formats and appearance. Would you like to learn how to build your own template and make customized graphs and how to create that one highly desired, unique graph that at first glance seems impossible? Then its a Great Time to Learn GTL! This paper guides you through the GTL fundamentals while walking you through creating a graph that at first glance appears too complex but is truly simple once you understand how to build your own template.
Richann Watson, DataRich Consulting
Session 3184-2019
The SAS® macro language is a powerful tool for extending the capabilities of SAS®. This hands-on workshop teaches essential macro coding concepts, techniques, tips, and tricks to help beginning users learn the basics of how the macro language works. Using a collection of proven macro language coding techniques, attendees learn how to write and process macro statements and parameters; replace text strings with macro (symbolic) variables; generate SAS code using macro techniques; manipulate macro variable values with macro functions; create and use global and local macro variables; construct simple arithmetic and logical expressions; interface the macro language with the SQL procedure; store and reuse macros; troubleshoot and debug macros; and develop efficient and portable macro language code.
Kirk Paul Lafler, Software Intelligence Corporation
Session 3125-2019
If you've ever used the hash object for table lookups, you're probably already a fan. Now its time to branch out and see what else hash can do for you. This paper shows how to use hash to build tables and aggregate data. Why would you ever want to do this? How about achieving a complex process that would have taken multiple sorts and many passes through your data all in a single DATA step that flows intuitively and isn't hard to write! This is an updated and expanded version of my original paper Hash Beyond Lookups: Your Code Will Never Be the Same! It provides a further exploration of some useful hash techniques and offers more situations that can be elegantly solved using hash. The main message? don't be afraid of hash!
Elizabeth Axelrod, Abt Associates Inc.
Session 3213-2019
The website Regulations.gov was launched in 2003 to provide the public with access to federal regulatory content and the ability to submit comments on federal regulations. Public participation in federal rulemaking is encouraged because it supports the legitimacy of regulatory decisions, frames public acceptance or resistance to rules under development, and shapes how the public interest is served. Manually reading thousands of comments is time-consuming and labor-intensive. It is also difficult for multiple reviewers to accurately and consistently assess content, themes, stakeholder identity, and sentiment. Given that individually proposed rules can exceed 10,000 comments, how can federal organizations quantitatively assess the data and incorporate feedback into the rulemaking process as required by law? This paper shows how SAS® Text Analytics can be used to develop transparent and accurate text models, and how SAS® Visual Analytics can quantify, summarize, and present the results of that analysis. These solutions can significantly increase time to value, leveraging capabilities that computers excel at while freeing up human intuition for the analysis of these results. Specifically, we address public commentary submitted in response to new product regulations by the US Food and Drug Administration. Ultimately, the application of a transparent and consistent text model to analyze these documents will support federal rule-makers and improve the health and lives of American citizens.
Emily Mcrae,
Tom Sabo, and
Manuel Figallo, SAS
Session 3196-2019
The advent of personal computers and then smartphones improved the lives of many people in many ways, but for disabled people it was a game-changer, allowing access to untold amounts of information that previously had not been available to them. Sadly, there is a downside: as with all data, access to the information contained within that data depends entirely on how that data has been presented. For disabled people, this understanding is even more important because if insufficient care is taken, information can be made completely inaccessible to large groups of people. I explain the nature of disability, the legality of not providing access to everybody, the standards and guidelines that are available, and how not considering inclusivity can affect the bottom line. I demonstrate some of the tools that disabled people use to access information and explain what can be done to make your data as inclusive as possible.
Terry Clarke, Blazie UK
Session 3439-2019
As a user of the Schedule Manager plug-in to SAS® Management
Console, you might not realize all the features available for managing
flows. This paper shows you how to unlock some hidden features and learn
about new features designed to help with huge flows. This paper also
provides an overview of the various supported schedulers and their niche.
Come learn how to get more out of SAS® scheduling.
Randolph S. Williams, SAS
Session 3500-2019
Generally speaking, a well-designed content security model for a SAS® software deployment is an intentional balance between authorizations and those who require them. Maintaining this balance is an ongoing task for SAS administrators. The process for implementing a security model as part of a SAS® Viya® 3.4 deployment needs to be intentional as well. Fortunately, using an existing SAS®9 environment that has evolved in accordance with the corporate security model can prove very valuable. This value is not limited to the initial implementation of the SAS Viya 3.4 security model. The SAS®9 environments model can also be referenced when the two environments run in parallel. This paper introduces techniques and processes to identify SAS®9 metadata objects that include directly (and intentionally) applied authorizations, and it explains how to retain those authorizations in SAS Viya 3.4. One of the processes that are discussed includes an automated solution using metadata security macros to deconstruct an Access Control Template (ACT) that is applied to SAS®9 objects. Deconstructing the ACT enables you to retain the authorizations when you promote SAS®9 content to SAS Viya 3.4. For ongoing administrative tasks, or when automation is not necessary, this paper demonstrates how to use Platform Object Framework administration tools, SAS®9 Security Report Macros, SAS® Management Console, SAS Viya 3.4 command-line interface (CLI), and SAS Viya 3.4 Environment Manager.
Mark Dragone, SAS
Session 3532-2019
For every challenge we face on the planet, there is data that can help us find the solution. The next generation of problem solvers, the digital natives, are poised to see issues such as poverty, hunger, gender equality, and climate change not as insurmountable but as solvable puzzles. With unprecedented access to technology and data, and an unquenchable thirst for digital connection, this generation holds incredible promise in solving social challenges that affect every population, every race, every gender. An interdisciplinary approach to education that incorporates science, technology, engineering, and math (STEM) and project-based learning is the best way to prepare these students for the world that awaits their contributions. GatherIQ (TM), a mobile and web app produced by SAS in conjunction with their education software division, Curriculum Pathways, invites students to join the global quest to reach the United Nations Sustainable Development Goals for 2030 through experiences in STEM and project-based learning. Using GatherIQ, students not only learn about the issues but combine their wits to address social challenges in their own back yards and around the world.
Jennifer Sabourin and
Lucy Kosturko, SAS
Session 3717-2019
With data becoming bigger and bigger year after year, it is important to
have the right tool set and platforms to meet the need of the organization.
The idea of one platform to meet all of your needs is something we all think
would be the perfect solution. Somewhere to bring all your business needs
and thereby make the management of the platform a lot easier. Royal Bank of
Scotland (RBS) is now three years down the line, and the dream has become
the reality. This paper looks at how RBS implemented a multi-tenancy
platform, which is internally know as Genie, and what was discovered along
the three-year journey. This paper covers: building tenancies and how using
one format to fit all can work; user management; data segregation, or how
many SAS® LASR (TM) Analytic Servers do you really need?;
security and using the same model to meet all requirements. One SAS®
9.4 platform, 10,000 users, and 14 tenancies; did the Genie help build our
dream...
Ross Sansom, The Royal Bank of Scotland Group
Session 3674-2019
Customers around the world are increasingly demanding rapid and relevant responses from their vendors, meaning that delivering on these expectations has become a key competitive advantage. Telenor, a multinational telco headquartered in Norway, communicates with its 172 million customers in real time, thanks to data and analytics. Highly efficient, real-time communications like these make great demands on platforms for data flow, model development and maintenance, and real-time technology. Telenor uses the latest SAS® technologies to develop solutions for handling consumer and business customer interactions, such as Next Best Action. In this presentation, we show you how SAS® Real-Time Decision Manager and SAS® Event Stream Processing complement each other and contribute to meeting the increasing demands of customers. We present use cases to illustrate the motives and the operational setup of the customer dialog.
Sverre Thommesen, Amesto NextBridge
Session 3034-2019
What is data in motion? What techniques does SAS® Viya® use to protect your data in motion from unwanted inspection or tampering? Why should you care? This paper introduces the concepts behind securely moving data across networks without fear of a data breach or data falsification. First, there's an overview of Transport Layer Security (TLS), x.509 certificates, cipher algorithms, and SAS® Secret Manager. I explain how SAS® uses TLS to establish secure communications between SAS servers and clients. I describe how SAS generates, distributes, and renews certificates, and how SAS delivers and manages the list of trusted Certificate Authorities. I also explain how you can replace certificates provided by SAS with your own certificates for any servers that users access directly. From a security perspective, I show how to disable and enable network security. In summary, I describe how SAS deploys software onto multiple machines while ensuring an end-to-end secure environment by default. Come learn how your data in motion is kept safe and secure.
Alec Fernandez, SAS
Session 3009-2019
SAS® has vast capabilities in accessing and manipulating core business information within large organizations. These capabilities are coupled with powerful Output Delivery System (ODS) tools to generate PDF reports. However, there are limitations to the manipulation of an existing PDF file. Python has access to many libraries that can update PDF bookmarks, hyperlinks, and the content of an existing PDF file. This paper describes a project that combines the strengths of both SAS and Python. This includes: bookmarks (renaming and rearranging parent/child bookmarks); hyperlinks (adjusting internal page links and external file PDF links; styles (controlling zoom views of the pages and font sizes; and text editing (manipulating text content of the PDF file). Users of this PDF tool are SAS programmers. Thus, a SAS macro is used to enable users to specify the PDF inputs. The %PDFTOOLS macro described in this paper reads the Microsoft Excel file and other sources to gather business rules. After performing a series of certain checks on the PDF inputs, it then integrates with a Python program, which then applies corrections to the PDF file. There are many tools that can be used to generate and manipulate PDF reports. The most effective solution is to use the strength of each tool in a unique way to formulate an easy-to-use yet effective method to achieve an optimal PDF report.
Sy Truong and
Jayant Solanki, Pharmacyclics LLC
Session 3103-2019
You as a statistician work effectively if you get the right things done in the right way at the right time. A statistician needs to exhibit strengths in four different areas to be effective: 1) Leadership consisting of self-leadership as well as influencing others. Leadership behaviors include driving projects proactively, having the overall goal in mind, communicating clearly, enabling smooth collaborations, and effectively delegating tasks. 2) Innovation in analyses and processes. Innovators challenge the status quo, see areas for improvement and implement solutions, and keep up-to-date with relevant statistical knowledge and apply it to solve real problems. 3) Knowledge about the data and the business environment. Knowledge comprises an understanding of how your work contributes to the bigger picture, business acumen, a solid understanding of many general statistical approaches, and decent programming skills. 4) Excellence in efficiency and quality. Excellence is exhibited by the ability to focus on details, capability of achieving high quality over an extended period of time, and delivery of quality results with minimal time and resources. This presentation provides instructions for personal improvement in these four dimensions.
Alexander Schacht PhD, UCB and The Effective Statistician
Session 3409-2019
A wide variety of clustering algorithms are available, and there are numerous possibilities for evaluating clustering solutions against a gold standard. The choice of a suitable clustering algorithm and of a suitable measure for the evaluation depends on the data type; whether separate class label information exists (supervised clustering); and on the particular distribution of the observations, including characteristics such as the number of clusters, separability of the clusters, and the shape, size, and density of the clusters. This paper provides a survey of some of the most widely used clustering evaluation criteria. In addition, the paper describes recently developed criteria that are applicable for mixed interval-categorical data and for non-Euclidean distance metrics. Notable examples of the methods covered include residual sum-of-squares, purity, the silhouette measure, the Calinsky-Harabasz measure, class-based precision and recall, the normalized mutual information, variation of information, and graph-sensitive indices.
Ralph Abbey, SAS Institute Inc.
Ralph Abbey
Session 3597-2019
Have you ever attended a Hands-on-Workshop and found it useful? Many people do! Being able to actually try out the things that you're learning is a wonderful way to learn. Its also a great way to teach. You can actually see the people apply what they're learning. Have you ever thought that it would be fun to teach other people in a hands-on format? Maybe you aren't sure what it takes or how to approach the course. This presentation can help you with those questions and struggles. What to teach? How much to teach? How should I teach it? How is a Hands-on-Workshop different from lecture style? How much to put into Microsoft PowerPoints? What if they ask me something I don't know? What if they have a computer problem? All those questions that you have are answered in this presentation.
Chuck Kincaid, Experis
Session 3023-2019
By 2018, the United States alone could face a shortage of 140,000 to 190,000 people with deep analytical skills as well as 1.5 million managers and analysts with the know-how to use the analysis of big data to make effective decisions (McKinsey Global Institute report, May 2011). This situation represents a 5060% gap between the supply and demand of people with deep analytical talent. So it must come as no surprise that how to find SAS® work is the question I am asked the most. Daily I am inundated with resumes and requests to connect on LinkedIn and offers of coffee, tea, or dinner. SAS is a highly valued skill, and everyone wants the juice on what steps to take to land work. Time magazine rated SAS as the # 1 career skill in the data world (http://time.com/money/4328180/most-valuable-career-skills/). SAS is also the #1 skill for landing a bigger paycheck according to this study: http://career-services.monster.com/yahooarticle/best-paid-job-skills#WT.mc_n=yta_fpt_article_best_paid_job_skills. To help fill the gap, I started helping our users to land work by creating a 21-day SAS challenge to identify their strengths. In this informative session, I share 21 tips you can take away. I also share tips from my own job search and how I landed work as a SAS professional. At the end of this session, you will receive an activity sheet for action items to do during the conference.
Charu Shankar, SAS
Session 3278-2019
With the Internet of Things (IoT), a digital twin is created to have a virtual representation of a remote device or system. The digital twin shows you the devices operating condition, no matter where it is physically located. IoT devices have a number of sensors installed on them, as well as sensors for the environment around them. Analytics can bring this sensor data together to create a true real-time digital twin. A previous paper showed how streaming analytics are used for device state estimation and anomaly detection. This paper explains how deep learning can be added to your digital twin for more understanding. Image and video analytics are used to capture operating conditions that are missed by regular sensors. Recurrent neural networks (RNN) add temporal data analysis and pattern detection in real-time data streams that are prevalent in digital twins. With these deep learning capabilities, your digital twin provides a new level of insight for your remote devices.
Brad Klenz, SAS
Session 3169-2019
Looking at websites, you can see a wide variety of ways to display data and interact with users. Much of what you see in a web browser is done by using a combination of HTML, CSS, and JavaScript. There are libraries of CSS and JavaScript code that provide a massive amount of functionality that we can use in our applications. Many of these libraries are free or very inexpensive. I describe some of the best libraries that I know of and how to use them with SAS®. SAS® Stored Processes enable SAS code to be run from a web browser using the SAS® Stored Process Web Application. They enable us to write SAS code that works with web technologies so that we can have SAS in the back end and HTML, CSS, or JavaScript in the front end. In this presentation, I show you how to get SAS to use these libraries to make impressive applications with SAS Stored Processes in the web browser.
Philip Mason, Wood Street Consultants
Session 3716-2019
Managing SAS® platforms in big organizations, you face some unique non-technical challenges, especially when you are serving many different departments. Few other tools offer the connectivity SAS does, and few are so widely used by people across the business. With new data storage types, new server environments like the cloud, and new versions of SAS arriving seemingly every month, leaders and SAS administrators find themselves at the heart of a tangled knot where each department is pulling their own thread to get data and insight relevant to them with ever bigger and more varied tasks. At Royal Bank of Scotland (RBS), our SAS® 9.4M5 platform with SAS® Visual Analytics, which we call Genie, has 10,000 users, 1500 of which manufacture data for SAS Visual Analytics reports, run complex models, and provide in-the-moment analysis. The user base represents almost every area of our company, and everyone from Commercial Banking to Human Resources has a stake in Genie and they all want different things. In this sort of environment, how do our SAS administrators and business leaders make sure we are all pulling together and headed in the right direction? It isn't simple, and the days when a SAS administrator was just the techy in the basement are long gone. To manage today's large SAS platforms, it is necessary to have a strategy for collaboration lead by the administrators, helping them keep their finger on the pulse of the business. So how can you do this? Its time to meet the Socialite admins from RBS and find out.
Johnathan Kain, The Royal Bank of Scotland Group
Session 3972-2019
This paper explains how we used SAS® Viya® to investigate phone numbers used in human trafficking, using a combined sample of approximately 3 million classified ads posted on backpage.com. The contact phone numbers displayed in online sex ads might be the most important single clue for finding the traffickers and pimps who profit from these activities. Some investigators believe criminals use inexpensive burner phones and throw them away after a few months to make it difficult for law enforcement to track them. Another theory suggests traffickers use full-featured phones and keep them for much longer periods of time. In a previous study that used a sample of ads collected over one year and about 120 locations, we found that at least 50% of the numbers were still used in ads after 812 months. While 40% of the numbers were found in places they had previously appeared, 10% were in new locations. This suggested that some of the missing 50% of the phone numbers might have also moved to other areas that were outside of the locations included in the original sample. To address this issue, we extended the original sample by collecting over 2.5 million additional ads, encompassing more than 300 locations and 12 more months. Results from the second study are presented.
Miriam Mcgaugh, Oklahoma State University
James Van Scotter and Lauren Agrigento, Louisiana State University
Denise Mcmanus, Ph.D., The University of Alabama
Tom Kari, Tom Kari Consulting
Session 3951-2019
Four hundred million reais. This is the volume of financial resources that the Court of Audit of the State of Cear (TCE-Cear) in Brazil identified in the year 2017 as possible irregularities. This finding was possible with the implementation of SAS® tools for the data analysis and analytical intelligence project to identify irregularities (fraud) in purchases. TCE-Cear is the public institution responsible for the control of public assets and resources of the State of Cear, promoting ethics in public management in order to guarantee the full exercise of citizenship of the population of Cear. It can be observed that companies organize themselves in an improper way, such as the creation of cards, for example, to increase their profit margins, circumventing legislation and implementing illegal methods to win bids. This work presents the methodology, procedures, and techniques applied with SAS® Enterprise Guide®, SAS® Enterprise Miner‚ and SAS® Social Network Analysis, which enable the crossing of data, application of sophisticated algorithms, and creation of networks of relationships between companies and people. All this has made it possible to identify around 400 million reais in tenders with signs of irregularities, contributing to the reduction of corruption in the state of Cear, Brazil.
Jose Alexsandre Fonseca da Sliva, TCE-Ceará
Session 4053-2019
In the United States, political polarization has become a significant issue. News media bias and the manner in which social networks deliver news to individual users have both been implicated as playing an essential role in increasing this polarization. Many have suggested that identification and disclosure of media bias would help address this divisive issue. In this paper, a method is demonstrated for data scientists to detect media bias without the bias of the data scientist influencing the results. The paper examines only partisan bias and focuses only on the two major political parties in the United States. The method used was to build a predictive model on congressional house speeches, labeling each speech based on the speakers party, and then use the model to score news media articles. By using a text topic node combined with logistic regression in SAS® Enterprise Miner (TM), just over a 92% accuracy rate at distinguishing whether a Republican or Democrat made a congressional speech was achieved. This method could be used to create bias checkers to create flags on articles that would assist readers in evaluating their content and enable social media sites to ensure that opposing viewpoints are displayed.
Alex Lobman,
Lohit Bhandari,
Nhan Nguyen, and
Ravi Josyula, Oklahoma State University
Session 3214-2019
Can you easily tell the day of the week on which the most sales for a store item occurred, and how its sales varied over time for each day? The cycle plot (also known as month plot or seasonal subseries plot) is a popular tool for answering these types of questions. It is an effective graph for analyzing seasonal patterns and long-term trends. Initially developed by Cleveland, et al. in the 1970s, it has gained great interest in the data visualization community lately. Although the Graph Template Language (GTL) does not provide a statement for this plot, you can easily produce one in SAS® with a combination of the DATA step, the SQL and SUMMARY procedures, and ODS Graphics. This paper uses examples to show you ways to generate cycle plots. The same techniques can also be used to produce similar visuals for other types of data.
Lingxiao Li, SAS
Session 3743-2019
Leveraging existing administrative data to better understand the course and consequences of psychiatric conditions is a potentially potent tool for medical researchers as well as those entities that manage and oversee health care for these populations (states, plans, and so on). In psychiatry, some conditions are expected to remit such that with appropriate treatment and care, the symptoms would be expected to be mitigated. For other conditions, there is an expectation of a longer, more chronic course. In addition, people might express symptomatology from different conditions at different points in time, indicating that the primary presenting concern might fluctuate over time. Therefore, people with psychiatric conditions can present with a myriad of symptomatology and categorizing the underlying conditions can be difficult. Being able to identify the most likely probable principal diagnosis during a specific time period using administrative records would help to identify cohorts of individuals with similar conditions, allow for the indication of most salient presenting symptomatology, and provide the best estimate of the most probable principal psychiatric diagnosis. This presentation reviews a psychiatric diagnosis SAS® macro, reviews its steps to create a unique mental health diagnosis for a cohort of clients using mental health clinic services, and compares the results to other, commonly used, diagnostic specification algorithms using administrative data.
Qingxian Chen, M.S. and
Emily Leckman-Westin, Ph.D., New York State Office of Mental Health
Session 3763-2019
Published in 2014 by the International Accounting Standards Board (IASB), applying IFRS 9 (International Financial Reporting Standard) to financial instruments will be mandatory beginning January 1, 2020, in accordance with the Central Bank of Brazil. As the impairment model for IFRS 9 is forward-looking, banks in Brazil are required to consider future economic scenarios to calculate ECL, which brings new challenges to modeling teams. In particular, we have identified a set of conditions that make monetary and fiscal policy mutually influence each other in a perverse way. One of these conditions is what we call fiscal dominance: an increase in the interest rate makes public debt dynamics unsustainable, which leads to a rise in the inflation rate rather than a reduction. If this hypothesis is confirmed, the entire IFRS 9 modeling structure should be based on these assumptions, since the forward-looking PDs or the expected credit loss (ECL) calculation would not only include macroeconomic variables related to the respective sector, but also asset prices that affect the whole system. In that case, there was a scenario in which the individual PDs suffer from systematic influence of a widespread increase of risk caused by an increased likelihood of default of sovereign debt in relation to the monetary policy tightness. We have developed a set of models, VAR/VECM, and related tests using SAS/ETS® and SAS® Studio to evaluate that theory.
Julio Filho, BANCO ABC BRASIL
Raphael Lima, The Analytical Sage
Session 3715-2019
Sicredi is one of the largest cooperative financial institutions in Brazil, composed of 114 credit unions working as a system to serve 4 million members. According to cooperativism values, Sicredi customers are members who directly take part in the units strategic decision and financial results distribution. Always looking for the best relationship with members, we offer financial solutions that improve their life quality and develop communities where we are present. In this paper, we present an overview of our Customer Relationship Management (CRM) journey process, which started in 2011 with the implementation of a homemade CRM software solution, and it has been in constant evolution. Today, our CRM process manages all interactions between account managers and customers, based on systematized actions following customer life cycle phases: activation, maintaining, reselling, and churn. Specific business actions are taken according to the customer phase in the cycle, always searching for the best customer experience. In order to organize our CRM offerings, many business rules and propensity models were implemented using SAS® Enterprise Guide®.
Peterson Colares,
Mathias Martens,
Juliano Silveira, and
Daniel Horn, Sicredi
Session 3363-2019
Any hardware infrastructure that is chosen by a SAS® customer to run their SAS applications requires a good understanding of all layers and components of the SAS infrastructure. Also required is an administrator to configure and manage the infrastructure. The customer must also be able to meet SAS requirements. Not just to run the software, but to also enable it to perform as optimally as possible. This paper talks about important performance considerations for SAS®9 (both SAS® Foundation and SAS® Grid Manager) and SAS® Viya® when hosted in any of the available public clouds Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Platform, to name a few. It also provides guidance on how to configure the cloud infrastructure to get the best performance with SAS.
Margaret Crevar and
Jim Kuell, SAS
Session 3637-2019
Financial institutions play a critical role in keeping access to our country's financial markets free from money laundering and terrorist financing activities. However, most automated detection systems suffer from very low conversion rates and produce an abundance of low-quality investigations. According to industry statistics, over 95% of system-generated alerts are false positives, and nearly 98% never result in a Suspicious Activity Report (SAR). These false alerts cost billions of dollars annually in wasted investigation time and leave investigators with alert fatigue. The conventional rules-based detection approach cannot keep pace with today's constantly evolving money laundering topologies. By using the predictive power of SAS® Visual Data Mining and Machine Learning on SAS® Viya®, financial institutions can reduce risk by reassessing and enhancing their detection strategies with a data-driven, risk-based approach that is recommended by regulators. This presentation demonstrates how SAS Visual Data Mining and Machine Learning enables users to quickly build machine learning models to predict which alerts will generate productive investigations and elevate your company's core anti-money laundering (AML) detection process.
Chris St. Jeor, Zencos
Session 3025-2019
Users can wreak havoc on systems in many different ways from excessive memory and CPU usage to updating tables with bad data. There are several ways to prevent individuals from doing this, namely by limiting users access to servers and data. There is a better way to manage permissions for users that is not server-or environment-specific. By creating a framework of metadata groups and roles, macro-specific permissions can be applied at the individual and group levels. This paper explains and demonstrates how to apply macro-level permissions to users via roles and groups in metadata.
Andrew Gannon, The Financial Risk Group
Session 3738-2019
The Institute for Veterans and Military Families (IVMF) offers many nationally run programs, and the survey data we collect across our entrepreneurship programming portfolio captures business outcomes of participants. The cleaning methods in SAS® Data Management Studio, which include a number of sequel executes and data jobs with expression, standardization, concatenation, data validation, clustering, and surviving record nodes, are discussed. With the cleaned data, dashboards were built in SAS® Visual Analytics to communicate program outcomes. The presentation walks through the rationale behind the evaluation and analysis, and how to conduct each step from cleaning the raw data through to its presentation in SAS Visual Analytics. It also details the way in which IVMF uses graduate student talent, the hallmark of our success at IVMF, higher educations first interdisciplinary academic institute that is focused on advancing the lives of the nations military veterans and their families. As a nonprofit situated on the Syracuse University campus, the IVMF is uniquely positioned to optimize students across 13 schools and colleges, while providing them invaluable real-life experience.
Ankita Kalita, Institute for Veterans and Military Families
Ankita Kalita and Bonnie Chapman, Institute for Veterans and Military Families
Session 3203-2019
Whether you have been programming in SAS® for years, are new to SAS, or have dabbled with SAS® Enterprise Guide® before, this hands-on workshop sheds some light on the depth, breadth, and power of the SAS Enterprise Guide environment. With all the demands on your time, you need powerful tools that are easy to learn and that deliver end-to-end support for your data exploration, reporting, and analytics needs. This workshop explores increasing your productivity by using SAS Enterprise Guide data exploration tools, workspace layout, and enhanced programming environment. You will also learn how to easily create reports and graphics, and produce the output formats that you need (XLS, PDF, RTF, HTML, and PPT).
Marje Fecht, Prowerk Consulting
Session 3952-2019
Sicoob is the largest Credit Union System in Brazil. It has more than 4.2 million customers and almost US$300 million in total assets. This case shows how we used SAS® Enterprise Miner (TM), SAS® Enterprise Guide®, SAS® Management Console, and SAS® Stored Process to completely change the credit scoring process of the institution, exploiting integration with products databases, automation, and reporting. Previous models were based on customer registration data and qualitative information, captured from self-filled questionnaires. The customers probability of default had to be manually updated by employees, which took, on average, 15 minutes for each update. This process involved high operational risk. The automatic reclassification project is based on a more robust methodology and intensive use of data for decision-making. The new statistical models are now processed inside a SAS® engine, which enabled the creation of behavior score models, integration with internal and external databases, and the automation of the entire process. These models can calculate more consistent and assertive probabilities of default. Another gain of centralizing the process inside SAS tools is the creation of dashboards, built through a SAS Stored Process, to monitor the performance of the models in near real time. In addition, there are gains in operational efficiency, an increase in competitiveness, and the reduction of image and legal risk, an estimated saving of almost US$3 million for Sicoob.
Roberta Moreira,
Leonadro Aguirre, and
Alfredo Oliveira, Sicoob
Session 2997-2019
Independent component analysis (ICA) attempts to extract from observed
multivariate data independent components (also called factors or latent
variables) that are as statistically independent from each other as
possible. You can use ICA to reveal the hidden structure of the data in
applications in many different fields, such as audio processing, biomedical
signal processing, and image processing. This paper briefly covers the
underlying principles of ICA and then discusses how to use the ICA
procedure, available in SAS® Visual Statistics 8.3 in SAS®
Viya®, to perform independent component analysis.
Ning Kang, SAS Institute Inc.
Session 3856-2019
Marketing nowadays is more and more done via influencers. Influencers are content creators on social media who have a lot of followers and who really influence what people think, but also what people consider buying. Because influencer marketing is just as costly as the more traditional marketing channels, it can be crucial for a company to select the best influencers to work with. In this paper, we use examples to analyze who the real influencers are on social media and what is the extent of their influence. The case study focuses on influencing opinions and sentiment for several brands. We analyze the posts and activities of a set of social media users with a high number of followers. This requires first downloading posts and their metadata from social media channels (we focus on Instagram and YouTube). Python libraries and code were used to complete these steps. We then analyze the network of posts, likes, and replies to identify the influencers, the communities they are influential in, and the extent of their influence on their followers opinion and sentiment about the brands being marketed. SAS® Viya® text and network analytics actions and action sets were used to perform these analyses. Both the data scraping steps (Python) and the analytics steps [SAS Viya functionality via Python and SAS® Scripting Wrapper for Analytics Transfer (SWAT)] are surfaced using Jupyter Notebook as a client.
Tamas Bosznay, Amadeus Software
Session 3282-2019
Advanced informatics layered with customized analytics has driven major change in every discipline in the past decade. In the healthcare environment, individual level determinants of health are being leveraged to reveal personalized patient management that considers disease patterns, high-risk attributes, hospital acquired conditions, and performance measures for specialized treatment approaches. The growing need for health analytics expertise in light of these health informatics advancements during the last decade has created a critical void for higher education to fill. An abundance of new data-based opportunities that have made large public-use data sets accessible for easy download and use in the classroom have allowed for a much more applied classroom experience. More hands-on applications are positioning students for greater impact in the real world upon graduation and entry into the job market. In this presentation, we highlight the use of controlled and adaptive case studies leveraging SAS® and real-world data to provide a more realistic classroom experience.
Besa Smith, MPH, PhD, ICON, plc
Tyler Smith, MS, PhD, National University
Session 3548-2019
This paper presents a new framework for integrating Python into Base SAS® on the Microsoft Windows platform. Previous attempts at invoking Python functionality through Base SAS involved the use of a Java object in the DATA step. This paper aims to present a more comprehensive framework that includes support for the following: transferring data and parameters from SAS® to Python; transferring data from Python to SAS; transferring figures generated in Python (using Matplotlib) back to SAS reports; and transferring contents from standard output and error streams from the Python process back to SAS reports. Although the components required to invoke Python from Base SAS have been available for a while, the lack of ease-of-use could prove to be a barrier to widespread use. The proposed framework aims to improve the usability of Python functionality through Base SAS by making the process more user-friendly. The framework is implemented through helper SAS macros, helper Python scripts, and helper binaries.
Venu Gopal Lolla, Oklahoma State University
Session 3973-2019
Data scientists and data analysts at top analytic companies are looking for a way to quickly do Agile/self-service analytics to run their business intelligence and analytics quickly in a massively parallel and in-database environment that is seamless for business users to manage and load all their various data types from many different systems. They need the ability to quickly explore, prototype, and test new theories with SAS® while failing or succeeding quickly, all in a self-service environment that does not depend on IT. The different data types that need to be loaded range from commodity Hadoop, to Microsoft Excel spreadsheets, external data sets, SAS data, and other database management systems (DBMSs). All these data types need to be loaded quickly and easily and merged with traditional data warehouse data. This presentation outlines a simple path for building a platform in Teradata that integrates SAS and other analytic capabilities with many data types, Hadoop, and enterprise data warehouse (EDW) data into a single, self-service solution, as well as strategies for querying and bringing data from multiple existing systems and combining the data to provide results in order to produce an even more powerful analytic decision-making environment.
Bob Matsey, Teradata
Session 2991-2019
This presentation explains some techniques available to you when working with SAS® and Microsoft Excel data. You learn how to import Excel data into SAS using the IMPORT procedure, the SAS DATA step, SAS®Enterprise Guide®, and other methods. Exporting data and analytical results from SAS to Excel is performed using the EXPORT procedure, the SAS DATA step, SAS Enterprise Guide, the SAS Output Delivery System (ODS), and other tools. The material is appropriate for all skill levels, and the techniques work with various versions of SAS software running on the Windows, UNIX (including Linux), and z/OS operating systems. Some techniques require only Base SAS® and others require SAS/ACCESS® Interface to PC Files.
Vincent DelGobbo, SAS Institute Inc.
Session 2987-2019
SAS® Analytics for Containers provides the option to deploy SAS® Analytics within container-enabled infrastructures, including Docker and Kubernetes, which are often run in the cloud. Aiming to analyze massively large data from Google BigQuery through SAS® in a containerized environment, we have integrated Google BigQuery with SAS® Analytics Pro in a Docker container in a Google Cloud environment. This paper guides you through the process of configuring SAS/ACCESS® Interface to Google BigQuery in a containerized SAS® application and the validation steps for same.
Sanket Mitra,
Srivalli Avadhanula, and
Fahad Ali, Core Compete PVT LTD
Session 3635-2019
The objective of this breakout session is to show strategies used while implementing the SAS® Platform into a Business Analytics curriculum at the University of Arkansas. The University of Arkansas Walton College of Business, located in Fayetteville, Arkansas, is accredited by the Association to Advance Collegiate Schools of Business (AACSB). The AACSB has participated in workshops with the Information Systems Department, outlining best practices for business analytics programs. SAS has been instrumental in helping to design the business analytics programs and has recognized the value of our program by presenting graduate and undergraduate students upon completion with a certificate endorsed by SAS. The initial portion of this session outlines the journey of program design and growth. The SAS solutions used in the Analytics program include SAS® Enterprise Guide®, SAS® Enterprise Miner (TM), SAS® Viya®, and SAS® Visual Analytics. The second part of the session provides an outline of how SAS® has been used in the core courses leading to the presentation of the SAS certificate. In addition, aspects of these applications are used in courses outside of the core courses in order to supplement the skill-set development of the students.
Michael Gibbs, University of Arkansas, Sam M. Walton College of Business
Session 3260-2019
For drug development, SAS® is the most powerful tool for analyzing data and producing tables, figures, and listings (TLF) that are incorporated into a statistical analysis report as a part of Clinical Study Report (CSR) in clinical trials. On the other hand, in recent years, programming tools such as Python and R have been growing up and are used in the data science industry, especially for academic research. For this reason, improvement in productivity and efficiency gain can be realized with the combination and interaction among these tools. In this paper, basic data handling and visualization techniques in clinical trials with SAS and Python, including pandas and SASPy modules that enable Python users to access SAS data sets and use other SAS functionalities, are introduced.
Yohei Takanami, Takeda Pharmaceuticals
Session 3261-2019
This paper demonstrates how you can use interactive graphics in SAS® 9.4 to assess and report your safety data. The interactive visualizations that you are shown include adverse event and laboratory results. In addition, you are shown how to display details-on-demand when you hover over a point. Adding interactivity to your graphs will bring your data to life and help improve lives!
Kriss Harris, SAS Specialists Limited
Richann Watson, DataRich Consulting
Session 3394-2019
This paper provides an overview of the SAS solution offering for the new
International Financial Reporting Standard IFRS 17, which includes a product
bundle of SAS® Infrastructure for Risk Management, SAS®
Risk and Finance Workbench, and SAS® Visual Analytics. We
cover both the main features of the solution and specific project-related
details that relate to items that we have already observed in project
implementations.
Sebiha Sahin, SAS
Session 3353-2019
SAS® Visual Data Mining and Machine Learning 8.3 in SAS® Viya® 3.4 includes the new pattern Match action, which you can use to execute graph queries that search for copies of a query graph within a larger graph, with the option of respecting node or link attributes (or both). This feature is also available via the PATTERNMATCH statement in the NETWORK procedure. The paper presents examples of pattern matching in social network and anti-money laundering applications. It also provides a functional comparison to Neo4js query language, Cypher, and computational comparisons to both iGraph and Neo4j.
Matthew Galati,
Rob Pratt, and
Steve Harenberg, SAS
Session 3430-2019
Prior to SAS® 9.4M6, there were two options for managing your workload on a SAS GridLoad Sharing Facility from IBM and Apache Hadoop YARN. Starting with SAS 9.4M6, SAS will provide its own workload manager, SAS® Workload Orchestrator. This paper provides information about the new SAS Workload Orchestrator in the areas of design and configuration, along with a comparison to the other workload managers that are supported for SAS® Grid Computing.
Doug Haigh, SAS
Session 3042-2019
SAS/STAT® 15.1 includes PROC BGLIMM, a new, high-performance, sampling-based procedure that provides full Bayesian inference for generalized linear mixed models. PROC BGLIMM models data from exponential family distributions that have correlations or nonconstant variability; uses syntax similar to that of the MIXED and GLIMMIX procedures (the CLASS, MODEL, RANDOM, REPEATED, and ESTIMATE statements); deploys optimal sampling algorithms that are parallelized for performance; handles multilevel nested and non-nested random-effects models; and fits models to multivariate or longitudinal data that contain repeated measurements. PROC BGLIMM provides convenient access, with improved performance, to Bayesian analysis of complex mixed models that you could previously perform with the MCMC procedure. This paper describes how to use the BGLIMM procedure for estimation, inference, and prediction.
Amy Shi and
Fang Chen, SAS
Session 3016-2019
Clustering has long been used to learn the structure of data and to classify individuals into distinct groups. For example, cluster analysis can be used for marketing segmentation that distinguishes potential buyers from nonbuyers. Classical clustering methods use heuristics or simple Euclidean distance to form clusters, but in some cases, the clustering is simpler and more useful if it is based on a formal likelihood. The MBC procedure, available in SAS® Visual Statistics 8.3 in SAS® Viya®, enables you to fit mixtures of multivariate Gaussian (normal) distributions to your data to learn a cluster structure in an unsupervised manner. You can use the fitted model to classify new observations. PROC MBC provides a weight of association for each new observation, enabling you to decide whether a new classification is a strong match for one cluster or needs closer expert examination. This paper describes the concepts behind model-based clustering and presents the basic mode of operation of PROC MBC. Several examples illustrate different use cases, including automated model selection through information criteria, the modeling of outliers, saving models, and applying saved models to new input data.
David Kessler, SAS
Session 3186-2019
Data-driven programming, or data-oriented programming (DOP), is a specific programming paradigm in which the data, data structures, or both control the flow of a program, and not the program logic. Often, data-driven programming approaches are applied in organizations with structured and unstructured data for filtering, aggregating, transforming, and calling other programs. This paper and presentation explore several data-driven programming techniques that are available to SAS® users. Topics include using metadata to capture valuable information about a SAS session, such as the librefs that are currently assigned, the names of the tables available in a session, whether a data set is empty, the number of observations in a data set, the number of character versus numeric variables in a data set, and a variables attributes; using the CALL EXECUTE routine to process (or execute) code generated by a DATA step; constructing a user-defined format directly from data; and using the SQL procedure and the macro language to construct an automated looping process.
Kirk Paul Lafler, Software Intelligence Corporation
Session 3304-2019
SAS has partnered with Esri, the worlds leading mapping technology company, to provide access to geospatial features throughout SAS® Visual Analytics. This paper shows you how to find trends and make decisions by adding location information to your data using geocoding, enriching your data by adding demographics, and analyzing your data using routing and drive-time calculations. We also show you how to incorporate your Esri shapefiles and feature services, and we give a preview of future integration.
Jeff Phillips,
Scott Hicks, and
Eric Short, SAS
Session 3581-2019
Nonlinear mixed-effects models are models in which one or more coefficients of the model enter in a nonlinear manner, such as appearing in the exponent of the growth function. This talk is intended for users already familiar with linear mixed-effects models who are interested in extending their modeling options to include more complex functions. Unlike linear mixed-effects models for longitudinal data, nonlinear mixed-effects models enable researchers to apply a wide range of nonlinear growth functions to data, including multi-phase functions. This talk reviews the syntax for the NLMIXED procedure for fitting a variety of nonlinear mixed-effects models.
Shelley Blozis, University of California, Davis
Session 3068-2019
As a Base SAS® programmer, you spend your day manipulating data and creating reports. You know there is a procedure that can give you what you want. As a matter of fact, there is probably more than one procedure to accomplish the task. Which one should you use? How do you remember which procedure is best for which task? This paper is all about the Base procedures. It explores the strengths of the commonly used, non-graphing procedures. It discusses the challenges of using each procedure and compares it to other procedures that accomplish similar tasks. The first section of the paper looks at utility procedures that gather and structure data: APPEND, COMPARE, CONTENTS, DATASETS, FORMAT, SORT, SQL, and TRANSPOSE. The next section discusses the Base SAS procedures that work with statistics: FREQ, MEANS/SUMMARY, and UNIVARIATE. The final section provides information about reporting procedures: PRINT, REPORT, and TABULATE.
Jane Eslinger, SAS Institute Inc.
Session 3175-2019
Statistical models for analyses of failure times include the proportional hazards model and the accelerated failure time model. These models can be extended to assess the influence of a longitudinally assessed biomarker (a time-varying covariate) on the survival distribution by modeling the hazard function or the scale parameter of a parametric survival distribution. If the biomarker is updated intermittently at a few time points, a straight-forward approach applies the most recent values preceding the failure times. The SAS® procedures PHREQ, LIFEREG, and SEVERITY can be used for analyses. Joint models for the failure time and biomarker parse their joint distribution into conditionally independent components given random effects. Using the GLIMMIX procedure, the biomarker trajectory is constructed as a linear function of random effects and polynomials or splines of time. When incorporated into the survival model as a time-varying covariate, the joint model, called a shared parameter model, is estimated using the NLMIXED procedure. The joint model provides a more complete use of the data on failure times and the longitudinal data on the biomarker. Recent software developments, including a SAS macro, have harnessed SAS procedures to address analyses of shared parameter models. We provide a brief overview of methods and demonstrate their application with previously published biomedical data.
Joseph Gardiner, Michigan State University
Session 3434-2019
In the 1950s, people like George Barris and Von Dutch took standard American sports cars and turned them into custom cars with rebuilt bodies and pinstripe paint giving birth to the kustom car industry. SAS® Viya® provides a highly integrated analytics environment. Data scientists can use SAS® Studio to run point-and-click SAS Viya machine learning models and automatically access the scored data in SAS® Visual Analytics interactive reporting. That comes standard with the SAS Viya engine. But the engine provides many opportunities to create a custom workflow for analytics projects to kustomize the SAS Viya engine with additional features and a stunning new paint job. By making your own point-and-click tasks in SAS Studio and using open-source data visualization software like D3.js to develop unique graphs within SAS Visual Analytics, you can supercharge your data science platform. In this paper, we create a highly customized, end-to-end workflow for machine learning modeling using SAS Studio custom tasks to trigger multiple modeling scenarios, aggregate the resulting output, and create a JSON data structure ready for D3.js. We present D3.js graphs like streamgraphs, circle packing, and sunburst graphs that can be run from within SAS Visual Analytics to explore the results of analytic modeling. All of the code for both the SAS Studio custom tasks and JavaScript visualizations will be available on GitHub for users to kustomize their own SAS Viya ride.
Elliot Inman,
Ryan West, and
Olivia Wright, SAS
Session 3862-2019
Are you looking for large-scale reporting from your data? Do you want to empower end users with the information that data provides and help in decision-making? Do you want to serve all the reporting needs of customers without the need of any business intelligence (BI) or reporting tool? Do you want reports to be generated on a single click using a user-friendly UI? Do you have large audience who are looking for reporting services? If you answer yes to any of these questions, then this paper is for you. In this paper, we discuss a use case in which every quarter, the business wants to generate more than 1500 customized customer reports. To accomplish this, we have developed a tool that enables the business to generate customer-specific reports on a single click and share the reports with customers as an email attachment. To achieve this outcome, we use SAS® programming, SAS® Add-In for Microsoft Office, VBScript, and think-cell. The final output is a Microsoft PowerPoint and PDF report that can be shared with customers via email, which is accessible anywhere. By the end of this paper, you have an overview on using SAS programming for large-scale reporting purposes with the help of Output Delivery System (ODS) for Excel output and VBScript for converting Excel data to a PowerPoint and PDF report.
Lokendra Devangan,
Balraj Pitlola, and
Malleswara Sastry Kanduri, Core Compete
Session 3192-2019
If your customer has not yet chosen to upgrade to SAS® Viya®, you need to find a way to optimize your analytical models without using a full factorial grid search. Latin hypersquare sampling with multidimensional uniformity is a good algorithm to sample your factorial data set. And then we use a genetic optimization to search a better set of hyperparameters.
Andrea Magatti, Business Integrator Partners BIP
Session 3133-2019
One of the interfaces that SAS® University Edition includes is the popular JupyterLab interface. You can use this open-source interface to generate dynamic notebooks that easily incorporate SAS® code and results into documents such as course materials and analytical reports. The ability to seamlessly interweave code, results, narrative text, and mathematical formulas all into one document provides students with practical experience in creating reports and effectively communicating results. In addition, the use of an executable document facilitates collaboration and promotes reproducible research and analyses. After a brief overview of SAS University Edition, this paper describes JupyterLab, discusses examples of using it to learn data science with SAS, and provides tips. SAS University Edition, which is available at no charge to educators and learners for academic, noncommercial use, includes SAS® Studio, Base SAS®, SAS/STAT®, and SAS/IML® software and some other analytical capabilities.
Brian Gaines, SAS
Session 3174-2019
During 2018, I was the Technical Lead on an Analytics 2.0 project for a large New Zealand Government organization that was deploying a MapR converged data platform in Microsoft Azure and a SAS® Viya® and SAS® 9.4 Platform via SAS® Global Hosting in Amazon Web Services (AWS). This presentation covers the technical architecture that was defined for the integrated platforms and the lessons learned during the implementation of the platforms. The presentation discusses the following key areas: 1) SAS Viya multitenancy architecture design; 2) implementing SAS Viya and SAS 9.4why you need to consider the tradeoffs; 3) implementing multi-cloud platforms and integration issues; 4) defining authentication and authorization in a managed platform environment; 5) data integration patterns between SAS Viya and a MapR data lake or data vault; and 6) the automation required to achieve a data ops vision.
Shane Gibson, Pitch Black Ltd
Session 3410-2019
Creating useful data presentations that communicate key points and influence audiences is a mix of art and skill much like a da Vinci masterpiece. The task always begins with a blank canvas that many data professionals find intimidating. Its not always clear where to start or how to ensure that you get the results you want. Whether you are trying to create a simple report, a dazzling dashboard, or tell an influential data story, using a standard method simplifies the process and enables your inner report artist to bloom. In this session, you learn various data presentation methods, review a common approach for creating data presentations, and then review some examples and techniques. This session features examples using SAS® tools, such as SAS® Visual Analytics and SAS® Office Analytics.
Tricia Aanderud, Zencos
Session 3181-2019
TXU Energy (TXU) is a market-leading Retail Electricity Provider (REP) in Texas. TXU is the retail arm of Vistra Energy, one of the nations largest integrated power producer (IPP) company. TXU serves more than 1.7 million customers with a wide range of innovative products and services that are backed by a reputable and trusted brand and strong culture of service excellence and customer satisfaction. Unlike regulated utilities, the Texas market is fully deregulated and is one of the most dynamic and competitive retail environments in the US. Customers have a choice for their service provider and TXU Energy is proud to be the #1 preferred brand for Texans. In this presentation, Atul Thatte shares how TXU leverages advanced analytics at every stage of the customer lifecycle to deliver best-in-class experience and maintain long-term profitable relationships with its customers. Atul walks you through the mission-critical role of advanced analytics from a profit and loss (PL) and customer experience optimization perspective and why advanced analytics should be viewed as a business function that achieves measurable strategic and tactical business goals, as opposed to purely a technology function. Moreover, Atul also shares TXU Energy's data and analytics journey, key accomplishments, and best practices that have helped TXU to maintain its key competitive advantage in a crowded marketplace of service providers.
Atul Thatte, TXU Energy
Session 2992-2019
Parameters are a set of dynamic variables in SAS® Visual Analytics that can enable developers to store a selected value of prompts, buttons, sliders, and other end-user inputs into a variable or group of variables. Often underused, parameters can give developers a powerful level of control over their reports. For example, a parameter can be used to create a dynamic X or Y axis, to dynamically combine or split variables in group roles, and to increase filtering possibilities. This breakout session clarifies what parameters are and how to use them, and gives examples of ways that parameters can be used to enhance the end users dashboard experience. The following examples are included: creating dynamic variable roles; creating buttons to dynamically combine and split columns; and using parameters in advanced filters across multiple data sources.
Jeffrey Meyers, Mayo Clinic
Session 3064-2019
Life tables are statistical tools that are typically used to portray the expectation of life at various ages. Life expectancy at birth is the most frequently cited life table statistic. It also provides information about the numbers of individuals who survive to various ages, median age at death, age-specific death rates, and the probability of dying at certain ages. The Texas Department of State Health Service creates life expectancy tables for publication in the Vital Statistics Annual Report every year. The old method demands that data be run using a Python program, then using a Java script, and then back to Python, which is a tedious process. In this paper, we respond to such a challenge by creating a SAS® syntax that produces a life table in one syntax. The syntax provides a general framework for building life tables, which is a fundamental tool in survival analysis.
Anna Vincent, Texas Department of State Health Services
Zheng,Suting (DSHS)
Session 3564-2019
Social media have become important spaces for organizing communication between citizens, candidates for elective offices, parliamentarians, and government. They can foster popular participation and give greater transparency to politics by building a bridge of interaction between representatives and population, candidates and voters. However, we know that this potential democratizing of social media is not always achieved. With the objective of giving greater clarity to the electoral debate and to this democratizing potential of the social networks of the internet, this article analyzes the level of engagement of internauts with the publications of the Facebook pages of the candidates for the Presidency of the Republic of Brazil, Luiz Incio Lula da Silva and Jair Bolsonaro. More specifically, it seeks to measure each candidates ability to mobilize engagement on their pages and how that capability changed or not throughout 2018. For the purpose of this analysis, engagement is the sum total of likes, shares, reactions, and comments. The reactions are quantified from the following options: Curti, Amei, Haha, Wow, Sad, or Grr. Five thousand publications of page administrators, 9.5 million shares, and 29 million reactions were analyzed. The graphs show, from different metrics, the engagement of internet users with the publications of the candidates pages.
Bruno Ferreira da Paixao, Governor of Federal District
Session 3263-2019
SAS/STAT® software and SAS® Enterprise Miner (TM) are two excellent environments for applying machine learning and other analytical procedures to a wide range of problems, from small data sets to the very large and very wide. In SAS Enterprise Miner, one can move seamlessly from data cleaning and processing, through preliminary analyses and modeling, to comparisons of predictive accuracy of several predictive methods and the scoring of new data sets. Many powerful machine learning and statistical learning tools including gradient boosting machines, artificial neural networks, and decision trees are nodes in SAS Enterprise Miner, and other SAS® procedures that are not nodes can be accessed through the SAS Code Node. In this talk, I work through some examples from business and other areas to illustrate some of the many capabilities of SAS/STAT and SAS Enterprise Miner.
David Cutler, Utah State University
Session 3377-2019
In SAS® Visual Investigator, scenario administrators can author surveillance rules to detect threats and generate highly visual alerts, which can then be investigated by analysts. Analysts then use the application to perform investigations before taking an action on the alert. Wouldnt it be great if the system could learn from the investigation that an analyst has performed in order to make the alert generation process more accurate and more up-to-date as trends change? This paper describes how SAS® Adaptive Learning and Intelligent Agent System automates the surveillance authoring process and uses machine learning techniques to adapt as new threat patterns emerge. For organizations without training data sets, we also examine how unsupervised and semi-supervised learning can be used with SAS Adaptive Learning and Intelligent Agent System to provide effective surveillance solutions.
Rory Mackenzie, SAS
Session 3212-2019
Sending out individualized reports can take days if you have hundreds (or thousands) of recipients. SAS® can help you get that time back by generating and sending these emails. In this paper, we use SAS macros to send hundreds of individualized reports in a matter of minutes. This process can be fully automated with a single input table that maps recipients to their respective reports. We cover options that can be applied to the emails, including personalizing the content of the email, sending from a designated email address, and marking the emails as high priority. In addition, we discuss creating summary reports that provide you with useful information, such as invalid email addresses and a snapshot of the emails sent out in a given run.
Salma Ibrahim, SAS
Session 3580-2019
SAS® macros are a powerful tool that can be used in all stages of SAS program development. Like most programmers, I have collected a set of macros that are so vital to my program development that I find that I use them just about every day. In this paper, I demonstrate the SAS macros that I developed and use myself. I hope they will serve as an inspiration for SAS programmers to develop their own set of must-have macros that they can use every day.
Joe Deshon, Boehringer Ingelheim Animal Health
Session 3825-2019
A data scientist is always necessary to construct various models and to acquire the latest analysis method for various kinds of data in order to maximize data science. Fashionable artificial intelligence (AI) is also the same. The AI that we define is the system with the series of processes of recognition, learning, and action, which assists with peoples activities. There are various types of data used in AI, so the models or methods that use recognition, learning, and action are different depending on the data format. However, in talking about data science, data governance is very important regardless of the data format. We have made a strategy for data governance using Python and SAS® via SAS® Viya®, and have been maximizing data science based on effective matching such as machine learning and deep learning (CNN, RNN, and so on). As one example, we introduce an AI SAS programmer system developed by our company, which semi-automatically creates SAS programs to analyze clinical data. This system is constructed from machine learning, deep learning, and so on, by selecting a programming language in data driven by SAS Viya. This system was led to 33% reduction of standard work time in analysis work. Now, based on this data-driven data governance strategy, we are making a strategy to utilize data science for product innovation of new drug development, and we also introduce a part of this strategy.
Ryo Kiguchi,
Katsunari Hirano, and
Yoshitake Kitanishi, Shionogi & CO., LTD.
Session 3093-2019
A clean SAS® data set is the deliverable. It is the one and only thing the analysts are waiting for, but so much goes into the data before it is ready to be shared. Our research team has spent decades learning to design good data collection instruments and to create purposeful survey questions with practical response values. Whether the questionnaire is manually administered by a research assistant or online software is used to collect information from a respondent, it is important to begin with a clear and concise plan. Survey software is widely used in industry and academia. We use our experience in SAS to tailor the way the data is collected; the way SAS likes to see it. Our example uses Qualtrics Research Suite to design and gather data, and SAS® 9.4 to create a data set for further investigation and analysis.
Julie Plano and
Keli Sorrentino, Yale University
Session 3645-2019
SAS® metadata contains extensive information about all elements of a SAS site. Important parts of that metadata are surfaced through clients such as SAS® Management Console and SAS® Data Integration Studio. But a lot is not accessible that way. For example, SAS Data Integration Studio does not tell you if another table has a foreign key relation to a table you might delete. A good understanding of the relations between different elements in the metadata can be acquired only by browsing through the metadata. You might want a list of all logins and the persons that use them. This paper describes an easily installed SAS® Stored Process that enables the user to navigate through the entire metadata of a SAS®9 installation, using only the web browser. The metadata is also available for people who do not have any of those SAS clients (dependent on authorization, of course). Information about jobs, transformation steps, tables read and written, users and their group memberships, and so on, are presented in a structured way. This Metadata Explorer is available in the public domain, through www.pwconsulting.nl, and can be installed without any preconditions. This paper describes the way the SAS Stored Process is organized and explains several technical tricks, like the STREAM procedure to create HTML and XML files on the fly; the XSL procedure to transform the XML results of a metadata query into a more easily processed form; a new ODS tagset for the REPORT procedure to produce folding tables using Bootstrap CSS, and so on.
Frank Poppe, Laurent de Walick, PW Consulting
Session 3620-2019
Propensity scores are commonly used in observational studies to adjust data for increased validity in exposure variable effect estimation. The PSMATCH procedure offers a number of methods for using propensity scores to this end, eliminating the need for macros. This paper highlights the use of PROC PSMATCH to perform 1:1 propensity score matching of treated and control units without replacement. As the proportion of matched cases is influenced by features of the data and PROC PSMATCH settings, this paper provides an example to illustrate these effects. The proportion of matched units is evaluated as a function of matching strategy (that is, greedy versus optimal), ratio of cases to controls in the original data set, and the number of variables in the model.
Brittany Hand,
J. Madison Hyer,
Rittal Mehta, and
Timothy Pawlik, Ohio State University, Wexner Med Ctr
Session 2986-2019
Mastering Parameters in SAS® Visual Analytics
Stu Sztukowski, SAS Institute Inc.
Session 3062-2019
This session provides useful hints and philosophies for starting out well as a SAS® programmer, or for how to improve from beginner to old, seasoned hand.
Kurt Bremser, Allianz Technology
Session 3621-2019
Better use of analytics is clearly a priority for executives but approving funding for new tools or resources is not. Part of the challenge is justifying the need for even more investment when they don't immediately understand how analytics contributes to the bottom line. Developing the financial justification for analytics that executives will respond to starts with measuring analytic results in real-world dollars. Its not about making new discoveries or gaining access to new data. Its about driving financial value. This presentation walks you through the process of developing the business and financial justification for advanced analytics by using examples gathered from customers across numerous industries. Focus is given to building the formulas necessary to measure improvements in operational performance, as well as addressing how these companies are able to achieve the extra bandwidth necessary to make these improvements a reality.
Kenneth Pikulik, Teradata
Session 3568-2019
If stability of a models performance is important, how would you measure it before deploying the model into production? This paper discusses the use of randomization test as a post-model-build technique to understand the potential variations in a models performance. Awareness of the variance in model performance can influence deployment decision and manage performance expectation once deployed. During model build, this information can also be of use in choosing amongst candidate models.
Daymond Ling, Seneca College of Applied Arts and Technology
Session 3341-2019
The powerful analytics in SAS® Viya® have been recently extended to include the ability to process medical images, the largest driver of health-care data growth. This new extension, released in SAS® Visual Data Mining and Machine Learning 8.3 in SAS® Viya® 3.4 enables customers to load, visualize, analyze, and save health-care image data and associated metadata at scale. As SAS continues to build on this foundation, several substantially new medical image processing capabilities are planned for future versions of SAS Visual Data Mining and Machine Learning. In particular, these new capabilities will enable customers to perform the following tasks: process generic data files, such as radiotherapy (DICOM-RT) files, under the Digital Imaging and Communications in Medicine (DICOM) standard; process images in a single SAS® Cloud Analytic Services (CAS) action call even when the processing parameters vary from one image to another in the input table; use highly advanced techniques to perform image segmentation; and quantify the size and shape of tissue regions in binary images. This paper demonstrates the new capabilities by applying them to colorectal liver metastases (CRLM) morphometry with CT scans to assess patients response to chemotherapy. This study is a collaborative effort between SAS and Amsterdam University Medical Center (AUMC) for improving CRLM treatment strategies.
Joost Huiskens, SAS
Fijoy Vadakkumpadan, SAS
Session 3669-2019
There is a major update to the SAS® Grid Quick Start on Amazon
Web Services
(https://aws.amazon.com/quickstart/architecture/sas-grid-infrastructure/).
In this session, we present the SAS Grid Quick Start 2.0 and the following
key improvements: 1) Launch a fully automated end-to-end SAS®
Grid environment (AWS infrastructure and SAS Grid software) that is ready
for use; 2) Support for IBM Spectrum Scale file storage for SAS Grid shared
storage in addition to Lustre storage; 3) SAS Grid nodes run on i3.8xlarge
instances; 4) Additional enhancements for storage type and shared data
backup; and 5) The SAS Grid deployment is done using Ansible scripting,
which SAS administrators can easily change to add or remove nodes in an
existing grid environment.
Siddharth Prasad and Srikanth Mergu, Core Compete LLC
Session 3978-2019
SAS® offers a variety of useful tools to manipulate character
data. Those are very handy when the data you receive might not be as clean
as you would like. A character string can contain inconsistent or unwanted
special character or formats that makes your reconciliation or merging job
fail. This paper uses a lot of workable examples to discuss several
functions to efficiently standardize the text variable before you use the
data to do the next step analysis. For example, the undesired information
can be removed out of the variables by leveraging the three arguments of the
COMPRESS function. Inconsistent formats or special character can be
standardized with a consistent character using TRANWRD, TRANSLATE, Uppercase,
Lowercase or Propcase functions. Sometimes, you might need to add or reorder a
substring. Concatenation operator (||) or CAT, CATT, CATS, CATX, SUBSTR, or
SCAN can be used. The intended audience is beginner to intermediate SAS
users with good knowledge of Base SAS®.
Guihong Chen, TCF Bank
Session 4056-2019
The primary objective of this study is to create a set of detailed profiles
of web advertisements known to recruit victims into human trafficking and
use those profiles to classify other advertisements into the same category.
The data used in this study are advertisements involving the sex trade
scraped from Backpage.com. These advertisements provide evidence of certain
key factors that are indicative of human trafficking. Some factors
associated with human trafficking are phone numbers, locations and
geographical range, frequency of ads, and specific syntax usage.
Lauren Agrigento and William Taylor, Louisiana State
University
Session 3669-2019
There is a major update to the SAS® Grid Quick Start on Amazon Web Services (https://aws.amazon.com/quickstart/architecture/sas-grid-infrastructure/). In this session, we present the SAS Grid Quick Start 2.0 and the following key improvements: 1) Launch a fully automated end-to-end SAS® Grid environment (AWS infrastructure and SAS Grid software) that is ready for use; 2) Support for IBM Spectrum Scale file storage for SAS Grid shared storage in addition to Lustre storage; 3) SAS Grid nodes run on i3.8xlarge instances; 4) Additional enhancements for storage type and shared data backup; and 5) The SAS Grid deployment is done using Ansible scripting, which SAS administrators can easily change to add or remove nodes in an existing grid environment.
Siddharth Prasad, Core Compete LLC
Srikanth Mergu, Core Compete LLC
Session 3978-2019
SAS® offers a variety of useful tools to manipulate character data. Those are very handy when the data you receive might not be as clean as you would like. A character string can contain inconsistent or unwanted special character or formats that makes your reconciliation or merging job fail. This paper uses a lot of workable examples to discuss several functions to efficiently standardize the text variable before you use the data to do the next step analysis. For example, the undesired information can be removed out of the variables by leveraging the three arguments of the COMPRESS function. Inconsistent formats or special character can be standardized with a consistent character using TRANWRD, TRANSLATE, Uppercase, Lowcase or Propcase functions. Sometimes, you might need to add or reorder a substring. Concatenation operator (||) or CAT, CATT, CATS, CATX, SUBSTR, or SCAN can be used. The intended audience is beginner to intermediate SAS users with good knowledge of Base SAS®.
Guihong Chen, TCF Bank
Session 4056-2019
The primary objective of this study is to create a set of detailed profiles of web advertisements known to recruit victims into human trafficking and use those profiles to classify other advertisements into the same category. The data used in this study are advertisements involving the sex trade scraped from Backpage.com. These advertisements provide evidence of certain key factors that are indicative of human trafficking. Some factors associated with human trafficking are phone numbers, locations and geographical range, frequency of ads, and specific syntax usage.
Lauren Agrigento, Louisiana State University
William Taylor, Louisiana State University
Session 3565-2019
This paper shares how SAS and Millennium Bank partnered to bring success to their Millennium Banking Academy (MBA) program. The presentation covers the value that SAS® Education provided toward the program and how Millennium has benefited from it. Millennium has selected the SAS® certification program as the flagship training asset of its Trainees Program. With a 90% retention rate, the results are very positive. Moreover, this Trainee Program was ranked in the top three in Portugal at the prestigious HR Magazine Awards 2018.
Pontes, Tarcísio, Millennium bcp
Session 3079-2019
My presentation addresses how to manage the risk embedded in heavy use of credit risk models by banks for improving decision-making. That risk is realized if there is a difference between the value received from the model and reality. Hence, the more you lean on a model for decision-making, the higher the cost of error. Lending is the core business of most banks. One of the key success factors in credit is to be able to differentiate between the good borrowers and the bad ones by using credit rating models. These models might be used in other models (for example, expected loss, pricing, concentration risk, economic capital). As a result, a flawed model might have a severe impact and cost of error. Since the role of models has been increasing due to digitalization, real-time decision-making, big data, IoT, and new machine learning algorithms, most of the risk is shifted from the credit officers to the models. In addition, every model has a lot of stake holders (the models owner, developer and validator, internal audit, board, regulator), which creates an echo system around it regarding its data quality, monitoring, documentation, correspondence, business uses, links to other models, and so on. As a result, managing the model risk and developing strong corporate governance of models are crucial. In this presentation, I describe how to establish a framework of corporate governance of models in organizations, and how to mitigate that risk with SAS®.
Boaz Galinson,VP, Credit Model Risk Management, Leumi Bank
Session 3356-2019
In our increasingly analytical world, models are high-value assets that drive important business decisions and critical regulatory activities. They have enabled organizations to replace subjective opinion with objective and repeatable logic. However, models require attention and care; most professionals would agree that bad models are costly! Yet many companies struggle to answer basic questions about their models: How well are your models performing? What models require attention? Are you properly allocating your resources to address your material models? SAS is the largest and most successful model risk management software vendor in the world. Software like SAS® Model Risk Management enables organizations to use a data-driven approach to assess their model risks, identify gaps, review and update their policies and procedures, and reassess their risks. This iterative approach continually improves their model quality and efficiency. In this presentation, I focus on best practices in exploring model risk data, as well as fine tuning your governance policy and approach.
David Asermely, SAS
Session 3729-2019
HIV infected patients can experience many intermediate events including between-event transition throughout the duration of their follow-up. Through modeling these transitions and accounting for competing adverse events at each transition, we can gain a deeper understanding of patient trajectories and how predictors affect patient status over the entire progression pathway. Transition-specific parametric multi-state models were applied to gain a deeper understanding of patient trajectories. Transition specific distributions for multi-state models, including the combination of proportional hazards and accelerated failure time models, were presented and compared in this study. The analysis revealed that the probability of staying in a severe or advanced stage of the diseased decreases as time progresses. The transition probability of immunological recovery increased with increasing (follow up) time. We further found that while the transition probability of recurrence increased with time, it reached an optimum at a point in time and then decreased as time progresses. Multi-stage modeling of transition-specific distributions offers a flexible tool for the study of covariate effects on the various transition rates. It is hoped that the article will help applied researchers to familiarize with the models and including interpretation of results. In this paper the implementation was carried out using SAS® software (NLMIXED code).
Zelalem Dessie and
Temesgen Zewotir, University of KwaZulu-Natal
Session 3596-2019
Cigarette smoking is harmful to health and is estimated to have a yearly social cost of billions of dollars. Smoking behavior is often established during the teenage years, so understanding why teens start smoking is important in the development of smoking prevention policies. One issue in modeling smoking choice is that factors in a persons choice to smoke are often correlated with unobserved characteristics or individual circumstances. If appropriate statistical techniques are not used, this endogeneity causes biased parameter estimates and incorrect inference. This paper demonstrates how to overcome the problem of endogeneity by using techniques from SAS Econometrics® software.
Gunce E. Walton, SAS Institute Inc.
Session 3154-2019
Deep learning is attracting more and more researchers and analysts with its numerous breakthrough successes in different areas of analytics and artificial intelligence. Although SAS® is well-known for its power in data processing and manipulation, it is still relatively new as a deep learning toolbox. On the other hand, Python has become the language for deep learning due to its flexibilities and great supports from deep learning researchers. In this paper, we present a case study of combining SAS and Python for the task of predicting next-day stock price direction with deep recurrent architectures, including vanilla Recurrent Neural Network, Long Short-Term Memory, Gated Recurrent Unit, and a novel model we proposed named Recurrent Embedding Kernel. To use the power of each framework, we preprocess data in SAS and build our deep models in Python. The whole process is unified in one framework using the SASPy module, which allows access to SAS codes in Python environment.
Linh Le and
Ying Xie, Kennesaw State University
Session 3730-2019
Traditional SAS® application configuration and administration has many detailed tasks for which you need deep knowledge of the SAS® Platform. When these tasks are done manually, they take a lot of time and mistakes are possible. That is the reason why we wanted to modernize the way to do it. SAS Platform administration configuration tools have been used as a quick start to automate configuration tasks. Some of the tools have been automated and integrated with the enterprise-level identity and access management process. SAS Platform user IDs, rights, and authorizations are now administered with as much automation as possible. When user groups have been added into Active Directory for a new application, the application configuration is automatically done in the development environment the next morning. The automation includes metadata folders, groups, ACTs, libraries and SAS® LASR (TM) Servers, and file system folders. There is also a SAS® Stored Process interface for production administrators to promote the configuration grant for selected applications. The benefits of this solution come in many ways. Only identity and access management integration save more than the development costs. And the process is faster than before, which improves the end-user experience by many days. Note that more savings will be achieved in the configuration part. On the technical side, this solution provides top-level naming standards and best practices for a variety of SAS technologies.
Timo Blomqvist, OP-Palvelut Oy
Tapio Kalmi, SAS
Session 3375-2019
In recent releases of SAS® Visual Analytics on SAS® Viya®, many reporting and dashboarding capabilities have been introduced that make the move to the current version of SAS Visual Analytics very attractive. Because the underlying architecture has changed, you have to carefully prepare for this move. This paper describes how you can best prepare and execute the modernization and what you should take into consideration to avoid any setbacks. The presentation also includes a live demonstration showcasing how to move content from SAS Visual Analytics 7.4 to SAS Visual Analytics 8.3.
Gregor Herrmann, SAS
Session 3448-2019
This paper presents a novel approach for monitoring model performance over
time. Instead of monitoring accuracy of prediction or conformity of
predictors marginal distributions, this approach watches for changes in the
joint distribution of the predictors. Mathematically, the model-predicted
outcome is a function of the predictors values. Therefore, the predicted
outcomes contain intricate information about the joint distribution of the
predictors. This paper proposes a simple metric that is coined the Feature
Contribution Index in this approach. Computing this index requires only the
predicted target values and the predictors observed values. Thus, we can
assess the health of a model as soon as the scores are available, and raise
our readiness for preemptive actions long before the target values are
eventually observed. This index is model-neutral because it works for any
type of model that contains categorical predictors, continuous predictors,
or both, and output-predicted values or probabilities. Models can be
monitored in near real time since the index is computed using simple and
time-matured algorithms that can be run in parallel. Finally, it is possible
to provide statistical control limits on the index. These limits help
foretell whether a particular predictor is a plausible culprit in causing
the deterioration of a models appearance over time.
Ming-Long Lam, Ph.D., SAS
Session 3195-2019
SAS® Credit Scoring is a flagship product from SAS that helps you reduce credit losses and boost your overall business performance by making better, data-driven credit decisions on both the origination and servicing sides of your business. One of the key benefits of the solution is the reduction in time to market, which delivers significant Return on Investment (ROI). Recently, the solution has undergone significant enhancements that reduce its implementation time from several months to several weeks enabling financial institutions to realize their time-to-market ROI even sooner. Based on real-life project experiences, this paper outlines how implementation can be fast-tracked through an improved assessment and a phased implementation approach. Moreover, the use of an automation toolkit optimizes implementation efforts, which helps further reduce time frames. This combination of processes and a toolkit enables you to leverage the existing data infrastructure at a site, thereby avoiding long project cycles that are associated with building data warehouses or data marts that feed into the solution. This paper also provides existing and new customers of SAS Credit Scoring with an update on recent enhancements to the solution and a roadmap to faster ROI.
Abhijit Joshi, SAS
Session 3270-2019
Real-world data (such as health insurance claims, data from providers electronic health record systems, and disease registries) is a rich source of potential evidence that is used to inform topics from public health surveillance to comparative effectiveness research. Data sourced from individual sites can be limited in its scope, coverage, and statistical power. Pooling data from multiple sites and sources, however, presents provenance, governance, analytic, and patient privacy challenges. Distributed data networks resolve many of these challenges. A distributed data network is a system for which no central repository of data exists. Data is instead maintained by and resides behind the firewall of each data-contributing partner in a network, who transforms their data into a common data model and permits indirect access to data via the use of a standard query approach. This paper discusses the contributions of several national-level distributed data networks, including the Sentinel Initiative, the National Patient-Centered Clinical Research Network, and the Observational Health Data Sciences and Informatics program. Focus is placed on the analytic infrastructure the common data models and reusable analytic tools each network has developed to support their scientific aims. This paper considers how organizations that are not members of these networks can adopt or adapt what has been achieved at the national-level to develop their own analytic infrastructures.
Jennifer R. Popovic, RTI International
Session 3322-2019
This paper introduces SAS® Viya® 3.4 multi-tenancy and considerations for its deployment. It delves into the steps required to deploy and configure multi-tenancy successfully. It highlights the reasons to use multi-tenancy, changes to make during deployment, how to onboard tenants, and pitfalls to avoid during the configuration.
Eric Davis, SAS
Session 3605-2019
Missing data is a common phenomenon in various data analyses. Imputation is a flexible method for handling missing-data problems since it efficiently uses all the available information in the data. Apart from regression imputation approach, the MI procedure in SAS® also provides the multiple imputation options, which create multiple data sets based on Markov chain Monte Carlo (MCMC) and full conditional specification (FCS) methods. However, these methods might not work very effectively for skewed multivariate data since they require the assumption of multivariate normal distribution. To deal with such data, we introduce an approach based on copula transformation, which was recently introduced by Bahuguna and Khattree (2017). We combine imputation using PROC MI and copula theory using PROC COPULA to arrive at an approach to solve the missing data problem for skewed multivariate data. We implement and demonstrate the use of this method through a simulated example under the assumption that data are missing completely at random (MCAR).
Zhixin Lun and
Ravindra Khattree, Oakland University
Session 3947-2019
Each year, thousands of players from all around the world submit their name to the National Basketball Association (NBA), in hopes that their talent warrants an invitation to the upcoming draft. The draft a process that enables teams to select the best talent to add to their roster. One of the major obstacles for teams in this process is trying to weed out the bad players, so to speak, from the good. Traditional selection methods have no real empirical backing and currently possess no foolproof measure of ensuring that the players chosen will perform well in the league. The purpose of this research paper is to develop models that can assist scouts in selecting the optimal players and prevent the selection of players who are more prone to bust or exhibit poor performance. In this study, I have developed several predictive models, such as decision trees and neural networks, that aide in forecasting a players first-year performance in the NBA. These models illustrate how pre-draft characteristics such as an individuals height, hand size, weight, wingspan, and other attributes serve as a predictor for their first-year performance. The models were developed using SAS® Enterprise Miner, and the best model will be selected based on the validation data mean squared error.
Isaiah Hartman, Walmart and Oklahoma State University
Session 3126-2019
Demand forecasting is essential for supply chain processing and for determining the income expectations of a company. Demand forecasting defines production plans for each Nestl factory. These plans are needed to evaluate manpower, for resource over-utilization and under-utilization, to produce in advance, to identify potential inventory shortages, and to create purchase plans for raw and packing materials. The better the forecast, the better the sales, operational, and financial planning. Lower inventories are needed to keep a high customer-service level. There are several factors that shape the demand of a food and beverage company, such as macro-economic context, marketing, and special events. All of this data is used by a SAS® solution for predictive analysis. Which SAS solution you choose depends on each demand and its variables. The solution needs to be able to identify everything that is related to a particular business unit because that information might be used in demand forecasting. A statistical forecast is now part of the sales and operational planning for 11 business units. SAS® Forecast Server is the baseline solution to predict when to grasp opportunities, how to close gaps between forecast and business targets, and how to identify risks. The accuracy of demand planning improved for a part of the portfolio after SAS Forecast Server was implemented. SAS Forecast Server had a direct impact on the level of customer service, it reduced inventories, and it improved product freshness.
Marcos Borges, Nestlé
Session 3896-2019
SAS® runs on multiple platforms and has multiple procedures for pulling in data, multiple methods to create and modify data, multiple procedures for data analysis, multiple methods to generate data, multiple ways to present the data, and, to top it all off, SAS offers multiple learning options. As someone new to SAS, how do you decide the best route for your own learning experience? Three years ago, I began my own journey as a SAS rookie when I started transitioning from a career as a secondary science instructor to a SAS programmer and institutional researcher in higher education. In my own learning journey, I have used both online and face-to-face classes and successfully achieved certification as a SAS programmer. I have a doctorate in Adult and Community College Education, and I provide an overview of my own SAS journey from the perspective of both an adult learner and a professional adult educator. Based on both my own experience and my expertise, I provide participants with a step-by-step procedure for developing personalized professional development plans to fit their unique needs and situations.
Kelly Smith, Central Piedmont Community College
Session 3123-2019
A majority of analytic work cannot start until an enterprise data warehouse (EDW) exists or significant database development work occurs. This time-consuming and usually cost-prohibitive prerequisite gets in an organizations way of generating real analytical value quickly from data. Other issues with EDWs include security (single-point-of-failure), scalability, data transfer and resource constraints, and accessibility. What if blockchain technology could help an organization skip the EDW process altogether? And how can SAS® be leveraged to generate immediate value in ensuring valid blockchain creation as well as timely analytic results? In this paper, we explore several potential cross-industry use cases for blockchain and how SAS can assist in creating a block and generate value from the block.
Angela Hall, SAS
Angela Hall
Session 3874-2019
The Acute Physiology and Chronic Health Evaluation (APACHE) II classification system is commonly used in intensive care units (ICU) to classify disease severity and predict hospital mortality. Disease severity scales in the ICU are a necessary component to assist in predicting patient outcomes, comparing quality of care, and stratifying patients for clinical trials. The APACHE II score is calculated from patient demographics and physiologic variables measured in the patients first 24 hours following ICU admission. The score comprises three components: 1) an acute physiology score (APS); 2) age; and 3) chronic health conditions. Although this information is accessible in electronic medical records (EMR), some systems do not have a way to automate the calculation of the APACHE II score, leaving physicians to calculate it by hand, which decreases patient care time and increases calculation error rates. This paper shows how to calculate the APACHE II score based on information from EMRs using our macro.
Margaret Kline and
Jessica Parker, Spectrum Health
Session 3455-2019
As SAS® continues to push boundaries with its cloud-based analytics ecosystem, SAS® Viya®, SAS also continues to break new ground as well! With a new initiative to become more open to developers via a robust API, and current integration with the Python package known as SWAT (Scripting Wrapper for Analytics Transfer), there are opportunities to take your in-house data science initiatives to a higher level. This session looks at incorporating open-source graphing techniques, specifically Pythons matplotlib integrated with the popular D3 visualization framework, to generate interactive plots that can spur discovery of the story your data is trying to tell you. We work through some traditional statistical programming examples via calls in Jupyter Notebook to SAS Viya. Within Jupyter, we convert our static graphs into dynamic graphs using mpld3, an open-source Python library that marries D3 to Python. Finally, we demonstrate moving our sample code into a Python microservice.
Joseph T Indelicato, SAS
Session 4038-2019
Currently, tumor biopsies do not provide doctors with all the possible information that could be used in determining a treatment plan for patients with cancer. The reason for this is that biopsies only remove a very small part of the tumor. However, there are certain rare cells (those that have a very large impact on determining key facts like how aggressive the cancer is and how fast it will spread and grow) that exist scattered around the tumor, but because of its size and lack of direction, the biopsy often does not pick these up. Once a dataset is formulated with different datapoints representing different cells within the tumor as a whole, clustering methods and related algorithms available in SAS can be used to find the groups of data points with the greatest variation. This shows the area of the tumor that contains the greatest variety in cell types, which is where a surgeon would want to aim during a biopsy in order to get the most accurate representation of the tumor and therefore create the most accurate prognosis and treatment plan. This model can then be applied to tumors of different cancers, sizes, and stages. This presentation will demonstrate the use of clustering procedures available in SAS to identify a biopsy sample with maximum variation in cell type. This work is an extension of the research that the author did during her internship at Stanford in the summer of 2017.
Richa Sehgal, Northview High School
Session 2981-2019
Within the marketing research industry, Total Unduplicated Reach and Frequency (TURF) analysis has become an increasingly popular technique used to determine which combination of products will appeal to the greatest number of consumers. For companies that rely on optimal product assortment to help drive profitability, the output from a well-designed TURF analysis is critical for understanding product cannibalization and to evaluate the tradeoffs associated with adding or removing specific products. Conventional approaches for TURF analyses have involved calculating all possible product combinations, only to then recommend the one optimal solution or the few near-optimal solutions. This exhaustive approach is computationally inefficient and does not scale to commercial sized problems, which can often involve dozens of products, thousands of consumers, and tens of millions of product combinations. Other approaches using the Greedy heuristic fail to guarantee an optimal solution. A more accurate and commercially viable approach to TURF analysis can instead be constructed as a mixed-integer linear programming (MILP) problem using SAS/OR® software. This paper details the modeling approach, data requirements, desired output, and scalability considerations. A detailed example along with sample code using SAS/OR is provided. This paper introduces both the business problem and the analytical solution, so anyone with a background in retail analytics or market research can use this approach.
Jay Laramore, SAS Institute
Session 3865-2019
This paper discusses the orchestration of SAS® data integration processes based on the arrival of the SAS data integration input files in Amazon Web Services (AWS) S3. Our client runs a daily process where they generate credit statements for their customers. Each customer receives their statement once in a month. Every day, around 200K customers are processed, eventually reaching out to their entire customer base of roughly 6 million in a month. The process starts in their on-premises datacenter, followed by the APR calculations in SAS data integration in AWS, and finally culminates with the generation of the statements in the on-premises datacenter. The entire architecture is built using microservices; small independent and highly decoupled components such as AWS Lambda, Simple Notification Service (SNS), Amazon Simple Storage Service (Amazon S3), and Amazon Elastic Compute Cloud (EC2). This makes it easier to troubleshoot any issues in the data pipeline. The choice of using a Lambda function for the orchestration adds a certain complexity to the process. However, it also provides the most stability, security, flexibility, and reliability for an enterprise architecture. There are simpler alternatives like S3Fs and Cloudwatch SSM, but they do not fit well for an enterprise architecture.
Prasanth Reddy and
Rohit Shetty, CoreCompete
Session 3585-2019
RTI International made the move from PC and server-based SAS® to SAS® Grid Computing in 2016. The decision to move to this environment was based on cost over time, scalability, room for growth, centralization of SAS administration and improved performance. Learn how the transition occurred, including successfully transitioning 400+ SAS users in more than one environment within a 12-month period. The process included gaining support from business units and IT, identifying an implementation partner, forming stakeholder teams, gathering requirements and designing the system. A plan and timeline was established for communication, implementation, training, and rollout. Along the way, we addressed challenges in our environment, Federal Information Processing Standard requirements, user response to the transition, data housed at regional offices, and legacy code that required modification.
Annette Green, RTI International
Session 4600-2019
Each month it seems there is a new technology introduced that can transform your organization in unimaginable ways. Yet despite the rapid growth in analytic solutions, a recent Gartner survey revealed that almost 75% of organizations thought their analytics maturity had not reached a level that optimized business outcomes. Clearly the hype is not matching the reality. Just like with any endeavor, your organization must have a planned strategy to achieve its analytical goals. Even the best intended projects die on the vine if not given the proper support. In this paper, you learn how to determine your organizations current analytic maturity level, ways for overcoming common blockers, and elements used in successful analytics adoption strategies. Our team has assisted multiple organizations as they transform from simple reporting to data optimization. Using this information, you can lead your organization to creating a successful roadmap that is customized to ensure success.
Tricia Aanderud,
Reid Baughman,
Ken Matz, and
Chris St. Jeor, Zencos
Session 3555-2019
Across America, prisons hold the parents of over a million children (Bureau of Justice Statistics, 2008). Nationally, prisons held approximately 744,200 fathers and 65,600 mothers (Glaze Maruschak, 2008). Archival data (Survey of Inmates in State and Federal Correctional Facilities, 2004) was utilized to conduct a correspondence analysis inquiry regarding inmates and their minor children. Correspondence analysis (CA) shows how data deviate from expectation (observed values versus expected values) when the row and column variables are independent (Friendly, 1991; Dickinson Hall, 2008). Correspondence analysis creates a two-dimensional visual display of observed data variation, which can be utilized for examination of variable behaviors (Wheater et al, 2003). SAS® code was written to invoke the CORRESP procedure. Variables of interest included self-reported gender, ethnicity, percentages of minor children, and their associated caregivers. Caregiver refers to the person responsible for the minor child while the parent was incarcerated the non-incarcerated parent, grandparents, other relatives, and foster care. The resultant CA output includes a table of associated values and the CA graphical displays generated. This presentation highlights SAS versatility to investigate social phenomena variables within large federal data sets and generates empirical inquiry to visualize the social magnitude of parents in prison.
Wendy Dickinson, University of South Florida
Session 3589-2019
SAS® Viya® 3.4 is new analytics architecture based on the SAS® Cloud Analytic Services (CAS) in-memory engine delivered by SAS. SAS Viya 3.4 changes the fundamental methodology of installing SAS by moving away from SAS Software Depot and toward industry-standard software deployment methods. This paper compares and contrasts SAS 9.4 with SAS Viya in several key areas for administrators including pre-installation requirements, installation processes, administration tools and methods, and data source connectivity including library definitions. The paper discusses upgrade and migration planning.
Don Hayes, DLL Consulting Inc.
Michael Shealy, Cached Consulting, LLC
Session 3236-2019
Model Studio in SAS® Visual Data Mining and Machine Learning provides a pipeline-centric, collaborative modeling environment that enables you to chain together steps (into a pipeline) for preprocessing your data, making predictions by using supervised learning algorithms, and then using the assessment measure of your choice to compare models. After the pipeline has determined the champion model, you can deploy that model to score new data. Since the first release of SAS Visual Data Mining and Machine Learning, many enhancements have been made to Model Studio, from nodes for running code (open-source code, batch code from SAS® Enterprise Miner (TM), or score code that is generated outside Model Studio) to more integration with other visual environments to allow for seamless exploration and visualization of your data as you build models. Although it is hard to limit the number of our favorite features, we present our top 10 features in Model Studio in SAS Visual Data Mining and Machine Learning 8.3 for relieving your biggest pains and increasing your productivity.
Wendy Czika and
Peter Gerakios, SAS
Session 3388-2019
Making sense of a large amount of data is one of the most important aspects of a reporting system. Reporting helps you and others in your organization discover important insights into trends, business strengths and weaknesses, and the overall health of a company. Therefore, report output should be in a format that anyone can understand easily. To create such output, you need to use the correct reporting tools. This paper, written for data analysts, discusses techniques to power up (amplify) the effectiveness of your reporting. These techniques use SAS® Output Deliver System (ODS) destinations (especially the ODS Excel destination) to generate functional, presentation-ready Microsoft Excel worksheets. The discussion also explains how to use the ODS destinations to enhance web pages and other types of documents. Finally, the paper explains how you can use Python open-source software with the SAS® System and ODS destinations to further enhance your reporting.
Chevell Parker, SAS
Session -2019
Making sense of a large amount of data is one of the most important aspects of a reporting system. Reporting helps you and others in your organization discover important insights into trends, business strengths and weaknesses, and the overall health of a company. Therefore, report output should be in a format that anyone can understand easily. To create such output, you need to use the correct reporting tools. This paper, written for data analysts, discusses techniques to power up (amplify) the effectiveness of your reporting. These techniques use SAS® Output Deliver System (ODS) destinations (especially the ODS Excel destination) to generate functional, presentation-ready Microsoft Excel worksheets. The discussion also explains how to use the ODS destinations to enhance web pages and other types of documents. Finally, the paper explains how you can use Python open-source software with the SAS® System and ODS destinations to further enhance your reporting.
Chevell Parker, SAS
Session 3879-2019
One of the challenges that business users have is handling data of variety and size and processing it to extract information to make business decisions. SAS® Enterprise Guide® can handle complex data files easily with different file formats coming from various data sources. SAS® Visual Analytics, with its rich visuals, can be used to develop business-friendly dashboards. Have you ever felt the need to publish or update comments and insights while analyzing the results in the dashboard so that its visible to others in real time? SAS Visual Analytics has unique capability to solve this user case using SAS stored procedures by interacting with external URLs. This paper explains how a user can enter their comments and insights to publish them real time in the SAS Visual Analytics dashboard. This paper also describes how to interact each record in a SAS Visual Analytics List table object by linking to an external link; each record in a SAS List table links to the PDF dashboards [created by Microsoft Excel Visual Basic for Applications (VBA)] residing in the server, which can be downloaded by clicking on the record in the SAS Visual Analytics List table. Also, it includes a process to send customized emails to the end users using SAS Enterprise Guide. These emails are customized by subject, body, and attachments (PDF dashboards).
Balraj Pitlola,
Lokendra Devangan,
Venkata Karimiddela, and
Malleswara Sastry Kanduri, Core Compete
Session 3427-2019
While it might seem as if you need to be an artist to create the kinds of beautiful, interesting, interactive visualizations you see on many commercial websites, you don't. All you need is a basic understanding of how HTML5 works and how human beings process visual information. In this paper, we provide guidelines for using SAS® Visual Analytics to create websites that are responsive and reactive. Responsive design relies on HTML5 technologies to dynamically adjust to the screen size and orientation of a web-based device. This enables websites to work well on many different devices, but it can cause problems. We present guidelines that reduce trial-and-error testing and describe common responsive design issues (resized legends, lost HTML tags, nonstandard fonts, and more). We show how to easily test the responsiveness of a report by using web developer views built into Google Chrome and Mozilla Firefox, and we provide warnings for some known issues with different browsers. Reactive design focuses on how a website responds to users interactions, in particular the speed and sensibility of response to human input. We describe how to implement a reactive design creating a smooth workflow of finger swipes or mouse clicks, how to use white space and negative space to draw users attention, how to use color in headers and graphs to associate related content, and other tips for enabling users to maintain a mental map of dynamic content, quickly accessing the information they need to know.
Elliot Inman,
Olivia Wright, and
Mark Malek, SAS
Session 3018-2019
Logistic regression modeling can result in perfect prediction of the responses in your training data, which unfortunately can correspond in practice to very imperfect prediction of the responses in a new data set. This imperfect prediction is often due to complete separation of your data, which means that there is a way to fit the model so that all the events have high predicted probabilities and all the nonevents have low predicted probabilities. In particular, there is a region that separates your event data from your nonevent data and that contains no observations. This dead zone has no information about your model, and hence you cannot evaluate the correctness of any predictions inside this region. This paper illustrates the dead zone and provides methods to find and measure it.
Robert Derr, SAS
Session 3107-2019
Hospital-acquired conditions (HACs) are health-care complications that occur during inpatient hospital stays. Given that HACs can potentially be avoided if hospitals follow standard clinical evidence-based guidelines, Medicaid and Medicare reduce reimbursement for these conditions. This paper demonstrates how the LOGISTIC procedure can identify patients at high risk of developing HACs during an inpatient stay. PROC LOGISTIC predicts the probability of an outcome given a set of covariates by using logistic regression models and maximum likelihood estimation. Features in the procedure enable users to select covariates associated with the highest probability of having an event. This paper demonstrates how to use PROC LOGISTIC to identify patients at an elevated risk of HACs based on patients prior medical history. Steps include: 1) identifying patient cohorts (training and test cohorts); 2) defining the HAC outcome event; 3) selecting covariates associated with developing an HAC; 4) identifying the best fit model in the training cohort; 5) examining model fit in the test cohort; and 6) assigning patients a HAC risk score. Results can be used to flag patients at high risk of developing HACs and to improve a hospitals overall health-care quality. Risk scores applied to electronic medical data can be used to prevent HACs in real time upon patient admission.
Jennifer Hargrove and
Pamela Hipp, SAS
Session 3878-2019
More than 50% of startup companies fail in the initial four years. Further, three out of every four venture-backed firms fail. The algorithm proposed in this paper can help to predict the success of a startup company based on financial and managerial variables. This prediction can help investors to get an idea whether the investment in a startup will be successful. Apart from implementing a model consisting of all the factors mentioned below and predicting the success of a startup company, various other models were created representing various milestones achieved by the company. This paper can also help the startup companies to know which factors are essential for getting an investment. The algorithm is based on more than 15,000 companies data collected from crunchbase.com. The financial variables include: investments in each funding round; valuation after each round of funding; current market value; total funds; investments and acquisitions by the company; and the financial background of key people. The managerial variables include: number of employees; competitors; location; age of the company; founders background; burn rate; and various news articles on the company scrapped from internet. A variety of methods will be used to determine the best model such as random forest, text parsing, logistic regression, decision tree and Survival analysis.
Vrushank Shah and
Miriam Mcgaugh, Oklahoma State University
Session 3877-2019
In today's data-driven world, unprecedented information related to product reliability and field quality can be furnished through warranty claims data. Predictive warranty analytics helps in unearthing potential problems early on and identifying emerging issues before they become huge, costly problems, and enables the initiation of the problem-solving process months in advance. The tools and techniques used in warranty analytics assist the quality management teams in decision-making and extraction of meaningful information by analyzing structured as well as non-structured warranty data with statistical techniques. SAS® Field Quality Analytics, a suite for quality analytics, assesses data from warranty, customer service, product, and other relevant sources to detect emerging issues sooner and provide actionable insights. This paper showcases how SAS Field Quality Analytics can be used to discover hidden relationships, patterns, and future trends of product performance, claim rates, customer views, and so on. Data from various sources is manipulated to bring in the desired format using Base SAS® and advanced coding. A combination of techniques like Enterprise Analytic Early Warning (for emerging issues identification), Reliability Analysis (for claim cost and frequency forecasting) using Weibull distribution, Text Mining on comments, inferential analysis and so on, were used. OEMs can benefit from these analyses to support strategic decisions like vehicle recall and maintain customer loyalty.
Saket Shubham,
Prateek Singh, and
Shiv Kumar, Mahindra & Mahindra (BristleCone)
Session 3268-2019
Working in different cultures results in a wide variety of data presentation among different regions and languages. If you have trouble showing data that complies with local culture or customer habit, SAS® National Language Support (NLS) formats are here for you! This paper shows you the power of NLS formats to translate SAS® output into meaningful results for users anywhere in the world. This paper demonstrates how using NLS formats in typical user scenarios gives you great usability and flexibility for your data presentation.
Yingying Zhao, SAS
Session 3566-2019
PROC DS2 is a new programming feature of the SAS® programming language that introduces the concepts of methods, modularity, and encapsulation. A DS2 program encapsulates variables and code. DS2 is included with the SAS programming language, which is based on the DATA step language. DS2 provides new data types that make it possible to have better precision for data and better interaction with external databases. DS2 includes three separate, but related, system-defined methods that run in order. The methods are INIT, RUN, and TERM. DS2 allows SAS® practitioners to define their own custom methods, which are called user-defined methods. This paper concentrates on how SAS programmers can use PROC DS2 to write modularized programs that result in easier maintenance. It introduces the basic programming instructions (syntax) associated with the construction and use of PROC DS2.
Dari Mazloom, USAA
Session 3704-2019
Managing a large volume of diagnosis and procedure codes is a common task in health services research that uses administrative data. One such collection of procedure codes is provided by the Centers for Disease Control and Preventions National Healthcare Safety Network (NHSN), which tracks 39 unique categories of surgical procedures for the purposes of healthcare-associated infection surveillance. To identify these surgical categories in a data set, a simple programming method would be to write 39 IF-ELSE IF statements referencing a single variable containing procedure codes. The issue with this approach is that there are over 1,100 Current Procedural Terminology (CPT) codes in the most recent NHSN release. The authors approach to reducing the written text in a program is to use a macro to write IF statements for the 39 variables using a user-defined function. The function searches the entered CPT code in a hash table holding the NHSN-specific CPT codes. The major benefits of this approach are reducing data entry errors that could be introduced when manually entering procedure codes into the program, as well as making a more concise program by mimicking an attribute table. In this case, the table holds the instructions for creating a variable instead of, for example, values for formatting a figure. Additionally, the process is scalable and has been applied to many similar challenges in our research.
Matthew Keller,
Katelin Nickel, and
Dustin Stwalley, Washington University School of Medicine St Louis
Session 3144-2019
The paper explains how one can best use the information stored in a history.log file that contains information about the jobs running in process manager. A SAS® program can be written to read the history.log file to identify failed jobs, long-running jobs, CPU consumption of the jobs, execution time of jobs, and so on.
Pradeep Ponnusamy, Royal Bank of Scotland
Session 3481-2019
SAS® Viya® is used for enterprise-class systems,
and customers expect a reliable system. Highly available deployments are a
key goal for SAS Viya. This paper addresses SAS Viya high-availability
considerations through different phases of the SAS® software
life cycle. After an introduction to SAS Viya, design principles, and
intra-service communication mechanisms, we present how to plan and design
your SAS Viya environment for high availability. We also describe how to
install and administer a highly available environment. Finally, we examine
what happens when services fail and how to recover.
Edoardo Riva, SAS
Session 3096-2019
Monte Carlo simulation can be used in SAS® Model Implementation Platform to estimate the average cash flow from a loan under one or more set of economic assumptions. Some users want an independent simulation for each scenario, while others might want to use the same set of random numbers for each scenario. If you do not set up the random numbers in a specific way, then you might not get the numbers that you are expecting. SAS® 9.4M5 introduced new pseudo-random number generators (PRNGs) and new subroutines that enable you to manipulate multiple pseudo-random number streams. These enhancements enable SAS Model Implementation Platform to generate PRNGs that are well-suited for a wide variety of needs. This paper describes techniques for using these features correctly, according to your specific purpose.
Joel Dood, SAS
Session 3807-2019
How do you help a community of SAS® users become more involved in the software and learn new ideas without providing direct training? We decided to answer this question by designing a series of puzzles that require the solutions to be run in SAS. This enabled us to showcase SAS features that were unused by many of our colleagues, and at the same time promote a critical-thinking mindset when solving the puzzles. Whether its X marks the spot using office locations and mapping data, solving a Rubiks cube using the SGPLOT procedure, or a data set-based crossword using SAS procedure names, each puzzle requires a different technique to solve and hopefully gets users using some newly acquired knowledge within their day-to-day work.
Amit Patel, Barclays
Lewis Mitchell, Barclays
Session 4005-2019
Programming with text strings or patterns in SAS® can be complicated without knowledge of Perl regular expressions. Just knowing basic skills of regular expressions (PRX functions) will sharpen anyone's programming skills. Having attended a few SAS conferences lately, first I have noticed that there are very few presentations on this topic, and second, even experienced presenters are either not familiar with it or they are not aware how powerful PRX functions can be in SAS. In this presentation, I present quick tips that anyone can learn very quickly.
Pratap Kunwar, The EMMES Corporation
Session 3857-2019
Passenger comfort, optimal utilization of rolling stock, and proper allocation of operational resources are the essential aspects for efficient management for any rail infrastructure. In this paper, we address these scheduling problems using modern SAS® technologies and SAS/OR® software: 1) Minimize travel time of all the trains for a route by reducing the number of stops; and 2) Optimal allocation of passenger seats between every pair of stations in each train. We also provided a solution to reduce the number of trains in each route while continuing to meet the existing customer demand. Mixed integer linear programming models were used to meet the two scheduling objectives. The computational results with real-life study on the Indian rail network show a significant reduction in the number of stoppages and trains while satisfying the given demand. Learn how this solution successfully solves the optimization problems using SAS® Inventory Optimization and SAS/OR software.
Malleswara Sastry Kanduri,
Ayush Tiwari, and
Lokendra Devangan, Core Compete
Session 3101-2019
Bitmap lookup was introduced into SAS® programming in 2000. Yet, its virtues started to gradually fade from the minds of SAS programmers after the advent of the SAS hash object a few years later. One reason is that the hash object requires no custom coding. The other is that its table lookup functionality seems to supplant the utility of bitmapping. However, this impression is false. In its niche of searching against a massive number of limited-range integer keys, a properly constructed and accessed bitmap is both much faster and incurs much smaller memory footprint. In this paper, we show programming methods of allocating, mapping, and searching bitmaps based on both character and numeric arrays, and discuss their relative pros and cons in terms of lookup speed and memory usage.
Paul Dorfman, Dorfman Consulting
Lessia Shajenko, Bank of America
Session 3361-2019
Regulatory agencies and pharmaceutical companies use real-world evidence (RWE) to generate clinical evidence derived from real-world data (RWD) for routine regulatory drug review and to monitor the usage and potential benefits or risks associated with a medical product in real-world settings. SAS® Real World Evidence is a visual RWE and visual analytics platform that enables quick discovery and creation of patient cohorts for population health analytics. We used SAS Real World Evidence to create index event cohorts to perform unsupervised and supervised signal detection analyses involving stroke events and atypical/typical antipsychotic medications. Use cases that show how SAS Real World Evidence enables intersection and application of RWE with population health analytics are provided. A population-level estimation example that used SAS® causal estimation and propensity score matching procedures to examine the association between antipsychotic drugs and stroke risk is presented in the paper.
David Olaleye and
Lina Clover, SAS
Session 3108-2019
For years, its been common knowledge that you should normalize your data to maximize your storage space. But, of course, this involves creating and maintaining reference tables. What if the tables become out of date? Am I just causing more problems by introducing complexity like this? In this paper, I illustrate a method of automating the normalization of tables to take the work and maintenance out of the equation. I also introduce an alternative to normalization available in Teradata and see how it stacks up. Using the SAS® Macro facility and Teradata SQL, I walk you through an example-driven discussion of these concepts.
Darryl Prebble, Prebble Consulting Inc.
Session 3677-2019
The efficient use of space can be very important when working with large SAS® data sets, many of which have millions of observations and hundreds of variables. We are often constrained to fit the data sets into a fixed amount of available space. Many SAS data sets are created by importing Microsoft Excel or Oracle data sets or delimited text files into SAS, and the default length of the variables in the SAS data sets can be much larger than necessary. When the data sets do not fit into the available space, we sometimes need to make choices about which variables and observations to keep, which files to zip, and which data sets to delete and re-create later. There are things that we can do to make the SAS data sets more compact and thus use our space more efficiently. These things can be done in a way that enables us to keep all the desired data sets without sacrificing any variables or observations. SAS has compression algorithms that can be used to shrink the space of the entire data set. In addition, there are tests that we can run that enable us to shrink the length of different variables and evaluate whether they are more efficiently stored as numeric or as character variables. These techniques often save a significant amount of space, sometimes as much as 90% of the original space is recouped. We can use macros so that data sets with large numbers of variables can have their space reduced by applying the above tests to all the variables in an automated fashion.
Stephen Sloan, Accenture
Session 3109-2019
Multicollinearity can be briefly described as the phenomenon in which two or more identified predictor variables are linearly related or codependent. The presence of this phenomenon can have a negative impact on an analysis as a whole. It can severely limit the conclusions of a research study. In this paper, we briefly review how to detect multicollinearity. Once it is detected, we determine which regularization techniques are the most appropriate to combat it. The nuances and assumptions of R1 (lasso), R2 (ridge regression), and elastic nets are covered to provide an adequate background for appropriate analytic implementation. This paper is intended for any level of SAS® user. This paper is written for an audience with a background in theoretical and applied statistics, although the information is presented in such a way that a user with any level of statistical or mathematical knowledge will be able to understand the content.
Deanna Schreiber-Gregory, Henry M Jackson Foundation for the Advancement of Military Medicine
Session 3113-2019
Making HTTP requests to interface with a REST API is a common necessity in modern programming. Doing this from within a SAS® program can be cumbersome. Making a simple get request requires the user to write header and body files, parse any URL parameters, and pass this information along to PROC HTTP. Authentication further complicates the problem. Once requests are made, the response needs to be somehow read in a way that is useful to the SAS programmer. This process often requires that the user be able to parse JavaScript Object Notation (JSON) strings to extract data before storing them in a SAS data set or local database. In this paper, we show how users can create Lua modules that simplify these tasks and, by using PROC LUA, how you can use these modules to easily interface with a REST API from SAS. We also show how debugging functionality can be worked into these modules and used to troubleshoot request errors. As specific examples, we demonstrate how to retrieve online climate data from the National Oceanic and Atmospheric Administration and how to interact with SAS® Risk and Finance Workbench programmatically.
Steven Major, SAS
Session 3695-2019
A teachers assessment or judgment of an individual students performance can impact other educators expectations of that students ability as well as the students future academic placement. Exploring the relationship between teacher judgment and student performance in primary education is critical, as early barriers can evolve into significant academic hurdles for individual students at the middle school and high school levels. Correlation analysis and regression models have been used to analyze the longitudinal and cross-sectional relationship between students achievements and teacher judgment in reading and mathematics across grades and years, considering students demographics. SAS® procedures (such as PROC CORR and PROC MIXED) were used to explore a large data set from the North Carolina Education Research Data Center (NCERDC), which includes 6,511,741 students in 3rd to 8th grades from 2006 to 2013. The data set includes information such as students End-Of-Grade (EOG) test scores, demographic characteristics, and evidence of their academic performance in each grade and year. SAS provides an effective tool to explore this data set with accessible and easy-to-use analysis approaches. Results demonstrate moderate to high correlations, which are significantly higher for male and significantly lower for minority ethnic groups. The regression models reveal that students gender, ethnicity, and previous grade performance significantly affect their EOG achievement score.
Juile Ivy and
Sareh Meshkinfam, North Carolina State University
Amy Reamer, University of North Carolina Wilmington
Session 3264-2019
Experiments answer specific questions by design. In order for the results to have a reasonable amount of certainty and statistical validity, a sufficient number of observations is required. To achieve this, the concepts of type I error, power, and sample size are introduced as parameters of the initial experimental design. Real-world constraints such as budget and feasibility change over time. Thus, statisticians often re-evaluate sample size under varying sets of assumptions, sometimes even on the fly. SAS® Studio includes a variety of point-and-click tasks associated with determining sample sizes to enter assumptions. In addition, the underlying SAS® code can be saved to facilitate code refinement or for reuse at a later time. The purpose of this Hands-On Workshop is to introduce some of the features of SAS Studio for sample size determination. A number of examples are presented, including tests associated with proportions, means, and survival analysis. Each exercise starts with a research question, proposed methodology, and list of requirements needed for estimating the sample size. The attendees then have the opportunity to work through the exercise using SAS Studio to discuss adaptations that might be necessary for its use in their everyday programming environment.
William Coar, Axio Research
Session 3007-2019
Containers are a popular new idea for building and deploying applications. At SAS® Global Forum 2018, I introduced a recipe for building a docker container from a SAS® Viya® order. The container enables you to stand up an editing environment that accepts only SAS® programming and experiment with SAS Viya and related SAS® Cloud Analytic Services (CAS) actions from SAS code or from a Jupyter (Python) notebook. This talk presents a progress report pre-built containers from SAS, examples for different SAS access engines, and more complicated deployment patterns onto Kubernetes.
Paul Kent, SAS
Session 3057-2019
Whether you work with Java, C#, or web development technologies, source control plays an important role in software development. SAS® is no different. These days the front runner in the source control world is Git. Git is a widely used distributed-source control system with one remote repository hosted somewhere such as GitHub and local repositories on each users computer. This paper provides a look at the Git functions introduced with SAS® 9.4M6. Highlights include: an introduction to the functions and why they were developed; a glimpse into the user interfaces of SAS® Studio and SAS® Enterprise Guide®, which use the functions; a workflow scenario with usage examples for each function; and a functions section that goes over each individual function.
Danny Zimmerman, SAS
Session 3137-2019
Kerberos is a network authentication protocol commonly found in medium and large enterprises. It uses strong cryptography to prove the identities of users, hosts, and services. SAS® has long been able to integrate with Kerberos to not only support its authentication strengths, but to enable user-friendly features such as Integrated Windows Authentication (IWA). SAS delegates some key configuration and operational tasks to the Kerberos realm and operating system layers and their respective administrators. These include Kerberos principal configurations, ticket requests, ticket cache management, user authentication, and more. For this reason, it can be difficult for SAS administrators to fully see and understand the entire SAS and Kerberos integration picture. We outline the configuration of a real-world proof-of-concept environment, which includes SAS® Viya® and Hadoop over CentOS, integrated with Microsoft Active Directory Kerberos authentication, and with support for IWA from the users desktop. With a deep understanding of how this specific environment is configured step by step for Kerberos integration from the OS layer up through the applications, SAS administrators should be able to better understand deployment and integration options in their own corporate environment. They should be able to improve their troubleshooting capabilities in existing environments with Kerberos integrations.
Spencer Hayes and
Michael Shealy, Cached Consulting, LLC
Session 3415-2019
Data scientists use a variety of tools, both commercial and open-source, to achieve key goals for their organization. For enterprise applications of analytics and artificial intelligence, it is crucial that teams can collaborate no matter which tools they are using. SAS® software provides a platform on which all users in the enterprise can create intelligence from data and operationalize the results easily. Data scientists and developers whose core programming competence is in languages such as Python and R can efficiently use SAS through a variety of APIs to increase productivity and improve time-to-value. This paper describes and demonstrates a variety of best-practice use cases to show how SAS software provides integration with open-source tools to support end-to-end analytical workflows.
Jesse Luebbert and Radhikha Myneni, SAS Institute Inc.
Session 3652-2019
If you haven't heard about SAS® with Teradata Vantage, you will soon. Teradata offerings have evolved beyond a leading enterprise data warehouse into Teradata Vantage, a modern architecture that integrates analytic engines, analytic tools, and languages with seamless data integration. Vantage extends support for data types, formats, and sources beyond traditional structured data. SAS is an integral and essential component in a Vantage deployment, as SAS delivers analytic tools and languages that extend the new Vantage platform to create a complete analytic ecosystem. In this session, we will focus on answering these questions: How does SAS® fit into the Teradata Vantage analytics architecture? What are the benefits of using SAS® with Teradata Vantage? Can I put my analytic solution in the cloud? Do I need to change my SAS® code? Or my SQL? Can you show me some example use cases?
Heather Burnette and
Greg Otto, Teradata Corporation
Session 3838-2019
We live in a world where everything computes. Where technology, apps, and data are driving digital transformation, reshaping markets, and disrupting every industry. IT needs to reach beyond the traditional data center and the public cloud to form and manage a hybrid connected system stretching from the edge to the cloud. Listen to an expert on how Hewlett Packard Enterprise (HPE) and SAS help organizations to rethink and modernize their infrastructure more comprehensively to deploy their edge-to-cloud architecture and analyze data at the edge quickly. HPE has Edgeline Converged Systems to connect to, analyze, and store data. HPE IoT gateways (HPE GL10 IoT Gateway and HPE GL20 IoT Gateway) connect, convert, and perform simple analyses. HPE Edgeline Converged Edge Systems can provide data-center-like compute that can be deployed in complex environments.
Kannan Mani, HPE
Session 3511-2019
Basic macros rely on symbolic substitution to place values in particular locations in SAS® code they simply create code. Beyond that, the sky is the limit! We cover the following advanced techniques: create and populate macro variables; read a list of file names from a folder and use the list for processing code; utility macros that calculate new values without the need for a DATA step; and advanced macro functions. We also include a discussion about macro quoting. Some pointers for passing values when you are using SAS/CONNECT® and SAS® Grid are also discussed.
Ron Coleman, SAS
Session 3733-2019
There is a growing trend where containers have become the currency of deployment in production across Enterprise IT organizations. SAS® analytic workloads usually involve lots of data crunching that can benefit from flexible and scalable container-based infrastructures. Kubernetes is a container orchestration framework and is offered as a service on Google Cloud. Kubernetes provides a flexible way to separate storage and compute. In this session, we show how to deploy SAS containers and orchestrate the analytic workloads on Kubernetes framework to achieve the following benefits: 1) Simplicity, ease of installing and managing SAS; 2) Performance ability to compute traditional workloads in a few hours as opposed to days; 3) Cost savings provision high performance and storage only when needed; 4) Elasticity ability to scale up and down based on business needs; and 5) Orchestration workflow management of analytic jobs. Delivering the best analytics in the marketplace married to the best container orchestration management is a big win for customers. The framework presented creates just that.
Sumanth Yamala, Core Compete
Session 3311-2019
SAS® Environment Manager is a powerful web application that enables administrators to monitor and administer their entire SAS® Viya® platform. There are many tips and tricks to using this application to its fullest potential. Because SAS Technical Support Engineers support a wide variety of customer deployments, they have unique insight. This paper shares some of this insight into SAS Environment Manager through the following top-five lists: administration tips, lesser-known features, and documentation resources.
Ursula Polo and
Allison Mahaffey, SAS
Session 3626-2019
SAS/ACCESS® Interface to Hadoop is a critical component of integrating SAS® environments to the increasing amounts of data available in Apache Hadoop. Although SAS® Viya® has enhanced features for data lift to memory, there are many day-to-day tasks that will continue to leverage SAS®9 technologies for analytics. Therefore, the reliance on an optimal integration to Hadoop with SAS/ACCESS is critical, especially for larger data and user volumes. This paper highlights best practices and experiences over and above the standard SAS documentation by sharing real-life experiences from a large financial institution and close work with SAS Research and Development. Focus in this paper is on using Apache Hive and Hadoop Distributed File System (HDFS), but other SAS and Hadoop technologies are mentioned when and if better choices for performance or business requirements are warranted. The author is a long-time SAS and financial institution architect and administrator with extensive experience in SAS and Hadoop performance tuning and hardware.
Thomas Keefer, D4T4 Solutions
Session 3860-2019
The GPS system is widely used for identifying the location of people and devices in many circumstances. However, it has been shown both by the authors final year students and others that GPS tends to have an accuracy of about +/- 25 meters 85% of the time, with the outliers ranging up to 1800 km. While the 25-meter accuracy is satisfactory for many situations, it is problematic when GPS is relied on to monitor the position of vehicles, especially near junctions, such is proposed for the Virtual Traffic Light systems, which need accuracies closer to 1 to 3 meters in order to manage the sequencing of movements. The requirement is to be able to use the GPS data, which is recorded in a time series at 1-second intervals, to provide some level of error estimation in order to correct the measured locations relative to the physical location. A group of students undertaking their Final Year Independent Studies projects used a range of approaches to investigate the feasibility of doing this, using both Microsoft Excel and SAS®. It is easy to visually identify the major positional errors when the data is plotted on a map or when a graph of velocity or acceleration is plotted against time. However, the challenge is to use an algorithmic approach. PROC ARIMA was developed for use on financial time series over long timescales but it was shown that it could be applied effectively to intervals as short as 1 second. This presentation demonstrates the power of PROC ARIMA in this unusual field.
Richard J Self, University of Derby
Session 3616-2019
SAS® Enterprise Miner (TM) and Model Studio are two solutions
that you can use to create predictive models. In this paper, we show that
although these applications have different architectures and run in
different environments, we can integrate models generated in one environment
and compare them with models produced in the other. In SAS Enterprise Miner,
we show how the SAS Viya Code node can be used to create models based on SAS
Visual Data Mining and Machine Learning and integrate them into a SAS
Enterprise Miner project. For Model Studio, we describe how models generated
in SAS Enterprise Miner can be integrated into a Model Studio pipeline for
the purpose of comparison. We also discuss how you can use the SAS®
code in Model Studio to produce user-defined models. We hope that a better
understanding of these capabilities can help users to fully use the rich
functionality and flexibility of these products.
Jagruti Kanjia, SAS
Session 3775-2019
The SAS® environment maintains many different output styles to use to enhance the visual display of your output data. The ODS destination for Excel can take advantage of these styles maintained by SAS to apply formatting and color schemes to your Excel output workbooks. I show you how to use the STYLE option in the ODS destination for Excel to enhance your output workbooks.
William Benjamin, Owl Computer Consultancy LLC
Session 3044-2019
In some people, loyalty to an interface can run as deep as loyalty to a
football team, and its been no different for SAS® users. The
good news is that a concerted effort is underway to remove the barriers
between interfaces and the differences between them so that you can just
choose what you use based on where you are. Are you on a desktop? A mobile
device? In a conference room? This paper talks you through the integrated
future that's under development. In the meantime, this paper helps you to
understand what the strengths and weaknesses are of the programming
interfaces today so that you can choose wisely. You also learn tips and
tricks for working efficiently. The paper focuses on SAS®
Studio and SAS® Enterprise Guide®, but it also
touches on several of the other SAS programming interfaces that are
available, including Jupyter.
Amy Peters and Samantha DuPont, SAS
Session 3156-2019
SAS® Viya® offers several techniques that can
maximize the speed of SAS® Visual Analytics reporting: data
partitioning, user-defined formats, and using aggregated data. However,
every SAS Visual Analytics report can be different: different data,
different graphs, and other differences in terms of filters, interactive
widgets, and more. Testing how changes to individual reports affect speed
can be laborious and might involve manually opening reports in the SAS®
Report Viewer several times and meticulously reviewing each reports
diagnostics or microservice logs. Even with this information, external
factors such as network performance can confound the diagnostics. This paper
presents a programmatic way to call a SAS Visual Analytics report to quickly
determine how long it takes the report to render using the report Images
service, available via the SAS Viya REST API. This paper provides all of the
code for an automated, end-to-end process that leverages the SAS Viya REST
API to retrieve the server-side render time of a SAS Visual Analytics
report. Code is provided for testing an individual report on demand. This
process can be repeated automatically while the report designer tests
several versions of the report. Macro code demonstrates how to test a suite
of reports for comprehensive A/B comparisons. Data gathered from these
repeated API calls enables designers to quickly determine the best
performance techniques to meet their specific reporting needs.
Michael Drutar, SAS
Session 3986-2019
SAS® Viya® offers new and exciting capabilities for the SAS® Platform, but the SAS Viya capabilities and designs might be unfamiliar to many SAS® users. This paper highlights experiences gained from multiple SAS Viya installation and implementation projects involving SAS Viya 3.2, 3.3, and 3.4, from small to very large hardware resource allocations. Key topics include hardware and architectural concepts, leveraging of external data sources, recovery considerations, integration with SAS®9, and daily operation overviews. This paper provides valuable insights to organizations considering or in the early stages of a SAS Viya implementation.
John Schmitz, Luminare Data
Session 3153-2019
Have you ever wanted to start your own small economy, but had no idea how to handle the recordkeeping? In that case, it is time to learn about what blockchain technology can do. Using Base SAS® code examples, this paper illustrates the principles behind creating a blockchain system to provide an unalterable record of validated transactions for your project.
Seth Hoffman, GEICO
Session 3355-2019
The next generation of SAS® Enterprise Guide® is here! The redefined SAS Enterprise Guide 8.1 is both sexy and intelligent. SAS Enterprise Guide 8.1 sports a modern and attractive interface, tab-based organization of content, and flexible window management such as floating, docking, and leveraging multiple monitors. Want to view your code, log, and results at the same time? SAS Enterprise Guide 8.1 has you covered. Want to view four data sets at the same time? Sure, no problem. Want to just code, without a project? You got it. See these features and more at the big reveal of SAS Enterprise Guide!
Casey Smith, SAS
Samantha Dupont, SAS
David Bailey, SAS
Session 3465-2019
Your website tracking generates a mountain of data loaded with gems waiting to be discovered. Analyzing this data uncovers untold stories and insights when, why, and how do people visit your website? But that's only part of the story. By combining front-end data with back-end data, beautiful gold nuggets of knowledge emerge. Uniting the unique visitor identification capabilities of SAS® Customer Intelligence 360 with customer relationship management (CRM) data empowers you to know not only when, why, and how people use your website, but also who is using it. You can see how different people and personas behave. In turn, CRM data can be fed back into SAS Customer Intelligence 360 to enable you to target each person or persona segment with more relevant content and promotions. This paper delves into how the marketing technologists behind sas.com integrate the abilities of SAS Customer Intelligence 360, SAS® Enterprise Guide®, and SAS® Visual Analytics to generate golden insights.
Mark Korey, SAS
Laura Maroglou, SAS
Session 3479-2019
SAS® Cloud Analytic Services (CAS) is the engine of our in-memory analytics offerings. It promises to deliver massively scalable performance of huge volumes of data. But then getting all of that data loaded into CAS (or saved back out later) is another scalability challenge to tackle. CAS addresses that challenge by providing support for a wide range of data sources, starting with native SAS® formats and extending to include direct loading from many third-party data providers. Along the way, we also need to carefully plan and evaluate the architecture and infrastructure to ensure that everything is moving along as fast and efficiently as intended. As of SAS® Viya® 3.3, the language we use to direct CAS in its efforts to transfer data has evolved considerably. We can explicitly configure CAS to use a serial data transfer method, but the objective were shooting for is effectively parallel. Alternatively, we can set up CAS and other related components for a fully parallel transfer, but then CAS might deem it necessary to downgrade to a serial approach. So lets crack open this case and look at determining exactly what kind of data transfer were asking CAS to perform, verifying that's what it did, and understanding why it worked.
Rob Collum, SAS
Session 3337-2019
Scoring new data to compute predictions for an existing model is a fundamental stage in the analytics life cycle. This task has never been easier, given recent additions to SAS/STAT® syntax. This paper shows how to use these new features in SAS/STAT software to make it simple to predict the response for new data. The paper discusses practical applications for scoring with the SCORE statement within a modeling procedure, with the PLM procedure using a stored model fit, and with the CODE statement in the DATA step. The paper also discusses tried-and-true methods of scoring using the SCORE procedure and using the missing-value-for-the-response trick.
Phil Gibbs and
Randy Tobias, SAS
Session 3828-2019
After netting 65 goals in 102 league appearances between 2007 and 2011, Fernando Torres became the most sought after player in the 2011 January transfer window. Chelsea spent a whopping 50 million to procure the services of the most lethal striker of his era. Torres move set a British record for most expensive transfer at that time. In the seasons to follow with the Blues, the Spaniard failed to live up to his high price tag. He scored only 19 league goals in 103 matches for Chelsea. Was it the injuries or the pressure of the huge expectations? Or was he just not a right fit for the Blues? Every year we see major soccer clubs spend hundreds of millions of dollars to sign contracts with the players of their interest. More often than not, these deals are based on a players skills, potential for growth, performance in prior seasons, and brand value. While often these transfer deals pay off, there is an abundance of failed transfers every season. And in the end, it all invariably depends on the right fit between the playing style of the player and the team. The primary objective of this paper is to leverage analytics to find the perfect fit for the teams. The idea is to analyze and decipher underlying quantitative patterns, and make an association that could help identify players with similar skill sets; the potential market value of such players; and positions within teams where these players would be most effective.
Vishal Gaurav and
Goutam Chakraborty, Oklahoma State University
Session 3022-2019
Healthcare fraud is a big issue in the United states as well as in many other countries. Many solutions are being adopted by healthcare payers to prevent fraudulent claims from hitting their system so as to avoid millions of dollars of leakage. SAS® has played a great role in helping payers develop analytic solutions that help organizations prevent fraud. One such solution was developed by me to counter fraud using screen scrape through SAS. Phantom providers are healthcare providers who do not exist or who do not hold valid licenses and bill false claims to payers. A phantom provider once identified changes to their tax ID numbers, and bills from new address to bypass payers edit and to fool payers. To identify all such notorious fraud providers, we developed analytics that use screen scrape using SAS. We identified a state government website that keeps track of all business and their owners in their database. We used SAS to run screen scrape on that website and identified new addresses of providers that were already identified by us as fraudulent. In this way, we were able to run our analytics in a dynamic way using screen scrape and identified thousands of other providers linked to those we had already identified as fraudulent, saving millions of dollars for our organization.
Hitesh Kharbanda, OPTUM GLOBAL SOLUTIONS
Gina Hedstrom, UHG
Session 3067-2019
The cumulative logit model is a logistic regression model where the target (or dependent variable) has two or more ordered levels. If it has only two levels, then the cumulative logit is the binary logistic model. Predictors for the cumulative logit model might be NOD (nominal, ordinal, discrete), where typically the number of levels is under 20. Alternatively, predictors might be continuous, where the predictor is numeric and has many levels. This paper discusses methods that screen and transform both NOD and continuous predictors before the stage of model fitting. Once a collection of predictors has been screened and transformed, the paper discusses predictor variable selection methods for model fitting. One focus of this paper is determining when a predictor should be allowed to have unequal slopes. If unequal slopes are allowed, then the predictor has J-1 distinct slopes corresponding to the J values of the target variable. SAS® macros are presented, which implement screening and transforming methods. Familiarity with PROC LOGISTIC is assumed.
Bruce Lund, Independent Consultant
Session 3140-2019
When you work in SAS® Visual Analytics, you want to complete your reports as quickly and easily as possible. SAS® Visual Analytics 8.3 provides new self-service data manipulation capabilities that significantly increase your control over data sources. This paper examines four of these new capabilities, each of which enables you to work with data more effectively in SAS Visual Analytics. You see how to join multiple SAS® Cloud Analytic Services (CAS) tables in SAS Visual Analytics to incorporate data items from different tables in the same report object. Next, you learn how to define a new data source (an aggregated data source) that can increase efficiency by limiting the number of rows used for an object. You also learn how to define a common filter that can cascade changes to the filter across multiple report objects. And for the final treat, you learn how to capture and reuse data changes as a data view. With this approach, the data view is associated with the CAS table and can be applied anytime that the CAS table is used to create a report. An application administrator can share a data view, so it can be applied by other users who can access the same CAS table. Be prepared to be delighted and to leave this paper a more efficient report designer!
Nicole Ball, Richard Bell, Lynn Matthews, SAS
Session 3479-2019
SAS® Cloud Analytic Services (CAS) is the engine of our
in-memory analytics offerings. It promises to deliver massively scalable
performance of huge volumes of data. But then getting all of that data
loaded into CAS (or saved back out later) is another scalability challenge
to tackle. CAS addresses that challenge by providing support for a wide
range of data sources, starting with native SAS® formats and
extending to include direct loading from many third-party data providers.
Along the way, we also need to carefully plan and evaluate the architecture
and infrastructure to ensure that everything is moving along as fast and
efficiently as intended. As of SAS® Viya® 3.3,
the language we use to direct CAS in its efforts to transfer data has
evolved considerably. We can explicitly configure CAS to use a serial data
transfer method, but the objective were shooting for is effectively
parallel. Alternatively, we can set up CAS and other related components for
a fully parallel transfer, but then CAS might deem it necessary to downgrade
to a serial approach. So lets crack open this case and look at determining
exactly what kind of data transfer were asking CAS to perform, verifying
that's what it did, and understanding why it worked.
Rob Collum, SAS
Session 3008-2019
Self-administered health survey short form (SF12v2) is commonly used to assess the health-related quality of life (HRQOL) among populations. However, there is a lack of data regarding its effectiveness among African Americans (AAs). This study's design is a secondary data analysis of a prospective cohort study. The aim of this paper is to assess the quality of life among AAs enrolled in a faith-based diabetes prevention program, Fit Body and Soul (FBAS), compared to a health education (HE) program, using SAS® software. SAS® 9.4 was used to score the data. The SF12v2 data calculate two summary component scores, Physical Component Summary Score (PCS) and Mental Health Component Summary Score (MCS) with eight sub-domains. Both PCS and MCS combine the 12 items in such a way that they compare to a national norm with a mean score of 50.0 and a standard deviation of 10. A total of 604 people were enrolled. Overall PCS for FBAS was 49 at baseline, 51 at week 12, and 50 at week 52 and for HE was 48 at baseline, 49 at week 12 and 49 at week 52. Overall, MCS for FBAS was 51 at baseline, 53 at week 12, and 52 at week 52 and for HE was 51 at baseline, 52 at week 12 and 51 at week 52. Quality of life among participants at week 12 was improved from baseline but not maintained at week 52. SAS software was an effective program for scoring the SF12 data. The SF-12v2 appears to be a valid survey tool for the assessment of HRQOL among AAs.
Thomas Joshua, University of Georgia
Lucy Marion, Augusta University
Lovoria Williams, University of Kentucky
Jane Garvin, University of St. Augustine for Health Sciences
Session 3276-2019
Programs that process large data sets can consume lots of a programmers time, waiting for jobs to complete. This paper presents simple techniques that all programmers can use to speed up their programs, leaving time to get more work done (or maybe to play more golf :). Techniques include subsetting, indexes, data set compression, and in-memory data.
Steve Cavill, Infoclarity
Session 3176-2019
Leveraging smart metering solutions to support energy efficiency on the individual household level poses novel research challenges in monitoring usage and providing accurate load forecasting. Forecasting electricity usage is an especially important component that can provide intelligence to smart meters. In this presentation, we propose an enhanced approach for load forecasting at the household level. The impacts of residents daily activities and appliances usage on the power consumption of the entire household are incorporated to improve the accuracy of the forecasting model. The contributions of the presentation are threefold. (1) We addressed short-term electricity load forecasting for 24 hours ahead, not on the aggregate but on the individual household level, which fits into the Residential Power Load Forecasting (RPLF) method. (2) For forecasting, we used a household-specific data set of behaviors that influence power consumption, which was derived using segmentation and sequence mining algorithms. (3) An extensive load forecasting study using different forecasting algorithms enhanced by household activity patterns was undertaken.
Krzysztof Gajowniczek, Warsaw University of Life Sciences
Session 3350-2019
A question frequently asked by SAS customers is how to run SAS® Environment Manager in a grid (or a shared configuration directory) environment. As of today, SAS has over seventy customer tickets that are related to this issue. You request, we listen. In SAS® 9.4M6, we re-designed the SAS Environment Manager agent to make it smart: it detects whether the request to run the SAS Environment Manager agent is from a new host or from a new grid node. If the request is from a new host, it automatically creates a new SAS Environment Manager agent instance for that host. With this design, we also introduced many capabilities to manage the smart agent to make it easier for customers. The new capabilities include support for update in place, hot fixes, other platforms, and all SAS® Deployment Manager tasks. This paper details the design, the implementation, and the process for how you can take advantage of this new smart agent feature. As a bonus, we also review what is new in the SAS® 9.4 middle-tier platform.
Zhiyong Li, SAS
Session 3178-2019
Gas usage forecasting is a difficult task. Users need an hourly forecast of natural gas used. Or, they need the internal (national) market with really high performance (with a mean absolute percentage error (MAPE) close to 3%) to ensure the correct balance of the network and to provide fair information to the gas shipper. In this paper, we explain how we reached a MAPE below 3% using multiple neural network models and an ensemble based on the history of the performance of each single model. We used good old MP CONNECT technology to achieve a high-performance forecast (using the good old SAS® 9.4 version).
Andrea Magatti, Gabriele Oliva, Gabriella Jacoel, and Carlo Binda, Business Integration Partners
Silvia Lameri, SNAM
Session 3490-2019
Diagnosing performance issues can be a lengthy and complicated process. For many, the most difficult step is figuring out where to begin. This typically leads to a track being opened with SAS Technical Support. The SAS Performance Lab has developed a standard methodology for diagnosing performance issues based on years of experience doing so both internally and at customer sites. This process is regularly applied when assisting with performance issues in SAS Technical Support tracks. This presentation goes through the methodology used by the SAS Performance Lab to diagnose performance issues and discusses resolutions to the most common problems.
Jim Kuell, SAS
Session 3237-2019
Structured Query Language (SQL) in SAS® provides not only a powerful way to manipulate your data, it enables users to perform programming tasks in a clean and concise way that would otherwise require multiple DATA steps, SORT procedures, and other summary statistical procedures. Often, SAS users use SQL for only specific tasks with which they are comfortable. They do not explore its full capabilities due to their unfamiliarity with SQL. This presentation introduces SQL to the SQL novice in a way that attempts to overcome this barrier by comparing SQL with more familiar DATA step and PROC SORT methods, including a discussion of tasks that are done more efficiently and accurately using SQL and tasks that are best left to DATA steps.
Brett Jepson, Rho, Inc.
Session 3405-2019
Apache Hadoop is a fascinating landscape of distributed storage and processing. However, the environment can be a challenge for managing data. With so many robust applications available, users are treated to a virtual buffet of procedural and SQL-like languages to work with their data. Whether the data is schema-on-read or schema-on-write, Hadoop is purpose-built to handle the task. In this introductory session, learn best practices for accessing data and deploying analytics to Apache Spark from SAS®, as well as for integrating Spark and SAS® Cloud Analytic Services for powerful, distributed, in-memory optimization.
Kumar Thangamuthu, SAS
Session 3102-2019
The Humanitarian OpenStreetMap Team (HOT) community consists of volunteers from around the globe. HOT tasks involve developing maps that identify communities and infrastructures based on satellite imagery. These maps are then used to assist aid organizations such as the Red Cross during humanitarian crises and for general community development in areas that are often not covered by the mapping products that most of us take for granted. This presentation examines OpenStreetMap data and introduces HOT and some of the associated mapping tasks, including assisting the aid efforts during the May 2018 Ebola outbreak in the Democratic Republic of the Congo. Analysis is performed on the data using the SAS® ODS Graphics procedures (including PROC SGMAP) to visualize the contributions to OpenStreetMap both spatially and over time.
Michael Matthews, Innotwist Pty Ltd
Session 3118-2019
Although Electronic Data Capture (EDC) has improved efficiency and timeliness in data entry and analysis in clinical trials, it has reduced the safeguards inherent in double data entry performed by dedicated professionals. EDC is vulnerable to inadequate training, transcription errors, fat-finger errors, negligence, and even fraud. Moreover, recent initiatives in risk-based monitoring are moving away from 100% on-site source data verification. Thus, supplemental data monitoring strategies are essential to ensure data accuracy for statistical analysis and reporting. We have developed a suite of statistical procedures to identify suspicious data values for individual subjects and across clinical sites. They include rounding errors and digit preference checks, univariate and bivariate outlier checks, longitudinal outlier checks within subjects, and variance checks. Generally, regression models are applied to account for demographic characteristics and other important covariates. The residuals from these models are used to identify outliers. The suite of data checks is illustrated using fabricated data. The strengths of this approach are highlighted and there is discussion of its shortcomings. This suite of statistical data checks is an effective tool for supplementing current processes and ensuring data accuracy. It can focus resources on specific data fields and clinical sites for efficient risk-based monitoring strategies.
Kaitie Lawson, Rho, Inc.
Session 3262-2019
From state-of-the-art research to routine analytics, the Jupyter Notebook offers an unprecedented reporting medium. Historically, tables, graphics, and other types of output had to be created separately and then integrated into a report piece by piece, amidst the drafting of text. The Jupyter Notebook interface enables you to create code cells and markdown cells in any arrangement. Markdown cells allow all typical formatting. Code cells can run code in the document. As a result, report creation happens naturally and in a completely reproducible way. Handing a colleague a Jupyter Notebook file to be re-run or revised is much easier and simpler for them than passing along, at a minimum, two files: one for the code and one for the text. Traditional reports become dynamic documents that include both text and living SAS® code that is run during document creation. With the SAS kernel for Jupyter, you have the power to create these computational narratives and much more!
Hunter Glanz, California Polytechnic State University
Session 3171-2019
Emotional intelligence is having its moment. Research now confirms what we have long suspected. Likability matters. People hire people who they like and people who they feel they can trust. Its the X factor. Years ago, having worked for one company for a long time was a sort of protection against getting laid off. That's not the case anymore. Jobs are getting outsourced overseas, and employers are hiring contract labor instead of employees. Whether you are a SAS® programmer, a statistical analyst, a data scientist, or a manager of SAS programmers, the advice covered in this presentation applies to everyone. We talk about the hidden job market, explore networking channels that work, and discuss trends in recruiting (including artificial intelligence and California Law AB 168). LinkedIn, social media, resumes, cover letters, interview tips and techniques, and all things related to job upward mobility are covered. I have been working as an executive recruiter in the SAS marketplace for almost 20 years. As such, I have seen and heard it all! Lets have a laugh and share stories (all names have been changed and are confidential). Lets look at candidates who were offered the job or promotion, often against the odds. By the end of this presentation, you should have some new tools for getting hired in today's ever-changing job market.
Molly Hall, Synchrony Solutions, Inc.
Session 4049-2019
Baloise Group is more than just a traditional insurance company. The changing security, safety, and service needs of society in the digital age lie at the heart of its business activities. This session is about how to implement the International Financial Reporting Standards IFRS 17 and IFRS 9 as far as possible in alignment with the defined target picture. To realize this vision, we divide the implementation into seven areas.
Klaus Rieger
Session 3582-2019
The FREQ procedure is a powerful tool for summarizing categorical data. Cross-tabulating variables is a common application of PROC FREQ. Cumulative frequency and percents are calculated across the entire data set that PROC FREQ is summarizing. PROC FREQ supports BY-group processing as well. Using BY-group processing, cumulative frequencies and percents are calculated within each BY group. One limitation of PROC FREQ is that this is an either/or proposition cumulative frequencies and percents are calculated across the entire data set or within each BY group, but not both at once. Displaying multi-level cumulative frequencies in the same output table using PROC FREQ requires merging output data from multiple runs. An alternative approach that allows for displaying cumulative frequencies and percents from the full sample alongside cumulative frequencies and percents within each BY group uses PROC REPORT rather than PROC FREQ. This paper outlines an approach for creating such a summary table using PROC REPORT and highlights the advantages conferred through using this technique.
Jedediah Teres, MDRC
Session 3984-2019
Many generations of statisticians have studied survey data and the art and science of conducting surveys. Techniques have been developed based on variance of estimates that can indicate the quality of a survey estimate. Similarly, work continues on defining quality indicators for administrative data, such as the proportion of missing values of a variable. Big data is a new area with little study on how quality is defined. This poster explores quality indicators in these three data source domains.
Peter Timusk, Statistics Canada
Peter Timusk, Statistics Canada
Session 3629-2019
As an executive, you want to manage your organization using data. It gives you the bottom line and tells you if you are heading in the right direction. Sometimes you call in your research team, and you ask them questions that are on your mind and they say You'll need to do a survey to answer that one. We've lined up people who know how to write the questions, and we have someone to write the code to analyze the survey. So for each question with a typical 5 option rating, no basis to judge, and some people left the question blank, there would be 7 percentages. But lets make it more informative by showing for each question the demographics. In the case study described in this article, there would be 22 rows for each of 100 questions. Lets see100 questions times 22 rows times 7 response options that would mean wed be delivering 15,400 numbers. This happened to me exactly like this. Fortunately, the boss was transparent and would take those 15,400 tables and try over a few weekends to find a few hours to transcribe the relevant numbers into a spreadsheet so that maybe the message would jump out. How many times have you said to yourself A computer can do what you are doing manually? Please stop! Let me figure out how to get the computer to do what you are doing. This article describes how to get to the message when you are faced with 15,400 percentages.
Alice Feldesman, US Government Accountability Office
Session 3900-2019
Predicting the survival rate for predictors of interest brings clinical meaning to research projects. For example, our research team is interested in modeling the one-year survival rate for the geriatric patients who have had major surgeries, namely abdominal aortic aneurysm repair, coronary artery bypass grafting, and colectomy. Patients function status, such as activity of daily living (ADL), is a key covariate for the survival model. The prediction of survival rates of patients with different levels of ADL impairment provides clinicians a better understanding on the risk of functions. In general, researchers use the SAS® PHREG procedure to build Cox proportional hazards models, and the BASELINE statement is specified to obtain the predicted probabilities of survival. Researchers often use survey data to obtain essential information for their studies. In our study, a longitudinal survey for the health and economic circumstances of a national representative sample with participants age 50 and over. We used the SURVEYPHREG procedure to build the Cox model as the procedure enables specifying the sample weights and complex survey design variable. However, the SURVEYPHREG procedure currently does not contain the BASELINE statement. Hence, it is difficult to predict the survival rate when using survey data. In this paper, we discuss different methods of survival rate prediction when accounting for sample weights and complex survey design variables is necessary.
Bocheng Jing, NCIRE
Ledif Diaz-Ramirez, University of california, San Francisco
Session 3610-2019
Python is described within the community as the second-best language for everything (Callahan, 2018). The strength of Python as a language stems from its huge range of uses including as a tool for data science. Python is taught in many university courses, and, as a result, there are a vast number of coders with Python skills in the data science industry. Last year at Amadeus we explored the use of SASpy as an initial tool to bridge the gap between Python and SAS® 9.4 (Foreman, 2018). This year, we move on to look at integration with SAS® Viya®. In July 2016, SAS released Python Scripting Wrapper for Analytics Transfer (SWAT), a Python library. It enables connections to SAS® Cloud Analytic Services (CAS), and therefore, opens up the functionality of SAS Viya to Python users. SWAT enables users who do not have a SAS® background to perform their data manipulation and analytics using more familiar pythonic syntax while still harnessing the power of CAS and SAS Viya. In this paper, we demonstrate the use of Python SWAT to first connect to a CAS session, load data from Python into CAS, and use CAS action sets to analyze it. We then make the data available to other applications such as SAS® Visual Analytics to demonstrate some of the functionality this gives.
Carrie Foreman, Amadeus Software Limited
Session 3642-2019
For every action, there is an equal and opposite reaction. Sir Isaac Newton
was talking about physics, but the concept also applies to other areas of
life including quotation marks in SAS® code. Unfortunately, SAS
coders do not always properly provide balanced quotation marks while coding.
SAS detects possible occurrences of this problem, signaling its concern with
a SASLOG message: WARNING: The quoted string currently being processed has
become more than 262 characters long. You might have unbalanced quotation
marks. This presentation contains a few coding suggestions to identify
unbalanced quotation marks in SAS code.
Andrew Kuligowski, Independent Consultant
Session 3727-2019
The bottom line of a business is directly influenced by a great many
factors, none more so than the profit generated or money lost from each
customer. Accessing financial performance metrics at the account level is a
crucial first step in optimizing the performance of a company; from
improving efficiency of marketing spend to identifying the most desirable
customers to retain. The SAS® Viya® REST APIs
provide the ideal route for a custom-made web application to access the
power of SAS® Cloud Analytic Services (CAS). In this paper, we
explore the options available for manipulating, analyzing, and reporting on
Account Level Profit using SAS Viya REST API calls.
Richard Carey, Demarq Limited
Session 3168-2019
In search of insight, we usually focus on data - access, preparation,
modeling, analysis, and visualization - but the value of data isn't always
apparent to the rest of the business. Data alone is not motivation enough
for someone to invest their time in a project unless the value is clearly
obvious. An effective way to engage internal business users with analytics
is with a data story, which is a story supported by data. Data stories paint
a bigger picture and evoke an empathy for what the numbers actually
represent. Contrary to popular belief, a data story is not a data
visualization. Data visualization is a great way to tell a data story, but
it relies on the story having been written first. A well-written data story,
combined with a good visualization, improves your insight communication.
Principles of graphic design are rarely taught to analysts, but an
understanding of the principles lifts any data visualization to the next
level. Kat shows you how to write a data story and effectively tell it using
data visualization. She shares an introduction to graphic design from a data
perspective and illustrates how applying simple design principles in SAS®
Visual Analytics enhances your data storytelling.
Kat Greenbrook, Rogue Penguin
Session 3462-2019
If you rely on data but aren't analytics oriented, you're missing out on
insights. Even worse, you could be sabotaging the decision-making that
relies on insights from data. In this paper, we explore how you can bring
analytics and analytical thinking into your decision-making by using SAS®
Visual Analytics. We explore everything from smart, automated tools that can
help you find insights within your data to simple charts and rules that can
help you determine the usefulness of your insights.
Atrin Assa, SAS
Session 3529-2019
The most powerful person on the planet is the storyteller. From Winston
Churchill in world politics to J.K. Rowling on the pages of a book the people
who move the world tell compelling stories. This paper explores the most
effective tools and techniques in SAS® Visual Analytics to
tell your data story and move your organization.
Atrin Assa, SAS
Session 3294-2019
SAS® Forecast Server is one of the most feature-rich
forecasting products on the market. This paper describes 10 underused
features to improve your workflow. First, start with the data: (1) Use
SAS® Time Series Studio to become familiar with your data and
define a hierarchy, (2) catch data problems through warnings about the time
ID variable, (3) look under the covers by using the SAS® log,
and (4) use adjustment variables and start-up code to fix data issues. Next,
improve the forecasts: (5) Define recurring events that influence the time
series, (6) add models by importing from an external list, (7) use rolling
simulations to evaluate forecast accuracy over the number of periods you
need to forecast, (8) evaluate the effects of independent variables by using
scenario analysis, and (9) gain insight into results by comparing models.
Finally, put your workflow into production: (10) Run the code in batch.
Evan Anderson and Michael Leonard, SAS
Session 4007-2019
Text analytics is the process of examining large collections of written
resources to generate new information; transforming unstructured text into
structured data helps us find meaningful insights from the text. It is a
subgroup of Natural Language Processing (NLP). Statistical methods,
rule-based modeling, and machine learning techniques are applied in text
analytics, allowing for the extraction of topics, keywords, semantics, and
sentiments from the raw text in an effort to categorize terms. While text
mining has its vast application in customer insight (for example,
pharmaceutical drug trial improvement or media coverage analysis), this
study reports the linguistic framework observed in peer-reviewed journal
papers. My objective is to look into the published papers of International
Journal of Research in Marketing and Journal of Marketing for the year 2017
and discover the pattern in the writing style by different researchers. The
challenge is to quantify the length and depth of published papers within the
year. This first phase of a multi-phase project looks across journals to
examine lexical density, common themes, and patterns among the language.
Also, automated readability index (ARI) of the accepted paper will be
calculated to realize how the coherency affects the overall structure of a
text. SAS® Viya®, SAS® Enterprise
Miner (TM), and various rule-based modeling was used for this project.
Rohit Banerjee, Oklahoma State University
Session 3571-2019
This paper describes how British Airways, a leading British airline company,
has been using SAS® Text Miner and SAS®
Contextual Analysis to derive insight from textual data sources including
customer surveys, cabin crew (flight attendant) feedback, engineering
technical logs, and company-internal social media posts. This paper
describes the Topic Discovery Tool, which is an application of SAS®
Enterprise Guide® to provide a customized user interface to
SAS Text Miner. Some of the challenges of text analytics and some tips and
tricks are discussed.
Simon Cumming, British Airways PLC
Session 3705-2019
Many surveys use open-ended questions to allow participants to express their
opinions. When conducting statistical analyses of results, the text
responses are often omitted or not analyzed properly. Existing text-mining
software might not always be available or suitable for use. Therefore, we
propose a practical alternative to perform text analysis using Base SAS®
9.4. Our approach incorporates a multi-level hierarchical data mining
structure developed by the Statistical Analysis Cell, AMEDDCS HRCoE. We
start the text mining process with capturing the entire sentence associated
with the keywords of interest. That step provides us the context in which
the particular keywords were used and helps to eliminate logical errors
related to possible misunderstanding of the respondents intention. Based on
the initial results, we propose the next step to define a more precise
search. The number of consecutive steps in the data mining process depends
on the level of granularity required. This data mining process can be
conducted in a stratified environment and enables statistical comparisons of
results. As the output statistics, we provide information about the percent
of respondents who used particular keywords in each open-ended question,
frequency profile (number of keywords per respondent), univariate analysis
of text strings, and number and percentage of respondents with multiple
keywords. We illustrate the process with a flowchart and results from survey
data.
Brandon Hosek, Joint Base San Antonio-Fort Sam Houston Barbara Wojcik,
AMEDDC&S HRCoE Catherine Stein, Statistical Analysis Cell Rebecca Humphrey,
Statistical Analysis Cell
Session 3232-2019
Hypertext Transfer Protocol (HTTP) is the foundation of data communication
for the World Wide Web, which has grown tremendously over the past
generation. Many applications now live entirely in the web, using web-based
services that use HTTP for communication. HTTP is not just for browsers
though, since most web services provide an HTTP REST API as a means for
clients to access data. Analysts frequently find themselves in a situation
where they need to communicate with a web service or application from within
a SAS® environment, which is where the HTTP procedure comes
in. PROC HTTP makes it possible to communicate with most every service out
there, coupling your SAS system with the web. Like the web, PROC HTTP
continues to evolve, gaining features and functionality in every new release
of SAS. This paper dives into all the capabilities found in PROC HTTP,
enabling you to get the most out of this fabulous procedure.
Joseph Henry, SAS
Session 3496-2019
Were making it easy to deploy your models and decisions to numerous run-time
environments. However, the model life cycle doesn't end once the model is
created. Rather, it is just the beginning of the important phases of model
monitoring and analysis. This need extends to SAS® models and
open-source and Predictive Model Markup Language (PMML) models. In this
demonstration, you learn techniques for analyzing model performance,
integration with business metrics, and root cause analysis.
David Duling and Ming-Long Lam, SAS
Session 3224-2019
Experience and good judgment are essential attributes of underwriters.
Systematic analysis of the underwriting decision-making process was expected
to increase the efficiency in developing these attributes in underwriters.
However, this approach fell short of expectations. The industry still
struggles with the pace at which knowledge and experience is delivered to
the next generation of underwriters. The solution might lie in the
development and deployment of artificial intelligence (AI) methods and
algorithms in the automation of the underwriting decision-making process.
This paper outlines the current state of the performance measurement of
underwriting decision-making process through performance metrics (including
a novel one). Further, this paper provides an in-depth description of
artificial intelligence methods and algorithms that can be successfully used
to automate underwriting decision-making process. Real data from one of the
leading insurance companies was used for the analysis and testing of
proposed approaches.
Tanya Kolosova, InProfix Inc
Session 3188-2019
With the computational advances over the past few decades, Bayesian analysis
approaches are starting to be fully appreciated. Forecasting and time series
have Bayesian approaches and techniques, but most people are unfamiliar with
them due to the immense popularity of the exponential smoothing and ARIMA
class of models. However, Bayesian modeling and time series analysis have a
lot in common! Both are based on using historical information to help inform
future modeling and decisions. This talk compares the classical exponential
smoothing and ARIMA class models to Bayesian models with autoregressive
components. It compares results from each of the classes of models on the
same data set and discusses how to approach Bayesian time series models in
SAS®.
Aric Labarr, Institute for Advanced Analytics at NC State
Session 3897-2019
In epidemiological science, researchers aim to associate exposures to
specific health outcomes. In psychological sciences, researchers want to
investigate how an intervention improves quality of life. In the social
sciences, policy makers want to quantify how effective work release programs
are in reducing re-incarceration rates. When investigating these
relationships, it is tempting to conclude that an independent X causes a
dependent Y. However, though significant associations might exist, the
mechanisms behind those relationships are not always apparent. Often,
additional variables influence the impact of your independent X variable. To
accurately model these additional variables in inferential statistics, you
must classify them based on their hypothesized role in the relationship. In
some situations, it might be hypothesized that X acts on, and by proxy,
exerts effect on Y through a mediating variable, M. To properly attribute
the significance and influence of mediating variables when performing
analyses, researchers can use causal mediation analysis (CMA). The CAUSALMED
procedure, introduced in SAS® 9.4, allows for the use of CMA
when looking at how X indirectly affects Y via M. In this paper, we
highlight how PROC CAUSALMED, in conjunction with various other regression
and pathway modeling procedures, was used to conduct CMA on data from
publicly available data sets to investigate how stress can act as a
mediating variable in health outcomes.
Daniel Muzyka and Matthew Lypka, Spectrum Health
Session 3559-2019
In this paper, we present methods that combine prediction of a target
variable Y from predictor variables (X variables) with prediction of the
next few residuals from their predecessors based on their time series
correlation properties. One simple-to-use procedure that accomplishes this
is PROC AUTOREG which, in addition, allows a certain type of heterogeneous
variance known as the ARCH and GARCH form. PROC AUTOREG has a nice outlier
detection algorithm that is illustrated in an example. It is a surprisingly
powerful procedure given its ease of use. Although PROC AUTOREG allows only
stationary autoregressive error terms, it provides a useful initial analysis
of the data even when a user intends to later fit a more sophisticated model
that deals possibly with non-stationary errors and error correlation
structures of the autoregressive moving average type. This can be done with
PROC ARIMA or PROC MIXED. The talk illustrates the similarities and
differences among these procedures. The focus of the talk is on basic ideas
and examples rather than on rigorous theory.
David Dickey, NC State University
Session 3747-2019
The FREQ procedure is essential to anybody using Base SAS® for
analyzing categorical data. This paper presents the various utilities of
PROC FREQ that enable effective data analysis. The cases presented include a
range of utilities such as finding counts, percentages, unique levels or
records, Pearson chi-square test, Fishers test, McNemar test,
Cochran-Armitage trend test, binomial proportions test, relative risk, and
odds ratio. In addition, this paper shows the Output Delivery System (ODS)
features available to effectively display your results. All the cases
presented in this paper prove the categorical might of PROC FREQ beyond
doubt.
Jinson Erinjeri, Emms Corporation
Saritha Bathi, Celgene Corporation
Session 3094-2019
The first thing you need to know is that SAS® software stores
dates and times as numbers. However, this is not the only thing that you
need to know. This presentation gives you a solid base for working with
dates and times in SAS. It introduces you to functions and features that
enable you to manipulate your dates and times with surprising flexibility.
This paper shows you some of the possible pitfalls with dates (and times and
date times) in your SAS code and how to avoid them. We show you how the SAS
handles dates and times through examples, including the ISO 8601 formats and
informats and how to use dates and times in TITLE and FOOTNOTE statements.
The presentation closes with a brief discussion of Excel conversions.
Derek Morgan, PAREXEL International
Session 3228-2019
Few data science programs and even fewer analytics programs require courses
in ethics. This has downstream consequences for businesses and for society.
Although universities do a (fairly) good job at teaching students what they
CAN do with data, they are less adept at teaching students what they SHOULD
(or SHOULD NOT) do with data. This talk goes through three case studies that
explain the unintended consequences of data science in application without
ethical guide rails. The presenter discusses the obligations that
universities have related to teaching ethical data science.
Jennifer Lewis Priestley, Ph.D. Kennesaw State University
Session 3432-2019
SAS® 9.4M6 enables SAS® programmers to create
PDF reports that fully meet the Web Content Accessibility Guidelines 2.0
(WCAG 2.0) conformance requirements. These are the guidelines that
government and industry use to determine whether electronically produced
output is usable by people with disabilities. With the accessibility
features in SAS 9.4M6, it is possible to create reports that require zero
post-processing remediation work, thus saving you time and money. By using
the PRINT, TABULATE, and REPORT procedures, and the ODS Report Writing
Interface (RWI), SAS 9.4M6 can prompt you to address certain accessibility
problems detected in your code, create tables of data that are structured to
be fully accessible to users, and add alternative text for images inserted
into your reports. This paper demonstrates how to use these reporting
procedures to create accessible PDF reports.
Greg Kraus, SAS
Session 3364-2019
Virtual Reality (VR) technology has been the next big thing for a few years
now, mainly focused on the consumer market in gaming and video playback
applications. More recently, better hardware and usability have started
making VR increasingly attractive for professional applications. Especially
in the world of big data and advanced analytics, having access to an
immersive, infinite, easily navigated visualization space opens an entire
new world of data exploration and workflow. This paper showcases one such
immersive workflow for inspecting social networks in the context of
financial fraud investigations one that takes advantage of SAS®
Viya® to transparently interact with its advanced analytics
and machine learning components to create an intuitive, interactive, and
productive analyst experience. Buckle up and enjoy the ride!
Falko Schulz, SAS Australia
Nikola Markovifa, Boemska United Kingdom
Session 3061-2019
Three-dimensional chess uses two or more chess boards so that a chess piece
can traverse several boards according to the rules for that piece. Thus, the
knight can remain on the board where it resides or move one or two steps to
a successive board, then move its remaining steps. In three-dimensional
chess, the Knights Tour is a sequence of moves on multiple 8x8 chess boards
so that a knight visits each square only once. Thus, for three boards, there
would be 192 squares visited only once. The paper, The Knights Tour in Chess
Implementing a Heuristic Solution (Gerlach 2015), explains a SAS®
solution for finding the knights tour on a single board, starting from any
square on the board. This paper discusses several scenarios and solutions
for generating the knights tour in three-dimensional chess.
John R Gerlach; DataCeutics, Inc.
Scott M. Gerlach; Dartmouth College
Session 3707-2019
Missing numerical data is a common feature when working with SAS®
data sets. The extent of missing data and their relationships to other types
of numerical data, whether complete or those with missing values, are often
not evident by visual inspection of the data set alone. When the number of
observations is not too large (a maximum of a few hundred) or when the data
can be divided into subgroups based on levels of categorical data, a missing
variable plot can be constructed by assigning rows of the plot to subjects
and columns to the numerical variables. For the variables of interest
(columns), a thin horizontal line indicates whether a value for that subject
is present (printed as a shade of a blue to indicate its relative magnitude)
or whether the value is missing (a bright red). The distribution of colors
in this plot helps to visually evaluate whether the data are missing at
random (MAR), a feature necessary to apply data imputation techniques with
the SAS procedures MI and MIANALYZE. Each column of the graph is also
labeled with the number and percent of the observations that are missing.
The input data set to make the graph with the SGPLOT procedure is produced
with a series of DATA steps, along with the RANK procedure. In addition, an
annotation data set prints information on the graph about the extent of
missing data.
Robin High, University of Nebraska Medical Center
Session 2977-2019
Excellus Blue Cross Blue Shield is a regional health plan in Central and
Western New York. We run SAS® 9.4 in a grid environment. Our
enterprise provisioning processes are not optimally integrated, so acquiring
and managing data on users access can be challenging. Given our highly
sensitive health care data, regulatory and compliance requirements, and the
current security climate, effective management and reporting on user access
is critical to our success. This paper describes how we used SAS®
macros, SAS metadata, SAS® Schedule Manager, UNIX scripting,
human resources information system data, Lightweight Directory Access
Protocol (LDAP) data, and some crafty programming to manage accounts and
generate consistent, comprehensive, and timely reporting on user access to
corporate data resources. This paper is for SAS administrators and managers
who require quick access to comprehensive and accurate information regarding
who in their enterprise has access to sensitive data stored complex
environment.
Hugh G. McCabe, Blue Cross Blue Shield
Session 3199-2019
Test automation is an integral component of modern software development.
Many programming languages and the communities built around them either
enjoy integrated test libraries as part of the languages standard library or
have adopted a de facto standard such as Pythons unit test module or Java
JUnit libraries. Many libraries that have been developed or promoted in the
SAS® community have been adopted to varying degrees. This
includes notable projects such as SASUnit® and the SAS
Operational Quality testing tool. All of these libraries are focused on the
programmer and geared toward unit testing approaches. SUIT is a new testing
tool offered to the community that not only provides familiar JUnit-style
test assertions for the SAS programmer, but also keyword-driven test-case
development more suitable for business-focused SAS users. It implements
automated user acceptance testing. With a wide range of out-of-the-box test
cases for SAS® 9 and SAS® Viya®,
users can quickly develop robust test cases. The extendable architecture
easily enables business units to develop their own keyword libraries. Using
the power of SAS, tests can be developed across the full lifecycle of your
development, from data to user interface. Test results integrate into common
continuous integration tools such as Jenkins and Bamboo. A
modern-browser-based interface provides a seamless interface to SAS
middle-tier applications.
Cameron Lawson, Selerity
Session 3241-2019
If you have been programming in SAS® for any length of time,
you are quite familiar with the slogan, Power to Know. After more than 15
years of programming in this robust solution, it goes without saying that I
am a firm believer in the truth of this statement. Undoubtedly though, there
is a second unspoken and critical truth tied to the success of SAS within an
organization -the power of SAS is nothing without the users Power of
Know-How. Developing a strong foundation in programming best practices;
writing stable, efficient code; and building automated solutions are all
ways in which you can begin to fully appreciate the power of SAS (and make
yourself more appealing to employers to boot). There is a difference between
having a working knowledge of SAS and being able to really program in SAS.
As someone who has grown as a SAS programmer over the years and now teaches
SAS in an online graduate certificate program, I have either lived or
witnessed many of the hurdles and stumbling blocks that folks can encounter
when they want to transition from an ad hoc user of SAS to a more technical
practitioner. This paper provides practical tips on how to add to your SAS
toolbox and how to apply that knowledge to create production-ready code. A
deeper understanding of key SAS concepts helps you become more effective in
your current organization and it might even land you that ideal job.
Gina Huff, Lumeris
Session 3026-2019
Many customers expect batch and programming approaches to manipulate SAS®
Visual Analytics reports in the areas of report creation, customization,
localization, and distribution. By using the REST API and Command-Line
Interface (CLI) with SAS® Viya® 3.4, SAS®
provides the power to meet these requirements. In this paper, we demonstrate
how to do it step-by-step with typical business scenarios. You learn how to
create or edit SAS® Visual Analytics reports programmatically,
how to localize a SAS Visual Analytics report based on the standard report,
how to customize the email content before distribute the reports in batch,
how to change the report distribution recipients automatically with a batch
job, and finally how to kick off the distribution right after the data is
refreshed automatically. This paper will be helpful for both SAS consultants
and customers who work with SAS Visual Analytics.
Rain Kang, SAS
Session 3563-2019
Decades of research have documented the effects of various social factors on
health status and health outcomes. This research project examines the
relationship between three measures of social determinants of health (SDH)
and two measures of avoidable outcomes of care. The SDH measures include a
validated, self-reported survey called Assess My Health (AMH); a subset of
ICD-10 supplemental Z Codes related to patient social and psychosocial
factors from claims data; and 3Ms SDH tool using publicly available small
area data. Avoidable outcomes of care used are 3Ms potentially preventable
readmission (PPR) method and potentially preventable emergency department
visits (PPV) method. The study population is enrollees in a Midwest Medicaid
plan. Patients are stratified according to clinical risk for outcomes using
3Ms Clinical Risk Grouping software with covariates including age and sex.
Statistical modeling of these outcomes allows for the calculation of
predictive probabilities and model diagnostics from an underlying
categorical regression model. Predictive probabilities are then compared for
each singular tool and in combination using standard model selection
processes. A training set of approximately 65% of the response data is used
to test the modeling. The remaining 35% is used as a verification data set.
Implications for risk adjustment using SDH is discussed as well as
methodologies for comparing predictive probabilities for model selection.
Paul Labrec, Ryan Butterfield, Laura Soloway, and Melissa
Gottscgalk, 3M Health Information Systems
Session 3110-2019
Most applications of the SAS® hash object focus on its table
lookup facility, mainly to combine data. While it is indeed a great lookup
tool, its functionality is much broader: As a data structure, the hash
object comes equipped with all common table operations, which means that it
can be also used for data aggregation, splitting, unduplication, and so on.
And, since these operations are executed in memory in constant time, they
tend to outperform other SAS alternatives. Furthermore, the run-time nature
of the hash object, its autonomous I/O facility, and its ability to point to
other objects make it possible to write dynamic and easily parameterized
programs without hard coding. In this paper, we outline the functionality of
the hash object well beyond table lookup as a compendium of the book Data
Management Solutions Using SAS Hash Table Operations. A Business
Intelligence Case Study recently published by SAS® Press.
Paul Dorfman, Dorfman Consulting
Don Henderson, Henderson Consulting
Services, LLC
Session 3165-2019
As managers, we are often asked to ensure that our business processes are
optimized. There are many software solutions for simply tracking work. But
true process optimization can be attained only by carefully examining inputs
and studying how varying those factors influences the outcome. Without
specialized tools such as SAS® Simulation Studio, obtaining
accurate results is very challenging due to the complexity of modern
business. In this paper, we use SAS Simulation Studio to conduct experiments
of how changes in work volume and resource availability impact process
efficiency and capacity of an essential business process. Modeling a process
in a simulated environment allows for true A-B testing. The point-and-click
interface is easy to use and empowers even novice users to translate
real-world scenarios into reliable estimates. The desktop deployment makes
it easy to install for businesses of any size, and integration with JMP®
enables quick analysis of results, which promotes experimentation and
data-driven decision-making. Business process management is often called a
science and an art; analytics helps ensure that the science is solid.
Dave Manet, SAS
Session 3344-2019
There are many languages that co-exist in the ecosystem of your SAS®
toolbox. This Hands-On Workshop teaches you how to use four SAS languages -
Base SAS, PROC SQL, Perl language elements, and the SAS® Macro
Language - to help you manipulate and investigate your data. Learn to
leverage these powerful languages to check your data with simple, yet
elegant techniques such as Boolean logic in PROC SQL, operators such as the
SOUNDS-LIKE operator in the DATA step and PROC step, functions such as the
SCAN function in the DATA step, efficient checking of your data with Perl
regular expressions, and last but not least, the amazing marriage between
PROC SQL and the SAS Macro Language to hold data you just found in a
variable that you can use over and over again. This workshop focuses on
coding techniques for data investigation and manipulation using Base SAS.
Charu Shankar, SAS
Session 3050-2019
If the goal is to improve student outcomes, intervention analytics is
required! The last few decades have seen a tremendous growth in the depth
and clarity of big data in education, including a greater emphasis on using
analytics to improve student outcomes. Despite the genesis and transition
from data-driven decision-making to actionable data to visual analytics, has
there really been a systemic transformation in the effective use of
analytics in education? This paper introduces the use of intervention
analytics to improve the educational outcomes for all students. A key
element of this process is reliance on the single-source SAS®
model that transforms data, integrates an innovative analytics model,
identifies unique educational interventions, and transitions information to
web-based platforms.
Sean Mulvenon, University of Nevada, Las Vegas (UNLV)
Session 3742-2019
Learning SAS® for the first time as a budding epidemiologist
in graduate school, I remember being amazed at how powerful and customizable
SAS was in making sense of huge amounts of data. I also remember being very
intimidated by all the moving parts involved in learning such a program! My
study and use of SAS has continued since that time, specifically now as a
Doctor of Public Health student. I have often wondered how much farther
along my research would be at this point had I learned SAS in high school.
When one of my professors recently told me that resources are available for
teaching SAS at that level, it prompted me to ask a local high school
teacher what she thought about teaching SAS to her students. When she
replied that she thought it was a great idea but did not know how she could
do this, I realized this was a gap that needed to be filled. The aim of this
presentation is to raise the awareness of educators about the importance of,
and resources available for, teaching SAS to high school students.
Jennifer Richards, Florida A & M University
Session 3295-2019
The new tmCooccur action in SAS® Viya® 3.4
detects significant co-occurrences of terms in sentences. It can identify
patterns and nuances such as the following: 1) the same word used with
different meanings (for example, They met along the bank of a river versus
They made a deposit in a bank); 2) verbs used as regular verbs versus
verb-particle combinations (for example, shut the heck up versus shut the
door please); and 3) less common uses of negation, such as statements like
He liked the show very much versus He liked the show not at all. This paper
describes how the results of the tmCooccur action can be used to generate
word embeddings on the basis of local information (in a manner similar to
GloVe and Word2vec), and also how it can be used to create virtual terms
that represent the most significant term co-occurrences. These new composite
terms can then be used to create better topic, concept, and category
definitions. Doing so can generate significant gains in lift for predictive
modeling on some real-world data. Finally, this paper describes how to use
the tmCooccur action with other kinds of transactional data, including IOT
(Internet of Things) and genomic data.
James Cox and Russell Albright, SAS
Session 3316-2019
The last few years have been an exciting time for the map geeks at SAS. Weve
added a lot of geospatial functionality throughout SAS®, and
we cant wait to show it off. This paper goes over all the additions,
including GeoEnrichment, geocoding, the SGMAP procedure, contours, WebMap
support, routing, location pins, clustering, and custom polygon support. We
also show you the enhancements in progress and detail the roadmap for the
future.
Jeff Phillips, Tony Graham,and Scott Hicks, SAS
Session 3561-2019
Its impossible to know all of SAS® or all of statistics. There
will always be some technique that you don't know. However, there are a few
techniques that anyone in biostatistics should know. If you can calculate
those with SAS, life is all the better. In this session, you learn how to
compute and interpret a dozen of these techniques, including several
statistics that are frequently confused. The following statistics are
covered: prevalence, incidence, sensitivity, specificity, attributable
fraction, population attributable fraction, risk difference, relative risk,
odds ratio, Fishers exact test, number needed to treat, and McNemars test.
The 13th, an extra bonus tool, is the ability to produce statistical
graphics using the Output Delivery System (ODS) in conjunction with
SAS/STAT® software. With these tools in your tool chest, even
non-statisticians or statisticians who are not specialists will be able to
answer many common questions in biostatistics. The fact that each of these
can be computed with a few statements in SAS makes the situation even
better.
AnnMaria De Mars, The Julia Group
Session 3664-2019
In this paper, we describe how to use SAS® Customer
Intelligence for campaign execution and post-campaign reporting, and how to
avoid three most commonly found campaign execution hurdles. SAS Customer
Intelligence is an integrated marketing management suite. It combines
multichannel campaign management, marketing resource management, and
marketing performance management with strong data management and a wide
range of analytic capabilities. Using an example of a monthly summary
campaign, this paper helps you understand SAS Customer Intelligence, what it
is, what its possibilities are, and where it can be used. We present three
challenges to address using SAS Customer Intelligence in the hospitality
industry: 1) Generating one row per guest with different offers; 2) Creating
a marketing history ID; and 3) Creating an output file with order
information. We also describe the main benefits of using SAS Customer
Intelligence in campaign execution and how it can be implemented.
Al Cordoba and Pratyush Gupta, Qualex Consulting
Services
Session 3644-2019
The SGPLOT procedure is a powerful procedure for creating various graphics.
Sometimes, just a graphic is not satisfying, and you might want to add some
text to it. This presentation talks about three powerful and easy-to-learn
ways to customize text in our figures by using special statements and
options in PROC SGPLOT.
Yaqi Jia, Johns Hopkins University
Session 3528-2019
Would you like to know how the clinical trial you design today might perform
in the future? Will your design be able to reach the target patient
enrollment fast enough while staying within budget? Come and learn how
SAS® Clinical Trial Enrollment Simulator, built on SAS/OR®
software, provides pharmaceutical and clinical research companies the power
to predict the future performance of their clinical trial enrollment
designs.
Bahar Biller, Anup Mokashi, Jinxin Yi, Ivan Oliveira,
and Jim Box, SAS
Session 3376-2019
SAS® Visual Investigator provides users with the flexibility
to build applications that are customized to their business needs. This
flexibility includes the ability to define how data is analyzed to generate
alerts, how data is indexed to enable users to search, how data is presented
to the users, what data users can enter into the system, and what workflows
are in place with respect to data entry. As a result, SAS Visual
Investigator can be used for applications in many different markets. This
paper covers some of the best practices for building applications on top of
SAS Visual Investigator to ensure maximum performance and utilization of
available resources.
Gordon Robinson, SAS
Session 3812-2019
Starting a new SAS® project can be very stressful in the
beginning, but there are SAS functions that you can use in the initial stage
of a project to explore the data and the business processes. Like many other
consultants, I have developed a check-list that has become so vital to my
personal onboarding. I routinely use it at the beginning of every project.
In this paper, I share this check-list with you, along with functions I use
myself, as well as a few examples to better demonstrate my process. I hope
they serve as an inspiration for other SAS consultants and programmers to
improve their effectiveness and success as they embark on new journeys.
Flora Fang Liu, Independent Consultant
Session 3161-2019
Although not as frequent as merging, a data manipulation task that SAS®
programmers are required to perform is vertically combining SAS data sets.
SAS provides multiple techniques for appending SAS data sets, which is
otherwise known as concatenating, or stacking. There are pitfalls and
adverse data quality consequences for using traditional approaches to
appending data sets. There are also efficiency implications with using
different methods to append SAS data files. In this paper, with practical
examples, I examine the technical procedures that are necessary to follow to
prepare data to be appended. I also compare different methods that are
available in Base SAS® to append SAS data sets, based on
efficiency criteria.
Jay Iyengar, Data Systems Consultants
Session 3543-2019
The number of vendors providing solutions for predicting student success
keeps increasing. Each vendor claims to be able to use your data and to
predict the likelihood of retention or graduation for you. This all comes
with a financial cost and required time and effort to prepare and update
data to feed their systems. Is creating an in-house system a better
solution? This paper walks through one way to make that determination. If
your institution has gone through changes impacting the continuity of data
or has made other significant structural changes, finding an off-the-shelf
solution can be more challenging. Should you include additional variables
that the commercial products do not consider? A side benefit of creating the
model in-house is validating multiple models on past students as well as
working with those who will be using the model output to generate buy-in.
Rolling out a commercial solution is of no benefit if those using it do not
trust the results. SAS® has the tools to do the modeling
whether you use Base SAS®, SAS® Enterprise Miner
(TM), or SAS® Visual Analytics. Why not leverage what you have
to support your students success?
Stephanie Thompson, RIT / Datamum
Session 3609-2019
Are you a SAS® infrastructure architect? Do you want to know
what has worked for other SAS customers and what things should be avoided?
Attend this presentation to learn from others experiences with SAS. We also
encourage you to share your experiences with us.
Margaret Crevar, SAS
Session 3949-2019
At least three human vaccine quality scandals broke out in China in the past
five years. The recent one that was reported in July 2018 triggered an
outcry from Chinese parents on the internet. They were eager to find out
whether the vaccines that their children got were at risk. However, there
was no complete vaccine tracing database that was available to the public
for easy searching. We gathered public vaccine sourcing information and
consolidated the data to make a comprehensive database for general public to
check their vaccine sources. We used Base SAS® for web
scraping, data cleansing, and data consolidation. Our database is used to
provide an easy way to check the risk associated with vaccine incidents.
Yizhe Ge, Donghua University
Tracy Song-Brink, North Carolina State
University
Session 3687-2019
In binary logistic models for credit risk and marketing applications, as
well as for many social science studies, there are ordinal predictors that
are considered for use in the model. The modeler faces the question of how
to use these ordinal predictors. Choices include conversion to dummy
variables, recoding as weight of evidence, or assuming an interval scale for
the ordinal levels and entering the effect as linear (or possibly as some
transformation). The use of dummies or weight of evidence add parameters to
the model (for weight of evidence the added parameters are implicit). This
paper provides a procedure to aid in decision-making regarding the handling
of ordinal predictors. First, a measure is computed of the monotonic
tendency of the ordinal versus the target (the dependent variable). If the
ordinal is nearly monotonic, then a simple numeric recoding of the ordinal
is made. This imposes an interval scale. Is this recoding appropriate? To
answer this question, a model comparison test is made between the saturated
model (all dummies) and the model with the numeric (recoded ordinal)
predictor. Acceptance of the null hypothesis of no difference allows the
numeric (recoded ordinal) to be considered for inclusion in the model.
SAS® code that carries out the ordinal to numeric recoding as
well as the model comparison test is included. This code can process many
ordinal predictors using an efficient, data-driven approach that does not
require hard coding.
Bruce Lund, Independent Consultant
Session 3680-2019
One of the most common concerns for large organizations is the risk of
performance degradation resulting from an increase in demand of SAS®
workloads. The most common pattern in a Linux platform is the delegation of
authentication requests to the corporate Active Directory via Pluggable
Authentication Modules (PAM) using the System Security Services Daemon
(SSSD), an area that is not sufficiently understood nor explored as yet. A
SAS administrator is expected to minimize the impact that an increase in
user concurrency brings to the SAS login elapse time, which can cause
performance degradation. Did you know that such performance problems can be
addressed by enabling and configuring a brand new multithreaded SAS
authentication daemon available on the latest version of SAS? Did you also
know that the authentication process can be further improved by configuring
SSSD caching options, which will avoid multiple calls to Active Directory by
the same user session? Through a combination of both these daemons, we can
remarkably improve the performance of your SAS® Platform to
support a large number of concurrent users. Its time to face your daemons.
Sirshendu (Sirsh) Ghosh, Royal Bank of Scotland
Session 3684-2019
When running SAS® programs that use large amounts of data or
have complicated algorithms, we often are frustrated by the amount of time
it takes for the programs to run and by the large amount of space required
for the program to run to completion. Even experienced SAS programmers
sometimes run into this situation, perhaps through the need to produce
results quickly, through a change in the data source, through inheriting
someone else's programs, or for some other reason. This paper outlines twenty
techniques that can reduce the time and space required for a program without
requiring an extended period of time for the modifications.
Stephen Sloan, Accenture
Session 3426-2019
Although organizations are increasingly leveraging SAS®
Viya®, many have a substantial investment in SAS®
9.4 applications and will for some time. This paper looks at a new feature
introduced in SAS Viya 3.4 that provides single sign-on with SAS 9.4,
enabling users to move seamlessly between web applications in both
environments without having to log on again, and making it possible to use
them collaboratively (for example, displaying a SAS® Visual
Analytics report from SAS Viya in the SAS® Information
Delivery Portal).
Mike Roda, SAS
Session 3433-2019
Hybrid vehicles let you use the best technology for a given scenario,
blending fuel efficiency with long driving range, and low emissions with
quick refueling. Its like two cars in one two motors magically complementing
each other under the hood, invisible to the driver. And as the driver, you
have just one control panel that chooses the best engine for the
circumstances, adjusting to changes in real time. We marketers face a
similar dilemma: we have different engines for different scenarios. Not only
do we have to learn how to drive multiple systems to address the diversity
of interactions we have with our customers, we must make them work together
seamlessly to deliver delightful customer experiences. Sounds more like
Marketing: Impossible! Are you thinking what Im thinking? Yes! What the
world needs now is a revolutionary approach hybrid marketing! The good news
is that SAS has been working hard on this and already has an answer for you.
In this session, you learn what hybrid marketing is and why its important to
your marketing success. You learn how this leverages your existing
investment in SAS®, and keeps your secret sauce analytics and
your data safe, as well as keeps your business processes in compliance with
privacy-related regulations. We share our unique approach to this need,
showcase a few analytic tricks we've already built for marketers, and
describe in both business and technical terms how hybrid marketing can work
for you.
Andy Bober, SAS Steve Hill, SAS
Session 3459-2019
This paper shows how you can use SAS® Visual Data Mining and Machine Learning and other SAS products to build and compare various predictive models. First, you use SAS Visual Data Mining and Machine Learning to create several models and you choose one of them as your champion model. You can publish all these models to different destination types such as Hadoop, Teradata, or SAS® Cloud Analytic Services (CAS). You use SAS® Embedded Process to score the data against these published models where the data reside. You can also register the models to SAS® Model Manager and compare them against other models for final champion model selection. You can then test these models to validate them for scoring. If you notice a degradation in the model, you can retrain the model. Retraining the model triggers a run of all the pipelines in the associated Visual Data Mining and Machine Learning project, and the recalculated project champion is automatically registered back to Model Manager. In addition, you can score streaming data by using SAS® Event Stream Processing on the models in Model Manager. SAS Visual Data Mining and Machine Learning also provides a scoring API that enables you to score models directly in Model Studio by using RESTful interfaces. This paper shows how you can unleash the full power of your models by taking advantage of the model processing capabilities in all these SAS products.
Shawn Pecze,
Xin Chi,
Prasanth Kanakadandi, and
Byron Biggs, SAS
Session 3370-2019
Are you a SAS® programmer searching for insights within Salesforce data? Accessing and analyzing your information is now easier than ever before with the all-new SAS/ACCESS® Interface to Salesforce. This paper provides you with an overview of how to explore Salesforce objects as if they were native SAS data sets. Learn how to directly execute the SOQL language used by Salesforce by using the SQL procedure, and take advantage of the advanced features of implicit SQL to let SAS do some of the work for you. With the help of SAS/ACCESS Interface to Salesforce, you'll begin to understand your data in no time!
Kevin Kantesaria and
Mason Morris, SAS
Session 3622-2019
An all-too-common practice in data analytics is the use of pre-defined business groupings to drive analyses and subsequent decision-making. Product categories drive assortment layouts in retail, but are they really indicative of what shoppers buy at the same time? Geographies define divisions for sales-oriented companies, but do customers in neighboring states really behave the same? In the Compliance Technology Analytics group at Wells Fargo, account types from an existing vendor solution were driving the peer-to-peer comparisons for the anti-money laundering (AML) transaction program that covers our securities business. Accounts of the same type, or peer group, are compared to one another to identify anomalies and out-of-bounds behavior for AML alerting. However, not all accounts in the same peer group are created equal, resulting in sub-optimal comparisons and less valuable alerts for our investigative team. Using SAS® Enterprise Miner (TM), we built more meaningful account peer groups with k-means clustering. With the new clusters, we have an apples-to-apples comparison of accounts when performing transaction monitoring and anomaly detection.
Laura Rudolphi and
Chris Robinson, Wells Fargo
Session 3722-2019
The advantage of an indexed Base SAS® engines data set is that when it is queried with a WHERE clause, the process might be optimized and the index subsetting works faster than a sequential read. Unfortunately, the Base SAS engine is unable to use more than one index at a time to optimize a WHERE clause. Especially when the WHERE clause contains an OR condition between two different (indexed) variables, the Base SAS engine does not optimize it and executes a sequential read. This presentation offers a simple, hash table-based solution to the following question: How do you handle a situation in which you have a WHERE clause with the OR condition on two different indexed variables and you want to use the advantage of indexing to subset the data faster?
Bartosz Jablonski, Warsaw University of Technology / Citibank Europe PLC
Session 3773-2019
Did you ever wish you could use the power of SAS® to take control of Microsoft Excel and make Excel do what you wanted when you wanted? Well, one letter is the key to doing just that, the letter X, as in the SAS X command that opens the door to all operating system commands from SAS. The Microsoft Windows operating system comes with a facility to write a series of commands called scripts. These scripts have the ability to open and reach into the internals of Excel. Scripts can load, execute, and remove Visual Basic for Applications (VBA) macro code and control Excel. This level of control enables you to make Excel do what you want, without leaving any traces of a macro behind. This is Power.
William E Benjamin Jr, Owl Computer Consultancy, LLC
Session 3651-2019
The United States Census Bureau (USCB) Demographic Systems Division (DSD, collectively USCB-DSD) utilizes Dataflux® Data Management Server and DataFlux® Data Management Studio for data management and cleansing. USCB-DSD accesses DataFlux Data Management Studio via a Microsoft Windows virtual desktop (vDESK) where users can perform data management tasks unique to their individual program area. Due to USCB security restrictions, users do not have access to the local Windows program files where the DataFlux Data Management Studio configuration is stored by default. Users also do not have the capability to establish a System Data Source Name (DSN) as required to make the connection to the repository. To solve these problems, USCB-DSD implemented a shared repository consisting of an Oracle database management system (DBMS) with the DataFlux data connection configuration saved on a shared file system and user credentials saved in the SAS metadata. This paper discusses how the USCB uses DataFlux Data Management Studio, the architecture of the USCB environment, the administrative problems encountered due to security restrictions, and a solution that minimizes the administrative problems.
Andre Harper and Matthew Hall, United States Census Bureau
Marc Price, Tom Gaughan and Michael Bretz
Session 3051-2019
This case study provides a real-world example of how Base SAS® was used to read in more than 185 Excel workbooks to check the structure of more than 10,000 worksheets, and then to repeat the process quarterly. It illustrates how components such as the SASIOXLX (XLSX) Engine, PROC SQL (to create macro variables), SAS® dictionary tables, and SAS macros were used together to create exception reports exported to Excel workbooks. The structure of the worksheets, such as worksheet names and variable names, were checked against pre-loaded templates. Values in the worksheets were checked for missing and invalid data and percent differences of numeric data. And, it was verified that specific observations were included in specific worksheets. This case study describes the process from its inception to its ongoing enhancements and modifications. Follow along and see how each challenge of the process was undertaken and how SAS Global Forum papers and presentations contributed to this quality check process.
Lisa Mendez, PhD, IQVIA Government Solutions
Andrew Kuligowski
Session 3811-2019
In the present fast-moving times, when real-time decisions are needed, data quality is highly appreciated as high-quality predictive models are created based on it. Nowadays, analytics is turning into a mature, value-adding technology. A special benefit is considering future costs and revenues. In my final thesis work, which was finalized and submitted in March 2019, I pointed out the issue around increasingly evolving digital marketing to optimize the example online campaign. The aim of my project is to describe the procedure of the application of a strategy that is sensitive to profit/loss, and to compare the models of standard logistic regression and weighted logistic regression with the application of such a strategy. When perceiving the goal of a selected digital campaign from the advertisers point of view, achieving the best click-through rate (CTR), the media agency must also consider the potential costs of cost per click (CPC) or the most effective profit. This balance (win-win strategy) needs to be understood and needs to be reached very sensitively. The goals I tackle in my model focus on setting bids optimizing CPC (cost per click). I have already taken cost-sensitive classification techniques into account while learning the model. They optimize the predictions of the target variable by specifying the costs.
Romana Sipoldova, University of Economics in Bratislava
Session 3331-2019
Are formats associated with your data? Formats that you created in SAS® using the FORMAT procedure? If you want to use SAS® Visual Analytics on SAS® Viya® with your data, you need to load your formats so that they can be found. In SAS®9, this meant adding the catalog that contained your formats to the format search path. On SAS Viya, there is an analogous task but the process is different. First, SAS Viya cannot read a SAS catalog, so you need to convert your catalog to a structure that SAS Viya understands. Next, you need to place the formats in a location where SAS Viya can find them. And finally, you need to tell SAS Viya to load the formats whenever a new session is started. This paper explores how you, as a user of SAS Visual Analytics on SAS Viya, can accomplish these steps and make your formats available. You can do this by using SAS® Environment Manager, SAS® Studio, or shell scripts. Each of these methods is described in detail, including sample code, and the benefits and limitations of each are presented.
Andrew Fagan, SAS
Session 3841-2019
Cars in Denmark are heavily taxed, and the level of taxation is changed from time to time. In recent years, first an extra tax was introduced in order to reduce fuel usage, and two years later a new tax was introduced in order to make more secure cars affordable. The first intervention was in favor of small cars, while the second was in favor of large cars. Whether such changes in the taxation rate were introduced immediately or were announced some months before makes a difference in their impact. Also, changes in taxation are expected by everyone interested in the negotiations in parliament. Statistics Denmark publishes the number of cars sold in many segments. In this paper, the car segments small, medium, and large are used in order to illustrate the effect of these changes in taxation. The models applied are estimated by the ARIMA procedure with intervention components, because exogenous, deterministic effects like changes in taxation should be modeled separately by deterministic components and not as a part of the stochastic model. In this paper, models for exponentially decreasing impacts of an intervention are of interest. However, situations where impacts are increasing up to the actual date of an announced future change in taxation also exist in this data set. Such situations are easily modeled by reverting the direction of time.
Anders Milhøj, University of Copenhagen
Session 3713-2019
Do you have a complex report involving multiple tables, text items, and graphics that could best be displayed in a multi-tabbed spreadsheet format? The Output Delivery System (ODS) destination for Excel, introduced in SAS® 9.4, enables you to create Microsoft Excel workbooks that easily integrate graphics, text, and tables, including column labels, filters, and formatted data values. In this paper, we examine the syntax used to generate a multi-tabbed Excel report that incorporates output from the REPORT, PRINT, SGPLOT, and SGPANEL procedures.
Caroline Walker, Warren Rogers LLC
Session 3539-2019
Prostate cancer (PrCA) incidence and mortality rates are higher among African-American men than any other racial group. Informed decision-making about prostate cancer screening could result in early detection and potentially reduce cancer health disparities. Currently, there are some, but few, computer-based decision aids to facilitate PrCA decisions of African-American men, but no scale has been validated to assess the extent to which African-American men accept and use a computer-based PrCA screening decision aid. Using parallel analysis, this study determined the dimensionality of the Computer-Based Prostate Cancer Screening Decision Aid and Acceptance Scale using data from a purposive sample of 352 African-American men aged 40 years and older who resided in South Carolina. Exploratory factor analysis was conducted using maximum likelihood, squared multiple correlations, and Promax rotation. Internal consistency reliability was assessed using Cronbachs alpha. Parallel analysis was used to determine the dimensionality of the scale using SAS® macro language. Results showed the optimal factor structure of the Computer-Based Prostate Cancer Screening Decision Aid among African-American men was a 24-item, 3-factor model. Factor loadings ranged from 0.32 to 0.94 with 11 items loading on Factor 1, 8 items on Factor 2, and 5 items on Factor 3. Parallel analysis is a valuable method for determining the dimensionality of the Computer-Based Prostate Cancer Screening Decision Aid.
Abbas Tavakoli,
Nikki Wooten, and
Otis Owens, University of South Carolina
Session 3875-2019
The classic Traveling Salesman Problem (TSP) establishes a list of cities to visit and costs associated with travel to each location. The goal is to produce a cycle of minimum cost that visits each city and returns the salesperson to the home location. What happens if an imposed time limit on the journey makes visiting each location impossible? Assuming each location is of equal value, then the goal transforms into visiting as many locations as possible within the imposed time limit. This variation is known as the Prize-Collecting Traveling Salesman Problem (PCTSP). In this paper, we use the OPTGRAPH procedure to implement the TSP, develop an approach to implement the PCTSP, and utilize SAS® Viya® to map results. Set in a suburb of metropolitan Atlanta, our motivation stems from the need to acquire as many virtual Pokemon as possible for Alice, the nine-year old daughter of one of the authors. Analysis of results from executing routes generated by SAS® is included.
Bryan Yockey and
Joe Demaio, Kennesaw State University
Session 3928-2019
Shift tables display the change in the frequency of subjects across specified categories from baseline to post-baseline time points. They are commonly used in clinical data to display the shift in the values of laboratory parameters, ECG interpretations, or other ordinal variables of interest across visits. The INTO clause in the SQL procedure can be used to create macro variables for the denominators used in these tables. These macro variables can be accessed throughout the program, allowing for easy computation of percentages and the ability to call the same macro variable to display the subject count value in the header. This paper outlines the steps for creating a shift table using an example with dummy data. It describes the process of creating macro variables in PROC SQL using the INTO clause, creating shift table shells using the DATA step, conducting frequency tabulations using PROC FREQ, calling the macro variables to calculate and present the incidence and percent, and using the macro variables for the subject count value in the headers. It then discusses the efficiency of the use of PROC SQL to create macro variable denominators over other methods of calculating denominators, such as in PROC FREQ. Code examples are provided to compare shift table generation techniques.
Jenna Cody, IQVIA
Session 3053-2019
This article goes over the SAS® procedures that can fit the Rasch model, such as the LOGISTIC, GENMOD, NLIN, and MCMC procedures. It describes how to use them to fit the standard Rasch model and the Rasch model with all person or item parameters fixed.
Tianshu Pan, Pearson
Yumin Chen, Frost Bank
Session 3465-2019
Your website tracking generates a mountain of data loaded with gems waiting
to be discovered. Analyzing this data uncovers untold stories and
insights when, why, and how do people visit your website? But that's only part
of the story. By combining front-end data with back-end data, beautiful gold
nuggets of knowledge emerge. Uniting the unique visitor identification
capabilities of SAS® Customer Intelligence 360 with customer
relationship management (CRM) data empowers you to know not only when, why,
and how people use your website, but also who is using it. You can see how
different people and personas behave. In turn, CRM data can be fed back into
SAS Customer Intelligence 360 to enable you to target each person or persona
segment with more relevant content and promotions. This paper delves into
how the marketing technologists behind sas.com integrate the abilities of
SAS Customer Intelligence 360, SAS® Enterprise
Guide®, and SAS® Visual Analytics to generate
golden insights.
Mark Korey, SAS Laura Maroglou, SAS
Session 4054-2019
Insight into and understanding of factors associated with student success is of great significance among universities and the prospective student population, including their parents. Prospective students are interested in learning about completion rates and their financial well-being after graduation in the form of expected earnings across universities. Various SAS® procedures in SAS® Enterprise Guide® were used to import, merge, clean, and analyze a big data set of universities and colleges collected over a period of 10 years. The study uses publicly available College Scorecard data to study the factors associated with student success and financial well-being. SAS® Visual Analytics was used to visualize and present the findings of the study. In this study, we found a few institutional characteristics to be strongly related to both student success and financial well-being.
Pratik Patel,
Maham Khan,
Varsha Sharma,
Shreyas Dalvi, and
Dr. Shabnam Mehra, University of South Florida
Session 3971-2019
Effective communication of foundational statistical concepts such as null and alternative distributions, Type I and Type II error, and statistical power is crucial for the teaching and learning process. Unfortunately, demonstrating such abstract concepts is often limited to lecture and simple static visualizations. Creating aesthetically appealing, yet clear and understandable, dynamic visualizations can make teaching and learning abstract concepts accessible to everyone. This paper explores the application of SAS® Graph Template Language, data manipulation, and macro programming with dynamic parameters for creating intuitive and user-friendly visualizations to convey the relations among variables that influence statistical power. This paper provides a step-by-step review and application of the techniques used, highlighting the creation and use of macros and macro variables (for example, by using the SQL procedure), PDF functions, TEMPLATE procedure concepts such as series plots, band plots, and draw statements, precedence of statements, transparency, rich text support, droplines, dynamic anchor points, and manipulation of graph space.
Aaron Myers, University of Arkansas
Session 3586-2019
Obtaining run-time statistics and performance information from SAS® Data Integration Studio batch processes has historically been a challenge. And with growing source populations, tight batch windows, and the demand for SAS® analytics and predictive model results, being aware of negative performance trends in the essential data management layer is as important as ever. With SAS® Job Monitor, the ETL (extract, transform, and load) administrator can take advantage of an integrated set of components that provide a more complete view of job load status, historical run-time trends, and other audit-related information.
Jeff Dyson, Financial Risk Group
Session 3659-2019
This paper describes a method to apply the National Center for Health Statistics (NCHS) data presentation standards for proportions across NCHS reports and data products. This method was developed by the Data Suppression Workgroup at NCHS. The new standards replace the use of the relative standard error (RSE) for proportions, which can perform poorly for very small and very large proportions. Standards are important for publications with many estimates from multiple data sources and limited space for measures of precision such as confidence intervals or standard errors. A SAS® macro, with code for direct computation based on mathematical formulae, is used to apply the multistep data presentation standards based on minimum denominator sample size and on the absolute and relative widths of a confidence Interval calculated using the Clopper-Pearson method (or the Korn-Graubard method for complex surveys). The macro provides a clear and efficient method to implement the SAS code, and is therefore useful to programmers who wish to replicate, match, or benchmark against published NCHS estimates. This macro can also be useful to programmers and analysts with their own data systems. Data from the National Health Interview Survey (NHIS) are used to demonstrate the use of the macro.
Mary Bush and
Nazik Elgaddal, National Center for Health Statistics (NCHS)
Session 3567-2019
Like right censoring, left truncation is commonly observed in lifetime data, which requires proper technique to analyze in order to achieve unbiased results. This paper introduces how to analyze such left-truncated lifetime data using the native PHREG procedure, including generating survival estimates, testing homogeneity across samples, as well as generating survival plots. Alternatively, the paper demonstrates two new macro functions, namely %LT_LIFETEST and %LT_ LOGRANKTEST, to conduct survival analysis with more flexibility for data with left truncation, such as enabling users to choose specific estimation time points and to perform weighted log-rank tests. The paper also provides updates on the previously presented %LIFETEST and %LIFETESTEXPORT macro functions for generating formatted survival analysis reports with added support for left-truncated data.
Zhen-Huan Hu, Medical College of Wisconsin
Session 3117-2019
Administrative health-care data including insurance claims data, electronic medical records (EMR) data, and hospitalization data contains standardized diagnosis codes to identify diseases and other medical conditions. These codes use the short-form name of ICD, which stands for International Classification of Diseases. Much of the currently available health-care data contains the ninth version of these codes, referred to as ICD-9. Although, the more recent 10th version, ICD-10, is becoming more common in health-care data. These diagnosis codes are typically saved as character variables, are often stored in arrays of multiple codes representing primary and secondary diagnoses, and can be associated with either outpatient medical visits or inpatient hospitalizations. SAS® text processing functions, array processing, and the SAS colon modifier can be used to analyze the text of these codes and to identify similar codes or ranges of ICD codes. In epidemiologic analyses, groups of multiple ICD diagnosis codes are typically used to define more general comorbidities or medical outcomes. These disease definitions based on multiple ICD diagnosis codes, also known as coding algorithms, can either be hardcoded in a SAS program or defined externally from the programming. When coding algorithm definitions based on ICD codes are stored externally, the definitions can be read into SAS, transformed to SAS format, and dynamically converted into SAS programming statements.
Kathy Fraeman, Evidera
Session 3872-2019
Model validation is an important step in establishing a prediction model.
The model validation process quantifies how well the model predicts future
outcomes. However, there are very few SAS® programming
examples showing the validation process. We previously developed a
generalized mixed effect model that predicts peri-operative blood
transfusion from patients characteristics. In this paper, we demonstrate the
SAS techniques that we used to validate such a model. This prediction model
was developed using the GLIMMIX procedure. The validation methods include
calibration using the SGPLOT procedure, discrimination using the ROC option
in the LOGISTIC procedure, and sensitivity analysis with bootstrapping
method using SAS macro language.
Donald Likosky, Xiaoting Wu, and Chang
He, University of Michigan
Session 3996-2019
Human trafficking networks commonly move from city to city in an effort to both avoid the police and to take advantage of certain events that might increase their business. In this study, unsupervised and unstructured textual data scraped from advertisements on Backpage.com is analyzed using SAS® Viya® textual machine learning analytics techniques to ultimately identify the types of appeals that human traffickers use in recruiting ads. We plan to use the results of this analysis to assist law enforcement in properly using its resources when identifying advertisements in which recruiting ads and their contents are properly categorized. The idea behind our analysis is to find out more about those recruiting ads by separating them from other advertisements, based on several unique features. These ads appear all over the country, but they originate in only a small number of places. Using these places and the unique features from each ad enables us to map each potential recruitment ring. Custom Concepts in SAS Viya enables us to classify indicators of the ad being the reason for the advertisements appearance, such as certain words or phrases that might indicate whether the trafficking network moved to the area in order to benefit from the recruitments increased demand. Assigning term densities to the Custom Concepts enables us to group similar posts based on their duration and attractor, and we can therefore identify direct links to traffickers.
Lauren Agrigento and
Nicholas Lacoste, Louisiana State University
Session 3601-2019
This session is designed for data scientists looking to develop custom analytics solutions backed by SAS® analytic capabilities. First, we describe an analytics environment based on the SAS® Viya® and SAS® Cloud Analytic Services (CAS) architectural foundation. Next, using cybersecurity as an example, we demonstrate how SAS analytics can be implemented in Python in this environment. Finally, we include a JupyterHub installation, preconfigured with Secure Sockets Layer (SSL) encryption and Lightweight Directory Access Protocol (LDAP) authentication.
Damian Herrick, SAS
Session 3471-2019
The SASEFRED interface engine is the best-kept secret in SAS/ETS® software. It dramatically reduces the amount of time and effort required to include economic indicator variables in your time series analysis. Using the SASEFRED engine in SAS® Enterprise Guide®, I can directly query the economic database of the Federal Reserve Bank of St. Louis. This public database contains over 529,000 economic time series aggregated from 86 sources. In this paper, I forecast wine demand and enrich my predictions via the inclusion of economic variables such as Retail Sales: Beer, Wine, and Liquor Stores and Producer Price Index by Industry: Beer, Wine, and Liquor Stores. Although I use a retail example, this technology is relevant to all industries. The diversity of economic variables provided in this database ensures that it is useful to virtually every time series analysis and industry. This specific example leverages SAS Enterprise Guide and SAS® Forecast Server as interfaces. However, this functionality works on SAS® 9.4 as well as on SAS® Viya® technology.
Catherine LaChapelle, SAS
Session 3744-2019
We are frequently asked by the customer if we can use the local Indian currency symbol in their SAS® Visual Analytics dashboards. Customers want the Locale Indian currency symbol (Rupee ₹) displayed behind their data so that they can locate where the exact values are. Unfortunately, by default SAS® has limitations on displaying currency symbols based on locales that are currently available in SAS. This paper discusses the problems and limitations of the current features and provides an example of how this new currency symbol can be added in SAS Visual Analytics.
Kalind Patel, Epoch Research Institute India Pvt Ltd
Session 3885-2019
SAS has partnered with expert Game Developer HEC Montreal (https://erpsim.hec.ca/) to create an Analytics Simulation Game. Much like an airline flight simulator, the objective of this game is to provide an active learning opportunity for aspiring data scientists to learn and apply analytics. Using SAS® Enterprise Miner (TM), students will work with a special simulated data set, derived from US Census Data and other sources, that features a million members of a major charity. Students are placed in the role of a data scientist for this charity, and they must unlock the value in members data to target which individuals should be called to increase the funds raised for the charity. Students first predict the amount each individual might give, but in a second round, they predict the uplift from contacting them instead. Each student submits their decisions to a special, live web-based scoreboard, where they can see how much they raised with their models and how they compare with other classmates. Students can use SAS Enterprise Miner via an Amazon Web Services (AWS) cloud instance, on their own PC, or in laboratories. This SAS® Global Forum marks the official release of the game, in time for professors and educators to use it in their Spring or Fall 2019 curriculum.
Jean-Francois Plante, HEC Montreal
Session 3574-2019
SAS® has exceptional analytics capabilities, but to process data we often need to extract, transform, validate, and correct the data that we get from various sources to make best use of its capabilities. Suppose in an application, we process customer data where we get information from data entry monthly, with multiple records having data entry errors. Periodically, we need to identify and correct those entries in the final SAS data set as part of the data validation and correction process. It is time-consuming to manually update each record monthly. Therefore, the need for an automated process arises to produce a final corrected data set. This paper demonstrates how we can update only the incorrect values in a SAS data set by using the external file, which provides only the corrected values (finder file). This process does not make any data merges or SQL joins for the data correction. The process uses the FORMAT procedure and creates the customized formats using CNTLIN for the finder file. PROC FORMAT creates the variable to be corrected and a unique master key having several variables concatenated to avoid errors in the correction process. Using this format, code corrects the invalid values in the variable and all remaining variables remain the same. This paper is intended for intermediate-level SAS developers who want to build data validation and data correction programs using SAS.
Shreyas Dalvi, University of South Florida
Session 3889-2019
JMP® has a robust set of functions that facilitate data cleaning and transformation. This discussion focuses on using the Match function in a formula to create new columns. Branched survey question responses from the MyLymeData patient registry are used to demonstrate the power of the Match function to quickly and accurately create a Likert scale describing patient reported health status following antibiotic treatment and to further create subgroups based on their responses.\
Mira Shapiro MSc., Analytic Designers LLC
Session 3556-2019
Starting in SAS® 9.3, the R interface enables SAS users on Windows and Linux who license SAS/IML® software to call R functions and transfer data between SAS and R from within SAS. Potential users include SAS/IML users and other SAS users who can use PROC IML just as a wrapper to transfer data between SAS and R and call R functions. This paper provides a basic introduction and some simple examples. The focus is on SAS users who are not PROC IML users, but who want to take advantage of the R interface.
Bruce Gilsen, Federal Reserve Board
Session 3418-2019
Have you ever wondered why the log file is one of the first items that SAS Technical Support requests? The reason is that the log file contains a wealth of information about what actually occurs in the application or platform. So, what exactly is Technical Support looking for? How can the SAS® Object Spawner log help pinpoint the underlying issue? By reading this paper, SAS administrators will gain insight into how to use the object-spawner log to identify issues more quickly and avoid downtime. The focus of this paper is on the object-spawner log, because the object spawner is the gateway to other application servers. The paper provides an overview of the object-spawner log, describes useful loggers that you should be aware of, details how to identify common issues by looking in the logs, and offers solutions for those issues. With this knowledge, you can be proactive in solving problems that arise.
Jessica Franklin, SAS
Session 3191-2019
Python is one of the most popular programming languages in recent years. It is now getting used in machine learning and AI. The big advantage of Python is that it is considered a one-stop shop that performs several functions with one programming language. Although SAS® is the most powerful analytical tool in a clinical study, Python expands the reporting activity of SAS and can potentially advance a SAS programmers career. SASPy is the new ticket - it is Python for SAS programmers. It is also known as the Python library in SAS and enables you to start a SAS session in Python. This means Python can support activities from SAS data handling to reporting. This presentation describes the basic usage of SASPy and introduces tips for handling SAS data sets. It discusses possible reporting activities using Python in a clinical study.
Yuichi Nakajima, Novartis Pharma K.K
Session 3782-2019
Hospital readmission is an adverse but often preventable event that has been shown to be related to a hospitals quality of care (Frankl et al., 1991). To reduce unplanned 30-day readmission rates among Medicare beneficiaries, the Centers for Medicare Medicaid Services introduced measures of risk-standardized readmission rates (RSRRs) in both its quality reporting and pay for performance programs. RSRRs are calculated for five medical conditions, two procedures, and one hospital-wide measure for participating hospitals. These results are publicly reported on Hospital Compare, an online tool that patients can use to guide their decisions about where to seek care. Radar plots provide a tool that can illuminate differences and similarities of rates within and across hospitals. This can be useful for patients seeking the best possible care, or for hospital quality departments trying to understand their hospitals performance compared to their peers. This technique can be implemented using the POLYGON statement in the SGPANEL procedure (Hebbar, 2013). The code can be modified to add grids, tick marks, and labels, which provide more information on the estimates. Plotting the hospitals RSRRs against the observed national rate enables one to quickly see which hospitals perform worse or better than the national average and which perform similarly in certain measures. In summary, radar plots are an effective way to display multiple hospital RSRRs at once and make quick comparisons.
Andrea Barbo,
Craig Parzynski, and
Jacqueline Grady, Yale/YNHH Center for Outcomes Research and Evaluation
Session 3576-2019
Health-care professionals who successfully turn data assets into data insights can quickly realize benefits such as reduced costs, healthier patients, and higher satisfaction rates. Some insights are discovered by comparing your data to national benchmarks, and others are found in your own data. Using SAS® Visual Analytics with various publicly available health-care data resources, we demonstrate effective techniques for trending health plan quality performance metrics, highlighting geographical variations in care, and assisting the users in making inferences about their data. Visualizing data enables decision makers and data analysts to uncover new associations, engage users, and communicate their messages successfully.
Scott Leslie, MedImpact Healthcare Systems, Inc.
Session 2990-2019
In a perfect world of reporting, all code is automated and successfully invoked without manual intervention or ever touching the SAS® production code. Unfortunately, this is not the case for all reports across a company. This paper demonstrates how to efficiently add and remove recipients on a SAS report distribution list without manipulating the SAS production code. To do this, my coworker and I developed an email distribution list macro along with a macro to send the report. Together, these two macros provide a solution to efficient reporting and can be used to create a report inventory. Using this macro enables users to get away from altering SAS production code to simply add or remove users who receive a report, which is a best practice for most companies. We demonstrate how to develop and use these macros efficiently.
Ryan Bixler and
Travis Himes, Gateway Health Plan
Session 3069-2019
An internal graphics challenge between SAS® and R programmers at the Mayo Clinic led to the creation of a circos plot with SAS® 9.4M3 features. A circos plot is a circular visualization of the relationship between objects, positions, and other time-point-related data. The circos plot features a curved rectangle for each node or object around the perimeter of the circle and a Bezier curve for each path between nodes that is pulled toward the center of the circle. Each rectangles size and curves size is determined by its proportion of the total population. The primary challenge of creating this plot with the current SAS® Statistical Graphics procedures is that these procedures do not include polar axes to simplify plotting circular or curved objects. The macro %CIRCOS is an example of overcoming these limitations creatively using trigonometry to prove that these types of graphs are still possible without polar axes.
Jeffrey Meyers, Mayo Clinic
Session 3040-2019
CASL This is an introductory paper to help you understand why one would want
to adopt the programming language CASL. SAS® Cloud Analytic
Services language (CASL) is a language specification used by both SAS®
clients and other clients to interact with SAS® Cloud Analytic
Services (CAS). CASL is a statement-based scripting language that supports
the entire analytical lifecycle (data management, analytics, and scoring),
as well as the monitoring of the CAS server. The strength of this language
can be seen in how easy it is to initialize parameters to CAS actions and to
manipulate the results of these actions. The results can then be used to
prepare the parameters for the execution of another action. The goal is to
provide an environment that enables a user to string together multiple
actions to accomplish a specific result. CASL provides access to all of your
favorite features in SAS, including Base SAS® in-database
procedures, the DATA step, DS2, FedSQL, formats, macros, and the Output
Delivery System (ODS). CASL is ideal for splicing together access to the CAS
server between the use of other SAS procedures from a SAS client or from
another client, or simply to create your own custom application.
Steven Sober, SAS
Session 3078-2019
Report localization was first introduced in SAS® Visual Analytics Designer 7.1. This feature made it easy for a user or a language specialist to translate a single report into multiple languages for report users. Many SAS® sample reports and customer reports have been translated using the 7.x releases of SAS Visual Analytics Designer. However, a user had to have a report-designing role to translate report content, which meant that the design of a report could be altered during localization. SAS Visual Analytics 8.3 is the first release based on SAS® Viya® that lets you localize reports. Starting in the 8.3 release, report content can be translated without the possibility of altering your report design. There is a new set of command-line interfaces (CLIs) in SAS Viya. The SAS reports CLI enables you to localize report content without requiring the report to be open for editing in SAS Visual Analytics. In this presentation, we introduce the SAS report CLI features and demonstrate how you can use them to localize your reports.
Elizabeth Bales and
James Holman, SAS
Session 3480-2019
Join us as we explore new features and functionality in the SAS® Function Compiler (FCMP). Integration with Python, support for running analytic scoring containers (ASTORE), and a new FCMP action set are the main topics we cover. Learn how to leverage your existing investment in Python by calling Python functions from an FCMP function. Get the most from your ASTORE by porting it from SAS® Viya® to SAS® 9.4 TS1M6, and then run it from an FCMP function called from the SAS® DATA step. Learn how to port your favorite user-written functions and subroutines to SAS® Cloud Analytic Services, and then use them within a computed column or another action. This paper shows you the tips and tricks you need to integrate your existing FCMP code with these new SAS technologies. Included are several examples to quickly get you started.
Andrew Henrick,
Aaron Mays,
Stacey Christian, and
Bill Mcneill, SAS
Michael Whitcher, SAS
Session 3943-2019
There are times when it is convenient to be able to compare two file system areas that are supposed to be equivalent. This could be during a migration activity involving changing servers or SAS® versions. This paper covers techniques that can help facilitate the comparison of programs, logs, Microsoft Excel files, and SAS data sets, as well as techniques for digesting the results.
Brian Varney, Experis
Session 3801-2019
The latest releases of SAS® Data Management provide a comprehensive and integrated set of capabilities for collecting, transforming, managing, and governing your data. The latest features in the product suite include capabilities for working with data from a wide variety of environments and types including Apache Hadoop, cloud data sources, relational database management system (RDBMS) files, unstructured data, and streaming. You also have the ability to perform extract, transform, load (ETL) and extract, load, transform (ELT) processes in diverse run-time environments such as SAS®, Hadoop, Apache Spark, SAS® Analytics, cloud, and data virtualization environments. There are also new artificial intelligence features that can provide insight into your data and help simplify the steps needed to prepare data for analytics. This paper provides an overview of the latest features of the SAS Data Management product suite and includes use cases and examples for leveraging product capabilities.
Nancy Rausch, SAS
Session 3220-2019
This paper demonstrates the challenges you might face and their solutions when you use SAS® to process large data sets. First, we demonstrate how to use SAS system options to program query efficiency. Next, we tune SAS programs to improve big data performance, modify SQL queries to maximize implicit pass-through, and re-architect processes to improve performance. Along the way, we identify situations in which data precision might be degraded, and then leverage FedSQL and DS2 to conduct full-precision calculations on a wide variety of ANSI data types. Finally, we look at boosting speed by using parallel processing techniques. We begin by using DS2 in Base SAS® to speed up CPU bound processes, and then kick it up a notch with the SAS® In-Database Code Accelerator for in-database processing on massively parallel processing (MPP) platforms like Teradata and Apache Hadoop. And for the ultimate speed boost, we convert our programs to use distributed processing and process memory-resident data in SAS® Viya® and SAS® Cloud Analytic Services (CAS).
Mark Jordan, SAS
Session 3089-2019
In 2018, Ernestynne delivered one of the top-rated presentations at the SAS® Users New Zealand conference. The presentation talked about a future of working with both open-source and proprietary software. It covered the benefits and provided some examples of combining both. The future is now here. This 90-minute Hands-On Workshop drills down into examples that were briefly touched on during that presentation. Attendees get a taste of working with R in SAS® Enterprise MinerTM and some general best practice tips. There is an introduction to the Scripting Wrapper for Analytics Transfer (SWAT) package, which enables you to interface with SAS® Viya® in open-source languages like R. This presentation briefly covers avoiding some traps that you can fall into when working with the SWAT package. The techniques learned here can be extended to enable you to work with Python in SAS®. The emphasis of this Hands-On Workshop is how to combine SAS with open-source tools, rather than teaching attendees how to code in R.
Ernestynne Walsh, Nicholson Consulting Ltd
Session 3569-2019
SAS® procedures can convey an enormous amount of information sometimes more information than is needed. Most SAS procedures generate ODS objects behind the scenes. SAS uses these objects with style templates that have custom buckets for certain types of output to produce the output that we see in all destinations (including the SAS listing). By tracing output objects and ODS templates using ODS TRACE (DOM) and by manipulating procedural output and ODS OUTPUT objects, we can pick and choose just the information that we want to see. We can then harness the power of SAS data management and reporting procedures to coalesce the information collected and present the information accurately and attractively.
Louise Hadden, Abt Associates Inc
Session 3183-2019
Have you ever found a table of data on a web page and wanted to scrape that data into a SAS® data set? This paper shows you how to do it in seconds using SAS® Graphics Accelerator for Google Chrome, which is a free extension for Google Chrome. First, you open the web page in Chrome. Then, you extract data from the table into your Laboratory within SAS Graphics Accelerator for Google Chrome. SAS Graphics Accelerator for Google Chrome automatically derives variable labels from column headings and variable lengths from observations. It also derives variable types if all observations in a column match patterns for a single type. If they don't, you can choose the type of a variable manually. Finally, you generate a SAS program that creates a data set in your WORK library with correct variable labels, lengths, and formats. With SAS Graphics Accelerator for Google Chrome, you'll be the fastest data wrangler on the web!
Brice Smith,
Ed Summers, and
Sean Mealin, SAS
Session 3258-2019
SAS® Visual Forecasting, the new-generation forecasting product from SAS, includes a web-based user interface for creating and running projects that generate forecasts from historical data. It is designed to use the highly parallel and distributed architecture of SAS® Viya®, a cloud-enabled, in-memory analytics engine that is powered by SAS® Cloud Analytic Services (CAS), to effectively model and forecast time series on a large scale. SAS Visual Forecasting includes several built-in modeling strategies, which serve as ready-to-use models for generating forecasts. It also supports custom modeling nodes, where you can write and import your own code-based modeling strategies. Similar to the ready-to-use models, these custom models can also be shared with other projects and forecasters. Forecasters can use SAS Visual Forecasting to create projects by using visual flow diagrams (called pipelines), running multiple built-in or code-based models on the same data, and choosing a champion model based on the results. This paper uses a gradient boosting tree model as an example to demonstrate how you can use a custom modeling node in SAS Visual Forecasting to develop and implement your own modeling strategy.
Yue Li,
Iman Vasheghani Farahani, and
Jingrui Xie, SAS
Session 3391-2019
This paper discusses three different types of storage services available
from Amazon Web Services (AWS)Amazon S3, Amazon Aurora, and Amazon
Redshift and how SAS® can access data from each type of storage
service for analytical purposes. Amazon S3 stores data as objects, such as
files. You can access these files by using the S3 procedure in SAS. Amazon
Aurora is a relational database that is part of Amazon Relational Database
Service. The engine for Aurora is compatible with both MySQL or Postgres.
Depending on whether the Aurora database is based on MySQL or Postgres, you
can use SAS/ACCESS® Interface to MySQL or SAS/ACCESS®
Interface to Postgres to access the data in Aurora. Amazon Redshift is a
fully managed, scalable data warehouse in the cloud. You can use
SAS/ACCESS® Interface to Amazon Redshift to access data stored
in Amazon Redshift.
Lou Galway, SAS Institute Inc.
Session 3768-2019
SAS/STAT® and SAS/ETS® software has several
procedures for working with count data based on the Poisson distribution or
the quadratic negative binomial distribution (NB-2). Count data might have
either an excess number of zeros (inflation) or the situation where zero is
not a valid outcome (truncation). The zero-inflated Poisson and negative
binomial models are available with the GENMOD and FMM procedures. The FMM
procedure also provides the zero-truncated Poisson and negative binomial
distributions. Other types of count data models such as the restricted and
unrestricted generalized Poisson, linear negative binomial (NB-1), and
Poisson-Inverse Gaussian (PIG) distributions can also serve as choices to
model count data and likewise are subject to zero-inflation or truncation.
Programming statements entered into the NLMIXED procedure in SAS/STAT can
model both zero-inflated and zero-truncated count data with other types of
distributions, and as a result improve model fit.
Robin High, University of Nebraska Medical Center