One of the first lessons that SAS® programmers learn on the job is that numeric and character variables do not play well together, and that type mismatches are one of the more common source of errors in their otherwise flawless SAS programs. Luckily, converting variables from one type to another in SAS (that is, casting) is not difficult, requiring only the judicious use of either the input() or put() function. There remains, however, the danger of data being lost in the conversion process. This type of error is most likely to occur in cases of character-to-numeric variable conversion, most especially when the user does not fully understand the data contained in the data set. This paper will review the basics of data storage for character and numeric variables in SAS, the use of formats and informats for conversions, and how to ensure accurate type conversion of even high-precision numeric values.
Andrew Clapson, Statistics Canada
Finding groups with similar attributes is at the core of knowledge discovery. To this end, Cluster Analysis automatically locates groups of similar observations. Despite successful applications, many practitioners are uncomfortable with the degree of automation in Cluster Analysis, which causes intuitive knowledge to be ignored. This is more true in text mining applications since individual words have meaning beyond the data set. Discovering groups with similar text is extremely insightful. However, blind applications of clustering algorithms ignore intuition and hence are unable to group similar text categories. The challenge is to integrate the power of clustering algorithms with the knowledge of experts. We demonstrate how SAS/STAT® 9.2 procedures and the SAS® Macro Language are used to ensemble the opinion of domain experts with multiple clustering models to arrive at a consensus. The method has been successfully applied to a large data set with structured attributes and unstructured opinions. The result is the ability to discover observations with similar attributes and opinions by capturing the wisdom of the crowds whether man or model.
Masoud Charkhabi, Canadian Imperial Bank of Commerce (CIBC)
Ling Zhu, Canadian Imperial Bank of Commerce (CIBC)
SAS/ACCESS® Interface to ODBC has been around forever. On one level, ODBC is very easy to use. That ease hides the flexibility that ODBC offers. This presentation uses examples to show you how to increase your program's performance and troubleshoot problems. You will learn: the differences between ODBC and OLE DB what the odbc.ini file is (and why it is important) how to discover what your ODBC driver is actually doing the difference between a native ACCESS engine and SAS/ACCESS Interface to ODBC
Jeff Bailey, SAS
The Kolmogorov-Smirnov (K-S) test is one of the most useful and general nonparametric methods for comparing two samples. It is sensitive to all types of differences between two populations (shift, scale, shape, and so on). In this paper, we will present a thorough investigation into the K-S test including, derivation of the formal test procedure, practical demonstration of the test, large sample approximation of the test, and ease of use in SAS® using the NPAR1WAY procedure.
Tison Bolen, Cardinal Health
Dawit Mulugeta, Cardinal Health
Jason Greenfield, Cardinal Health
Lisa Conley, Cardinal Health
Most marketers today are trying to use Facebook s network of 1.1 billion plus registered users for social media marketing. Local television stations and newspapers are no exception. This paper investigates what makes a post effective. A Facebook page that is owned by a brand has fans, or people who like the page and follow the stories posted on that page. The posts on a brand page, however, do not appear on all the fans News Feeds. This is determined by EdgeRank, a Facebook proprietary algorithm that determines what content users see and how it s prioritized on their News Feed. If marketers can understand how EdgeRank works, then they can develop more impactful posts and ultimately produce more effective social marketing using Facebook. The objective of this paper is to find the characteristics of a Facebook post that enhance the efficacy of a news outlet s page among their fans using Facebook Power Ratio as the target variable. Power Ratio, a surrogate to EdgeRank, was developed by experts at Frank N. Magid Associates, a research-based media consulting firm. Seventeen variables that describe the characteristics of a post were extracted from more than 8,000 posts, which were encoded by 10 media experts at Magid. Numerous models were built and compared to predict Power Ratio. The most useful model is a polynomial regression with the top three important factors as whether a post asks fans to like the post, content category of a post (including news, weather, etc.), and number of fans of the page.
Dinesh Yadav Gaddam, Oklahoma State University
Yogananda Domlur Seetharama, Oklahoma State University
Many different neuroscience researchers have explored how various parts of the brain are connected, but no one has performed association mining using brain data. In this study, we used SAS® Enterprise Miner™ 7.1 for association mining of brain data collected by a 14-channel EEG device. An application of the association mining technique is presented in this novel context of brain activities and by linking our results to theories of cognitive neuroscience. The brain waves were collected while a user processed information about Facebook, the most well-known social networking site. The data was cleaned using Independent Component Analysis via an open source MATLAB package. Next, by applying the LORETA algorithm, activations at every fraction of the second were recorded. The data was codified into transactions to perform association mining. Results showing how various parts of brain get excited while processing the information are reported. This study provides preliminary insights into how brain wave data can be analyzed by widely available data mining techniques to enhance researcher s understanding of brain activation patterns.
Pankush Kalgotra, Oklahoma State University
Ramesh Sharda, Oklahoma State University
Goutam Chakraborty, Oklahoma State University
In our previous work, we often needed to perform large numbers of repetitive and data-driven post-campaign analyses to evaluate the performance of marketing campaigns in terms of customer response. These routine tasks were usually carried out manually by using Microsoft Excel, which was tedious, time-consuming, and error-prone. In order to improve the work efficiency and analysis accuracy, we managed to automate the analysis process with SAS® programming and replace the manual Excel work. Through the use of SAS macro programs and other advanced skills, we successfully automated the complicated data-driven analyses with high efficiency and accuracy. This paper presents and illustrates the creative analytical ideas and programming skills for developing the automatic analysis process, which can be extended to apply in a variety of business intelligence and analytics fields.
Justin Jia, Canadian Imperial Bank of Commerce (CIBC)
Amanda Lin, Bell Canada
The creation of production reports for our organization has historically been a labor-intensive process. Each month, our team produced around 650 SAS® graphs and 30 tables which were then copied and pasted into 16 custom Microsoft PowerPoint presentations, each between 20 and 30 pages. To reduce the number of manual steps, we converted to using stored processes and the SAS® Add-In for Microsoft Office. This allowed us to simply refresh those 16 PowerPoint presentations by using SAS Add-In for Microsoft Office to run SAS® Stored Processes. SAS Stored Processes generates the graphs and tables while SAS Add-In for Microsoft Office refreshes the document with updated graphs already sized and positioned on the slides just as we need them. With this new process, we are realizing the dream of reducing the amount of time spent on a single monthly production process. This paper will discuss the steps to creating a complex PowerPoint presentation that is simply refreshed rather than created new each month. I will discuss converting the original code to stored processes using SAS® Enterprise Guide®, options and style statements that are required to continue to use a custom style sheet, and how to create the PowerPoint presentation with an assortment of output types including horizontal bar charts, control charts, and tables. I will also discuss some of the challenges and solutions specific to the stored process and PowerPoint Add-In that we encountered during this conversion process.
Julie VanBuskirk, Baylor Health Care System
This paper provides an overview of how to create a SAS® Enterprise Guide® process that is well designed, simple, documented, automated, modular, efficient, reliable, and easy to maintain. Topics include how to organize a SAS Enterprise Guide process, how to best document in SAS Enterprise Guide, when to leverage point-and-click functionality, and how to automate and simplify SAS Enterprise Guide processes. This paper has something for any SAS Enterprise Guide user, new or experienced!
Jennifer First-Kluge, Systems Seminar Consultants
Steven First, Systems Seminar Consultants
In the credit card industry, there is a group of people who use credit cards as an interest-free loan by transferring their balances between cards during 0% balance transfer (BT) periods in order to avoid paying interest. These people are called gamers. Gamers generate losses for banks due to their behavior of paying no interest and having no purchases. It is hard to use traditional ways, such as risk scorecards, to identify them since gamers tend to have very good credit histories. This paper uses Naive Bayes classifier to classify gamers into three segments, according to the proportion of gamers. Using this model, the targeting policy and underwriting policy can be significantly improved and the function of tracking the proportion of gamers in population can be realized. This result has been accomplished by using logistic regression in SAS® combined with a Microsoft Excel pivot table. The procedure is described in detail in this paper.
Yang Ge, Lancaster University
A case control study is in its most basic form comparing a case series to a matched control series and are commonly implemented in the field of public health. While matching is intended to eliminate confounding, the main potential benefit of matching in case control studies is a gain in efficiency. There are many known methods for selecting potential match or matches (in case of 1:n studies) per case, the most prominent being distance-based approach and matching on propensity scores. In this paper, we will go through both and compare their results and will present a macro capable of performing both.
Lovedeep Gondara, BC Cancer Agency
Colleen Mcgahan, BC Cancer Agency
The life of a SAS® program can be broken down into sets of changes made over time. Programmers are generally focused on the future, but when things go wrong, a look into the past can be invaluable. Determining what changes were made, why they were made, and by whom can save both time and headaches. This paper discusses version control and the current options available to SAS® Enterprise Guide® users. It then highlights the upcoming Program History feature of SAS Enterprise Guide. This feature enables users to easily track changes made to SAS programs. Properly managing the life cycle of your SAS programs will enable you to develop with peace of mind.
Joe Flynn, SAS
Casey Smith, SAS
Alex Song, SAS
With the growth in size and complexity of organizations investing in SAS® platform technologies, the size and complexity of ETL subsystems and data integration (DI) jobs is growing at a rapid rate. Developers are pushed to come up with new and innovative ways to improve process efficiency in their DI jobs to meet increasingly demanding service level agreements (SLAs). The ability to conditionally execute or switch paths in a DI job is an extremely useful technique for improving process efficiency. How can a SAS® Data Integration developer design a job to best suit conditional execution? This paper discusses a technique for providing a parameterized dynamic execution custom transformation that can be easily incorporated into SAS® Data Integration Studio jobs to provide process path switching capabilities. The aim of any data integration task is to ensure that all sources of business data are integrated as efficiently as possible. It is concerned with the repurposing of data via transformation, should be a value-adding process, and also should be the product of collaboration. Modularization of common or repeatable processes is a fundamental part of the collaboration process in DI design and development. Switch path a custom transformation built to conditionally execute branches or nodes in SAS Data Integration Studio provides a reusable module for solving the conditional execution limitations of standard SAS Data Integration Studio transformations and jobs. Switch Path logic in SAS Data Integration Studio can serve many purposes in day-to-day business needs for a SAS data integration developer as it is completely reusable
Prajwal Shetty, Tesco
Paper SAS2401-2014:
Confessions of a SAS® Dummy
People from all over the world are using SAS® analytics to achieve great things, such as to develop life-saving medicines, detect and prevent financial fraud, and ensure the survival of endangered species. Chris Hemedinger is not one of those people. Instead, Chris has used SAS to optimize his baby name selections, evaluate his movie rental behavior, and analyze his Facebook friends. Join Chris as he reviews some of his personal triumphs over the little problems in life, and learn how these exercises can help to hone your skills for when it really matters.
Chris Hemedinger, SAS
Business Intelligence platforms provide a bridge between expert data analysts and decision-makers and other end-users. But what do you do when you can identify no system that meets both your needs and your budget? If you are the Consolidated Data Analysis Center in the HHS Office of Inspector General, you use SAS® Enterprise BI Server and the SAS® Stored Process Web Application to build your own. This presentation covers the inception, design, and implementation of the PAYment by Geographic Area (PAYGAR) system, which uses only SAS® Enterprise BI tools, namely the SAS Stored Process Web Application, PROC GMAP, and HTML/JAVA embedded in a DATA step, to create an interactive platform for presenting and exploring data that has a geographic component. In particular, the presentation reviews how we created a system of chained stored processes to enable a user to select the data to be presented, navigate through different geographic levels, and display companion reports related to the current data and geographic selections. It also covers the creation of the HTML front-end that sits over and manages the system. Throughout, the presentation emphasizes the scalability of PAYGAR, which the SAS Stored Process Web Application facilitates.
Scott Hutchison, HHS Office of Inspector General
John Venturini, Piper Enterprise Solutions
Energy companies that operate in a highly regulated environment and are constrained in pricing flexibility must employ a multitude of approaches to maintain high levels of customer satisfaction. Many investor-owned utilities are just starting to embrace a customer-centric business model to improve the customer experience and hold the line on costs while operating in an inflationary business setting. Faced with these challenges, it is natural for utility executives to ask: 'What drives customer satisfaction, and what is the optimum balance between influencing customer perceptions and improving actual process performance in order to be viewed as a top-tier performer by our customers?' J.D. Power, for example, cites power quality and reliability as the top influencer of overall customer satisfaction. But studies have also shown that customer perceptions of reliability do not always match actual reliability experience. This apparent gap between actual and perceived performance raises a conundrum: Should the utility focus its efforts and resources on improving actual reliability performance or would it be better to concentrate on influencing customer perceptions of reliability? How can this conundrum be unraveled with an analytically driven approach? In this paper, we explore how the design of experiment techniques can be employed to help understand the relationship between process performance and customer perception, thereby leading to important insights into the energy customer equation and higher customer satisfaction!
Mark Konya, Ameren Missouri
Kathy Ball, SAS
In this new era of healthcare reform, health insurance companies have heightened their efforts to pinpoint who their customers are, what their characteristics are, what they look like today, and how this impacts business in today s and tomorrow s healthcare environment. The passing of the Healthcare Reform policies led insurance companies to focus and prioritize their projects on understanding who the members in their current population were. The goal was to provide an integrated single view of the customer that could be used for retention, increased market share, balancing population risk, improving customer relations, and providing programs to meet the members' needs. By understanding the customer, a marketing strategy could be built for each customer segment classification, as predefined by specific attributes. This paper describes how SAS® was used to perform the analytics that were used to characterize their insured population. The high-level discussion of the project includes regression modeling, customer segmentation, variable selection, and propensity scoring using claims, enrollment, and third-party psychographic data.
MaryAnne DePesquo, BlueCross BlueShield of Arizona
The Washington D.C. aqueduct was completed in 1863, carrying desperately needed clean water to its many residents. Just as the aqueduct was vital and important to its residents, a lifeline if you will, so too is the supply of data to the business. Without the flow of vital information, many businesses would not be able to make important decisions. The task of building my company s first dashboard was brought before us by our CIO; the business had not asked for it. In this poster, I discuss how we were able to bring fresh ideas and data to our business units by converting the data they saw on a daily basis in reports to dashboards. The road to success was long with plenty of struggles from creating our own business requirements to building data marts, synching SQL to SAS®, using information maps and SAS® Enterprise Guide® projects to move data around, all while dealing with technology and other I.T. team roadblocks. Then on to designing what would become our real-time dashboards, fighting for SharePoint single sign-on, and, oh yeah, user adoption. My story of how dashboards revitalized the business is a refreshing tale for all levels.
Jennifer McBride, Virginia Credit Union
New innovative, analytical techniques are necessary to extract patterns in big data that have temporal and geo-spatial attributes. An approach to this problem is required when geo-spatial time series data sets, which have billions of rows and the precision of exact latitude and longitude data, make it extremely difficult to locate patterns of interest The usual temporal bins of years, months, days, hours, and minutes often do not allow the analyst to have control of the precision necessary to find patterns of interest. Geohashing is a string representation of two-dimensional geometric coordinates. Time hashing is a similar representation, which maps time to preserve all temporal aspects of the date and time of the data into a one-dimensional set of data points. Geohashing and time hashing are both forms of a Z-order curve, which maps multidimensional data into single dimensions and preserves the locality of the data points. This paper explores the use of a multidimensional Z-order curve, combining both geohashing and time hashing, that is known as geo-temporal hashing or space-time boxes using SAS®. This technique provides a foundation for reducing the data into bins that can yield new methods for pattern discovery and detection in big data.
Richard La Valley, Leidos
Abraham Usher, Human Geo Group
Don Henderson, Henderson Consulting Services
Paul Dorfman, Dorfman Consulting
New York City boasts a wide variety of cuisine owing to the rich tourism and the vibrant immigrant population. The quality of food and hygiene maintained at the restaurants serving different cuisines has a direct impact on the people dining in them. The objective of this paper is to build a model that predicts the grade of the restaurants in New York City. It also provides deeper statistical insights into the distribution of restaurants, cuisine categories, grades, criticality of violations, etc., and concludes with the sequence analysis performed on the complete set of violations recorded for the restaurants at different time periods over the years 2012 and 2013. The data for 2013 is used to test the model. The data set consists of 15 variables that capture to restaurant location-specific and violation details. The target is an ordinal variable with three levels, A, B, and C, in descending order of the quality representation. Various SAS® Enterprise Miner™ models, logistic regression, decision trees, neural networks, and ensemble models are built and compared using validation misclassification rate. The stepwise regression model appears to be the best model, with prediction accuracy of 75.33%. The regression model is trained at step 3. The number of critical violations at 8.5 gives the root node for the split of the target levels, and the rest of the tree splits are guided by the predictor variables such as number of critical and non-critical violations, number of critical violations for the year 2011, cuisine group, and the borough.
Pruthvi Bhupathiraju Venkata, Oklahoma State University
Goutam Chakraborty, Oklahoma State University
With the introduction of new features in SAS® 9.4 Grid Manager, administrators of SAS solutions have even better capabilities for effectively managing the use of SAS® Enterprise Guide® in a grid environment. In this paper, we explain and demonstrate proven practices for configuring the SAS 9.4 Grid Manager environment, leveraging grid options sets and grid-spawned SAS® Workspace Servers. We walk through the options provided by SAS Enterprise Guide that make the most effective use of the grid environment.
Edoardo Riva, SAS
Data quality depends on review by operational stewards of the content. Volumes of complex data disappear as e-mail attachments. Is there a critical data shift that might be missed? Embedding a summary image drives expert data review from 15% to 87%. Downstream error rate is significantly reduced. Increased accuracy to variable physician compensation measures results.
Amy Swartz, Kaiser Permanente
The availability of specialized programming and analysis resources in academic medical centers is often limited, creating a significant challenge for clinical research. The current work describes how Base SAS® and SAS® Enterprise Guide® are being used to empower research staff so that they are less reliant on these scarce resources.
Chris Schacherer, Clinical Data Management Systems, LLC
For data analysts, one of the most important steps after manipulating and analyzing the data set is to create a report for it. Nowadays, many statistics tables and reports are generated as HTML files that can be easily accessed through the Internet. However, the SAS® Output Delivery System (ODS) HTML output has many limitations on interacting with users. In this paper, we introduce a method to enhance the traditional ODS HTML output by using jQuery (a JavaScript library). A macro was developed to implement this idea. Compared to the standard HTML output, this macro can add sort, pagination, search, and even dynamic drilldown function to the ODS HTML output file.
Yu Fu, Oklahoma State Department of Health
Chao Huang, Oklahoma State University
SAS® is an outstanding suite of software, but not everyone in the workplace speaks SAS. However, almost everyone speaks Excel. Often, the data you are analyzing, the data you are creating, and the report you are producing is a form of a Microsoft Excel spreadsheet. Every year at SAS® Global Forum, there are SAS and Excel presentations, not just because Excel isso pervasive in the workplace, but because there s always something new to learn (or re-learn)! This paper summarizes and references (and pays homage to!) previous SAS Global Forum presentations, as well as examines some of the latest Excel capabilities with the latest versions of SAS® 9.4 and SAS® Visual Analytics.
Andrew Howell, ANJ Solutions
Business Intelligence (BI) dashboards serve as an invaluable, high-level, visual reference tool for decision-making processes in many business industries. A request was made to our department to develop some BI dashboards that could be incorporated in an academic setting. These dashboards would aim to serve various undergraduate executive and administrative staff at the university. While most business data may lend itself to work very well and easily in the development of dashboards, academic data is typically modeled differently and, therefore, faces unique challenges. In this paper, the authors detail and share the design and development process of creating dashboards for decision making in an academic environment utilizing SAS® BI Dashboard 4.3 and other SAS® Enterprise Business Intelligence 9.2 tools. The authors also provide lessons learned as well as recommendations for future implementations of BI dashboards utilizing academic data.
Evangeline Collado, University of Central Florida
Michelle Parente, University of Central Florida
No way. Not gonna happen. I am a real SAS® programmer. (Spoken by a Real SAS Programmer.) SAS® Enterprise Guide® sometimes gets a bad rap. It was originally promoted as a code generator for non-programmers. The truth is, however, that SAS Enterprise Guide has always allowed programmers to write their own code. In addition, it offers many features that are not included in PC SAS®. This presentation shows you the top ten features that people who like to write code care about. It will be taught by a programmer who now prefers using SAS Enterprise Guide.
Christopher Bost, MDRC
PROC TABULATE is the most widely used reporting tool in SAS®, along with PROC REPORT. Any kind of report with the desired statistics can be produced by PROC TABULATE. When we need to report some summary statistics like mean, median, and range in the heading, either we have to edit it outside SAS in word processing software or enter it manually. In this paper, we discuss how we can automate this to be dynamic by using PROC SQL and some simple macros.
Lovedeep Gondara, BC Cancer Agency
This paper shares our experience integrating two leading data analytics and Geographic Information Systems (GIS) software products SAS® and ArcGIS to provide integrated reporting capabilities. SAS is a powerful tool for data manipulation and statistical analysis. ArcGIS is a powerful tool for analyzing data spatially and presenting complex cartographic representations. Combining statistical data analytics and GIS provides increased insight into data and allows for new and creative ways of visualizing the results. Although products exist to facilitate the sharing of data between SAS and ArcGIS, there are no ready-made solutions for integrating the output of these two tools in a dynamic and automated way. Our approach leverages the individual strengths of SAS and ArcGIS, as well as the report delivery infrastructure of SAS® Information Delivery Portal.
Nathan Clausen, CACI
Aaron House, CACI
Missing data is an ever-present issue, and analysts should exercise proper care when dealing with it. Depending on the data and the analytical approach, this problem can be addressed by simply removing records with missing data. However, in most cases, this is not the best approach. In fact, this can potentially result in inaccurate or biased analyses. The SAS® programming language offers many DATA step processes and functions for handling missing values. However, some analysts might not like or be comfortable with programming. Fortunately, SAS® Enterprise Guide® can provide those analysts with a number of simple built-in tasks for discovering missing data and diagnosing their distribution across fields. In addition, various techniques are available in SAS Enterprise Guide for imputing missing values, varying from simple built-in tasks to more advanced tasks that might require some customized SAS code. The focus of this presentation is to demonstrate how SAS Enterprise Guide features such as Query Builder, Filter and Sort Wizard, Describe Data, Standardize Data, and Create Time Series address missing data issues through the point-and-click interface. As an example of code integration, we demonstrate the use of a code node for more advanced handling of missing data. Specifically, this demonstration highlights the power and programming simplicity of PROC EXPAND (SAS/ETS® software) in imputing missing values for time series data.
Elena Shtern, SAS
Matt Hall, SAS
When first presented with SAS® Enterprise Guide®, many existing SAS® programmers don't know where to begin. They want to understand, 'What's in it for me?' if they switch over. These longtime users of SAS are accustomed to typing all of their code into the Program Editor window and clicking Submit. This beginning tutorial introduces SAS Enterprise Guide 6.1 to old and new users of SAS who need to code. It points out advantages and tips that demonstrate why a user should be excited about the switch. This tutorial focuses on the key points of a session involving coding and introduces new features. It covers the top three items for a user to consider when switching over to a server-based environment. Attendees will return to the office with a new motivation and confidence to start coding with SAS Enterprise Guide.
Andy Ravenna, SAS
Portfolio segmentation is key in all forecasting projects. Not all products are equally predictable. Nestl uses animal names for its segmentation, and the animal behavior translates well into how the planners should plan these products. Mad Bulls are those products that are tough to predict, if we don't know what is causing their unpredictability. The Horses are easier to deal with. Modern time series based statistical forecasting methods can tame Mad Bulls, as they allow to add explanatory variables into the models. Nestl now complements its Demand Planning solution based on SAP with predictive analytics technology provided by SAS®, to overcome these issues in an industry that is highly promotion-driven. In this talk, we will provide an overview of the relationship Nestl is building with SAS, and provide concrete examples of how modern statistical forecasting methods available in SAS® Demand-Driven Planning and Optimization help us to increase forecasting performance, and therefore to provide high service to our customers with optimized stock, the primary goal of Nestl 's supply chains.
Marcel Baumgartner, Nestlé SA
This case study shows how SAS® Enterprise Guide® and SAS® Enterprise BI made it possible to easily implement reports of fraud prevention in BF Financial Services and also how to help operational areas to increase efficiency through automation of information delivery. The fraud alert report was made using a program developed in SAS Enterprise Guide to detect frauds on loan applications and later published in SAS® Web Report Studio in order to be analyzed by a team. The second example is the automation by SAS BI of a payment report that spent 30% of the time of a six-worker staff.
Plinio Faria, Bradesco
The role of the Data Scientist is the viral job description of the decade. And like LOLcats, there are many types of Data Scientists. What is this new role? Who is hiring them? What do they do? What skills are required to do their job? What does this mean for the SAS® programmer and the statistician? Are they obsolete? And finally, if I am a SAS user, how can I become a Data Scientist? Come learn about this job of the future and what you can do to be part of it.
Chuck Kincaid, Experis Business Analytics
Do you have data in SharePoint that you would like to run analysis on with SAS®? This workshop teaches you how to create a custom task in SAS® Enterprise Guide® in order to find, retrieve, and format that data into a SAS data set for use in your SAS programs.
Bill Reid, SAS
No matter how long you ve been programming in SAS®, using and manipulating dates still seems to require effort. Learn all about SAS dates, the different ways they can be presented, and how to make them useful. This paper includes excellent examples for dealing with raw input dates, functions to manage dates, and outputting SAS dates into other formats. Included is all the date information you will need: date and time functions, Informats, formats, and arithmetic operations.
Jenine Milum, Equifax Inc.
This presentation is for users who are familiar with SAS® Enterprise Guide® but might not be aware of the many useful new features added in versions 4.2 and beyond. For example, SAS Enterprise Guide allows you to: Format your SAS® source code to make it easier to read. Easily schedule a project to run at a given time. Work with OLAP data in your enterprise. We will overview these and other features to help you become even more productive using this powerful application.
Mark Allemang, SAS
This presentation addresses two main topics: The first topic focuses on the industry's norms and the best practices for building internal credit ratings (PD, EAD, and LGD). Although there is not any capital relief to local US banks using internal credit ratings (the US hs not adopted the Internal Rating Based approach of Basel2, with the exception of the top 10 banks), there is an increased responsiveness in credit ratings modeling for the last two years in the US banking industry. The main reason is the added value a bank can achieve from these ratings, and that is the focus of the second part of this presentation. It describes our journey (a client story) for getting there, introducing the SAS® project. Even more importantly, it describes how we use credit ratings in order to achieve effective credit risk management and get real added value out of that investment. The key success factor for achieving it is to effectively implement ratings within the credit process and throughout decision making . Only then can ratings be used to improve risk-adjusted return on capital, which is the high-end objective of all of us.
Boaz Galinson, Bank Leumi
Understanding the actual gambling behavior of an individual over the Internet, we develop markers which identify behavioral patterns, which in turn can be used to predict the level of risk a subscriber is prone to gambling. The data set contains 4,056 subscribers. Using SAS® Enterprise Miner™ 12.1, a set of models are run to predict which subscriber is likely to become a high-risk internet gambler. The data contains 114 variables such as first active date and first active product used on the website as well as the characteristics of the game such as fixed odds, poker, casino, games, etc. Other measures of a subscriber s data such as money put at stake and what odds are being bet are also included. These variables provide a comprehensive view of a subscriber s behavior while gambling over the website. The target variable is modeled as a binary variable, 0 indicating a risky gambler and 1 indicating a controlled gambler. The data is a typical example of real-world data with many missing values and hence had to be transformed, imputed, and then later considered for analysis. The model comparison algorithm of SAS Enterprise Miner 12.1 was used to determine the best model. The stepwise Regression performs the best among a set of 25 models which were run using over a 100 permutations of each model. The Stepwise Regression model predicts a high-risk Internet gambler at an accuracy of 69.63% with variables such as wk4frequency and wk3frequency of bets.
Sai Vijay Kishore Movva, Oklahoma State University
Vandana Reddy, Oklahoma State University
Goutam Chakraborty, Oklahoma State University
Traditional SAS® programs typically consist of a series of SAS DATA steps, which refine input data sets until the final data set or report is reached. SAS DATA steps do not run in-database. However, SAS® Enterprise Guide® users can replicate this kind of iterative programming and have the resulting process flow run in-database by linking a series of SAS Enterprise Guide Query Builder tasks that output SAS views pointing at data that resides in a Teradata database, right up to the last Query Builder task, which generates the final data set or report. This session both explains and demonstrates this functionality.
Frank Capobianco, Teradata
Formats are an often under-valued tool in the SAS® toolbox. They can be used in just about all domains to improve the readability of a report, or they can be used as a look-up table to recode your data. Out of the box, SAS includes a multitude of ready-defined formats that can be applied without modification to address most recode and redisplay requirements. And if that s not enough, there is also a FORMAT procedure for defining your own custom formats. This paper looks at using some of the formats supplied by SAS in some innovative ways, but primarily focuses on the techniques we can apply in creating our own custom formats.
Brian Bee, The Knowledge Warehouse Ltd
Report automation and scheduling are very hot topics in many industries. They confer many advantages including reduced work load, elimination of repetitive tasks, generatation of accurate results, and better performance. This paper illustrates how to design an appropriate program to automate and schedule reports in SAS® 9.1 and SAS® Enterprise Guide® 5.1 using a SAS® server as well as the Windows Scheduler. The automation part includes good aspects of formatting Microsoft Excel tables using XML or VBA coding or any other formats, and conditional auto e-mailing with file attachments. We systematically walk through each step with a clear flow diagram from the data source to the final destination. We also discuss details of server-side and PC-side schedulers and how these schedulers involve invoking batch programs.
Anjan Matlapudi, AmerihealthCaritas
Traditional merchandise planning processes have been primarily product and location focused, with decisions about assortment selection, breadth and depth, and distribution based on the historical performance of merchandise in stores. However, retailers are recognizing that in order to compete and succeed in an increasingly complex marketplace, assortments must become customer-centric. Advanced analytics can be leveraged to generate actionable insights into the relevance of merchandise to a retailer's various customer segments and purchase channel preferences. These insights enrich the merchandise and assortment planning process. This paper describes techniques for using advanced analytics to impact customer-centric assortments. Topics covered include approaches for scoring merchandise based on customer relevance and preferences, techniques for gaining insight into customer relevance without customer data, and an overall approach to a customer-driven merchandise planning process.
Christopher Matz, SAS
If you have been programming SAS® for years, you have probably made Display Manager your own: customized window layout, program text colors, bookmarks, and abbreviations/keyboard macros. Now you are using SAS® Enterprise Guide®. Did you know you can have almost all the same modifications you had in Base SAS® in SAS Enterprise Guide, plus more?
John Ladds, Statistics Canada
A common complaint from users working on identifying fraud and abuse in Medicare is that teams focus on operational applications, static reports, and high-level outliers. But, when faced with the need to constantly evaluate changing Medicare provider and beneficiary or enrollee dynamics, users are clamoring for more dynamic and accurate detection approaches. Providing these organizations with a data discovery and predictive analytics framework that leverages Hadoop and other big data approaches, while providing a clear path for teams to make more fact-based decisions more quickly is very important in pre- and post-fraud and abuse analysis. Organizations that do pursue a framework and a reusable services-based data discovery and analytics framework and architecture approach enjoy greater success in supporting data management, reporting, and analytics demands. They can quickly turn models into prioritized alerts and avoid improper or fraudulent payments. A successful framework should enable organizations to come up with efficient fraud, waste, and abuse models to address complex schemes; identify fraud, waste, and abuse vulnerabilities; and shorten triage efforts using a variety of data sourced from big data platforms like Hadoop and other relational database management systems. This paper talks about the data management, data discovery, predictive analytics, and social network analysis capabilities that are included in the SAS fraud framework and how a unified approach can significantly reduce the lifecycle of building and deploying fraud models. We hope this paper will provide IT leaders with a clear path for resolving issues from the simple to the incredibly complex, through a measured and scalable approach for delivering value for fraud, waste, and abuse models by providing deep insights to support evidence-based investigations.
Vivek Sethunatesan, Northrop Grumman Corp
The capabilities of SAS® have been extended by the use of macros and custom formats. SAS macro code libraries and custom format libraries can be stored in various locations, some of which may or may not always be easily and efficiently accessed from other operating environments. Code can be in various states of development ranging from global organization-wide approved libraries to very elementary just-getting-started code. Formalized yet flexible file structures for storing code are needed. SAS user environments range from standalone systems such as PC SAS or SAS on a server/mainframe to much more complex installations using multiple platforms. Strictest attention must be paid to (1) file location for macros and formats and (2) management of the lack of cross-platform portability of formats. Macros are relatively easy to run from their native locations. This paper covers methods of doing this with emphasis on: (a) the option sasautos to define the location and the search order for identifying macros being called, and (b) even more importantly the little-known SAS option MAUTOLOCDISPLAY to identify the location of the macro actually called in the saslog. Format libraries are more difficult to manage and cannot be created and run in a different operating system than that in which they were created. This paper will discuss the export, copying and importing of format libraries to provide cross-platform capability. A SAS macro used to identify the source of a format being used will be presented.
Roger Muller, Data-To-Events, Inc.
Mobile devices are taking over conventional ways of sharing and presenting information in today s businesses and working environments. Accessibility to this information is a key factor for companies and institutions in order to reach wider audiences more efficiently. SAS® software provides a powerful set of tools that allows developers to fulfill the increasing demand in mobile reporting without needing to upgrade to the latest version of the platform. Here at University of Central Florida (UCF), we were able to create reports targeting our iPad consumers at our executive level by using the SAS® 9.2 Enterprise Business Intelligence environment, specifically SAS® Web Report Studio 4.3. These reports provide them with the relevant data for their decision-making process. At UCF, the goal is to provide executive consumers with reports that fit on one screen in order to avoid the need of scrolling and that are easily exportable to PDF. This is done in order to respond to their demand to be able to accomodate their increasing use of portable technology to share sensitive data in a timely manner. The technical challenge is to provide specific data to those executive users requesting access through their iPad devices. Compatibility issues arise but are successfully bypassed. We are able to provide reports that fit on one screen and that can be opened as a PDF if needed. These enhanced capabilities were requested and well received by our users. This paper presents techniques we use in order to create mobile reports.
Carlos Piemonti, University of Central Florida
This presentation features implementation leads from SAS® Professional Services and Health Canada's Non-Insured Health benefits (NIHB) program, on a joint implementation of SAS® Fraud Framework for Health Care. The presentation walks through the fast-paced implementation of NIHB's Pharmacy Surveillance System that guards Canadian taxpayers from undue costs, and protects the safety of NIHB clients. This presentation is a blend of project management and technical material, and presents both the client (NIHB) and consultant (SAS) perspectives throughout the story. The presentation converges onto several core principles needed to successfully deliver analytical solutions.
Jeffrey Menzies, Health Canada
Ian Ghent, SAS
When viewing and working with SAS® data sets especially wide ones it s often instinctive to rearrange the variables (columns) into some intuitive order. The RETAIN statement is one of the most commonly cited methods used for ordering variables. Though RETAIN can perform this task, its use as an ordering clause can cause a host of easily missed problems due to its intended function of retaining values across DATA step iterations. This risk is especially great for the more novice SAS programmer. Instead, two equally effective and less risky ways to order data set variables are recommended, namely, the FORMAT and SQL SELECT statements.
Andrew Clapson, Statistics Canada
PROC TABULATE is a powerful tool for creating tabular summary reports. Its advantages, over PROC REPORT, are that it requires less code, allows for more convenient table construction, and uses syntax that makes it easier to modify a table s structure. However, its inability to compute the sum, difference, product, and ratio of column sums has hindered its use in many circumstances. This paper illustrates and discusses some creative approaches and methods for overcoming these limitations, enabling users to produce needed reports and still enjoy the simplicity and convenience of PROC TABULATE. These methods and skills can have prominent applications in a variety of business intelligence and analytics fields.
Justin Jia, Canadian Imperial Bank of Commerce (CIBC)
Amanda Lin, Bell Canada
The SQL procedure contains many powerful and elegant language features for intermediate and advanced SQL users. This presentation discusses topics that will help SAS® users unlock the many powerful features, options, and other gems found in the SQL universe. Topics include CASE logic; a sampling of summary (statistical) functions; dictionary tables; PROC SQL and the SAS macro language interface; joins and join algorithms; PROC SQL statement options _METHOD, MAGIC=101, MAGIC=102, and MAGIC=103; and key performance (optimization) issues.
Kirk Paul Lafler, Software Intelligence Corporation
Paper 1506-2014:
Practical Considerations in the Development of a Suite of Predictive Models for Population Health Management
The use of predictive models in healthcare has steadily increased over the decades. Statistical models now are assumed to be a necessary component in population health management. This session will review practical considerations in the choice of models to develop, criteria for assessing the utility of the models for production, and challenges with incorporating the models into business process flows. Specific examples of models will be provided based upon work by the Health Economics team at Blue Cross Blue Shield of North Carolina.
Daryl Wansink, Blue Cross Blue Shield of North Carolina
An increase in sea levels is a potential problem that is affecting the human race and marine ecosystem. Many models are being developed to find out the factors that are responsible for it. In this research, the Memory-Based Reasoning model looks more effective than most other models. This is because this model takes the previous solutions and predicts the solutions for forthcoming cases. The data was collected from NASA. The data contains 1,072 observations and 10 variables such as emissions of carbon dioxide, temperature, and other contributing factors like electric power consumption, total number of industries established, and so on. Results of Memory-Based Reasoning models like RD tree, scan tree, neural networks, decision tree, and logistic regression are compared. Fit statistics, such as misclassification rate and average squared error are used to evaluate the model performance. This analysis is used to predict the rise in sea levels in the near future and to take the necessary actions to protect the environment from global warming and natural disasters.
Prasanna K S Sailaja Bhamidi, Oklahoma State University
Goutam Chakraborty, Oklahoma State University
In this session, you learn how Kaiser Permanente has taken a centralized production support approach to using SAS® Enterprise Guide® 4.3 in the healthcare industry. Kaiser Permanente Northwest (KPNW) has designed standardized processes and procedures that have allowed KPNW to streamline the support of production content, which enabled KPNW analytical resources to focus more on new content development rather than on maintenance and support of steady state programs and processes. We started with over 200 individual SAS® processes across four different SAS platforms, SAS Enterprise Guide, Mainframe SAS®, PC SAS® and SAS® Data Integration Studio, in oder to standardize our development approach on SAS Enterprise Guide and build efficient and scalable processes within our department and across the region. We walk through the need for change, how the team was set up, provide an overview of the UNIX SAS platform, walk through the standard production requirements (developer pack), and review lessons learned.
Ryan Henderson, Kaiser Permanente
Karl Petith, Kaiser Permanente
In healthcare, we often express our analytics results as being adjusted . For example, you might have read a study in which the authors reported the data as age-adjusted or risk-adjusted. The concept of adjustment is widely used in program evaluation, comparing quality indicators across providers and systems, forecasting incidence rates, and in cost-effectiveness research. In order to make reasonable comparisons across time, place, or population, we need to account for small sample sizes and case-mix variation in other words, we need to level the playing field and account for differences in health status and for uniqueness in a given population. If you are new to healthcare. What it really means to adjust the data in order to make comparisons might not be obvious. In this paper, we explore the methods by which we control for potentially confounding variables in our data. We do so through a series of examples from the healthcare literature in both primary care and health insurance. In this survey of methods, we discuss the concepts of rates and how they can be adjusted for demographic strata (such as age, gender, and race), as well as health risk factors such as case mix.
Greg Nelson, ThotWave
The project focuses on using analytics to reveal unwarranted use of access to medical records, i.e. employees in health organizations that access information about neighbours, friends, celebrities, etc., without a sound reason to do so. The method is based on the natural assumption that the vast majority of lookups are legitimate lookups that differ from a statistically defined normal behavior will be subject to manual investigation. The work was carried out in collaboration between SAS Institute Norway and the largest Norwegian hospital, Oslo University Hospital (OUS) and was aimed at establishing whether the method is suitable for unveiling unwarranted lookups in medical records. A number of so called scenarios are used to indicate adverse behaviour, each responsible for looking at one particular aspect of journal access data. For instance, one scenario determines the timeliness of a lookup relative to the patient's admission history; another judges whether the medical competency of the employee is relevant to the situation of the patient at the time of the lookup. We have so far designed and developed a library of around 20 scenarios that together are used in weighted combination to render a final judgment of the appropriateness of the lookup. The approach has been proven highly successful, and a further development of these ideas is currently being done, the aim of which is to establish a joint Norwegian solution to the problem of unwarranted access. Furthermore, we believe that the approach and the framework may be utilised in many other industries where sensitive data is being processed, such as financial, police, tax and social services. In this paper, the method is outlined, as well as results of its application on data from OUS.
Heidi Thorstensen, Oslo University Hospital
Torulf Mollestad, SAS
SAS® Enterprise Guide® has become the place through which many SAS® users access the power of SAS. Some like it, some loathe it, some have never known anything else. In my experience, the following attitudes prevail regarding the product: 1) I don't know what SAS is, but I can use a mouse and I know what my business needs are. 2) I've used SAS before, but now my company has moved to SAS Enterprise Guide and I love it! 3) I've used SAS before, but now my company has done something really stupid. SAS Enterprise Guide offers a place to learn as well as work. The product offers environments for point-and-click for those who want that, and a type-your-code-with-semi-colons environment for those who want that. Even better, a user can mix and match, using the best of both worlds. I show that SAS Enterprise Guide is a great place for building up business solutions using a step-by-step method, how we can make the best of both environments, and how we can dip our toes into parts of SAS that might have frustrated us in the past and made us run away and cry I ll do it in Excel! I demonstrate that there are some very nice aspects to SAS Enterprise Guide, out of the box, that are often ignored but that can improve the overall SAS experience. We look at my personal nemeses, SAS/GRAPH® and PROC TABULATE, with a side-trip to the mysterious world that is ODS, or the Output Delivery System.
Dave Shea, Skylark Limited
Have you been programming in SAS® for a while and just aren t sure how SAS® Enterprise Guide® can help you? This presentation demonstrates how SAS programmers can use SAS Enterprise Guide 5.1 as their primary interface to SAS, while maintaining the flexibility of writing their own customized code. We explore: navigating and customizing the SAS Enterprise Guide environment using SAS Enterprise Guide to access existing programs and enhance processing exploiting the enhanced development environment including syntax completion and built-in function help using SAS® Code Analyzer, Report Builder, and Document Builder adding Project Parameters to generalize the usability of programs and processes leveraging built-in capabilities available in SAS Enterprise Guide to further enhance the information you deliver Our audience is SAS users who understand the basics of SAS programming and want to learn how to use SAS Enterprise Guide. This paper is also appropriate for users of earlier versions of SAS Enterprise Guide who want to try the enhanced features available in SAS Enterprise Guide 5.1.
Marje Fecht, Prowerk Consulting
Rupinder Dhillon, Dhillon Consulting
Are you wondering what is causing your valuable machine asset to fail? What could those drivers be, and what is the likelihood of failure? Do you want to be proactive rather than reactive? Answers to these questions have arrived with SAS® Predictive Asset Maintenance. The solution provides an analytical framework to reduce the amount of unscheduled downtime and optimize maintenance cycles and costs. An all new (R&D-based) version of this offering is now available. Key aspects of this paper include: Discussing key business drivers for and capabilities of SAS Predictive Asset Maintenance. Detailed analysis of the solution, including: Data model Explorations Data selections Path I: analysis workbench maintenance analysis and stability monitoring Path II: analysis workbench JMP®, SAS® Enterprise Guide®, and SAS® Enterprise Miner™ Analytical case development using SAS Enterprise Miner, SAS® Model Manager, and SAS® Data Integration Studio SAS Predictive Asset Maintenance Portlet for reports A realistic business example in the oil and gas industry is used.
George Habek, SAS
The Purchasing Department is considering contracting with your team for a new SAS® Enterprise BI application. He's already met with SAS® and seen the sales pitch, and he is very interested. But the manager is a tightwad and not sure about spending the money. Also, he wants his team to be the primary developers for this new application. Before investing his money on training, programming, and support, he would like a proof-of-concept. This paper will walk you through the seven steps to create a SAS Enterprise BI POC project: Develop a kick-off meeting including a full demo of the SAS Enterprise BI tools. Set up your UNIX file systems and security. Set up your SAS metadata ACTs, users, groups, folders, and libraries. Make sure the necessary SAS client tools are installed on the developers machines. Hold a SAS Enterprise BI workshop to introduce them to the basics, including SAS® Enterprise Guide®, SAS® Stored Processes, SAS® Information Maps, SAS® Web Report Studio, SAS® Information Delivery Portal, and SAS® Add-In for Microsoft Office, along with supporting documentation. Work with them to develop a simple project, one that highlights the benefits of SAS Enterprise BI and shows several methods for achieving the desired results. Last but not least, follow up! Remember, your goal is not to launch a full-blown application. Instead, we ll strive toward helping them see the potential in your organization for applying this methodology.
Sheryl Weise, Wells Fargo
Distributing SAS® software to a large number of machines can be challenging at best and exhausting at worst. Common areas of concern for installers are silent automation, network traffic, ease of setup, standardized configurations, maintainability, and simply the sheer amount of time it takes to make the software available to end users. We describe a variety of techniques for easing the pain of provisioning SAS software, including the new standalone SAS® Enterprise Guide® and SAS® Add-in for Microsoft Office installers, as well as the tried and true SAS® Deployment Wizard record and playback functionality. We also cover ways to shrink SAS Software Depots, like the new 'subsetting recipe' feature, in order to ease scenarios requiring depot redistribution. Finally, we touch on alternate methods for workstation access to SAS client software, including application streaming, desktop virtualization, and Java Web Start.
Mark Schneider, SAS
Have you ever needed to use dates as values to loop through a table? For example, how many events occurred by 1, 2 , 3 & n months ahead? Maybe you just changed the dates manually and re-ran the query n times? This is a common need in economic and behavioral sciences. This presentation demonstrates how to create a table of dates that can be used with SAS® macro variables to loop through a table. Using this dates table in combination with the SAS DO loop ensures accuracy and saves time.
Scott Fawver, Arch Mortgage Insurance Company
One in every four people dies of heart disease in the United States, and stress is an important factor which contributes towards a cardiac event. As the condition of the heart gradually worsens with age, the factors that lead to a myocardial infarction when the patients are subjected to stress are analyzed. The data used for this project was obtained from a survey conducted through the Department of Biostatistics at Vanderbilt University. The objective of this poster is to predict the chance of survival of a patient after a cardiac event. Then by using decision trees, neural networks, regression models, bootstrap decision trees, and ensemble models, we predict the target which is modeled as a binary variable, indicating whether a person is likely to survive or die. The top 15 models, each with an accuracy of over 70%, were considered. The model will give important survival characteristics of a patient which include his history with diabetes, smoking, hypertension, and angioplasty.
Yogananda Domlur Seetharama, Oklahoma State University
Sai Vijay Kishore Movva, Oklahoma State University
With smartphone and mobile apps market developing so rapidly, the expectations about effectiveness of mobile applications is high. Marketers and app developers need to analyze huge data available much before the app release, not only to better market the app, but also to avoid costly mistakes. The purpose of this poster is to build models to predict the success rate of an app to be released in a particular category. Data has been collected for 540 android apps under the Top free newly released apps category from https://play.google.com/store . The SAS® Enterprise Miner™ Text Mining node and SAS® Sentiment Analysis Studio are used to parse and tokenize the collected customer reviews and also to calculate the average customer sentiment score for each app. Linear regression, neural, and auto-neural network models have been built to predict the rank of an app by considering average rating, number of installations, total number of reviews, number of 1-5 star ratings, app size, category, content rating, and average customer sentiment score as independent variables. A linear regression model with least Average Squared Error is selected as the best model, and number of installations, app maturity content are considered as significant model variables. App category, user reviews, and average customer sentiment score are also considered as important variables in deciding the success of an app. The poster summarizes the app success trends across various factors and also introduces a new SAS® macro %getappdata, which we have developed for web crawling and text parsing.
Vandana Reddy, Oklahoma State University
Chinmay Dugar, Oklahoma State University
Global businesses must react to daily changes in market conditions over multiple geographies and industries. Consuming reputable daily economic reports assists in understanding these changing conditions, but requires both a significant human time commitment and a subjective assessment of each topic area of interest. To combat these constraints, Dow's Advanced Analytics team has constructed a process to calculate sentence-level topic frequency and sentiment scoring from unstructured economic reports. Daily topic sentiment scores are aggregated to weekly and monthly intervals and used as exogenous variables to model external economic time series data. These models serve to both validate the relationship between our sentiment scoring process and also as near-term forecasts where daily or weekly variables are unavailable. This paper will first describe our process of using SAS® Text Miner to import and discover economic topics and sentiment from unstructured economic reports. The next section describes sentiment variable selection techniques that use SAS/STAT®, SAS/ETS®, and SAS® Enterprise Miner™ to generate similarity measures to economic indices. Our process then uses ARIMAX modeling in SAS® Forecast Studio to create economic index forecasts with topic sentiments. Finally, we show how the sentiment model components are used as a matrix of economic key performance indicators by topic and geography.
Michael P. Dessauer, The Dow Chemical Company
Justin Kauhl, Tata Consultancy Services
For decades, SAS® has been the cornerstone of many organizations for business reporting. In more recent times, the ability to quickly determine the performance of an organization through the use of dashboards has become a requirement. Different ways of providing dashboard capabilities are discussed in this paper: using out-of-the-box solutions such as SAS® Visual Analytics and SAS® BI Dashboard, through to alternative solutions using SAS® Stored Processes, batch processes, and SAS® Integration Technologies. Extending the available indicators is also discussed, using Graph Template Language and KPI indicators provided with Base SAS®, as well as alternatives such as Google Charts and Flash objects. Real-world field experience, problem areas, solutions, and tips are shared, along with live examples of some of the different methods.
Mark Bodt, The Knowledge Warehouse (Knoware)
The SAS® Enterprise Guide® Query Builder is one of the most powerful components of the software. It enables a user to bring in data, join, drop and add columns, compute new columns, sort, filter data, leverage the advanced expression builder, change column attributes, and more! This presentation provides an overview of the major features of this powerful tool and how to leverage it every day.
Jennifer First-Kluge, Systems Seminar Consultants
Steven First, Systems Seminar Consultants
Direct marketing is the practice of delivering promotional messages directly to potential customers on an individual basis rather than by using mass medium. In this project, we build a finely tuned response model that helps a financial services company to select high-quality receptive customers for their future campaigns and to identify the important factors that influence marketing to effectively manage their resources. This study was based on the customer solicitation center s marketing campaign data (45,211 observations and 18 variables) available on UC Irvine's web site with attributes of present and past campaign information (communication type, contact duration, previous campaign outcome, and so on) and customer s personal and banking information. As part of data preparation, we had performed mean imputation to handle missing values and categorical recoding for reducing levels of class variables. In this study, we had built several predictive models using the SAS® Enterprise Miner™ models Decision Tree, Neural Network, Logistic Regression, and SVM to predict whether the customer responds to the loan offer by subscribing. The results showed that the Stepwise Logistic Regression model was the best when chosen based on the misclassification rate criteria. When the top 3 decile customers were selected based on the best model, the cumulative response rate was 14.5% in contrast to the baseline response rate of 5%. Further analysis showed that the customers are more likely to subscribe to the loan offer if they have the following characteristics: never been contacted in the past, no default history, and provided cell phone as primary contact information.
Arun Mandapaka, Oklahoma State University
Amit Kushwah, Oklahoma State University
Goutam Chakraborty, Oklahoma State University
Applying models to analyze sports data has always been done by teams across the globe. The film Moneyball has generated much hype about how a sports team can use data and statistics to build a winning team. The objective of this poster is to use the model comparison algorithm of SAS® Enterprise Miner™ to pick the best model that can predict the outcome of a soccer game. It is hence important to determine which factors influence the results of a game. The data set used contains input variables about a team s offensive and defensive abilities and the outcome of a game is modeled as a target variable. Using SAS Enterprise Miner, multinomial regression, neural networks, decision trees, ensemble models and gradient boosting models are built. Over 100 different versions of these models are run. The data contains statistics from the 2012-13 English premier league season. The competition has 20 teams playing each other in a home and away format. The season has a total of 380 games; the first 283 games are used to predict the outcome of the last 97 games. The target variable is treated as both nominal variable and ordinal variable with 3 levels for home win, away win, and tie. The gradient boosting model is the winning model which seems to predict games with 65% accuracy and identifies factors such as goals scored and ball possession as more important compared to fouls committed or red cards received.
Vandana Reddy, Oklahoma State University
Sai Vijay Kishore Movva, Oklahoma State University
Identifying claim fraud using predictive analytics represents a unique challenge. 1. Predictive analytics generally requires that you have a target variable which can be analyzed. Fraud is unique in this regard in that there is a lot of fraud that has occurred historically that has not been identified. Therefore, the definition of the target variable is difficult. 2.There is also a natural assumption that the past will bear some resemblance to the future. In the case of fraud, methods of defrauding insurance companies change quickly and can make the analysis of a historical database less valuable for identifying future fraud. 3. In an underlying database of claims that may have been determined to be fraudulent by an insurance company, there is many times an inconsistency between different claim adjusters regarding which claims are referred for investigation. This inconsistency can lead to erroneous model results due to data that is not homogenous. This paper will demonstrate how analytics can be used in several ways to help identify fraud: 1. More consistent referral of suspicious claims 2. Better identification of new types of suspicious claims 3. Incorporating claim adjuster insight into the analytics results. As part of this paper, we will demonstrate the application of several approaches to fraud identification: 1. Clustering 2. Association analysis 3. PRIDIT (Principal Component Analysis of RIDIT scores).
Roosevelt C. Mosley, Pinnacle Actuarial Resources, Inc.
Nick Kucera, Pinnacle Actuarial Resources, Inc.
HTML5 has become the de facto standard for web applications. As a result, the lingua franca object notation of the web services that the web applications call has switched from XML to JSON. JSON is remarkably easy to parse in JavaScript, but so far SAS doesn't have any native JSON parsers. The Facebook Graph API dropped XML support a few years ago. This paper shows how we can parse the JSON in SAS by calling an external script, using PROC GROOVY to parse it inside of SAS, or by parsing the JSON manually with a DATA step. We'll extract the data from the Facebook Graph API and import it into an OLAP data mart to report and analyze a marketing campaign's effectiveness.
Philihp Busby, SAS
No need to fret, Base SAS® programmers. Converting to SAS® Enterprise Guide® is a breeze, and it provides so many advantages. Coding remote connections to SAS® servers is a thing of the past. Generate WYSIWYG prompts to increase the usage of the SAS code and to create reports and SAS® Stored Processes to share easily with people who don t use SAS Enterprise Guide. The first and most important thing, however, is to change the default options and preferences to tame SAS Enterprise Guide, making it behave similar to your Base SAS ways. I cover all of these topics and provide demos along the way.
Angela Hall, SAS
This paper gives you a better idea of how and where to use the record lookup functions to locate observations where a variable has some characteristic. Various related functions are illustrated to search numeric and character values in this process. Code is shown with time comparisons. I will discuss three possible ways to retrieve records using the SAS® DATA step, PROC SQL, and Perl regular expressions. Real and CPU time processing issues will be highlighted when comparing to retrieve records using these methods. Although the program is written for the PC using SAS® 9.2 in a Windows XP 32-bit environment, all the functions are applicable to any system. All the tools discussed are in Base SAS®. The typical attendee or reader will have some experience in SAS, but not a lot of experience dealing with large amount of data.
Anjan Matlapudi, Amerihealth Critas
Have you found OS file permissions to be insufficient to tailor access controls to meet your SAS® data security requirements? Have you found metadata permissions on tables useful for restricting access to SAS data, but then discovered that SAS programmers can avoid the permissions by issuing LIBNAME statements that do not use the metadata? Would you like to ensure that users have access to only particular rows or columns in SAS data sets, no matter how they access the SAS data sets? Metadata-bound libraries provide the ability to authorize access to SAS data by authenticated Metadata User and Group identities that cannot be bypassed by SAS programmers who attempt to avoid the metadata with direct LIBNAME statements. They also provide the ability to limit the rows and columns in SAS data sets that an authenticated user is allowed to see. The authorization decision is made in the bowels of the SAS® I/O system, where it cannot be avoided when data is accessed. Metadata-bound libraries were first implemented in the second maintenance release of SAS® 9.3 and were enhanced in SAS® 9.4. This paper overviews the feature and discusses best practices for administering libraries bound to metadata and user experiences with bound data. It also discusses enhancements included in the first maintenance release of SAS 9.4.
Jack Wallace, SAS
There are yearly 2.35 million road accident cases recorded in the U.S. Among them, 37,000 were considered fatal. Road crashes cost USD 230.6 billion per year, or an average of USD 820 per person. Our efforts are to identify the important factors that lead to vehicle collisions and to predict the injury risk involved in them. Data was collected from National Automotive Sampling System (NASS), containing 20,247 cases with 19 variables. Input variables describe the factors involved in an accident like Height, Age, Weight, Gender, Vehicle model year, Speed limit, Energy absorption in Collision & Deformation location, etc. The target variable is nominal showing levels of injury. Missing values in interval variables were imputed using mean and class variables using the count method. Multivariate analysis suggests high correlation between tire footprint and wheelbase (Corr=0.97, P<0.0001) and original weight of car and curb weight of car (Corr=0.79, P<0.0001). Variables having high kurtosis values were transformed using range standardization. Variables were sorted using variable importance using decision tree analysis. Models like multiple regression, polynomial regression, neural network, and decision tree were applied in the dataset to identify the factors that are most significant in predicting the injury risk. Multilinear perception neural network came out to be the best model to predict injury risk index, with the least Average Squared Error 0.086 in validation dataset.
Prateek Khare, Oklahoma State University
Vandana Reddy, Oklahoma State University
Goutam Chakraborty, Oklahoma State University
In the past, calibration was done by using extremely complicated macros in Base SAS® to create a Microsoft Excel workbook with multiple linked spreadsheets. This process made it hard to audit, was not reliably replicable, and was open to user error. The task was to create a replicable, auditable, and locked down application that allowed the user to change certain parameters and see the impact of those changes without needing to code. SAS® Stored Processes are used to generate a screen that is split into three sections: one shows static reporting, the second is a data-driven custom input form, and the third shows test results. The initial screen uses a standard stored process that enables the user to select the model and time period. Macro variables are passed through to subset data. The Static reports are created from a stored process that executes two REPORT procedures that subset the data based on the passed parameters. The form is built using SAS® to generate HTML and is data driven. The Update button at the end of the form executes a stored process that collects the data that the user has entered into the form and updates a database. After the rates have been updated, they are used to generate test results using PROC REPORT.
Anita Measey, Bank of Montreal
Health plans use wide-ranging interventions based on criteria set by nationally recognized organizations (for example, NCQA and CMS) to change health-related behavior in large populations. Evaluation of these interventions has become more important with the increased need to report patient-centered quality of care outcomes. Findings from evaluations can detect successful intervention elements and identify at-risk patients for further targeted interventions. This paper describes how SAS® was applied to evaluate the effectiveness of a patient-directed intervention designed to increase medication adherence and a health plan s CMS Part D Star Ratings. Topics covered include querying data warehouse tables, merging pharmacy and eligibility claims, manipulating data to create outcome variables, and running statistical tests to measure pre-post intervention differences.
Scott Leslie, MedImpact Healthcare Systems, Inc.
When reading data files or writing SAS® programs, we are often hunting for the right format or informat. There are so many to choose from! Does it seem like too many to search the manual? Let SAS help find the right one! We use the SAS dictionary table VFORMAT and a very small SAS program. This presentation demonstrates how two simple functions unlock the potential of this great resource: SASHELP.VFORMAT.
Peter Crawford, Crawford Software Consultancy Limited