Services Papers A-Z

A
Session SAS5642-2016:
A Ringside Seat: The ODS Excel Destination versus the ODS ExcelXP Tagset
The new and highly anticipated SAS® Output Delivery System (ODS) destination for Microsoft Excel is finally here! Available as a production feature in the third maintenance release of SAS® 9.4 (TS1M3), this new destination generates native Excel (XLSX) files that are compatible with Microsoft Office 2010 or later. This paper is written for anyone, from entry-level programmers to business analysts, who uses the SAS® System and Microsoft Excel to create reports. The discussion covers features and benefits of the new Excel destination, differences between the Excel destination and the older ExcelXP tagset, and functionality that exists in the ExcelXP tagset that is not available in the Excel destination. These topics are all illustrated with meaningful examples. The paper also explains how you can bridge the gap that exists as a result of differences in the functionality between the destination and the tagset. In addition, the discussion outlines when it is beneficial for you to use the Excel destination versus the ExcelXP tagset, and vice versa. After reading this paper, you should be able to make an informed decision about which tool best meets your needs.
Read the paper (PDF) | Watch the recording
Chevell Parker, SAS
Session 9280-2016:
An Application of the DEA Optimization Methodology to Optimize the Collection Direction Capacity of a Bank
In the Collection Direction of a well-recognized Colombian financial institution, there was no methodology that provided an optimal number of collection agents to improve the collection task and make possible that more customers be compromised with their minimum monthly payment of their debt. The objective of this paper is to apply the Data Envelopment Analysis (DEA) Optimization Methodology to determine the optimal number of agents to maximize the monthly collection in the bank. We show that the results can have a positive impact to the credit portfolio behavior and reduce the collection management cost. DEA optimization methodology has been successfully used in various fields to solve multi-criteria optimization problems, but it is not commonly used in the financial sector mostly because this methodology requires specialized software, such as SAS® Enterprise Guide®. In this paper, we present the PROC OPTMODEL and we show how to formulate the optimization problem, program the SAS® Code, and how to process adequately the available data.
Read the paper (PDF) | Download the data file (ZIP)
Miguel Díaz, Scotiabank - Colpatria
Oscar Javier Cortés Arrigui, Scotiabank - Colpatria
Session 11001-2016:
Analysis of IMDb Reviews for Movies and Television Series Using SAS® Enterprise Miner™ and SAS® Sentiment Analysis Studio
Movie reviews provide crucial information and act as an important factor when deciding whether to spend money on seeing a film in the theater. Each review reflects an individual's take on the movie and there are often contrasting reviews for the same movie. Going through each review may create confusion in the mind of a reader. Analyzing all the movie reviews and generating a quick summary that describes the performance, direction, and screenplay, among other aspects is helpful to the readers in understanding the sentiments of movie reviewers and in deciding if they would like to watch the movie in the theaters. In this paper, we demonstrate how the SAS® Enterprise Miner™ nodes enable us to generate a quick summary of the terms and their relationship with each other, which describes the various aspects of a movie. The Text Cluster and Text Topic nodes are used to generate groups with similar subjects such as genres of movies, acting, and music. We use SAS® Sentiment Analysis Studio to build models that help us classify 10,000 reviews where each review is a separate document and the reviews are equally split into good or bad. The Smoothed Relative Frequency and Chi-square model is found to be the best model with overall precision of 78.37%. As soon as the latest reviews are out, such analysis can be performed to help viewers quickly grasp the sentiments of the reviews and decide if the movie is worth watching.
Read the paper (PDF)
Ameya Jadhavar, Oklahoma State University
Goutam Chakraborty, Oklahoma State University
Prithvi Raj Sirolikar, Oklahoma State University
Session 10663-2016:
Applying Frequentist and Bayesian Logistic Regression to MOOC data in SAS®: a Case Study
Massive Open Online Courses (MOOC) have attracted increasing attention in educational data mining research areas. MOOC platforms provide free higher education courses to Internet users worldwide. However, MOOCs have high enrollment but notoriously low completion rates. The goal of this study is apply frequentist and Bayesian logistic regression to investigate whether and how students' engagement, intentions, education levels, and other demographics are conducive to course completion in MOOC platforms. The original data used in this study came from an online eight-week course titled Big Data in Education, taught within the Coursera platform (MOOC) by Teachers College, Columbia University. The data sets for analysis were created from three different sources--clickstream data, a pre-course survey, and homework assignment files. The SAS system provides multiple procedures to perform logistic regression, with each procedure having different features and functions. In this study, we apply two approaches--frequentist and Bayesian logistic regression, to MOOC data. PROC LOGISTIC is used for the frequentist approach, and PROC GENMOD is used for Bayesian analysis. The results obtained from the two approaches are compared. All the statistical analyses are conducted in SAS® 9.3. Our preliminary results show that MOOC students with higher course engagement and higher motivation are more likely to complete the MOOC course.
Read the paper (PDF) | View the e-poster or slides (PDF)
Yan Zhang, Educational Testing Service
Ryan Baker, Columbia University
Yoav Bergner, Educational Testing Service
Session 11761-2016:
Arrest Prediction and Analysis based on a Data Mining Approach
In light of past indiscretions by the police departments leading to riots in major U.S. cities, it is important to assess factors leading to the arrest of a citizen. The police department should understand the arrest situation before they can make any decision and diffuse any possibility of a riot or similar incidents. Many such incidents in the past are a result of the police department failing to make right decisions in the case of emergencies. The primary objective of this study is to understand the key factors that impact arrest of people in New York City and understanding various crimes that have taken place in the region in the last two years. The study explores different regions of New York City where the crimes have taken place and also the timing of incidents leading to them. The data set consists of 111 variables and 273,430 observations from the year 2013 to 2014, with a binary target variable arrest made . The percentage of Yes and No incidents of arrest made are 6% and 94% respectively. This study analyzes the reasons for arrest, which are suspicion, timing of frisk activities, identifying threats, basis of search, whether arrest required a physical intervention, location of the arrest, and whether the frisk officer needs any support. Decision tree, regression, and neural network models were built, and decision tree turned out to be the best model with a validation misclassification rate of 6.47% and model sensitivity of 73.17%. The results from the decision tree model showed that suspicion count, search basis count, summons leading to violence, ethnicity, and use of force are some of the important factors influencing arrest of a person.
Read the paper (PDF)
Karan Rudra, Oklahoma State University
Goutam Chakraborty, Oklahoma State University
Maitreya Kadiyala, Oklahoma State University
B
Session 3640-2016:
Big Data, Big Headaches: An Agile Modeling Solution Designed for the Information Age
The surge of data and data sources in marketing has created an analytical bottleneck in most organizations. Analytics departments have been pushed into a difficult decision: either purchase black-box analytical tools to generate efficiencies or hire more analysts, modelers, and data scientists. Knowledge gaps stemming from restrictions in black-box tools or from backlogs in the work of analytical teams have resulted in lost business opportunities. Existing big data analytics tools respond well when dealing with large record counts and small variable counts, but they fall short in bringing efficiencies when dealing with wide data. This paper discusses the importance of an agile modeling engine designed to deliver productivity, irrespective of the size of the data or the complexity of the modeling approach.
Read the paper (PDF) | Watch the recording
Mariam Seirafi, Cornerstone Group of Companies
Session 11360-2016:
Breakthroughs at Old Dominion Electric Cooperative with Energy Load Forecasting Innovation
The electrical grid has become more complex; utilities are revisiting their approaches, methods, and technology to accurately predict energy demands across all time horizons in a timely manner. With the advanced analytics of SAS® Energy Forecasting, Old Dominion Electric Cooperative (ODEC) provides data-driven load predictions from next hour to next year and beyond. Accurate intraday forecasts means meeting daily peak demands saving millions of dollars at critical seasons and events. Mid-term forecasts provide a baseline to the cooperative and its members to accurately anticipate regional growth and customer needs, in addition to signaling power marketers where, when, and how much to hedge future energy purchases to meet weather-driven demands. Long-term forecasts create defensible numbers for large capital expenditures such as generation and transmission projects. Much of the data for determining load comes from disparate systems such as supervisory control and data acquisition (SCADA) and internal billing systems combined with external market data (PJM Energy Market), weather, and economic data. This data needs to be analyzed, validated, and shaped to fully leverage predictive methods. Business insights and planning metrics are achieved when flexible data integration capabilities are combined with advanced analytics and visualization. These increased computing demands at ODEC are being achieved by leveraging Amazon Web Services (AWS) for expanded business discovery and operational capacity. Flexible and scalable data and discovery environments allow ODEC analysts to efficiently develop and test models that are I/O intensive. SAS® visualization for the analyst is a graphic compute environment for information-sharing that is memory intensive. Also, ODEC IT operations require deployment options tuned for process optimization to meet service level agreements that can be quickly evaluated, tested, and promoted into production. What was once very difficult for most ut ilities to embrace is now achievable with new approaches, methods, and technology like never before.
Read the paper (PDF) | Watch the recording
David Hamilton, ODEC
Steve Becker, SAS
Emily Forney, SAS
Session 10742-2016:
Building a Recommender System with SAS® to Improve Cross-Selling for Online Retailers
Nowadays, the recommender system is a popular tool for online retailer businesses to predict customers' next-product-to-buy (NPTB). Based on statistical techniques and the information collected by the retailer, an efficient recommender system can suggest a meaningful NPTB to customers. A useful suggestion can reduce the customer's searching time for a wanted product and improve the buying experience, thus increasing the chance of cross-selling for online retailers and helping them build customer loyalty. Within a recommender system, the combination of advanced statistical techniques with available information (such as customer profiles, product attributes, and popular products) is the key element in using the retailer's database to produce a useful suggestion of an NPTB for customers. This paper illustrates how to create a recommender system with the SAS® RECOMMEND procedure for online business. Using the recommender system, we can produce predictions, compare the performance of different predictive models (such as decision trees or multinomial discrete-choice models), and make business-oriented recommendations from the analysis.
Read the paper (PDF) | Download the data file (ZIP)
Miao Nie, ABN AMRO Bank
Shanshan Cong, SAS Institute
C
Session SAS4240-2016:
Creating a Strong Business Case for SAS® Grid Manager: Translating Grid Computing Benefits to Business Benefits
SAS® Grid Manager, as well as other grid computing technologies, have a set of great capabilities that we, IT professionals, love to have in our systems. This technology increases high availability, allows parallel processing, facilitates increasing demand by scale out, and offers other features that make life better for those managing and using these environments. However, even when business users take advantage of these features, they are more concerned about the business part of the problem. Most of the time business groups hold the budgets and are key stakeholders for any SAS Grid Manager project. Therefore, it is crucial to demonstrate to business users how they will benefit from the new technologies, how the features will improve their daily operations, help them be more efficient and productive, and help them achieve better results. This paper guides you through a process to create a strong and persuasive business plan that translates the technology features from SAS Grid Manager to business benefits.
Read the paper (PDF) | Watch the recording
Marlos Bosso, SAS
Session 4940-2016:
Customer Acquisition: Targeting for Long-term Retention
Customer retention is a primary concern for businesses that rely on a subscription revenue model. It is common for marketers of subscription-based offerings to develop predictive models that are aimed at identifying subscribers who have the highest risk of attrition. With these likely unsubscribes identified, marketers then attempt to forestall customer termination by using a variety of retention enhancement tactics, which might include free offers, customer training, satisfaction surveys, or other measures. Although customer retention is always a worthy pursuit, it is often expensive to retain subscribers. In many cases, associated retention programs simply prove unprofitable over time because the overall cost of such programs frequently exceeds the lifetime value of the cohort of unsubscribed customers. Generally, it is more profitable to focus resources on identifying and marketing to a targeted prospective customer. When the target marketing strategy focuses on identifying prospects who are most likely to subscribe over the long term, the need for special retention marketing efforts decreases sharply. This paper describes results of an analytically driven targeting approach that is aimed at inviting new customers to a milk and grocery home-delivery service, with the promise of attracting only those prospects who are expected to exhibit high long-term retention rates.
Read the paper (PDF)
D
Session 10160-2016:
Design for Success: An Approach to Metadata Architecture for Distributed Visual Analytics
Metadata is an integral and critical part of any environment. Metadata facilitates resource discovery and provides unique identification of every single digital component of a system, simple to complex. SAS® Visual Analytics, one of the most powerful analytics visualization platforms, leverages the power of metadata to provide a plethora of functionalities for all types of users. The possibilities range from real-time advanced analytics and power-user reporting to advanced deployment features for a robust and scalable distributed platform to internal and external users. This paper explains the best practices and advanced approaches for designing and managing metadata for a distributed global SAS Visual Analytics environment. Designing and building the architecture of such an environment requires attention to important factors like user groups and roles, access management, data protection, data volume control, performance requirements, and so on. This paper covers how to build a sustainable and scalable metadata architecture through a top-down hierarchical approach. It helps SAS Visual Analytics Data Administrators to improve the platform benchmark through memory mapping, perform administrative data load (AUTOLOAD, Unload, Reload-on-Start, and so on), monitor artifacts of distributed SAS® LASR™ Analytic Servers on co-located Hadoop Distributed File System (HDFS), optimize high-volume access via FullCopies, build customized FLEX themes, and so on. It showcases practical approaches to managing distributed SAS LASR Analytic Servers, offering guest access for global users, managing host accounts, enabling Mobile BI, using power-user reporting features, customizing formats, enabling home page customization, using best practices for environment migration, and much more.
Read the paper (PDF)
Ratul Saha, Kavi Associates
Vimal Raj Arockiasamy, Kavi Associates
Vignesh Balasubramanian, Kavi Global
Session 10740-2016:
Developing an On-Demand Web Report Platform Using Stored Processes and SAS® Web Application Server
As SAS® programmers, we often develop listings, graphs, and reports that need to be delivered frequently to our customers. We might decide to manually run the program every time we get a request, or we might easily schedule an automatic task to send a report at a specific date and time. Both scenarios have some disadvantages. If the report is manual, we have to find and run the program every time someone request an updated version of the output. It takes some time and it is not the most interesting part of the job. If we schedule an automatic task in Windows, we still sometimes get an email from the customers because they need the report immediately. That means that we have to find and run the program for them. This paper explains how we developed an on-demand report platform using SAS® Enterprise Guide®, SAS® Web Application Server, and stored processes. We had developed many reports for different customer groups, and we were getting more and more emails from them asking for updated versions of their reports. We felt we were not using our time wisely and decided to create an infrastructure where users could easily run their programs through a web interface. The tool that we created enables SAS programmers to easily release on-demand web reports with minimum programming. It has web interfaces developed using stored processes for the administrative tasks, and it also automatically customizes the front end based on the user who connects to the website. One of the challenges of the project was that certain reports had to be available to a specific group of users only.
Read the paper (PDF)
Romain Miralles, Genomic Health
E
Session SAS3120-2016:
Ensemble Modeling: Recent Advances and Applications
Ensemble models are a popular class of methods for combining the posterior probabilities of two or more predictive models in order to create a potentially more accurate model. This paper summarizes the theoretical background of recent ensemble techniques and presents examples of real-world applications. Examples of these novel ensemble techniques include weighted combinations (such as stacking or blending) of predicted probabilities in addition to averaging or voting approaches that combine the posterior probabilities by adding one model at a time. Fit statistics across several data sets are compared to highlight the advantages and disadvantages of each method, and process flow diagrams that can be used as ensemble templates for SAS® Enterprise Miner™ are presented.
Read the paper (PDF)
Wendy Czika, SAS
Ye Liu, SAS Institute
Session SAS5246-2016:
Enterprise Data Governance across SAS® and Beyond
As Data Management professionals, you have to comply with new regulations and controls. One such regulation is Basel Committee on Banking Supervision (BCBS) 239. To respond to these new demands, you have to put processes and methods in place to automate metadata collection and analysis, and to provide rigorous documentation around your data flows. You also have to deal with many aspects of data management including data access, data manipulation (ETL and other), data quality, data usage, and data consumption, often from a variety of toolsets that are not necessarily from a single vendor. This paper shows you how to use SAS® technologies to support data governance requirements, including third party metadata collection and data monitoring. It highlights best practices such as implementing a business glossary and establishing controls for monitoring data. Attend this session to become familiar with the SAS tools used to meet the new requirements and to implement a more managed environment.
Read the paper (PDF)
Jeff Stander, SAS
F
Session 9260-2016:
FASHION, STYLE "GOTTA HAVE IT" COMPUTE DEFINE BLOCK
Do you create complex reports using PROC REPORT? Are you confused by the COMPUTE BLOCK feature of PROC REPORT? Are you even aware of it? Maybe you already produce reports using PROC REPORT, but suddenly your boss needs you to modify some of the values in one or more of the columns. Maybe your boss needs to see the values of some rows in boldface and others highlighted in a stylish yellow. Perhaps one of the columns in the report needs to display a variety of fashionable formats (some with varying decimal places and some without any decimals). Maybe the customer needs to see a footnote in specific cells of the report. Well, if this sounds familiar then come take a look at the COMPUTE BLOCK of PROC REPORT. This paper shows a few tips and tricks of using the COMPUTE DEFINE block with conditional IF/THEN logic to make your reports stylish and fashionable. The COMPUTE BLOCK allows you to use data DATA step code within PROC REPORT to provide customization and style to your reports. We'll see how the Census Bureau produces a stylish demographic profile for customers of its Special Census program using PROC REPORT with the COMPUTE BLOCK. The paper focuses on how to use the COMPUTE BLOCK to create this stylish Special Census profile. The paper shows quick tips and simple code to handle multiple formats within the same column, make the values in the Total rows boldface, trafficlighting, and how to add footnotes to any cell based on the column or row. The Special Census profile report is an Excel table created with ODS tagsets.ExcelXP that is stylish and fashionable, thanks in part to the COMPUTE BLOCK.
Read the paper (PDF) | Watch the recording
Chris Boniface, Census Bureau
Session 3940-2016:
Fantasizing about the Big Data of NFL Fantasy Football, or Time to Get a Life
With millions of users and peak traffic of thousands of requests a second for complex user-specific data, fantasy football offers many data design challenges. Not only is there a high volume of data transfers, but the data is also dynamic and of diverse types. We need to process data originating on the stadium playing field and user devices and make it available to a variety of different services. The system must be nimble and must produce accurate and timely responses. This talk discusses the strategies employed by and lessons learned from one of the primary architects of the National Football League's fantasy football system. We explore general data design considerations with specific examples of high availability, data integrity, system performance, and some other random buzzwords. We review some of the common pitfalls facing large-scale databases and the systems using them. And we cover some of the tips and best practices to take your data-driven applications from fantasy to reality.
Read the paper (PDF)
Clint Carpenter, Carpenter Programming
Session 2700-2016:
Forecasting Behavior with Age-Period-Cohort Models: How APC Predicted the US Mortgage Crisis, but Also Does So Much More
We introduce age-period-cohort (APC) models, which analyze data in which performance is measured by age of an account, account open date, and performance date. We demonstrate this flexible technique with an example from a recent study that seeks to explain the root causes of the US mortgage crisis. In addition, we show how APC models can predict website usage, retail store sales, salesperson performance, and employee attrition. We even present an example in which APC was applied to a database of tree rings to reveal climate variation in the southwestern United States.
View the e-poster or slides (PDF)
Joseph Breeden, Prescient Models
G
Session SAS5501-2016:
Getting There from Here: Lifting Enterprise SAS® to the Amazon Public Cloud
If your organization already deploys one or more software solutions via Amazon Web Services (AWS), you know the value of the public cloud. AWS provides a scalable public cloud with a global footprint, allowing users access to enterprise software solutions anywhere at any time. Although SAS® began long before AWS was even imagined, many loyal organizations driven by SAS are moving their local SAS analytics into the public AWS cloud, alongside other software hosted by AWS. SAS® Solutions OnDemand has assisted organizations in this transition. In this paper, we describe how we extended our enterprise hosting business to AWS. We describe the open source automation framework from which SAS Soultions onDemand built our automation stack, which simplified the process of migrating a SAS implementation. We'll provide the technical details of our automation and network footprint, a discussion of the technologies we chose along the way, and a list of lessons learned.
Read the paper (PDF)
Ethan Merrill, SAS
Bryan Harkola, SAS
Session 7300-2016:
Graphing Made Easy for Project Management
Project management is a hot topic across many industries, and there are multiple commercial software applications for managing projects available. The reality, however, is that the majority of project management software is not applicable for daily usage. SAS® has a solution for this issue that can be used for managing projects graphically in real time. This paper introduces a new paradigm for project management using the SAS® Graph Template Language (GTL). SAS clients, in real time, can use GTL to visualize resource assignments, task plans, delivery tracking, and project status across multiple project levels for more efficient project management.
Read the paper (PDF)
Zhouming(Victor) Sun, Medimmune
H
Session SAS6640-2016:
How to Create Event Stream Processing Models via a Graphical User Interface
SAS® Event Stream Processing is designed to analyze and process large volumes of streaming data in motion. SAS Event Stream Processing provides a browser-based user interface that enables you to create and test event stream processing models in a visual drop-and-drag environment. This environment delivers a highly interactive and intuitive user experience. This paper describes the visual, interactive interface for building models and monitoring event stream activity. It also provides examples to demonstrate how you can easily build a model using the graphical user interface of SAS Event Stream Processing. In these examples, SAS Event Stream Processing serves as the front end to process high-velocity streams. On the back end, SAS® Real-Time Decision Manager consumes events and makes the final decision to push the suited offer to the customer. This paper explains the concepts of windows, retention, edges, and connector. It also explains how SAS Event Stream Processing integrates with SAS Real-Time Decision Manager.
Read the paper (PDF)
Lei Xiao, SAS
Fang Meng, SAS
Session 9800-2016:
How to Visualize SAS® Data with JavaScript Libraries like HighCharts and D3
Have you ever wondered how to get the most from Web 2.0 technologies in order to visualize SAS® data? How to make those graphs dynamic, so that users can explore the data in a controlled way, without needing prior knowledge of SAS products or data science? Wonder no more! In this session, you learn how to turn basic sashelp.stocks data into a snazzy HighCharts stock chart in which a user can review any time period, zoom in and out, and export the graph as an image. All of these features with only two DATA steps and one SORT procedure, for 57 lines of SAS code.
Download the data file (ZIP) | View the e-poster or slides (PDF)
Vasilij Nevlev, Analytium Ltd
I
Session SAS5641-2016:
Improve Your Business through Process Mining
Looking for new ways to improve your business? Try mining your own data! Event log data is a side product of information systems generated for audit and security purposes and is seldom analyzed, especially in combination with business data. Along with the cloud computing era, more event log data has been accumulated and analysts are searching for innovative ways to take advantage of all data resources in order to get valuable insights. Process mining, a new field for discovering business patterns from event log data, has recently proved useful for business applications. Process mining shares some algorithms with data mining but it is more focused on interpretation of the detected patterns rather than prediction. Analysis of these patterns can lead to improvements in the efficiency of common existing and planned business processes. Through process mining, analysts can uncover hidden relationships between resources and activities and make changes to improve organizational structure. This paper shows you how to use SAS® Analytics to gain insights from real event log data.
Read the paper (PDF)
Emily (Yan) Gao, SAS
Robert Chu, SAS
Xudong Sun, SAS
Session 8680-2016:
Integrating Microsoft VBScript and SAS®
Microsoft Visual Basic Scripting Edition (VBScript) and SAS® software are each powerful tools in their own right. These two technologies can be combined so that SAS code can call a VBScript program or vice versa. This gives a programmer the ability to automate SAS tasks; traverse the file system; send emails programmatically via Microsoft Outlook or SMTP; manipulate Microsoft Word, Microsoft Excel, and Microsoft PowerPoint files; get web data; and more. This paper presents example code to demonstrate each of these capabilities.
Read the paper (PDF) | Download the data file (ZIP)
Christopher Johnson, BrickStreet Insurance
K
Session 7140-2016:
Key Requirements For SAS® Grid Users
Considering the fact that SAS® Grid Manager is becoming more and more popular, it is important to fulfill the user's need for a successful migration to a SAS® Grid environment. This paper focuses on key requirements and common issues for new SAS Grid users, especially if they are coming from a traditional environment. This paper describes a few common requirements like the need for a current working directory, the change of file system navigation in SAS® Enterprise Guide® with user-given location, getting job execution summary email, and so on. The GRIDWORK directory has been introduced in SAS Grid Manager, which is a bit different from the traditional SAS WORK location. This paper explains how you can use the GRIDWORK location in a more user-friendly way. Sometimes users experience data set size differences during grid migration. A few important reasons for data set size difference are demonstrated. We also demonstrate how to create new custom scripts as per business needs and how to incorporate them with SAS Grid Manager engine.
Read the paper (PDF) | View the e-poster or slides (PDF)
Piyush Singh, TATA Consultancy Services Ltd
Tanuj Gupta, TATA Consultancy Services
Prasoon Sangwan, Tata consultancy services limited
L
Session SAS6447-2016:
Linear State Space Models in Retail and Hospitality
Retailers need critical information about the expected inventory pattern over the life of a product to make pricing, replenishment, and staffing decisions. Hotels rely on booking curves to set rates and staffing levels for future dates. This paper explores a linear state space approach to understanding these industry challenges, applying the SAS/ETS® SSM procedure. We also use the SAS/ETS SIMILARITY procedure to provide additional insight. These advanced techniques help us quantify the relationship between the current inventory level and all previous inventory levels (in the retail case). In the hospitality example, we can evaluate how current total bookings relate to historical booking levels. Applying these procedures can produce valuable new insights about the nature of the retail inventory cycle and the hotel booking curve.
Read the paper (PDF)
Beth Cubbage, SAS
Session SAS4060-2016:
Location, Location, Location--Analytics with SAS® Visual Analytics and Esri
Business Intelligence users analyze business data in a variety of ways. Seventy percent of business data contains location information. For in-depth analysis, it is essential to combine location information with mapping. New analytical capabilities are added to SAS® Visual Analytics, leveraging the new partnership with Esri, a leader in location intelligence and mapping. The new capabilities enable users to enhance the analytical insights from SAS Visual Analytics. This paper demonstrates and discusses the new partnership with Esri and the new capabilities added to SAS Visual Analytics.
Read the paper (PDF)
Murali Nori, SAS
Himesh Patel, SAS
M
Session 5580-2016:
Macro Variables in SAS® Enterprise Guide®
For SAS® Enterprise Guide® users, sometimes macro variables and their values need to be brought over to the local workspace from the server, especially when multiple data sets or outputs need to be written to separate files in a local drive. Manually retyping the macro variables and their values in the local workspace after they have been created on the server workspace would be time-consuming and error-prone, especially when we have quite a number of macro variables and values to bring over. Instead, this task can be achieved in an efficient manner by using dictionary tables and the CALL SYMPUT routine, as illustrated in more detail below. The same approach can also be used to bring macro variables and their values from the local to the server workspace.
Read the paper (PDF) | Download the data file (ZIP) | Watch the recording
Khoi To, Office of Planning and Decision Support, Virginia Commonwealth University
Session 10460-2016:
Missing Values: They Are NOT Nothing
When analyzing data with SAS®, we often encounter missing or null values in data. Missing values can arise from the availability, collectibility, or other issues with the data. They represent the imperfect nature of real data. Under most circumstances, we need to clean, filter, separate, impute, or investigate the missing values in data. These processes can take up a lot of time, and they are annoying. For these reasons, missing values are usually unwelcome and need to be avoided in data analysis. There are two sides to every coin, however. If we can think outside the box, we can take advantage of the negative features of missing values for positive uses. Sometimes, we can create and use missing values to achieve our particular goals in data manipulation and analysis. These approaches can make data analyses convenient and improve work efficiency for SAS programming. This kind of creative and critical thinking is the most valuable quality for data analysts. This paper exploits real-world examples to demonstrate the creative uses of missing values in data analysis and SAS programming, and discusses the advantages and disadvantages of these methods and approaches. The illustrated methods and advanced programming skills can be used in a wide variety of data analysis and business analytics fields.
Read the paper (PDF)
Justin Jia, Trans Union Canada
Shan Shan Lin, CIBC
N
Session 10360-2016:
Nine Frequently Asked Questions about Getting Started with SAS® Visual Analytics
You've heard all the talk about SAS® Visual Analytics--but maybe you are still confused about how the product would work in your SAS® environment. Many customers have the same points of confusion about what they need to do with their data, how to get data into the product, how SAS Visual Analytics would benefit them, and even should they be considering Hadoop or the cloud. In this paper, we cover the questions we are asked most often about implementation, administration, and usage of SAS Visual Analytics.
Read the paper (PDF) | Watch the recording
Tricia Aanderud, Zencos Consulting LLC
Ryan Kumpfmiller, Zencos Consulting
Nick Welke, Zencos Consulting
O
Session 10640-2016:
Optimizing Airline Pilot Connection Time Using PROC REG and PROC LOGISTIC
As any airline traveler knows, connection time is a key element of the travel experience. A tight connection time can cause angst and concern, while a lengthy connection time can introduce boredom and a longer than desired travel time. The same elements apply when constructing schedules for airline pilots. Like passengers, pilot schedules are built with connections. Delta Air Lines operates a hub and spoke system that feeds both passengers and pilots from the spoke stations and connects them through the hub stations. Pilot connection times that are tight can result in operational disruptions, whereas extended pilot connection times are inefficient and unnecessarily costly. This paper demonstrates how Delta Air Lines used SAS® PROC REG and PROC LOGISTIC to analyze historical data in order to build operationally robust and financially responsible pilot connections.
Read the paper (PDF)
Andy Hummel, Delta Air Lines
P
Session 7540-2016:
PROC SQL for SQL DieHards
Inspired by Christianna William's paper on transitioning to PROC SQL from the DATA step, this paper aims to help SQL programmers transition to SAS® by using PROC SQL. SAS adapted the Structured Query Language (SQL) by means of PROC SQL back with SAS®6. PROC SQL syntax closely resembles SQL. However, there are some SQL features that are not available in SAS. Throughout this paper, we outline common SQL tasks and how they might differ in PROC SQL. We also introduce useful SAS features that are not available in SQL. Topics covered are appropriate for novice SAS users.
Read the paper (PDF)
Barbara Ross, NA
Jessica Bennett, Snap Finance
Session 2480-2016:
Performing Pattern Matching by Using Perl Regular Expressions
SAS® software provides many DATA step functions that search and extract patterns from a character string, such as SUBSTR, SCAN, INDEX, TRANWRD, etc. Using these functions to perform pattern matching often requires you to use many function calls to match a character position. However, using the Perl regular expression (PRX) functions or routines in the DATA step improves pattern-matching tasks by reducing the number of function calls and making the program easier to maintain. This talk, in addition to discussing the syntax of Perl regular expressions, demonstrates many real-world applications.
Read the paper (PDF) | Download the data file (ZIP)
Arthur Li, City of Hope
Session 7560-2016:
Processing CDC and SCD Type 2 for Sources without CDC: A Hybrid Approach
In a data warehousing system, change data capture (CDC) plays an important part not just in making the data warehouse (DWH) aware of the change but also in providing a means of flowing the change to the DWH marts and reporting tables so that we see the current and latest version of the truth. This and slowly changing dimensions (SCD) create a cycle that runs the DWH and provides valuable insights in the history and for the decision-making future. What if the source has no CDC? It would be an ETL nightmare to identify the exact change and report the absolute truth. If these two processes can be combined into a single process where just one single transform does both jobs of identifying the change and applying the change to the DWH, then we can save significant processing times and value resources of the system. Hence, I came up with a hybrid SCD with CDC approach for this. My paper focuses on sources that DO NOT have CDC in their sources and need to perform SCD Type 2 on such records without worrying about data duplications and increased processing times.
Read the paper (PDF) | Watch the recording
Vishant Bhat, University of Newcastle
Tony Blanch, SAS Consultant
Session 10481-2016:
Product Purchase Sequence Analyses by Using a Horizontal Data Sorting Technique
Horizontal data sorting is a very useful SAS® technique in advanced data analysis when you are using SAS programming. Two years ago (SAS® Global Forum Paper 376-2013), we presented and illustrated various methods and approaches to perform horizontal data sorting, and we demonstrated its valuable application in strategic data reporting. However, this technique can also be used as a creative analytic method in advanced business analytics. This paper presents and discusses its innovative and insightful applications in product purchase sequence analyses such as product opening sequence analysis, product affinity analysis, next best offer analysis, time-span analysis, and so on. Compared to other analytic approaches, the horizontal data sorting technique has the distinct advantages of being straightforward, simple, and convenient to use. This technique also produces easy-to-interpret analytic results. Therefore, the technique can have a wide variety of applications in customer data analysis and business analytics fields.
Read the paper (PDF) | View the e-poster or slides (PDF)
Justin Jia, Trans Union Canada
Shan Shan Lin, CIBC
Session 12526-2016:
Proposing a Recommendation System for DonorsChoose.org
Donorschoose.org is a nonprofit organization that allows individuals to donate directly to public school classroom projects. Teachers from public schools post a request for funding a project with a short essay describing it. Donors all around the world can look at these projects when they log in to Donorschoose.org and donate to projects of their choice. The idea is to have a personalized recommendation webpage for all the donors, which will show them the projects, which they prefer, like and love to donate. Implementing a recommender system for the DonorsChoose.org website will improve user experience and help more projects meet their funding goals. It also will help us in understanding the donors' preferences and delivering to them what they want or value. One type of recommendation system can be designed by predicting projects that will be less likely to meet funding goals, segmenting and profiling the donors and using that information for recommending right projects when the donors log in to DonorsChoose.org.
Read the paper (PDF)
Heramb Joshi, Oklahoma State University
Goutam Chakraborty, Oklahoma State University
Sandeep Chittoor, Student
Vignesh Dhanabal, Oklahoma State University
Sharat Dwibhasi, Oklahoma State University
R
Session SAS6363-2016:
REST at Ease with SAS®: How to Use SAS to Get Your REST
Representational State Transfer (REST) is being used across the industry for designing networked applications to provide lightweight and powerful alternatives to web services such as SOAP and Web Services Description Language (WSDL). Since REST is based entirely on HTTP, SAS® provides everything you need to make REST calls and to process structured and unstructured data alike. This paper takes a look at how some enhancements in the third maintenance release of SAS® 9.4 can benefit you in this area. Learn how the HTTP procedure and other SAS language features provide everything you need to simply and securely use REST.
Read the paper (PDF)
Joseph Henry, SAS
Session SAS6444-2016:
Rapid Prototyping: Accelerating Development of Your Organization's Reports Using SAS® Visual Analytics
One of the most important factors driving the success of requirements-gathering can be easily overlooked. Your user community needs to have a clear understanding of what is possible: from different ways to represent a hierarchy to how visualizations can drive an analysis to newer, but less common, visualizations that are quickly becoming standard. Discussions about desktop access versus mobile deployment and/or which users might need more advanced statistical reporting can lead to a serious case of option overload. One of the best cures for option overload is to provide your user community with access to template reports they can explore themselves. In this paper, we describe how you can take a single rich data set and build a set of template reports that demonstrate the full functionality of SAS® Visual Analytics, a suite of the most common, most useful SAS Visual Analytics report structures, from high-level dashboards to statistically deep dynamic visualizations. We show exactly how to build a dozen template reports from a single data source, simultaneously representing options for color schemes, themes, and other choices to consider. Although this template suite approach can apply to any industry, our example data set will be publicly available data from the Home Mortgage Disclosure Act, de-identified data on mortgage loan determinations. Instead of beginning requirements-gathering with a blank slate, your users can begin the conversation with, I would like something like Template #4, greatly reducing the time and effort required to meet their needs.
Read the paper (PDF)
Elliot Inman, SAS
Michael Drutar, SAS
Session 10401-2016:
Responsible Gambling Model at Veikkaus
Our company Veikkaus is a state-owned gambling and lottery company in Finland that has a national legalized monopoly for gambling. All the profit we make goes back to Finnish society (for art, sports, science, and culture), and this is done by our government. In addition to the government's requirements of profit, the state (Finland) also requires us to handle the adverse social aspects of gaming, such as problem gambling. The challenge in our business is to balance between these two factors. For the purposes of problem gambling, we have used SAS® tools to create a responsible gaming tool, called VasA, based on a logistic regression model. The name VasA is derived from the Finnish words for 'Responsible Customership.' The model identifies problem gamblers from our customer database using the data from identified gaming, money transfers, web behavior, and customer data. The variables that were used in the model are based on the theory behind the problem gambling. Our actions for problem gambling include, for example, different CRM and personalization of a customer's website in our web service. There were several companies who provided responsible gambling tools as such for us to buy, but we wanted to create our own for two reasons. Firstly, we wanted it to include our whole customer database, meaning all our customers and not just those customers who wanted to take part in it. These other tools normally include only customers who want to take part. The other reason was that we saved a ridiculous amount of money by doing it by ourselves compared to having to buy one. During this process, SAS played a big role, from gathering the data to the construction of the tool, and from modeling to creating the VasA variables, then on to the database, and finally to the analyses and reporting.
Read the paper (PDF)
Tero Kallioniemi, Veikkaus
S
Session SAS5880-2016:
SAS® Mobile Analytics: Accelerate Analytical Insights on the Go
Mobile devices are an integral part of a business professional's life. These mobile devices are getting increasingly powerful in terms of processor speeds and memory capabilities. Business users can benefit from a more analytical visualization of the data along with their business context. The new SAS® Mobile BI contains many enhancements that facilitate the use of SAS® Analytics in the newest version of SAS® Visual Analytics. This paper demonstrates how to use the new analytical visualization that has been added to SAS Mobile BI from SAS Visual Analytics, for a richer and more insightful experience for business professionals on the go.
Read the paper (PDF)
Murali Nori, SAS
Session 11802-2016:
Sentiment Analysis of User Reviews for Hotels
With the ever growing global tourism sector, the hotel industry is also flourishing by leaps and bounds, and the standards of quality services by the tourists are also increasing. In today's cyber age where many on the Internet become part of the online community, word of mouth has steadily increased over time. According to a recent survey, approximately 46% of travelers look for online reviews before traveling. A one-star rating by users influences the peer consumer more than the five-star brand that the respective hotel markets itself to be. In this paper, we do the customer segmentation and create clusters based on their reviews. The next process relates to the creation of an online survey targeting the interests of the customers of those different clusters, which helps the hoteliers identify the interests and the hospitality expectations from a hotel. For this paper, we use a data set of 4096 different hotel reviews, with a total of 60,239 user reviews.
View the e-poster or slides (PDF)
Saurabh Nandy, Oklahoma State University
Neha Singh, Oklahoma State University
T
Session SAS2560-2016:
Ten Tips to Unlock the Power of Hadoop with SAS®
This paper discusses a set of practical recommendations for optimizing the performance and scalability of your Hadoop system using SAS®. Topics include recommendations gleaned from actual deployments from a variety of implementations and distributions. Techniques cover tips for improving performance and working with complex Hadoop technologies such as Kerberos, techniques for improving efficiency when working with data, methods to better leverage the SAS in Hadoop components, and other recommendations. With this information, you can unlock the power of SAS in your Hadoop system.
Read the paper (PDF)
Nancy Rausch, SAS
Wilbram Hazejager, SAS
Session 3980-2016:
Text Analytics and Brand Topic Maps
In this session, we examine the use of text analytics as a tool for strategic analysis of an organization, a leader, a brand, or a process. The software solutions SAS® Enterprise Miner™ and Base SAS® are used to extract topics from text and create visualizations that identify the relative placement of the topics next to various business entities. We review a number of case studies that identify and visualize brand-topic relationships in the context of branding, leadership, and service quality.
Read the paper (PDF) | Watch the recording
Nicholas Evangelopoulos, University of North Texas
Session 12489-2016:
The Application of Fatality Analysis Reporting System Data on the Road Safety Education of US, DC, and PR Minors
All public schools in the United States require health and safety education for their students. Furthermore, almost all states require driver education before minors can obtain a driver's license. Through extensive analysis of the Fatality Analysis Reporting System data, we have concluded that from 2011-2013 an average of 12.1% of all individuals killed in a motor vehicle accident in the United States, District of Columbia, and Puerto Rico were minors (18 years or younger). Our goal is to offer insight within our analysis in order to better road safety education to prevent future premature deaths involving motor vehicles.
Read the paper (PDF)
Molly Funk, Bryant University
Max Karsok, Bryant University
Michelle Williams, Bryant University
Session 7120-2016:
The Combination of SAS® and VBA Makes Life Easier
VBA has been described as a glue language, and has been widely used in exchanging data between Microsoft products such as Excel and Word or PowerPoint. How to trigger the VBA macro from SAS® via DDE has been widely discussed in recent years. However, using SAS to send parameters to a VBA macro was seldom reported. This paper provides a solution for this problem. Copying Excel tables to PowerPoint using the combination of SAS and VBA is illustrated as an example. The SAS program rapidly scans all Excel files that are contained in one folder, passes the file information to VBA as parameters, and triggers the VBA macro to write PowerPoint files in a loop. As a result, a batch of PowerPoint files can be generated by just one mouse-click.
Read the paper (PDF) | Watch the recording
Zhu Yanrong, Medtronic
Session SAS6477-2016:
The Optimization of the Optimal Customer
For marketers who are responsible for identifying the best customer to target in a campaign, it is often daunting to determine which media channel, offer, or campaign program is the one the customer is more apt to respond to, and therefore, is more likely to increase revenue. This presentation examines the components of designing campaigns to identify promotable segments of customers and to target the optimal customers using SAS® Marketing Automation integrated with SAS® Marketing Optimization.
Read the paper (PDF)
Pamela Dixon, SAS
Session 7020-2016:
Three Methods to Dynamically Assign Colors to Plots Based on Group Value
Specifying colors based on group value is a quite popular practice in visualizing data, but it is not so easy to do, especially when there are multiple group values. This paper explores three different methods to dynamically assign colors to plots based on their group values. They are combining EVAL and IFN functions in the plot statements; bringing the DISCRETEATTRMAP block into the plot statements; and using the macro from the SAS® sample 40255.
Read the paper (PDF) | Watch the recording
Amos Shu, MedImmune
U
Session SAS6660-2016:
Using Metadata Queries To Build Row-Level Audit Reports in SAS® Visual Analytics
Sensitive data requires elevated security requirements and the flexibility to apply logic that subsets data based on user privileges. Following the instructions in SAS® Visual Analytics: Administration Guide gives you the ability to apply row-level permission conditions. After you have set the permissions, you have to prove through audits who has access and row-level security. This paper provides you with the ability to easily apply, validate, report, and audit all tables that have row-level permissions, along with the groups, users, and conditions that will be applied. Take the hours of maintenance and lack of visibility out of row-level secure data and build confidence in the data and analytics that are provided to the enterprise.
Read the paper (PDF) | Download the data file (ZIP)
Brandon Kirk, SAS
Session 5581-2016:
Using PROC TABULATE and LAG(n) Function for Rates of Change
For SAS® users, PROC TABULATE and PROC REPORT (and its compute blocks) are probably among the most common procedures for calculating and displaying data. It is, however, pretty difficult to calculate and display changes from one column to another using data from other rows with just these two procedures. Compute blocks in PROC REPORT can calculate additional columns, but it would be challenging to pick up values from other rows as inputs. This presentation shows how PROC TABULATE can work with the lag(n) function to calculate rates of change from one period of time to another. This offers the flexibility of feeding into calculations the data retrieved from other rows of the report. PROC REPORT is then used to produce the desired output. The same approach can also be used in a variety of scenarios to produce customized reports.
Read the paper (PDF) | Download the data file (ZIP) | Watch the recording
Khoi To, Office of Planning and Decision Support, Virginia Commonwealth University
W
Session SAS2400-2016:
What's New in SAS® Data Management
The latest releases of SAS® Data Integration Studio, SAS® Data Management Studio and SAS® Data Integration Server, SAS® Data Governance, and SAS/ACCESS® software provide a comprehensive and integrated set of capabilities for collecting, transforming, and managing your data. The latest features in the product suite include capabilities for working with data from a wide variety of environments and types including Hadoop, cloud, RDBMS, files, unstructured data, streaming, and others, and the ability to perform ETL and ELT transformations in diverse run-time environments including SAS®, database systems, Hadoop, Spark, SAS® Analytics, cloud, and data virtualization environments. There are also new capabilities for lineage, impact analysis, clustering, and other data governance features for enhancements to master data and support metadata management. This paper provides an overview of the latest features of the SAS® Data Management product suite and includes use cases and examples for leveraging product capabilities.
Read the paper (PDF)
Nancy Rausch, SAS
Session SAS5520-2016:
When the Answer to Public or Private Is Both: Managing a Hybrid Cloud Environment
For many organizations, the answer to whether to manage their data and analytics in a public or private cloud is going to be both. Both can be the answer for many different reasons: common sense logic not to replace a system that already works just to incorporate something new; legal or corporate regulations that require some data, but not all data, to remain in place; and even a desire to provide local employees with a traditional data center experience while providing remote or international employees with cloud-based analytics easily managed through software deployed via Amazon Web Services (AWS). In this paper, we discuss some of the unique technical challenges of managing a hybrid environment, including how to monitor system performance simultaneously for two different systems that might not share the same infrastructure or even provide comparable system monitoring tools; how to manage authorization when access and permissions might be driven by two different security technologies that make implementation of a singular protocol problematic; and how to ensure overall automation of two platforms that might be independently automated, but not originally designed to work together. In this paper, we share lessons learned from a decade of experience implementing hybrid cloud environments.
Read the paper (PDF)
Ethan Merrill, SAS
Bryan Harkola, SAS
Session 9440-2016:
Who's Your Neighbor? A SAS® Algorithm for Finding Nearby Zip Codes
Even if you're not a GIS mapping pro, it pays to have some geographic problem-solving techniques in your back pocket. In this paper we illustrate a general approach to finding the closest location to any given US zip code, with a specific, user-accessible example of how to do it, using only Base SAS®. We also suggest a method for implementing the solution in a production environment, as well as demonstrate how parallel processing can be used to cut down on computing time if there are hardware constraints.
Read the paper (PDF) | Download the data file (ZIP)
Andrew Clapson, MD Financial Management
Annmarie Smith, HomeServe USA
Y
Session 10600-2016:
You Can Bet on It: Missing Observations Are Preserved with the PRELOADFMT and COMPLETETYPES Options
Do you write reports that sometimes have missing categories across all class variables? Some programmers write all sorts of additional DATA step code in order to show the zeros for the missing rows or columns. Did you ever wonder whether there is an easier way to accomplish this? PROC MEANS and PROC TABULATE, in conjunction with PROC FORMAT, can handle this situation with a couple of powerful options. With PROC TABULATE, we can use the PRELOADFMT and PRINTMISS options in conjunction with a user-defined format in PROC FORMAT to accomplish this task. With PROC SUMMARY, we can use the COMPLETETYPES option to get all the rows with zeros. This paper uses examples from Census Bureau tabulations to illustrate the use of these procedures and options to preserve missing rows or columns.
Read the paper (PDF) | Watch the recording
Chris Boniface, Census Bureau
Janet Wysocki, U.S. Census Bureau
back to top