Hadoop Papers A-Z

E
Session 1049-2017:
Exploiting Competitor Data Using SAS/ACCESS® Interface to Hadoop
The British Airways (BA) revenue management team is responsible for surfacing prices made available in the market with the objective of maximizing revenue from our 40,000,000 passenger journeys. BA is currently working to understand how competitor data can be exploited to help facilitate better decision making. Due to the low level of aggregation, competitor data is too large (and consequently too expensive) to store on conventional relational databases. Therefore, it has been stored on a small Hadoop installation at BA. Thanks to SAS/ACCESS® Interface to Hadoop, we have been able to run our complex algorithms on these large data sets without changing the way we work and whilst exploiting the full capabilities of SAS®.
Read the paper (PDF)
Kayne Putman, British Airways
F
Session SAS0491-2017:
From Source to Target: Hadoop Capabilities of SAS® Data Integration Studio
This paper demonstrates how to use the capabilities of SAS® Data Integration Studio to extract, load, and transform your data within a Hadoop environment. Which transformations can be used in each layer of the ELT process is illustrated using a sample use case, and the functionality of each is described. The use case steps through the process from source to target.
Read the paper (PDF)
Darryl Yewchin, SAS
Todd Foreman, SAS
H
Session 2001-2017:
Hands-On Workshop: SAS® Data Loader for Hadoop
This workshop provides hands-on experience with some basic functionality of SAS Data Loader for Hadoop. You will learn how to: Copy Data to Hadoop Profile Data in Hadoop Cleanse Data in Hadoop
Kari Richardson, SAS
Session SAS0378-2017:
How SAS® Customers Are Using Hadoop: Year in Review
Another year implementing, validating, securing, optimizing, migrating, and adopting the Hadoop platform. What have been the top 10 accomplishments with Hadoop seen over the last year? We also review issues, concerns, and resolutions from the past year as well. We discuss where implementations are and some best practices for moving forward with Hadoop and SAS® releases.
Read the paper (PDF)
Howard Plemmons, SAS
Mauro Cazzari, SAS
I
Session 1117-2017:
Introduction to Configuring and Managing SAS® Grid Manager for Hadoop
How can we run traditional SAS® jobs, including SAS® Workspace Servers, on Hadoop worker nodes? The answer is SAS® Grid Manager for Hadoop, which is integrated with the Hadoop ecosystem to provide resource management, high availability and enterprise scheduling for SAS customers. This paper provides an introduction to the architecture, configuration, and management of SAS Grid Manager for Hadoop. Anyone involved with SAS and Apache Hadoop should find the information in this paper useful. The first area covered is a breakdown of each required SAS and Hadoop component. From the Hadoop ecosystem, we define the role of Hadoop YARN, Hadoop Distributed File System (HDFS) storage, and Hadoop client services. We review SAS metadata definitions for SAS Grid Manager, SAS® Object Spawner, and SAS® Workspace Servers. We cover required Kerberos security, as well as SAS® Enterprise Guide® and the SAS® Grid Manager Client Utility. YARN queues and the SAS Grid Policy file for optimizing job scheduling are also reviewed. And finally, we discuss traditional SAS math running on a Hadoop worker node, and how it can take advantage of high-performance math to accelerate job execution. By leveraging SAS Grid Manager for Hadoop, sites are moving SAS jobs inside a Hadoop cluster. This will ultimately cut down on data movement and provide more consistent job execution. Although this paper is written for SAS and Hadoop administrators, SAS users can also benefit from this session.
Read the paper (PDF)
Mark Lochbihler, Hortonworks
S
Session SAS0488-2017:
SAS® and Hadoop: The 6th Annual State of the Union
The fourth maintenance release for SAS® 9.4 and the new SAS® Viya platform bring even more progress with respect to the interoperability between SAS® and Hadoop the industry standard for big data. This talk brings you up-to-date with where we are: more distributions, more data types, more options and then there is the cloud. Come and learn about the exciting new developments for blending your SAS processing with your shared Hadoop cluster.
Read the paper (PDF)
Paul Kent, SAS
T
Session SAS0190-2017:
Ten Tips to Unlock the Power of Hadoop with SAS®
This paper discusses a set of practical recommendations for optimizing the performance and scalability of your Hadoop system using SAS®. Topics include recommendations gleaned from actual deployments from a variety of implementations and distributions. Techniques cover tips for improving performance and working with complex Hadoop technologies such as Kerberos, techniques for improving efficiency when working with data, methods to better leverage the SAS in Hadoop components, and other recommendations. With this information, you can unlock the power of SAS in your Hadoop system.
Read the paper (PDF)
Wilbram Hazejager, SAS
Nancy Rausch, SAS
back to top