SAS Global Forum 2015 Proceedings

In the very near future you will likely encounter Hadoop. It is rapidly displacing database management systems in the corporate world and is rearing its head in the SAS^® world. If you think now is the time to learn how to use SAS with Hadoop, you are in luck. This workshop is the jump start you need. This workshop introduces Hadoop and shows you how to access it by using SAS/ACCESS^® Interface to Hadoop. During the workshop, we show you how to do the following: how to configure your SAS environment so that you can access Hadoop data; how to use the Hadoop FILENAME statement; how to use the HADOOP procedure; and how to use SAS/ACCESS Interface to Hadoop (including performance tuning).

So you are still writing SAS^® DATA steps and SAS macros and running them through a command-line scheduler. When work comes in, there is only one person who knows that code, and they are out--what to do? This paper shows how SAS applies extract, transform, load (ETL) modernization techniques with SAS^® Data Integration Studio to gain resource efficiencies and to break down the ETL black box. We are going to share the fundamentals (metadata foldering and naming standards) that ensure success, along with steps to ease into the pool while iteratively gaining benefits. Benefits include self-documenting code visualization, impact analysis on jobs and tables impacted by change, and being supportable by interchangeable bench resources. We conclude with demonstrating how SAS^® Visual Analytics is being used to monitor service-level agreements and provide actionable insights into job-flow performance and scheduling.

Read the paper (PDF).

Building and maintaining a data warehouse can require a complex series of jobs. Having an ETL flow that is reliable and well integrated is one big challenge. An ETL process might need some pre- and post-processing operations on the database to be well integrated and reliable. Some might handle this via maintenance windows. Others like us might generate custom transformations to be included in SAS^® Data Integration Studio jobs. Custom transformations in SAS Data Integration Studio can be used to speed ETL process flows and reduce the database administrator's intervention after ETL flows are complete. In this paper, we demonstrate the use of custom transformations in SAS Data Integration Studio jobs to handle database-specific tasks for improving process efficiency and reliability in ETL flows.

Read the paper (PDF).

In this session, we discuss the advantages of SAS^® Federation Server and how it makes it easier for business users to access secure data for reports and use analytics to drive accurate decisions. This frees up IT staff to focus on other tasks by giving them a simple method of sharing data using a centralized, governed, security layer. SAS Federation Server is a data server that provides scalable, threaded, multi-user, and standards-based data access technology in order to process and seamlessly integrate data from multiple data repositories. The server acts as a hub that provides clients with data by accessing, managing, and sharing data from multiple relational and non-relational data sources as well as from SAS^® data. Users can view data in big data sources like Hadoop, SAP HANA, Netezza, or Teradata, and blend them with existing database systems like Oracle or DB2. Security and governance features, such as data masking, ensure that the right users have access to the data and reduce the risk of exposure. Finally, data services are exposed via a REST API for simpler access to data from third-party applications.

Read the paper (PDF).

Organizations are loading data into Hadoop platforms at an extraordinary rate. However, in order to extract value from these platforms, the data must be prepared for analytic exploit. As the volume of data grows, it becomes increasingly more important to reduce data movement, as well as to leverage the computing power of these distributed systems. This paper provides a cursory overview of SAS^® Data Loader, a product specifically aimed at these challenges. We cover the underlying mechanisms of how SAS Data Loader works, as well as how it's used to profile, cleanse, transform, and ultimately prepare data for analytics in Hadoop.

Read the paper (PDF).

Well, Hadoop community, now that you have your data in Hadoop, how are you staging your analytical base tables? In my discussions with clients about this, we all agree on one thing: Data sizes stored in Hadoop prevent us from moving that data to a different platform in order to generate the analytical base tables. To address this dilemma, I want to introduce to you the SAS^® In-Database Code Accelerator for Hadoop.

Read the paper (PDF).

It is a common task to reshape your data from long to wide for the purpose of reporting or analytical modeling and PROC TRANSPOSE provides a convenient way to accomplish this. However, when performing the transpose action on large tables stored in a database management system (DBMS) such as Teradata, the performance of PROC TRANSPOSE can be significantly compromised. In this case, it is more efficient for the DBMS to perform the transpose task. SAS^® provides in-database processing technology in PROC SQL, which allows the SQL explicit pass-through method to push some or all of the work to the DBMS. This technique has facilitated integration between SAS and a wide range of data warehouses and databases, including Teradata, EMC Greenplum, IBM DB2, IBM Netezza, Oracle, and Aster Data. This paper uses the Teradata database as an example DBMS and explains how to transpose a large table that resides in it using the SQL explicit pass-through method. The paper begins with comparing the execution time using PROC TRANSPOSE with the execution time using SQL explicit pass-through. From this comparison, it is clear that SQL explicit pass-through is more efficient than the traditional PROC TRANSPOSE when transposing Teradata tables, especially large tables. The paper explains how to use the SQL explicit pass-through method and discusses the types of data columns that you might need to transpose, such as numeric and character. The paper presents a transpose solution for these types of columns. Finally, the paper provides recommendations on packaging the SQL explicit pass-through method by embedding it in a macro. SAS programmers who are working with data stored in an external DBMS and who would like to efficiently transpose their data will benefit from this paper.

Read the paper (PDF).

Streaming data is becoming more and more prevalent. Everything is generating data now--social media, machine sensors, the 'Internet of Things'. And you need to decide what to do with that data right now. And 'right now' could mean 10,000 times or more per second. SAS^® Event Stream Processing provides an infrastructure for capturing streaming data and processing it on the fly--including applying analytics and deciding what to do with that data. All in milliseconds. There are some basic tenets on how SAS^® provides this extremely high-throughput, low-latency technology to meet whatever streaming analytics your company might want to pursue.

Read the paper (PDF). | Watch the recording.

For the many relational database products that SAS/ACCESS^®supports (Oracle, Teradata, DB2, MySQL, SQL Server, Hadoop, Greenplum, PC Files, to name but a few), there are a myriad of RDBMS-specific options at your disposal, but how do you know the right options for any given situation? How much data should you transfer at a time? Which SAS^® functions can be passed through to the database--which cannot? How do you verify that your processes are running efficiently? How do you test and validate any changes? The answer lies with the feedback capabilities of the SASTRACE system option.

Read the paper (PDF).

This workshop provides hands-on experience with SAS^® Data Loader for Hadoop. Workshop participants will configure SAS Data Loader for Hadoop and use various directives inside SAS Data Loader for Hadoop to interact with data in the Hadoop cluster.

Read the paper (PDF).

Is your company using or considering using SAP Business Warehouse (BW) powered by SAP HANA? SAS^® provides various levels of integration with SAP BW in an SAP HANA environment. This integration enables you to not only access SAP BW components from SAS, but to also push portions of SAS analysis directly into SAP HANA, accelerating predictive modeling and data mining operations. This paper explains the SAS toolset for different integration scenarios, highlights the newest technologies contributing to integration, and walks you through examples of using SAS with SAP BW on SAP HANA. The paper is targeted at SAS and SAP developers and architects interested in building a productive analytical environment with the help of the latest SAS and SAP collaborative advancements.

Read the paper (PDF).

Every day, companies all over the world are moving their data into the Cloud. While there are many options available, much of this data will wind up in Amazon Redshift. As a SAS^® user, you are probably wondering, 'What is the best way to access this data using SAS?' This paper discusses the many ways that you can use SAS/ACCESS^® to get to Amazon Redshift. We compare and contrast the various approaches and help you decide which is best for you. Topics that are discussed are building a connection, choosing appropriate data types, and SQL functions.

Read the paper (PDF).

A maximum harvest in farming analytics is achieved only if analytics can also be operationalized at the level of core business applications. Mapped to the use of SAS^® Analytics, the fruits of SAS be shared with Enterprise Business Applications by SAP. Learn how your SAS environment, including the latest of SAS^® In-Memory Analytics, can be integrated with SAP applications based on the SAP In-Memory Platform SAP HANA. We'll explore how a SAS^® Predictive Modeling environment can be embedded inside SAP HANA and how native SAP HANA data management capabilities such as SAP HANA Views, Smart Data Access, and more can be leveraged by SAS applications and contribute to an end-to-end in-memory data management and analytics platform. Come and see how you can extend the reach of your SAS^® Analytics efforts with the SAP HANA integration!

Read the paper (PDF).

The latest releases of SAS^® Data Integration Studio and DataFlux^® Data Management Platform provide an integrated environment for managing and transforming your data to meet new and increasingly complex data management challenges. The enhancements help develop efficient processes that can clean, standardize, transform, master, and manage your data. The latest features include capabilities for building complex job processes, new web-based development and job monitoring environments, enhanced ELT transformation capabilities, big data transformation capabilities for Hadoop, integration with the analytic platform provided by SAS^® LASR™ Analytic Server, enhanced features for lineage tracing and impact analysis, and new features for master data and metadata management. This paper provides an overview of the latest features of the products and includes use cases and examples for leveraging product capabilities.