SAS Global Forum 2017 Proceedings

Automatic loading, tracking, and visualization of data readiness in SAS^® Visual Analytics is easy when you combine SAS^® Data Integration Studio with the DATASET and LASR procedures. This paper illustrates the simple method that the University of North Carolina at Chapel Hill (Enterprise Reporting and Departmental Systems) uses to automatically load tables into the SAS^® LASR Analytic Servers, and then store reportable data about the HDFS tables created, the LASR tables loaded, and the ETL job execution times. This methodology gives the department the ability to longitudinally visualize system loading performance and identify changes in system behavior, as well as providing a means of measuring how well we are serving our customers over time.

Read the paper (PDF)

UNIX and Linux SAS^® administrators, have you ever been greeted by one of these statements as you walk into the office before you have gotten your first cup of coffee? Power outage! SAS servers are down. I cannot access my reports. Have you frantically tried to restart the SAS servers to avoid loss of productivity and missed one of the steps in the process, causing further delays while other work continues to pile up? If you have had this experience, you understand the benefit to be gained from a utility that automates the management of these multi-tiered deployments. Until recently, there was no method for automatically starting and stopping multi-tiered services in an orchestrated fashion. Instead, you had to use time-consuming manual procedures to manage SAS services. These procedures were also prone to human error, which could result in corrupted services and additional time lost, debugging and resolving issues injected by this process. To address this challenge, SAS Technical Support created the SAS Local Services Management (SAS_lsm) utility, which provides automated, orderly management of your SAS^® multi-tiered deployments. The intent of this paper is to demonstrate the deployment and usage of the SAS_lsm utility. Now, go grab a coffee, and let's see how SAS_lsm can make life less chaotic.

Read the paper (PDF)

Users want more power. SAS^® delivers. Data grids are a new data type available to users of SAS^® Business Rules Manager and SAS^® Decision Manager. These data grids can be deployed to both batch and web service scoring for data mining models and business decisions. Users will learn how to construct data with grid data types, create business rules using high-level expressions, and deploy decisions to both batch and web services for scoring.

Read the paper (PDF)

The Base SAS^® 9.4 Output Delivery System (ODS) EPUB destination enables users to deliver SAS^® reports as e-books on Apple mobile devices. ODS EPUB e-books are truly mobile you don't need an Internet connection to read them. Just install Apple's free iBooks app, and you're good to go. This paper shows you how to create an e-book with ODS EPUB and sideload it onto your Apple device. You will learn new SAS^® 9.4 techniques for including text, images, audio, and video in your ODS EPUB e-books. You will understand how to customize your e-book's table of contents (TOC) so that readers can easily navigate the e-book. And you will learn how to modify the ODS EPUB style to create specialized presentation effects. This paper provides beginning to intermediate instruction for writing e-books with ODS EPUB. Please bring your iPad, iPhone, or iPod to the presentation so that you can download and read the examples.

Read the paper (PDF)

The TABULATE procedure has long been a central workhorse of our organization's reporting processes, given that it offers a uniquely concise syntax for obtaining descriptive statistics on deeply grouped and nested categories within a data set. Given the diverse output capabilities of SAS^®, it often then suffices to simply ship the procedure's completed output elsewhere via the Output Delivery System (ODS). Yet there remain cases in which we want to not only obtain a formatted result, but also to acquire the full nesting tree and logic by which the computations were made. In these cases, we want to treat the details of the Tabulate statements as data, not merely as presentation. I demonstrate how we have solved this problem by parsing our Tabulate statements into a nested tree structure in JSON that can be transferred and easily queried for deep values elsewhere beyond the SAS program. Along the way, this provides an excellent opportunity to walk through the nesting logic of the procedure's statements and explain how to think about the axes, groupings, and set computations that make it tick. The source code for our syntax parser are also available on GitHub for further use.

Read the paper (PDF)

Real workflow dependencies exist when the completion or output of one data process is a prerequisite for subsequent data processes. For example, in extract, transform, load (ETL) systems, the extract must precede the transform and the transform must precede the load. This serialization is common in SAS^® data analytic development but should be implemented only when actual dependencies exist. A false dependency, by contrast, exists when the workflow itself does not require serialization but is coded in a manner that forces a process to wait unnecessarily for some unrelated process to complete. For example, an ETL system might extract, transform, and load one data set, and then extract, transform, and load a second data set, causing processing of the second data set to wait unnecessarily for the first to complete. This hands-on session demonstrates three common patterns of false dependencies, teaching SAS practitioners how to recognize and remedy false dependencies through parallel processing paradigms. Groups of participants are pitted against each other, as the class simultaneously runs both serialized software and distributed software that runs in parallel. Participants execute exercises in unison, and then watch their machines race to the finish as the tremendous performance advantages of parallel processing are demonstrated in one exercise after another--ideal for anyone seeking to walk away with proven techniques that can measurably increase your performance and bonus.

Read the paper (PDF)

SAS^® In-Memory Analytics for Hadoop is an analytical programming environment that enables a user to use many components of an analytics project in a single environment, rather than switching between different applications. Users can easily prepare raw data for different types of analytics procedures. These techniques explore the data to enhance the information extractions. They can apply a large variety of statistical and machine learning techniques to the data to compare different analytical approaches. The model comparison capabilities let them quickly find the best model, which they can deploy and score in the Hadoop environment. All of these different components of the analytics project are supported in a distributed in-memory environment for lightning-fast processing. This paper highlights tips for working with the interaction between Hadoop data and for dealing with SAS^® LASR Analytic Server. It contains multiple scenarios with elementary but pragmatic approaches that enable SAS^® programmers to work efficiently within the SAS^® In-Memory Analytics environment.

Read the paper (PDF) | View the e-poster or slides (PDF)

As the IT industry moves to further embrace cloud computing and the benefits it enables, many companies have been slow to adopt these changes due to concerns around data compliance. Compliance with state and federal law and the relevant regulations often leads decision makers to insist that systems dealing with protected health information or similarly sensitive data remain on-premises, as the risks for non-compliance are so high. In this session, we detail BNL Consulting s standard practices for transitioning solutions that are compliant with the Health Insurance Portability and Accountability Act (HIPAA) from on-premises to a cloud-based environment hosted by Amazon Web Services (AWS). We explain that by following best practices and doing plenty of research, HIPAA compliance in a cloud environment is no more challenging than compliance in an on-premises environment. We discuss the role of best-in-practice dev-ops tools like Docker, Consul, ELK Stack, and others, which improve the reliability and the repeat-ability of your HIPAA-compliant solutions. We tie these recommendations to the use of common SAS tools and show how they can work in concert to stabilize and improve the performance of the solution over the on-premises alternatives. Although this presentation is focused on health care and HIPAA-specific examples, many of the described practices and processes apply to any sensitive-data solutions that are being considered for the cloud.