SAS Global Forum 2017 Proceedings

SAS^® is often deployed in a client/server architecture in which SAS^® Foundation is installed on a server and is accessed from each user's workstation. Many system administrators prefer that users not log on directly to the server to run SAS, nor do they want to set up a complex Citrix environment. SAS client applications are an attractive alternative for this type of architecture. But with the advent of multiple SAS^® Studio editions and ongoing enhancements to SAS^® Enterprise Guide^®, choosing the most suitable client application presents a challenge for many system administrators. To help guide you in this choice, this paper compares the administration of three SAS Foundation client applications that can be used in a client/server architecture: SAS Enterprise Guide, SAS^® Studio Basic, and SAS^® Studio Mid-Tier. The usage differences between SAS Studio and SAS Enterprise Guide have been addressed elsewhere. In this paper, we focus on differences that pertain specifically to system administration, including deployment, maintenance, and authentication. The information presented here will help system administrators determine which application best fits the needs of their users and their environment.

Read the paper (PDF)

This paper presents considerations for deploying SAS^® Foundation across software-defined storage (SDS) infrastructures, and within virtualized storage environments. There are many new offerings on the market that offer easy, point-and-click creation of storage entities, with simplified management. Internal storage area network (SAN) virtualization also removes much of the hands-on management for defining storage device pools. Automated tier software further attempts to optimize data placement across performance tiers without manual intervention. Virtual storage provisioning and automated tier placement have many time-saving and management benefits. In some cases, they have also caused serious unintended performance issues with heavy large-block workloads, such as those found in SAS Foundation. You must follow best practices to get the benefit of these new technologies while still maintaining performance. For SDS infrastructures, this paper offers specific considerations for the performance of applications in SAS Foundation, workload management and segregation, replication, high availability, and disaster recovery. Architecture and performance ramifications and advice are offered for virtualized and tiered storage systems. General virtual storage pros and cons are also discussed in detail.

Read the paper (PDF)

Are you prepared if a disaster happens? If your company relies on SAS^® applications to stay in business, you should have a Disaster Recovery Plan (DRP) in place. By a DRP, we mean documentation of the process to recover and protect your SAS infrastructure (SAS binaries, the operating system that is tuned to run your SAS applications, and all the pertinent data that the SAS applications require) in the event of a disaster. This paper discusses what needs to be in this plan to ensure that your SAS infrastructure not only works after it is recovered, but is able to be maintained on the recovery hardware infrastructure.

Read the paper (PDF)

Sometimes it might be beneficial to share a BI environment with multiple tenants within an enterprise, but at the same time this might also introduce additional complexity with regard to the administration of data access. In this breakout session, one possible setup is shown by sharing a high-level overview of such an environment within the ING bank in the Netherlands for the Risk Services organization.

Read the paper (PDF)

In the quest for valuable analytics, access to business data through message queues provides near real-time access to the entire data life cycle. This in turn enables our analytical models to perform accurately. What does the item a user temporarily put in the shopping basket indicate, and what can be done to motivate the user? How do you recover the user who has now unsubscribed, given that the user had previously unsubscribed and re-subscribed quickly? User behavior can be captured completely and efficiently using a message queue, which causes minimal load on production systems and allows for distributed environments. There are some technical issues encountered when attempting to populate a data warehouse using events from a message queue. The presentation outlines a solution to the following issues: the message queue connection, how to ensure that messages aren't lost in transit, and how to efficiently process messages with SAS^®; message definition and metadata, and how to react to changes in message structure; data architecture and which data architecture is appropriate for storing message data and other business data; late arrival of messages and how late arriving data can be loaded into slowly changing dimensions; and analytical processing and how transactional message data can be reformatted for analytical modeling. Ultimately, populating a data warehouse with message queue data can require less development than accessing source databases; however a robust architecture

Read the paper (PDF)

You re in the business of performing complex analyses on large amounts of data. This data changes quickly and often, so you ve invested in a powerful high-performance analytics engine with the speed to respond to a real-time data stream. However, you realize an immediate problem upon the implementation of your software solution: your analytics engine wants to process many records of data at once, but your streaming engine wants to send individual records. How do you store this streaming data? How do you tell the analytics engine about the updates? This paper explains how to manage real-time streaming data in a batch-processing analytics engine. The problem of managing streaming data in analytics engines comes up in many industries: energy, finance, health care, and marketing to name a few. The solution described in this paper can be applied in any industry, using features common to most analytics engines. You learn how to store and manage streaming data in such a way as to guarantee that the analytics engine has only current information, limit interruptions to data access, avoid duplication of data, and maintain a historical record of events.

Read the paper (PDF)

Today, companies are increasingly using analytics to discover new revenue and cost-saving opportunities. Many business professionals turn to SAS, a leader in business analytics software and service, to help them improve performance and make better decisions faster. Analytics is also being used in risk management, fraud detection, life sciences, sports, and many more emerging markets. However, to maximize the value to the business, analytics solutions need to be deployed quickly and cost-effectively, while also providing the ability to readily scale without degrading performance. Of course, in today's demanding environments, where budgets are still shrinking and mandates to reduce carbon footprints are growing, the solution must deliver excellent hardware utilization, power efficiency, and return on investment. To help solve some of these challenges, Red Hat and SAS have collaborated to recommend the best practices for configuring SAS^®9 running on Red Hat Enterprise Linux. The scope of this document covers Red Hat Enterprise Linux 6 and 7. Areas researched include the I/O subsystem, file system selection, and kernel tuning, both in bare metal and virtualized (KVM) environments. Additionally, we now include grid-based configurations running with Red Hat Resilient Storage Add-On (Global File System 2 [GFS2] clusters).

Read the paper (PDF)

If you are planning to deploy SAS^® Grid Manager or SAS^® Enterprise BI (or other distributed SAS^® Foundation applications) with load-balanced servers on multiple operating systems instances, a shared file system is required. In order to determine the best shared file system choice for a given deployment, it is important to understand how the file system is used, the SAS^® I/O workload characteristics performed on it, and the stressors that SAS Foundation applications produce on the file system. For the purposes of this paper, we use the term shared file system to mean both a clustered file system and shared file system, even though shared can denote a network file system and a distributed file system not clustered. This paper examines the shared file systems that are most commonly used with SAS and reviews their strengths and weaknesses.

Read the paper (PDF)

SAS^® Cloud Analytic Services (CAS) is the cloud-based run-time environment for data management and analytics in SAS^®. By run-time environment, we refer to the combination of hardware and software where data management and analytics take place. In a sense, CAS is just another SAS platform to do things. CAS is a platform for high-performance analytics and distributed computing. The CAS server provides data management and an analytics framework that can run in the cloud, that can act as a cloud, and that provides the best-in-class analytics that SAS is known for. This new architecture functions as a public API, allowing access from many different clients such as Lua, Python, Java, REST, and yes, even SAS. The CAS server is designed to provide user-level sessions, to share data between sessions, and to provide fault tolerance, which allows a worker node to crash without losing data and allows the user action to continue running to completion. The isolation provided to each session allows one session to crash without affecting other sessions. The concept of 'always in memory' in CAS means that an action is not aware of what the server does to allow the action to access the data. The entire file might be in memory or just pieces of the file might be mapped into memory, just in time for the action to access the data. This allows CAS tables to be loaded that are larger than the memory available across the grid. Hadoop can be used to provide data redundancy. The server is elastic and can add or remove nodes as needed. Users can specify how many nodes they want their session to use, so that the session fits their needs.

Read the paper (PDF)

We are always looking for ways to improve the performance, efficiency, and availability of our investment in SAS^® solutions. To address those needs, SAS offers the ability to cluster many of its constituent software components. A cluster is a set of systems that work together with the goal of providing a single service. This session identifies 12 different technologies to create clusters of SAS software components and describes how they are designed to boost the capabilities of SAS to function in the enterprise.