SAS Global Forum 2015 Proceedings

This paper illustrates a high-level infrastructure discussion with some explanation of the SAS^® codes used to implement a configurable batch framework for managing and updating the data rows and row-level permissions in SAS^® OLAP Cube Studio. The framework contains a collection of reusable, parameter-driven Base SAS^® macros, Base SAS custom programs, and UNIX or LINUX shell scripts. This collection manages the typical steps and processes used for manipulating SAS files and for executing SAS statements. The Base SAS macro collection contains a group of utility macros that includes: concurrent /parallel processing macros, SAS^® Metadata Repository macros, SAS^® Scalable Performance Data Engine table macros, table lookup macros, table manipulation macros, and other macros. There is also a group of OLAP-related macros that includes OLAP utility macros and OLAP permission table processing macros.

Read the paper (PDF).

When promoting metadata in large packages from SAS^® Data Integration Studio between environments, metadata and the underlying physical data can become out of sync. This can result in metadata items that cannot be opened by users because SAS^® has thrown an error. It often falls to the SAS administrator to resolve the synchronization issues when they might not have been responsible for promoting the metadata items in the first place. In this paper, we will discuss a simple macro that can be used to compare the table metadata to that of the physical tables, and any anomalies will be noted.

Read the paper (PDF). | Download the data file (ZIP).

Modernizing SAS^® assets within an enterprise is key to reducing costs and improving productivity. Modernization implies consolidating multiple SAS environments into a single shared enterprise SAS deployment. While the benefits of modernization are clear, the management of a single-enterprise deployment is sometimes a struggle between business units who once had autonomy and IT that is now responsible for managing this shared infrastructure. The centralized management and control of a SAS deployment is based on SAS metadata. This paper provides a practical approach to the shared management of a centralized SAS deployment using SAS^® Management Console. It takes into consideration the day-to-day needs of the business and IT requirements including centralized security, monitoring, and management. This document defines what resources are contained in SAS metadata, what responsibilities should be centrally controlled, and the pros and cons of distributing the administration of metadata content across the enterprise. This document is intended as a guide for SAS administrators and assumes that you are familiar with the concepts and terminology introduced in SAS^® 9.4 Intelligence Platform: Security Administration Guide.

Read the paper (PDF).

There is a goldmine of information that is available to you in SAS^® metadata. The challenge, however, is being able to retrieve and leverage that information. While there is useful functionality available in SAS^® Management Console as well as a collection of functional macros provided by SAS to help accomplish this, getting a complete metadata picture in an automated way has proven difficult. This paper discusses the methods we have used to find core information within SAS^® 9.2 metadata and how we have been able to pull this information in a programmatic way. We used Base SAS^®, SAS^® Data Integration Studio, PC SAS^®, and SAS^® XML Mapper to build a solution that now provides daily metadata reporting about our SAS Data Integration Studio jobs, job flows, tables, and so on. This information can now be used for auditing purposes as well as for helping us build our full metadata inventory as we prepare to migrate to SAS^® 9.4.

Read the paper (PDF).

Whether you manage computer systems in a small-to-medium environment (for example, in labs, workshops, or corporate training groups) or in a large-scale deployment, the ability to automate SAS^® 9.4 installations is important to the efficiency and success of your software deployments. For large-scale deployments, you can automate the installation process by using third-party provisioning software such as Microsoft System Center Configuration Manager (SCCM) or Symantec Altiris. But what if you have a small-to-medium environment and you do not have provisioning software to package deployment jobs? No worries! There is a solution. This paper presents a case study of just such a situation where a process was developed for SAS regional users groups (RUGs). Along with the case study, the paper offers a process for automating SAS 9.4 installations in workshop, lab, and corporate training (small-to-medium sized) environments. This process incorporates the new -srwonly option with the SAS^® Deployment Wizard, deployment-wizard commands that use response files, and batch-file implementation. This combination results in easy automation of an installation, even without provisioning software.

Read the paper (PDF).

SAS^® Visual Analytics is deployed by many customers. IT departments are tasked with efficiently managing the server resources, achieving maximum usage of resources, optimizing availability, and managing costs. Business users expect the system to be available when needed and to perform to their expectations. Business executives who sponsor business intelligence (BI) and analytical projects like to see that their decision to support and finance the project meets business requirements. Business executives also like to know how different people in the organization are using SAS Visual Analytics. With the release of SAS Visual Analytics 7.1, new functionality is added to support the memory management of the SAS^® LASR™ Analytic Server. Also, new out-of-the-box usage and audit reporting is introduced. This paper covers BI-on-BI for SAS Visual Analytics. Also, all the new functionality introduced for SAS Visual Analytics administration and questions about the resource management, data compression, and out-of-the-box usage reporting of SAS Visual Analytics are also discussed. Key product capabilities are demonstrated.

Read the paper (PDF).

As the SAS^® platform becomes increasingly metadata-driven, it becomes increasingly important to get the structures and controls surrounding the metadata repository correct. This presentation aims to point out some of the considerations and potential pitfalls of working with the metadata infrastructure. It also suggests some solutions that have been used with the aim of making this process as simple as possible.

Read the paper (PDF).

The power of SAS^®9 applications allows information and knowledge creation from very large amounts of data. Analysis that used to consist of 10s-100s of gigabytes (GBs) of supporting data has rapidly grown into the 10s to 100s of terabytes (TBs). This data expansion has resulted in more and larger SAS data stores. Setting up file systems to support these large volumes of data with adequate performance, as well as ensuring adequate storage space for the SAS^® temporary files, can be very challenging. Technology advancements in storage and system virtualization, flash storage, and hybrid storage management require continual updating of best practices to configure I/O subsystems. This paper presents updated best practices for configuring the I/O subsystem for your SAS^®9 applications, ensuring adequate capacity, bandwidth, and performance for your SAS^®9 workloads. We have found that very few storage systems work ideally with SAS with their out-of-the-box settings, so it is important to convey these general guidelines.

Read the paper (PDF).

We regularly speak with organizations running established SAS^® 9.1.3 systems that have not yet upgraded to a later version of SAS^®. Often this is because their current SAS 9.1.3 environment is working fine, and no compelling event to upgrade has materialized. Now that SAS 9.1.3 has moved to a lower level of support and some very exciting technologies (Hadoop, cloud, ever-better scalability) are more accessible than ever using SAS^® 9.4, the case for migrating from SAS 9.1.3 is strong. Upgrading a large SAS ecosystem with multiple environments, an active development stream, and a busy production environment can seem daunting. This paper aims to demystify the process, suggesting outline migration approaches for a variety of the most common scenarios in SAS 9.1.3 to SAS 9.4 upgrades, and a scalable template project plan that has been proven at a range of organizations.

Read the paper (PDF).

Proper management of master data is a critical component of any enterprise information system. However, effective master data management (MDM) requires that both IT and Business understand the life cycle of master data and the fundamental principles of entity resolution (ER). This presentation provides a high-level overview of current practices in data matching, record linking, and entity information life cycle management that are foundational to building an effective strategy to improve data integration and MDM. Particular areas of focus are: 1) The need for ongoing ER analytics--the systematic and quantitative measurement of ER performance; 2) Investing in clerical review and asserted resolution for continuous improvement; and 3) Addressing the large-scale ER challenge through distributed processing.

Read the paper (PDF). | Watch the recording.

From large holding companies with multiple subsidiaries to loosely affiliated state educational institutions, security domains are being federated to enable users from one domain to access applications in other domains and ultimately save money on software costs through sharing. Rather than rely on centralized security, applications must accept claims-based authentication from trusted authorities and support open standards such as Security Assertion Markup Language (SAML) instead of proprietary security protocols. This paper introduces SAML 2.0 and explains how the open source SAML implementation known as Shibboleth can be integrated with the SAS^® 9.4 security architecture to support SAML. It then describes in detail how to set up Microsoft Active Directory Federation Services (AD FS) as the SAML Identity Provider, how to set up the SAS middle tier as the relying party, and how to troubleshoot problems.

Read the paper (PDF).

As organizations strive to do more with fewer resources, many modernize their disparate PC operations to centralized server deployments. Administrators and users share many concerns about using SAS^® on a Microsoft Windows server. This paper outlines key guidelines, plus architecture and performance considerations, that are essential to making a successful transition from PC to server. This paper outlines the five key considerations for SAS customers who will change their configuration from PC-based SAS to using SAS on a Windows server: 1) Data and directory references; 2) Interactive and surrounding applications; 3) Usability; 4) Performance; 5) SAS Metadata Server.

Read the paper (PDF).

The SAS^® Global Forum paper 'Best Practices for Configuring Your I/O Subsystem for SAS^®9 Applications' provides general guidelines for configuring I/O subsystems for your SAS^® applications. The paper reflects updated storage and virtualization technology. This companion paper ('Frequently Asked Questions Regarding Storage Configurations') is commensurately updated, including new storage technologies such as storage virtualization, storage tiers (including automated tier management), and flash storage. The subject matter is voluminous, so a frequently asked questions (FAQ) format is used. Our goal is to continually update this paper as additional field needs arise and technology dictates.

Read the paper (PDF).

At the University of North Carolina at Chapel Hill, we had the pleasure of rolling out a strong enterprise-wide SAS^® Visual Analytics environment in 10 months, with strong support from SAS. We encountered many bumps in the road, moments of both mountain highs and worrisome lows, as we learned what we could and could not do, and new ways to accomplish our goals. Our journey started in December of 2013 when a decision was made to try SAS Visual Analytics for all reporting, and incorporate other solutions only if and when we hit an insurmountable obstacle. We are still strongly using SAS Visual Analytics and are augmenting the tools with additional products. Along the way, we learned a number of things about the SAS Visual Analytics environment that are gems, whether one is relatively new to SAS^® or an old hand. Measuring what is happening is paramount to knowing what constraints exist in the system before trying to enhance performance. Targeted improvements help if measurements can be made before and after each alteration. There are a few architectural alterations that can help in general, but we have seen that measuring is the guaranteed way to know what the problems are and whether the cures were effective.

Read the paper (PDF).

As a SAS^® Intelligence Platform Administrator, have your eyes ever glazed over as you performed repetitive tasks in SAS^® Management Console or some other administrative user interface? Perhaps you're setting up metadata for a new department, managing a set of backups, or promoting content between dev, test, and prod environments. Did you know there is a large library of batch utilities to help you automate many of these common administration tasks? This paper explores content reporting and management utilities, such as viewing authorizations or relationships between content, as well as administrative tasks such as analyzing, creating, or deleting metadata repositories or performing a backup of the system. The batch utilities can be incorporated into scripts so that you can run them repeatedly on either an ad hoc or scheduled basis. Give your mouse a rest and save yourself some time.

Read the paper (PDF).

A group tasked with testing SAS^® software from the customer perspective has gathered a number of helpful hints for SAS^® 9.4 that will smooth the transition to its new features and products. These hints will help with the 'huh?' moments that crop up when you are getting oriented and will provide short, straightforward answers. We also share insights about changes in your order contents. Gleaned from extensive multi-tier deployments, SAS^® Customer Experience Testing shares insiders' practical tips to ensure that you are ready to begin your transition to SAS 9.4. The target audience for this paper is primarily system administrators who will be installing, configuring, or administering the SAS 9.4 environment. (This paper is an updated version of the paper presented at SAS Global Forum 2014 and includes new features and software changes since the original paper was delivered, plus any relevant content that still applies. This paper includes information specific to SAS 9.4 and SAS 9.4 maintenance releases.)

Read the paper (PDF).

In this session, we discuss the advantages of SAS^® Federation Server and how it makes it easier for business users to access secure data for reports and use analytics to drive accurate decisions. This frees up IT staff to focus on other tasks by giving them a simple method of sharing data using a centralized, governed, security layer. SAS Federation Server is a data server that provides scalable, threaded, multi-user, and standards-based data access technology in order to process and seamlessly integrate data from multiple data repositories. The server acts as a hub that provides clients with data by accessing, managing, and sharing data from multiple relational and non-relational data sources as well as from SAS^® data. Users can view data in big data sources like Hadoop, SAP HANA, Netezza, or Teradata, and blend them with existing database systems like Oracle or DB2. Security and governance features, such as data masking, ensure that the right users have access to the data and reduce the risk of exposure. Finally, data services are exposed via a REST API for simpler access to data from third-party applications.

Read the paper (PDF).

Storage space on a UNIX platform is a costly--and finite--resource to maintain, even under ideal conditions. By regularly monitoring and promptly responding to space limitations that might occur during production, an organization can mitigate the risk of wasted expense, time and effort caused by this problem. SAS^® programmers at Truven Health Analytics have designed a reporting tool to measure space usage by a number of distinct factors over time. Using tabular and graphical output, the tool provides a full picture of what often contributes to critical reductions of available hardware space. It enables managers and users to respond appropriately and effectively whenever this occurs. It also helps to identify ways to encourage more efficient practices, thereby minimizing the likelihood of this occurring in the future. Operating System: RHEL 5.4 (Red Hat Enterprise Linux), Oracle Sun Fire X4600 M2 SAS^® 9.3 TS1M1.

The first task to accomplish our SAS^® 9.4 installation goal is to create an Amazon Web Services (AWS) secured EC2 (Elastic Compute Cloud 2) instance called a Virtual Private Cloud (VPC). Through a series of wizard-driven dialog boxes, the SAS administrator selects virtual CPUs (vCPUs, which have about a 2:1 ratio to cores ), memory, storage, and network performance considerations via regional availability zones. Then, there is a prompt to create a VPC that will be housed within the EC2 instance, along with a major component called subnets. A step to create a security group is next, which enables the SAS administrator to specify all of the VPC firewall port rules required for the SAS 9.4 application. Next, the EC2 instance is reviewed and a security key pair is either selected or created. Then the EC2 launches. At this point, Internet connectivity to the EC2 instance is granted by attaching an Internet gateway and its route table to the VPC and allocating and associating an elastic IP address along with a public DNS. The second major task involves establishing connectivity to the EC2 instance and a method of download for SAS software. In the case of the Linux Red Hat instance created here, putty is configured to use the EC2's security key pair (.ppk file). In order to transfer files securely to the EC2 instance, a tool such as WinSCP is installed and uses the putty connection for secure FTP. The Linux OS is then updated, and then VNCServer is installed and configured so that the SAS administrator can use a GUI. Finally, a Firefox web browser is installed to download the SAS^® Download Manager. After downloading the SAS Download Manager, a SAS depot directory is created on the Linux file system and the SAS Download Manager is run once we have provided the software order number and SAS installation key. Once the SAS software depot has been loaded, we can verify the success of the SAS software depot's download by running the SAS depot checker. The next pre-installatio n task is to take care of some Linux OS housekeeping. Local users (for example, the SAS installation ID), sas, and other IDs such as sassrv, lsfadmin, lsfuser, and sasdemo are created. Specific directory permissions are set for the installer ID sas. The ulimit setting for open files and max user processes are increased and directories are created for a SAS installation home and configuration directory. Some third-party tools such as python, which are required for SAS 9.4, are installed. Then Korn shell and other required Linux packages are installed. Finally, the SAS Deployment Manager installation wizard is launched and the multiple dialog boxes are filled out, with many defaults accepted and Next clicked. SAS administrators should consider running the SAS Deployment Manager twice, first to solely install the SAS software, and then later to configure. Finally, after SAS Deployment Manager completion, SAS post-installation tasks are completed.

Read the paper (PDF).

Today's SAS^® environment has large numbers of concurrent SAS processes and ever-growing data volumes. To help SAS users remain productive, SAS administrators must ensure that SAS applications have sufficient computer resources, properly configured and monitored often. Understanding how all the components of SAS work and how they will be used by your users is the first step. The guidance offered in this paper will help SAS administrators evaluate hardware, operating system, and infrastructure options for a SAS environment that will keep their SAS applications running at optimal performance and their user community happy.

Read the paper (PDF).

SAS^® Grid Computing promises many benefits that the SAS^® community has been demanding for years, including workload management of SAS applications, a highly available infrastructure, higher resource utilization, flexibility for IT infrastructure, and potentially improved performance of SAS applications. But to implement these benefits, you need to have a good definition of what you need and an understanding of what is involved in enabling the SAS tasks to take advantage of all the SAS grid nodes. In addition to haivng this understanding of SAS, the underlying hardware infrastructure (cores to storage) must be configured and tuned correctly. This paper discusses the most important things (or misunderstandings) that SAS customers need to know before they deploy SAS^® Grid Manager.

Read the paper (PDF).

SAS^® customers benefit greatly when they are using the functionality, performance, and stability available in the latest version of SAS. However, the task of moving all SAS collateral such as programs, data, catalogs, metadata (stored processes, maps, queries, reports, and so on), and content to SAS^® 9.4 can seem daunting. This paper provides an overview of the steps required to move all SAS collateral from systems based on SAS^® 9.2 and SAS^® 9.3 to the current release of SAS^® 9.4.

Read the paper (PDF).

In-database processing refers to the integration of advanced analytics into the data warehouse. With this capability, analytic processing is optimized to run where the data reside, in parallel, without having to copy or move the data for analysis. From a data governance perspective there are many good reasons to embrace in-database processing. Many analytical computing solutions and large databases use this technology because it provides significant performance improvements over more traditional methods. Come learn how Blue Cross Blue Shield of Tennessee (BCBST) uses in-database processing from SAS and Teradata.

SAS^® Environment Manager helps SAS^® administrators and system administrators manage SAS resources and effectively monitor the environment. SAS Environment Manager provides administrators with a centralized location for accessing and monitoring the SAS^® Customer Intelligence environment. This enables administrators to identify problem areas and to maintain an in-depth understanding of the day-to-day activities on the system. It is also an excellent way to predict the usage and growth of the environment for scalability. With SAS Environment Manager, administrators can set up monitoring for CI logs (for example, SASCustIntelCore6.3.log, SASCustIntelStudio6.3.log) and other general logs from the SAS^® Intelligence Platform. This paper contains examples for administrators who support SAS Customer Intelligence to set up this type of monitoring. It provides recommendations for approaches and for how to interpret the results from SAS Environment Manager.

Read the paper (PDF).

This paper describes how we reduced elapsed time for the third maintenance release for SAS^® 9.4 by as much as 22% by using the High Performance FICON for IBM System z (zHPF) facility to perform I/O for SAS^® files on IBM mainframe systems. The paper details the performance improvements, internal testing to quantify improvements, and the customer actions needed to enable zHPF on their system. The benefits of zHPF are discussed within the larger context of other techniques that a customer can use to accelerate processing of SAS files.

Read the paper (PDF).

EBI administrators who are new to SAS^® Visual Analytics and used to the logging capability of the SAS^® OLAP Server might be wondering how they can get their SAS^® LASR™ Analytic Server to produce verbose log files. While the SAS LASR Analytic Server logs differ from those produced by the SAS OLAP Server, the SAS LASR Analytic Server log contains information about each request made to LASR tables and can be a great data source for administrators looking to learn more about how their SAS Visual Analytics deployments are being used. This session will discuss how to quickly enable logging for your SAS LASR Analytic Server in SAS Visual Analytics 6.4. You will see what information is available to a SAS administrator in these logs, how they can be parsed into data sets with SAS code, then loaded back into the SAS LASR Analytic Server to create SAS Visual Analytics explorations and reports.

Read the paper (PDF).

The SAS^® Environment Manager Service Architecture expands on the core monitoring capabilities of SAS^® Environment Manager delivered in SAS^® 9.4. Multiple sources of data available in the SAS^® Environment Manager Data Mart--traditional operational performance metrics, events, and ARM, audit, and access logs--together with built-in and custom reports put powerful capabilities into the hands of IT operations. This paper introduces the concept of service-oriented even identification and discusses how to use the new architecture and tools effectively as well as the wealth of data available in the SAS Environment Manager Data Mart. In addition, extensions for importing new data, writing custom reports, instrumenting batch SAS^® jobs, and leveraging and extending auditing capabilities are explored.

Read the paper (PDF).

Many companies use geographically dispersed data centers running SAS^® Grid Manager to provide 24/7 SAS^® processing capability with the thought that if a disaster takes out one of the data centers, another data center can take over the SAS processing. To accomplish this, careful planning must take into consideration hardware, software, and communication infrastructure along with the SAS workload. This paper looks into some of the options available, focusing on using SAS Grid Manager to manage the disaster workload shift.

Read the paper (PDF).

Sometimes you need to provide multiple administrators with the ability to manage your software. The rationale can be a need to separate roles and responsibilities (such as installer and configuration manager), changing job responsibilities, or even just covering for the primary administrator while on vacation. To meet that need, it's tempting to share the logon credentials of your SAS^® installer account, but doing so can potentially compromise your security and cause a corporate audit to fail. This paper focuses on standard IT practices and utilities, explaining how to diligently manage the administration of your SAS software to help you properly ensure that access is secured and that auditability is maintained.

Read the paper (PDF). | Watch the recording.

Given the challenges of data security in today's business environment, how can you protect the data that is used by SAS^® Visual Analytics? SAS^® has implemented security features in its widely used business intelligence platform, including row-level security in SAS Visual Analytics. Row-level security specifies who can access particular rows in a LASR table. Throughout this paper, we discuss two ways of implementing row-level security for LASR tables in SAS^® Visual Analytics--interactively and in batch. Both approaches link table-based permission conditions with identities that are stored in metadata.

Read the paper (PDF).

Join us for lunch as we discuss the benefits of being part of the elite group that is SAS Certified Professionals. The SAS Global Certification program has awarded more than 79,000 credentials to SAS users across the globe. Come listen to Terry Barham, Global Certification Manager, give an overview of the SAS Certification program, explain the benefits of becoming SAS certified and discuss exam preparation tips. This session will also include a Q&A section where you can get answers to your SAS Certification questions.

SAS^® Analytics enables organizations to tackle complex business problems using big data and to provide insights needed to make critical business decisions. A well-architected enterprise storage infrastructure is needed to realize the full potential of SAS Analytics. However, as the need for big data analytics and rapid response times increases, the performance gap between server speeds and traditional hard disk drive (HDD) based storage systems can be a significant concern. The growing performance gap can have detrimental effects, particularly when it comes to critical business applications. As a result, organizations are looking for newer, smarter, faster storage systems to accelerate business insights. IBM FlashSystem Storage systems store the data in flash memory. They are designed for dramatically faster access times and support incredible amounts of input/output operations per second (IOPS) and throughput, with significantly lower latency than HDD-based solutions. Due to their macro-efficiency design, FlashSystem Storage systems consume less power and have significantly lower cooling and space requirements, while allowing server processors to run SAS Analytics more efficiently. Being an all-flash storage system, IBM FlashSystem provides consistent low latency response across IOPS range, as the analytics workload scales. This paper introduces the benefits of IBM FlashSystem Storage for deploying SAS Analytics and highlights some of the deployment scenarios and architectural considerations. This paper also describes best practices and tuning guidelines for deploying SAS Analytics on FlashSystem Storage systems, which would help SAS Analytics customers in architecting solutions with FlashSystem Storage.

Read the paper (PDF).

Wouldn't it be great if there were a way to deploy SAS^® Grid Manager in discrete building blocks that have the proper balance of compute capability, RAM, and IO throughput? Well, now you can! This paper discusses the attributes of a well-designed SAS Grid Manager deployment and why it is sometimes difficult to engineer such an environment when IT responsibilities are segregated between server administration, network administration, and storage administration. The paper presents a concrete design that will position the customer for a successful SAS Grid Manager deployment of any size and that can also scale out easily as the needs of the organization grow.

SAS^® vApps (virtual applications) are a SAS^® construct designed to logically and physically encapsulate a single- or multi-tier software solution into a virtual machine (or sometimes into multiple virtual machines). In this paper, we examine the conceptual, logical, and physical design perspectives that comprise a vApp, giving you a high-level understanding of both the technical and business benefits of vApps and the design decisions that go into envisioning and constructing SAS vApps. These are described in the context of the user roles involved in the life cycle of a vApp, and how those roles interact with a vApp at various points along its continuum.

Read the paper (PDF).

One of the challenges in Secure Socket Layer (SSL) configuration for any web configuration is the SSL certificate management for client and server side. The SSL overview covers the structure of the x.509 certificate and SSL handshake process for the client and server components. There are three distinctive SSL client/server combinations within the SAS^® Visual Analytics 7.1 web application configuration. The most common one is the browser accessing the web application. The second one is the internal SAS^® web application accessing another SAS web application. The third one is a SAS Workspace Server executing a PROC or LIBNAME statement that accesses the SAS^® LASR™ Authorization Service web application. Each SSL client/server scenario in the configuration is explained in terms of SSL handshake and certificate arrangement. Server identity certificate generation using Microsoft Active Directory Certificate Services (AD CS) for enterprise level organization is showcased. The certificates, in proper format, need to be supplied to the SAS^® Deployment Wizard during the configuration process. The prerequisites and configuration steps are shown with examples.

Read the paper (PDF).

The Hadoop ecosystem is vast, and there's a lot of conflicting information available about how to best secure any given implementation. It's also difficult to fix any mistakes made early on once an instance is put into production. In this paper, we demonstrate the currently accepted best practices for securing and Kerberizing Hadoop clusters in a vendor-agnostic way, review some of the not-so-obvious pitfalls one could encounter during the process, and delve into some of the theory behind why things are the way they are.

Your enterprise SAS^® Visual Analytics implementation is on its way to being adopted throughout your organization, unleashing the production of critical business content by business analysts, data scientists, and decision makers from many business units. This content is relied upon to inform decisions and provide insight into the results of those decisions. With the development of SAS Visual Analytics content decentralized into the hands of business users, the use of automated version control is essential to providing protection and recovery in the event of inadvertent changes to that content. Re-creation of complex report objects accidentally modified by a business user is time-consuming and can be eliminated by maintaining a version control repository of report (and other) objects created in SAS Visual Analytics. This paper walks through the steps for implementing an automated process for version control using SAS^®. This process can be applied to all types of metadata objects used in multiple SAS application development and analysis environments, such as reports and explorations from SAS Visual Analytics, and jobs, tables, and libraries from SAS^® Data Integration Studio. Basic concepts for the process, as well as specific techniques used for our implementation are included. So eliminate the risk of content loss for your business users and the burden of manual version control for your applications developers. Your IT shop will enjoy time savings and greater reliability.

Read the paper (PDF).

With cloud service providers such as Amazon commodifying the process to create a server instance based on desirable OS and sizing requirements for a SAS^® implementation, a definite advantage is the speed and simplicity of getting started with a SAS installation. Planning horizons are nonexistent, and initial financial outlay is economized because no server hardware procurement occurs, no data center space reserved, nor any hardware/OS engineers assigned to participate in the initial server instance creation. The cloud infrastructure seems to make the OS irrelevant, an afterthought, and even just an extension of SAS software. In addition, if the initial sizing, memory allocation, or disk space selection results later in some deficiency or errors in SAS processing, the flexibility of the virtual server instance allows the instance image to be saved and restored to a new, larger, or performance-enhanced instance at relatively low cost and minor inconvenience to production users. Once logged on with an authenticated ID, with Internet connectivity established, a SAS installer ID created, and a web browser started, it's just a matter of downloading the SAS^® Download Manager to begin the creation of the SAS software depot. Many Amazon cloud instances have download speeds that tend to be greater and processing time that is shorter to create the depot. Installing SAS via the SAS^® Deployment Wizard is not dissimilar on a cloud instance versus a server instance, and all the same challenges (for example, SSL, authentication and single sign-on, and repository migration) apply. Overall, SAS administrators have an optimal, straightforward, and low-cost opportunity to deploy additional SAS instances running different versions or more complex configurations (for example, SAS^® Grid Computing, resource-based load balancing, and SAS jobs split and run parallel across multiple nodes). While the main advantages of using a cloud instance to deploy a new SAS i mplementation tend to revolve around efficiency, speed, and affordability, its pitfalls have to do with vulnerability to intrusion and external attack. The same easy, low-cost server instance launch also has a negative flip side that includes a possible lack of experienced OS oversight and basic security precaution. At the moment, Linux administrators around the country are patching their physical and virtual systems to prevent the spread of the Shellshock vulnerability for web servers that originated in cloud instances. Cloud instances have also been targeted and credentials compromised which, in some cases, have allowed thousands of new instances to be spun up and billed to an unsuspecting AWS licensed user. Extra steps have to be taken to prevent the aforementioned attacks and fortunately, there are cloud-based methods available. By creating a Virtual Private Cloud (VPC) instance, AWS users can restrict access by originating IP addresses while also requiring additional administration, including creating entries for application ports that require external access. Moreover, with each step toward more secure cloud implementations, there are additional complexities that arise, including making additional changes or compromises with corporate firewall policy and user authentication methods.

Read the paper (PDF).

Many industries are challenged with requirements to protect information and limit its access. In this paper, we will discuss various approaches for row-level access to LASR tables and demonstrate our implementation. Methods discussed in this paper include security joins in data queries, using star schema with security table as one dimension, permission conditions based on metadata stored user information, and user IDs being associated with data as a dedicated column. The paper then identifies shortcomings and strengths of various approaches as well as our iterations to satisfy business needs that led us to our row-level permissions implementation. In addition, the paper offers recommendations and other considerations to keep in mind while working on row-level persmissions with LASR tables.

Read the paper (PDF).

The SAS^® LASR™ Analytic Server acts as a back-end, in-memory analytics engine for solutions such as SAS^® Visual Analytics and SAS^® Visual Statistics. It is designed to exist in a massively scalable, distributed environment, often alongside Hadoop. This paper guides you through the impacts of the architecture decisions shared by both software applications and what they specifically mean for SAS^®. We then present positive actions you can take to rebound from unexpected outages and resume efficient operations.

Read the paper (PDF).

SAS^® Visual Analytics is a product that easily enables the interactive analysis of data. It offers capabilities for analyzing data using a visual approach. This paper discusses architecture options for configuring a SAS Visual Analytics installation that serves multiple customers in parallel. The overall objective is to create an environment that scales with the volume of data and also with the number of customer groups. This paper explains several concepts for serving multiple customers groups and explains the pros and cons of each approach.

Read the paper (PDF).

SAS^® Visual Analytics is very responsive in analyzing historical data, and it takes advantage of in-memory data. Data query, exploration, and reports form the basis of the tool, which also has other forward-looking techniques such as star schemas and stored processes. A security model is established by defining the permissions through a web-based application that is stored in a database table. That table is brought to the SAS Visual Analytics environment as a LASR table. Typically, security is established based on the departmental access, geographic region, or other business-defined groups. This permission table is joined with the underlying base table. Security is defined by a data filter expression through a conditional grant using SAS^® metadata identities. The in-memory LASR star schema is very similar to a typical star schema. A single fact table that is surrounded by dimension tables is used to create the star schema. The star schema gives you the advantage of loading data quickly on the fly. Each of the dimension tables is joined to the fact table with a dimension key. A SAS application that gives the flexibility and the power of coding is created as a stored process that can be executed as requested by client applications such as SAS Visual Analytics. Input data sources for stored processes can be either LASR tables in the SAS^® LASR™ Analytic Server or any other data that can be reached through the stored process code logic.

Read the paper (PDF).

So you have big data and need to know how to quickly and efficiently keep your data up-to-date and available in SAS^® Visual Analytics? One of the challenges that customers often face is how to regularly update data tables in the SAS^® LASR™ Analytic Server, the in-memory analytical platform for SAS Visual Analytics. Is appending data always the right answer? What are some of the key things to consider when automating a data update and load process? Based on proven best practices and existing customer implementations, this paper provides you with answers to those questions and more, enabling you to optimize your update and data load processes. This ensures that your organization develops an effective and robust data refresh strategy.

Read the paper (PDF).

Becoming one of the best memorizers in the world doesn't happen overnight. With hard work, dedication, a bit of obsession, and with the assistance of some clever analytics metrics, Nelson Dellis was able to climb himself up to the top of the memory rankings in under a year to become the now 3x USA Memory Champion. In this talk, he explains what it takes to become the best at memory, what is involved in such grueling memory competitions, and how analytics helped him get there.

SAS^® platform administrators always feel the pinch of not having information about how much storage space is occupied by each user on one specific file system or in the entire environment. Sometimes the platform administrator does not have access to all users' folders, so they have to plan for the worst. There are multiple approaches to tackle this problem. One of the better methods is to initiate an alert mechanism to notify a user when they are in the top 10 file system users on the system.

Read the paper (PDF).

Smoke detectors operate by comparing actual air quality to expected air quality standards and immediately alerting occupants when smoke or particle levels exceed established thresholds. Just as rapid identification of smoke (that is, poor air quality) can detect harmful fire and facilitate its early extinguishment, rapid detection of poor quality data can highlight data entry or ingestion errors, faulty logic, insufficient or inaccurate business rules, or process failure. Aspects of data quality--such as availability, completeness, correctness, and timeliness--should be assessed against stated requirements that account for the scope, objective, and intended use of data products. A single outlier, an accidentally locked data set, or even subtle modifications to a data structure can cause a robust extract-transform-load (ETL) infrastructure to grind to a halt or produce invalid results. Thus, a mature data infrastructure should incorporate quality assurance methods that facilitate robust processing and quality data products, as well as quality control methods that monitor and validate data products against their stated requirements. The SAS^® Smoke Detector represents a scalable, generalizable solution that assesses the availability, completeness, and structure of persistent SAS data sets, ideal for finished data products or transactional data sets received with standardized frequency and format. Like a smoke detector, the quality control dashboard is not intended to discover the source of the blaze, but rather to sound an alarm to stakeholders that data have been modified, locked, deleted, or otherwise corrupted. Through rapid detection and response, the fidelity of data is increased as well as the responsiveness of developers to threats to data quality and validity.

Read the paper (PDF).

As SAS^® products become more web-oriented and sophisticated, SAS administrators face an increased challenge to manage their SAS middle-tier environments. They want to know the answers to important critical questions when planning, installing, configuring, deploying, and administrating their SAS products. They also need to meet the requirements of high performance, high availability, increased security, maintainability, and more. In this paper, we identify the most common and challenging questions that most of our administrators and customers have asked. These questions range across topics such as SAS middle-tier architecture, clustering, performance, security, and administration using SAS^® Environment Manger. These questions come from many sources such as technical support, consultants, and internal customer experience testing teams. The specific questions include: what is new in SAS 9.4 mid-tier infrastructure and why that is better for me; should I use the SAS Web Server or can I use another third party Web Server in my deployment; where can I deploy customer dynamic web applications and static contents; what are the SAS JRE, SAS Web Server, SAS Web Application Server upgrade policy and process; how to architect and configure to achieve High Availability for EBI and VA; how to install, update or add my products for cluster members; how can I tune the mid-tier performance and improve the start-up time of my SAS Web Application Server; what options are available for configuring SSL; what is the security policy, what security patches are available and how to apply them; how can I manage my mid-tier infrastructure and applications and how the user and account are managed in SAS Environment Manager? The paper will present detailed answers for these questions and also point out where you can find more information. We believe that with the answers to these questions, you, SAS administrators, can better implement and manage your SAS environment with a higher confide nce and satisfaction.