SAS Global Forum 2014 Proceedings

Nowadays, most corporations build and maintain their own data warehouse, and an ETL (Extract, Transform, and Load) process plays a critical role in managing the data. Some people might create a large program and execute this program from top to bottom. Others might generate a SAS^® driver with several programs included, and then execute this driver. If some programs can be run in parallel, then developers must write extra code to handle these concurrent processes. If one program fails, then users can either rerun the entire process or comment out the successful programs and resume the job from where the program failed. Usually the programs are deployed in production with read and execute permission only. Users do not have the priviledge of modifying codes on the fly. In this case, how do you comment out the programs if the job terminated abnormally? This paper illustrates an approach for managing ETL process flows. The approach uses a framework based on SAS, on a UNIX platform. This is a high-level infrastructure discussion with some explanation of the SAS codes that are used to implement the framework. The framework supports the rerun or partial run of the entire process without changing any source codes. It also supports the concurrent process, and therefore no extra code is needed.

SID file, SAS^® Deployment Wizard, SAS^® Migration Utility, SAS^® Environment Manager, plan file. SAS^® can seem very mysterious to IT organizations used to working with other software solutions. The more IT knows and understands about SAS how it works, what its system requirements are, how to maintain it and back it up, and what its value is to the organization the better IT can support the SAS shop. This paper provides an introduction to the world of SAS and sheds light on some of the unique elements of maintaining a SAS environment.

SAS^® 9.4 has overhauled web authentication schemes, and the integration with enterprise security infrastructure is quite different from that of SAS^® 9.3. This paper examines advanced security features such as Secure Sockets Layer (SSL) configuration, single sign-on (SSO) support through Integrated Windows authentication (IWA), and third-party security packages like CA SiteMinder and IBM Tivoli Access Manager and WebSEAL. FIPS 140-2 compliance efforts that enforce the use of a stronger encryption algorithm for web communication and the SAS^® system itself are also described. The authentication support for mobile devices such as the iPad is different. The secure Wi-Fi connection from a mobile device to the IT internal resources, as well as how it can be safely integrated into the enterprise security configuration by using the same user repository as the SAS web applications, is explained. The configuration example is shown with SAS^® Visual Analytics 6.2.

SAS^® 9.4 and SAS^® Visual Analytics support a wide list of authentication protocols such as Integrated Windows authentication (IWA), client certificate, IBM WebSEAL, CA SiteMinder, and Security Assertion Markup Language (SAML) 2.0. However, advanced customers might want to use some of these protocols together and also have the flexibility to select which protocols to use. In this paper, we focus on a fallback authentication framework that supports IWA as the primary authentication method. When IWA fails, it uses the X509 client certificate as the secondary authentication method, and when the client certificate fails, it uses the form-based username/password as the last option. The paper first introduces the security architecture of SAS^® 9.4 and SAS Visual Analytics. It then reviews the three above-mentioned security protocols. Further, it introduces the detailed fallback authentication framework and discusses how to configure it. Finally, we discuss the use of this framework in the customer scenario from implementing the fallback authentication framework in a customer s SAS^® 9.4 and SAS Visual Analytics environment.

Big data! Hadoop! MapReduce! These are all buzzwords that you ve probably already heard mentioned at SAS^® Global Forum 2014. But what exactly is MapReduce and what has it got to do with SAS^®? This talk explains how a simple processing framework (created by Google and more recently popularized by the open-source technology Hadoop) can be replicated using cornerstone SAS technologies such as Base SAS^®, SAS macros, and SAS/CONNECT^®. The talk explains how, out of the box, the SAS DATA step can replicate the MAP function. It looks at how well-established SAS procedures can be used to create reduce-like functionality. We look at how parallel processing data across multiple machines using MPCONNECT can replicate MapReduce s shared-nothing approach to data processing.

With a growing enterprise analytics environment that comprises global users and a variety of sensitive data sources, a system administrator is faced with the challenge of knowing who logs into the system, how often, and what applications and what data sources are being consumed. This information is necessary for auditing the consumers of data as well as for monitoring the growth of data sources for hardware expansion. With the use of SAS^® Audit, Performance and Measurement Package, along with some additional middle-tier logging and SAS^® code, information about the major consumers of the environment can be loaded into LASR tables and analyzed with SAS^® Visual Analytics reporting tools.

There are many components that make up the middle tier and server tier of a SAS^® 9.4 deployment. There is also a variety of technologies that can be used to provide high availability of these components. This paper focuses on a small set of best practices recommended by SAS for a consistent high-availability strategy across the entire SAS 9.4 platform. We focus on two technologies: clustering, as well as the high-availability features of SAS^® Grid Manager. For the clustering, we detail newly introduced clustering capabilities in SAS 9.4 such as the middle-tier SAS^® Web Application Server and the server-tier SAS^® metadata clusters. We also introduce the small, medium, and large deployment scenarios or profiles, which make use of each of these technologies. These deployment scenarios reflect the typical customer's environment and address their high availability, performance, and scalability requirements.

You can provide access and visibility to SAS^® BI Dashboards, SAS^® Stored Processes, and SAS^® Visual Analytics through the use of SAS^® Web Parts for Microsoft SharePoint. In many organizations, the administrators who are responsible for SharePoint and SAS^® are different. This paper provides best practices for the deployment of SAS Web Parts for Microsoft SharePoint. Bridging the gap between SharePoint and SAS is especially important for people who are not familiar with SharePoint administration. This paper also provides tips for co-existence between SAS Web Parts for Microsoft SharePoint 6.1 and 5.1. (The 5.1 release is available in SAS^® 9.3. The 6.1 release is available in SAS^® 9.4.) Finally, this paper provides some guidance on DNS, permissions, and installation techniques the fine points that make or break your deployment!

Usually, log files are checked by users only when SAS^® completes the execution of programs. If SAS finds any errors in the current line, it skips the current step and executes the next line. The process is completed only at the execution complete program. There are a few programs that will take more than a day to complete. In this case, the user opens the log file in Read-Only mode frequently to check for errors, warnings, and unexpected notes and terminates the execution of the program manually if any potential messages are identified. Otherwise, the user will be notified with the errors in the log file only at the end of the execution. Our suggestion is to run the parallel utility program along with the production program to check the log file of the currently running program and to notify the user through an e-mail when an error, warning, or unexpected note is found in the log file. Also, the execution can be terminated automatically and the user can be notified when potential messages are identified.

This paper describes a method that uses some simple SAS^® macros and SQL to merge data sets containing related data that contains rows with varying effective date ranges. The data sets are merged into a single data set that represents a serial list of snapshots of the merged data, as of a change in any of the effective dates. While simple conceptually, this type of merge is often problematic when the effective date ranges are not consecutive or consistent, when the ranges overlap, or when there are missing ranges from one or more of the merged data sets. The technique described was used by the Fairfax County Human Resources Department to combine various employee data sets (Employee Name and Personal Data, Personnel Assignment and Job Classification, Personnel Actions, Position-Related data, Pay Plan and Grade, Work Schedule, Organizational Assignment, and so on) from the County's SAP-HCM ERP system into a single Employee Action History/Change Activity file for historical reporting purposes. The technique currently is used to combine fourteen data sets, but is easily expandable by inserting a few lines of code using the existing macros.

With the growth in size and complexity of organizations investing in SAS^® platform technologies, the size and complexity of ETL subsystems and data integration (DI) jobs is growing at a rapid rate. Developers are pushed to come up with new and innovative ways to improve process efficiency in their DI jobs to meet increasingly demanding service level agreements (SLAs). The ability to conditionally execute or switch paths in a DI job is an extremely useful technique for improving process efficiency. How can a SAS^® Data Integration developer design a job to best suit conditional execution? This paper discusses a technique for providing a parameterized dynamic execution custom transformation that can be easily incorporated into SAS^® Data Integration Studio jobs to provide process path switching capabilities. The aim of any data integration task is to ensure that all sources of business data are integrated as efficiently as possible. It is concerned with the repurposing of data via transformation, should be a value-adding process, and also should be the product of collaboration. Modularization of common or repeatable processes is a fundamental part of the collaboration process in DI design and development. Switch path a custom transformation built to conditionally execute branches or nodes in SAS Data Integration Studio provides a reusable module for solving the conditional execution limitations of standard SAS Data Integration Studio transformations and jobs. Switch Path logic in SAS Data Integration Studio can serve many purposes in day-to-day business needs for a SAS data integration developer as it is completely reusable

If you have an existing SAS^® Business Intelligence environment and you want to add SAS^® Visual Analytics, you need to make some architectural choices. SAS Visual Analytics and SAS Business Intelligence can share certain components, such as a SAS^® Metadata Server and the SAS^® Web Infrastructure Platform. Sharing metadata eliminates the need to create and maintain duplicate information, and it enables your users to take advantage of functionality that can be shared between SAS Visual Analytics and SAS Business Intelligence. Sharing the SAS Web Infrastructure Platform enables SAS middle-tier applications such as SAS^® Visual Analytics Services and SAS^® Web Report Studio to communicate with each other. Intended for SAS architects and administrators, this paper explores supported architecture for SAS Visual Analytics and SAS Business Intelligence. The paper then identifies areas where the architecture can be shared as well as where resources should be kept separate. In addition, the paper offers recommendations and other considerations to keep in mind when you are managing shared resources.

Having data that are consistent, reliable, and well linked is one of the biggest challenges faced by financial institutions. The paper describes how the SAS^® Data Management offering helps to connect people, processes, and technology to deliver consistent results for data sourcing and analytics teams, and minimizes the cost and time involved in the development life cycle. The paper concludes with best practices learned from various enterprise data initiatives.

Your company s chronically overloaded SAS^® environment, adversely impacted user community, and the resultant lackluster productivity have finally convinced your upper management that it is time to upgrade to a SAS^® grid to eliminate all the resource problems once and for all. But after the contract is signed and implementation begins, you as the SAS administrator suddenly realize that your company-wide standard mode of SAS operations, that is, using the traditional SAS^® Display Manager on a server machine, runs counter to the expectation of the SAS grid your users are now supposed to switch to SAS^® Enterprise Guide^® on a PC. This is utterly unacceptable to the user community because almost everything has to change in a big way. If you like to play a hero in your little world, this is your opportunity. There are a number of things you can do to make the transition to the SAS grid as smooth and painless as possible, and your users get to keep their favorite SAS Display Manager.

Very often, there is a need to present the analysis output from SAS^® through web applications. On these occasions, it would make a lot of difference to have highly interactive charts over static image charts and graphs. Not only this is visually appealing, with features like zooming, filtering, etc., it enables consumers to have a better understanding of the output. There are a lot of charting libraries available in the market which enable us to develop cool charts without much effort. Some of the packages are Highcharts, Highstock, KendoUI, and so on. They are developed in JavaScript and use the latest HTML5 components, and they also support a variety of chart types such as line, spline, area, area spline, column, bar, pie, scatter, angular gauges, area range, area spline range, column range, bubble, box plot, error bars, funnel, waterfall, polar chart types etc. This paper demonstrates how we can combine the data processing and analytic powers of SAS with the visualization abilities of these charting libraries. Since most of them consume JSON-formatted data, the emphasis is on JSON producing capabilities of SAS, both with PROC JSON and other custom programming methods. The example would show how easy it is to do develop a stored process which produces JSON data which would be consumed by the charting library with minimum change to the sample program.

APP is an unofficial collective abbreviation for the SAS^® functions ADDR, PEEK, PEEKC, the CALL POKE routine, and their so-called LONG 64-bit counterparts the SAS tools designed to directly read from and write to physical memory in the DATA step. APP functions have long been a SAS dark horse. First, the examples of APP usage in SAS documentation amount to a few technical report tidbits intended for mainframe system programming, with nary a hint how the functions can be used for data management programming. Second, the documentation note on the CALL POKE routine is so intimidating in tone that many potentially receptive folks might decide to avoid the allegedly precarious route altogether. However, little can stand in the way of an inquisitive SAS programmer daring to take a close look, and it turns out that APP functions are very simple and useful tools! They can be used to explore how things really work, to make code more concise, to implement en masse data movement, and they can often dramatically improve execution efficiency. The author and many other SAS experts (notably Peter Crawford, Koen Vyverman, Richard DeVenezia, Toby Dunn, and the fellow masked by his 'Puddin' Man' sobriquet) have been poking around the SAS APP realm on SAS-L and in their own practices since 1998, occasionally letting the SAS community at large to peek at their findings. This opus is an attempt to circumscribe the results in a systematic manner. Welcome to the APP world! You are in for a few glorious surprises.

With the introduction of new features in SAS^® 9.4 Grid Manager, administrators of SAS solutions have even better capabilities for effectively managing the use of SAS^® Enterprise Guide^® in a grid environment. In this paper, we explain and demonstrate proven practices for configuring the SAS 9.4 Grid Manager environment, leveraging grid options sets and grid-spawned SAS^® Workspace Servers. We walk through the options provided by SAS Enterprise Guide that make the most effective use of the grid environment.

Typically, it takes a system administrator to understand the graphic data results that are generated in the Microsoft Windows Performance Monitor. However, using SAS/GRAPH^® software, you can customize performance results in such a way that makes the data easier to read and understand than the data that appears in the default performance monitor graphs. This paper uses a SAS^® data set that contains a subset of the most common performance counters to show how SAS programmers can create an improved, easily understood view of the key performance counters by using SAS/GRAPH software. This improved view can help your organization reduce resource bottlenecks on systems that range from large servers to small workstations. The paper begins with a concise explanation of how to collect data with Windows Performance Monitor. Next, examples are used to illustrate the following topics in detail: converting and formatting a subset of the performance-monitor data into a data set using a SAS program to generate clearly labeled graphs that summarize performance results analyzing results in different combinations that illustrate common resource bottlenecks

Pipeline parallelism, an extension of MP Connect, is an effective way to speed processing. Piping allows the typical programming sequence of DATA step followed by PROC to execute in parallel. Piping uses TCP ports to pass records directly from the DATA step to the PROC immediately as each individual record is processed. The DATA step in effect becomes a data transformation filter for the PROC , running in parallel and incurring no additional disk storage or related I/O lag. Establishing a pipe with MP Connect typically requires specifying a physical TCP port to be used by the writing and by the reading processes. Coding in this style opens the possibility for users to generate systems conflicts by inadvertently requesting ports that are in use. SAS^® Metadata Server allows one to allocate ports dynamically; that is, users can use a symbolic name for the port with the server dynamically determining an unused port to temporarily assign to the SAS^® job. While this capability is attractive, implementing SAS Metadata Server on a system which does not use any of the other SAS BI technology can be inefficient from a cost perspective. To enable dynamic port allocation without the added cost, we created a UNIX script which can be called from within SAS to ascertain which ports are available at runtime. The script returns a list of available ports which is captured in a SAS macro variable and subsequently used in establishing pipeline parallelism.

Speed, precision, reliability these are just three of the many challenges that today s banking institutions need to face. Join Austria s ERSTE GROUP Bank on their road from monolithic processing toward a highly flexible processing infrastructure using SAS^® Grid technology. This paper focuses on the central topics and decisions that go beyond the standard material about the product that is presented initially to SAS Grid prospects. Topics covered range from how to choose the correct hardware and critical architecture considerations to the necessary adaptions of existing code and logic all of which have shown to be a common experience for all the members of the SAS Grid community. After making the initial plans and successfully managing the initial hurdles, seeing it all come together makes you realize the endless possibilities for improving your processing landscape.

A group tasked with testing SAS^® software from the customer perspective has gathered a number of helpful hints for SAS^® 9.4 that will smooth the transition to its new features and products. These hints will help with the 'huh?' moments that crop up when you're getting oriented and will provide short, straightforward answers. And we can share insights about changes in your order contents. Gleaned from extensive multi-tier deployments, SAS^® Customer Experience Testing shares insiders' practical tips to ensure you are ready to begin your transition to SAS^® 9.4.

This case study shows how SAS^® Enterprise Guide^® and SAS^® Enterprise BI made it possible to easily implement reports of fraud prevention in BF Financial Services and also how to help operational areas to increase efficiency through automation of information delivery. The fraud alert report was made using a program developed in SAS Enterprise Guide to detect frauds on loan applications and later published in SAS^® Web Report Studio in order to be analyzed by a team. The second example is the automation by SAS BI of a payment report that spent 30% of the time of a six-worker staff.

SAS^® High-Performance Analytics is a significant step forward in the area of high-speed, analytic processing in a scalable clustered environment. However, Big Data problems generally come with data from lots of data sources, at varying levels of maturity. Teradata s innovative Unified Data Architecture (UDA) represents a significant improvement in the way that large companies can think about Enterprise Data Management, including the Teradata Database, Hortonworks Hadoop, and Aster Data Discovery platform in a seamless integrated platform. Together, the two platforms provide business users, analysts, and data scientists with the ideally suited data management platforms, targeted specifically to their analytic needs, based upon analytic use cases, managed in a single integrated enterprise data management environment. The paper will focus on how several companies today are using Teradata s Integrated Hardware and Software UDA Platform to manage a single enterprise analytic environment, fight the ongoing proliferation of analytic data marts, and speed their operational analytic processes.

SAS^® solutions are tightly integrated with the scheduling capabilities provided by SAS^® Grid Manager and Platform Suite for SAS^®. Many organizations require that their corporate scheduler be used to control SAS processing within the enterprise. Historically this has been a laborious process, requiring duplication of job and flow information using manual forms and cumbersome change management. This paper provides proven techniques and methods that enable tight integration between the corporate scheduler and SAS without the administrative overhead. Platform Suite for SAS can be used to create flows which are then executed by the corporate scheduler. The business unit can tweak the flow without reference to the enterprise scheduling team. The approaches discussed are: Using the corporate scheduler to: Trigger SAS flows and to respond to flow return codes Restart a SAS flow that has exited due to error conditions Enable and disable LSF queues, allowing jobs that have been queued up to run within a time window that is managed on external dependencies rather than time How to configure your SAS environment to leverage the provided capabilities Real-world use cases to highlight the features and benefits of this approach The contents of this paper is of interest to SAS administrators and IT personnel responsible for enterprise scheduling. Full code and deployment instructions will be made available.

Item response theory (IRT) is concerned with accurate test scoring and development of test items. You design test items to measure various types of abilities (such as math ability), traits (such as extroversion), or behavioral characteristics (such as purchasing tendency). Responses to test items can be binary (such as correct or incorrect responses in ability tests) or ordinal (such as degree of agreement on Likert scales). Traditionally, IRT models have been used to analyze these types of data in psychological assessments and educational testing. With the use of IRT models, you can not only improve scoring accuracy but also economize test administrations by adaptively using only the discriminative items. These features might explain why in recent years IRT models have become increasingly popular in many other fields, such as medical research, health sciences, quality-of-life research, and even marketing research. This paper describes a variety of IRT models, such as the Rasch model, two-parameter model, and graded response model, and demonstrates their application by using real-data examples. It also shows how to use the IRT procedure, which is new in SAS/STAT^® 13.1, to calibrate items, interpret item characteristics, and score respondents. Finally, the paper explains how the application of IRT models can help improve test scoring and develop better tests. You will see the value in applying item response theory, possibly in your own organization!

SAS^® 9.4 has improved clustering capabilities that allow for scalability and failover for middle-tier servers and the metadata server. In this presentation, we share our experiences with high-availability and failover testing done prior to SAS 9.4 availability. We discuss what we tested and lessons learned (good and bad) while doing the testing.

SAS^® Environment Manager is included with the release of SAS^® 9.4. This exciting new product enables administrators to monitor the performance and operation of their SAS^® deployments. What very few people are aware of is that the data collected by SAS Environment Manager is stored in a centralized data mart that's designed to help administrators better understand the behavior and performance of the components of their SAS solution stack. This data mart could also be used to help organizations to meet their ITIL reporting and measurement requirements. In addition to the information about alerts, events, and performance metrics collected by the SAS Environment Manager agent technology, this data mart includes the metadata audit and content usage data previously available only from the SAS^® Audit, Performance and Measurement Package.

SAS^® has a large portfolio of Java EE applications. In releases previous to SAS^® 9.4, SAS provides support for configuring, deploying, and running these applications in Oracle WebLogic, IBM WebSphere, or Red Hat JBoss. Beginning with SAS^® 9.4, SAS has updated the middle-tier architecture to deliver and run these web applications exclusivcely in the SAS^® Web Application Server (a specialized, extended configuration of Pivotal tc Server), rather than the other thrid-party web application servers. This paper discusses the motivation, technology selections, and architecture on which this change is based. It also describes the advantages that the new approach presents to customers, including increased automation of installation and configuration tasks, and improved system administration.

This poster shows the audience step-by-step how to connect to a database without registering the connection in either the Windows ODBC Administrator tool or in the Windows Registry database. This poster also shows how the connection can be more flexible and better managed by building it into a SAS^® macro.

ODS is a power tool for generating HTML-based reports. Quite often, however, there are exacting requirements for report content, layout, and placement that can be done with HTML (and especially HTML5) that can t be done with ODS. This presentation shows several examples that use PROC STREAM and SAS^® Server Pages in a batch (for example, scheduled tasks, using SAS^® Display Manager, using SAS^® Enterprise Guide^®) to generate such custom reports. And yes, despite the name SAS Server Pages, this technology, including the use of jQuery widgets, does apply to batch environments. This paper describes and shows several examples that are similar to those presented in the SAS^® Press book SAS Server Pages: Generating Dynamic Content (http://support.sas.com/publishing/authors/extras/64993b.html) and on the author s blog Jurassic SAS in the BI/EBI World (http://hcsbi.blogspot.com/): creating a custom calendar; a sample mail-merge application; generating a custom Microsoft Excel-based report; and generating an expanding drill-down table.

Can you juggle? Maybe. Can you shuffle a deck of cards? Probably. Can you do both at the same time? Welcome to the world of SAS^® and LSF! Very few SAS Administrators start out learning LSF at the same time they learn SAS; most already know SAS, possibly starting out as a programmer or analyst, but now have to step up to an enterprise platform with shared resources. The biggest challenge on an enterprise platform? How to share! How to maximum the utilization of a SAS platform, yet still ensure everyone gets their fair share? This presentation will boil down the 2000+ pages of LSF documentation to provide an introduction into various LSF concepts: * Host * Clusters * Nodes * Queues * First-Come-First-Serve * Fairshare * and various configuration settings: UJOB_LIMIT, PJOB_LIMIT, etc. Plus some insight on where to configure all these settings which are set up by the installation process, and which can be configured by the SAS or LSF administrator. This session is definitely NOT for experts. It is for those about to step into an enterprise deployment of SAS, and want to understand how the SAS server sessions they know so well can run on a shared platform.

The Department of Market Monitoring (DMM) at California ISO is responsible for promoting a robust, competitive, and nondiscriminatory electric power market in California by keeping a close watch on the efficiency and effectiveness of the ancillary service, congestion management, and real-time spot markets. We monitor the potential of market participants to exercise undue market power, the behavior of market participants that is consistent with attempts to exercise market power and the market performance that results from the interaction of market structure with participant behavior. In order to perform monitoring activities effectively, DMM collects available data, designs, and implement reporting dashboards that track key market metrics. We are using various SAS^® BI tools to develop and employ metrics and analytic tools applicable to market structure, participant behavior, and market performance. This paper provides details about the effective use of various SAS BI tools to implement an automated real time market monitoring functionality.

SAS^® 9.4 introduces several new software products to better support SAS^® web applications. These products include SAS^® Web Server, SAS^® Web Application Server (with the availability of out-of-the-box clustering), and SAS^® Environment Manager. Even though these products have been tuned and tested for SAS 9.4 web applications, advanced users might want to know the tools and techniques that they can use to further monitor, manage, tune, and improve the performance of their environment. This paper discusses how customers can achieve that by exploring the following concepts, activities, techniques, and tools: using SAS Environment Manager to monitor run-time performance of middle-tier components using additional tools to monitor middle-tier components (Apache server-status, Java VisualVM, Java command-line tools, Java GC logging) identifying the potential bottlenecks and tuning suggestions identifying appropriate clustering strategy (single-server vs. multi-server for homogenous or heterogeneous clustering) suggesting the data to collect when analyzing performance (GC data, thread dumps, heapdumps, system resource utilization information, log files) discussing in-depth performance analysis tools (Thread Dump Analyzer, HPjmeter, Eclipse Memory Analyzer (MAT), IBM Support Assistant tools: GC and Memory Visualizer, Memory Analyzer, Thread, and Monitor Dump Analyzer)

As organizations deploy SAS^® applications to produce the analytical results that are critical for solid decision making, they are turning to distributed grid computing operated by SAS^® Grid Manager. SAS Grid Manager provides a flexible, centrally managed computing environment for processing large volumes of data for analytical applications. Exceptional storage performance is one of the most critical components of implementing SAS in a distributed grid environment. When the storage subsystem is not designed properly or implemented correctly, SAS applications do not perform well, thereby reducing a key advantage of moving to grid computing. Therefore, a well-architected SAS environment with a high-performance storage environment is integral to clients getting the most out of their investment. This paper introduces concepts from software storage virtualization in the cloud for the generalized SAS Grid Manager architecture, highlights platform and enterprise architecture considerations, and uses the most popularly selected distributed file system, IBM GPFS, as an example. File system scalability considerations, configuration details, and tuning suggestions are provided in a manner that can be applied to a client s own environment. A summary checklist of important factors to consider when architecting and deploying a shared, distributed file system is provided.

There are exciting new capabilities available from SAS^® High-Performance Analytics and SAS^® Visual Analytics. Current customers seek a deployment strategy that enables gradual migration to the new technologies. Such a strategy would mitigate the need for 'rip and replace' and would enable resource utilization to evolve along a continuum rather than partitioning resources, which would result in underused computing or storage hardware. New customers who deploy a combination of SAS^® Grid Manager, SAS High-Performance Analytics, and SAS Visual Analytics seek to reduce the cost of computing resources and reduce data duplication and data movement by deploying these solutions on the same pool of hardware. When sharing hardware, it is important to implement resource management in order to help guarantee that resources are available for critical applications and processes. This session discusses various methods for managing hardware resources in a multi-application environment. Specific strategies are suggested, along with implementation suggestions.

This discussion uses SAS^® Office Analytics as an example to demonstrate the importance of preparing for the SAS^® installation. There are many nuances as well as requirements that need to be addressed before you do an installation. These requirements are basically similar, yet they differ according to the target installation operating system. In other words, there are some differences in preparation routines for Windows and *Nix flavors. Our discussion focuses on these three topics: 1. Pre-installation considerations such as sizing, storage, proper credentials, and third-party requirements; 2. Installation steps and requirements; and 3. Post-installation configuration. In addition to preparation, this paper also discusses potential issues and pitfalls to watch out for, as well as best practices.

SAS^® Visual Analytics delivers the power of approachable in-memory analytics in an intuitive web interface. The scalable technology behind SAS Visual Analytics should not benefit just the analyst or data scientist in your organization but indeed everyone regardless of their analytical background. This paper outlines a framework for the creation of a cloud deployment of SAS Visual Analytics using the SAS^® 9.4 platform. Based on proven best practices and existing customer implementations, the paper focuses on architecture, processes, and design for reliability and scalable multi-tenancy. The framework enables your organization to move away from the departmental view of the world and to offer analytical capabilities for consumerization and collaboration across the enterprise.

How does the SAS^® server architecture fit within your IT infrastructure? What functional aspects does the architecture support? This session helps attendees understand the logical server topology of the SAS technology stack: resource and process management in-memory architecture in-database processing The session also discusses process flows from data acquisition through analytical information to visual insight. IT architects, data administrators, and IT managers from all industries should leave with an understanding of how SAS has evolved to better fit into the IT enterprise and to help IT's internal customers make better decisions.

All successful organizations seek ways of communicating the identity of subject matter experts to employees. This information exists as common knowledge when an organization is first starting out, but the common knowledge becomes fragmented as the organization grows. SAS^® Text Analytics can be used on an organization's internal unstructured data to reunite these knowledge fragments. This paper demonstrates how to extract and surface this valuable information from within an organization. First, the organization s unstructured textual data are analyzed by SAS^® Enterprise Content Categorization to develop a topic taxonomy that associates subject matter with subject matter experts in the organization. Then, SAS Text Analytics can be used successfully to build powerful semantic models that enhance an organization's unstructured data. This paper shows how to use those models to process and deliver real-time information to employees, increasing the value of internal company information.

Security-conscious organizations have rigorous IT regulations, especially when company data is available on the move. This paper explores the options available to secure a deployment of SAS^® Mobile BI with SAS^® Visual Analytics. The setup ensures encrypted communication from remote mobile clients all the way to backend servers. Additionally, the integration of SAS Mobile BI with third-party Mobile Device Management (MDM) software and Virtual Private Network (VPN) technology enable you to place several layers of security and access control to your data. The paper also covers the out-of-the box security features of the SAS Mobile BI and SAS Visual Analytics administration applications to help you close the loop on all possible areas of exploitation.

Even if you are familiar with security considerations for SAS^® BI deployments, such as metadata and file system permissions, there are additional security aspects to consider when securing any environment that includes SAS^® Visual Analytics. These include files and permissions to the grid machines in a distributed environment, permissions on the SAS^® LASR™ Analytic Servers, and interactions with existing metadata types. We approach these security aspects from the perspective of an administrator who is securing the environment for himself, a data builder, and a report consumer.

Distributing SAS^® software to a large number of machines can be challenging at best and exhausting at worst. Common areas of concern for installers are silent automation, network traffic, ease of setup, standardized configurations, maintainability, and simply the sheer amount of time it takes to make the software available to end users. We describe a variety of techniques for easing the pain of provisioning SAS software, including the new standalone SAS^® Enterprise Guide^® and SAS^® Add-in for Microsoft Office installers, as well as the tried and true SAS^® Deployment Wizard record and playback functionality. We also cover ways to shrink SAS Software Depots, like the new 'subsetting recipe' feature, in order to ease scenarios requiring depot redistribution. Finally, we touch on alternate methods for workstation access to SAS client software, including application streaming, desktop virtualization, and Java Web Start.

SAS^® platform installations are large, complex, growing, and ever-changing enterprise systems that support many diverse groups of users and content. A reliable metadata security implementation is critical for providing access to business resources in a methodical, organized, partitioned, and protected manner. With natural changes to users, groups, and folders from an organization s day-to-day activities, deviations from an original metadata security plan are very likely and can put protected resources at risk. Regular security testing can ensure compliance, but, given existing administrator commitments and the time consuming nature of manual testing procedures, it doesn't tend to happen. This paper discusses concepts and outlines several example test specifications from an automated metadata security testing framework being developed by Metacoda. With regularly scheduled, automated testing, using a well-defined set of test rules, administrators can focus on their other work, and let alerts notify them of any deviations from a metadata security test specification.

We continually work with our hardware partners to establish best practices with regard to tuning the latest hardware components that are released each year. This paper goes over the latest tuning guidelines for your hardware infrastructure, including your host computer system, operating system, and complete I/O infrastructure (from the computer host and network adapters down through the physical storage). Our findings are published in SAS^® papers on the SAS website, support.sas.com, with updates posted to the SAS Administration blog.

This paper gives you a better idea of how and where to use the record lookup functions to locate observations where a variable has some characteristic. Various related functions are illustrated to search numeric and character values in this process. Code is shown with time comparisons. I will discuss three possible ways to retrieve records using the SAS^® DATA step, PROC SQL, and Perl regular expressions. Real and CPU time processing issues will be highlighted when comparing to retrieve records using these methods. Although the program is written for the PC using SAS^® 9.2 in a Windows XP 32-bit environment, all the functions are applicable to any system. All the tools discussed are in Base SAS^®. The typical attendee or reader will have some experience in SAS, but not a lot of experience dealing with large amount of data.

When assisting SAS^® customers who are experiencing performance issues, we are often asked by the SAS users at a customer site for the top 10 guidelines to share with those who have taken on the role of system administrator or SAS administrator. This paper points you to where you can get more information regarding each of the guidelines and related details on the SAS website.

Everyone has heard about SAS^® Cloud. Now come learn how you can build and manage your own cloud using the same SAS^® virtual application (vApp) technology.

Are you a Java programmer who has been asked to work with SAS^®, or a SAS programmer who has been asked to provide an interface to your IT colleagues? Let s face it, not a lot of Java programmers are heavy SAS users. If this is the case in your company, then you are in luck because SAS provides a couple of really slick features to allow Java programmers to access both SAS data and SAS programming from within a Java program. This paper walks beginner Java or SAS programmers through the simple task of accessing SASdata and SAS programs from a Java program. All that you need is a Java environment and access to a running SAS process, such as a SAS server. This SAS server can either be a SAS/SHARE^® server or an IOM server. However, if you do not have either of these two servers that is okay; with the tools that are provided by SAS, you can start up a remote SAS session within Java and harness the power of SAS.

Data quality is at the very heart of accurate, relevant, and trusted information, but traditional techniques that require the data to be moved, cleansed, and repopulated simply can't scale up to cover the ultra-jumbo nature of big data environments. This paper describes how SAS^® Data Quality accelerators for databases like Teradata and Hadoop deliver data quality for big data by operating in situ and in parallel on each of the nodes of these clustered environments. The paper shows how data quality operations can be easily modified to leverage these technologies. It examines the results of performance benchmarks that show how in-database operations can scale to meet the demands of any use case, no matter how big a big data mammoth you have.

The latest releases of SAS^® Data Integration Studio and SAS^® Data Management provide an integrated environment for managing and transforming your data to meet new and increasingly complex data management challenges. The enhancements help develop efficient processes that can clean, standardize, transform, master, and manage your data. The latest features include: capabilities for building complex job processes web and tablet environments for managing your data enhanced ELT transformation capabilities big data transformation capabilities for Hadoop integration with the SAS^® LASR™ platform enhanced features for lineage tracing and impact analysis new features for master data and metadata management This paper provides an overview of the latest features of the products and includes use cases and examples for leveraging product capabilities.