SAS Institute. The Power to Know

FOCUS AREAS

SAS on the Grid

Exploding data volumes, increasing demands on business, and ever-shrinking windows of time are causing "analysis paralysis" for many organizations. This has resulted in computation-intensive and data-intensive application needs that cannot be met with existing IT infrastructure, given the way in which that infrastructure is currently used and managed. Because of the enormous power of SAS analytics, SAS applications are used for data- and computation- intensive analyses by every type of SAS customer in every industry segment. Therefore, many SAS applications have a natural affinity for a grid infrastructure. There are several ways to allow a SAS workflow to run in a grid environment:
  1. by distributing SAS job components and/or multiple SAS jobs to the grid

  2. by scheduling SAS workflows made up of one or more jobs to the grid

  3. by combining 1 and 2



DISTRIBUTING SAS JOBS TO THE GRID

Many SAS applications are made up of replicate runs of the same fundamental task or many independent tasks that can be distributed to a grid environment for parallel execution resulting in acceleration of information and results. As a SAS user, you can choose from automated grid capabilities in an easy-to-use, point-and-click interface or by using the syntax of the powerful 4GL SAS programming language for distribution of SAS applications to a grid. For example, a common workflow in ETL processing is the need to run the same analysis (multiple iterations) over different subsets of data such as each state in the U.S. or each sales territory in your organization. New loop transforms have been added to SAS Data Integration Studio 3.3 to allow multiple iterations to automatically distribute to a grid environment for parallel and therefore faster execution. A common workflow in data mining is the need to run multiple models (independent tasks) over the same input data source. SAS Enterprise Miner 5.2 can now automatically distribute parallel nodes in a SAS Enterprise Miner workflow to a grid environment with similar benefit. Both SAS Data Integration Studio and SAS Enterprise Miner have built-in intelligence to automatically create the necessary syntax in the generated SAS program to run in a grid environment when appropriate. The result is that the grid infrastructure remains transparent to you as a user of these applications allowing you to focus on your ETL and data mining tasks and also benefit from better performance of your applications. The applications generated by SAS Data Integration Studio and SAS Enterprise Miner can also be saved as a SAS stored process and subsequently used by the SAS Business Intelligence (BI) components.

In addition to the automated grid capabilities of the SAS data integration, data mining, and business intelligence components, you also have the flexibility of developing applications using the SAS programming language to run in a grid environment. The same syntax used by the SAS solutions to generate programs behind the scenes can be used by SAS programmers to identify subtasks of a long-running application to distribute across the grid. This syntax can also be used as a wrapper around each user's job as a whole for the purposes of load-balancing SAS jobs from a number of users needing to use a virtualized pool of resources. Refer to Key Definition for Automatic Program Submission to the Grid for an example of how this would be done.

SAS/CONNECT provides the syntax to enable distribution of SAS job components. The parallel processing capabilities of SAS/CONNECT have been integrated with components from Platform Computing to provide the most efficient workload distribution to the grid resources, efficient management of the grid resources, and run-time monitoring of the SAS grid environment. Refer to Syntax for SAS/CONNECT Grid Functions for the complete grid syntax.

error-file:tidyout.log

SCHEDULING SAS WORKFLOWS TO THE GRID

The SAS scheduling interface can be used to schedule a SAS workflow. It is directly incorporated into a number of SAS products and solutions, including SAS Data Integration Studio, SAS Web Report Studio, SAS Marketing Automation, and SAS Marketing Optimization. In addition, any SAS program created by a SAS solution or SAS programmer can be scheduled using the Schedule Manager plug-in within SAS Management Console. Using the SAS scheduling integration with Platform Computing, you can schedule a flow based on a trigger event. A trigger event can be a specific date/time event or a file event. A trigger can also be a recurring event. A flow is typically made up of multiple jobs where two or more of the jobs can be executed simultaneously. In this case, when the job is triggered to execute, the parallel jobs are distributed to resources in the grid environment. Even for flows that contain only a single job or jobs that must execute in a serial fashion, multiple users scheduling such flows benefit by having these flows execute on the most appropriate resource and the total multi-user workload effectively balanced within the grid. The SAS grid capabilities are enabled by a new product called SAS Grid Manager. One of the components of SAS Grid Manager is the Platform Suite for SAS which has three components:

error-file:tidyout.log



COMBINING SCHEDULING AND DISTRIBUTION OF SAS JOBS TO THE GRID

While the interfaces for scheduling a SAS workflow to the grid and distributing the subtasks of a SAS job to the grid differ, the two can be used together to achieve maximum flexibility and performance. A flow could contain a job that was either generated or written with the appropriate SAS/CONNECT syntax to identify parallel subtasks for grid execution. When the flow is triggered to run, that particular job would be directed to a specific grid node by LSF for SAS and as that job executes it would create parallel tasks to be executed across the remaining grid resources. The mapping of the parallel subtasks to grid resources would also be handled by LSF for SAS.

error-file:tidyout.log

page divider

We at SAS have created the Scalability Community to make you aware of the connectivity and scalability features and enhancements that you can leverage for your SAS installation. The success of this community depends on you. Send electronic mail to scalability@sas.com with your comments, requirements, and suggestions.