Many SAS applications are made up of replicate runs of the same fundamental task or many independent tasks that can be distributed to a grid environment for parallel execution resulting in acceleration of information and results. As a SAS user, you can choose from automated grid capabilities in an easy-to-use, point-and-click interface or by using the syntax of the powerful 4GL SAS programming language for distribution of SAS applications to a grid. For example, a common workflow in ETL processing is the need to run the same analysis (multiple iterations) over different subsets of data such as each state in the U.S. or each sales territory in your organization. New loop transforms have been added to SAS Data Integration Studio 3.3 to allow multiple iterations to automatically distribute to a grid environment for parallel and therefore faster execution. A common workflow in data mining is the need to run multiple models (independent tasks) over the same input data source. SAS Enterprise Miner 5.2 can now automatically distribute parallel nodes in a SAS Enterprise Miner workflow to a grid environment with similar benefit. Both SAS Data Integration Studio and SAS Enterprise Miner have built-in intelligence to automatically create the necessary syntax in the generated SAS program to run in a grid environment when appropriate. The result is that the grid infrastructure remains transparent to you as a user of these applications allowing you to focus on your ETL and data mining tasks and also benefit from better performance of your applications. The applications generated by SAS Data Integration Studio and SAS Enterprise Miner can also be saved as a SAS stored process and subsequently used by the SAS Business Intelligence (BI) components.
In addition to the automated grid capabilities of the SAS data integration, data mining, and business intelligence components, you also have the flexibility of developing applications using the SAS programming language to run in a grid environment. The same syntax used by the SAS solutions to generate programs behind the scenes can be used by SAS programmers to identify subtasks of a long-running application to distribute across the grid. This syntax can also be used as a wrapper around each user's job as a whole for the purposes of load-balancing SAS jobs from a number of users needing to use a virtualized pool of resources. Refer to Key Definition for Automatic Program Submission to the Grid for an example of how this would be done.
SAS/CONNECT provides the syntax to enable distribution of SAS job components. The parallel processing capabilities of SAS/CONNECT have been integrated with components from Platform Computing to provide the most efficient workload distribution to the grid resources, efficient management of the grid resources, and run-time monitoring of the SAS grid environment. Refer to Syntax for SAS/CONNECT Grid Functions for the complete grid syntax.
While the interfaces for scheduling a SAS workflow to the grid and distributing the subtasks of a SAS job to the grid differ, the two can be used together to achieve maximum flexibility and performance. A flow could contain a job that was either generated or written with the appropriate SAS/CONNECT syntax to identify parallel subtasks for grid execution. When the flow is triggered to run, that particular job would be directed to a specific grid node by LSF for SAS and as that job executes it would create parallel tasks to be executed across the remaining grid resources. The mapping of the parallel subtasks to grid resources would also be handled by LSF for SAS.
We at SAS have created the Scalability Community to make you aware of the connectivity and scalability features and enhancements that you can leverage for your SAS installation. The success of this community depends on you. Send electronic mail to scalability@sas.com with your comments, requirements, and suggestions.