Using SAS Data Integration Studio with a SAS Grid

Scheduling SAS Data Integration Studio Jobs on a Grid

You can schedule jobs from within SAS Data Integration Studio and have those jobs run on the grid. You deploy the job for scheduling in SAS Data Integration Studio, and then use the Schedule Manager plug-in in SAS Management Console to specify the schedule and the scheduling server. For more information, see Scheduling Jobs on a Grid . Also see Scheduling in SAS. .

Multi-User Workload Balancing with SAS Data Integration Studio

SAS Data Integration Studio 4.2 enables users to directly submit jobs to a grid. This capability allows the submitted jobs to take advantage of load balancing and job prioritization that you have specified in your grid. SAS Data Integration Studio also enables you to specify the workload that submitted jobs should use. This capability enables users to submit jobs to the correct grid partition for their work.
To submit a job to the grid, select the SAS Grid Server component in the Server menu on the Job Editor toolbar. Click Submit in the toolbar to submit the job to the grid.
Submitting a Job to the Grid
SAS DIS with job submitted to grid
To specify a workload value for the server, follow these steps:
  1. On the SAS Data Integration Studio menu bar, select Toolsthen selectOptions, and then select the SAS Server tab on the Options dialog box.
  2. Select the SAS grid server in the Server field.
  3. Select the workload to use for the submitted jobs in the Grid workload specification field.
    Selecting the Workload
    SAS Server tab on Options window
SAS Grid Manager uses the workload value to send the submitted job to the appropriate grid partition. For more information about the other steps required, see Defining and Specifying Resources.

Parallel Workload Balancing with SAS Data Integration Studio

A common workflow in applications created by SAS Data Integration Studio is to repeatedly execute the same analysis against different subsets of data. Rather than running the process against each table in sequence, use a SAS grid environment to run the same process in parallel against each source table, with the processes distributed among the grid nodes. For this workflow, the Loop and Loop-End transformation nodes can be used in SAS Data Integration Studio to automatically generate a SAS application that spawns each iteration of the loop to a SAS grid via SAS Grid Manager.
Loop and Loop-End Transformation Nodes
processing data in parallel
To specify options for loop processing, open the Loop Properties window and select the Loop Options tab. You can specify the workload for the job, as well as how many processes can be active at once.
Loop Properties Dialog Box
loop properties dialog box
For more information, see SAS Data Integration Studio: User's Guide.

Updating SAS Grid Server Definitions for Partitioning

After defining resource names, you can update the grid server metadata so that SAS Data Integration Studio knows the available resource names. To update the definitions, follow these steps:
  1. In SAS Management Console, open the Server Manager plug-in and locate the logical server definition.
  2. Expand the logical Grid Server node and select the Grid Server node. Select Properties from the pop-up menu or the File menu.
  3. In the Properties window, select the Options tab.
  4. Specify the workload resource name (for example, DI) in the Workload field.
  5. Save and close the definition.
  6. Repeat this process for all workloads.

Specifying Workload for the Loop Transformation

A SAS Data Integration Studio user performs these steps to specify an LSF resource in the properties for a Loop Transformation in a SAS Data Integration Studio job. When the job is submitted for execution, it is submitted to one or more grid nodes that are associated with the resource.
It is assumed that the default SAS application server for SAS Data Integration Studio has a Logical SAS Grid Server component, which was updated in the metadata repository. For more information, see Defining and Specifying Resources.
  1. In SAS Data Integration Studio, open the job that contains the Loop Transformation to be updated.
  2. In the Process Designer window, right-click the metadata object for the Loop Transformation and select Properties.
  3. In the Properties window, click the Loop Options tab.
  4. On the Loop Options tab, in the Grid workload specification text box, enter the name of the desired workload, such as DI. The entry is case sensitive.
  5. Click OK to save your changes, and close the Properties window.