Loading Indexes in Parallel

A significant strength of SPD Server is efficient creation, maintenance, and use of table indexes. Indexing can greatly speed the evaluation of WHERE clause queries. The index can also be a source of sort order when performing BY clause processing. The index is also used directly by some SAS applications. For example, PROC SQL uses indexes to efficiently evaluate equijoins.

Parallel Index Creation

Parallel Index Creation Example

Parallel Index Updates

Parallel Index Creation

SPD Server supports parallel index creation using asynchronous index options. To enable asynchronous parallel index creation, either submit the SPDSIASY=YES macro variable before creating an index in SAS, or use the ASYNCINDEX=YES table option.

Both the macro variable and the table option apply to the DATA step INDEX= processing as well as to PROC DATASETS INDEX CREATE commands. Either method allows all of the declared indexes to be populated with a single scan of the table. A single scan is a substantial improvement over making multiple passes through the data to build each index serially.

As always, there is a price for parallelism. To create multiple indexes requires enough WORKPATH= disk space to create all of the key sorts at the same time. The PROC DATASETS structure has the flexibility to allow batched parallel index creation by using multiple MODIFY groups. The Parallel Index Creation example below inserts INDEX CREATE statements between two successive MODIFY statements resulting in a parallel creation group.

Parallel Index Creation Example

  DATA foo.patient_info;
     length
       last_name $10
       first_name $20
       patient_class $2
       patient_sex $1;

     patient_no=10;
       last_name="Doe";
       first_name="John";
       patient_class="XY";
       patient_age=33;
       patient_sex="M";

  run;

  %let spdsiasy=YES;
  PROC DATASETS lib=foo;
     modify patient_info;
        index create
          patient_no
          patient_class;
     modify patient_info;
        index create
          last_name
          first_name;
     modify patient_info;
        index create
          whole_name=(last_name first_name)
          class_sex=(patient_class patient_sex);
  quit;

Indexes for PATIENT_NO and PATIENT_CLASS are created in parallel, indexes for LAST_NAME and FIRST_NAME are created in parallel, and indexes for WHOLE_NAME and CLASS_SEX are created in parallel.

Parallel Index Updates

SPD Server also supports parallel index updates during table append operations. Multiple threads enable overlap of data transfer to the proxy, as well as updates of the data store and index files. SPD Server decomposes table append operations into a set of steps that can be performed in parallel. The level of parallelism attained depends on the number of indexes that are present on the table. The more indexes you have, the greater the exploitation of parallelism during the append processing. As with parallel index creation, parallel index updates use WORKPATH= disk space for the key sorts that are part of the index append processing.