The parallel range join
method uses a join index to determine the ranges of rows between the
tables that can be joined in parallel. The parallel range join method
requires you to create a join index on the columns to be joined in
the tables that you want to merge. The join index divides the two
tables into a specified number of near-equal parts, or ranges, based
on matching values between the join columns. The Parallel Join facility
recognizes the ranges of rows that contain matching values between
the join columns, and then uses concurrent join threads to join the
rows in parallel. The SPD Server parallel sort then sorts the rows
within a range.
The parallel range join
method can be performed only on tables that are in the same domain.
If either of the two tables are updated after the join index is created,
the join index must be rebuilt before the parallel range join method
can be used. The parallel range join method performs best when the
columns of the tables that are being joined are sorted. If the columns
are not relatively sorted, then the concurrent join threads can cause
processor thrashing. Processor thrashing occurs when unsorted rows
in a table require SPD Server to perform increasingly larger table
row scans, which can consume processor resources at a high rate during
concurrent join operations.
More detailed information
about creating join indexes is available in Chapter 17, "SAS Scalable
Performance Data (SPD) Server Index Utility Ixutil," of the
SAS Scalable Performance Data (SPD) Server 4.5: Administrator's Guide.
How does the SPD Server
Parallel Join facility choose between the sort-merge method and the
range join method? If a join index is available for the tables to
be joined, the Parallel Join facility chooses the parallel range join
method. If a join index does not exist, or if the join index has not
been rebuilt since a table was updated, the Parallel Join facility
defaults to using the parallel sort-merge method.