Joins
are a common and resource-intensive part of SAS Data Integration Studio.
SAS SQL implements several well-known join algorithms: sort-merge,
index, and hash. You can use common techniques to aid join performance,
irrespective of the algorithm that you choose. Conditions often cause
the SAS SQL optimizer to choose the sort-merge algorithm; techniques
that improve sort performance also improve sort-merge join performance.
However, understanding and leveraging index and hash joins enhance
performance.
You might
often perform lookups between tables in SAS Data Integration Studio.
Based on key values in one table, you look up matching keys in a second
table and retrieve associated data in the second table. SQL joins
can perform lookups. However, SAS and SAS Data Integration Studio
provide special lookup mechanisms that typically outperform a join.
The problems associated with joins are similar to the problems with
sorting:
-
Join performance seems slow.
-
You have trouble influencing the
join algorithm that SAS SQL chooses.
-
You experience higher than expected
disk space consumption.
-
You have trouble operating SAS
SQL joins with RDBMS data.