WHERE processing enables you to conditionally select a subset of rows, so that the software processes
only the rows that meet specified conditions. To optimize the performance of WHERE
processing, you can request that data subsetting be performed in the
Hadoop cluster, which can be an
SPD Server dynamic cluster table. Then, when you submit SPD Server code that includes a
WHERE expression (which defines the condition that selected rows must satisfy), SPD Server instantiates
the WHERE expression as a Java class. SPD Server submits the Java class to the Hadoop
cluster as a component in a
MapReduce program.
By requesting that data subsetting be performed in the Hadoop cluster, performance
might be improved by taking advantage of the filtering and ordering
capabilities of the MapReduce framework. As a result, only the subset of the data
is returned to the SPD Server client. Performance is often improved with large tables
when the WHERE expression qualifies only a relatively small subset.