When using conventional
processing to access data inside a Greenplum database, SAS Enterprise
Miner asks the
SAS/ACCESS engine for all rows of the table being processed.
The
SAS/ACCESS engine generates an SQL SELECT * statement that is
passed to the Greenplum database. That SELECT statement fetches all
the rows in the table, and the
SAS/ACCESS engine returns them to SAS
Enterprise Miner. As the number of rows in the table grows over time,
network latency grows because the amount of data that is fetched from
the Greenplum database to the SAS scoring process increases.
The SAS Scoring Accelerator
for Greenplum embeds the robustness of SAS Enterprise Miner scoring
models directly in the highly scalable Greenplum database. By using
the SAS In-Database technology and the SAS Scoring Accelerator for
Greenplum, the scoring processing is done inside the database, and
thus does not require the transfer of data.
The SAS Scoring Accelerator
for Greenplum takes the models that are developed by SAS Enterprise
Miner and translates them into scoring functions that can be deployed
inside Greenplum. After the scoring functions are published, the functions
extend the Greenplum SQL language and can be used in SQL statements
like other Greenplum functions.
The SAS Scoring Accelerator
for Greenplum consists of two components:
-
the Score Code Export node
in SAS Enterprise Miner. This extension exports the model scoring
logic, including metadata about the required input and output variables,
from SAS Enterprise Miner.
-
the publishing client that includes the %INDGP_PUBLISH_MODEL
macro. This macro translates the scoring model into .c and .h files
for creating the scoring functions and generates a script of Greenplum
commands for registering the scoring functions. The publishing client
then uses the
SAS/ACCESS Interface to Greenplum to publish the scoring
functions to Greenplum.