Using a DataFlux Data Service in a SAS Data Integration Studio Job

Problem

You want to include a DataFlux Data Management Studio data service in the flow for a SAS Data Integration Studio job. For example, you could create a data service that generates match codes and clustering information. You could then call that service in the flow for a SAS Data Integration Studio job, as shown in the next figure.
SAS Data Integration Studio Job That Calls a Data Service
SAS Data Integration Studio Job That Calls a Data Service
For the purpose of illustration, the job shown above is similar in purpose to the sample job that is shown in Create Match Code Job Flow. However, you might want to use a DataFlux Data Service transformation to perform tasks that are a specialty of DataFlux software, such as profiling, monitoring, or address verification.

Solution

Create a data job in DataFlux Data Management Studio. Configure the job as a data service and deploy it to a DataFlux Data Management Server. Create a SAS Data Integration Studio job and add a DataFlux Data Service transformation to the flow. Configure this transformation so that it takes input from the SAS job, sends the input to the DataFlux data service, and then returns output from the service to the SAS job.

Tasks

Verify Prerequisites

The current version of SAS Data Integration Studio can execute data services that were created with DataFlux Data Management Studio only. If you want to execute services that were created with DataFlux dfPower Studio, then the services must be migrated to one of the SAS data management offerings. For more information, see the DataFlux Migration Guide.

Create a Data Service in DataFlux Data Management Studio

A data service is a DataFlux Data Management Studio data job that has been configured as a real-time service and deployed to a DataFlux Data Management Server. For the current example, you would create a data service that generates match codes and cluster information. The flow for that job might look similar to the following figure.
Data Service in DataFlux Data Management Studio
Data Service in DataFlux Data Management Studio
The job must be deployed to a DataFlux Data Management Server, so that it can be accessed from SAS Data Integration Studio. The first node in the flow (External Data Provider) takes input from the job in SAS Data Integration Studio, and the last node (Data Target (Insert)) return output to the job in SAS Data Integration Studio. For information about creating and deploying a data service in DataFlux Data Management Studio, see the topic “Deploying a Data Job as a Real-Time Service” in the Data Job chapter of the DataFlux Data Management Studio User’s Guide.

Create and Populate a Job in SAS Data Integration Studio

For the current example, you would create a SAS Data Integration Studio job and add a DataFlux Data Service transformation to the flow, as shown in the next figure.
SAS Data Integration Studio Job That Calls a Data Service
SAS Data Integration Studio Job That Calls a Data Service
The sources and targets in the flow are added in the usual manner. The sources and targets shown above are similar to those in the sample job that is shown in Create Match Code Job Flow. In the current example, however, a data service is used instead of the Create Match Codes transformation.

Configure the DataFlux Data Service Transformation

Open the Properties window for the DataFlux Data Service transformation. On the Data Service tab, select the DataFlux Data Management Server and select the appropriate data service that was created in DataFlux Data Management Studio. The next figure shows the values for the sample job.
Data Service Tab
Data Service Tab
In the previous figure, the Server field specifies the DataFlux Data Management Server where the data service was deployed. The Service field specifies the data service that you want to run in this step. The data service that you select here was created as described in Configure the DataFlux Data Service Transformation.
On the Input Mapping tab, map one or more input columns for the transformation to the corresponding inputs in the data service, as shown in the next figure.
Input Mapping Tab
Input Mapping Tab
On the Output Mapping tab, map one or more output columns for the transformation to the corresponding outputs in the data service, as shown in the next figure.
Output Mapping Tab
Output Mapping Tab
Click OK to save your input and close the Properties window. The job is now ready to run.

Run the Job and View the Output

Perform the following steps to run the job and view the output:
  1. Run the job.
  2. If the job completes without error, go to the next step. If error messages appear, read and respond to the messages.
  3. Right-click the target table and select View Data. The following display depicts the cluster and match code columns in the target.
    Output from a DataFlux Data Service
    Output from a DataFlux Data Service