Setting Up the SAS Data Quality Server Software

Overview of SAS Data Quality Server Software

When you run the SAS Deployment Wizard, you install the SAS Data Quality Server software and configure a SAS Application Server to read a Quality Knowledge Base from DataFlux, a SAS company. Installing a Quality Knowledge Base enables the use of the data quality transformations in the Data Quality folder in SAS Data Integration Studio. You can also use the data quality functions in the Expression Builder. The Expression Builder is available in many data integration transformations.
SAS Data Integration Studio has additional transformations that execute data quality jobs and real-time services on DataFlux Data Management Servers. The DataFlux jobs and real-time services are created using DataFlux Data Management Studio. The DataFlux client allows you to analyze and profile data quality across your enterprise, and create customized Quality Knowledge Bases that enforce your business rules.
This section explains the installed configuration of the SAS Data Quality Server software, and how to perform a simple test to ensure that the system is working. This section also covers several administrative tasks associated with data quality, including the following:
  • registering DataFlux Data Management Servers
  • downloading new locales
  • creating new schemes
  • setting data quality options in SAS Data Integration Studio
Note: If you are unfamiliar with the subject of data quality technology and terminology, refer to any of the following documents: SAS Data Quality Server: Reference, DataFlux Data Management Studio: User’s Guide, and DataFlux Data Management Server: Administrator’s Guide.

About the Data Quality Configuration

When you install the data quality software, the SAS Deployment Wizard installs and configures the SAS Data Quality Server, SAS Foundation, and SAS Data Integration Studio software. If you deploy an enterprise data integration software bundle, the SAS Deployment Wizard also installs the DataFlux client and server and creates metadata for the server. After installation, the data quality software is fully operational. The following information is provided so that you can change the default configuration.
The SAS Data Quality Server software is installed in !SASROOT\dquality.
Locales and schemes are located in directories subordinate to dquality: sasmisc\QltyKB\sample\locale and sasmisc\QltyKB\sample\scheme.
During execution, data quality jobs reference a specific Quality Knowledge Base (QKB) at a specific location. The location of the QKB can be specified by the job, by the DI Studio client, or by one of two system defaults.
Data quality jobs can specify the location of a Quality Knowledge Base by specifying a value for the system option DQSETUPLOC.
To create jobs that reference non-default QKBs, open SAS Data Integration Studio, select Toolsthen selectOptions, and display the Data Quality tab.
To set a default QKB location on a SAS Application Server, open the file SAS-configuration-directory\Lev1\SASApp\sasv9_usermods.cfg and specify a new location, as shown in this example:
-dqsetuploc "C:\Program Files\DataFlux\QltyKB\CI\2010A"
When you install the server, the write-protected system default for DQSETUPLOC is specified in the file SAS-configuration-directory\Lev1\SASApp\sasv9.cfg. The default entry in that file is as follows:
DQSETUPLOC "!SASROOT\dquality\sasmisc\QltyKB\sample"
The default locale is set as follows in sasv9_usermods.cfg:
-dqlocale (ENUSA)
You can change values in the SAS configuration file sasv9_usermods.cfg without restarting servers. Any SAS Data Integration Studio clients need to be restarted in order to use new values.
Note: Do not edit sasv9.cfg, so that your system can revert to a known default state.

Test the SAS Data Quality Server Software

To verify that your SAS Data Quality Server software is working, create a job in SAS Data Integration Studio that contains a Create Match Code transformation.
Follow these steps to create such a job:
  1. From the SAS Data Integration Studio desktop, select Toolsthen selectProcess Designer to start the New Job Wizard.
  2. In the New Job Wizard, enter a name for the job—such as Create Database Match Codes—in the Name box. Then, click Finish. A new Process Designer window appears on the right side of your workspace.
  3. From the Process Library tree, select and drag the Create Match Code template into the Process Designer.
  4. From the Inventory tree, or another tree view, select and drag the metadata object for any table to the source drop zone.
  5. From the Inventory tree, or another tree view, select and drag the metadata object for any table to the target drop zone. Both a Loader and the target table are added to the graphical representation of the job.
  6. Double-click the Create Match Code transformation to display the Properties menu.
  7. As the Properties window opens, a dialog box indicates that match definitions are being loaded from the Quality Knowledge Base. This indicates that the SAS Data Quality Server software has been properly installed and configured.
  8. To see a list of the match definitions that were loaded, specify a source table for the job and display the Match Code tab of the Properties window. Double-click the Match Definitions column to show the list of match definitions.

Register a New DataFlux Data Management Server

DataFlux Management Servers run jobs and real-time services that are created with DataFlux Data Management Studio. One Data Management Server is installed and registered automatically when you install an enterprise data integration software bundle.
To register another DataFlux Integration Server in SAS metadata, follow these steps:
  1. Open SAS Management Console, right-click the Server Manager, and select New Server.
  2. In the New Server Wizard, select Http server and click Next.
  3. Enter a name and optional description and click Next.
  4. Enter the optional server software version number and vendor name.
  5. Select or enter the network path to the DataFlux Integration Server.
  6. For the Application Server Type, select DataFlux Data Management Server, and then click Next.
  7. Select authentication and protocol options.
  8. Specify the name of the host.
  9. Accept the default port number 2136.
  10. Specify a proxy URL if one is needed.
  11. Click Next, review your entries, and click Finish. The new server appears in the Plug-Ins tab.

Download Locales

When initially installed, the Quality Knowledge Base contains a single locale (English/USA). You can obtain additional locales from DataFlux at the following Web address: http://www.dataflux.com/QKB. DataFlux regularly updates locales, so it is important that you install the latest versions after you install the data quality software.
If you install additional locales, you need to update your data quality setup file accordingly, as indicated in the documentation that is provided with each locale. Information about locating and editing the setup file is provided in the SAS Data Quality Server: Reference.
Note that you can create new locales and edit existing locales using DataFlux Data Management Studio.

Create Schemes

A scheme is a reusable collection of match codes and standardization values that is applied to input character values for the purposes of transformation or analysis. Schemes can be created in Blue Fusion Data (BFD) format or SAS format (NOBFD). Before your data integration developers can use the Apply Lookup Standardization template in SAS Data Integration Studio, you must specify a scheme repository and a scheme repository type. You also need to create schemes by using the SAS Data Quality Server or dfPower Studio software. For information about how to create schemes, see the relevant product documentation.
Scheme repositories should be separated based on scheme type. BFD schemes and NOBFD schemes should be stored separately to ensure that standardization jobs use the appropriate schemes. Two scheme repositories are provided in the default installation. On the SAS Application Server, the default scheme directory is SAS-configuration-directory\Lev1\appServer\SASEnvironment\QltyKB\scheme. This directory contains a number of BFD schemes that are supplied with the SAS Data Quality Server software.
A second default scheme directory is provided for interactive SAS sessions that are started on the local host: ..\dquality\sasmisc\content\scheme.
In SAS Data Integration Studio, the scheme repository and the scheme repository type are specified in the Data Quality tab of the Options dialog box (select Toolsthen selectOptions). The default scheme repository type is dfPower Scheme (BFD). No default scheme repository is specified. When you specify a scheme repository in SAS Data Integration Studio, a full or explicit path is recommended. Relative paths must be specified relative to the SAS Application Server.
If you change an existing value in the Scheme Repository Type or Scheme Repository box, then you need to replace any existing instances of the Apply Lookup Standardization transformation. Replacement of the Scheme Repository Type or Repository is required. The scheme metadata is added to these jobs when they are run for the first time.
To update a job to use a different scheme repository:
  1. Add a new Apply Lookup Standardization transformation to the job.
  2. Configure the new transformation.
  3. Delete the old transformation.
  4. Move the new transformation into place.

Set Data Quality Options for SAS Data Integration Studio

You can set several options related to data quality by using the Data Quality tab in the Options dialog box in SAS Data Integration Studio (select Toolsthen selectOptions).