Administering SAS Data Integration Studio |
Overview of SAS Data Quality Server Software |
When you run the SAS Deployment Wizard, you install the SAS Data Quality Server software and configure a SAS Application Server to read a Quality Knowledge Base. This enables data integration developers to use the data quality transformations Create Match Code and Apply Lookup Standardization. Those developers can also make use of the data quality functions that are provided in the Expression Builder, which is available in many of the data integration transformations.
Enterprise implementations of the SAS Intelligence Platform make use of the dfPower Studio software and DataFlux Integration Servers from DataFlux, a SAS company. The dfPower Studio software enables you to customize your Quality Knowledge Base. The dfPower Architect and dfPower Profile software enable you to create jobs and real-time services that run on DataFlux Integration Servers. After you register DataFlux Integration Servers in SAS metadata, you can write jobs in SAS Data Integration Studio that execute DataFlux jobs and real-time services on the DataFlux Integration Servers. The transformations that execute DataFlux jobs and real-time services are called DataFlux IS Job and DataFlux IS Service.
This section explains the installed configuration of the SAS Data Quality Server software, and how to perform a simple test to ensure that the system is working. This section also covers several administrative tasks associated with data quality, including the following:
registering DataFlux Integration Servers
downloading new locales
creating new schemes
setting data quality options in SAS Data Integration Studio
Note: If you are unfamiliar with the subject of data quality and the terminology used in this section, see the SAS Data Quality Server: Reference.
About the Data Quality Configuration |
If the SAS Foundation software and the SAS Data Quality Server software are installed, and the SAS Deployment Wizard has been run, this completes the setup. Your data integration developers can use the data quality transformations and functions. This automatic setup is convenient. However, you need to understand the setup in case you need to make changes.
During installation, the SAS Data Quality Server software is installed in ..\dquality. A typical default path might be SAS-configuration-directory\Lev1\dquality.
Locales and schemes are located in directories subordinate to dquality: sasmisc\content\locale and sasmisc\content\scheme.
A data quality configuration file is used to specify the location of the Quality Knowledge Base. When you start a local interactive SAS session, the data quality configuration is referenced by default at \dquality\sasmisc\dqsetup.txt.
When you use SAS Data Integration Studio to start a SAS Workspace server on a SAS Application Server, the data quality configuration file is SAS-configuration-directory\Lev1\appServer\dqsetup.txt. A typical default path is C:\SAS\EGServers\Lev1\SASApp\dqsetup.txt, where SASApp is the name of the SAS Application Server.
On SAS Application Servers, the location of the data quality configuration file is specified in the SAS configuration files SAS-configuration-directory\Lev1\appServer\sasv9.cfg and sasv9_usermods.cfg. In the SAS configuration files, the line that specifies the location of the setup file (by default) is as follows:
-dqsetuploc "dqsetup.txt"
In that same SAS configuration file, the server's default data quality locale is specified as:
-dqlocale (ENUSA)
You can change values in the SAS configuration file sasv9_usermods.cfg without restarting servers. Any SAS Data Integration Studio clients need to be restarted in order to use new values.
Note: Do not edit sasv9.cfg, which needs to remain unchanged to retain default settings. Instead, edit sasv9_usermods.cfg, which includes and overrides sasv9.cfg.
You can use the Properties windows of the data quality transformations to override the server defaults for data quality. You can also use the Data Quality tab of the Options dialog box in SAS Data Integration Studio to override server defaults in connection profiles. To display the Options dialog box, select Tools Options.
Test the SAS Data Quality Server Software |
To verify that your SAS Data Quality Server software is working, create a job in SAS Data Integration Studio that contains a Create Match Code transformation.
Follow these steps to create such a job:
From the SAS Data Integration Studio desktop, select Tools Process Designer to start the New Job Wizard.
In the New Job Wizard, enter a name for the job--such as Create Database Match Codes--in the Name box. Then, click Finish. A new Process Designer window appears on the right side of your workspace.
From the Process Library tree, select and drag the Create Match Code template into the Process Designer.
From the Inventory tree, or another tree view, select and drag the metadata object for any table to the source drop zone.
From the Inventory tree, or another tree view, select and drag the metadata object for any table to the target drop zone. Both a Loader and the target table are added to the graphical representation of the job.
Double-click the Create Match Code transformation to display the Properties menu.
As the Properties window appears, a dialog box indicates that match definitions are being loaded from the Quality Knowledge Base. This indicates that the SAS Data Quality Server software has been properly installed and configured.
To see a list of the match definitions that were loaded, specify a source table for the job and display the Match Code tab of the Properties window. Double-click the Match Definitions column to show the list of match definitions.
Register DataFlux Integration Servers |
DataFlux Integration Servers run jobs and real-time services that are created with the dfPower Architect and dfPower Profile software from DataFlux, a SAS company. When you install a DataFlux Integration Server, and register the server in SAS metadata, you can execute DataFlux jobs and service from other jobs that you create in SAS Data Integration Studio.
To register a DataFlux Integration Server in SAS metadata, follow these steps:
Open SAS Management Console, right-click the Server Manager, and select New Server.
In the New Server Wizard, select Http server and click Next.
Enter a name and optional description and click Next.
Enter the optional server software version number and vendor name.
Select or enter the network path to the DataFlux Integration Server.
For the Application Server Type, select DataFlux Integration Server, and then click Next.
Select authentication and protocol options.
Specify the name of the host.
Accept the default port number 21036.
Specify a proxy URL if one is needed.
Click Next, review your entries, and click Finish. The new server appears in the Plug-Ins tab.
Download Locales |
When initially installed, the Quality Knowledge Base contains a single locale (English/USA). You can obtain additional locales from DataFlux, a SAS company, at the following Web address: http://www.dataflux.com/QKB. DataFlux regularly adds new locales for various regions and national languages.
If you install additional locales, you need to update your data quality setup file accordingly, as indicated in the documentation that is provided with each locale. Information about locating and editing the setup file is also provided in SAS Data Quality Server: Reference.
You can also create new locales (and edit existing ones) using the dfPower Customize software from DataFlux, a SAS company.
Create Schemes |
A scheme is a reusable collection of match codes and standardization values that is applied to input character values for the purposes of transformation or analysis. Schemes can be created in Blue Fusion Data (BFD) format or SAS format (NOBFD). Before your data integration developers can use the Apply Lookup Standardization template in SAS Data Integration Studio, you must specify a scheme repository and a scheme repository type. You also need to create schemes by using the SAS Data Quality Server or dfPower Studio software. For information about how to create schemes, see the relevant product documentation.
Scheme repositories should be separated based on scheme type. BFD schemes and NOBFD schemes should be stored separately to ensure that standardization jobs use the appropriate schemes. Two scheme repositories are provided in the default installation. On the SAS Application Server, the default scheme directory is SAS-configuration-directory\Lev1\appServer\SASEnvironment\QltyKB\scheme. This directory contains a number of BFD schemes that are supplied with the SAS Data Quality Server software.
A second default scheme directory is provided for interactive SAS sessions that are started on the local host: ..\dquality\sasmisc\content\scheme.
In SAS Data Integration Studio, the scheme repository and the scheme repository type are specified in the Data Quality tab of the Options dialog box (select Tools Options). The default scheme repository type is dfPower Scheme (BFD). No default scheme repository is specified. When you specify a scheme repository in SAS Data Integration Studio, a full or explicit path is recommended. Relative paths must be specified relative to the SAS Application Server.
If you change an existing value in the Scheme Repository Type or Scheme Repository box, then you need to replace any existing instances of the Apply Lookup Standardization transformation. Replacement of the Scheme Repository Type or Repository is required. The scheme metadata is added to these jobs when they are run for the first time.
To update a job to use a different scheme repository:
Add a new Apply Lookup Standardization transformation to the job.
Configure the new transformation.
Delete the old transformation.
Move the new transformation into place.
Set Data Quality Options for SAS Data Integration Studio |
You can set several options related to data quality by using the Data Quality tab in the Options dialog box in SAS Data Integration Studio (select Tools Options).
Copyright © 2010 by SAS Institute Inc., Cary, NC, USA. All rights reserved.