SAS Institute. The Power to Know

SAS(R) Data Quality Server 9.2: Reference

Previous Page | Next Page

Using the SAS Data Quality Server Software

Overview

The SAS Data Quality Server software provides a Quality Knowledge Base, along with SAS language elements that enable you to analyze, transform, and standardize your data. By cleansing your data, you increase the quality and value of the information that you derive from your data.

The language elements in the SAS Data Quality Server software can be separated into two functional groups. As shown in the following diagram, one group cleanses data in SAS. The other group runs data cleansing jobs and services on Integration Servers from DataFlux (a SAS company).

Functional Overview of the SAS Data Quality Server Software

[Overview of the SAS Data Quality Server]

The language elements in the Local Process group read data definitions out of the Quality Knowledge Base to, for example, create match codes, apply schemes, or parse text. The language elements in the Server Process group start and stop jobs and services and manage log entries on DataFlux Integration Servers.

The DataFlux Integration Servers and the related dfPower Profile and dfPower Architect applications are made available with the SAS Data Quality Server software in various software bundles.

You can add DataFlux software after purchasing the SAS Data Quality Server software.

DataFlux Integration Servers that are purchased for use with SAS are restricted so that jobs and services can be executed only by SAS programs.


The Quality Knowledge Base

The Quality Knowledge Base contains data quality definitions for a number of locales. Definitions specify how categories of data are processed. For example, the match definition NAME defines how match codes are created for the names of individuals. The data definitions in the Quality Knowledge Base are redefined in each locale.

Locales provide data definitions for a national language and a geographical region. For example, the locale ENUSA reflects the English language as it is used in the United States.

In your data analysis and cleansing programs, and in your DataFlux jobs, and services, you select the locales that apply to your data. Those locales are read into your process for reference during analysis, transformation, and standardization.

To obtain online Help for your Quality Knowledge Base, open the dfPower Customize application and select Help [arrow] QKB Help.


The Local Process Group

The local process group of SAS language elements provides data cleansing capabilities within SAS processes. The group consists of the procedures DQSCHEME and DQMATCH, 18 functions, one CALL routine, 2 system options, and several AUTOCALL macros.

The DQSCHEME procedure enables you to create and apply schemes that transform similar data values into the single-most-common value, as shown in the following diagram.

DQSCHEME Transformation Output

[DQSCHEME Transformation Output]

PROC DQSCHEME also analyzes and reports on the quality of your data. For additional information, see The DQSCHEME Procedure.

The DQMATCH procedure enables you to generate match codes as a basis for standardization or transformation. Match codes are character representations of data values. You generate match codes based on a definition and sensitivity value. The match codes reflect the relative similarity of data values. Values that generate the same match codes are candidates for transformation or standardization. To learn more about PROC DQMATCH, see The DQMATCH Procedure.

The functions and CALL routines in the local process group provide the following capabilities:

  • parse a text string and return the string with tokens that identify the elements in the string

  • return a token from a parsed string

  • create match codes for strings or tokens

  • standardize strings

  • apply schemes

  • analyze patterns

  • return the name of the data definition that fits the value

  • return the name of the locale that best fits the data

  • return a list of definitions in a locale

For links to specific functions, see Functions Listed by Category.

The Server Process Group

The server process group of language elements enable you to run DataFlux jobs and services on DataFlux Integration Servers. You can also administer log files on Integration Servers.

Use the DQSRVSVC procedure to run services that were created with the dfPower Architect software. These services rapidly process and return small amounts of data. Rapid response enables real-time data cleansing at the point of data entry. For more information, see The DQSRVSVC Procedure.

Use the DQSRVADM procedure to return job log information. See The DQSRVADM Procedure.

Use the functions in the server process group to run jobs, read and manage status logs, and terminate jobs and services. See DataFlux Integration Server Functions.

Note:   The server group of procedures and functions are not available in the OpenVMS operating environment.  [cautionend]


DataFlux Jobs and Services

On DataFlux Integration Servers, jobs and services fulfill separate needs. Use jobs to access larger data sets in batch mode, when a client application is not waiting for a response. Use services and small data sets in real time, when clients await a response from the server.

To create jobs and services for your Integration Servers, use the dfPower Profile and dfPower Architect applications from DataFlux (a SAS company). The dfPower Profile software enables you to create jobs that analyze the quality of your data. Profile jobs are available in two types. One type runs on individual files; the other runs on repositories. Each type of job has its own trigger function (DQSRVPROFJOBREP and DQSRVPROFJOBFILE).

The dfPower Architect software provides a graphical interface that you use to create jobs and services. You trigger the Architect jobs with the function DQSRVARCHJOB. Run Architect services with PROC DQSRVSVC.

Jobs and services generate log information on Integration Servers. You can read the server logs using the DQSRVSTATUS function or the DQSRVADM procedure. Based on the job status information that is returned in the logs, you can terminate jobs and services using DQSRVKILLJOB.


Data Quality Software in Other SAS Products

The SAS Data Quality Server software is frequently used in tandem with the dfPower Studio software from DataFlux (a SAS company). The dfPower Studio software enables you to cleanse data with or without using a DataFlux Integration Server. The dfPower Studio software includes, among other packages, the dfPower Customize software. The dfPower Customize software enables you to create and edit definitions in your Quality Knowledge Base. dfPower Studio also includes the dfPower Profile software and the dfPower Architect software, which is used to create jobs and services through a graphical user interface. For more information on dfPower Studio, see www.dataflux.com .

The SAS Data Quality Server software is bundled into enterprise versions of the SAS Intelligence Platform. In those bundles, the SAS Data Quality Server software is made available in transformation templates in the SAS Data Integration Studio software. Also, in the SAS Enterprise Guide and SAS Data Integration Studio software, the Expression Builder enables you to add data quality functions to your logical expressions.

Previous Page | Next Page | Top of Page