About the Data Quality Methods

Overview

Data quality methods implement the basic QKB definition types. Each data quality method is defined in dfs_serv_dq.xml with an associated QKB path and locale. The configuration file contains each of the data quality methods presented below. However, you can create custom wrappers for definitions that are not presented in this topic. All of the data quality methods have the same general syntax, except DQPARSE, DQEXTRACT, and DQMATCH. DQPARSE and DQEXTRACT require an additional input token that qualifies the output field. DQMATCH uses a sensitivity code that specifies the degree of similarity for the data matching.

Standardization

DQSTANDARDIZE

You can perform standardization using the DQSTANDARDIZE method:

method dqstandardize(nnvarchar(256) value, nnvarchar(256) qkb_def, 
nnvarchar(50) locale) returns nnvarchar(256);
The DQSTANDARDIZE method supports the following data types for the value parameter:
nnvarchar(256)|date|timestamp
Standardization generates a preferred standard representation of data values. Standardization definitions are provided for character content such as dates, names, and postal codes. The available standardization definitions vary from one locale to the next. The return values are provided in the appropriate case, and insignificant blank spaces and punctuation are removed. The order of the elements in the return values might differ from the order of the elements in the input character values.
Here are sample SELECT statements for standardization:
SELECT SYSPROC.DQ.DQUALITY.DQSTANDARDIZE (
            STATE, 
            'State/Province (Full Name)', 
            'ENUSA' )  AS STANDARD_STATE
FROM employee

SELECT SYSPROC.DQ.DQUALITY.DQSTANDARDIZE (
            postalCode, 
            'Postal Code', 
            'ENUSA' )  AS STANDARD_POSTAL_CODE
FROM employee

Matching

DQMATCH

You can perform matching using the DQMATCH method:

method dqmatch(nvarchar(256) value, nvarchar(256) qkb_def, int sensitivity, 
nvarchar(50) locale) returns nvarchar(256);
The DQMATCH method supports the following data types for the value parameter:
nvarchar(256)|date|timestamp
Matching analyzes the input data and generates a matchcode for the data. The matchcode represents a condensed version of the character value. Similar strings receive identical matchcodes. You can specify a sensitivity value, ranging from 0–100, indicating the degree of similarity that should be applied to consider something a match. A sensitivity value of 100 yields more information, and 0 yields less. The default recommended sensitivity value is 85. Here are sample SELECT statements for matching:
SELECT SYSPROC.DQ.DQUALITY.DQMATCH (
            postalCode, 
            'Postal Code', 85,
            'ENUSA' )  AS MATCH_POSTAL_CODE
FROM employee


SELECT SYSPROC.DQ.DQUALITY.DQMATCH (
            phone, 
            'Phone', 50,
            'ENUSA' )  AS MATCH_PHONE
FROM employee

Pattern Analysis

DQPATTERN

You can perform pattern analysis using the DQPATTERN method:

method dqpattern(nvarchar(256) value, nvarchar(256) qkb_def, 
nvarchar(50) locale) returns nvarchar(256);
Pattern analysis returns a simple representation of a text string’s character pattern, which can be used for pattern frequency analysis in profiling jobs. Pattern analysis identifies words or characters in the input data column as numeric, alphabetic, non-alphanumeric, or mixed. The choice of pattern analysis definition determines the nature of the analysis. Here are sample SELECT statements for pattern analysis:
SELECT SYSYPROC.DQ.DQUALITY.DQPATTERN (
            name, 
            'Word', 
            'ENUSA' )  AS PATTERN_WORD
FROM employee

SELECT SYSPROC.DQ.DQUALITY.DQPATTERN (
            address, 
            'City - State/Province - Postal Code', 
            'ENUSA' )  AS PATTERN_CITY_STATE_POSTAL
FROM employee

Identification Analysis

DQIDENTIFY

You can perform identification analysis using the DQIDENTIFY method:

method dqidentify(nvarchar(256) value, nvarchar(256) qkb_def, 
nvarchar(50) locale) returns nvarchar(256);
Identification analysis returns a value that indicates the category of the content in an input character string. The available categories and return values depend on your choice of identification definition and locale. Here are sample SELECT statements for identification analysis:
SELECT SYSPROC.DQ.DQUALITY.DQIDENTIFY (
            Name, 
            'Field Name', 
            'ENUSA' )  AS IDENTIFY_FIELD_NAME
FROM employee

SELECT SYSPROC.DQ.DQUALITY.DQIDENTIFY (
            email, 
            'E-mail (Country Identification)', 
            'ENUSA' )  AS IDENTIFY_EMAIL
FROM employee

Gender Analysis

DQGENDER

You can perform gender analysis using the DQGENDER method:

method dqgender(nvarchar(256) value, nvarchar(256) qkb_def, 
nvarchar(50) locale) returns nvarchar(256);
Gender analysis evaluates the name or other information about an individual to determine the gender of that individual. If the evaluation finds substantial clues that indicate gender, the function returns a value that indicates that the gender is female or male. If the evaluation is inconclusive, the stored procedure returns a value that indicates that the gender is unknown. The exact return value is determined by the specified gender analysis definition and locale. Here is a sample SELECT statement for gender analysis:
SELECT SYSPROC.DQ.DQUALITY.DQGENDER (
            NAME, 
            'Name', 
            'ENUSA' )  AS GENDER_NAME
FROM employee

Casing

DQLOWERCASE

The DQLOWERCASE method applies lowercase text:

method dqlowercase(nvarchar(256) value, nvarchar(256) qkb_def, 
nvarchar(50) locale) returns nvarchar(256);
Casing applies context-sensitive case rules to text. It operates on character content, such as names, organizations, and addresses. Here is a sample of lower casing:
SELECT SYSPROC.DQ.DQUALITY.DQLOWERCASE (
            name, 
            'Lower', 
            'ENUSA' )  AS LOWERCASE_PHONE
FROM employee

DQUPPERCASE

The DQUPPERCASE method applies uppercase text:

method dquppercase(nvarchar(256) value, nvarchar(256) qkb_def, 
nvarchar(50) locale) returns nvarchar(256);
Here is a sample of upper casing:
SELECT SYSPROC.DQ.DQUALITY.DQUPPERCASE (
            name, 
            'Upper', 
            'ENUSA' )  AS UPPERCASE_PHONE
FROM employee

DQPROPERCASE

The DQPROPERCASE method applies uppercase and lowercase text using context-sensitive rules:

method dqpropercase(nvarchar(256) value, nvarchar(256) qkb_def, 
nvarchar(50) locale) returns nvarchar(256);

Parsing

DQPARSE

You can perform parsing using the DQPARSE method:

method dqparse(nvarchar(256) value, nvarchar(256) qkb_def,
 nvarchar(256) tokener, nvarchar(50) locale) returns nvarchar(256);
The DQPARSE method supports the following data types for the value parameter:
nvarchar(256)|date|timestamp
Parsing segments a string into semantically atomic tokens. Parsing is performed with the DQPARSE method. Here are samples of the SELECT statement for parsing:
SELECT SYSPROC.DQ.DQUALITY.DQPARSE (
            address, 
            'Address', 'Street Name', 
            'ENUSA' )  AS PARSE_ADDRESS_STREET_NAME
FROM employee

SELECT SYSPROC.DQ.DQUALITY.DQPARSE (
            name, 
            'Name (Global)', 'Prefix', 
            'ENUSA' )  AS PARSE_NAME_PREFIX
FROM employee

Extraction

DQEXTRACT

You can perform extraction using the DQEXTRACT method:

method dqextract(nvarchar(256) value, nvarchar(256) qkb_def,
 nvarchar(256) tokener, nvarchar(50) locale) returns nvarchar(256);
Extraction returns one or more extracted text values, or tokens, as output.
Last updated: March 6, 2018