What's New in SAS Data Quality Server 9.0, 9.1, and 9.1.2
Overview
New capabilities, functions, and options in SAS Data Quality Server
(formerly called SAS Data Quality - Cleanse) improve your ability to:
- analyze the quality of your data
- minimize defective
data
- reduce redundancies
- transform and standardize data
- merge and recombine
data.
Note: z/OS is the successor to the OS/390 operating system. SAS
Data Quality Server 9.1 is supported on both OS/390 and z/OS operating systems.
Throughout this document, any reference to z/OS also applies to OS/390.
General Enhancements
- SAS Data Quality Server is now available under the z/OS (formerly
OS/390) operating environment.
- The software is now closely integrated with:
- SAS ETL Studio for data cleansing as part of enterprise-wide extract,
transform, and load.
- dfPower Studio and Blue Fusion from DataFlux (a SAS company) for
locale editing and additional data cleansing functionality.
- The SAS Sample Library now contains samples for SAS Data Quality
Server 9.1.
Locales
The following locales are provided:
- DEDEU, German language, for use with data from
Germany
- ENAUS, English for Australia
- ENGBR, English for Great Britain
- ENUSA, English for the United
States
- ENZSA, English for South Africa (new in SAS 9.1.2)
- FRFRA, French for France (new in SAS 9.1.2)
- ITITA, Italian for
Italy
- NLNLD, Dutch for The Netherlands.
Functions
The following new functions interpret input character values that have
been parsed (and therefore contain delimiters):
- DQMATCHPARSED returns a match code from a parsed character value.
- DQGENDERPARSED returns a gender
determination from the parsed
name of an individual.
The next set of new functions helps you prepare parsed character
values:
- DQPARSETOKENPUT creates a new parsed value or adds a parsed value
to an existing parsed value.
- DQMATCHINFOGET returns the name of the parse definition that is
associated with a specified match definition.
- DQGENDERINFOGET returns the name of the parse definition that
is associated with a specified gender definition.
The new function DQLOCALEINFOGET returns a list of the locales that
are currently loaded into memory.
The new function DQPATTERN returns a pattern analysis of the words or
characters in an input character value.
The existing function DQSCHEMEAPPLY and the existing CALL routine CALL
DQSCHEMEAPPLY now accept the following new arguments: <scheme-lookup-method>, <match-definition>, <sensitivity>, and
<locale>. These
arguments implement the new scheme-apply capabilities that are also available
in the DQSCHEME procedure for the APPLY statement.
System Options
The following system options are new or have changed:
- DQLOCALE= specifies the locales that are to be loaded
into memory.
- DQSETUPLOC= specifies the location of the setup file for SAS Data
Quality Server.
AUTOCALL Macros
The following AUTOCALL macros are new or have changed:
- %DQPUTLOC displays in the SAS log all the definitions
and tokens
in a specified locale.
- %DQLOAD loads specified locales. The new DQINFO parameter generates
additional information in the SAS log for debugging purposes.
- %DQUNLOAD unloads all locales.
DQSCHEME Procedure
In the PROC DQSCHEME statement, the BFD and NOBFD options enable you
to generate schemes in SAS format and in BFD format. BFD format schemes can
be displayed and edited using dfPower Customize from DataFlux (a SAS company).
The new CONVERT statement enables you to convert existing schemes between
SAS and BFD formats.
For the CREATE statement in the DQSCHEME procedure, the following options
are new or have been changed:
- INCLUDE_ALL enables you to fully populate the scheme. The scheme
includes all output values, including those that are not transformed.
- MODE= is stored in the scheme to specify that the scheme will,
by default, be applied to the entirety of each value of the input variable
(when MODE=PHRASE), or to each element in each value (when MODE=ELEMENT).
The value of MODE= that is stored in the scheme can be overridden by the value
of the MODE= option in the APPLY statement, or in the <mode>
argument in the DQSCHEMEAPPLY function or CALL routine.
- SENSITIVITY= enables you to specify the degree of complexity in
the match codes that are generated internally when the scheme is built. Higher
sensitivity values generate match codes that are more complex. Complex match
codes are useful when you want a higher degree of similarity between the DATA
values that map to a given STANDARD value.
- MATCHDEF=, which was previously known as MATCHTYPE=, specifies
the match definition that is referenced during the creation of match codes.
- LOCALE= specifies a locale.
For the APPLY statement in
the DQSCHEME procedure, MODE= specifies whether
to apply the scheme to the entire input value or to each element of the input
value. The default value is determined by the value of MODE= that was stored
in the scheme when the scheme was created.
DQMATCH Procedure
For the PROC DQMATCH statement, the options DELIMITER and NODELIMITER
enable you to generate concatenated match codes with or without a delimiter.
For the CRITERIA statement in the DQMATCH procedure, the following options
are new or have been changed:
- SENSITIVITY= now has a maximum value of 95, which equates with
the maximum sensitivity in dfPower Studio.
- MATCHCODE= enables you to generate more than one match code in
a single pass through the data.
- DELIMSTR= enables you to generate match codes for parsed input
character variables.
- MATCHDEF=, which was previously known as MATCHTYPE=, specifies
the match definition that will be referenced during the creation of match
codes.
- LOCALE= specifies a locale.