Glossary |
in SAS data quality, a SAS output data set that provides information about the degree of divergence in specified character values.
a file format for schemes that can be created and applied in data quality software from SAS and from DataFlux (a SAS company). Schemes in Blue Fusion data format are sometimes referred to as BFD schemes. Schemes can also be created in SAS format.
a part of a locale that is referenced during data cleansing to impose a capitalization scheme on a character variable.
to improve the consistency and accuracy of data by standardizing it, reorganizing it, and eliminating redundancy.
in SAS data quality, a set of character values that have the same match code.
a match code that consists of a concatenation of match codes from values from two or more input character variables in the same observation. A delimiter can be specified to separate the individual match codes in the concatenation.
a match code that consists of a concatenation of match codes that are created for each token in a delimited or parsed string. Within a compound match code, individual match codes might be separated by a delimiter.
in SAS data quality, the process of evaluating input data sets in order to determine whether data cleansing is needed.
the process of eliminating inaccuracies, irregularities, and discrepancies from data.
are contained in the Quality Knowledge Base for a number of locales. Data definitions specify how categories of data are processed.
the relative value of data, which is based on the accuracy of the knowledge that can be generated using that data. High-quality data is consistent, accurate, and unambiguous, and it can be processed efficiently.
in SAS data quality, a cleansing process that applies a scheme to a specified character variable. The scheme creates match codes internally to create clusters. All values in each cluster are then transformed to the standardization value that is specified in the scheme for each cluster.
a character that separates words or phrases in a text string.
a part of a locale that is referenced during data cleansing to determine the gender of individuals based on the names of those individuals.
a part of a locale that is referenced during the selection of the locale from the locale list. This is the best choice for use in the analysis or cleansing of the specified character values.
a part of a locale that is referenced during data analysis or data cleansing to determine categories for specified character values.
provide data definitions for a national language and geographical region. The locale reflects the language, local conventions, and culture for a geographic region. Local conventions can include specific formatting rules for dates, times, and numbers, and a currency symbol for the country or region. Collating sequences, paper sizes, and conventions for postal addresses and telephone numbers are also typically specified for each locale. Some examples of locale values are French_Canada, Portuguese_Brazil, and Chinese_Singapore.
an ordered list of locales that is loaded into memory prior to data analysis or data cleansing. The first locale in the list is the default locale.
a set of values that produce identical match codes or identical match code components. Identical match codes are assigned to clusters. See also match code, match code component, and cluster.
an encoded version of a character value that is created as a basis for data analysis and data cleansing. Match codes are used to cluster and compare character values.
a part of a locale that is referenced during the creation of match codes. Each match definition is specific to a category of data content. For example, in the ENUSA locale, match definitions are provided for names, e-mail addresses, and street addresses, among others. See also sensitivity.
a title of respect or a professional title that precedes a first name or an initial. For example, Mr., Mrs., and Dr. are name prefixes.
a part of a name that follows the last name. For example, Jr. and Sr. are name suffixes.
in SAS data quality, a process that inserts into a character value a series of delimiters, as determined by a specified parse definition.
a part of a locale that is referenced during the parsing of character values. The parse definition specifies the number and location of the delimiters that are inserted during parsing. The location of the delimiters depends on the content of the character values. See also token.
a named element that can be assigned a value during parsing. Tokens are assigned values based on the specified parse definition. The value can then be manipulated using the name of the token. See also token.
in SAS data quality, a text string into which has been inserted a delimiter and name at the beginning of each token in that string. The string is automatically parsed by referencing a parse definition. See also delimited string.
a collection of locales and other information that is referenced during data analysis and data cleansing. For example, to create match codes for a data set with addresses in Great Britain, you would reference the ADDRESS match definition, in the ENGBR locale.
a file format for schemes that can be created and applied in data quality software from SAS and from DataFlux (a SAS company). Schemes in SAS data format are sometimes referred to as ??? schemes.
in SAS data quality, a reusable collection of match codes and standardization values that is applied to input character values for the purposes of transformation or analysis. Schemes can be created in Blue Fusion data format or SAS data format. See also Blue Fusion data format.
in SAS data quality, a value that specifies the amount of information in match codes. Greater sensitivity values result in match codes that contain greater amounts of information. As sensitivity values increase, character values must be increasingly similar to generate the same match codes.
a part of a locale that is referenced during data cleansing to impose a specified format on character values.
in SAS data quality, to impose a specified format on character values. Standardization definition is used to standardize the data.
in SAS data quality, a named word or phrase in a parsed or delimited string that can be individually analyzed and cleansed. See also parse token.
in SAS Data Quality, a process that converts a group of similar data values to the single value that is most commonly present in the group.
in SAS Data Quality, the most frequently occurring value in a cluster. In data cleansing, this value is propagated to all of the values in the cluster.
Copyright © 2010 by SAS Institute Inc., Cary, NC, USA. All rights reserved.