Data Masking

Overview

Data masking is a method of hiding sensitive data, or personally identifiable information (PII), within data sources. PII is any data that could potentially identify a specific individual. Any information that can be used to distinguish one person from another, such as government-issued IDs, can be considered PII. The purpose of data masking is to protect original PII data by using a functional substitute in situations where the audience is not privileged to access the original data. The primary focus of data masking is to protect sensitive data while maintaining integrity of the data so that it remains usable.
Data masking with SAS Federation Server uses a set of rules (rule types) and arguments that are run with a system function, SYSCAT.DM.MASK. Data masking rules consist of individual rule types that define the specific masking action or algorithm to apply to the data. Data masking rules are as follows:
  • ENCRYPT and DECRYPT
  • HASH
  • TRANC (Transliterated Value)
  • RANDOM
  • RANDATE (Random Date)
  • RANSTR, RANDIG (Random String)
These rules are valid for use in FedSQL queries which are applied over literal values or individual columns to hide PII.

About the SYSCAT.DM.MASK Function

The SYSCAT.DM.MASK function accepts defaults configured as package options in addition to the various arguments associated with each rule type. For example, the KEY argument for the ENCRYPT rule defaults to the value configured as the ENCRYPT_KEY package option if a key is not specified in the argument. To configure default masking parameters as package options, use the ALTER SERVER DDL command. The following example sets a default, or pre-configured, encryption key used by the ENCRYPT and HASH rule types:
ALTER SERVER {options PACKAGE(name 'DM',
     ENCRYPT_KEY ’212e8ba6b7f84796a87a985d54277f2f’)}
Configured parameters revert to a static default value if they are dropped.
Note: Column data encrypted with the static default key cannot be reversed with the DECRYPT rule type. If encrypted data will require decryption, you should set a unique encryption key within the data masking argument.
Use the SYSCAT.DM.MASK function with the specified rule types and arguments to mask a value containing PII. The rule type argument must be a string constant. Argument names are not case sensitive. Here is the syntax:
SYSCAT.DM.MASK( 'rule-type', value [, rule-arguments])
Here is the syntax for the rule arguments:
[, 'rule-arg-name1', 'rule-arg-value1',
[, 'rule-arg-name2', 'rule-arg-value2', ...]] )
Here is an example of data masking that uses the HASH rule-type. This rule masks the ‘LastName’ column in the HR.EMPLOYEES table using an HMAC-MD5 hash and aliases the result as LN:
select SYSCAT.DM.MASK('HASH', "LastName", 
                      'alg', 'MD5', 
                      'key', 'abc123!' ) as "LN"
 from HR.EMPLOYEES
Data Masking Arguments and Descriptions
Argument
Description
rule-type
Name of the rule type, for example, ENCRYPT, DECRYPT, or HASH.
value
Specifies the data value to mask, for example, a column reference or other value expression. Value is always the first position following a rule-type in a data masking statement.
rule-arg-name1
Name of the first masking rule argument.
rule-arg-value1
Value of the first masking rule argument.
rule-arg-name2, ...
Name of the second and subsequent masking rule arguments.
rule-arg-value2
Value of the second and subsequent masking rule arguments.

Data Masking Rule Types and Arguments

ENCRYPT / DECRYPT

ENCRYPT masks the values in a column by encrypting or encoding a single value using a symmetric key cipher or simple encoding algorithm. To achieve truly encrypted results, a KEY argument must be specified, or a default ENCRYPT_KEY pre-configured in SYSCAT.DM.MASK.
DECRYPT unmasks previously encrypted values using a symmetric key cipher or a simple decoding algorithm. To unmask previously encrypted or encoded values, the encryption key must be specified using the KEY argument. DECRYPT will not use the encryption key configured in SYSCAT.DM.MASK. The DECRYPT rule returns NULL for the AES algorithm if the KEY argument is omitted.
Here is the syntax for ENCRYPT and DECRYPT:
SYSCAT.DM.MASK( 'ENCRYPT', value [, rule-arguments ] ),
SYSCAT.DM.MASK( 'DECRYPT', value [, rule-arguments ] )
Note: Use HASH if uniqueness, reversal, or decryption is not a requirement.
Unique Results are unique if the input values are unique.
Deterministic Results are deterministic if DETERMINISTIC YES is specified.
Reversible Reversal is possible if key is poor quality or divulged.

Arguments for ENCRYPT and DECRYPT

Use the following values to specify a data masking function with ENCRYPT or DECRYPT:

VALUE

Specifies the data value to mask. Value can be a column or other value expression, and always follows the ENCRYPT or DECRYPT rule-type in a masking statement. Here is an example where the social security number column (“SSN”) is masked with encryption:

select SYSCAT.DM.MASK('ENCRYPT', "SSN",...

ALG

Argument type: Algorithm required. Case-insensitive string constant.

Specifies the algorithm name which is one of the following:
AES/FIPS FIPS 140-2 compliant AES encryption (FIPS 197). Symmetric key block encryption cipher using the AES (Advanced Encryption Standard) algorithm for 128 bit blocks using a 256 bit key and a 64 bit salt value. An encryption key must be specified in order for the algorithm to produce truly encrypted results. Requires installation of DataFlux Secure.
AES FIPS 140-2 compliant AES encryption (FIPS 197). Symmetric key block encryption cipher using the AES (Advanced Encryption Standard) algorithm for 128 bit blocks using a 256 bit key and a 16 bit salt value. An encryption key must be specified in order for the algorithm to produce truly encrypted results. Requires installation of DataFlux Secure.
BASE64 Base 64 encoding. Not FIPS 140-2 compliant.
SAS001 Alias for BASE64 encoding.
SAS002 SAS Proprietary encoding. Not FIPS 140-2 compliant.
SAS003 Alias for AES.
SAS004 Alias for AES/FIPS.

KEY

Argument type: Case-insensitive string constant. Valid only with AES encrypted data. Required for DECRYPT. Not valid for the SAS002 algorithm.

Symmetric encryption/decryption key. The key defaults to the pre-configured ENCRYPT_KEY option in SYSCAT.DM.MASK, or, for ENCRYPT only, the static default when none is configured. You can set a pre-configured encryption key using the ENCRYPT_KEY option in the ALTER SERVER DDL statement.
Note: DECRYPT will not use a pre-configured encryption key. Therefore, you must specify a KEY argument for decryption to work.

DETERMINISTIC

Argument type: Case-insensitive Boolean string constant or 1 or 0.

Use with ENCRYPT only.
Boolean string constant values must contain one of {YES, TRUE, ON, 1, NO, FALSE, OFF, 0 }.
Use this option when deterministic output is required. Controls whether encrypted value is DETERMINISTIC. The default value is FALSE.
Exception: Not valid with the SAS002 algorithm, which does not support a custom key.

EXPAND_PREC

Argument type: Case-insensitive Boolean string constant or 1 or 0.

Use with ENCRYPT only.
Boolean string constant values must contain one of {YES, TRUE, ON, 1, NO, FALSE, OFF, 0 }.
This option causes the precision of the output value to accommodate the possibility of data bloat when using the specified encryption or encoding method. EXPAND_PREC is active by default. If the encrypted value does not fit in the column, this option returns an empty string for VARCHARor NVARCHAR, or all-blank for CHAR or NCHAR.
  • an empty string for VARCHAR or NVARCHAR.
  • a completely padded blank for CHAR or NCHAR.

CASE

Argument type: Case-insensitive string constant.

Use with ENCRYPT only.
Controls case in output. Does not apply to the BASE64 algorithm, which produces upper and lower cased characters.
U[PPER] Use uppercase hexits A-F.
L[OWER] Use lowercase hexits a-f.

STRIP

Argument type: Case-insensitive string constant.

Use with ENCRYPT only.
Specifies whether to strip trailing whitespace characters from the input value prior to encryption. By default, values are not stripped.
Valid values are BLANKS, UNICODESP, UNICODESPACE, ANY, ALL, WS. BLANKS specifies to strip ASCII blank (0x20) characters only. UNICODESP and UNICODESPACE specify to strip Unicode whitespace characters. ANY, ALL, and WS specify to strip Unicode whitespace characters as well as certain format control characters.

HASH

The HASH rule hashes a single value into a fixed-length hash digest or HMAC string and is not reversible. Here is the syntax for HASH:
SYSCAT.DM.MASK( 'HASH', value [, rule-arguments ] )
Unique Results might be unique if the input values are unique.
Deterministic Yes
Reversible No
Algorithm
Value Type
Return Type
MD5
CHAR or VARCHAR
CHAR(32)
WCHAR or WVARCHAR
WCHAR(32)
other
BINARY(16)
SHA256
CHAR or VARCHAR
CHAR(64)
WCHAR or WVARCHAR
WCHAR(64)
other
BINARY(32)

Arguments for HASH

Use the following values to specify a HASH data masking function:

VALUE

Specifies the data value to mask. Value can be a column or other value expression, and always follows the HASH rule-type in a masking statement. Here is an example where “LastName” is the masked value.

select SYSCAT.DM.MASK('HASH', "LastName",...

ALG

Argument type: Case-insensitive string constant.

Required. Specifies the algorithm name:
MD5 Robert Rivest’s 128–bit algorithm (1991).
SHA256* NSA’S 256–bit algorithm (2001). SHA256 is the default.
Note: *SHA256 requires installation of DataFlux Secure software.

CASE

Argument type: Case-insensitive string constant.

Controls case in output. Does not apply to the BASE64 algorithm, which produces upper and lower cased characters.
U[PPER] Use uppercase hexits A-F.
L[OWER] Use lowercase hexits a-f.

KEY

Argument type: Case-insensitive string constant.

By specifying a KEY argument, or defaulting to the ENCRYPT_KEY parameter configured in the package, a ‘hash message authentication code’ (HMAC) that complies with RFC 2104 (http://tools.ietf.org/pdf/rfc2104) is computed. Otherwise, the specified raw digest is computed directly.

TRANC

TRANC masks the values in a column by transliterating characters from the input string to characters in the output string. Ensure that the mapped result contains ‘many-to-1’ character transliterations so that an inverse transliteration does not determine the original value. The TO and FROM strings are case–sensitive and should be lower cased. Here is the syntax for TRANC:
 SYSCAT.DM.MASK ('TRANC', "value" , 
            'FROM', 'lower-case string, 
            'TO', 'lower-case string' )
Unique Data dependent.
Deterministic YES
Reversible Reversal is possible depending on the character mapping provided. Reversal is appropriate when mapping multiple input characters to a single output character.
Here is the syntax for TRANC:
SELECT SYSCAT.DM.MASK('TRANC', TABLE.COLUMN."FIELD", 'FROM', 'characters_to_convert',
 'TO', 'converted_characters',
'START', 1, 'LENGTH ', 11) )

'TRANC', '123', 'FROM', '0123456789','TO', 'XXXXXXXXXX',
   'START', 1, 'LENGTH', 3

Arguments for TRANC

Use the following values to specify a data masking function with TRANC:

VALUE

Specifies the data value to mask. Value can be a column or other value expression, and always follows the TRANC rule-type in a masking statement.

FROM

Argument type: String

Specifies the characters to convert from.

TO

Argument type: String

Specifies the characters to convert to. The TO string must not have more characters than the FROM string. Additional FROM string characters are mapped to blanks.

START

Argument type: String

Specifies the starting position. The default starting position is 1.

LENGTH

Argument type: Integer

Specifies the transliteration length. The default is the length of entire input string.

RANDOM

RANDOM masks the values in a numeric column which results in a uniformly distributed pseudo-random number.
Unique Not guaranteed and dependent on the value range.
Deterministic YES unless NULL constant is specified as the value.
Reversible Not applicable.
Here is the syntax for RANDOM:
SELECT SYSCAT.DM.MASK('RANDOM', '123', 'MIN', 1239, 'MAX', 10000, 'VARY', 10.5,
'KEY', '5c39b18d77d5f297ff92e4942e5522b5')

Arguments for RANDOM

Use the following values to specify a RANDOM data masking function:

VALUE

Specifies the data value to mask. Value can be a column or other value expression, and always follows the RANDOM rule-type in a masking statement.

SEED

Argument type: 64-bit signed integer

Initial integer seed. If omitted, a default seed is supplied from the RANDOM_SEED package configuration parameter. You can set a RANDOM_SEED using the ALTER SERVER .

MIN

Argument type: Numeric

MIN specifies the minimum value and accepts either NULL or NOT NULL values. Either MIN and MAX or VARY is required in the argument.

MAX

Argument type: Numeric

MAX specifies the maximum value and accepts either NULL or NOT NULL values. Either MIN and MAX or VARY is required in the argument.

VARY

Argument type: Numeric

Vary original value by +/- variance of the amount. VARY requires a NOT NULL value. Mutually exclusive of MIN and MAX parameters. Therefore, VARY is required if MIN or MAX is not used.

KEY

Argument type: String

Specifies a secret key that is used to produce an HMAC internally from which the pseudo-random result can be computed deterministically based on the value. A quality key is necessary to prevent discovery of the value using a rainbow table attack. KEY is used in conjunction with the input value to compute a cryptographic hash message authentication code (a derived key) from which to compute the pseudo-random output. KEY works with MIN or MAX and VARY.
Note: Use KEY with a non-NULL argument to produce deterministic results. The KEY value is combined with the value passed to the function to produce deterministic values. Use SEED with a NULL argument to produce non-deterministic results. Either KEY or SEED is used, but not both. If both are specified one will be ignored, depending on whether the value is NULL.

RANDATE

RANDATE masks the values in a date column by replacing them with pseudo-random date values.
Unique Not guaranteed and dependent on the date range.
Deterministic YES unless NULL constant is specified as the value.
Reversible No.

Arguments for RANDATE

Use the following values to specify a data masking function with RANDATE:

VALUE

Specifies the data value to mask. Value can be a column or other value expression, and always follows the RANDATE rule-type in a masking statement.

SEED

Argument type: 64-bit signed integer

Initial integer seed. If omitted, a default seed is supplied from the RANDOM_SEED package configuration parameter. You can set a RANDOM_SEED using the ALTER SERVER .

VARY

Argument type: Numeric

Vary original value by +/- variance of the amount. Mutually exclusive of MIN and MAX parameters.

U[NITS]

Argument type: String

Specifies the variance units. Numeric types are treated as a SAS date value. The possible units are:
DAY, D
WEEK, WK, W
MONTH, MO
YEAR, YR, Y
HOUR, HR, H
MINUTE, MIN, M
SECOND, SEC, S
Note: Random date variances include DATE, TIME, and TIMESTAMP column types. The default unit for TIME or TIMESTAMP data types is HOUR, and MONTH for others.

KEY

Argument type: String

Specifies a secret key that is used to produce an HMAC internally from which the pseudo-random result can be computed deterministically based on the value. A quality key is necessary to prevent discovery of the value using a rainbow table attack. KEY is used in conjunction with the input value to compute a cryptographic hash message authentication code (a derived key) from which to compute the pseudo-random output.
Note: Use KEY with a non-NULL argument to produce deterministic results. The KEY value is combined with the value passed to the function to produce deterministic values. Use SEED with a NULL argument to produce non-deterministic results. Either KEY or SEED is used, but not both. If both are specified one will be ignored, depending on whether the value is NULL.

RANSTR

RANSTR masks the values in a column by replacing with random strings. Strings are generated by an algorithm that uses characters from the source string in the generation process, adding padding characters if necessary. Padding is placed to the left of the string unless RIGHT is specified. The minimum number of non-pad characters is MINPREC, and the maximum is MAXPREC. The value passed to RANSTR is used only to ensure deterministic results.
Use the following values to define a data masking function with RANSTR:
Type WCHAR(MAXPREC) if padding is off. WVARCHAR(MAXPREC) if padding is on.
Unique Not guaranteed but more so as MINPREC is increased.
Deterministic YES unless NULL constant is specified as the value.
Reversible Not applicable.
Here is the syntax for RANSTR:
SELECT SYSCAT.DM.MASK('RANSTR', NULL, 'MINPREC', 10, 'SOURCE', 'Hello World')

Arguments for RANSTR

Use the following values to specify a data masking function using RANSTR:

VALUE

Specifies the data value to mask. Value can be a column or other value expression. Value always follows the RANSTR rule-type in a masking statement.

SEED

Argument type: 64-bit signed integer

Specifies the initial seed.

MINPREC

Argument type: Integer

Specifies the minimum string precision. The default minimum precision is 0.

MAXPREC

Argument type: Integer

Specifies the maximum string precision. By default, MAXPREC=MINPREC.

SOURCE

Argument type: String

Specifies the characters from which to create the random string. The source value can be a column reference.

PAD

Argument type: String

Specifies a PAD character or NULL. If NULL is specified, no padding is added to the generated string. PAD uses ' ' as the default character if a character is not specified.

RIGHT

Argument type: Boolean

Specifies that pad is on the right side of the generated string.

KEY

Argument type: String

Specifies a secret key that is used to produce an HMAC internally from which the pseudo-random result can be computed deterministically based on the value. A quality key is necessary to prevent discovery of the value using a rainbow table attack. KEY is used in conjunction with the input value to compute a cryptographic hash message authentication code (a derived key) from which to compute the pseudo-random output.
Note: Use KEY with a non-NULL argument to produce deterministic results. The KEY value is combined with the value passed to the function to produce deterministic values. Use SEED with a NULL argument to produce non-deterministic results. Either KEY or SEED is used, but not both. If both are specified one will be ignored, depending on whether the value is NULL.

RANDIG

RANDIG masks the numeric values in a column by replacing digits them with strings of random digits. Strings are generated by an algorithm that uses digits derived from the base number system of the source value, adding padding digits if necessary. RANDIG is an alias of RANSTR with the following constraints:
  • Padding is always to the left of digits.
  • Pad character defaults to ‘0’ for bases other than 64, and ’’ for base 64.
  • SOURCE is implicit and derived from the value of BASE.
Here is the syntax for RANDIG:
SELECT SYSCAT.DM.MASK ('RANDIG', NULL, 'SEED', 100, 'BASE', 2)

Arguments for RANDIG

Use the following values to specify a data masking function with RANDIG.
RANDIG Only Parameters:

VALUE

Specifies the data value to mask. Value can be a column or other value expression, and always follows the RANDIG rule-type in a masking statement.

SEED

Argument type: 64-bit signed integer

Specifies the initial seed. Use with a NULL argument to produce non-deterministic results.

MINPREC

Argument type: Integer

Specifies the minimum string precision. The default minimum precision is 0.

MAXPREC

Argument type: Integer

Specifies the maximum string precision. By default, MAXPREC=MINPREC.

BASE

Argument type: Integer

2 Binary
8 Octal
10 Decimal (default)
16 Hexadecimal
64 Base 64

CASE

Argument type: String

U[PPER] Use uppercase hexits A-F (default).
L[OWER] Use lowercase hexits a-f.
The CASE option is ignored for bases other than 16.

KEY

Argument type: String

Specifies a secret key that is used to produce an HMAC internally from which the pseudo-random result can be computed deterministically based on the value. A quality key is necessary to prevent discovery of the value using a rainbow table attack. KEY is used in conjunction with the input value to compute a cryptographic hash message authentication code (a derived key) from which to compute the pseudo-random output.
Note: Use KEY with a non-NULL argument to produce deterministic results. The KEY value is combined with the value passed to the function to produce deterministic values. Use SEED with a NULL argument to produce non-deterministic results. Either KEY or SEED is used, but not both. If both are specified one will be ignored, depending on whether the value is NULL.
Last updated: March 6, 2018