How FedSQL Processes Nulls and SAS Missing Values

FedSQL Modes for Nonexistent Data

Nonexistent data is represented by a SAS missing value in SAS data sets and SPD Engine data sets. For all other data sources, nonexistent data is represented by an ANSI SQL null value. The SAS missing value indicators, . , ._, .A - .Z, and '␣' are known values that indicate nonexistent data. Table data with an ANSI null has no real data value; it is metadata that indicates an unknown value.
Because there are significant differences in processing null values and SAS missing values, FedSQL has two modes for processing nonexistent data: the ANSI SQL null mode (ANSI mode) and the SAS missing value mode (SAS mode).
The behavior of nonexistent data depends on how you connect to the data source:
  • By default, a client application that connects to the data source via a client-side driver, such as JDBC or ODBC, processes data using ANSI mode.
  • By default, a Base SAS session that submits PROC FEDSQL processes data using SAS mode. PROC FEDSQL provides the ANSIMODE option in order to process data in ANSI mode.
In most instances, no mode change is necessary to process nonexistent data. The following are instances of when you might want to change the mode:
  • when a client application processes SAS data sets or SPD Engine data sets and the mode for nonexistent data is in ANSI mode
  • when the processing of SAS data sets or SPD Engine data sets is complete and the client application is ready to return to ANSI mode
CAUTION:
If the mode is not set for the desired results, data is lost.
In ANSI mode, when FedSQL reads a numeric SAS missing data value from the data source, it converts it to a data type of DOUBLE. If the SAS missing data value is a special missing value, such as .A, the .A is lost when it is converted to a null. When a null value is written to a SAS data set or an SPD Engine data set, FedSQL converts it to a SAS missing value, which is a period ( . ). In SAS mode, when a null of type CHAR is read from the data source, FedSQL converts it to a blank character. When the blank character is stored in a SAS data set or an SPD Engine data set, that value can no longer be interpreted as an unknown value in ANSI mode.
For information about how to set FedSQL in ANSI mode and SAS mode, see the SAS documentation for your client environment.

Differences between Processing Null Values and SAS Missing Values

Processing SAS missing values is different from processing null values and has significant implications in these situations:
  • when filtering data (for example, in a WHERE clause, a HAVING clause, or an outer join ON clause). SAS mode interprets null values as SAS missing values, which are known values, whereas ANSI mode interprets null values as unknown values.
  • when submitting outer joins in ANSI mode, internal processing might generate nulls for intermediate result sets. FedSQL might generate SAS missing values in SAS mode for intermediate result sets. Therefore, for intermediate result sets, nulls are interpreted as unknown values in ANSI mode and in SAS mode, missing values are interpreted as known values.
  • when comparing a blank character. SAS mode interprets the blank character as a missing value. In ANSI mode, a blank character is a blank character; it has no special meaning.
The following are attribute and behavior differences between null values and SAS missing values:
Attribute and Behavior Differences between Null Values and SAS Missing Values
Attribute or Behavior
Null Values
SAS Missing Values
internal representation
metadata
floating point or character
evaluation by logical operators
is an unknown value that is compared by using three-valued logic, whose resolved values are True, False, and Unknown. For example, WHERE col1 = null returns UNKNOWN.
is a known value that, when compared, resolves to a Boolean result
collating sequence order
appears as the smallest value
appears as the smallest value
For information about the results of logical operations on null values, see <search-condition>.

Reading and Writing Nonexistent Data in ANSI Mode

Many relational databases such as Oracle and DB2 implement ANSI SQL null values. Therefore, the concept of null values using FedSQL is the same as using the SQL language for databases that support ANSI SQL. It is important to understand how FedSQL processes SAS missing values because data can be lost.
SAS missing value data types can be only DOUBLE or CHAR. Therefore, only the conversion for these data types is shown. The following table shows the value that is returned to the client application when FedSQL reads a null value or a SAS missing value from a data source in ANSI mode:
Reading Nonexistent Data Values in ANSI Mode
Column Data Type
Nonexistent Data Value
Value Returned to the Application
DOUBLE
., ._, or .A - .Z
null
DOUBLE
null
null
CHAR
'␣'
'␣'
CHAR
null
null
Note: The value '␣' is a blank space between single quotation marks, which, in ANSI mode, is a blank space, not nonexistent data.
This next table shows the value that is stored when nonexistent data values are written to data sources in ANSI mode:
Writing Nonexistent Data Values in ANSI Mode
Column Data Type
Nonexistent Data Value
Value Stored in a SAS Data Set or an SPD Engine Data Set
Value Stored in the ANSI SQL Null Supported Data Source
DOUBLE
., ._, or .A – .Z
.
null
DOUBLE
null
.
null
CHAR
'␣'
'␣'
'␣'
CHAR
null
'␣'
null
Note: The value '␣' is a blank space between single quotation marks, which, in ANSI mode, is a blank space, not nonexistent data.

Reading and Writing Nonexistent Data in SAS Mode

When the client application uses the SAS mode, nonexistent data values are treated like SAS missing values in the Base SAS environment.
The following table shows how nonexistent data values of data type DOUBLE and CHAR are read in SAS mode:
Reading Nonexistent Data Values in SAS Mode
Column Data Type
Nonexistent Data Value
Value Returned to the Application
DOUBLE
., ._, or .A –.Z
., ._, or .A - .Z
DOUBLE
null
.
CHAR
'␣'
'␣'
CHAR
null
'␣'
Note: The value '␣' is a blank space between single quotation marks, which, in SAS mode, is nonexistent data.
The next table shows how nonexistent data values are written to a data source in SAS mode:
Writing Nonexistent Data Values in SAS Mode
Column Data Type
Nonexistent Data Value
Value Stored in a SAS Data Set or an SPD Engine Data Set
Value Stored in the ANSI SQL Null Supported Data Source
DOUBLE
., ._, or .A - .Z
., ._, or .A - .Z
null
DOUBLE
null
.
null
CHAR
'␣'
'␣'
'␣'
CHAR
null
'␣'
'␣'
Note: The value '␣' is a blank space between single quotation marks, which, in SAS mode, is nonexistent data.

Testing and Modifying Nulls and SAS Missing Values

FedSQL provides the IFNULL function to test for a null value and the NULLIF expression to change a null value.
The IFNULL function takes two expressions as arguments. If the first expression is a null value, it returns the second expression. Otherwise, the function returns the first value:
IFNULL(expression, return_value_if_null_expression)
If the value of expression is null, the function returns the value of return_value_if_null_expression.
In this example, all book names are returned for the books that have an unknown value of numCopies:
select bookName when ifnull(numCopies, 'T') = 'T';
The NULLIF expression also takes two expressions as arguments. If the two expressions are equal, the value that is returned is a null value. Otherwise, the value that is returned is the first SQL expression:
NULLIF(expression, test_value_expression);
Here, if the value of numCopies is a negative value, –1, it is replaced with a null to indicate an unknown value:
update books set numCopies = nullif(numCopies, –1);
For more information, see IFNULL Function and NULLIF Expression.