SELECT Statement

Retrieves columns and rows of data from tables.

Categories:	Data Definition
Categories:	Data Manipulation
Supports:	EXECUTE Statement
Data source:	SAS data set, SPD Engine data set, SPD Server table, Aster, DB2 under UNIX and PC, Greenplum, HAWQ, HDMD, Hive, Impala, MDS, MySQL, Netezza, ODBC, Oracle, PostgreSQL, SAP, SAP HANA, Sybase IQ, Teradata

Syntax

Syntax

The main clauses of the SELECT statement can be summarized as follows.

SELECT <select-list>

FROM <table-specification>

[WHERE <search-condition>]

[GROUP BY <grouping-column>]

[HAVING <search-condition>]

[ORDER BY <sort-specification>]

[LIMIT {count | ALL}]

[OFFSET number]

;

The detailed syntax of the SELECT statement is as follows.

<query-expression>

[ORDER BY <sort-specification> [, ...<sort-specification>]];

<query-expression>::=

{<query-specification> | <query-expression>}

{UNION [ALL] | EXCEPT | INTERSECT} {<query-specification> | <query-expression>}

<query-specification>::=

SELECT [ALL | DISTINCT] <select-list> <table-expression>

<select-list>::=

| column [AS column-alias]

| expression [AS column-alias]

| table.*

| table-alias.*

| table.column [AS column-alias]

<table-expression>::=

FROM <table-specification> [, ...<table-specification>]

[WHERE <search-condition>]

[GROUP BY <grouping-column> [, ...<grouping-column>]]

[HAVING <search-condition>]

<table-specification>::=

table [[AS] alias]

| CONNECTION TO catalog (<native-syntax>) [[AS] table-alias]

| (<query-specification>) [AS] alias

| <joined-table>

<joined-table>::=

<cross-join>

| <qualified-join>

| <natural-join>

<cross-join>::=

<table-specification> CROSS JOIN <table-specification>

<qualified-join>::=

<table-specification> [<join-type>] JOIN <table-specification> <join-specification>

<natural-join>::=

<table-specification> NATURAL [<join-type>] JOIN <table-specification>

<join-type>::=

INNER

| LEFT [OUTER]

| RIGHT [OUTER]

| FULL [OUTER]

<join-specification>::=

ON <search-condition>

| USING (column [, ...column])

<search-condition>::=

{

[NOT] {<sql-expression> | (<search-condition>)}

[{AND | OR} [NOT] {<sql-expression> | (<search-condition>)}]

}

[,... {[NOT] {<sql-expression> | (<search-condition>)}
[{AND | OR} [NOT] {<sql-expression> | (<search-condition>)}]}]

<sql-expression>::=

expression {operator | predicate} expression

<sort-specification>::=

{order-by-expression [ASC | DESC]} [, ...order-by-expression [ASC | DESC]]

<grouping-column>::=

column [, ...column]

| column-position-number

| <sql-expression>

Arguments

See the following sections for syntax argument descriptions.

SELECT Clause
FROM Clause
WHERE Clause
GROUP BY Clause
HAVING Clause
ORDER BY Clause
UNION Operator
LIMIT Clause
OFFSET Clause
EXCEPT Operator
INTERSECT Operator

Details

Overview

The SELECT statement can be used in two ways.

The single row SELECT statement, which can be executed by itself, returns only one row. For example:
```
select 42;
select 42 as x;
```
The first code fragment returns a single column that contains the value 42. The column is named “column”. The second code fragment returns a similar column. However, the column is named “x”.
A query specification begins with the SELECT keyword (called a SELECT clause) and cannot be used by itself. It reads column values from one or more tables and enables you to define conditions for the data that will be returned from the tables. It must be used as a part of another SQL statement and can return more than one row. A query specification creates a virtual table. For example:
```
select column(s)
from table(s)
where condition(s);
```

The order of clauses in the SELECT statement is important. The optional clauses can be omitted but, when used, they must appear in the appropriate order. A SELECT statement can be specified within a SELECT statement (called a subquery). The ORDER BY, OFFSET, and LIMIT clauses can be used only on the outermost SELECT of a SELECT statement.

A view can be specified in the SELECT statement wherever a table can be specified. FedSQL supports native DBMS views and FedSQL views. It does not support PROC SQL views.

Note: There is no limit on the number of tables that you can reference in a FedSQL query. However, queries with a large number of table references can cause performance issues.

SELECT Clause

Description

Lists the columns that will appear in a virtual result table.

Syntax

SELECT [ALL | DISTINCT] <select-list>

<select-list>::=

| column [AS column-alias]

| <query-specification>

| <sql-expression> [AS column-alias]

| table.*

| table-alias.*

| table.column [AS column-alias]

Arguments

ALL

includes all rows, including duplicate rows in the result table.

DISTINCT

eliminates duplicate rows in the result table.

<select-list>

specifies the columns to be selected for the result table.

*

selects all columns in the table that is listed in the FROM clause.

column-alias

assigns a temporary, alternate name to the column.

column [AS column-alias]

selects a single column. When [AS column-alias] is specified, assigns the column alias to the column.

<query-specification>

specifies an embedded SELECT subquery.

See	Overview of Subqueries

<sql-expression> [AS column-alias]

derives a column name from an expression.

See	<sql-expression>

table.*

selects all columns in the table.

table-alias.*

selects all columns in the table.

See	Table Aliases

table.column [AS column-alias]

selects a single column from the specified table.

Asterisk (*) Notation

The asterisk (*) represents all columns of the table or tables that are listed in the FROM clause. When an asterisk is not prefixed with a table name, all the columns from all tables in the FROM clause are included; when it is prefixed (for example, table.* or table-alias.*), all the columns from only that table are included.

Column Aliases

A column alias is a temporary, alternate name for a column. Aliases are specified in the SELECT clause to name or rename columns in the result table in order to be clearer or easier to read. Aliases are often used to name a column that is the result of an arithmetic expression or summary function. An alias is one word only.

The keyword AS is required to distinguish a column alias from other column names.

Column aliases are optional, and each column name in the SELECT clause can have an alias. After you assign an alias to a column, you can use the alias to refer to that column in other clauses.

FROM Clause

Description

(Optional) specifies source tables.

Syntax

FROM <table-specification> [, ...<table-specification>]

<table-specification>::=

table [[AS] table-alias]

| CONNECTION TO catalog (<native-syntax>) [[AS] alias]

| (<query-specification>) [AS] alias

| <joined-table>

<joined-table>::=

<cross-join>

| <qualified-join>

| <natural-join>

<cross-join>::=

<table-specification> CROSS JOIN <table-specification>

<qualified-join>::=

<table-specification> [<join-type>] JOIN <table-specification> <join-specification>

<natural-join>::=

<table-specification> NATURAL [<join-type>] JOIN <table-specification>

<join-type>::=

INNER

| LEFT [OUTER]

| RIGHT [OUTER]

| FULL [OUTER]

<join-specification>::=

ON <search-condition>

| USING (column [, ...column])

Arguments

CONNECTION TO catalog (<native-syntax>) [[AS] alias]

specifies data from a DBMS catalog by using the SQL pass-through facility. You can use SQL syntax that the DBMS understands, even if that syntax is not valid in FedSQL. For more information, see FedSQL Explicit Pass-Through Facility.

CROSS JOIN

defines a join that is the Cartesian product of two tables.

See	Cross Joins

JOIN

defines a join that enables you to filter the data by using a search condition or by using specific columns.

See	Qualified Joins

NATURAL JOIN

defines a join that selects rows from two tables that have equal values in columns that share the same name and the same type.

See	Natural Joins

(<query-specification>) [AS] alias

specifies an embedded SELECT subquery that functions as an in-line view. alias defines a temporary name for the in-line view and is required. An in-line view saves you a programming step. Rather than creating a view and referring to it in another query, you can specify the view in-line in the FROM clause.

See	Overview of Subqueries

table

specifies the name of a table. The name can be in any of the forms described for table.

table-alias

specifies a temporary, alternate name for table. The AS keyword is optional.

INNER

specifies that only the subset of rows from the first table that matches rows from the second table are returned. Unmatched rows from both tables are discarded.

LEFT [OUTER]

specifies that matching rows and rows from the first table that do not match any row in the second table are returned.

RIGHT [OUTER]

specifies that matching rows and rows from the second table that do not match any row in the first table are returned.

FULL [OUTER]

species that all matching and unmatching rows from the first and second table are returned.

column

specifies the name of a column.

ON <search-condition>

specifies a condition join used to match rows from one table to another. If the search condition is satisfied, the matching rows are added to the result table.

See	<search-condition>

USING (column [,...column])

specifies which columns to use in an inner or outer join.

See	ON and USING Clauses

Overview

The FROM clause enables you to specify source tables. You can reference tables by specifying their table name, by submitting a query to a DBMS, by specifying an embedded SELECT subquery, or by specifying a join.

Table Aliases

A table alias is a temporary, alternate name for a table. Table aliases are used in joins to distinguish the columns of one table from those in the other table or tables and can make a query easier to read by abbreviating the table names. A table name or alias must be prefixed to a column name when you are joining tables that have matching column names. Column names in reflexive joins (joining a table with itself) must be prefixed with a table alias in order to distinguish which copy of the table the column comes from. A table alias cannot be given an alias.

Joined Tables

When multiple table specifications are listed in the FROM clause, they are processed to form one table. The result table contains data from each contributing table. These queries are referred to as joins. Joins do not alter the original table.

Conceptually, when two tables are specified, each row of table A is matched with all the rows of table B to produce an internal or intermediate table. The number of rows in the intermediate table (Cartesian ) is equal to the product of the number of rows in each of the source tables. The intermediate table becomes the input to the rest of the query in which some of its rows can be eliminated by the WHERE, ON, or USING clause or summarized by a function.

For an overview of FedSQL join operations, see Join Operations.

Specifying the Rows to Be Returned

The WHERE, ON, and USING clauses contain the conditions under which the rows in the Cartesian product are kept or eliminated in the result table. WHERE is used to select rows from inner joins. ON is used to select rows from inner or outer joins. USING is used to select specific columns to be included in the join. The condition is evaluated for each row from each table in the intermediate table described in Joined Tables. The row is considered to be a match if the result of the expression is true (a nonzero, nonmissing, or null value) for that row.

Simple Joins

The most basic type of join is simply a list of multiple tables, separated by commas, in the FROM clause of a SELECT statement. The following query joins the two tables GrainProducts and Sales that are shown in Tables Used in Examples.

/* FedSQL code for simple join */
proc fedsql;
   title 'Simple Join - GrainProducts and Sales';
   select * from grainproducts, sales;
quit;

Simple Join - GrainProducts and Sales Table

Joining tables in this way returns the Cartesian of the tables. Each row from the first table is combined with every row of the second table. The number of rows in the result table is equal to the number of rows in the first table multiplied by the number of rows in the second table.

The Cartesian product of a simple join can result in large, meaningless tables. You can subset a simple join by using a WHERE clause. This type of simple join is known as an equijoin. The following query subsets the previous table by matching the ID columns and creates the table shown in Equijoin - GrainProducts and Sales Table.

/* FedSQL code for equijoin */
proc fedsql;
   title 'Equijoin - GrainProducts and Sales';
   select * from grainproducts, sales
      where grainproducts.prodid=sales.prodid;
quit;

In an equijoin, the comparison has to be an equality comparison. Multiple match criteria (not shown here) can be specified by using the AND operator. When multiple match criteria are specified, only rows meeting all of the equality tests are returned.

Equijoin - GrainProducts and Sales Table

Cross Joins

The cross join functions the same as a simple join; it returns the product of two tables. Like a Cartesian product, a cross join's output can be limited by a WHERE clause.

The following queries produce the same result.

select * from grainproducts, sales;

select * from grainproducts cross join sales;

Note: Do not use an ON clause with a cross join. An ON clause causes a cross join to fail. However, you can use a WHERE clause to subset the output.

Qualified Joins

Qualified joins provide an easier way to control which rows appear in the result table. You can also further subset the result table with the ON or USING clause.

The two types of qualified joins are inner and outer.

Inner Joins

Inner Join Diagram

An inner join returns a result table for all the rows in one table that have one or more matching rows in another table. Using the GrainProducts and Sales tables, the following query matches the product ID columns of the two tables and creates the result table shown in Inner Join - GrainProducts and Sales Table.

proc fedsql;
   title 'Inner Join - GrainProducts and Sales';
   select *
      from grainproducts inner join sales
         on grainproducts.prodid=sales.prodid;
quit;

Inner Join - GrainProducts and Sales Table

You can use the ON or USING clause instead of the WHERE clause to specify the column or columns on which you are joining the tables. However, you can continue to use the WHERE clause to subset the query result.

Note that an inner join with an ON or USING clause can provide the same functionality as listing tables in the FROM clause and specifying join columns with a WHERE clause (an equijoin). For example, these two sets of code use the inner join construction.

select *
   from grainproducts inner join sales
      on grainproducts.prodid=sales.prodid;

select *
   from grainproducts inner join sales
      using (prodid);

This code produces the same output as the previous code but uses the inner join construction.

select *
   from grainproducts, sales
      where grainproducts.prodid=sales.prodid;

Outer Joins

Outer joins are inner joins that have been augmented with rows from one table that do not match with any row from the other table in the join. The result table includes rows that match and rows that do not match from the join's source tables. Nonmatching rows have null or missing values in the columns from the unmatched table. You can use the ON or USING clause instead of the WHERE clause to specify the column or columns on which you are joining the tables. However, you can continue to use the WHERE clause to subset the query result.

The three types of outer joins are left, right, and full.

Left Outer Joins

Left Outer Join Diagram

A left outer join lists matching rows and rows from the first table listed in the FROM clause that do not match any row in the second table listed in the FROM clause. Using the GrainProducts and Sales tables, the following code creates a table with matching rows from the GrainProducts and Sales tables and the unmatched rows from the GrainProducts table. Note that missing values are shown for Sales table data in the unmatched row from the GrainProducts table.

title 'Left Outer Join - GrainProducts and Sales';
select *
   from grainproducts left outer join sales
      on grainproducts.prodid=sales.prodid;

Left Outer Join - GrainProducts and Sales Table

Right Outer Joins

Right Outer Join Diagram

A right outer join lists matching rows and rows from the second table listed in the FROM clause that do not match any row in the first table listed in the FROM clause. Using the GrainProducts and Sales tables, the following code creates a table with matching rows from the GrainProducts and Sales tables and the unmatched rows from the Sales table. Note that missing values are shown for GrainProducts table data in the unmatched row from the Sales table.

title 'Right Outer Join - GrainProducts and Sales';
select *
   from grainproducts right outer join sales
      on grainproducts.prodid=sales.prodid;

Right Outer Join - GrainProducts and Sales Table

Full Outer Joins

Full Outer Join Diagram

A full outer join combines the left outer join and the right outer join. The result table contains both the matching and unmatching rows from the left and right tables. Using the GrainProducts and Sales tables, the following code creates a table with matching rows from the GrainProducts and Sales tables and the unmatched rows from the GrainProducts and Sales tables. Note that missing values are shown for data in the unmatched rows.

title 'Full Outer Join - GrainProducts and Sales';
select *
   from grainproducts full outer join sales
      on grainproducts.prodid=sales.prodid;

Full Outer Join - GrainProducts and Sales Table

ON and USING Clauses

You can use an ON clause with an expression that specifies a condition on which the join is based. The conditional expression can contain any predicate, although column names and comparison operators are most often used. The ON clause with an inner join is equivalent to a WHERE clause. The ON clause with an outer join (left, right, or full) is different from a WHERE clause. The ON clause with an outer join filters the rows and then includes the nonmatched rows with the null or missing values.

You can use a USING clause to specify one of two columns to include in the result table.

The difference between the ON clause and the USING clause is that you can use any conditional expression with the ON clause. The USING clause always implies an equality between the column names. For example, this ON clause eliminates

United
                           States

from the results table.

title 'Inner Join - GrainProducts and Sales Outside US';
select *
   from grainproducts inner join sales
      on sales.country <> 'United States'
         AND grainproducts.prodid=sales.prodid;

Inner Join - GrainProducts and Sales outside the US

Natural Joins

A natural join selects rows from two tables that have equal values in columns that share the same name and the same type. An error results if two columns have the same name but different types. You can perform an inner, left, right, or full natural join. If join-type is omitted when specifying a natural join, then INNER is implied. If like columns are not found, then a cross join is performed. You can use a WHERE clause to limit the output.

Using the GrainProducts and Sales tables, the following code performs a natural left outer join.

title 'Natural Left Outer Join - GrainProducts and Sales';
select *
   from grainproducts natural left outer join sales;

Natural Left Outer Join - GrainProducts and Sales Table

Notice that the prodid column appears only once in the result table.

Note: Do not use an ON clause with a natural join. An ON clause causes a natural join to fail. When using a natural join, an ON clause is implied, matching all like columns.

WHERE Clause

Description

Subsets the result table based on the specified search conditions.

Syntax

WHERE <search-condition>

Arguments

<search-condition>

specifies the conditions for the rows returned by the WHERE clause.

See	<search-condition>

Details

The WHERE clause requires a search condition (one or more expressions separated by an operand or predicate) that specifies which rows are chosen for inclusion in the result table. When a condition is met (that is, the condition resolves to true), those rows are displayed in the result table. Otherwise, no rows are displayed.

Note: You cannot use aggregate functions that specify only one column. For example, you cannot use the following code.

where max(inventory1)>10000;

However, you can use this WHERE clause.

where max(inventory1, inventory2)>10000;

Note: If a column contains REAL or DOUBLE values, avoid using a WHERE clause with the = and the <> operators. REAL and DOUBLE values are approximate numeric data types and can give inaccurate results when used in a WHERE clause with the = and the <> operators. You should limit REAL and DOUBLE columns to comparisons with the > or < operator.

GROUP BY Clause

Description

Specifies how to group the data for summarizing.

Syntax

GROUP BY <grouping-column> [, ...<grouping-column>]

<grouping-column>::=

column [, ...column]

| column-position-number

| <sql-expression>

Arguments

column

specifies the name of a column or a column alias.

column-position-number

specifies a nonnegative integer that equates to a column position.

<sql-expression>

specifies a valid SQL expression.

See	<sql-expression>

Details

The GROUP BY clause groups data by a specified column or columns.

If the column or columns on which you are grouping contain missing or null values in some rows, SAS collects all the rows with missing or null values in the grouping columns into a single group.

You can specify more than one grouping column to get more detailed reports. If more than one grouping column is specified, then the first one determines the major grouping.

Integers can be substituted for column names in the GROUP BY clause. For example, if the grouping column is 2, then the results are grouped by values in the second column. Note that if you use a floating-point value (for example, 2.3) instead of an integer, then FedSQL ignores the decimal portion.

You can group the output by the values that are returned by an expression. For example, if X is a numeric variable, then the output of the following is grouped by the values of X.

select x, sum(y)
   from table1
   group by x;

Similarly, if Y is a character variable, then the output of the following is grouped by the values of Y.

select sum(x), y
   from table1
   group by y;

When you use a GROUP BY clause, you can also use an aggregate function in the SELECT clause or in a HAVING clause to instruct SAS in how to summarize the data for each group. When you use a GROUP BY clause without an aggregate function, SAS treats the GROUP BY clause as if it were an ORDER BY clause.

You can use the ORDER BY clause to specify the order in which rows are displayed in the result table. If you do not specify the ORDER BY clause, groups returned by the GROUP BY clause are not in any particular order.

Note: For an SPD Engine data set, utility files are used for certain operations that need extra space. The GROUP BY clause requires a utility file and the Base SAS UTILLOC system option allocates space for that utility file. For more information, see SAS Scalable Performance Data Engine: Reference.

Note: FedSQL does not support remerging of summary statistics.

HAVING Clause

Description

Subsets grouped data based on specified search conditions.

Syntax

HAVING <search-condition>

Arguments

<search-condition>

specifies the conditions for the rows returned by the HAVING clause.

See	<search-condition>

Details

The HAVING clause requires a search condition (one or more expressions separated by an operand or predicate) that specifies which rows are chosen for inclusion in the result table. A HAVING clause evaluates as either true or false for each group in a query. You can use a HAVING clause with a GROUP BY clause to filter grouped data. The HAVING clause affects groups in a way that is similar to how a WHERE clause affects individual rows.

Queries that contain a HAVING clause usually also contain a GROUP BY clause, an aggregate function, or both. When you use a HAVING clause without a GROUP BY clause, SAS treats the HAVING clause as if it were a WHERE clause.

Differences between the HAVING Clause and WHERE Clause
HAVING clause attributes	WHERE clause attributes
typically used to specify conditions for including or excluding groups of rows from a table	used to specify conditions for including or excluding individual rows from a table
must follow the GROUP BY clause in a query, if used with a GROUP BY clause	must precede the GROUP BY clause in a query, if used with a GROUP BY clause
affected by a GROUP BY clause; when there is no GROUP BY clause, the HAVING clause is treated like a WHERE clause	not affected by a GROUP BY clause
processed after the GROUP BY clause and any aggregate functions	processed before a GROUP BY clause, if there is one, and before any aggregate functions

ORDER BY Clause

Description

Specifies the order in which rows are returned in a result table.

Syntax

ORDER BY <sort-specification> [, ...<sort-specification>];

<sort-specification>::=

{order-by-expression [ASC | DESC]} [, ...order-by-expression [ASC | DESC]]

| {order-by-expression [COLLATE collating-sequence-options]}
[, ...order-by-expression [COLLATE collating-sequence-options]]

Arguments

order-by-expression

specifies a column on which to sort. The sort column can be one of the following.

column

specifies the name of a column or a column alias.

column-position-number

specifies a nonnegative integer that equates to a column position.

<sql-expression>

specifies any valid SQL expression.

See	<sql-expression>

ASC

orders the data in ascending order. This is the default order; if ASC or DESC are not specified, the data is ordered in ascending order.

DESC

orders the data in descending order.

COLLATE collating-sequence-options

specifies linguistic collation, which sorts characters according to rules of the specified language. The rules and default collating sequence options are based on the language specified in the current locale setting. The implementation is provided by the International Components for Unicode (ICU) library and produces results that are largely compatible with the Unicode Collation Algorithms (UCA).

The collating-sequence-options argument can be one of the following values:

DANISH | FINNISH | ITALIAN | NORWEGIAN | POLISH | SPANISH | SWEDISH

sorts characters according to the language specified.

LINGUISTIC [collating-rules]

collating-rules can be one of the following values:

ALTERNATE_HANDLING=SHIFTED

controls the handling of variable characters like spaces, punctuation, and symbols. When this option is not specified (using the default value NON_IGNORABLE), differences among these variable characters are of the same importance as differences among letters. If the ALTERNATE_HANDLING option is specified, these variable characters are of minor importance.

Default	NON_IGNORABLE
Tip	The SHIFTED value is often used in combination with STRENGTH= set to Quaternary. In such a case, spaces, punctuation, and symbols are considered when comparing strings, but only if all other aspects of the strings (base letters, accents, and case) are identical.

CASE_FIRST=

specify order of uppercase and lowercase letters. This argument is valid for only TERTIARY, QUATERNARY, or IDENTICAL levels. The following table provides the values and information for the CASE_FIRST argument:

CASE_FIRST= Values
Value	Description
UPPER	Sorts uppercase letters first, then the lowercase letters.
LOWER	Sorts lowercase letters first, then the uppercase letters.

COLLATION=

The following table lists the available COLLATION= values. If you do not select a collation value, then the user's locale-default collation is selected.

COLLATION= Values
Value	Description
BIG5HAN	specifies Pinyin ordering for Latin and specifies big5 charset ordering for Chinese, Japanese, and Korean characters.
DIRECT	specifies a Hindi variant.
GB2312HAN	specifies Pinyin ordering for Latin and specifies gb2312han charset ordering for Chinese, Japanese, and Korean characters.
PHONEBOOK	specifies a telephone-book style for ordering of characters. Select PHONEBOOK only with the German language.
PINYIN	specifies an ordering for Chinese, Japanese, and Korean characters based on character-by-character transliteration into Pinyin. This ordering is typically used with simplified Chinese.
POSIX	is the Portable Operating System Interface. This option specifies a "C" locale ordering of characters.
STROKE	specifies a nonalphabetic writing style ordering of characters. Select STROKE with Chinese, Japanese, Korean, or Vietnamese languages. This ordering is typically used with Traditional Chinese.
TRADITIONAL	specifies a traditional style for ordering of characters. For example, select TRADITIONAL with the Spanish language.

LOCALE= locale_name

specifies the locale name in the form of a POSIX name(for example, ja_JP). For more information, see SAS National Language Support (NLS): Reference Guide

NUMERIC_COLLATION=

orders integer values within the text by the numeric value instead of characters used to represent the numbers.

NUMERIC_COLLATION= Values
Value	Description
ON	Order numbers by the numeric value. For example, "8 Main St." would sort before "45 Main St.".
OFF	Order numbers by the character value. For example, "45 Main St." would sort before "8 Main St.".

Default

OFF

STRENGTH=

The value of strength is related to the collation level. There are five collation-level values. The following table provides information about the five levels. The default value for strength is related to the locale.

STRENGTH= Values
Value	Type of Collation	Description
PRIMARY or 1	PRIMARY specifies differences between base characters (for example, "a" < "b").	It is the strongest difference. For example, dictionaries are divided into different sections by base character.
SECONDARY or 2	Accents in the characters are considered secondary differences (for example, "as" < "às" < "at").	A secondary difference is ignored when there is a primary difference anywhere in the strings. Other differences between letters can also be considered secondary differences, depending on the language.
TERTIARY or 3	Upper and lowercase differences in characters are distinguished at the tertiary level (for example, "ao" < "Ao" < "aò").	A tertiary difference is ignored when there is a primary or secondary difference anywhere in the strings. Another example is the difference between large and small Kana.
QUATERNARY or 4	When punctuation is ignored at level 1–3, an additional level can be used to distinguish words with and without punctuation (for example, "ab" < "a-b" < "aB").	The quaternary level should be used if ignoring punctuation is required or when processing Japanese text. This difference is ignored when there is a primary, secondary, or tertiary difference.
IDENTICAL or 5	When all other levels are equal, the identical level is used as a tiebreaker. The Unicode code point values of the Normalization Form D (NFD) form of each string are compared at this level, just in case there is no difference at levels 1–4.	This level should be used sparingly, as only code point values differences between two strings is an extremely rare occurrence. For example, only Hebrew cantillation marks are distinguished at this level.

Alias

LEVEL=

Restriction	Linguistic collation is not supported on platforms VMS on Itanium (VMI) or 64-bit Windows on Itanium (W64).
Tip	The collating-rules must be enclosed in parentheses. More than one collating rule can be specified.
See	ICU License - ICU 1.8.1 and later.
	The section on Linguistic Collation in SAS National Language Support (NLS): Reference Guide.
	Refer to http://www.unicode.org for the Unicode Collation Algorithm (UCA) specification.

Details

The ORDER BY clause sorts the result of a query expression according to the order specified in that query. When this clause is used, the default ordering sequence is ascending, from the lowest value to the highest.

If an ORDER BY clause is omitted, then a particular order to the output rows, such as the order in which the rows are encountered in the queried table, cannot be guaranteed. Without an ORDER BY clause, the order of the output rows is determined by the internal processing of FedSQL, the default collating sequence of SAS, and your operating environment. Therefore, if you want your result table to appear in a particular order, then use the ORDER BY clause.

If more than one order-by-expression is specified (separated by commas), then the first one determines the major sort order.

Integers can be substituted for column names in the ORDER BY clause. For example, if the order-by-expression is 2, then the results are ordered by values in the second column. Note that if you use a floating-point value (for example, 2.3) instead of an integer, then FedSQL issues an error message.

In the ORDER BY clause, you can specify any column of a table that is specified in the FROM clause of a query expression, regardless of whether that column has been included in the query's SELECT clause. However, if SELECT DISTINCT is specified, or if the SELECT statement contains a UNION operator, the sort column must appear in the query's SELECT clause.

Note: SAS missing values or null values are treated as the lowest possible values.

Note: For an SPD Engine data set, utility files are used for certain operations that need extra space. The ORDER BY clause requires a utility file and the Base SAS UTILLOC system option allocates space for that utility file. For more information, see SAS Scalable Performance Data Engine: Reference.

LIMIT Clause

Description

Specifies the number of rows that the SELECT statement returns.

Syntax

LIMIT {count | ALL}

Arguments

count

specifies the number of rows that the SELECT statement returns.

Tip	count can be an integer or any simple expression that resolves to an integer value.

ALL

specifies that all rows are returned.

Details

The LIMIT clause can be used alone or in conjunction with the OFFSET clause. The OFFSET clause specifies the number of rows to skip before the SELECT statement starts to return rows.

Note: When you use the LIMIT clause, it is recommended that you use an ORDER BY clause to create an ordered sequence. Otherwise, you can get an unpredictable subset of a query's rows.

OFFSET Clause

Description

Specifies the number of rows to skip before the SELECT statement starts to return rows.

Syntax

OFFSET number

Arguments

number

specifies the number of rows to skip.

Tip	number can be an integer or any simple expression that resolves to an integer value.

Details

The OFFSET clause can be used alone or in conjunction with the LIMIT clause. The OFFSET clause specifies the number of rows to skip before the SELECT statement starts to return rows.

Note: When you use the OFFSET clause, it is recommended that you use an ORDER BY clause to create an ordered sequence. Otherwise, you get an unpredictable subset of a query's rows.

UNION Operator

Descriptions

Combines the result of two or more queries into a single result table.

Syntax

{<query-specification> | <query-expression>}

UNION [ALL | DISTINCT] {<query-specification> | <query-expression>}

Arguments

<query-specification> | <query-expression>

specifies one or more SELECT statements that produces a virtual table.

See	SELECT Statement
	Overview of Subqueries

UNION

specifies that multiple result tables are combined and returned as a single result table.

ALL

specifies that all rows, including duplicates, are included in the result table. If not specified, all rows are returned.

DISTINCT

specifies that only unique rows can appear in the result table.

See	DISTINCT Predicate

Details

The UNION set operator produces a table that contains all the unique rows that result from both queries. That is, the result table contains rows produced by the first query, the second query, or both.

Columns are appended by position in the tables, regardless of the column names. However, the data type of the corresponding columns must match or be of a corresponding data type where length is the only differentiator, or the union does not occur. That is, CHAR can be merged with CHAR or VARCHAR, and INTEGER can be merged with BIGINT, INTEGER, SMALLINT, or TINYINT. However, CHAR or INTEGER cannot be merged with DOUBLE.

When columns of different lengths are merged, the length of the column from the left-hand position is used. For example:

INTEGER UNION ALL BIGINT results in INTEGER
BIGINT UNION ALL INTEGER results in BIGINT
CHAR(20) UNION ALL CHAR(10) results in CHAR(20)
CHAR(10) UNION ALL CHAR(20) results in CHAR(10)

To avoid data truncation, specify the wider data type in the left-hand position. If that is not possible, use the CAST function to convert the column to a wider data type.

The names of the columns in the result table are the names of the columns from the first query expression or query-specification unless a column (such as an expression) has no name in the first query expression or query-specification. In such a case, the name of that column in the result table is “column”.

The UNION set operator automatically eliminates duplicate rows from its result tables. The optional ALL keyword preserves the duplicate rows, reduces the execution by one step, and thereby improves the query's performance. You use it when you want to display all the rows resulting from the query, rather than just the unique rows. The ALL keyword is used only when a set operator is also specified.

EXCEPT Operator

Description

Combines the result of two or more queries into a single result table that contains only rows that are in the first query but not in the second query.

Syntax

{<query-specification> | <query-expression>}

EXCEPT

{

[ALL | DISTINCT]

| CORRESPONDING [BY (column [, ...column])]

}

{<query-specification> | <query-expression>}

Arguments

<query-specification> | <query-expression>

specifies one or more SELECT statements that produces a virtual table.

See	SELECT Statement
	Overview of Subqueries

EXCEPT

specifies that multiple result tables are combined and only those rows that are in the first result table and not in the second result table are included.

ALL

specifies that all rows, including duplicates, are included in the result table. If not specified, all rows are returned.

DISTINCT

specifies that only unique rows can appear in the result table.

See	DISTINCT Predicate

CORRESPONDING

specifies the columns to include in the query.

Tip	If you do not specify the BY clause, the result table will include every column that appears in both of the tables.

BY

specifies that only these columns be included in the result table.

column

specifies the name of the column.

Restriction

Every column must be a valid column in both tables.

Details

The EXCEPT set operator produces (from the first query) a result table that has unique rows that are not in the second query. If the intermediate result from the first query has at least one occurrence of a row that is not in the intermediate result of the second query, then that row (from the first query) is included in the result table.

The EXCEPT set operator automatically eliminates duplicate rows from its result tables. The optional ALL keyword preserves the duplicate rows, reduces the execution by one step, and thereby improves the query's performance. You use it when you want to display all the rows resulting from the query, rather than just the unique rows. The ALL keyword is used only when a set operator is also specified.

INTERSECT Operator

Description

Combines the result of two or more queries into a single result table that contains only rows that are common to both queries.

Syntax

{<query-specification> | <query-expression>}

INTERSECT

{

[ALL | DISTINCT]

| CORRESPONDING [BY (column [, ...column)]]

}

{<query-specification> | <query-expression>}

Arguments

<query-specification> | <query-expression>

specifies one or more SELECT statements that produces a virtual table.

See	SELECT Statement
	Overview of Subqueries

INTERSECT

specifies that multiple result tables are combined and only those rows that are common to both result tables are included.

ALL

specifies that all rows, including duplicates, are included in the result table. If not specified, all rows are returned.

DISTINCT

specifies that only unique rows can appear in the result table.

See	DISTINCT Predicate

CORRESPONDING

specifies the columns to include in the query.

Tip	If you do not specify the BY clause, the result table will include every column that appears in both of the tables.

BY

specifies that only these columns to be included in the result table.

column

specifies the name of the column.

Restriction

Every column must be a valid column in both tables.

Details

The INTERSECT operator produces a result table that has rows that are common to both tables.

The INTERSECT set operator automatically eliminates duplicate rows from its result tables. The optional ALL keyword preserves the duplicate rows, reduces the execution by one step, and thereby improves the query's performance. You use it when you want to display all the rows resulting from the query, rather than just the unique rows. The ALL keyword is used only when a set operator is also specified.

<search-condition>

Description

Is a combination of one or more operators and predicates that specifies which rows are chosen for inclusion in the result table.

Syntax

<search-condition>::=

{

[NOT] {<sql-expression> | (<search-condition>)}

[{AND | OR} [NOT] {<sql-expression> | (<search-condition>)}]

}

[, ...{[NOT] {<sql-expression> | (<search-condition>)}
[{AND | OR} [NOT] {<sql-expression> | (<search-condition>)}]}]

<sql-expression>::=

expression {operator | predicate} expression

Arguments

NOT

negates a Boolean condition. This table outlines the outcomes when you compare true and false values using the NOT operator.

Truth Table for the NOT Operator
NOT	Result
True	False
False	True
Unknown	Unknown

AND

combines two conditions by finding observations that satisfy both conditions. This table outlines the outcomes when you compare TRUE and FALSE values using the AND operator.

Truth Table for the AND Operator
AND	True	False	Unknown
True	True	False	Unknown
False	False	False	False
Unknown	Unknown	False	Unknown

OR

combines two conditions by finding observations that satisfy either condition or both. This table outlines the outcomes when you compare TRUE and FALSE values using the OR operator.

Truth Table for the OR Operator
OR	True	False	Unknown
True	True	True	True
False	True	False	Unknown
Unknown	True	Unknown	Unknown

<sql-expression>

specifies any valid SQL expression.

See	<sql-expression>

Details

The search condition specifies which rows are returned in a result table for a SELECT statement. Within the SELECT statement, the search condition is used in the WHERE clause, the HAVING clause, and the ON clause with joins.

The order of precedence for the logical operators is NOT, AND, and then OR, but you can override the order by using parentheses. Everything within the parentheses is evaluated first to yield a single value before that value can be used by any operator outside of the parentheses.

There can be precision issues when REAL values are involved in a search condition. To avoid these issues, it is recommended that you define or create a cast for the value in the condition. In other words, wherever the value is used in the original query, force a cast with this syntax.

CAST (value AS type)

Here are some examples:

select * from test where x = cast (1.0e20 as real);
select cast (1.0e20 as real) from test;
select cast (col1 as real) from test;

Note: The search condition is also used with the UPDATE Statement and the DELETE Statement. For the UPDATE statement, the search specification specifies which rows are updated. For the DELETE statement, the search specification specifies which rows are deleted.