Overview of the SELECT Statement

How to Use the SELECT Statement

This chapter shows you how to perform the following tasks:
  • retrieve data from a single table by using the SELECT statement
  • validate the correctness of a SELECT statement by using the VALIDATE statement
With the SELECT statement, you can retrieve data from tables or data that is described by SAS data views.
Note: The examples in this chapter retrieve data from tables that are SAS data sets. However, you can use all of the operations that are described here with SAS data views.
The SELECT statement is the primary tool of PROC SQL. You use it to identify, retrieve, and manipulate columns of data from a table. You can also use several optional clauses within the SELECT statement to place restrictions on a query.

SELECT and FROM Clauses

The following simple SELECT statement is sufficient to produce a useful result:
select Name
   from sql.countries;
The SELECT statement must contain a SELECT clause and a FROM clause, both of which are required in a PROC SQL query. This SELECT statement contains the following:
  • a SELECT clause that lists the Name column
  • a FROM clause that lists the table in which the Name column resides

WHERE Clause

The WHERE clause enables you to restrict the data that you retrieve by specifying a condition that each row of the table must satisfy. PROC SQL output includes only those rows that satisfy the condition. The following SELECT statement contains a WHERE clause that restricts the query output to only those countries that have a population that is greater than 5,000,000 people:
select Name
   from sql.countries
   where Population gt 5000000; 

ORDER BY Clause

The ORDER BY clause enables you to sort the output from a table by one or more columns. That is, you can put character values in either ascending or descending alphabetical order, and you can put numerical values in either ascending or descending numerical order. The default order is ascending. For example, you can modify the previous example to list the data by descending population:
select Name
   from sql.countries
   where Population gt 5000000
   order by Population desc;

GROUP BY Clause

The GROUP BY clause enables you to break query results into subsets of rows. When you use the GROUP BY clause, you use an aggregate function in the SELECT clause or a HAVING clause to instruct PROC SQL how to group the data. For details about aggregate functions, see Summarizing Data. PROC SQL calculates the aggregate function separately for each group. When you do not use an aggregate function, PROC SQL treats the GROUP BY clause as if it were an ORDER BY clause, and any aggregate functions are applied to the entire table.
The following query uses the SUM function to list the total population of each continent. The GROUP BY clause groups the countries by continent, and the ORDER BY clause puts the continents in alphabetical order:
select Continent, sum(Population) 
   from sql.countries
   group by Continent
   order by Continent;

HAVING Clause

The HAVING clause works with the GROUP BY clause to restrict the groups in a query's results based on a given condition. PROC SQL applies the HAVING condition after grouping the data and applying aggregate functions. For example, the following query restricts the groups to include only the continents of Asia and Europe:
select Continent, sum(Population)
   from sql.countries
   group by Continent
   having Continent in ('Asia', 'Europe')
   order by Continent;

Ordering the SELECT Statement

When you construct a SELECT statement, you must specify the clauses in the following order:
  1. SELECT
  2. FROM
  3. WHERE
  4. GROUP BY
  5. HAVING
  6. ORDER BY
Note: Only the SELECT and FROM clauses are required.
The PROC SQL SELECT statement and its clauses are discussed in further detail in the following sections.