Retrieving Data from a Single Table |
The WHERE clause enables you to retrieve only rows from a table that satisfy a condition. WHERE clauses can contain any of the columns in a table, including columns that are not selected.
Using a Simple WHERE Clause |
The following example uses a WHERE clause to find all countries that are in the continent of Europe and their populations:
proc sql outobs=12; title 'Countries in Europe'; select Name, Population format=comma10. from sql.countries where Continent = 'Europe';
Countries in Europe Name Population ----------------------------------------------- Albania 3,407,400 Andorra 64,634 Austria 8,033,746 Belarus 10,508,000 Belgium 10,162,614 Bosnia and Herzegovina 4,697,040 Bulgaria 8,887,111 Channel Islands 146,436 Croatia 4,744,505 Czech Republic 10,511,029 Denmark 5,239,356 England 49,293,170
Retrieving Rows Based on a Comparison |
You can use comparison operators in a WHERE clause to select different subsets of data. The following table lists the comparison operators that you can use:
Symbol | Mnemonic Equivalent | Definition | Example |
---|---|---|---|
= | EQ | equal to | where Name = 'Asia'; |
^= or ~= or ¬= or <> | NE | not equal to | where Name ne 'Africa'; |
> | GT | greater than | where Area > 10000; |
< | LT | less than | where Depth < 5000; |
>= | GE | greater than or equal to | where Statehood >= '01jan1860'd; |
<= | LE | less than or equal to | where Population <= 5000000; |
The following example subsets the SQL.UNITEDSTATES table by including only states with populations greater than 5,000,000 people:
proc sql; title 'States with Populations over 5,000,000'; select Name, Population format=comma10. from sql.unitedstates where Population gt 5000000 order by Population desc;
Retrieving Rows Based on a Comparison
States with Populations over 5,000,000 Name Population ----------------------------------------------- California 31,518,948 New York 18,377,334 Texas 18,209,994 Florida 13,814,408 Pennsylvania 12,167,566 Illinois 11,813,091 Ohio 11,200,790 Michigan 9,571,318 New Jersey 7,957,196 North Carolina 7,013,950 Georgia 6,985,572 Virginia 6,554,851 Massachusetts 6,071,816 Indiana 5,769,553 Washington 5,307,322 Missouri 5,285,610 Tennessee 5,149,273 Wisconsin 5,087,770 Maryland 5,014,048
Retrieving Rows That Satisfy Multiple Conditions |
You can use logical, or Boolean, operators to construct a WHERE clause that contains two or more expressions. The following table lists the logical operators that you can use:
Symbol | Mnemonic Equivalent | Definition | Example |
---|---|---|---|
& | AND | specifies that both the previous and following conditions must be true | Continent = 'Asia' and Population > 5000000 |
! or | or ¦ | OR | specifies that either the previous or the following condition must be true | Population < 1000000 or Population > 5000000 |
^ or ~ or ¬ | NOT | specifies that the following condition must be false | Continent not 'Africa' |
The following example uses two expressions to include only countries that are in Africa and that have a population greater than 20,000,000 people:
proc sql; title 'Countries in Africa with Populations over 20,000,000'; select Name, Population format=comma10. from sql.countries where Continent = 'Africa' and Population gt 20000000 order by Population desc;
Retrieving Rows That Satisfy Multiple Conditions
Countries in Africa with Populations over 20,000,000 Name Population ----------------------------------------------- Nigeria 99,062,003 Egypt 59,912,259 Ethiopia 59,291,170 South Africa 44,365,873 Congo, Democratic Republic of 43,106,529 Sudan 29,711,229 Morocco 28,841,705 Kenya 28,520,558 Tanzania 28,263,033 Algeria 28,171,132 Uganda 20,055,584
Note: You can use parentheses to improve the readability of WHERE clauses that contain multiple, or compound, expressions, such as the following:
where (Continent = 'Africa' and Population gt 2000000) or (Continent = 'Asia' and Population gt 1000000)
Using Other Conditional Operators |
You can use many different conditional operators in a WHERE clause. The following table lists other operators that you can use:
Operator | Definition | Example |
---|---|---|
ANY | specifies that at least one of a set of values obtained from a subquery must satisfy a given condition | where Population > any (select Population from sql.countries) |
ALL | specifies that all of the values obtained from a subquery must satisfy a given condition | where Population > all (select Population from sql.countries) |
BETWEEN-AND | tests for values within an inclusive range | where Population between 1000000 and 5000000 |
CONTAINS | tests for values that contain a specified string | where Continent contains 'America'; |
EXISTS | tests for the existence of a set of values obtained from a subquery |
where exists (select * from
sql.oilprod);
|
IN | tests for values that match one of a list of values | where Name in ('Africa', 'Asia'); |
IS NULL or IS MISSING | tests for missing values | where Population is missing; |
LIKE | tests for values that match a specified pattern (table note 1) | where Continent like 'A%'; |
=* | tests for values that sound like a specified value | where Name =* 'Tiland'; |
TABLE NOTE 1: You can use a percent symbol (%) to match any number of characters. You can use an underscore (_) to match one arbitrary character.
Note: All of these operators can be prefixed with the NOT operator to form a negative condition.
The IN operator enables you to include values within a list that you supply. The following example uses the IN operator to include only the mountains and waterfalls in the SQL.FEATURES table:
proc sql outobs=12; title 'World Mountains and Waterfalls'; select Name, Type, Height format=comma10. from sql.features where Type in ('Mountain', 'Waterfall') order by Height;
World Mountains and Waterfalls Name Type Height --------------------------------------- Niagara Falls Waterfall 193 Yosemite Waterfall 2,425 Tugela Falls Waterfall 3,110 Angel Falls Waterfall 3,212 Kosciusko Mountain 7,310 Pico Duarte Mountain 10,417 Cook Mountain 12,349 Matterhorn Mountain 14,690 Wilhelm Mountain 14,793 Mont Blanc Mountain 15,771 Ararat Mountain 16,804 Vinson Massif Mountain 16,864
The IS MISSING operator enables you to identify rows that contain columns with missing values. The following example selects countries that are not located on a continent; that is, these countries have a missing value in the Continent column:
proc sql; title 'Countries with Missing Continents'; select Name, Continent from sql.countries where Continent is missing;
Note: The IS NULL operator is the same as, and interchangeable with, the IS MISSING operator.
Countries with Missing Continents Name Continent ------------------------------------------------------------------------ Bermuda Iceland Kalaallit Nunaat
To select rows based on a range of values, you can use the BETWEEN-AND operators. This example selects countries that have latitudes within five degrees of the Equator:
proc sql outobs=12; title 'Equatorial Cities of the World'; select City, Country, Latitude from sql.worldcitycoords where Latitude between -5 and 5;
Note: In the tables used in these examples, latitude values that are south of the Equator are negative. Longitude values that are west of the Prime Meridian are also negative.
Note: Because the BETWEEN-AND operators are inclusive, the values that you specify in the BETWEEN-AND expression are included in the results.
Using the BETWEEN-AND Operators
Equatorial Cities of the World City Country Latitude ---------------------------------------------------------- Belem Brazil -1 Fortaleza Brazil -4 Bogota Colombia 4 Cali Colombia 3 Brazzaville Congo -4 Quito Ecuador 0 Cayenne French Guiana 5 Accra Ghana 5 Medan Indonesia 3 Palembang Indonesia -3 Nairobi Kenya -1 Kuala Lumpur Malaysia 4
The LIKE operator enables you to select rows based on pattern matching. For example, the following query returns all countries in the SQL.COUNTRIES table that begin with the letter Z and are any number of characters long, or end with the letter a and are five characters long:
proc sql; title1 'Country Names that Begin with the Letter "Z"'; title2 'or Are 5 Characters Long and End with the Letter "a"'; select Name from sql.countries where Name like 'Z%' or Name like '____a';
Country Names that Begin with the Letter "Z" or Are 5 Characters Long and End with the Letter "a" Name ----------------------------------- China Ghana India Kenya Libya Malta Syria Tonga Zambia Zimbabwe
The percent sign (%) and underscore (_) are wildcard characters. For more information about pattern matching with the LIKE comparison operator, see the "SQL Procedure" chapter in the Base SAS Procedures Guide.
Using Truncated String Comparison Operators |
Truncated string comparison operators are used to compare two strings. They differ from conventional comparison operators in that, before executing the comparison, PROC SQL truncates the longer string to be the same length as the shorter string. The truncation is performed internally; neither operand is permanently changed. The following table lists the truncated comparison operators:
Symbol | Definition | Example |
---|---|---|
EQT | equal to truncated strings | where Name eqt 'Aust'; |
GTT | greater than truncated strings | where Name gtt 'Bah'; |
LTT | less than truncated strings | where Name ltt 'An'; |
GET | greater than or equal to truncated strings | where Country get 'United A'; |
LET | less than or equal to truncated strings | where Lastname let 'Smith'; |
NET | not equal to truncated strings | where Style net 'TWO'; |
The following example returns a list of U.S. states that have 'New ' at the beginning of their names:
proc sql; title '"New" U.S. States'; select Name from sql.unitedstates where Name eqt 'New ';
Using a Truncated String Comparison Operator
"New" U.S. States Name ----------------------------------- New Hampshire New Jersey New Mexico New York
Using a WHERE Clause with Missing Values |
If a column that you specify in a WHERE clause contains missing values, then a query might provide unexpected results. For example, the following query returns all features from the SQL.FEATURES table that have a depth of less than 500 feet:
/* incorrect output */ proc sql outobs=12; title 'World Features with a Depth of Less than 500 Feet'; select Name, Depth from sql.features where Depth lt 500 order by Depth;
Using a WHERE Clause with Missing Values (Incorrect Output)
World Features with a Depth of Less than 500 Feet Name Depth ------------------------- Kalahari . Nile . Citlaltepec . Lena . Mont Blanc . Borneo . Rub al Khali . Amur . Yosemite . Cook . Mackenzie-Peace . Mekong .However, because PROC SQL treats missing values as smaller than nonmissing values, features that have no depth listed are also included in the results. To avoid this problem, you could adjust the WHERE expression to check for missing values and exclude them from the query results, as follows:
/* corrected output */ proc sql outobs=12; title 'World Features with a Depth of Less than 500 Feet'; select Name, Depth from sql.features where Depth lt 500 and Depth is not missing order by Depth;
Using a WHERE Clause with Missing Values (Corrected Output)
World Features with a Depth of Less than 500 Feet Name Depth ------------------------- Baltic 180 Aral Sea 222 Victoria 264 Hudson Bay 305 North 308
Copyright © 2008 by SAS Institute Inc., Cary, NC, USA. All rights reserved.