| Retrieving Data from a Single Table |
The WHERE clause enables you to retrieve only rows from a table that satisfy a condition. WHERE clauses can contain any of the columns in a table, including columns that are not selected.
| Using a Simple WHERE Clause |
The following example uses a WHERE clause to find all countries that are in the continent of Europe and their populations:
proc sql outobs=12; title 'Countries in Europe'; select Name, Population format=comma10. from sql.countries where Continent = 'Europe';
Countries in Europe
Name Population
-----------------------------------------------
Albania 3,407,400
Andorra 64,634
Austria 8,033,746
Belarus 10,508,000
Belgium 10,162,614
Bosnia and Herzegovina 4,697,040
Bulgaria 8,887,111
Channel Islands 146,436
Croatia 4,744,505
Czech Republic 10,511,029
Denmark 5,239,356
England 49,293,170
| Retrieving Rows Based on a Comparison |
You can use comparison operators in a WHERE clause to select different subsets of data. The following table lists the comparison operators that you can use:
| Symbol | Mnemonic Equivalent | Definition | Example |
|---|---|---|---|
| = | EQ | equal to | where Name = 'Asia'; |
| ^= or ~= or ¬= or <> | NE | not equal to | where Name ne 'Africa'; |
| > | GT | greater than | where Area > 10000; |
| < | LT | less than | where Depth < 5000; |
| >= | GE | greater than or equal to | where Statehood >= '01jan1860'd; |
| <= | LE | less than or equal to | where Population <= 5000000; |
The following example subsets the SQL.UNITEDSTATES table by including only states with populations greater than 5,000,000 people:
proc sql; title 'States with Populations over 5,000,000'; select Name, Population format=comma10. from sql.unitedstates where Population gt 5000000 order by Population desc;
Retrieving Rows Based on a Comparison
States with Populations over 5,000,000
Name Population
-----------------------------------------------
California 31,518,948
New York 18,377,334
Texas 18,209,994
Florida 13,814,408
Pennsylvania 12,167,566
Illinois 11,813,091
Ohio 11,200,790
Michigan 9,571,318
New Jersey 7,957,196
North Carolina 7,013,950
Georgia 6,985,572
Virginia 6,554,851
Massachusetts 6,071,816
Indiana 5,769,553
Washington 5,307,322
Missouri 5,285,610
Tennessee 5,149,273
Wisconsin 5,087,770
Maryland 5,014,048
| Retrieving Rows That Satisfy Multiple Conditions |
You can use logical, or Boolean, operators to construct a WHERE clause that contains two or more expressions. The following table lists the logical operators that you can use:
| Symbol | Mnemonic Equivalent | Definition | Example |
|---|---|---|---|
| & | AND | specifies that both the previous and following conditions must be true | Continent = 'Asia' and Population > 5000000 |
| ! or | or ¦ | OR | specifies that either the previous or the following condition must be true | Population < 1000000 or Population > 5000000 |
| ^ or ~ or ¬ | NOT | specifies that the following condition must be false | Continent not 'Africa' |
The following example uses two expressions to include only countries that are in Africa and that have a population greater than 20,000,000 people:
proc sql; title 'Countries in Africa with Populations over 20,000,000'; select Name, Population format=comma10. from sql.countries where Continent = 'Africa' and Population gt 20000000 order by Population desc;
Retrieving Rows That Satisfy Multiple Conditions
Countries in Africa with Populations over 20,000,000
Name Population
-----------------------------------------------
Nigeria 99,062,003
Egypt 59,912,259
Ethiopia 59,291,170
South Africa 44,365,873
Congo, Democratic Republic of 43,106,529
Sudan 29,711,229
Morocco 28,841,705
Kenya 28,520,558
Tanzania 28,263,033
Algeria 28,171,132
Uganda 20,055,584
Note: You can use parentheses to improve the readability of WHERE clauses that contain multiple, or compound, expressions, such as the following:
where (Continent = 'Africa' and Population gt 2000000) or (Continent = 'Asia' and Population gt 1000000)
![[cautionend]](../../../../common/61991/HTML/default/images/cautend.gif)
| Using Other Conditional Operators |
You can use many different conditional operators in a WHERE clause. The following table lists other operators that you can use:
| Operator | Definition | Example |
|---|---|---|
| ANY | specifies that at least one of a set of values obtained from a subquery must satisfy a given condition | where Population > any (select Population from sql.countries) |
| ALL | specifies that all of the values obtained from a subquery must satisfy a given condition | where Population > all (select Population from sql.countries) |
| BETWEEN-AND | tests for values within an inclusive range | where Population between 1000000 and 5000000 |
| CONTAINS | tests for values that contain a specified string | where Continent contains 'America'; |
| EXISTS | tests for the existence of a set of values obtained from a subquery |
where exists (select * from
sql.oilprod);
|
| IN | tests for values that match one of a list of values | where Name in ('Africa', 'Asia'); |
| IS NULL or IS MISSING | tests for missing values | where Population is missing; |
| LIKE | tests for values that match a specified pattern (table note 1) | where Continent like 'A%'; |
| =* | tests for values that sound like a specified value | where Name =* 'Tiland'; |
TABLE NOTE 1: You can use a percent symbol (%) to match any number of characters.
You can use an underscore (_) to match one arbitrary character. ![[arrow]](../../../../common/61991/HTML/default/images/fntarrow.gif)
Note: All of these operators can be prefixed with the NOT
operator to form a negative condition. ![[cautionend]](../../../../common/61991/HTML/default/images/cautend.gif)
The IN operator enables you to include values within a list that you supply. The following example uses the IN operator to include only the mountains and waterfalls in the SQL.FEATURES table:
proc sql outobs=12; title 'World Mountains and Waterfalls'; select Name, Type, Height format=comma10. from sql.features where Type in ('Mountain', 'Waterfall') order by Height;
World Mountains and Waterfalls
Name Type Height
---------------------------------------
Niagara Falls Waterfall 193
Yosemite Waterfall 2,425
Tugela Falls Waterfall 3,110
Angel Falls Waterfall 3,212
Kosciusko Mountain 7,310
Pico Duarte Mountain 10,417
Cook Mountain 12,349
Matterhorn Mountain 14,690
Wilhelm Mountain 14,793
Mont Blanc Mountain 15,771
Ararat Mountain 16,804
Vinson Massif Mountain 16,864
The IS MISSING operator enables you to identify rows that contain columns with missing values. The following example selects countries that are not located on a continent; that is, these countries have a missing value in the Continent column:
proc sql; title 'Countries with Missing Continents'; select Name, Continent from sql.countries where Continent is missing;
Note: The IS NULL operator is
the same as, and interchangeable with, the IS MISSING operator. ![[cautionend]](../../../../common/61991/HTML/default/images/cautend.gif)
Countries with Missing Continents
Name Continent
------------------------------------------------------------------------
Bermuda
Iceland
Kalaallit Nunaat
To select rows based on a range of values, you can use the BETWEEN-AND operators. This example selects countries that have latitudes within five degrees of the Equator:
proc sql outobs=12; title 'Equatorial Cities of the World'; select City, Country, Latitude from sql.worldcitycoords where Latitude between -5 and 5;
Note: In the tables used
in these examples, latitude values that are south of the Equator are negative.
Longitude values that are west of the Prime Meridian are also negative. ![[cautionend]](../../../../common/61991/HTML/default/images/cautend.gif)
Note: Because the BETWEEN-AND operators are inclusive, the values
that you specify in the BETWEEN-AND expression are included in the results. ![[cautionend]](../../../../common/61991/HTML/default/images/cautend.gif)
Using the BETWEEN-AND Operators
Equatorial Cities of the World
City Country Latitude
----------------------------------------------------------
Belem Brazil -1
Fortaleza Brazil -4
Bogota Colombia 4
Cali Colombia 3
Brazzaville Congo -4
Quito Ecuador 0
Cayenne French Guiana 5
Accra Ghana 5
Medan Indonesia 3
Palembang Indonesia -3
Nairobi Kenya -1
Kuala Lumpur Malaysia 4
The LIKE operator enables you to select rows based on pattern matching. For example, the following query returns all countries in the SQL.COUNTRIES table that begin with the letter Z and are any number of characters long, or end with the letter a and are five characters long:
proc sql; title1 'Country Names that Begin with the Letter "Z"'; title2 'or Are 5 Characters Long and End with the Letter "a"'; select Name from sql.countries where Name like 'Z%' or Name like '____a';
Country Names that Begin with the Letter "Z"
or Are 5 Characters Long and End with the Letter "a"
Name
-----------------------------------
China
Ghana
India
Kenya
Libya
Malta
Syria
Tonga
Zambia
Zimbabwe
The percent sign (%) and underscore (_) are wildcard characters. For more information about pattern matching with the LIKE comparison operator, see the "SQL Procedure" chapter in the Base SAS Procedures Guide.
| Using Truncated String Comparison Operators |
Truncated string comparison operators are used to compare two strings. They differ from conventional comparison operators in that, before executing the comparison, PROC SQL truncates the longer string to be the same length as the shorter string. The truncation is performed internally; neither operand is permanently changed. The following table lists the truncated comparison operators:
| Symbol | Definition | Example |
|---|---|---|
| EQT | equal to truncated strings | where Name eqt 'Aust'; |
| GTT | greater than truncated strings | where Name gtt 'Bah'; |
| LTT | less than truncated strings | where Name ltt 'An'; |
| GET | greater than or equal to truncated strings | where Country get 'United A'; |
| LET | less than or equal to truncated strings | where Lastname let 'Smith'; |
| NET | not equal to truncated strings | where Style net 'TWO'; |
The following example returns a list of U.S. states that have 'New ' at the beginning of their names:
proc sql; title '"New" U.S. States'; select Name from sql.unitedstates where Name eqt 'New ';
Using a Truncated String Comparison Operator
"New" U.S. States
Name
-----------------------------------
New Hampshire
New Jersey
New Mexico
New York
| Using a WHERE Clause with Missing Values |
If a column that you specify in a WHERE clause contains missing values, then a query might provide unexpected results. For example, the following query returns all features from the SQL.FEATURES table that have a depth of less than 500 feet:
/* incorrect output */ proc sql outobs=12; title 'World Features with a Depth of Less than 500 Feet'; select Name, Depth from sql.features where Depth lt 500 order by Depth;
Using a WHERE Clause with Missing Values (Incorrect Output)
World Features with a Depth of Less than 500 Feet
Name Depth
-------------------------
Kalahari .
Nile .
Citlaltepec .
Lena .
Mont Blanc .
Borneo .
Rub al Khali .
Amur .
Yosemite .
Cook .
Mackenzie-Peace .
Mekong .
However, because PROC SQL treats missing values as smaller than nonmissing
values, features that have no depth listed are also included in the results.
To avoid this problem, you could adjust the WHERE expression to check for
missing values and exclude them from the query results, as follows:
/* corrected output */ proc sql outobs=12; title 'World Features with a Depth of Less than 500 Feet'; select Name, Depth from sql.features where Depth lt 500 and Depth is not missing order by Depth;
Using a WHERE Clause with Missing Values (Corrected Output)
World Features with a Depth of Less than 500 Feet
Name Depth
-------------------------
Baltic 180
Aral Sea 222
Victoria 264
Hudson Bay 305
North 308
Copyright © 2008 by SAS Institute Inc., Cary, NC, USA. All rights reserved.