| Retrieving Data from a Single Table |
The GROUP BY clause groups data by a specified column or columns. When you use a GROUP BY clause, you also use an aggregate function in the SELECT clause or in a HAVING clause to instruct PROC SQL in how to summarize the data for each group. PROC SQL calculates the aggregate function separately for each group.
| Grouping by One Column |
The following example sums the populations of all countries to find the total population of each continent:
proc sql; title 'Total Populations of World Continents'; select Continent, sum(Population) format=comma14. as TotalPopulation from sql.countries where Continent is not missing group by Continent;
Note: Countries for which a continent
is not listed are excluded by the WHERE clause. ![[cautionend]](../../../../common/61991/HTML/default/images/cautend.gif)
Total Populations of World Continents
Total
Continent Population
---------------------------------------------------
Africa 710,529,592
Asia 3,381,858,879
Australia 18,255,944
Central America and Caribbean 66,815,930
Europe 872,192,202
North America 384,801,818
Oceania 5,342,368
South America 317,568,801
| Grouping without Summarizing |
When you use a GROUP BY clause without an aggregate function, PROC SQL treats the GROUP BY clause as if it were an ORDER BY clause and displays a message in the log that informs you that this has happened. The following example attempts to group high and low temperature information for each city in the SQL.WORLDTEMPS table by country:
proc sql outobs=12; title 'High and Low Temperatures'; select City, Country, AvgHigh, AvgLow from sql.worldtemps group by Country;
The output and log show that PROC SQL transforms the GROUP BY clause into an ORDER BY clause.
Grouping without Aggregate Functions
High and Low Temperatures
City Country AvgHigh AvgLow
-------------------------------------------------------
Algiers Algeria 90 45
Buenos Aires Argentina 87 48
Sydney Australia 79 44
Vienna Austria 76 28
Nassau Bahamas 88 65
Hamilton Bermuda 85 59
Sao Paulo Brazil 81 53
Rio de Janeiro Brazil 85 64
Quebec Canada 76 5
Montreal Canada 77 8
Toronto Canada 80 17
Beijing China 86 17
Grouping without Aggregate Functions (Partial Log)
WARNING: A GROUP BY clause has been transformed into an ORDER BY clause because
neither the SELECT clause nor the optional HAVING clause of the
associated table-expression referenced a summary function.
| Grouping by Multiple Columns |
To group by multiple columns, separate the column names with commas within the GROUP BY clause. You can use aggregate functions with any of the columns that you select. The following example groups by both Location and Type, producing total square miles for the deserts and lakes in each location in the SQL.FEATURES table:
proc sql; title 'Total Square Miles of Deserts and Lakes'; select Location, Type, sum(Area) as TotalArea format=comma16. from sql.features where type in ('Desert', 'Lake') group by Location, Type;
Total Square Miles of Deserts and Lakes
Location Type TotalArea
--------------------------------------------------
Africa Desert 3,725,000
Africa Lake 50,958
Asia Lake 25,300
Australia Desert 300,000
Canada Lake 12,275
China Desert 500,000
Europe - Asia Lake 143,550
North America Desert 140,000
North America Lake 77,200
Russia Lake 11,780
Saudi Arabia Desert 250,000
| Grouping and Sorting Data |
You can order grouped results with an ORDER BY clause. The following example takes the previous example and adds an ORDER BY clause to change the order of the Location column from ascending order to descending order:
proc sql; title 'Total Square Miles of Deserts and Lakes'; select Location, Type, sum(Area) as TotalArea format=comma16. from sql.features where type in ('Desert', 'Lake') group by Location, Type order by Location desc;
Grouping with an ORDER BY Clause
Total Square Miles of Deserts and Lakes
Location Type TotalArea
--------------------------------------------------
Saudi Arabia Desert 250,000
Russia Lake 11,780
North America Lake 77,200
North America Desert 140,000
Europe - Asia Lake 143,550
China Desert 500,000
Canada Lake 12,275
Australia Desert 300,000
Asia Lake 25,300
Africa Desert 3,725,000
Africa Lake 50,958
| Grouping with Missing Values |
When a column contains missing values, PROC SQL treats the missing values as a single group. This can sometimes provide unexpected results.
In this example, because the SQL.COUNTRIES table contains some missing values in the Continent column, the missing values combine to form a single group that has the total area of the countries that have a missing value in the Continent column:
/* incorrect output */ proc sql outobs=12; title 'Areas of World Continents'; select Name format=$25., Continent, sum(Area) format=comma12. as TotalArea from sql.countries group by Continent order by Continent, Name;
The output is incorrect because Bermuda, Iceland, and Kalaallit Nunaat are not actually part of the same continent; however, PROC SQL treats them that way because they all have a missing character value in the Continent column.
Finding Grouping Errors Caused by Missing Values (Incorrect Output)
Areas of World Continents
Name Continent TotalArea
-----------------------------------------------------------------------
Bermuda 876,800
Iceland 876,800
Kalaallit Nunaat 876,800
Algeria Africa 11,299,595
Angola Africa 11,299,595
Benin Africa 11,299,595
Botswana Africa 11,299,595
Burkina Faso Africa 11,299,595
Burundi Africa 11,299,595
Cameroon Africa 11,299,595
Cape Verde Africa 11,299,595
Central African Republic Africa 11,299,595
To correct the query from the previous example, you can write a WHERE clause to exclude the missing values from the results:
/* corrected output */ proc sql outobs=12; title 'Areas of World Continents'; select Name format=$25., Continent, sum(Area) format=comma12. as TotalArea from sql.countries where Continent is not missing group by Continent order by Continent, Name;
Adjusting the Query to Avoid Errors Due to Missing Values (Corrected Output)
Areas of World Continents
Name Continent TotalArea
-----------------------------------------------------------------------
Algeria Africa 11,299,595
Angola Africa 11,299,595
Benin Africa 11,299,595
Botswana Africa 11,299,595
Burkina Faso Africa 11,299,595
Burundi Africa 11,299,595
Cameroon Africa 11,299,595
Cape Verde Africa 11,299,595
Central African Republic Africa 11,299,595
Chad Africa 11,299,595
Comoros Africa 11,299,595
Congo Africa 11,299,595
Note: Aggregate functions, such as the SUM function, can cause
the same calculation to repeat for every row. This occurs whenever PROC SQL
remerges data. See Remerging Summary Statistics for more information about remerging. ![[cautionend]](../../../../common/61991/HTML/default/images/cautend.gif)
Copyright © 2008 by SAS Institute Inc., Cary, NC, USA. All rights reserved.