Previous Page | Next Page

Retrieving Data from a Single Table

Grouping Data

The GROUP BY clause groups data by a specified column or columns. When you use a GROUP BY clause, you also use an aggregate function in the SELECT clause or in a HAVING clause to instruct PROC SQL in how to summarize the data for each group. PROC SQL calculates the aggregate function separately for each group.


Grouping by One Column

The following example sums the populations of all countries to find the total population of each continent:

proc sql;
   title 'Total Populations of World Continents';
   select Continent, sum(Population) format=comma14. as TotalPopulation
      from sql.countries
      where Continent is not missing
      group by Continent;

Note:   Countries for which a continent is not listed are excluded by the WHERE clause.  [cautionend]

Grouping by One Column

                     Total Populations of World Continents

                                                            Total
              Continent                                Population
              ---------------------------------------------------
              Africa                                  710,529,592
              Asia                                  3,381,858,879
              Australia                                18,255,944
              Central America and Caribbean            66,815,930
              Europe                                  872,192,202
              North America                           384,801,818
              Oceania                                   5,342,368
              South America                           317,568,801

Grouping without Summarizing

When you use a GROUP BY clause without an aggregate function, PROC SQL treats the GROUP BY clause as if it were an ORDER BY clause and displays a message in the log that informs you that this has happened. The following example attempts to group high and low temperature information for each city in the SQL.WORLDTEMPS table by country:

proc sql outobs=12;
   title 'High and Low Temperatures';
   select City, Country, AvgHigh, AvgLow
      from sql.worldtemps
      group by Country;

The output and log show that PROC SQL transforms the GROUP BY clause into an ORDER BY clause.

Grouping without Aggregate Functions

                           High and Low Temperatures

            City                Country           AvgHigh    AvgLow
            -------------------------------------------------------
            Algiers             Algeria                90        45
            Buenos Aires        Argentina              87        48
            Sydney              Australia              79        44
            Vienna              Austria                76        28
            Nassau              Bahamas                88        65
            Hamilton            Bermuda                85        59
            Sao Paulo           Brazil                 81        53
            Rio de Janeiro      Brazil                 85        64
            Quebec              Canada                 76         5
            Montreal            Canada                 77         8
            Toronto             Canada                 80        17
            Beijing             China                  86        17

Grouping without Aggregate Functions (Partial Log)

 WARNING: A GROUP BY clause has been transformed into an ORDER BY clause because 
         neither the SELECT clause nor the optional HAVING clause of the 
         associated table-expression referenced a summary function.

Grouping by Multiple Columns

To group by multiple columns, separate the column names with commas within the GROUP BY clause. You can use aggregate functions with any of the columns that you select. The following example groups by both Location and Type, producing total square miles for the deserts and lakes in each location in the SQL.FEATURES table:

proc sql;
   title 'Total Square Miles of Deserts and Lakes';
   select Location, Type, sum(Area) as TotalArea format=comma16.
      from sql.features
      where type in ('Desert', 'Lake')
      group by Location, Type;

Grouping by Multiple Columns

                         Total Square Miles of Deserts and Lakes

                    Location              Type               TotalArea
                    --------------------------------------------------
                    Africa                Desert             3,725,000
                    Africa                Lake                  50,958
                    Asia                  Lake                  25,300
                    Australia             Desert               300,000
                    Canada                Lake                  12,275
                    China                 Desert               500,000
                    Europe - Asia         Lake                 143,550
                    North America         Desert               140,000
                    North America         Lake                  77,200
                    Russia                Lake                  11,780
                    Saudi Arabia          Desert               250,000

Grouping and Sorting Data

You can order grouped results with an ORDER BY clause. The following example takes the previous example and adds an ORDER BY clause to change the order of the Location column from ascending order to descending order:

proc sql;
   title 'Total Square Miles of Deserts and Lakes';
   select Location, Type, sum(Area) as TotalArea format=comma16.
      from sql.features
      where type in ('Desert', 'Lake')
      group by Location, Type
      order by Location desc;

Grouping with an ORDER BY Clause

                         Total Square Miles of Deserts and Lakes

                    Location              Type               TotalArea
                    --------------------------------------------------
                    Saudi Arabia          Desert               250,000
                    Russia                Lake                  11,780
                    North America         Lake                  77,200
                    North America         Desert               140,000
                    Europe - Asia         Lake                 143,550
                    China                 Desert               500,000
                    Canada                Lake                  12,275
                    Australia             Desert               300,000
                    Asia                  Lake                  25,300
                    Africa                Desert             3,725,000
                    Africa                Lake                  50,958

Grouping with Missing Values

When a column contains missing values, PROC SQL treats the missing values as a single group. This can sometimes provide unexpected results.


Finding Grouping Errors Caused by Missing Values

In this example, because the SQL.COUNTRIES table contains some missing values in the Continent column, the missing values combine to form a single group that has the total area of the countries that have a missing value in the Continent column:

/* incorrect output */

proc sql outobs=12;
   title 'Areas of World Continents';
   select Name format=$25., 
          Continent, 
          sum(Area) format=comma12. as TotalArea
      from sql.countries
      group by Continent
      order by Continent, Name;

The output is incorrect because Bermuda, Iceland, and Kalaallit Nunaat are not actually part of the same continent; however, PROC SQL treats them that way because they all have a missing character value in the Continent column.

Finding Grouping Errors Caused by Missing Values (Incorrect Output)

                                Areas of World Continents

         Name                       Continent                          TotalArea
         -----------------------------------------------------------------------
         Bermuda                                                         876,800
         Iceland                                                         876,800
         Kalaallit Nunaat                                                876,800
         Algeria                    Africa                            11,299,595
         Angola                     Africa                            11,299,595
         Benin                      Africa                            11,299,595
         Botswana                   Africa                            11,299,595
         Burkina Faso               Africa                            11,299,595
         Burundi                    Africa                            11,299,595
         Cameroon                   Africa                            11,299,595
         Cape Verde                 Africa                            11,299,595
         Central African  Republic  Africa                            11,299,595

To correct the query from the previous example, you can write a WHERE clause to exclude the missing values from the results:

/* corrected output */

proc sql outobs=12;
   title 'Areas of World Continents';
   select Name format=$25.,
          Continent, 
          sum(Area) format=comma12. as TotalArea
      from sql.countries
      where Continent is not missing
      group by Continent
      order by Continent, Name;

Adjusting the Query to Avoid Errors Due to Missing Values (Corrected Output)

                                Areas of World Continents

         Name                       Continent                          TotalArea
         -----------------------------------------------------------------------
         Algeria                    Africa                            11,299,595
         Angola                     Africa                            11,299,595
         Benin                      Africa                            11,299,595
         Botswana                   Africa                            11,299,595
         Burkina Faso               Africa                            11,299,595
         Burundi                    Africa                            11,299,595
         Cameroon                   Africa                            11,299,595
         Cape Verde                 Africa                            11,299,595
         Central African  Republic  Africa                            11,299,595
         Chad                       Africa                            11,299,595
         Comoros                    Africa                            11,299,595
         Congo                      Africa                            11,299,595

Note:   Aggregate functions, such as the SUM function, can cause the same calculation to repeat for every row. This occurs whenever PROC SQL remerges data. See Remerging Summary Statistics for more information about remerging.  [cautionend]

Previous Page | Next Page | Top of Page