When a column contains
missing values, PROC SQL treats the missing values as a single group.
This can sometimes provide unexpected results.
In this example,
because the SQL.COUNTRIES table contains some missing values in the
Continent column, the missing values combine to form a single group
that has the total area of the countries that have a missing value
in the Continent column:
libname sql 'SAS-library';
/* unexpected output */
proc sql outobs=12;
title 'Areas of World Continents';
select Name format=$25.,
Continent,
sum(Area) format=comma12. as TotalArea
from sql.countries
group by Continent
order by Continent, Name;
The output is incorrect
because Bermuda, Iceland, and Kalaallit Nunaat are not actually part
of the same continent. However, PROC SQL treats them that way because
they all have a missing character value in the Continent column.
Finding Grouping Errors Caused by Missing Values (Unexpected
Output)
To correct the query
from the previous example, you can write a WHERE clause to exclude
the missing values from the results:
/* modified output */
proc sql outobs=12;
title 'Areas of World Continents';
select Name format=$25.,
Continent,
sum(Area) format=comma12. as TotalArea
from sql.countries
where Continent is not missing
group by Continent
order by Continent, Name;
Adjusting the Query to Avoid Errors Due to Missing Values (Modified
Output)
Note: Aggregate functions, such
as the SUM function, can cause the same calculation to repeat for
every row. This occurs whenever PROC SQL remerges data. See
Remerging Summary Statistics for more information about remerging.