The SQL procedure (PROC SQL) is a wonderful tool for summarizing (or aggregating) data. It provides a number of useful summary (or aggregate) functions to help perform calculations, descriptive statistics, and other aggregating operations in a SELECT statement or HAVING clause. These functions are designed to summarize information and not display detail about data.
Without the availability of summary functions, you would have to construct the necessary logic by using somewhat complicated SQL programming constructs. When you use a summary function without a GROUP BY clause, all the rows in a table are treated as a single group. Consequently, the results are often a single row value.
A number of summary functions are available, including facilities to count non-missing values; determine the minimum and maximum values in specific columns; return the range of values; compute the mean, standard deviation, and variance of specific values; and other aggregating functions. The following table presents an alphabetical listing of the available PROC SQL summary functions; when multiple names for the same function are available, the ANSI-approved name appears first.
Summary Function | Description |
AVG, MEAN | Average or mean of values |
COUNT, FREQ, N | Aggregate number of non-missing values |
CSS | Corrected sum of squares |
CV | Coefficient of variation |
MAX | Largest value |
MIN | Smallest value |
NMISS | Number of missing values |
PRT | Probability of a greater absolute value of Student's t |
RANGE | Difference between the largest and smallest values |
STD | Standard deviation |
STDERR | Standard error of the mean |
SUM | Sum of values |
SUMWGT | Sum of the weight variable values, which is 1 |
T | Testing the hypothesis that the population mean is zero |
USS | Uncorrected sum of squares |
VAR | Variance |
The next example uses the COUNT function with the (*) argument to produce a total number of rows, regardless of whether data is missing. The asterisk (*) is specified as the argument to the COUNT function to count all rows in the MOVIES table.
PROC SQL Code
PROC SQL; SELECT COUNT(*) AS Row_Count FROM MOVIES; QUIT;
Results
Row_Count ------------------ 22
Unlike the COUNT(*) function syntax that counts all rows, regardless of whether data is missing, the next example uses the COUNT function with the (column-name) argument to produce a total number of non-missing rows based on the column, RATING.
PROC SQL Code
PROC SQL; SELECT COUNT(RATING) AS Non_Missing_Row_Count FROM MOVIES; QUIT;
Results
Non_Missing_ Row_Count ----------------------- 22
The MIN summary function can be specified to determine which movie in the MOVIES table has the shortest running time.
PROC SQL Code
PROC SQL; SELECT MIN(length) AS Shortest Label='Shortest Length' FROM MOVIES; QUIT;
Results
Shortest Length ----------------- 97
In the next example, the RANGE function is specified to determine the difference between the largest and smallest values for a selected column. Suppose you wanted to determine the range of movie lengths for all movies by their movie rating (G, PG, PG-13, and R). You could construct the query as follows.
PROC SQL Code
PROC SQL; SELECT RANGE(length) AS Range_Length FROM MOVIES GROUP BY RATING; QUIT;
Results
Range_ Length ------ 0 32 97 72
About the Author
Kirk Paul Lafler is the author of PROC SQL: Beyond the Basics Using SAS, published by SAS Press. He also writes the
popular SAS tips column "Kirk's Korner," which appears regularly in several SAS users group newsletters, and is a frequent speaker at
SAS users group meetings.
His book is available from the online bookstore.
These sample files and code examples are provided by SAS Institute Inc. "as is" without warranty of any kind, either express or implied, including but not limited to the implied warranties of merchantability and fitness for a particular purpose. Recipients acknowledge and agree that SAS Institute shall not be liable for any damages whatsoever arising out of their use of this material. In addition, SAS Institute will provide no support for the materials contained herein.
These sample files and code examples are provided by SAS Institute Inc. "as is" without warranty of any kind, either express or implied, including but not limited to the implied warranties of merchantability and fitness for a particular purpose. Recipients acknowledge and agree that SAS Institute shall not be liable for any damages whatsoever arising out of their use of this material. In addition, SAS Institute will provide no support for the materials contained herein.
Type: | Sample |
Topic: | Internal Administration ==> BBU Non SAS Authors ==> Kirk Paul Lafler SAS Reference ==> Procedures ==> SQL |
Date Modified: | 2005-04-09 03:01:52 |
Date Created: | 2005-04-06 14:58:29 |
Product Family | Product | Host | SAS Release | |
Starting | Ending | |||
SAS System | Base SAS | All | n/a | n/a |