Producing Summary Statistics

Summary statistics on the numeric variables of a SAS data set can be obtained with the SUMMARY statement. These statistics can be based on subgroups of the data by using the CLASS clause in the SUMMARY statement. The SAVE option in the OPT clause enables you to save the computed statistics in matrices for later perusal. For example, consider the following statement.

   > summary var {height weight} class {sex} stat{mean std} opt{save};

               SEX       Nobs  Variable        MEAN         STD
               ------------------------------------------------
               F            9  HEIGHT      60.58889     5.01833
                               WEIGHT      90.11111    19.38391

               M            9  HEIGHT      64.45556     4.90742
                               WEIGHT     110.00000    23.84717

               All         18  HEIGHT      62.52222     5.20978
                               WEIGHT     100.05556    23.43382
               ------------------------------------------------

This summary statement gives the mean and standard deviation of the variables HEIGHT and WEIGHT for the two subgroups (male and female) of the data set CLASS. Since the SAVE option is set, the statistics of the variables are stored in matrices under the name of the corresponding variables: each column corresponds to a statistic and each row corresponds to a subgroup. Two other vectors, SEX and _NOBS_, are created. The vector SEX contains the two distinct values of the CLASS variable SEX used in forming the two subgroups. The vector _NOBS_ has the number of observations in each subgroup.

Note that the combined means and standard deviations of the two subgroups are displayed but not saved.

More than one CLASS variable can be used, in which case a subgroup is defined by the combination of the values of the CLASS variables.