![]() |
![]() |
SAS Elementary Statistics Procedures |
Simple Statistics |
The base SAS procedures use a standardized set of keywords to refer to statistics. You specify these keywords in SAS statements to request the statistics to be displayed or stored in an output data set.
In the following notation, summation is over observations that contain nonmissing values of the analyzed variable and, except where shown, over nonmissing weights and frequencies of one or more:
is the nonmissing value of the analyzed variable for observation i.
is the frequency that is associated with
if you use a FREQ statement. If you omit the FREQ statement,
then
for all i.
is the weight that is associated with
if you use a WEIGHT statement. The base procedures automatically
exclude the values of
with missing weights from the analysis.
By default, the base procedures treat a negative weight as if it is
equal to zero. However, if you use the EXCLNPWGT option in the PROC statement,
then the procedure also excludes those values of
with nonpositive weights. Note that most SAS/STAT procedures,
such as PROC TTEST and PROC GLM, exclude values with nonpositive weights by
default.
If you omit the WEIGHT statement, then
for all i.
is the number of nonmissing values of
,
. If you use the EXCLNPWGT option and the WEIGHT statement,
then
is the number of nonmissing values with positive weights.
is the mean
is the variance
When VARDEF= |
![]() |
N |
![]() |
DF |
![]() |
WEIGHT |
![]() |
WDF |
![]() |
The default is DF.
is the standardized variable
The standard keywords and formulas for each statistic follow. Some formulas use keywords to designate the corresponding statistic.
Statistic | PROC MEANS and SUMMARY | PROC UNIVARIATE | PROC TABULATE | PROC REPORT | PROC CORR | PROC SQL | |
---|---|---|---|---|---|---|---|
Number of missing values | X | X | X | X |
|
X | |
Number of nonmissing values | X | X | X | X | X | X | |
Number of observations | X | X |
|
|
|
X | |
Sum of weights | X | X | X | X | X | X | |
Mean | X | X | X | X | X | X | |
Sum | X | X | X | X | X | X | |
Extreme values | X | X |
|
|
|
|
|
Minimum | X | X | X | X | X | X | |
Maximum | X | X | X | X | X | X | |
Range | X | X | X | X |
|
X | |
Uncorrected sum of squares | X | X | X | X | X | X | |
Corrected sum of squares | X | X | X | X | X | X | |
Variance | X | X | X | X | X | X | |
Covariance |
|
|
|
|
X |
|
|
Standard deviation | X | X | X | X | X | X | |
Standard error of the mean | X | X | X | X |
|
X | |
Coefficient of variation | X | X | X | X |
|
X | |
Skewness | X | X | X |
|
|
|
|
Kurtosis | X | X | X |
|
|
|
|
Confidence Limits |
|
|
|
|
|
|
|
of the mean | X | X | X |
|
|
|
|
of the variance |
|
X |
|
|
|
|
|
of quantiles |
|
X |
|
|
|
|
|
Median | X | X | X | X | X |
|
|
Mode | X | X | X | X |
|
|
|
Percentiles/Deciles/Quartiles | X | X | X | X |
|
|
|
t test |
|
|
|
|
|
|
|
|
for mean=0 | X | X | X | X |
|
X |
|
for mean=
![]() |
|
X |
|
|
|
|
Nonparametric tests for location |
|
X |
|
|
|
|
|
Tests for normality |
|
X |
|
|
|
|
|
Correlation coefficients |
|
|
|
|
X |
|
|
Cronbach's alpha |
|
|
|
|
X |
|
Descriptive Statistics |
The keywords for descriptive statistics are
is the sum of squares corrected for the mean, computed as
is the percent coefficient of variation, computed as
is the kurtosis, which measures heaviness of tails. When VARDEF=DF, the kurtosis is computed as
When VARDEF=N, the kurtosis is computed as
Note: PROC MEANS and PROC TABULATE do not compute weighted kurtosis.
is the maximum value of
.
is the arithmetic mean
.
is the minimum value of
.
is the most frequent value of
.
Note: When QMETHOD=P2, PROC REPORT, PROC MEANS, and PROC TABULATE do
not compute MODE.
is the number of
values that are not missing. Observations with
less than one and
equal to missing or
(when you use the EXCLNPWGT option) are excluded from the
analysis and are not included in the calculation of N.
is the number of
values that are missing. Observations with
less than one and
equal to missing or
(when you use the EXCLNPWGT option) are excluded from the
analysis and are not included in the calculation of NMISS.
is the total number of observations and is calculated as the sum of N and NMISS. However, if you use the WEIGHT statement, then NOBS is calculated as the sum of N, NMISS, and the number of observations excluded because of missing or nonpositive weights.
is the range and is calculated as the difference between maximum value and minimum value.
is skewness, which measures the tendency of the deviations to be larger in one direction than in the other. When VARDEF=DF, the skewness is computed as
When VARDEF=N, the skewness is computed as
Note: PROC MEANS and PROC TABULATE do not compute weighted skewness.
is the standard deviation s
and is computed as the square root of the variance,
.
is the standard error of the mean, computed as
is the sum, computed as
is the sum of the weights,
, computed as
is the uncorrected sum of squares, computed as
is the variance
.
Quantile and Related Statistics |
The keywords for quantiles and related statistics are
is the middle value.
is the 1st percentile.
is the 5th percentile.
is the 10th percentile.
is the 90th percentile.
is the 95th percentile.
is the 99th percentile.
is the lower quartile (25th percentile).
is the upper quartile (75th percentile).
is interquartile range and is calculated as
When you use the WEIGHT statement, the tth percentile is computed as
QNTLDEF= | Description | Formula | |
---|---|---|---|
1 |
weighted average at
![]() |
![]() |
|
where
![]() ![]() |
|||
2 |
observation numbered closest to
![]() |
![]() |
if
![]() |
![]() |
if
![]() ![]() |
||
![]() |
if
![]() ![]() |
||
where i is the integer part of
![]() |
|||
3 | empirical distribution function |
![]() |
if
![]() |
![]() |
if
![]() |
||
4 |
weighted average aimed at
![]() |
![]() |
|
where
![]() ![]() |
|||
5 | empirical distribution function with averaging |
![]() |
if
![]() |
![]() |
if
![]() |
Hypothesis Testing Statistics |
The keywords for hypothesis testing statistics are
is the Student's t statistic to test the null
hypothesis that the population mean is equal to
and is calculated as
By default, when you use a WEIGHT statement, the procedure counts the
values with nonpositive weights in the degrees of freedom.
Use the EXCLNPWGT option in the PROC statement to exclude values with nonpositive
weights. Most SAS/STAT procedures, such as PROC TTEST and PROC GLM automatically
exclude values with nonpositive weights.
is the two-tailed p-value for Student's t statistic, T, with
degrees of freedom. This value is the probability under
the null hypothesis of obtaining a more extreme value of T than is observed
in this sample.
Confidence Limits for the Mean |
The keywords for confidence limits are
is the two-sided confidence limit for the mean. A two-sided
percent confidence interval for the mean has upper and lower
limits
is the one-sided confidence limit below the mean. The one-sided
percent confidence interval for the mean has the lower limit
is the one-sided confidence limit above the mean. The one-sided
percent confidence interval for the mean has the upper limit
Using Weights |
For more information on using weights and an example, see WEIGHT.
Data Requirements for Summarization Procedures |
The following are the minimal data requirements to compute unweighted statistics and do not describe recommended sample sizes. Statistics are reported as missing if VARDEF=DF (the default) and the following requirements are not met:
N and NMISS are computed regardless of the number of missing or nonmissing observations.
SUM, MEAN, MAX, MIN, RANGE, USS, and CSS require at least one nonmissing observation.
VAR, STD, STDERR, CV, T, PRT, and PROBT require at least two nonmissing observations.
SKEWNESS requires at least three nonmissing observations.
KURTOSIS requires at least four nonmissing observations.
SKEWNESS, KURTOSIS, T, PROBT, and PRT require that STD is greater than zero.
CV requires that MEAN is not equal to zero.
CLM, LCLM, UCLM, STDERR, T, PRT, and PROBT require that VARDEF=DF.
![]() |
![]() |
Copyright © 2010 by SAS Institute Inc., Cary, NC, USA. All rights reserved.