The OUTPUT statement saves statistics and BY variables in an output data set. When you use a BY statement, each observation in the output data set corresponds to one of the BY groups. Otherwise, the output data set contains only one observation.
You can use any number of OUTPUT statements in the UNIVARIATE procedure. Each OUTPUT statement creates a new data set to contain the statistics specified in that statement. You must use the VAR statement with the OUTPUT statement. The OUTPUT statement must contain a specification of the form keyword=names or the PCTLPTS= and PCTLPRE= options. See Example 4.7 and Example 4.8.
You can use the OUT= option to specify the name of the output data set:
A keyword=names specification selects a statistic to be included in the output data set and gives the names of new variables that contain the statistic. Specify a keyword for each desired statistic, followed by an equal sign, followed by the names of the variables to contain the statistic. In the output data set, the first variable listed after a keyword in the OUTPUT statement contains the statistic for the first variable listed in the VAR statement, the second variable contains the statistic for the second variable in the VAR statement, and so on. If the list of names following the equal sign is shorter than the list of variables in the VAR statement, the procedure uses the names in the order in which the variables are listed in the VAR statement. The available keywords are listed in Table 4.14.
Table 4.14: Statistical Keywords
Keyword |
Description |
---|---|
Descriptive Statistic Keywords |
|
CSS |
Corrected sum of squares |
CV |
Coefficient of variation |
GEOMEAN |
Geometric mean |
KURTOSIS | KURT |
Kurtosis |
MAX |
Largest value |
MEAN |
Sample mean |
MIN |
Smallest value |
MODE |
Most frequent value |
N |
Sample size |
NMISS |
Number of missing values |
NOBS |
Number of observations |
RANGE |
Range |
SKEWNESS | SKEW |
Skewness |
STD | STDDEV |
Standard deviation |
STDMEAN | STDERR |
Standard error of the mean |
SUM |
Sum of the observations |
SUMWGT |
Sum of the weights |
USS |
Uncorrected sum of squares |
VAR |
Variance |
Quantile Statistic Keywords |
|
P1 |
1st percentile |
P5 |
5th percentile |
P10 |
10th percentile |
Q1 | P25 |
Lower quartile (25th percentile) |
MEDIAN | Q2 | P50 |
Median (50th percentile) |
Q3 | P75 |
Upper quartile (75th percentile) |
P90 |
90th percentile |
P95 |
95th percentile |
P99 |
99th percentile |
QRANGE |
Interquartile range (Q3–Q1) |
Robust Statistic Keywords |
|
GINI |
Gini’s mean difference |
MAD |
Median absolute difference about the median |
QN |
, alternative to MAD |
SN |
, alternative to MAD |
STD_GINI |
Gini’s standard deviation |
STD_MAD |
MAD standard deviation |
STD_QN |
standard deviation |
STD_QRANGE |
Interquartile range standard deviation |
STD_SN |
standard deviation |
Hypothesis Testing Keywords |
|
MSIGN |
Sign statistic |
NORMALTEST |
Test statistic for normality |
SIGNRANK |
Signed rank statistic |
PROBM |
Probability of a greater absolute value for the sign statistic |
PROBN |
Probability value for the test of normality |
PROBS |
Probability value for the signed rank test |
PROBT |
Probability value for the Student’s t test |
T |
Statistic for the Student’s t test |
The UNIVARIATE procedure automatically computes the 1st, 5th, 10th, 25th, 50th, 75th, 90th, 95th, and 99th percentiles for the data. These can be saved in an output data set by using keyword=names specifications. You can request additional percentiles by using the PCTLPTS= option. The following percentile-options are related to these additional percentiles: