OUTPUT Statement: CAPABILITY Procedure

Syntax: OUTPUT Statement

The syntax for the OUTPUT statement is as follows:

OUTPUT <OUT=SAS-data-set> <keyword1=names …keywordk=names> <percentile-options> ;

You can use any number of OUTPUT statements in the CAPABILITY procedure. Each OUTPUT statement creates a new data set containing the statistics specified in that statement. When you use the OUTPUT statement, you must also use the VAR statement. In addition, the OUTPUT statement must contain at least one of the following:

You can use the OUT= option to specify the name of the output data set:

OUT=SAS-data-set

specifies the name of the output data set. To create a permanent SAS data set, specify a two-level name. See SAS Statements: Reference for more information on permanent SAS data sets. For example, the previous statements create an output data set named Summary. If the OUT= option is omitted, then by default the new data set is named using the DATAn convention.

A keyword=names specification selects a statistic to be included in the output data set and gives names to the new variables that contain the statistics. Specify a keyword for each desired statistic, an equal sign, and the names of the variables to contain the statistic.

In the output data set, the first variable listed after a keyword in the OUTPUT statement contains the statistic for the first variable listed in the VAR statement; the second variable contains the statistic for the second variable in the VAR statement, and so on. The list of names following the equal sign can be shorter than the list of variables in the VAR statement. In this case, the procedure uses the names in the order in which the variables are listed in the VAR statement. Consider the following example:

proc capability noprint;
   var length width height;
   output out=summary mean=mlength mwidth;
run;

The variables mlength and mwidth contain the means for length and width. The mean for height is computed by the procedure but is not saved in the output data set.

Table 5.52 lists all keywords available in the OUTPUT statement grouped by type. Formulas for selected statistics are given in the section Details: CAPABILITY Procedure.

Table 5.52: OUTPUT Statement Statistic Keywords

Keyword

Description

Descriptive Statistics

CSS

sum of squares corrected for the mean

CV

percent coefficient of variation

KURTOSIS | KURT

kurtosis

MAX

largest (maximum) value

MEAN

mean

MIN

smallest (minimum) value

MODE

most frequent value (if not unique, the smallest mode)

N

number of observations on which calculations are based

NMISS

number of missing values

NOBS

number of observations

RANGE

range

SKEWNESS | SKEW

skewness

STD | STDDEV

standard deviation

STDMEAN | STDERR

standard error of the mean

SUM

sum

SUMWGT

sum of weights

USS

uncorrected sum of squares

VAR

variance

Quantile Statistics

MEDIAN | P50 | Q2

median (50th percentile)

P1

1st percentile

P5

5th percentile

P10

10th percentile

P90

90th percentile

P95

95th percentile

P99

99th percentile

Q1 | P25

lower quartile (25th percentile)

Q3 | P75

upper quartile (75th percentile)

QRANGE

interquartile range (Q3 – Q1)

Robust Statistics

GINI

Gini’s mean difference

MAD

median absolute difference

QN

2nd variation of median absolute difference

SN

1st variation of median absolute difference

STD_GINI

standard deviation for Gini’s mean difference

STD_MAD

standard deviation for median absolute difference

STD_QN

standard deviation for the second variation of the median absolute difference

STD_QRANGE

estimate of the standard deviation, based on interquartile range

STD_SN

standard deviation for the first variation of the median absolute difference

Hypothesis Test Statistics

MSIGN

sign statistic

NORMAL

test statistic for normality. If the sample size is less than or equal to 2000, this is the Shapiro-Wilk W statistic. Otherwise, it is the Kolmogorov D statistic.

PNORMAL | PROBN

p-value for normality test

PROBM

probability of a greater absolute value for the sign statistic

PROBS

probability of a greater absolute value for the signed rank statistic

PROBT

two-tailed p-value for Student’s t statistic with $n-1$ degrees of freedom

SIGNRANK

signed rank statistic

T

Student’s t statistic to test the null hypothesis that the population mean is equal to $\mu _0$

Specification Limits and Related Statistics

LSL

lower specification limit

PCTGTR

percent of nonmissing observations greater than

 

the upper specification limit

PCTLSS

percent of nonmissing observations less than

 

the lower specification limit

TARGET

target value

USL

upper specification limit

Capability Indices and Related Statistics

CP

capability index $C_{p}$

CPLCL

lower confidence limit for $C_{p}$

CPUCL

upper confidence limit for $C_{p}$

CPK

capability index $C_{pk}$ (also denoted CPK)

CPKLCL

lower confidence limit for $C_{pk}$

CPKUCL

upper confidence limit for $C_{pk}$

CPL

capability index CPL

CPLLCL

lower confidence limit for $CPL$

CPLUCL

upper confidence limit for $CPL$

CPM

capability index $C_{pm}$

CPMLCL

lower confidence limit for $C_{pm}$

CPMUCL

upper confidence limit for $C_{pm}$

CPU

capability index CPU

CPULCL

lower confidence limit for $CPU$

CPUUCL

upper confidence limit for $CPU$

K

capability index k (also denoted K)


The CAPABILITY procedure automatically computes the 1st, 5th, 10th, 25th, 50th, 75th, 90th, 95th, and 99th percentiles for the data. You can save these statistics in an output data set by using keyword=names specifications. You can request additional percentiles by using the PCTLPTS= option. The following percentile-options are related to these additional percentiles:

CIPCTLDF=(cipctl-options)
CIQUANTDF=(cipctl-options)

requests distribution-free confidence limits for percentiles that are requested with the PCTLPTS= option. In other words, no specific parametric distribution such as the normal is assumed for the data. PROC CAPABILITY uses order statistics (ranks) to compute the confidence limits as described by Hahn and Meeker (1991). This option does not apply if you use a WEIGHT statement. You can specify the following cipctl-options:

ALPHA=$\alpha $

specifies the level of significance $\alpha $ for $100(1-\alpha )\% $ confidence intervals. The value $\alpha $ must be between 0 and 1; the default value is 0.05, which results in 95% confidence intervals. The default value is the value of ALPHA= given in the PROC statement.

LOWERPRE=prefixes

specifies one or more prefixes that are used to create names for variables that contain the lower confidence limits. To save lower confidence limits for more than one analysis variable, specify a list of prefixes. The order of the prefixes corresponds to the order of the analysis variables in the VAR statement.

LOWERNAME=suffixes

specifies one or more suffixes that are used to create names for variables that contain the lower confidence limits. PROC CAPABILITY creates a variable name by combining the LOWERPRE= value and suffix name. Because the suffixes are associated with the requested percentiles, list the suffixes in the same order as the PCTLPTS= percentiles.

TYPE=keyword

specifies the type of confidence limit, where keyword is LOWER, UPPER, SYMMETRIC, or ASYMMETRIC. The default value is SYMMETRIC.

UPPERPRE=prefixes

specifies one or more prefixes that are used to create names for variables that contain the upper confidence limits. To save upper confidence limits for more than one analysis variable, specify a list of prefixes. The order of the prefixes corresponds to the order of the analysis variables in the VAR statement.

UPPERNAME=suffixes

specifies one or more suffixes that are used to create names for variables that contain the upper confidence limits. PROC CAPABILITY creates a variable name by combining the UPPERPRE= value and suffix name. Because the suffixes are associated with the requested percentiles, list the suffixes in the same order as the PCTLPTS= percentiles.

Note: See the entries for the PCTLPTS=, PCTLPRE=, and PCTLNAME= options for a detailed description of how variable names are created using prefixes, percentile values, and suffixes.

CIPCTLNORMAL=(cipctl-options)
CIQUANTNORMAL=(cipctl-options)

requests confidence limits based on the assumption that the data are normally distributed for percentiles that are requested with the PCTLPTS= option. The computational method is described in Section 4.4.1 of Hahn and Meeker (1991) and uses the noncentral $t$ distribution as given by Odeh and Owen (1980). This option does not apply if you use a WEIGHT statement. You can specify the following cipctl-options:

ALPHA=$\alpha $

specifies the level of significance $\alpha $ for $100(1-\alpha )\% $ confidence intervals. The value $\alpha $ must be between 0 and 1; the default value is 0.05, which results in 95% confidence intervals. The default value is the value of ALPHA= given in the PROC statement.

LOWERPRE=prefixes

specifies one or more prefixes that are used to create names for variables that contain the lower confidence limits. To save lower confidence limits for more than one analysis variable, specify a list of prefixes. The order of the prefixes corresponds to the order of the analysis variables in the VAR statement.

LOWERNAME=suffixes

specifies one or more suffixes that are used to create names for variables that contain the lower confidence limits. PROC CAPABILITY creates a variable name by combining the LOWERPRE= value and suffix name. Because the suffixes are associated with the requested percentiles, list the suffixes in the same order as the PCTLPTS= percentiles.

TYPE=keyword

specifies the type of confidence limit, where keyword is LOWER, UPPER, or TWOSIDED. The default is TWOSIDED.

UPPERPRE=prefixes

specifies one or more prefixes that are used to create names for variables that contain the upper confidence limits. To save upper confidence limits for more than one analysis variable, specify a list of prefixes. The order of the prefixes corresponds to the order of the analysis variables in the VAR statement.

UPPERNAME=suffixes

specifies one or more suffixes that are used to create names for variables that contain the upper confidence limits. PROC CAPABILITY creates a variable name by combining the UPPERPRE= value and suffix name. Because the suffixes are associated with the requested percentiles, list the suffixes in the same order as the PCTLPTS= percentiles.

Note: See the entries for the PCTLPTS=, PCTLPRE=, and PCTLNAME= options for a detailed description of how variable names are created using prefixes, percentile values, and suffixes.

PCTLGROUP=BYSTAT | BYVAR

specifies the order in which variables that you request with the PCTLPTS= option are added to the OUT= data set when the VAR statement lists more than one analysis variable. By default (or if you specify PCTLGROUP=BYSTAT), all variables that are associated with a percentile value are created consecutively. If you specify PCTLGROUP=BYVAR, all variables that are associated with an analysis variable are created consecutively.

Consider the following statements:

proc univariate data=Score;
   var PreTest PostTest;
   output out=ByStat pctlpts=20 40 pctlpre=Pre_ Post_;
   output out=ByVar pctlgroup=byvar pctlpts=20 40 pctlpre=Pre_ Post_;
run;

The order of variables in the data set ByStat is Pre_20, Post_20, Pre_40, Post_40. The order of variables in the data set ByVar is Pre_20, Pre_40, Post_20, Post_40.

PCTLNAME=suffixes

provides name suffixes for the new variables created by the PCTLPTS= option. These suffixes are appended to the prefixes you specify with the PCTLPRE= option, replacing the percentile values that are used as suffixes by default. List the suffixes in the same order in which you specify the percentiles. If you specify n suffixes with the PCTLNAME= option and m percentile values with the PCTLPTS= option, where $m > n$, the suffixes are used to name the first n percentiles, and the default names are used for the remaining $m - n$ percentiles. For example, consider the following statements:

proc capability;
   var length width height;
   output pctlpts  = 20 40
          pctlpre  = pl pw ph
          pctlname = twenty;
run;

The value twenty in the PCTLNAME= option is used for only the first percentile in the PCTLPTS= list. This suffix is appended to the values in the PCTLPRE= option to generate the new variable names pltwenty, pwtwenty, and phtwenty, which contain the 20th percentiles for length, width, and height, respectively. Because a second PCTLNAME= suffix is not specified, variable names for the 40th percentiles for length, width, and height are generated using the prefixes and percentile values. Thus, the output data set contains the variables pltwenty, pl40, pwtwenty, pw40, phtwenty, and ph40.

PCTLNDEC=value

specifies the number of decimal places in percentile values that are incorporated into percentile variable names. The default value is 1. For example, the following statements create two output data sets, each containing one percentile variable. The variable in data set short is named pwid85_1, while the one in data set long is named pwid85_125.

proc capability;
   var width;
   output out=short pctlpts=85.125 pctlpre=pwid;
   output out=long  pctlpts=85.125 pctlpre=pwid pctlndec=3;
run;
PCTLPRE=prefixes

specifies prefixes used to create variable names for percentiles requested with the PCTLPTS= option. The PCTLPRE= and PCTLPTS= options must be used together.

The procedure generates new variable names by using the prefix and the percentile values. If the specified percentile is an integer, the variable name is simply the prefix followed by the value. For noninteger percentiles, an underscore replaces the decimal point in the variable name, and decimal values are truncated to one decimal place. For example, the following statements create the variables pwid20, pwid33_3, pwid66_6, and pwid80 for the 20th, 33.33rd, 66.67th, and 80th percentiles of width, respectively:

proc capability noprint;
   var width;
   output pctlpts=20 33.33 66.67 80 pctlpre=pwid;
run;

If you request percentiles for more than one variable, you should list prefixes in the same order in which the variables appear in the VAR statement. For example, the following statements compute the 80th and 87.5th percentiles for length and width and save the new variables plength80, plength87_5, pwidth80, and pwidth87_5 in the output data set:

proc capability noprint;
   var length width;
   output pctlpts=80 87.5 pctlpre=plength pwidth;
run;
PCTLPTS=percentiles

specifies percentiles that are not automatically computed by the procedure. The CAPABILITY procedure automatically computes the 1st, 5th, 10th, 25th, 50th, 75th, 90th, 95th, and 99th percentiles for the data. These can be saved in an output data set by using keyword=names specifications. The PCTLPTS= option generates additional percentiles and outputs them to a data set; these additional percentiles are not printed.

If you use the PCTLPTS= option, you must also use the PCTLPRE= option to provide a prefix for the new variable names. For example, to create variables that contain the 20th, 40th, 60th, and 80th percentiles of length, use the following statements:

proc capability noprint;
   var length;
   output pctlpts=20 40 60 80 pctlpre=plen;
run;

This creates the variables plen20, plen40, plen60, and plen80, whose values are the corresponding percentiles of length. In addition to specifying name prefixes with the PCTLPRE= option, you can also use the PCTLNAME= option to create name suffixes for the new variables created by the PCTLPTS= option.