If you request a fitted parametric distribution, printed output summarizing the fit is produced in addition to the graphical display. Figure 5.16 shows the printed output for a fitted lognormal distribution requested by the following statements:
ods graphics off; proc capability data=Hang; spec target=14 lsl=13.95 usl=14.05; hist / lognormal(color=black indices midpercents); run;
Figure 5.16: Sample Summary of Fitted Distribution
Parameters for Lognormal Distribution  

Parameter  Symbol  Estimate 
Threshold  Theta  0 
Scale  Zeta  2.638966 
Shape  Sigma  0.001497 
Mean  13.99873  
Std Dev  0.020952 
GoodnessofFit Tests for Lognormal Distribution  

Test  Statistic  DF  p Value  
KolmogorovSmirnov  D  0.09148348  Pr > D  >0.150  
Cramervon Mises  WSq  0.05040427  Pr > WSq  >0.500  
AndersonDarling  ASq  0.33476355  Pr > ASq  >0.500  
ChiSquare  ChiSq  2.87938822  3  Pr > ChiSq  0.411 
Percent Outside Specifications for Lognormal Distribution  

Lower Limit  Upper Limit  
LSL  13.950000  USL  14.050000 
Obs Pct < LSL  2.000000  Obs Pct > USL  0 
Est Pct < LSL  0.992170  Est Pct > USL  0.728125 
Capability Indices Based on Lognormal Distribution 


Cp  0.795463 
CPL  0.776822 
CPU  0.814021 
Cpk  0.776822 
Cpm  0.792237 
Histogram Bin Percents for Lognormal Distribution 


Bin Midpoint 
Percent  
Observed  Estimated  
13.95  4.000  2.963 
13.97  18.000  15.354 
13.99  26.000  33.872 
14.01  38.000  32.055 
14.03  10.000  13.050 
14.05  4.000  2.281 
Quantiles for Lognormal Distribution  

Percent  Quantile  
Observed  Estimated  
1.0  13.9440  13.9501 
5.0  13.9656  13.9643 
10.0  13.9710  13.9719 
25.0  13.9860  13.9846 
50.0  14.0018  13.9987 
75.0  14.0129  14.0129 
90.0  14.0218  14.0256 
95.0  14.0241  14.0332 
99.0  14.0470  14.0475 
The summary is organized into the following parts:
Parameters
ChiSquare GoodnessofFit Test
EDF GoodnessofFit Tests
Specifications
Indices Using the Fitted Curve
Histogram Intervals
Quantiles
These parts are described in the sections that follow.
This section lists the parameters for the fitted curve as well as the estimated mean and estimated standard deviation. See Formulas for Fitted Curves.
The chisquare goodnessoffit statistic for a fitted parametric distribution is computed as follows:

where observed value in ith histogram interval expected value in ith histogram interval m = number of histogram intervals p = number of estimated parameters
The degrees of freedom for the chisquare test is equal to . You can save the observed and expected interval values in the OUTFIT= data set discussed in Output Data Sets.
Note that empty intervals are not combined, and the range of intervals used to compute begins with the first interval containing observations and ends with the final interval containing observations.
When you fit a parametric distribution, the HISTOGRAM statement provides a series of goodnessoffit tests based on the empirical distribution function (EDF). The EDF tests offer advantages over the chisquare goodnessoffit test, including improved power and invariance with respect to the histogram midpoints. For a thorough discussion, refer to D’Agostino and Stephens (1986).
The empirical distribution function is defined for a set of n independent observations with a common distribution function . Denote the observations ordered from smallest to largest as . The empirical distribution function, , is defined as

Note that is a step function that takes a step of height at each observation. This function estimates the distribution function . At any value x, is the proportion of observations less than or equal to x, while is the probability of an observation less than or equal to x. EDF statistics measure the discrepancy between and .
The computational formulas for the EDF statistics make use of the probability integral transformation . If is the distribution function of X, the random variable U is uniformly distributed between 0 and 1.
Given n observations , the values are computed by applying the transformation, as shown in the following sections.
The HISTOGRAM statement provides three EDF tests:
KolmogorovSmirnov
AndersonDarling
Cramérvon Mises
These tests are based on various measures of the discrepancy between the empirical distribution function and the proposed parametric cumulative distribution function .
The following sections provide formal definitions of the EDF statistics.
The KolmogorovSmirnov statistic (D) is defined as

The KolmogorovSmirnov statistic belongs to the supremum class of EDF statistics. This class of statistics is based on the largest vertical difference between and .
The KolmogorovSmirnov statistic is computed as the maximum of and , where is the largest vertical distance between the EDF and the distribution function when the EDF is greater than the distribution function, and is the largest vertical distance when the EDF is less than the distribution function.

The AndersonDarling statistic and the Cramérvon Mises statistic belong to the quadratic class of EDF statistics. This class of statistics is based on the squared difference . Quadratic statistics have the following general form:

The function weights the squared difference .
The AndersonDarling statistic () is defined as

Here the weight function is .
The AndersonDarling statistic is computed as

The Cramérvon Mises statistic () is defined as

Here the weight function is .
The Cramérvon Mises statistic is computed as

Once the EDF test statistics are computed, the associated probability values (pvalues) must be calculated.
For the Gumbel, inverse Gaussian, generalized Pareto, and Rayleigh distributions, the procedure computes associated probability values (pvalues) by resampling from the estimated distribution. It generates k random samples of size n, where k is specified by the EDFNSAMPLES= option and n is the number of observations in the original data. EDF test statistics are computed for each sample, and the pvalue is the proportion of samples whose EDF statistic is greater than or equal to the statistic computed for the original data. You can use the EDFSEED= option to specify a seed value for generating the sample values.
For the beta, exponential, gamma, lognormal, normal, power function, and Weibull distributions, the CAPABILITY procedure uses internal tables of probability levels similar to those given by D’Agostino and Stephens (1986). If the value is between two probability levels, then linear interpolation is used to estimate the probability value. The probability value depends upon the parameters that are known and the parameters that are estimated for the distribution you are fitting. Table 5.23 summarizes different combinations of estimated parameters for which EDF tests are available.
Table 5.23: Availability of EDF Tests
Distribution 
Parameters 
Tests Available 


Threshold 
Scale 
Shape 

known 
known 
known 
all 

known 
known 
unknown 
all 

known, 
known 
all 

known 
unknown 
all 

unknown 
known 
all 

unknown 
unknown 
all 

known 
known 
known 
all 

known 
unknown 
known 
all 

known 
known 
unknown 
all 

known 
unknown 
unknown 
all 

unknown 
known 
known 
all 

unknown 
unknown 
known 
all 

unknown 
known 
unknown 
all 

unknown 
unknown 
unknown 
all 

known 
known 
known 
all 

known 
known 
unknown 
and 

known 
unknown 
known 
and 

known 
unknown 
unknown 
all 

unknown 
known 
known 
all 

unknown 
known 
unknown 
all 

unknown 
unknown 
known 
all 

unknown 
unknown 
unknown 
all 

known 
known 
all 

known 
unknown 
and 

unknown 
known 
and 

unknown 
unknown 
all 

known 
known 
c known 
all 

known 
unknown 
c known 
and 

known 
known 
c unknown 
and 

known 
unknown 
c unknown 
and 

unknown 
known 
c > 2 known 
all 

unknown 
unknown 
c > 2 known 
all 

unknown 
known 
c > 2 unknown 
all 

unknown 
unknown 
c > 2 unknown 
all 
This section is included in the summary only if you provide specification limits, and it tabulates the limits as well as the observed percentages and estimated percentages outside the limits.
The estimated percentages are computed only if fitted distributions are requested and are based on the probability that an observed value exceeds the specification limits, assuming the fitted distribution. The observed percentages are the percents of observations outside the specification limits.
This section is included in the summary only if you specify the INDICES option in parentheses after a distribution option, as in the statements that produce Figure 5.16. Standard process capability indices, such as and , are not appropriate if the data are not normally distributed. The INDICES option computes generalizations of the standard indices by using the fact that for the normal distribution, is both the distance from the lower 0.135 percentile to the median (or mean) and the distance from the median (or mean) to the upper 99.865 percentile. These percentiles are estimated from the fitted distribution, and the appropriate percentiletomedian distances are substituted for in the standard formulas.
Writing T for the target, LSL and USL for the lower and upper specification limits, and for the th percentile, the generalized capability indices are as follows:






If the data are normally distributed, these formulas reduce to the formulas for the standard capability indices, which are given in the section Standard Capability Indices.
The following guidelines apply to the use of generalized capability indices requested with the INDICES option:
When you choose the family of parametric distributions for the fitted curve, consider whether an appropriate family can be derived from assumptions about the process.
Whenever possible, examine the data distribution with a histogram, probability plot, or quantilequantile plot.
Apply goodnessoffit tests to assess how well the parametric distribution models the data.
Consider whether a generalized index has a meaningful practical interpretation in your application.
At the time of this writing, there is ongoing research concerning the application of generalized capability indices, and it is important to note that other approaches can be used with nonnormal data:
Transform the data to normality, then compute and report standard capability indices on the transformed scale.
Report the proportion of nonconforming output estimated from the fitted distribution.
If it is not possible to adequately model the data distribution with a parametric density, smooth the data distribution with a kernel density estimate and simply report the proportion of nonconforming output.
Refer to Rodriguez and Bynum (1992) for additional discussion.
This section is included in the summary only if you specify the MIDPERCENTS option in parentheses after the distribution option, as in the statements that produce Figure 5.16. This table lists the interval midpoints along with the observed and estimated percentages of the observations that lie in the interval. The estimated percentages are based on the fitted distribution.
In addition, you can specify the MIDPERCENTS option to request a table of interval midpoints with the observed percent of observations that lie in the interval. See the entry for the MIDPERCENTS option .
This table lists observed and estimated quantiles. You can use the PERCENTS= option to specify the list of quantiles to appear in this list. The list in Figure 5.16 is the default list. See the entry for the PERCENTS= option .