PROC SEVERITY: Statistics of Fit :: SAS/ETS(R) 9.22 User's Guide

The SEVERITY Procedure

Statistics of Fit

PROC SEVERITY computes and reports various statistics of fit to indicate how well the estimated model fits the data. The statistics belong to two categories: likelihood-based statistics and EDF-based statistics. Statistics Neg2LogLike, AIC, AICC, and BIC are likelihood-based statistics, and statistics KS, AD, and CvM are EDF-based statistics. The following subsections provide definitions of each.

Likelihood-Based Statistics

Let $\text{[math]}$ denote the response variable values. Let $\text{[math]}$ be the likelihood as defined in the section Likelihood Function. Let $\text{[math]}$ denote the number of model parameters estimated. Note that $\text{[math]}$ , where $\text{[math]}$ is the number of distribution parameters, $\text{[math]}$ is the number of regressors, if any, specified in the MODEL statement, and $\text{[math]}$ is the number of regressors found to be linearly dependent (redundant) on other regressors. Given this notation, the likelihood-based statistics are defined as follows:

Neg2LogLike

The log likelihood is reported as

$\text{[math]}$

The multiplying factor $\text{[math]}$ makes it easy to compare it to the other likelihood-based statistics. A model with a smaller value of Neg2LogLike is deemed better.

AIC

The Akaike’s information criterion (AIC) is defined as

$\text{[math]}$

A model with a smaller value of AIC is deemed better.

AICC

The corrected Akaike’s information criterion (AICC) is defined as

$\text{[math]}$

A model with a smaller value of AICC is deemed better. It corrects the finite-sample bias that AIC has when $\text{[math]}$ is small compared to $\text{[math]}$ . AICC is related to AIC as

$\text{[math]}$

As $\text{[math]}$ becomes large compared to $\text{[math]}$ , AICC converges to AIC. AICC is usually recommended over AIC as a model selection criterion.

BIC

The Schwarz Bayesian information criterion (BIC) is defined as

$\text{[math]}$

A model with a smaller value of BIC is deemed better.

EDF-Based Statistics

This class of statistics is based on the difference between the estimate of the cumulative distribution function (CDF) and the estimate of the empirical distribution function (EDF). Let $\text{[math]}$ denote the sample of $\text{[math]}$ values of the response variable. Let $\text{[math]}$ denote the number of observations with a value less than or equal to $\text{[math]}$ , where $\text{[math]}$ is an indicator function. Let $\text{[math]}$ denote the EDF estimate that is computed by using the method specified in the EMPIRICALCDF= option. Let $\text{[math]}$ denote the estimate of the CDF. Let $\text{[math]}$ denote the EDF estimate of $\text{[math]}$ values that are computed using the same method that is used to compute the EDF of $\text{[math]}$ values. Using the probability integral transformation, if $\text{[math]}$ is the true distribution of the random variable $\text{[math]}$ , then the random variable $\text{[math]}$ is uniformly distributed between 0 and 1 (D’Agostino and Stephens 1986, Ch. 4). Thus, comparing $\text{[math]}$ with $\text{[math]}$ is equivalent to comparing $\text{[math]}$ with $\text{[math]}$ (uniform distribution).

Note the following two points regarding which CDF estimates are used for computing the test statistics:

If regressor variables are specified, then the CDF estimates $\text{[math]}$ used for computing the EDF test statistics are from a mixture distribution. See the section CDF and PDF Estimates with Regression Effects for details.
If left-truncation is specified without the probability of observability and the method for computing the EDF estimate is KAPLANMEIER or MODIFIEDKM, then $\text{[math]}$ is a conditional estimate of the EDF, as noted in the section EDF Estimates and Left-Truncation. However, $\text{[math]}$ is an unconditional estimate of the CDF. So, a conditional estimate of the CDF needs to be used for computing the EDF-based statistics. It is denoted by $\text{[math]}$ and defined as:

$\text{[math]}$

where $\text{[math]}$ is the smallest value of the left-truncation threshold.
Note that if regressors are specified, then both $\text{[math]}$ and $\text{[math]}$ are computed from a mixture distribution, as indicated previously.

In the following, it is assumed that $\text{[math]}$ denotes an appropriate estimate of the CDF if left-truncation or regression effects are specified.

Given this, the EDF-based statistics of fit are defined as follows:

KS

The Kolmogorov-Smirnov (KS) statistic computes the largest vertical distance between the CDF and the EDF. It is formally defined as follows:

$\text{[math]}$

If the STANDARD method is used to compute the EDF, then the following formula is used:

	$\text{[math]}$	$\text{[math]}$
	$\text{[math]}$	$\text{[math]}$
	$\text{[math]}$	$\text{[math]}$

Note that $\text{[math]}$ is assumed to be 0.

If the method used to compute the EDF is any method other than the STANDARD method, then the following formula is used:

	$\text{[math]}$	$\text{[math]}$
	$\text{[math]}$	$\text{[math]}$
	$\text{[math]}$	$\text{[math]}$

AD

The Anderson-Darling (AD) statistic is a quadratic EDF statistic that is proportional to the expected value of the weighted squared difference between the EDF and CDF. It is formally defined as follows:

$\text{[math]}$

If the STANDARD method is used to compute the EDF, then the following formula is used:

$\text{[math]}$

If the method used to compute the EDF is any method other than the STANDARD method, then the statistic can be computed by using the following two pieces of information:

The EDF estimate is a step function. In the interval $\text{[math]}$ , it is equal to $\text{[math]}$ .
Using the probability integral transform $\text{[math]}$ , the formula simplifies to

$\text{[math]}$

The computation formula can then be derived from the following approximation:

$\text{[math]}$

Assuming $\text{[math]}$ , $\text{[math]}$ , $\text{[math]}$ , and $\text{[math]}$ yields the following computation formula:

$\text{[math]}$

where $\text{[math]}$ and $\text{[math]}$ .

CvM

The Cramér-von-Mises (CvM) statistic is a quadratic EDF statistic that is proportional to the expected value of the squared difference between the EDF and CDF. It is formally defined as follows:

$\text{[math]}$

If the STANDARD method is used to compute the EDF, then the following formula is used:

$\text{[math]}$

If the method used to compute the EDF is any method other than the STANDARD method, then the statistic can be computed by using the following two pieces of information:

The EDF estimate is a step function. In the interval $\text{[math]}$ , it is equal to $\text{[math]}$ .
Using the probability integral transform $\text{[math]}$ , the formula simplifies to:

$\text{[math]}$

The computation formula can then be derived from the following approximation:

$\text{[math]}$

Assuming $\text{[math]}$ , $\text{[math]}$ , and $\text{[math]}$ yields the following computation formula:

$\text{[math]}$

This formula is similar to the one proposed by Koziol and Green (1976).

Note: This procedure is experimental.

Top of Page