The SEVERITY Procedure

Parameter Estimation Method

Likelihood Function
Probability of Observability and Likelihood
Estimating Covariance and Standard Errors

If you have not specified a custom objective function by specifying programming statements and the OBJECTIVE= option in the PROC SEVERITY statement, then PROC SEVERITY uses the maximum likelihood (ML) method to estimate the parameters of each model. A nonlinear optimization process is used to maximize the log of the likelihood function. If you have specified a custom objective function, then PROC SEVERITY uses a nonlinear optimization algorithm to estimate the parameters of each model that minimize the value of your specified objective function. For more information, see the section Custom Objective Functions.

Likelihood Function

Let $f_\Theta (x)$ and $F_\Theta (x)$ denote the PDF and CDF, respectively, evaluated at for a set of parameter values $\Theta$ . Let denote the random response variable, and let denote its value recorded in an observation in the input data set. Let and denote the random variables for the left-truncation and right-truncation threshold, respectively, and let and denote their values for an observation, respectively. If there is no left-truncation, then $t^ l = \tau ^ l$ , where $\tau ^ l$ is the smallest value in the support of the distribution; so . If there is no right-truncation, then $t^ r = \tau _ h$ , where $\tau _ h$ is the largest value in the support of the distribution; so . Let and denote the random variables for the left-censoring and right-censoring limit, respectively, and let and denote their values for an observation, respectively. If there is no left-censoring, then $c^ l = \tau _ h$ , so . If there is no right-censoring, then $c^ r = \tau ^ l$ , so .

The set of input observations can be categorized into the following four subsets within each BY group:

is the set of uncensored and untruncated observations. The likelihood of an observation in is

$l_{E} = \Pr (Y=y) = f_\Theta (y)$
is the set of uncensored observations that are truncated. The likelihood of an observation in is

$l_{E_ t} = \Pr (Y=y | t^ l < Y \leq t^ r) = \frac{f_\Theta (y)}{F_\Theta (t^ r) - F_\Theta (t^ l)}$
is the set of censored observations that are not truncated. The likelihood of an observation is

$l_{C} = \Pr (c^ r < Y \leq c^ l) = F_\Theta (c^ l) - F_\Theta (c^ r)$
is the set of censored observations that are truncated. The likelihood of an observation is

$l_{C_ t} = \Pr (c^ r < Y \leq c^ l | t^ l < Y \leq t^ r) = \frac{F_\Theta (c^ l) - F_\Theta (c^ r)}{F_\Theta (t^ r) - F_\Theta (t^ l)}$

Note that $(E \cup E_ t) \cap (C \cup C_ t) = \emptyset$ . Also, the sets and are empty when no truncation is specified, and the sets and are empty when no censoring is specified.

Given this, the likelihood of the data is as follows:

$\begin{equation*} L = {\displaystyle \prod _{E} f_\Theta (y)} {\displaystyle \prod _{E_ t} \frac{f_\Theta (y)}{F_\Theta (t^ r) - F_\Theta (t^ l)}} {\displaystyle \prod _{C} F_\Theta (c^ l) - F_\Theta (c^ r)} {\displaystyle \prod _{C_ t} \frac{F_\Theta (c^ l) - F_\Theta (c^ r)}{F_\Theta (t^ r) - F_\Theta (t^ l)}} \end{equation*}$

The maximum likelihood procedure used by PROC SEVERITY finds an optimal set of parameter values $\hat{\Theta }$ that maximizes $\log (L)$ subject to the boundary constraints on parameter values. For a distribution dist, such boundary constraints can be specified by using the dist_LOWERBOUNDS and dist_UPPERBOUNDS subroutines. See the section Defining a Distribution Model with the FCMP Procedure for more information. Some aspects of the optimization process can be controlled by using the NLOPTIONS statement.

Probability of Observability and Likelihood

If probability of observability is specified for the left-truncation, then PROC SEVERITY uses a modified likelihood function for each truncated observation. If the probability of observability is $p \in (0.0, 1.0]$ , then for each left-truncated observation with truncation threshold , there exist observations with a response variable value less than or equal to . Each such observation has a probability of $\Pr (Y \leq t^ l) = F_\Theta (t^ l)$ . The right-truncation and censoring information does not apply to these added observations. Thus, following the notation of the section Likelihood Function, the likelihood of the data is as follows:

$\displaystyle L =$	$\displaystyle {\displaystyle \prod _{E} f_\Theta (y)} {\displaystyle \prod _{E_ t, t^ l = \tau ^ l} \frac{f_\Theta (y)}{F_\Theta (t^ r)}} {\displaystyle \prod _{E_ t, t^ l > \tau ^ l} \frac{f_\Theta (y)}{F_\Theta (t^ r)} F_\Theta (t^ l)^{\frac{1-p}{p}}}$
$\displaystyle$	$\displaystyle {\displaystyle \prod _{C} F_\Theta (c^ l) - F_\Theta (c^ r)} {\displaystyle \prod _{C_ t, t^ l = \tau ^ l} \frac{F_\Theta (c^ l) - F_\Theta (c^ r)}{F_\Theta (t^ r)}} {\displaystyle \prod _{C_ t, t^ l > \tau ^ l} \frac{F_\Theta (c^ l) - F_\Theta (c^ r)}{F_\Theta (t^ r)} F_\Theta (t^ l)^{\frac{1-p}{p}}}$

Note that the likelihood of the observations that are not left-truncated (observations in sets and , and observations in sets and for which $t^ l=\tau ^ l$ ) is not affected.

If you have specified a custom objective function, then PROC SEVERITY accounts for the probability of observability only while computing the empirical distribution function estimate. The parameter estimates are affected only by your custom objective function.

Estimating Covariance and Standard Errors

PROC SEVERITY computes an estimate of the covariance matrix of the parameters by using the asymptotic theory of the maximum likelihood estimators (MLE). If denotes the number of observations used for estimating a parameter vector $\pmb {\theta }$ , then the theory states that as $N \rightarrow \infty$ , the distribution of $\hat{\pmb {\theta }}$ , the estimate of $\pmb {\theta }$ , converges to a normal distribution with mean $\pmb {\theta }$ and covariance $\hat{\mathbf{C}}$ such that $\mathbf{I}(\pmb {\theta }) \cdot \hat{\mathbf{C}} \rightarrow 1$ , where $\mathbf{I}(\pmb {\theta }) = -E\left[ \nabla ^2 \log (L(\pmb {\theta }))\right]$ is the information matrix for the likelihood of the data, $L(\pmb {\theta })$ . The covariance estimate is obtained by using the inverse of the information matrix.

In particular, if $\mathbf{G} = \nabla ^2 (-\log (L(\pmb {\theta })))$ denotes the Hessian matrix of the negative of log likelihood, then the covariance estimate is computed as

$\hat{\mathbf{C}} = \frac{N}{d} \mathbf{G}^{-1}$

where is a denominator that is determined by the VARDEF= option. If VARDEF=N, then , which yields the asymptotic covariance estimate. If VARDEF=DF, then , where is number of parameters (the model’s degrees of freedom). The VARDEF=DF option is the default, because it attempts to correct the potential bias introduced by the finite sample.

The standard error of the parameter $\theta _ i$ is computed as the square root of the th diagonal element of the estimated covariance matrix; that is, $s_ i = \sqrt {\hat{C}_{ii}}$ .

If you have specified a custom objective function, then the covariance matrix of the parameters is still computed by inverting the information matrix, except that the Hessian matrix $\mathbf{G}$ is computed as $\mathbf{G} = \nabla ^2 \log (U(\pmb {\theta }))$ , where denotes your custom objective function that is minimized by the optimizer.

Covariance and standard error estimates might not be available if the Hessian matrix is found to be singular at the end of the optimization process. This can especially happen if the optimization process stops without converging.