The SEVERITY Procedure

Parameter Estimation Method

If you do not specify a custom objective function by specifying programming statements and the OBJECTIVE= option in the PROC SEVERITY statement, then PROC SEVERITY uses the maximum likelihood (ML) method to estimate the parameters of each model. A nonlinear optimization process is used to maximize the log of the likelihood function. If you specify a custom objective function, then PROC SEVERITY uses a nonlinear optimization algorithm to estimate the parameters of each model that minimize the value of your specified objective function. For more information, see the section Custom Objective Functions.

Likelihood Function

Let $f_\Theta (x)$ and $F_\Theta (x)$ denote the PDF and CDF, respectively, evaluated at x for a set of parameter values $\Theta $. Let Y denote the random response variable, and let y denote its value recorded in an observation in the input data set. Let $T^ l$ and $T^ r$ denote the random variables for the left-truncation and right-truncation threshold, respectively, and let $t^ l$ and $t^ r$ denote their values for an observation, respectively. If there is no left-truncation, then $t^ l = \tau ^ l$, where $\tau ^ l$ is the smallest value in the support of the distribution; so $F(t^ l)=0$. If there is no right-truncation, then $t^ r = \tau _ h$, where $\tau _ h$ is the largest value in the support of the distribution; so $F(t^ r)=1$. Let $C^ l$ and $C^ r$ denote the random variables for the left-censoring and right-censoring limit, respectively, and let $c^ l$ and $c^ r$ denote their values for an observation, respectively. If there is no left-censoring, then $c^ l = \tau _ h$; so $F(c^ l)=1$. If there is no right-censoring, then $c^ r = \tau ^ l$; so $F(c^ r)=0$.

The set of input observations can be categorized into the following four subsets within each BY group:

  • E is the set of uncensored and untruncated observations. The likelihood of an observation in E is

    \[ l_{E} = \Pr (Y=y) = f_\Theta (y) \]
  • $E_ t$ is the set of uncensored observations that are truncated. The likelihood of an observation in $E_ t$ is

    \[ l_{E_ t} = \Pr (Y=y | t^ l < Y \leq t^ r) = \frac{f_\Theta (y)}{F_\Theta (t^ r) - F_\Theta (t^ l)} \]
  • C is the set of censored observations that are not truncated. The likelihood of an observation C is

    \[ l_{C} = \Pr (c^ r < Y \leq c^ l) = F_\Theta (c^ l) - F_\Theta (c^ r) \]
  • $C_ t$ is the set of censored observations that are truncated. The likelihood of an observation $C_ t$ is

    \[ l_{C_ t} = \Pr (c^ r < Y \leq c^ l | t^ l < Y \leq t^ r) = \frac{F_\Theta (c^ l) - F_\Theta (c^ r)}{F_\Theta (t^ r) - F_\Theta (t^ l)} \]

Note that $(E \cup E_ t) \cap (C \cup C_ t) = \emptyset $. Also, the sets $E_ t$ and $C_ t$ are empty when you do not specify truncation, and the sets C and $C_ t$ are empty when you do not specify censoring.

Given this, the likelihood of the data L is as follows:

\begin{equation*} L = {\displaystyle \prod _{E} f_\Theta (y)} {\displaystyle \prod _{E_ t} \frac{f_\Theta (y)}{F_\Theta (t^ r) - F_\Theta (t^ l)}} {\displaystyle \prod _{C} F_\Theta (c^ l) - F_\Theta (c^ r)} {\displaystyle \prod _{C_ t} \frac{F_\Theta (c^ l) - F_\Theta (c^ r)}{F_\Theta (t^ r) - F_\Theta (t^ l)}} \end{equation*}

The maximum likelihood procedure used by PROC SEVERITY finds an optimal set of parameter values $\hat{\Theta }$ that maximizes $\log (L)$ subject to the boundary constraints on parameter values. For a distribution dist, you can specify such boundary constraints by using the dist_LOWERBOUNDS and dist_UPPERBOUNDS subroutines. For more information, see the section Defining a Severity Distribution Model with the FCMP Procedure. Some aspects of the optimization process can be controlled by using the NLOPTIONS statement.

Probability of Observability and Likelihood

If you specify the probability of observability for the left-truncation, then PROC SEVERITY uses a modified likelihood function for each truncated observation. If the probability of observability is $p \in (0.0, 1.0]$, then for each left-truncated observation with truncation threshold $t^ l$, there exist $(1-p)/p$ observations with a response variable value less than or equal to $t^ l$. Each such observation has a probability of $\Pr (Y \leq t^ l) = F_\Theta (t^ l)$. The right-truncation and censoring information does not apply to these added observations. Thus, following the notation of the section Likelihood Function, the likelihood of the data is as follows:

\begin{align*} L = & {\displaystyle \prod _{E} f_\Theta (y)} {\displaystyle \prod _{E_ t, t^ l = \tau ^ l} \frac{f_\Theta (y)}{F_\Theta (t^ r)}} {\displaystyle \prod _{E_ t, t^ l > \tau ^ l} \frac{f_\Theta (y)}{F_\Theta (t^ r)} F_\Theta (t^ l)^{\frac{1-p}{p}}} \\ & {\displaystyle \prod _{C} F_\Theta (c^ l) - F_\Theta (c^ r)} {\displaystyle \prod _{C_ t, t^ l = \tau ^ l} \frac{F_\Theta (c^ l) - F_\Theta (c^ r)}{F_\Theta (t^ r)}} {\displaystyle \prod _{C_ t, t^ l > \tau ^ l} \frac{F_\Theta (c^ l) - F_\Theta (c^ r)}{F_\Theta (t^ r)} F_\Theta (t^ l)^{\frac{1-p}{p}}} \end{align*}

Note that the likelihood of the observations that are not left-truncated (observations in sets E and C, and observations in sets $E_ t$ and $C_ t$ for which $t^ l=\tau ^ l$) is not affected.

If you specify a custom objective function, then PROC SEVERITY accounts for the probability of observability only while computing the empirical distribution function estimate. The parameter estimates are affected only by your custom objective function.

Estimating Covariance and Standard Errors

PROC SEVERITY computes an estimate of the covariance matrix of the parameters by using the asymptotic theory of the maximum likelihood estimators (MLE). If N denotes the number of observations used for estimating a parameter vector $\pmb {\theta }$, then the theory states that as $N \rightarrow \infty $, the distribution of $\hat{\pmb {\theta }}$, the estimate of $\pmb {\theta }$, converges to a normal distribution with mean $\pmb {\theta }$ and covariance $\hat{\mathbf{C}}$ such that $\mathbf{I}(\pmb {\theta }) \cdot \hat{\mathbf{C}} \rightarrow 1$, where $\mathbf{I}(\pmb {\theta }) = -E\left[ \nabla ^2 \log (L(\pmb {\theta }))\right]$ is the information matrix for the likelihood of the data, $L(\pmb {\theta })$. The covariance estimate is obtained by using the inverse of the information matrix.

In particular, if $\mathbf{G} = \nabla ^2 (-\log (L(\pmb {\theta })))$ denotes the Hessian matrix of the negative of log likelihood, then the covariance estimate is computed as

\[ \hat{\mathbf{C}} = \frac{N}{d} \mathbf{G}^{-1} \]

where d is a denominator that is determined by the VARDEF= option. If VARDEF=N, then $d = N$, which yields the asymptotic covariance estimate. If VARDEF=DF, then $d=N - k$, where k is number of parameters (the model’s degrees of freedom). The VARDEF=DF option is the default, because it attempts to correct the potential bias introduced by the finite sample.

The standard error $s_ i$ of the parameter $\theta _ i$ is computed as the square root of the ith diagonal element of the estimated covariance matrix; that is, $s_ i = \sqrt {\hat{C}_{ii}}$.

If you specify a custom objective function, then the covariance matrix of the parameters is still computed by inverting the information matrix, except that the Hessian matrix $\mathbf{G}$ is computed as $\mathbf{G} = \nabla ^2 \log (U(\pmb {\theta }))$, where U denotes your custom objective function that is minimized by the optimizer.

Covariance and standard error estimates might not be available if the Hessian matrix is found to be singular at the end of the optimization process. This can especially happen if the optimization process stops without converging.