The SEVERITY Procedure

Censoring and Truncation

Subsections:

Probability of Observability
Truncation and Conditional CDF Estimates

One of the key features of PROC SEVERITY is that it enables you to specify whether the severity event’s magnitude is observable and if it is observable, then whether the exact value of the magnitude is known. If an event is unobservable when the magnitude is in certain intervals, then it is referred to as a truncation effect. If the exact magnitude of the event is not known, but it is known to have a value in a certain interval, then it is referred to as a censoring effect.

PROC SEVERITY allows a severity event to be subject to any combination of the following four censoring and truncation effects:

Left-truncation: An event is said to be left-truncated if it is observed only when $Y > T^ l$ , where $Y$ denotes the random variable for the magnitude and $T^ l$ denotes a random variable for the truncation threshold. You can specify left-truncation using the LEFTTRUNCATED= option in the LOSS statement.
Right-truncation: An event is said to be right-truncated if it is observed only when $Y \leq T^ r$ , where $Y$ denotes the random variable for the magnitude and $T^ r$ denotes a random variable for the truncation threshold. You can specify right-truncation using the RIGHTTRUNCATED= option in the LOSS statement.
Left-censoring: An event is said to be left-censored if it is known that the magnitude is $Y \leq C^ l$ , but the exact value of $Y$ is not known. $C^ l$ is a random variable for the censoring limit. You can specify left-censoring using the LEFTCENSORED= option in the LOSS statement.
Right-censoring: An event is said to be right-censored if it is known that the magnitude is $Y > C^ r$ , but the exact value of $Y$ is not known. $C^ r$ is a random variable for the censoring limit. You can specify right-censoring using the RIGHTCENSORED= option in the LOSS statement.

For each effect, you can specify a different threshold or limit for each observation or specify a single threshold or limit that applies to all the observations.

If all the four types of effects are present on an event, then the following relationship holds: $T^ l < C^ r \leq C^ l \leq T^ r$ . PROC SEVERITY checks these relationships and write a warning to the SAS log if any is violated.

If you specify the response variable in the LOSS statement, then PROC SEVERITY also checks whether each observation satisfies the definitions of the specified censoring and truncation effects. If you specify left-truncation, then PROC SEVERITY ignores observations where $Y \leq T^ l$ , because such observations are not observable by definition. Similarly, if you specify right-truncation, then PROC SEVERITY ignores observations where $Y > T^ r$ . If you specify left-censoring, then PROC SEVERITY treats an observation with $Y > C^ l$ as uncensored and ignores the value of $C^ l$ . The observations with $Y \leq C^ l$ are considered as left-censored, and the value of $Y$ is ignored. If you specify right-censoring, then PROC SEVERITY treats an observation with $Y \leq C^ r$ as uncensored and ignores the value of $C^ r$ . The observations with $Y > C^ r$ are considered as right-censored, and the value of $Y$ is ignored. If you specify both left-censoring and right-censoring, it is referred to as interval-censoring. If $C^ r < C^ l$ is satisfied for an observation, then it is considered as interval-censored and the value of the response variable is ignored. If $C^ r = C^ l$ for an observation, then PROC SEVERITY assumes that observation to be uncensored. If all the observations in a data set are censored in some form, then the specification of the response variable in the LOSS statement is optional, because the actual value of the response variable is not required for the purposes of estimating a model.

Specification of censoring and truncation affects the likelihood of the data (see the section Likelihood Function) and how the empirical distribution function (EDF) is estimated (see the section Empirical Distribution Function Estimation Methods).

Probability of Observability

For left-truncated data, PROC SEVERITY also enables you to provide additional information in the form of probability of observability by using the PROBOBSERVED= option. It is defined as the probability that the underlying severity event gets observed (and recorded) for the specified left-truncation threshold value. For example, if you specify a value of 0.75, then for every 75 observations recorded above a specified threshold, 25 more events have happened with a severity value less than or equal to the specified threshold. Although the exact severity value of those 25 events is not known, PROC SEVERITY can use the information about the number of those events.

In particular, for each left-truncated observation, PROC SEVERITY assumes a presence of $(1-p)/p$ additional observations with $y_ i = t_ i$ . These additional observations are then used for computing the likelihood (see the section Probability of Observability and Likelihood) and an unconditional estimate of the empirical distribution function (see the section EDF Estimates and Truncation).

Truncation and Conditional CDF Estimates

If you specify left-truncation without the probability of observability or if you specify right-truncation, then the EDF estimates that are computed by all methods except the STANDARD method are conditional on the truncation information. See the section EDF Estimates and Truncation for more information. In such cases, PROC SEVERITY uses conditional estimates of the CDF whenever they are used for computational or visual comparison with the EDF estimates.

Let $t^ l_{\text {min}} = \text {min}_ i \lbrace t^ l_ i \rbrace$ be the smallest value of the left-truncation threshold ( $t^ l_ i$ is the left-truncation threshold for observation $i$ ) and $t^ r_{\text {max}} = \text {max}_ i \lbrace t^ r_ i \rbrace$ be the largest value of the right-truncation threshold ( $t^ r_ i$ is the right-truncation threshold for observation $i$ ). If $\hat{F}(y)$ denotes the unconditional estimate of the CDF at $y$ , then the conditional estimate $\hat{F}^ c(y)$ is computed as follows:

If you do not specify the probability of observability, then the EDF estimates are conditional on the left-truncation information. If an observation is both left-truncated and right-truncated, then

$\hat{F}^ c(y) = \frac{\hat{F}(y) - \hat{F}(t^ l_{\text {min}})}{\hat{F}(t^ r_{\text {max}}) - \hat{F}(t^ l_{\text {min}})}$

If an observation is left-truncated but not right-truncated, then

$\hat{F}^ c(y) = \frac{\hat{F}(y) - \hat{F}(t^ l_{\text {min}})}{1 - \hat{F}(t^ l_{\text {min}})}$

If an observation is right-truncated but not left-truncated, then

$\hat{F}^ c(y) = \frac{\hat{F}(y)}{\hat{F}(t^ r_{\text {max}})}$
If you specify the probability of observability, then EDF estimates are not conditional on the left-truncation information. If an observation is not right-truncated, then the conditional estimate is the same as the unconditional estimate. If an observation is right-truncated, then the conditional estimate is computed as

$\hat{F}^ c(y) = \frac{\hat{F}(y)}{\hat{F}(t^ r_{\text {max}})}$

If you specify regressors, then $\hat{F}(y)$ , $\hat{F}(t^ l_{\text {min}})$ , and $\hat{F}(t^ r_{\text {max}})$ are all computed from a mixture distribution, as described in the section CDF and PDF Estimates with Regression Effects.