The HPSEVERITY Procedure

Example 23.5 Fitting Distributions to Interval-Censored Data

In some applications, the data available for modeling might not be exact. A commonly encountered scenario is the use of grouped data from an external agency, which for several reasons, including privacy, does not provide information about individual loss events. The losses are grouped into disjoint bins, and you know only the range and number of values in each bin. Each group is essentially interval-censored, because you know that a loss magnitude is in certain interval, but you do not know the exact magnitude. This example illustrates how you can use PROC HPSEVERITY to model such data.

The following DATA step generates sample grouped data for dental insurance claims, which is taken from Klugman, Panjer, and Willmot (1998):

/* Grouped dental insurance claims data
    (Klugman, Panjer, and Willmot, 1998) */
data gdental;
   input lowerbd upperbd count @@;
   datalines;
0 25 30  25 50 31  50 100 57  100 150 42  150 250 65  250 500 84
500 1000 45  1000 1500 10  1500 2500 11  2500 4000 3
;
run;

The following PROC HPSEVERITY step fits all the predefined distributions to the data in Work.Gdental data set:

/* Fit all predefined distributions */
proc hpseverity data=gdental edf=turnbull print=all criterion=aicc;
   loss / rc=lowerbd lc=upperbd;
   weight count;
   dist _predef_;
   performance nthreads=1;
run;

The EDF= option in the PROC HPSEVERITY statement specifies that the Turnbull’s method be used for EDF estimation. The LOSS statement specifies the left and right boundaries of each group as the right-censoring and left-censoring limits, respectively. The variable count records the number of losses in each group and is specified in the WEIGHT statement. Note that no response variable is specified in the LOSS statement, which is allowed as long as each observation in the input data set is censored. The PERFORMANCE statement specifies that just one thread of execution be used, to minimize the overhead associated with multithreading, because the input data set is very small.

Some of the key results prepared by PROC HPSEVERITY are shown in Output 23.5.1. According to the "Model Selection" table in Output 23.5.1, all distribution models have converged. The "All Fit Statistics" table in Output 23.5.1 indicates that the exponential distribution (EXP) has the best fit for data according to a majority of the likelihood-based statistics.

Output 23.5.1: Statistics of Fit for Interval-Censored Data

The HPSEVERITY Procedure

Input Data Set
Name WORK.GDENTAL

Model Selection
Distribution Converged AICC Selected
Burr Yes 51.41112 No
Exp Yes 44.64768 Yes
Gamma Yes 47.63969 No
Igauss Yes 48.05874 No
Logn Yes 47.34027 No
Pareto Yes 47.16908 No
Gpd Yes 47.16908 No
Weibull Yes 47.47700 No

All Fit Statistics
Distribution -2 Log
Likelihood
AIC AICC BIC KS AD CvM
Burr 41.41112 * 47.41112   51.41112   48.31888   0.08974 * 0.00103 * 0.0000816 *
Exp 42.14768   44.14768 * 44.64768 * 44.45026 * 0.26412   0.09936   0.01866  
Gamma 41.92541   45.92541   47.63969   46.53058   0.19569   0.04608   0.00759  
Igauss 42.34445   46.34445   48.05874   46.94962   0.34514   0.12301   0.02562  
Logn 41.62598   45.62598   47.34027   46.23115   0.16853   0.01884   0.00333  
Pareto 41.45480   45.45480   47.16908   46.05997   0.11423   0.00739   0.0009084  
Gpd 41.45480   45.45480   47.16908   46.05997   0.11423   0.00739   0.0009084  
Weibull 41.76272   45.76272   47.47700   46.36789   0.17238   0.03293   0.00472  
Note: The asterisk (*) marks the best model according to each column's criterion.