In some applications, the data available for modeling might not be exact. A commonly encountered scenario is the use of grouped data from an external agency, which for several reasons, including privacy, does not provide information about individual loss events. The losses are grouped into disjoint bins, and you know only the range and number of values in each bin. Each group is essentially interval-censored, because you know that a loss magnitude is in certain interval, but you do not know the exact magnitude. This example illustrates how you can use PROC HPSEVERITY to model such data.
The following DATA step generates sample grouped data for dental insurance claims, which is taken from Klugman, Panjer, and Willmot (1998):
/* Grouped dental insurance claims data (Klugman, Panjer, and Willmot, 1998) */ data gdental; input lowerbd upperbd count @@; datalines; 0 25 30 25 50 31 50 100 57 100 150 42 150 250 65 250 500 84 500 1000 45 1000 1500 10 1500 2500 11 2500 4000 3 ; run;
The following PROC HPSEVERITY step fits all the predefined distributions to the data in Work.Gdental
data set:
/* Fit all predefined distributions */ proc hpseverity data=gdental edf=turnbull print=all criterion=aicc; loss / rc=lowerbd lc=upperbd; weight count; dist _predef_; performance nthreads=1; run;
The EDF= option in the PROC HPSEVERITY statement specifies that the Turnbull’s method be used for EDF estimation. The LOSS
statement specifies the left and right boundaries of each group as the right-censoring and left-censoring limits, respectively.
The variable count
records the number of losses in each group and is specified in the WEIGHT statement. Note that no response variable is specified
in the LOSS statement, which is allowed as long as each observation in the input data set is censored. The PERFORMANCE statement
specifies that just one thread of execution be used, to minimize the overhead associated with multithreading, because the
input data set is very small.
Some of the key results prepared by PROC HPSEVERITY are shown in Output 9.5.1. According to the “Model Selection” table in Output 9.5.1, all distribution models have converged. The “All Fit Statistics” table in Output 9.5.1 indicates that the exponential distribution (EXP) has the best fit for data according to a majority of the likelihood-based statistics.
Output 9.5.1: Statistics of Fit for Interval-Censored Data
Input Data Set | |
---|---|
Name | WORK.GDENTAL |
Model Selection | |||
---|---|---|---|
Distribution | Converged | AICC | Selected |
Burr | Yes | 51.41112 | No |
Exp | Yes | 44.64768 | Yes |
Gamma | Yes | 47.63969 | No |
Igauss | Yes | 48.05874 | No |
Logn | Yes | 47.34027 | No |
Pareto | Yes | 47.16908 | No |
Gpd | Yes | 47.16908 | No |
Weibull | Yes | 47.47700 | No |
All Fit Statistics | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Distribution | -2 Log Likelihood |
AIC | AICC | BIC | KS | AD | CvM | |||||||
Burr | 41.41112 | * | 47.41112 | 51.41112 | 48.31888 | 0.08974 | * | 0.00103 | * | 0.0000816 | * | |||
Exp | 42.14768 | 44.14768 | * | 44.64768 | * | 44.45026 | * | 0.26412 | 0.09936 | 0.01866 | ||||
Gamma | 41.92541 | 45.92541 | 47.63969 | 46.53058 | 0.19569 | 0.04608 | 0.00759 | |||||||
Igauss | 42.34445 | 46.34445 | 48.05874 | 46.94962 | 0.34514 | 0.12301 | 0.02562 | |||||||
Logn | 41.62598 | 45.62598 | 47.34027 | 46.23115 | 0.16853 | 0.01884 | 0.00333 | |||||||
Pareto | 41.45480 | 45.45480 | 47.16908 | 46.05997 | 0.11423 | 0.00739 | 0.0009084 | |||||||
Gpd | 41.45480 | 45.45480 | 47.16908 | 46.05997 | 0.11423 | 0.00739 | 0.0009084 | |||||||
Weibull | 41.76272 | 45.76272 | 47.47700 | 46.36789 | 0.17238 | 0.03293 | 0.00472 | |||||||
Note: The asterisk (*) marks the best model according to each column's criterion. |