The HPSEVERITY Procedure

A Simple Example of Fitting Predefined Distributions

The simplest way to use PROC HPSEVERITY is to fit all the predefined distributions to a set of values and let the procedure identify the best fitting distribution.

Consider a lognormal distribution, whose probability density function (PDF) f and cumulative distribution function (CDF) F are as follows, respectively, where $\Phi $ denotes the CDF of the standard normal distribution:

\[  f(x; \mu , \sigma ) = \frac{1}{x \sigma \sqrt {2 \pi }} e^{-\frac{1}{2}\left(\frac{\log (x) - \mu }{\sigma }\right)^2} \quad \text {and} \quad F(x; \mu , \sigma ) = \Phi \left(\frac{\log (x) - \mu }{\sigma }\right)  \]

The following DATA step statements simulate a sample from a lognormal distribution with population parameters $\mu = 1.5$ and $\sigma =0.25$, and store the sample in the variable Y of a data set Work.Test_sev1:

/*------------- Simple Lognormal Example -------------*/
data test_sev1(keep=y label='Simple Lognormal Sample');
   call streaminit(45678);
   label y='Response Variable';
   Mu = 1.5;
   Sigma = 0.25;
   do n = 1 to 100;
      y = exp(Mu) * rand('LOGNORMAL')**Sigma;
      output;
   end;
run;

The following statements fit all the predefined distribution models to the values of Y and identify the best distribution according to the corrected Akaike’s information criterion (AICC):

proc hpseverity data=test_sev1 crit=aicc;
   loss y;
   dist _predefined_;
run;

The PROC HPSEVERITY statement specifies the input data set along with the model selection criterion, the LOSS statement specifies the variable to be modeled, and the DIST statement with the _PREDEFINED_ keyword specifies that all the predefined distribution models be fitted.

Some of the default output displayed by this step is shown in Figure 9.1 through Figure 9.3. First, information about the input data set is displayed followed by the "Model Selection" table, as shown in Figure 9.1. The model selection table displays the convergence status, the value of the selection criterion, and the selection status for each of the candidate models. The Converged column indicates whether the estimation process for a given distribution model has converged, might have converged, or failed. The Selected column indicates whether a given distribution has the best fit for the data according to the selection criterion. For this example, the lognormal distribution model is selected, because it has the lowest value for the selection criterion.

Figure 9.1: Data Set Information and Model Selection Table

The HPSEVERITY Procedure

Input Data Set
Name WORK.TEST_SEV1
Label Simple Lognormal Sample

Model Selection
Distribution Converged AICC Selected
Burr Yes 322.50845 No
Exp Yes 508.12287 No
Gamma Yes 320.50264 No
Igauss Yes 319.61652 No
Logn Yes 319.56579 Yes
Pareto Yes 510.28172 No
Gpd Yes 510.20576 No
Weibull Yes 334.82373 No



Next, the estimation information for each of the candidate models is displayed. The information for the lognormal model, which is the best fitting model, is shown in Figure 9.2. The first table displays a summary of the distribution. The second table displays the convergence status. This is followed by a summary of the optimization process which indicates the technique used, the number of iterations, the number of times the objective function was evaluated, and the log likelihood attained at the end of the optimization. Since the model with lognormal distribution has converged, PROC HPSEVERITY displays its statistics of fit and parameter estimates. The estimates of Mu=1.49605 and Sigma=0.26243 are quite close to the population parameters of Mu=1.5 and Sigma=0.25 from which the sample was generated. The p-value for each estimate indicates the rejection of the null hypothesis that the estimate is 0, implying that both the estimates are significantly different from 0.

Figure 9.2: Estimation Details for the Lognormal Model

The HPSEVERITY Procedure
Logn Distribution

Distribution Information
Name Logn
Description Lognormal Distribution
Distribution Parameters 2

Convergence Status
Convergence criterion (GCONV=1E-8) satisfied.

Optimization Summary
Optimization Technique Trust Region
Iterations 2
Function Calls 8
Log Likelihood -157.72104

Fit Statistics
-2 Log Likelihood 315.44208
AIC 319.44208
AICC 319.56579
BIC 324.65242
Kolmogorov-Smirnov 0.50641
Anderson-Darling 0.31240
Cramer-von Mises 0.04353

Parameter Estimates
Parameter Estimate Standard
Error
t Value Approx
Pr > |t|
Mu 1.49605 0.02651 56.43 <.0001
Sigma 0.26243 0.01874 14.00 <.0001



The parameter estimates of the Burr distribution are shown in Figure 9.3. These estimates are used in the next example.

Figure 9.3: Parameter Estimates for the Burr Model

Parameter Estimates
Parameter Estimate Standard
Error
t Value Approx
Pr > |t|
Theta 4.62348 0.46181 10.01 <.0001
Alpha 1.15706 0.47493 2.44 0.0167
Gamma 6.41227 0.99039 6.47 <.0001