The HPCDM Procedure (Experimental)

Scenario Analysis

The distributions of loss frequency and loss severity often depend on exogenous variables (regressors). For example, the number of losses and the severity of each loss that an automobile insurance policyholder incurs might depend on the characteristics of the policyholder and the characteristics of the vehicle. When you fit frequency and severity models, you need to account for the effects of such regressors on the probability distributions of the counts and severity. The COUNTREG procedure enables you to model regression effects on the mean of the count distribution, and the SEVERITY procedure enables you to model regression effects on the scale parameter of the severity distribution. When you use these models to estimate the compound distribution model of the aggregate loss, you need to specify a set of values for all the regressors, which represents the state of the world for which the simulation is conducted. This is referred to as the what-if or scenario analysis.

Consider that you, as an automobile insurance company, have postulated that the distribution of the loss event frequency depends on five regressors (external factors): age of the policyholder, gender, type of car, annual miles driven, and policyholder’s education level. Further, the distribution of the severity of each loss depends on three regressors: type of car, safety rating of the car, and annual household income of the policyholder (which can be thought of as a proxy for the luxury level of the car). Note that the frequency model regressors and severity model regressors can be different, as illustrated in this example.

Let these regressors be recorded in the variables Age (scaled by a factor of 1/50), Gender (1: female, 2: male), CarType (1: sedan, 2: sport utility vehicle), AnnualMiles (scaled by a factor of 1/5,000), Education (1: high school graduate, 2: college graduate, 3: advanced degree holder), CarSafety (scaled to be between 0 and 1, the safest being 1), and Income (scaled by a factor of 1/100,000), respectively. Let the historical data about the number of losses that various policyholders incur in a year be recorded in the NumLoss variable of the Work.LossCounts data set, and let the severity of each loss be recorded in the LossAmount variable of the Work.Losses data set.

The following PROC COUNTREG step fits the count regression model and stores the fitted model information in the Work.CountregModel item store:

/* Fit negative binomial frequency model for the number of losses */
proc countreg data=losscounts;
   model numloss = age gender carType annualMiles education / dist=negbin;
   store work.countregmodel;
run;

You can examine the parameter estimates of the count model that are stored in the Work.CountregModel item store by submitting the following statements:

/* Examine the parameter estimates for the model in the item store */
proc countreg restore=work.countregmodel;
   show parameters;
run;

The "Parameter Estimates" table that is displayed by the SHOW statement is shown in Figure 4.5.

Figure 4.5: Parameter Estimates of the Count Regression Model

ITEM STORE CONTENTS: WORK.COUNTREGMODEL

Parameter Estimates
Parameter DF Estimate Standard
Error
t Value Approx
Pr > |t|
Intercept 1 0.910479 0.090515 10.06 <.0001
age 1 -0.626803 0.058547 -10.71 <.0001
gender 1 1.025034 0.032099 31.93 <.0001
carType 1 0.615165 0.031153 19.75 <.0001
annualMiles 1 -1.010276 0.017512 -57.69 <.0001
education 1 -0.280246 0.021677 -12.93 <.0001
_Alpha 1 0.318403 0.020090 15.85 <.0001



The following PROC SEVERITY step fits the severity scale regression models for all the common distributions that are predefined in PROC SEVERITY:

/* Fit severity models for the magnitude of losses */
proc severity data=losses plots=none outest=work.sevregest print=all;
   loss lossamount;
   scalemodel carType carSafety income;
   dist _predef_;
   nloptions maxiter=100;
run;

The comparison of fit statistics of various scale regression models is shown in Figure 4.6. The scale regression model that is based on the lognormal distribution is deemed the best-fitting model according to the likelihood-based statistics, whereas the scale regression model that is based on the generalized Pareto distribution (GPD) is deemed the best-fitting model according to the EDF-based statistics.

Figure 4.6: Severity Model Comparison

The SEVERITY Procedure

All Fit Statistics
Distribution -2 Log
Likelihood
AIC AICC BIC KS AD CvM
Burr 127231   127243   127243   127286   7.75407   224.47578   27.41346  
Exp 128431   128439   128439   128467   6.13537   181.83094   12.33919  
Gamma 128324   128334   128334   128370   7.54562   276.13156   24.59515  
Igauss 127434   127444   127444   127480   6.15855   211.51908   17.70942  
Logn 127062 * 127072 * 127072 * 127107 * 6.77687   212.70400   21.47945  
Pareto 128166   128176   128176   128211   5.37453   110.53673   7.07119  
Gpd 128166   128176   128176   128211   5.37453 * 110.53660 * 7.07116 *
Weibull 128429   128439   128439   128475   6.21268   190.81178   13.45425  
Note: The asterisk (*) marks the best model according to each column's criterion.



Now, you are ready to analyze the distribution of the aggregate loss that can be expected from a specific policyholder—for example, a 59-year-old male policyholder with an advanced degree who earns 159,870 and drives a sedan that has a very high safety rating about 11,474 miles annually. First, you need to encode and scale this information into the appropriate regressor variables of a data set. Let that data set be named Work.SinglePolicy, with an observation as shown in Figure 4.7.

Figure 4.7: Scenario Analysis Data for One Policyholder

age gender carType annualMiles education carSafety income
1.18 2 1 2.2948 3 0.99532 1.5987



Now, you can submit the following PROC HPCDM step to analyze the compound distribution of the aggregate loss that is incurred by the policyholder in the Work.SinglePolicy data set in a given year by using the frequency model from the Work.CountregModel item store and the two best severity models, lognormal and GPD, from the Work.SevRegEst data set:

/* Simulate the aggregate loss distribution for the scenario
   with single policyholder */
proc hpcdm data=singlePolicy nreplicates=10000 seed=13579 print=all
           countstore=work.countregmodel severityest=work.sevregest;
   severitymodel logn gpd;
   outsum out=onepolicysum mean stddev skew kurtosis median
         pctlpts=97.5 to 99.5 by 1;
run;

The displayed results from the preceding PROC HPCDM step are shown in Figure 4.8.

When you use a severity scale regression model, it is recommended that you verify the severity scale regressors that are used by PROC HPCDM by examining the Scale Model Regressors row of the "Compound Distribution Information" table. PROC HPCDM detects the severity regressors automatically by examining the variables in the SEVERITYEST= and DATA= data sets. If those data sets contain variables that you did not include in the SCALEMODEL statement in PROC SEVERITY, then such variables can be treated as severity regressors. One common mistake that can lead to this situation is to fit a severity model by using the BY statement and forget to specify the identical BY statement in the PROC HPCDM step; this can cause PROC HPCDM to treat BY variables as scale model regressors. In this example, Figure 4.8 confirms that the correct set of scale model regressors is detected.

Figure 4.8: Scenario Analysis Results for One Policyholder with Lognormal Severity Model

The HPCDM Procedure
Severity Model: Logn
Count Model: NegBin(p=2)

Compound Distribution Information
Severity Model Lognormal Distribution
Scale Model Regressors carType carSafety income
Count Model NegBin(p=2) Model in Item Store WORK.COUNTREGMODEL

Sample Summary Statistics
Mean 209.76287 Median 0
Standard Deviation 559.78686 Interquartile Range 126.67003
Variance 313361.3 Minimum 0
Skewness 5.04884 Maximum 9998.2
Kurtosis 39.24667 Sample Size 10000

Sample Percentiles
Percentile Value
0 0
1 0
5 0
25 0
50 0
75 126.67003
95 1215.9
97.5 1863.1
98.5 2288.9
99 2737.3
99.5 3529.0
Percentile Method = 5



The "Sample Summary Statistics" and "Sample Percentiles" tables in Figure 4.8 show estimates of the aggregate loss distribution for the lognormal severity model. The average expected loss is about 210, and the worst-case loss, if approximated by the 97.5th percentile, is about 1,863. The percentiles table shows that the distribution is highly skewed to the right; this is also confirmed by the skewness estimate. The median estimate of 0 can be interpreted in two ways. One way is to conclude that the policyholder will not incur any loss in 50% of the years during which he or she is insured. The other way is to conclude that 50% of policyholders who have the characteristics of this policyholder will not incur any loss in a given year. However, there is a 2.5% chance that the policyholder will incur a loss that exceeds 1,863 in any given year and a 0.5% chance that the policyholder will incur a loss that exceeds 3,529 in any given year.

If the aggregate loss sample is simulated by using the GPD severity model, then the results are as shown in Figure 4.9. The average and worst-case losses are 211 and 1,856, respectively. These estimates are very close to the values that are predicted by the lognormal severity model.

Figure 4.9: Scenario Analysis Results for One Policyholder with GPD Severity Model

The HPCDM Procedure
Severity Model: Gpd
Count Model: NegBin(p=2)

Compound Distribution Information
Severity Model Generalized Pareto Distribution
Scale Model Regressors carType carSafety income
Count Model NegBin(p=2) Model in Item Store WORK.COUNTREGMODEL

Sample Summary Statistics
Mean 211.16729 Median 0
Standard Deviation 539.58331 Interquartile Range 121.18808
Variance 291150.1 Minimum 0
Skewness 4.44116 Maximum 7349.2
Kurtosis 29.03404 Sample Size 10000

Sample Percentiles
Percentile Value
0 0
1 0
5 0
25 0
50 0
75 121.18808
95 1259.5
97.5 1855.7
98.5 2288.5
99 2577.5
99.5 3294.9
Percentile Method = 5



The scenario that you just analyzed contains only one policyholder. You can extend the scenario to include multiple policyholders. Let the Work.GroupOfPolicies data set record information about five different policyholders, as shown in Figure 4.10.

Figure 4.10: Scenario Analysis Data for Multiple Policyholders

policyholderId age gender carType annualMiles education carSafety income
1 1.18 2 1 2.2948 3 0.99532 1.59870
2 0.66 2 1 2.6718 2 0.86412 0.84459
3 0.64 2 2 1.9528 1 0.86478 0.50177
4 0.46 1 2 2.6402 2 0.27062 1.18870
5 0.62 1 1 1.7294 1 0.32830 0.37694



The following PROC HPCDM step conducts a scenario analysis for the aggregate loss that is incurred by all five policyholders in the Work.GroupOfPolicies data set together in one year:

/* Simulate the aggregate loss distribution for the scenario
   with multiple policyholders */
proc hpcdm data=groupOfPolicies nreplicates=10000 seed=13579 print=all
           countstore=work.countregmodel severityest=work.sevregest
           plots=(conditionaldensity(rightq=0.95)) nperturbedSamples=50;
   severitymodel logn gpd;
   outsum out=multipolicysum mean stddev skew kurtosis median
         pctlpts=97.5 to 99.5 by 1;
run;

The preceding PROC HPCDM step conducts perturbation analysis by simulating 50 perturbed samples. The perturbation summary results for the lognormal severity model are shown in Figure 4.11, and the results for the GPD severity model are shown in Figure 4.12. If the severity of each loss follows the fitted lognormal distribution, then you can expect that the group of policyholders together incurs an average loss of 5,333 $\pm $ 547 and a worst-case loss of 26,416 $\pm $ 2,681 when you define the worst-case loss as the 97.5th percentile.

Figure 4.11: Perturbation Analysis of Losses from Multiple Policyholders with Lognormal Severity Model

The HPCDM Procedure
Severity Model: Logn
Count Model: NegBin(p=2)

Compound Distribution Information
Severity Model Lognormal Distribution
Scale Model Regressors carType carSafety income
Count Model NegBin(p=2) Model in Item Store WORK.COUNTREGMODEL

Sample Perturbation Analysis
Statistic Estimate Standard
Error
Mean 5333.4 547.02353
Standard Deviation 7428.7 729.93126
Variance 55718988 11240603
Skewness 2.99560 0.18583
Kurtosis 14.12580 2.83814
Number of Perturbed Samples = 50
Size of Each Sample = 10000

Sample Percentile Perturbation Analysis
Percentile Estimate Standard
Error
1 0 0
5 0 0
25 727.92534 113.89462
50 2589.4 302.39245
75 6919.1 718.06509
95 20059.4 1971.8
97.5 26416.3 2681.2
98.5 31256.8 3136.3
99 35166.1 3403.8
99.5 42119.0 4099.4
Number of Perturbed Samples = 50
Size of Each Sample = 10000



If the severity of each loss follows the fitted GPD distribution, then you can expect an average loss of 5,303 $\pm $ 553 and a worst-case loss of 25,885 $\pm $ 2,936.

If you decide to use the 99.5th percentile to define the worst-case loss, then the worst-case loss is 42,119 $\pm $ 4,099 for the lognormal severity model and 40,626 $\pm $ 5,022 for the GPD severity model. The numbers for lognormal and GPD are well within one standard error of each other, which indicates that the aggregate loss distribution is less sensitive to the choice of these two severity distributions in this particular example; you can use the results from either of them.

Figure 4.12: Perturbation Analysis of Losses from Multiple Policyholders with GPD Severity Model

The HPCDM Procedure
Severity Model: Gpd
Count Model: NegBin(p=2)

Compound Distribution Information
Severity Model Generalized Pareto Distribution
Scale Model Regressors carType carSafety income
Count Model NegBin(p=2) Model in Item Store WORK.COUNTREGMODEL

Sample Perturbation Analysis
Statistic Estimate Standard
Error
Mean 5302.7 552.73848
Standard Deviation 7280.4 836.84441
Variance 53704766 12319537
Skewness 2.90094 0.19750
Kurtosis 13.29785 2.75709
Number of Perturbed Samples = 50
Size of Each Sample = 10000

Sample Percentile Perturbation Analysis
Percentile Estimate Standard
Error
1 0 0
5 0 0
25 708.74349 102.91404
50 2616.7 282.24080
75 6974.4 697.83278
95 19747.8 2171.9
97.5 25885.1 2936.3
98.5 30424.8 3525.2
99 34213.3 4066.3
99.5 40626.2 5022.4
Number of Perturbed Samples = 50
Size of Each Sample = 10000



The PLOTS=CONDITIONALDENSITY option that is used in the preceding PROC HPCDM step prepares the conditional density plots for the body and right-tail regions of the density function of the aggregate loss. The plots for the aggregate loss sample that is generated by using the lognormal severity model are shown in Figure 4.13. The plot on the left side is the plot of $\Pr (Y| Y \leq \text {19,085})$, where the limit 19,085 is the 95th percentile as specified by the RIGHTQ=0.95 option. The plot on the right side is the plot of $\Pr (Y| Y > \text {19,085})$, which helps you visualize the right-tail region of the density function. You can also request the plot of the left tail by specifying the LEFTQ= suboption of the CONDITIONALDENSITY option if you want to explore the details of the left tail region. Note that the conditional density plots are always produced by using the unperturbed sample.

Figure 4.13: Conditional Density Plots for the Aggregate Loss of Multiple Policyholders

Conditional Density Plots for the Aggregate Loss of Multiple Policyholders