The HPCDM Procedure (Experimental)

Scenario Analysis

The distributions of loss frequency and loss severity often depend on exogenous variables (regressors). For example, the number of losses and the severity of each loss that an automobile insurance policyholder incurs might depend on the characteristics of the policyholder and the characteristics of the vehicle. When you fit frequency and severity models, you need to account for the effects of such regressors on the probability distributions of the counts and severity. The COUNTREG procedure enables you to model regression effects on the mean of the count distribution, and the SEVERITY procedure enables you to model regression effects on the scale parameter of the severity distribution. When you use these models to estimate the compound distribution model of the aggregate loss, you need to specify a set of values for all the regressors, which represents the state of the world for which the simulation is conducted. This is referred to as the what-if or scenario analysis.

Consider that you, as an automobile insurance company, have postulated that the distribution of the loss event frequency depends on five regressors (external factors): age of the policyholder, gender, type of car, annual miles driven, and policyholder’s education level. Further, the distribution of the severity of each loss depends on three regressors: type of car, safety rating of the car, and annual household income of the policyholder (which can be thought of as a proxy for the luxury level of the car). Note that the frequency model regressors and severity model regressors can be different, as illustrated in this example.

Let these regressors be recorded in the variables Age (scaled by a factor of 1/50), Gender (1: female, 2: male), CarType (1: sedan, 2: sport utility vehicle), AnnualMiles (scaled by a factor of 1/5,000), Education (1: high school graduate, 2: college graduate, 3: advanced degree holder), CarSafety (scaled to be between 0 and 1, the safest being 1), and Income (scaled by a factor of 1/100,000), respectively. Let the historical data about the number of losses that various policyholders incur in a year be recorded in the NumLoss variable of the Work.LossCounts data set, and let the severity of each loss be recorded in the LossAmount variable of the Work.Losses data set.

The following PROC COUNTREG step fits the count regression model and stores the fitted model information in the Work.CountregModel item store:

/* Fit negative binomial frequency model for the number of losses */
proc countreg data=losscounts;
   model numloss = age gender carType annualMiles education / dist=negbin;
   store work.countregmodel;
run;

You can examine the parameter estimates of the count model that are stored in the Work.CountregModel item store by submitting the following statements:

/* Examine the parameter estimates for the model in the item store */
proc countreg restore=work.countregmodel;
   show parameters;
run;

The "Parameter Estimates" table that is displayed by the SHOW statement is shown in Figure 4.5.

Figure 4.5: Parameter Estimates of the Count Regression Model

ITEM STORE CONTENTS: WORK.COUNTREGMODEL

Parameter Estimates
Parameter	DF	Estimate	Standard Error	t Value	Approx Pr > \|t\|
Intercept	1	0.910479	0.090515	10.06	<.0001
age	1	-0.626803	0.058547	-10.71	<.0001
gender	1	1.025034	0.032099	31.93	<.0001
carType	1	0.615165	0.031153	19.75	<.0001
annualMiles	1	-1.010276	0.017512	-57.69	<.0001
education	1	-0.280246	0.021677	-12.93	<.0001
_Alpha	1	0.318403	0.020090	15.85	<.0001

The following PROC SEVERITY step fits the severity scale regression models for all the common distributions that are predefined in PROC SEVERITY:

/* Fit severity models for the magnitude of losses */
proc severity data=losses plots=none outest=work.sevregest print=all;
   loss lossamount;
   scalemodel carType carSafety income;
   dist _predef_;
   nloptions maxiter=100;
run;

The comparison of fit statistics of various scale regression models is shown in Figure 4.6. The scale regression model that is based on the lognormal distribution is deemed the best-fitting model according to the likelihood-based statistics, whereas the scale regression model that is based on the generalized Pareto distribution (GPD) is deemed the best-fitting model according to the EDF-based statistics.

Figure 4.6: Severity Model Comparison

The SEVERITY Procedure

All Fit Statistics
Distribution	-2 Log Likelihood		AIC		AICC		BIC		KS		AD		CvM
Burr	127231		127243		127243		127286		7.75407		224.47578		27.41346
Exp	128431		128439		128439		128467		6.13537		181.83094		12.33919
Gamma	128324		128334		128334		128370		7.54562		276.13156		24.59515
Igauss	127434		127444		127444		127480		6.15855		211.51908		17.70942
Logn	127062	*	127072	*	127072	*	127107	*	6.77687		212.70400		21.47945
Pareto	128166		128176		128176		128211		5.37453		110.53673		7.07119
Gpd	128166		128176		128176		128211		5.37453	*	110.53660	*	7.07116	*
Weibull	128429		128439		128439		128475		6.21268		190.81178		13.45425
Note: The asterisk (*) marks the best model according to each column's criterion.

Now, you are ready to analyze the distribution of the aggregate loss that can be expected from a specific policyholder—for example, a 59-year-old male policyholder with an advanced degree who earns 159,870 and drives a sedan that has a very high safety rating about 11,474 miles annually. First, you need to encode and scale this information into the appropriate regressor variables of a data set. Let that data set be named Work.SinglePolicy, with an observation as shown in Figure 4.7.

Figure 4.7: Scenario Analysis Data for One Policyholder

age	gender	carType	annualMiles	education	carSafety	income
1.18	2	1	2.2948	3	0.99532	1.5987

Now, you can submit the following PROC HPCDM step to analyze the compound distribution of the aggregate loss that is incurred by the policyholder in the Work.SinglePolicy data set in a given year by using the frequency model from the Work.CountregModel item store and the two best severity models, lognormal and GPD, from the Work.SevRegEst data set:

/* Simulate the aggregate loss distribution for the scenario
   with single policyholder */
proc hpcdm data=singlePolicy nreplicates=10000 seed=13579 print=all
           countstore=work.countregmodel severityest=work.sevregest;
   severitymodel logn gpd;
   outsum out=onepolicysum mean stddev skew kurtosis median
         pctlpts=97.5 to 99.5 by 1;
run;

The displayed results from the preceding PROC HPCDM step are shown in Figure 4.8.

When you use a severity scale regression model, it is recommended that you verify the severity scale regressors that are used by PROC HPCDM by examining the Scale Model Regressors row of the "Compound Distribution Information" table. PROC HPCDM detects the severity regressors automatically by examining the variables in the SEVERITYEST= and DATA= data sets. If those data sets contain variables that you did not include in the SCALEMODEL statement in PROC SEVERITY, then such variables can be treated as severity regressors. One common mistake that can lead to this situation is to fit a severity model by using the BY statement and forget to specify the identical BY statement in the PROC HPCDM step; this can cause PROC HPCDM to treat BY variables as scale model regressors. In this example, Figure 4.8 confirms that the correct set of scale model regressors is detected.

Figure 4.8: Scenario Analysis Results for One Policyholder with Lognormal Severity Model

The HPCDM Procedure

Severity Model: Logn

Count Model: NegBin(p=2)

Compound Distribution Information
Severity Model	Lognormal Distribution
Scale Model Regressors	carType carSafety income
Count Model	NegBin(p=2) Model in Item Store WORK.COUNTREGMODEL

Sample Summary Statistics
Mean	209.76287	Median	0
Standard Deviation	559.78686	Interquartile Range	126.67003
Variance	313361.3	Minimum	0
Skewness	5.04884	Maximum	9998.2
Kurtosis	39.24667	Sample Size	10000

Sample Percentiles
Percentile	Value
0	0
1	0
5	0
25	0
50	0
75	126.67003
95	1215.9
97.5	1863.1
98.5	2288.9
99	2737.3
99.5	3529.0
Percentile Method = 5

The "Sample Summary Statistics" and "Sample Percentiles" tables in Figure 4.8 show estimates of the aggregate loss distribution for the lognormal severity model. The average expected loss is about 210, and the worst-case loss, if approximated by the 97.5th percentile, is about 1,863. The percentiles table shows that the distribution is highly skewed to the right; this is also confirmed by the skewness estimate. The median estimate of 0 can be interpreted in two ways. One way is to conclude that the policyholder will not incur any loss in 50% of the years during which he or she is insured. The other way is to conclude that 50% of policyholders who have the characteristics of this policyholder will not incur any loss in a given year. However, there is a 2.5% chance that the policyholder will incur a loss that exceeds 1,863 in any given year and a 0.5% chance that the policyholder will incur a loss that exceeds 3,529 in any given year.

If the aggregate loss sample is simulated by using the GPD severity model, then the results are as shown in Figure 4.9. The average and worst-case losses are 211 and 1,856, respectively. These estimates are very close to the values that are predicted by the lognormal severity model.

Figure 4.9: Scenario Analysis Results for One Policyholder with GPD Severity Model

The HPCDM Procedure

Severity Model: Gpd

Count Model: NegBin(p=2)

Compound Distribution Information
Severity Model	Generalized Pareto Distribution
Scale Model Regressors	carType carSafety income
Count Model	NegBin(p=2) Model in Item Store WORK.COUNTREGMODEL

Sample Summary Statistics
Mean	211.16729	Median	0
Standard Deviation	539.58331	Interquartile Range	121.18808
Variance	291150.1	Minimum	0
Skewness	4.44116	Maximum	7349.2
Kurtosis	29.03404	Sample Size	10000

Sample Percentiles
Percentile	Value
0	0
1	0
5	0
25	0
50	0
75	121.18808
95	1259.5
97.5	1855.7
98.5	2288.5
99	2577.5
99.5	3294.9
Percentile Method = 5

The scenario that you just analyzed contains only one policyholder. You can extend the scenario to include multiple policyholders. Let the Work.GroupOfPolicies data set record information about five different policyholders, as shown in Figure 4.10.

Figure 4.10: Scenario Analysis Data for Multiple Policyholders

policyholderId	age	gender	carType	annualMiles	education	carSafety	income
1	1.18	2	1	2.2948	3	0.99532	1.59870
2	0.66	2	1	2.6718	2	0.86412	0.84459
3	0.64	2	2	1.9528	1	0.86478	0.50177
4	0.46	1	2	2.6402	2	0.27062	1.18870
5	0.62	1	1	1.7294	1	0.32830	0.37694

The following PROC HPCDM step conducts a scenario analysis for the aggregate loss that is incurred by all five policyholders in the Work.GroupOfPolicies data set together in one year:

/* Simulate the aggregate loss distribution for the scenario
   with multiple policyholders */
proc hpcdm data=groupOfPolicies nreplicates=10000 seed=13579 print=all
           countstore=work.countregmodel severityest=work.sevregest
           plots=(conditionaldensity(rightq=0.95)) nperturbedSamples=50;
   severitymodel logn gpd;
   outsum out=multipolicysum mean stddev skew kurtosis median
         pctlpts=97.5 to 99.5 by 1;
run;

The preceding PROC HPCDM step conducts perturbation analysis by simulating 50 perturbed samples. The perturbation summary results for the lognormal severity model are shown in Figure 4.11, and the results for the GPD severity model are shown in Figure 4.12. If the severity of each loss follows the fitted lognormal distribution, then you can expect that the group of policyholders together incurs an average loss of 5,333 $\pm$ 547 and a worst-case loss of 26,416 $\pm$ 2,681 when you define the worst-case loss as the 97.5th percentile.

Figure 4.11: Perturbation Analysis of Losses from Multiple Policyholders with Lognormal Severity Model

The HPCDM Procedure

Severity Model: Logn

Count Model: NegBin(p=2)

Compound Distribution Information
Severity Model	Lognormal Distribution
Scale Model Regressors	carType carSafety income
Count Model	NegBin(p=2) Model in Item Store WORK.COUNTREGMODEL

Sample Perturbation Analysis
Statistic	Estimate	Standard Error
Mean	5333.4	547.02353
Standard Deviation	7428.7	729.93126
Variance	55718988	11240603
Skewness	2.99560	0.18583
Kurtosis	14.12580	2.83814
Number of Perturbed Samples = 50
Size of Each Sample = 10000

Sample Percentile Perturbation Analysis
Percentile	Estimate	Standard Error
1	0	0
5	0	0
25	727.92534	113.89462
50	2589.4	302.39245
75	6919.1	718.06509
95	20059.4	1971.8
97.5	26416.3	2681.2
98.5	31256.8	3136.3
99	35166.1	3403.8
99.5	42119.0	4099.4
Number of Perturbed Samples = 50
Size of Each Sample = 10000

If the severity of each loss follows the fitted GPD distribution, then you can expect an average loss of 5,303 $\pm$ 553 and a worst-case loss of 25,885 $\pm$ 2,936.

If you decide to use the 99.5th percentile to define the worst-case loss, then the worst-case loss is 42,119 $\pm$ 4,099 for the lognormal severity model and 40,626 $\pm$ 5,022 for the GPD severity model. The numbers for lognormal and GPD are well within one standard error of each other, which indicates that the aggregate loss distribution is less sensitive to the choice of these two severity distributions in this particular example; you can use the results from either of them.

Figure 4.12: Perturbation Analysis of Losses from Multiple Policyholders with GPD Severity Model

The HPCDM Procedure

Severity Model: Gpd

Count Model: NegBin(p=2)

Compound Distribution Information
Severity Model	Generalized Pareto Distribution
Scale Model Regressors	carType carSafety income
Count Model	NegBin(p=2) Model in Item Store WORK.COUNTREGMODEL

Sample Perturbation Analysis
Statistic	Estimate	Standard Error
Mean	5302.7	552.73848
Standard Deviation	7280.4	836.84441
Variance	53704766	12319537
Skewness	2.90094	0.19750
Kurtosis	13.29785	2.75709
Number of Perturbed Samples = 50
Size of Each Sample = 10000

Sample Percentile Perturbation Analysis
Percentile	Estimate	Standard Error
1	0	0
5	0	0
25	708.74349	102.91404
50	2616.7	282.24080
75	6974.4	697.83278
95	19747.8	2171.9
97.5	25885.1	2936.3
98.5	30424.8	3525.2
99	34213.3	4066.3
99.5	40626.2	5022.4
Number of Perturbed Samples = 50
Size of Each Sample = 10000

The PLOTS=CONDITIONALDENSITY option that is used in the preceding PROC HPCDM step prepares the conditional density plots for the body and right-tail regions of the density function of the aggregate loss. The plots for the aggregate loss sample that is generated by using the lognormal severity model are shown in Figure 4.13. The plot on the left side is the plot of $\Pr (Y| Y \leq \text {19,085})$ , where the limit 19,085 is the 95th percentile as specified by the RIGHTQ=0.95 option. The plot on the right side is the plot of $\Pr (Y| Y > \text {19,085})$ , which helps you visualize the right-tail region of the density function. You can also request the plot of the left tail by specifying the LEFTQ= suboption of the CONDITIONALDENSITY option if you want to explore the details of the left tail region. Note that the conditional density plots are always produced by using the unperturbed sample.

Figure 4.13: Conditional Density Plots for the Aggregate Loss of Multiple Policyholders