Prior Probabilities

For a categorical target variable, each modeling node can estimate posterior probabilities for each class, which are defined as the conditional probabilities of the classes given the input variables. By default, the posterior probabilities are based on implicit prior probabilities that are proportional to the frequencies of the classes in the training set. You can specify different prior probabilities via the Target Profile using the Prior Probabilities tab (see the Target Profile chapter). Also, given a previously scored data set containing posterior probabilities, you can compute new posterior probabilities for different priors by using the DECIDE procedure, which reads the prior probabilities from a decision data set.
Prior probabilities should be specified when the sample proportions of the classes in the training set differ substantially from the proportions in the operational data to be scored, either through sampling variation or deliberate bias. For example, when the purpose of the analysis is to detect a rare class, it is a common practice to use a training set in which the rare class is over represented. If no prior probabilities are used, the estimated posterior probabilities for the rare class will be too high. If you specify correct priors, the posterior probabilities will be correctly adjusted no matter what the proportions in the training set are. For more information, see Detecting Rare Classes.
Increasing the prior probability of a class increases the posterior probability of the class, moving the classification boundary for that class so that more cases are classified into the class. Changing the prior will have a more noticeable effect if the original posterior is near 0.5 than if it is near zero or one.
For linear logistic regression and linear normal-theory discriminant analysis, classification boundaries are hyperplanes; increasing the prior for a class moves the hyperplanes for that class farther from the class mean, while decreasing the prior moves the hyperplanes closer to the class mean. But changing the priors does not change the angles of the hyperplanes.
For quadratic logistic regression and quadratic normal-theory discriminant analysis, classification boundaries are quadratic hypersurfaces; increasing the prior for a class moves the boundaries for that class farther from the class mean, while decreasing the prior moves the boundaries closer to the class mean. But changing the priors does not change the shapes of the quadratic surfaces.
To show the effect of changing prior probabilities, the data in the following figure were generated to have three classes, shown as red circles, blue crosses, and green triangles. Each class has 100 training cases with a bivariate normal distribution.
Training Data graph
These training data were used to fit a quadratic logistic regression model using the Neural Network engine. Since each class has the same number of training cases, the implicit prior probabilities are equal. In the following figure, the plot on the left shows color-coded posterior probabilities for each class. Bright red areas have a posterior probability near 1.0 for the red circle class, bright blue areas have a posterior probability near 1.0 for the blue cross class, and bright green areas have a posterior probability near 1.0 for the green triangle class. The plot on the right shows the classification results as red, blue, and green regions.
Equal Priors
If the prior probability for the red class is increased, the red areas in the plots expand in size as shown in the following figure. The red class has a small variance, so the effect is not widespread. Since the priors for the blue and green classes are still equal, the boundary between blue and green has not changed.
Adjusted red priors
If the prior probability for the blue class is increased, the blue areas in the plots expand in size as shown in the following figure. The blue class has a large variance and has a substantial density extending beyond the high-density red region, so increasing the blue prior causes the red areas to contract dramatically.
Adjusted blue priors
If the prior probability for the green class is increased, the green areas in the plots expand as shown in the following figure.
adjusted green priors
In the literature on data mining, statistics, pattern recognition, and so on, prior probabilities are used for a variety of purposes that are sometimes confusing. In Enterprise Miner, however, the nodes are designed to use prior probabilities in a simple, unambiguous way:
  • Prior probabilities are assumed to be estimates of the true proportions of the classes in the operational data to be scored.
  • Prior probabilities are not used by default for parameter estimation. This enables you to manipulate the class proportions in the training set by nonproportional sampling or by a frequency variable in any manner that you want.
  • If you specify prior probabilities, the posterior probabilities computed by the modeling nodes are always adjusted for the priors.
  • If you specify prior probabilities, the profit and loss summary statistics are always adjusted for priors and therefore provide valid model comparisons, assuming that you specify valid decision consequences. (See the following section on Decisions.)
If you do not explicitly specify prior probabilities (or if you specify None for prior probabilities in the target profile), no adjustments for priors are performed by any nodes.
Posterior probabilities are adjusted for priors as follows. Let:
t
be an index for target values (classes)
i
be an index for cases
OldPrior(t)
be the old prior probability or implicit prior probability for target t
OldPost(i,t)
be the posterior probability based on OldPrior(t)
Prior(t)
be the new prior probability desired for target t
Post(i,t)
be the posterior probability based on Prior(t)
Then:
Post(i, t) = [OldPost(i,t)*Prior(t) / OldPrior(t)]/ sum over j of [OldPost(i,j)*Prior(j) / OldPrior(j)]
For classification, each case i is assigned to the class with the greatest posterior probability, that is, the class t for which Post(i,t) is maximized.
Prior probabilities have no effect on estimating parameters in the Regression node, on learning weights in the Neural Network node, or, by default, on growing trees in the Tree node. Prior probabilities do affect classification and decision processing for each case. Hence, if you specify the appropriate options for each node, prior probabilities can affect the choice of models in the Regression node, early stopping in the Neural Network node, and pruning in the Tree node.
Prior probabilities are also used to adjust the relative contribution of each class when computing the total and average profit and loss as described in the section below on Decisions. The adjustment of total and average profit and loss is distinct from the adjustment of posterior probabilities. The latter is used to obtain correct posteriors for individual cases, whereas the former is used to obtain correct summary statistics for the sample. The adjustment of total and average profit and loss is done only if you explicitly specify prior probabilities; the adjustment is not done when the implicit priors based on the training set proportions are used.
Note that the fit statistics such as misclassification rate and mean squared error are not adjusted for prior probabilities. These fit statistics are intended to provide information about the training process under the assumption that you have provided an appropriate training set with appropriate frequencies. Hence, adjustment for prior probabilities could present a misleading picture of the training results. The profit and loss summary statistics are intended to be used for model selection, and to assess decisions that are made using the model under the assumption that you have provided the appropriate prior probabilities and decision values. Therefore, adjustment for prior probabilities is required for data sets that lack representative class proportions. For more details, see Decisions.
If you specify priors explicitly, Enterprise Miner assumes that the priors that you specify represent the true operational prior probabilities and adjusts the profit and loss summary statistics accordingly. Therefore:
  • If you are using profit and loss summary statistics, the class proportions in the validation and test sets need not be the same as in the operational data as long as your priors are correct for the operational data.
  • You can use training sets based on different sampling methods or with differently weighted classes (using a frequency variable), and as long as you use the same explicitly specified prior probabilities, the profit and loss summary statistics for the training, validation, and test sets will be comparable across all of those different training conditions.
  • If you fit two or more models with different specified priors, the profit and loss summary statistics will not be comparable and should not be used for model selection, since the different summary statistics apply to different operational data sets.
If you do not specify priors, Enterprise Miner assumes that the validation and test sets are representative of the operational data. Hence, the profit and loss summary statistics are not adjusted for the implicit priors based on the training set proportions. Therefore:
  • If the validation and test sets are indeed representative of the operational data, then regardless of whether you specify priors, you can use training sets based on different sampling methods or with differently weighted classes (using a frequency variable), and the profit and loss summary statistics for the validation and test sets will be comparable across all of those different training conditions.
  • If the validation and test sets are not representative of the operational data, then the validation statistics might not provide valid model comparisons, and the test-set statistics might not provide valid estimates of generalization accuracy.
If a class has both an old prior and a new prior of zero, then it is omitted from the computations. If a class has a zero old prior, you might not assign it a positive new prior, since that would cause a division by zero. Prior probabilities might not be missing or negative. They must sum to a positive value. If the priors do not sum to one, they are automatically adjusted to do so by dividing each prior by the sum of the priors. A class might have a zero prior probability, but if you use PROC DECIDE to update posterior probabilities, any case having a nonzero posterior corresponding to a zero prior will cause the results for that case to be set to missing values.
To summarize, prior probabilities do not affect:
  • Estimating parameters in the Regression node.
  • Learning weights in the Neural Network node.
  • Growing (as opposed to pruning) trees in the Decision Tree node unless you configure the property Use Prior Probability in Split Search.
  • Residuals, which are based on posteriors before adjustment for priors, except in the Decision Tree node if you choose to use prior probabilities in the split search.
  • Error functions such as deviance or likelihood, except in the Decision Tree node if you choose to use prior probabilities in the split search.
  • Fit statistics such as MSE based on residuals or error functions, except in the Decision Tree node if you choose to use prior probabilities in the split search.
Prior probabilities do affect:
  • Posterior probabilities
  • Classification
  • Decisions
  • Misclassification rate
  • Expected profit or loss
  • Profit and loss summary statistics, including the relative contribution of each class.
Prior probabilities will by default affect the following processes if and only if there are two or more decisions in the decision matrix:
  • Choice of models in the Regression node
  • Early stopping in the Neural Network node
  • Pruning trees in the Tree node.