All of the Enterprise
Miner modeling nodes enable you to specify a frequency variable. Typically,
the values of the frequency variable are nonnegative integers. The
data are treated as if each case were replicated as many times as
the value of the frequency variable.
Unlike most SAS procedures,
the modeling nodes in Enterprise Miner accept values for a frequency
variable that are not integers without truncating the fractional part.
Thus, you can use a frequency variable to perform weighted analyses.
However, Enterprise
Miner does not provide explicit support for sampling weights, noise-variance
weights, or other analyses where the weight variable does not represent
the frequency of occurrence of each case. If the frequency variable
represents sampling weights or noise-variance weights, the point estimates
of regression coefficients and neural network weights will be valid.
But if the frequency variable does not represent actual frequencies,
then standard errors, significance tests, and statistics such as MSE,
AIC, and SBC might be invalid.
If you want to do weighted
estimation under the usual assumption for weighted least squares that
the weights are inversely proportional to the noise variance (error
variance) of the target variable, then you can obtain statistically
correct results by specifying frequency values that add up to the
sample size.
If you want to use sampling
weights that are inversely proportional to the sampling probability
of each case, you can get approximate estimates for MSE and related
statistics in the Regression and Neural Network nodes by specifying
frequencies that add up to the effective sample size. A pessimistic
approximation to the effective sample size is provided by
where W(i) is a sampling
weight for case i. This approximation will not work properly with
the Decision Tree node.