The ENTROPY Procedure (Experimental)

Information Measures

PROC ENTROPY returns several measures of fit. First, the value of the objective function is returned. Next, the signal entropy is provided followed by the noise entropy. The sum of the noise and signal entropies should equal the value of the objective function. The next two metrics that follow are the normed entropies of both the signal and the noise.

Normalized entropy (NE) measures the relative informational content of both the signal and noise components through p and w, respectively (Golan, Judge, and Miller, 1996). Let S denote the normalized entropy of the signal, $ X \!  \beta $, defined as:

\[  S(\tilde{p}) = \frac{-\tilde{p} \,  \ln (\tilde{p})}{-q \,  \ln (q)}  \]

where $ S(\tilde{p}) \,  \epsilon \,  [0,1] $. In the case of GME, where uniform priors are assumed, S can be written as:

\[  S(\tilde{p}) = \frac{-\tilde{p} \,  \ln (\tilde{p})}{\sum _ i \ln (M_ i)}  \]

where $M_ i$ is the number of support points for parameter $i$. A value of 0 for S implies that there is no uncertainty regarding the parameters; hence, it is a degenerate situation. However, a value of 1 implies that the posterior distributions equal the priors, which indicates total uncertainty if the priors are uniform.

Because NE is relative, it can be used for comparing various situations. Consider adding a data point to the model. If $ S_{T+1} = S_{T}$, then there is no additional information contained within that data constraint. However, if $ S_{T+1} < S_{T} $, then the data point gives a more informed set of parameter estimates.

NE can be used for determining the importance of particular variables with regard to the reduction of the uncertainty they bring to the model. Each of the $k$ parameters that is estimated has an associated NE defined as

\[  S(\tilde{p_{k}}) = \frac{-\tilde{p}_{k} \,  \ln (\tilde{p}_{k})}{- \ln (q_{k})}  \]

or, in the GME case,

\[  S(\tilde{p_{k}}) = \frac{-\tilde{p}_{k} \,  \ln (\tilde{p}_{k})}{\ln (M)}  \]

where $\tilde{p_{k}}$ is the vector of supports for parameter $\beta _{k}$ and $M$ is the corresponding number of support points. Since a value of 1 implies no relative information for that particular sample, Golan, Judge, and Miller (1996) suggest an exclusion criteria of $S(\tilde{p_{k}}) > 0.99 $ as an acceptable means of selecting noninformative variables. See Golan, Judge, and Miller (1996) for some simulation results.

The final set of measures of fit are the parameter information index and error information index. These measures can be best summarized as 1 – the appropriate normed entropy.