The AUTOREG Procedure

Heteroscedasticity- and Autocorrelation-Consistent Covariance Matrix Estimator

The heteroscedasticity-consistent covariance matrix estimator (HCCME), also known as the sandwich (or robust or empirical) covariance matrix estimator, has been popular in recent years because it gives the consistent estimation of the covariance matrix of the parameter estimates even when the heteroscedasticity structure might be unknown or misspecified. White (1980) proposes the concept of HCCME, known as HC0. However, the small-sample performance of HC0 is not good in some cases. Davidson and MacKinnon (1993) introduce more improvements to HC0, namely HC1, HC2 and HC3, with the degrees-of-freedom or leverage adjustment. Cribari-Neto (2004) proposes HC4 for cases that have points of high leverage.

HCCME can be expressed in the following general sandwich form:

\[  \Sigma =B^{-1} M B^{-1}  \]

where $B$, which stands for bread, is the Hessian matrix and $M$, which stands for meat, is the outer product of gradient (OPG) with or without adjustment. For HC0, $M$ is the OPG without adjustment; that is,

\[  M_{\mr {HC0}}=\sum _{t=1}^{T}{g_ t g_ t’}  \]

where $T$ is the sample size and $g_ t$ is the gradient vector of $t$th observation. For HC1, $M$ is the OPG with the degrees-of-freedom correction; that is,

\[  M_{\mr {HC1}}=\frac{T}{T-k}\sum _{t=1}^{T}{g_ t g_ t’}  \]

where $k$ is the number of parameters. For HC2, HC3, and HC4, the adjustment is related to leverage, namely,

\[  M_{\mr {HC2}}=\sum _{t=1}^{T}{\frac{g_ t g_ t}{1-h_{tt}}} \\ M_{\mr {HC3}}=\sum _{t=1}^{T}{\frac{g_ t g_ t}{(1-h_{tt})^2}} \\ M_{\mr {HC4}}=\sum _{t=1}^{T}{\frac{g_ t g_ t}{(1-h_{tt})^{\min {(4,T h_{tt}/k)}}}}  \]

The leverage $h_{tt}$ is defined as $h_{tt}\equiv j_ t’(\sum _{t=1}^{T}{j_ t j_ t’})^{-1} j_ t$, where $j_ t$ is defined as follows:

  • For an OLS model, $j_ t$ is the $t$th observed regressors in column vector form.

  • For an AR error model, $j_ t$ is the derivative vector of the $t$th residual with respect to the parameters.

  • For a GARCH or heteroscedasticity model, $j_ t$ is the gradient of the $t$th observation (that is, $g_ t$).

The heteroscedasticity- and autocorrelation-consistent (HAC) covariance matrix estimator can also be expressed in sandwich form:

\[  \Sigma =B^{-1} M B^{-1}  \]

where $B$ is still the Hessian matrix, but $M$ is the kernel estimator in the following form:

\[  M_{\mr {HAC}}=a\left(\sum _{t=1}^{T}{g_ t g_ t’}+\sum _{j=1}^{T-1}{k(\frac{j}{b})\sum _{t=1}^{T-j}{\left(g_ t g_{t+j}’ + g_{t+j} g_{t}’\right)}}\right)  \]

where $T$ is the sample size, $g_ t$ is the gradient vector of $t$th observation, $k(.)$ is the real-valued kernel function, $b$ is the bandwidth parameter, and $a$ is the adjustment factor of small-sample degrees of freedom (that is, $a=1$ if ADJUSTDF option is not specified and otherwise $a=T/(T-k)$, where $k$ is the number of parameters). The types of kernel functions are listed in Table 8.2.

Table 8.2: Kernel Functions

Kernel Name

Equation

Bartlett

$k(x)=\left\{  \begin{array}{ll} 1-|x| &  |x|\leq 1 \\ 0 &  \text {otherwise} \end{array} \right.$

Parzen

$k(x)=\left\{  \begin{array}{ll} 1-6x^2+6|x|^3 &  0\leq |x| \leq 1/2 \\ 2(1-|x|)^3 &  1/2 \leq |x| \leq 1 \\ 0 &  \text {otherwise} \end{array} \right.$

Quadratic spectral

$k(x)=\frac{25}{12\pi ^2x^2} \left( \frac{\sin {(6\pi x/5)}}{6\pi x/5} - \cos {(6\pi x/5)} \right)$

Truncated

$k(x)=\left\{  \begin{array}{ll} 1 &  |x|\leq 1 \\ 0 &  \text {otherwise} \end{array} \right.$

Tukey-Hanning

$k(x)=\left\{  \begin{array}{ll} \left(1+\cos {(\pi x)}\right)/2 &  |x|\leq 1 \\ 0 &  \text {otherwise} \end{array} \right.$


When you specify BANDWIDTH=ANDREWS91, according to Andrews (1991) the bandwidth parameter is estimated as shown in Table 8.3.

Table 8.3: Bandwidth Parameter Estimation

Kernel Name

Bandwidth Parameter

Bartlett

$b = 1.1447(\alpha (1)T)^{1/3}$

Parzen

$b = 2.6614(\alpha (2)T)^{1/5}$

Quadratic spectral

$b = 1.3221(\alpha (2)T)^{1/5}$

Truncated

$b = 0.6611(\alpha (2)T)^{1/5}$

Tukey-Hanning

$b = 1.7462(\alpha (2)T)^{1/5}$


Let $\{ g_{at}\} $ denote each series in $\{ g_ t\} $, and let $(\rho _ a,\sigma _ a^2)$ denote the corresponding estimates of the autoregressive and innovation variance parameters of the AR(1) model on $\{ g_{at}\} $, $a=1,...,k$, where the AR(1) model is parameterized as $g_{at}=\rho g_{at-1} + \epsilon _{at}$ with $Var(\epsilon _{at})=\sigma _ a^2$. The factors $\alpha (1)$ and $\alpha (2)$ are estimated with the following formulas:

\[  \alpha (1) = \frac{\sum _{a=1}^ k{\frac{4\rho _ a^2\sigma _ a^4}{(1-\rho _ a)^6(1+\rho _ a)^2}}}{\sum _{a=1}^ k{\frac{\sigma _ a^4}{(1-\rho _ a)^4}}} \\ \alpha (2) = \frac{\sum _{a=1}^ k{\frac{4\rho _ a^2\sigma _ a^4}{(1-\rho _ a)^8}}}{\sum _{a=1}^ k{\frac{\sigma _ a^4}{(1-\rho _ a)^4}}}  \]

When you specify BANDWIDTH=NEWEYWEST94, according to Newey and West (1994) the bandwidth parameter is estimated as shown in Table 8.4.

Table 8.4: Bandwidth Parameter Estimation

Kernel Name

Bandwidth Parameter

Bartlett

$b = 1.1447(\{ s_1/s_0\} ^2T)^{1/3}$

Parzen

$b = 2.6614(\{ s_1/s_0\} ^2T)^{1/5}$

Quadratic spectral

$b = 1.3221(\{ s_1/s_0\} ^2T)^{1/5}$

Truncated

$b = 0.6611(\{ s_1/s_0\} ^2T)^{1/5}$

Tukey-Hanning

$b = 1.7462(\{ s_1/s_0\} ^2T)^{1/5}$


The factors $s_1$ and $s_0$ are estimated with the following formulas:

\[  s_1 = 2\sum _{j=1}^ n{j\sigma _ j} \\ s_0 = \sigma _0+2\sum _{j=1}^ n{\sigma _ j}  \]

where $n$ is the lag selection parameter and is determined by kernels, as listed in Table 8.5.

Table 8.5: Lag Selection Parameter Estimation

Kernel Name

Lag Selection Parameter

Bartlett

$n = c(T/100)^{2/9}$

Parzen

$n = c(T/100)^{4/25}$

Quadratic spectral

$n = c(T/100)^{2/25}$

Truncated

$n = c(T/100)^{1/5}$

Tukey-Hanning

$n = c(T/100)^{1/5}$


The factor $c$ in Table 8.5 is specified by the C= option; by default it is 12.

The factor $\sigma _ j$ is estimated with the equation

\[  \sigma _ j = T^{-1}\sum _{t=j+1}^{T}{\left(\sum _{a=i}^ k{g_{at}}\sum _{a=i}^ k{g_{at-j}}\right)}, j=0, ..., n  \]

where $i$ is 1 if the NOINT option in the MODEL statement is specified (otherwise, it is 2), and $g_{at}$ is the same as in the Andrews method.

If you specify BANDWIDTH=SAMPLESIZE, the bandwidth parameter is estimated with the equation

\[  b = \left\{  \begin{array}{ l l } \left\lfloor {\gamma T^{r} + c} \right\rfloor &  \text {if BANDWIDTH=SAMPLESIZE(INT) option is specified} \\ \gamma T^{r} + c &  \text {otherwise} \end{array} \right.  \]

where $T$ is the sample size; $\left\lfloor {x} \right\rfloor $ is the largest integer less than or equal to $x$; and $\gamma $, $r$, and $c$ are values specified by the BANDWIDTH=SAMPLESIZE(GAMMA=, RATE=, CONSTANT=) options, respectively.

If you specify the PREWHITENING option, $g_ t$ is prewhitened by the VAR(1) model,

\[  g_ t = A g_{t-1} + w_ t  \]

Then $M$ is calculated by

\[  M_{\mr {HAC}}=a\left((I-A)^{-1}\right)’\left(\sum _{t=1}^{T}{w_ t w_ t’}+\sum _{j=1}^{T-1}{k(\frac{j}{b})\sum _{t=1}^{T-j}{\left(w_ t w_{t+j}’ + w_{t+j} w_{t}’\right)}}\right)(I-A)^{-1}  \]

The bandwidth calculation is also based on the prewhitened series $w_ t$.