The AUTOREG Procedure

GARCH Models

Consider the series ${y_{t}}$, which follows the GARCH process. The conditional distribution of the series Y for time t is written

\[  y_{t}| {\Psi }_{t-1} {\sim } \mr{N}(0, h_{t})  \]

where ${\Psi }_{t-1}$ denotes all available information at time ${t-1}$. The conditional variance ${ h_{t}}$ is

\[  h_{t} = {\omega } + \sum _{i=1}^{q}{{\alpha }_{i} y_{t-i}^{2}} + \sum _{j=1}^{p}{{\gamma }_{j} h_{t-j}}  \]


\[  p \ge 0, q>0  \]
\[  {\omega }>0, {\alpha }_{i} \ge 0, {\gamma }_{j} \ge 0  \]

The GARCH$(p,q)$ model reduces to the ARCH$(q)$ process when ${p=0}$. At least one of the ARCH parameters must be nonzero (${q > 0}$). The GARCH regression model can be written

\[  y_{t} = \mb{x} _{t}’ {\beta } + {\epsilon }_{t}  \]
\[  {\epsilon }_{t} = \sqrt { h_{t}} e_{t}  \]
\[  h_{t} = {\omega } + \sum _{i=1}^{q}{{\alpha }_{i} {\epsilon }_{t-i}^{2}} + \sum _{j=1}^{p}{{\gamma }_{j} h_{t-j}}  \]

where ${ e_{t} {\sim } \mr{IN}(0, 1)}$.

In addition, you can consider the model with disturbances following an autoregressive process and with the GARCH errors. The AR$(m)$-GARCH$(p,q)$ regression model is denoted

\[  y_{t} = \mb{x} _{t}’ {\beta } + {\nu }_{t}  \]
\[  {\nu }_{t} = {\epsilon }_{t} - {\varphi }_{1} {\nu }_{t-1} -{\ldots }- {\varphi }_{m} {\nu }_{t-m}  \]
\[  {\epsilon }_{t} = \sqrt { h_{t}} e_{t}  \]
\[  h_{t} = {\omega } + \sum _{i=1}^{q}{{\alpha }_{i} {\epsilon }_{t-i}^{2}} + \sum _{j=1}^{p}{{\gamma }_{j} h_{t-j}}  \]

GARCH Estimation with Nelson-Cao Inequality Constraints

The GARCH$(p,q)$ model is written in ARCH(${\infty }$) form as

\begin{eqnarray*}  h_{t} & =&  \left( 1-\sum _{j=1}^ p {\gamma }_{j}{B} ^{j} \right)^{-1} \left[{{\omega } + \sum _{i=1}^{q}{{\alpha }_{i} {\epsilon }_{t-i}^{2}}}\right] \nonumber \\ & =&  {\omega }^{*} + \sum _{i=1}^{{\infty }}{{\phi }_{i} {\epsilon }_{t-i}^{2}} \end{eqnarray*}

where B is a backshift operator. Therefore, ${h_{t} \ge 0}$ if ${ {\omega }^{*} \ge 0}$ and ${{\phi }_{i} \ge 0,\   {\forall } i}$. Assume that the roots of the following polynomial equation are inside the unit circle:

\[  \sum _{j=0}^{p}{-{\gamma }_{j}\mi{Z} ^{p-j}}  \]

where ${{\gamma }_{0}=-1}$ and Z is a complex scalar. $-\sum _{j=0}^{p}{{\gamma }_{j}\mi{Z} ^{p-j}}$ and ${\sum _{i=1}^{q}{{\alpha }_{i}\mi{Z} ^{q-i}}}$ do not share common factors. Under these conditions, ${{| {\omega }^{*}|}<{\infty }}$, ${{|{\phi }_{i}|} < {\infty }}$, and these coefficients of the ARCH(${\infty }$) process are well defined.

Define ${n={\max }(p,q)}$. The coefficient ${{\phi }_{i}}$ is written

\begin{eqnarray*}  {\phi }_{0} & =&  {\alpha }_{1} \\ {\phi }_{1} & =&  {\gamma }_{1}{\phi }_{0}+{\alpha }_{2} \\ {\cdots } \\ {\phi }_{n-1} & =&  {\gamma }_{1}{\phi }_{n-2}+ {\gamma }_{2}{\phi }_{n-3}+{\cdots }+{\gamma }_{n-1}{\phi }_{0} +{\alpha }_{n} \\ {\phi }_{k} & =&  {\gamma }_{1}{\phi }_{k-1}+ {\gamma }_{2}{\phi }_{k-2}+{\cdots }+{\gamma }_{n}{\phi }_{k-n} \text { for }k \ge n \nonumber \end{eqnarray*}

where ${{\alpha }_{i}=0}$ for ${i>q}$ and ${{\gamma }_{j}=0}$ for ${j>p}$.

Nelson and Cao (1992) proposed the finite inequality constraints for GARCH$(1,q)$ and GARCH$(2,q)$ cases. However, it is not straightforward to derive the finite inequality constraints for the general GARCH$(p,q)$ model.

For the GARCH$(1,q)$ model, the nonlinear inequality constraints are

\begin{eqnarray*}  {\omega } &  \ge &  0 \\ {\gamma }_{1} & \ge &  0 \\ {\phi }_{k} & \ge &  0\mr{\  for\  }k=0,1,{\cdots },q-1 \nonumber \end{eqnarray*}

For the GARCH$(2,q)$ model, the nonlinear inequality constraints are

\begin{eqnarray*}  {\Delta }_{i} & {\in }&  \emph{R} \text { for }i=1,2 \\ {\omega }^{*} & \ge &  0 \\ {\Delta }_{1} & >&  0 \\ \sum _{j=0}^{q-1}{{\Delta }^{-j}_{1} {\alpha }_{j+1}} & >&  0 \\ {\phi }_{k} & \ge &  0\text { for }k=0,1,{\cdots },q \nonumber \end{eqnarray*}

where ${{\Delta }_{1}}$ and ${{\Delta }_{2}}$ are the roots of ${(\mi{Z} ^{2}-{\gamma }_{1}\mi{Z} -{\gamma }_{2})}$.

For the GARCH$(p,q)$ model with ${p > 2}$, only $\max (q-1,p)+1$ nonlinear inequality constraints (${{\phi }_{k} \ge 0}$ for ${k=0}$ to max(${q-1,p}$)) are imposed, together with the in-sample positivity constraints of the conditional variance ${h_{t}}$.

Using the HETERO Statement with GARCH Models

The HETERO statement can be combined with the GARCH= option in the MODEL statement to include input variables in the GARCH conditional variance model. For example, the GARCH$(1,1)$ variance model with two dummy input variables D1 and D2 is

\begin{eqnarray*}  \epsilon _ t & =&  \sqrt {h_ t}e_ t \\ h_ t & =&  \omega + \alpha _1\epsilon ^2_{t-1} + \gamma _1 h_{t-1} + \eta _1 D1_ t + \eta _2 D2_ t \nonumber \end{eqnarray*}

The following statements estimate this GARCH model:

proc autoreg data=one;
   model y = x z / garch=(p=1,q=1);
   hetero d1 d2;

The parameters for the variables D1 and D2 can be constrained using the COEF= option. For example, the constraints $\eta _1 = \eta _2 = 1$ are imposed by the following statements:

proc autoreg data=one;
   model y = x z / garch=(p=1,q=1);
   hetero d1 d2 / coef=unit;

IGARCH and Stationary GARCH Model

The condition ${\sum _{i=1}^{q}{{\alpha }_{i}}+\sum _{j=1}^{p}{{\gamma }_{j}} < 1}$ implies that the GARCH process is weakly stationary since the mean, variance, and autocovariance are finite and constant over time. When the GARCH process is stationary, the unconditional variance of ${{\epsilon }_{\mi{t} }}$ is computed as

\[  \mb{V} ({\epsilon }_{\mi{t} }) = \frac{{\omega }}{(1- \sum _{i=1}^{q}{{\alpha }_{\mi{i} }}-\sum _{j=1}^{p}{{\gamma }_{\mi{j} }})}  \]

where ${ {\epsilon }_{t}=\sqrt { h_{t}} e_{t}}$ and ${ h_{t}}$ is the GARCH$(p,q)$ conditional variance.

Sometimes the multistep forecasts of the variance do not approach the unconditional variance when the model is integrated in variance; that is, ${\sum _{i=1}^{q}{{\alpha }_{i}} + \sum _{j=1}^{p}{{\gamma }_{j}} = 1}$.

The unconditional variance for the IGARCH model does not exist. However, it is interesting that the IGARCH model can be strongly stationary even though it is not weakly stationary. Refer to Nelson (1990) for details.


The EGARCH model was proposed by Nelson (1991). Nelson and Cao (1992) argue that the nonnegativity constraints in the linear GARCH model are too restrictive. The GARCH model imposes the nonnegative constraints on the parameters, ${\alpha }_{i}$ and ${\gamma }_{j}$, while there are no restrictions on these parameters in the EGARCH model. In the EGARCH model, the conditional variance, $h_{t}$, is an asymmetric function of lagged disturbances ${\epsilon }_{t-i}$:

\[  {\ln }( h_{t}) = {\omega } + \sum _{i=1}^{q}{{\alpha }_{i}g( z_{t-i})} + \sum _{j=1}^{p}{{\gamma }_{j}{\ln }( h_{t-j})}  \]


\[  g( z_{t}) = {\theta } z_{t}+{\gamma }[{|z_{t}|} -{E}{|z_{t}|}]  \]
\[  z_{t} = {\epsilon }_{t}/\sqrt { h_{t}}  \]

The coefficient of the second term in ${g( z_{t})}$ is set to be 1 (${\gamma }$=1) in our formulation. Note that ${{E}{| z_{t}|} = (2/{\pi })^{1/2}}$ if ${ z_{t} {\sim } \mr{N}(0,1)}$. The properties of the EGARCH model are summarized as follows:

  • The function ${g( z_{t})}$ is linear in ${ z_{t}}$ with slope coefficient ${{\theta }+1}$ if ${ z_{t}}$ is positive while ${g( z_{t})}$ is linear in ${ z_{t}}$ with slope coefficient ${{\theta }-1}$ if ${ z_{t}}$ is negative.

  • Suppose that ${{\theta }=0}$. Large innovations increase the conditional variance if ${{{| z_{t}|}- {E}{| z_{t}|}} > 0}$ and decrease the conditional variance if ${{{| z_{t}|}- {E}{| z_{t}|}} < 0}$.

  • Suppose that ${{\theta }<1}$. The innovation in variance, ${g( z_{t})}$, is positive if the innovations ${ z_{t}}$ are less than ${(2/{\pi })^{1/2}/({\theta }-1)}$. Therefore, the negative innovations in returns, ${ {\epsilon }_{t}}$, cause the innovation to the conditional variance to be positive if ${\theta }$ is much less than 1.


As shown in many empirical studies, positive and negative innovations have different impacts on future volatility. There is a long list of variations of GARCH models that consider the asymmetricity. Three typical variations are the quadratic GARCH (QGARCH) model (Engle and Ng, 1993), the threshold GARCH (TGARCH) model (Glosten, Jaganathan, and Runkle, 1993; Zakoian, 1994), and the power GARCH (PGARCH) model (Ding, Granger, and Engle, 1993). For more details about the asymmetric GARCH models, see Engle and Ng (1993).

In the QGARCH model, the lagged errors’ centers are shifted from zero to some constant values:

\[  h_{t} = {\omega } + \sum _{i=1}^{q}{{\alpha }_{i} ( {\epsilon }_{t-i} - {\psi }_{i} )^{2}} + \sum _{j=1}^{p}{{\gamma }_{j} h_{t-j}}  \]

In the TGARCH model, there is an extra slope coefficient for each lagged squared error,

\[  h_{t} = {\omega } + \sum _{i=1}^{q}{({\alpha }_{i} + 1_{{\epsilon }_{t-i}<0} {\psi }_{i}) {\epsilon }_{t-i}^{2}} + \sum _{j=1}^{p}{{\gamma }_{j} h_{t-j}}  \]

where the indicator function $1_{{\epsilon }_{t}<0}$ is one if ${\epsilon }_{t}<0$; otherwise, zero.

The PGARCH model not only considers the asymmetric effect, but also provides another way to model the long memory property in the volatility,

\[  h_{t}^{\lambda } = {\omega } + \sum _{i=1}^{q}{{\alpha }_{i} (|{\epsilon }_{t-i}|-{\psi }_{i}{\epsilon }_{t-i})^{2\lambda }} + \sum _{j=1}^{p}{{\gamma }_{j} h_{t-j}^{\lambda }}  \]

where $\lambda > 0$ and $|{\psi }_{i}| \leq 1, i = 1, ..., q$.

Note that the implemented TGARCH model is also well known as GJR-GARCH (Glosten, Jaganathan, and Runkle, 1993), which is similar to the threshold GARCH model proposed by Zakoian (1994) but not exactly same. In Zakoian’s model, the conditional standard deviation is a linear function of the past values of the white noise. Zakoian’s version can be regarded as a special case of PGARCH model when $\lambda =1/2$.


The GARCH-M model has the added regressor that is the conditional standard deviation:

\[  y_{t} = \mb{x} _{t}’ {\beta } + {\delta }\sqrt { h_{t}} + {\epsilon }_{t}  \]
\[  {\epsilon }_{t} = \sqrt { h_{t}} e_{t}  \]

where ${ h_{t}}$ follows the ARCH or GARCH process.

Maximum Likelihood Estimation

The family of GARCH models are estimated using the maximum likelihood method. The log-likelihood function is computed from the product of all conditional densities of the prediction errors.

When ${e_{t}}$ is assumed to have a standard normal distribution (${e_{t} {\sim } \mr{N}(0,1)}$), the log-likelihood function is given by

\[  l = \sum _{t=1}^{N}{\frac{1}{2}\left[-{\ln }(2{\pi }) -{\ln }( h_{t})- \frac{{\epsilon }_{t}^{2}}{h_{t}}\right]}  \]

where ${{\epsilon }_{t}= y_{t}- \mb{x} _{t}’{\beta }}$ and ${ h_{t}}$ is the conditional variance. When the GARCH$(p,q)$-M model is estimated, ${ {\epsilon }_{t}= y_{t}- \mb{x} _{t}’{\beta }- {\delta }\sqrt { h_{t}}}$. When there are no regressors, the residuals ${{\epsilon }_{t}}$ are denoted as ${y_{t}}$ or ${y_{t} - {\delta }\sqrt { h_{t}}}$.

If ${e_{t}}$ has the standardized Student’s t distribution, the log-likelihood function for the conditional t distribution is

\[  {\ell } = \sum _{t=1}^{\mi{N} } \Biggl [ {\ln }\left({\Gamma }\left(\frac{{\nu }+1}{2}\right)\right) -{\ln }\left({\Gamma }\left(\frac{{\nu }}{2}\right)\right) -\frac{1}{2}{\ln }(({\nu }-2){\pi }h_{t})  \]
\[  -\frac{1}{2}({\nu }+1){\ln }\left(1+\frac{{\epsilon }^{2}_{t}}{h_{t}({\nu }-2)}\right)\Biggr ]  \]

where ${{\Gamma }({\cdot })}$ is the gamma function and ${{\nu }}$ is the degree of freedom (${{\nu }>2}$). Under the conditional t distribution, the additional parameter ${1}/{\nu }$ is estimated. The log-likelihood function for the conditional t distribution converges to the log-likelihood function of the conditional normal GARCH model as ${{1}/{\nu } {\rightarrow }\;  0}$.

The likelihood function is maximized via either the dual quasi-Newton or the trust region algorithm. The default is the dual quasi-Newton algorithm. The starting values for the regression parameters ${\beta }$ are obtained from the OLS estimates. When there are autoregressive parameters in the model, the initial values are obtained from the Yule-Walker estimates. The starting value $1.0^{-6}$ is used for the GARCH process parameters.

The variance-covariance matrix is computed using the Hessian matrix. The dual quasi-Newton method approximates the Hessian matrix while the quasi-Newton method gets an approximation of the inverse of Hessian. The trust region method uses the Hessian matrix obtained using numerical differentiation. When there are active constraints, that is, ${\mb{q} ({\theta })=\mb{0} }$, the variance-covariance matrix is given by

\[  \mb{V} (\hat{{\theta }}) = \mb{H} ^{-1}[\mb{I} - \mb{Q} ’ (\mb{Q} \mb{H} ^{-1}\mb{Q} ’)^{-1}\mb{Q} \mb{H} ^{-1}]  \]

where ${\mb{H} =-{\partial }^{2}l/{\partial }{\theta }{\partial }{\theta }’}$ and ${\mb{Q} = {\partial }\mb{q} ({\theta })/{\partial }{\theta }’}$. Therefore, the variance-covariance matrix without active constraints reduces to ${\mb{V} (\hat{{\theta }}) = \mb{H} ^{-1}}$.