The VARMAX procedure uses numerous linear algebra routines and frequently uses the sweep operator (Goodnight 1979) and the Cholesky root (Golub and Van Loan 1983).
In addition, the VARMAX procedure uses the nonlinear optimization (NLO) subsystem to perform nonlinear optimization tasks for the maximum likelihood estimation. The optimization requires intensive computation.
For some data sets, the computation algorithm can fail to converge. Nonconvergence can result from a number of causes, including flat or ridged likelihood surfaces and illconditioned data.
If you experience convergence problems, the following points might be helpful:
Data that contain extreme values can affect results in PROC VARMAX. Rescaling the data can improve stability.
Changing the TECH=, MAXITER=, and MAXFUNC= options in the NLOPTIONS statement can improve the stability of the optimization process.
Specifying a different model that might fit the data more closely and might improve convergence.
Let be the length of each series, be the number of dependent variables, be the order of autoregressive terms, and be the order of movingaverage terms. The number of parameters to estimate for a VARMA() model is

As increases, the number of parameters to estimate increases very quickly. Furthermore the memory requirement for VARMA() quadratically increases as and increase.
For a VARMAX() model and GARCHtype multivariate conditional heteroscedasticity models, the number of parameters to estimate and the memory requirements are considerable.