Given the order p, let be the vector of current and past values relevant to prediction of :
Let be the vector of current and future values:
In the canonical correlation analysis, consider submatrices of the sample covariance matrix of and . This covariance matrix, , has a block Hankel form:
|
The canonical correlation analysis forms a sequence of potential state vectors . Examine a sequence of subvectors of , form the submatrix that consists of the rows and columns of that correspond to the components of , and compute its canonical correlations.
The smallest canonical correlation of is then used in the selection of the components of the state vector. The selection process is described in the following discussion. For more details about this process, see Akaike (1976).
In the following discussion, the notation denotes the wide sense conditional expectation (best linear predictor) of , given all with s less than or equal to t. In the notation , the first subscript denotes the ith component of .
The initial state vector is set to . The sequence is initialized by setting
That is, start by considering whether to add to the initial state vector .
The procedure forms the submatrix that corresponds to and computes its canonical correlations. Denote the smallest canonical correlation of as . If is significantly greater than 0, is added to the state vector.
If the smallest canonical correlation of is not significantly greater than 0, then a linear combination of is uncorrelated with the past, . Assuming that the determinant of is not 0, (that is, no input series is a constant), you can take the coefficient of in this linear combination to be 1. Denote the coefficients of in this linear combination as . This gives the relationship:
Therefore, the current state vector already contains all the past information useful for predicting and any greater leads of . The variable is not added to the state vector, nor are any terms considered as possible components of the state vector. The variable is no longer active for state vector selection.
The process described for is repeated for the remaining elements of . The next candidate for inclusion in the state vector is the next component of that corresponds to an active variable. Components of that correspond to inactive variables that produced a zero in a previous step are skipped.
Denote the next candidate as . The vector is formed from the current state vector and as follows:
The matrix is formed from and its canonical correlations are computed. The smallest canonical correlation of is judged to be either greater than or equal to 0. If it is judged to be greater than 0, is added to the state vector. If it is judged to be 0, then a linear combination of is uncorrelated with the , and the variable is now inactive.
The state vector selection process continues until no active variables remain.
For each step in the canonical correlation sequence, the significance of the smallest canonical correlation is judged by an information criterion from Akaike (1976). This information criterion is
where q is the dimension of at the current step, r is the order of the state vector, p is the order of the vector autoregressive process, and is the value of the SIGCORR= option. The default is SIGCORR=2. If this information criterion is less than or equal to 0, is taken to be 0; otherwise, it is taken to be significantly greater than 0. (Do not confuse this information criterion with the AIC.)
Variables in are not added in the model, even with positive information criterion, because of the singularity of . You can force the consideration of more candidate state variables by increasing the size of the matrix by specifying a PASTMIN= option value larger than p.
To print the details of the canonical correlation analysis process, specify the CANCORR option in the PROC STATESPACE statement. The CANCORR option prints the candidate state vectors, the canonical correlations, and the information criteria for testing the significance of the smallest canonical correlation.
Bartlett’s and its degrees of freedom are also printed when the CANCORR option is specified. The formula used for Bartlett’s is
with degrees of freedom.
Figure 28.12 shows the output of the CANCORR option for the introductory example shown in the Getting Started: STATESPACE Procedure.
proc statespace data=in out=out lead=10 cancorr; var x(1) y(1); id t; run;
Figure 28.12: Canonical Correlations Analysis
x(T;T) | y(T;T) | x(T+1;T) | Information Criterion | Chi Square | DF |
---|---|---|---|---|---|
1 | 1 | 0.237045 | 3.566167 | 11.4505 | 4 |
New variables are added to the state vector if the information criteria are positive. In this example, and are not added to the state space vector because the information criteria for these models are negative.
If the information criterion is nearly 0, then you might want to investigate models that arise if the opposite decision is made regarding . This investigation can be accomplished by using a FORM statement to specify part or all of the state vector.
When a candidate variable yields a zero and is not added to the state vector, a linear combination of is uncorrelated with the . Because of the method used to construct the sequence, the coefficient of in can be taken as 1. Denote the coefficients of in this linear combination as .
This gives the relationship:
The vector is used as a preliminary estimate of the first r columns of the row of the transition matrix corresponding to .