PROC CORRESP: Algorithm and Notation :: SAS/STAT(R) 9.3 User's Guide

Algorithm and Notation

This section is primarily based on the theory of correspondence analysis found in Greenacre (1984). If you are interested in other references, see the section Background.

Let $\text{[math]}$ be the contingency table formed from those observations and variables that are not supplementary and from those observations that have no missing values and have a positive weight. This table is an $\text{[math]}$ rank q matrix of nonnegative numbers with nonzero row and column sums. If $\text{[math]}$ is the binary coding for variable A, and $\text{[math]}$ is the binary coding for variable B, then $\text{[math]}$ is a contingency table. Similarly, if $\text{[math]}$ contains the binary coding for both variables B and C, then $\text{[math]}$ can also be input to a correspondence analysis. With the BINARY option, $\text{[math]}$ , and the analysis is based on a binary table. In multiple correspondence analysis, the analysis is based on a Burt table, $\text{[math]}$ .

Let $\text{[math]}$ be a vector of 1s of the appropriate order, let $\text{[math]}$ be an identity matrix, and let diag $\text{[math]}$ be a matrix-valued function that creates a diagonal matrix from a vector. Let

	$\text{[math]}$	$\text{[math]}$
	$\text{[math]}$	$\text{[math]}$
	$\text{[math]}$	$\text{[math]}$
	$\text{[math]}$	$\text{[math]}$
	$\text{[math]}$	$\text{[math]}$
	$\text{[math]}$	$\text{[math]}$
	$\text{[math]}$	$\text{[math]}$
	$\text{[math]}$	$\text{[math]}$

The scalar f is the sum of all elements in $\text{[math]}$ . The matrix $\text{[math]}$ is a matrix of relative frequencies. The vector $\text{[math]}$ contains row marginal proportions or row "masses." The vector $\text{[math]}$ contains column marginal proportions or column masses. The matrices $\text{[math]}$ and $\text{[math]}$ are diagonal matrices of marginals.

The rows of $\text{[math]}$ contain the "row profiles." The elements of each row of $\text{[math]}$ sum to one. Each $\text{[math]}$ element of $\text{[math]}$ contains the observed probability of being in column j given membership in row i. Similarly, the columns of $\text{[math]}$ contain the column profiles. The coordinates in correspondence analysis are based on the generalized singular value decomposition of $\text{[math]}$ ,

$\text{[math]}$

where

$\text{[math]}$

In multiple correspondence analysis,

$\text{[math]}$

The matrix $\text{[math]}$ , which is the rectangular matrix of left generalized singular vectors, has $\text{[math]}$ rows and q columns; the matrix $\text{[math]}$ , which is a diagonal matrix of singular values, has q rows and columns; and the matrix $\text{[math]}$ , which is the rectangular matrix of right generalized singular vectors, has $\text{[math]}$ rows and q columns. The columns of $\text{[math]}$ and $\text{[math]}$ define the principal axes of the column and row point clouds, respectively.

The generalized singular value decomposition of $\text{[math]}$ , discarding the last singular value (which is zero) and the last left and right singular vectors, is exactly the same as a generalized singular value decomposition of $\text{[math]}$ , discarding the first singular value (which is one), the first left singular vector, $\text{[math]}$ , and the first right singular vector, $\text{[math]}$ . The first (trivial) column of $\text{[math]}$ and $\text{[math]}$ and the first singular value in $\text{[math]}$ are discarded before any results are displayed. You can obtain the generalized singular value decomposition of $\text{[math]}$ from the ordinary singular value decomposition of $\text{[math]}$ :

$\text{[math]}$

Hence, $\text{[math]}$ and $\text{[math]}$ .

The default row coordinates are $\text{[math]}$ , and the default column coordinates are $\text{[math]}$ . Typically the first two columns of $\text{[math]}$ and $\text{[math]}$ are plotted to display graphically associations between the row and column categories. The plot consists of two overlaid plots, one for rows and one for columns. The row points are row profiles, and the column points are column profiles, both rescaled so that distances between profiles can be displayed as ordinary Euclidean distances, then orthogonally rotated to a principal axes orientation. Distances between row points and other row points have meaning, as do distances between column points and other column points. However, distances between column points and row points are not interpretable.

The PROFILE=, ROW=, and COLUMN= Options

The PROFILE=, ROW=, and COLUMN= options standardize the coordinates before they are displayed and placed in the output data set. The options PROFILE=BOTH, PROFILE=ROW, and PROFILE=COLUMN provide the standardizations that are typically used in correspondence analysis. There are six choices each for row and column coordinates (see Table 31.3). However, most of the combinations of the ROW= and COLUMN= options are not useful. The ROW= and COLUMN= options are provided for completeness, but they are not intended for general use.

Table 31.3 Coordinates
ROW=		Matrix Formula
A		$\text{[math]}$
AD		$\text{[math]}$
DA		$\text{[math]}$
DAD		$\text{[math]}$
DAD1/2		$\text{[math]}$
DAID1/2		$\text{[math]}$
COLUMN=		Matrix Formula
B		$\text{[math]}$
BD		$\text{[math]}$
DB		$\text{[math]}$
DBD		$\text{[math]}$
DBD1/2		$\text{[math]}$
DBID1/2		$\text{[math]}$

When PROFILE=ROW (ROW=DAD and COLUMN=DB), the row coordinates $\text{[math]}$ and column coordinates $\text{[math]}$ provide a correspondence analysis based on the row profile matrix. The row profile (conditional probability) matrix is defined as $\text{[math]}$ . The elements of each row of $\text{[math]}$ sum to one. Each $\text{[math]}$ element of $\text{[math]}$ contains the observed probability of being in column j given membership in row i. The "principal" row coordinates $\text{[math]}$ and "standard" column coordinates $\text{[math]}$ provide a decomposition of $\text{[math]}$ . Since $\text{[math]}$ , the row coordinates are weighted centroids of the column coordinates. Each column point, with coordinates scaled to standard coordinates, defines a vertex in $\text{[math]}$ -dimensional space. All of the principal row coordinates are located in the space defined by the standard column coordinates. Distances among row points have meaning, but distances among column points and distances between row and column points are not interpretable.

The option PROFILE=COLUMN can be described as applying the PROFILE=ROW formulas to the transpose of the contingency table. When PROFILE=COLUMN (ROW=DA and COLUMN=DBD), the principal column coordinates $\text{[math]}$ are weighted centroids of the standard row coordinates $\text{[math]}$ . Each row point, with coordinates scaled to standard coordinates, defines a vertex in $\text{[math]}$ -dimensional space. All of the principal column coordinates are located in the space defined by the standard row coordinates. Distances among column points have meaning, but distances among row points and distances between row and column points are not interpretable.

The usual sets of coordinates are given by the default PROFILE=BOTH (ROW=DAD and COLUMN=DBD). All of the summary statistics, such as the squared cosines and contributions to inertia, apply to these two sets of points. One advantage to using these coordinates is that both sets $\text{[math]}$ and $\text{[math]}$ are postmultiplied by the diagonal matrix $\text{[math]}$ , which has diagonal values that are all less than or equal to one. When $\text{[math]}$ is a part of the definition of only one set of coordinates, that set forms a tight cluster near the centroid, whereas the other set of points is more widely dispersed. Including $\text{[math]}$ in both sets makes a better graphical display. However, care must be taken in interpreting such a plot. No correct interpretation of distances between row points and column points can be made.

Another property of this choice of coordinates concerns the geometry of distances between points within each set. The default row coordinates can be decomposed into $\text{[math]}$ . The row coordinates are row profiles $\text{[math]}$ , rescaled by $\text{[math]}$ (rescaled so that distances between profiles are transformed from a chi-square metric to a Euclidean metric), then orthogonally rotated (with $\text{[math]}$ ) to a principal axes orientation. Similarly, the column coordinates are column profiles rescaled to a Euclidean metric and orthogonally rotated to a principal axes orientation.

The rationale for computing distances between row profiles by using the non-Euclidean chi-square metric is as follows. Each row of the contingency table can be viewed as a realization of a multinomial distribution conditional on its row marginal frequency. The null hypothesis of row and column independence is equivalent to the hypothesis of homogeneity of the row profiles. A significant chi-square statistic is geometrically interpreted as a significant deviation of the row profiles from their centroid, $\text{[math]}$ . The chi-square metric is the Mahalanobis metric between row profiles based on their estimated covariance matrix under the homogeneity assumption (Greenacre and Hastie; 1987). A parallel argument can be made for the column profiles.

When ROW=DAD1/2 and COLUMN=DBD1/2 (Gifi; 1990; van der Heijden and de Leeuw; 1985), the row coordinates $\text{[math]}$ and column coordinates $\text{[math]}$ are a decomposition of $\text{[math]}$ .

In all of the preceding pairs, distances between row and column points are not meaningful. This prompted Carroll, Green, and Schaffer (1986) to propose that row coordinates $\text{[math]}$ and column coordinates $\text{[math]}$ be used. These coordinates are (except for a constant scaling) the coordinates from a multiple correspondence analysis of a Burt table created from two categorical variables. This standardization is available with ROW=DAID1/2 and COLUMN=DBID1/2. However, this approach has been criticized on both theoretical and empirical grounds by Greenacre (1989). The Carroll, Green, and Schaffer standardization relies on the assumption that the chi-square metric is an appropriate metric for measuring the distance between the columns of a bivariate indicator matrix. See the section Using the TABLES Statement for a description of indicator matrices. Greenacre (1989) showed that this assumption cannot be justified.

The MCA Option

The MCA option performs a multiple correspondence analysis (MCA). This option requires a Burt table. You can specify the MCA option with a table created from a design matrix with fuzzy coding schemes as long as every row of every partition of the design matrix has the same marginal sum. For example, each row of each partition could contain the probabilities that the observation is a member of each level. Then the Burt table constructed from this matrix no longer contains all integers, and the diagonal partitions are no longer diagonal matrices, but MCA is still valid.

A TABLES statement with a single variable list creates a Burt table. Thus, you can always specify the MCA option with this type of input. If you use the MCA option when reading an existing table with a VAR statement, you must ensure that the table is a Burt table.

If you perform MCA on a table that is not a Burt table, the results of the analysis are invalid. If the table is not symmetric, or if the sums of all elements in each diagonal partition are not equal, PROC CORRESP displays an error message and quits.

A subset of the columns of a Burt table is not necessarily a Burt table, so in MCA it is not appropriate to designate arbitrary columns as supplementary. You can, however, designate all columns from one or more categorical variables as supplementary.

The results of a multiple correspondence analysis of a Burt table $\text{[math]}$ are the same as the column results from a simple correspondence analysis of the binary (or fuzzy) matrix $\text{[math]}$ . Multiple correspondence analysis is not a simple correspondence analysis of the Burt table. It is not appropriate to perform a simple correspondence analysis of a Burt table. The MCA option is based on $\text{[math]}$ , whereas a simple correspondence analysis of the Burt table would be based on $\text{[math]}$ .

Since the rows and columns of the Burt table are the same, no row information is displayed or written to the output data sets. The resulting inertias and the default (COLUMN=DBD) column coordinates are the appropriate inertias and coordinates for an MCA. The supplementary column coordinates, cosines, and quality of representation formulas for MCA differ from the simple correspondence analysis formulas because the design matrix column profiles and left singular vectors are not available.

The following statements create a Burt table and perform a multiple correspondence analysis:

proc corresp data=Neighbor observed short mca;
   tables Hair Height Sex Age;
run;

Both the rows and the columns have the same nine categories (Blond, Brown, White, Short, Tall, Female, Male, Old, and Young).

MCA Adjusted Inertias

The usual principal inertias of a Burt table constructed from m categorical variables in MCA are the eigenvalues $\text{[math]}$ from $\text{[math]}$ . The problem with these inertias is that they provide a pessimistic indication of fit. Benzécri (1979) proposed the following inertia adjustment, which is also described by Greenacre (1984, p. 145):

$\text{[math]}$ $\text{[math]}$ for $\text{[math]}$

This adjustment computes the percent of adjusted inertia relative to the sum of the adjusted inertias for all inertias greater than $\text{[math]}$ . The Benzécri adjustment is available with the BENZECRI option.

Greenacre (1994, p. 156) argues that the Benzécri adjustment overestimates the quality of fit. Greenacre proposes instead to compute the percentage of adjusted inertia relative to

$\text{[math]}$

for all inertias greater than $\text{[math]}$ , where $\text{[math]}$ is the sum of squared inertias. The Greenacre adjustment is available with the GREENACRE option.

Ordinary unadjusted inertias are printed by default with MCA when neither the BENZECRI nor the GREENACRE option is specified. However, the unadjusted inertias are not printed by default when either the BENZECRI or the GREENACRE option is specified. To display both adjusted and unadjusted inertias, specify the UNADJUSTED option in addition to the relevant adjusted inertia option (BENZECRI, GREENACRE, or both).

Supplementary Rows and Columns

Supplementary rows and columns are represented as points in the joint row and column space, but they are not used in determining the locations of the other active rows and columns of the table. The formulas that are used to compute coordinates for the supplementary rows and columns depend on the PROFILE= option or the ROW= and COLUMN= options. Let $\text{[math]}$ be a matrix with rows that contain the supplementary observations, and let $\text{[math]}$ be a matrix with rows that contain the supplementary variables. Note that $\text{[math]}$ is defined to be the transpose of the supplementary variable partition of the table. Let $\text{[math]}$ be the supplementary observation profile matrix, and let $\text{[math]}$ be the supplementary variable profile matrix. Note that the notation diag $\text{[math]}$ means to convert the vector to a diagonal matrix, then invert the diagonal matrix. The coordinates for the supplementary observations and variables are shown in Table 31.4.

Table 31.4 Coordinates for Supplementary Observations
ROW=	Matrix Formula
A	$\text{[math]}$
AD	$\text{[math]}$
DA	$\text{[math]}$
DAD	$\text{[math]}$
DAD1/2	$\text{[math]}$
DAID1/2	$\text{[math]}$
COLUMN=	Matrix Formula
B	$\text{[math]}$
BD	$\text{[math]}$
DB	$\text{[math]}$
DBD	$\text{[math]}$
DBD1/2	$\text{[math]}$
DBID1/2	$\text{[math]}$
MCA COLUMN=	Matrix Formula
B	not allowed
BD	not allowed
DB	$\text{[math]}$
DBD	$\text{[math]}$
DBD1/2	$\text{[math]}$
DBID1/2	$\text{[math]}$

Statistics That Aid Interpretation

The partial contributions to inertia, squared cosines, quality of representation, inertia, and mass provide additional information about the coordinates. These statistics are displayed by default. Include the SHORT or NOPRINT option in the PROC CORRESP statement to avoid having these statistics displayed.

These statistics pertain to the default PROFILE=BOTH coordinates, no matter what values you specify for the ROW=, COLUMN=, or PROFILE= option. Let sq $\text{[math]}$ be a matrix-valued function denoting element-wise squaring of the argument matrix. Let t be the total inertia (the sum of the elements in $\text{[math]}$ ).

In MCA, let $\text{[math]}$ be the Burt table partition containing the intersection of the supplementary columns and the supplementary rows. The matrix $\text{[math]}$ is a diagonal matrix of marginal frequencies of the supplemental columns of the binary matrix $\text{[math]}$ . Let p be the number of rows in this design matrix. The statistics are defined in Table 31.5.

Table 31.5 Statistics That Aid Interpretation
Statistic		Matrix Formula
Row partial contributions		$\text{[math]}$ sq $\text{[math]}$
to inertia
Column partial contributions		$\text{[math]}$ sq $\text{[math]}$
to inertia
Row squared cosines		diag(sq $\text{[math]}$ sq $\text{[math]}$
Column squared cosines		diag(sq $\text{[math]}$ sq $\text{[math]}$
Row mass		$\text{[math]}$
Column mass		$\text{[math]}$
Row inertia		$\text{[math]}$ sq $\text{[math]}$
Column inertia		$\text{[math]}$ sq $\text{[math]}$
Supplementary row		diag(sq $\text{[math]}$ sq $\text{[math]}$
squared cosines
Supplementary column		diag(sq $\text{[math]}$ sq $\text{[math]}$
squared cosines
MCA supplementary column		$\text{[math]}$ sq $\text{[math]}$
squared cosines

The quality of representation in the DIMENS=n dimensional display of any point is the sum of its squared cosines over only the n dimensions. Inertia and mass are not defined for supplementary points.

A table that summarizes the partial contributions to inertia table is also computed. The points that best explain the inertia of each dimension and the dimension to which each point contributes the most inertia are indicated. The output data set variable names for this table are Best1–Bestn (where DIMENS=n) and Best. The Best column contains the dimension number of the largest partial contribution to inertia for each point (the index of the maximum value in each row of $\text{[math]}$ sq $\text{[math]}$ or $\text{[math]}$ sq $\text{[math]}$ ).

For each row, the Best1–Bestn columns contain either the corresponding value of Best, if the point is one of the biggest contributors to the dimension’s inertia, or 0 if it is not. Specifically, Best1 contains the value of Best for the point with the largest contribution to dimension one’s inertia. A cumulative proportion sum is initialized to this point’s partial contribution to the inertia of dimension one. If this sum is less than the value for the MININERTIA= option, then Best1 contains the value of Best for the point with the second-largest contribution to dimension one’s inertia. Otherwise, this point’s Best1 is 0. This point’s partial contribution to inertia is added to the sum. This process continues for the point with the third-largest partial contribution, and so on, until adding a point’s contribution to the sum increases the sum beyond the value of the MININERTIA= option. This same algorithm is then used for Best2, and so on.

For example, the following table contains contributions to inertia and the corresponding Best variables. The contribution to inertia variables are proportions that sum to 1 within each column. The first point makes its greatest contribution to the inertia of dimension two, so Best for point one is set to 2, and Best1–Best3 for point one must all be 0 or 2. The second point also makes its greatest contribution to the inertia of dimension two, so Best for point two is set to 2, and Best1–Best3 for point two must all be 0 or 2, and so on.

Assume MININERTIA=0.8, the default. Table 31.6 shows some contributions to inertia. In dimension one, the largest contribution is 0.41302 for the fourth point, so Best1 is set to 1, the value of Best for the fourth point. Because this value is less than 0.8, the second-largest value (0.36456 for point five) is found and its Best1 is set to its Best’s value of 1. Because $\text{[math]}$ is less than 0.8, the third point (0.0882 at point eight) is found and Best1 is set to 3, since the contribution to dimension three for that point is greater than the contribution to dimension one. This increases the sum of the partial contributions to greater than 0.8, so the remaining Best1 values are all 0.

Table 31.6 Best Statistics
Contr1	Contr2	Contr3	Best1	Best2	Best3	Best
0.01593	0.32178	0.07565	0	2	2	2
0.03014	0.24826	0.07715	0	2	2	2
0.00592	0.02892	0.02698	0	0	0	2
0.41302	0.05191	0.05773	1	0	0	1
0.36456	0.00344	0.15565	1	0	1	1
0.03902	0.30966	0.11717	0	2	2	2
0.00019	0.01840	0.00734	0	0	0	2
0.08820	0.00527	0.16555	3	0	3	3
0.01447	0.00024	0.03851	0	0	0	3
0.02855	0.01213	0.27827	0	0	3	3

The CORRESP Procedure

The PROFILE=, ROW=, and COLUMN= Options

The MCA Option

MCA Adjusted Inertias

Supplementary Rows and Columns

Statistics That Aid Interpretation