![]() | ![]() | ![]() | ![]() | ![]() | ![]() |
Contents: | Purpose / History / Requirements / Usage / Details / Limitations / Missing Values / References |
NOTE: Beginning in SAS 9.4, this macro is no longer needed. Use the OUTPLC= option in Base SAS PROC CORR to save a matrix of polychoric (or tetrachoric) correlations.
Version | Update Notes |
1.7 | Fixed errors referring to variable type or length of the NAME variable when analyzing character variables. Improved distance example. Added ID=, ORDER=, and PRINTLEVELS= options. Made _NUMERIC_, _CHARACTER_, _ALL_ available with VAR=. Fixed looping if variable named I or J used. Fixed problem if NAME is variable in input data set. Use of ODS now makes SAS 8 the minimum release. |
1.6 | Fixed bug that didn't allow a variable named X in the input data. Default for VAR= is now all variables, not just numeric variables. Fixed problem where correlation from previous variable pair was used when the current pair does not form at least a 2x2 table (a WARNING appears in the log when this happens). Use of %sysfunc requires SAS 6.12 or later. Added automatic check for new version. |
1.5 | Fixed display of converge message. Capture and reset NOTES option at end. |
1.4 | Removed NOSTIMER option. Allow for long variable names in SAS 7 or later. Added version indicator to macro notes. |
1.3 | Print message if no convergence when computing polychoric correlations. Added CONVERGE= and MAXITER= options. |
1.2 | Fixed problem with macro notes not printing. |
%inc "<location of your file containing the POLYCHOR macro>";
Following this statement, you may call the %POLYCHOR macro. See the Results tab for an example.
The options and allowable values are:
Convergence problems
The PLCORR option uses an iterative, maximum likelihood method to
estimate the polychoric correlation. Occasionally, this method
will not converge on an estimate, and in that case, the value of
the correlation is set to missing. By adjusting the values of the
CONVERGE= and/or MAXITER= options in the TABLE statement of PROC
FREQ, you may be able to obtain an estimate. For example, the
following statements attempt to estimate the polychoric correlation
between variables X1 and X2 setting the convergence criterion to a
more lenient 0.001 and allowing more iterations than the default.
See the FREQ chapter of the SAS/STAT User's Guide for details on
these options.
proc freq; table x1 * x2 / plcorr converge=.001 maxiter=30; run;
You can adjust the CONVERGE= and MAXITER= options for the estimation of all polychoric correlations by using the CONVERGE= and MAXITER= options in the %POLYCHOR macro.
Ordering of levels
The variables are assumed to be ordinal variables. PROC FREQ forms a crosstabulation for each pair of variables. Note that the ordering of the rows and columns in a table affects the computation of the polychoric correlation. For instance, two variables with levels LOW, MEDIUM, and HIGH, in this order, will produce a different correlation estimate when ordered MEDIUM, LOW, HIGH. You should verify that all of your variables will be in the desired order for your chosen setting of the ORDER= option. The PRINTLEVELS=YES option displays all character variable levels in the order used when computing the correlations. The ORDER= option affects the ordering of all variables, character and numeric, in the analysis.
Obtaining a correlation matrix
If TYPE=CORR is specified, the individual correlation coefficients are then assembled into
a TYPE=CORR data set containing a matrix of polychoric
correlations. The resulting data set can be used, but for descriptive analyses only, in
either the FACTOR or the CALIS procedure (specify METHOD=ULS in
either procedure). If the
maximum likelihood method (METHOD=ML) is used, note that none of
the hypothesis tests will be valid, and the polychoric correlation
matrix may be indefinite with small samples.
Obtaining a distance matrix
If TYPE=DISTANCE is specified, a TYPE=DISTANCE data set is created containing a matrix of
dissimilarity values. The dissimilarity value used is computed as
1 - plcorr 2, where plcorr is the polychoric correlation.
It is assumed that the columns of the input data set are the items to be clustered and that each row of the data set is a variable (such as HEIGHT) on which the items are measured. Collectively, the variables (rows) locate the items (columns) to be clustered. Note that this data structure is the transpose of the usual data set input to such procedures as PROC CLUSTER in which the items to be clustered are the rows (observations) in the data set and the variables which locate the items are the columns. You can use PROC TRANSPOSE to convert rows to columns. The distance data set created by the %POLYCHOR macro includes a variable containing the item (column) names which can be used in subsequent analyses to identify the items. You can name this variable using the ID= option (the default name is _ID_).
The output data set
can be used in the CLUSTER procedure (but the CCC value is not
valid) or the MODECLUS procedure. As discussed in the documentation of the these procedures and the DISTANCE procedure, variables with higher variability have greater effect on the distance measure. As a result, you may want to standardize the variables before computing distances. This can be done by using PROC STDIZE.
See the Appendix, Special SAS Data Sets in the SAS/STAT User's Guide for a description of TYPE=CORR and DISTANCE data sets.
If a message which begins like this appears in the SAS log:
WARNING: No OUTPUT data set is produced because no statistics can be computed for this table, which has ...
it indicates that the current pair of variables does not form at least a 2x2 table. The polychoric correlation can not be computed and is set to missing in the output data set.
If some polychoric correlations could not be estimated and are missing in the OUT= data set, then an attempt to use the data set as input to an analytical procedure such as PROC PRINCOMP results in this message in the SAS log:
ERROR: CORR matrix incomplete in data set WORK._PLCORR.
All correlations must be nonmissing in order to do an analysis.
See the DETAILS and LIMITATIONS sections above for information on polychoric correlation estimates that are set to missing.
These sample files and code examples are provided by SAS Institute Inc. "as is" without warranty of any kind, either express or implied, including but not limited to the implied warranties of merchantability and fitness for a particular purpose. Recipients acknowledge and agree that SAS Institute shall not be liable for any damages whatsoever arising out of their use of this material. In addition, SAS Institute will provide no support for the materials contained herein.