Contents: |
Purpose / History / Requirements / Usage / Details / Limitations / Missing Values / References |

**NOTE: Beginning in SAS 9.4, this macro is no longer needed. Use the OUTPLC= option in Base SAS PROC CORR to save a matrix of polychoric (or tetrachoric) correlations.**

*PURPOSE:*- The %POLYCHOR macro creates a SAS data set containing a correlation matrix of polychoric correlations or a distance matrix based on polychoric correlations.
*HISTORY:*-
*Version**Update Notes*1.7 Fixed errors referring to variable type or length of the NAME variable when analyzing character variables. Improved distance example. Added ID=, ORDER=, and PRINTLEVELS= options. Made _NUMERIC_, _CHARACTER_, _ALL_ available with VAR=. Fixed looping if variable named I or J used. Fixed problem if NAME is variable in input data set. Use of ODS now makes SAS 8 the minimum release. 1.6 Fixed bug that didn't allow a variable named X in the input data. Default for VAR= is now all variables, not just numeric variables. Fixed problem where correlation from previous variable pair was used when the current pair does not form at least a 2x2 table (a WARNING appears in the log when this happens). Use of %sysfunc requires SAS 6.12 or later. Added automatic check for new version. 1.5 Fixed display of converge message. Capture and reset NOTES option at end. 1.4 Removed NOSTIMER option. Allow for long variable names in SAS 7 or later. Added version indicator to macro notes. 1.3 Print message if no convergence when computing polychoric correlations. Added CONVERGE= and MAXITER= options. 1.2 Fixed problem with macro notes not printing. *REQUIREMENTS:*- The %POLYCHOR macro requires Version 8 or later of Base SAS Software.
*USAGE:*-
Follow the instructions in the Downloads tab of this
sample to save the %POLYCHOR macro definition. Replace the text within quotes in the following statement with the location of the %POLYCHOR macro definition file on your system. In your SAS program or in the SAS editor window, specify this statement to define the %POLYCHOR macro and make it available for use:
%inc "<location of your file containing the POLYCHOR macro>";

Following this statement, you may call the %POLYCHOR macro. See the Results tab for an example.

The options and allowable values are:

**DATA=**- SAS data set to be analyzed. If the DATA= option is not supplied, the most recently created SAS data set is used. Input data must be raw data, not data summarized to cell counts. When computing distances among items, this data set should be structured with items as columns and variables as rows as described below.
**VAR=**- Polychoric or tetrachoric correlations will be computed for every pair of variables listed in the VAR= option. Separate individual variable names with blanks. Keywords _NUMERIC_, _CHARACTER_ and _ALL_ may also be used to represent all numeric variables, all character variables, or all variables in the data set, respectively. Do not repeat variable names. By default, all variables found in the data set will be used. See LIMITATIONS below for time considerations.
**OUT=**- Specifies the name of the output data set that will contain the correlation or distance matrix. By default, the output data set is named _PLCORR.
**TYPE=CORR | DISTANCE**- Specifies the type of matrix to be created. If TYPE=CORR (the default), then a correlation matrix is computed and the output data set is assigned a data set type of CORR. If TYPE=DISTANCE, then a distance matrix is computed and the output dat set is assigned a data set type of DISTANCE. See DETAILS below for more information.
**ID=**- When a distance matrix is requested by specifying TYPE=DISTANCE, ID= names the variable added to the OUT= data set which contains the variable names from the DATA= data set. This variable can be used in subsequent analyses to identify the items. The ID= option is ignored if TYPE=CORR. If ID= is not specified when TYPE=DISTANCE, the variable is named _ID_.
**ORDER=DATA | FORMATTED | FREQ | INTERNAL**- If specified, this option is placed in all runs of PROC FREQ which is used to compute each correlation. The default is INTERNAL. See the description of this option in the PROC FREQ documentation.
**PRINTLEVELS=YES | NO**- When PRINTLEVELS=YES (the default), the "Character Variable Levels" table is displayed which shows the levels for all character variables and the order of the levels used when computing the polychoric correlations. See DETAILS below for more information.
**CONVERGE=**- Specifies the convergence criterion for computing the polychoric correlation. The value of the CONVERGE= option must be a positive number; by default, CONVERGE=0.0001. Iterative computation of the polychoric correlation stops when the convergence measure falls below the value of the CONVERGE= option or when the number of iterations exceeds the value specified in the MAXITER= option, whichever happens first. See DETAILS below for more information.
**MAXITER=**- Specifies the maximum number of iterations for computing the polychoric correlation. The value of the MAXITER= option must be a positive integer; by default, MAXITER=20. Iterative computation of the polychoric correlation stops when the number of iterations exceeds the value of the MAXITER= option, or when the convergence measure falls below the value of the CONVERGE= option, whichever happens first. See DETAILS below for more information.

*DETAILS:*-
The PLCORR option in the FREQ procedure is used iteratively to
compute the polychoric correlation for each pair of variables. If
both variables in a pair are binary (that is, they take on only two
distinct values), then the correlation computed by the PLCORR
option is usually referred to as the
*tetrachoric*correlation.**Convergence problems**

The PLCORR option uses an iterative, maximum likelihood method to estimate the polychoric correlation. Occasionally, this method will not converge on an estimate, and in that case, the value of the correlation is set to missing. By adjusting the values of the CONVERGE= and/or MAXITER= options in the TABLE statement of PROC FREQ, you may be able to obtain an estimate. For example, the following statements attempt to estimate the polychoric correlation between variables X1 and X2 setting the convergence criterion to a more lenient 0.001 and allowing more iterations than the default. See the FREQ chapter of the*SAS/STAT User's Guide*for details on these options.proc freq; table x1 * x2 / plcorr converge=.001 maxiter=30; run;

You can adjust the CONVERGE= and MAXITER= options for the estimation of all polychoric correlations by using the CONVERGE= and MAXITER= options in the %POLYCHOR macro.

**Ordering of levels**

The variables are assumed to be ordinal variables. PROC FREQ forms a crosstabulation for each pair of variables. Note that the ordering of the rows and columns in a table affects the computation of the polychoric correlation. For instance, two variables with levels LOW, MEDIUM, and HIGH, in this order, will produce a different correlation estimate when ordered MEDIUM, LOW, HIGH. You should verify that all of your variables will be in the desired order for your chosen setting of the ORDER= option. The PRINTLEVELS=YES option displays all character variable levels in the order used when computing the correlations. The ORDER= option affects the ordering of all variables, character and numeric, in the analysis.**Obtaining a correlation matrix**

If TYPE=CORR is specified, the individual correlation coefficients are then assembled into a TYPE=CORR data set containing a matrix of polychoric correlations. The resulting data set can be used, but for descriptive analyses only, in either the FACTOR or the CALIS procedure (specify METHOD=ULS in either procedure). If the maximum likelihood method (METHOD=ML) is used, note that none of the hypothesis tests will be valid, and the polychoric correlation matrix may be indefinite with small samples.**Obtaining a distance matrix**

If TYPE=DISTANCE is specified, a TYPE=DISTANCE data set is created containing a matrix of dissimilarity values. The dissimilarity value used is computed as 1 -*plcorr*^{2}, where*plcorr*is the polychoric correlation. It is assumed that the columns of the input data set are the items to be clustered and that each row of the data set is a variable (such as HEIGHT) on which the items are measured. Collectively, the variables (rows) locate the items (columns) to be clustered. Note that this data structure is the transpose of the usual data set input to such procedures as PROC CLUSTER in which the items to be clustered are the rows (observations) in the data set and the variables which locate the items are the columns. You can use PROC TRANSPOSE to convert rows to columns. The distance data set created by the %POLYCHOR macro includes a variable containing the item (column) names which can be used in subsequent analyses to identify the items. You can name this variable using the ID= option (the default name is _ID_). The output data set can be used in the CLUSTER procedure (but the CCC value is not valid) or the MODECLUS procedure. As discussed in the documentation of the these procedures and the DISTANCE procedure, variables with higher variability have greater effect on the distance measure. As a result, you may want to standardize the variables before computing distances. This can be done by using PROC STDIZE.See the Appendix,

**Special SAS Data Sets**in the*SAS/STAT User's Guide*for a description of TYPE=CORR and DISTANCE data sets. *LIMITATIONS:*-
The time required to compute the correlation or distance matrix
increases quadratically as the number of variables increases. Up
to 999 variables are allowed, but the time required for more than
100 variables may be exorbitant.
If a message which begins like this appears in the SAS log:

WARNING: No OUTPUT data set is produced because no statistics can be computed for this table, which has ...

it indicates that the current pair of variables does not form at least a 2x2 table. The polychoric correlation can not be computed and is set to missing in the output data set.

If some polychoric correlations could not be estimated and are missing in the OUT= data set, then an attempt to use the data set as input to an analytical procedure such as PROC PRINCOMP results in this message in the SAS log:

ERROR: CORR matrix incomplete in data set WORK._PLCORR.

All correlations must be nonmissing in order to do an analysis.

*MISSING VALUES:*-
Observations with missing values are omitted from the computation
of correlations. However, when computing the polychoric
correlation between two variables, if an observation's values for
these two variables are not missing, then the observation is used
regardless of any missing values the observation may have in other
variables. Since a matrix of correlations based on differing numbers of observations can be singular, you may want to remove all missing values from the analysis variables prior to use of this macro.
See the DETAILS and LIMITATIONS sections above for information on polychoric correlation estimates that are set to missing.

*REFERENCES:*-
See the Drasgow (1986) and Olsson (1979) references cited in the FREQ chapter of the
*SAS/STAT User's Guide*.

These sample files and code examples are provided by SAS Institute Inc. "as is" without warranty of any kind, either express or implied, including but not limited to the implied warranties of merchantability and fitness for a particular purpose. Recipients acknowledge and agree that SAS Institute shall not be liable for any damages whatsoever arising out of their use of this material. In addition, SAS Institute will provide no support for the materials contained herein.

These sample files and code examples are provided by SAS Institute Inc. "as is" without warranty of any kind, either express or implied, including but not limited to the implied warranties of merchantability and fitness for a particular purpose. Recipients acknowledge and agree that SAS Institute shall not be liable for any damages whatsoever arising out of their use of this material. In addition, SAS Institute will provide no support for the materials contained herein.

*EXAMPLE:*- The following DATA step creates five variables of normally-distributed data. Each observation is given an ID number for use later when distances between IDs are computed. The PROC RANK step computes quintile scores from the normal data to simulate ordinal data resulting from coarsely-measured normal data.
data norm; length id $ 8; array x{5} x1-x5; do n=1 to 20; do i=1 to 5; x{i}=rannor(238423)*3+10; end; id=cats(n,''); keep id x1-x5; output; end; run; proc rank data=norm out=ordinal groups=5; var x1-x5; run;

This statement defines the POLYCHOR macro and makes it available for use:

%inc "<location of your file containing the POLYCHOR macro>";

The following statements call the POLYCHOR macro which creates a TYPE=CORR data set named _PLCORR containing a matrix of polychoric correlations among the numeric variables (X1-X5) in the last-created data set, ORDINAL.

%polychor(var=_numeric_) proc print noobs; var _type_ _name_ x1-x5; run;

PROC PRINT displays the TYPE=CORR data set, _PLCORR, containing the polychoric correlation matrix:

_TYPE_ _NAME_ x1 x2 x3 x4 x5 N 20.0000 20.0000 20.0000 20.0000 20.0000 MEAN 2.0000 2.0000 2.0000 2.0000 2.0000 STD 1.4510 1.4510 1.4510 1.4510 1.4510 CORR x1 1.0000 . . . . CORR x2 -0.2023 1.0000 . . . CORR x3 0.2163 -0.1856 1.0000 . . CORR x4 0.3830 -0.1577 0.3868 1.0000 . CORR x5 -0.0689 0.1593 0.1964 0.0890 1.0000

The following steps create a TYPE=DISTANCE data set named DIST containing a dissimilarity matrix for the first six observations of data set ORDINAL. PROC TRANSPOSE creates the required data structure -- items to be clustered as columns, variables that locate items as rows. The call of the POLYCHOR macro requests computation of the distance matrix using all numeric variables and allowing for extra iteration in the algorithm that computes the correlations. A variable named ID is created containing the names of the items (variables) being clustered. PROC CLUSTER reads the distance data set and does a cluster analysis of the six observations. PROC TREE draws a dendrogram showing the clustering process.

proc transpose data=ordinal out=ordt prefix=ID; where id in ('1' '2' '3' '4' '5' '6'); id id; var x1-x5; run; %polychor(var=_numeric_, type=distance, id=id, out=dist, maxiter=100) proc print noobs; run; proc cluster data=dist method=average outtree=tree noprint; id id; run; proc tree data=Tree horizontal; id id; run;

PROC PRINT displays the TYPE=DISTANCE data set, DIST, containing the distances based on the polychoric correlations, followed by the dendrogram from PROC TREE:

id ID1 ID2 ID3 ID4 ID5 ID6 ID1 0.00000 . . . . . ID2 0.99150 0.00000 . . . . ID3 0.89850 0.33690 0.00000 . . . ID4 0.99906 0.98574 0.20678 0.00000 . . ID5 0.99906 0.90162 0.00000 0.00000 0 . ID6 0.89850 0.33689 0.00000 0.20679 .000002000 0

Clustering of the six observations is shown in the dendrogram displayed by PROC TREE.

Right-click on the link below and select **Save** to save
the %POLYCHOR macro definition
to a file. It is recommended that you name the file
`polychor.sas`

.

Download and save polychor.sas

The %POLYCHOR macro creates a SAS data set containing a correlation
matrix of polychoric correlations or a distance matrix based on
polychoric correlations.

#### Operating System and Release Information

Type: | Sample |

Topic: | SAS Reference ==> Procedures ==> FREQ Analytics ==> Longitudinal Analysis Analytics ==> Exact Methods Analytics ==> Categorical Data Analysis Analytics ==> Nonparametric Analysis SAS Reference ==> Procedures ==> CORR Analytics ==> Descriptive Statistics |

Date Modified: | 2015-09-17 16:46:27 |

Date Created: | 2005-01-13 15:03:32 |

Product Family | Product | Host | SAS Release | |

Starting | Ending | |||

SAS System | SAS/STAT | z/OS | ||

64-bit Enabled HP-UX | ||||

Microsoft Windows Server 2012 R2 Std | ||||

Microsoft Windows Server 2012 Std | ||||

Microsoft Windows XP Professional | ||||

Windows 7 Enterprise 32 bit | ||||

Windows 7 Enterprise x64 | ||||

Windows 7 Home Premium 32 bit | ||||

Windows 7 Home Premium x64 | ||||

Windows 7 Professional 32 bit | ||||

Windows 7 Professional x64 | ||||

Windows 7 Ultimate 32 bit | ||||

Windows 7 Ultimate x64 | ||||

Windows Millennium Edition (Me) | ||||

Windows Vista | ||||

Windows Vista for x64 | ||||

64-bit Enabled AIX | ||||

OS/2 | ||||

Microsoft Windows 8 Enterprise 32-bit | ||||

Microsoft® Windows® for x64 | ||||

Microsoft Windows XP 64-bit Edition | ||||

Microsoft Windows Server 2003 Enterprise 64-bit Edition | ||||

Microsoft Windows Server 2003 Datacenter 64-bit Edition | ||||

OpenVMS VAX | ||||

Microsoft® Windows® for 64-Bit Itanium-based Systems | ||||

Z64 | ||||

Microsoft Windows 8.1 Enterprise x64 | ||||

Microsoft Windows 8.1 Enterprise 32-bit | ||||

Microsoft Windows 8 Pro x64 | ||||

Microsoft Windows 8 Enterprise x64 | ||||

Microsoft Windows 8 Pro 32-bit | ||||

Microsoft Windows Server 2012 R2 Datacenter | ||||

Microsoft Windows Server 2012 Datacenter | ||||

Microsoft Windows Server 2008 for x64 | ||||

Microsoft Windows Server 2008 R2 | ||||

Microsoft Windows Server 2008 | ||||

Microsoft Windows Server 2003 for x64 | ||||

Microsoft Windows Server 2003 Standard Edition | ||||

Microsoft Windows Server 2003 Enterprise Edition | ||||

Microsoft Windows Server 2003 Datacenter Edition | ||||

Microsoft Windows NT Workstation | ||||

Microsoft Windows 2000 Professional | ||||

Microsoft Windows 2000 Server | ||||

Microsoft Windows 2000 Datacenter Server | ||||

Microsoft Windows 2000 Advanced Server | ||||

Microsoft Windows 95/98 | ||||

Microsoft Windows 8.1 Pro 32-bit | ||||

Microsoft Windows 8.1 Pro | ||||

64-bit Enabled Solaris | ||||

ABI+ for Intel Architecture | ||||

AIX | ||||

HP-UX | ||||

HP-UX IPF | ||||

IRIX | ||||

Linux | ||||

Linux for x64 | ||||

Linux on Itanium | ||||

OpenVMS Alpha | ||||

OpenVMS on HP Integrity | ||||

Solaris | ||||

Solaris for x64 | ||||

Tru64 UNIX |