Example 2.1 Computing Four Measures of Association
This example produces a correlation analysis with descriptive statistics and four measures of association: the Pearson product-moment correlation, the Spearman rank-order correlation, Kendall’s tau-b coefficients, and Hoeffding’s measure of dependence, .
The Fitness data set created in the section Getting Started: CORR Procedure contains measurements from a study of physical fitness of 31 participants. The following statements request all four measures of association for the variables Weight, Oxygen, and Runtime:
ods graphics on;
title 'Measures of Association for a Physical Fitness Study';
proc corr data=Fitness pearson spearman kendall hoeffding
plots=matrix(histogram);
var Weight Oxygen RunTime;
run;
ods graphics off;
Note that Pearson correlations are computed by default only if all three nonparametric correlations (SPEARMAN, KENDALL, and HOEFFDING) are not specified. Otherwise, you need to specify the PEARSON option explicitly to compute Pearson correlations.
The "Simple Statistics" table in Output 2.1.1 displays univariate descriptive statistics for analysis variables. By default, observations with nonmissing values for each variable are used to derive the univariate statistics for that variable. When nonparametric measures of association are specified, the procedure displays the median instead of the sum as an additional descriptive measure.
Output 2.1.1
Simple Statistics
31 |
77.44452 |
8.32857 |
77.45000 |
59.08000 |
91.63000 |
29 |
47.22721 |
5.47718 |
46.67200 |
37.38800 |
60.05500 |
29 |
10.67414 |
1.39194 |
10.50000 |
8.17000 |
14.03000 |
The "Pearson Correlation Coefficients" table in Output 2.1.2 displays Pearson correlation statistics for pairs of analysis variables. The Pearson correlation is a parametric measure of association for two continuous random variables. When there are missing data, the number of observations used to calculate the correlation can vary.
Output 2.1.2
Pearson Correlation Coefficients
The table shows that the Pearson correlation between Runtime and Oxygen is 0.86843, which is significant with a -value less than 0.0001. This indicates a strong negative linear relationship between these two variables. As Runtime increases, Oxygen decreases linearly.
The Spearman rank-order correlation is a nonparametric measure of association based on the ranks of the data values. The "Spearman Correlation Coefficients" table in Output 2.1.3 displays results similar to those of the "Pearson Correlation Coefficients" table in Output 2.1.2.
Output 2.1.3
Spearman Correlation Coefficients
Kendall’s tau-b is a nonparametric measure of association based on the number of concordances and discordances in paired observations. The "Kendall Tau b Correlation Coefficients" table in Output 2.1.4 displays results similar to those of the "Pearson Correlation Coefficients" table in Output 2.1.2.
Output 2.1.4
Kendall’s Tau-b Correlation Coefficients
Hoeffding’s measure of dependence, , is a nonparametric measure of association that detects more general departures from independence. Without ties in the variables, the values of the statistic can vary between and , with indicating complete dependence. Otherwise, the statistic can result in a smaller value. The "Hoeffding Dependence Coefficients" table in Output 2.1.5 displays Hoeffding dependence coefficients. Since ties occur in the variable Weight, the statistic for the Weight variable is less than .
Output 2.1.5
Hoeffding’s Dependence Coefficients
When you use the PLOTS=MATRIX(HISTOGRAM) option, the CORR procedure displays a symmetric matrix plot for the analysis variables listed in the VAR statement (Output 2.1.6).
Output 2.1.6
Symmetric Scatter Plot Matrix
The strong negative linear relationship between Oxygen and Runtime is evident in Output 2.1.6.
Note that this graphical display is requested by enabling ODS Graphics and by specifying the PLOTS= option. For more information about ODS Graphics, see
Chapter 21,
Statistical Graphics Using ODS
(SAS/STAT User's Guide).