Example 2.5 Computing Polyserial Correlations

The following statements create the data set Fitness1. This data set contains an ordinal variable Oxygen that is derived from a continuous measurement of oxygen intake which is not directly observed.

*----------------- Data on Physical Fitness -----------------*
| These measurements were made on men involved in a physical |
| fitness course at N.C. State University.                   |
| The variables are Age (years), Weight (kg),                |
| Runtime (time to run 1.5 miles in minutes), and            |
| Oxygen (an ordinal variable based on oxygen intake,        |
|         ml per kg body weight per minute)                  |
| Certain values were changed to missing for the analysis.   |
*------------------------------------------------------------*;
data Fitness1;
   input Age Weight RunTime Oxygen @@;
   datalines;
44 89.47 11.37  8     40 75.07 10.07  9
44 85.84  8.65 10     42 68.15  8.17 11
38 89.02   .    9     47 77.45 11.63  8
40 75.98 11.95  9     43 81.19 10.85  9
44 81.42 13.08  7     38 81.87  8.63 12
44 73.03 10.13 10     45 87.66 14.03  7
45 66.45 11.12  8     47 79.15 10.60  9
54 83.12 10.33 10     49 81.42  8.95  9
51 69.63 10.95  8     51 77.91 10.00  9
48 91.63 10.25  9     49 73.37 10.08  .
57 73.37 12.63  7     54 79.38 11.17  9
52 76.32  9.63  9     50 70.87  8.92 10
51 67.25 11.08  9     54 91.63 12.88  7
51 73.71 10.47  9     57 59.08  9.93 10
49 76.32   .    .     48 61.24 11.50  9
52 82.78 10.50  9
;

The following statements compute Pearson correlations and polyserial correlations:

proc corr data=Fitness1 pearson polyserial;
   with Oxygen;
   var  Age Weight RunTime;
run;

For the purpose of computing Pearson correlations, the variables in the WITH and VAR statements are treated as continuous variables. For the purpose of computing polyserial correlations, the variables in the WITH statement are treated as ordinal variables by default, and the variables in the VAR statement are treated as continuous variables.


The "Simple Statistics" table in Output 2.5.1 displays univariate descriptive statistics for each analysis variable.

Output 2.5.1 Simple Statistics
The CORR Procedure

1 With Variables: Oxygen
3 Variables: Age Weight RunTime

Simple Statistics
Variable N Mean Std Dev Median Minimum Maximum
Oxygen 29 8.93103 1.16285 9.00000 7.00000 12.00000
Age 31 47.67742 5.21144 48.00000 38.00000 57.00000
Weight 31 77.44452 8.32857 77.45000 59.08000 91.63000
RunTime 29 10.67414 1.39194 10.50000 8.17000 14.03000

The "Pearson Correlation Coefficients" table in Output 2.5.2 displays Pearson correlation statistics between Oxygen and the other three variables. The table shows a strong correlation between variables Oxygen and RunTime.

Output 2.5.2 Pearson Correlation Coefficients
Pearson Correlation Coefficients
Prob > |r| under H0: Rho=0
Number of Observations
  Age Weight RunTime
Oxygen
-0.25581
0.1804
29
-0.22211
0.2469
29
-0.85750
<.0001
28

The "Polyserial Correlations" table in Output 2.5.3 displays polyserial correlation statistics between Oxygen and the three continuous variables. The variable Oxygen is treated as an ordinal variable derived from oxygen intake (the underlying continuous variable), assuming a bivariate normal distribution for oxygen intake and each of the three continuous variables Age, Weight, and RunTime. The CORR procedure provides two tests for a zero polyserial correlation: the Wald test and the likelihood ratio test. The table shows a strong polyserial correlation between RunTime and the underlying continuous variable of Oxygen from both tests.

Output 2.5.3 Polyserial Correlation Coefficients
Polyserial Correlations
Continuous
Variable
Ordinal
Variable
N Correlation Wald Test LR Test
Standard
Error
Chi-Square Pr > ChiSq Chi-Square Pr > ChiSq
Age Oxygen 29 -0.23586 0.18813 1.5717 0.2100 1.4466 0.2291
Weight Oxygen 29 -0.24514 0.18421 1.7709 0.1833 1.6185 0.2033
RunTime Oxygen 28 -0.91042 0.04071 500.0345 <.0001 38.6963 <.0001