Descriptive Statistics

Computing Correlations

You can use the Correlations task to compute pairwise correlation coefficients for the variables in your data set. The correlation is a measure of the strength of the linear relationship between two variables. This task can compute the standard Pearson product-moment correlations, nonparametric measures of association, partial correlations, and Cronbach's coefficient alpha. The task also can produce scatter plots with confidence ellipses.

The following example computes correlation coefficients for four variables in the Fitness data set. This data set contains measurements made on groups of men taking a physical fitness course at North Carolina State University. The variables are as follows:

age, in years
weight, in kilograms
oxygen intake rate, in milliliters per kilogram of body weight per minute
time taken to run 1.5 miles, in minutes
heart rate while resting
heart rate while running
maximum heart rate recorded while running
group number

This example includes looking at correlations between the variables runtime, runpulse, maxpulse, and oxygen and also producing the corresponding scatter plots with confidence ellipses.

Open the Fitness Data Set

To open the Fitness data set, follow these steps:
  1. Select Tools arrow Sample Data ...
  2. Select Fitness.
  3. Click OK to create the sample data set in your Sasuser directory.
  4. Select File arrow Open By SAS Name ...
  5. Select Sasuser from the list of Libraries.
  6. Select Fitness from the list of members.
  7. Click OK to bring the Fitness data set into the data table.

Request Correlations

To compute correlations for variables in the Fitness data set, follow these steps:
  1. Select Statistics arrow Descriptive arrow Correlations ...
  2. Select the variables runtime, runpulse, maxpulse, and oxygen to correlate.

Figure 7.18 displays the resulting Correlations dialog.

Correlations Dialog

Figure 7.18: Correlations Dialog

If you click OK in the Correlations main dialog, the default output, which includes Pearson correlations, is produced. Or, you can request specific types of correlations by using the Options dialog.

Request a Scatter Plot

To request a scatter plot with a confidence ellipse, follow these steps:
  1. Click on the Plots button.
  2. Select Scatter plots.
  3. Select Add confidence ellipses.

The confidence level used in calculating the confidence ellipse is 0.95. To use a different level, type that value in the Probability value: field, as displayed in Figure 7.19.

  1. Click OK.

Correlations: Plots Dialog

Figure 7.19: Correlations: Plots Dialog

Click OK in the main dialog to perform the analysis.

Review the Results

The results are presented in the project tree, as displayed in Figure 7.20.

Correlations: Project Tree

Figure 7.20: Correlations: Project Tree

You can double-click on any of the resulting nodes in the project tree to view the information in a separate window.

Figure 7.21 displays univariate statistics for each of the analysis variables. The table provides the number of observations, the mean, the standard deviation, the sum, and the minimum and maximum values for each variable.

Correlations: Univariate Statistics

Figure 7.21: Correlations: Univariate Statistics

Figure 7.22 displays the table of correlations. The p-value, which is the significance probability of the correlation, is displayed under each of the correlation coefficients. For example, the correlation between the variables maxpulse and runtime is 0.22610, with an associated p-value of 0.2213, and the correlation between the variables oxygen and runpulse is -0.39797, with an associated p-value of 0.0266.

Correlations: Table of Correlations

Figure 7.22: Correlations: Table of Correlations

Six scatter plots, each of which includes a 95% confidence ellipse, are produced in this analysis. Each plot displays the relationship between one pair of the analysis variables. The scatter plot of runtime versus oxygen is displayed in Figure 7.23.

Correlations: Scatter Plot with Confidence Ellipse

Figure 7.23: Correlations: Scatter Plot with Confidence Ellipse

Confidence ellipses are used as a graphical indicator of correlation. When two variables are uncorrelated, the confidence ellipse is circular in shape. The ellipse becomes more elongated the stronger the correlation is between two variables.

Previous | Next | Top of Page