|
|
 |
|
|
 |
| All Exercises |
Problem |
Sample Data |
Solution |
 |
|
|
Select an Exercise for |
Click any exercise title to see the problem for that exercise. Then you can view and download sample data, complete the exercise, and check the solution.
Student Survey 3 Investigate characteristics of college students using survey results.
|
Football: Height and Neck Measurements: Problem |
A set of data was collected for the Brigham Young football team. Information on players’ positions, heights, weights, percent body fat, neck measurements, and performances on various weight lifts were recorded.
Use the Pearson correlation coefficient to evaluate the relationship between height and neck measurement, and weight and neck measurement. With which of these two variables does neck measurement seem to be more correlated? |
 Lee Creighton SAS Institute Inc.
Printer Friendly |
Football: Height and Neck Measurements: Sample Data | |
The Football data set was collected from the Brigham Young University football program. Various body composition measurements (such as weight, height, percent body fat) and physical performance measurements (such as speed, bench press, and squat) are included in the data set. Note: some observations have missing data values. These are the variables in the data set: Name | Type | Description | | Height | num | height of player (inches) | | Weight | num | weight of player (pounds) | | Fat | num | percent body fat | | Speed | num | evaluation of player’s speed | | Neck | num | neck measurement (inches) | | Bench | num | player’s bench press (pounds) | | Squat | num | player’s squat (pounds) | | LegPress | num | player’s leg press (pounds) | | Position | num | primary position | | Position2 | num | secondary position | | Speed2 | num | second evaluation of player’s speed | |
|
Source of Data
|
Sall, J., Creighton, L., & Lehman, A. (2006). JMP Start Statistics, Third Edition. Cary, NC: SAS Institute Inc. |
Football: Height and Neck Measurements: Solution |
Using SAS Enterprise Guide, the value of the Pearson correlation coefficient between height and neck measurement is 0.47769. The value of correlation between weight and neck measurement is 0.81370. So, based on these values it appears that neck measurement is more closely correlated with weight, rather than height. |
Student Survey 3: Problem |
To determine the characteristics of the students in their large introductory statistics class, a group of college professors administered a survey to the students in their classes. Some of the questions asked include: the student’s age in years, height (in inches), shoe size and high school GPA.
Exercise: a. Before you actually do any calculations or construct graphics, consider the relationship between a student’s height and the height of their ideal mate. What type of relationship would you expect for these variables? (Positive or negative? Strong or weak?) b. Explain in layman’s terms what a positive correlation between these variables would mean. Then explain (again in layman’s terms) what a negative correlation would mean. c. Calculate Pearson’s correlation for these variables. Is it positive or negative? Does this match your intuition? Explain. d. Create a scatterplot of these variables. Does the scatterplot match the correlation you calculated? Explain. e. Further explore the data to develop an explanation for the calculated correlation.
|
 Dr. Roger Woodard North Carolina State University
Printer Friendly |
Student Survey 3: Sample Data | |
The Survey data is the result of a survey administered to a large introductory statistics class. The data set contains answers from 485 participants. Not all questions are answered, resulting in missing data. Missing data is indicated by a period. These are the variables in the data set: Name | Type | Description | | Gender | char | gender (male or female) of the respondent | | Age | num | age of the subject in years | | Textbook | num | answer to the question “How much did you spend for textbooks this term (to nearest dollar)?” | | Cigs | num | answer to the question “How many cigarettes did you smoke yesterday?” | | ColGPA | num | answer to the question “What is your cumulative Grade Point Average at this institution?” | | HSGPA | num | answer to the question “What was your cumulative high school GPA (4 point scale)?” | | Height | num | height of the respondent in inches | | Mateh | num | the height of the respondent’s “ideal mate” | | Shoe | num | the respondent’s shoe size | | Breakfast | char | answer to the question “Did you eat breakfast this morning?” | | Flight | char | answer to the question “Did you fly on a commercial airline during the past 30 days? (Yes or No)” | | Play | char | answer to the question “Have you seen a play in a live theater in the past 6 months? Yes or No” | | Vote | char | answer to the question “Are you registered to vote? Yes or No” | | Credit | num | answer to the question “How many credit hours are you taking this term?” | |
|
Source of Data
|
This data was collected by Roger Woodard of the North Carolina State University in 2005. |
Student Survey 3: Solution |
a. Most people would expect that a student’s height and the height of their ideal mate are positively correlated. b. A positive relationship would indicate that a taller individual would want a taller mate. Thus, a tall basketball player would want a tall mate rather than a very short one. A negative correlation would indicate that tall people want a short mate and a short person wants a tall mate. c. The correlation is -0.289. This does not match what most people would expect. d. The scatterplot is given below. It does match the correlation. e. The scatterplot has two clusters. Further exploration might lead a person to split the scatterplot by gender. Doing this (see the second scatterplot) shows that there are two positive correlations joined together. The positive correlation matches what we expect. If we calculate the correlations separately by gender we find that for males (0.468) and females (0.401) both are positive correlations.
Teacher's note: This is a good example of why one should not take a correlation at face value, but instead always make a scatterplot of the relationship. It is a continuous example of Simpson’s paradox where a third variable changes the apparent relationship of two variables.
|
Crime and Recreation in Cities: Problem |
In a study to investigate the prevalence of relationships between different socioeconomic factors, 52 western cities were rated by nine criteria: climate terrain, housing, health care environment, crime, transportation, education, arts, recreation, and economics. For housing and crime, the lower the rating score, the better. For the remaining seven criteria, the higher the score, the better. Find the correlation of crime and recreation for this sample data. Based on your value for the Pearson correlation coefficient, would you describe the linear relationship between crime and recreation as strong or weak, positive or negative? |
 SAS Institute Inc.
Printer Friendly |
Crime and Recreation in Cities: Sample Data | |
The Westernrates data set contains data about ratings on nine criteria (climate and terrain, housing, health care and environment, crime, transportation, education, arts, recreation, and economics) for 52 western cities. These are the variables in the data set: Name | Type | Description | | City | char | city | | State | char | state | | ClimateTerrain | num | rating of climate and terrain | | Housing | num | rating of housing | | HealthCareEnvironment | num | rating of health care and environment | | Crime | num | rating of crime | | Transportation | num | rating of transportation | | Education | num | rating of education | | Arts | num | rating of the arts | | Recreation | num | rating of recreation | | Economics | num | rating of economics | |
|
Source of Data
|
This data is sample data from SAS Institute Inc. |
Crime and Recreation in Cities: Solution |
The correlation value of 0.30337 suggests that there is a weak positive relationship between crime and recreation. Furthermore, this implies that there is an increase in the crime score as the recreation score increases for these western cities. |
|