|
|
 |
|
|
 |
| All Exercises |
Problem |
Sample Data |
Solution |
 |
|
|
Select an Exercise for |
Click any exercise title to see the problem for that exercise. Then you can view and download sample data, complete the exercise, and check the solution.
Student Survey 4 Investigate characteristics of college students using survey results.
Livestock Auctions 1 Describe the relationship between the cost of operations and the number of cattle sold at auctions.
|
Student Survey 4: Problem |
To determine the characteristics of the students in their large introductory statistics class, a group of college professors administered a survey to the students in their classes. Some of the questions asked include: the student’s age in years, height (in inches), shoe size and high school GPA.
Exercise: a. Can we estimate a person’s height from their shoe size? Create a scatterplot of the relationship between shoe size and height. Choose the appropriate variable to be on the horizontal axis. b. Examine the scatterplot. Is the relationship positive or negative? c. Calculate the least squares regression line that relates height to shoe size. What is the least squares line? d. Explain in layman’s terms the meaning of the slope term in this regression line. e. Is the slope between shoe size and height significant? Explain how you know. f. Does the intercept term make sense in this setting? Explain. g. Calculate the R-square statistic for the relationship between shoe size and height. Explain this terms meaning. h. We find a student shoe print that was a size 10. What height would you estimate for this student? Also calculate a prediction interval. Would this interval be useful in identifying the student? Explain.
|
 Dr. Roger Woodard North Carolina State University
Printer Friendly |
Student Survey 4: Sample Data | |
The Survey data is the result of a survey administered to a large introductory statistics class. The data set contains answers from 485 participants. Not all questions are answered resulting in missing data. Missing data is indicated by a period. These are the variables in the data set: Name | Type | Description | | Gender | char | gender (male or female) of the respondent | | Age | num | age of the subject in years | | Textbook | num | answer to the question “How much did you spend for textbooks this term (to nearest dollar)?” | | Cigs | num | answer to the question “How many cigarettes did you smoke yesterday?” | | ColGPA | num | answer to the question “What is your cumulative Grade Point Average at this institution?” | | HSGPA | num | answer to the question “What was your cumulative high school GPA (4 point scale)?” | | Height | num | height of the respondent in inches | | Mateh | num | the height of the respondent’s “ideal mate” | | Shoe | num | the respondent’s shoe size | | Breakfast | char | answer to the question “Did you eat breakfast this morning?” | | Flight | char | answer to the question “Did you fly on a commercial airline during the past 30 days? (Yes or No)” | | Play | char | answer to the question “Have you seen a play in a live theater in the past 6 months? Yes or No” | | Vote | char | answer to the question “Are you registered to vote? Yes or No” | | Credit | num | answer to the question “How many credit hours are you taking this term?” | |
|
Source of Data
|
This data was collected by Roger Woodard of the North Carolina State University in 2005. |
Student Survey 4: Solution |
a. The scatterplot is given below. If we are predicting height with the shoe size we should put the shoe size on the horizontal axis. b. For this scatterplot we see a strong positive correlation. c. The least squares line is “height=52.05+1.64*shoe”. d. The slope term 1.64 indicates that the expected height of a student would increase 1.64 inches for each larger shoe size they have. e. The slope term is significant based on the t-test of significance. This term has a p-value of less than 0.001. f. The intercept term would indicate that a person with shoe size zero would be 52.05 inches tall. It is impossible to have a shoe size of zero so this term does not have a practical interpretation. g. The R-square is 0.7318. This indicates that about 73% of the variability in the height of a subject can be explained by the straight line relationship with shoe size. h. We would predict that the height would be around 68.45. If we want to put a prediction interval around this estimate it would range from 64.34 to 72.55. This interval is very uninformative. Even with out the information about shoe size we might guess that some one would be between 5’4 and 6’0 tall.
Teacher's note: This example can be well motivated by the question of evaluating crime scene evidence. It can serve as a good lead in for multiple regression by considering adding additional variables such as gender and age to the relationship.
|
Livestock Auctions 1: Problem |
A group of livestock auction market managers were interested in learning how the number of cattle sold at their markets influenced the cost of operations of their markets. Data was collected from 19 such auction markets.
Find the simple linear regression equation for cost of operations on cattle sold. Use this equation to predict the cost of operations for a market that anticipates selling 13,500 cattle. |
 SAS Institute Inc.
Printer Friendly |
Livestock Auctions 1: Sample Data | |
The Market data set contains data from 19 livestock auction markets, including the number of head of cattle sold (in thousands), the cost of operations of the auction market (in thousands of dollars), and the market identifier. These are the variables in the data set: Name | Type | Description | | marketid | char | market identifier | | cattle | num | numbers of head of cattle sold (in thousands) | | cost | num | cost of operations of the auction market (in thousands of dollars) | |
|
Source of Data
|
This data is sample data from SAS Institute Inc. |
Livestock Auctions 1: Solution |
The estimate for the intercept parameter is 7.19650, and the estimate for the slope parameter is 4.56396. So, the equation of the simple linear regression line is
y-hat = 7.19650 + 4.56396x,
where y-hat is the predicted cost of operations (in thousands of dollars) and x is the number of cattle sold (in thousands).
For a market anticipating the sale of 13,500 cattle, the predicted cost of operations is
y-hat = 7.19650 + 4.59396(13.5) = 69.215
So, $69,215 in operating costs is expected for 13,500 cattle sold. |
|