|
|
 |
|
|
 |
| All Exercises |
Problem |
Sample Data |
Solution |
 |
|
|
Select an Exercise for |
Click any exercise title to see the problem for that exercise. Then you can view and download sample data, complete the exercise, and check the solution.
Student Survey 5 Investigate characteristics of college students using survey results.
Colon Cancer Determine whether risk of colon cancer differs between age groups.
|
Student Survey 5: Problem |
To determine the characteristics of the students in their large introductory statistics class, a group of college professors administered a survey to the students in their classes. Some of the questions asked include: the student’s age in years, height (in inches), shoe size, and high school GPA. We all know that there are many physical differences in male and female college students, but what other characteristics differ? Assume that we can consider our sample results to be a random sample from the population of all undergraduate students at this institution.
Exercise: a. Is there a relationship between the gender of a student and if the student is registered to vote? Create a two-way table displaying the gender and voter registration status for these students. Does there appear to be a relationship? b. Calculate Pearson’s Chi-square statistic and the associated p-value for these variables. c. Explain the null and alternative hypothesis for the Chi-square test in this setting. d. What conclusion can you draw from the results of the Chi-square test? e. Conduct the Chi-square test for gender and whether or not the student ate breakfast. f. Conduct the Chi-square test for gender and if the student took a commercial flight in the last 30 days. g. Conduct the Chi-square test for gender and if the student has seen a play in the past 6 months. |
 Dr. Roger Woodard North Carolina State University
Printer Friendly |
Student Survey 5: Sample Data | |
The Survey data is the result of a survey administered to a large introductory statistics class. The data set contains answers from 485 participants. Not all questions are answered resulting in missing data. Missing data is indicated by a period. These are the variables in the data set: Name | Type | Description | | Gender | char | gender (male or female) of the respondent | | Age | num | age of the subject in years | | Textbook | num | answer to the question “How much did you spend for textbooks this term (to nearest dollar)?” | | Cigs | num | answer to the question “How many cigarettes did you smoke yesterday?” | | ColGPA | num | answer to the question “What is your cumulative Grade Point Average at this institution?” | | HSGPA | num | answer to the question “What was your cumulative high school GPA (4 point scale)?” | | Height | num | height of the respondent in inches | | Mateh | num | the height of the respondent’s “ideal mate” | | Shoe | num | the respondent’s shoe size | | Breakfast | char | answer to the question “Did you eat breakfast this morning?” | | Flight | char | answer to the question “Did you fly on a commercial airline during the past 30 days? (Yes or No)” | | Play | char | answer to the question “Have you seen a play in a live theater in the past 6 months? Yes or No” | | Vote | char | answer to the question “Are you registered to vote? Yes or No” | | Credit | num | answer to the question “How many credit hours are you taking this term?” | |
|
Source of Data
|
This data was collected by Roger Woodard of the North Carolina State University in 2005. |
Student Survey 5: Solution |
a. The two-way table is given below. There does not seem to be much of a relationship, for both males and females. Over 86% of the subjects were registered to vote. Table of gender by vote
| gender | vote | | Frequency Percent Row Pct Col Pct | n | y | Total | | f | 40 8.26 13.75 66.67 | 251 51.86 86.25 59.20 | 291 60.12 | | m | 20 4.13 10.36 33.33 | 173 35.74 89.64 40.80 | 193 39.88 | | Total | 60 12.40 | 424 87.60 | 484 100.00 | | Frequency Missing = 2 |
b. The test statistic is 1.2229 and the p-value is 0.2688. c. The null hypothesis is that gender and voter registration is independent. The alternative hypothesis is that the variables are not independent. d. Based on the large p-value, we see that we cannot reject the null hypothesis. Therefore, we cannot reject the idea that gender and voter registration is independent. e. The test statistic is 5.7322 and p-value is 0.0167. We can reject the null hypothesis at the 5% level (but not the 1% level). We can conclude that there is evidence that eating breakfast and gender are related among the students at this university. f. The test statistic is 1.1028 and p-value is 0.2937. We cannot reject the null hypothesis. Therefore there is not enough evidence to conclude there is a relationship between gender and taking a flight among the students at this university. g. The test statistic is 9.1603 and p-value is 0.0025. We can reject the null hypothesis at both the 5% and 1% level. We can conclude that there is evidence that seeing a play and gender are related among the students at this university. |
Colon Cancer: Problem |
A colonoscopy screening study was performed on individuals who were considered to be at a high risk of colon cancer, due to adenoma findings in previous examinations. The data recorded from the screening were the variables Finding (coded 0 for negative examination, 1 for small adenoma, and 2 for large adenoma) and Age (rounded: 30-39 years coded as 35, 40-49 years coded as 45, and so on). Using the Pearson chi-square test, determine whether there are significance differences among age groups for risk of colon cancer. Give the chi-square test statistic and the p-value for the test. |
 SAS Institute Inc.
Printer Friendly |
Colon Cancer: Sample Data | |
The Colonoscopy data set contains data from a colonoscopy screening study on individuals considered to be at high risk of colon cancer. These are the variables in the data set: Name | Type | Description | | Finding | num | finding (0 for negative examination, 1 for small adenoma, and 2 for large adenoma) | | Age | num | age (rounded to the midpoint of each decade; e.g., ages 30-39 years coded as 35, ages 40-49 years coded as 45) | |
|
Source of Data
|
This data is sample data from SAS Institute Inc. |
Colon Cancer: Solution |
The value of the Pearson chi-square statistic is 28.2566, with a p-value of 0.0004, which can be considered to be very significant. So, we can conclude that there are significant differences among age groups for risk of colon cancer. |
|