|
|
 |
|
|
 |
| All Exercises |
Problem |
Sample Data |
Solution |
 |
|
|
Select an Exercise for |
Click any exercise title to see the problem for that exercise. Then you can view and download sample data, complete the exercise, and check the solution.
|
Hot Dogs (Bar Chart): Problem |
An investigation of the taste and nutritional content of three types of hot dogs was carried out. The types of franks included in the study were categorized as beef, meat, and poultry.
Construct a bar chart of hot dog type and use it to compare the type frequencies. |
 Lee Creighton SAS Institute Inc.
Printer Friendly |
Hot Dogs (Bar Chart): Sample Data | |
The Hot_dogs data set is from a study on the taste and nutritional content of hot dogs. Fifty-four brands of hot dogs were included. Information on the type of hot dog, a description of the taste, weight (ounces), protein content, calories, sodium, and protein fat were collected. These are the variables in the data set: Name | Type | Description | | Product_Name | char | brand of hot dog | | Type | char | type of hot dog (beef, meat, poultry) | | Taste | char | description of taste (bland, medium, scrumptious) | | _oz | num | weight of frank (ounces) | | _lb_Protein | num | protein content | | Calories | num | caloric content | | Sodium | num | sodium content | | Protein_Fat | num | protein from fat | |
|
Source of Data
|
Sall, J., Creighton, L., & Lehman, A. (2006). JMP Start Statistics, Third Edition. Cary, NC: SAS Institute Inc. |
Hot Dogs (Bar Chart): Solution |
Using the variable Type as the categorical variable to chart for this problem, the horizontal axis of the bar chart is divided into three classes—for each hot dog type. The vertical axis indicates the frequency (count) of the hot dog types. Vertical bars are constructed for each class, with the height of each bar representing the class frequency. It turns out that meat and poultry types had the same hot dog count (17), while the beef type was included more in the study (20 hot dogs).
|
Typing Speed (Stem & Leaf Plot): Problem |
Three brands of typewriters were tested for typing speed by having expert typists type identical passages of text.
Construct a stem and leaf plot to examine the distribution of typing speeds. Would you describe the shape of the distribution as symmetric, skewed to the left, or skewed to the right? |
 Lee Creighton SAS Institute Inc.
Printer Friendly |
Typing Speed (Stem & Leaf Plot): Sample Data | |
The Typing_data data set contains the results of a test for typing speed on three different brands of typewriters. The speeds were recorded as words typed per minute. These are the variables in the data set: Name | Type | Description | | brand | char | brand of typewriter | | speed | num | typing speed (words per minute) | |
|
Source of Data
|
Sall, J., Creighton, L., & Lehman, A. (2006). JMP Start Statistics, Third Edition. Cary, NC: SAS Institute Inc. |
Typing Speed (Stem & Leaf Plot): Solution |
Stem Leaf 6 12 6 668 7 001223 7 779 8 01 8 7
The distribution of typing speeds appears to be skewed to the right.
|
Student Survey 1 (Histogram): Problem |
To determine the characteristics of the students in their large introductory statistics class, a group of college professors administered a survey to the students in their classes. Some of the questions asked include: the student’s age in years, height (in inches), shoe size, and high school GPA.
Create a histogram of the age of these students in years.
a. What shape does this histogram take on?
b. Where is the main cluster of the students’ ages?
c. How old is the oldest student in this class (approximately)?
d. Examine the shape of this histogram. Give an explanation in layman’s terms to why this histogram takes on the shape that it does.
|
 Dr. Roger Woodard North Carolina State University
Printer Friendly |
Student Survey 1 (Histogram): Sample Data | |
The Survey data set is the result of a survey administered to a large introductory statistics class. The data set contains answers from 485 participants. Not all questions are answered, resulting in missing data. Missing data is indicated by a period. These are the variables in the data set: Name | Type | Description | | Gender | char | gender (male or female) of the respondent | | Age | num | age of the subject in years | | Textbook | num | answer to the question “How much did you spend for textbooks this term (to nearest dollar)?” | | Cigs | num | answer to the question “How many cigarettes did you smoke yesterday?” | | ColGPA | num | answer to the question “What is your cumulative Grade Point Average at this institution?” | | HSGPA | num | answer to the question “What was your cumulative high school GPA (4 point scale)?” | | Height | num | height of the respondent in inches | | Mateh | num | the height of the respondent’s “ideal mate” | | Shoe | num | the respondent’s shoe size | | Breakfast | char | answer to the question “Did you eat breakfast this morning?” | | Flight | char | answer to the question “Did you fly on a commercial airline during the past 30 days? (Yes or No)” | | Play | char | answer to the question “Have you seen a play in a live theater in the past 6 months? Yes or No” | | Vote | char | answer to the question “Are you registered to vote? Yes or No” | | Credit | num | answer to the question “How many credit hours are you taking this term?” | |
|
Source of Data
|
This data was collected by Roger Woodard of the North Carolina State University in 2005. |
Student Survey 1 (Histogram): Solution |
a. The histogram is skewed to the right.
b. The main cluster of ages is in the 19 to 22 year age range.
c. The oldest student is about 60 years of age.
d. This histogram is skewed to the right. This is because the main cluster of students represents traditional students who have gone to college immediately after high school. The age of college students is limited on the low end by the practical limitation of attending college. It is unreasonable to expect many individuals under age 16 to attend college. However, there is no age limit on attending college, so the long right tail of the distribution represents students who have taken a few years longer to complete college, non-traditional students, and lifelong learners. |
Student Survey 2 (Box Plot): Problem |
To determine the characteristics of the students in their large introductory statistics class, a group of college professors administered a survey to the students in their classes. Some of the questions asked include: the student’s age in years, height (in inches), shoe size and high school GPA.
Create box plots of the heights of these students separated by gender. Compare the shape, location and spread of the height of the groups.
|
 Dr. Roger Woodard North Carolina State University
Printer Friendly |
Student Survey 2 (Box Plot): Sample Data | |
The Survey data set is the result of a survey administered to a large introductory statistics class. The data set contains answers from 485 participants. Not all questions are answered resulting in missing data. Missing data is indicated by a period. These are the variables in the data set: Name | Type | Description | | Gender | char | gender (male or female) of the respondent | | Age | num | age of the subject in years | | Textbook | num | answer to the question “How much did you spend for textbooks this term (to nearest dollar)?” | | Cigs | num | answer to the question “How many cigarettes did you smoke yesterday?” | | ColGPA | num | answer to the question “What is your cumulative Grade Point Average at this institution?” | | HSGPA | num | answer to the question “What was your cumulative high school GPA (4 point scale)?” | | Height | num | height of the respondent in inches | | Mateh | num | the height of the respondent’s “ideal mate” | | Shoe | num | the respondent’s shoe size | | Breakfast | char | answer to the question “Did you eat breakfast this morning?” | | Flight | char | answer to the question “Did you fly on a commercial airline during the past 30 days? (Yes or No)” | | Play | char | answer to the question “Have you seen a play in a live theater in the past 6 months? Yes or No” | | Vote | char | answer to the question “Are you registered to vote? Yes or No” | | Credit | num | answer to the question “How many credit hours are you taking this term?” | |
|
Source of Data
|
This data was collected by Roger Woodard of the North Carolina State University in 2005. |
Student Survey 2 (Box Plot): Solution |
Shape: For both genders the height is approximately symmetric. We notice that the males are centered around 71 inches, while the females are centered at about 65 inches. In terms of spread we notice that the males are more spread out than the females; however, the spread of the middle 50% (as represented by the box of the boxplot) is about the same for both males and females. |
Muzzle Velocities 1 (Box Plot): Problem |
A government bureau in charge of assessing the efficiency of firearms for law enforcement agencies performed an experiment where the muzzle velocities of cartridges made from two types of gunpowder were recorded. The same type of firearm and cartridge was used for both types of gunpowder in the study. Create box plots for each gunpowder type. Then, based on a comparison of the distributions, decide which gunpowder seems generally to result in a higher muzzle velocity. |
 SAS Institute Inc.
Printer Friendly |
Muzzle Velocities 1 (Box Plot): Sample Data | |
The Bullets data set contains data that was collected to determine whether there is a difference in the muzzle velocity of cartridges made from two types of gunpowder. These are the variables in the data set: Name | Type | Description | | powder | num | type of gunpowder | | velocity | num | muzzle velocity | |
|
Source of Data
|
This data is sample data from SAS Institute Inc. |
Muzzle Velocities 1 (Box Plot): Solution |
It seems that the use of powder 2 generally results in a higher muzzle velocity, because the overall distribution of its values is higher. |
Candy Bars (Pie Chart): Problem |
The United States Department of Agriculture (USDA) and the Department of Health and Human Service (HHS) have suggested that a daily diet should consist of an appropriate number of calories (among other things), of which 30% or fewer should be calories from fat. For adults consuming 2000 calories per day, this works out to 65 grams of fat or less. A sample of various candy bars and non-bar candies (such as M&Ms, Skittles, etc.) was collected, and nutritional facts from each candy were recorded including total fat and saturated fat (measured in grams). Create a pie chart to display the number of candy bars from each brand that were included in the sample. |
 SAS Institute Inc.
Printer Friendly |
Candy Bars (Pie Chart): Sample Data | |
The Candy data set contains nutritional facts about candy bars and non-bar candies such as M&Ms, Reese's Pieces, Skittles, and Super Hot Tamales. These are the variables in the data set: Name | Type | Description | | Brand | char | brand of candy | | Name | char | name of candy | | Serving_pkg | num | servings per package | | Oz_pkg | num | ounces per package | | Calories | num | calories | | Total_fat_g | num | total fat content in grams | | Saturated_fat_g | num | saturated fat content in grams | | Cholesterol_g | num | cholesterol content in grams | | Sodium_mg | num | sodium content in milligrams | | Carbohydrate_g | num | carbohydrate content in grams | | Dietary_fiber_g | num | dietary fiber content in grams | | Sugars_g | num | sugars content in grams | | Protein_g | num | protein content in grams | | Vitamin_A_RDI | num | vitamin A content as a percentage of the RDI (Reference Daily Intake) | | Vitamin_C_RDI | num | vitamin C content as a percentage of the RDI (Reference Daily Intake) | | Calcium_RDI | num | calcium content as a percentage of the RDI (Reference Daily Intake) | | Iron_RDI | num | iron content as a percentage of the RDI (Reference Daily Intake) | |
|
Source of Data
|
This data is sample data from SAS Institute Inc. |
Candy Bars (Pie Chart): Solution |
It appears that most of the candy bars in the sample were of the Hershey brand. |
Car Types (Bar Chart): Problem |
A set of data was collected on 116 cars from different countries, containing information such as weight, gas tank size, turning radius, horsepower and engine displacement. Create a bar chart to display the distribution of car type (compact, sporty, small, medium, and large). |
 SAS Institute Inc.
Printer Friendly |
Car Types (Bar Chart): Sample Data | |
The Cars data set contains data about cars from different countries. These are the variables in the data set: Name | Type | Description | | Model | char | model | | Country | char | country | | Type | char | type (Compact, Small, Medium, Large, Sporty) | | Weight | num | weight | | TurningRadius | num | turning radius | | Displacement | num | engine displacement | | Horsepower | num | horsepower | | GasTank | num | capacity of gas tank | |
|
Source of Data
|
This data is sample data from SAS Institute Inc. |
Car Types (Bar Chart): Solution |
It appears that the medium car type occurred with the most frequency in the sample. |
Arrests (Stem & Leaf Plot): Problem |
A division of the U.S. Department of Justice collected data on the total number of arrests in the United States from the year 1970 to 1999. Variables in the data set include year, total number of arrests, total number of arrests by age group, total population of the U.S. on July 1 of the given, etc. Use a stem plot to display the distribution of total number of arrests in the U.S. from 1970 to 1999. |
 SAS Institute Inc.
Printer Friendly |
Arrests (Stem & Leaf Plot): Sample Data | |
The Totarrests data set contains data about the total number of arrests in the United States from the year 1970 to 1999. These are the variables in the data set: Name | Type | Description | | Year | num | year of arrest | | TotalArrests | num | total number of arrests | | AGE1 | num | number of arrests for age group 1 | | AGE2 | num | number of arrests for age group 2 | | AGE3 | num | number of arrests for age group 3 | | AGE4 | num | number of arrests for age group 4 | | AGE5 | num | number of arrests for age group 5 | | ArrestRate | num | total arrests per 100 thousand in population | | AGE1rate | num | arrests per 100 thousand in population for age group 1 | | AGE2rate | num | arrests per 100 thousand in population for age group 2 | | AGE3rate | num | arrests per 100 thousand in population for age group 3 | | AGE4rate | num | arrests per 100 thousand in population for age group 4 | | AGE5rate | num | arrests per 100 thousand in population for age group 5 | | Population | num | total population of the U.S. on July 1 of the given year | | AGE1pop | num | population of the U.S. in age group 1 on July 1 of the given year | | AGE2pop | num | population of the U.S. in age group 2 on July 1 of the given year | | AGE3pop | num | population of the U.S. in age group 3 on July 1 of the given year | | AGE4pop | num | population of the U.S. in age group 4 on July 1 of the given year | | AGE5pop | num | population of the U.S. in age group 5 on July 1 of the given year | |
|
Source of Data
|
This data is sample data from SAS Institute Inc. |
Arrests (Stem & Leaf Plot): Solution |
This is the stem plot:
Stem Leaf 15 123 14 00122356 13 8 12 157 11 679 10 22348 9 0136 8 167
To get the actual value in the data set represented by each stem-leaf combination, multiply these pairs by 1,000,000. |
Cholesterol Levels (Quantiles Plot): Problem |
In a study to investigate the relationships between various factors and heart disease, blood lipid screenings were conducted on a group of patients. Three months after an initial screening, data was collected from a second screening that included information such as gender, age, weight, total cholesterol, and history of heart disease. Use a normal q-q plot to inspect the normality of total cholesterol level before we proceed with a simple linear regression with this variable as our response. |
 SAS Institute Inc.
Printer Friendly |
Cholesterol Levels (Quantiles Plot): Sample Data | |
The Lipid data set contains data about blood lipid screenings and patient history. These are the variables in the data set: Name | Type | Description | | Name | char | name | | Gender | char | gender | | Age | num | age | | Weight | num | weight at first screening | | Cholesterol | num | total cholesterol level at first screening | | Triglycerides | num | triglycerides level at first screening | | HDL | num | HDL level at first screening | | LDL | num | LDL level at first screening | | PercentIdeal | num | percentage of ideal weight at first screening | | Height | num | height | | Skinfold | num | skinfold measurement | | SystolicBP | num | systolic blood pressure | | DiastolicBP | num | diastolic blood pressure | | Weight3 | num | weight at 3-month screening | | PercentIdeal3 | num | percentage of ideal weight at 3-month screening | | Triglyceride3 | num | triglycerides level at 3-month screening | | Cholesterol3 | num | total cholesterol level at 3-month screening | | HDL3 | num | HDL level at 3-month screening | | LDL3 | num | LDL level at 3-month screening | | Exercise | num | exercise | | Coffee | num | coffee consumption (cups per day) | | Smoking | char | smoking behavior (none, quit, cigar, pipes, cigarettes) | | Alcohol | char | alcohol consumption (number of drinks per day) | | HeartDisease | char | history of heart disease | | CholesterolLoss | num | reduction in cholesterol level between first and 3-month screening | |
|
Source of Data
|
This data is sample data from SAS Institute Inc. |
Cholesterol Levels (Quantiles Plot): Solution |
The points on the q-q plot do not display an overall strong departure from a linear pattern, so we can conclude that the total cholesterol levels could be normally distributed. |
|