SAS Statistical Library  
SAS - The power to know(tm)
  Home | Solve Exercises | Download SAS Data | Learn How in SAS | Download Questions
splash  
color
   
 

1.


If 2% of the fuses produced are defective, the probability that in a
randomly selected sample of six there are two defectives is:


a. (6C2)((.02)**2)((.98)**4)
b. ((.02)**4)((.98)**2)
c. (6C4)((.02)**4)((.98)**2)
d. ((.98)**4)((.02)**2)
e. none of the above



Answer:

a. (6C2)((.02)**2)((.98)**4)


Use the binomial probability formula.


2.


A company engaged in recruiting wishes to develop a questionnaire that
spreads out applicants (shows greater variation in scores than does
its standard form). A new, longer form is developed and tested by
having 16 randomly selected applicants from the current applicant
pool fill it out. (Another group of 16 randomly and independently
selected applicants fills out the standard form.)


Results obtained were:
Variance - new form S**2(1) = 40.2
Variance - standard form S**2(2) = 12.3


Regarding scores as normally distributed -
a. Test at the 5% level the claim that the variance of the new form
is greater than that of the old form.
b. Sketch the relevant F distribution and indicate the rejection
region.



Answer:

a. F(calculated) = 40.2/12.3 = 3.268
F(critical) = 2.4 at the 95% confidence level with df=15,15.


Since F(calculated) is greater than F(critical), we will reject
the null hypothesis that the new variance is less than or equal
to the old variance.


b. If available, consult file of graphs and diagrams that could not
be computerized for accompanying diagram.


3.


It is desired to see if there is a relationship in tastes for an
expensive car and owning a tri-maran. A survey of 200 upper-
upper class potential purchasers of cars and tri-marans gave
these responses:


want expensive car do not want expensive totals
car


want tri-maran 100 40 140


don't want 20 40 60
tri-maran


totals 120 80 200


Specifically, it is desired to test H(O): the desire for a tri-maran
is independent of a desire for an expensive car vs. H(1): there
is a relationship, at level ALPHA = 10%.


The correct table value (cutoff point) to use for this problem is
closest to:


a) 2.71 d) 9.21
b) 4.61 e) 6.25
c) 6.63



Answer:

a) 2.71
CHISQUARE(critical, df = 1, ALPHA = .10) = 2.70554


4.


The thickness of the individual cards produced by a certain
playing card manufacturer is normally distributed with mean =
0.01 inches and variance = 0.000052. What is the probability
that a deck of 52 cards is more than 0.65 inches in thickness?


A. .001 B. .006 C. .023 D. .036
E. .067 F. .087 G. .159 H. .184



Answer:

B. .006


MU (deck) = 52 * .01 = .52
Var(deck) = 52 * .000052 = .002704


Z = (.65 - .52)/SQRT(.002704) = .13/.052
= 2.5


P(Z > 2.5) = .0062


5.


A particular type of bolt is produced having diameters with mean 0.500
inches and standard deviation 0.005 inches. Nuts are also produced
having inside diameters with mean 0.505 inches and standard deviation
0.005 inches. If a nut and a bolt are chosen at random, what is the
probability that the bolt will fit inside the nut?





Answer:

Mean for the distribution of differences = .005
Standard deviation = SQRT((.005)**2/1 + (.005)**2/1) = .007071


Z = value of interest - mean of distribution (of differences) /
standard error of the distribution of differences


Z = 0 - .005/.007071 = -.71


We want all the area to the right of -.71


= .7611 or 76%.


6.


It is known that the lengths of a particular manufactured item are
normally distributed with a mean of 6 and a standard deviation of
3. If one item is selected at random, what is the probability that
it wil fall between 5.7 and 7.5?



Answer:

P(5.7 < Y < 7.5) = P((5.7-6)/3 < Z < (7.5-6)/3)
= P(-.1 < Z < .5)
= .0398 + .1915
= .2313


7.


Suppose the length of life of certain kinds of batteries is normally
distributed with MU = 36 months, SIGMA = 4 months. The company guar-
antees the battery to last 30 months. What proportion of the batter-
ies will they have to make an adjustment on?



Answer:

P(X < 30) = P(Z < (30 - 36)/4)
= P(Z < -1.5)
= .0668 or 6.68%


8.


It is found that the gyre swivit, manufactured on any given day
at Gozornenplatz, Inc.'s Swat City plant, has the following cha-
racteristics with respect to length:


Normally distributed with MU = 3.5"
and SIGMA = .2".


Draw a picture for each of the following, and show your work.


a. What percent of a day's output lies within two standard
deviations of the mean?


b. Of a day's output of 2500 gyre swivits, how many will
measure less than 3.966"?


c. 95% of a day's output, centered around the average, will
measure between _______ and _______.


d. What percent of a day's output lies between 1 and 2.58
standard deviations above the mean?



Answer:

If available, consult file of graphs and diagrams that could not be
computerized.


a. 95.4%


P(3.3 <= X <= 3.7) = P(-2 <= Z <= 2)
= 2(P(0 <= Z <= 2))
= 2 * (.4772)
= .9544


b. 2475


Z = (X - MU)/SIGMA
= (3.966 - 3.5)/.2
= (.466/.2)
= 2.33


P(Z < 2.33) = .99
99% of 2500 = (.99)(2500) = 2475


c. 3.108, 3.892


P(0 <= Z <= ?) = .475
Therefore ? = 1.96


Lower Boundary = MU - (Z)(SIGMA)
= 3.5 - (1.96)(.2)
= 3.108


Higher Boundary = MU + (Z)(SIGMA)
= 3.5 + (1.96)(.2)
= 3.892


d. 15.4%


P(1 <= Z <= 2.58)
= P(0 <= Z <= 2.58) - P(0 <= Z <= 1)
= .4951 - .3413
= .1538
= 15.4%


9.


The U.S. Department of Commerce has just completed a sample survey of
weekly food expenditures. A simple random sample of 100 families was
taken. The average weekly food expenditure was $70.00 per week, with
a standard deviation of $8.00. You may assume expenditures in the
population to be normally distributed.


a. What proportion of the families spent $85.00 or more per week
on food? Be sure to diagram your problem solution]


b. Using the information above, find the expenditure value above
which 80% of the families lie.



Answer:

If available, consult file of graphs and diagrams that could not be
computerized.


a) Z = (X - MU)/SIGMA
= (85 - 70)/8
= 1.875


Area beyond this Z value is .0301, so 3.01% of the families spent
85 dollars or more per week.


b) A cumulative Z value such that 80% lies above it or 20% lies below
it is -.84.


Z = (X - MU)/SIGMA
-.84 = (X - 70)/8
X = 63.28


Therefore, 80% of the families lie above the expenditure value of
$63.28/week.


10.


Suppose a floor manager of a large department store is
studying buying habits of their customers.


a) If he is willing to assume that monthly income of these customers
is distributed normally, what proportion of the income should he
expect to fall in the interval determined by MU +/- 1.2(SIGMA)?


b) What proportion of the income should he expect to be greater
than MU + SIGMA?


c) Still assuming normality, what is the probability that a
customer selected at random will have an income exceeding the
population mean by 3*SIGMA?



Answer:

a) P(MU <= X <= 1.2SIGMA) = .3849
P(X is in interval MU +/- 1.2SIGMA) = 2(.3849) = .7698


b) P(X > (MU + SIGMA)) = (.5 - .3413) = .1587


c) P(X > MU + (3*SIGMA)) = (.5 - .4987) = .0013


11.


Suppose a floor manager of a large department store is studying
the buying habits of the store's customers.


a) If he is willing to assume that monthly income of these
customers is distributed normally and SIGMA = $500, find
the proportion of customers exceeding the population mean
by $375.


b) Find the proportion of customers within $125 of the
population mean.



Answer:

a) Z = 375/500 = .75
P(Z > .75) = (0.5 - .2734) = .2266


b) Z = 125/500 = .25
P(-.25 <= Z <= .25) = 2(.0987) = .1974


12.


A floor manager of a large department store is studying the buying
habits of the store's customers. Suppose the manager has someone
tell him that monthly income of these customers is distributed nor-
mally with a population mean of $600 and standard deviation of $500.


a) What proportion of the customers should he expect to have
incomes less than $600?


b) What proportion should he expect to have incomes less
than $725?



Answer:

a) .5


b) Z = (725 - 600)/500 = .25
P(Z < .25) = .5 + .0987 = .5987


13.


A company manufactures rope.  From a large number of tests over a long
period of time, they have found a mean breaking strength of 300 lbs.
and a standard deviation of 24 lbs. Assume that these values are
MU and SIGMA.


It is believed that by a newly developed process, the mean breaking
strength can be increased.


(a) Design a decision rule for rejecting the old process with an
ALPHA error of 0.01 if it is agreed to test 64 ropes.


(b) Under the decision rule adopted in (a), what is the probability
of accepting the old process when in fact the new process has
increased the mean breaking strength to 310 lbs.? Assume SIGMA
is still 24 lbs. Use a diagram to illustrate what you have done,
i.e., draw the reference distributions.



Answer:

a. One tail test at ALPHA = .01, therefore Z = 2.33.


Z = (YBAR-MU)/(SIGMA/SQRT(n))
2.33 = (YBAR-300)/(24/SQRT(64))
YBAR = 307


Decision Rule: If the mean strength of 64 ropes tested is 307
lbs. or more, we reject the hypothesis of no im-
provement, i.e., we continue that the new process
is better.


b. If available, consult file of graphs and diagrams that could not
be computerized for reference distributions.


Z = (307-310)/(24/SQRT(64)) = 1.00
Area = 0.1587 or 15.87%


P(type II error) = 0.1587


14.


Suppose X is the price  that a certain stock will be exactly 6 months
from today. Assume that X is normally distributed with a mean of $30
and a standard deviation of $5.


a. Find the probability that X will be at least $30.
b. Find the probability that X will be greater than $40.
c. Find the probability that X will be between $20 and $35.
d. How many standard deviations is $38 from the mean?
e. If you paid $29 for the stock today, what is the probability that
you will make a profit if you sell the stock exactly 6 months from
today?



Answer:

a. P(X >= 30) = P(Z >= 0) = 1/2, where Z = (30 - 30)/5


b. Z = (40 - 30)/5 = 2; P(X > 40) = .5 - .4772 = .0228


c. Z(1) = (20 - 30)/5 = -2
Z(2) = (35 - 30)/5 = 1


Prob(20 < X < 35) = .4772 + .3413 = .8185


d. 8/5 = 1.6 SD's


e. Prob(X > 29) = .5 + .0793 = .5793, where Z = (29 - 30)/5 = -.2


15.


Distribution of the I.Q.'s of 4,500 employees of a company is
roughly normal with mean 104 and standard deviation 15. Find
the number of employees whose I.Q. is:


a. greater than or equal to 110
b. between 95 and 110.



Answer:

a. Z = (110 - 104)/(15) = .4
NO. = (4500)(.5 - .1554) = 1550


b. Z = (95 - 104)/(15) = -.6
NO. = (4500)(.1554) + (4500)(.2258) = 1715


16.


A certain kind of automobile battery is known to  have  a  length  of
life which is normally distributed with a mean of 1200 days and
standard deviation 100 days. How long should the guarantee be if the
manufacturer wants to replace only 10% of the batteries which are
sold?



Answer:

Z = -1.28 for 10 percent failure


-1.28 = (X - 1200)/100


X = 1072 days for guarantee


17.


A floor manager of a large department store is studying the buying
habits of the store's customers. Suppose he assumes that the monthly
income of these customers is normally distributed with a standard de-
viation of 500. If he were to draw a random sample of N = 100 custo-
mers and determine their income:


a) What is the probability that the sample mean of incomes will
differ from the population mean by more than $25?


b) What is the probability that the sample mean is larger than
the population mean?


c) Could you provide a reasonable answer to (a) and (b) if
the population of incomes were not normal? Explain.



Answer:

a) SIGMA(XBAR) = SIGMA/SQRT(n)
= 500/SQRT(100)
= 50


Z = (XBAR - MU)/SIGMA(XBAR)
Z = 25/50 = .5
P(Z < -.5 or Z > +.5) = 2(.5 - .1915) = .6170


b) .5


c) Yes, the central limit theorem assures us that the
distribution of means for n = 100 is symmetrical and
approximately normal.


18.


Suppose that you work for a brewery as a clerk to receive barley
shipments. As part of your job you are to decide whether to keep
or return new shipments of barley. The criteria used for making your
decision is an estimation of the moisture content of the shipment.
If the moisture level is too high (above 17.5%) the shipment has a
good possibility of rotting before use and, therefore, a loss of
money to the company. You know from past experience that the variance
for all barley shipments is 36 and that your staff can process at the
most one sample of 9 moisture readings per shipment.


a. Propose a rule for accepting and rejecting grain shipments on the
basis of sample means where the null claim is a shipment has a
mean moisture content of 17.5% or less (H(0): MU <= 17.5%).
Let the probability of Type I error be .10.


b. When will you make incorrect decisions about a grain shipment
having MU = 17.4? What will be the probability of such an
error?


c. When will you make incorrect decisions about a grain shipment
having MU = 19? What will be the probability of such errors?


d. When will you make incorrect decisions about a grain shipment
having MU = 21? What will be the probability of such errors?



Answer:

SIGMA**2 = 36
Take a sample, n = 9
SIGMA(XBAR) = SIGMA/SQRT(n) = 6/3 = 2


a. H(0): MU <= 17.5
H(1): MU > 17.5


ALPHA = .10 implies Z = 1.28
Z = XBAR - MU/SIGMA(XBAR)


1.28 = XBAR - 17.5/2
2.56 = XBAR - 17.5


XBAR = 20.06


Reject H(0) when XBAR > 20.06.


b. I am rejecting H(0) when XBAR > 20.06, so when MU is REALLY 17.4,
I make incorrect decisions whenever XBAR > 20.06.


Z = 20.06 - 17.4/2
Z = 1.33
Area beyond Z = 1.33 is .0918.


The probability of an incorrect decision is .0918.


c. I am rejecting H(0) when XBAR > 20.06, so when MU is REALLY 19,
I make incorrect decisions whenever XBAR <= 20.06.


Z = 20.06 - 19/2
Z = .53
Area between mean and Z = .2019.


The probability of an incorrect decision is .5 + .2019 = .7019.


d. I am rejecting H(0) when XBAR > 20.06, so when MU is REALLY 21,
I make incorrect decisions whenever XBAR < 20.06.


Z = 20.06 - 21/2
Z = -.47
Area beyond Z = -.47 is .3912.


The probability of an incorrect decision is .3912.


19.


If the number of complaints which a laundry receive per day is a
random variable having Poisson distribution with LAMBDA = 4, find
the probabilities that on a given day the laundry will receive:


a. no complaints,
b. exactly 2 complaints.



Answer:

a. P(x=0) = (4**0)(e** -4)/0] = .018
b. P(x=2) = (4**2)(e** -4)/2] = .147


20.


As a quality control inspector you have observed that  wooden  wheels
which are bored off-center occur about three percent of the time. If
six of these wheels are to be used on each toy truck produced by Acme
Toy Company, the probability that a given truck has no wheels off
center would be obtained by using which distribution?


(a) Normal (c) Hypergeometric
(b) Poisson (d) Binomial



Answer:

(d) Binomial


21.


Suppose that 2% of the fuses produced by a machine are defective.
If we take a sample of 6 from the machine's output, the probability
that the first four fuses are good and the last two defective is:


a. (6C4)((.98)**4)((.02)**2)
b. ((.02)**4)((.98)**2)
c. (6C4)((.02)**4)((.98)**2)
d. ((.98)**4)((.02)**2)
e. none of these



Answer:

d. ((.98)**4)((.02)**2)


The combination (6C4) is not necessary because the order GGGGDD
is distinct.


22.


An accounting firm processed 1000 balance sheets for its client
last year. If 20% of these are known to contain errors, what
is the probability of finding at least one error in a
sample of 4 balance sheets chosen at random with replacement?


a. .4906
b. .0016
c. .5904
d. .9984
e. none of these



Answer:

c. Let x = # balance sheets containing errors
P(x>=1) = 1-P(x=0);
P(x=0) = b(0;4,.20) = (4C0)(.20**0)(.80**4)
= .4096
P(x>=1) = 1 - .4096 = .5904


23.


We have a manufacturing process which produces good items
with probability .9. We select a sample of 15 items. Assume a
binomial experiment.


What is the probability that there is at least one good item in
the sample?


a) 15C1(.9)**1
b) 15C0(.9)**0(.1)**15
c) 1 - (15C0(.9)**0(.1)**15)
d) 1 - (15C1(.9)**1(.1)**14)
e) none of the above



Answer:

c) 1 - (15C0(.9)**0(.1)**15)


24.


The following triangle test is sometimes used to identify taste
experts. In the case of wine tasting, a test subject is presented
with three glasses of wine, two of one kind and a third glass of
another wine. The test subject is asked to identify the single
glass of wine. A test subject who merely guesses has a 1 chance
in 3 of identifying the single glass correctly. An expert wine
taster should be able to do much better. Let K stand for the num-
ber of correct identifications made by a test subject in 10 inde-
pendent triangle tests.


A test subject makes at least 5 correct identifications (k >= 5). The
descriptive level associated with this result is:


a. .076 c. .213
b. .137 d. .057



Answer:

c. .213


Descriptive level = P(5 or more correct identifications)
= P(5) + P(6) + P(7) + P(8) + P(9) + P(10)
= .1366+.0569+.0162+.0030+.0003+.0000
= .213


25.


In a dispute over the proportion of defects in  a  large  shipment  a
buyer claims there are 20% defective while the seller claims only
10%. To settle the dispute it is decided to take a sample of size
100 from the shipment and if there are less than 15 defectives found
to rule in favor of the seller. (Note: the shipment is so large that
sampling can be considered to be with replacement.)


a. What is the probability of ruling in favor of the seller if
he is correct?
b. What is the probability of ruling in favor of the seller if
the buyer is correct?



Answer:

a. If the seller is correct -


Using normal approximation: p = .1, np = 10, SIGMA = SQRT(npq) = 3
Prob(Z < (14.5 - 10)/3) = .9322


Using binomial with n = 100 and p = .1:
P(X < 15) = .9274


b. If the buyer is correct -


Using normal approximation: p = .2, np = 20, SIGMA = 4
Prob(Z < (14.5 - 20)/4) = .0845


Using binomial with n = 100 and p = .2:
P(X < 15) = .0804


26.


Suppose that it is known that one out of  ten  undergraduate  college
textbooks is an outstanding financial success. A publisher has
selected four new text books for publication. What is the
probability that:


a. Exactly one will be an outstanding financial success?


b. at least one?


c. at least two?



Answer:

a. P = (4C1)(.1**1)(.9**3) = .2916


b. P = 1 - (4C0)(.9**4) = .3439


c. P = 1 - .6561 - .2916 = .0523


27.


The Liddalol Airline Company runs an airline from New York to
Boston. Its planes carry a maximum of 90 passengers. Knowing
that not all persons who reserve seats will actually use them,
they accept 100 reservations for each flight. The company has
determined that 80% of the persons who make reservations actu-
ally use them. Assuming that 100 reservations are made for a
particular flight, find the probability that some passengers
will not get seats. What two assumptions do you have to make?



Answer:

p = .8
1 - p = .2
n = 100


Using normal approximation to binomial:


XBAR = np = 100*.8 = 80
S**2 = npq = 100*.8*.2 = 16
S = 4


P(X > 90) = P(Z > (90 - 80)/4)
= P(Z > 2.5)
= .0062


The two assumptions are:


1. That the decisions for individuals to use their reservations
are independent


2. That the probability that a person who makes a reservation
actually uses it remains constant from person to person.


28.


The Noglow automatic cigarette lighter is claimed to light 80% of
the time when the button is pushed. If this is true, and if the
lighter is tried 25 times:


a. What is the probability of getting exactly 20 lights?


b. What is the probability of getting fewer than 17 lights?


c. What is the probability of getting no lights on the first
4 trials?



Answer:

a. P(X = 20) = (25C20)(.8**20)(.2**5)
= .196


b. P(X < 17) = .046 (use table)


c. (.2**4) = .0016


29.


A floor manager of a large department store is studying habits of their
customers. One aspect of this research pertains to residence location
of customers.


a) If 1/2 of the customers live outside the city, what is the proba-
bility that 4 customers selected at random will all live inside
the city?


b) Continuing to suppose that 1/2 live outside, what is the probability
that 3 or fewer in a random sample of size n = 10 will live outside?


c) If 1/2 live outside, the probability is 0.10 that the random sample
of size n = 100 will contain ____ or fewer persons living outside.



Answer:

a) (1/2)**4 = 1/16


b) Let X = number chosen who live outside
P(X<=3) = b(0; 10, .5) + b(1; 10, .5) + b(2; 10, .5) + b(3; 10, .5)
= .001 + .0098 + .0439 + .1173
= 0.172


c) P(X<=?) = .10 = b(?; 100, 1/2)
Using the fact that for large n and p and q not too close to zero,
the binomial distribution can be closely approximated by a normal
distribution where
Z = (X - np)/SQRT(npq)


Therefore
X = (Z*SQRT(npq)) + np
= (-1.28*SQRT(100*.5*.5) + (100*.5)
= (-1.28 * 5) + 50
= -6.4 + 50
= 43.6
X == 44


30.


A salesman has found that, on the average, the probability of a  sale
on a single contact is .3. If the salesman contacts 50 customers,
what is the probability that at least 10 will buy? Write an exact
expression for the probability and then obtain an approximate
numerical value using the normal approximation.



Answer:

Exact Expression:


p = .3, q = 1 - p = .7, n = 50


p(X >= 10) = SUM(X = 10, 50)((50CX)(.3**X)(.7**[n - X!)) = .9598


Using normal approximation:


XBAR = n*p = 50*.3 = 15
SIGMA = SQRT(npq) = SQRT(50*.3*.7) = 3.24


Z = (10 - 15)/3.24 = -1.54
P(Z >= -1.54) = .4382 + .5000
= .9382


If you use the correction factor, Z = -1.70 and P(Z >= -1.70) = .9554.


31.


A machine produces bolts in a length (in inches) found to
obey a normal probability law with mean MU = 5 and standard
deviation SIGMA = 0.1. The specifications for a bolt call for
items with a length (in inches) equal to 5 +/- 0.15.
A bolt not meeting these specifications is called defective.


a. What is the probability that a bolt produced by this
machine will be defective?


b. If a sample of 10 bolts is chosen at random what is
the probability that there will be at least two
defective bolts?



Answer:

X = length of bolt


a. P(defective) = 1 - P(4.85 < X < 5.15)
= 1-P((4.85 - 5)/.1 < (X - 5)/.1 < (5.15 - 5)/.1)
= 1 - P(-1.5 < Z < 1.5)
= 1 - .8664
= .1336


b. Y = number of defectives in a sample of size 10


Then Y has a binomial distribution with parameters n = 10,
P = .1336.


Hence, P(at least 2 are defective) = P(Y >= 2)
= 1 - P(Y = 0) - P(Y = 1)
= 1 - (10C0)*(P**0)*((l - P)**10) - (10C1)*P*((1-P)**9)
= 1 - (.8664**10) - 10*(.1336)*(.8664**9)
= .394


32.


A factory finds that, on the average, 20% of the bolts produced by a
given machine are defective. If 10 bolts are selected at random from
the day's production, find the probability that:


a) exactly 2 will be defective.
b) 2 or more will be defective.



Answer:

a) P(X = 2) = (nCX)(p**X)(q**(n - X))
= (10C2)(.2**2)(.8**(10 - 2))
= .3020


b) P(X >= 2) = 1 - P(X <= 1)
= 1 - (P(1) + P(0))
= 1 - (((10C1)(.2**1)(.8**(10 - 1)))
+ ((10C0)(.2**0)(.8**(10 - 0))))
= 1 - (.2684 + .1074)
= .6242


33.


Seventy  five  percent  of  the  Ford  autos made in 1976 are falling
apart. Determine the probability distribution of the number of Fords
in a sample of 4 that are falling apart. Draw a histogram of the
distribution. What is the mean and variance of the distribution?



Answer:

Let X = the number of Fords falling apart in a sample of four.


probability distribution: (binomial distribution with n=4 and p=.75)


X ^ p(X)
-------^----------
0 ^ 0.0039 = (4C0)(.75**0)(.25**4)
1 ^ 0.0469 = (4C1)(.75**1)(.25**3)
2 ^ 0.2109 = (4C2)(.75**2)(.25**2)
3 ^ 0.4219 = (4C3)(.75**3)(.25**1)
4 ^ 0.3164 = (4C4)(.75**4)(.25**0)


^
P(X) ^ ^ ^ ^ ^
^ ^ ^ ^ ^
^ ^ ^ ^ ^
^ ^ ^ ^ ^
^ ^ ^ ^ ^
0.6 ^----------^----------^----------^----------^
^ ^ ^ ^ ^
^ ^ ^ ^ ^
0.5 ^----------^----------^----------^----------^
^ ^ ^ ^ ^
^ ^ ^ ---------- ^
0.4 ^----------^----------^----^ ^-----^
^ ^ ^ ^ ^ ^
^ ^ ^ ^ ^----------
0.3 ^----------^----------^----^ ^ ^
^ ^ ^ ^ ^ ^
^ ^ ----------^ ^ ^
0.2 ^----------^----^ ^ ^ ^
^ ^ ^ ^ ^ ^
^ ^ ^ ^ ^ ^
0.1 ^----------^----^ ^ ^ ^
^ ^ ^ ^ ^ ^
^ ----------^ ^ ^ ^
^----^----------^----------^----------^----------^----->
0 1 2 3 4 X



mean = np = 4*.75 = 3
variance = npq = 4*.75*.25 = .75


34.


The probability that a particular kind of machine, used in production,
breaks down during a one week period is 0.2. If a company has 10 of
these machines, what is the probability of having:
a. at least two breakdowns during a given week?
b. 3,4, or 5 breakdowns during a given week?
c. How many breakdowns should the company expect to have over
a one month (4 weeks) time period?
d. Now suppose that during a particular week six machines
break down. Do you have reason to believe that the break-
down rate may have increased above the 0.20 rate? State
reasoning.
e. What assumptions are necessary in making the above calcula-
tions of probability?



Answer:

x = # of breakdowns during a given week
x has a binomial distribution with n=10, p=.2


a. P(x>=2) = 1 - [b(0;10,.2) + b(1;10,.2)!
= 1 - [.376!
= .624
b. P(3 <= x <= 5) = b(3;10,.2) + b(4;10,.2) + b(5;10,.2)
= .2013 + .0881 + .0264
= .3158
c. E[x! = np = 10(.2) = 2 for 1 week
for 4 weeks = 4(2) = 8
d. The probability of 6 or more breakdowns given p = .2, n = 10
is .0064. Since the probability that this event occurred by
chance is so small, there appears to be some indication that
the breakdown rate has increased above .2.
e. 1) The breakdown of one machine is not related to the condi-
tion of any of the other machines; in other words, machine
breakdowns are independent.
2) The probability of a breakdown is the same for each of the
machines.


35.


A large TV retailer in San Francisco claims that 80 percent of all
service calls on color television sets are concerned with the small
receiving tube. Test this claim against the alternative PI =/= 0.80
at ALPHA = 0.05 if a random sample of 222 calls on color television
sets included 167 which were concerned with the small receiving tube.



Answer:

Continue the null hypothesis that the proportion is .80.


Two methods of solution:
1. Using the binomial distribution:


H(0): PI = .80
H(A): PI =/= .80


Z = ((167/222) - .80)/SQRT(PI*Q/n)
= (.752 - .80)/SQRT(.00072)
= -1.779


2. Using the normal approximation to the binomial:


Mean = n*PI = 222 * .8 = 177.6
Variance = n*PI*Q = 222*.8*.2 = 35.52


H(0): MU = 177.6
H(A): MU =/= 177.6


Z = (167 - 177.6)/SQRT(35.52)
= -10.6/5.9599
= -1.779


The critical Z value for both of these cases is:
Z(ALPHA/2 = .025) = 1.96


Since the two Z calculated values are less than the critical Z value,
we continue the null hypothesis that the proportion is equal to .80.


36.


You, as a manufacturer,  can  use  a  particular  part  only  if  its
diameter is between .14 and .20 inches. Two companies, A and B, can
supply you with these parts at comparable costs. Supplier A produces
parts whose mean is .17 and whose standard deviataion is .015 inches.
However, supplier B produces parts whose mean is .16 inches and whose
standard deviation is .012. The diameters of the parts from each
company are normally distributed. Which company should you buy from
and why?



Answer:

For Supplier A:


Z = (X - MU)/SIGMA
= (.14 - .17)/.015
= -2


and Z = (.20 - .17)/.015
= 2


Area between Z = 2 and Z = -2 under the normal curve is .9544. There-
fore, 95.44% of the parts would be within .14 in. and .20 in.


For Supplier B:


Z = (.14 - .16)/.012
= -1.67


and Z = (.20 - .16)/.012
= 3.33


Area between Z = 3.33 and Z = -1.67 under the normal curve is .9520.
Therefore, 95.20% of the parts would be within .14 in. and .20 in.


Conclusion: I would choose Supplier A by a hair.


37.


A lightbulb is selected randomly from a factory's monthly production.
The bulb's lifetime (total hours of illumination) is a random variable
with exponential density function
f(x) = (1/MU)*(e**[-x/MU!) if x >= 0
= 0 if x < 0,
where the fixed parameter MU is the mean of this distribution (MU > 0).


a) Derive the cumulative distribution function F(x).
Show that a random lifetime X exceeds x hours (x > 0) with
probability
P(X > x) = e**(-1/MU)
b) Let M denote the smallest value in a random sample of n bulb
lifetimes X(1), X(2), ..., X(n).
Show that P(M > x) = P(X(1) > nx).
HINT: M > x if and only if X(1) > x and X(2) > x and ...
and X(n) > x.
c) Assume the mean lifetime MU = 700 hours.
Use a) and a table of the exponential function to evaluate
numerically
i) the median lifetime x(.50),
ii) P(X <= 70),
iii) P(70 < X <= 700).



Answer:

a) F(X) = INT(X/0)((1/MU)*(e**[-t/MU!)dt)
X
= -e**(-t/MU)!
0
= 1/0 - [e**-X/MU)!


F(X) = [ 0; x < 0
[ 1.0 - [e**(-x/MU)!; x >= 0


Prob (X>x) = 1.0 - F(X)
= 1.0 - [1.0 - [e**(-x/MU)!!
= e**(-x/MU)


b) Prob(M > x) = [Prob(X(1)>x)!*[Prob(X(2)>x)!*...*[Prob(X(n)>x)!
= [e**(-x/MU)!**n
= e**(-xn/MU)
= [Prob(X(1)>xn)!


c) i) 0.50 = Prob(X <= Median)
= F(x)
= 1.0 - [e**(-x/700)!
0.50 = e**(-x/700)
using a table of the exponential function
x/700 == .693
x == 485.1 hours
ii) Prob(X<=70) = F(X=70)
= 1.0 - [e**-70/700)!
= 1.0 - 0.90484
= 0.09516
iii) Prob(70 < x <= 700) = F(x=700) - F(x=70)
= [1.0-[e**(-700/700)!!-[1.0-[e**(-70/700)
= [1.0 - .36788! - [0.09516!
= 0.53696


38.


A lightbulb is selected randomly from a factory's monthly production.
The bulb's lifetime (total hours of illumination) is a random variable
with exponential density function
f(x) = [(1/MU)*(e**[-x/MU!) if x >= 0
[ 0 if x < 0,
where the fixed parameter MU is the mean of this distribution (MU>0).
a) For an exponential distribution the standard deviation SIGMA = MU.
Let XBAR = (1/n)(X(1)+X(2)+...+X(n)) denote the average value in
a random sample of n bulb lifetimes. Express E[XBAR! and VAR[XBAR!
in terms of MU. If the mean MU = 700 hours and sample size n = 100,
then the statistic Z=(XBAR-700)/70 has approximately a normal
distribution with what mean and variance?
b) Describe a test of the null hypothesis H(0): MU <= 700 against the
alternative hypothesis H(1): MU > 700, using only the sample mean
XBAR. If the desired significance level is ALPHA = .05 and sample
size n = 100, then indicate which numerical values of XBAR corre-
spond to this test rejecting H(0).
(Use the table of the standard normal distribution.)
c) If mean MU = 700 hours, then P(X > 2100) = .04979. If instead
MU > 700, is P(X > 2100) larger or smaller than .04979?



Answer:

a) E[XBAR! = E[(1/n)*(X(1)+X(2)+...+X(n))!
= (1/n)*[E[X(1)+E[X(2)!+...+E[X(n)!!
= (1/n)*[n*E[X!!
= E[X!
= INT(INFNTY/0)(X*(1/MU)*e**[-x/MU!)dx)
(Integrating by parts, with
u = x dv = (1/MU)(e**[-x/MU!)dx
du = dx v = -e**[-x/MU!
INFNTY
= -x*(e**[-x/MU!)! - INT(INFNTY/0)(-e**[-x/MU!dx)
0


INFNTY
= -MU * e**[-x/MU!!
0
= MU


E[x**2! = INT(INFNTY/0)((x**2)*(1/MU)*(e**[-x/MU!)dx)
by parts with,
u = (x**2) dv = (1/MU)(e**[-x/MU!)dx
du = 2x dx v = -e**[-x/MU!


INFNTY
= (x**2)*(-e**[-x/MU!)! -INT(INFNTY/0)((2x)*(-e**[-x/MU!)dx)
0
= -2*INT(INFNTY/0)((x*(-e**[-x/MU)dx)
by parts with
u = x dv = -e**[-x/MU!dx
du = dx v = mu*(e**[-x/MU!)
INFNTY
= -2*[x*MU*(e**[-x/MU!)! - INT(MU*(e**[-x/MU!)dx)!
0


INFNTY
= -2(MU**2)*(e**[-x/MU!)!
0
= 2(MU**2)


VAR[XBAR! = VAR[(1/n)*(X(1)+X(2)+...+X(n))!
= [(1/n)**2!*[VAR[X(1)!+VAR[X(2)+...+VAR[X(n)!!
= [(1/N)**2!*[n*VAR[X!!
= (1/n)*(VAR[X!)
= (1/n)*[E[X**2!-(E[X!**2)!
= (1/n)*[2(MU**2)-(MU**2)!
= (MU**2)/n


Z = (XBAR-700)/70
E[Z! = (E[XBAR!-700)/70
= (MU-700)/70
= (700-700)/70
= 0/70
= 0


VAR[Z! = VAR[(XBAR-700)/70!
= [(1/70)**2! * VAR(XBAR)
= [(1/70)**2! * [(MU**2)/n!
= [1/4900! * [(700**2)/100!
= 1


b) test statistic: Z = [XBAR-700!/[700/SQRT(n)!
critical region: Any value of Z(calc) that lies beyond the Z(crit)
which is found in the standard normal table with ALPHA per
cent of the distribution beyond it.


with n = 100 and ALPHA = .05, Z(crit) = 1.645


Thus in order to reject H(0),
[XBAR-700!/[700/SQRT(100)! >= 1.645
XBAR >= (1.645*70) + 700
XBAR >= 815.15


c) It can be shown that a random lifetime X exceeds x hours (X>0)
with probability
P(X > x) = e**(-x/MU)
Therefore,
P(X > 2100) = e**(-2100/700)
= e**(-3)
Now if MU > 700, the exponent of e becomes less and looking at a
table of the exponential function it is evident that the probability
becomes smaller.


39.


In a given business venture a man can make a profit of $1000 or
suffer a loss of $500. The probability of a profit is 0.6. What
is the expected profit (or loss) in that venture?



Answer:

p = .6
Expected profit = (.6*1000) - (.4*500)
= 600 - 200 = 400


40.


The following probability distribution applies to the value of a stock
during the coming year:


VALUE P(VALUE)
100 .46
150 .04
200 .20
250 .20
300 .10


Compute the expected value of the stock. What interpretation would
you give to this value?



Answer:

E(V) = (100 * .46) + (150 * .04) + (200 * .20) + (250 * .20)
+ (300 * .10)
= 46 + 6 + 40 + 50 + 30
= 172


The $172 is the average, or mean, of the distribution of stock values.


41.


Suppose that the probability that a salesman makes a sale to any
customer is .4. If each sale is worth $100 in commissions and the
events of making a sale to two different customers are independent,
what is his expected commission if he sees two customers on a parti-
cular day?



Answer:

If we let X = commissions that day, the probability distribution
for X is:


X ^ p(X)
_________________
0 ^ .6 * .6 = .36
$100 ^ (.6 * .4) + (.4 * .6) = .48
$200 ^ (.4 * .4) = .16


E(X) = (0 * .36) + (100 * .48) + (200 * .16)
= 48 + 32
= $80


42.


An investor wishes to buy a stock and sell it three months later.
After much investigation he has narrowed his possible choices to
five different stocks and decides to pick one of these at random.
The first stock has one chance in four of losing value. The
second stock has one chance in three of losing value. The third
stock has two chances out of nine of losing value. The fourth
and the fifth stocks both have three chances out of ten of losing
value. What is the probability that the investor loses money on
his investment?



Answer:

P(picking a particular stock) = .2


Given: First Stock: P(losing) = .25
Second Stock: P(losing) = .33
Third Stock: P(losing) = .22
Fourth Stock: P(losing) = .3
Fifth Stock: P(losing) = .3


Probability that the investor loses money on his
investment = E[P(losing)!


E[P(losing)! = (.25*.2) + (.33*.2) + (.22*.2) + (.3*.2) + (.3*.2)
= .28


43.


Suppose an investor buys a stock for $100 per share with intentions
of selling it three months later. At the end of three months he has
one chance in four of selling for $80 per share, one chance in four
of selling for $100 per share and one chance in two of selling for
$140 per share. How much per share can the investor expect to make
on this stock when he sells it?



Answer:

Let X = amount made by investor when he sells stock.


x ^ p(x)
_____________
-20 ^ .25
0 ^ .25
40 ^ .5


E(X) = (-20*.25) + (0*.25) + (40*.5)
= -5 + 0 + 10
= 5


The investor can expect to make $5 on this stock when he sells it.


44.


Joe Pennyworth has a very rare 1919S-VDB penny.   He  is  considering
accepting a firm offer of $8 for the penny or putting it up for
auction at the local numismatic club. His possible actions are:


a(1): put the penny up for auction
a(2): accept $8 for the penny


An estimate of the probability distribution for the sales price at the
auction is given to Joe as:


Sales Price Probability
----------- -----------
$ 6 .10
7 .20
8 .30
9 .30
10 .10


Joe has determined his utility function as:


Dollar Value Utility
------------ -------
$ 6 1.0
7 2.5
8 4.0
9 4.5
10 5.0


a. If Joe evaluates the problem by considering expected monetary values
what is his decision?


b. If Joe evaluates the problem by considering expected utilities, what
is his decision?


c. Plot the utility function.


d. Is Joe a risk preferrer?



Answer:

a. Expected profit if the penny is put up for auction:


E[a(1)! = [6*.10!+[7*.20!+[8*.30!+[9*.30!+[10*.10!
= .60 + 1.40 + 2.40 + 2.70 + 1.00
= $8.10


Expected profit if he accepts the offer:


E[a(2)! = [8.00*1.00!
= $8.00


Since E[a(1)! > E[a(2)!, he should put the penny up for auction.


b. Expected utility of putting the penny up for auction:


Exp. Ut. [a(1)! = [1*.10!+[2.5*.20!+[4*.30!+[4.5*.30!+[5*.10!
= 3.65


Expected utility of accepting offer:


Exp. Ut. [a(2)! = [4*1.00!
= 4


Since Exp. Ut.[a(2)! > Exp. Ut.[a(1)!, he should accept the offer of
$8 for the penny.


c. ^
Utility ^
^
5 + *
^ *
4 + *
^
3 +
^ *
2 + (NOTE: Connect *'s
^ with a smooth curve.)
1 + *
^
----+---+---+---+---+---+---+---+---+---+------>
1 2 3 4 5 6 7 8 9 10
Dollar Value


d. No, a risk avoider.


45.


A lot containing 12 parts among which 3 are defective is put on  sale
"as is" at $10.00 per part with no inspection possible. If a
defective part represents a complete loss of the $10.00 to the buyer
and the good parts can be resold at $14.50 each, is it worthwhile to
buy one of these parts and select it at random?



Answer:

Expected return value of part = .75*(14.50) + .25(0) = 10.875


Therefore, you expect to gain approximately $.87 on each part you buy,
and it is worthwhile to buy one selected at random.


46.


The Connecticut Daily Numbers game uses a selection procedure
very similar to the one described in the following paragraph
for the selection of random numbers.


There are 10 identical ping pong balls on which the
digits 0, 1, ..., 9 have been written. After mixing
the balls thoroughly in a box, one is selected with-
out looking. The digit written on the ball is record-
ed, and then the ball is put back in the box. This
whole process of mixing, selecting, writing down a
digit, and returning the ball to the box is repeated
again and again.


The first 100 digits selected by the Connecticut Lottery showed
the following distribution:


digit: 0 1 2 3 4 5 6 7 8 9


number of
occurrences: 23 7 5 8 12 11 8 9 7 10


Test the hypothesis that all 10 digits are equally likely.
The descriptive level, DELTA, for the test satisfies:


a. DELTA < .01 c. .025 < DELTA < .10
b. .01 < DELTA < .025 d. .10 < DELTA



Answer:

a. DELTA < .01


H(O): All digits are equally likely to occur. (This defines
a Goodness of Fit Test for a uniform distribution.)


Expected frequency for each cell is 10.


CHISQ(calc) = [[(23-10)**2!+[(7-10)**2!+[(5-10)**2!+[(8-10)**2!+
[(12-10)**2!+[(11-10)**2!+[(8-10)**2!+[(9-10)**2!+
[(7-10)**2!+[(10-10)**2!!/10
= 22.6


The probability of obtaining CHISQ(calculated) = 22.6 is less than
.01. (This DELTA value was obtained from Table of CHISQUARE dis-
tribution with 9 df.)


47.


Among twenty-five articles, nine are defective, six having only minor
defects and three having major defects. Determine the probability
that an article selected at random has major defects given that it
has defects.


a. 1/3
b. .25
c. .24
d. .08



Answer:

a. 1/3


P[MD/D! = (3/25)/(9/25)
= 3/9
= 1/3


48.


Among twenty-five articles eight are defective, six having only
minor defects and two having major defects. Determine the pro-
bability that an article selected at random has major defects
given that it has defects.


(a) .08 (c) 1/3
(b) .25 (d) .24



Answer:

(b) .25


P(MDD) = P(MD and D)/P(D)
= (2/25)/(8/25)
= .25


49.


The following table shows the composition of employees at Dwinal's
Inn.


(FT) ^ (PT) ^
Full-time ^ Part-time ^ TOTAL
------------------------------------
Waiters (W) 20 ^ ^ 30
-------------------------------------
Bartenders (B) ^ ^
-------------------------------------
Cooks (C) 10 ^ ^ 15
-------------------------------------
TOTAL ^ 15 ^ 50



a) Complete the above table.


b) From this table structure (form) another table showing all
the marginal and joint probabilities.


c) Find the following conditional probabilities:


i) P(PTW) = ?
ii) P(BFT) = ?
iii) P(BC) = ?
iv) P(CPT) = ?



Answer:

a)
(FT) ^ (PT) ^
Full-time ^ Part-time ^ TOTAL
------------------------------------
Waiters (W) 20 ^ 10 ^ 30
------------------------------------
Bartenders (B) 5 ^ 0 ^ 5
------------------------------------
Cooks (C) 10 ^ 5 ^ 15
------------------------------------
TOTAL 35 ^ 15 ^ 50



b) Joint Marginal
----------------------
^ .4 ^ .2 ^ .6 ^
----------------------
Joint ^ .1 ^ 0.00 ^ .1 ^
----------------------
^ .2 ^ .1 ^ .3 ^
----------------------
Marginal ^ .7 ^ .3 ^ ^
----------------------


c) i) P(PTW) = P(PT INTRSCT W)/P(W)
= .2/.6
= .33


ii) P(BFT) = P(B INTRSCT FT)/P(FT)
= .1/.7
= .143


iii) P(BC) = P(B INTRSCT C)/P(C)
= 0.0/.3
= 0


iv) P(CPT) = P(C INTRSCT PT)/P(PT)
= .1/.3
= .33


50.


A certain kind of job opening can be filled by hiring only  either  a
high school graduate or a college graduate. In the past all hirings
for the job have resulted in 80% of the hirings being successful
according to the company's evaluation. It is also known that among
all failures at the job, 40% have been high school graduates, while
among all successes only 30% have been high school graduates. Using
the percentages as probabilities, find the probability that if the
job is given to a high school graduate it will be successful.



Answer:

Using Bayes' Law:


P(success) = .8 P(HSF) = .4 P(HSS) = .3


P(SHS) = [P(S)*P(HSS)!/[P(S)*P(HSS) + P(F)*P(HSF)!
= ((.8)(.3))/((.8)(.3) + (.2)(.4)) = .75


or the population breaks down as follows:


Successful Failure
------------------------
High school graduate ^ 24% ^ 8% ^ 32%
------------------------
College graduate ^ 56% ^ 12% ^ 68%
------------------------
80% 20%


Therefore, Prob(successfulhigh school graduate) = 24/32
= .75


51.


Suppose in a 500 mile race there are 12 entries. 4 cars are to be
placed in each of three rows to start the race. How many ways can
the first row of cars in the race be formed?



Answer:

4]*12C4 = 11880


or


12P4 = 11880


52.


An ice cream store has three sizes of ice cream cones, small, medium,
and large. If four cones are randomly selected, one at a time, what
is the probability that a small cone will be selected before three
large cones are selected?


a. 1/3 b. 2/3 c. 64/81 d. 9/16 e. 61/64



Answer:

c. 64/81


1 - P(no S selected) - P(LLLS) = ((2/3)**4) - ((1/3)**4)
= 1 - (16/81) - (1/81)
= 64/81


OR:


Sxxx 1/3 27/81 !
-Sxx 2/3*1/3 18/81 !
--Sx 2/3*2/3*1/3 12/81 ! 64/81
MMMS ((1/3)**4) 1/81 !
MMLS 3*((1/3)**2)((1/3)**2) 3/81 !
MLLS 3*((1/3)**2)((1/3)**2) 3/81 !


53.


A certain assembly consists of two sections, A and B, which are bolted
together. In a bin of 100 assemblies, 12 have only section A defective,
10 have only section B defective, and 2 have both section A and section
B defective. What is the probability of choosing, without replacement,
2 assemblies from the bin which have neither section A nor section B
defective?


a. (76)**2/(100)**2
b. (98)**2/(100)**2
c. 98(97)/[100(99)!
d. 76(75)/[100(99)!
e. none of these



Answer:

d. 76(75)/[100(99)!
# of sections without defectives = 100 - (12 + 10 + 2)
= 100 - 24 = 76
P(of no defectives) = (76/100)*(75/99)


54.


Given that each of three identical devices operating independently
has probability 3/4 of operating successfully, determine the pro-
bability that exactly two of the three fail.


(a) 3/64 (c) 4/64
(b) 9/64 (d) 27/64



Answer:

(b) 9/64


P(2 failures and 1 success) = 3*(1/4)*(1/4)*(3/4)
= 3*(3/64)
= 9/64


55.


The  life in months of service before failure of the color television
picture tube in 8 television sets manufactured by Firm A and 8 sets
manufactured by Firm B are as follows (arranged according to size):


Firm A: 25, 29, 31, 32, 35, 37, 39, 40
Firm B: 34, 36, 41, 43, 44, 45, 47, 48


Let ETA(A) and ETA(B) denote the median service life of picture tubes
produced by the 2 firms. A confidence interval for ETA(B) - ETA(A) is
bounded by the dth smallest and the dth largest of all differences of
B- and A-observations. For confidence coefficient .99, we take d
equal to:


(a) 9 (b) 14 (c) 15 (d) 17



Answer:

(a) 9


56.


Prices of shares on the stock market are recorded to 1/8th of a dollar.
We might then expect to find stocks selling at prices ending in:


0 1/8 1/4 3/8 1/2 5/8 3/4 7/8


with about equal frequency. On a certain day, 120 stocks showed the
following frequencies:


26 8 15 9 22 12 19 9.


H(O): P(0) = P(1/8) = P(1/4) = P(3/8) = P(1/2) = P(5/8) = P(3/4) =
P(7/8), or the distribution of final eighths is uniform.
H(A): the distribution of final eighths is not uniform.


At significance level ALPHA = .10, the hypothesis being tested is
rejected provided the test statistic is:


a) greater than 1.28. d) greater than 12.0.
b) greater than 14.7. e) smaller than 2.83.
c) smaller than 4.17.



Answer:

d) greater than 12.0.


CHISQUARE(critical, df=7, ALPHA=.10) = 12.0


57.


Prices of shares on the stock market are recorded to 1/8th of a dollar.
We might then expect to find stocks selling at prices ending in:


0 1/8 1/4 3/8 1/2 5/8 3/4 7/8


with about equal frequency. On a certain day, 120 stocks showed the
following frequencies:


26 8 15 9 22 12 19 9.


H(O): P(0) = P(1/8) = P(1/4) = P(3/8) = P(1/2) = P(5/8) = P(3/4) =
P(7/8), or the distribution of final eighths is uniform.
H(A): the distribution of final eighths is not uniform.


In testing the above hypothesis, all expected frequencies equal:


a) 12 b) 60 c) 125 d) 500 e) none of these



Answer:

e) none of these


e = 120 * (1/8)
= 15


58.


A manufacturer of floor polish conducted a consumer-preference experi-
ment to determine which of the five different floor polishes was
superior. A sample of 100 housewives viewed five patches of flooring
which received the five polishes. Each housewife indicated the patch
that she considered superior in appearance. The lighting, background,
etc., were approximately the same for all five patches. The result
of the survey was as follows:


Polish A B C D E TOTAL
Frequency 27 17 15 22 19 100


a. State the hypothesis of "no preference" in statistical
terminology.
b. State the test statistic used.
c. Test the hypothesis at ALPHA = .10 and draw the conclusion.



Answer:

a.) H(O): P(A) = P(B) = P(C) = P(D) = P(E) = 1/5
H(A): P(A) =/= P(B) =/= P(C) =/= P(D) =/= P(E)


b.) Use CHISQUARE with 4 df


c.) CHISQUARE (calculated) = Sum [((O-E)**2)/E!


A B C D E TOTAL
O 27 17 15 22 19 100
E = n*p(i) 20 20 20 20 20 100
O-E 7 -3 -5 2 -1 0
(O-E)**2 49 9 25 4 1
((O-E)**2)/E 2.45 .45 1.25 .20 .05 4.40


CHI SQUARE (calculated) = 4.40
CHI SQUARE (critical, df = 4, ALPHA = .10) = 7.78


Therefore the data supports the null hypothesis at the .10
level and a conclusion that no significant consumer-
preference for floor polish has been found.


59.


A  market  research  firm  was  hired to test consumer preference for
different packages for some soap. Two hundred randomly selected
housewives were given a package of soap wrapped in each of the
following colors: red, white, blue, green. After a month in which
they could use the soap, they were given a free case of the color
package of their choice. There were no markings to differentiate the
packages - just color - and the soap itself was the same. Is there a
significant difference in the colors they selected?


Color Package No. Housewives Choosing
------------- -----------------------
red 50
white 75
blue 30
green 45



Answer:

Contingency table:
0 50 75 30 45
E 50 50 50 50


CHISQ = ((50 - 50)**2)/50 + ((75 - 50)**2)/50 + ((30 - 50)**2)/50 +
((45 - 50)**2)/50
= 0 + 12.5 + 8 + .5
= 21


df = (K - 1) = 3


P(CHISQ(3) >= 21) < .001


Reject H(O) at ALPHA = .10, .05, or .01.
Conclude that there is a significant difference in the colors chosen.


60.


Horse-racing fans often insist that in a race around a circular track
the horses in certain post positions have significant advantages.
Post position 1 is nearest to the inside rail and post position 8 is
farthest to the outside. Suppose we observed the results of races for
one month of racing at a track. (Horses were randomly assigned to
post positions.) Results were as follows:


post position 1 2 3 4 5 6 7 8
no. of wins 29 15 18 25 17 10 15 11


Use CHISQUARE to test whether there is any difference in number of wins.
State the null hypothesis you are testing.



Answer:

H(O): No difference in number of wins per post position (uniform
distribution).


Total number of wins = 140.
Expected number of wins/position = 140/8 = 17.5


CHISQUARE(calculated) = ((29 - 17.5)**2)/17.5 + ((15 - 17.5)**2)/17.5 +
((18 - 17.5)**2)/17.5 + ((25 - 17.5)**2)/17.5 +
((17 - 17.5)**2)/17.5 + ((10 - 17.5)**2)/17.5 +
((15 - 17.5)**2)/17.5 + ((11 - 17.5)**2)/17.5
= 17.143


ALPHA = .05 (or .10 or .01) usually; df = 8 - 1 = 7


.01 <= P(CHISQ(7)=17.143) <= .05


Therefore, the conclusion is to reject H(O) at ALPHA = .05 or .10,
and continue H(O) at ALPHA = .01.


61.


New York State Thruway Commission is examining lane usage on the bridge
leading to the Big Apple (Tappan Zee Bridge). It is hypothesized that,
during rush hours, traffic in vehicles/hour in the rightmost four lanes
is in the ratio:


Lane 1 2 3 4
^--------------------------^
LEFT ^ 11 ^ 12 ^ 10 ^ 7 ^ RIGHT
^__________________________^


In essence this means that of the total traffic for the four lanes:


11/40 took lane 1
12/40 took lane 2
10/40 took lane 3
7/40 took lane 4


A sampling of lane traffic for one day is as follows:


^-----------------------------------^
^ 8200 ^ 9000 ^ 7350 ^ 5200 ^
^___________________________________^


Can we conclude at ALPHA = .05 that traffic lane usage occurred in
the hypothesized ratios?


(a) Pick the most appropriate hypothesis test. What is it?
(b) State the null and alternative hypotheses.
(c) Compute a test statistic.
(d) Indicate the critical value or values.
(e) Do you continue or reject H(O)? What is your conclusion
relative to the question posed above?



Answer:

(a) CHISQUARE goodness of fit


(b) H(O): The lane usage occurs in the ratio of 11:12:10:7.
H(A): The lane usage is other than 11:12:10:7.


(c) observed ^ 8200 ^ 9000 ^ 7350 ^ 5200 ^ 29750
__________^________^________^________^________^_______
expected ^8181.25 ^ 8925 ^ 7437.5 ^5206.25 ^


CHISQUARE(calc) = .042 + .630 + 1.029 + .008 = 1.709


(d) CHISQUARE(.05,3) = 7.815


(e) Do not reject H(O) since CHISQUARE(calc) < CHISQUARE(crit).
Conclude that lane usage occurs in hypothesized ratios.


62.


The following is the number of cars produced in an auto plant.


MON TUE WED THU FRI
-------------------------------
20 25 25 20 10


Test the null hypothesis at ALPHA = .01 that production does not depend
on the day of the week.



Answer:

n(i) 20 25 25 20 10
-------------------------------------
E[n(i)! 20 20 20 20 20


CHISQUARE(calculated) = 7.5
CHISQUARE(critical, df=4, ALPHA=.01) = 13.3


Since CHISQUARE(calculated) < CHISQUARE(critical), we cannot reject
the null hypothesis at ALPHA = .01.


63.


It is desired to see whether there is a relationship in tastes for an
expensive car and owning a tri-maran. A survey of 200 upper-class
potential purchasers of cars and tri-marans gave these responses:


Want Expensive Do Not Want Expensive Totals
Car Car


Want Tri-maran 100 40 140
Don't Want Tri-maran 20 40 60


Totals 120 80 200


Specifically, it is desired to test H(O): the desire for a tri-maran
is independent of a desire for an expensive car.


If H(O) were true, the estimate of the expected number of those who do
not want either a tri-maran or an expensive car would be:


a) (140*120)/200 d) (140*80)/200
b) (140*60)/200 e) (120*60)/200
c) (60*80)/200



Answer:

c) (60*80)/200


Expected Value = (80/200)(60/200)(200)
= (80*60)/200


64.


It is desired to see if there is a relationship in tastes for an
expensive car and owning a tri-maran. A survey of 200 upper-upper
class potential purchasers of cars and tri-marans gave these
responses:


want expensive don't want expensive totals
car car


want tri-maran 100 40 140


don't want 20 40 60
tri-maran


totals 120 80 200


Specifically, it is desired to test H(O): the desire for a tri-
maran is independent of the desire for an expensive car.


The contribution to the Chi-square statistic of the term: desire tri-
maran - and desire expensive car is:


a) ((100-120)**2)/120 d) ((100-78)**2)/78
b) ((100-140)**2)/140 e) ((100-80)**2)/80
c) ((100-84)**2)/84



Answer:

c) ((100-84)**2)/84


Expected Value = (120/200)(140/200)(200)
= 84


Contribution = ((100-84)**2)/84


65.


It is desired to see if there is a relationship in tastes for an
expensive car and owning a tri-maran. A survey of 200 upper-upper
class potential buyers of cars and tri-marans gave these results:


want expensive don't want expensive total
car car


want tri-maran 100 40 140


don't want 20 40 60
tri-maran


totals 120 80 200


Specifically, it is desired to test H(O): the desire for a tri-maran
is independent of the desire for an expensive car vs. H(1): there is
a relationship at level ALPHA = .20.


Given that the appropriate normalized statistic is greater than 23
and less than 30, one should ______ H(O): independence since the
value ______ is ______ than the correct cutoff point.


a) reject, 30, bigger d) continue, 23, bigger
b) reject, 23, bigger e) continue, 23, smaller
c) continue, 30, bigger



Answer:

b) reject, 23, bigger


CHISQUARE(critical, df = 1, ALPHA = .20) = 1.64
and if CHISQUARE(calculated) is in the interval (23,30)
CHISQUARE(calculated) > CHISQuARE(critical), which implies
that H(O) should be rejected.


66.


The life in months of service before failure of the color television
picture tube in 8 television sets manufactured by Firm B are as follows
(arranged according to size):


Firm B: 34, 36, 41, 43, 44, 45, 47, 48


Let ETA(B) denote the median service life of picture tubes produced by
the firm. To test the hypothesis ETA(B) = 38.5 against the alternative
ETA(B) =/= 38.5, the value of CHISQ(calculated) for the median test
equals:


(a) 8 (b) 6 (c) 4 (d) 2



Answer:

(d) 2


^ Above 38.5 ^ Below 38.5
-----------------------------------
observed ^ 2 ^ 6 ^
-----------------------------------
expected ^ 4 ^ 4 ^
-----------------------------------


CHISQ = [[(2 - 4)**2! + [(6 - 4)**2!!/4 = 2


67.


A car rental agency is in the process of deciding the brand  of  tire
to purchase as standard equipment for their fleet. As part of the
decision process, they are interested in studying the treadlife of
five competing brands. Based on testing, the research department
determined that each of 10 tires of each brand will last the
following number of miles (in 1000's to the nearest 1000). Compute a
CHISQ median test. Test the null hypothesis H(O): no difference among
tires, with ALPHA = .05.


Tire Brands
-------------
A B C D E


40 45 30 35 28
42 40 32 40 32
45 40 31 42 34
38 44 35 36 28
40 42 28 38 32
41 44 29 34 26
43 41 31 41 29
43 41 30 41 31
37 43 34 35 25
40 41 27 37 31



Answer:

MD(overall) = 37


Observed:
A B C D E
- - - - -
above MD 9 10 0 5 0
below MD 0 0 10 4 10


Expected:


4.5 5 5 4.5 5
4.5 5 5 4.5 5


CHISQ(calculated) = 41.34
CHISQ(ALPHA=.10, df=4) = 7.779


CHISQ(calculated) > CHISQ(critical), therefore reject H(O) and
conclude the samples are from populations with different medians.


68.


Test that there is no relationship between  performance  in  a company's
training program and ultimate success in the job. Use ALPHA = 0.01. The
following data is obtained from 400 samples of a company.


PERFORMANCE IN TRAINING PROGRAM


A B C
----------------------------------
SUCCESS A ^ 63 ^ 49 ^ 9 ^
IN JOB ----------------------------------
B ^ 60 ^ 79 ^ 28 ^
----------------------------------
C ^ 29 ^ 60 ^ 23 ^
----------------------------------



Answer:

H(O): There is no relationship between performance in the training
program and success in the job.
H(A): There is a relationship between performance in the training
program and success in the job.


A B C
-------------------------------------------
A ^ 63(45.98) ^ 49(56.87) ^ 9(18.15) ^ 121
-------------------------------------------
B ^ 60(63.46) ^ 79(78.49) ^ 28(25.05) ^ 167
-------------------------------------------
C ^ 29(42.56) ^ 60(52.64) ^ 23(16.8) ^ 112
-------------------------------------------
152 188 60 400


CHISQUARE = ((63 - 45.98)**2)/45.98 + ... + ((23 - 16.8)**2)/16.8
= 20.18


Critical value = 13.3
df = (3 - 1)(3 - 1) = 4


Since CHISQUARE(critical) < CHISQUARE(calculated), reject the null
hypothesis and conclude that there is a relationship.


69.


In order to find out how viewing preferences of TV viewers change
over the years, networks conduct viewer surveys. In such a survey,
viewers of sports events were asked to name their favorite sport.
The following table gives responses for the years 1960 and 1970.


1960 1970
---- ----


football 150 250
baseball 250 150
basketball 100 100


The null hypothesis tested by an appropriate CHISQUARE test is:


a) 1970 viewers prefer football to baseball.
b) 1960 viewers prefer baseball to football.
c) There have been no changes in viewing preferences between 1960
and 1970.
d) Viewing habits of TV watchers have changed between 1960 and 1970.
e) The number of sports viewers has remained the same over the years.



Answer:

c) There have been no changes in viewing preferences between 1960
and 1970.


70.


In order to find out how viewing preferences of TV viewers change
over the years, networks conduct viewer surveys. In such a survey,
viewers of sports events were asked their favorite sport. The fol-
lowing table gives responses for the years 1960 and 1970.


1960 1970
---- ----


football 150 250
baseball 250 150
basketball 100 100


Using the null hypothesis that there have been no changes in viewing
preferences between 1960 and 1970, the value of CHISQUARE for the given
table is:


a) less than 2. d) between 25 and 45.
b) between 2 and 10. e) greater than 45.
c) between 10 and 25.



Answer:

e) greater than 45.


Expected values:


1960 1970
---- ----
football 200 200
baseball 200 200
basketball 100 100


CHISQUARE(calc) = SUM([(O-E)**2!/E)
= ([(150-200)**2!/200) + ([(250-200)**2!/200) +
([(100-100)**2!/100) + ([(250-200)**2!/200) +
([(150-200)**2!/200) + ([(100-100)**2!/100)
= (2500/200) + (2500/200) + 0 + (2500/200) +
(2500/200) + 0
= 10000/200
= 50


71.


In order to find out how viewing preferences of TV viewers change
over the years, networks conduct viewer surveys. In such a survey,
viewers of sports events were asked their favorite sport. The fol-
lowing table gives responses for the years 1960 and 1970.


1960 1970
---- ----


football 150 250
baseball 250 150
basketball 100 100


The network was interested in testing the null hypothesis that there
have been no changes in viewing preferences between 1960 and 1970.
If the correct value of CHISQUARE is sufficiently high to reject
the hypothesis being tested, then we can conclude that:


a) viewing habits have not changed over the 10-year span.
b) basketball is more popular in 1970 than in 1960.
c) both football and baseball have become more popular in 1970.
d) the appeal of football has increased and that of baseball has
decreased between 1960 and 1970.
e) the appeal of baseball has increased and that of football has
decreased between 1960 and 1970.



Answer:

d) the appeal of football has increased and that of baseball has
decreased between 1960 and 1970.


Since we reject H(O): that there has been no change in viewing
habits, and from the table, we can see that more people preferred
football in 1970 than in 1960, that fewer people preferred base-
ball in 1970 than in 1960, and that there was no change in pre-
ference with regards to basketball, the above conclusion is
appropriate.


72.


Given the following data matrix:


AUTOMOBILES
CHEV.
CORVETTE MUSTANG II VW RABBIT MONTE CARLO
-------- ----------- --------- -----------
OWNER'S AGE


less than 40 21 143 36 28 ^ 228


greater than
or equal to
40 26 61 35 41 ^ 163
---- ---- ---- ---- ---
47 204 71 69 ^ 391


Test at ALPHA = .05 if the populations of cars have different
distributions of ages of persons owning them. If the null hypothesis
is rejected, construct confidence intervals about the proportion
differences as a post-hoc test procedure.



Answer:

a. Expected Frequencies:
27.41 118.96 41.40 40.24
19.59 85.04 29.60 28.76


CHISQ(calculated) = 21.88
CHISQ(ALPHA=.05, df=3) = 7.815


CHISQ(calculated) > CHISQ(critical), therefore reject H(O) and con-
clude that the distributions are different.


b. P(1) P(2) P(3) P(4)
---- ---- ---- ----
.4468 .7010 .5070 .4058


P(i) - P(j) +/- SQRT(CHISQ(critical) * (p(i)q(i)/n + p(j)q(j)/n))


Pairs C.I.
----- ----
1 - 2 -.2542 +/- .2216 *
1 - 3 -.0602 +/- .2619
1 - 4 -.0041 +/- .2615
2 - 3 .1940 +/- .1885 *
2 - 4 .2952 +/- .1879 *
3 - 4 .1012 +/- .1751


Conclude: P(1) - P(2)
P(2) - P(3) caused rejection of H(O)
P(2) - P(4)


73.


Frequency of repairs are being examined for two populations of cars,
foreign and domestic. Given the sample data below, can we conclude
that the population distributions are the same at ALPHA = .10?


Frequency of Repairs/Year
0 1 - 2 3 - 5 More than 5
- ----- ----- -----------
Foreign Autos 6 ^ 11 ^ 11 ^ 7
Domestic Autos 100 ^ 50 ^ 22 ^ 17



Answer:

Expected frequencies:


0 1 - 2 3 - 5 More than 5
- ----- ----- -----------
Foreign Autos 16.56 ^ 9.53 ^ 5.16 ^ 3.75
Domestic Autos 89.44 ^ 51.47 ^ 27.84 ^ 20.25


CHISQUARE(calculated) = 19.439
CHISQUARE(critical, ALPHA=.10, df=3) = 6.251


Since CHISQUARE(calculated) > CHISQUARE(critical), reject H(0)
and conclude that the distributions are not the same.


74.


Below are the results of an insurance survey to relate amount of
insurance to income.


Amount of Insurance Income
Family (in Thousand $) (in Thousand $)
------ ------------------- ---------------
A 9 10
B 20 14
C 22 15
D 15 14
E 17 14
F 30 25
G 18 12
H 25 16
I 10 12
J 20 15


Find RHO and TAU and test each for significance.
(Note: data has ties.)



Answer:

a. R(X) R(Y) (R(X)-R(Y)**2
---- ---- -------------
1 1 0
2 2.5 .25
3 5 4
4 5 1
5 2.5 6.25
6.5 5 2.25
6.5 7.5 1
8 7.5 .25
9 9 0
10 10 0
-----
15


H(O): X and Y are independent
H(A): X and Y are correlated


RHO(crit) = .6364
RHO(calc) = 1 - ((6*15)/(10*99))
= .9091


Since RHO(calc) > RHO(crit), reject H(O) and conclude that
X and Y are correlated.


b. N(C) N(D) #Neither
---- ---- --------
9 0 0
7 0 1
4 1 2
4 1 1
5 0 0
4 0 0
2 0 1
2 0 0
1 0 0
0 0 0
-- - -
38 2 5


U(X) = (1/2) * (2*1) = 1
U(Y) = (1/2) * (2*1 + 3*2 + 2*1) = 5


TAU = (38 - 2)/SQRT(44)*SQRT(40)
= 36/41.952
= .8581


H(O): X and Y are independent
H(A): X and Y are correlated


T(calc) = 38 - 2 = 36
T(crit) = 21


Since T(calc) > T(crit), reject H(O) and conclude that
X and Y are correlated.


75.


The observed life, in months of service, before failure for the color
television picture tube in 8 television sets manufactured by Firm B are
as follows (arranged according to size):


Firm B: 34 36 41 43 44 45 47 48


Let ETA(B) denote the median service life of picture tubes produced by
the firm.


The point estimate of ETA(B) equals:


a. 35 b. 43.5 c. 44 d. 33.5



Answer:

b. 43.5


n = 8
Therefore, the median equals the average of the two middle values.
Median = (43 + 44)/2 = 43.5
or any number between 43 and 44.


76.


The life in months of service before failure of the color television
picture tubes in 8 television sets manufactured by Firm A and 8 sets
manufactured by Firm B are as follows (arranged according to size):


Firm A: 25 29 31 32 35 37 39 40
Firm B: 34 36 41 43 44 45 47 48


Let ETA(A) and ETA(B) denote the median service life of picture tubes
produced by the two firms.


The S-interval with confidence coefficient .71 for ETA(A) is bounded
by:


a. 29 and 39 b. 36 and 47 c. 31 and 37 d. 41 and 45



Answer:

c. 31 and 37


GAMMA = .71
n = 8


From the Table of d-factors for Sign Test and Confidence Intervals
for the median, d = 3. The confidence interval is bounded by the
d-smallest and d-largest sample observations. Thus, the S-inter-
val about the median is bounded by the third smallest and third
largest sample observations, or 31 and 37.


77.


The life in months of service before failure of the color television
picture tube in 8 television sets manufactured by Firm A and 8 sets
manufactured by Firm B are as follows (arranged according to size):


Firm A: 25 29 31 32 35 37 39 40
Firm B: 34 36 41 43 44 45 47 48


Let ETA(A) and ETA(B) denote the median service life of picture tubes
produced by the two firms.


The W-interval with confidence coefficient .98 for ETA(A) is bounded
by:


a. 29 and 39 b. 36 and 47 c. 35 and 47.5 d. 27 and 39.5



Answer:

d. 27 and 39.5


n = 8
Using a table of critical values for the W-interval with ALPHA=.02,
d=2, the table of averages:


^ 25 29 31 32 35 37 39 40
--------------------------------------------
25 ^ 25 [27! 28
29 ^ 29 30
31 ^ 31
32 ^
35 ^
37 ^ 37 38 38.5
39 ^ 39 [39.5!
40 ^ 40


W-interval is 27 and 39.5.


78.


Explain which measure of central tendency is most useful when reporting
an average income for persons employed by Beech Aircraft.



Answer:

The median would be the most useful measure of central tendency
when reporting an average income. The distribution
of income is positively skewed since there are relatively
few people who earn a substantially high income. These
extreme values would affect the mean by inflating it. The
median, which simply indicates the point where half the
observations are above and half are below, would
not be affected by such extreme values and in this sense
would more accurately convey the "average" income.


79.


A college athlete, equally talented in baseball and football, compares
the income potential in the two sports before choosing to specialize in
one of them. The data available for annual income from all sources is
below:


Mean Median 90th percentile
Football players: $25,000 $20,000 $70,000


Baseball players: $23,000 $28,000 $50,000


a. Give a one-sentence interpretation of the mean which indicates how
it can be used to help him to decide between the two sports.
b. Do the same for the median and the same for the 90th percentile.
c. Based on the data above, which sport would you suggest he choose?
Indicate why.



Answer:

a. The mean is the average dollar income (or expected income) and in-
dicates football is the better choice, though the difference
between the two means is not great.


b. The median is the midpoint in terms of rankings and is substantially
higher for baseball. The 90th percentile is the dollar value that
exceeds 90 percent of the salaries and indicates football is the
best choice.


c. The distribution of salaries in football is skewed right, indicating
higher potential salary extremes -- a riskier but higher - payoff
choice. A risk-averse athlete might choose baseball, whose left-
skewed salaries suggest higher "typical" salaries.


80.


The life in months of service before failure of the color television
picture tube in 8 television sets manufactured by Firm A and 8 sets
manufactured by Firm B are as follows (arranged according to size):


Firm A: 25 29 31 32 35 37 39 40
Firm B: 34 36 41 43 44 45 47 48


Let ETA(A) and ETA(B) denote the median service life of picture tubes
produced by the two firms.


You want to test the hypothesis ETA(A) = 38 against the alternative
ETA(A) < 38. The correct sign test statistic and its value is:


a. S(+) = 2 b. S(-) = 2 c. S(+) = 3 d. S(-) = 3



Answer:

a. S(+) = 2


Since we have H(A): ETA(A) < 38, we expect fewer observa-
tions to be larger than the median, and the correct test
statistic is S(+). Its value is:


S(+) = # observations > 38 = 2.


81.


A student organization surveyed food prices at 4 local food stores:


Stores
Item Weight/volume A B C D
----------------------------------------------------------------------
Apples per lb .30 .30 .33 .45
Lettuce one head .39 .25 .25 .39
Milk, homogenized 1/2 gal container .84 .76 .81 .76
Eggs: fresh, grade A, 1 doz .89 .83 .69 .93
large
Hamburger per lb 1.29 .99 .99 1.09
Frying chicken cut up, per lb .65 .46 .59 .69
Chicken noodle soup 10 3/4 oz can .22 .19 .22 .19
White bread 1 lb loaf .48 .59 .48 .33
Raviolios with meat 15 oz .45 .41 .43 .35
sauce
Soda qt bottle .38 .40 .37 .39
Coffee 4 oz 1.39 1.31 1.29 1.23
Peanut butter 28 oz jar 1.19 1.16 1.17 1.09
Laundry soap 3 lb 1 oz .89 .85 .81 .80


You may want to compare prices at stores C and D. An appropriate
two-sample test can be based on either:


a. the sign test or the median test.
b. the Wilcoxon one- or two- sample test.
c. the sign test or Wilcoxon one-sample test.
d. the median test or Wilcoxon two-sample test.



Answer:

c. the sign test or Wilcoxon one-sample test.


82.


The observed life, in months of service, before failure for the color
television picture tube in 8 television sets manufactured by Firm B are
as follows (arranged according to size):


Firm B: 34, 36, 41, 43, 44, 45, 47, 48


Let ETA(B) denote the median service life of picture tubes produced by
the firm and assume the lifetimes have symmetric distributions. You
want to test the hypothesis ETA(B) = 38.5 against the alternative
ETA(B) =/= 38.5 using the Wilcoxon signed rank test. From the
following list, select the most reasonable test statistic:


(a) W(+) = 2 (b) W(+) = 5 (c) W(-) = 5 (d) W(-) = 2



Answer:

(c) W(-) = 5


X(i) D(i) ]D(i)] Rank
----- ----- ------ ----


34 -4.5 4.5 3.5
36 -2.5 2.5 1.5
41 2.5 2.5 1.5
43 4.5 4.5 3.5
44 5.5 5.5 5
45 6.5 6.5 6
47 8.5 8.5 7
48 9.5 9.5 8


W(-) = SUM(R(-)) = 3.5 + 1.5 = 5


83.


Ten randomly selected cars of a specific year, make, and model and
with similar equipment, are subjected to an EPA gasoline mileage
test. The resulting miles/gallon are:


24.6, 30.0, 28.2, 27.4, 26.8,
23.9, 22.2, 26.4, 32.6, 28.8


Using the Wilcoxon Median Test, test the hypothesis that the population
median is 30 miles/gallon at the ALPHA = .10 level. Construct a 90%
confidence interval for the median.



Answer:

Measurement D(i) ]D(i)] Rank
----------- ---- ------ -----
24.6 -5.4 5.4 7
30.0 0 0 -
28.2 -1.8 1.8 2
27.4 -2.6 2.6 3.5
26.8 -3.2 3.2 5
23.9 -6.1 6.1 8
22.2 -7.8 7.8 9
26.4 -3.6 3.6 6
32.6 2.6 2.6 3.5
28.8 -1.2 1.2 1


R+ = 3.5
---> T = 3.5
R- = 41.5


Lower w = 9
Upper w = (9*10)/2 - 9 = 36


Since (T=3.5) < 9, we reject H(0): median = 30.


For the confidence interval, we need the 11th largest and smallest
values, to be obtained from the following table:


^ 32.6 30.0 28.8 28.2 27.4 26.8 26.4 24.6 23.9 22.2
--------------------------------------------------------------------
32.6 ^ 32.6 31.3 30.7 30.4 30.3 29.7 29.5 28.6 28.25 27.4
30.0 ^ 30.0 29.4 29.1 28.7 28.4 -- -- -- --
28.8 ^ [28.8! 28.5 28.1 27.8 -- -- -- --
28.2 ^ -- -- -- -- -- 26.05 [25.2!
27.4 ^ -- -- -- 26.0 25.65 24.8
26.8 ^ -- -- 25.7 25.35 24.5
26.4 ^ 26.4 25.5 25.15 24.3
24.6 ^ 24.6 24.25 23.4
23.9 ^ 23.9 23.05
22.2 ^ 22.2


Therefore, 90% C.I.: from 25.2 to 28.8.


84.


The  life in months of service before failure of the color television
picture tube in 8 television sets manufactured by Firm A and 8 sets
manufactured by Firm B are as follows (arranged according to size):


Firm A: 25, 29, 31, 32, 35, 37, 39, 40
Firm B: 34, 36, 41, 43, 44, 45, 47, 48


Against the two-sided alternative, the Wilcoxon (Mann Whitney) two-
sample test has descriptive level:


(a) .050 (b) .010 (c) .007 (d) .004



Answer:

(c) .007


U(A) = 0 + 0 + 0 + 0 + 1 + 2 + 2 + 2
= 7
U(B) = 64 - 7
= 57
P(U(A) <= 7) = .007


85.


The  life in months of service before failure of the color television
picture tube in 8 television sets manufactured by Firm A and 8 sets
manufactured by Firm B are as follows (arranged according to size):


Firm A: 25, 29, 31, 32, 35, 37, 39, 40
Firm B: 34, 36, 41, 43, 44, 45, 47, 48


Suppose the data is ranked as one combined set. The sum of the ranks
R(B) for the B-observations equals:


(a) 36 (b) 43 (c) 57 (d) 93



Answer:

(d) 93


Table of Ranks:


Firm A: 1 2 3 4 6 8 9 10
Firm B: 5 7 11 12 13 14 15 16


SUM(R(B)) = 5 + 7 + 11 + 12 + 13 + 14 + 15 + 16
= 93


86.


An expert gave the following subjective ratings of the driving abilities
of a group of two subjects. Test the hypothesis that according to the
expert's ratings, women are better drivers than men. (Use a non-para-
metric test with ALPHA = .05.) (NOTE: higher scores indicate better
drivers.)


Expert's Ratings


Male 7, 4, 2, 3, 12, 1, 14, 10, 10
Female 6, 13, 12, 10, 14, 7, 3, 11



Answer:

H(O): Female drivers are worse or as good as male drivers
H(A): Female drivers are better than male drivers


Using the Mann Whitney-Wilcoxon test:


U(M) = 2.5 + 1 + 0 +.5 + 5.5 + 0 + 7.5 + 3.5 + 3.5
= 24


U(critical, onetail, ALPHA=.05,9,8) = 19


Since U(M) > U(critical) continue H(O). Therefore sample evidence was
not strong enough to indicate that females are better drivers than
males.


Using Wilcoxon version of test:


Ranks:
Ratings ^ 1 2 3 4 6 7 10 11 12 13 14
-----------------------------------------------------------------------
M or F ^ M M M,F M F M,F M,M,F F M,F F M,F
Rank ^ 1 2 3.5 5 6 7.5 10 12 13.5 15 16.5


Sum of ranks for females = 3.5 + 6 + 7.5 + 10 + 12 + 13.5 + 15 + 16.5
= 84 = T(F)


Sum of ranks for males = [(17)(18)/2! - 84
= 69 = T(M)


Converting to U statistic:


U(M) = T(M) - [.5 * n(M) * (n(M) + 1)!
= 69 - [.5 * 9 * 10!
= 24


Conclusion reached is the same as above.


87.


A large consulting firm hires a west coast university to provide an
MBA program for its employees. The basic statistics course is taught
at two locations of the firm. After completion of the course, stan-
dardized tests are given to the participating employees at each loca-
tion. Assume the distributions of test scores are symmetric for both
groups. The results are shown below.


Observation Location A Location B
----------- ---------- ----------
1 65 60
2 74 72
3 77 66
4 82 75
5 70 78
6 78 65
7 84 --


Test the hypothesis at ALPHA = .10 that the two samples came from the
same population, or equivalently that the populations have the same
median scores.



Answer:

H(O): The two samples came from the same population.
(median(1) = median(2))
H(A): The two samples came from different populations.
(median(1) =/= median(2))


Using Wilcoxon-Mann Whitney test statistic we find:


A Rank B Rank
-- ---- -- ----
84 13 78 10.5
82 12 75 8
78 10.5 72 6
77 9 66 4
74 7 65 2.5
70 5 60 1
65 2.5
---- ----
Ranked Sums: 59.0 32.0


Smaller Rank Sum = 32; (Note that some books give critical values
for this sum.)
T = 32 - (6 * 7)/2 ; (Transforming to the Mann-Whitney U Statis-
= 11 tic.)


Critical Values: lower U = 9
upper U = (6 * 7) - 9
= 33


Since 9 <= (T=11) < 33, do not reject H(O). We do not have sufficient
evidence to claim a difference between the two populations.


88.


New employees of the ABC corporation are given a training program  to
acquaint them with business procedures and principles. Two groups of
ten each are selected randomly from a large set of new employees. The
first group is trained using Method A, and the second group is
trained using Method B. At the end of the training period, each
group is given the same test to determine how much information has
been assimilated. The data are:


Method A Method B
-------- --------


55 50
70 91
70 90
65 62
62 75
81 88
72 84
58 78
67 82
50 80


Use ALPHA = .05 to test that the two training methods result in the
same amount of assimilated information.



Answer:

X(i) Rank
---- ----


50 1.5
50* 1.5*
55* 3 *
58* 4 *
62* 5.5*
62 5.5
65* 7 *
67* 8 *
70* 9.5*
70* 9.5*
72* 11 *
75 12
78 13
80 14
81* 15 *
82 16
84 17
88 18
90 19
91 20


(* indicates Method A)


S = SUM(R(X(i))) = 1.5 + 3 + 4 + 5.5 + 7 + 8 + 9.5 + 9.5 + 11 + 15
= 74


T = S - n*(n - 1)/2 = 74 - 10*11/2
= 19


lower w = 24
upper w = 10*10 - 24 = 76


T < 24, reject H(0); conclude methods result in different scores.


89.


Eight names are selected at random from the subscriber list of
Magazine A, and eight additional names from the list of Magazine
B. The ages of the subscribers are determined and listed below
(fictitious data):


A: 18, 24, 35, 19, 20, 20, 40, 17
B: 20, 30, 45, 38, 42, 34, 50, 22


a. Using appropriate 5-year age groups, prepare a stem & leaf
plot OR a histogram for each group.


b. Is there evidence (at the 5% level of significance) that the
two magazines appeal to different age groups?


c. Give two possible reasons for choosing the test you chose for
part b instead of some other test.



Answer:

a. STEM & LEAF PLOT:


Magazines
A B
1 ^ 897 ^
2 ^ 400 ^ 02
2 ^ ^
Age 3 ^ ^ 04
Groups 3 ^ 5 ^ 8
4 ^ 0 ^ 2
4 ^ ^ 5
5 ^ ^ 0


HISTOGRAMS:


^ Magazine A
^
5 +
^
4 +
Frequency ^
3 + _________
^ ^ ^ ^
2 + ^ ^ ^
^ ^ ^ ^
1 + ^ ^ ^ ---------
^ ^ ^ ^ ^ ^ ^
----+---+---+---+---+---+---+---+---+---+---+--->
5 10 15 20 25 30 35 40 45 50 55
Age


^ Magazine B
^
5 +
^
4 +
Frequency ^
3 +
^
2 + ----- -----
^ ^ ^ ^ ^
1 + ^ ^ ^ ^----------------
^ ^ ^ ^ ^ ^ ^ ^ ^
----+---+---+---+---+---+---+---+---+---+---+--->
5 10 15 20 25 30 35 40 45 50 55
Age


b. Using the Mann-Whitney U Test:


U(A) = 0 + 2 + 4 + 0 + .5 + .5 + 5 + 0
= 12


Using a table of critical values for this test at 5% level
of significance, U(critical) = 14


Since U(observed) < U(critical), there is evidence that the
two magazines appeal to different age groups.


c. 1) Normality cannot be assumed.
2) The sample sizes are not large enough to avoid necessity
for normality.


90.


Suppose  you  run  a  warehouse  that  stocks  replacement  parts for
appliances. You randomly sample orders for parts for replacement
burner units for two brands (A and B) of electric stoves. Over a
period of 50 weeks you observe the following:


weekly demand number of weeks
for burners Brand A Brand B
------------- -------------------


0 28 22
1 15 21
2 6 7
3 1 0
over 3 0 0


Use the Kolmogorov-Smirnov test to determine if these two distributions
are the same.



Answer:

F(A) F(B) D
---- ---- -
.56 .44 .12
.86 .86 0
.98 1.00 .02
1.00 1.00 0
1.00 1.00 0


P(D(50) >= .12) > .2
Do not reject H(O) at ALPHA = .10, .05 or .01.


91.


Suppose we examine a random sample of subjects to see whether there is a
preference for certain colors. The colors selected are green, blue,
brown, yellow, and black. Theory suggests that people may prefer those
colors that are most commonly found in nature, so the colors have been
ranked from least commonly found in nature (black) to most commonly
found (green). We have recorded the number of people (n = 50) who rated
each color as their favorite. Our null hypothesis is one of no special
preference. Perform a Kolmogorov-Smirnov test to see whether the ob-
served preferences fit our expected uniform distribution.


color expected observed
----- -------- --------


black 10 0
yellow 10 5
brown 10 0
blue 10 25
green 10 20



Answer:

S(n)(X) F(0)(X) D
------- ------- -


0 .2 .2
.1 .4 .3
.1 .6 .5
.6 .8 .2
1.0 1.0 0


Prob.(D(n=50) >= .5) < .01


Reject H(O), that all colors are uniformly preferred, at ALPHA = .10,
.05 and .01.


92.


You wish to compare four methods of displaying apples sold in
supermarkets. The question to be answered is:


Does one of these methods (A,B,C, or D) provide greater daily apple
sales than another?


In order to evaluate methods, a single display is used in a store for a
full day. Displays cannot be changed during the working day but can be
changed between days. A store owner agrees to let you set up and meas-
ure sales on Monday, Tuesday, Wednesday, and Thursday for each of 4
consecutive weeks during October and November (i.e. a total of 16
selling days). Prior to this test period you obtain the following
information on apple sales where the same display has been used
each day:


Units of Apples Sold Monday Tuesday Wednesday Thursday
First Week 100 105 70 100
Second Week 125 120 85 120


Design an experiment to compare A,B,C, and D. Explain your choice of
design.



Answer:

The information available on apple sales using the same display (uniform
treatment) indicates one should not assume uniform results, since both
1. Day of week (Wednesday is noticiably worse) and
2. Week (second week had more sales)
pear to affect the response. Therefore, a Latin Square Design is
appropriate to restrict randomization so that each display is tested
an equal number of times on each day of the week and an equal number
of times during each week. In this way, each display is equally
exposed to effects of week and day.


The following assignment of treatments (from the program LSQPLN***) is
one possible assignment since each display occurs on each day of the
week as well as once during each week.


Day of Week


Week 1 2 3 4


1 C D B A
2 D C A B
3 A B D C
4 B A C D


93.


Suppose that you have been instructed to test 2 chemicals that are
said to be mosquito repellants. You are to compare these 4 treat-
ment combinations:


Amount of Chemical A Amount of Chemical B
____________________ ____________________
T(11) 0 0
T(12) 0 +
T(21) + 0
T(22) + +


+ indicates the recommended rate of application for each chemical.
An experimental unit consists of an arm of a subject. Each subject
provides 2 arms that are considered comparable, but subjects may
differ in mosquito appeal (i.e., experimental units are homogeneous
within incomplete blocks of size 2).


^----^----^ ^----^----^
^ ^ ^ ^ ^ ^
^ ^ ^ ^ ^ ^
^ ^ ^ ^ ^ ^
^----^----^ ^----^----^
Arm 1 2 1 2


Person 1 Person 2


A) How would you group treatments so that main effect differences
between rates for chemical A would be measured precisely, while
main effect differences between rates for B are measured less pre-
cisely? (Indicate clearly which treatments must be applied to the
same person).


B) How would you group treatments so that A and B main effects are
measured most precisely while AB interaction is estimated less pre-
cisely?



Answer:

A) To obtain high precision for A at the expense of low precision
for B, let B be the basis for grouping (B is main plot factor) ==>
Group 1: T(11), T(21); Group 2: T(12), T(22). Both group members
are assigned to the same person.


B) Let AB be the basis for grouping, i.e., group members will consist
of treatments receiving (+) signs for AB interaction, or those re-
ceiving (-) signs.


T(11) T(12) T(21) T(22)


A + + - -
B + - + -
AB + - - + ==> Group 1: T(11) and T(22)
Group 2: T(12) and T(21)


94.


An investigator wished to study the effect of an operator on the
performance of a machine. He could arrange to have each of four
operators run the machine five times. A response measurement could be
recorded each time the machine was used. How many experimental units
will he have if-


a. He randomly selects an operator, has him run the machine five times,
then selects another operator, etc.?


b. He identifies 20 turns for running the machine and randomly assigns
operators to turns subject to the requirement that each operator
perform five times?


c. He forms five groups of four turns and randomly and independently
assigns operators within each group of four?



Answer:

a. 4 : an experimental unit is a set of 5 turns or time of running the
machine.
b. 20: an experimental unit is a turn.
c. 20: an experimental unit is a turn even though turns have been
arranged in groups.


95.


An investigator suspected that the time required to pour a mold in a
foundry was longer after lunch than before lunch. He proposed comparing
these two conditions or treatments by measuring times needed to pour a
mold.


Which of the following schemes meets the requirement of independence of
response among experimental units? Why or why not?


a. The foundry was visited one day and four times were recorded before
lunch and four times after lunch.


b. The foundry was visited on four days. On each day one time was
recorded before lunch and one after lunch.



Answer:

Scheme b comes close to meeting the requirement of independence of
response among experimental units. With this scheme other factors that
may affect time to pour a mold, either before or after lunch such as:
the weather conditions or the handing out of paychecks during lunch
etc. would be balanced, and, therefore, the responses would not all be
tied together with that common influencing factor.


96.


A company is interested in adopting a new type of machine.  Since
it is an expensive model they are not willing to adopt it unless
they are fairly positive it will decrease the production time per
unit. If MU(S) is the mean production time per unit under the stan-
dard machine and MU(n) is the mean production time per unit under
the new machine, the appropriate pair of hypotheses to test is:


(a) H(O): MU(S) = MU(n) vs. H(A): MU(S) < MU(n)
(b) H(O): MU(S) >= MU(n) vs. H(A): MU(S) < MU(n)
(c) H(O): MU(S) = MU(n) vs. H(A): MU(S) =/= MU(n)
(d) H(O): MU(S) = MU(n) vs. H(A): MU(S) > MU(n)



Answer:

(d) H(O): MU(S) = MU(n) vs. H(A): MU(S) > MU(n)


The hypothesis we do not wish to reject unduly is MU(S) = MU(n).
This we call H(O). The alternative we wish to investigate and
not accept unduly is MU(S) > MU(n).


97.


A home owner claims that the current market value of his house is at
least $40,000. Sixty real estate agents were asked independently to
estimate the house's value. The hypothesis test that followed ended
with a decision of "reject H(O)". Which of the following statements
accurately states the conclusion?


a) The home owner is right, the house is worth $40,000.
b) The home owner is right, the house is worth less than $40,000.
c) The home owner is wrong, the house is worth less than $40,000.
d) The home owner is wrong, the house is worth more than $40,000.
e) The home owner is wrong, he should not sell his home.



Answer:

c) The home owner is wrong, the house is worth less than $40,000.


98.


In an experiment to determine the effect of a utilization review (UR)
procedure on the length of hospitalization, patients were paired by age
and sex and one member of each pair was randomly assigned to a ward that
had UR, the other to a regular ward. The results of the 30 pairs are
summarized below. The hospital wants to know at the 5% significance
level if the length of hospitalization is different for those experien-
cing a utilization review.


Regular Ward UR Ward Paired Difference
Mean Length of Stay 5.64 4.29 1.35
S.D. 2.75 2.41 3.41


a. Set up the appropriate confidence interval to evaluate this
experiment.


b. How can the hospital use the confidence interval to make a
decision about the effectiveness of the procedure?



Answer:

a. C.I. = DBAR +/- [t*S(D)/SQRT(n)!
= 1.35 +/- [2.045*3.41/SQRT(30)!
= 1.35 +/- 1.27


.08 <= MU(D) <= 2.62


b. Because MU = 0 is not included in this interval, we know the
null hypothesis would be rejected in a significance test at
the 5% level. Therefore, we can conclude that the procedure
is effective.


99.


A shirt manufacturer  is  considering  the  purchase  of  new  sewing
machines. If MU(1) is the average number of shirts made per hour by
his old machines and MU(2) is the corresponding average number of
shirts per hour for the new machine, he wants to test the null
hypothesis MU(1) = MU(2) against a suitable alternative.


a. What alternative hypothesis should he use if he does not want to
buy the new machine unless it is proven superior?


b. What alternative hypothesis should the manufacturer use if he
wants to buy the new machine (which has nice features) unless
the old machines are actually superior?



Answer:

a. MU(2) > MU(1)


b. MU(1) > MU(2)


100.


A manufacturer who produces auto tires wished to compare the wearing
qualities of two types of tires, A and B. To make the comparison, a
tire of Type A and one of Type B were randomly assigned and mounted
on the rear wheels of each of 5 automobiles. The automobiles were
then operated for a specific number of miles and the amount of wear
was then recorded for each tire.


Auto A B
___________________________


1 10.6 10.2
2 9.8 9.4
3 12.3 11.8
4 9.7 9.1
5 8.8 8.3


Test H(0): MU(A) = MU(B) against H(A): MU(A) =/= MU(B) with
ALPHA = .05:


a) using 2-sample test.
b) using paired-sample test.


What are your conclusions? Explain.



Answer:

a) XBARA = 51.2/5 = 10.24
XBARB = 48.8/5 = 9.76


S(A)**2 = (SUMX(A)**2 - ((SUMX(A))**2)/n)/(n - 1)
= (531.22 - (51.2**2)/5)/4
= 1.73


Similarly: S(B)**2 = 2.44


(NOTE: The following is the same as pooling since the sample sizes
are equal. However, the proper df = 5 + 5 - 2 = 8.)


S(XBARA - XBARB) = SQRT((S(A)**2)/n(A) + (S(B)**2)/n(B))
= SQRT((1.73/5) + (2.44/5))
= .913


Critical values of:
(XBARA - XBARB) = (MU(A) - MU(B)) +/- t(crit)*S(XBARA - XBARB)
= 0 +/- (2.31*.913)
= +/- 2.109


DM(CALC) = XBARA - XBARB
= 10.24 - 9.76
= .48


Since .48 is neither less than -2.109 nor more than 2.109, we
cannot reject H(O).


b) A B D D - DBAR = d d**2


10.6 10.2 .4 -.08 .0064
9.8 9.4 .4 -.08 .0064
12.3 11.8 .5 .02 .0004
9.7 9.1 .6 .12 .0144
8.8 8.3 .5 .02 .0004
---- -------
2.4 .0152


DBAR = 2.4/5 = .48


S(D) = SQRT((SUM(d)**2)/n - 1)
= SQRT(.0152/4)
= .0616


S(DBAR) = S(D)/SQRT(n)
= .0616/SQRT(5)
= .0276


t(calc) = DBAR/S(DBAR)
= .48/.0276
= 17.41


t(crit) = 2.776 for n - 1 = 4 df


Since 17.41 > 2.776, we reject H(O) and conclude that the means
are different.


H(O) was rejected in (b) but not in (a) since the test for related
samples is stronger than that for independent samples.


101.


We  are  interested in the wearing capabilitites of tires.  We obtain
Good-day and Good-poor Tires and 9 racing cars (and also the track
used for the Indianapolis 500 Race). We put Good-day on the
left-hand side of the car (front and rear) and Good-poor on the
right-hand side of the car (front and rear). We then allow the cars
to complete the 500 miles at a (relatively) safe speed and then
measure the wear (in millimeters) per tire.


Car No. Good-day Good-poor
------- -------- ---------
77 17 16
82 18 19
92 17 12
41 16 13
17 15 14
22 14 12
18 10 10
23 18 15
43 17 13


a. All the advertising literature claims equality between Good-day and
Good-poor. Can you present evidence to disprove this claim? Use a
significance level of 5%.


b. Comment on the validity of this experimental set-up.



Answer:

a. Let d be the difference in wear between tires on the left-hand side
compared to tires on the right-hand side. We are interested in
testing the hypothesis that the mean (dBAR) of such different scores
is zero.


H(0): MU(dBAR) = 0
H(1): MU(dBAR) =/= 0


The problem is obviously a paired experiment set-up and therefore we
perform a t-test on the difference.


Car No. GD GP d(i) d(i)**2
------- -- -- ---- -------
77 17 16 1 1
82 18 19 -1 1
92 17 12 5 25
41 16 13 3 9
17 15 14 1 1
22 14 12 2 4
18 10 10 0 0
23 18 15 3 9
43 17 13 4 16
---- -------
SUM 18 66


dBAR = [SUM(d(i))!/[9!
= 18/9
= 2


S(d)**2 = [SUM([d(i)-dBAR!**2)!/[n-1!
= [[SUM(d(i)**2)!-[n*(dBAR**2)!!/[n-1!
= [[66!-[9*4!!/[8!
= 3.75


t(calc.) = [dBAR-0!/[SQRT([S(d)**2!/n)!
= [2-0!/[SQRT([3.75!/9)!
= 3.098


t(critical, .05, two-tailed, 8 df) = 2.306


Since t(calculated) > t(critical), reject H(0). Therefore we can
claim on the basis of this test that the tires are not equal.


b. The Indianapolis race track has an oval shape with highly-banked
curves. Since the cars travel in only one direction, only the
inner tires would wear appreciably. There are many other drawbacks
to the design, but this one is catastrophic.


102.


A random sample of 625 boxes taken from the output of a box making
machine was inspected for flaws. It was found that 500 of the boxes
were free from flaws. To three decimals, what is the upper limit of
the 0.99 confidence interval estimate of the proportion of
acceptable boxes being produced?


a. .8 + 1.96*SQRT(.16/625)
b. .8 + 2.576*(.16/625)
c. .8 + 1.96*(.16/625)
d. .8 + 2.576*SQRT(.16/625)



Answer:

d. .8 + 2.576*SQRT(.16/625)


C.I. = p +/- Z(ALPHA/2)*SQRT(pq/n)
p = 500/625 = .8, q = 125/625 = .2
C.I. = .8 +/- Z(.005)*SQRT(.8*.2/625)


Upper limit = .8 + 2.576*SQRT(.16/625)


103.


Suppose that I'm interested in a herd of Unicorns where  mean  weight
is unknown, but the population variance is known to be 100. The
Director of Unicorns has given me $1200 for a year and directed me to
submit monthly reports on the average weight of this herd. It costs $5
per unicorn to weigh a beast and the Director will let me have
whatever money is left over after I pay expenses. He won't tolerate a
confidence interval greater than +/- 5 and becomes very upset with an
interval that doesn't contain the true mean. (He has a special infor-
mant who reports intervals that don't contain the true population
mean.) What shall I do? Why? Will I make any money? Will I survive
the year?



Answer:

The steps taken would probably depend upon several factors, including:


(1) What action the Director would take if you failed to
produce a confidence interval that contains the mean,


(2) What chance of failure you are willing to risk, and


(3) How you expect the mean weight of the herd to change through-
out the year.


If you feel it is necessary to estimate the mean every month with as
much confidence as possible and forego making any money, you would
use a sample size = 20 and find a confidence interval with a 97.5%
confidence level (see calculations below).


n = (1200/12)/5
= 20


Z = 5/(10/SQRT(20))
= 2.24


P(-2.24 < Z < 2.24) = .975


In order to make money, you could either decrease your confidence
level, not make a new estimate every month, or ask the director for a
raise.


104.


A random sample of 500 accounts receivable is selected from the 4,032
accounts that a firm has, and the sample mean is found to be
$242.30. The sample standard deviation is computed to be $3.20.
Set up a .99 confidence interval estimate of the population mean.
How do you interpret the meaning of this interval?



Answer:

Using t with ALPHA = .01 and df = 499,


C.I. = XBAR +/- (t) (S/SQRT(n))
= 242.30 +/- (2.576) (3.20/SQRT(500))
= 241.93 to 242.67.


99% of the time that this procedure is used to calculate an interval,
the resulting interval will contain MU. This interval may or may not
include MU.


105.


A survey on consumer finances reports that 33 per cent of a sample
of 2,600 spending units expected good times during the next 12 months.
Assume that a simple random sample was used in the study. Set up a
.95 confidence interval estimate of the population proportion of
spending units expecting good times.



Answer:

p = .33
n = 2600
Z(ALPHA=.025) = 1.96


Stand. error of proportion = SQRT(pq/n)
= SQRT((.33*.67)/2600)
= .009


C.I. = .33 +/- (1.96 * .009)
= from .312 to .348


106.


In a random sample of 200 television viewers in a certain area, 95
had seen a certain controversial program. Construct a 0.99 confi-
dence interval for the actual percentage of television viewers in
that area who saw the program.



Answer:

.475 +/- 2.58 SQRT((.475*.525)/200) = .475 +/- .091


107.


Taking a random sample from its very extensive files, a water company
finds that the amount owed in 16 delinquent accounts have a mean of
$16.35 and a standard deviation of $4.56.


a. Use these values to construct a .98 confidence interval for the
average amount owed on all delinquent accounts.


b. If Mr. Blackwater, the company president, claims the delinquent
accounts have a population mean of $19.01, how could you quickly
respond to him based on part a above (also after explaining that
you were using a 2% ALPHA level)?



Answer:

a. C.I. = XBAR +/- [t*S(XBAR)!; t(ALPHA=.01, one-tail, df=15) = 2.602
= 16.35 +/- [2.602*(4.56/SQRT(16))!
= from 13.384 to 19.316


b. According to the confidence interval found in part a, Mr.
Blackwater's estimate of $19.01 is a possible estimate for
the population mean. It should be pointed out that we are 98%
confident that such a confidence interval would contain the
population mean.


108.


A floor manager of a large department store is studying the buying
habits of the store's customers. Suppose he assumes that monthly
income of these customers is normally distributed with a standard
deviation of 500. If he draws a random sample of size N = 100 and
obtains a sample mean of YBAR = 800,


a) Find a .95 confidence interval for the true population mean.
b) Do you think that it would be quite unreasonable for
the true population mean to be $600? Explain.



Answer:

a) C.I. = YBAR +/- Z*SIGMA(YBAR)
= 800 +/- 1.96*(500/SQRT(100))
= 800 +/- (1.96*50)
= 800 +/- 98


Therefore, 702 < MU < 898


b) Yes, based on the above confidence interval, we would reject
the hypothesis that MU = 700 (at ALPHA = .05).


109.


A cigarette manufacturer tests tobaccos of two different brands of cig-
arettes for nicotine content and obtains the following results:


Brand A: 4 6 5 2 3
Brand B: 7 8 5 9 6


a. Using ALPHA = .01, would you say that there is a difference in
the averages?


b. Set up 99% confidence limits on the difference. Does this
answer agree with your answer in part a? Why or why not?





Answer:

a. For supplier A:


XBARA = 20/5 = 4


S(A)**2 = (SUM(X**2))-((SUM(X))**2)(n)/(n-1)
= (90-(400/5))/4 = 2.5


For supplier B:


XBARB = 35/5 = 7


S(B)**2 = (225-(1225/5))/4 = 2.5



S(XBARA-XBARB) = SQRT((S(A)**2/n(A)) + (S(B)**2/n(B)))
= SQRT((6.25/5) + (6.25/5))
= 1.58


(XBARA-XBARB)(crit) = MU +/- t(crit)*S(XBARA-XBARB)
= 0 +/- (3.36)*1.58
= +/- 5.31


(XBARA-XBARB)(calc) = 4 - 7 = -3


Since -3 is neither less then -5.31 nor more than 5.31, we cannot
reject H(0).


b. (XBARA - XBARB) - t(crit)*S(XBARA - XBARB) < MU(A) - MU(B)
< (XBARA-XBARB) + t(crit)*S(XBARA-XBARB)


-3 - 5.31 < MU(A) - MU(B) < -3 + 5.31
-8.31 < MU(A) - MU(B) < 2.31


Note that H(O): MU(A) - MU(B) is included in this interval, and
thus we are led to the same conclusion, (i.e. continuation of
H(O)), as in part (a).


110.


A brewery producing beer has a number of specifications for quality.
Among these standards is the requirement that the degree of hop like
flavor should be a value of 8.0.


The production of the brewery consists of a large number of batches.
It's possible for differences to arise between batches, so we will
regard each batch as a different population. We will consider the
hoppiness of each batch as a normally distributed variable with mean
and variance unknown.


From each batch you can remove 6 samples for hoppiness. For each
batch you are to:


a. set confidence limits for the batch (population) mean, MU;
b. determine if these limits are consistent with the require-
ment that hoppiness is a value of 8.0.


1. Outline the procedure to be followed in setting confidence limits
where the probability of the interval calculated including MU is:


a. 90%
b. 99%


2. Apply the procedure outlined to this set of sample values: 13,
11, 9, 14, 8, 11. Is this sample data consistent with the speci-
fication of hoppiness = 8.0 when the probability level used is:


a. 90%
b. 99%


3. Do these results suggest any weakness in the procedure used? If
so, what?



Answer:

1. a. To set 90% confidence limits including MU, we need XBAR, the
sample standard deviation, s, the sample size, n, and a t
value for ALPHA = .1.


Use the formula:
XBAR +/- (t, ALPHA/2)(s/SQRT(n)) with t(ALPHA/2) = 2.015.


b. Use the same procedure as in 1a, but use t(ALPHA/2) = 4.032.


2. XBAR = 11
Standard deviation = 2.28
n = 6


a. 11 +/- (2.015)(2.28/SQRT(6))
= 11 +/- (2.015)(.93)
= 11 +/- 1.876
= 9.124 to 12.876


This is inconsistent with the specification of hoppiness = 8.0.


b. 11 +/- (4.032)(2.28/SQRT(6))
= 11 +/- (4.032)(.93)
= 11 +/- 3.753
= 7.247 to 14.753


This is consistent with the specification of hoppiness = 8.0.


3. Ths basic weakness seems to be in using a procedure that produces a
confidence interval consistent with a wide range of values, which
makes it difficult to detect departures from MU = 8. This situation
is exaggerated when the 99% level is used. In addition, the sample
size used seems small relative to the variability measured.


111.


Willy the Waiter claims that the amount in tips that he receives per
customer on any given day is normally distributed, but that the
average and variability change from day to day (in response to changes
in Willy, the weather, the menu, etc.). So far today, Willy has
received the following amounts in tips (in dollars):


1, 2, .5, 1, 1.5, 0.


a. Write a model for Willy's tips for today. Define all terms.
b. Set 90% confidence limits for the next tip that he will receive.



Answer:

a. Y(J) = MU + EPSILON(J)
where:


Y(J) : tips received from customer J
MU : mean value for tips for the day
EPSILON(J): deviation of customer J's tips from the mean value,
assumed to be a value of a normally distributed ran-
dom variable with a mean of zero and a variance of
SIGMA**2.


b. Since the mean and variance of the population have been estimated
the variance of the predicted, or future, value involves both the
variance of the individuals (S**2) and the variance involved in
estimating the mean ((S**2)/n).


S(Predicted value)**2 = S**2 + (S**2)/n
= (S**2) + (1 + 1/n)


Finding XBAR = 1 and S = .707,
we get:


S(Pred. Value) = SQRT(.5(1 + 1/6))
= SQRT (.583)
= .764


Using ALPHA = .10 and df = 5, t = 2.132


C.I. = 1 +/- 2.015(.764)
= -.54 to 2.54


112.


The following triangle test is sometimes used to identify taste
experts. In the case of wine tasting, a test subject is presented
with three glasses of wine, two of one kind and a third glass of
another wine. The test subject is asked to identify the single
glass of wine. A test subject who merely guesses has a 1 chance
in 3 of identifying the single glass correctly. An expert wine
taster should be able to do much better. Let K stand for the num-
ber of correct identifications made by a test subject in 10 inde-
pendent triangle tests.


Assume that a test subject is accorded the title "expert wine
taster" if the number K of correct identifications is suffi-
ciently high to reject the hypothesis P = 1/3 at significance
level ALPHA = .02


i) A Type II error has the consequence that:


a. an experienced wine taster is accorded the title.
b. a person who is guessing is not accorded the title.
c. an experienced wine taster is not accorded the title.
d. a person who is guessing is accorded the title.


ii) The power of the test when P = .8 equals:


a. .38 b. .61 c. .88 d. .97



Answer:

i) c. an experienced wine taster is not accorded the title.


ii) c. .88


Under H(O): P = 1/3


P(X >= 7) = P(7) + P(8) + P(9) + P(10)
= .016 + .003 + 0 + 0
== .02


Therefore, X = 7 is the critical value.


Under H(A): P = .8


Power = 1 - BETA; (BETA = P(X < 7))
= P(X >= 7)
= .201 + .302 + .268 + .107
= .878


113.


A toaster manufacturer produces two models, A and B.  Experience
indicates that 3% of the customers buying model A will make a claim
on their warranty. In a sample of 400 owners of model B (whose
warranties have expired), 16 made a claim on their warranty. The
manufacturer wishes to determine if the models differ in the number
of claims.


(a) Determine the value of the test statistic. What are your
conclusions?


(b) Let PI(B) be the probability that a buyer of model B will make a
claim on the warranty. For what values of the test statistic
would you reject H(0) when testing H(0): PI(B) = .03 against
H(A): PI(B) =/= .03? (Let ALPHA = .10).



Answer:

(a) For A: PI(A) = .03 For B: n = 400, p(B) = 16/400 = .04


H(0): PI(A) = PI(B) = .03, or PI(B) - PI(A) = 0
H(A): PI(B) =/= .03


Z(calculated) = (.04 - .03)/SQRT(.03*.97/400)
= .01/.00853
= 1.1724


Z(critical, twotail, ALPHA = .05) = +/- 1.96


Therefore continue H(0), and assume at the 95% confidence level
that the models are the same in the number of claims.


(b) Using ALPHA = .10:
Z(critical, twotail) = +/- 1.645


114.


Past experience shows that, if a certain machine is adjusted properly, 5
percent of the items turned out by the machine are defective. Each day
the first 25 items produced by the machine are inspected for defects.
If three or fewer defects are found, production is continued without
interruption. If four or more items are found to be defective, produc-
tion is interrupted and an engineer is asked to adjust the machine.
After adjustments have been made, production is resumed. This proce-
dure can be viewed as a test of the hypothesis p = .05 against the
alternative p > .05, p being the probability that the machine turns
out a defective item. In test terminology, the engineer is asked to
make adjustments only when the hypothesis is rejected.


Interpret the quality control procedure described above as a test of
the indicated hypothesis. A Type I error results in:


a. a justified production stoppage to carry out machine adjustments.
b. an unnecessary interruption of production.
c. the continued production of an excess of defective items.
d. the continued production, without interruption, of items that
satisfy the accepted standard.



Answer:

b. an unnecessary interruption of production.


115.


The workers in a large plant  have  complained  through  their  union
negotiators that they are being underpaid. Both sides
(labor-management) agree that the mean wage for plant workers in this
industry is about $3.75 per hour with a standard deviation of $.84 per
hour.


(i) Does the fact that a random sample of 49 workers from this plant
gave mean wage $3.54 provide sufficient evidence to indicate the
plant is paying an inferior wage? Use ALPHA = .05.


(ii) State what a Type I and a Type II error would be for this problem.



Answer:

(i) H(O): MU >= 3.75
H(A): MU < 3.75


n = 49
XBAR = 3.54
MU = 3.75
SIGMA = .84


SIGMA(XBAR) = [.84!/[SQRT(49)!
= .12


Z(calc) = [3.54 - 3.75!/[.12!
= -1.75


Z(crit, ALPHA=.05, one-tailed) = -1.645


Since Z(calc) < Z(crit), reject H(O). Therefore, sample evidence
is strong enough to suggest that workers are being underpaid.


(ii) Type I Error: A type I error will occur when the null hypothesis
is rejected on the basis of the sample information
and in reality the null hypothesis is true. In
this case, the conclusion based on the random
sample would be that the workers are being under-
paid when actually they are not. So, the workers'
complaint would be erroneously supported.


Type II Error: A type II error will occur when the null hypothesis
is not rejected on the basis of the sample informa-
tion and in reality the null hypothesis is false.
In this case, the conclusion based on the random
sample would be that the workers are not being un-
derpaid when actually they are being underpaid. So
the employers' position of just wages would be er-
roneously supported.


116.


The daily yield of a chemical manufactured in a chemical plant,
recorded for n = 49 days, produced a mean and standard deviation
equal to XBAR = 870 tons and s = 21 tons, respectively.


Test H(0): MU = 880 against H(A): MU < 880, using ALPHA = .05.
Calculate BETA for H(A): MU = 870.



Answer:

S(M) = S/SQRT(n) = 21/7 = 3
XBAR(crit) = MU(M) + Z(crit)S(M)
= 880 + ((-1.65)*3)
= 875.05


Since 870 < 875.05, we reject H(0) and conclude that MU < 880.


BETA is the probability of committing a type II error. Using the
above decision rule and given H(A), it is the probability that XBAR
is greater than XBAR(crit) = 875.05 when MU = 870.


BETA(H(A): MU = 870) = P(XBAR > 875.05); Z = (875.05 - 870)/3
= P(Z > 1.683) ; = 1.683
= .046


117.


We are interested in finding the linear relation between the number
of widgets purchased at one time and the cost per widget. The
following data has been obtained:


X: Number of widgets purchased-- 1 3 6 10 15
Y: Cost per widget(in dollars)--55 52 46 32 25


Suppose the regression line is YHAT = -2.5X + 60. We compute the
average price per widget if 30 are purchased and observe:


a. YHAT = -15 dollars; obviously, we are mistaken; the prediction
YHAT is actually +15 dollars.
b. YHAT = 15 dollars, which seems reasonable judging by the data.
c. YHAT = -15 dollars, which is obvious nonsense. The regression
line must be incorrect.
d. YHAT = -15 dollars, which is obvious nonsense. This reminds us
that predicting Y outside the range of X values in our data is a
very poor practice.



Answer:

d. YHAT = -15 dollars, which is obvious nonsense. This reminds us
that predicting Y outside the range of X values in our data is a
very poor practice.


118.


A  management  analyst  is  studying  production  in  an   electronic
component assembly factory. Workers individually assemble components
into final products. Each worker is given 100 sets of components to
assemble each day. Employees clock out at the time they finish
assembling the 100 sets into final products. The analyst has average
hourly production rates for each individual worker. Which mean
should be used to calculate the overall average production per labor
hour?


a. arithmetic mean
b. geometric mean
c. harmonic mean



Answer:

c. harmonic mean


The harmonic mean is properly used since the numerator in each
worker's average production is 100 units and the denominator,
hours worked, varies.


119.


A   management  analyst  is  studying  production  in  an  electronic
component assembly factory. Workers individually assemble components
into final products. Workers assemble as many units as they can in
an eight hour day. The analyst has average hourly production rates
for each individual worker. To calculate the factory's overall
average hourly production per worker, which mean should be used?


a. arithmetic mean
b. geometric mean
c. harmonic mean



Answer:

a. arithmetic mean


The arithmetic mean of individual average hourly production rates
is the same as total production divided by total hours worked,
since individual rates are daily production divided by eight for
every employee.


120.


Nelly finds that 30 out of 100 randomly selected persons  walking  in
downtown Cincinnati believe that the government should spend more
money on health care; a similar survey in the suburbs shows that 20
out of 100 persons believe in more government spending. At the ALPHA
= 0.05 level, are these data evidence that the people in downtown
Cincinnati believe to a different extent than people in the suburbs
that the government should spend more money on health care?



Answer:

Z = (P(1) - P(2) -0)/(S)
where S is the standard error for the difference of proportions
S = SQRT((.3(.7)/100) + (.2(.8)/100)) = .06
Z(calc.) = ((.3 - .2) - 0)/.06 = 1.67
Z(crit.) = +/- 1.96 for ALPHA = .05


Since 1.67 < 1.96; Continue the H(O).


121.


You are to conduct an opinion poll to determine  the  opinions  of resi-
dents of a given community about a projected industrial development pro-
gram. How large a sample should you select to estimate the proportion
of adult residents favoring the projected development? Make all assump-
tions necessary to determine the sample size, and justify these assump-
tions.



Answer:

Necessary assumptions are:


Level of significance = .05, then Z = 1.96
Tolerable error, e = .05
Assume X has a binomial distribution with a population of size N
and PI = .50, where X is the number of adult residents favor-
ing the projected development.


The first two assumptions are arbitrary values that will depend upon
the preference of the researcher. The choice of the proportion has
been set at maximum variability since no other information on the
proportion was available.


n = (Z**2)(PI)(1 - PI)/(e**2)
= (1.96**2)(.5)(.5)/(.05**2)
= 384.16
== 385


122.


a.  For each of the samples listed below obtain:
1. a mean
2. a variance, and
3. a standard deviation


Each sample was randomly obtained from the production of the hot
dog manufacturer listed.


Company Dog Length(inches)


A 5,5,5,5,5
B 6,5,5,5,4
C 9,9,5,1,1
D 9,5,5,5,1
E 9,5,5,5,5,5,5,5,5,1
F 9,9,9,4,4,3,3,3,3,3


b. Given that the price per hot dog is the same for all manufacturers,
whose hot dogs would you buy? Why?



Answer:

a. Company Mean Variance St. Dev.
(SUM X(i))/n=XBAR S**2=(SUM(X-XBAR)**2)/n-1 S=SQRT(VAR.)


A 25/5=5 0 0
B 25/5=5 2/4=1/2=.5 SQRT( 1/2 )=.707
C 25/5=5 64/4=16 SQRT( 16 )=4
D 25/5=5 32/4=8 SQRT(8)=2.83
E 50/10=5 32/9=3.55 SQRT( 3.55)=1.89
F 50/10=5 70/9=7.78 SQRT(7.78)=2.79


b. This question may have a variety of answers. The decision would
depend on the purpose. If it was important to have as little
variability as possible when selling 5 inch hot dogs, company A
would be best since it has the least variability. However, if you
could profit from selling hot dogs in a variety of lengths, company
F might prove best since it shows a lot of variability and produces hot
dogs ranging from 9 to 3 inches in length.


123.


Two workers on the same job show the following results over a long
period of time.


Worker Worker
A B
-------------------------------------------------------------------
Mean time of completing the job (minutes) 30 25
Standard deviation (minutes) 6 4


a. Which worker appears to be more consistent in the time he
requires to complete the job? Explain.


b. Which worker appears to be faster in completing the job?
Explain.



Answer:

a. Worker B appears to be more consistent in the time he requires
to complete the job, since he has a smaller variance.


b. Worker B appears to be faster in completing the job, since he
has a smaller mean. (You could actually test this.)


124.


Suppose the manager of a plant is concerned with the total
number of man-hours lost due to accidents for the past 12
months. The company statistician has reported the mean number
of man-hours lost per month but did not keep a record of the
total sum. Should the manager order the study repeated to
obtain the desired information? Explain your answer clearly.



Answer:

No--the estimate that he would get using the mean number per
month would most likely be accurate enough, without having to
go to the extra expense of another study. Presumably the mean
number of hours lost per month is equal to the total number
of hours lost divided by 12, so it's not difficult to
calculate the total.


125.


A large health screening program that will have 36 clinics needs to
purchase scales for the clinics. A manufacturing firm has available
36 scales on which the same 180 pound man was weighed. The variance
in his weight on the 36 scales was .07 (lb**2). The screening program
will buy the scales if the variance is not significantly greater than
.05 at the 1% significance level.


a. What test statistic would you use to test the null hypothesis that
the true variance in weights on the new scale is .05? Set up the
computations.


b. What are the null hypothesis, alternative hypothesis, and critical
region of such a test?



Answer:

a. CHISQUARE = (n-1)*(S**2)/(SIGMA**2)


b. H(O): SIGMA**2 = .05
H(A): SIGMA**2 > .05


Critical Region: CHISQUARE(df=35, ALPHA=.01, one-tail) = 57.34


126.


Suppose that the variable measured using a random sample is annual
income. (Suppose that it and all other items were measured accurately.)
Explain what it is that these two models have to say about income.


1. Y(I) = MU + EPSILON(I)
2. Y(I,J) = MU(J) + EPSILON(I,J) Where J = 1 indicates a Democrat
J = 2 indicates a Republican



Answer:

Model 1 states that annual income can be described by a single popula-
tion having a single mean and standard deviation. Individual incomes
consist of a common mean plus random variation.


Model 2 states that description of annual income may require 2 popula-
tions, one for Democrats and one for Republicans. It provides for the
possibility of different population means for income and may also pro-
vide for different standard deviations.


NOTE: This is a case where it would probably not be advisable to
assume that EPSILONS are normally distributed.


127.


A manufacturing company operates 12 plants that are regarded as
about the same in all important respects. This company decides
to try a new safety program. The new program is randomly
assigned to 6 plants while the old program is continued at the
other 6. Number of man-hours lost per plant per month, were
measured in each plant following completion of the safety
programs. Results were:
New Program: 46, 41, 16, 11, 58, 61
Old Program: 92, 65, 10, 24, 46, 51
a. Write a model for number of man hours lost.
b. Fill out an ANOVA table corresponding to this model.
c. Was there a change in accident rate that was detectable at
the 10% level?



Answer:

a. Y(I,J) = MU(I) + EPSILON(I,J) or MU + TAU(I) + EPSILON(I,J)
Where:
Y(I,J) is manhours lost in plant J under program I
MU(I) is population mean for time lost under program I.
or
MU is population mean and
TAU(I) is used to indicate effect of program I defined as a
deviation from MU.
and EPSILON(I,J) indicates a random element associated with the Jth
plant using program I. These random elements are normally
distributed with mean = 0 and variance = SIGMA**2


b. Source df Source df
Total 6 Total 6
Mean General
Program 1 1 or mean 1
Mean Corrected
Program 2 1 total 5
Error 4 Programs 1
Error 4


c. Old Program: New Program:
XBAR(1) = 48 XBAR(2) = 38.83
S(1) = 29.18 S(2) = 21.03
n = 6 n = 6
Testing:
H(0): XBAR(1) = XBAR(2)
H(A): XBAR(1) =/= XBAR(2)
We first must test for homogeneity of variance or
H(0): SIGMA(1) = SIGMA(2)
H(A): SIGMA(1) =/= SIGMA(2)
F(calculated) = [S(1)**2!/[S(2)**2! = 851.6/442.17
= 1.926 with 5 EPSILON 5 df
F(critical) 5.05 with ALPHA = .05, df = 5 EPSILON 5
Since F(calculated) is less than F(tabled), there is evidence at the 5%
level to continue the null hypothesis of homogeneity of variance.
The variance should now be pooled:
S(p)**2 = ((5) (851.6) + (5) (442.17))/10
= 646.89
and finally find the standard error of the difference between means
S(XBAR(1) - XBAR(2)) = SQRT ((S(p)**2)(1/n(1) + 1/n(2)))
= 14.68
Now using the two-tailed t test with ALPHA = .10, df = 10 we test
the null hypothesis about the means
t(calculated) = ((48-38.83)-0)/14.68 = .6245
t(criticals) = -1.812 EPSILON 1.812
Continue the null hypothesis that there was no change in the accident
rate. Since t(calculated) is greater than the smaller t(critical) but
less than the larger t(critical) at the 90% confidence level.


128.


In the attached Table 1, results for the routine measurement of
nickel in a steel standard are reported. This determination was made
daily over a long period of time to establish a quality control
program.


In Table 2, the data have been plotted as a tally sheet of
individual values. Clearly, a grouped tally sheet would be more
effective in revealing the pattern of variation in these data.


Perform the following --


(a) Set up a grouped tally sheet and histogram. A cell interval of
0.05% is recommended. List the frequency, cumulative frequency
and relative cumulative frequency for each cell.


(b) Calculate the mean and standard deviation (use coding) by both
the ungrouped and the grouped procedures. Compare results.


(c) What is the mode -- comment -- is it meaningful?


(d) What is the median?


(e) Calculate the standard deviation of the mean.


(f) Plot an ogive. Plot the data on normal probability paper. Is it
reasonable to assume a normal distribution? If so, estimate the
standard deviation and mean and compare wih the calculated values.
Estimate the percentage of values outside of the limits 4.88 to
5.21 and compare with the actual percentage.


Table 1. Results of Daily Determination of Nickel in a Nickel
Steel Standard


Date % Ni Date % Ni Date % Ni


Mar. 6 4.95 Apr. 17 4.96 May 29 5.03
7 5.02 18 4.79 30 5.08
8 5.17 19 5.06 31 5.20
9 5.08 20 5.03 June 1 5.11
10 4.92 21 4.95 2 4.95
11 4.94 22 5.10 3 4.95


13 5.22 24 5.05 5 5.00
14 4.96 25 5.30 6 4.92
15 5.05 26 5.24 7 5.16
16 5.02 27 5.00 8 5.14
17 5.14 28 5.08 9 5.02
18 5.00 29 5.04 10 5.14


20 5.07 May 1 4.97 12 5.02
21 4.83 2 4.86 13 4.97
22 5.11 3 5.07 14 4.96
23 4.99 4 4.90 15 5.26
24 4.98 5 5.22 16 5.11
25 5.26 6 5.07 17 5.15


27 4.88 8 5.31 19 4.98
28 5.01 9 5.05 20 5.15
29 4.98 10 5.16 21 5.00
30 5.21 11 5.02 22 5.14
31 5.15 12 5.18 23 4.98
Apr. 1 5.00 13 4.90 24 5.03


3 5.00 15 5.20 26 5.01
4 5.10 16 5.08 27 4.97
5 5.03 17 5.19 28 5.12
6 4.97 18 5.16 29 4.98
7 4.89 19 4.88
8 5.12 20 4.99


10 5.27 22 4.92
11 5.09 23 5.17
12 5.13 24 5.01
13 4.93 25 5.02
14 4.93 26 5.06
15 5.04 27 5.03



Table 2. Frequency Table and Tally Sheet for the Data
in Table 1


Ni Conc., Tally Frequency Ni Conc., Tally Frequency
% (y) Marks (f) % (y) Marks (f)


4.79 X 1 5.05 XXX 3
4.80 5.06 XX 2
4.81 5.07 XXX 3
4.82 5.08 XXXX 4
4.83 X 1 5.09 X 1
4.84 5.10 XX 2
4.85 5.11 XXX 3
4.86 X 1 5.12 XX 2
4.87 5.13 X 1
4.88 XX 2 5.14 XXXX 4
4.89 X 1 5.15 XXX 3
4.90 XX 2 5.16 XXX 3
4.91 5.17 XX 2
4.92 XXX 3 5.18 X 1
4.93 XX 2 5.19 X 1
4.94 X 1 5.20 XX 2
4.95 XXXX 4 5.21 X 1
4.96 XXX 3 5.22 XX 2
4.97 XXXX 4 5.23
4.98 XXXXX 5 5.24 X 1
4.99 XX 2 5.25
5.00 XXXXXX 6 5.26 XX 2
5.01 XXX 3 5.27 X 1
5.02 XXXXXX 6 5.28
5.03 XXXXX 5 5.29
5.04 XX 2 5.30 X 1
5.31 X 1



Answer:

a) (If available, consult file of graphs and charts that could not be
be computerized.)


Cell Cell Cum Rel Cum
Midpoints Boundaries f f f
4.775
4.80 1 1 0.01
4.825
4.85 2 3 0.03
4.875
4.90 8 11 0.11
4.925
4.95 14 25 0.25
4.975
5.00 22 47 0.47
5.025
5.05 15 62 0.62
5.075
5.10 12 74 0.74
5.125
5.15 13 87 0.87
5.175
5.20 7 94 0.94
5.225
5.25 4 98 0.98
5.275
5.30 2 100 1.00
5.325
___
100


b) ungrouped YBAR = 504.99/100 = 5.0499 == 5.05


ungrouped S(Y) = SQRT[(2551.3039 - 2550.1490)/99!
= SQRT(0.01166)
= 0.108 == 0.11


Grouped and coded by: Y = 0.05d + 5.05


Cell
Midpoint d f f*d f(d**2)
4.80 -5 1 -5 25
4.85 -4 2 -8 32
4.90 -3 8 -24 72
4.95 -2 14 -28 56
5.00 -1 22 -22 22
5.05 0 15 0 0
5.10 +1 12 +12 12
5.15 +2 13 +26 52
5.20 +3 7 +21 63
5.25 +4 4 +16 64
5.30 +5 2 +10 50
___ ___
sum(fd) = -2 sum(f*d**2) =
448


dBAR = (sum(fd))/n = -2/100 = -.02


YBAR = (0.05)(-.02) + 5.05 = 5.049 == 5.05


S(d) = SQRT[((448 - 2**2)/100) / 99! = SQRT(4.525) = 2.127


S(Y) = (2.127)(0.05) = 0.106 == 0.11


c) 5.00 or 5.02 - not meaningful because no single value occurs
with sufficient frequency.


d) Median is average of 50th and 51st observations -
(5.03 + 5.03)/2 = 5.03


e) S(YBAR) = S(Y)/SQRT(n) = 0.108/SQRT(100) = 0.0108 == 0.011


f) Estimates graphically should compare closely.


(If available, consult file of graphs and charts that could not be
computerized.)


Actual percentage outside = 11%.
Graphical estimate should be within about 2% of this.


129.


A coffee dispensing machine provides servings that have a population
mean of 6 ounces and a population standard deviation of .3 ounces.
If the difference is measured between randomly chosen cups (e.g.
the 7th minus the 15th, the 22nd minus the 29th, etc.), the
distribution of differences will have a mean of ______ and a
standard deviation of ______.



Answer:

a. MU = 0
b. SIGMA = SQRT(.09/1 + .09/1) = .424


130.


The closing prices of two common stocks were recorded for a period
of 15 days. The means and variances were:


Y(1)BAR = 40.33, Y(2)BAR = 42.54,
S(1)**2 = 1.54, S(2)**2 = 2.96


Do these data present sufficient evidence to indicate a difference in
variability of the two stocks for the populations associated with the
two samples? [Assume stock 1 is normally distributed with mean = MU(1)
and variance = SIGMA(1)**2 and stock 2 is normally distributed with mean
= MU(2) and variance = SIGMA(2)**2; ALPHA = 5% and S(i)**2 =
SUM(j=1,n(i))([(Y(ij)-Y(i)BAR)**2!/[n(i)-1!), i=1,2.!



Answer:

H(0): SIGMA(1)**2 = SIGMA(2)**2
H(A): SIGMA(1)**2 =/= SIGMA(2)**2


F(calc) = [larger variance! / [smaller variance!
= [2.96! / [1.54! = 1.922


F(crit., df=14,14, ALPHA= .05, one-tail) = 2.48


Since our calculated F value is less than our tabled F value, we do not
reject (continue) the null hypothesis that the variances for the two
populations are the same.


131.


Once upon a time there was a king who proclaimed that a proper
kingdom should not have great differences in wealth. One day he
instructed his wizard to randomly sample his kingdom so that he
could assess the distribution of wealth.


So the wizard did this -


1. He randomly selected 100 people, found their income, and wrote
down the mean for the group of 100;
2. He randomly and independently repeated this process over and
over again;
3. He truthfully reported to the king, "I have repeatedly taken
average wealth in the kingdom of 100 subjects and find that the
average wealth is 10 units and variance of those averages is 1
unit. Further, those means are normally distributed."


a. What is mean wealth of individuals in the kingdom?
b. What is the variance for individual wealth in the kingdom?
c. Why did the wizard report on means based on samples of 100?



Answer:

a. 10 units
b. SIGMA**2 = (n) (SIGMA(XBAR)**2)
= 100 * 1
= 100
c. To conceal the variability that would be obvious if he reported
on individuals.


132.


HEADLINE:   MPG for Gas Guzzler skyrockets over
MPG for Econ Scooter]


Data: in miles per gallon


Gas Econ
Guzzler Scooter
------- -------
1964 4 25
1968 5 30
1972 8 35
1976 16 40


100] G
Percent ]
Increase ]
Over 75]
Prior ]
Time ] G
Period 50]
]
]
25] G
] E
] E E
-------------------------------------------
1968 1972 1976


(To complete the graph connect the three G points with
straight lines to relate the performance of Gas Guzzler.
Similarly, connect the three E points to show the trend
for Econ Scooter.)


Even though the above graph is correct, explain how it has led to
the misleading headline.



Answer:

The headline is misleading in the sense that it implies that mpg is
being compared for the two vehicles. Only upon inspection of the data
can one see that Econ Scooters have a substantially higher mpg, while
their rate of increasing mpg has not been as great. The graph accurate-
ly indicates the rate of increase in mpg, but the headline is comparing
actual mpg, which is quite different.


133.


Suppose that a report contains this graph:


^
^
Annual Income ^
(thousands of ^
$ per year) ^
^
50 + *
^ *
^
^
^
^
^
^
^ *
^
25 +
^
^
^
^
*
^
^
^
^
----------+---------+---------+---------->
10 20 30
Years of Experience in Trade


(Note: to complete graph, connect the *'s
with a smooth curve.)


a. What does the graph indicate as annual income for someone with no
experience in the trade?


b. Describe the relation between income and experience over the inter-
val from 0 to 20.


c. Describe the relation between income and experience over the inter-
val 20 to 30.


d. Describe the overall graph.



Answer:

a. Around 12,500 dollars per year.


b. There appears to be approximately a straight line relation in which
income increases with experience over the interval from 0 to 20.
(There seems to be some curvature or flattening for experience near
20.) The change in income in this range is from around 12.5 to
around 48, so the rate of increased income is roughly $35,500/20 =
$1775 per year.


c. The relation between experience and income for experience between
20 and 30 years also appears to be roughly a straight line, but a
flat straight line, indicating that income stays roughly constant
at a little less than $50,000 per year.


d. The overall graph indicates income initially around $12,500 (no ex-
perience), increasing income in the range from 0 to 20 years exper-
ience, approaching a limit that seems to be a little below $50,000.
That limit seems to be reached sometime between 10 and 25 years.
(Income seems to remain about constant afterward.)


134.


If a random sample of 18 homes south of Center Street in Provo showed
the average selling price to be $15,000 with s**2 = $2400 and a random
sample of 18 homes north of Center Street revealed an average selling
price of $16000 with s**2 = $4800, can you conclude that there is a
statistically significant difference (ALPHA = .05) between the selling
price of homes in these areas of Provo?



Answer:

H(0): MU(north) - MU(south) = 0
H(A): MU(north) - MU(south) =/= 0


s**2 = ((18-1)(2400) + (18-1)(4800)) / (18+18-2) = 3600
s = 60


Before pooling the sample variances, we will test to see if the
population variances are equal:
H(0): SIGMA(north)**2 = SIGMA(south)**2
H(A): SIGMA(north)**2 =/= SIGMA(south)**2


F(calc.) = 4800/2400
= 2
F(crit, df=17,17, ALPHA=.05) = 2.29


So do not reject (continue) H(0), and pool s**2's, the following is
equivalent to pooling when the sample sizes are equal:


t(calculated) = (15000-16000)/((60)*SQRT(1/18+1/18)) = -50
t(crit., ALPHA=.05, df=35, two-tailed) = +/- 2.03


conclusion: there is a significant difference


135.


A machine is supposed to produce Zorkel fingers having a thickness of
.050 inches. To test if the machine is working properly, a random
sample of 16 Zorkel fingers is selected randomly from the day's out-
put. The mean thickness of the sample is .053 inches with S = .003.
We wish to determine if the machine is in proper working order with
ALPHA = .01. Use a two-tailed test.



Answer:

Hypothetical Population: Set of all Zorkel fingers.
Sample : The 16 randomly selected fingers.


H(O): MU = .05. The mean Zorkel finger thickness is .05.
H(A): MU =/= .05. The mean Zorkel finger thickness is other than .05.


MU(M) = .05 by H(O)


S(M) = S/SQRT(n)
= (.003)/4
= .0007497


M(crit) = MU(M) +/- t(crit)*S(M)
= .05 +/- (2.95)*(.0007497)
= .052 to .048


Since M = .053 is greater than .052, we reject H(O) and conclude that
Zorkel finger thickness is other than .05.


OR, using a t-test:


t(calc) = [XBAR - MU! / [S(M)!
= [.053 - .050! / [.0007497!
= 4.0016


t(crit, df=15, ALPHA=.01, two-tailed) = 2.947


Since t(calc) > t(crit), we reach the same conclusion as above.


136.


An investigator was interested in studying relations between a number
of factors and salary in a university. One of the factors of
interest was tenure status. After much agonizing, the investigator
decided to use the following variable for persons having faculty
appointments.


Variable called T coded as:
1 for non-tenure track people such as administrators
with faculty appointments


2 for faculty with less than one year in service in
tenure track positions


3 for faculty with one to three years on tenure track


.



.


7 for faculty having tenure and more than 20 years
service


The investigator then carried out a multiple regression analysis in
which one of the variables fitted was T.


If you accept his seven tenure classes as a reasonable grouping
scheme, would you use this approach? Why or why not?



Answer:

If I regarded the seven classes specified as a good way to form
groups based on tenure, I would want to see what would happen if I
used six independent variables instead of just one to represent
tenure effects. Using T alone might work if the relation between
salary and the code values for T were linear. But, the code values
for T don't appear to be either well ordered (the first class
included administrators who are apt to have higher salaries than the
following classes which do seem to provide sort of an increasing
order) or equally spaced. (Is the amount of "tenure" the same
between classes 1 and 2, and 2 and 3?).


I would not use this approach.


137.


A report on the effect of  sex  on  faculty  salaries  at  a  Western
University states that all ranks and departments have been surveyed.
It states that:


A simple regression with salary as dependent variable and sex
as independent variable had a regression coefficient for sex
equal $5000.


A multiple regression with the same dependent variable but
additional variables for rank (full professor, associate,
assistant, instructor), department, tenure status, length
of employment, etc. had a regression coefficient for sex
equal $100.


(In both cases, the independent variable used for sex was coded so
that the regression coefficient for sex represented the salary
advantage of males over females.)


Which of these values would you use to represent the effect of sex on
salary? Explain your answer.



Answer:

I would expect that value $100 would be more informative or
trustworthy. When all important factors affecting response have been
held constant except for a single independent variable and that
variable is related to response by a straight line relation, a simple
regression coefficient can provide a good measure of how that
variable affects response. But, if response is affected by many
variables and they are not constant in the data set being examined, a
simple regression coefficient can be very misleading. In this case,
the fact that the regression coefficient for sex changed from 5000 to
100 when other variables were included in the fitted regression
indicates that much of the apparent influence of sex on salary really
was the result of treating variables like rank, department, etc. as
constant or unimportant when in fact some of them were important and
not constant.


138.


Attached is a table relating current food prices and prices from
3 months ago for a certain supermarket in the area. Perform the
following:


a. Plot current price vs. price 3 months ago.
b. Propose a model relating current price and price 3 months ago.
Define all terms and estimate all parameters.


Produce Price
3 months ago Current
-----------------------------------------------------
Milk (1 gallon) 1.39 1.39
Cheese (sliced 12 oz.) .89 .93
Eggs (1 doz. large) .89 .81
Bologna (12 oz.) .85 .89
White Tuna Fish (7 oz.) .75 .79
Soup (chicken noodle) .22 .24
Green Beans (1 lb. can) .33 .39
Ground Beef (1 lb.) .95 .98
Corn Flakes (12 oz.) .49 .49
Spaghetti (2 lbs.) .89 .95
Sauce (w/o meat, 16 oz.) .59 .59
Coffee (6 oz.) 1.57 1.57
Bread (1 oz.) .45 .52
Lettuce (1 head) .49 .33
Potatoes (10 lbs.) .49 .69
Fruit Cocktail (18 oz.) .49 .49
Peanut Butter (18 oz.) .87 .99
Yogurt (8 oz.) .37 .37
Rice (2 lbs.) .71 1.09
Cottage Cheese (1 lb.) .65 .67


Total 14.33 15.17



Answer:

a. If available, consult file of graphs and diagrams that could not be
computerized for graph.


b. The model I propose is: Y = B(1)*X + EPSILON


where: Y is the response, current price;
B(1) is the estimated effect of X on Y;
X is the independent variable, price 3 months age;
EPSILON is a random error term.


The fitted equation is: YHAT = 1.046 * X


I forced this regression through the origin because, with a regres-
sion not through the origin, (the intercept equalled .052), the
intercept was not significant at the 5% level. Also, the usual
method of describing inflation would not include adding a constant
to some computed number.


The t test for the regression coefficient and F test for the regres-
sion mean square are significant.


ANOVA
Source df SS M.Sq.


Uncorrected total 20 13.8443
Regression 1 13.6213 13.6213
Pooled Error 19 0.2230 .01739


R**2, adjusted for matched X error = .9894


139.


A small mail-order house uses the weight of incoming mail to determine
how many of their employees are to be assigned to filling orders on a
given day. Assume a linear regression model, given X = weight (lbs)
of mail on hand at 7:00 a.m., and Y = no. of 8-hour shifts required to
fill the orders of that day. The calculated results from some data
are given below:


n = 8 SUM(X**2) = 524
SUM(X) = 56 SUM(XY) = 364
SUM(Y) = 40 SUM(Y**2) = 256 YHAT = .52 + .84X


(a) Test the hypothesis that the slope of the regression line is zero
at ALPHA = .05.


(b) Find a 90% confidence interval for the number of eight hour shifts
required if there are 10 lbs. of mail on hand at 7 a.m. on a par-
ticular day.


(c) In analyzing the fitted regression model, explain what the values
for b(0) and b(1) mean. Is there anything inconsistent about your
fitted values from a practical standpoint?



Answer:

(a) H(O): BETA = 0
H(A): BETA =/= 0


SSE = [256-[[40**2!/8!! - [([364-(56*40)/8!**2)/(524-[[56**2!/8!)!
= [56! - [(84**2)/(132)!
= 2.5454


MSE = [2.5454!/[6!
= 0.4242


S(b)**2 = 0.4242/132
= 0.0032


S(b) = 0.0567


t(calc) = [0.84 - 0!/[.0567!
= 14.817


t(crit, ALPHA=.05, two-tailed, df=6) = +/- 2.447


Since t(calc) < +t(crit), reject H(O). Therefore, based on this
sample evidence, conclude that the regression coefficient is dif-
ferent from zero.


(b) S(YAT) = SQRT([4242! * [(1/8) + ([10**2!/[132!)!)
= 0.61


C.I. = YHAT +/- [t * S(YHAT)!
= [.52 + (.84*10)! +/- [1.943 * 0.61!
= 8.92 +/- 1.189
= from 7.73 to 10.11


(c) b(0) is the estimated value for BETA(0) which is the intercept
value on the Y-axis for the regression line.


b(1) is the estimated value for BETA(1) which is the slope of the
regression line. This indicates the ratio of the change in the Y-
variable with respect to the change in the X-variable for the par-
ticular line.


The fitted value that might cause some concern from a practical
standpoint is b(0), which implies that approximately a four hour
shift is needed even when no mail is on hand.


140.


A report for an organization states that a simple regression was used
to relate salary to sex. The independent variable for sex was coded
so that the regression coefficient for sex represented the salary ad-
vantage of males over females. The result of fitting over 700 pairs
of values was a regression coefficient of 5000 (advantage for males
of $5000). "A test of the regression coefficient at the 1% level was
significant. The correlation coefficient r was .24."


What action would you take on the basis of this report? Explain.



Answer:

Send the report writers back to re-examine their data.


1) The only time that a simple regression would be a good way to
estimate the effect of sex on salary would be when all other
important factors affecting response have been held constant
or nearly constant. That seems unlikely in most organizations.
(There should also be a straight line relation between salary
and sex. That should be a good bet unless there are more than
two sexes in the organization.)


2) For this data set, sex has only accounted for around 6% (.24**2)
of the variation in salary. It would take someone bolder than I
am to take action on a description that leaves 94% of the varia-
tion in salary unexplained. (The test of significance says that
the evidence at hand is consistent with the claim that the regres-
sion coefficient is not zero. It offers guarantees neither that
the model fitted is reasonable nor that a worthwhile amount of var-
iation in response has been accounted for.)


141.


The following data illustrates  the  relationship  between  income  and
education for a sample of nine U.S. workers.


Education (X(j) in years) Income (Y(j) in thousands of $)
0 5
6 6
8 7
10 9
12 8
12 10
12 12
14 11
16 12


a. Obtain a scattergram for the data.


b. Perform a regression analysis using the model:
Y(j) = a + b*X(j) + e(j)


c. Draw the regression line on the scattergram.


d. What income, in thousands of dollars, would you predict for a
single U.S. worker with 10 years of education?


e. Find the correlation coefficient for the data.


f. What proportion of the variance in income is "explained" by the
regression equation?


g. Based on this sample, how much extra income would an additional
year of education be worth to a person with less than 16 years
of education?



Answer:

a. Y ^
^ Connect points A & B to form the
^ graph of the regression line.
15 +
^
^ B
^ * *
^ *
10 + *
^ *
^ *
^ *
^ *
5 *
A
^
^
^
--+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+------> X
5 10 15


b. YHAT = 4.1063 + .47826(X)


c. Refer to scatter diagram above.


d. Prediction for a worker with 10 years of education:
YHAT = 4.1063 + .47826(10)
YHAT = 8.889


Therefore, I predict an income of $8,889 when education = 10 years.


e. r = .892


f. r**2 = .796, so 79.6% of the variation in income has been explained
by the model.


g. An additional year of education is worth $478, since b(1), the
regression coefficient, indicates the change in Y for a unit change
in X.


142.


An  experiment was conducted in a supermarket to observe the relation
between the amount of display space allocated to Petrushka brand
coffee and its weekly sales. The data for the five time periods are
below.


Space Allocated (sq. yds.) - X: 1 2 3 4 5
Weekly Sales (cases) - Y: 2 4 5 6 8


After gathering the data, Clark Kent, the SUPERmarket MANager,
discovered that the slope of the least squares line is 1.40 and the
intercept is +0.80.


a) Plot the least squares line and the data points. Comment on the
fit.


b) What would you predict the weekly sales would be if the manager
allocated 4.5 sq. yd.?


c) How much of an increase in sales can he expect for every extra
sq. yd. of display space?


d) What does LEAST SQUARES refer to?


e) Why are there two equations for the confidence intervals for a
future Y value, and when would you use each one?



Answer:

a) Y
^
^
9 +
^
8 + *
^
7 +
^
6 + *
Weekly ^
Sales 5 + B
(cases) ^
4 + * (NOTE: To plot an appro-
^ ximation to the regression
3 + line, connect points A & B.
^ Also note that point B is a
2 + * data point.)
^
1 A
^
-----+----+----+----+----+----+----+------> X
1 2 3 4 5
Space Allocated (sq. yds.)


The least squares line appears to fit the data very well.


b) Sales = (1.4*4.5) + .8
= 7.1


c) 1.4 sq. yd.


d) Least squares refers to minimizing the sum of squares of the dis-
tance between the regression line and the data points.


e) One equation is used to arrive at confidence intervals for predict-
ing the mean response, while the other is used for predicting a
particular response.


143.


A study was conducted in which typing speed (number of words per minute)
was measured each day after the beginning of a period of practice
typing. Part of the results of fitting a series of polynomial models
appear below.


df for Error S**2 R**2
Linear 9 28.40 .85
Quadratic 8 4.51 .98
Cubic 7 .91 .99


On the basis of this information, which model would you choose? Why?



Answer:

SSE(Quad.) = 4.51*8 = 36.08
SSE(Cubic) = 0.91*7 = 6.37
Diff. SS. = 29.71 with 1 df
F(CALC) = Mean Sq. Diff./Error Mean Sq. (Containing Model)
= 29.71/.91 = 32.65 with 1 and 7 df.
F(CRITICAL = ALPHA = .05, df = 1,7) = 5.59


Since F(CALC) is greater than F(CRITICAL), we reject the null
hypothesis that the coefficient of the cubic term equals zero.
I would, therefore, choose the cubic model.


144.


What would you guess the value of the correlation coefficient to be for
the pair of variables: "number of man-hours worked" and "number of
units of work completed"?


a) Approximately 0.9
b) Approximately 0.4
c) Approximately 0.0
d) Approximately -0.4
e) Approximately -0.9



Answer:

a) Approximtely 0.9


145.


The results of an imaginary investigation of the effect on sales of
different methods of displaying peaches included:
ANOVA
Source of Variation df SS M.S.


Total 25
Mean 1
Corrected Total 24
Day of the week 4 400 100
Fruit Market 4 1000 250
Display 4 200 50
Error 9 225 25


Using the information contained in this table perform appropriate
tests to decide if background variation accounts for the effect on
sales of:
a. Display
b. Day of Week
c. Fruit Market



Answer:

a. F(calculated) = 50/25 = 2
F(critical, df = 4, 9, ALPHA = .05) = 3.63
Therefore, retain the null hypothesis that the effect of display equals
zero.


b. F(calculated) = 100/25 = 4
F(critical) = 3.63
Therefore, reject the null hypothesis that the effect of the day of the
week equals zero.


c. F(calculated) = 250/25 = 4
F(critical) = 3.63
Therefore, reject the null hypothesis that the effect of fruit market
equals zero.
From these F tests we can see that background variation accounts for the
effect on sales of display only.


146.


To test the hypothesis that shelf placement influences sales, a
marketing researcher has collected data on sales in a random sample
of 15 comparable supermarkets with 3 different shelving policies
for an identical brand of soup. The data is weekly sales figures
(in tens of cans). Perform the appropriate test at the 5% level.
If you reject, which shelving policies are different? (Note:
1/SQRT(.4) = 1.6.)


bottom shelf middle shelf top shelf
sales sales sales
------------ ------------ ---------
10 25 10
5 20 10
10 25 20
10 30 20
15 50 40


Sums 50 150 100
Sums of
squared scores 550 5050 2600 8200



Answer:

OVERALL MEAN(SALES) 20
STANDARD DEVIATION 10
COEFFICIENT OF VARIATION 50


ANALYSIS OF VARIANCE:
---------------------
SOURCE OF VARIATION DF SS MEAN SQUARE F(CALC.)
UNCORRECTED TOTAL 15 8200.0000
CORRECT'N FOR MEAN 1 6000.0000
CORRECTED TOTAL 14 2200.0000
SHELF 2 1000.0000 500.00000 5.00
EXPERIMENTAL ERROR 12 1200.0000 100.00000


MEANS FOR SHELF
TREATMENT MEAN(SALES)
MIDDLE 30
TOP 20
BOTTOM 10


PROBABILITY LEVEL FOR COMPARING MEANS = .05
VALUE FOR STUDENT'S t (DF=12,ALPHA=.05,TWO-TAILED) = 2.179


LSD FOR ABOVE MEANS IS 13.7812 at PROB.LEVEL .05
(Note: LSD means Least Significant Difference.)
------------------------------------------------------------------------
F(critical, df=2,12, ALPHA=.05) = 3.88


Therefore, reject the null hypothesis that shelving policy does not
influence sales.


Based on the LSD given above, it appears that there is a significant
difference between the middle shelf and the bottom shelf.


147.


Suppose that you wish to test 4 brands of tires for length of
usefulness and that you have available 4 car-driver combinations.
Thus you have available 16 experimental units if you consider
each tire position on a car as an experimental unit: i.e.


Front-right Front-left Rear-right Rear-left
___________ __________ __________ _________
Car 1 Unit 1 Unit 2 Unit 3 Unit 4
Car 2 Unit 5 Unit 6 Unit 7 Unit 8
Car 3 Unit 9 Unit 10 Unit 11 Unit 12
Car 4 Unit 13 Unit 14 Unit 15 Unit 16


A. Assign Brands (A,B,C,D) randomly to experimental units
(i.e., Use a procedure appropriate to a completely random,
CR, design). Show how you used the random numbers table.
Do you see any dangers in using a CR design for this kind
of experiment?


B. Assign brands to experimental units subject to the restric-
tion that each brand must be tested once in each tire posi-
tion (i.e., Use a procedure appropriate to a randomized com-
plete block, RCB, design where tire position is used in form-
ing blocks). Show how you used the random numbers table. Do
you see any dangers in using an RCB design for this kind of
experiment?


C. Suggest a way of testing tires in this situation that might
overcome the dangers of using either a CR or RCB design.



Answer:

A. Generating randomly 16 numbers using a computer or using
random numbers table, will assign brands randomly to ex-
perimental units such that each brand appears 4 times in
the experiment, i.e.


2, 4, 8, 14, 13, 12, 7, 1, 16, 6, 15, 3, 9, 5, 10, 11


e.g. Brand A to unit 2
" B " " 4
" C " " 8
" D " " 14
" A " " 13
" B " " 12
" C " " 7
" D " " 1
etc.


(1) There is a possibility that one brand might
appear on only one tire position (e.g. A to 1,5,
9, and 13)


(2) Or one brand(s) might appear on one car (e.g.
B to 1,2,3, and 4).


B. Problem (1) will be solved since in RCB designs each brand
will be applied once to each tire position. e.g. In Front-
right (say as block I) randomly assign brands to units
1,5,9 and 13 (e.g A to 5, B to 13, C to 1 and D to 9).
This scheme will eliminate the position to position vari-
ation. But one would expect with this scheme a car to
car variation (row-wise).


C. A Latin square design will resolve (2) as opposed to RCB
and resolve (1) and (2) as opposed to CR. In this scheme
each brand will appear once and only once in each position
and each car. One way is:


_________________________
^ A ^ B ^ C ^ D ^
^_______________________^
^ B ^ C ^ D ^ A ^
^_______________________^
^ C ^ D ^ A ^ B ^
^_______________________^
^ D ^ A ^ B ^ C ^
^_______________________^


148.


The manager of a department store wished to compare the influence of
background music on the volume of sales in the shoe department.
He wished to test:


T1. Waltzes,
T2. Marches,
T3. Acid Rock,
T4. Polkas


He decided to use the same treatment for a sales period where each
week provided four periods.


P1. Friday - 10 a.m. to 3 p.m.
P2. Friday - 3 p.m. to 8 p.m.
P3. Saturday - 10 a.m. to 3 p.m.
P4. Saturday - 3 p.m. to 8 p.m.


He also decided to use one month (4 weeks) for testing. When asked
what, if any, differences would he find if the same background music
were used during all test periods he answered:


a. Sales would be greatest during P2 and P4. P3 would be better
than P1.


b. Sales would be best during the first week of the month, next
best during the 3rd week, and about equally poor during the 2nd
and 4th week.


A. Define and illustrate experimental unit in terms of this problem.
B. Would you elect to conduct this inquiry as indicated?
C. Suppose that you have no alternative but to conduct an experiment
under the conditions decreed by the manager. Which of the common
designs discussed in the course would you use?
D. Write the model for the design you have chosen. Define all terms
carefully. (Be sure that your definitions of terms is relevant
to this particular problem.)



Answer:

A. An experimental unit is one of the four time periods during a
certain week, such as: Saturday from 10 a.m. to 3 p.m. during
the first week.
B. Problems:
1. The treatment set doesn't include a treatment with no
music. Yet that would seem to be a reasonable stan-
dard for comparison. This treatment set only allows
comparisons among conditions involving background music.
2. If a randomized block or latin square design is to
be used it requires the assumption that there is no
interaction between treatments and the blocking factor.
It seems questionable that the difference in sales be-
tween, say, acid rock and waltzes would be the same for
10 a.m. to 3 p.m. on Friday and 3 p.m. to 8 p.m. on
Saturday.
C. I would use the Latin Square design with time of day and week
of month as my blocking factors.
D. Y(I,J,K) = MU + TAU(I) + RHO(J) + KAPPA(K) + EPSILON(I,J,K)
with I = 1, 2, 3, 4
J = 1, 2, 3, 4
K = 1, 2, 3, 4


where Y(I,J,K) is the response
MU is the overall mean
TAU(I) are the treatment (type of music) effects
RHO(J) are the effects of the time period
KAPPA(K) are the effects of the weeks of the month
EPSILON is the random error


149.


A test was conducted to compare the relative effectiveness of three
waterproofing compounds, (A,B,C). A strip of cloth was subdivided
into nine pieces - - -


Left Center Right
_____ _____ _____ _____ _____ _____ _____ _____ _____


_____ _____ _____ _____ _____ _____ _____ _____ _____


Each piece was considered to be an experimental unit, but it was
suspected that the pieces differed systematically from left to
right in capacity to become waterproofed. Accordingly, the
random assignments of compounds to experimental units was res-
tricted so that:


I. Each compound was tested once in each set of three pieces (sets
are left, center, and right); and
II. Each compound was tested once in each of the positions within a
set of three (once furthest left in a section, once in the cen-
ter of a section, and once on the right of a section).


a. Write a model appropriate to such a trial.
b. Analyze and interpret the following results for such a randomization
scheme:


Left Center Right
_____ _____ _____ _____ _____ _____ _____ _____ _____
B, 12 A, 15 C, 16 A, 11 C, 17 B, 10 C, 10 B, 12 A, 14
_____ _____ _____ _____ _____ _____ _____ _____ _____


(consider higher numbers as better)



Answer:

a. This is an LSQ design where the model is:


Y(I,J,K) = MU + TAU(I) + RHO(J) + KAPPA(K) + EPSILON(I,J,K)
Y is response, degree of waterproofing
MU is an overall mean for waterproofing
TAU(I) are the treatment effects
RHO(J) are the column effects, or piece position on cloth
KAPPA(K) are the row effects, or the position within the piece
EPSILON is the random error, assumed to be normally distributed
with mean = 0 and variance = SIGMA**2


Estimates of parameters


SIGMA**2 = 5.333


MU 13 RHO(1) -2 KAPPA(1) 1.333
TAU(1) .333 RHO(2) 1.667 KAPPA(2) - .333
TAU(2) - 1.667 RHO(3) .333 KAPPA(3) -1
TAU(3) 1.333


Treatment means were:
C = 14.333
A = 13.333
B = 11.333


b. None of the differences among treatment means appear to be signi-
ficant; they are all less than the LSD of 18.7148 (ALPHA = .01).


The F test for treatments (alternative test with higher Type II
error rate):


H(0): TAU(1) = TAU(2) = TAU(3) = 0
F(calculated) = 1.3125
F(table, ALPHA = .01, df = 2,2) = 99,


also does not allow one to reject H(0). In conclusion, it appears
that none of the compounds are significantly different from any
other at ALPHA = .01.


150.


A test has been conducted in which four tire brands have been tested
using 12 experimental units where an experimental unit consisted of one
tire position on one car. The random assignment of brands to experi-
mental units was restricted so that each brand was tested once on each
car. Results (in amount of wear) were:


Front Right Front Left Rear Right Rear Left


Car 1 D, 7.17 A, 7.62 B, 8.14 C, 7.76
Car 2 B, 8.15 A, 8.00 D, 7.57 C, 7.73
Car 3 C, 7.74 B, 7.87 A, 7.93 D, 7.80


a. Write a model appropriate to this trial and estimate all parameters.
b. Do any of the assumptions for this design make you uneasy? Explain.
c. Analyze and interpret these results.



Answer:

a. The model is Y(I,J) = MU + TAU(I) + RHO(J) + EPSILON(I,J)
where Y is the response, tread wear
TAU(I) are the treatment effects, effects of tire brand
RHO(J) are the block effects, effects of car
EPSILON is the random error term with mean = 0 and
variance = SIGMA**2
MU is the overall mean


Estimates of parameters:


MU(HAT) = 7.79
TAU(A,HAT) = .0599 = .06
TAU(B,HAT) = .2633
TAU(C,HAT) = -.04667 = -.047
TAU(D,HAT) = -.27667 = -.277
RHO(1,HAT) = -.1175
RHO(2,HAT) = .0725
RHO(3,HAT) = .045


SIGMA**2 = .0419 with 6 df.


b. Using a randomized block (RCB) design makes me uneasy since I would
expect wheel position on car to also affect tread wear. Therefore,
I would also block on wheel position as well as car and use a Latin
Square design.


c. Treatments means are: B = 8.053, A = 7.85, C = 7.743, D = 7.513
Only one difference is significant at the .05 level. Tires B and
D are different since their difference is greater than the LSD.
(B - D) +/- LSD
.54 +/- .409
Interval is from .131 to .949
Since the interval does not include zero, we reject the null hypo-
thesis that the true difference is zero.


The F test for treatments fails. This is the case where the LSD
indicates a significant difference while the F test of treatments
doesn't. These procedures usually are different and usually have
different properties regarding Type I and Type II error rates.
Here, the LSD is more exposed to Type I errors and the F test is
more exposed to Type II errors.


151.


Write out the sources of variation and the degrees of freedom for the
following industrial experiment. Mention also the name of the design.


Three machines were used to produce parts made from four kinds of
metal. Each machine made one part from each type of metal. The order
with which the metals were assigned to the machines was established
through a randomization procedure.



Answer:

Source of Variation df
------------------- --


Total 12
Mean 1
Metals 3
Machines 2
Residual 6
(Metal x Machine)


This is a randomized block experiment with metals playing the role of
blocks.


152.


The Crapi Cable Company #35 cable has a mean breaking strength of 1800
pounds with a standard deviation of 100 pounds. A new material is used
which, it is claimed, increases the breaking strength. To test this
claim a random sample of 50 cables, manufactured with the new material,
is tested. It is found that the sample has a mean breaking strength
of 1850 pounds. Test this claim using ALPHA = .01.



Answer:

Hypothetical population: All Crapi #35 cables made with the new
material.
Sample: The 50 cables randomly selected.


H(O): MU = 1800. The mean breaking strength of the new cable is
1800 lb.
H(A): MU > 1800. The mean breaking strength of the new cable is
more than 1800 lb.


MU(XBAR) = 1800 by H(O)


SIGMA(XBAR) = SIGMA/SQRT(n)
= 100/SQRT(50)
= 14.142


XBAR(crit) = MU(XBAR) + Z(crit)*SIGMA(XBAR)
= 1800 + (2.33)*(14.142)
= 1832.951


Since the sample mean breaking strength is 1850, which is greater than
1832.51, we must reject H(O) and conclude that the mean breaking
strength of the new cable is significantly more than 1800 lb.


153.


In the past a chemical fertilizer plant has produced  an  average  of
1100 pounds of fertilizer per day. The record for the past year based
on 256 operating days shows the following:


XBAR = 1060 lbs/day
S = 320 lbs/day


where XBAR and S have the usual meaning. It is desired to test
whether or not the average daily production has dropped significantly
over the past year. Suppose that in this kind of operation, the
traditionally acceptable level of significance has been .05. But the
plant manager, in his report to his bosses, uses level of significance
.01. Analyze the data at both levels after setting up appropriate
hypotheses, and comment.



Answer:

H(O): MU = 1100
H(A): MU < 1100


Since n = 256, use Z to approximate t.


S(XBAR) = 320/SQRT(256)
= 320/16
= 20


Z(calculated) = (1060 - 1100)/20
= -40/20
= -2


Z(critical, ALPHA=.05, one-tailed) = 1.645


Z(critical, ALPHA=.01, one-tailed) = 2.33


Therefore, H(0) is rejected at ALPHA=.05 but continued at ALPHA=.01.
It appears that the manager is trying to pull a fast one on his
bosses by using ALPHA=.01 and saying production has not dropped.
However, if the traditional level of significance is used, ALPHA=.05,
there is evidence that indicates a drop in production.


154.


The Pfft Light Bulb Company claims that the mean life of its 2 watt
bulbs is 1300 hours. Suspecting that the claim is too high, Nalph
Rader gathered a random sample of 64 bulbs and tested each. He found
the average life to be 1295 hours with s = 20 hours. Test the com-
pany's claim using ALPHA = .01.



Answer:

Hypothetical population: All Pfft 2 watt bulbs.
Sample: The 64 randomly selected bulbs.


H(O): MU = 1300. The mean life of 2 watt bulbs is 1300 hours.
H(A): MU < 1300. The mean life of 2 watt bulbs is less than 1300
hours.


MU(XBAR) = 1300 by H(O)


S(XBAR) = S/SQRT(n)
= 20/8
= 2.5


XBAR(crit) = MU(XBAR) + Z(crit)*S(XBAR)
= 1300 - 2.33*2.5
= 1294.18


Since 1295 is not less than 1294.18, we cannot reject H(O). There is
not enough evidence to conclude that the mean life of the 2 watt bulbs
is significantly less than 1300 hours.


155.


The Rickety Railroad Company claims that .5 of the trains on its Foggy
Bottom branch run on time. An Interstate Commerce Commission investi-
gator doubts the claim but is uncertain about whether the true frac-
tion is less than or greater than the claim. A random sample of 64
trains was checked and he found that 21 were on time. Test the com-
pany's claim using ALPHA = .05.



Answer:

Hypothetical population: All trains on the Foggy Bottom branch.
Sample: The 64 randomly selected trains.


H(O): PI = .5 50% of the Foggy Bottom branch trains run on time.
H(A): PI =/= .5 Other than 50% of the Foggy Bottom branch trains
run on time.


We can use the normal approximation since:


n*PI = 64 * .5 = 32 > 5; and
n*(1 - PI) = 64 * (1 - .5) = 32 > 5.


MU(p) = .5 by H(O)


SIGMA(p) = SQRT(PI*(1 - PI)/n)
= SQRT(.5*(1 - .5)/64)
= .0625


p(crit) = MU(p) +/- Z(crit)*SIGMA(p)
= .5 +/- 1.96*.0625
= .6225, .3775


In this case, p = 21/64 = .328, which is less than .3775, so we
reject H(O). Other than 50% of the Foggy Bottom branch trains
run on time.


OR:


Z(calc) = (p - PI)/SIGMA(p)
= ((21/64) - .5)/(.0625)
= -2.75


Since -2.75 < -1.96 (Z-crit), we reject H(O) and reach the same
conclusion as above.


156.


Define the term "stratified sample" and explain
why it would be useful in the following situation. A
company is composed of many small plants located through-
out the United States. A Vice President of the company
wants to determine the opinions of the employees on
the vacation policy.



Answer:

Answer - A stratified sample is one which has been
obtained by a procedure in which the frame is divided into
non overlapping categories (strata). Sampling units are
then selected at random from each stratum thus assuring
that all strata are represented in the sample.
For the given problem, I would suggest a type of
stratified sampling procedure. Specifically, I would
recommend that each plant be considered a stratum and a
random sample obtained within each stratum to insure that
all plants are represented in the sample. I would
further suggest that the sample from each stratum represent
the proportional size of that stratum. For example,
if plant A employs 25% of the total company's employees,
then the sample from plant A should represent 25% of the
total sample obtained.


157.


A company wants to estimate with a degree of confidence of  0.95  and
with an absolute error not greather than $4.00 the true mean dollar
size of orders for a particular item. How large a sample should the
company take from its very extensive records to meet this requirement,
if SIGMA is assumed to equal $20.00?



Answer:

(2 * SIGMA)/SQRT(n) = d
(2 * 20)/SQRT(n) = 4


n = 100


Note: 1.96 is a more accurate estimate of the critical value, but
complicates the computation.


158.


A manufacturer wishes to determine the average weight of a certain type
of product in order to design the proper package. What size sample is
required so that the risk of exceeding an error of .20 pounds is .010?
(Note: Errors can be positive or negative.) Assume SIGMA is
1.10 pounds.



Answer:

Using Z = (XBAR - MU)/(SIGMA/SQRT(n))


we get n = ((Z) (SIGMA)/(XBAR - MU))**2
n = ((2.576)(1.1)/.2)**2
n = 200.73


Therefore, a sample size of 201 is required.


159.


An aircraft parts manufacturer wishes to determine the average shearing
strength of a certain type of weld in order to submit a bid for a con-
tract to produce these parts. What size sample is required so that the
risk of exceeding an error of 20 pounds or more is .005? Assume that
SIGMA is 100 pounds.



Answer:

Using a significance level = .005, Z = 2.576


n = (Z**2)(SIGMA**2)/(e**2)
= (2.576**2)(100**2)/(20**2)
= 165.9
== 166


160.


In a random sample of flashlight batteries, the average useful life was
22 hours and the sample standard deviation was 5 hours. How large
should the sample size be if you want the mean of your sample to be
within 1 hour of MU 99 times out of 100 in repeated sampling?



Answer:

If significance level = .01, then Z = 2.576
SIGMA(HAT) = 5 hours
Tolerable error = 1 hour


n = (Z**2)(SIGMA**2)/(error**2)
= (2.576**2)(5**2)/(1**2)
= 165.89
= 166


161.


A floor manager of a large department store is studying the buying
habits of his customers. Suppose he has good reason to believe that an
estimate of $600 for the population mean for the amount spent in his
store each year is wrong. He makes preparation to draw a sample but
lacks the funds to draw N=100 as he had planned. How large a sample
need he draw in order to estimate the population mean within $100 of
the true value with probability of 0.95? (Assume SIGMA = $500.)



Answer:

n = [Z(ALPHA/2) * SIGMA/e!**2; e = tolerable error
= [(1.96) * 500/100!**2
= 96


162.


The variance of average family income in New York State is known to be
about the same as it is in Delaware. The mean family income is to be
estimated by a sample survey in each state. It is desired to have the
sampling error equal in both states. If the recommended sample size for
Delaware is 2500 families:


a) What size sample would you take in New York State?
b) What statistical formula supports your answer to part (a)?



Answer:

a) 2500


b) Variance of mean = SIGMA**2/n


Quantity estimated in each state is a mean. SIGMA is the same for
each state, so equal sampling error can be achieved by using equal
sample size.


163.


Should a sample survey be considered before a complete count
(census)? If so, why? Be brief.



Answer:

Yes, most of the time a sample survey should be considered before
a census because it has the following advantages:


1) Greater accuracy advantage - it is a curious fact that the
results from a carefully planned and well executed sample
survey are expected to be more accurate than those from a
complete census.


2) Cost advantage - if data are secured from a small fraction
of the population, expenditures are smaller than if a com-
plete count is attempted. Low cost permits expansion of
the statistical program and expanded usefulness.


3) Time advantage - in many research problems, time is a criti-
cal factor. A sample of the data can be collected, coded,
tabulated and analyzed more quickly than a complete count.


4) Destructive nature of the test - in order to make observations
in some problems, particularly those dealing with manufactured
products, the elementary units being observed must be destroyed
or weakened. To test all units in the population would result
in damage to all units.


164.


What are the major sources of uncertainty (error) in sample survey data?
Describe them and give an example of each.



Answer:

All data, whether obtained by a census or sample, are subject to various
types of uncertainties. There are three types of uncertainties:


1. Structural Limitations - are defects that are built into the survey
procedures. The following are some examples:


a. Failure to obtain observations which would be useful;
b. Unclear or biased wording of the questionnaire;
c. Poor selection of the measuring or testing instrument;
d. Too large a gap between the frame and population;
e. Poor choice of survey data;
f. Incorrect usage of statistical formulas for calculating
estimates.


The best control and method for avoiding structural limitation is
achieved by detailed planning and design, through testing, review
of the literature, and prior studies.


2. Operational Blemishes and Blunders - originate in the execution of
the work. The following are some examples:


a. Failure to ask some of the questions;
b. Asking questions not on the questionnaire;
c. Mistakes in reading the measuring instrument;
d. Nonresponse or refusal;
e. Keypunch errors.


The detection, control and measurement of operational blemishes and
blunders can be achieved through a repetition or audit of the sample,
thereby enabling evaluation of their impact on the estimates.


3. Random Variation - is measured by the standard error of estimate.
The first source of random variation is simply the variability
(spread, dispersion) among the sampling units in the frame. The
second source of random variation is from inherent, uncorrelated
or nonpersistent, accidental variations of the cancelling nature
that arise from inherent variability, perhaps on an hourly basis,
of the investigators, supervisors, editors, coders, keypunchers,
and other workers.