SAS Statistical Library  
SAS - The power to know(tm)
  Home | Solve Exercises | Download SAS Data | Learn How in SAS | Download Questions
splash  
color
   
 

1.


Explain briefly how you would decide which of the following two events
is the more unusual:


a. A 90 degree day in Vermont.
b. A 100 degree day in Florida.



Answer:

One would examine previous weather records and note the relative fre-
quency of 90+ degree days in Vermont, and 100+ degree days in
Florida. Thus, relative frequencies are one method of estimating the
probability of each event. (Note: The smallest frequency is the
more unusual event.)


2.


The Getrich Tire Company is having a tire sale on tires salvaged from
a train wreck. Of the 15 tires offered in the sale five tires have
suffered internal damage and the remaining ten are damage free. You
are planning on purchasing two of these tires. In finding the proba-
bility that the two tires selected at random from the 15 will be damage
free, the probability distribution to use is:


(a) Normal (c) Hypergeometric
(b) Poisson (d) Binomial



Answer:

(c) Hypergeometric


Since two tires are selected without replacement,
it is Hypergeometric.


3.


As a seamstress you have observed that flaws in a certain type of
material occur on the average of 0.2 per yard. The distribution
to find the probability of no more than one flaw occurring in a
dress requiring four yards of this material would be:


(a) Normal (c) Hypergeometric
(b) Poisson (d) Binomial



Answer:

(b) Poisson; with LAMBDA = .2


Any random phenomenon for which a count of some sort is of interest
is a candidate for modeling by assuming a Poisson distribution.


4.


Suppose that a neutron passing through plutonium is equally likely to
release 1, 2, or 3 other neutrons, and suppose that these second
generation neutrons are in turn each likely to release 1, 2, or 3
third generation neutrons. What is the probability distribution of
the number of third generation neutrons? What is the mean of this
distribution?



Answer:

To find the probability distribution use a tree diagram and count.


N=n ^ 1 2 3 4 5 6 7 8 9
------------------------------------------------------------------
P(N=n) ^ 1/9 4/27 16/81 12/81 12/81 10/81 2/27 1/27 1/81


Mean = E(N) = (1)(1/9) + (2)(4/27) + ... + (9)(1/81) = 4


5.


The lengths of Frank's 24 inch franks are normally distributed with
mean of 2 feet and variance of 0.03 feet. If you purchase 3 of Frank's
franks for your family, what is the probability that you will have a
total length of hot dogs in excess of 6.60 feet?


a. .008 b. .014 c. .019 d. .023 e. .029



Answer:

d. .023


Variance = 3*(.03) = .09
Z = (6.60 - 6)/SQRT(.09) = .60/.3 = 2


Area beyond Z = .0228 == .023


6.


If the life of wild pheasants follows a normal distribution with a
mean of 9 months and a variance of 9, what percent of the population
will be less than 11 months of age?
(Note that MU = 9 and SIGMA(X)**2 = 9.)


(a) 34.13 (c) 74.86
(b) 84.13 (d) 62.93



Answer:

(c) 74.86


Z = (11 - 9)/3 = .67
P(Z < .67) = (.2486) + (.5000)
= .7486
= 74.86%


7.


The distribution of lifetimes for a certain type of light
bulb is normally distributed with a mean of 1000 hours and a
standard deviation of 100 hours. Find the 33rd percentile of
the distribution of lifetimes.


a. 560
b. 330
c. 1044
d. 1440
e. none of these



Answer:

e. none of these
P(z=?) = .33
z = -.44
-.44 = (x-1000)/(100)
x = -44 + 1000
= 956


8.


In testing a new rifle, the new rifle and a standard rifle are fired a
large and equal number of times under similar conditions. The new rifle
scored 53 hits while the old rifle scored 47 hits. For a comparison
of the two rifles, the total number of hits by the two rifles may be re-
garded as equivalent to 100 flips of a coin and a hit by the new rifle
as a head. Consider the null hypothesis that the two rifles are equally
good (prob. of head = 1/2) against the alternative that the new rifle
is better. Answer the following three questions without using the
continuity correction.


A. The Z-value is


1) .5
2) .6
3) -.6
4) 4.3
5) none of the above


B. The significance level is


1) 0.011
2) 0.274
3) 0.726
4) less than 0.001
5) none of the above


C. The null hypothesis should


1) be rejected at 5% but not at 1% level
2) be rejected at 1% but not at 5% level
3) be rejected at either 5% or 1% level
4) not be rejected at either 1% or 5% level
5) not be continued or rejected without further information



Answer:

A. (2) .6
XBAR = np = 100*.5 = 50
SIGMA = SQRT(npq) = SQRT(100 * .5 * .5)
= 5


Z = (53 - 50)/5 = .6


B. (2) 0.274


C. (4) not to be rejected at either 1% or 5% level.


9.


Molybdenum  rods produced on a production line are supposed to average
2.2 inches in length. It is desired to check whether the process is in
control. Let X = length of such a rod. Assume X is approximately nor-
mally distributed with mean = MU and variance = SIGMA**2, where the mean
and the variance are unknown.


Suppose a sample of n = 400 rods is taken and yields a sample average
length of XBAR = 2 inches, and SUM((X - XBAR)**2) = 399.


To test H(0): MU = 2.2 vs. H(1): MU =/= 2.2 at level ALPHA = 8%, one
would use a _____ confidence interval for MU and hence a table value
of _____.


a) 92%, 1.67
b) 92%, 1.41
c) 92%, 1.75
d) 96%, 2.06
e) 96%, 1.75



Answer:

c) 92%, 1.75


10.


A rod from a production line has length X where X is normally
distributed with mean = 2 and variance = 1/2.


Draw two rods X(1) and X(2) and place them end to end. The sum
of their lengths is X(1) + X(2).


P(X(1) + X(2) < 3.6) = P(XBAR < 1.8) since (X(1) + X(2))/2 =
XBAR, for sample size n = 2. Hence P(X(1) + X(2) < 3.6) is
expressible in Z terms as


a) P(Z < -SQRT(2)/5)
b) P(Z < -(2/5))
c) P(Z < -(4/5))
d) P(Z < -(1/5))
e) P(Z < -(2*SQRT(2)/5))



Answer:

b) P(Z < -(2/5))


SIGMA(XBAR) = SQRT((SIGMA**2)/n)
= SQRT((1/2)/2)
= 1/2


Z = (XBAR - MU)/(SIGMA(XBAR))
= (1.8 - 2)/(1/2)
= -(1/5)/(1/2)
= -(2/5)


11.


Rods produced by G&R Company are normally distributed with a mean of 66
cm. and a standard deviation of 2 cm. Rods are too long to be useable
if they are longer than 68.5 cm. What percentage of these rods are too
long?


a) 0.1056 b) 0.1151 c) 0.3849 d) 0.3944
e) None of the above are correct.



Answer:

a) 0.1056


Z = [X - MU!/[SIGMA!
= [68.5 - 66!/[2!
= 1.25


Prob.(Z>1.25) = .1056


12.


A particular type of bolt is produced having diameters with mean 0.500
inches and standard deviation 0.005 inches. Nuts are also produced
having inside diameters with mean 0.505 inches and standard deviation
0.005 inches. If a nut and a bolt are chosen at random, what is the
probability that the bolt will fit inside the nut?





Answer:

Mean for the distribution of differences = .005
Standard deviation = SQRT((.005)**2/1 + (.005)**2/1) = .007071


Z = value of interest - mean of distribution (of differences) /
standard error of the distribution of differences


Z = 0 - .005/.007071 = -.71


We want all the area to the right of -.71


= .7611 or 76%.


13.


It is known that the lengths of a particular manufactured item are
normally distributed with a mean of 6 and a standard deviation of
3. If one item is selected at random, what is the probability that
it wil fall between 5.7 and 7.5?



Answer:

P(5.7 < Y < 7.5) = P((5.7-6)/3 < Z < (7.5-6)/3)
= P(-.1 < Z < .5)
= .0398 + .1915
= .2313


14.


A company manufactures cylinders that have a mean 2 inches in
diameter. The standard deviation of the diameters of the cylinders
is .10 inches. The diameters of a sample of 4 cylinders are
measured every hour. The sample mean is used to decide
whether or not the manufacturing process is operating satisfactorily.
The following decision rule is applied: If the mean diameter
for the sample of 4 cylinders is equal to 2.15 inches or more,
or equal to 1.85 inches or less, stop the process. If the
mean diameter is more than 1.85 inches and less then 2.15
inches, leave the process alone.
a. What is the probability of stopping the process if the
process average MU, remains at 2.00 inches?
b. What is the probability of stopping the process if the
process mean were to shift to MU = 2.10 inches?
c. What is the probability of leaving the process alone if
the process mean were to shift to MU = 2.15 inches?
To MU = 2.30 inches?



Answer:

a. Z = (1.85 - 2.00)/(.10/SQRT(4)) or (2.15 - 2.00)/(.10/SQRT(4))
= -.15/.05 or = +.15/.05
= -3 = +3
P(Z<-3 or Z>+3) = .0013 + .0013
= .0026
b. Z = (1.85 - 2.10)/.05 or (2.15 - 2.10)/.05
= -2.5/.05 .05/.05
= -5 1
P(Z<-5 or Z>1) = .00000 + .1587
= .1587
c. Using MU = 2.15:
Z = (1.85 - 2.15)/.05 or (2.15 - 2.15)/.05
= -.30/.05 0/.05
= -6 0
P(-6Using MU = 2.30:
Z = (1.85 - 2.30)/.05 or (2.15 - 2.30)/.05
= -.45/.05 -.15/.05
= -9 -3
P(-9

15.


A company manufactures rope.  From a large number of tests over a long
period of time, they have found a mean breaking strength of 300 lbs.
and a standard deviation of 24 lbs. Assume that these values are
MU and SIGMA.


It is believed that by a newly developed process, the mean breaking
strength can be increased.


(a) Design a decision rule for rejecting the old process with an
ALPHA error of 0.01 if it is agreed to test 64 ropes.


(b) Under the decision rule adopted in (a), what is the probability
of accepting the old process when in fact the new process has
increased the mean breaking strength to 310 lbs.? Assume SIGMA
is still 24 lbs. Use a diagram to illustrate what you have done,
i.e., draw the reference distributions.



Answer:

a. One tail test at ALPHA = .01, therefore Z = 2.33.


Z = (YBAR-MU)/(SIGMA/SQRT(n))
2.33 = (YBAR-300)/(24/SQRT(64))
YBAR = 307


Decision Rule: If the mean strength of 64 ropes tested is 307
lbs. or more, we reject the hypothesis of no im-
provement, i.e., we continue that the new process
is better.


b. If available, consult file of graphs and diagrams that could not
be computerized for reference distributions.


Z = (307-310)/(24/SQRT(64)) = 1.00
Area = 0.1587 or 15.87%


P(type II error) = 0.1587


16.


A certain kind of automobile battery is known to  have  a  length  of
life which is normally distributed with a mean of 1200 days and
standard deviation 100 days. How long should the guarantee be if the
manufacturer wants to replace only 10% of the batteries which are
sold?



Answer:

Z = -1.28 for 10 percent failure


-1.28 = (X - 1200)/100


X = 1072 days for guarantee


17.


It  is  known  from  past experience that when a certain type of farm
machine is used, the length of time it will run before needing an
overhaul is approximately normally distributed with MU=455 hours and
SIGMA=50 hours. When running the output is 100 bushels per hour.


a. What is the probability that such a machine will process at least
40,000 bushels before needing an overhaul?


b. If a large number of such machines are put into service about 25%
will be running after X hours. Calculate X.


c. If 25 machines are put into service, what is the probability that
their AVERAGE life will be at least 445 hours?



Answer:

a. Z = (40,000-455*100)/(50*100) = -1.1
prob. = .5 + .3643 = .8643


b. prob. = .25 Z = .68
X = 455*100 + 50*100*.68 = 48,900


c. S = 50/SQRT(25) = 10
Z = (445-455)/10 = -1


prob. = .5 + .3413 = .84


18.


Suppose the hour life lengths, X(1) and X(2), of two brands of
electronic tubes, say T(1) and T(2), are:


MU(1) = 100 MU(2) = 102
SIGMA(1)**2 = 36 SIGMA(2)**2 = 9


a. Find the value of K such that P(X(1) > K) = .93319.


b. If a tube is needed for a 106 hour time period, which brand
should be selected? Why?


c. If one tube is selected at random from brand T(1), find the
probability that its life will exceed 100 hours.


d. Find P(X(2)-X(1) > 0).



Answer:

a. P(Z > Y) = .9334 or P(Z < Y) = .0666
Therefore, Y = -1.5


Using the formula Z = (X - MU)/SIGMA:
X = (Z*SIGMA) + MU
K = (-1.5)(6) + 100
= -9 + 100
= 91


b. Z = (106 - 100)/6 Z = (106 - 102)/3
= 1 = 1.3
P(Z > 1) = .1587 P(Z > 1.3) = .0918


T(1) should be selected because 15.87% of T(1) tubes last for
106 hours or more, but only 9.18% of T(2) tubes last that long.


c. Z = (100 - 100)/6
= 0/6
= 0
P(Z > 0) = .5000


d. Since the variances are known, the standard error of differences
between elements equals:


SIGMA(X(2) - X(1)) = SQRT[(SIGMA(2)**2) + (SIGMA(1)**2)!
= SQRT(36 + 9)
= 6.71


MU(X(2) - X(1)) = 2


Therefore, Z = (0 - 2)/6.71 = -.3
P(Z > -.3) = .1179 + .5000 = .6179 or 61.79%.


19.


Suppose that you work for a brewery as a clerk to receive barley
shipments. As part of your job you are to decide whether to keep
or return new shipments of barley. The criteria used for making your
decision is an estimation of the moisture content of the shipment.
If the moisture level is too high (above 17.5%) the shipment has a
good possibility of rotting before use and, therefore, a loss of
money to the company. You know from past experience that the variance
for all barley shipments is 36 and that your staff can process at the
most one sample of 9 moisture readings per shipment.


a. Propose a rule for accepting and rejecting grain shipments on the
basis of sample means where the null claim is a shipment has a
mean moisture content of 17.5% or less (H(0): MU <= 17.5%).
Let the probability of Type I error be .10.


b. When will you make incorrect decisions about a grain shipment
having MU = 17.4? What will be the probability of such an
error?


c. When will you make incorrect decisions about a grain shipment
having MU = 19? What will be the probability of such errors?


d. When will you make incorrect decisions about a grain shipment
having MU = 21? What will be the probability of such errors?



Answer:

SIGMA**2 = 36
Take a sample, n = 9
SIGMA(XBAR) = SIGMA/SQRT(n) = 6/3 = 2


a. H(0): MU <= 17.5
H(1): MU > 17.5


ALPHA = .10 implies Z = 1.28
Z = XBAR - MU/SIGMA(XBAR)


1.28 = XBAR - 17.5/2
2.56 = XBAR - 17.5


XBAR = 20.06


Reject H(0) when XBAR > 20.06.


b. I am rejecting H(0) when XBAR > 20.06, so when MU is REALLY 17.4,
I make incorrect decisions whenever XBAR > 20.06.


Z = 20.06 - 17.4/2
Z = 1.33
Area beyond Z = 1.33 is .0918.


The probability of an incorrect decision is .0918.


c. I am rejecting H(0) when XBAR > 20.06, so when MU is REALLY 19,
I make incorrect decisions whenever XBAR <= 20.06.


Z = 20.06 - 19/2
Z = .53
Area between mean and Z = .2019.


The probability of an incorrect decision is .5 + .2019 = .7019.


d. I am rejecting H(0) when XBAR > 20.06, so when MU is REALLY 21,
I make incorrect decisions whenever XBAR < 20.06.


Z = 20.06 - 21/2
Z = -.47
Area beyond Z = -.47 is .3912.


The probability of an incorrect decision is .3912.


20.


A man purchases 100 boxes of nails, each box containing 1000 nails.
If, on the average, one out of every 500 nails is rusty, how many of
the 100 boxes would you expect to contain less than 2 rusty nails?


a. 18 b. 27 c. 41 d. 49 e. 68



Answer:

c. 41


f(X) = (LAMBDA**X)*(exp(-LAMBDA))/X], where LAMBDA = np = 2
100*(f(0) + f(1)) = 100 * (exp(-2) + 2exp(-2))
= 13.5 + 27.0
== 41


21.


Failures of electron tubes in airborne applications have been found to
follow closely the Poisson Distribution. A receiver with sixteen tubes
suffers a tube-failure on the average of once every 50 hours of operat-
ing time. Find the probability of more than one failure on an eight
hour mission.



Answer:

Using the Poisson distribution where:


P(Y) = ((e** - LAMBDA)(LAMBDA**Y))/(Y])


LAMBDA = 8/50 = .16


P(Y > 1) = 1 - P(0) - P(1)
= 1 - ((e**(-.16))*(.16**0))/0]
- ((e**(-.16))*(.16**1))/1]
= .1152


22.


Suppose the weather forecaster is either  right  or  wrong  with  his
daily forecast and that the probability he is wrong on any day is .4.
Assume his performance is to be evaluated on 18 randomly selected
days such that his performance is independent from day to day. Let A
be the event that he is wrong on less than 5 of the days.


a. Find the exact value of P(A).
b. Find the approximate value of P(A), based on the Poisson
approximation.
c. Is the approximation in (b.) valid? Why or why not?
d. Find the approximate value of P(A), based on the Central Limit
Theorem. (Hint: SIGMA**2 = np(1 - p))
e. Is the approximation in (d.) valid? Why or why not?



Answer:

a. P = P(4 wrong) + P(3 wrong) + P(2 wrong) + P(1 wrong) + P(0 wrong)
= (18C4)(.4**4)(.6**14) + ... + (18C0)(.4**0)(.6**18)
= .061 + .025 + .007 + .001 + .000
= .094


b. LAMBDA = np = (18)(.4) = 7.2
P = (7.2**4)(e**-7.2)/4] + (7.2**3)(e**-7.2)/3] +
(7.2**2)(e**-7.2)/2] + (7.2)(e**-7.2) + (e**-7.2)
= .084 + .046 + .019 + .005 + .001
= .115


c. No, because n is too small and P is too large.


d. p = .4
SIGMA = SQRT(npq) = 2.08
MU = np = 7.2
4.5 in standard units is -1.30 = (4.5 - 7.2)/2.08.
P(Z < -1.30) = .5 - .4032 = .0968


e. Yes, because np and nq are greater than 5, (or p is not close to 0
or 1 and n is at least moderate size).


23.


The probability of a snow storm on any given day during January is equal
to P.


a) What is the probability of at least one snow storm during January
(the month has 31 days)? Set this up in general since you have
no values for P.


b) If p = 1/10, what is the probability of exactly three storms during
the period beginning with January 10 and ending with January 21?
Set this up but do not evaluate.


c) Use the normal approximation to evaluate the above probability in
part b.



Answer:

a) P(at least 1 storm) = 1-P(no storm) = 1-Q**31
where P = 1-Q


b) P(3) = (12C3)*(.1**3)*(.9**9) = .085


c) p=.1 mean=np=1.2 SIGMA=SQRT(npq)=1.04
P(3) == P(2.5


standard score = (2.5-1.2)/1.04 = 1.25
standard score = (3.5-1.2)/1.04 = 2.21


prob. = .4864 - .3944 = .092


24.


Seventy  five  percent  of  the  Ford  autos made in 1976 are falling
apart. Determine the probability distribution of the number of Fords
in a sample of 4 that are falling apart. Draw a histogram of the
distribution. What is the mean and variance of the distribution?



Answer:

Let X = the number of Fords falling apart in a sample of four.


probability distribution: (binomial distribution with n=4 and p=.75)


X ^ p(X)
-------^----------
0 ^ 0.0039 = (4C0)(.75**0)(.25**4)
1 ^ 0.0469 = (4C1)(.75**1)(.25**3)
2 ^ 0.2109 = (4C2)(.75**2)(.25**2)
3 ^ 0.4219 = (4C3)(.75**3)(.25**1)
4 ^ 0.3164 = (4C4)(.75**4)(.25**0)


^
P(X) ^ ^ ^ ^ ^
^ ^ ^ ^ ^
^ ^ ^ ^ ^
^ ^ ^ ^ ^
^ ^ ^ ^ ^
0.6 ^----------^----------^----------^----------^
^ ^ ^ ^ ^
^ ^ ^ ^ ^
0.5 ^----------^----------^----------^----------^
^ ^ ^ ^ ^
^ ^ ^ ---------- ^
0.4 ^----------^----------^----^ ^-----^
^ ^ ^ ^ ^ ^
^ ^ ^ ^ ^----------
0.3 ^----------^----------^----^ ^ ^
^ ^ ^ ^ ^ ^
^ ^ ----------^ ^ ^
0.2 ^----------^----^ ^ ^ ^
^ ^ ^ ^ ^ ^
^ ^ ^ ^ ^ ^
0.1 ^----------^----^ ^ ^ ^
^ ^ ^ ^ ^ ^
^ ----------^ ^ ^ ^
^----^----------^----------^----------^----------^----->
0 1 2 3 4 X



mean = np = 4*.75 = 3
variance = npq = 4*.75*.25 = .75


25.


The following results were obtained from life tests on miniature
bearings. Each datum represents the hours to failure in a
particular turbine.


Time to Failure
---------------------------------------------------
Turbine No. 1 2 3 4 5 6 7 8 9 10
---------- - - - - - - - - - --


Bearing 110 116 670 530 260 190 116 254 150 99
Runs 600 1130 525 242 336 414 300 213 769 140
350 90 194 112 78 558 308
280 123 108 330 930 320 41
96 690 925 92
122 260


a) Plot a frequency histogram with cell interval of 50 hours.
Does the distribution look normal?


b) Plot a conventional % relative cumulative frequency curve on
normal probability paper. What do you conclude?


c) Plot a similar curve on log-normal probability paper. What
do you conclude?


d) Calculate the proper estimate of central tendency and dispersion.



Answer:

LIFE TEST DATA


Time to Failure Frequency % Relative Cumulative Frequency
--------------- --------- -------------------------------
41 1 2.5
78 1 5.0
90 1 7.5
92 1 10.0
96 1 12.5
99 1 15.0
108 1 17.5
110 1 20.0
112 1 22.5
116 2 27.5
122 1 30.0
123 1 32.5
140 1 35.0
150 1 37.5
190 1 40.0
194 1 42.5
213 1 45.0
242 1 47.5
254 1 50.0
260 2 55.0
280 1 57.5
300 1 60.0
308 1 62.5
320 1 65.0
330 1 67.5
336 1 70.0
350 1 72.5
414 1 75.0
525 1 77.5
530 1 80.0
558 1 82.5
600 1 85.0
670 1 87.5
690 1 90.0
769 1 92.0
925 1 95.0
930 1 97.5
1130 1 100.0


a) No, the distribution does not look normal. (If available, consult
file of graphs and diagrams that could not be computerized.)


b) This plot does not form a straight line, and thus appears to be non-
normal. (If available, consult file of graphs and diagrams that
could not be computerized.)


c) This plot does form a straight line, and thus the log of the data
appears to form a normal distribution. (If available, consult file
of graphs and diagrams that could not be computerized.)


d) Arithmetic mean is 329.275 hours.
Standard deviation is 269.96.


Arithmetic mean of logs is 2.382. Antilog is 241 hours. As
expected, the geometric mean is less than the arithmetic mean. The
geomtric standard deviation is 0.352 as is the log, and the anti-
log of this number is 2.25 hours.


26.


The strengths of elevator cables are to be measured.  Let X = strength
of a cable, and assume X is normal with mean MU and variance SIGMA**2,
both unknown. A sample of 89 cables is taken, with results XBAR = 31
and S**2 = 89.


A 93% confidence interval for MU uses a table value closest to:


(a) 1.60 (b) 2.11 (c) 1.32 (d) 1.12 (e) 1.81



Answer:

(e) 1.81


Use Z value because sample size is large, although t distribution
would ordinarily be used when SIGMA**2 is unknown.


27.


Rods from a production line have a length X which is distributed
normally with a mean of 2 and a variance of 1/2. Draw two rods
X(1), X(2) and place them end to end. The sum of their lengths
is X(1) + X(2).


P[(X(1) + X(2)) < 3.6! = P(XBAR < 1.8) has a value closest to:


(a) .1554 (d) .3446
(b) .2157 (e) .7843
(c) .2843



Answer:

(d) .3446


Z = (X - MU)/SQRT(Variance/n)


Z = (1.8 - 2)/SQRT(.5/2) = -4


Area beyond Z of .4 = .3446


Therefore, the probability that the sum of the two rods will
be < 3.6 is .3446.


28.


You, as a manufacturer,  can  use  a  particular  part  only  if  its
diameter is between .14 and .20 inches. Two companies, A and B, can
supply you with these parts at comparable costs. Supplier A produces
parts whose mean is .17 and whose standard deviataion is .015 inches.
However, supplier B produces parts whose mean is .16 inches and whose
standard deviation is .012. The diameters of the parts from each
company are normally distributed. Which company should you buy from
and why?



Answer:

For Supplier A:


Z = (X - MU)/SIGMA
= (.14 - .17)/.015
= -2


and Z = (.20 - .17)/.015
= 2


Area between Z = 2 and Z = -2 under the normal curve is .9544. There-
fore, 95.44% of the parts would be within .14 in. and .20 in.


For Supplier B:


Z = (.14 - .16)/.012
= -1.67


and Z = (.20 - .16)/.012
= 3.33


Area between Z = 3.33 and Z = -1.67 under the normal curve is .9520.
Therefore, 95.20% of the parts would be within .14 in. and .20 in.


Conclusion: I would choose Supplier A by a hair.


29.


A lightbulb is selected randomly from a factory's monthly production.
The bulb's lifetime (total hours of illumination) is a random variable
with exponential density function
f(x) = (1/MU)*(e**[-x/MU!) if x >= 0
= 0 if x < 0,
where the fixed parameter MU is the mean of this distribution (MU > 0).


a) Derive the cumulative distribution function F(x).
Show that a random lifetime X exceeds x hours (x > 0) with
probability
P(X > x) = e**(-1/MU)
b) Let M denote the smallest value in a random sample of n bulb
lifetimes X(1), X(2), ..., X(n).
Show that P(M > x) = P(X(1) > nx).
HINT: M > x if and only if X(1) > x and X(2) > x and ...
and X(n) > x.
c) Assume the mean lifetime MU = 700 hours.
Use a) and a table of the exponential function to evaluate
numerically
i) the median lifetime x(.50),
ii) P(X <= 70),
iii) P(70 < X <= 700).



Answer:

a) F(X) = INT(X/0)((1/MU)*(e**[-t/MU!)dt)
X
= -e**(-t/MU)!
0
= 1/0 - [e**-X/MU)!


F(X) = [ 0; x < 0
[ 1.0 - [e**(-x/MU)!; x >= 0


Prob (X>x) = 1.0 - F(X)
= 1.0 - [1.0 - [e**(-x/MU)!!
= e**(-x/MU)


b) Prob(M > x) = [Prob(X(1)>x)!*[Prob(X(2)>x)!*...*[Prob(X(n)>x)!
= [e**(-x/MU)!**n
= e**(-xn/MU)
= [Prob(X(1)>xn)!


c) i) 0.50 = Prob(X <= Median)
= F(x)
= 1.0 - [e**(-x/700)!
0.50 = e**(-x/700)
using a table of the exponential function
x/700 == .693
x == 485.1 hours
ii) Prob(X<=70) = F(X=70)
= 1.0 - [e**-70/700)!
= 1.0 - 0.90484
= 0.09516
iii) Prob(70 < x <= 700) = F(x=700) - F(x=70)
= [1.0-[e**(-700/700)!!-[1.0-[e**(-70/700)
= [1.0 - .36788! - [0.09516!
= 0.53696


30.


Suppose that the duration of a storm on a tropical island is expo-
nentially distributed with mean value of THETA = 5 minutes. What is
the probability that a storm on the island will last at least two
minutes more, given that it has already lasted for 5 minutes?



Answer:

The distribution for the duration of a storm f(X) is:


f(X) = (1/5) * (e**(-X/5)) X > 0
= 0 elsewhere


P(rain will last at least 2 minutes morelasted for 5 min.)
= P(X >= 7X >= 5)
= (INT(INFNTY/7)((1/5)(e**(-X/5))))/
(INT(INFNTY/5)((1/5)(e**(-X/5))))
= (e**(-7/5))/(e**-1)
= (e**(-2/5))
= .67032


31.


A lightbulb is selected randomly from a factory's monthly production.
The bulb's lifetime (total hours of illumination) is a random variable
with exponential density function
f(x) = [(1/MU)*(e**[-x/MU!) if x >= 0
[ 0 if x < 0,
where the fixed parameter MU is the mean of this distribution (MU>0).
a) For an exponential distribution the standard deviation SIGMA = MU.
Let XBAR = (1/n)(X(1)+X(2)+...+X(n)) denote the average value in
a random sample of n bulb lifetimes. Express E[XBAR! and VAR[XBAR!
in terms of MU. If the mean MU = 700 hours and sample size n = 100,
then the statistic Z=(XBAR-700)/70 has approximately a normal
distribution with what mean and variance?
b) Describe a test of the null hypothesis H(0): MU <= 700 against the
alternative hypothesis H(1): MU > 700, using only the sample mean
XBAR. If the desired significance level is ALPHA = .05 and sample
size n = 100, then indicate which numerical values of XBAR corre-
spond to this test rejecting H(0).
(Use the table of the standard normal distribution.)
c) If mean MU = 700 hours, then P(X > 2100) = .04979. If instead
MU > 700, is P(X > 2100) larger or smaller than .04979?



Answer:

a) E[XBAR! = E[(1/n)*(X(1)+X(2)+...+X(n))!
= (1/n)*[E[X(1)+E[X(2)!+...+E[X(n)!!
= (1/n)*[n*E[X!!
= E[X!
= INT(INFNTY/0)(X*(1/MU)*e**[-x/MU!)dx)
(Integrating by parts, with
u = x dv = (1/MU)(e**[-x/MU!)dx
du = dx v = -e**[-x/MU!
INFNTY
= -x*(e**[-x/MU!)! - INT(INFNTY/0)(-e**[-x/MU!dx)
0


INFNTY
= -MU * e**[-x/MU!!
0
= MU


E[x**2! = INT(INFNTY/0)((x**2)*(1/MU)*(e**[-x/MU!)dx)
by parts with,
u = (x**2) dv = (1/MU)(e**[-x/MU!)dx
du = 2x dx v = -e**[-x/MU!


INFNTY
= (x**2)*(-e**[-x/MU!)! -INT(INFNTY/0)((2x)*(-e**[-x/MU!)dx)
0
= -2*INT(INFNTY/0)((x*(-e**[-x/MU)dx)
by parts with
u = x dv = -e**[-x/MU!dx
du = dx v = mu*(e**[-x/MU!)
INFNTY
= -2*[x*MU*(e**[-x/MU!)! - INT(MU*(e**[-x/MU!)dx)!
0


INFNTY
= -2(MU**2)*(e**[-x/MU!)!
0
= 2(MU**2)


VAR[XBAR! = VAR[(1/n)*(X(1)+X(2)+...+X(n))!
= [(1/n)**2!*[VAR[X(1)!+VAR[X(2)+...+VAR[X(n)!!
= [(1/N)**2!*[n*VAR[X!!
= (1/n)*(VAR[X!)
= (1/n)*[E[X**2!-(E[X!**2)!
= (1/n)*[2(MU**2)-(MU**2)!
= (MU**2)/n


Z = (XBAR-700)/70
E[Z! = (E[XBAR!-700)/70
= (MU-700)/70
= (700-700)/70
= 0/70
= 0


VAR[Z! = VAR[(XBAR-700)/70!
= [(1/70)**2! * VAR(XBAR)
= [(1/70)**2! * [(MU**2)/n!
= [1/4900! * [(700**2)/100!
= 1


b) test statistic: Z = [XBAR-700!/[700/SQRT(n)!
critical region: Any value of Z(calc) that lies beyond the Z(crit)
which is found in the standard normal table with ALPHA per
cent of the distribution beyond it.


with n = 100 and ALPHA = .05, Z(crit) = 1.645


Thus in order to reject H(0),
[XBAR-700!/[700/SQRT(100)! >= 1.645
XBAR >= (1.645*70) + 700
XBAR >= 815.15


c) It can be shown that a random lifetime X exceeds x hours (X>0)
with probability
P(X > x) = e**(-x/MU)
Therefore,
P(X > 2100) = e**(-2100/700)
= e**(-3)
Now if MU > 700, the exponent of e becomes less and looking at a
table of the exponential function it is evident that the probability
becomes smaller.


32.


A lot containing 12 parts among which 3 are defective is put on  sale
"as is" at $10.00 per part with no inspection possible. If a
defective part represents a complete loss of the $10.00 to the buyer
and the good parts can be resold at $14.50 each, is it worthwhile to
buy one of these parts and select it at random?



Answer:

Expected return value of part = .75*(14.50) + .25(0) = 10.875


Therefore, you expect to gain approximately $.87 on each part you buy,
and it is worthwhile to buy one selected at random.


33.


Usually when we make use of a random numbers table we wish
to arrange things so that each each event has an equal probability
of occurring. If we were interested in locating 5 corn
trials in a region having 48 corn farms and we wanted each
farm to have an equal likelihood of being selected (in contrast
to the common practice of locating trials on the farms of the
growers most friendly to the local extension agent), describe
a method using the random numbers table that could be used
to make the selection. Indicate the five farms selected
using your method.



Answer:

To use a random numbers table one must do the following:
1. Make up a rule for converting digits from the table into
sample identification numbers. The rule used ordinarily should
make selections of each population item equally likely. It
should also indicate if the same element can be counted more
than once.
2. Find a starting point in the table in a manner that will
not always lead to the same starting point or a small set of
starting points.
3. Translate the digits that follow the starting point into
sample identification numbers.
In this case we will use sampling without replacement meaning
that a population element can only appear once in a sample.
It is also assumed that the I.D. numbers 1 to 48, have been
assigned to the farms.
a. The rule for converting digits is: beginning at the starting
point and going left to right take a pair of digits and use
those if they are in the range 1 to 48 otherwise discard.
Continue this process until you get five.
b. To arrive at the starting point, haphazardly put your
finger on a group of digits, use the first two digits (that fit
the table) to get a row number and the next two to get a column.
Using this process I get row 44 and column 04 as my starting
point. Starting from there I get the following pairs:
76, 54, 91, 40, 69, 90, 67, 24, 56, 83, 50, 82, 94, 81, 13,
98, 42, 87, 88, 02
Therefore, the sample would contain the following farms:
40, 24, 13, 42, 2


34.


Electron tubes made by two factories, A and B, are installed at random
in single tube units. Thirty percent of the tubes are from factory B.
The probability that a factory B tube will fail in the first week of
operation is .1, and the probability that a factory A tube will fail is
.3. If a particular unit fails in the first 100 hours of continuous
operation, what is the probability that it had a tube from factory A
installed? From factory B?



Answer:

a. P(Afailure) = [P(A)*P(failureA)!/
[P(A)*P(failureA) + P(B)*P(failureB)!
= (.7*.3)/[(.7*.3) + (.3*.1)!
= .875 = 7/8


b. P(Bfailure) = 1 - P(Afailure)
= 1 - .875 = .125 = 1/8


35.


Suppose  that two of the six spark plugs on a six-cylinder automobile
engine require replacement. If the mechanic removes two plugs at
random, what is the probability that he will select the two defective
plugs? At least one of the two defective plugs?



Answer:

a. prob = 1/(6C2) = 1/15
b. prob = 1( - (4C2))/(6C2) = 9/15 = 3/5


36.


A certain assembly consists of two sections, A and B, which are bolted
together. In a bin of 100 assemblies, 12 have only section A defective,
10 have only section B defective, and 2 have both section A and section
B defective. What is the probability of choosing, without replacement,
2 assemblies from the bin which have neither section A nor section B
defective?


a. (76)**2/(100)**2
b. (98)**2/(100)**2
c. 98(97)/[100(99)!
d. 76(75)/[100(99)!
e. none of these



Answer:

d. 76(75)/[100(99)!
# of sections without defectives = 100 - (12 + 10 + 2)
= 100 - 24 = 76
P(of no defectives) = (76/100)*(75/99)


37.


Suppose that the probability is 0.1 that the weather (being either sun-
shine or rain) does not change from one day to the next. The sun is
shining today. What is the probability that it will rain the day after
tomorrow?



Answer:

S(today) - R(tomorrow) - R(day after)


P(SRR) = (.9)(.1)
= .09


S(today) - S(tomorrow) - R(day after)


P(SSR) = (.1)(.9)
= .09


P(rain day after tomorrow) = .09 + .09 = .18


38.


-------------     -----------------     ---------------------
^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^
------------- ----------------- ---------------------
^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^
------------- ----------------- ---------------------
^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^
------------- ----------------- ---------------------


I II III


Three fields have been divided into plots as in the above figure.
Define an edge plot as one that is on the outside of the field (this
includes corner plots).


A farmer selects at random one plot from each field, with no relation
between choices from different fields. What is the probability he
ends up with an edge plot from I and II but not from III? Give
answer as simplified fraction.



Answer:

P(edge 1) = 8/9
P(edge 2) = 10/12 = 5/6
P(not edge 3) = 3/15 = 1/5
P(edge 1, edge 2, not edge 3) = (8/9)(5/6)(1/5) = 4/27


39.


Suppose that the probability is 0.7 that the weather (sunshine or rain)
is different for any given day than it was on the preceding day. If it
is a sunshine day today, what is the probability that it will be rain-
ing the day after tomorrow?



Answer:

PROB = P(sun, sun, rain) + P(sun, rain, rain)
= (.3)(.7) + (.7)(.3)
= .42


40.


The following table gives data which have been rounded from an
actual federal report on the subject.


DISTRIBUTION OF ADMINISTRATORS FOR NURSING AND PERSONAL CARE HOMES,
BY LENGTH OF TOTAL WORK EXPERIENCE AND SIZE OF THE HOME,
UNITED STATES, EXCLUDING ALASKA AND HAWAII, JUNE-AUGUST 1969.


-----------------------------------------------------------------------
Length of ^ Under ^ ^ ^ ^ ^ ^
Size total work ^ 1 ^ 1-4 ^ 5-9 ^ 10-19 ^ 20+ ^ Total ^
of home experience ^ year ^ years ^ years ^ years ^ years ^ ^
_____________________^_______^_______^_______^_______^_______^_______^
^ ^ ^ ^ ^ ^ ^
Under 25 beds ^ 200 ^ 600 ^ 850 ^ 1100 ^ 550 ^ 3300 ^
^ ^ ^ ^ ^ ^ ^
25-49 beds ^ 200 ^ 750 ^ 500 ^ 600 ^ 350 ^ 2400 ^
^ ^ ^ ^ ^ ^ ^
50-99 beds ^ 250 ^ 700 ^ 550 ^ 450 ^ 250 ^ 2200 ^
^ ^ ^ ^ ^ ^ ^
100-299 beds ^ 100 ^ 300 ^ 250 ^ 200 ^ 150 ^ 1000 ^
^ ^ ^ ^ ^ ^ ^
300 beds and over ^ 0 ^ 20 ^ 30 ^ 30 ^ 20 ^ 100 ^
______________________^_______^_______^_______^_______^_______^_______^
^ ^ ^ ^ ^ ^ ^
Total ^ 750 ^ 2370 ^ 2180 ^ 2380 ^ 1320 ^ 9000 ^
______________________^_______^_______^_______^_______^_______^_______^


If administrator's experience were independent of size of home, find:


A. the probability that an administrator chosen at random is adminis-
tering a home with 25 beds or more, given that he/she has at least
10 years experience.


B. the probability that (for an administrator-home pair selected at
random) the home will have 99 or fewer beds and the administrator
will have a work experience of 1 to 9 years inclusive.


C. the probability of an administrator with experience of 1 to 9 years
with a nursing home of 300 beds or over.


D. the number of administrators you would expect to have from 1 to 9
years experience and work in a home with 99 or fewer beds.


E. the number of administrators with 1 to 9 years experience in a home
of 300 or over beds.



Answer:

A. (note: If events A and B are independent,then
P(AB) = P(A).)


P(25 beds+ 10 yrs.+) = P(25 beds+)
= (2400+2200+1000+100)/9000
= 57/90
= .633


B. P(99 or fewer beds INTERSECTION 1 - 9 yrs)
= [(2370 + 2180)/9000! * [(3300 +
2400 + 2200)/9000!
= [4550/9000! * [7900/9000!
= 0.4438


C. P(300+ beds INTERSECTION 1-9 yrs)
= [(2370 + 2180)/9000! * [100/9000!
= .0056


D. Using the probability found in part B.
Expected number = 9000*.4438 = 3994.2


E. Using the probability found in part C.
Expected number = 9000*.0056 = 50.4


41.


A population of 160 communities is arranged according to death
rate and air pollution level as follows:


AIR POLLUTION LEVEL


Low Medium High
-------------------
Low 2 6 8 ^ 16
DEATH -------------------
Medium 14 42 56 ^ 112
RATE -------------------
High 4 12 16 ^ 32
-------------------
20 60 80 ^ 160


How many communities would you expect to have a low death rate and a
high air pollution level if death rate and air pollution level are
independent (i.e. are not associated)?


a. (16*80)/(160**2) b. (20*32)/(160**2) c. 8/160
d. 4 e. none of these



Answer:

e. none of these


P(high pollution) = 80/160 = 1/2 = .5
P(low death rate) = 16/160 = 1/10 = .1
P(both) = .05


Number of communities with low death rate and high pollution =
(.05)*(160) = 8


42.


A population of 160 communities is arranged according to death
rate and air pollution level as follows:


AIR POLLUTION LEVEL


Low Medium High
-------------------
Low 2 6 8 ^ 16
DEATH -------------------
Medium 14 42 56 ^ 112
RATE -------------------
High 4 12 16 ^ 32
-------------------
20 60 80 ^ 160


This is the entire population not a sample. In view of this --


Which of the following statements is correct about the two events
"low death rate" and "high air pollution level":


a. they are independent
b. they are mutually exclusive
c. they are exhaustive
d. they are opposite
e. none of these



Answer:

a. they are independent


P(high pollution)*P(low death) = .5 * .1 = .05
8/160 = .05


43.


A random sample of 160 communities is distributed according to death
rate and air pollution level as follows:


AIR POLLUTION LEVEL


Low Medium High
-------------------
Low 2 6 8 ^ 16
DEATH -------------------
Medium 14 42 56 ^ 112
RATE -------------------
High 4 12 16 ^ 32
-------------------
20 60 80 ^ 160


Which of the following statements is correct?


a. There is no evidence that air pollution level and death rate are
related.


b. Death rate and air pollution level are dependent variables.


c. The CHISQUARE index for the table is very large.


d. The probability that a randomly selected community will have a high
death rate will vary as the air pollution level of the community
varies.


e. None of these.



Answer:

a. There is no evidence that air pollution level and death rate are
related.


CHISQUARE(calculated) = 0, so continue the notion in H(O) of
independence.


44.


A special steel alloy has an average tensile strength of 25,800 psi.
The numerical value of the variance is 1,500,000. The units assoc-
iated with this variance would be:


(a) (psi)**2 (c) SQRT(psi)
(b) psi (d) unknown



Answer:

(a) (psi)**2


45.


The  life in months of service before failure of the color television
picture tube in 8 television sets manufactured by Firm A and 8 sets
manufactured by Firm B are as follows (arranged according to size):


Firm A: 25, 29, 31, 32, 35, 37, 39, 40
Firm B: 34, 36, 41, 43, 44, 45, 47, 48


Let ETA(A) and ETA(B) denote the median service life of picture tubes
produced by the 2 firms. A confidence interval for ETA(B) - ETA(A) is
bounded by the dth smallest and the dth largest of all differences of
B- and A-observations. For confidence coefficient .99, we take d
equal to:


(a) 9 (b) 14 (c) 15 (d) 17



Answer:

(a) 9


46.


A Physicist comes to you (as Associate Professor of  Statistics)  for
your help. He has a special electronic circuit consisting of Anodes
and Diodes linked together in a special way. This is an experimental
piece of equipment and for the 4 components (i.e. Anodes and Diodes)
he knows the following probability distribution from previous
experiments.


____________________________________________________________
^ Anodes or Diodes failing per circuit ^ 0 ^ 1 ^ 2 ^ 3 ^ 4 ^
^---------------------------------------^---^---^---^---^----^
^ Probability that many fail ^0.1^0.2^0.3^0.3^ 0.1^
------------------------------------------------------------


He creates a special circuit consisting of the same Anodes and Diodes
and wonders whether the distribution has remained the same. He uses
500 of the circuits and counts the number of Diodes and Anodes that
fail. He finds the following:


___________________________________________________
^ Anodes or Diodes failing ^ 0 ^ 1 ^ 2 ^ 3 ^ 4 ^
^------------------------------^---^---^---^---^----^
^ Frequency ^ 50^105^145^155^ 45 ^
---------------------------------------------------


Could the Physicist conclude the new circuit had the same
characteristics as the previous circuits? Be careful to state the
Level of Significance used.



Answer:

Here we need to see how well the data we have fits the theoretical
(past) distribution. This is a Chi square goodness of fit problem.


H(O): The data fits the past distribution.
H(A): The data does not fit the past distribution.


CHISQ = SUM([(O-E)**2!/[E!)


where O = observed value
E = expected value = probability*total


df= k-m-1,
k = no. of categories,
m = no. of estimated parameters


Table:


No.
failing ^ O ^ E ^ (O-E) ^ [(O-E)**2! ^ [(O-E)**2!/[E!
----------^---^---^-------^------------^------------------
0 ^ 50^ 50^ 0 ^ 0 ^ 0.000
1 ^105^100^ 5 ^ 25 ^ 0.250
2 ^145^150^ -5 ^ 25 ^ 0.167
3 ^155^150^ 5 ^ 25 ^ 0.167
4 ^ 45^ 50^ -5 ^ 25 ^ 0.500
----------^---^---^-------^------------^-------------------
TOTAL ^500^500^ 0 ^ 100 ^ 1.084


CHISQ (calc.) = 1.084


From tables:


CHISQ(crit., df=4, ALPHA=.05) = 9.49
CHISQ(crit., df=4, ALPHA=.10) = 7.78


Since CHISQ(calc.) < CHISQ(crit.), we shall continue (not reject)
H(O) with ALPHA = 10%. It seems most likely that the characteris-
tics of the new circuit are the same as the previous circuits.


47.


The works known to be written by a famous author have been thoroughly
analyzed as to sentence length. A newly found manuscript is claimed
to have been written by the same author. The data below are taken
from a sample of 2000 sentences in this new manuscript. Use
CHISQUARE to decide whether the new manuscript is by the same author.


proportion of sentences
_______________________
no. words in
sentence known author new manuscript
____________ _________________________________
3 or less .010 .007
4-5 .030 .024
6-8 .041 .031
9-12 .102 .034
13-16 .263 .250
17-20 .279 .203
21-24 .118 .198
25-27 .105 .156
28-29 .042 .081
30 or more .010 .016





Answer:

Number of sentences
-------------------
0 14 48 62 68 500 406 396 312 162 32
E 20 60 82 204 526 558 236 210 84 20


CHISQ = (14-20)**2/20 + (48-60)**2/60 + (62-82)**2/82 +
(68-204)**2/204 + (500-526)**2/526 + (406-558)**2/558
+ (396-236)**2/236 + (162-84)**2/84 + (312-210)**2/210
+ (32-20)**2/20
= 380.081


d.f. = (k-1) = 9


P(CHISQ(9) >= 380.081) < .001


Reject H(0) that new manuscript by same author at ALPHA = .10,
.05 or .01.


48.


On the basis of the data presented below, do we have reason to believe
the geneticist who says offspring with characteristics A, B, C, and D
should occur with relative frequency 1:2:4:8 in the experiment? Use
ALPHA = .005.


Characteristic A B C D
Number 28 60 208 304



Answer:

There are a total of 600 offspring. If the geneticist is correct, then
(1/(1+2+4+8))(600)=(1/15)(600) = 40 offspring are expected to have Cha-
racteristic A. We can compute other expected values as follows.


Characteristic A B C D


Expected Number 40 80 160 320
Observed Number 28 60 208 304
Difference (O(i)-E(i)) -12 -20 48 -16


Now these differences look very large, which implies that the
geneticist is probably wrong. A statistical test can be made
by computing:


W = SUM(i=1,4)(([O(i)-E(i)!**2)/E(i)),


which is the CHISQUARE test with 3 degrees of freedom.


W = (-12**2)/40 + (-20**2)/80 + (48**2)/160 + (-16**2)/320
= 23.8


CHISQUARE(critical, ALPHA=.005, df=3) = 12.8381


Since 23.8 > 12.8381, reject the geneticist's claim.


49.


A random sample of 160 communities is distributed according to death
rate and air pollution level as follows:


AIR POLLUTION LEVEL


Low Medium High
-------------------
Low 2 6 8 ^ 16
DEATH -------------------
Medium 14 42 56 ^ 112
RATE -------------------
High 4 12 16 ^ 32
-------------------
20 60 80 ^ 160


What would you estimate from the above table to be the probability that
a randomly sampled community will have a low death rate and a high air
pollution level?


a. (16*80)/160 b. (20*32)/(160**2) c. 8/160
d. 4/160 e. none of these



Answer:

c. 8/160


50.


The following table gives data which have been rounded from an
actual federal report on the subject.


DISTRIBUTION OF ADMINISTRATORS FOR NURSING AND PERSONAL CARE HOMES,
BY LENGTH OF TOTAL WORK EXPERIENCE AND SIZE OF THE HOME,
UNITED STATES, EXCLUDING ALASKA AND HAWAII, JUNE-AUGUST 1969.


-----------------------------------------------------------------------
Length of ^ Under ^ ^ ^ ^ ^ ^
Size total work ^ 1 ^ 1-4 ^ 5-9 ^ 10-19 ^ 20+ ^ Total ^
of home experience ^ year ^ years ^ years ^ years ^ years ^ ^
__________ ___________^_______^_______^_______^_______^_______^_______^
^ ^ ^ ^ ^ ^ ^
Under 25 beds ^ 200 ^ 600 ^ 850 ^ 1100 ^ 550 ^ 3300 ^
^ ^ ^ ^ ^ ^ ^
25-49 beds ^ 200 ^ 750 ^ 500 ^ 600 ^ 350 ^ 2400 ^
^ ^ ^ ^ ^ ^ ^
50-99 beds ^ 250 ^ 700 ^ 550 ^ 450 ^ 250 ^ 2200 ^
^ ^ ^ ^ ^ ^ ^
100-299 beds ^ 100 ^ 300 ^ 250 ^ 200 ^ 150 ^ 1000 ^
^ ^ ^ ^ ^ ^ ^
300 beds and over ^ 0 ^ 20 ^ 30 ^ 30 ^ 20 ^ 100 ^
______________________^_______^_______^_______^_______^_______^_______^
^ ^ ^ ^ ^ ^ ^
Total ^ 750 ^ 2370 ^ 2180 ^ 2380 ^ 1320 ^ 9000 ^
______________________^_______^_______^_______^_______^_______^_______^


Decide whether work experience is associated with size of home in which
the administrator works. Explain your decision.



Answer:

Using the CHI SQUARE statistic to test the hypotheses:


H(0): Work experience and size of home are independent.
H(A): Work experience and size of home are dependent.


CHI SQUARE = SUM[((0-E)**2)/E!
= 344.73


CHI SQUARE (d.f. = 16, ALPHA = .05) = 26.29


Therefore we have very strong evidence to reject the null hypothesis
at the .05 ALPHA level.


51.


Brand of Tire
A B C D E
--- --- --- --- ---
151 157 135 147 146
143 158 146 174 171
159 150 142 179 167
152 142 129 163 145
156 140 139 148 147
165 166


The data in the above table give stopping distance for five brands of
tires. You want to test the hypothesis that brands D and E do not
differ with respect to stopping ability. This hypothesis can be tested
using


a. either the sign test or the Wilcoxon signed rank test.
b. the sign test, but not the Wilcoxon signed rank test.
c. either the median test or the Wilcoxon two-sample test.
d. the median test, but not the Wilcoxon two-sample test.
e. the test of homogeneity.



Answer:

c. either the median test or the Wilcoxon two-sample test.


52.


The life in months of service before failure of the color television
picture tube in 8 television sets manufactured by Firm B are as follows
(arranged according to size):


Firm B: 34, 36, 41, 43, 44, 45, 47, 48


Let ETA(B) denote the median service life of picture tubes produced by
the firm. To test the hypothesis ETA(B) = 38.5 against the alternative
ETA(B) =/= 38.5, the value of CHISQ(calculated) for the median test
equals:


(a) 8 (b) 6 (c) 4 (d) 2



Answer:

(d) 2


^ Above 38.5 ^ Below 38.5
-----------------------------------
observed ^ 2 ^ 6 ^
-----------------------------------
expected ^ 4 ^ 4 ^
-----------------------------------


CHISQ = [[(2 - 4)**2! + [(6 - 4)**2!!/4 = 2


53.


Five cars are entered in a race:


starting order: 1 2 3 4 5
finishing order: 2 1 4 3 5


The Kendall rank correlation coefficient between starting order and
finishing order equals


a. -.4 b. -.2 c. .6 d. .2 e. .4



Answer:

c. .6


N(C) N(D)
X Y (# concordant pairs) (# discordant pairs)
1 2 3 1
2 1 3 0
3 4 1 1
4 3 1 0
5 5 0 0
-- --
8 2


# of pairs in data = n = 5


T = [N(C) - N(D)!/[n(n-1)/2!
= [8 - 2! / [5(4)/2!
= .6


54.


The observed life, in months of service, before failure for the color
television picture tube in 8 television sets manufactured by Firm B are
as follows (arranged according to size):


Firm B: 34 36 41 43 44 45 47 48


Let ETA(B) denote the median service life of picture tubes produced by
the firm.


The point estimate of ETA(B) equals:


a. 35 b. 43.5 c. 44 d. 33.5



Answer:

b. 43.5


n = 8
Therefore, the median equals the average of the two middle values.
Median = (43 + 44)/2 = 43.5
or any number between 43 and 44.


55.


The life in months of service before failure of the color television
picture tubes in 8 television sets manufactured by Firm A and 8 sets
manufactured by Firm B are as follows (arranged according to size):


Firm A: 25 29 31 32 35 37 39 40
Firm B: 34 36 41 43 44 45 47 48


Let ETA(A) and ETA(B) denote the median service life of picture tubes
produced by the two firms.


The S-interval with confidence coefficient .71 for ETA(A) is bounded
by:


a. 29 and 39 b. 36 and 47 c. 31 and 37 d. 41 and 45



Answer:

c. 31 and 37


GAMMA = .71
n = 8


From the Table of d-factors for Sign Test and Confidence Intervals
for the median, d = 3. The confidence interval is bounded by the
d-smallest and d-largest sample observations. Thus, the S-inter-
val about the median is bounded by the third smallest and third
largest sample observations, or 31 and 37.


56.


The life in months of service before failure of the color television
picture tube in 8 television sets manufactured by Firm A and 8 sets
manufactured by Firm B are as follows (arranged according to size):


Firm A: 25 29 31 32 35 37 39 40
Firm B: 34 36 41 43 44 45 47 48


Let ETA(A) and ETA(B) denote the median service life of picture tubes
produced by the two firms.


The W-interval with confidence coefficient .98 for ETA(A) is bounded
by:


a. 29 and 39 b. 36 and 47 c. 35 and 47.5 d. 27 and 39.5



Answer:

d. 27 and 39.5


n = 8
Using a table of critical values for the W-interval with ALPHA=.02,
d=2, the table of averages:


^ 25 29 31 32 35 37 39 40
--------------------------------------------
25 ^ 25 [27! 28
29 ^ 29 30
31 ^ 31
32 ^
35 ^
37 ^ 37 38 38.5
39 ^ 39 [39.5!
40 ^ 40


W-interval is 27 and 39.5.


57.


The coded values for a measure of brightness in paper (light
reflectivity), prepared by two different processes, are as
follows for samples of size 9 drawn randomly from each of the
two processes:


A B
___ ___


6.1 9.1
9.2 8.2
8.7 8.6
8.9 6.9
7.6 7.5
7.1 7.9
9.5 8.3
8.3 7.8
9.0 8.9


Do the data present sufficient evidence (ALPHA = .10) to
indicate a difference in the populations of brightness
measurements for the two processes?


a. Use the sign test.
b. Use the Mann-Whitney rank test.



Answer:

H(O): Brightness has the same distribution under both
processes


H(A): Brightness has different distributions under
the two processes


a. Sign test:


The signs associated with the differences are:
-,+,+,+,+,-,+,+,+.


The smaller number of like signs is 2. With 9 pairs, 1 or
fewer signs are required for significance at the .10 level,
therefore we continue the null hypothesis of no difference.


b. Mann-Whitney rank test:



A B


6.1 1 9.1 16
9.2 17 8.2 9
8.7 13 8.6 12
8.9 14 6.9 2
7.6 5 7.5 4
7.1 3 7.9 7
9.5 18 8.3 10.5
8.3 10.5 7.8 6
9.0 15 8.1 8
---- ----
96.5 74.5


With n(1) = 9 and n(2) = 9, a value of 63 or less for
the smaller sum of ranks should lead to rejection at
the .05 level. Therefore we will continue H(O) at both
the .05 and .10 levels.


58.


Model           ^    G       F       C
---------------------------------------
Compacts ^ 20.3 25.6 24.0
Intermediate 6s ^ 21.2 24.7 23.1
Intermediate 8s ^ 18.2 19.3 20.6
Full-size 8s ^ 18.6 19.3 19.8
Sport Cars ^ 18.5 20.7 21.4


The data in the above table give gasoline mileage for various types of
cars produced by three different manufacturers. You want to compare
cars produced by manufacturers G and F. The hypothesis that gasoline
mileage does not differ for the two manufacturers can be tested using


a. either the sign test or the Wilcoxon signed rank test.
b. the sign test, but not the Wilcoxon signed rank test.
c. either the median test or the Wilcoxon two-sample test.
d. the median test, but not the Wilcoxon two-sample test.
e. the test of homogeneity.



Answer:

a. either the sign test or the Wilcoxon signed rank test.


59.


The life in months of service before failure of the color television
picture tube in 8 television sets manufactured by Firm A and 8 sets
manufactured by Firm B are as follows (arranged according to size):


Firm A: 25 29 31 32 35 37 39 40
Firm B: 34 36 41 43 44 45 47 48


Let ETA(A) and ETA(B) denote the median service life of picture tubes
produced by the two firms.


You want to test the hypothesis ETA(A) = 38 against the alternative
ETA(A) < 38. The correct sign test statistic and its value is:


a. S(+) = 2 b. S(-) = 2 c. S(+) = 3 d. S(-) = 3



Answer:

a. S(+) = 2


Since we have H(A): ETA(A) < 38, we expect fewer observa-
tions to be larger than the median, and the correct test
statistic is S(+). Its value is:


S(+) = # observations > 38 = 2.


60.


An experiment was designed to compare the durability of  two  highway
paints named type A and type B. An "A" strip and a "B" strip were
painted across a highway at each of 30 locations. At the end of the
test period the following results were observed: at 6 locations type
A showed the least wear, at 15 locations type B showed the least
wear, and at 9 locations both had the same amount of wear. Use the
sign test at the 5% level to test that both paints have equal
durability.



Answer:

H(O): P(A < B) = P(B < A) = .5


(NOTE: All tied cases are dropped from the analysis for the sign
test.)


n = 21
X = the number of fewer signs = 6


Using appropriate table: P(X <= 3) = .001
Therefore, reject the null hypothesis.


61.


The observed life, in months of service, before failure for the color
television picture tube in 8 television sets manufactured by Firm B are
as follows (arranged according to size):


Firm B: 34, 36, 41, 43, 44, 45, 47, 48


Let ETA(B) denote the median service life of picture tubes produced by
the firm and assume the lifetimes have symmetric distributions. You
want to test the hypothesis ETA(B) = 38.5 against the alternative
ETA(B) =/= 38.5 using the Wilcoxon signed rank test. From the
following list, select the most reasonable test statistic:


(a) W(+) = 2 (b) W(+) = 5 (c) W(-) = 5 (d) W(-) = 2



Answer:

(c) W(-) = 5


X(i) D(i) ]D(i)] Rank
----- ----- ------ ----


34 -4.5 4.5 3.5
36 -2.5 2.5 1.5
41 2.5 2.5 1.5
43 4.5 4.5 3.5
44 5.5 5.5 5
45 6.5 6.5 6
47 8.5 8.5 7
48 9.5 9.5 8


W(-) = SUM(R(-)) = 3.5 + 1.5 = 5


62.


The state highway department is collecting data to determine whether
a highway's repair priorities should be raised, lowered, or should
remain the same. The decision will be made in the following manner.
If the population median of traffic flow = 100 cars per day, the
priority will remain the same. If the population median of traffic
flow > 100 cars per day, raise the priority. If the population me-
dian of traffic flow < 100 cars per day, decrease the priority. Data
for nine randomly selected days is as follows:


Traffic Flow: 88, 91, 89, 101, 93, 86, 95, 98, 92


Can we conclude at ALPHA = .05 that the median number of cars per day
is 100?


(a) Pick the most appropriate nonparametric procedure.
(b) State null and alternative hypotheses.
(c) Compute a test statistic.
(d) Indicate your critical values.
(e) Do you or do you not reject H(0)? What is your conclusion? What
happens to the road in question?



Answer:

(a) Use Wilcoxon test.


(b) H(0): Md = 100
H(1): Md =/= 100


(c) X D(i) = X - 100 ABS(D(i)) Rank
- -------------- --------- ----


86 -14 14 9
88 -12 12 8
89 -11 11 7
91 -9 9 6
92 -8 8 5
93 -7 7 4
95 -5 5 3
98 -2 2 2
101 +1 1 1


T = 1 R+ = 1 (Sum of positive ranks)
R- = 44 (Sum of negative ranks)


(d) lower W = 6
upper W = 9(10)/2 - 6 = 39


(e) (T=1) < 6, therefore reject H(O). Conclude that the median is
less than 100 cars per day, and decrease the priority.


63.


Ten randomly selected cars of a specific year, make, and model and
with similar equipment, are subjected to an EPA gasoline mileage
test. The resulting miles/gallon are:


24.6, 30.0, 28.2, 27.4, 26.8,
23.9, 22.2, 26.4, 32.6, 28.8


Using the Wilcoxon Median Test, test the hypothesis that the population
median is 30 miles/gallon at the ALPHA = .10 level. Construct a 90%
confidence interval for the median.



Answer:

Measurement D(i) ]D(i)] Rank
----------- ---- ------ -----
24.6 -5.4 5.4 7
30.0 0 0 -
28.2 -1.8 1.8 2
27.4 -2.6 2.6 3.5
26.8 -3.2 3.2 5
23.9 -6.1 6.1 8
22.2 -7.8 7.8 9
26.4 -3.6 3.6 6
32.6 2.6 2.6 3.5
28.8 -1.2 1.2 1


R+ = 3.5
---> T = 3.5
R- = 41.5


Lower w = 9
Upper w = (9*10)/2 - 9 = 36


Since (T=3.5) < 9, we reject H(0): median = 30.


For the confidence interval, we need the 11th largest and smallest
values, to be obtained from the following table:


^ 32.6 30.0 28.8 28.2 27.4 26.8 26.4 24.6 23.9 22.2
--------------------------------------------------------------------
32.6 ^ 32.6 31.3 30.7 30.4 30.3 29.7 29.5 28.6 28.25 27.4
30.0 ^ 30.0 29.4 29.1 28.7 28.4 -- -- -- --
28.8 ^ [28.8! 28.5 28.1 27.8 -- -- -- --
28.2 ^ -- -- -- -- -- 26.05 [25.2!
27.4 ^ -- -- -- 26.0 25.65 24.8
26.8 ^ -- -- 25.7 25.35 24.5
26.4 ^ 26.4 25.5 25.15 24.3
24.6 ^ 24.6 24.25 23.4
23.9 ^ 23.9 23.05
22.2 ^ 22.2


Therefore, 90% C.I.: from 25.2 to 28.8.


64.


On eight occasions of cloud seeding, the following amounts of rainfall
were observed: .74, .54, 1.25, .27, .76, 1.01, .49, .70. On six control
occasions (when no cloud seeding took place), the following amounts of
rainfall were measured: .25, .36, .42, .16, .59, .66.


We test (using the Wilcoxon - Mann - Whitney test) the hypothesis that
cloud seeding does not increase amount of rainfall against the alter-
native that it does. The descriptive level for the given data equals


a. .00 b. .01 c. .02 d. .04 e. .05



Answer:

c. .02


U(S) = number of times seeded observations are larger than
control observations
U(S) = 6 + 4 + 6 + 2 + 6 + 6 + 4 + 6
= 40
U(NS) = (8*6) - 40
= 48 - 40
= 8


from table:
P(U(NS) < 9) = .021


65.


The  life in months of service before failure of the color television
picture tube in 8 television sets manufactured by Firm A and 8 sets
manufactured by Firm B are as follows (arranged according to size):


Firm A: 25, 29, 31, 32, 35, 37, 39, 40
Firm B: 34, 36, 41, 43, 44, 45, 47, 48


Against the two-sided alternative, the Wilcoxon (Mann Whitney) two-
sample test has descriptive level:


(a) .050 (b) .010 (c) .007 (d) .004



Answer:

(c) .007


U(A) = 0 + 0 + 0 + 0 + 1 + 2 + 2 + 2
= 7
U(B) = 64 - 7
= 57
P(U(A) <= 7) = .007


66.


The  life in months of service before failure of the color television
picture tube in 8 television sets manufactured by Firm A and 8 sets
manufactured by Firm B are as follows (arranged according to size):


Firm A: 25, 29, 31, 32, 35, 37, 39, 40
Firm B: 34, 36, 41, 43, 44, 45, 47, 48


Suppose the data is ranked as one combined set. The sum of the ranks
R(B) for the B-observations equals:


(a) 36 (b) 43 (c) 57 (d) 93



Answer:

(d) 93


Table of Ranks:


Firm A: 1 2 3 4 6 8 9 10
Firm B: 5 7 11 12 13 14 15 16


SUM(R(B)) = 5 + 7 + 11 + 12 + 13 + 14 + 15 + 16
= 93


67.


Question
type 1 2 3 4 An investigator is interested
in teachers' use of various
Teacher types of questions in teaching
A 9 1 9 2 mathematics. He identifies 4
B 4 6 7 0 types of questions which de-
C 8 2 5 1 mand responses of different
D 6 9 2 3 levels of complexity. He re-
E 7 5 6 2 cords the number of questions
F 7 3 4 1 of each type asked by each
G 8 5 2 5 teacher in a random sample of
H 8 9 7 1 10 teachers. The frequencies
I 6 5 8 4 are reported for teacher and
J 7 2 5 1 question type.


The most appropriate nonparametric test for these data would be:


A. Mann-Whitney Test
B. Friedman Test
C. Wilcoxon Test
D. CHI-SQUARE Test of Homogeneity



Answer:

B. Friedman Test


We are interested in comparing the average ranks for the four
question types. Friedman will do this directly.


68.


Using the list of designs below, indicate which type of design is
most descriptive of the following study:


a. one shot case study
b. factorial design
c. time series design
d. nonequivalent control group
e. co-relational study
f. one group pretest posttest
g. equivalent time series
h. patched up design
i. posttest only control group design
j. criterion group
k. pretest-posttest control
l. separate sample pretest-posttest


A company making electric drills has kept accurate monthly records of
the number of faulty drills sold as indicated by the number of them
that have been returned to the factory. Because this number has
been increasing, the company has instituted a training program to im-
prove the skills of its inspectors in the hopes that fewer faulty
drills will be distributed. The company plans to assess the effects
of this training by continuing to examine monthly records of how many
drills are returned to the factory for repairs after the training pro-
gram has been completed.



Answer:

c. time series


This study consists first of repeated observation, then conduct of
training program, and then continued observation.


69.


Regarding the testing of ammunition using a 16-inch gun subject to a
linear trend for wear, W.J. Youden makes these comments on a testing
sequence


AAAA/BBBB/..../EEEE where A, ..., E are brands of shells


"Nothing good comes from this work. The averages are worthless. Each
average depends on its position in the firing order. The estimate of
the experimental error based on repeat rounds fired in succession ob-
viously has no applicability for judging differences between
ammunitions not fired in immediate succession."


a. Why are the averages worthless?
b. What is wrong with the estimate of experimental error?



Answer:

a. The averages are worthless because differences between ammunition
means are mixed up with differences in firing order where firing or-
der is an important source of variation. If we let, say, RHO(1) re-
present the effect of testing during the first four firings, RHO(2)
represent the effect of testing during the second four firings, etc.
and let TAU(A) represent the effect of Brand A ammunition, TAU(B)
represent the effect of Brand B ammunition, etc., then the differ-
ence YBAR(A) - YBAR(B) = (TAU(A) + RHO(1)) - (TAU(B) + RHO(2)). The
proper difference is distorted by an amount (RHO(1) - RHO(2)).


b. Repeat rounds fired in succession will tend to agree with each other
much more than rounds scattered randomly through the firing se-
quence. They will underestimate the variance that should be used in
comparing brands of ammunition.


70.


An Experiment is to be conducted in a greenhouse to compare three
treatments. The 9 experimental units to be used will be arranged as
below:


Units


Greenhouse X X X


Heat X X X


Source(Radiator) X X X



Since the plants used in the experiment are sensitive to heat, it is
expected that the closer an experimental unit is to the heat source, the
poorer the response that will be obtained. Otherwise the experimental
units are considered to be about the same.


a. How would you assign treatments to experimental units? Explain.
b. What, if any, experimental design is defined by this method of
assignment?



Answer:

a. I would assign treatments such that each treatment would occur in
each of the distances from the radiator because we suspect that the
distrance from the heat will affect the response.
b. This is the randomized complete block design, with distance from the
heat source used as the blocking factor.


71.


Suppose that you have been appointed energy czar of New Hampshire
and have been instructed to provide guidance to consumers on the
relation of speed to miles per gallon for various makes of cars.
Disregarding cost considerations, which of the following static
test schemes would you prefer? Why?


a.) 5 tests all at 25 mph
b.) 2 tests at 25 mph, 3 tests at 55 mph
c.) 1 test at 25, 1 at 35, 2 at 45, and 1 at 55 mph
d.) 1 test each at 25, 35, 45, 55, and 65 mph



Answer:

d.) Because I would be able to say something about speeds of 25 and
65 without extrapolating, and I would be able to say more about
the whole range because it is well covered using this scheme.


72.


Suppose that you are a member of a garden club that has 30 members.  The
club has fallen into controversy as to whether or not planting garlic
next to Baby's Breath is an effective way to reduce insect attacks on
Baby's Breath.


One club member proposes that the best gardener in the club plant
various parts of his garden either with Baby's Breath alone or with
Baby's Breath next to garlic. The proposal is that random selection be
used so that half of 20 areas are planted one way and half are planted
the other way.


Another club member disagrees. He suggests that each club member plant
2 areas - one with Baby's Breath alone and one with Baby's Breath plus
garlic (again randomly assigning treatment to area). Which scheme do
you favor? Why?



Answer:

I favor the second scheme because the scope of inference will be much
broader. After the study is finished, the results would apply to many
types of soil conditions, environmental conditions and gardener's
abilities.


73.


Suppose that you are an entomologist and wish to test 4 compounds that
are said to attract a certain insect. You have 16 insect traps in a
square (4X4) arrangement in a field that contains a growing crop. You
think, but are not sure, that either wind direction or distance from
insect source may affect number trapped. Your traps are like this:


Wind ------------->


X X X X


X X X X


X X X X


X X X X


Insect Source /]
]


a. How many df will be associated with experimental error if the
design used for this situation is
i) completely random?
ii) randomized blocks (with one block consisting of the traps clos-
est to the insect source, ..., another block
consisting of the traps furthest from the
source)?
iii) latin square?


b. What model terms have to be important if it's to be worthwhile to
have used the latin square?


c. Which model terms (or influences) have to be unimportant if a CR
design is to be a good choice?



Answer:

a. i) t(r-1) = 4(4-1) = 4(3) = 12
ii) (r-1)(t-1) = (4-1)(4-1) = (3)(3) = 9
iii) (t-1)(t-2) = (4-1)(4-2) = (3)(2) = 6
where t = number of treatments
r = number of blocks or repetitions


b. Both wind and distance from insect source have to be important in
order for the latin square design to be worthwhile, therefore, both
RHO(j) and KAPPA(k) for all i and k must be important.


c. Both wind and distance from insect source, or RHO(j) and KAPPA(k)
for all j and k, must be unimportant for the completely random
design to be appropriate.


74.


A computer user wishes to compare two programs in terms of amount of
computer time used. Both programs perform the same analysis and use
the same data set. The computer user randomly selects 10 time periods
during the time when the computer ordinarily is used. For each of
these periods, both programs are run. Each time a random method
is used to decide which program should be run first. What experimental
design has been used?



Answer:

A randomized complete block (RCB) design has been used to compare
the two programs. The implied blocking factor is time period, since
each program must run within each time period. Each treatment
(program) occurs randomly within each time period.


75.


A computer user is charged for the amount of computer time that he uses.
He has at his disposal two programs that perform the same analysis. He
sets up a standard data set and wishes to compare programs in terms of
computer time used. He randomly chooses 20 time periods from those
times when he ordinarily would use the computer. Programs are randomly
assigned to these 20 periods subject only to the requirement that each
program be run 10 times. What experimental design has been used?



Answer:

The completely random design has been used since the treatments
(programs) have been randomly assigned to the experimental units without
any restriction on randomization except that each treatment be assigned
10 times.


76.


An experiment has been conducted in which two computer programs were
compared in terms of computer time needed to perform the same analysis
of the same data.


Results obtained included:


ANOVA
Source of Variation df SS M.S.
Total 20 - -
Mean 1 - -
Corrected Total 19 - -
Treatments 1 - -
Error 18 - -


Means
Program 1 20
Program 2 15


LSD(at ALPHA .05) = t * S(dBAR) = 2


a. Which computer program would you use in the future? Why?
b. What design is suggested by this report?



Answer:

a. I would use program 2 because the difference in mean times between
the two programs is 5 which is larger that the LSD of 2, which
indicates that the observed difference was unlikely (at the .05
level) to have occurred by chance alone. Thus, the sample data
indicates a significant difference between the two programs.
b. A completely random design is indicated by the ANOVA table because
no degrees of freedom have been subtracted from the total for any
blocking factors. The total loss in degrees of freedom is
attributable to the mean and one treatment factor.


77.


An imaginary experiment was conducted to compare length of life of
batteries sold be 4 manufacturers.


Results obtained included:


ANOVA
Source of Variation df SS M.S.


Total 16 - -
Mean 1 - -
Corrected Total 15 - -
Type of flashlight 3 - -
Month of testing 3 - -
Brands 3 - -
Error 6 - -


Means
Brand c 4.0
Brand b 3.9
Brand a 3.5
Brand d 3.3


LSD(.05) = .2


a. What brand(s) would you suggest buying if all prices were the same?
Why?
b. What design was used?
c. What does the ANOVA table tell you about how the data was obtained?



Answer:

a. I would suggest buying brands c or b because although they are not
significantly different from each other, both of them are significantly
different from the other two brands, based on the LSD at a .05
significance level.
b. A latin square design is implied by the ANOVA table. Two blocking
factors are indicated; type of flashlight and month of testing, each
having four levels.
c. The ANOVA table indicated an LSQ design with the two restrictions on
randomization of treatments to experimental units. Each brand of
battery was required to occur once within each month and once with each
type of flashlight.


78.


Suppose that you are in charge of product testing for a chemical
company. You are persuaded that your company had a new product that is
a promising way of relieving insomnia.


a. Suppose that the resources available only permit testing one other
treatment in addition to the new compound. What will that
treatment be? Why?


b. Suppose that available resources permit testing two treatments
besides the new product. What will be your choice of treatments?
Why?


c. What will be your choice if five additional treatments can be tested
(in addition to the new product)? Explain you choice.



Answer:

a. The other treatment would be a control so that there will be some
real basis for comparison to see if the product does any good at all
b. Then I would choose a control and a product which is believed to be
effective at relieving insomnia.
c. If I could test 5 additional treatments, they would be a control and
four treatments for insomnia. If four were much more widely used or
more interesting that others, they would be the four tested. If all
insomnia preparations were about equally interesting, then I would
choose four randomly.


79.


Sominex commercials repeatedly present endorsements of the form:  I take
Sominex and sleep something fierce. Some skeptics would suggest that
many, if not all, of the Sominex endorsers could take an inert pill and
sleep something fierce. Why would an inert pill be a better "control"
for testing the effectiveness of a sleeping potion that no pill at all?
(e.g. test subjects might report not being sleepy, then they might
randomly receive either Sominex or no pill at all.)



Answer:

The problem associated with not taking any pill at all to compare with
taking a Sominex pill is that such a design has not controlled for the
possibility of an effect on response of taking "any" pill. That is,
some people may take a pill to sleep, and just the act of taking a pill,
which they think works will in fact have an effect on their ability to
sleep. By comparing Sominex to an inert pill, or control, one can
control for this possible effect and look at just the effect of Sominex.
Since both groups are going through the process of "taking pills", this
effect when comparing the two has been controlled and the differences
between the two groups will be due to the ingredients of the pill alone,
rather than including effects of "taking a pill".


80.


Suppose that you have identified three pocket calculator models that you
regard as suitable for your work and comparable in price. Suppose that
you will make your decision on which one to buy on the basis of the time
that it takes for you to perform a particular set of calculations. You
feel that you are equally familiar with all 3 models, but suspect that
if you repeat the same set of calculations over and over again, you will
become slower and slower. Suppose that it is reasonable to repeat the
calculations six times on each machine. Which design will you use? Why?


i) Completely Random
ii) Two 3X3 Latin Squares
iii) Randomized Block



Answer:

Choose two 3X3 Latin Squares because the position in the testing process
appears to have a significant effect on the response, and, therefore,
this effect should be balanced out.


81.


The time required for a computer user to finish running a program to
perform a particular analysis depends on many things such as-
Language used by the program,
Number of other jobs being processed on the system,
Amount of data being summarized, etc.


Suppose that 3 programs were available to perform the same analysis
where the chief difference among programs was the language used for
writing them. Suppose that these languages were:


1. FORTRAN
2. BASIC
3. Assembler


Suppose further that 3 terminals were available and all 3 versions could
be run at the same time. (A different person at each terminal). Which
of the following designs would you use? Why?


i) Randomized Block
ii) Latin Square
iii) Completely Random



Answer:

I would use a Randomized Block design so that each program would run at
the same time. This assumes that there will be no important effect of
person or terminal on the amount of time needed to run a program.


82.


An investigator wished to study the effect of an operator on the
performance of a machine. He could arrange to have each of four
operators run the machine five times. A response measurement could be
recorded each time the machine was used. How many experimental units
will he have if-


a. He randomly selects an operator, has him run the machine five times,
then selects another operator, etc.?


b. He identifies 20 turns for running the machine and randomly assigns
operators to turns subject to the requirement that each operator
perform five times?


c. He forms five groups of four turns and randomly and independently
assigns operators within each group of four?



Answer:

a. 4 : an experimental unit is a set of 5 turns or time of running the
machine.
b. 20: an experimental unit is a turn.
c. 20: an experimental unit is a turn even though turns have been
arranged in groups.


83.


An investigator plans to conduct an experiment to evaluate three
different methods for measuring a chemical. An experimental
unit will involve the activities of a technician during 12 time
periods over six days. When asked what pattern of variation
he would expect if the technician used the same method in
each of those time periods, the investigator responded that all
measurements would be about the same, that he expected no re-
gular pattern of variation.
Which of the following experimental designs do you recommend? Why?
i) Latin Square
ii) Randomized Block
iii) Completely Random Design



Answer:

I recommend iii, Completely Random Design because there are no expected
influencing factors for which we should balance the effects.


84.


Suppose that we wished to compare two drivers in terms of miles
travelled per gallon of gas used. Suppose that an experimental
unit consists of independently driving over a single
prescribed course involving about the same traffic and both
city and open road driving within a prescribed time range. In
order to broaden the scope of this comparison, it is desired
to use:
a Cadillac
Ford
Volkswagen
Datsun Z
Jeep
Mazda
a. How would you set up this trial to balance the effects of
kind of car on driver performance? What design would you use?
b. How would you balance both the effects of kind of car
and day of travel?



Answer:

a. I would use a randomized block design and use kind of car as
a blocking factor, having each driver drive in each of
the cars.
b. If length of time required to complete a run were short
enough so that two comparable runs could be finished on the
same day, I would define a block as 2 runs on the same day
involving the same kind of car. Drivers would be randomly
assigned to runs each day. If it was not reasonable to
regard 2 runs on the same day as comparable, then a scheme
involving a 2 X 2 latin square for each brand of car
might be useable.


85.


You now have at your disposal one 16-inch gun.  You are to develop a
testing scheme to test shell velocity for five brands of ammunition.
Because of the amount of explosive required for each round, each
supplier will provide just four rounds.


While the gun to be used for testing has a new barrel, you know from
previous experience that gun barrel wear is pronounced. In fact,
you are willing to assume that the only consequential uncontrolled
environmental factor is gun wear and that order of firing has a
linear effect on shell velocity. (Velocity of shell declines from
firing 1 to firing 20.)


a. Define experimental unit.
b. Propose a testing scheme and assign brands to experimental units.
c. Will your testing scheme avoid distortions due to firing order
if the influence of firing order is I = 50 - 2T (I: influence,
T: time of firing, T = 1, 2, ..., 20)?



Answer:

a. An experimental unit is one firing of the gun, and all the
preparation that goes with it.
b. I propose that the randomized complete block design be used.
I would assign the treatments to the experimental units within
four blocks of five sequential units. The following is an example
of an assignment of treatments to experimental units.


Block 1 2 3 4


Treatment
1 2 6 15 18
2 5 9 11 19
3 1 10 13 20
4 3 7 14 17
5 4 8 12 16


c. This scheme will not entirely avoid distortions due to firing
order. The randomized complete block goes a long way in balancing
these distortions.


If you had five rounds of each brand of ammunition you could use a
Latin Square design and eliminate the distortions altogether. (It's
also possible to use 4 columns of a 5 X 5 Latin Square, but it's not
expected that this option would be proposed in an introductory
class.)


Influence values for the above randomization
(e.g. for firing #1. I = 50 - 2 = 48
firing #2. I = 50 - 4 = 46)


Treatment Total Influence
1 46 38 20 14 118
2 40 32 28 12 112
3 48 30 24 10 112
4 44 36 22 16 118
5 42 34 26 18 120


Treatment 5 had the largest advantage due to firing order,
but RCB randomization largely balanced firing order effects.


86.


An investigator wishes to explore the effectivess of four coagulants
(named A, B, C, and D) in removing suspended material from water.
He proposes to try these coagulants on six different water samples
where each sample is large enough to provide an aliquot for testing
each coagulant. The proposed procedure consists of


* Preparing enough of each coagulant for Sample 1 where the
order of preparing coagulants is random
* Treating aliquots of Sample 1 in the same order and putting
samples on the stirrer in that order
* Repeating the same process for each of the other five water
samples (using a different randomization for each
sample).


a. How would you conduct a uniformity trial for the above situation?
b. What would it tell you?



Answer:

a. You would conduct a uniformity trial by running the whole experi-
ment but using only one of the coagulants throughout.


b. It would show you any patterns of variation in response among the
experimental units independent of the treatments. It would also
give you an estimate of the background variation.


87.


The average number of hours an electric circuit lasts before
failing is 100 hours. An engineer claims that he can develop
a circuit that increases the average life of the circuit. It
is desired to test H(0): MU = 100 against the appropriate al-
ternative hypothesis. The alternative hypothesis is best rep-
resented as:


(a) H(A): MU =/= 100 (c) H(A): MU < 100
(b) H(A): MU <= 100 (d) H(A): MU > 100



Answer:

(d) H(A): MU > 100


88.


We want to compare two machines for production line speed of beer bottle
manufacturing. At the end of n(1) = 9 days, the number of bottles pro-
duced by machine 1 yield XBAR(1) = 19, S(1)**2 = 4. For machine 2, we
have n(2) = 6 days, XBAR(2) = 17, and S(2)**2 = 9. Assume independence
of samples, normality, and equality of unknown variances.


In testing H(0): MU(1) = MU(2) vs. H(1): MU(1) =/= MU(2), the observed
value of our statistic is:


(A) (2)/(SQRT(77/13))(SQRT((1/9) + (1/6)))
(B) (2)/SQRT((4/9) + (9/6))
(C) (2)/(SQRT(90/13))(SQRT((1/9) + (1/6)))
(D) (2)/SQRT((16/9) + (81/6))
(E) (2)/(SQRT(36/13))(SQRT((1/9) + (1/6)))



Answer:

(A) (2)/(SQRT(77/13))(SQRT((1/9) + (1/6)))


First we must find the pooled variance:
S(P)**2 = ((8)(4) + (5)(9))/(9 + 6 - 2)
= 77/13


Now standard error of the difference between means:
S(XBAR(1) - XBAR(2)) = (SQRT(77/13))(SQRT((1/9) + (1/6)))


The observed t-value is:
t(calculated) = ((19 - 17) - 0)/S(XBAR(1) - XBAR(2))
= 2/S(XBAR(1) - XBAR(2))


89.


We want to compare two machines for production line speed of beer bottle
manufacturing. At the end of each of n(1) = 9 days, the number of
bottles produced by machine 1 yield XBAR(1) = 19, S(1)**2 = 4. For ma-
chine 2, we have n(2) = 6 days, XBAR(2) = 17, and S(2)**2 = 9. Assume
independence of samples, normality, and equality of unknown variances.


Suppose we test H(0): MU(1) = MU(2) vs. H(1): MU(1) =/= MU(2) at
ALPHA=.05, and the value of the test statistic is 2.20. Then we should:


(A) Do not reject (continue) H(0)
(B) Reject H(0)
(C) Do not reject (continue) H(1)
(D) Both (B) and (C)
(E) Both (A) and (C)



Answer:

(D) Both (B) and (C)


Given that: t(calculated) = 2.20
and find : t(critical, ALPHA=.05, twotailed, df=13) = +/- 2.160.


Since t(calculated) falls in the area of rejection, we would reject
H(0) and would not reject (continue) H(1).


90.


Molybdenum rods are produced by a production line setup.   It is desir-
able to check whether the process is in control. Let X = length of such
a rod. Assume X is approximately normally distributed with mean = MU
and variance = SIGMA**2, where the mean and variance are unknown.


Take n = 400 sample rods, with sample average length XBAR = 2 inches,
and SUM((X - XBAR)**2) = 399.


In testing H(0): MU = 2.2 vs. H(1): MU =/= 2.2 at level ALPHA = 8%,
one should _____ the H(0) since the value _____ lies _____ the
confidence interval.


a) continue, 2.2, within
b) reject, 2, outside of
c) reject, 2.2, outside of
d) continue, 2, within
e) either b or c



Answer:

c) reject, 2.2, outside of


S**2 = 399/399 = 1
S(XBAR) = SQRT(S**2/n) = .05


If you center the confidence interval on the sample mean the
confidence interval = 2 +/- (1.75)(.05)
= from 1.9125 to 2.0875


which does not contain the hypothesized value, 2.2.


91.


Molybdenum rods are produced by a production line setup.  It is desired
to check whether the process is in control. Let X = length of such a
rod. Assume X is approximately normally distributed with mean = MU
and variance = SIGMA**2, where the mean and variance are unknown.


Take n = 400 sample rods, with sample average length XBAR = 2 inches
and SUM((X - XBAR)**2) = 399.


If one were testing H(0): MU = 1 vs. H(1): MU =/= 1 at level
ALPHA = _____, one should _____ the H(0) since the value 1 lies
_____ the confidence interval.


a) 16%, not reject (continue), within
b) 8%, not reject (continue), within
c) 4%, not reject (continue), within
d) 4%, not reject (continue), to the left of
e) 4%, reject, to the left of



Answer:

e) 4%, reject, to the left of


S**2 = 399/399 = 1; S(XBAR) = SQRT(S**2/n) = .05;
C.I. = 2 +/- Z(ALPHA/2) * .05;


Z(16%/2) = 1.41, Z(8%/2) = 1.75, Z(4%/2) = 2.05


C.I.(ALPHA=16%) = 2 +/- (1.41)(.05)
= from 1.93 to 2.07


C.I.(ALPHA= 8%) = 2 +/- (1.75)(.05)
= from 1.91 to 2.09


C.I.(ALPHA= 4%) = 2 +/- (2.05)(.05)
= from 1.90 to 2.10


1 is not included in any of the confidence intervals, so H(0) should
be rejected in all cases.


92.


It is known that long, thin titanium rods  lengthen  with  increasing
temperature. A sample of n=20 identical titanium rods is selected.
Each is subjected to a particular uniform temperature for a specified
time. Let Y denote the change in length. The readings are
(X(1),Y(1)),...,(X(20),Y(20)), with data XBAR=2 (in hundreds of
degrees F), YBAR=3 (in milli-inches), SUM(X-XBAR)**2 = 10,
SUM(Y-YBAR)**2 = 40, and SUM(X-XBAR)(Y- YBAR) = 16.


In testing H(0): RHO = 0 vs H(1): RHO =/= 0 (RHO = population value
for Pearson correlation coefficient) at level ALPHA = 5%, one should ___
H(0) since the statistic r/SQRT((1-r**2)/(n-2)) = ____ is ____
than the correct table value of ____.


(a) reject, 5.7, greater, 2.086
(b) reject, 5.7, greater, 1.734
(c) reject, 5.7, greater, 2.093
(d) reject, 5.7, greater, 2.101
(e) continue, 1.6, less, 2.086



Answer:

(d) reject, 5.7, greater, 2.101


93.


In order to compare two brands of  tires,  Nader's  Raiders  selected
five tires of each brand, measuring the mileage for which each tire
gave adequate service. The result of the test (expressed in
thousands of miles) were:


BRAND A BRAND B
------- -------
28.2 25.5
24.9 25.4
23.0 25.3
21.8 25.0
28.1 24.8
----- -----
126.0 126.0


Assuming both brands sell for the same price, which brand of tire would
you say is the better buy? (Do not compute standard deviations.)



Answer:

Under the no computation requirement, it appears Brand B is the better
buy, since the mileage figures are more consistent and the rank ordering
(highest to lowest) AABBBBABAA seems to favor B.


94.


Two types of paint are to be tested.  Paint I is somewhat
cheaper than paint II. The test consists of giving scores to
the paints, after they have been exposed to certain weather
conditions for a period of 6 months. Five samples of each
type of paint are scored as follows:


Paint I ^ 26 16 20 25 23
________________________________
Paint II ^ 20 28 32 25 25


We should like to adopt paint I, the cheaper one, unless we
have definite reason to believe that paint II is better.
Test the hypothesis that MU(2) <= MU(1) at level of signifi-
cance ALPHA = 10.


A. State your test statistic and critical region.
B. Perform your calculations.
C. State your conclusions.



Answer:

Assumptions: (a) Both populations are normal and independent.
(b) Populations have the common variance SIGMA**2.


A. Test statistic: t = [XBAR(1) - XBAR(2)!/[S(XBAR(1) - XBAR(2))!
t(critical, ALPHA=.1, df=8) = -1.397


B. t = [XBAR(1) - XBAR(2)!/[S(XBAR(1) - XBAR(2))!


where: n(1) = n(2) = 5
S(XBAR(1) - XBAR(2)) = SQRT((S(1)**2/n(1)) + (S(2)**2/n(2)))


XBAR(2) = 26 XBAR(1) = 22
S(2)**2 = 19.5 S(1)**2 = 16.5


If t(calc) <= t(crit), we reject H(0).


t(crit) = -1.4
t(calc) = (22-26)/2.68
= -1.491


C. Since t(calculated) < t(critical), we reject H(0): MU(2) <=
MU(1) and adopt paint II.


95.


Five parallel determinations of zinc in an organic substance have been
obtained. The results arranged in order are: 16.84%, 16.86%, 16.91%,
16.93%, and 17.08%. The initial reaction is a desire to discard the
highest value, which seems to be an outlier.


a) Briefly describe the considerations which should arise
in the mind of the experimenter in deciding how to treat
the data.


b) Perform a statistical test to determine if the datum should
be rejected. What is the inherent weakness of such tests?



Answer:

a) 1. Are there any physical reasons on which to base a rejection?
(i.e., dirty glassware, spill, etc.)
2. Are data normally distributed?
3. What are requirements of the analyses?


b) Using the first 4 measurements to calculate XBAR and S(X):


XBAR = 16.885
S(X) = 0.0420


t(calc) = (17.08 - 16.885)/(.042*SQRT(1+(1/4)))
= 4.153


t(crit, ALPHA=.05, df=3, two-tailed) = 3.182


Since t(calc) > t(crit), the datum does appear suspect and should
be considered for deletion.


NOTE: In this case,


VAR(X(5)-XBAR) = VAR(X(5) - [[X(1)+X(2)+X(3)+X(4)!/4!)
= SIGMA**2 + [1/16![4*(SIGMA**2)! - 2COV(X(5),XBAR)
= (SIGMA**2)(1 + (1/4))


(COV(X(5),XBAR) = 0, since X(5) and XBAR are independent.)


96.


A report on the effect of a nuclear power plant on number of fish per
unit area in nearby waters states that the hypothesis under test
was: H(0): MU(1) - MU(2) >= 20 where


MU(1) is mean number of fish during the year before the plant was
constructed; and


MU(2) is mean number of fish during the year after the plant began
operation.


a. Is this a one tailed or a two tailed test?


b. Will a sample difference of 19 ever result in rejection of
this claim? (i.e. will a reduction of 19 ever lead to
rejection of this claim.)



Answer:

a. one tailed test


b. yes


97.


a.  The information sheet of an insecticide company carried this
statement:


"Differences in control between our material and the current
standard material were not statistically significant at the
99% confidence level."


How do you interpret this statement? What, if any, additional
information would you like to have in order to make a choice of
which material to use?


b. Suppose that the statement read that differences were significant
at the 99% confidence level. How would you interpret this state-
ment? What, if any, additional information would you like to have
in order to make a choice?


(In answering this question, it may be helpful to picture an
experiment as a way of sampling a population of differences in
response to these 2 chemicals. Commonly, it's hypothesized that
such a population of differences has a mean of zero, i.e., H(0):
MU = 0. Such a hypothesis is readily tested by using sample dif-
ferences to set confidence limits.)



Answer:

a. As the statement stands, it simply means that the new material is
neither better, nor worse, at controlling insects than the current
standard material. However, no information is available on the res-
pective means and the corresponding measures of variation; what size
sample was used to make the comparison; was a one or two tailed test
made on the difference or a confidence interval created around the
observed difference; what was the standard error of the difference?


b. My first reaction to the statement of significant differences at the
.99 level is what was the observed difference, i.e., which product
controls insects better] However, it still lacks the necessary in-
formation to support such a statement. Once again, knowledge of the
size of the observed difference, the corresponding t-value or confi-
dence interval, and measures of variation are needed to judge the
merit of the concluding statement. Statistically significant dif-
ferences can be observed simply due to large sample sizes or dif-
ferences that are just barely significant may indicate the need
for further experimentation or replication.


98.


In a study of learning ability, six boys and six girls were chosen at
random from a kindergarten class. Scores were obtained as measure-
ments of their ability to learn nonsense syllables. Students were
paired on the basis of IQ (that is, the boy with the lowest IQ was
paired with the girl with the lowest IQ, etc.). The data are
presented below.


SCORES
Pair Number Boys Girls
----------- ---- -----
1 10 13
2 14 8
3 16 10
4 13 13
5 13 12
6 15 13


a. Estimate the difference between the population mean scores for boys
and that for girls.


b. Find a 99% confidence interval for the difference in population
means, doing your calculations very roughly, so as to find which
of the following is closest to the answer. Circle the corres-
ponding number.


1. -12 to 16
2. -4 to 8
3. -1 to 5
4. 0 to 4


(If you don't like any of these, show your work for partial credit.)


c. The primary purpose of the pairing was in hopes of reducing
(circle one):


1. skewness in the data
2. the standard error of the difference in sample means
3. the degrees of freedom in the appropriate t-test
4. heterogeneity of the variances
5. correlation between boys and girls



Answer:

a. Estimated difference between the population mean scores for boys
and girls = XBAR(boys) - XBAR(girls)
= 13.5 - 11.5
= 2


b. 2. -4 to 8


Since it is a paired test, we will compute S(DBAR) in order to
compute the 99% confidence interval for the difference in popu-
lation means.


DBAR = XBAR(boys) - XBAR(girls) = 2
d(i) = (D(i) - DBAR), where D(i) is the difference between the
ith pair.


S(D) = SQRT[SUM(d(i)**2)/(n-1)!
= SQRT(62/5)
= 3.52


S(DBAR) = S(D)/SQRT(n)
= 3.52/SQRT(6)
= 1.44


99% confidence interval:
= DBAR +/- [t(ALPHA=.01, twotailed, df=5)*S(DBAR)!
= 2 +/- [(4.032)*(1.44)!
= 2 +/- 5.8
= from -3.8 to 7.8


c. 2. the standard error of the difference in sample means


99.


A chemist studies two treatments applied to a chemical which he must
prepare in small quantity because of cost and variability of its pro-
perties. The first time he runs the experiment he applies treatment
A to half of the first batch and treatment B to the other half of the
first batch. He experiments with six batches and on the basis of six
pairs of observations he declares the means of the two treatment pop-
ulations to be just barely significant. The next time he runs a sim-
ilar experiment involving the same chemical and two treatments. He
intends to run a two group (unpaired) experiment.


(1) State some advantages (things in favor of) a two group experiment
for the chemist.


(2) State some reasons why the chemist might again prefer a paired de-
sign.



Answer:

(1) A two group experiment allows for more degrees of freedom for mak-
ing a test. The group design has twice as many degrees of freedom
and in addition it allows for the possibility of getting three or
more experimental units out of one test chemical preparation batch.
With the two group experiment there is no restriction with regard
to equal sample sizes.


(2) With a paired design, unwanted variability can be handled. That
is, the variance of difference SIGMA(D)**2 will in general be less
than the variance of observations. There is no need to assume nor-
mality of each population and only differences need to be assumed
independent with homogeneous variance when a t-test is employed.


100.


We  are  interested in the wearing capabilitites of tires.  We obtain
Good-day and Good-poor Tires and 9 racing cars (and also the track
used for the Indianapolis 500 Race). We put Good-day on the
left-hand side of the car (front and rear) and Good-poor on the
right-hand side of the car (front and rear). We then allow the cars
to complete the 500 miles at a (relatively) safe speed and then
measure the wear (in millimeters) per tire.


Car No. Good-day Good-poor
------- -------- ---------
77 17 16
82 18 19
92 17 12
41 16 13
17 15 14
22 14 12
18 10 10
23 18 15
43 17 13


a. All the advertising literature claims equality between Good-day and
Good-poor. Can you present evidence to disprove this claim? Use a
significance level of 5%.


b. Comment on the validity of this experimental set-up.



Answer:

a. Let d be the difference in wear between tires on the left-hand side
compared to tires on the right-hand side. We are interested in
testing the hypothesis that the mean (dBAR) of such different scores
is zero.


H(0): MU(dBAR) = 0
H(1): MU(dBAR) =/= 0


The problem is obviously a paired experiment set-up and therefore we
perform a t-test on the difference.


Car No. GD GP d(i) d(i)**2
------- -- -- ---- -------
77 17 16 1 1
82 18 19 -1 1
92 17 12 5 25
41 16 13 3 9
17 15 14 1 1
22 14 12 2 4
18 10 10 0 0
23 18 15 3 9
43 17 13 4 16
---- -------
SUM 18 66


dBAR = [SUM(d(i))!/[9!
= 18/9
= 2


S(d)**2 = [SUM([d(i)-dBAR!**2)!/[n-1!
= [[SUM(d(i)**2)!-[n*(dBAR**2)!!/[n-1!
= [[66!-[9*4!!/[8!
= 3.75


t(calc.) = [dBAR-0!/[SQRT([S(d)**2!/n)!
= [2-0!/[SQRT([3.75!/9)!
= 3.098


t(critical, .05, two-tailed, 8 df) = 2.306


Since t(calculated) > t(critical), reject H(0). Therefore we can
claim on the basis of this test that the tires are not equal.


b. The Indianapolis race track has an oval shape with highly-banked
curves. Since the cars travel in only one direction, only the
inner tires would wear appreciably. There are many other drawbacks
to the design, but this one is catastrophic.


101.


A computer programmer was concerned about the length of time that
would be required to print exam questions and answers at a
terminal (a printing device like a typewriter). The programmer
knew:
-that the terminal was capable of printing 30 characters
per second;


-the number of lines (maximum length: 80 characters) of text
that made up each question and answer;


-the identity of the questions to be printed at any particular
time; and


-that question selection for a single exam was a random process.


Accordingly, the programmer estimated the time required to print a
set of questions and answers by multiplying the total number of lines
to be printed by 80 and then dividing the result by 30 to arrive at
an estimated print time in seconds. Later the programmer was
informed that his estimates for print time were always too high.
What do you recommend that the programmer do to improve his
estimation scheme?



Answer:

A variety of suggestions should be offered in response to this
question. Some should focus on the distribution of characters per
line and the use of something other than the maximum length as a
basis for estimation. (This assumes that the printing device
recognizes short lines and does not always attempt to print 80
characters.) My preference would be to form a series of exams
covering the anticipated range of useage and using random selection
as much as possible. I would print these exams on the terminal and
hope that a regression involving:


Y: Print time; and
X: Total number of lines


would provide an adequate approximation of estimating print time. If
not, I would then be inclined to examine use of such variables as
question length, answer length, computer load, etc.


102.


A test was performed to determine intensity settings for a certain type
of filter. In 20 separate runs of the test, the results for filter 1
are as follows:


96, 83, 97, 93, 99, 95, 97, 91, 100, 92,
88, 89, 85, 94, 90, 92, 91, 78, 77, 93.


Use this data to answer the following. (You may assume a normal distri-
bution.)


a. Write a model to describe an individual measurement from the popula-
tion in terms of an overall mean and a random element. Define all
terms completely. (You may assume the intensity settings to be
determined by a large number of factors that operate independently.
Assume that each factor makes a small contribution to intensity set-
ting and that contributions are additive.)


b. Obtain sample estimates for the parameters and set 90% confidence
limits for MU.


c. How would you modify your answer to part b if you knew that SIGMA**2
was 36?


d. Verify that the sample variance, S**2, is equal to:


S**2 = SUM(i=1,n)(e(i)**2) where e(i) is a deviation from the
------------------- mean and P is the number of para-
(n - P) meters estimated other than the
population variance.



Answer:

a. Population form of model: Y(i) = MU + EPSILON(i), i = 1, 2, ...


where Y(i): intensity setting at which oprator i can first
detect an image using filter 1.


MU: the population average intensity setting for an
indefinitely large population of operators.


EPSILON(i): a random element representing the difference in
intensity setting between that required by a par-
ticular operator (Operator i) and the population
mean. The usual assumption is that the EPSILON
(i)'s are normally and independently distributed
with a mean of 0 and a variance of SIGMA**2.


Sample form of model: Y(i) = MU(HAT) + e(i), i = 1, 2, ...


where Y(i): same as above.


MUHAT or YBAR: an estimate of MU above.


e(i): = Y(i) - MU(HAT) ! A deviation, the difference
= Y(i) - Y(i)HAT ! between an observed inten-
= Y(i) - YBAR ! sity setting, Y(i), and what
we would estimate for the
ith intensity setting,
Y(i)HAT. Here, Y(i)HAT =
MU(HAT) = YBAR.


b. PARAMETER ESTIMATE
--------- --------
MU MU(HAT) = YBAR = (SUM(i=1,20)(Y(i)))/n
= 1820/20
= 91


SIGMA**2 S**2 = (SUM(i=1,20)((Y(i) - YBAR)**2))/(n - 1)
= ((SUM(i=1,20)(Y(i)**2))-(SUM(Y(i))**2)/20)/(n-1)
= [(96**2 + ... + 93**2) - ((1820)**2/20)! / 19
= 39.789 with 19 df


CONFIDENCE LIMITS (Variance Estimated):


General form for limits: parameter estimate +/- (t)*(estimated
standard error of parameter estimate)


In this case, limits for MU = MU(HAT) +/- t(ALPHA=.05, df=19) *
(S(MU(HAT)))
= YBAR +/- (1.729) * (S(YBAR))
= 91 +/- (1.729) * (1.410)
= from 88.56 to 93.44


i.e., 90% of the time that we draw a random sample of 20 opera-
tors and calculate an interval in this way, we will get an in-
terval that contains the true mean, MU. (This assumes that the
model used is appropriate.)


c. CONFIDENCE LIMITS (Variance Known):


General form for limits: parameter estimate +/- (Z)* (known
standard error of parameter estimate)


In this case, limits for MU = MU(HAT) +/- (Z) * (SIGMA(MU(HAT)))
= YBAR +/- (1.64) * (SIGMA(YBAR))
= 91 +/- (1.64) * (1.34)
= from 88.8 to 93.2


d. i Y(i) Y(i)HAT=YBAR e(i)
- ---- ------------ ----
1 96 91 +5
2 83 91 -8
3 97 91 +6
4 93 91 +2
5 99 91 +8
6 95 91 +4
7 97 91 +6
8 91 91 0
9 100 91 +9
10 92 91 +1
11 88 91 -3
12 89 91 -2
13 85 91 -6
14 94 91 +3
15 90 91 -1
16 92 91 +1
17 91 91 0
18 78 91 -13
19 77 91 -14
20 93 91 +2



SUM(i=1,20)(e(i)**2) = (5)**2 + ... + (2)**2
= 756


[SUM(i=1,20)(e(i)**2)!/(n - P) = 756/19
= 39.789 with 19 df


since Y(i)HAT = YBAR.


103.


Molybedenum rods are produced by a production line setup.  It is desired
to check whether the process is in control. Let X = length of such a
rod. Assume X is approximately normally distributed with mean = MU and
variance = SIGMA**2, where MU and SIGMA**2 are unknown.


Take N = 400 sample rods, with sample average length XBAR = 2 inches,
and SUM((X-XBAR)**2) = 399.


The correct confidence interval for MU at ALPHA = 8% is closest to:


a. 2.2 +/- (1.75/20)
b. 2 +/- (1.41)SQRT(399/400)
c. 2.2 +/- (2.06/20)
d. 2 +/- (1.75/20)
e. 2 +/- (1.67)SQRT(399/400)



Answer:

d. 2 +/- (1.75/20)


S**2 = 399/(400-1) = 1
S(XBAR) = SQRT(S**2/n)
= SQRT(1/400)
= 1/20


C.I. = XBAR +/- Z(ALPHA=.08/2) * S(XBAR)
= 2 +/- (1.75) * (1/20)


104.


John has done an experiment on gallons of water per second that flow in
a sewer main in the city. He makes 16 measurements of this flow and
finds that their average is 100 and their variance is 9. Find a 98%
confidence interval for the mean flow.



Answer:

C.I. = XBAR +/- [t(df=15,ALPHA=.01)*S/SQRT(n)!
C.I. = 100 +/- [2.602*(3/SQRT(16))!
C.I. = 100 +/- 1.95
= from 98.05 to 101.95


105.


The  calculated nitrogen content  of  pure  benzanilide is 7.10%.  Five
repeat analyses of "representative" samples yielded values of 7.11%,
7.08%, 7.06%, 7.06%, and 7.04%. Using an ALPHA level of size 5%, can we
conclude that the experimental mean differs from the expected value?
Assume that the measured values are approximately normally distributed.



Answer:

H(O): MU = 7.10
H(A): MU =/= 7.10


YBAR = 7.07


S(Y) = 0.0265


t = (YBAR - MU)/S(YBAR) = (7.07 - 7.10)/(0.0265/SQRT(5))
= 2.53


t(critical, ALPHA=.05, df=4) = +/- 2.776


Since the calculated value of t is not in the critical region, continue
H(O) that the nitrogen content has a true value of 7.10%, i.e., the
0.03% difference is ascribable to random error.


or


YBAR +/- t*(S(Y)/SQRT(n))
YBAR +/- 2.776*(0.0265)/(SQRT(5))
P(7.037 <= MU <= 7.103) = 0.95


Continue H(O) that the nitrogen content has a true value of 7.10% at 95%
level since 7.10 lies within the 95% confidence interval.


106.


Wire cable is being manufactured by two processes.  We need to
determine if the processes are having different effects on the
mean breaking strength of the cable. Randomly selected samples
from each process were submitted to the lab for testing as though
they were regular samples. Coded values of the load required
to break the cables (tension) are given below:


Process No. 1: 9, 4, 10, 7, 9, 10


Process No. 2: 14, 9, 13, 12, 13, 8, 10


Determine if there is any difference in the mean breaking
strength for the two processes at the 95% probability level.



Answer:

HO: MU(1) - MU(2) = 0


n(1) = 6 n(2) = 7
MU(1) = 8.167 MU(2) = 11.285
S(1)**2 = 4.47 S(2)**2 = 4.49


F = 4.49/4.47 = 1.004
F(ALPHA = .05, df = 5,6) = 4.39


Since F(calculated) < F(critical), we can assume at the 95% probability
level that there is no difference in the standard deviations. It is,
therefore, acceptable to pool the variances.


S(P)**2 = [(n(1) - 1)(S(1)**2) + (n(2) - 1)(S(2)**2)!/(n(1) + n(2) - 2)
= (5*4.47 + 6*4.49)/11
= 4.48


S(1BAR - 2BAR) = SQRT(4.48/6 + 4.48/7)
= 1.177


t = [(8.167 - 11.285) - 0!/1.177
= -2.66


t(ALPHA = .05, df = 11) = 2.201


Since t(calculated) > t(critical), reject H(0). Therefore, the two
methods are different.


Using a confidence interval:
X(2)BAR - X(1)BAR +/- 2.201 * 1.77
3.118 +/- 2.591


C.I. = P(.527 <= MU(2) - MU(1) <= 5.71) = .95


107.


Past production units of a certain jet engine model showed the mean
military thrust to be 7600 pounds. The first ten production units
manufactured after a model change yielded military thrusts of 7620,
7680, 7570, 7700, 7650, 7720, 7600, 7540, 7670, and 7630. Is there
sufficient evidence (use ALPHA = 0.05) that the model change
resulted in a higher average military thrust?



Answer:

Using ALPHA = .05 and a one-tailed t-test we test:
H(0): MU <= 7600
H(A): MU > 7600


Finding: YBAR = 7638
S(Y) = SQRT((583,420,000 - ((76,380)**2/10))/9) = 57.3


t = (YBAR - MU)/(S(Y)/SQRT(N))
t = (7638 - 7600)/(57.3/SQRT(10))
t = 2.097


t(critical) = 1.83


Since t(calculated) is larger than t(critical) for a one-sided test at
ALPHA = .05, reject the null hypothesis. At the 95% confidence level,
the sample evidence indicates a detectable increase.


108.


Past experience shows that, if a certain machine is adjusted properly, 5
percent of the items turned out by the machine are defective. Each day
the first 25 items produced by the machine are inspected for defects.
If three or fewer defects are found, production is continued without
interruption. If four or more items are found to be defective, produc-
tion is interrupted and an engineer is asked to adjust the machine.
After adjustments have been made, production is resumed. This proce-
dure can be viewed as a test of the hypothesis p = .05 against the
alternative p > .05, p being the probability that the machine turns
out a defective item. In test terminology, the engineer is asked to
make adjustments only when the hypothesis is rejected.


Interpret the quality control procedure described above as a test of
the indicated hypothesis. A Type I error results in:


a. a justified production stoppage to carry out machine adjustments.
b. an unnecessary interruption of production.
c. the continued production of an excess of defective items.
d. the continued production, without interruption, of items that
satisfy the accepted standard.



Answer:

b. an unnecessary interruption of production.


109.


The daily yield of a chemical manufactured in a chemical plant,
recorded for n = 49 days, produced a mean and standard deviation
equal to XBAR = 870 tons and s = 21 tons, respectively.


Test H(0): MU = 880 against H(A): MU < 880, using ALPHA = .05.
Calculate BETA for H(A): MU = 870.



Answer:

S(M) = S/SQRT(n) = 21/7 = 3
XBAR(crit) = MU(M) + Z(crit)S(M)
= 880 + ((-1.65)*3)
= 875.05


Since 870 < 875.05, we reject H(0) and conclude that MU < 880.


BETA is the probability of committing a type II error. Using the
above decision rule and given H(A), it is the probability that XBAR
is greater than XBAR(crit) = 875.05 when MU = 870.


BETA(H(A): MU = 870) = P(XBAR > 875.05); Z = (875.05 - 870)/3
= P(Z > 1.683) ; = 1.683
= .046


110.


An economist is interested in the possible influence of "Miracle Wheat"
on the average yield of wheat in a district. To do so he fits a linear
regression of average yield per year against year after introduction of
"Miracle Wheat" for a ten year period. The fitted trend line is


YHAT(j) = 80 + 1.5*X(j)
(Y(j): Average yield in j year after introduction)
(X(j): j year after introduction).


a. What is the estimated average yield for the fourth year after
introduction?
b. Do you want to use this trend line to estimate yield for, say, 20
years after introduction? Why? What would your estimate be?



Answer:

a. 80 + 1.5*4 = 86
b. No. I would not want to extrapolate that far. If I did, my estimate
would be 110, but some other factors probably come into play with
20 years.


111.


A  management  analyst  is  studying  production  in  an   electronic
component assembly factory. Workers individually assemble components
into final products. Each worker is given 100 sets of components to
assemble each day. Employees clock out at the time they finish
assembling the 100 sets into final products. The analyst has average
hourly production rates for each individual worker. Which mean
should be used to calculate the overall average production per labor
hour?


a. arithmetic mean
b. geometric mean
c. harmonic mean



Answer:

c. harmonic mean


The harmonic mean is properly used since the numerator in each
worker's average production is 100 units and the denominator,
hours worked, varies.


112.


A   management  analyst  is  studying  production  in  an  electronic
component assembly factory. Workers individually assemble components
into final products. Workers assemble as many units as they can in
an eight hour day. The analyst has average hourly production rates
for each individual worker. To calculate the factory's overall
average hourly production per worker, which mean should be used?


a. arithmetic mean
b. geometric mean
c. harmonic mean



Answer:

a. arithmetic mean


The arithmetic mean of individual average hourly production rates
is the same as total production divided by total hours worked,
since individual rates are daily production divided by eight for
every employee.


113.


A computer programmer reports that the average time required to run a
particular program is 11.67 minutes, and that the variance is 8.55 with
5 df. In the Appendix of his report, he lists the following values for
time to run the program:


12, 17, 9, 13, 11, 8.


a. What model for time to run the program was implicit in what he re-
ported?


b. What does this report (or the model) say about factors that might
affect run times?



Answer:

a. Y(I) = MU + EPSILON(I)
Where:


Y(I): time to run job I
MU: population mean run time
EPSILON(I): random error term associated with
job I, usually assumed to be normally
distributed with a mean of zero and a
variance of SIGMA**2.


b. The model that he uses in his report says that all important
factors that affect runtime have been held constant.


114.


Suppose that you have at your disposal the information below for each
of 30 drivers. Propose a model (including a very brief indication of
symbols used to represent independent variables) to explain how miles
per gallon vary from driver to driver on the basis of the factors
measured.


Information:
1. miles driven per day
2. weight of car
3. number of cylinders in car
4. average speed
5. miles per gallon
6. number of passengers



Answer:

Y(j) = b(0) + b(1)*X(1) + b(2)*X(2) + b(3)*X(3) + b(4)*X(4) + b(5)*X(6)
+ e(j)


where the dependent variable is variable 5 - miles per gallon and the
independent variables are
X(1) - miles driven per day
X(2) - weight of car
X(3) - number of cylinders in car
X(4) - average speed
X(6) - number of passengers


115.


A hospital is considering use of a new device for measuring patient
temperatures. For each of 50 patients it is proposed that there be one
time when the patient's temperature is taken with both a standard
thermometer and the new device. The order of using devices is to be
randomly determined for each patient. The model proposed to describe
the temperatures recorded is:


Y(i,j) = MU + TAU(i) + RHO(j) + EPSILON(i,j)i=1 or 2, j=1, 2, ..., 50.


a. What is Y(i,j)?
b. What is TAU(i)?
c. What is RHO(j)?
d. What design is proposed?



Answer:

a. Y(i,j) is the response measured for treatment i at level j of the
blocking factor (Person j).
b. TAU(i) is the effect of treatment i, treatment 1: standard thermo-
meter, treatment 2: new thermometer.
c. RHO(j) is the effect of block j, (Person j).
d. The Randomized Complete Block design has been proposed.


116.


In the attached Table 1, results for the routine measurement of
nickel in a steel standard are reported. This determination was made
daily over a long period of time to establish a quality control
program.


In Table 2, the data have been plotted as a tally sheet of
individual values. Clearly, a grouped tally sheet would be more
effective in revealing the pattern of variation in these data.


Perform the following --


(a) Set up a grouped tally sheet and histogram. A cell interval of
0.05% is recommended. List the frequency, cumulative frequency
and relative cumulative frequency for each cell.


(b) Calculate the mean and standard deviation (use coding) by both
the ungrouped and the grouped procedures. Compare results.


(c) What is the mode -- comment -- is it meaningful?


(d) What is the median?


(e) Calculate the standard deviation of the mean.


(f) Plot an ogive. Plot the data on normal probability paper. Is it
reasonable to assume a normal distribution? If so, estimate the
standard deviation and mean and compare wih the calculated values.
Estimate the percentage of values outside of the limits 4.88 to
5.21 and compare with the actual percentage.


Table 1. Results of Daily Determination of Nickel in a Nickel
Steel Standard


Date % Ni Date % Ni Date % Ni


Mar. 6 4.95 Apr. 17 4.96 May 29 5.03
7 5.02 18 4.79 30 5.08
8 5.17 19 5.06 31 5.20
9 5.08 20 5.03 June 1 5.11
10 4.92 21 4.95 2 4.95
11 4.94 22 5.10 3 4.95


13 5.22 24 5.05 5 5.00
14 4.96 25 5.30 6 4.92
15 5.05 26 5.24 7 5.16
16 5.02 27 5.00 8 5.14
17 5.14 28 5.08 9 5.02
18 5.00 29 5.04 10 5.14


20 5.07 May 1 4.97 12 5.02
21 4.83 2 4.86 13 4.97
22 5.11 3 5.07 14 4.96
23 4.99 4 4.90 15 5.26
24 4.98 5 5.22 16 5.11
25 5.26 6 5.07 17 5.15


27 4.88 8 5.31 19 4.98
28 5.01 9 5.05 20 5.15
29 4.98 10 5.16 21 5.00
30 5.21 11 5.02 22 5.14
31 5.15 12 5.18 23 4.98
Apr. 1 5.00 13 4.90 24 5.03


3 5.00 15 5.20 26 5.01
4 5.10 16 5.08 27 4.97
5 5.03 17 5.19 28 5.12
6 4.97 18 5.16 29 4.98
7 4.89 19 4.88
8 5.12 20 4.99


10 5.27 22 4.92
11 5.09 23 5.17
12 5.13 24 5.01
13 4.93 25 5.02
14 4.93 26 5.06
15 5.04 27 5.03



Table 2. Frequency Table and Tally Sheet for the Data
in Table 1


Ni Conc., Tally Frequency Ni Conc., Tally Frequency
% (y) Marks (f) % (y) Marks (f)


4.79 X 1 5.05 XXX 3
4.80 5.06 XX 2
4.81 5.07 XXX 3
4.82 5.08 XXXX 4
4.83 X 1 5.09 X 1
4.84 5.10 XX 2
4.85 5.11 XXX 3
4.86 X 1 5.12 XX 2
4.87 5.13 X 1
4.88 XX 2 5.14 XXXX 4
4.89 X 1 5.15 XXX 3
4.90 XX 2 5.16 XXX 3
4.91 5.17 XX 2
4.92 XXX 3 5.18 X 1
4.93 XX 2 5.19 X 1
4.94 X 1 5.20 XX 2
4.95 XXXX 4 5.21 X 1
4.96 XXX 3 5.22 XX 2
4.97 XXXX 4 5.23
4.98 XXXXX 5 5.24 X 1
4.99 XX 2 5.25
5.00 XXXXXX 6 5.26 XX 2
5.01 XXX 3 5.27 X 1
5.02 XXXXXX 6 5.28
5.03 XXXXX 5 5.29
5.04 XX 2 5.30 X 1
5.31 X 1



Answer:

a) (If available, consult file of graphs and charts that could not be
be computerized.)


Cell Cell Cum Rel Cum
Midpoints Boundaries f f f
4.775
4.80 1 1 0.01
4.825
4.85 2 3 0.03
4.875
4.90 8 11 0.11
4.925
4.95 14 25 0.25
4.975
5.00 22 47 0.47
5.025
5.05 15 62 0.62
5.075
5.10 12 74 0.74
5.125
5.15 13 87 0.87
5.175
5.20 7 94 0.94
5.225
5.25 4 98 0.98
5.275
5.30 2 100 1.00
5.325
___
100


b) ungrouped YBAR = 504.99/100 = 5.0499 == 5.05


ungrouped S(Y) = SQRT[(2551.3039 - 2550.1490)/99!
= SQRT(0.01166)
= 0.108 == 0.11


Grouped and coded by: Y = 0.05d + 5.05


Cell
Midpoint d f f*d f(d**2)
4.80 -5 1 -5 25
4.85 -4 2 -8 32
4.90 -3 8 -24 72
4.95 -2 14 -28 56
5.00 -1 22 -22 22
5.05 0 15 0 0
5.10 +1 12 +12 12
5.15 +2 13 +26 52
5.20 +3 7 +21 63
5.25 +4 4 +16 64
5.30 +5 2 +10 50
___ ___
sum(fd) = -2 sum(f*d**2) =
448


dBAR = (sum(fd))/n = -2/100 = -.02


YBAR = (0.05)(-.02) + 5.05 = 5.049 == 5.05


S(d) = SQRT[((448 - 2**2)/100) / 99! = SQRT(4.525) = 2.127


S(Y) = (2.127)(0.05) = 0.106 == 0.11


c) 5.00 or 5.02 - not meaningful because no single value occurs
with sufficient frequency.


d) Median is average of 50th and 51st observations -
(5.03 + 5.03)/2 = 5.03


e) S(YBAR) = S(Y)/SQRT(n) = 0.108/SQRT(100) = 0.0108 == 0.011


f) Estimates graphically should compare closely.


(If available, consult file of graphs and charts that could not be
computerized.)


Actual percentage outside = 11%.
Graphical estimate should be within about 2% of this.


117.


A coffee dispensing machine provides servings that have a population
mean of 6 ounces and a population standard deviation of .3 ounces.
If the difference is measured between randomly chosen cups (e.g.
the 7th minus the 15th, the 22nd minus the 29th, etc.), the
distribution of differences will have a mean of ______ and a
standard deviation of ______.



Answer:

a. MU = 0
b. SIGMA = SQRT(.09/1 + .09/1) = .424


118.


The  administrator of a loan program for small farmers (five foot and
under) institutes a new objective scale by which his field
investigators are asked to rate small farms on their profit
potential. He suspects that two of his investigators are applying
the standard quite differently, which offends his sense of order. To
check on them, he asks both of them to rate 12 randomly chosen farms.
The results:


FARM # 1 2 3 4 5 6 7 8 9 10 11 12


A RATING 90 80 75 80 60 80 55 40 80 65 70 60
B RATING 65 50 50 65 40 50 55 45 55 55 45 45


a) Use an appropriate statistical test to see whether this is strong
enough evidence to reject the hypothesis that they rate in the
same way.


b) Make a scatter diagram of the same data. Fit a straight line to
the set of points by eye. Estimate the equation of this line using
your graph.


c) How could the information in (a) and (b) TAKEN TOGETHER be useful
to the administrator?



Answer:

a) The appropriate test in this case appears to be the paired (re-
lated samples) t-test.


H(O): MU(Y) - MU(X) = 0
H(A): MU(Y) - MU(X) =/= 0


Calculations:


Y ^ X ^ D = Y - X ^ d = (D - DBAR) ^ d**2
--------------------------------------------------------
90 ^ 65 ^ 25 ^ 7.08 ^ 50.17
80 ^ 50 ^ 30 ^ 12.08 ^ 146.01
75 ^ 50 ^ 25 ^ 7.08 ^ 50.17
80 ^ 65 ^ 15 ^ - 2.92 ^ 8.51
60 ^ 40 ^ 20 ^ 2.08 ^ 4.34
80 ^ 50 ^ 30 ^ 12.08 ^ 146.01
55 ^ 55 ^ 0 ^ -17.92 ^ 321.01
40 ^ 45 ^ - 5 ^ -22.92 ^ 525.17
80 ^ 55 ^ 25 ^ 7.08 ^ 50.17
65 ^ 55 ^ 10 ^ - 7.92 ^ 62.67
70 ^ 45 ^ 25 ^ 7.08 ^ 50.17
60 ^ 45 ^ 15 ^ - 2.92 ^ 8.51
--------------------------------------------------------
835 ^ 620 ^ 215 ^ 0.00 ^ 1422.91


YBAR = 69.58
XBAR = 51.67
DBAR = 17.92


S(D) = SQRT(1422.91/11) = 11.37
S(DBAR) = S(D)/SQRT(12) = 3.28


t(calc) = 17.92/3.28
= 5.457


t(crit, ALPHA=.05, df=11, two-tailed) = +/- 2.201


Since t(calc) > +t(crit), reject H(O). Thus, the evidence
is strong enough that the hypothesis that they rate in the
same way can be rejected.


b) ^
^
Y ^
^
90 + * B
^
85 +
^
80 + 2 *
^
75 + * NOTE: for a line pos-
^ sibly fit by eye draw
70 + * a line through points
A-RATING ^ A and B. Also note
65 + A * that 2 indicates there
^ are two data points
60 + * * located at this posi-
^ tion.
55 + *
^
50 +
^
45 +
^
40 + *
^
-----+----+----+----+----+----+----+----+----+----> X
40 45 50 55 60 65 70 75 80
B-RATING


The equation associated with the above line fitted by eye is:
YHAT = 20 + (1.0*X)


The estimated equation found by the least squares method is:
YHAT = 14.69 + (1.063*X)


c) The information in part (a) does imply that the investigators do
apply the standards quite differently. However, using the infor-
mation in part (b), the administrator can estimate one of the
ratings given the other.


119.


We want to compare two machines for production  line  speed  of  beer
bottle manufacturing. At the end of each of n(1) = 9 days, the
number of bottles produced by machine(1) yield XBAR(1) = 19, S(1)**2
= 4. For machine(2), we have n(2) = 6 days and XBAR(2) = 17, S(2)**2
= 9. Assume independence of samples, normality, and equality of
unknown variances.


To test H(0): MU(1) = MU(2) vs. H(1): MU(1) =/= MU(2) at ALPHA=.05,
we need a table value equal to:


a. 1.771
b. 1.960
c. 2.160
d. 2.131
e. 2.145



Answer:

c. 2.160


t(df=n(1)+n(2)-2=13, ALPHA=.05, Two-tailed) = 2.160


120.


Four replicate analyses of each of two ends of a special metal rod were
made. All eight analyses were made in random order. Results for copper
analyses on end A were: 4.02, 4.04, 4.08, and 4.05. On end B, they
were: 4.08, 4.06, 4.12, and 4.10. At the 95% probability level, can we
reject the hypothesis of no difference in copper content for the two
ends? At the 99% level?



Answer:

Mean for end A = (4.02 + 4.04 + 4.08 + 4.05)/4 = 4.0475
Variance for end A = .000625
Mean for end B = (4.08 + 4.06 + 4.12 + 4.1)/4 = 4.09
Variance for end B = .000667


S(P)**2 = [(n(1) - 1)(S(1)**2) + (n(2) - 1)(S(2)**2)!/(n(1) + n(2) - 2)
= (3*.000625 + 3*.000667)/6 = .000646


S(ABAR - BBAR) = SQRT(.000646/4 + .000646/4) = .0180


HO: MU(A) - MU(B) = 0


t = (-.0425 - 0)/.0180 = -2.36


t(ALPHA=.05, df=6) = 2.447 Do not reject the null hypothesis.
t(ALPHA=.01, df=6) = 3.707 Do not reject the null hypothesis.


121.


A biologist is working with dangerous chemical residues found in wild
skunks. Of particular interest is the possible relationship between
the % Mercury accumulation in the liver and the % Telurium
accumulation in the lungs. He purchases seven "chemically clean"
skunks, and subjects them to a diet containing Telurium and Mercury.
The amounts absorbed by an animal will, of course, vary from animal
to animal. The results were:


% Mercury (X) % Telurium (Y)
------------- --------------
3 3
5 4
2 2
4 5
6 2
1 0
7 1


Possible useful summaries:


SUM(X) = 28.00000 SUM(Y) = 17.00000
SUM(X**2) = 140.00000 SUM(Y**2) = 29.00000
SUM([X-XBAR!**2) = 28.00000 SUM([Y-YBAR!**2) = 17.71429
SUM(X*Y) = 72.00000 SUM([X-XBAR!*[Y-YBAR!) = 4.00000


At the 5% level of significance, do you think there is a linear rela-
tionship between % Mercury accumulation and the % Telurium accumulation?



Answer:

Using Pearson's Product Moment correlation coefficient as an indicator
of the strength of a linear relaionship:


r = [SUM([X-XBAR!*[Y-YBAR!)!/
[SQRT([SUM([Y-YBAR!**2)!*[SUM([X-XBAR!**2)!)!
= [4.000!/[SQRT([17.71429!*[28.00000!)!
= 0.1796


To see if this is suspiciously large we may refer to special tables or
use the approximate t-test.


H(O): Correlaion is zero.
H(A): Correlation is other than zero.


i.e.
H(O): RHO = 0
H(A): RHO =/= 0


t(calc.) = [r!/[SQRT([1-(r**2)!/[n-2!)!
= [0.1796!/[SQRT([1-(0.1796**2)!/[5!)!
= 0.4082 with 5 df


t(crit., df=5, ALPHA=.05, two-tailed) = +/- 2.571


Since -t(crit.) < t(calc.) < +t(crit.) continue (do not reject)
H(O). It seems there is no relationship between Mercury and
Telurium accumulations.


However, a plot of the data reveals:


* = DATA
^
^
^
^
5 ^ *
^
^
^
^
4 ^ *
^
^
^
^
% of 3 ^ *
Telerium ^
^
^
^
2 ^ * *
^
^
^
^
1 ^ *
^
^
^
^
0 ^ *
1 2 3 4 5 6 7 8
% of
Mercury


It is obvious that Mercury and Telurium are related by a curve having
a maximum Y when X=4. Why didn't the t-test reveal this? It is be-
cause the test only looks at linear correlation. These two variables
are correlated but not in a linear manner.


122.


We want to know which of two types of filters  should  be used  over  an
oscilloscope to help the operator pick out the image on the cathode ray
tube. A test was designed in which the strength of a signal could be
varied from zero up to the point where the operator first detects the
image. At this point, the intensity setting was read. The lower the
reading when the image was first detected, the better the filter is.
Because people vary in their ability to detect the image, twenty opera-
tors were selected and each one made one reading for each filter. From
the results which are tabulated below, test the null hypothesis of no
detectable difference in the filters. If they do differ at some ALPHA
level of less than .10, tell which is best.


Operator F1 F2 Operator F1 F2 Operator F1 F2


1 96 92 8 91 90 15 90 89
2 83 84 9 100 93 16 92 90
3 97 92 10 92 90 17 91 90
4 93 90 11 88 88 18 78 80
5 99 93 12 89 89 19 77 80
6 95 91 13 85 86 20 93 90
7 97 92 14 94 91



Answer:

DBAR = SUM(D)/N = 40/20 =2


F1 F2 D=F1-F2 D-DBAR (D-DBAR)**2


96 92 4 2 4
83 84 -1 -3 9
97 92 5 3 9
93 90 3 1 1
99 93 6 4 16
95 91 4 2 4
97 92 5 3 9
91 90 1 -1 1
100 93 7 5 25
92 90 2 0 0
88 88 0 -2 4
89 89 0 -2 4
85 86 -1 -3 9
94 91 3 1 1
90 89 1 -1 1
92 90 2 0 0
91 90 1 -1 1
78 80 -2 -4 16
77 80 -3 -5 25
93 90 3 1 1
-- ---
40 140


S(D) = SQRT[SUM((D-DBAR)**2)/(N-1)! = SQRT(140/19) = 2.7145


Using a paired t test:


H(O): MU(F1 - F2) = 0
H(A): MU(F1 - F2) =/= 0


t(calculated) = (DBAR - MU(F1-F2))/(S(D)/SQRT(N))
= (2 - 0)/(2.7145/SQRT(20))
= 3.30


t(ALPHA = .01, df=19) = 2.86
and
t(ALPHA = .001, df=19) = 3.88


At ALPHA = .01, t(calculated) > t(critical), so you would reject H(O).
At ALPHA = .001, t(calculated) < t(critical), so you would continue
H(O).


Since F1BAR = 91 and F2BAR = 89, F2 should be considered the best.


123.


Two methods were used in a study of the latent heat of fusion of ice.
Both method A (an electrical method) and method B (a method of mixtures)
were conducted with the specimens cooled to -0.72 degrees C. The data
represent the change in total heat from -0.72 degrees C to water at
0 degrees C, in calories per gram of mass.


METHOD A METHOD B
79.98 80.02
80.04 79.94
80.02 79.98
80.04 79.97
80.03 79.97
80.03 80.03
80.04 79.95
79.97 79.97
80.05
80.03
80.02
80.00
80.02


Is there any difference in the 2 methods at the 5% probability level?



Answer:

H(0): MU(A) = MU(B)
H(A): MU(A) =/= MU(B)


YBAR(A) = 80.02 YBAR(B) = 79.98
S(A)**2 = 0.000574 S(B)**2 = 0.000984
n = 13 n = 8


F = 984/574 = 1.71 F(ALPHA=.05, df=7,12) = 2.92


Since F(calculated) < F(critical), we can assume at the .95 level that
there is no difference in the standard deviations. Therefore, it is
acceptable to pool.


S(P) = SQRT[((.12*.000574) + (7*.000984))/(12+7)! = 0.0269


t = [((80.02-79.98)-0)/(0.0269*SQRT(1/13+1/8))!
= .04/(0.0269*.45) = 3.30


t(ALPHA=.05, df=19, two-tailed) = 2.093


Since t(calculated) exceeds t(critical), reject H(0), i.e. the 2
methods are different.


124.


Propose and justify your proposal for a relation, if any, of
the following variables on steam use.
Steam Use Production Wind Days Worked Days Below 32o Temperature
1. 10.98 .61 7.4 20 22 35.3
2. 11.13 .64 8.0 20 25 29.7
3. 12.51 .78 7.4 23 17 30.8
4. 8.40 .49 7.5 20 22 58.8
5. 9.27 .84 5.5 21 0 61.4
6. 8.73 .74 8.9 22 0 71.3
7. 6.36 .42 4.1 11 0 74.4
8. 8.50 .87 4.1 23 0 76.7
9. 7.82 .75 4.1 21 0 70.7
10. 9.14 .76 4.5 20 0 57.5
Values are on a monthly basis for a manufacturing firm. Wind
and temperature entries are monthly means. Days worked and Days
Below 32 degrees are number of days in a month.



Answer:

The model proposed here is:
Y(Steam use) = B(0) + B(1) * X2(prod) + B(2) * X6(temp) + E
This description based on temperature (X6) and production
(X2) accounts for around 96% of the variation in steam use. The
fitted equation is:
YHAT = 11.31 + 4.56*X2 - .09*X(6).
The tests of individual b values and F test of regression mean
square are significant. The residual patterns seem to
be acceptable.
Since Days below 32 degrees and Days worked are highly correlated
with Temperature and Production (respectively) we suspect
that they will not add much to the regression.


Model with Temperature and Production
df SS M.sq.
Regression 2 27.90 13.95
Error 7 1.21 .17
R**2 = .958


Model with all five variables:
df SS M.sq.
Regression 5 28.42 5.68
Error 4 .69 .17
R**2 = .976


125.



Average June
August Yield Minimum Temperature June Rainfall
Y X(1) X(2)


13.1 50.4 3.1
14.1 51.0 5.0
15.7 49.1 6.7
14.3 51.2 5.2
15.2 48.1 6.9
16.7 48.0 7.8
13.8 51.0 5.6
12.4 49.6 4.0
11.5 53.1 3.7
15.3 48.2 6.5
14.4 52.2 4.8
13.3 50.5 4.3
12.5 54.2 1.9
12.7 50.1 5.6
16.5 49.9 6.8


The above data was studied with the aid of a computer. It is data
typical of actual corn yield information recorded in Oklahoma. The
correlations were as follows:


yield vs. temperature : -.657
yield vs. rainfall : .846
temperature vs. rainfall: -.796


Now, although the correlation between yield and temperature is strong
and negative, the least squares equation given by the computer print-
out was:


Y = 7.79 + .0379X(1) + .847X(2).


Is the sign of the temperature variable X(1) consistent with the nega-
tive correlation coefficient? Explain.



Answer:

The printout is correct. The correlation need not have the same sign
as the coefficient of the variable in the least squares fit. In the
presence of X(2) the effect of X(1) need not be the same as the effect
of X(2) alone. This makes sense in this experiment. Corn needs mois-
ture, and rain is usually accompanied by cool weather, but the best com-
bination for corn yield is warm and wet weather.


126.


Listed below is some fictitious data concerning
1. Amount of oil needed to fill the tank for the heating plant of a
building.
2. Days elapsed since last oil delivery.
3. Average outside temperature (Fahrenheit) during the time since last
delivery


Oil Days since last delivery Average Temperature (F)
60 16 10.1
41 17 22
50 16 15.4
29 13 18
81 19 7.3
74 26 28
57 25 34


a. Write and fit 2 models
1. One relating oil consumption to average temperature alone.
2. One relating oil consumption to both temp. and days since last
delivery.
b. Compare the estimated regression coefficients for temperature from
these two models. Why do you think there is this difference (of
lack of difference)?
c. Compare the estimated intercepts for the two models. Does either
of these values seem reasonable? Why?



Answer:

a. 1. Y = b(0) + b(1)*X(3) + e
Y = 61.216 - .27084*X(3)
2. Y = b(0) + b(2)*X(2) + b(3)*X(3) + e
Y = .42296 + 5.0193*X(2) -2.0289*X(3)


b. Regression coefficients for temp.
Model 1: -.27084 Model 2: -2.0289
The coefficients for average temperature are different because the
variable "Days since last delivery" provide important information
for exploring variation in oil use. When average temperature is
fitted without using information about time since last delivery all
times are treated as if the same and the coefficient for temperature
is calculated as if times were the same. When variation in time
since last delivery is taken into account, the coefficient for
temperature no longer is calculated as if there were no variation
in time since last delivery.


c. Intercepts Model 1: 61.2 Model 2: .42296
R-Square Model 1: .02 Model 2: .99


I would go with Model 2 because Model 1 does not account for much
of the variation, and the intercept for Model 2 does make more
sense. If it has been zero days since the last delivery, they
should not need much oil, even if it has been very cold (0 degrees)
that day.


127.


It is known that long-thin titanium curtain rods lengthen with in-
creasing temperature. A sample of n = 20 identical titanium rods
are selected. Each is subjected to a particular uniform tempera-
ture X for a specified time. Let Y denote the change in length.
The readings are (X(1), Y(1)), ..., (X(20), Y(20)), with data XBAR
= 2 (in hundreds of degrees fahrenheit), YBAR = 3 (in milli-inches),
SUM((X - XBAR)**2) = 10, SUM((Y - YBAR)**2) = 40, and SUM[(X - XBAR)
(Y - YBAR)! = 16.


The least squares regression line of the form Y = a + bX has values
a = __________and b = __________respectively.


(a) 4/5, 7/5 (d) 7/4, -1/2
(b) 3/4, 3/2 (e) 1/5, 13/5
(c) 8/5, -1/5



Answer:

(c) 8/5, -1/5


a = YBAR - b(XBAR)


b = SUM(i = 1, n)[(X(i) - XBAR)(Y(i) - YBAR)!/
SUM(i = 1, n)[(X(i) - XBAR)**2!


b = 16/10 = 8/5


a = 3 - 8/5*2 = -1/5


128.


It is known that long, thin titanium curtain rods lengthen with in-
creasing temperature. A sample of n = 20 identical titanium rods is
selected. Each is subjected to a particular uniform temperature X for
a specified time. Let Y denote the change in length. The readings are
(X(1), Y(1)), ..., (X(20), Y(20)), with data XBAR = 2 (in hundreds of
degrees F), YBAR = 3 (in Milli-inches), SUM((X - XBAR)**2) = 10, SUM
((Y - YBAR)**2) = 40, and SUM[(X - XBAR)(Y - YBAR)! = 16.


When the temperature is set at 400 degrees (i.e., X = 4), then the
predicted value of the lengthening of the rod is closest to ______
milli-inches.


(a) 24/5 (d) 3/5
(b) 17/5 (e) 31/5
(c) 5



Answer:

(e) 31/5


b = SUM(i = 1, n)[(X(i) - XBAR)(Y(i) - YBAR)!/
SUM(i = 1, n)[(X(i) - XBAR)**2!


b = 16/10 = 8/5


a = YBAR - b(XBAR)


a = 3 - 8/5*2 = -1/5


Hence, YHAT = a + bX
= -1/5 + 8/5X


Now X = 4


Hence, YHAT = -1/5 + 8/5*4 = 31/5


129.


The following data were obtained in a study of road width and the number
of accidents occurring per hundred million vehicle miles.


Width Number of Accidents
73 42
50 83
62 58
30 93
25 90


The Department of Transportation wishes to use width to predict number
of accidents. Determine an equation which will enable them to do this.
Can the department significantly improve its prediction of number of
accidents by using the data on width, over what the prediction would
be not using the width data? (HINT: do a hypothesis test.) (Get the
appropriate formulas set up, data inserted, then approximate.)



Answer:

X = Width Y = Number of Accidents


SUM(X) = 240 SUM(Y) = 366
SUM(X)**2 = 13198 SUM(Y)**2 = 28766
SUM(X*Y) = 15852


b = (15852 - (240*366)/5)/(13198 - (240**2)/5)
= (-1716)/1678
= -1.02


a = 366/5 - ((-1.02)*240/5)
= 122.16


Regression equation: YHAT = 122.16 - 1.02*X


Source df SS MSQ F
------------------------------------------
Regression 1 1754.86 1754.86 23.93
Deviations 3 219.94 73.31
Total 4 1974.8


Critical Region: F > 10.1


Reject H(O) and conclude that width data can significantly
improve the prediction of accidents.


130.


Find the regression line of the stopping distance Y on the speed X
of cars based on the following data:


X = speed (mph) 20 30 40 50
Y = stopping distance (ft) 50 90 150 210



Answer:

Y = -64 + (5.4*X)


131.


The following data are the result of a thermodynamic experiment:


X surface area -2 -1 3 5
Y heat loss 0 6 9 10


a. Find the least squares line to fit these data, and make a sketch
of the points and the line.


b. Estimate the loss of heat for a surface area of 4.0.


c. Would you feel safe in using this line to estimate heat loss for
a surface area of 10? Explain.



Answer:

a. Computations:


X ^ Y ^ X**2 ^ Y**2 ^ X*Y
-------^-------^----------^----------^----------
-2 ^ 0 ^ 4 ^ 0 ^ 0
-1 ^ 6 ^ 1 ^ 36 ^ - 6
3 ^ 9 ^ 9 ^ 81 ^ 27
5 ^ 10 ^ 25 ^ 100 ^ 50
-------^-------^----------^----------^----------
5 ^ 25 ^ 39 ^ 217 ^ 71


XBAR = 5/4 = 1.25


YBAR = 25/4 = 6.25


bHAT = [(4*71)-(5*25)!/[(4*39)-(5**2)!
= 1.214


aHAT = 6.25 - (1.214*1.25)
= 4.733


^
Y ^
^
10 +-----^-----^-----^-----^-----* (Note: draw a
^ ^ ^ ^ ^ ^ line through
^ ^ ^ ^ ^ ^ points A and
9 +-----^-----^-----*-----^-----^ B for an
^ ^ ^ ^ ^ ^ approximation
^ ^ ^ ^ ^ ^ to the least
8 +-----^-----^-----^-----^-----^ squares line.)
^ ^ ^ ^ ^ ^
^ ^ ^ ^ ^ ^
Heat Loss 7 +-----^-----^-----^-----^-----^
^ ^ ^ ^ ^ ^
^ ^ ^ ^ ^ ^
* 6 +-----B-----^-----^-----^-----^
^ ^ ^ ^ ^ ^
^ ^ ^ ^ ^ ^
5 +-----^-----^-----^-----^-----^
^ ^ ^ ^ ^ ^
^ ^ ^ ^ ^ ^
4 +-----^-----^-----^-----^-----^
^ ^ ^ ^ ^ ^
^ ^ ^ ^ ^ ^
3 +-----^-----^-----^-----^-----^
^ ^ ^ ^ ^ ^
^ ^ ^ ^ ^ ^
2 +-----^-----^-----^-----^-----^
^ ^ ^ ^ ^ ^
^ ^ ^ ^ ^ ^
A 1 +-----^-----^-----^-----^-----^
^ ^ ^ ^ ^ ^
^ ^ ^ ^ ^ ^
<-+-----*-----+-----+-----+-----+-----+-----+-----+------> X
-3 -2 -1 0 1 2 3 4 5
Surface Area


b. YHAT = 4.733 + (X*1.214)
= 4.733 + (4.0*1.214)
= 9.589


c. No. Since the observed values for surface area range from -2 to
5, I would feel very apprehensive about extrapolating to a surface
area of 10.


132.


An engineer is interested in the flow rate of a river (volume/min.)
at a downstream location, D. He has a poor set of records for this
location, but an extensive set of records for an upstream location
U. He would like to find a way to estimate flow rates at D corres-
ponding to various flow rates at U.


a) If we use a regression model for a straight line, what are the
usual symbols and assumptions that apply to measurements taken
at U and D?


b) If there were no streams of any consequence entering the river
between U and D, how would our choice of model be influenced
and what parameters are to be estimated for a straight line re-
lation?


c) If a number of major streams entered the river between U and D,
how would our choice of model be influenced and what parameters
are to be estimated for a straight line relation?


d) Suppose that there are several streams entering the river between
U and D. Estimate the slope of a straight line linking these
measurements.


Flow Rate at U Flow Rate at D
-------------- --------------


1 3
2 7
3 9
4 9
5 12



Answer:

a) The measurements at U are values for the independent variable
usually represented by X and are assumed to be measured with
negligible error. The measurements at D are values for the de-
pendent variable usually represented by Y such that each Y(i)
is assumed to be normally and independently distributed with mean
= ALPHA + BETA*X(i) and variance = SIGMA**2.


b) We might force the straight line through the origin (X = 0,
Y = 0). Parameters to be estimated would be BETA and SIGMA**2.


c) We should then be reluctant to claim that there were any values
of X where we did not have to estimate the corresponding value
of Y. We would use a model for a straight line through (XBAR,
YBAR) and estimate ALPHA, BETA and SIGMA**2.


d) For a straight line through (X, Y),


b = [SUM(X(j)*Y(j))!/[SUM(X(j)**2)!


SUM(X(j)*Y(j)) = 20
SUM(X(j)**2) = 10


BETA(HAT) = b = 2


133.


A few months ago, Road & Track  magazine  compared  the  performance  of
about 25 sports cars with respect to attainable top speed and fuel eco-
nomy. Regressions were run to investigate how both top speed and fuel
economy were affected by the horsepower capability of the engine. The
findings are summarized below.


Where: M(i) = miles per gallon of the ith car.
S(i) = top speed in miles per hour of the ith car.
H(i) = horsepower rating of the ith car, measured as the actual
number of horses. All cars tested had between 50 and 300
horsepower engines.


Equation (1): MHAT(i) = 30 - .05*H(i) r**2 = .55
Equation (2): SHAT(i) = 60 + .20*H(i) r**2 = .72


(1) Interpret the regression coefficients (slope and intercept) of
Equation (1) precisely.


(2) What do the results suggest about the relative usefulness of horse-
power rating in predicting fuel economy on one hand and top speed
on the other hand?


(3) Why would you hesitate to use the regression results for predicting
the performance characteristics of cars which have less than a 50
horsepower engine?


(4) Based on the results of equations (1) and (2), (as well as on in-
tuition), one could say that there exists a "trade-off" between top
speed and fuel economy; i.e., in order to generate an improvement
in one, you must sacrifice some of the other. Compute the magni-
tude of the trade-off; the number of miles per hours we would pre-
dict would be sacrificed for each additional mile per gallon of
fuel economy.



Answer:

(1) Slope = -.05 indicating m.p.g. decreases with an inrease in horse-
power.


Intercept = 30 indicating all cars got less than 30 m.p.g. and, in
fact, since minimum horsepower was 50, all cars got
less than 27.5 m.p.g.


(2) Horsepower is more useful in predicting speed, (i.e. r**2 is
larger).


(3) No such cars were in the sample, therefore one cannot safely extra-
polate unless one has reason to believe (through other similar
experiments) that the relationship follows a similar linear pattern
below 50 h.p.


(4) 20 horsepower per 1 mile change in economy, and 20 horsepwer would
increase speed by 4 m.p.h. Therefore, an increase of 1 m.p.g. in
fuel economy should be accompanied by a decrease of 4 m.p.h. in the
speed of the car.


134.


Once upon a time an investigator was concerned about fuel consumption
and speed of travel of automobiles. He measured miles per gallon,
mpg, for static tests at 25 miles per hour (mph) and 55 miles per
hour for many brands of cars.


His report stated that a regression line had been fitted for each
brand and that a straight line describes the relation between
mpg and mph perfectly since r**2 = 100 for every brand.


Do you subscribe to the claim that a straight line perfectly
describes the relation of mpg to speeds between 25 and 55 mph?
Why?



Answer:

No, I don't, because he has fit regression lines with only two
observations, so naturally his r**2 equals 100. You need at least
three observations to test for(or to observe) departures from
a straight line. Two points determine one and only one line.


135.


Sometimes a fitted regression equation will do a good job of explaining
how response varies with the independent variables measured and fail
miserably to agree with theory or previous observations. For example,
an equation relating yield for a crop to rate of Nitrogen (N) applica-
tion might fit well and indicate that increased N reduces yield. How
can there be such an inconsistency?



Answer:

A regression equation that fits observed responses well is concerned
with summarizing the data set at hand. If the message conveyed by that
equation doesn't fit with theory or previous experience, it may well be
that the data set at hand involves variable settings different from
those envisioned by theory or encountered in previous experience. Such
conflicts should not be dismissed quickly. They probably indicate that
response is being observed under "new" conditions that warrant detailed
comparison with those that have been considered previously (These "new"
conditions may be due to variation in factors other than those ordina-
rily thought to be important. e.g., If we obtained data relating the
volume of an ideal gas to pressure, we would obtain consistent results
as long as all other factors were kept constant. But, the results that
applied under constant conditions would fail if we allowed, say, tempe-
rature to vary. These results would not fit previous experiences or
theory. But, if temperature had been measured, these results could be
reconciled with previous experience by further analysis).


136.


It is known that long-thin titanium curtain rods lengthen with in-
creasing temperature. A sample of n = 20 identical titanium rods
are selected. Each is subjected to a particular uniform tempera-
ture X for a specified time. Let Y denote the change in length
corresponding to X.


The readings are (X(1), Y(1)), ...(X(20), Y(20)), with data XBAR =
2 (in hundreds of degrees fahrenheit), YBAR = 3 (in milli-inches),
SUM(X - XBAR)**2 = 10, SUM(Y - YBAR)**2 = 40, SUM[(X - XBAR)(Y -
YBAR)! = 16.


The sample correlation coefficient r =


(a) 4/5 (d) 1/4
(b) 3/4 (e) 1/25
(c) 1/5



Answer:

(a) 4/5


Y = SUM((X - XBAR)(Y - YBAR))/
SQRT(SUM((X - XBAR)**2)SUM((Y - YBAR)**2))


= 16/SQRT(40*10) = 16/20 = 4/5


137.


Briefly discuss your evaluation of the following statement.


"Since the linear correlation coefficient (r) between IQ and
earning potential is near 0 there is no relationship between
the two."



Answer:

The statement should probably be rewritten as:


"Since the linear correlation coefficient (r) between IQ and
earning potential is near 0 there is no linear relationship
between the two."


This would better emphasize the fact that no linear relationship may
exist, however there remains the possibility that higher order rela-
tionships may exist.


138.


Thirty patients in a leprosorium were randomly selected to be treated
for several months with one of the following:


A - an antibiotic
B - a different antibiotic
C - an inert drug used as a control


At the end of the test period, laboratory tests were conducted to
provide a measure of abundance of leprosy bacilli in each patient.


Scores obtained were:


Patient 1 2 3 4 5 6 7 8 9 10
Drug A 6 0 2 8 11 4 13 1 8 0
Drug B 0 2 3 1 18 4 14 9 1 9
Drug C 13 10 18 5 23 12 5 16 1 20


a. Analyze and interpret the results of this trial (use ALPHA = .05).
b. Write a model appropriate to this trial. Define all terms
and estimate all parameters.
c. To be consistent with the model you have specified, what order
should have been followed in collecting samples from patients
and in carrying out lab tests?



Answer:

a. Using the program CARROT*** you get the following results:


Means for Antibiotics
Treatment Mean (leprosy bacilli)
Placebo 12.3
Drug A 5.3
Drug B 6.1


LSD (at ALPHA = .05) = 5.57113


LSD tests on means:


H(0): XBAR(1) - XBAR(2) = 0
H(A): XBAR(1) - XBAR(2) =/= 0


(XBAR(1) - XBAR(2)) +/- LSD
Placebo and drug B


6.2 +/- 5.57
Interval is from +.63 to +11.77


The interval does not contain zero, therefore, we reject H(0).


Since both antibiotic treatments are significantly different from
the placebo, one would conclude that they have a significant effect
on reducing leprosy bacilli. However, antibiotics A and B are not
significantly different from each other. The differences in their
means could be attributed to chance variation alone.


b. Model: Y(I,J) = MU + TAU(I) + EPSILON(I,J)


Response = overall mean + treatment effect + error(assumed to have
a mean of zero
and variance =
SIGMA**2)


Estimates: MU 7.9
TAU(1) drug A -2.6
TAU(2) drug B -1.8
TAU(3) Placebo 4.4
SIGMA**2 36.9


c. Suppose that the 30 patients had been assigned identification
numbers 1, 2, ..., 30. Presumably, treatments were randomly
assigned to patients on the basis of these numbers. To be con-
sistent with this set up, samples should have been collected
and analyses run according to the identification numbers
(i.e. sample collected first and analysis run first from patient
1 and so on). this amounts to randomly assigning order of sam-
ple collection and order of lab analysis as well as patients to
treatments. (It's preferable to collecting samples and running
analyses for all on Drug A, then all on Drug B, then all on
placebo.)


139.


An imaginary investigation was conducted to compare three different
locations at which body temperature can be obtained. Over a two
month period, 20 occasions were randomly selected from a large set
of times and beds that could reasonably be used in a ward. On those
occasions, the temperature of the patient in the bed was taken at
all three locations where the order of measurement was randomized
independently with each patient. (This imaginary experiment
must have been conducted by extraordinarily persuasive people.)


Imaginary results were:


Location: 1 2 3


Patient 1 98.8 98.9 98.6
2 98.2 98.2 98.0
3 99.0 99.1 99.1
4 100.6 100.6 100.5
5 99.8 99.9 99.6
6 101.3 101.5 101.2
7 100.4 100.4 100.2
8 101.6 101.7 101.6
9 98.5 98.5 98.4
10 98.1 98.2 97.9
11 98.8 99.0 98.7
12 100.9 100.9 100.7
13 98.4 98.6 98.3
14 99.0 99.1 98.8
15 101.4 101.4 101.3
16 99.8 99.9 99.6
17 99.2 99.3 99.2
18 98.7 98.9 98.6
19 99.2 99.3 99.2
20 101.5 101.6 101.5


a. Summarize the results of this investigation (use ALPHA = .05).
b. What was the estimated variance (S**2)? What was the standard
error of the difference between location means (S(dBAR))?
c. Set 95% confidence limits for the temperature difference between
location 1 and location 2.



Answer:

a. The analysis needed for this trial is for an RCB design with
patients corresponding to blocks. It is found that the three
treatments (locations) all yield responses (temperatures)
significantly different from one another.


The results were:


ANOVA


Source df SS M.Sq.


Total 60 595929.1280
Mean 1 595847.2000
Corrected total 59 81.9219
Location 2 0.4063 0.20313
Person 19 81.3906 4.28372
Experimental Error 38 0.1250 0.00329


Treatment means


Location 2 99.75
Location 1 99.66
Location 3 99.55


LSD (t*S(dBAR)) for these means at the .05 confidence level=.0366547


Location 2 - Location 1 = .09
Location 2 - Location 3 = .20
Location 1 - Location 3 = .11


All of the differences between treatment means are larger than
the least significant difference, therefore, I conclude that
all the treatments yield responses significantly different from
each other.


An F test of the blocking factor:


F(calc) = 4.28372/0.00329 = 1302
F(crit. df = 19.38; ALPHA = .05) == 1.85


indicates that the blocking factor was worthwhile.


b. The variance (S**2) is estimated by the mean square for experi-
mental error, which is 0.00329


The standard error of a difference between two means (S(dBAR)) is
computed using the formula:


SQRT((2*S**2)/r)


Therefore, S(dBAR) == SQRT((2*.00329)/20)
= .01813836


c. Confidence limits for the difference between Location 2 and
Location 1:


(99.75 - 99.66) +/- t * S(dBAR)
.09 +/- 2.021 * 0.018138
95% confidence interval is from .053 to .127.


140.


Treatments A, B, C and D are to be applied to a field pictured below.
3 blocks are to be used. The soil is known to improve as we go from
east to west.


--------------------------------------------------
^ ^
^ ^
East ^ ^ West
^ ^
^ ^
--------------------------------------------------
-->Soil gets better as we move in this direction-->


1) Show in the field layout a typical randomized block design.
(Indicate the 3 blocks.)


2) Write out a fixed effects model appropriate for the data
resulting from such a design and experiment.


3) Write the sources of variation and the degrees of freedom
for an ANOVA table associated with the experiment.


4) Write the expected mean square for treatments in the ANOVA
table and explain how this EMS can be used to justify ANOVA
F tests for the null hypothesis H(O): All treatment means
are equal.



Answer:

1) Block 1 Block 2 Block 3
---------------------------------
^ D ^ B ^ A ^
^ A ^ C ^ C ^
East ^ B ^ D ^ B ^ West
^ C ^ A ^ C ^
---------------------------------



2) Let Y(ij) be the observation for the jth treatment in block i.
Suppose Y(ij) = MU + BETA(i) + TAU(j) + EPSILON(ij), where MU,
BETA(i), and TAU(j) are fixed, BETA(i) is the effect of the ith
block, TAU(j) is the effect of the jth treatment, and EPSILON(ij)
is the random error associated with observation Y(ij).


Suppose also that SUM(i=1,3)(BETA(i)=0), SUM(i=1,4)(TAU(j)=0),
and EPSILON(ij) is distributed normally and independently with
mean zero and homogeneous variance, SIGMA**2.


3) ANOVA table


Source of df SS MS EMS
Variability


Total 12
Mean 1
Blocks 2 SSB
Treatments 3 SST MST SIGMA**2+(3/3)SUM(j=1,4)(TAU(j)**2)
Exp. Error 6 SSE MSE SIGMA**2


4) If H(O) is true then TAU(j) = 0 for all j and SUM(j=1,4)(TAU(j)**2)=
0. This means that MST and MSE are both estimating SIGMA**2. Hence
an F ratio near 1 is compatible with H(0), and a large F ratio im-
plies rejection of H(0).


141.


Suppose that you are to test the effects of these fertilizer rates on
strength of cotton fiber. Rates (Potassium source in pounds/Acre)
R1. 36
R2. 54
R3. 72
R4. 108
R5. 144


Field plots to be used for this test are arranged as below:


---- ---- ---- ---- ----
1 2 3 4 5
---- ---- ---- ---- ----
6 7 8 9 10
---- ---- ---- ---- ----
11 12 13 14 15


(The numbers identify experimental units)


Randomly assign treatments within groups of 5 plots so that a randomized
complete block design will be appropriate.



Answer:

The program RCBPLN*** can be used to generate a random assignment of
treatments to experimental units so that each treatment occurs once
within each block. The following is an example:


1 2 3 4 5
Block 1 R3 R5 R2 R4 R1



6 7 8 9 10
2 R1 R4 R5 R3 R2



11 12 13 14 15
3 R4 R2 R3 R5 R1


15 Experimental units numbered 1-15
Any procedure which produces a random assignment of treatments within
each block is acceptable. A random numbers table could be used where a
method of determining I.D. numbers corresponding to the 5 treatments
was performed for each of the 3 blocks.


142.


As a researcher working for the milk industry you have been asked to
test four feed rations (named A, B, C, and D) and their effect on milk
yield for cows. A total of 16 cows are available for testing purposes.
These cows are not all alike, but do form four similar (homogeneous)
groupings of four cows each.


a. Based on the information presented above, which of the more common
experimental designs would you choose for this investigation?
Explain.
b. Write a model appropriate to the chosen design and define all terms.
c. What constitutes an experimental unit? Explain.
d. What are the treatments and how will you make comparisons among
them?
e. Produce a random assignment of treatments to experimental units for
the specified design.



Answer:

a. I would use a randomized complete block design because the cows form
four similar groups of four cows, and I think that responses may
vary among these groups.
b. Y(I,J) = MU + TAU(I) + RHO(J) + EPSILON(I,J)
Where Y is the response, (milk yield)
MU is the overall mean
TAU(I) are the treatment (ration) effects
RHO(J) are the block (group of cows) effect
EPSILON is the random element term with mean = 0 and
variance = SIGMA**2
c. An experimental unit involves a cow being fed a particular ration
over the test period.
d. The treatments are the four feed rations, A, B, C, and D. I would
feed these cows their assigned rations for the same length of time.
During this time period, I will record milk yield for each of the
cows at the end of the testing period. I will compute means for the
treatments and a measure of the variance, and will then compute an
LSD with which I could compare the means.
e. I have obtained below one possible assignment of treatments to
experimental units. Cows 1 through 4 are similar, 5 through 8 are
similar, etc.


Blocks: 1 2 3 4
Treatments:
1 2 6 9 13
2 3 5 12 15
3 4 7 11 16
4 1 8 10 14


143.


An  experiment  involving  two calculating machines [the Curta (hand-
operated) and the SR-51 (electronically-operated)!, with the former
designated as treatment C and the latter as treatment S, was
conducted on 10 sets of 15 two-digit numbers. The yield data are
seconds required to square and sum the 15 numbers. Since it was
suspected that the time reqired to square and sum a set of numbers
was shorter for the second operation on the same set of numbers than
it was on the first, this source of variation was taken into account
in designing the experiment. Each of the two treatments appeared five
times in the first order of performing the calculation and five times
in the second order, and both treatments appeared on each set of
numbers; except for these restrictions the allocation of the
treatments was random. The randomized plan for executing the
experiment and the data obtained are given below (treatments are C
and S):


------------------------------------------------------------------------
^ ^ Set of numbers squared and summed ^Order^
^ ^----------------------------------------------------------^ ^
^Order^ 1 ^ 2 ^ 3 ^ 4 ^ 5 ^ 6 ^ 7 ^ 8 ^ 9 ^ 10 ^Total^
------------------------------------------------------------------------
^ ^ ^ ^ ^ Time in seconds ^ ^ ^ ^ ^
^ 1 ^ C ^ S ^ C ^ S ^ S ^ C ^ C ^ S ^ C ^ S ^ ^
^ ^ 255^ 115 ^ 280 ^ 107 ^ 105 ^ 240^ 195 ^ 110 ^ 202 ^ 85 ^ 1694^
^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^
^ 2 ^ S ^ C ^ S ^ C ^ C ^ S ^ S ^ C ^ S ^ C ^ ^
^ ^ 113^ 200 ^ 117 ^ 238 ^ 210 ^ 104^ 90 ^ 200 ^ 105 ^ 180 ^ 1557^
------------------------------------------------------------------------
^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^
^Total^ 368^ 315 ^ 397 ^ 345 ^ 315 ^ 344^ 285 ^ 310 ^ 307 ^ 265 ^ 3251^
^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^
^Mean ^ 184^157.5^198.5^172.5^157.5^ 172^142.5^ 152 ^153.5^132.5 ^ -- ^
------------------------------------------------------------------------


Overall mean: YBAR(...) = 162.55


Order means: YBAR(1..) = 169.4 and YBAR(2..) = 155.7


Treatment means: YBAR(.C.) = 220.0 and YBAR(.S.) = 105.1


Sum of squares of estimated random errors = SUM(eHAT(hij)**2)
= 2219.50


a. The experimental unit is:


b. The experimental design is:


c. The number of degrees of freedom for error is:


d. Show how to obtain eHAT(1C1) in terms of the numbers above.


e. Show how to compute S(e)**2 = estimated variance of a single
observation in terms of the numbers above.


f. For the above data, the estimated variance of a treatment mean
equals:


g. For the above data, the estimated variance of a difference between
two treatment means is:


h. The 95% confidence interval or interval estimate for the difference
between the two treatment mean is from:


i. How do the computations in the preceding statement change in com-
puting the 80% confidence interval?


j. The sources of variation in the above experiment are:


k. What effects are orthogonal to each other in the above design?


l. The coefficient of variation for a single observation is:



Answer:

a. The experimental unit is one set of 15 numbers.


b. The experimental design is a simple change-over (cross-over).


c. The number of degrees of freedom for error is (2-1)(10-2) = 8.


d. eHAT(1C1) = 255 - 169.4 - 184.0 - 220.0 + 2(162.55)
= 6.7


e. S(e)**2 = 2219.50/8
= 277.44


f. The estimated variance of a treatment mean = 2219.50/(8*10)
= 2219.50/80
= 27.74


g. The estimated variance of a difference between two treatment
means = (2*2219.50)/(8*10)
= 2219.50/40
= 55.49


h. The 95% confidence interfal for the difference between the two
treatment means from:
220.0 - 105.1 - [2.31*SQRT(2219.5/40)! to
220.0 - 105.1 + [2.31*SQRT(2219.5/40)!; or
from 97.69 to 132.11


i. The computations in the preceding statement change in that
2.31 is changed to 1.40.


j. The sources of variation are: overall mean, bias, order effects,
set of numbers effects, calculator effects, and residual (random
error).


k. Effects that are orthogonal to each other include:


mean and bias are not, but mean and bias, order, set, and
calculator effects are orthogonal to each other.


l. The coefficient of variation = [SQRT(2219.5/8)!/[162.55!
= .1025


144.


A test was conducted to compare the relative effectiveness of three
waterproofing compounds, (A,B,C). A strip of cloth was subdivided
into nine pieces - - -


Left Center Right
_____ _____ _____ _____ _____ _____ _____ _____ _____


_____ _____ _____ _____ _____ _____ _____ _____ _____


Each piece was considered to be an experimental unit, but it was
suspected that the pieces differed systematically from left to
right in capacity to become waterproofed. Accordingly, the
random assignments of compounds to experimental units was res-
tricted so that:


I. Each compound was tested once in each set of three pieces (sets
are left, center, and right); and
II. Each compound was tested once in each of the positions within a
set of three (once furthest left in a section, once in the cen-
ter of a section, and once on the right of a section).


a. Write a model appropriate to such a trial.
b. Analyze and interpret the following results for such a randomization
scheme:


Left Center Right
_____ _____ _____ _____ _____ _____ _____ _____ _____
B, 12 A, 15 C, 16 A, 11 C, 17 B, 10 C, 10 B, 12 A, 14
_____ _____ _____ _____ _____ _____ _____ _____ _____


(consider higher numbers as better)



Answer:

a. This is an LSQ design where the model is:


Y(I,J,K) = MU + TAU(I) + RHO(J) + KAPPA(K) + EPSILON(I,J,K)
Y is response, degree of waterproofing
MU is an overall mean for waterproofing
TAU(I) are the treatment effects
RHO(J) are the column effects, or piece position on cloth
KAPPA(K) are the row effects, or the position within the piece
EPSILON is the random error, assumed to be normally distributed
with mean = 0 and variance = SIGMA**2


Estimates of parameters


SIGMA**2 = 5.333


MU 13 RHO(1) -2 KAPPA(1) 1.333
TAU(1) .333 RHO(2) 1.667 KAPPA(2) - .333
TAU(2) - 1.667 RHO(3) .333 KAPPA(3) -1
TAU(3) 1.333


Treatment means were:
C = 14.333
A = 13.333
B = 11.333


b. None of the differences among treatment means appear to be signi-
ficant; they are all less than the LSD of 18.7148 (ALPHA = .01).


The F test for treatments (alternative test with higher Type II
error rate):


H(0): TAU(1) = TAU(2) = TAU(3) = 0
F(calculated) = 1.3125
F(table, ALPHA = .01, df = 2,2) = 99,


also does not allow one to reject H(0). In conclusion, it appears
that none of the compounds are significantly different from any
other at ALPHA = .01.


145.


A test has been conducted in which four tire brands have been tested
using 12 experimental units where an experimental unit consisted of one
tire position on one car. The random assignment of brands to experi-
mental units was restricted so that each brand was tested once on each
car. Results (in amount of wear) were:


Front Right Front Left Rear Right Rear Left


Car 1 D, 7.17 A, 7.62 B, 8.14 C, 7.76
Car 2 B, 8.15 A, 8.00 D, 7.57 C, 7.73
Car 3 C, 7.74 B, 7.87 A, 7.93 D, 7.80


a. Write a model appropriate to this trial and estimate all parameters.
b. Do any of the assumptions for this design make you uneasy? Explain.
c. Analyze and interpret these results.



Answer:

a. The model is Y(I,J) = MU + TAU(I) + RHO(J) + EPSILON(I,J)
where Y is the response, tread wear
TAU(I) are the treatment effects, effects of tire brand
RHO(J) are the block effects, effects of car
EPSILON is the random error term with mean = 0 and
variance = SIGMA**2
MU is the overall mean


Estimates of parameters:


MU(HAT) = 7.79
TAU(A,HAT) = .0599 = .06
TAU(B,HAT) = .2633
TAU(C,HAT) = -.04667 = -.047
TAU(D,HAT) = -.27667 = -.277
RHO(1,HAT) = -.1175
RHO(2,HAT) = .0725
RHO(3,HAT) = .045


SIGMA**2 = .0419 with 6 df.


b. Using a randomized block (RCB) design makes me uneasy since I would
expect wheel position on car to also affect tread wear. Therefore,
I would also block on wheel position as well as car and use a Latin
Square design.


c. Treatments means are: B = 8.053, A = 7.85, C = 7.743, D = 7.513
Only one difference is significant at the .05 level. Tires B and
D are different since their difference is greater than the LSD.
(B - D) +/- LSD
.54 +/- .409
Interval is from .131 to .949
Since the interval does not include zero, we reject the null hypo-
thesis that the true difference is zero.


The F test for treatments fails. This is the case where the LSD
indicates a significant difference while the F test of treatments
doesn't. These procedures usually are different and usually have
different properties regarding Type I and Type II error rates.
Here, the LSD is more exposed to Type I errors and the F test is
more exposed to Type II errors.


146.


Write out the sources of variation and the degrees of freedom for the
following industrial experiment. Mention also the name of the design.


Three machines were used to produce parts made from four kinds of
metal. Each machine made one part from each type of metal. The order
with which the metals were assigned to the machines was established
through a randomization procedure.



Answer:

Source of Variation df
------------------- --


Total 12
Mean 1
Metals 3
Machines 2
Residual 6
(Metal x Machine)


This is a randomized block experiment with metals playing the role of
blocks.


147.


An investigator has at his disposal a garden in which there are 16
spaces for planting marigolds. The investigator is persuaded that a
plant will respond equally well (produce the same number of flowers) in
any one of these spaces. He wishes to compare 4 new marigold varieties.
The design that matches his notion of the experimental material is:


a. Latin Square
b. Randomized Block
c. Completely Random



Answer:

c., since his vision for uniform conditions is of equal response for all
experimental units.


148.


The model proposed to describe the responses measured in an experiment
is:


Y(i,j) = MU + TAU(i) + EPSILON(i,j) i=1, 2, 3, or 4


Where Y(i,j) is the number of flowers produced by a marigold plant j
belonging to the variety i.


a. What is TAU(i)?
b. What design corresponds to the model?



Answer:

a. The treatment effect of variety i, which in this case represents the
number of blossoms more or less than the overall mean produced by
a plant belonging to variety i.
b. Completely Random Design, since the model doesn't include any terms
for blocking factors.


149.


Five laboratories were invited to participate in an experiment to
test the chemical content of four materials known to vary over the
range of interest. Each laboratory was given two samples of each
material to analyze. The results were:


Laboratories
-------------------------------------------------------
Material
Specimens I II III IV V
-------------------------------------------------------


1 8,11 10, 8 7,10 9,12 10,13
2 14,19 11,15 13,11 10,13 17,19
3 20,16 21,18 21,20 22,25 24,22
4 19,13 11,12 17,15 19,17 9,11


Perform the appropriate calculations to determine if there is any
systematic difference between laboratories.



Answer:

ANOVA:


Source of Variation df SS MS F
------------------- -- -- -- -
Total 40 9696.00
Correction for mean 1 8761.00
Laboratories 4 36.65 9.16 1.97
Materials 3 628.20
Error 20 93.00 4.65
Interaction 12 176.55 14.71 3.16 *


F(critical, ALPHA=.05, df=12,20) = 2.28


In this case, the interaction is significant. This probably masks
the systematic variation in laboratories, which turns out not to
be significant.


To get a clearer picture, separate means were calculated for differ-
ent laboratories and material interactions.


ML (Material * Laboratory) Mean Response
-------------------------- -------------
11 9.5
12 9.0
13 8.5
14 10.5
15 11.5
21 16.5
22 13.0
23 12.0
24 11.5
25 18.0
31 18.0
32 19.5
33 20.5
34 23.5
35 23.0
41 16.0
42 11.5
43 16.0
44 18.0
45 10.0


Mean responses were plotted against materials for each laboratory.
The graph indicated that the lab effects depend on the specific
material being considered. No lab is consistently different from
other labs for all materials. For example, Lab 5 gives highest mean
response for material one and two, but lowest for material four.
Similarly, Lab 4 gives highest mean response for material four only.


If available, consult file of graphs and diagrams that could not be
computerized for appropriate graph.


150.


A completely randomized design was used for an  experiment  on  light
intensity in foot candle power units for three types of lights (M =
mercury vapor, L = low pressure sodium vapor, and H = high pressure
sodium vapor), in one large parking lot. Suppose the results ob-
tained were:


^ Treatment and Responses (Y(ij)) ^
^ ^
^ M L H ^
^--------------^--------------^--------------^
^ Y(M1) = 12 ^ Y(L1) = 15 ^ Y(H1) = 20 ^
^ Y(M2) = 10 ^ Y(L2) = 14 ^ Y(H2) = 12 ^
^ Y(M3) = 11 ^ Y(L3) = 13 ^ Y(H3) = 8 ^
^ Y(M4) = 9 ^ Y(L4) = 11 ^ Y(H4) = 7 ^
^ Y(M5) = 8 ^ ^ Y(H5) = 23 ^
--------^--------------^--------------^--------------^--------------
^ ^ ^ ^ Overall mean
Totals ^ Y(M.) = 50 ^ Y(L.) = 53 ^ Y(H.) = 70 ^ = 173/14
^ ^ ^ ^
Means ^YBAR(M.)=10.0 ^YBAR(L.)=53/4 ^YBAR(H.)=14.0 ^ = YBAR(..)
--------------------------------------------------------------------


The estimated treatment effects in terms of the above data are
computed as:


a. tHAT(M) = _______________.
b. tHAT(L) = _______________.
c. tHAT(H) = _______________.


The estimated random error effects in terms of the above data may
be computed as:


d. eHAT(M2) = _______________.
e. eHAT(L3) = _______________.



Answer:

a. 10 - 173/14


b. 53/4 - 173/14


c. 14 - 173/14


d. 10 - 10


e. 13 - 53/4


151.


In the past a chemical fertilizer plant has produced  an  average  of
1100 pounds of fertilizer per day. The record for the past year based
on 256 operating days shows the following:


XBAR = 1060 lbs/day
S = 320 lbs/day


where XBAR and S have the usual meaning. It is desired to test
whether or not the average daily production has dropped significantly
over the past year. Suppose that in this kind of operation, the
traditionally acceptable level of significance has been .05. But the
plant manager, in his report to his bosses, uses level of significance
.01. Analyze the data at both levels after setting up appropriate
hypotheses, and comment.



Answer:

H(O): MU = 1100
H(A): MU < 1100


Since n = 256, use Z to approximate t.


S(XBAR) = 320/SQRT(256)
= 320/16
= 20


Z(calculated) = (1060 - 1100)/20
= -40/20
= -2


Z(critical, ALPHA=.05, one-tailed) = 1.645


Z(critical, ALPHA=.01, one-tailed) = 2.33


Therefore, H(0) is rejected at ALPHA=.05 but continued at ALPHA=.01.
It appears that the manager is trying to pull a fast one on his
bosses by using ALPHA=.01 and saying production has not dropped.
However, if the traditional level of significance is used, ALPHA=.05,
there is evidence that indicates a drop in production.


152.


A test of the breaking strengths of two different types of cables was
conducted using samples of n(1) = n(2) = 100 pieces of each type of
cable.


CABLE I CABLE II
-------------------------------------
YBAR(1) = 1925 YBAR(2) = 1905
S(1) = 40 S(2) = 50


Do the data provide sufficient evidence to indicate a difference
between the mean breaking strengths of the two cables? Use ALPHA =
.10. Assume SIGMA(1)**2 = SIGMA(2)**2. The tabular value is 1.65.



Answer:

Z = (1925 - 1905)/[SQRT(1600/100 + 2500/100)! = 3.1


Therefore the data indicates a difference.


153.


A standard method for determining the amount of active ingredient in
propellants is known to have a standard deviation SIGMA = .8. Two
new propellants, assumed to be homogeneous, were tested five times.
The results of these tests are:


X(1): 63.2, 63.6, 62.7, 64.4, 63.1
X(2): 62.2, 64.8, 62.2, 60.2, 61.1


Test at the .05 level of significance whether there is a difference
in the amount of active ingredient in the two propellants.



Answer:

H(0): MU(1) - MU(2) = 0
H(1): MU(1) - MU(2) =/= 0


XBAR(1) = 63.4
XBAR(2) = 62.1


SIGMA(XBAR(1)-XBAR(2) = SQRT([SIGMA(1)**2!/[n(1)!+[SIGMA(2)**2!/[n(2)!
= SQRT(.64/5 + .64/5)
= .506


Using a Z-test:
Z = ((XBAR(1) - XBAR(2)) - 0)/SIGMA(XBAR(1) - XBAR(2))
= 1.3/.506
= 2.57


The critical two-tailed value for Z with ALPHA = .05 is 1.96.
Therefore reject H(0) at 5% significance level since 2.57 > 1.96.


154.


Suppose that you have been assigned to estimate the height of a
group of corn plants arranged in 4 rows with 50 plants in each row.
You may take measurements of 10 plants.


a. Outline a method for obtaining a random sample in such a situation.
b. What advantages or disadvantages are in such a procedure?



Answer:

a. Assign numbers to plants (1 - 200). Draw a random sample of size 10
using a random numbers table. Simplest procedure is to use
sampling with replacement.


b. Advantage is that common formulas for mean and variance apply,
but it's a nuisance to have to number plants and use random
selection.


155.


In a random sample of flashlight batteries, the average  useful  life
was 22 hours and the sample standard deviation was 2 hours. How large
should the sample size be if you want the mean of your sample to be
within 1 hour of MU 90 times out of 100 in repeated sampling?


a. 25
b. 11
c. 90
d. 35
e. both c & d. Since the calculated n is too small for the
ntral limit theorem to apply, choose n >= 30.



Answer:

b. 11


n = [[2**2![1.645**2!!/[1**2!
= 10.8241
== 11