David B. Howell
-
A. Objective — students will be able to apply 2 given formulas to solve problems such as 1, 2, and 3 posed near the beginning of this Unit.
-
B. The experimental question—see Problem 1: What percent of the high school population watches MTV more than one hour per week? How large a sample do I need to answer the question within a given error tolerance with 95% confidence?
-
C. Issues, and some resolutions —
With the bean/pea population as a model and with some sampling, we essentially worked toward an answer to the experimental question by trial and error. We discovered that
all
our combinations of samples reaching about 550 in size would give us a predicted percent within E = 3. So we would be willing to claim 95% confidence! Or, for N approximately 110, our sample gave us a value within E = 6, 96% of the time.
I have chosen to concentrate on the 95% confidence level because it is a very common level used by experimenters and pollsters. Other levels sometimes used are 90%, 99%, and 99.9%. [These correspond, of course, to = 0.05, 0.10, 0.01, and 0.001 in formal statistics.]
Here is a formula we can use to answer our MTV question:
We will predict that P, the percent of the population we want to know, is p, the percent of the population in our sample, plus or minus E, the error tolerance. In symbols, P is p +- E. Or P is in the interval from p - E to p + E. And we will make this prediction with 95% confidence. But how do we know what E is? Or the sample size, N?
E = 1.96 times the square root of p times q divided by N
E = 1.96 Ã pq/N
E is the error tolerance.
1.96 is a factor mathematicians calculate from the 95% confidence level we said we’d use. [It is, of course, z0.05;] If we wanted only 90% confidence, then the factor would be 1.65; if we wanted 99% confidence, the factor would be 2.58.)
p = the percent of what we want in our sample.
q = 1 - p or the percent of everything else in our sample.
N = the size of our sample, the number of people or answers or objects in our sample.
Let’s use this formula with our bean/pea population. For my particular sample B, we had N = 26, p = 62%. Then we would predict
(figure available in print form)
P lies within 0.62 ± 1.96 times 0.095 = 0.62 ± 0.19.
P lies between 0.62-0.19 and 0.62-0.19 OR between 0.43 and 0.81 with 95% confidence.
When we took lots of samples of size about 27, we found P was about 68%, or 0.68. Is that between 0.43 and 0.81? Of course it is!
Let’s try this for F. p = 75% or 0.75
Then q = 0.25
(figure available in print form)
P is within 0.75 ± 0.16
P is between 0.59 and 0.91 with 95% confidence.
Let’s try it for sample A. p = 88% or 0.88
Then q = 0.12
(figure available in print form)
P is within 0.88 ± 0.11
P is between 0.77 and 0.99 with 95% confidence.
Did you say, “No, P was 0.68. That is NOT between 0.77 and 0.99.’’? Well, we didn’t claim 100% confidence, did we?! 95% “confident’’ means 5% of the time we’re wrong! This was one of those cases where we were wrong!
Let’s try two more. Use samples of about N = 110.
For my sample 1, p = 71% or 0.71
Then q = 0.29
N = 113
(figure available in print form)
P is within 0.75 ± 0.08
P is between 0.67 and 0.83 with 95% confidence. Notice, since N is larger than before, how much smaller E is.
For my sample 8, p = 65% or 0.65
Then q = 0.35
N = 111
(figure available in print form)
P is within 0.65 ± 0.08
P is between 0.57 and 0.73 with 95% confidence.
Let’s go back to the beginning. We wanted to predict what percent of high school students watch more than one hour of MTV a week. We pretended the beans were those students and the beans and peas together were all high school students. Actually, we would conduct a survey, trying to pick students at random, couldn’t we. But how many students should we pick in our sample? There is a way to use the formula we’ve just worked with to answer the question. We’ll stick with the beans/peas model.
P, we said, was within p
±
E.
(figure available in print form)
E = 1.96 Ã pq/N. Suppose we decide our error tolerance in advance.
Then we can solve for N as long as we have a guess about p. [If your class can do the solution, do it. Otherwise simply present the following.]
(figure available in print form)
In words, N equals 1.96 divided by E, then squared or multiplied by itself, times p times p.
Remember when we guessed a percent, p, for beans way back at the beginning of Lesson 10? We’ll use that number for p now. And let’s agree we want E = 0.06 at the 95% confidence level.
(figure available in print form)
N = 256
A sample of 256 should do it.
Suppose we had guessed p = 0.70.
(figure available in print form)
N = 224. Close, but a little less than the 256 we had before.
Suppose we set E = 0.10, and guessed p = 0.60. Do you expect N to be larger or smaller? Why? Let’s calculate N.
(figure available in print form)
N = 92. Did you expect N to be smaller because we made E larger?
Let’s make E = 0.04, and keep our guess at p = 0.60. What do you expect will happen to N now?
(figure available in print form)
N = 576. Did you expect N to be larger because we made E smaller?
What do we do if we have no idea at all about what p might be?
The safest solution is to use p = 0.50. That will give the largest value of N for a given error tolerance.
D. Observations and discussion to objectives —
[Don’t try the foregoing without calculators! And you may have to teach calculator use for the specific formulas, too!]
Here are two more problems. Let’s try them to see how to summarize what we’ve learned. Recall problem 2. Sandra “Dunk-em’’ Smith may run for Student Government president. First, however, she wants an estimate of what percent of the students in her school know who she is. She’d like to have 95% confidence in a prediction within E = 0.10.
How many students should be polled?
(figure available in print form)
“Dunk-em’’ thinks 75% know who she is. Her campaign manager says to use 50% because it will give a “safer’’, larger number of students to sample. Try it both ways!
(figure available in print form)
N = 72
“Dunk-em’’ decides to play it safe. Her campaign workers poll a sample of 96 students. 62 of them know who “Dunk-em’’ is. What is the prediction for the percent of all students?
(figure available in print form)
(figure available in print form)
(figure available in print form)
P is within 0.65 ± 0.10
P is between 0.55 and 0.75.
“Dunk-em’’ is now 95% confident that between 55% and 75% of the students at her school know who she is. Now she can decide whether to run for Student Government president. What would you decide?
[One approach to Willie B. Ready’s air filter problem (Problem 3 at the beginning):
For 95% confidence and E = 0.03 and a guess of p = 0.20, we get
(figure available in print form)
N = 683 air filters.
Willie figures it would take two months’s worth of air filters to get that many. So he changes his E to 0.10.
(figure available in print form)
N = 61 air filters.
He goes with it. He gets 5 defective ones. So he calculates
(figure available in print form)
P is within 0.08 ± 0.07
P is between 0.01 and 0.15.
Willie has estimated that between 1% and 15% of the air filters are defective. What would you do? Change suppliers? Warn the supplier that you will change if there is no improvement? Ignore it?
Statistics help us predict. But the important decisions we base on the predictions can not be made by the statistics. Human beings make those decisions!
The series of Lessons is concluded. Hopefully, students have met the objectives. The base of understanding in real problems, in concrete experience, should prepare the students both for a clearer understanding of general statistical data as well as for the further study of statistics.
(figure available in print form)
WORKSHEET
1