David B. Howell
-
A. Objective—students will be able to describe relationships among sample size, error tolerance, and “confidence level’’ in a Table.
-
B. The experimental question—What percent of the colored cubes in the box are red?
-
C. Issues, and some possible resolutions —
Let’s review and extend the activity of Lesson 8 in “The Statistics Sampler.’’ We were concerned with the percent, or proportion, of the total population of Colsquar which was red (the residents were colored cubic centimeters). We analyzed samples, first, of size 10. Then we combined those into samples of 30, and then of 50. We formed three histograms as follows:
(figure available in print form)
(figure available in print form)
(figure available in print form)
We summarized the graphed data as follows:
Sample size
|
Range (percent of red)
|
Mean
|
10
|
0-40%
|
13%
|
30
|
0-23%
|
13%
|
50
|
6-22%
|
13%
|
Now, based on the information in the graphs, we are going to expand the detail in the Table.
-
*1 The graph shows a total of 32 cases.
-
*2 Three cases, those
beyond
the 20% column of the graph, do NOT lie between O and 26%. Therefore, 32-3=29 cases DO lie in the interval from O to 26%. Since the actual population mean is equal to the mean of all possible samples of a given size, we might feel, based on
our
samples, 91% confident that the population mean lies between 0 and 26% red. Or, looked at from the outside, only 3 cases or 9% of our samples, by chance, would cause us to predict smaller than 0 or larger than 26% red.
-
*3 Our sample size of 10 is too small to show differences between multiples of 10%
(figure available in print form)
-
D. Observations and discussion to Objective—
The expanded Table on the previous page shows clearly the inter-relationships among sample size, error tolerance, and “confidence level.’’ For sample size 10, 72% (or 23 of the 32 different samples we made) of our samples have “percent red’’ within E = 8 of the sample mean value (population value) are between 13 - 8 = 5 and 13 + 8 = 21 of the population value 13). For sample size 30, 72% of our samples have “percent red’’ within E = 6 of the population value. Or 82% are within E = 8. For sample size 50, 69% (almost 72%) of our samples have “percent red’’ within E = 4 of the population value. Or 100% within E = 8!
Let’s work with the language a bit. We said “For sample size 50, 69% of our samples have ‘’percent red’’ within E = 4 of the population value.’’ Said another way, if we took just
one
sample of 50, there would be about a 69% chance that it would have a “percent red’’ value within E = 4 of the actual population value. Or another way, if we took just one sample of 50, we would be 69% confident that it would have a “percent red’’ value within E = 4 of the actual population value.
I can hear you now! “Hold it! Hold it!’’ you say. “If we take just
one
sample, we won’t know what the population value
is
. So what good does it do us to be 69% “confident’’ our value is within E = 4 of it>’’
Well, look at it from the other side. If my value is within 4 of some other value, then isn’t the other value within 4 of mine? If the other value is 13 and my value is 16, we are within 4 of each other. If my value is 9, we are still within 4 of each other. If I get 16, I’ll simply say that the other value, the value I want to predict, is between 16 - 4 = 12 and 16 + 4 = 20. 13 qualifies, doesn’t it?! If I get 9, I’ll predict 5 to 13. 13 still qualifies! And if I am 69% confident my value is within 4 of the true one, then I am 69% confident the true value is within 4 of mine.
When we started this experiment with the little cubes, we pretend the cubes were residents of the planet Colsquar, and red residents (cubes) liked the red records we wanted to sell. We wanted to predict the percent of the population which was red. Now pretend something different. Pretend that the colored cubes are air filters. Red cubes are defective. Willie B. Ready of Awesome Auto Parts wants to predict what percent are defective with 95% confidence and error tolerance E = 3. How many filters does he need in his sample? He can get to E = 3 for a sample of 10, but only at the 44% confidence level. For a sample of 50, he can get to E = 3 with 69% confidence. Obviously, he’ll have to sample some number more than 50. But we don’t know, yet, a simple way to find that number.
Before we describe a simple way, however, let’s go through on more model experiment to be sure we have a good idea of this whole sampling process.