David B. Howell
-
A. Objective — same as Lesson 9.
-
B. The experimental questions — What percent of the population is beans? How close can i get? How confident am I?
-
C. Issues, and some possible resolutions —
[Materials and procedure. A box of several thousand objects — two different kinds or colors of objects. I used dried beans and peas, less than two small packages in all, which just filled a one-quart container and, I estimated, approached 4000 objects. To mix the beans and peas thoroughly, I dumped them into a large container which I shook vigorously! Sampling was a bit less efficient than for the colored cubic centimeters; since beans and peas are different in size and shape, I couldn’t count out the same number of objects for each sample without compromising randomness. I scooped out a level teaspoonful for each sample, getting generally 25 to 30 beans and peas. It became important, then, to have a calculator to find the “percent’’ of beans in each sample (and later, to find the totals of combined samples).
Theoretically, one should take successive samples
with
replacement to meet a condition of randomness. For such a large population, it shouldn’t make a noticeable difference to sample without replacement, however, For the class that raises the issue, and if you have time, it might be worthwhile to try both methods to compare results.
The greatly reduced copies of Worksheets included here illustrate
my
results.
Use full-size Worksheets with the class! The particular combinations used to generate larger samples are, of course, not important.]
Here are several thousand beans and peas. We could use them as models of air filters, with beans (or peas) the defective ones. Or as high school students, with beans (or peas) students who know “Dunk’em’’ Smith. Or who watch MTV more than one hour per week. In any case, we want to take samples so we can predict the percent of the entire population which is beans.
On your Worksheet, write the Research Question, “What percent is beans?’’ And in parentheses, record your estimate (guess) right now based on looking at the top layer (these are well-mixed). We’ll need to refer to your estimates later.
With the teaspoon, each of you take one sample. Count the total, and the number of beans, and record on your Worksheet. Then we’ll list all of your samples on one set of Worksheets.
[Here are my Worksheets for 26 samples]
(figure available in print form)
From these 26 samples, where N averages about 27, what would you be willing to predict about the percent of the total which is beans? Less than 50%? More than 90%? Probably in the 60’s or high 50’s, or low 70’s? How confident are you? What error tolerance will you accept?
Let’s graph the data; perhaps it will be easier to see what’s happening...
(figure available in print form)
PERCENT OF BEANS
After our experience with the cubes, we would expect that combining the small samples into larger ones would give us a clearer picture and a narrower range. Let’s do that here, combining four small samples into new samples averaging about 110 in size.
[Here are
my
results.]
(figure available in print form)
And let’s graph these on the same scale we had before.
(figure available in print form)
PERCENT OF BEANS
What are you willing to predict now?
Let’s combine again — combining groups of five of the second set of samples into a new set of 26 samples averaging about 550 in size.
[Here are my results.]
(figure available in print form)
(figure available in print form)
And graphing as before, but with one modification since the percents cluster so tightly... Let’s keep the same scale, but break each interval in half so we see each percent value.
(figure available in print form)
PERCENT OF BEANS
Now what are you willing to predict?
As we did in Lesson 9, we’ll make a Table summarizing these results in terms of N, the percent E, and the “confidence level.’’
[Here are my results.]
(figure available in print form)
D. Observations and discussion to Objective —
The Table on the previous page makes it clear, for the samples I took, that with a sample size of 27, 85% of the samples lie within E = 14 of the population (sample mean) percent of 68. Or, to change the point of view again, 85% of the time the population percent will be within E = 14 of whatever my sample percent is. When the sample size is increased to 110, 85% of the time the population percent will be within E = 14 of whatever my sample percent is. When the sample size is increased to 110, 85% of the time the population percent will be within E = 4 of whatever my sample percent is! And when the sample size gets up to 550, 88% of the time I can predict within E = 2 of the population percent.
[You may want to view the “confidence level’’ in a more technically correct way from the error side. In the last case, for example, one would say that only 12% (100-88) of the time will I have a sample more than E = 2 off by chance. Or...the population percent is different from my sample value, say 67%, 67 2, only 12% of the time; therefore I have no reason to reject the hypothesis P = 67 2 at the 12% level. But I think such a degree of technicality requires a far more sophisticated and formal background in probability, the normal curve, and hypothesis testing than is appropriate at the level of these lessons and
than is necessary to establish the basic concepts
as we have been doing. For a similar, intuitive, non-technical approach using box and whisker plots for 90% confidence instead of the histogram and Table techniques here, refer to
Information from Samples
by Landwehr et al. (Bibliography reference 6).]