Yale-New Haven Teachers Institute Home

## Some More Statistical Exercises

by
James Francis Langan

#### To Guide Entry

Doing a second unit on statistics gives me the chance to update the recommendations I made last time and report the results of what I did.

I said to put statistics in the curriculum and look at the literature. Have the students do statistics. “Let’s get started.” Well I am still getting started, but here is what I have accomplished and what I have found.

Students will do statistics, they enjoy it. In fact I now believe they would find the standard deviation, especially if calculators are available. In fact, the properties of the mean and standard deviation of modified data sets can be used to study transformations on functions. I cover this in the section called “Working with the Mean and Standard Deviation”.

The section “What Happened” gives a report of my experiences in the past school year. It includes a computer program to do the steps of a classroom demonstration for drawing a histogram, finding the mean, median and mode.

I am still looking for the ideal books for students to read. I have found some statistics stories to share with students they are found in the obvious section “Some Stories”. I also found a set of materials that have the students do statistics.

The set is four 8 1/2 x 11 paper backs called Statistics by Example by the Joint Committee on the Curriculum in Statistics and probability of the American Statistical Association and the National Council of Teachers of Mathematics. The four volumes are Exploring Data, Weighing Chances, Detecting Patterns and Finding Models. The Chairman of the committee is Frederick Mosteller the professor I mentioned last year who samples his class to predict how much money it has. He also says students should be able to find out if runs of heads on coin flipping experiments or runs of dice are independent. In two of the articles he wrote for these volumes he looks at the fractional parts of stock prices to see if they are uniformly distributed, equally likely. All together there are 52 articles.

So what is statistics? What are the key ideas of statistics? Statistics is the scientific method. We do experiments, record our data and then ask, “How significant is it?” It is inductive thinking, not deductive. Since inductive thinking does not give certainty we would like to have some idea of what our chances are of being right. We calculate the probability of our results as if chance alone were responsible. If the probability is small then something else must be the cause. Here is an example.

“Turning the Tables” by Joel E. Cohen is a three page article in Statistics by Example: Exploring Data that makes this point very well. A psychologist observes people sitting at square tables in cafeterias. He counts the numbers of times people sit “opposite” each other and the number of times they sit adjacent to each other, “corner” seating. He also observes whether they are interacting or working independently and just happen to be at the same table. His counts were

____Number of pairs observed
 Corner Opposite Interacting 134 63 Noninteracting 2 16
The psychologist concluded people preferred corner seating for interacting and they preferred it because it allowed then to avoid eye contact.

What if the people had sat down at random? What is the probability of sitting at a corner? What is the probability of sitting across from one another? How many corner seatings are there? There are four corners. How many ways to sit opposite one another? There are only two opposite seating plans. Here is another way to look at it. The first person sits down, the next person has three choices two of them will result in corner seating only one will result in opposite seating. So we should expect to see twice as many corner seatings as opposite seatings. That is, out of 197 interacting pairs we expect 131 to sit at corners and 66 to sit opposite each other if they sit at random. 131 versus 134, 66 versus 63 that seems to be what one would expect. Look at the noninteracting pairs, however, out of 18 pairs we expect 12 to be corner seatings and 6 to be opposite if the seating is random. 12 versus 2 and 6 versus 16 seem to be going just the reverse of what is expected.

Something must cause people to sit opposite one another when they are working independently. Eye contact must not be the issue with the interacting pairs either since, I assume, noninteractors would be more likely to want to avoid eye contact than interactors.

Can we be sure? How do we know that the difference in the first case does not matter and the difference in the second case does matter? To answer that question we need to know about a statistic called the chi-square statistic. At this point in my experience of showing statistics to high school students I do not believe I should teach chi-square.

Our job is to give students insight into what is going on. Statistics is uncertainty. If students are to use chi-square they should know that it is a distribution, not a number one looks up in a table. These points are well covered in a number of articles in Statistics by Example. If your students show interest and ability let them use these books to find out.

For the reader who may not know how to calculate a chi-square, I did not when I read we should test to see if runs of heads were random, here is the formula.

The sum for i=1 to n of terms of the form

(Oi Ð Ei)2/ E2 Where Oi is the i-th observed count Ei is the i-th count expected by probability Perfect agreement with chance would give a chi-square of zero. How probable one’s chi-square is when it is not zero is determined by looking up a value in a chi-square table.

### Some Stories

Statistics means both data and analysis of that data. The history of mathematical analysis of statistics begins around 1890 with Karl Pearson. Much development of statistics took place between the World Wars but it was not until World War II that statistics became a tool of war. How many uniforms do you need? You have to make them before you have the personnel. When the inductee arrives he must be given his uniform and it must “fit”. Someone better have kept data on the population. If we make too many we are wasting resources that could go to some other activity. The British did not have such data at the start of the war.

One war story is the case of armor plate on aircraft. It is heavy and if it is not needed it will slow down the plane and use more fuel. A team recorded the locations of all the bullet holes in planes that had returned from battle. Then armor plate was placed at random in the places were no bullet holes were recorded. Why? Any bullet hole that was recorded was not a vital spot, the plane had returned. The planes hit in vital spots had never been seen.

The Nature of Statistics by W. Allen Wallis and Harry V. Roberts is a paper back that one could have students read. The authors claim another title for their book could be “How to Live with Statistics, Without Actually Figuring”. Their second chapter is 26 short anecdotes of cases where statistics was used to solve a problem.

One of their cases is a study of the relationship between aircraft losses and the length of time since overhaul. It was found that the number of hours lost decreased as the time increased, the opposite of what was expected. So the length of time between overhauls was allowed to increase and the maintenance system improved so overhaul would increase the chances a plane would fly rather than decrease them. This points out that statistics is the scientific method. Being observant has unexpected rewards.

Another story Wallis and Roberts tell is how the Allies estimated German industrial output and capacity from the serial numbers on captured equipment. After the war it was found the Allies had the figures sooner than the Germans and were more accurate because the Germans waited for full counts.

One last war story. After the war the occupation forces required bakers to sell bread of a minimum weight. A professor kept track of his bread weights. He found they made a normal curve but the mean was lower than called for. He reported the baker to the authorities. They spoke to the baker and the professor’s loaves now were all over the required weight. He again reported the baker to the authorities and this time the baker was prosecuted. Why? The histogram was not normal. The baker was selling the professor only loaves from the heavy end and those who were not complaining were getting the rest.

Sampling is another source of stories. Rail Roads and Airlines have found it cheaper to sample inter-line billings than to calculate the exact amounts due by working out every bill. After all it takes time to do the bills.

There is popular literature on statistics and probability available for your students. It is just less frequent than Algebra, Geometry and Topology. Look for it. I am still looking for more.

Stories do not have to come from the literature. Look around you. One example I can think of is the geysers in Yellowstone National park. How do we know when to show up to see them? How do we know when it is not worth waiting any longer? Why is Old Faithful called Old Faithful? Records are kept and standard deviations are calculated.

The literature need not be math books. Frequently someone will write Ann Landers or some other columnist bewailing some behavior they observed. Is it probable? Could it just have happened by chance?

### Working with the Mean and Standard Deviation

Insight to the meaning of the standard deviation will be gained by experience in finding standard deviations. The sets of data should be selected so as to demonstrate key ideas. Two points to investigate are what happens to the mean and standard deviation when a constant is added to all the numbers in a set of data and what happens when all the data are multiplied by a constant. This is an opportunity to give experience in coordinate transformations. In Algebra when the teacher moves a graph of an equation just by looking at its formula and the previous example many students find this to be sleight of hand. If the students are non-Algebra students they will be gaining experience for later. If the students are Algebra students not only are we providing an example to be studied but also we are integrating a statistics concept into the standard curriculum.

It is to be hoped that calculators will be provided to do these calculations. I believe some learning is just doing the calculations oneself. If it is done by computer much of the point is lost. Once the work gets boring and we have seen the point then the computer can be used.

Here are some sets and some questions to be asked with each one.

Draw a histogram for each set and find the mean and standard deviation.

 Example Mean Standard dev. 1 19,20,21 20 0.81649
Compare each of the following to example one. How is example one changed to get the new case? How did the mean and standard deviation change? Make another pair to show the same point using more numbers.

 2 -1, 0, 1 0 0.81648 3 19,20,20,21 20 0.70711 4 38,40,42 40 1.63299 5 57,60,63 60 2.44949 6 19,19,20,20,21,21 20 0.81648
Example 2 is 20 subtracted from each score in example 1. The mean has decreased by 20 but the dispersion has not changed so the standard deviation stay the same. Example 3 shows how adjoining the mean to a set of scores does not change the mean but does decrease the dispersion and the standard deviation. Examples 4 and 5 are example 1 multiplied by 2 and 3 respectively and the means and standard deviations have likewise been multiplied by 2 and 3. Example 6 shows that multiplying the frequencies by a constant changes neither the mean nor the standard deviation.

 Example Mean Standard dev. 7 3,5,8 5.33 . . . 2.05480 8 4,7,9 6.66 . . . 2.05480
Tell how examples 7 and 8 were combined to make examples 9 and 10. How did the means change? How did the standard deviations change?

 9 7,12,17 12.0 4.08248 10 12,20,21,27,32,35, 45,45,56,72 35.55 . . . 18.04384
Example 9 is the term by term sum of 7 and 8 the new mean is the sum of the given means. There does not seem to be a simple relationship for the standard deviations. Example 10 is the set of all possible products formed by taking one number from each set. The mean is the product of the two given means, but again there is no obvious relationship for the standard deviations.

After the students can find standard deviations tell them we may expect certain percentages of the data to fall within fixed numbers of standard deviations from the mean. Here would be the place to show the class a normal curve telling them the area of an interval under the curve represents the probability of a score falling in that interval. The probability is 68% that a score lies within plus or minus one standard deviation of the mean, 95% that a score is within plus or minus two standard deviations of the mean, and 99% that a score is within three standard deviations of the mean. I would stress that these percentages are dependent upon the distribution that is being considered.

### What Happened

When I wrote last year’s unit, I frequently thought of the Algebra II students I had in 1984-85. This year I taught no Algebra classes. However, I was able to do parts of the unit with an Applied Math class, students who failed the Ninth Grade Proficiency Exam.

So this year I am thinking of what you can do with non-college-prep students. Students who take long times to get started, to believe they can do the work, and finally to complete the work.

In September, full of enthusiasm from having written the unit, I started talking about probability. The concept of a common fraction is central to the Applied Math course. We did the sample space for two dice, drawing cards, and drawing marbles from bags. The students did the work, they answered the questions.

I should have continued on into statistics. Instead I went to preparing the students to pass a parallel form of the Ninth Grade Proficiency Exam, by looking at questions modeled on the exam itself. The number of students who passed it in September was small as was the number who finally passed it in June.

So we were all frustrated by the end of the year, the students much earlier than the teacher. I wanted some different, hands on activity. I decided we would do circle graphs. First we would have to learn how to use a protractor, to make angles of a given size and to measure given angles. Some students did learn to construct and measure angles. Most of the students needed much more time to even learn the concept of what an angle is and what it means to measure one. So I was still frustrated.

The text book I was using was Stein’s Refresher Mathematics by Edwin I. Stein. Chapter five was “Graphs, Statistics and Probability.” A colleague says the book is written on a college reading level. The presentation is quite sparse. There are problem sets of data to use for finding mean, median, mode and for drawing histograms. Next Stein goes on to Percentiles without any motivation or visualization technique. The book is written for someone who can read but needs to find out how to do some arithmetic task. It is not written for anyone afraid of school.

So we spent some time on histograms. The students behaved as if it were a hands on activity. We did the work in class step by step. Stein treats each topic separately. I, however, had the students draw the histogram even when the book was only asking for the mean, median and mode. Stein’s next topic was percentiles. So I wanted a way to introduce percentiles from histograms. I decided to teach cumulative frequency plots as an introduction and visualization of the median and the percentiles. So a cumulative frequency column was added to the work sheet.

Stein leaves it to the instructor to ask questions to stretch the students minds. When we were doing a histogram and finding the modes, I asked what was the probability of a particular score. The students answered the question. After seven months they knew their probability. I was pleased.

The amount of time needed to achieve proficiency was much greater than what I believed when writing the unit. We did make cumulative frequency plots, but we ran out of time before we could get to translating the frequency ordinates into percentages. The student satisfaction leads me to believe I should have started with this topic earlier.

Many of the topics of Applied Math easily tie into statistics. Changing common fractions to percents is a large part of the Applied Math course, but what does it mean, why would anyone want to do it? Well changing frequencies on the left edge of a cumulative frequency plot to percentages from zero to 100 on the right edge is concrete and visual, perhaps, even motivational.

The histograms were done as a class activity: data on board, arrange in order, make tallies, count frequencies, plot histogram, find mean, median and mode. Answers were provided step by step. I waited for the students to finish a step before I gave that answer on the board. We discussed that step before they went on to the next step and so forth. Additional problems were given for homework. Some students were motivated on their own to use color for the shading on their histograms.

While this may be arithmetic, the students seemed to enjoy the organized process. I now think finding a standard deviation could be done with such students. I need some motivation to do it, however. I do not want to do it just because the students like to do arithmetic. I would want to find the standard deviations of a number of sets of data, after making their histograms. This would give a visual meaning to the standard deviation.

With the Applied Math students the data was not grouped into intervals. The data were treated as if they were scores on tests. Usually about ten scores 20 to 50 items.

So the next time I teach Applied Math I will start with probability and statistics. I will use these subjects as occasions to use and introduce the arithmetic skills of the course.

So plotting data as histograms and cumulative frequency plots is the way to go. How do we get the data to plot? Do some experiments.

### Some Experiments

If you have time, do the paper thickness experiment in last year’s unit. There are quicker ways to generate data. Have each student write 5 numbers from one to ten on a note card. Tell them to do it “randomly”. Collect the cards and record the results. Do the experiment on a number of days. If you teach chi-square you can see if the scores really are random. Calculate the chi-square each day too.

Another activity was suggested by an example in Samprit Chatterjee’s article “Estimating the Size of Wildlife Populations”, in Statistics by Example: Exploring Data. The problem is to take a deck of cards select a number of them, record them, replace them in the deck, shuffle well, make another selection and record the number of repeats from the first selection. This simulates the technique used by ecologists to estimate the number of fish in a lake. A number of fish are captured and tagged (N1) they are then released. A short time later another fishing trip is made. The number in the second catch is N2 the number that have tags in the second catch is T. We now have a proportion. It is assumed that the ratio of tagged fish in the lake to the total number of fish in the lake is the same as the ratio of tagged fish in the second catch to the total number of fish in the second catch. So now do the same with your card data. Do you get the correct number?

Students are apt to just count the cards or tell you how many there are supposed to be in the deck if you ask the question. So give the directions first. In fact do not have any one student do both catches. Since we want data to plot have each student do only one selection of cards. Have each student make two copies of the selection they made. Having numbered each selection assign number pairs to each student to compare for repeats. With a class as small as 16 that is 120 comparisons. Do as many as you can stand. I started to do this experiment by hand. I had many questions. If you do it many times and record all the cards that you see eventually you see all the cards. It would have made more sense to count them in the first place if you repeat the draws. Should the two selections be the same number of cards? How many cards? I thought these questions would be answered by Chatteerjee’s second article “Estimating Wildlife populations by the Capture-Recapture Method” in Statistics by Example: Finding Models. What the article did do was show the answer one gets by assuming a proportion is the maximum likelihood estimate. It is a feast of combinations, finding probabilities. I found doing 16 deals for 8 comparisons was tedious, so I wrote a computer program to make all the comparisons. The number of cards in a selection may be varied from 1 to 52, but the same number will be used in all selections, it was easier to program. Playing with the computer might give a hint to some of my questions. So there is another statistics problem for me to explore.

### The Computer Programs

The programs are available on disk call the Institute office and I will get a copy to you.

The first program is “Step by Step”, it does all the arithmetic steps needed to do the work done with the Applied Math class. Read the REMark statements and you will see what is going on.

The second program, “HISTOGRAM” makes a histogram using X’s as tallies. The output will appear on the screen, but also may be printed out so as to share more examples with students than can be generated in one class period. This program could be used in conjunction with the mean and standard deviation activity.

The third program, “HOW MANY CARDS IN A DECK”, simulates the capture-recapture experiment with a deck of cards. One may have as many as 16 catches of as many as 52 cards. The program will make all comparisons for repeats and calculate the maximum likelihood estimate. If there are no repeats the program avoids division by zero and prints: “NO VALUE”.

(figure available in print form)
Here is the output of histogram. Its listing follows:

(Figure available in print form)
(Figure available in print form)
(Figure available in print form)
(Figure available in print form)
(Figure available in print form)

### Bibliography

See the books listed in last year’s unit as well as the following.

Atkinson, Anthony C. and Fienberg, Stephen E., A Celebration of Statistics: The ISI Centenary Volume. New York, N. Y.: Springer-Verlag New York, Inc., 1985.

A collection of articles for teachers. Some are about statistics and its history.

Mosteller, Frederick, et ali, Statistics by Example. Reading. Mass.: Addison-Wesley Publishing Company, 1973.

These are the four volumes I mention in this unit. Do look at them. You will find some articles your students can read independently.

Rustagi, Jagdish S. and Wolfe, Douglas A., Teaching of Statistics and Statistical Consulting. New York, N.Y.: Academic Press, Inc., 1982.

Some more background reading to see what the Statisticians believe they should be doing themselves.

Wallis, W. Allen and Roberts, Harry V., The Nature of Statistics. New York, N.Y.: The Free Press, 1965.

The paper back mentioned in the unit that you could assign for reading.