Yale-New Haven Teachers Institute Home

## Some Statistical Exercises

by
James Francis Langan

#### To Guide Entry

Accountants, actuaries, administrators, agronomists, anatomists, anthropologists, archaeologists, astronomers, biologists, buyers for stores, chemists, cliometricians, consumers, demographers, economists, educators, epidemiologist, gamblers, geneticists, industrial engineers, lawyers, managers, market researchers, material strength testers, medical researchers, military strategists, oceanographers, paleontologists, physicists, politicians, product safety engineers, psychologists, purchasing agents, researchers, sales persons, sociologists, television producers, weather forecasters. Do any of those titles describe you? Are any of those jobs ones you hope to have? How many? All of these occupations use statistics. Collect statistics, make decisions on the basis of statistics, persuade others with statistics, or are the object of statistics.

Our first activity is, “Do you know what all those jobs are?” Use a dictionary, give complete sentence definitions of each job. Can you name any other users of statistics? Can you tell how statistics are used by these people?

Here are some other fields that use statistics: communications and control theory, cybernetics, information theory, game theory, operations research, systems analysis.

So statistics is a useful subject. That is why we are asked to expose our students to it. By this activity our students come to see mathematics around themselves. This activity allows the use of nonmath skills in the math class. We are called upon to integrate writing into all our subjects, here is an opportunity.

While we are using our dictionaries, let us find the definition and etymology (another word for students to look up) of statistics. What word do you hear in statistics? State. Historically, statistics was the collection, organization and interpretation of census data, data about the nation. Today statistics uses any type of data.

Those interested in precise language may be interested to know that what many people call statistics are really data. Baseball and football “statistics” when they give number of times at bat, number of hits or football statistics when they give number of yards gained are giving data. A statistic is a number that is calculated to summarize a set of numbers. So a batting average is a statistic.

When I think of statistics, I think of averages and standard deviations. What do others think of or use? One list, the result of a survey of political science journals reported these seven topics, in order of use: relative frequency, frequency, mean, correlation coefficient r, index numbers, r2 and, lastly, standard deviation and variance.

Frequency is the count, the number of times something happens. We will see frequency in the dice activity and in the measurement project. Relative frequency is a fraction or a percent. A probability is a relative frequency. These two concepts are in the paper. I do not use the term relative frequency in the paper, however. I think probability is enough of an introduction. The vocabulary would be an intrusion. The student who knows a probability will have no trouble calculating a relative frequency at a later date.

As I said, the average is what I think of when I say statistics. What good is an average? I sometimes think some teachers just ask for it as a problem using two operations. Text books emphasize three properties of the mean. The mean is the most probable score, the mean is the number the sum of the deviations about which will be zero, and the mean is the number the sum of the squares of the deviations about which will be a minimum. This was the motivation for this project. I wanted to find how to demonstrate these properties to students. The third is a calculus problem so I did not write out the proof. The Fair Game activity is my attempt to answer these questions.

The correlation coefficient r and r2 are topics for a full course development. I want topics that can be covered in short periods of time as breaks from the normal flow of the course.

Index numbers are an arithmetic activity that some teachers might want to explore. When I read the calculations I thought my arithmetic students could do it, but I wondered how I could get them to do it. What would motivate index numbers?

The standard deviation and variance go with the mean. The mean gives us a measure of central tendency but what is the spread? The standard deviation gives us the dispersion. I have difficulty including standard deviation in an introduction. True, it is of fundamental importance, but the formula is intimidating and not self evident. Yes, we need a measure of dispersion, but why such a complicated one? Not a question to answer in an introduction. If we can get the students to see a need for a measure of dispersion we will have dome a good job. The measuring project is an effort to answer this desire.

If teachers want to give students practice in evaluating formulas and following directions, the standard deviation formula might be a candidate. Students can feel successful by working out a complicated formula, in class, in competition with their neighbors. Telling the class the formula is from a more advanced class would be all the motivation needed.

I wrote the units as if I were talking to my class. The Algebra class I had this past year is the one that came to mind most often. That does not mean this is for Algebra II students. If the dialogues were transcriptions there would have been interruptions. I would have had to check to see if the students understood what had been said. Students would have had questions that might have led to other issues. That is why I think of these stories as science fiction, not as a put down, but as a story that begins with certain assumptions. When these activities are done in a class room time will be called for, to wait for calculations to be performed, measurements to be made and recorded. All this time may cause the activity to take a number of days which in turn will call for review. Maybe it will become too long.

As you read this think of how you could modify it for your students. Try some different strategies. Finding probabilities when you have a sample space is no different than saying what fraction of the set is the circled subset? If it is asked in terms of probability it will be useful in some students’ minds rather than just fractions.

I did make the proposal of the dice game in class one day. The game is you get a dollar for each point appearing on the die. The students could see that some amount would have to be paid for the privilege of playing the game. However, the discussion turned to how bookies make money. The students were surprised that a bookie would take bets on any side.

How you use the material will depend upon your teaching style. I want discussions with my students. I need a blackboard and chalk. The measurement project will require metric rulers and the students’ texts. A form for recording the data would be helpful if it will take more than one day to collect and present the data.

The list of jobs and the page on shooting dice could be copied as hand outs to the students. Since, I think in terms of discussion, the questions in the articles are only suggestions. If you are going to use them as hand outs you might ask your students to add questions of their own.

While I am telling the reader to try new things, it is easier said than done. Historically probability came before statistics, statistics uses probability to interpret data, I was taught probability before statistics, my algebra text has probability in it but not statistics per se, so I wrote this paper starting with probability. So if you are interested in doing something different go directly to the article Measuring Paper Thickness. It is the article I see as the most statistical. It will call for the most student participation.

So let us get on with the activities. Our first wag Word Wealth using the dictionary on the job list. Our next activity is Role Playing.

Since statistics is concerned with real world problems we can share such problems with our students. The problems can be presented as role playing situations. To solve problems we must understand what is being asked for. To understand a problem we must ask questions. In role playing situations the students may be the managers who have to make the decisions. The students need not solve the problem, but asking the proper questions will be considered successful completion of the activity.

Consider this scenario. You are the manufacturer of pantyhose. At the start of each month you service your equipment. You run a batch of raw material. You know how much raw material is used each month so you know how many pairs you should have at the end of the month.

The problem is that you never have as many pairs at the end of the month as were predicted at the start of the month. It is always less, and by a good amount too.

Some of your fellow managers believe it is employee pilfering. They set up traps, but find nothing. They decide to call in a team of psychologists to find the coo-coo they believe is doing the stealing.

Can you save your firm from being told it has a suspicious mind, that in fact no stealing is going on?

The idea for this story came from How to Use (and misuse) statistics by Gregory A. Kimble. The psychologists were called in, but what they found was a “nonrepresentative” sample. When the machines were clean they produced a finer thread than later in the month after some loss of efficiency.. So as the month went on the thread got thicker taking more thread to make a pair of hose.

Problems like these are opportunities for students to think. One aspect of applying mathematics to real life situations is to know what is being attempted, what are the assumptions. These activities allow opportunity for discussion. Allow the students time to think, if they think you are asking riddles they may just wait for you to tell them the answer.

This idea, role playing, needs more work. There are books of applications of math to real life problems, however, the mathematics is at too high a level for students who have not had calculus. I think some of those problems can be presented to the class upto the solution step, all discussion and definition of the problem.

So we have two reasons for teaching statistics: it is used in jobs, it provides interesting problems. However, we can not teach a full years course in statistics, there is not enough time in the schedule. Furthermore if a full year course is to be offered a text would be purchased and followed. No need for these units. These units are to show places statistics may be integrated into the normal math curriculum. The teacher is already doing statistics, only it should be mentioned.

### Shooting Dice

(figure available in print form)
The chart above lists all the ways that two dice can come up. A list of all the ways something can end up is called the Sample Space for that game or experiment or whatever.

The probability of something is the ratio of the number of ways the thing can happen to the number of things that can happen. Let us find the probability of a one on the first die, P(one on first die). How many points in the table have a one come first? ___ Put a box around them. How many points are there altogether? ___ So the ratio of number of ways to have one on first die to number of ways the dice can come up is 6 to 36, which we reduce to 1/6. So P (one on first die) = 1

6 So how do we find the probability of an event? First make your sample space, the list of everything that can happen. Then circle the set of things you want to happen. Count the number of things in your set, count the number of things in the whole sample space. The ratio of the first number to the second number, what I want over what can happen, is my probability. It is a fraction so reduce it if possible.

How would you describe the points in the upside down T box? So what is the probability of getting a 6 on the second die or a 3 on the first? How many points are in the box? How many points are there altogether? So P(6 on first OR 3 on second) is 11/36.

How would you describe the event in the oval? So what is the probability of getting a 3 on the first die and a 6 on the second? P(3 on first die AND 6 on second die) = 1/36.

Do people usually say “I have a 3, on my first die and a 6 on my second?” Of course not. What do they say? They add the 3 and the 6 and say: “I have a nine.” So make a new sample space chart, but this time instead of (1,1) put 2 and instead of (2,3) put 5 and so forth for all 36 points.

Use your new sample space to find these probabilities.

P(2), P(3), P(4), P(5), P(6), P(7), P(8), P(9), P(10), P(11), P(12). Make a bar graph of your probabilities from above. Have the scores from 2 to 12 across the bottom and the probabilities go up the side.

Multiplying Polynomials

Here is a way to change the dice sample space into a multiplication problem. First let us look at a different way to multiply two binomials. (2a + 3)(5a + 4). First make a two by two square with 2a + 3 across the top and 5a + 4 along the side. Each of the boxes is filled in by multiplying the term at the top by the term at the side. Then combine like terms.

(figure available in print form)
We get the same answer as normally.5 Now6use this square table method to multiply x + x2 + x3 + x4 + x5 + x6 by itself.

(figure available in print form)
Look at the chart, if the x’s were erased inside the boxes what would it look like? Look at your sample space for throwing two dice. How many ways can you throw 7 with two dice? ___ What is the coefficient on x7 in the expansion? ___ So what do the exponents stand for? What do the coefficients stand for? ___

This is the idea of generating functions. Since computers can multiply polynomials they can calculate probabilities if they have the correct generating function. So the generating function for the number of ways two dice can come up is (x + x2 + x3 + x4 + x5 + x6)2, where the coefficient on xn tells the number of ways n can be thrown.

In fact if you let x = 1 in the generating function you will find the total number of ways the dice can come up. (1 + 1 + 1 + 1 + 1 + 1)2 = (6)2 = 36.

The point of this activity is to show how math is interconnected. Ideas in one course appear in another. I certainly did not claim I was teaching generating functions. Only that generating functions and multiplying polynomials can be mentioned at an early date. This is part of a math course called Combinatorics. Some people call combinatorics counting without counting. Look at the generating function, it counts the number of ways the dice can come up, but we did not actually count 1,2,3 to get the answers.

### A Fair Game

What is a fair game? One in which you have just as much chance of winning as your opponent. If money were involved we would say it was a fair game if neither of the players lost any money, at least if they played long enough. If a gambler offered us the following game what would he expect us to pay before each turn (ante up) to make it a fair game? Here is the game. We have one die. If we throw a one the gambler pays us one dollar, if we throw a two the gambler pays us two dollars and so forth upto six dollars. Obviously the gambler will not do this for nothing, and we will not do it if there is no hope of us making any money. If he charged six or seven dollars we would pay and have no chance of a profit.

Let us use probability to look at the game. A die has six faces each numbered one through six. So P(1),P(2),P(3),P(4),P(5),P(6) are all 1/6. So one sixth of the time we can expect a one to turn up, and the gambler pays one dollar. One sixth of the time a two will show up and the gambler pays two dollars and so forth. If we were to pretend the gambler had to pay the same amount every time then he would expect to pay

1 of \$1 + 1 of \$2 + 1 of \$3 + 1 of \$4 + 1 of \$5 + 1 of \$6.
 6 6       6     6 6       6
Using the distributive law we factor the 1/6 getting

1 ( 1\$ + \$2 + \$3 + \$4 + \$5 + \$6) = 1(\$21) = \$3.50.
 6 6
So if we pay him \$3.50 as our ante and he expects to pay out \$3.50 each time, everything will balance and it will be a fair game. If a 1,2,or 3 comes up we lose from 50¢ to 2.50 if a 4, 5, or 6 comes up we win from 50¢ to \$2.50. Our wins balance off our losses so it is a fair game.

Here are some vocabulary words to go with the ideas above. The \$3.50 is also called the mathematical expectation of the game. We get the mathematical expectation of a game by multiplying the probability of each payoff by the value of each payoff and then adding up all the products. If we were to drop the dollar signs, we could think of the numbers on the dice as scores and the 3.5 would be the expected score on one die. It is impossible to get 3.5 on one die. Think about two die, 3.5 is expected on the first 3.5 is expected on the second so 7 is expected on both of them together. Does that make sense?

Let us use the rule for mathematical expectation and see if it gives seven too. From our sample space for two dice we know the probabilities: P(2) = 1/36; P(3) = 2/36; P(4) = 3/36; P(5) = 4/36; P(6) = 5/36; P(7) = 6/36; P(8) = 5/36; P(9) = 4/36; P(10) = 3/36; P(11) = 2/36; P(12) =1/36. To find the expected score multiply the score by its probability:

2/36; 6/36; 12/36; 20/36; 42/36; 40/36; 36/36; 30/36; 22/36; 12/36.

Add them up getting: 252/36 which is seven. So we get the same mathematical expectation two different ways. Furthermore, we know from our sample space for two dice that seven is the most probable score. So the mathematical expectation is the most probable score or result.

There is another way to look at the mathematical expectation. When we found \$3.50 as the value of the game, what did we do? We added the payoffs or scores and then divided the answer by 6 the number of scores. That is called the mean average. So the mean average is the most probable score.

There is still more to say about the mean average. Let us look at the one die game where the value of the game, the mean average of the points, the ante, was 3.50. Here is the payoff chart:

 Point Payoff Ante Net 1 \$1 \$3.50 -\$2.50 2 \$2 \$3.50 -\$1.50 Negative numbers 3 \$3 \$3.50 .50 mean we lose. 4 \$4 \$3.50 .50 Positive numbers 5 \$5 \$3.50 \$1.50 mean we win. 6 \$6 \$3.50 \$2.50
Add up our net we get zero. For every gain there was a balancing loss. The numbers in the Net column were calculated by subtracting the \$3.50 from the payoff. The \$3.50 is the mean the net numbers are known as the deviations from the mean and the sum of the deviations from the mean is zero.

Let us use some algebra. We have a set of numbers we will subtract an unknown number a from each of the given numbers add up all the differences and set it equal to zero. What is a when we solve for it?

(figure available in print form)
If you know about sigma notation for sums, you could do the same proof for any sat of numbers not just the 3 numbers 2,6,and 7.

So we have seen two properties of the mean it is the expected or most probable score. The mean is the number the deviations from which sum to zero. There is a third property. If we squared the deviations from the mean and added the squares up the answer would be smaller than if we used any number other than the mean.

 Case Numbers Deviations Squares I 2 25 = 3 9 6 6-5 = 1 1 7 7-5 = 2 4 14
Case

 II 2 2-6 = -4 16 6 6-6 = 0 0 7 7-6 = 1 1 17

### Measuring Paper Thickness

In this activity we will find the thickness of a page in our text book, by use of a ruler! The purpose is to show that all measurements are estimates. That the measurements cluster around a central value, that there is a spread to the measurements around the central value, but as we go farther from the central value there are less measurements.

This story will take a number of days. I have not tried it with students yet so I do not know how many days will be needed. I can anticipate some delays. It takes longer to do the measuring than one would think. How are the measurements of all the students going to be shared? Provide the students with data recording and calculating forms. Have a box at the bottom edge to be filled in with the five student thicknesses. You may then stack the forms on a copier so only the results show getting all the class data on one page. Having all the results the class can then arrange them in order, set up intervals, do frequency counts, think about what they have and so on.

That implies we do the measuring one day the analyzing next. However, we must be sure the students know how to use a ruler. What are the marks mm or cm, what are the numbers? So first have the students do some gross measurements. Measure the length, width, and thickness of their text including the covers. I can hear students saying, “We did this in general science.” Maybe they will be able to use a ruler.

Since we want to show that measurements are estimates, we need more than one measurement, to show the variation. When I tried this with the seminar the adults felt one measurement was enough. When the measurements of others were pointed out the response was “I’m right , they’re wrong.” That is why I ask the students to do me a “favor” in the script. If you get two different values you are not going to say that you are wrong. Do not pass out the recording forms until you have agreement that a number of measurements need to be done.

So you know how thick the pack is. How many pages are in the pack? Are you going to count them? My, you are ambitious. Here is a chance for mathematics to do the work for us. How many page numbers are there on each sheet of paper? So we will be dividing by two. What will be divided by two? Pretend we had one page with the numbers 6 and 7. If we subtracted we would get one, but dividing by two would give us a half, so we will have to do something to the one. Add one and then divide by two. Try it with two sheets, pretend the numbers are 7,8 and 9,10. Ten take away seven makes 3, add one makes 4, divide by two makes 2 like it should. So if BPN stands for “back page number and FPN stands for “front page number”’, then the number of sheets in the pack will be (BPNFPN + 1)/2.

Go to it. Record your data, do your calculations. Now before you tell me your results, please do me a favor. Do it again, use different page numbers at the beginning and end. You came up with different answers each time. Why? Some variation may be due to the difference in pressure when the pack was held for measuring. The difference in an individual’s measurements will be due to the precision of the ruler. The thickness of a mm mark is thicker than a sheet of paper. How well are you at estimating distances between the marks?

Accuracy is how close you are to the true value. How can you tell how accurate you are if you do not know the true value? If we could get our measurements closer together then we would say they were more precise. If our measurements agreed for a certain number of decimal places we could say the measurement was accurate to that number of places. How can we get our measurements closer together? Use more precision instruments. A ruler with finer and more frequent markings, a vernier caliper or a micrometer would give more accurate results. There would still be variation but farther out in the decimal places.

That might well be one day’s work. We established the need to make a number of measurements. The next day pass out the data form and collect the measurements. Some teachers might have the class fill out the forms as home work. If a computer is available a program could be written so all the students had to do was enter the data and the machine would calculate the thicknesses.

We have all these measurements. What are we going to do with them? What is the smallest measurement? what is the largest? How can we tell? If the numbers were arranged in order we might see some pattern. If a computer or calculator were used we will have so many decimal places that rounding will be called for. We have all these measurements which one is right? Which one is in the middle? That is called the median. Which one appears most frequently? That is called the mode. To see what other questions can be asked we need a set of data. Here are my results in summary form. Since my measurements seam to say the thickness is 0.08xx mm, I tried to determine the next decimal place so I grouped the data in intervals of length .0010.

I expect that someone will suggest calculating the mean.

____________Frequency

 Measurement Interval times Interval(mm) Frequency mid-value mid-value .0800 .0809 0 .0805 .0810 .0819 ******** 6 .0815 .4890 .0820 .0829 * 1 .0825 .0825 .0830 .0839 *********** 9 .0835 .7515 .0840 .0849 ********* 7 .0845 .5915 .0850 .0859 **************** 12 .0855 1.0260 .0860 .0869 ************* 10 .0865 .8650 .0870 .0879 *** 3 .0875 .2625 .0880 .0889 ** 2 .0885 .1770 .0890 .0899 **** * 5 .0895 .4775 .0900 .0909 0 .0905 .0910 .0919 0 .0915 .0920 .0929 * 1 .0925 .0925 .0930 .0939 * 1 .0935 .0935 total 57 total 4.8785
____Mean (4.8785)/57 = .085587

We can treat the chart as a sample space. What is the probability that a measurement will be between .0850 and .0859? 12/57. If we did all the measurements again it is doubtful we would get the same ratio so instead of probability we ask for the relative frequency of measurements between .0850 and .0859. 12/57 or 21%.

Ask each student to compare his individual scores against the class average. Were they higher, lower or equal to the class average. count the number of times for each case. Most students will be either high all the time or low all the time. This is called bias. It shows that some used more pressure than others when they did the measuring, but they were consistent. Bias is to be expected in measurement our job is to be aware of it.

Ask each student to compute the average of his scores. You will find the range of the averages will be less than the range of the scores.

The asterisks give us a histogram. The normal way to show a histogram is to have the equal interval along the x-axis and vertical bars rising up for the frequency.

We may be able to go farther, but I think we would overwhelm our students if we did so. We have seen an application of statistics, a need to use statistics, and the statistical techniques. A large amount of data had to be organized. The central tendency gave us an idea as to what the correct measurement was. The dispersion gave us an idea as about how close we were to being correct.

I would like to hear how it goes when you try it with your class. Ask the questions. Teachers ask questions, students answer them. If students do not answer one question ask them another question.

Summary and Conclusion

In this paper we have seen six activities covering many statistical concepts.

1. Word Wealth: Jobs and subjects using statistics.
2. Role Playing: Do the assumptions fit the facts?
3. Shooting Dice: Sample space and probability.
4. Multiplying Polynomials: An example of a generating function.
5. A Fair Game: Mathematical expectation and the properties of the mean.
6. Measuring Paper Thickness: Collecting and organizing data, median, mode, mean, frequency, accuracy, precision.
The Word Wealth, Shooting Dice, and Measuring Paper Thickness can be done with any class. The different level classes will take different amounts of time to do the job, because of different levels of efficiency. deciding what to do and then doing it. Different classes have different levels of interest and attention spans. How the activities are sold to introduce them will determine the interest of the class. If Word Wealth is presented as “Look these words up” it will go over as “busy work”, “boring!” Emphasize doing something different.

Do try putting statistics in your curriculum. Have the class generate its own data. Work for class participation in active activities. We all learn more by doing than by listening.

Do look at the literature. There are authors presenting statistics as an interesting subject. Texts are being written that engage the student. One example is sampling. One college professor has his class count its money, then he takes a random sample and predicts the class total. How big a class is needed?

There is a lot that can be done. Let’s get started. Let me know about your results.

### Bibliography

These are books that I can recommend to both teachers and students. These books are not texts, they do not make demands upon one’s math ability.

Huff, Darrell, How to Lie with Statistics. New York, N.Y.: W.W. Norton & Company, Inc., 1954. Most of the authors who write statistics books have read this one. It may have caused some people to believe statistics is lying.

Huff, Darrell, How to Take a Chance. New York, N.Y.: W.W. Norton & Company, Inc., 1959. More of the same. Both of these books are readable and entertaining. The illustrations really sell these books. Everyone should read them.

Kline, Morris, Mathematics in Western Culture. New York, N.Y.: Oxford University Press, 1964. Read all of it. Chapters XXI-XXIV involve probability and statistics. Find out who Graunt, Petty and Quételet were.

Kramer, Edna E., The Main Stream of Mathematics. New York, N.Y.: Fawcett World Library, n.d., Copyright 1951 by Oxford University Press, Inc. Chapter 8 covers statistics.

Slonim, Morris James, Sampling, A Quick, Reliable Guide to Practical Statistics. New York, N.Y.: Simon and Schuster, 1966.

Steen, Lynn Arthur, Editor, Mathematics Today: Twelve Informal Essays. New York, N.Y.: Springer-Verlag, 1979.

Steen, Lynn Arthur, Editor, Mathematics Tomorrow. New York, N.Y.: Springer-Verlag,1981.

Weaver, Warren, Lady Luck: The Theory of Probability. Garden City, N.Y.: Anchor Books; Doubleday and Company, Inc., n.d., Copyright 1963 by Educational Services Incorporated.

Youden, W.J., Experimentation and Measurement. New York, N.Y.: Scholastic Book Services, 1962, Copyright 1962 by the National Science Teachers Association, Inc. The book that motivated the measurement unit. It introduces the statistics of measurement, the normal curve, accuracy, bias, precision.