The primary objective of this section is to take a set of data and be able to see both how the values are centering by determining the mean, median and mode averages and how they are varying by determining the range and making a box plot diagram. The basic strategy for this is to have students gather classroom information and then work with it repeatedly finding the indicators in which we are interested.
The structure I plan to use in gathering classroom material for this work has three levels. The first level is to have each student keep a daily record of class attendance and his or her own grades. Form A is a possible example of an individual form to use. The space near the bottom can be used for the students individual grades. Level two of classroom data gathering is the daily maintenance of a wall chart of which will be recorded the percents of class attendance, an average of class grades when grades are given and the room temperature. See Forms B, C and D as possible examples. The third level of data gathering is more flexible and consists of the assignment and test grades obtained from students’ work in sections II and III.
The basic time frame is to have students keep the individual forms and the wall chart for a period of about four weeks. After the students understand what is expected of them, keeping the individual forms and the wall chart will take only a few minutes each day. During that four week period classroom time mostly will be spent on statistical topics such as those given in sections II and III. I think of this part as statistically cultivating the classroom soil.
For some students keeping the individual forms will be a fairly easy task: for others it will be difficult to even keep track of the forms. Both for the individual forms and the wall chart, the percent of attendance must be calculated. To do this we need a count of who is present and who is absent. How do we count a student excused, say to attend a student government meeting? Is that student absent or present? It is a decision that needs to be discussed and made. During the process of keeping these daily records many such questions may come up. Discussing these questions and coming to an agreement as a class will help students develop a better understanding of what it means to gather data. There is leeway. How solid are the numbers we read in the newspapers, magazines and books?
But what does this have to do with the primary objective of this section to develop skill in finding measures of central tendency and variance for a given set of data? It is generating the sets of data we are going to use in finding the mean, median, mode and range and from which we will make box plots. What sets of data? Each set of classroom grades from assignments and tests and the final set of percent of attendance for each student as well as the set of temperature readings is a set of data the students understand well and for each we can find the measures we want. First we find them together in class, then students take sets of data and find measures on their own either in class or for homework.
Students can bring in their own sets from the areas in which they are interested or sets of data can come from the
World
Almanac
or any area of concern . The number of sets used depends on the class situation, but for each set used the values are ordered, the mean, median, mode, and range are determined and a box plot is made. Let’s look at the statistical concepts involved.
The mean is the sum of the values of a set divided by the number of values and is the average with which most students are familiar.
Example: {200, 30, 125,92}
Mean = 200 + 30 + 125 + 92 = 111.75
____
4
The median may be defined as the number in the middle after the values have been put in order. There are always the same number of values above it as below it.
Example: {2,4,6,8,10}
Median = 6
If the number of values is even, there isn’t a middle value, so you take the mean average of the two middle numbers.
Example: {2,4,6,8,10,12}
6 + 8
Median = 6 + 8 = 7
____
2
The mode is the value found most frequently. None of the three sets of data given above have a mode since each value was used only once.
Example: {2,2,4,8,7,7,3,2}
Mode = 2
Example: {7,9,3,7,5,9,1}
Mode = 7 and 9
The last example has two modes and is called bimodal.
The average to choose is the one that best serves your need. A shopkeeper would be interested in knowing the mode of the sizes of shirts he sold so he would know how to order. If you were considering buying s house in a particular neighborhood, knowing the median income for the families who live there might be the most helpful average to know. If you were interested in baseball you would watch the batting averages of your favorite players. Here the mean is used. Each average gives different information. It gives a different view of the data.
Consider the information in the table below on the XYZ Plant Incomes. Find the mean, median and mode.
Mean = 183,000 = $20,333 0
____
9
Median = $12,000
Mode = $12,000
Incomes for the XYZ Plant
owner
|
$60,000
|
manager
|
40,000
|
worker
|
15,000
|
worker
|
15,000
|
worker
|
12,000
|
worker
|
12,000
|
worker
|
12,000
|
helper
|
9,000
|
helper
|
8,000
|
If you were the owner and wanted to show how well you paid you would say your plant paid an average salary of $20,333. If you were a worker who wanted an increase you would say that the average wage was $12,000.
To look at the variation in this set of data we want to find the range and make s box plot. The range of a set of data is the difference between the largest value and the smallest. Using the data from the XYZ Plant above we have the range equals $60,000 8,000 = $52,000.
Range = largest value smallest value
The range gives us an indication of how far the data is spread. It is an indicator of variance, but it does not give us any information about how the individual values are distributed or how they vary. For this a box plot can be helpful.
(figure available in print form)
The box plot is a quick way of seeing how the data is distributed. There is a lot of information presented. To construct a box plot make a line and impose a scale on it that will include your lowest and highest values. Next plot your values by putting an x above the proper number. If there are more than one of the same value, stack them.
Next draw a light dotted line down indicating the median value. For our data it is $12,000. Next put a dot indicating the lowest and highest values. There are only two more numbers we have to find, the upper and lower quantities. The upper quartile is the median of the data above the set median. The lower quartile is the median of the data below the set median. For our example, the set median is $12,000. There are four values above it and four below. The median of the upper four values is 40,000 + 15,000 = 27,000 this is called the upper quartile, Doing similarly for the lower quartile we get 12,000 + 9,000 = 10,500. Mark lines indicating the quantities and draw a box going from the upper quartile to the lower quartile as shown. Finally draw a line going from the upper quartile end of the box to the highest value and similarly for a line going from the lower quartile end to the lowest value. These end lines are called the whiskers of the box plot. To do this is quick and easy once you have learn ed the pattern .
Taking a look at some of the information presented in our box plot, notice that the line in the center area of the box is the median so that tells us half the values are greater than or equal to it and half the values are less than or equal to it. The upper end of the box divides in half the frequencies of the upper-valued data and similarly for the lower end of the box for the lower valued data. The shape of the box will change as the set of data changes. It is a model that makes it easy to talk about distributions. One can easily say and understand things like, “There are 2 values on the lower whisker of this box whereas the last one we did had 6.”