In the statistics classes that I teach, we open the course with topics of data production and collection. I have often felt that this section was "not math-y enough", and have worried about emphasizing the material because it is very accessible and can give students a false sense of the rest of the class. I have come to see that this section is very important and when seen as the set up for the rest of the year, can be taught in a manner that foreshadows future learning and builds good process habits at the onset of the class.
All maths can be seen as emanating from measurement, or quantification. The ways that we choose to quantify inform our data collection, processing and analysis. Therefore, from the outset, we need to think carefully about what we want to know, how we can find what we want to know and how we can quantify it. Given any topic of interest that the student generates, we begin by asking our framing questions:
What can be measured? What are the tools needed? What are the units of measurement?
From the outset, teaching will include vocabulary of the sampling and experimental procedures. Variables can be quantitatively measured (pounds lifted), or categorized and counted (girls who performed a memory task successfully). For example, perhaps a student expresses interest in body-building and exercise. We can ask our framing questions: What can be measured? (pulling or lifting strength); What are the tools needed? (weights, sophisticated lab tools, pushups); What are the units of measurement? (pounds, kg if using weights lifted, # pushups). These are simple questions that lead to the next set of questions: How could data be collected? Should we use an observational approach or an experiment?
We can design simple studies. In another example, if we want to observe gender differences in performing memory tasks, do we need to study equal numbers of boys and girls? Can we observe just the students in our class? Can we prove causation? Using small groups and partners to listen and react with questions, students should plan and refine studies. Students should produce a written reflection on the evolution of their study and notice any improvement in the results, based on how the questions were framed.
Exploratory analysis of our initial data should help us to see patterns in our results. By quantifying the results and organizing into tables or graphs, we may see patterns that lead us to predict outcomes, or ask further questions. In particular, we often ask: How unusual are these results? The question of usualness leads us to the study of variability, and in turn probability.
In addition, each step of organizing the problem leads us through the process. Categorical data should be compared using proportions and displayed in bar graphs, quantitative data with histograms. The standard deviation of a sampling model for categorical data is computed by using the formula sqrt(p(1-p)/n), while the standard deviation of a sampling model for quantitative data uses the formula sx/sqrt(n). Categorical data relies on the normal model for calculating probabilities; with quantitative data it is t-models.