This section describes in detail the main statistical, graphical, economical and environmental concepts that will be discussed in this unit. Terms that may be included in a vocabulary list are italicized. These concepts are referred to in the individual lesson plans. The economical and environmental concepts are related to the statistical and graphical applications that will be used to study the behavior of and relationships between the natural resources and human populations. References to resources for data sets involving the economical and environmental concepts are provided in the individual lessons.
Statistical Concepts
Data
is measured or observed information. The two main
data types
are
quantitative
and
qualitative.
Quantitative data is numerical, while qualitative data is non-numerical. The four common
data types
include nominal, ordinal, interval and ratio measurements.
Nominal data
is purely qualitative, and is essentially categorical, i.e. does not have order or magnitude.
Ordinal data
can be qualitative or quantitative, has implicit order, but has no common difference.
Interval data
are purely quantitative, but the difference between two data has meaning.
Ratio data
is interval data in which a zero implies a quantity of zero for some measurement or observation (Larson & Farber, 2000). Comparison of the characteristics of data types can be eased by use of a matrix
1
.
Data summary measures of central tendency include
mean
,
median
,
mode
. The mean is the sum of all data divided by the size of the data set. The median is the middle entry in an ordered data set of odd size n. If n is even, the median is the mean of the two middle data points of an ordered data set. The mode is the datum with the highest frequency.
Data summary measures of dispersion, spread, or variability include
range
,
variance
,
standard deviation
,
interquartile range
. In an ordered data set, the lowest value is the
minimum
and the largest value is the
maximum
. The range is the difference of the maximum and minimum. The variance (of a sample of size n) is the sum of the squared difference of each data point from the mean, divided by one less than the size of the data set. The standard deviation is the square root of the variance
2
. The interquartile range is found by first separating the ordered data set into four
quartiles
, each with the same size. The divisions of the quartiles are known as Q
1
, the median (Q
2
), and Q
3
. The interquartile range (IQR) is Q
3
-- Q
1
. Taken with the maximum and minimum, the median and quartiles are collectively known as the
five number summary
, which is useful for creating box plots.
A
frequency distribution
is a tabular representation of the
frequency
of
a finite number of categories, or
classes
, where the classes are interval data.
Frequency
is the number of data entries in a class. The types of distributions include
normal or symmetric, skewed right, skewed left,
and
uniform or flat
. In a normal distribution, all measures of center coincide. In a skewed-left distribution, mean median mode, and the “peak” of the data occurs on the right side of the distribution. In a skewed-right distribution, mode median mean, and the “peak” of the data occurs on the left side of the distribution
3
. When a frequency distribution is normal, probabilities or percentiles can be known using the standard deviation: 68% of the data lies within 1 standard deviation of the mean, 95% of the data is within two standard deviations from the mean, and virtually all (99.7%) of the data lies within three standard deviations from the mean
4
.
A
variable
, for the purposes of this unit, is a phenomenon measured or recorded as quantitative data. In this way, a data set consisting of discrete nominal categories with frequencies is considered
univariate
. A data set with two corresponding sets of data (ordered pairs) is considered a
bivariate
data set.
A
population
is the set of all outcomes or measurements. A
sample
is a part of the population. A
statistic
is a measurement of some facet of a sample, while a
parameter
measures a characteristic of the population (Larson & Farber, 2000). One of the main goals of
inferential statistics
is to use statistics to estimate parameters.
Missing data points within the range of a bivariate data set can be
interpolated
or
extrapolated
using a linear regression formula
5
.
Graphical Concepts
Graphical displays used to show frequency of univariate data that is classified nominally include dot plots, bar graphs, pie charts, and frequency tables. A stem and leaf plot and a histogram display frequency and distribution shape of data that is classified by interval data. Cumulative frequency plots or curves show the rate of increase in total frequency for all previous classes. Graphics lending themselves to display of distribution shape and variability include box plots and histograms. Graphical displays used to highlight the relationship between two interval or ratio variables include time series graphs and scatter plots. In both of these types of displays, data are considered as ordered pairs.
Economic and Environmental Concepts
Some of the economic and environmental concepts that students may explore include the basics for human existence on a large scale: climate, weather, freshwater, food, energy, health care, child care, housing, transportation, and taxes. Other data that may be explored includes human population of different areas, land area, production of waste and consumption of energy, carbon footprint, and water footprint. Preliminary data or introduction to the idea of natural resources may be acquired by viewing data maps and cartograms, which will reinforce geographical knowledge and practice spatial reasoning.
Economical concepts consist primarily of costs, which students will investigate and to which students will apply the statistics and graphics. Some basic cost categories that may be explored include household costs, such as the costs of water, food, electric and fuel (energy) costs, health care costs and childcare costs. The results of these costs or rates of consumption may be compared to the known quantities of resources that are available, and conclusions or predictions about the future availability of the resources will be explored
6
.