Module 8 – Introduction
This is the beginning of Unit 3. Unit 3 includes Modules 8, & 9. Unit 3 concludes with the Unit 3 Checkpoint.
Before we begin this Unit, let’s see how the new ideas in this module relate to the Big Picture of Statistics.
We begin a statistical investigation with a research question. The investigation proceeds with the following steps:
- Produce Data: Determine what to measure, how to sample, then collect the data.
- Explore the Data: Analyze and summarize the data.
- Draw a Conclusion: Use the data, probability, and statistical inference to draw a conclusion about the population.
In the course up to this point, we have focused on summarizing and analyzing data for a quantitative variable. In this Unit, we focus on summarizing and analyzing the relationship between two quantitative variables. This material is still part of exploratory data analysis.
Introduction
Previously, we analyzed a quantitative variable using a graph (dotplot, histogram or boxplot) and a numerical summary (5-number summary or mean with standard deviation.)
We also examined the relationship between a categorical variable and a quantitative variable, by using the categorical variable to create two groups and analyzing the quantitative variable for each group. For example, we compared credit card debt (quantitative variable) for male and female college students (categorical variable.) For each student, we had two pieces of information: gender and credit card debt. In this situation we used side-by-side histograms or boxplots to compare the two groups.
In this Unit we examine the relationship between two quantitative variables. For each individual in the data set, we collect two pieces of quantitative information. For example, for a breakfast cereal, we could record sugar content in a serving and calories in a serving.
We use a graph called a scatterplot to graph the two pieces of information in two-dimensions.
In a scatterplot one variable is the explanatory variable, and the other is the response variable. The response variable is the focus of the study, the outcome that we examine. The explanatory variable is the variable that we think may influence the response. We can think of the explanatory variable as the predictor of the response.
EXAMPLE Scatterplot
A research firm conducts a study to explore the relationship between a driver’s age and the driver’s ability to read highway signs. The subjects are a random sample of 30 drivers between the ages of 18 and 82.
(Source: Jessica M. Utts and Robert F. Heckard, Mind on Statistics [Brooks/Cole, 2002]. Original source: Data collected by The Last Resource, Inc., Bellfonte, PA.)
Because the purpose of this study is to explore the effect of age on the driver’s ability to read highway signs,
- the explanatory variable is age, and
- the response variable is the maximum distance at which the driver can read a highway sign, or maximum reading distance.
Both variables are quantitative.
Here is what the raw data look like:
In this data set, the individuals are the 30 drivers. For each driver, we have two values: age and maximum reading distance.
To explore the relationship between age and distance, we create a graph called a scatterplot. To create a scatterplot, we use an ordered pair (x, y) to represent the data for each driver. The x-coordinate is the explanatory variable: driver’s age. The y-coordinate is the response variable: maximum reading distance.
For this example, the ordered pair (18, 510) represents an 18-year-old driver who can read a highway sign at a maximum distance of 510 feet. We plot a point for each ordered pair. In the scatterplot, each driver appears as a single point.
Generally, each point in a scatterplot represents one individual. However, a dot can represent more than one individual. For example, if two individuals have the same variable values, then one dot will represent them both.
The x-coordinate is the value of the explanatory variable for that individual. The y-coordinate is the value of the response variable for that individual.
Here is the completed scatterplot:
COMMENT
The explanatory variable is on the horizontal x-axis. The response variable is on the vertical y-axis. Sometimes the variables do not have a clear explanatory–response relationship. In this case, there is no rule to follow. Plot the variables on either axis.