Problem:
A retail company is testing a new platform to promote and sell products to business customers.
As part of the experiment to examine the effect of the new platform on sales, the following data was collected:
- 180 established business customers (or accounts) were randomly selected, and their average weekly order for the last 6 months was calculated.
- 90 randomly selected customers out of the 180 were routed to use the new platform. After six months, their average weekly order was calculated once again.
- Additional 90 new customers were routed to the new platform, and their average weekly orders were calculated after six months.
The experiment process generated three disjoint groups of customers:
Group Name | Description | #customers |
A Customers 1-90 | Assigned to new platform: The weekly order was calculated before and after the new platform For each customer, there are two observations one before and one after the new platform was deployed | 90 |
B Customers 91-180 | Did not experience the new platform: The weekly order was calculated twice by six months For each customer, there are two observations both with the old platform | 90 |
C Customers 181-270 | Assigned to new platform: The weekly order was calculated only once after 6 months on the new platform For each customer, there is one observation after the new platform was deployed | 90 |
Please note that the groups do not necessarily correlate with different populations. The populations will be defined by the research question in each study.
Timeline of experiment:
May 1, 2023 (first measure, A & B) November 1, 2023 (second measure, A, B & C)
For each business customer, additional information was collected such as business “age” (numeric), “premium” (yes/no, whether the customer subscribes to the premium delivery), “industry” (primary, secondary, tertiary or quaternary), and “location” in the US (pacific/ eastern). In addition, the variable “measure” indicates whether the calculation corresponds to the first or second measurement. For the C group, there is only one “measure” that equates to the second measurement on the other groups.
Part 1: Data Exploration
For the variable “order” in each of the specified subgroups:
- Provide descriptive statistics: mean, mean, standard deviation, and sample size
- Plot histogram and boxplot
For the variable “premium” in each of the specified subgroups:
- Populate the given frequency table (use Pivot Table or R commands to find the counts)
- Orders in the new platform.
Descriptive statistics:
New platform (Nov 1, 2023) | |
Mean | |
Median | |
Standard Deviation | |
Sample Size |
<Provide screenshot of calculations Excel, or R commands>
Graphs:
Histogram
Boxplot
<Provide screenshot of calculations Excel, or R commands>
- Orders in the old platform.
Descriptive statistics:
Old platform (May 1, 2023) | |
Mean | |
Median | |
Standard Deviation | |
Sample Size |
<Provide screenshot of calculations Excel, or R commands>
Graphs:
Histogram
Boxplot
<Provide screenshot of calculations Excel, or R commands>
- Orders of customers located in the pacific versus eastern region (New platform, Nov 1, 2023)
Descriptive statistics:
Pacific | Eastern | |
Mean | ||
Median | ||
Standard Deviation | ||
Sample Size |
<Provide screenshot of calculations Excel, or R commands>
Graphs:
Histogram by location
Boxplot by location
<Provide screenshot of calculations Excel, or R commands>
- Count of premium and no premium subscribers while on the old platform at the start (May 1, 2023).
Frequency Table:
<Provide screenshot of calculations Excel, or R commands>
- Count of premium and no premium subscribers while on the old platform and new platform on November 1, 2023?
Frequency Table:
Count/Frequency | |||
Old platform | New platform | Total | |
Yes | |||
No | |||
Total |
<Provide screenshot of calculations Excel, or R commands>
Part 2: Statistical
For each of the following research question:
- State the null and alternative hypotheses.
- Determine the appropriate hypothesis test and justify your selection.
- Run the hypothesis test and provide the R code with results.
- State the statistical conclusion
- Communicate the statistical conclusion within the context of the problem in plain English.
Use 0.05 significance level for all statistical tests.
- Is the average weekly order on the new platform greater than $5,900?
Hypotheses:
H0: _____________________________________________________
H1: _____________________________________________________
Statistical test and justification: _____________________________________________
Statistical test code and results: _____________________________________________
Statistical conclusion: _____________________________________________________
Conclusion within context: _________________________________________________
- Based on the data from May 1, 2023, is the proportion of customers that subscribe to the premium delivery while on the old platform less than 12%?
Hypotheses:
H0: _____________________________________________________
H1: _____________________________________________________
Statistical test and justification: _____________________________________________
Statistical test code and results: _____________________________________________
Statistical conclusion: _____________________________________________________
Conclusion within context: _________________________________________________
- Based on the data from November 1, 2023, when using the new platform, is there a difference in the average weekly order of customers located in the pacific versus eastern region?
Hypotheses:
H0: _____________________________________________________
H1: _____________________________________________________
Statistical test and justification: _____________________________________________
Statistical test code and results: _____________________________________________
Statistical conclusion: _____________________________________________________
Conclusion within context: _____________________________________________________
- Based on the data from November 1, 2023, is the average weekly order of customers that experience the new platform greater than those who did not?
Hypotheses:
H0: _____________________________________________________
H1: _____________________________________________________
Statistical test and justification: _____________________________________________
Statistical test code and results: _____________________________________________
Statistical conclusion: _____________________________________________________
Conclusion within context: _________________________________________________
- Based on the data from November 1, 2023, is the proportion of customers that subscribe to the premium delivery different between the old platform and new platform?
Hypotheses:
H0: _____________________________________________________
H1: _____________________________________________________
Statistical test and justification: _____________________________________________
Statistical test code and results: _____________________________________________
Statistical conclusion: _____________________________________________________
Conclusion within context: _________________________________________________
- Based on the data from group A, is the average weekly order after the new platform greater than on the old platform?
Hypotheses:
H0: _____________________________________________________
H1: _____________________________________________________
Statistical test and justification: _____________________________________________
Statistical test code and results: _____________________________________________
Statistical conclusion: _____________________________________________________
Conclusion within context: _________________________________________________