A single PDF document produced in R Markdown, which includes:
1. code used to answer each question,
2. the statistical output from the code, and
3. your discussion or comment on the results in response to each question.
Make sure each question and the corresponding code chuck(s) are clearly marked and/or annotated.
In this problem set, we will use a data set on U.S. House of Representatives election returns (house.csv) to study the incumbency advantage (and more). This data set contains 7 variables, the first three of which uniquely identify each election by the state, year, and district in which it took place. The next four columns provide the vote totals for the Democratic and Republican candidates (vote D and vote R), and indicate whether a candidate is an incumbent (inc R, for example, takes the value 1 if the Republican candidate is an incumbent, and 0 otherwise).
- (5 points) Successful rendering of an .Rmd file into a PDF, with a proper document title.
- (5 points) Download house.csv and read it into R. Before doing so, don’t forget to remove everything from the Global Environment and set the working directory.
- (5 points) Within the data set, create a new variable, called dem.pct, which reflects the Democratic vote percentage in each congressional election, defined as 100 × vote D / vote.D + vote_R.
- (5 points) Similarly, create a new variable, called gop.pct, which reflects the Republican vote percentage in each congressional election, by adapting the formula used earlier
- (5 points) Calculate the overall mean of dem.pct.
- (5 points) Calculate the overall median of dem.pct.
- (5 points) Compare the mean and median just calculated. What can you infer from the difference?
- (5 points) Calculate the mean of dem.pct for elections in which a Democratic incumbent is running.
- (5 points) Calculate the mean of dem.pct for elections in which no Democratic incumbent is running.
- (5 points) Calculate the mean of dem.pct for elections in which neither party has an incumbent running.
- (5 points) Compare the three quantities calculated above. What do they suggest about how incumbents perform in elections? Any conclusion you draw should be based on evidence borne out by the data.
- (10 points) Compute the difference between the mean Democratic vote share in elections with a Democratic incumbent and the mean Democratic vote share in open-seat races (i.e., races where neither party has an incumbent running). Does the difference between these two quantities reflect an effect of having a Democratic incumbent (as opposed to having no incumbent) on the Democratic vote percentage? Why or why not?
- (15 points) Plot two histograms: one for Democratic vote share in open-seat elections and the other for Democratic vote share in elections with a Democratic incumbent. Label the histograms properly.
- (5 points) In one of the histograms, there appears to be some “lumping” at one end of the plot. Can you speculate what these elections were? Why doesn’t the other histogram seem to have the same lumping?
- (5 points) Calculate the median of dem.pct for the midterm election in 2006.
- (5 points) Calculate the median of dem.pct for the midterm election in 2010.
- (5 points) What can you infer from the two quantities just calculated in terms of the Democratic Party’s political fortune in these two elections? Specifically, did it manage to clinch a majority each time? Can you speculate any factor(s) that contributes to the party’s electoral performance each time?
- Extra Credit (10 points) Make a two-way scatterplot where dem.pct is on the x-axis and gop.pct on the y-axis. Do you observe any relationship between these two variables? Can you explain it?