This is an analysis of the Financial Contributions to the Presidential Election 2016 Campaigns for the state of Pennsylvania. I chose this state, because Pennsylvania is a traditional “bluewall” stronghold state that had not been won by Republicans since the 1980s. I think the contributions here, could provide interesting insights about who people support and their patterns of donations. We will be considering contributions made between July 17, 2014 and November 8, 2016.
Through this exploration, I will try to find answers to questions like:
Firstly, we’ll load the dataset we need to use.
dim(pa_data)
## [1] 243796 18
levels(pa_data$cand_nm)
## [1] "Bush, Jeb" "Carson, Benjamin S."
## [3] "Christie, Christopher J." "Clinton, Hillary Rodham"
## [5] "Cruz, Rafael Edward 'Ted'" "Fiorina, Carly"
## [7] "Graham, Lindsey O." "Huckabee, Mike"
## [9] "Jindal, Bobby" "Johnson, Gary"
## [11] "Kasich, John R." "Lessig, Lawrence"
## [13] "McMullin, Evan" "O'Malley, Martin Joseph"
## [15] "Pataki, George E." "Paul, Rand"
## [17] "Perry, James R. (Rick)" "Rubio, Marco"
## [19] "Sanders, Bernard" "Santorum, Richard J."
## [21] "Stein, Jill" "Trump, Donald J."
## [23] "Walker, Scott" "Webb, James Henry Jr."
There are 243796 observations, 18 variables and 24 candidates in this dataset. Let us take a look at the contributions.
summary(pa_data$contb_receipt_amt)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -93308.0 15.0 27.0 102.8 80.0 10800.0
The contributions show some negative amounts, which appear to be refunds. We will remove these in our analysis. And while the dataset contains post election contributions upto December 31st, 2016, we will consider contributions only upto the election day for most of our analysis.
summary(pa$contb_receipt_amt)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.05 15.00 28.00 108.85 80.00 10800.00
The median contribution is $28 which indicates that people seem to be making donations in small amounts. Let’s look at the distribution of the contributions.
As the contributions have extreme low and high values it is not possible to look at the details from this distribution. Hence, transforming the x scale to log 10 here may help us visualize the plot better.
Now, the distribution looks normal, and we can see that amounts less than $100 seem to be most common.
We shall now take a look at how the contributions looked for different candidates. I will be creating a new variable for party affiliation of the candidates here, to be able to distinguish between them.
It looks like a few candidates received most contributions. Since many of the candidates dropped out of the presidential race early on, or were not as consequential to the elections as others, going forward, it would be more productive to focus on the top six candidates. Let’s just look at the top six candidates in the next plot to see how they compare against each other.
Hillary Clinton has the highest number of supporters followed by Sanders and then Trump. While there are more republican candidates than democrats, 49.34% of the contributions went to Hillary Clinton alone.
Now let us consider where the contributors come from i.e. which cities/towns have most contributors.
Not surprisingly, it looks like the more populous cities seem to have more donors.
At this point, it would also be useful to add the latitude and longitude variables for each contribution, so that it will help locate how the contributions came in geographically, in later part of the analysis. I will use the zipcode package for this purpose.
Next, let us see how the supporters vary by their occupation and plot the top six. Since there are entries with occupation listed as “Information requested”, we will need to eliminate those. Also we will do some data cleaning here and combine some occupations like NOT EMPLOYED and UNEMPLOYED as one. There is truly a lot of work that can be done here, but I will clean only a handful that seem relevant to the analysis.
About 24% of the contributors were retired, as found in this analysis. I am interested in finding which candidates did they donate to?
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.55 20.00 35.00 86.56 80.00 5400.00
Almost 53.4% of the retired people who donated, chose to contribute to Hillary.
While, retired people have made the most contributions, they are followed by unemployed donors. Surely, unemployed contributors made only about 7.8% of the total contributors, yet they are a significant number, given that there wasn’t a steady source of income for them. It would be interesting to see how they contributed.
It’s interesting to see that Bernie Sanders had overwhelming support (89.24%) from the unemployed citizens of Pennsylvania. Sander’s popularity amongst the unemployed has been noted previously in the media as well.
Since I see a difference in choice of candidate here, let’s quickly check who was more popular with the attorneys, who formed the third largest group of supporters.
Hillary again received overwhelming support from attorneys as well (78%).
Next let us analyse how the contributions varied by party.
The democrats in Pennsylvania had almost 3x supporters as republicans.
There are slightly more female contributors than male in this dataset.
At this point I would also like to find out how many male and female donors did Hillary Clinton have. Being a woman, was she more popular with female supporters?
Hillary Clinton has almost twice the number of female supporters as male, which was what one would have expected.
One final thing I would like to take a look at is the post election contributions. Usually these are contributions made to pay off campaign debt or for re-eelction.
Trump received 1296 contributions post election, a possible reason for that could be in this article. The other candidates didn’t seem to have received any considerable amount of money. Hillary Clinton infact had not a single contribution from Pennsylvania after she lost the election.
The variables that interest me the most from this dataset are the contribution amount, contributor date and contributor occupation. I will be using these variables to explore the dataset to find amount of money candidates raised, how many supporters they had, where did the supporters come from and what party, gender or occupation they belonged to.
The other features that will help support my investigations would be contributor zipcode, party affiliation of candidates and timeline of contributions made.
Yes, I created the following new variables:
The following operations were performed:
Let’s take a look at the contribution amount raised by all parties.
The democratic party raised 1.5 times more money than the republican party. Let us look at the contributions in more detail through boxplots. Also since the amount raised by other parties is very small, we will focus on the democrat and republican parties only.
## pa$party: democrat
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.05 14.00 25.00 88.07 50.00 5400.00
## --------------------------------------------------------
## pa$party: other
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 3.00 29.14 75.00 172.27 215.93 2700.00
## --------------------------------------------------------
## pa$party: republican
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.8 25.0 50.0 168.4 100.0 10800.0
Since there are so many outliers, it is not possible to infer much from this plot. So we will transform the y axis by applying the Log10 scale.
The median for republican is $50 and that for democrats is $25. As we can see from the plot, the median amount for republican party is much higher (twice) as that of democratic party. While both have outliers, there are many small donations (< $1) that are made to democrats.
Now, let’s also take a look at how much money the candidates raised individually.
Hillary Clinton is clearly a winner here. 50% of the money raised in Pennsylvania went to Hillary alone. Let us look at the contributions for the candidates in detail.
## pa$cand_nm: Bush, Jeb
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.0 50.0 250.0 909.9 2000.0 5400.0
## --------------------------------------------------------
## pa$cand_nm: Carson, Benjamin S.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.00 25.00 50.00 94.94 100.00 5400.00
## --------------------------------------------------------
## pa$cand_nm: Christie, Christopher J.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 15 425 1000 1463 2700 5400
## --------------------------------------------------------
## pa$cand_nm: Clinton, Hillary Rodham
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.05 15.00 25.00 109.87 90.00 5000.00
## --------------------------------------------------------
## pa$cand_nm: Cruz, Rafael Edward 'Ted'
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.0 25.0 50.0 101.1 100.0 8000.0
## --------------------------------------------------------
## pa$cand_nm: Fiorina, Carly
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 3.0 50.0 100.0 373.1 500.0 5000.0
## --------------------------------------------------------
## pa$cand_nm: Graham, Lindsey O.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 25 50 250 945 1375 5400
## --------------------------------------------------------
## pa$cand_nm: Huckabee, Mike
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1 25 50 244 100 5400
## --------------------------------------------------------
## pa$cand_nm: Jindal, Bobby
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 20.16 50.00 100.00 584.24 500.00 2700.00
## --------------------------------------------------------
## pa$cand_nm: Johnson, Gary
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 3.0 35.0 100.0 241.8 250.0 2700.0
## --------------------------------------------------------
## pa$cand_nm: Kasich, John R.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 10.0 100.0 250.0 645.1 1000.0 2700.0
## --------------------------------------------------------
## pa$cand_nm: Lessig, Lawrence
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 11.98 87.50 100.00 306.52 250.00 2700.00
## --------------------------------------------------------
## pa$cand_nm: McMullin, Evan
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 5.00 25.00 50.00 76.28 100.00 250.00
## --------------------------------------------------------
## pa$cand_nm: O'Malley, Martin Joseph
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 15.0 250.0 500.0 897.4 1000.0 5400.0
## --------------------------------------------------------
## pa$cand_nm: Pataki, George E.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 125.0 187.5 250.0 208.3 250.0 250.0
## --------------------------------------------------------
## pa$cand_nm: Paul, Rand
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.0 25.0 50.0 122.8 100.0 2700.0
## --------------------------------------------------------
## pa$cand_nm: Perry, James R. (Rick)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 25 250 250 745 500 2700
## --------------------------------------------------------
## pa$cand_nm: Rubio, Marco
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.0 25.0 75.0 411.7 250.0 10800.0
## --------------------------------------------------------
## pa$cand_nm: Sanders, Bernard
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.00 10.00 27.00 42.63 50.00 5000.00
## --------------------------------------------------------
## pa$cand_nm: Santorum, Richard J.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 5.0 100.0 500.0 825.2 1000.0 5400.0
## --------------------------------------------------------
## pa$cand_nm: Stein, Jill
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 4.20 29.00 50.00 91.89 100.00 1000.00
## --------------------------------------------------------
## pa$cand_nm: Trump, Donald J.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.8 28.0 40.0 144.2 101.0 5000.0
## --------------------------------------------------------
## pa$cand_nm: Walker, Scott
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 10.0 206.5 300.0 919.0 1000.0 10800.0
## --------------------------------------------------------
## pa$cand_nm: Webb, James Henry Jr.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 50.0 100.0 250.0 445.5 375.0 2600.0
As we can infer from the boxplot above, Hillary Clinton has the lowest median contribution ($25), while Christopher Christie has the highest ($1000). Jeb Bush has the largest IQR while George Pataki has the smallest. Clinton, Ted Cruz, Sanders and Trump have more donors who made large amounts of donations, compared to other candidates. Marco Rubio and Walker Scott have both received maximum donation of $10800, which is the largest among all candidates.
Let’s take a look at what percentage of contribution amount each candidate received within their own party.
For both the parties, only a few candidates got most of the donations. Hillary Clinton received a whooping 83% of the donation amount in the democratic party, while Trump received 40% in the republican party. Five candidates within the republican party even received less than 1% money donated to republican party.
We shall now analyse the contributions raised by occupations
It’s interesting to note that while unemployed people were the second most in number of contributions made, attorneys raised the second highest amount of contributions. Let’s plot the average contribution to get a better idea.
It seems like the higher paying professions like Attorneys or Doctors donate more on average.
We saw that many retired people donated to Hillary Clinton while, most unemployed donated to Bernie Sanders. Let’s take a look at how various occupations donate party wise. Are some professions more likely to donate to democrats or republicans?
Most professions in Pennsylvania seem to donate to democrats. Only the retired people seem to be somewhat evenly split between both parties.
Let’s see how the contributions look for men and women.
Men raised more than women in total, even though the number of contributions from women were slightly higher than men as seen earlier. The box plot will give more idea about the way they donated.
## pa.gender$gender: female
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.05 15.00 25.00 88.21 55.00 5400.00
## --------------------------------------------------------
## pa.gender$gender: male
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.09 20.00 35.00 129.10 100.00 10800.00
The median contribution of men is 40% higher than women. And if we look at the IQR, the third quartile and maximum donations are higher for men than women. So men seem to have made more big donations than women.
It would be interesting to see how men and women contributed party wise.
Female donors seem to support democrats more than republicans in Pennsylvania. As one can infer from the plot, democrats have almost 4x contributions from women than republicans. Let’s analyse what share of these donations went to Hillary Clinton. Did women donate more to democratic party because of Hillary?
75% of the women supporters who contributed to democratic party, contributed to Hillary, which supports the idea that women tended to donate more to a female candidate.
We shall now visit the contributions made in the year 2016 and look at them in more detail.
Democrats have been consistently leading in the number of suporters every month, right upto the election day. The only time, republicans came somewhat close to garner enough support as democrats was in the month of July. A possible trigger for that could be, announcement of Trump as presidential nominee.
In Pennslyvania, the Democratic Party had 3x the number of republican supporters and raised 1.5x more money than republicans. The number of contributions during the election year were consistently high for democrats every month. Hillary Clinton received 83% of total money donated within her party while Trump received 40% of total donations within his. In both parties few candidates received most donations.
The median contribution amount for democratic party was $25 while that for republican party was $50. In general, men contribute more than women.
Retired people contributed the most, while Attorneys had the highest average donation amount. While unemployed people were the second largest contributors in number, they contributed the lowest average donation amount.
More retired people (53.4%) and attorneys (78%) donated to Hillary Clinton, while most unmeployed supporters (89%) donated to Bernie Sanders.
Hillary Clinton received twice as many contributions from women than men.
75% of the donations that women made to the democratic party, went to Hillary Clinton.
Since most variables are non numeric except amount donated, it is not possible to compute correlation directly here. However from the analysis it looks like a strong relationship exists between gender and party. Women it seems are more likely to donate to democrats, because of Hillary.
Also from the analysis possibly strong relationships could exist between population of a city/town and the number of donations.
Also there could possibly be a strong relationship between income and amount donated, as we have observed that higher income professions like attorneys and doctors tend to donate more on average.
Population data and salary statistics would be necessary to find an exact relationship, along with the variables that already exist in this dataset.
We have seen which candidates raised more money and how much of the share they received within their own party. We also found that the democrats were consistently leading in the contributions than the republicans. It would be interesting to explore this relationship further. I would like to find out now how each of the candidates received these donations over a period of time. So let’s look at some of the scatter plots of this data.
Note: I have used LOESS moothing method from smoothing the scatterplot above. We can see that Bernie Sanders had far greater supporters than Hillary Clinton between January 2016 until June 2016 when Hillary was declared the presidential nominee.
On the other hand, as far as the republican candidates were concerned, Ted Cruz was steadily leading the campaign and Ben Carson followed a close second, until they gave up their candidacy. Trump actually had the lowest supporters, until he was declared the republican presidential nominee, after which his support soared.
Let’s take a look now at where the supporters came from on a map. Before plotting the zipcodes on the map, I will check if there are any odd datapoints here.
## contbr_zip contbr_city party cand_nm
## 13698 75014 PARIS democrat Clinton, Hillary Rodham
## 29305 75014 PARIS democrat Sanders, Bernard
## 32424 61462 ILLINOIS republican Trump, Donald J.
## 32528 85710 TUCSON republican Huckabee, Mike
## 37944 75017 PARIS democrat Clinton, Hillary Rodham
## 40845 75014 PARIS democrat Clinton, Hillary Rodham
## contb_receipt_amt zip city state latitude longitude
## 13698 25.00 75014 Irving TX 32.76727 -96.77763
## 29305 50.00 75014 Irving TX 32.76727 -96.77763
## 32424 267.29 61462 Monmouth IL 40.91885 -90.64466
## 32528 250.00 85710 Tucson AZ 32.21329 -110.82559
## 37944 500.00 75017 Irving TX 32.76727 -96.77763
## 40845 25.00 75014 Irving TX 32.76727 -96.77763
## contbr_zip contbr_city party cand_nm
## 288 99999 SPRINGBROOK TWP. republican Trump, Donald J.
## 290 99999 BEDMINSTER republican Trump, Donald J.
## 1695 <NA> PITTSBURGH republican Walker, Scott
## 1703 0 GLENMORE republican Walker, Scott
## 10097 17373 DALLASTOWN republican Trump, Donald J.
## 10098 17373 DALLASTOWN republican Trump, Donald J.
## contb_receipt_amt zip city state latitude longitude
## 288 98.34 99999 <NA> <NA> NA NA
## 290 48.61 99999 <NA> <NA> NA NA
## 1695 2700.00 <NA> <NA> <NA> NA NA
## 1703 500.00 0 <NA> <NA> NA NA
## 10097 150.00 17373 <NA> <NA> NA NA
## 10098 100.00 17373 <NA> <NA> NA NA
As you can see, some of the zipcodes are not listed, or are outside PA, even though the state is listed as PA. Since it is true that the Federal Election Commission is very strict about not allowing donations outside the state, there is possibly some error here and the data needs more cleaning. For now, I will ignore the zipcodes outside of Pennslyvania. (39 have been found).
Let’s now plot the contributions on the map.
While there is some overplotting in the major city regions, it is still evident from the map that the sparsely populated areas on the map which include small towns contribute more to republicans, while the cities seem to have more contributions made to the democratic party.
From the map, it can be observed that people in cities or areas with larger concentration of population donated more to democrats, while those in sparsely populated regions had more republican donors.
Bernie Sanders was leading in supporters than Hillary Clinton throughout, until Hillary was declared the presidential nominee. Trump on the other hand had almost no support in comparison to other candidates. But his support soared after his nomination.
It’s both necessary and interesting for campaigns to find out what demographic their supporters belong to. Hillary Clinton received support from retired people who donated the most money and also from attorneys who formed the third largest group of supporters. Yet it is worthwhile to note that unemployed people didn’t support her and chose to contribute to Bernie Sanders instead.
Hillary Clinton received 83%, while Trump received only 40% of the donations from their respective parties. The fact that there were more republican candidates in the race, split up Trump’s donations.
Bernie Sanders was leading with maximum number of contributions, until he gave up his run. After that, Hillary’s support soared closer to the election. Trump on the other hand had the lowest contributions, until others dropped out of the race and he was declared presidential nominee. While his supported rose after his nomination, it dwindled closer to the election.
The presidential election of 2016 was both controversial and interesting and it threw a surprise in the end, when Donald Trump won states nobody was expecting him to. I chose to analyse the financial contributions made to presidential campaigns in Pennsylvania, as it was a democratic stronghold state and the contribution trend in this state would be interesting to discover.
I wasn’t familiar with the gender, zipcode and ggmap packages and spent some time learning to use them for this analysis. I also spent lot of time improving aesthetics of the plots and adding features like hover for one particular scatterplot using the package plotly which took some time.
There are 8370 unique occupations listed in this dataset, many of which imply the same profession. I spent some time in trying to clean this information up, but had to limit myself to only a chosen few occupations, as it was a time consuming task.
It would have really helped to have county information for every contribution in this dataset. But since it’s not present, it would be a wonderful addition to obtain this information possibly through webscraping and include it in this dataset. It might help discover new insights on donation trends in counties, especially those like Wilkes-Barre or Erie that swung from blue to red this election.
It would also be a great idea to find correlation between income and contributions made if this data can be obtained through census data for Pennsylvania for various occupations. Similarly, another area to explore would be to find if densely populated counties make more donations the sparsely poulated ones?
Finally, this analysis is only for the state of Pennsylvania. It would be interesting to analyse campaign data for swing states like Florida or Ohio. To gain even more useful insights, it would help to expand the analysis to the nation as a whole and study the data for various candidates.