Financial Contribution to Presidential Campaigns 2016 in Pennsylvania

Introduction

This is an analysis of the Financial Contributions to the Presidential Election 2016 Campaigns for the state of Pennsylvania. I chose this state, because Pennsylvania is a traditional “bluewall” stronghold state that had not been won by Republicans since the 1980s. I think the contributions here, could provide interesting insights about who people support and their patterns of donations. We will be considering contributions made between July 17, 2014 and November 8, 2016.

Through this exploration, I will try to find answers to questions like:

  1. Which parties and candidates have most support based on the number and amount of contributions?
  2. How do men and women contribute? Does Hillary have more female supporters than other candidates?
  3. How do donations vary across various occupations?
  4. How do the contributions vary geographically? Did cities and smaller towns contribute to certain party/candidate?

Univariate Plots Section

Firstly, we’ll load the dataset we need to use.

dim(pa_data)
## [1] 243796     18
levels(pa_data$cand_nm)
##  [1] "Bush, Jeb"                 "Carson, Benjamin S."      
##  [3] "Christie, Christopher J."  "Clinton, Hillary Rodham"  
##  [5] "Cruz, Rafael Edward 'Ted'" "Fiorina, Carly"           
##  [7] "Graham, Lindsey O."        "Huckabee, Mike"           
##  [9] "Jindal, Bobby"             "Johnson, Gary"            
## [11] "Kasich, John R."           "Lessig, Lawrence"         
## [13] "McMullin, Evan"            "O'Malley, Martin Joseph"  
## [15] "Pataki, George E."         "Paul, Rand"               
## [17] "Perry, James R. (Rick)"    "Rubio, Marco"             
## [19] "Sanders, Bernard"          "Santorum, Richard J."     
## [21] "Stein, Jill"               "Trump, Donald J."         
## [23] "Walker, Scott"             "Webb, James Henry Jr."

There are 243796 observations, 18 variables and 24 candidates in this dataset. Let us take a look at the contributions.

summary(pa_data$contb_receipt_amt)
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
## -93308.0     15.0     27.0    102.8     80.0  10800.0

The contributions show some negative amounts, which appear to be refunds. We will remove these in our analysis. And while the dataset contains post election contributions upto December 31st, 2016, we will consider contributions only upto the election day for most of our analysis.

summary(pa$contb_receipt_amt)
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
##     0.05    15.00    28.00   108.85    80.00 10800.00


The median contribution is $28 which indicates that people seem to be making donations in small amounts. Let’s look at the distribution of the contributions.

As the contributions have extreme low and high values it is not possible to look at the details from this distribution. Hence, transforming the x scale to log 10 here may help us visualize the plot better.


Now, the distribution looks normal, and we can see that amounts less than $100 seem to be most common.
We shall now take a look at how the contributions looked for different candidates. I will be creating a new variable for party affiliation of the candidates here, to be able to distinguish between them.


It looks like a few candidates received most contributions. Since many of the candidates dropped out of the presidential race early on, or were not as consequential to the elections as others, going forward, it would be more productive to focus on the top six candidates. Let’s just look at the top six candidates in the next plot to see how they compare against each other.


Hillary Clinton has the highest number of supporters followed by Sanders and then Trump. While there are more republican candidates than democrats, 49.34% of the contributions went to Hillary Clinton alone.

Now let us consider where the contributors come from i.e. which cities/towns have most contributors.


Not surprisingly, it looks like the more populous cities seem to have more donors.

At this point, it would also be useful to add the latitude and longitude variables for each contribution, so that it will help locate how the contributions came in geographically, in later part of the analysis. I will use the zipcode package for this purpose.

Next, let us see how the supporters vary by their occupation and plot the top six. Since there are entries with occupation listed as “Information requested”, we will need to eliminate those. Also we will do some data cleaning here and combine some occupations like NOT EMPLOYED and UNEMPLOYED as one. There is truly a lot of work that can be done here, but I will clean only a handful that seem relevant to the analysis.



About 24% of the contributors were retired, as found in this analysis. I am interested in finding which candidates did they donate to?

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.55   20.00   35.00   86.56   80.00 5400.00


Almost 53.4% of the retired people who donated, chose to contribute to Hillary.

While, retired people have made the most contributions, they are followed by unemployed donors. Surely, unemployed contributors made only about 7.8% of the total contributors, yet they are a significant number, given that there wasn’t a steady source of income for them. It would be interesting to see how they contributed.


It’s interesting to see that Bernie Sanders had overwhelming support (89.24%) from the unemployed citizens of Pennsylvania. Sander’s popularity amongst the unemployed has been noted previously in the media as well.

Since I see a difference in choice of candidate here, let’s quickly check who was more popular with the attorneys, who formed the third largest group of supporters.

Hillary again received overwhelming support from attorneys as well (78%).
Next let us analyse how the contributions varied by party.


The democrats in Pennsylvania had almost 3x supporters as republicans.

We will now try to analyse how men and women donated. Since the dataset doesn’t contain this information, we will need to add two more variables here.



There are slightly more female contributors than male in this dataset.

At this point I would also like to find out how many male and female donors did Hillary Clinton have. Being a woman, was she more popular with female supporters?



Hillary Clinton has almost twice the number of female supporters as male, which was what one would have expected.

One final thing I would like to take a look at is the post election contributions. Usually these are contributions made to pay off campaign debt or for re-eelction.


Trump received 1296 contributions post election, a possible reason for that could be in this article. The other candidates didn’t seem to have received any considerable amount of money. Hillary Clinton infact had not a single contribution from Pennsylvania after she lost the election.

Univariate Analysis

What is the structure of your dataset?

There are 243796 observations, 18 variables and 24 candidates in this dataset. The variables in the dataset are:
  • cmte_id: Committee ID
  • cand_id: Candidate ID
  • cand_nm: Candidate Name
  • contbr_nm: Contributor Name
  • contbr_city: Contributor City
  • contbr_st: Contributor State
  • contbr_zip: Contributor Zipcode
  • contbr_employer: Contributor Employer
  • contbr_occupation: Contributor Occupation
  • contb_receipt_amt: Contribution Receipt Amount
  • contb_receipt_dt: Contribution Receipt Date
  • receipt_desc: Receipt Description
  • memo_cd: Memo Code
  • memo_text: Memo Text
  • form_tp: Form Type
  • file_num: File Number
  • tran_id: Transaction ID
  • election_tp: Election Type

What is/are the main feature(s) of interest in your dataset?

The variables that interest me the most from this dataset are the contribution amount, contributor date and contributor occupation. I will be using these variables to explore the dataset to find amount of money candidates raised, how many supporters they had, where did the supporters come from and what party, gender or occupation they belonged to.

What other features in the dataset do you think will help support your into your feature(s) of interest?

The other features that will help support my investigations would be contributor zipcode, party affiliation of candidates and timeline of contributions made.

Did you create any new variables from existing variables in the dataset?

Yes, I created the following new variables:

  • contb_year: Contribution Year extracted from contb_receipt_dt
  • contb_month: Contribution Month extracted from contb_receipt_dt
  • contb_day: Contribution Day extracted from contb_receipt_dt
  • party: Candidate’s party affiliation
  • first_nm: Contributor’s first name
  • gender: Contributor’s gender (predicted from the first name using gender package)
  • latitude: Contributor’s Latitude
  • longitude: Contributor’s Longitude

Of the features you investigated, were there any unusual distributions?
Did you perform any operations on the data to tidy, adjust, or change the form
of the data? If so, why did you do this?

The following operations were performed:

  • The negative contributions were omitted as there was no information about what they were. They were likely to be refunds and hence not contribute to the analysis in any way.
  • The histogram for number of contributions came out skewed and was converted to logarithmic scale to be able to be viewed better.

Bivariate Plots Section


Let’s take a look at the contribution amount raised by all parties.


The democratic party raised 1.5 times more money than the republican party. Let us look at the contributions in more detail through boxplots. Also since the amount raised by other parties is very small, we will focus on the democrat and republican parties only.

## pa$party: democrat
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.05   14.00   25.00   88.07   50.00 5400.00 
## -------------------------------------------------------- 
## pa$party: other
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    3.00   29.14   75.00  172.27  215.93 2700.00 
## -------------------------------------------------------- 
## pa$party: republican
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     0.8    25.0    50.0   168.4   100.0 10800.0


Since there are so many outliers, it is not possible to infer much from this plot. So we will transform the y axis by applying the Log10 scale.

The median for republican is $50 and that for democrats is $25. As we can see from the plot, the median amount for republican party is much higher (twice) as that of democratic party. While both have outliers, there are many small donations (< $1) that are made to democrats.

Now, let’s also take a look at how much money the candidates raised individually.



Hillary Clinton is clearly a winner here. 50% of the money raised in Pennsylvania went to Hillary alone. Let us look at the contributions for the candidates in detail.

## pa$cand_nm: Bush, Jeb
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     1.0    50.0   250.0   909.9  2000.0  5400.0 
## -------------------------------------------------------- 
## pa$cand_nm: Carson, Benjamin S.
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1.00   25.00   50.00   94.94  100.00 5400.00 
## -------------------------------------------------------- 
## pa$cand_nm: Christie, Christopher J.
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##      15     425    1000    1463    2700    5400 
## -------------------------------------------------------- 
## pa$cand_nm: Clinton, Hillary Rodham
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.05   15.00   25.00  109.87   90.00 5000.00 
## -------------------------------------------------------- 
## pa$cand_nm: Cruz, Rafael Edward 'Ted'
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     1.0    25.0    50.0   101.1   100.0  8000.0 
## -------------------------------------------------------- 
## pa$cand_nm: Fiorina, Carly
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     3.0    50.0   100.0   373.1   500.0  5000.0 
## -------------------------------------------------------- 
## pa$cand_nm: Graham, Lindsey O.
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##      25      50     250     945    1375    5400 
## -------------------------------------------------------- 
## pa$cand_nm: Huckabee, Mike
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##       1      25      50     244     100    5400 
## -------------------------------------------------------- 
## pa$cand_nm: Jindal, Bobby
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   20.16   50.00  100.00  584.24  500.00 2700.00 
## -------------------------------------------------------- 
## pa$cand_nm: Johnson, Gary
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     3.0    35.0   100.0   241.8   250.0  2700.0 
## -------------------------------------------------------- 
## pa$cand_nm: Kasich, John R.
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    10.0   100.0   250.0   645.1  1000.0  2700.0 
## -------------------------------------------------------- 
## pa$cand_nm: Lessig, Lawrence
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   11.98   87.50  100.00  306.52  250.00 2700.00 
## -------------------------------------------------------- 
## pa$cand_nm: McMullin, Evan
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    5.00   25.00   50.00   76.28  100.00  250.00 
## -------------------------------------------------------- 
## pa$cand_nm: O'Malley, Martin Joseph
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    15.0   250.0   500.0   897.4  1000.0  5400.0 
## -------------------------------------------------------- 
## pa$cand_nm: Pataki, George E.
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   125.0   187.5   250.0   208.3   250.0   250.0 
## -------------------------------------------------------- 
## pa$cand_nm: Paul, Rand
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     1.0    25.0    50.0   122.8   100.0  2700.0 
## -------------------------------------------------------- 
## pa$cand_nm: Perry, James R. (Rick)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##      25     250     250     745     500    2700 
## -------------------------------------------------------- 
## pa$cand_nm: Rubio, Marco
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     2.0    25.0    75.0   411.7   250.0 10800.0 
## -------------------------------------------------------- 
## pa$cand_nm: Sanders, Bernard
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1.00   10.00   27.00   42.63   50.00 5000.00 
## -------------------------------------------------------- 
## pa$cand_nm: Santorum, Richard J.
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     5.0   100.0   500.0   825.2  1000.0  5400.0 
## -------------------------------------------------------- 
## pa$cand_nm: Stein, Jill
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    4.20   29.00   50.00   91.89  100.00 1000.00 
## -------------------------------------------------------- 
## pa$cand_nm: Trump, Donald J.
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     0.8    28.0    40.0   144.2   101.0  5000.0 
## -------------------------------------------------------- 
## pa$cand_nm: Walker, Scott
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    10.0   206.5   300.0   919.0  1000.0 10800.0 
## -------------------------------------------------------- 
## pa$cand_nm: Webb, James Henry Jr.
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    50.0   100.0   250.0   445.5   375.0  2600.0


As we can infer from the boxplot above, Hillary Clinton has the lowest median contribution ($25), while Christopher Christie has the highest ($1000). Jeb Bush has the largest IQR while George Pataki has the smallest. Clinton, Ted Cruz, Sanders and Trump have more donors who made large amounts of donations, compared to other candidates. Marco Rubio and Walker Scott have both received maximum donation of $10800, which is the largest among all candidates.

Let’s take a look at what percentage of contribution amount each candidate received within their own party.


For both the parties, only a few candidates got most of the donations. Hillary Clinton received a whooping 83% of the donation amount in the democratic party, while Trump received 40% in the republican party. Five candidates within the republican party even received less than 1% money donated to republican party.


We shall now analyse the contributions raised by occupations
It’s interesting to note that while unemployed people were the second most in number of contributions made, attorneys raised the second highest amount of contributions. Let’s plot the average contribution to get a better idea.

It seems like the higher paying professions like Attorneys or Doctors donate more on average.
We saw that many retired people donated to Hillary Clinton while, most unemployed donated to Bernie Sanders. Let’s take a look at how various occupations donate party wise. Are some professions more likely to donate to democrats or republicans?

Most professions in Pennsylvania seem to donate to democrats. Only the retired people seem to be somewhat evenly split between both parties.

Let’s see how the contributions look for men and women.


Men raised more than women in total, even though the number of contributions from women were slightly higher than men as seen earlier. The box plot will give more idea about the way they donated.

## pa.gender$gender: female
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.05   15.00   25.00   88.21   55.00 5400.00 
## -------------------------------------------------------- 
## pa.gender$gender: male
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
##     0.09    20.00    35.00   129.10   100.00 10800.00


The median contribution of men is 40% higher than women. And if we look at the IQR, the third quartile and maximum donations are higher for men than women. So men seem to have made more big donations than women.

It would be interesting to see how men and women contributed party wise.


Female donors seem to support democrats more than republicans in Pennsylvania. As one can infer from the plot, democrats have almost 4x contributions from women than republicans. Let’s analyse what share of these donations went to Hillary Clinton. Did women donate more to democratic party because of Hillary?


75% of the women supporters who contributed to democratic party, contributed to Hillary, which supports the idea that women tended to donate more to a female candidate.
We shall now visit the contributions made in the year 2016 and look at them in more detail.

Democrats have been consistently leading in the number of suporters every month, right upto the election day. The only time, republicans came somewhat close to garner enough support as democrats was in the month of July. A possible trigger for that could be, announcement of Trump as presidential nominee.


Bivariate Analysis

Talk about some of the relationships you observed in this part of the investigation. How did the feature(s) of interest vary with other features in the dataset?

In Pennslyvania, the Democratic Party had 3x the number of republican supporters and raised 1.5x more money than republicans. The number of contributions during the election year were consistently high for democrats every month. Hillary Clinton received 83% of total money donated within her party while Trump received 40% of total donations within his. In both parties few candidates received most donations.

The median contribution amount for democratic party was $25 while that for republican party was $50. In general, men contribute more than women.

Retired people contributed the most, while Attorneys had the highest average donation amount. While unemployed people were the second largest contributors in number, they contributed the lowest average donation amount.

Did you observe any interesting relationships between the other features (not the main feature(s) of interest)?

More retired people (53.4%) and attorneys (78%) donated to Hillary Clinton, while most unmeployed supporters (89%) donated to Bernie Sanders.

Hillary Clinton received twice as many contributions from women than men.

75% of the donations that women made to the democratic party, went to Hillary Clinton.

What was the strongest relationship you found?

Since most variables are non numeric except amount donated, it is not possible to compute correlation directly here. However from the analysis it looks like a strong relationship exists between gender and party. Women it seems are more likely to donate to democrats, because of Hillary.

Also from the analysis possibly strong relationships could exist between population of a city/town and the number of donations.

Also there could possibly be a strong relationship between income and amount donated, as we have observed that higher income professions like attorneys and doctors tend to donate more on average.

Population data and salary statistics would be necessary to find an exact relationship, along with the variables that already exist in this dataset.

Multivariate Plots Section

We have seen which candidates raised more money and how much of the share they received within their own party. We also found that the democrats were consistently leading in the contributions than the republicans. It would be interesting to explore this relationship further. I would like to find out now how each of the candidates received these donations over a period of time. So let’s look at some of the scatter plots of this data.


We can see that as the election neared, Hillary was raising the highest contributions, while Trump’s contributions fell closer to the elections. Let’s now take a look at the number of supporters they had.


Note: I have used LOESS moothing method from smoothing the scatterplot above. We can see that Bernie Sanders had far greater supporters than Hillary Clinton between January 2016 until June 2016 when Hillary was declared the presidential nominee.

On the other hand, as far as the republican candidates were concerned, Ted Cruz was steadily leading the campaign and Ben Carson followed a close second, until they gave up their candidacy. Trump actually had the lowest supporters, until he was declared the republican presidential nominee, after which his support soared.

Let’s take a look now at where the supporters came from on a map. Before plotting the zipcodes on the map, I will check if there are any odd datapoints here.

##       contbr_zip contbr_city      party                 cand_nm
## 13698      75014       PARIS   democrat Clinton, Hillary Rodham
## 29305      75014       PARIS   democrat        Sanders, Bernard
## 32424      61462    ILLINOIS republican        Trump, Donald J.
## 32528      85710      TUCSON republican          Huckabee, Mike
## 37944      75017       PARIS   democrat Clinton, Hillary Rodham
## 40845      75014       PARIS   democrat Clinton, Hillary Rodham
##       contb_receipt_amt   zip     city state latitude  longitude
## 13698             25.00 75014   Irving    TX 32.76727  -96.77763
## 29305             50.00 75014   Irving    TX 32.76727  -96.77763
## 32424            267.29 61462 Monmouth    IL 40.91885  -90.64466
## 32528            250.00 85710   Tucson    AZ 32.21329 -110.82559
## 37944            500.00 75017   Irving    TX 32.76727  -96.77763
## 40845             25.00 75014   Irving    TX 32.76727  -96.77763
##       contbr_zip      contbr_city      party          cand_nm
## 288        99999 SPRINGBROOK TWP. republican Trump, Donald J.
## 290        99999       BEDMINSTER republican Trump, Donald J.
## 1695        <NA>       PITTSBURGH republican    Walker, Scott
## 1703           0         GLENMORE republican    Walker, Scott
## 10097      17373       DALLASTOWN republican Trump, Donald J.
## 10098      17373       DALLASTOWN republican Trump, Donald J.
##       contb_receipt_amt   zip city state latitude longitude
## 288               98.34 99999 <NA>  <NA>       NA        NA
## 290               48.61 99999 <NA>  <NA>       NA        NA
## 1695            2700.00  <NA> <NA>  <NA>       NA        NA
## 1703             500.00     0 <NA>  <NA>       NA        NA
## 10097            150.00 17373 <NA>  <NA>       NA        NA
## 10098            100.00 17373 <NA>  <NA>       NA        NA


As you can see, some of the zipcodes are not listed, or are outside PA, even though the state is listed as PA. Since it is true that the Federal Election Commission is very strict about not allowing donations outside the state, there is possibly some error here and the data needs more cleaning. For now, I will ignore the zipcodes outside of Pennslyvania. (39 have been found).

Let’s now plot the contributions on the map.

While there is some overplotting in the major city regions, it is still evident from the map that the sparsely populated areas on the map which include small towns contribute more to republicans, while the cities seem to have more contributions made to the democratic party.

Multivariate Analysis

Talk about some of the relationships you observed in this part of the investigation. Were there features that strengthened each other in terms of looking at your feature(s) of interest?

From the map, it can be observed that people in cities or areas with larger concentration of population donated more to democrats, while those in sparsely populated regions had more republican donors.

Were there any interesting or surprising interactions between features?

Bernie Sanders was leading in supporters than Hillary Clinton throughout, until Hillary was declared the presidential nominee. Trump on the other hand had almost no support in comparison to other candidates. But his support soared after his nomination.

Final Plots and Summary

Retired people and attorneys supported Hillary, while the unemployed supported Bernie Sanders


It’s both necessary and interesting for campaigns to find out what demographic their supporters belong to. Hillary Clinton received support from retired people who donated the most money and also from attorneys who formed the third largest group of supporters. Yet it is worthwhile to note that unemployed people didn’t support her and chose to contribute to Bernie Sanders instead.

Percentage of donation received by presidential nominee in the party


Hillary Clinton received 83%, while Trump received only 40% of the donations from their respective parties. The fact that there were more republican candidates in the race, split up Trump’s donations.

How the candidates received donations over time


Bernie Sanders was leading with maximum number of contributions, until he gave up his run. After that, Hillary’s support soared closer to the election. Trump on the other hand had the lowest contributions, until others dropped out of the race and he was declared presidential nominee. While his supported rose after his nomination, it dwindled closer to the election.


Reflection

The presidential election of 2016 was both controversial and interesting and it threw a surprise in the end, when Donald Trump won states nobody was expecting him to. I chose to analyse the financial contributions made to presidential campaigns in Pennsylvania, as it was a democratic stronghold state and the contribution trend in this state would be interesting to discover.

Conclusion

  • The democratic party had more supporters and raised more money than the republicans.
  • More women donated to democrats and specifically to Hillary Clinton.
  • Hillary Clinton had the highest number of contributions in this dataset and 50% of the total donations throughout the state were made to her campaign.
  • 53.4% retired people and 78% of the attorneys supported Hillary while most unemployed (89%) supported Sanders.
  • Hillary Clinton received 83% of the total donations made to her party, while Trump received 40%.
  • Retired people made the most contributions, followed by unemployed.
  • Higher paying occupations like attorneys and doctors had the highest average contributions.
  • While there were slightly more female contributors than men, men raised more money than women.
  • Bernie Sanders was leading with maximum number of contributions, until he gave up his run. After that, Hillary’s support soared closer to the election. Trump on the other hand had the lowest contributions, until others dropped out of the race and he was declared presidential nominee.

Struggles

I wasn’t familiar with the gender, zipcode and ggmap packages and spent some time learning to use them for this analysis. I also spent lot of time improving aesthetics of the plots and adding features like hover for one particular scatterplot using the package plotly which took some time.

There are 8370 unique occupations listed in this dataset, many of which imply the same profession. I spent some time in trying to clean this information up, but had to limit myself to only a chosen few occupations, as it was a time consuming task.

Success

  • I wouldn’t have been able to complete this analysis without the ggplot2 and dplyr packages. They are absolutely powerful and critical to this analysis. Learning them was a valuable experience.
  • It was also an enriching experience to use ggmap and plotly which made my visualizations so much better, and there is a lot more in there that I am eager to discover.
  • There was a clear difference in the demographics of republican and democrat donors. There was even a difference between Clinton and Sanders’ donors. These insights were certainly significant for me and I am hopeful would help campaign managers in future.

Future Work

It would have really helped to have county information for every contribution in this dataset. But since it’s not present, it would be a wonderful addition to obtain this information possibly through webscraping and include it in this dataset. It might help discover new insights on donation trends in counties, especially those like Wilkes-Barre or Erie that swung from blue to red this election.

It would also be a great idea to find correlation between income and contributions made if this data can be obtained through census data for Pennsylvania for various occupations. Similarly, another area to explore would be to find if densely populated counties make more donations the sparsely poulated ones?

Finally, this analysis is only for the state of Pennsylvania. It would be interesting to analyse campaign data for swing states like Florida or Ohio. To gain even more useful insights, it would help to expand the analysis to the nation as a whole and study the data for various candidates.