Project

#BTW17 - Recording and Analyzing Twitter data of the Federal German Elections 2017 (election campaigns for the 19th German Bundestag)

Goal: Data-driven political campaigns can be successful. "The Obama 2012 campaign used data analytics and the experimental method to assemble a winning coalition vote by vote. In doing so, it overturned the long dominance of TV advertising in U.S. politics and created something new in the world: a national campaign run like a local ward election, where the interests of individual voters were known and addressed."

But four years later, Hillary Clinton’s data-driven campaign organized by the same party failed under the eyes of the world. The question is why data-driven campaigns worked for Barack Obama but not for Hillary Clinton? There is an interesting article proclaiming that ‘data-driven’ campaigns are killing the US Democratic Party, because the wrong lessons from Obama’s success have been learned. Dave Gold states that "Democrats have allowed microtargeting to become microthinking. Each cycle, we speak to fewer and fewer people and have less and less to say" although adressing the right audience. So, whether this is true or not can not be answered by a dataset. However, it should be obvious for the reader that data collected acompanying such election campaigns might contain worthful insights.

Especially Twitter analysis of US election campaigns are done for a while. However, there exist only some open accessible Twitter datasets with a clear focus on political election campaigns in countries of the European Union. That is why this project records Twitter interactions for one further European country (Germany).

The project has the following objectives:

- Record a representative dataset of Twitter interactions during to pre- and hot-phase for the 19th German Bundestag elections.
- Provide this dataset via an Open Data platform like Zenodo.
- Develop or contribute to pragmatic software tools that enable to record Twitter datasets over long period of times.
- Perform a network analysis of the recorded dataset regarding political parties in Germany.
- Publish analysis results in Open Access scholary contribution channels.

Updates
0 new
9
Recommendations
0 new
3
Followers
0 new
5
Reads
0 new
112

Project log

Nane Kratzke
added a research item
The data from social networks like Twitter is a valuable source for research but full of redundancy, making it hard to provide large-scale, self-contained, and small datasets. The data recording is a common problem in social media-based studies and could be standardized. Sadly, this is hardly done. This paper reports on lessons learned from a long-term evaluation study recording the complete public sample of the German and English Twitter stream. It presents a recording solution proposal that merely chunks a linear stream of events to reduce redundancy. If events are observed multiple times within the time-span of a chunk, only the latest observation is written to the chunk. A 10 Gigabyte Twitter raw dataset covering 1,2 Million Tweets of 120.000 users recorded between June and September 2017 was used to analyze expectable compression rates. It turned out that resulting datasets need only between 10% and 20% of the original data size without losing any event, metadata or the relationships between single events. This kind of redundancy reduction recording makes it possible to curate large-scale (even nationwide), self-contained, and small datasets of social networks for research in a standardized and reproducible manner.
Nane Kratzke
added an update
I am glad that the Twitter Datasets community on Zenodo has been added another dataset about "Anatomy of an online misinformation network".
 
Nane Kratzke
added an update
Twista is a stream and analysis command line tool-suite to record and analyze Twitter streams. Twista provided via Github: https://github.com/nkratzke/twista
However, the best thing of the 0.2.0 release is, that there is a small Wiki covering:
  • Introduction how to use Twista
  • Twista API Cookbook how to use Twista for analysis
 
Nane Kratzke
added a research item
Von Juni bis September wurden Twitter-Interaktionen mit deutschen Politikern des 18. deutschen Bundestags und bundespolitisch relevanten Politikern der FDP und AfD ”mitgeschnitten” und als Open Data Datensatz für weitere Analysen aufbereitet. Insgesamt wurden die Accounts von 364 Politikern verfolgt. Im Rahmen dessen wurden etwa 120.000 Twitter User erfasst, die gemeinsam über 1.2 Mio. Tweets erzeugt haben. Dies entspricht einer Stichprobe von etwa 5% des tatsächlichen Traffics auf Twitter. Die Gesamtmenge der erfassten Daten beträgt ca. 10 GB. Der Vortrag stellt erste Erkenntnisse vor, die in diesem Datensatz zu finden sind. Dabei wird einigen Fragen nachgegangen, z.B. ob es ”laute” und ”leise” Parteien auf Twitter gibt? Lässt sich die politische Nähe von Twitter Nutzern zu Parteien ableiten? Eignet sich Twitter als Instrument für die Meinungsforschung? Und vor allem: War das Wahlergebnis bereits im Vorfeld absehbar?
Nane Kratzke
added an update
I am glad to announce that the Dataset Descriptor for the #BTW17 Twitter Dataset is available here:
This descriptor has been published in the Journal Data and provides additional insights how the #BTW17 Twitter Dataset has been covered. Furthermore it provides some hints and limitations what can be analyzed using this Open Access dataset.
 
Nane Kratzke
added a research item
The German Bundestag elections are the most important elections in Germany. This dataset comprises Twitter interactions related to German politicians of the most important political parties over several months in the (pre-)phase of the German federal election campaigns in 2017. The Twitter accounts of more than 360 politicians were followed for four months. The collected data comprise a sample of approximately 10 GB of Twitter raw data, and they cover more than 120,000 active Twitter users and more than 1,200,000 recorded tweets. Even without sophisticated data analysis techniques, it was possible to deduce a likely political party proximity for more than half of these accounts simply by looking at the re-tweet behavior. This might be of interest for innovative data-driven party campaign strategists in the future. Furthermore, it is observable, that, in Germany, supporters and politicians of populist parties make use of Twitter much more intensively and aggressively than supporters of other parties. Furthermore, established left-wing parties seem to be more active on Twitter than established conservative parties. The dataset can be used to study how political parties, their followers and supporters make use of social media channels in political election campaigns and what kind of content is shared.
Nane Kratzke
added an update
The dataset description is prepared to be published in the Open Access Journal "Data".
So, stay tuned.
 
Nane Kratzke
added an update
Now available: 10GB #BTW17 #dataset (German Election Campaigns on Twitter, 120.000 users and 1.2 Mio. Tweets)
 
Nane Kratzke
added an update
I am looking forward to release the Twitter dataset of my #BTW17 project.
Currently, the dataset comprises 10GB of Twitter raw data, covering more than 100.000 Twitter accounts and more than 1.000.000 tweets. I hope this dataset will be a great resource for natural language processing. One week to go, and the tweet volume is increasing day by day.
The dataset is already prepared to be published on Zenodo and already has a registered DOI 10.5281/zenodo.835735. So, stay tuned.
 
Nane Kratzke
added a research item
Twista is a Twitter streaming and analysis command line tool suite implemented in Python 3.6. It provides the following core features: To crawl HTML pages for Twitter accounts, to collect Tweets (statuses, replies, retweets, replies) for a specified set of screennames, and to transform collected chunks of Tweets into a NetworkX graph for follow up analysis of observed Twitter interactions.
Nane Kratzke
added a project goal
Data-driven political campaigns can be successful. "The Obama 2012 campaign used data analytics and the experimental method to assemble a winning coalition vote by vote. In doing so, it overturned the long dominance of TV advertising in U.S. politics and created something new in the world: a national campaign run like a local ward election, where the interests of individual voters were known and addressed."
But four years later, Hillary Clinton’s data-driven campaign organized by the same party failed under the eyes of the world. The question is why data-driven campaigns worked for Barack Obama but not for Hillary Clinton? There is an interesting article proclaiming that ‘data-driven’ campaigns are killing the US Democratic Party, because the wrong lessons from Obama’s success have been learned. Dave Gold states that "Democrats have allowed microtargeting to become microthinking. Each cycle, we speak to fewer and fewer people and have less and less to say" although adressing the right audience. So, whether this is true or not can not be answered by a dataset. However, it should be obvious for the reader that data collected acompanying such election campaigns might contain worthful insights.
Especially Twitter analysis of US election campaigns are done for a while. However, there exist only some open accessible Twitter datasets with a clear focus on political election campaigns in countries of the European Union. That is why this project records Twitter interactions for one further European country (Germany).
The project has the following objectives:
- Record a representative dataset of Twitter interactions during to pre- and hot-phase for the 19th German Bundestag elections.
- Provide this dataset via an Open Data platform like Zenodo.
- Develop or contribute to pragmatic software tools that enable to record Twitter datasets over long period of times.
- Perform a network analysis of the recorded dataset regarding political parties in Germany.
- Publish analysis results in Open Access scholary contribution channels.