#BTW17 - Recording and Analyzing Twitter data of the Federal German Elections 2017 (election campaigns for the 19th German Bundestag)
The data from social networks like Twitter is a valuable source for research but full of redundancy, making it hard to provide large-scale, self-contained, and small datasets. The data recording is a common problem in social media-based studies and could be standardized. Sadly, this is hardly done. This paper reports on lessons learned from a long-term evaluation study recording the complete public sample of the German and English Twitter stream. It presents a recording solution proposal that merely chunks a linear stream of events to reduce redundancy. If events are observed multiple times within the time-span of a chunk, only the latest observation is written to the chunk. A 10 Gigabyte Twitter raw dataset covering 1,2 Million Tweets of 120.000 users recorded between June and September 2017 was used to analyze expectable compression rates. It turned out that resulting datasets need only between 10% and 20% of the original data size without losing any event, metadata or the relationships between single events. This kind of redundancy reduction recording makes it possible to curate large-scale (even nationwide), self-contained, and small datasets of social networks for research in a standardized and reproducible manner.
Von Juni bis September wurden Twitter-Interaktionen mit deutschen Politikern des 18. deutschen Bundestags und bundespolitisch relevanten Politikern der FDP und AfD ”mitgeschnitten” und als Open Data Datensatz für weitere Analysen aufbereitet. Insgesamt wurden die Accounts von 364 Politikern verfolgt. Im Rahmen dessen wurden etwa 120.000 Twitter User erfasst, die gemeinsam über 1.2 Mio. Tweets erzeugt haben. Dies entspricht einer Stichprobe von etwa 5% des tatsächlichen Traffics auf Twitter. Die Gesamtmenge der erfassten Daten beträgt ca. 10 GB. Der Vortrag stellt erste Erkenntnisse vor, die in diesem Datensatz zu finden sind. Dabei wird einigen Fragen nachgegangen, z.B. ob es ”laute” und ”leise” Parteien auf Twitter gibt? Lässt sich die politische Nähe von Twitter Nutzern zu Parteien ableiten? Eignet sich Twitter als Instrument für die Meinungsforschung? Und vor allem: War das Wahlergebnis bereits im Vorfeld absehbar?
The German Bundestag elections are the most important elections in Germany. This dataset comprises Twitter interactions related to German politicians of the most important political parties over several months in the (pre-)phase of the German federal election campaigns in 2017. The Twitter accounts of more than 360 politicians were followed for four months. The collected data comprise a sample of approximately 10 GB of Twitter raw data, and they cover more than 120,000 active Twitter users and more than 1,200,000 recorded tweets. Even without sophisticated data analysis techniques, it was possible to deduce a likely political party proximity for more than half of these accounts simply by looking at the re-tweet behavior. This might be of interest for innovative data-driven party campaign strategists in the future. Furthermore, it is observable, that, in Germany, supporters and politicians of populist parties make use of Twitter much more intensively and aggressively than supporters of other parties. Furthermore, established left-wing parties seem to be more active on Twitter than established conservative parties. The dataset can be used to study how political parties, their followers and supporters make use of social media channels in political election campaigns and what kind of content is shared.
Twista is a Twitter streaming and analysis command line tool suite implemented in Python 3.6. It provides the following core features: To crawl HTML pages for Twitter accounts, to collect Tweets (statuses, replies, retweets, replies) for a specified set of screennames, and to transform collected chunks of Tweets into a NetworkX graph for follow up analysis of observed Twitter interactions.