ArticlePDF Available

Abstract and Figures

We present an open-source interface for scientists to explore Twitter data through interactive network visualizations. Combining data collection, transformation and visualization in one easily accessible framework, the twitter explorer connects distant and close reading of Twitter data through the interactive exploration of interaction networks and semantic networks. By lowering the technological barriers of data-driven research, it aims to attract researchers from various disciplinary backgrounds and facilitates new perspectives in the thriving field of computational social science.
Content may be subject to copyright.
ISSN: 2003-1998
This article is published under a CC BY-SA license
VOL. 3, NO. 1, 2021, 106118
Armin Pournaki, Felix Gaisbauer, Sven Banisch and Eckehard Olbrich*
We present an open-source interface for scientists to explore Twitter data through
interactive network visualizations. Combining data collection, transformation and
visualization in one easily accessible framework, the twitter explorer connects distant
and close reading of Twitter data through the interactive exploration of interaction
networks and semantic networks. By lowering the technological barriers of data-
driven research, it aims to attract researchers from various disciplinary backgrounds
and facilitates new perspectives in the thriving field of computational social science.
Keywords: Twitter, complex networks, interface, digital methods, computational
social science.
* Max Planck Institute for Mathematics in the Sciences, Leipzig, Germany.
Due to its public-by-default nature and the possibility of calling data sets
conveniently via an API, Twitter has become a widely used source for the
observation and analysis of political debates (Conover, Gonçalves, et al. 2011;
Gaumont, Panahi, and Chavalarias 2018), sentiments (Paltoglou and Thelwall
2017), brand communication (Nitins and Burgess 2014), or natural disasters (Bruns
and Burgess 2014), to name a few. Different kinds of interactions on Twitter
(Rainie 2014) are often represented in the form of networks, such as retweet
networks (Conover, Gonçalves, et al. 2011; Conover, Ratkiewicz, et al. 2011), reply
networks (Gaisbauer et al. 2020), mention networks (Conover, Ratkiewicz, et al.
2011), follower networks (Myers et al. 2014) or co-hashtag networks (Burgess and
Matamoros-Fernández 2016). While many of the employed methods, building on
concepts from graph theory and network science, can be regarded as distant reading
approaches, it is undoubtedly crucial for social science researchers to perform a
close reading
of digital traces to gain a more focused and specific understanding of
their objects of research. As an interface that bridges the two approaches, the twitter
explorer gives an "overview of the data that highlights potentially interesting
patterns", while allowing a "drill down on. these patterns for further exploration"
(Jänicke et al. 2015). This means that the structural overview given by the network
allows the user to find the relevant content through a framework we present as
"guided close reading". In this context, we conceive the twitter explorer as a social
media observatory, enabling users to "capture the complexities of social behaviour
[...] through computational analyses of digital media data" (Willaert et al. 2020).
There exists a wide range of tools for collecting, analyzing and visualizing Twitter
data, some of which are referenced on Twitter’s own website (Twitter 2020e).
Among the most popular tools are DMI tcat (Borra and Rieder 2014) for data
collection and analysis in combination with the powerful network visualization suite
Gephi (Bastian, Heymann, and Jacomy 2009). While many existing solutions are
suited for one specific task and rely on the interplay and compatibility of several
applications, the twitter explorer provides an open framework that combines data
collection, transformation and visualization and allows users to explore the collected
Twitter corpus interactively, while being open to external data sources and analysis
suites through data import and export. To better situate the twitter explorer in its
context, a comparison of existing tools is presented in Table 1 below.
These terms were originally coined by Franco Moretti in the context of literary studies (Moretti
2000). Close reading refers to "the thorough interpretation of a text passage" (Jänicke et al. 2015),
while distant reading "aims to generate an abstract view by shifting from observing textual content
to visualizing global features of a single or of multiple text(s)" (Jänicke et al. 2015).
Table 1. A comparison of tools for access, analysis and visualization of Twitter data. Due
to the steady pace of tool development in this field of research, this list cannot be exhaustive.
However, we aim to give an overview of some popular methods and their features. A
checkmark in parenthesis denotes basic or experimental functionality. Note that we
included almost only open-source software in the table. Furthermore, we chose to omit tools
that were not maintained anymore.
data access
data analysis
data visualization
data flow
last commit
twitter explorer
DMI tcat3
( )
NodeXL Pro4
( )
( )
( )
OSoMe Networks
( )
The twitter explorer consists of three components:
The collector, a Streamlit-powered
(Treuille, Teixeira, and Kelly 2020)
application provides a graphical user interface for the Twitter Search API and
saves the collected data for further processing.
The visualizer, a Streamlit-powered application provides a graphical user
interface for the generation of interaction networks and semantic networks
based on the collected data and saves the interactive networks.
The explorer interface allows users to interact with the networks and explore
the underlying metadata of nodes and links.
Each of these components is conceived in a modular way which facilitates adding
new features to the twitter explorer (see Figure 1).
DocNow (2020)
Borra and Rieder (2014)
Smith (2013)
Bastian, Heymann and Jacomy (2009)
Jünger and Keyling (2019)
TWINT-Project (2018)
VOSON-Lab (2018)
Young (2020)
Davis et al. (2016)
Streamlit is a Python library for the creation and deployment of data-analytic tools
Figure 1. The twitter explorer framework. The collector (left), after having set up the
credentials, allows for connection to the Twitter Search API and saves the collected
tweets in jsonl format. They are then passed on to the visualiser (middle), where the
user can get an overview of the content and then create the retweet- and hashtag
networks. The interactive networks are generated as html files that can be explored in
the web browser. The modular structure of the three components facilitates the
development of new features, which are suggested by the light grey boxes.
In the collector, the user interacts with the Twitter Search API (Twitter 2020f),
giving access to a limited set of tweets from the last 7 days.
3.1.1 Authentication
Since 2018, users need to apply for a Twitter Developer Account in order to access
the API (Roth and Johnson 2018). Since the collector makes direct API calls, this
step is necessary for its usage. There are developer accounts specific to academic
research (Twitter data for academic research 2020). The user can then create app
tokens which will allow the twitter explorer to connect to the API via Application-
only authentication (OAuth 2.0) (Twitter 2020a).
3.1.2 Collection
There are different APIs for users to collect Twitter data. The Stream API (Twitter
2020g) filters all incoming tweets for a given search string. It can be used to collect
tweets containing a certain keyword, or to collect all tweets by a certain (group of)
user(s). This API allows the retrieval of all published tweets and is only capped by
the upper bound of 1% of the total Twitter traffic. The twitter explorer has no built-
saves data as
saves networks as
Collector Visualizer Explorer
plot timeline
collect tweets using the
search api
save tweets as jsonl
generate networks
retweet networks
hashtag networks
display networks
force-directed algorithm
change node size according to metadata
change node color according to community
explore twitter metadata
show nodeʼs tweets in dataset
show nodeʼs current timeline
based on node degree
community detection
louvain / infomap
data display options
hide certain metadata
export options
.gml / .csv / .gv
Backend: Python
Frontend: Streamlit
Backend: JavaScript
Frontend: HTML5
main library: tweepy
main library: igraph
main library: d3 force-graph
in feature for the Stream API because we believe that such collections are best done
on a headless server which stores the large amounts of incoming data in a database.
To collect tweets from the past, we recur to the Search API (Twitter 2020f). The
collection of tweets is again initiated by a keyword string, following the rules of a
Twitter Advanced Search (Twitter 2020c). This free API comes with limitations:
users can only make a limited number of requests per 15 minutes (Twitter 2020d).
In the twitter explorer, tweets are continuously stored until all possible tweets that
the Search API provides are collected.
Note that the Search API gives access only to indexed tweets from the last 7
days. Therefore, a collection created by the Search API cannot be considered
extensive, and it is subject to Twitter’s nontransparent filtering algorithm. Previous
research on the comparison between Stream and Search API however concludes
that Twitter filters mostly duplicates and strong language (Thelwall 2015; Black et
al. 2012). Measuring the volume of a 48-hour collection of tweets based on the
keyword "clubhouse", we find that 80% of tweets from the Stream API collection
are contained in the Stream API (see Figure 5 in the Appendix).
The visualizer creates interactive network visualizations from the collected corpus.
One can distinguish between interaction networks (with users as nodes) and
semantic networks (with words or concepts as nodes). The twitter explorer currently
supports the creation of retweet networks as interaction networks and hashtag co-
occurrence networks as semantic networks. Several data aggregation methods allow
for exploration of the network at different scales.
3.2.1 Twitter timeline
The data is presented as a timeline, where tweet counts are plotted over time. The
user can get a feeling of the overall salience of the chosen keyword and possible
peaks can hint towards special events.
Figure 2. The retweet network exploration interface. The modular command palette
(left) can (1) show information about the underlying data, (2) modify the
visualization, (3) display network measures and (4) search for and show information
about specific users and the content they generated in the dataset. Nodes are colored
according to their community. They can be interacted with by clicking or hovering to
display the username and relevant metadata in the palette. We invite the reader to
test the interactive visualization here:
3.2.2 Interaction networks
There are several ways of interaction on Twitter: retweets, mentions, replies,
following, likes, quotes and direct messages. Not all of them are accessible through
the API. We focus on retweet interaction which can be represented as a directed
network in which nodes are users and a link is drawn from node to if retweets . The
twitter explorer’s visualizer provides an interface for creating retweet networks which
includes the following features:
Community detection. In order to find strongly connected clusters of a
network, it has become common practice to employ community detection
algorithms. The twitter explorer currently supports Louvain (Blondel et al. 2008)
and InfoMap (Rosvall and Bergstrom 2007) algorithms.
Force-directed layout. The visualization library (Asturiano 2018) spatializes
the network using a force-directed layout in which nodes that retweet each other
more often are placed closer to each other (Noack 2009).
Aggregation methods. One challenge for understanding and visualizing
complex interaction networks is to find useful aggregation methods necessary to
observe the underlying discourse at different levels of granularity. We therefore
propose several methods of node aggregation: (1) removing nodes that only retweet
one source and don’t generate any content, (2) removing nodes that were retweeted
less than times and (3) reducing the network to an interaction network of
communities (cluster graph).
Hiding sensitive metadata. Removes all accessible metadata of users that have
less than 5000 followers from the interactive visualization. The nodes are visible,
and their links are taken into account, but they cannot be personally identified in
the interface.
Export abilities. Exports the networks to common formats like edgelist, GML
or GraphViz. The framework is therefore compatible with a wide range of existing
tools for network analysis (Bastian, Heymann, and Jacomy 2009; Peixoto 2014;
Csardi and Nepusz 2006).
An example of a retweet network visualized with the twitter explorer can be seen in
Figure 2. We collected data using the keyword "Brexit" about 10 days before the
General Election in the UK in December 2019. We observe a polarized retweet
network, where pro and anti-Brexiteers form two distinct clusters. This hints to the
fact that users in the debate tend to mainly share (and endorse) content created by
their own opinion group.
Figure 3. Hashtag network. Every node is a hashtag, and a link is drawn between
hashtags for every tweet they appear in together. The size of the text corresponds is
proportional to the node degree. We invite the reader to test the interactive
visualization here:
3.2.3 Semantic networks
While retweet networks allow to identify the main proponents of a debate and their
interaction patterns, looking at the most retweeted tweets might not be sufficient
to get an impression of the content structure of the debate. In order to explore the
textual content of the data, we propose hashtag co-occurrence networks. Here,
every node is a hashtag, and links are drawn between nodes if they appear in the
same tweet. By again laying out the network with a force-directed algorithm, the
hashtag network gives an overview of the debate’s vocabulary and can reveal the
different subtopics within a debate.
An example using the previously introduced Brexit data is shown in Figure 3.
Hashtags like "#votetactically", "#GetTheToriesOut" or "#VoteConservative"
point towards discussions closely related to the General Election, while hashtags
like "#DeepStateCorruption", "#TheGreatAwakening" or "#QAnon" shed light on
the existence of conspiracy-theory-related sub-discussions in the dataset.
The twitter explorer offers an intuitive exploration interface (see Figure 2). A
modular command palette allows for user interaction and provides insight into the
underlying meta data of the network:
Network information. Accesses generic information about the network
(keywords used to collect the data, date of collection, first/last tweet of the dataset).
Visualization options. Supports different node colorings according to their
community assignment. The node size can be dynamically changed according to
their respective metadata values (in/out-degree, number of followers, number of
followed accounts). This facilitates for instance the detection of news outlets.
Network measures. Shows the number of nodes and links in the network. This
set will be extended to include a wider range of network indicators in future releases.
User information. Search users in the given network and find them by
zooming or flashing their color. Display the user’s relevant metadata (number of
followers, number of followed accounts, number of retweets, number of times
retweeted), their tweets in the dataset as well as their current timeline. Note that
the interface will only display tweets that are still online at the time of exploration.
By doing so, it complies with the Twitter display requirements (Twitter 2020b).
The twitter explorer can be regarded an all-in-one-solution for the exploration of
Twitter networks, for which it is easy to develop new modules within the existing
components (see Figure 1). An example would be to include additional community
detection algorithms or new node aggregation methods.
Figure 4. The twitter explorer in context. Its modular structure makes it easy to
develop new features for the twitter explorer, but it also allows it to be used in
combination with existing data analysis and network science tools. The dotted arrows
depict export paths allowing users to integrate the (transformed) data from the twitter
explorer into their desired data analysis environment.
At the same time, its modular structure (division into collector / visualizer /
explorer) and the ability to export the generated data makes the tool compatible
with a variety of other data analysis tools (see Figure 4). Therefore, scientists can
use the twitter explorer in combination with existing tools from data and network
science. For instance, after the collector, the data could be passed on to a database,
or passed on to a natural language processing pipeline for content analysis. After
the visualizer, the exported network can be imported to a visualization suite like
Gephi, where various network measures and layout algorithms can be computed.
The twitter explorer is currently in an open beta stage on GitHub. Future work will
include the dynamical nature of retweet interaction in the visualization paradigms.
In order to disseminate the framework and attract new audiences to the field of
data-driven research, vignettes (use-cases) will be designed to showcase the twitter
explorer’s use in social science research. They will be published on our blog which is
accessible at Furthermore, it is planned to add the
possibility of exploring recently developed measures such as graph curvatures which
can provide new insights to the analysis of social networks (Leal et al. 2018). The
authors plan to actively maintain the tool and adapt it to Twitter API changes, like
the one that was recently announced for Academic Research (Twitter 2021).
The twitter explorer interface can be tested at The
source code is available on GitHub, where the current release can be downloaded
(Pournaki 2020). It is licensed under the GNU GPLv3 license (Free Software
Foundation Inc. 2007).
The twitter explorer is written partly in Python (data collection and transformation)
and JavaScript (interactive network visualization). The frontend for the data
collector and the visualizer is made with Streamlit (Treuille, Teixeira, and Kelly
2020), a Python library for the creation and deployment of data-analytic tools. The
Twitter objects are stored in the json lines format (Ward 2020). The network
operations and community detection rely on the Python implementation of igraph
(Csardi and Nepusz 2006). The interactive networks are drawn using D3.js
(Bostock 2011), more specifically the force-graph library (Asturiano 2018).
The idea for the twitter explorer originated from fruitful discussions in the context
of the ODYCCEUS project between Armin Pournaki, Felix Gaisbauer, Sven
Banisch and Eckehard Olbrich. The tool is designed and developed by Armin
Pournaki. All authors wrote the manuscript. This project has received funding from
the European Union’s Horizon 2020 research and innovation programme under
grant agreement No 732942.
Asturiano, Vasco (2018). force-graph.
[Online; accessed 29-January-2021].
Bastian, Mathieu, Sebastien Heymann, and Mathieu Jacomy (2009). “Gephi: An
Open Source Software for Exploring and Manipulating Networks”. In: url:
Black, Alan et al. (2012). “Twitter zombie: Architecture for capturing, socially
transforming and analyzing the Twittersphere”. In: Proceedings of the 17th
ACM international conference on Supporting group work, pp. 229–238.
Blondel, Vincent D et al. (2008). “Fast unfolding of communities in large
networks”. In: Journal of statistical mechanics: theory and experiment 2008.10,
Borra, Erik and Bernhard Rieder (2014). “Programmed method: Developing a
toolset for capturing and analyzing tweets”. In: Aslib Journal of Information
Bostock, Mike (2011). D3.js. [Online; accessed 29-January-
Bruns, Axel and Jean Burgess (2014). “Crisis communication in natural disasters:
The Queensland floods and Christchurch earthquakes”. In: Twitter and
society [Digital Formations, Volume 89]: ed. by A Bruns et al. United States of
America: Peter Lang Publishing, pp. 373–384.
Burgess, Jean and Ariadna Matamoros-Fernández (2016). “Mapping sociocultural
controversies across digital media platforms: One week of# gamergate on
Twitter, YouTube, and Tumblr”. In: Communication Research and Practice
2.1, pp. 79–96.
Conover, Michael D, Bruno Gonçalves, et al. (Oct. 2011). “Predicting the
Political Alignment of Twitter Users”. In: 2011 IEEE Third International
Conference on Privacy, Security, Risk and Trust and 2011 IEEE Third
International Conference on Social Computing, pp. 192–199. doi:
Conover, Michael D, Jacob Ratkiewicz, et al. (2011). “Political polarization on
twitter”. In: Fifth international AAAI conference on weblogs and social
Csardi, Gabor and Tamas Nepusz (2006). “The igraph software package for
complex network research”. In: InterJournal Complex Systems, p. 1695. url:
Davis, Clayton A et al. (2016). “OSoMe: the IUNI observatory on social media”.
In: PeerJ Computer Science 2, e87.
DocNow (2020). twarc. [Online; accessed 29-
Free Software Foundation Inc. (2007). GNU General Public License. [Online; accessed 29-January-
Gaisbauer, Felix et al. (2020). “How Twitter affects the perception of public
opinion: Two case studies”. In: arXiv preprint arXiv:2009.01666.
Gaumont, Noé, Maziyar Panahi, and David Chavalarias (Sept. 2018).
“Reconstruction of the socio-semantic dynamics of political activist Twitter
networks—Method and application to the 2017 French presidential
election”. In: PLOS ONE 13.9, pp. 1–38. doi:
10.1371/journal.pone.0201879. url:
Jänicke, Stefan et al. (2015). “On Close and Distant Reading in Digital
Humanities: A Survey and Future Challenges.” In: EuroVis (STARs), pp. 83–
Jünger, Jakob and Till Keyling (2019). Facepager. [Online; accessed 29-January-2021].
Leal, Wilmer et al. (2018). “Forman-Ricci Curvature for Hypergraphs”. en. In:
doi: 10.13140/RG.2.2.27347.84001. url:
Moretti, Franco (2000). “Conjectures on world literature”. In: New left review 1,
p. 54.
Myers, Seth A et al. (2014). “Information network or social network? The
structure of the Twitter follow graph”. In: Proceedings of the 23rd
International Conference on World Wide Web, pp. 493–498.
Nitins, Tanya and Jean Burgess (2014). “Twitter, brands, and user engagement”.
In: Twitter and society [Digital Formations, Volume 89]: ed. by A Bruns et al.
United States of America: Peter Lang Publishing, pp. 293–304.
Noack, Andreas (Feb. 2009). “Modularity clustering is force-directed layout”. In:
Physical Review E 79.2, p. 026102. doi: 10.1103/physreve.79.026102.
Paltoglou, Georgios and Mike Thelwall (2017). “Sensing social media: A range of
approaches for sentiment analysis”. In: Cyberemotions. Springer,
. 97–117.
Peixoto, Tiago P. (2014). “The graph-tool python library
”. In: figshare. doi:
10.6084/m9.figshare.1164194. url:
1164194 (visited on 09/10/2014).
Pournaki, Armin (2020). twitter-explorer.
explorer. [Online; accessed 29-January-2021].
Rainie, Lee (2014). “The six types of Twitter conversations”. In: Pew Research
Center 20.
Rosvall, Martin and Carl T Bergstrom (2007). “Maps of information flow reveal
community structure in complex networks”. In: arXiv preprint physics.soc-
Stream vs. Search API
We investigate the difference between the Twitter Stream and the Search API.
Using the keyword "clubhouse", we first collect tweets using the Stream API from
Jan. 25th to Jan. 27th. We then launch the Twitter Search on Jan. 27th to see how
many tweets we can collect until Jan. 25th. The tweet count over time is shown in
Figure 5. The Search API provides about 80% of the tweets collected by the Stream
API. In our example, 13% of the missing tweets in the Search corpus were original
tweets and 13% were retweets.
Figure 5. Streaming API vs Search API. We collected tweets using the keyword
"clubhouse" for 48 hours using the search and the streaming API and observe that the
Search API constantly returns less tweets than the Search API. Over the whole time
range, the searched tweets make out 80% of the streamed tweets.
01/25 15:00
01/25 18:00
01/25 21:00
01/26 00:00
01/26 03:00
01/26 06:00
01/26 09:00
01/26 12:00
01/26 15:00
01/26 18:00
01/26 21:00
01/27 00:00
01/27 03:00
01/27 06:00
01/27 09:00
01/27 12:00
01/27 15:00
tweet count
... A recently introduced open-source interface for scientists to explore Twitter data through interactive network visualizations is the Twitter Explorer [21]. It makes use of the Twitter search API with all the limitations (number of requests per 15 min and tweets from the last seven days) to collect tweets based on a search term and analyze them. ...
Full-text available
The proliferation of online news, especially during the “infodemic” that emerged along with the COVID-19 pandemic, has rapidly increased the risk of and, more importantly, the volume of online misinformation. Online Social Networks (OSNs), such as Facebook, Twitter, and YouTube, serve as fertile ground for disseminating misinformation, making the need for tools for analyzing the social web and gaining insights into communities that drive misinformation online vital. We introduce the MeVer NetworkX analysis and visualization tool, which helps users delve into social media conversations, helps users gain insights about how information propagates, and provides intuition about communities formed via interactions. The contributions of our tool lie in easy navigation through a multitude of features that provide helpful insights about the account behaviors and information propagation, provide the support of Twitter, Facebook, and Telegram graphs, and provide the modularity to integrate more platforms. The tool also provides features that highlight suspicious accounts in a graph that a user should investigate further. We collected four Twitter datasets related to COVID-19 disinformation to present the tool’s functionalities and evaluate its effectiveness.
... This helps to show how influential the user sharing real or false information could be. The code for both these visualizations is written partly by the visualizer in Twitter explorer [44]. Python implementation of the igraph library [45] is used for making the graph networks. ...
Full-text available
The rise in online misinformation in recent years threatens democracies by distorting authentic public discourse and causing confusion, fear, and even, in extreme cases, violence. There is a need to understand the spread of false content through online networks for developing interventions that disrupt misinformation before it achieves virality. Using a Deep Bidirectional Transformer for Language Understanding (BERT) and propagation graphs, this study classifies and visualizes the spread of misinformation on a social media network using publicly available Twitter data. The results confirm prior research around user clusters and the virality of false content while improving the precision of deep learning models for misinformation detection. The study further demonstrates the suitability of BERT for providing a scalable model for false information detection, which can contribute to the development of more timely and accurate interventions to slow the spread of misinformation in online environments.
Full-text available
In contrast to graph-based models for complex networks, hypergraphs are more general structures going beyond binary relations of graphs. For graphs, statistics gauging different aspects of their structures have been devised and there is undergoing research for devising them for hypergraphs. Forman-Ricci curvature is a statistics for graphs, which is based on Riemannian geometry, and that stresses the relational character of vertices in a network through the analysis of edges rather than vertices. In spite of the different applications of this curvature, it has not yet been formulated for hypergraphs. Here we devise the Forman-Ricci curvature for directed and undirected hypergraphs, where the curvature for graphs is a particular case. We report its upper and lower bounds and the respective bounds for the graph case. The curvature quantifies the trade-off between hyperedge(arc) size and the degree of participation of hyperedge(arc) vertices in other hyperedges(arcs). We calculated the curvature for two large networks: Wikipedia vote network and Escherichia coli metabolic network. In the first case the curvature is ruled by hyperedge size, while in the second by hyperedge degree. We found that the number of users involved in Wikipedia elections goes hand-in-hand with the participation of experienced users. The curvature values of the metabolic network allowed detecting redundant and bottle neck reactions. It is found that ADP phosphorilation is the metabolic bottle neck reaction but that the reverse reaction is not that central for the metabolism.
Full-text available
Background Digital spaces, and in particular social networking sites, are becoming increasingly present and influential in the functioning of our democracies. In this paper, we propose an integrated methodology for the data collection, the reconstruction, the analysis and the visualization of the development of a country’s political landscape from Twitter data. Method The proposed method relies solely on the interactions between Twitter accounts and is independent of the characteristics of the shared contents such as the language of the tweets. We validate our methodology on a case study on the 2017 French presidential election (60 million Twitter exchanges between more than 2.4 million users) via two independent methods: the comparison between our automated political categorization and a human categorization based on the evaluation of a sample of 5000 profiles descriptions; the correspondence between the reconfigurations detected in the reconstructed political landscape and key political events reported in the media. This latter validation demonstrated the ability of our approach to accurately reflect the reconfigurations at play in the off-line political scene. Results We built on this reconstruction to give insights into the opinion dynamics and the reconfigurations of political communities at play during a presidential election. First, we propose a quantitative description and analysis of the political engagement of members of political communities. Second, we analyze the impact of political communities on information diffusion and in particular on their role in the fake news phenomena. We measure a differential echo chamber effect on the different types of political news (fake news, debunks, standard news) caused by the community structure and emphasize the importance of addressing the meso-structures of political networks in understanding the fake news phenomena. Conclusions Giving access to an intermediate level, between sociological surveys in the field and large statistical studies (such as those conducted by national or international organizations) we demonstrate that social networks data make it possible to qualify and quantify the activity of political communities in a multi-polar political environment; as well as their temporal evolution and reconfiguration, their structure, their alliance strategies and their semantic particularities during a presidential campaign through the analysis of their digital traces. We conclude this paper with a comment on the political and ethical implications of the use of social networks data in politics. We stress the importance of developing social macroscopes that will enable citizens to better understand how they collectively make society and propose as example the “Politoscope”, a macroscope that delivers some of our results in an interactive way.
Full-text available
The study of social phenomena is becoming increasingly reliant on big data from online social networks. Broad access to social media data, however, requires software development skills that not all researchers possess. Here we present the IUNI Observatory on Social Media, an open analytics platform designed to facilitate computational social science. The system leverages a historical, ongoing collection of over 70 billion public messages from Twitter. We illustrate a number of interactive open-source tools to retrieve, visualize, and analyze derived data from this collection. The Observatory, now available at, is the result of a large, six-year collaborative effort coordinated by the Indiana University Network Science Institute.
Conference Paper
Full-text available
We present an overview of the last ten years of research on visualizations that support close and distant reading of textual data in the digital humanities. We look at various works published within both the visualization and digital humanities communities. We provide a taxonomy of applied methods for close and distant reading, and illustrate approaches that combine both reading techniques to provide a multifaceted view of the data. Furthermore, we list toolkits and potentially beneficial visualization approaches for research in the digital humanities. Finally, we summarize collaboration experiences when developing visualizations for close and distant reading, and give an outlook on future challenges in that research area.
Full-text available
Purpose – The purpose of this paper is to introduce Digital Methods Initiative Twitter Capture and Analysis Toolset, a toolset for capturing and analyzing Twitter data. Instead of just presenting a technical paper detailing the system, however, the authors argue that the type of data used for, as well as the methods encoded in, computational systems have epistemological repercussions for research. The authors thus aim at situating the development of the toolset in relation to methodological debates in the social sciences and humanities. Design/methodology/approach – The authors review the possibilities and limitations of existing approaches to capture and analyze Twitter data in order to address the various ways in which computational systems frame research. The authors then introduce the open-source toolset and put forward an approach that embraces methodological diversity and epistemological plurality. Findings – The authors find that design decisions and more general methodological reasoning can and should go hand in hand when building tools for computational social science or digital humanities. Practical implications – Besides methodological transparency, the software provides robust and reproducible data capture and analysis, and interlinks with existing analytical software. Epistemic plurality is emphasized by taking into account how Twitter structures information, by allowing for a number of different sampling techniques, by enabling a variety of analytical approaches or paradigms, and by facilitating work at the micro, meso, and macro levels. Originality/value – The paper opens up critical debate by connecting tool design to fundamental interrogations of methodology and its repercussions for the production of knowledge. The design of the software is inspired by exchanges and debates with scholars from a variety of disciplines and the attempt to propose a flexible and extensible tool that accommodates a wide array of methodological approaches is directly motivated by the desire to keep computational work open for various epistemic sensibilities.
Sentiment analysis deals with the computational detection and extraction of opinions, beliefs and emotions in written text. It combines theories and methodologies from a diverse set of scientific domains, such as psychology, natural language processing and machine learning. It fulfils the very important role of transforming the unstructured textual communication between social media users into quantifiable and informed estimations of expressed sentiment, which can subsequently be used by physicists, sociologists, complex system experts in studying the collective properties of such phenomena. The problem has been addressed from two different but often complementary directions: lexicon-based solutions that rely on sentiment dictionaries (i.e., lists of words in which each token is annotated with an indication of the affective content it typically conveys) and machine learning solutions that automatically or semi-automatically learn to detect the affective content of text. In this chapter, we discuss a range of solutions and their strengths and weaknesses in different environments and settings. We conclude that based on the application environment as well as the desired output, different types of analyses are appropriate, with varying levels of predictive accuracy.
Social media play a prominent role in mediating issues of public concern, not only providing the stage on which public debates play out but also shaping their topics and dynamics. Building on and extending existing approaches to both issue mapping and social media analysis, this article explores ways of accounting for popular media practices and the special case of ‘born digital’ sociocultural controversies. We present a case study of the GamerGate controversy with a particular focus on a spike in activity associated with a 2015 Law and Order: SVU episode about gender-based violence and harassment in games culture that was widely interpreted as being based on events associated with GamerGate. The case highlights the importance and challenges of accounting for the cultural dynamics of digital media within and across platforms.
This paper proposes a new method for the study of world literature. Moretti assumes that within modernity all cultural influences - not necessarily identical with political influences - are part of the struggle for symbolic hegemony. As carriers of modernity literary genres are subjected to various deformations within local cultures.To capture these deformations it is necessary to test selected formal or structural elements (tropes, themes, motife, narrative) across several national literatures.This is not possible through close reading, but only through distant reading, that is to say through interpretations that build on the work of other literary historians. A literary form analysed in this way represents the literary system embodied in the work (a system of literary inequality), and is an abstraction of social relationships. Hence the study of world literature is an analysis of power.
Conference Paper
In this paper, we provide a characterization of the topological features of the Twitter follow graph, analyzing properties such as degree distributions, connected components, shortest path lengths, clustering coefficients, and degree assortativity. For each of these properties, we compare and contrast with available data from other social networks. These analyses provide a set of authoritative statistics that the community can reference. In addition, we use these data to investigate an often-posed question: Is Twitter a social network or an information network? The "follow" relationship in Twitter is primarily about information consumption, yet many follows are built on social ties. Not surprisingly, we find that the Twitter follow graph exhibits structural characteristics of both an information network and a social network. Going beyond descriptive characterizations, we hypothesize that from an individual user's perspective, Twitter starts off more like an information network, but evolves to behave more like a social network. We provide preliminary evidence that may serve as a formal model of how a hybrid network like Twitter evolves.