Conference PaperPDF Available
Contropedia - the analysis and visualization of
controversies in Wikipedia articles
Erik Borra
University of Amsterdam
Esther Weltevrede
University of Amsterdam
Paolo Ciuccarelli
Politecnico de Milan
Andreas Kaltenbrunner
Barcelona Media
David Laniado
Barcelona Media
Giovanni Magni
Politecnico de Milan
Michele Mauri
Politecnico de Milan
Richard Rogers
University of Amsterdam
Tommaso Venturini
Médialab, Sciences Politiques
Collaborative content creation inevitably reaches situations
where dierent points of view lead to conflict. In Wikipedia,
one of the most prominent examples of collaboration online,
conflict is mediated by both policy and software, and con-
flicts often reflect larger societal debates.
In this paper, we describe the ongoing project Contrope-
dia which aims to build a platform for the analysis and vi-
sualization of such controversies in Wikipedia. Controversy
metrics are extracted from activity streams generated by ed-
its to, and discussions about, individual articles and groups
of related articles. An article’s revision history and its corre-
sponding discussion pages constitute two parallel streams of
user interactions that, taken together, fully describe the pro-
cess of the collaborative creation of an article. Our proposed
platform, Contropedia, builds on state of the art techniques
and extends current metrics for the analysis of both edit
and discussion activity and visualizes these both as a layer
on top of Wikipedia articles as well as a dashboard view pre-
senting additional analytics. Furthermore, the combination
of these two approaches allows for a deeper understanding
of the substance, composition, actor alignment, trajectory
and liveliness of controversies on Wikipedia.
Our research aims to provide a better understanding of
socio-technical phenomena that take place on the web and
to equip citizens with tools to fully deploy the complexity of
controversies. Contropedia is useful for the general public
as well as user groups with specific interests such as scien-
tists, students, data journalists, decision makers and media
Contropedia is still an ongoing project and the present
document has been written with the aim of asking for feed-
back from the Wikipedia and research community.
Submitted for confidential review.
May 2014
Categories and Subject Descriptors
D.2.2 [Design Tools and Techniques]: User interfaces;
H.5.3 [Group and Organization Interfaces]: Computer-
supported cooperative work
General Terms
Algorithms, Measurement, Design, Human Factors
Wikipedia, Controversy Mapping, Digital Methods, Infor-
mation Visualization, Actor-Network Theory, Data Mining
The aim of Contropedia is to build a tool for the real-time
analysis and visualization of discussions and controversies in
systems for collaborative content generation. The tool helps
to achieve a deeper multidisciplinary understanding of the
web as a societal artifact and its function as a mirror of
societal controversies. As experimental field we choose Wi-
kipedia, the largest platform for such content and also the
overall sixth most tracked website1at the moment this pa-
per was written. We are currently developing a real-time dy-
namic atlas of the substance, composition, actor alignment,
trajectory and liveliness of controversies on Wikipedia.
We have followed an interdisciplinary approach in design-
ing and implementing the Contropedia platform combin-
ing experts from the fields of social science, data mining,
data visualization as well as new media and digital cul-
ture studies. The tool is anchored in previous experiences
of mapping public debates obtained from the ongoing EU
FP7 project, EMAPS2, and its predecessor MACOSPOL3.
These two projects have been developing a toolkit of con-
ceptual and methodological instruments to explore and rep-
resent techno-scientific disputes. Drawing on the traditions
1Alexa (2014) Top sites. Available at http://www.alexa.
com/topsites. Last accessed on April 10, 2014.
2Electronic Maps to Assist Public Science (http://www. Last accessed on May 1, 2014.
3Mapping controversies in science and technology for politics
( Last accessed
on May 1, 2014.
of pragmatist political thinking [11] and of the science and
technology studies [4], the method of controversy mapping
has been introduced by Bruno Latour to understand and
participate in public debate [20, 21]. The idea behind such
method is that conflict is an essential part of collective ex-
istence and that its energies can be positively channelled
(or at least the most destructive outcomes can be avoided)
by equipping citizens with the tools to deploy the full com-
plexity of controversies. Developing such tools is becoming
increasingly feasible thanks to the growing traceability of
collective discussions made possible by the spread of digital
media [22].
Here we extend these concepts to the controversies which
may evolve around Wikipedia articles. While Wikipedia’s
apparatus is primed to produce encyclopedic content, in a
system where everyone is allowed to edit, it comes as no
surprise that agreement about content is not always easily
reached. To maintain its ’encyclopedianess’ Wikipedia has
mechanisms in place with which consensus is designed, such
as the core content policies neutral point of view (NPoV),
verifiability and no original research. By recognizing Wiki-
pedia’s purpose to reach consensus, we suggest that by scru-
tinizing the disagreements which become apparent through-
out an article’s edit-history and talk pages, Wikipedia can
be repurposed to study not only consensus but controver-
sies [26].
Conflicts in Wikipedia articles have previously been stud-
ied by observing article edit histories [3] - where each edit
and the user who made that edit are logged, by considering
reverts [1, 18] - a key mechanism in Wikipedia for repair-
ing detrimental edits to an article, and by analyzing talk
pages [6, 9] - parallel spaces for discussion about the article
content. Additionally, research has been pursued to char-
acterize and visualize conflict and coordination on Wikipe-
dia [7, 10] and to identify which articles are controversial [24,
27]. The vast majority of these studies are primarily focused
on the social dynamics between editors and only a few have
taken into account the content of controversial edits [17, 23],
as we do in Contropedia.
Furthermore, while most of previous literature uses arti-
cles as basic units of content, in this project we go beyond
the mere identification of controversial pages by studying
the controversies within an article, or a set of articles, in
more detail. We combine techniques based on both mining
article edit history and analysing discussion patterns in talk
pages to identify and visualize what exactly is controversial
within an article (which figures, concepts, references, etc),
how these controversies evolve, which users are involved,
whether these users belong to opposing camps, and what
issues they hold dear. While Contropedia uses up-to-date
data, each controversy will be provided with a historical view
to show which of the controversial issues within a (set of)
articles are especially active and at which time (are they hot
or not?, what is the trajectory of a contested issue?).
During the development of Contropedia we have put spe-
cial focus on Wikipedia’s articles related to the debate on
climate change. There are several reasons for this choice.
From a practical point of view, this choice allows us to profit
from the synergies of two other research project in which
the partners of this consortium are participating: EMAPS
and MEDEA, which are both devoted to the analysis of the
discussions around global warming adaptation. From a po-
litical point of view, climate change is probably the most
important current techno-scientific debate, both in terms of
the amount of actors and resources involved and for the im-
pacts that its outcomes are likely to have on collective life.
Finally, climate change has always been one of the most in-
tensely discussed topics on Wikipedia [8].
The rest of this paper is structured as follows: In Section
2 we first introduce some theoretical background on contro-
versy mapping and explain why Wikipedia is especially well
suited for controversy mapping. In Section 3 various met-
rics are introduced which allow one to locate the substance
of controversies within an article. In Section 4 the dierent
modules and functionalities of Contropedia’s user interface
are explained and finally, in Section 5 we recapitulate the
goals and uses of Contropedia and point towards future di-
rections of research and development.
Here we briefly discuss why Wikipedia is suitable for con-
troversy mapping. We do this by connecting the defining
characteristics of controversy mapping with the relevant ele-
ments of Wikipedia’s socio-technical apparatus. This discus-
sion establishes the prerequisites with which we can explore
the substance of controversies.
2.1 Controversy mapping
Controversy mapping is the practice of Actor-Network
Theory (ANT) through which one aims to observe and de-
scribe the debate around techno-scientific issues [20]. Some-
thing is said to be controversial when it becomes apparent
that the actors in a debate disagree and cannot ignore each
other until a compromise is reached. As such, controver-
sies constitute the best setting to observe the construction
of social life, as these are the moments at which the rela-
tions between actors and their positions are articulated and,
because of the fluidity of debates, constantly repositioned.
Mapping out this multiplicity of viewpoints allows one to
find out what is at stake in the unfolding of the controversy.
Although an elaborate treatise of ANT is beyond the scope
of this paper, for the purpose of this research it is impera-
tive to understand that in ANT an actor is understood to
be anything which is acting. That means that whenever the
presence or absence of something makes a dierence, and
whenever this dierence is perceived by other actors, that
something can be said to be an actor in the controversy.
Actors are thus “not only human beings and human groups,
but also natural and biological elements, industrial and artis-
tic products, economic and other institutions, scientific and
technical artifacts and so on and so forth” [20, pp4]. Fur-
thermore, an actor is never isolated but always composed
by, and a component of, a network. An actor exists because
it inter-acts; it is always relating and being related.
Now that the very basics of controversy mapping have
been introduced Wikipedia articles can be positioned as sites
of controversy, which we do in the following paragraphs.
2.2 Wikipedia as a controversy defusing de-
While much research has been devoted to the identifica-
tion of controversial articles, see e.g. [24, 27], to our knowl-
edge little research has yet revealed which content within an
article is controversial. Our project seeks to go beyond the
mere identification of controversial articles and seeks to dis-
cover within a controversial article what exactly is disputed,
when it is, and to what extent.
Wikipedia’s goal is the production of encyclopedic content
through an open and collaborative environment. It will come
as no surprise that agreement about content is not always
easily reached. Therefore, Wikipedia has a socio-technical
apparatus [13] in place to support consensus reaching – con-
sensus is even said to be “the primary way decisions are
made”4. It does not imply unanimity nor is it the result of a
vote, it is a form of decision-making which involves“an eort
to incorporate all editors’ legitimate concerns, while respect-
ing Wikipedia’s norms”
5. Phrased dierently, consensus is
the guiding principle for knowledge production and conflict
resolution on Wikipedia. Consensus is achieved most no-
tably through the core content policies neutral point of view6
(NPoV), verifiability7and no original research8, although
Wikipedia also oers a plethora of other tools and tech-
niques to defuse controversies: to avoid edit wars there is
a three-revert rule9, controversial matters should be further
discussed on talk pages, etc.
The built-in edit history and the talk pages of a Wikipe-
dia article are rich sources for the controversy mapper as
they meticulously document the work involved in reaching
consensus. They provide a detailed record of present and
past changes to the content of articles and the unfolding of
discussions on the talk page; they reveal the fabric of col-
lective knowledge production as editors of Wikipedia engage
in tying and untying relations, and argue about categories
and terminology. Because of this, we propose that the edit
histories and talk pages of an article may be repurposed to
map out the specific actors, positions and matters of concern
in controversies, as well as the extent to which something is
controversial within that article [26].
Returning to ANT’s notion of an actor, we posit that each
linked element in a controversial Wikipedia article can be
considered an actor in the controversy. According to Wiki-
pedia’s guidelines links should generally be created to “any
subject of another article that will help readers understand
the article more fully”, to “articles with relevant informa-
tion for the current article”, to “articles explaining words of
technical terms, jargon or slang expressions/phrases”, and
to “proper names that are likely to be unfamiliar to read-
Wikipedia:Consensus&oldid=604604782 Last accessed
on May 1, 2014.
Wikipedia:Consensus&oldid=604604782 Last accessed
on May 1, 2014.
Last accessed on May 1, 2014.
Wikipedia:Verifiability&oldid=605140017 Last ac-
cessed on May 1, 2014.
Last accessed on May 1, 2014.
Wikipedia:Edit_warring&oldid=605860122 Last accessed
on May 1, 2014.
ers”10. As such, links to other Wikipedia articles can be
said to identify all things which are acting within the con-
text of an article. They are also networked, because each link
within an article points to another Wikipedia article which
is collectively produced11. These actors then, are the lenses
through which we can look at the substance and activity of
controversies within a Wikipedia article.
Now that we have established links within an article, or
more generically wiki objects12 , as the actors and focal points
within a controversial article, it is possible to find out which
of these actors have been most controversial within the ar-
ticle. In this section we first discuss metrics based on the
edit-history of a controversial article, then we discuss a met-
ric based on the talk page of an article, and lastly we discuss
how we combine these two metrics.
3.1 Assigning controversy scores based on the
edit history
We have implemented two metrics based on the edit-history
to indicate which of the actors have been the locus of most
edit activity or debate. By looking at edits to the sentences
in which these wiki objects appear, they both use the activ-
ity around a specific actor as an indication of controversial-
Before our metrics are run, our system will retrieve the full
edit history for a controversial article, including the wiki text
of each revision and the meta data conveying at what time it
was edited, who the editor was, as well as the editor’s com-
ment. As we are specifically interested in the substance of
the controversy, or more specifically, those edits where sub-
stantive contributions are made, our measures do not take
into account vandalism edits or vandalism reverts.13 Addi-
tionally, we have filtered out punctuation and maintenance
Let {R1,...,R
r+1,...}be the set of revisions
of a Wikipedia article. As we are specifically interested in
the substance of edits we consider the edit activity on a
sentence level by comparing every revision Rr1with its
successor Rr.15 We first split each revision into sections16
and then make a pairwise comparison of the sections via a
Last accessed on May 1, 2014.
11Note that editors of a Wikipedia page are also actors and
12In a later stage we will also consider external links, refer-
ences, figures, and templates as actors.
13Vandalism edits are detected, and hence discarded, by
identifying whether a comment contains the word ’vandal’
whether the user name making the revert belongs to one of
the known vandalism bots, or when a an IP-edit is reverted
within 60 seconds.
14We consider an edit as a punctuation or maintenance edit
if the textual dierence between two revisions has a Leven-
shtein distance smaller than five.
15Whenever a user makes multiple consecutive edits, we com-
pare the last version made by the user with the version which
she started revising.
16We do this both because [10] and [14] have shown that
controversy is located in specific sections of an article and
because it makes comparison easier.
di17 algorithm. Whenever a dierence between two sec-
tions is detected, we identify the individual sentences which
are edited and the wiki objects contained in them.
For each wiki object thus obtained we store in a database
the article’s revision id, the user who made the edit, the com-
ment added by the user, the sentence which was changed,
the diof the edit, the wiki object, its canonical form18 ,
and the type of edit (inserted, deleted, changed, or change
in surrounding sentence).
3.1.1 Edit-based measure
Although we are interested in finding out how controver-
sial a wiki object Okis, we are comparing the edit activity
of sentences in which Okappears. Intuitively, the more wiki
objects appear in an edited sentence, the less focus there
is on one particular wiki object, i.e. an actor. For every
edited sentence Sjwe thus divide the weight attributed to a
wiki object by the total number of wiki objects o(Sj) which
appear in that sentence.
For a given revision Rrand each wiki object Okin edited
sentences Sjwe assign a controversy score c(Ok)
i=1 X
In other words: over all revisions up to Rrand over all edited
sentences where Okappears in a certain revision we sum the
inverse of the number of wikiobjects in these sentences.
We can also make reverts count more:
where %i= 2 when the edit of revision Riis a revert and
%i= 1 if it is a normal edit.
We obtain the normalized controversy score for a wiki ob-
ject in a particular article by normalizing all the scores by
the wiki object with the maximum controversy score. To
find out which wiki object is most controversial, i.e. around
which actors most negotiations took place, we simply rank
the wiki objects of the article in descending order.
3.1.2 User-based measure
While the ’edit-based’ measures of Eqs. (1) and (2) al-
ready provide wiki objects with an intuitively intelligible
controversy rank, we have also experimented with a ’user-
based’ measure, inspired by the mutual revert measure of [19].
Again, let {R1,...,R
r+1,...}be the set of
revisions of a Wikipedia article. We denote by nu
rthe num-
ber of times a user Uu, author of revision Rr, has edited the
article19. We characterize edits by tuples (nv
r), where u
17Like the difile comparison utility, the text of an element
is tokenized on white space, full stop, or line break and the
longest common sub-sequence of tokens is calculated. To-
kens absent in the sub-sequence but present in the new re-
vision are treated as insertions, and tokens absent in the
sub-sequence but present in the old revision are treated as
18For each wiki object we derive its canonical form, which
allow us to uniformly identify elements across versions. For
example, the canonical form of a wiki link is the combination
of its type (link) and the article name it refers to, allowing
us to track the edit activity in the anchor text.
19Here too we consider consecutive edits by the same user as
one edit.
denotes the index of the user who makes edits to the article
versions edited by user Uv. Note that r=q+1.
As we want to focus on wiki objects instead of revisions,
we denote by nu
r(Ok) the total number of times (until re-
vision number r) in which wiki object Okwas edited by
user Uu.
Let us denote in analogy to Eq. (1) ˜nr(Ok)=PSj2Rr
as a weighted version of this counter so that we include the
notion that there should be less weight attributed to object
Okif multiple wiki objects are present in the edited sen-
tences Sj. Weighted edit pairs for a specific wiki object Ok
are then defined as (˜nv
r(Ok)), where
r(Ok)= X
Rr:Uuchanges Ok
Rr:Uuchanges OkX
In other words: we sum the weighted edit count of all revi-
sions of user Uuwhere she changes Object Ok.
When ranking editor contributions, two main edit types
can be distinguished: when one or both of the editors have
made few edits to the article, these are typically editors who
are not “invested”in the controversy, and when both editors
are heavily invested. In order to express this distinction
numerically, we use the lesser of the re-weighted edit counter
r(Ok)), so that the total count includes edits by
less invested pairs of users as well, but with a much smaller
Finally, we multiply by the number of editors |Ei|(the
larger the armies, the larger the war).
min ˜nv
Additionally, we can censor the editor pairs with great-
est weighted edit counts to eliminate cases with conflicts
between two persons only.
c(Ok)=|Ei| r
min ˜nv
i=1 min ˜nv
The results are similar to the ’edit-based’ measures, with
small dierences in emphasis.20
Section 4 reports about the uses of these metrics. First,
however, we introduce how we measured controversy scores
for actors based on the activity in the talk pages.
3.2 Assigning controversy scores based on the
talk pages
Besides looking at controversies in article edit histories we
analyze talk pages, which are special wiki pages associated to
each article and devoted to discussion about how to improve
article quality. Here is where controversies are explicitly
discussed by editors of an article. In cases in which an article
20In a later stage we will also experiment with mutual edits,
i.e. only regard those edits where editors have mutually
edited each other at least once.
is protected from editing21, talk pages are the only place
where controversy takes place.
First, we identify all talk pages associated to an article
and parse them to detect thread titles, and signature and
date of each comment. Additionally we look at comment
indentation to extract the thread structure of messages and
replies, i.e. we reconstruct the discussion tree. For this
step we follow the methodology of [6] and [9], but instead of
quantifying controversiality at the level of articles, we do it
at the level of discussion threads.
We represent each thread as a tree of comments and replies,
and we characterize controversiality of each thread based on
two metrics, representing respectively depth and width of
the discussion. The first metric, derived from the structure
of the discussion tree, is the maximum depth of the thread,
i.e. the maximum level of indentation of comments. When
comments are directly written under the thread title without
indentation, i.e. no comment is a reply to another comment,
maximum depth of the thread is 1. If some comment is a re-
ply to another comment, maximum depth is 2, while if there
is a reply to a reply maximum depth is 3, and so on. High
values of this metrics indicate high presence of argumenta-
tion among users and is therefore a straightforward proxy
for controversiality. However, a deep discussion thread can
be created just by a few users, typically just by two users
arguing with each other. Thus, as a complementary met-
ric we take the number of users participating in a thread,
representing the width of the discussion.
3.3 Combining controversy scores from the edit
history and the talk pages
Discussions in talk pages are not explicitly associated to
the specific sections of the article to which they are asso-
ciated, so we establish this connection relying on a set of
heuristics to associate each discussion thread to one or more
The first method consists of detecting when a user men-
tions a discussion at the moment the article is edited. When
submitting an edit, a user has to fill in a comment field de-
scribing the content of the edit. If in this comment we find
a link to a discussion thread, we deduce that the editor is
executing what has been deliberated in the corresponding
thread, or is anyway referring to that discussion as related
to the edit. We therefore interpret this as straightforward
evidence of a connection between the article section aected
by the edit and the discussion thread mentioned in the edit
As a second method, we look for each section’s title in dis-
cussion thread titles, or in the text of the discussion threads.
If the section’s title is mentioned in a discussion thread, we
establish a link between the two. Analogously, we also look
for the word “abstract” in the talk pages to detect connec-
tions between a discussion thread and the article’s abstract.
To be able to match a higher number of discussion threads,
and have a more robust matching, we are currently develop-
ing more sophisticated methods, based on common elements
between a discussion thread and an article section. Namely,
we look for the co-occurrence of common actors mentioned,
and of common users active in the same time-period. In
other words, if we find that the same actors are mentioned
Wikipedia:Protection_policy&oldid=606716407 Last
accessed May 3, 2014.
in a discussion thread and in a section of the article, we hy-
pothesize a relationship between the two. Similarly, if some
users are editing the article and discussing in the associated
talk page at the same time, we interpret this as a hint for
associating the discussion thread in which these users are
discussing to the article section they are editing.
After associating discussion threads to article sections, we
are able to assign to each actor and each section the contro-
versiality indexes coming from the analysis of both sources.
Contropedia will be an online platform22 that seeks to
provide insights into the disagreement about the substance
of controversial Wikipedia articles. As addressed in the pre-
vious section, the platform makes use of the rich data that
is available on the history and talk pages of Wikipedia ar-
ticles to map out the disputed aspects of controversial ar-
ticles. The aim is to provide insights into the debate itself
and gain an understanding of the matters of concern, i.e. the
disputed actors within an article. In this section we describe
and present the design of the main components of the Con-
tropedia platform. The questions we have currently sought
to answer are: What is controversial and to what extent?
Who is involved in the controversy? When are controversies
These answers are addressed in a number of components,
which consist of a layer on top of the controversial Wikipedia
article as well as a dashboard view of controversial actors.
These visualizations depict three main sources of data taken
from the Wikipedia platform: controversial elements within
an article, discussion threads from talk pages, and interac-
tion networks of Wikipedians editing each other’s revisions.
4.1 The overall platform design
Contropedia provides visual access to the our analyses, in
order to assist the user in understanding how controversies
are deployed. Diagrammatic tools are particularly suited
for this purpose, as controversies need tools that do not di-
vide or analyze the elements separately but present them in
an interconnected and indivisible manner [15]. In Contro-
pedia, information visualization is used to convey the page
history, its genesis and the social dynamics which generated
it. Providing visual access to the page history provides an
overview and allows the user to study the dynamics behind
controversial topics. It also allows to compare controver-
sial pages, to identify patterns, similarities and dierences.
Compared to other tools that use visualization on Wikipe-
dia, see e.g. [7, 25], Contropedia proposes a shift in visu-
alization goals. The visualization does not just represent
an abstract view of Wikipedia data but overlays our mea-
sures of controversiality onto Wikipedia’s contents, similar
to annotations. We have applied methods from information
visualization to identify and design visual models in order
to communicate the data and to provide useful interactions
to explore controversies [2]. The aim of these visualizations
is not to convey data in a more ecient way, but rather to
use them to shape new knowledge about the topic [12].
22A demo of Contropedia can be found at http://www. We provide it here for review pur-
poses only. Please do not distribute the link or make it
public, as the system is still in an experimental phase.
Figure 1: Controversial layer view: minified ver-
sion of the Global Warming article which visualizes
where controversial wiki objects are located within
an article, as well as the discussion and content tog-
gle options.
Contropedia seeks to annotate and rework the original
Wikipedia article layout to show which actors are contro-
versial and in which part of the article they are. The whole
page is represented through a minified version of the arti-
cle (see Figure 1)23 that simplifies the original article and
highlights the controversial elements while respecting the
original article’s layout. To achieve this, a visual contrast
has been created among controversial and non-controversial
elements. Controversial elements are represented through
five colour shades, from the most controversial (red) to the
least one (pale blue). To enhance the visual contrast, colors
and graphical elements were removed from the page. Text
is rendered using a block font, enhancing each section pro-
portion and controversial elements position. Images are re-
placed with a standard placeholder. We used gray colours
to represent all non-controversial elements.
Contropedia’s visual navigation is based on the semantic
zooming pattern: “Zooming need not follow a strict geomet-
ric metaphor: semantic zooming methods can modify both
the amount of information shown and how it is displayed as
analysts move among levels of detail.” [5]. In Contropedia,
each page is subdivided according to article sections. Em-
pirically, article sections are a good solution to sample the
article in coherent sub-elements24, as sections can be seen as
minimum thematic elements. In our visual representation,
each section will share the same graphical structure. The
use of a compact, minified, representation allows one to get
an overall view of the article and to identify the most rel-
evant, controversial, sections at a glance. The user is then
able to expand part of the article and see the article’s text
in its original layout. By hovering a wiki object, a further
level of zoom is provided showing the list of edits related to
the wiki object, i.e. the debate about the actor..
4.2 Where is the controversy located, and what
is controversial and to what extent?
Contropedia shows how much a given part of an article is
edited sequentially by disagreeing article editors (to appreci-
ate the lack of consensus on it). Additionally, if the section
of an article is discussed in the talk pages, the correspond-
23Figures 1, 2 and 3 can be seen as a functional demo on Last accessed May 1, 2014.
24Wikipedia guidelines recommend to avoid too short or
too long sections.
Basic_technique Last accessed May 1, 2014
Figure 2: Content view: Zoomed in layer on top
of the original Global Warming article with contro-
versial wiki objects highlighted according to their
controversiality index.
ing discussion threads are linked from the article and their
controversiality is accounted for25.
4.2.1 Controversial edits
When the application is loaded, each section is represented
with its title, a lateral bar showing the overall controversy
level, and two controversy navigation buttons: discussions
and content (see Figure 1). When clicking discussions the
main threads related to the section are represented through
dots. When clicking content the minified version of that sec-
tion can be extended so that a zoom overlay of the original
article’s section is shown (see Figure 2).
The content button reveals the original Wikipedia struc-
ture of the article with additional highlights on the most
controversial wiki objects. Hovering those actors allows the
users to go into deeper detail and shows them the section
as if they were on the original page (see Figure 2). All the
original elements are set in gray scale mode, as links are
highlighted with a colored background according to their
controversiality index as the images are overlaid by a shape,
this way we can show controversial parts directly on the
Wikipedia format in a less abstract way.
On mouse-over the highlighted controversial wiki objects
within an article can be expanded to show the edits involv-
ing, or debate about, that wiki object (see Figure 3). This
function allows the user to zoom in and further look into
the edit history of a particular wiki object as the edits are
divided and presented by wiki objects, as opposed to the
overall article history over time as oered by the original
Wikipedia interface. The edit view allows the user to fur-
ther scrutinize the construction of controversial elements in
an article and to gain an understanding of what precisely is
disputed about that particular subsection of the controversy.
4.2.2 Controversial discussion threads
Underneath the discussion button at the top of the sec-
tion, dots represent the discussion threads in the talk pages
that are about that section. The size indicates the num-
ber of Wikipedia editors involved in the discussion thread
and the redness indicates the controversiality of the thread.
25At the time of writing this paper, this specific link has not
been implemented in the demo yet.
Figure 3: Edits view: The detailed view of the edits
related to the wiki object ‘climate’ in the Global
Warming article.
Figure 4: Discussion view: Toggled view of the ‘dis-
cussion’ button depicting the history of a discussion
and the authors involved. This is a mockup and not
yet implemented in the demo.
Toggling the discussion button shows the history of the dis-
cussion behind the page and the Wikipedians involved in it
(see Figure 4). Following previous project guidelines and
the overall color scheme, threads are represented as trees to
highlight the evolution, the ramifications and the dynamics
of the debate during time. Each tree leads to the full thread
on the discussion page.
4.2.3 Controversial wiki objects
The layer view provides a powerful way to present what
the controversial elements are within the context of the arti-
cle. In addition to the layer view the Contropedia platform
presents a dashboard view to quickly identify the most con-
troversial elements (see Figure 5)26. This view is designed
to address a more analytical view on the data and also in-
cludes highly controversial elements which might have been
deleted in the shown revision. This dashboard consists of
a table listing all the controversial elements, from the most
controversial to the least one. Each cell contains a dier-
ent variable associated to it, represented through a suitable
visualization. The table adopts the small multiplies idea,
26The figure is accesible in the Contropedia Demo under Last accessed May 1, 2014.
Figure 5: Dasboard view: Analytics for the most
controversial elements in the article on Global
Warm i n g .
allowing a vertical comparison of the same variable among
controversial elements. Through its color represents the bar
on the left how controversial and actor is.
A timeline shows the amount of edits through time, al-
lowing the user to identify historical periods where several
edits aected the same element. The timeline is visualized
using a horizon chart; a visualization which is particularly
suitable for items with high variability that should be rep-
resented in a compact way. The type of actor is represented
through an icon. All the other variables are represented
through bars. Deletes, inserts and changes are normalized
on the maximum value of the three. In this way it is possi-
ble to confront the same variable among dierent elements
but also to compare values of the same item. The remaining
variables are normalized independently of their maximum
value. This is due the fact that are dierent properties of
the page, and it would not make sense to compare them.
The timeline visualization is also meant to provide access
to the full list of single revisions. Following the folded in-
teraction which was used in the minified layout, when the
timeline is clicked, all edits to that wiki object are shown
(see Figure 3). This revisions table adopts a classical “di
visualization” showing the added and removed parts with
color-coding. It also shows several other relevant variables
like the name of the editor, the timestamp of the edit, the
comment, and the name of the section in which the edit was
The interactive layer view on top of the article, as well as
the additional dashboard view, show various analytics and
provide the user of Contropedia with various possibilities
to quickly gain an understanding of what is controversial,
where something is controversial within the article, and to
what extent. A feature which we have not yet implemented
will be placed at the top of the layer view and will allow
the user of Contropedia to set the time frame to visualize
the selected article. This will allow the user to explore the
trajectory of which sections and actors were controversial
at which time, i.e. at what point an actor was still hot or
whether its controversiality has cooled down over time.
4.3 Who is involved in the discussion?
The Contropedia platform helps in discovering who are
the main actors of the controversy. In addition to extract-
ing the actors in the debate, Contropedia also extracts the
actor of the debate, that is to say, the Wikipedians which
have been most active in rousing the controversy. In Contro-
pedia the editors of an article are used as proxies to identify
Figure 6: Reply network: Depicting Wikipedia editors as nodes and replies between editors as edges based
on the talk page of the Climatic Research Unit email controversy article.
controversial wiki objects and to define clusters, or editor
camps, around an issue. These are in turn used to identify
the multiplicity of viewpoints in the controversy by connect-
ing editor clusters to the substance of the controversy in the
article. For the purpose of this paper, we provide a case
study of these actors in the context of one specific article:
the Climatic Research Unit email controversy.
To generate clusters of editor camps we built editor net-
works based on two types of interactions between editors:
interactions in reply chains in the article’s talk page (reply
network), and reverts of edits to the article (revert network).
4.3.1 Interaction graphs
An interaction in a reply chain means that in a specific
discussion thread on the article talk page the users have
mutually replied to each other consecutively. The most sim-
ple case we consider is that user Awrites a comment, user
Breplies to this comment and user Areplies back. Much
longer chains can occur.
Examples of these graphs can be observed in Figure 6
(reply network) and Figure 7 (revert network). In both cases
anonymous users have been removed from the networks. The
edge weights indicate the number of reply chains between
each pair of users in the reply network and, in the case of
the reverts network, the number of reverts between the two
users (more exactly the sum of the reverts between the two
Figure 7: Revert graph: Depicting Wikipedia edi-
tors as nodes and reverts of editor contributions as
edges based on the edit history of the Climatic Re-
search Unit email controversy article.
users in both directions divided by two and rounded up to
the nearest integer).
4.3.2 “My enemy’s enemy is my friend” network
All the interactions described in the previous sub-section
can be considered interactions between antagonists in the
controversies. Thus, standard community detecting meth-
ods which aim at maximizing the number of connections
within (and minimize those outside) a community are not
suitable here.
We have therefor developed a Gephi plugin27 aimed at
finding communities when edges are connecting antagonists.
It is based on the idea that “the enemy of my enemy is
my friend”. The plugin first computes a “friendship” score
between each pair of nodes. The more common antagonists
two nodes have, the higher their friendship score will be.
The more connected two nodes are, the lower their friendship
score will become. Then all original edges of the network are
removed (for community detection) and each pair of nodes
that has a score higher than a given threshold will become
connected. Finally, Gephi’s modularity algorithm is used to
detect communities in this new graph. Later the original
edges of the network are restored and the provisional edges
are removed again.
Examples of the outcome of this algorithms can be found
in Figure 6 (reply network) and Figure 7 (revert network).
Nodes from the same communities (those “fighting” with the
same users) have the same color. We observe that the large
majority of interaction happens in between the users of the
dierent communities. The algorithm is still in an exper-
imental phase and extensive user evaluation are currently
being performed to verify their outcome. Nonetheless an
initial verification of the outcome of the algorithm indicates
that the obtained communities indeed correspond to persons
with similar positions in the controversies.
While currently these networks are generated on the basis
of an article’s full edit history, thus involving all wiki ob-
jects within the article, we will soon also implement these
networks around specific wiki objects. This will allow us to
indicate per actor whether the debate around it is polarized
or involves multiple groups, whether the controversy is ar-
tificially kept alive by a specific set of users (which might
point towards slant or purposeful bias), etc.
The DMI proverb to“learn from medium method” [16] en-
tails that any online (search) engine or platform has its own
specific methods and objects with which digital researchers
can work. This medium-specific approach seeks to iden-
tify the core objects of the medium, to study how these ob-
jects are handled by the medium and to repurpose medium
method productively. Closely looking at the inner workings
of Wikipedia and thereby recognizing that Wikipedia works
as a controversy defusing device, and that each link within
an article can be seen as an actor in a debate, has allowed us
to repurpose controversial Wikipedia articles as ideal sites
to map controversies. We have shown in this paper how
actors in a Wikipedia article may be assigned controversy
scores by using edit histories and talk pages. We then in-
troduced the design of Contropedia which is intended to
get a quick overview of what is controversial in an article,
where and to what extent. The design, and accompanying
27The Antagonist Based Community Detection plugin for
Gephi can be found at
demo, furthermore allow the users of Contropedia to zoom
in to the specific discussion and changes around actors in
the unfolding debate. The last part of our paper dealt with
detecting which Wikipedians which have been most active in
rousing the controversy, identifying the muliplicity of view-
points around actors in a controversy and showing which
issues they hold dear. We thus believe that Contropedia,
as an implementation of controversy mapping, provides an
elaborate toolkit of social life as it unfolds on Wikipedia.
We thus foresee many dierent types of users:
Wikipedians can gain insight into the substance and
build-up of controversies, allowing informed decisions
about the management of “edit wars” and disagree-
ments about the articles’ content.
For scientists it will provide a tool to visualize in real-
time the dynamics of techno-scientific debates, stim-
ulating the framing and phrasing of scientific issues,
helping to clarify and defuse the conflict.
For teachers the platform could be employed to teach
the complexity of techno-scientific controversies. By
showing the full complexity of a scientific controversy
as well as its entanglement of relations and networks,
Contropedia can provide an innovative viewpoint on
’science in action’.
For decision makers, both public and private, it will
allow to more readily understand the essence and the
extent of open controversies and therefore the assess-
ment of their potential consequences.
For communicators, on professional media or open plat-
forms (e.g. blogs), access to the background of the con-
troversies will facilitate a better informed description
of facts and issues to the public.
For citizens wishing to be better informed, and for so-
ciety in general, access to an integrated platform map-
ping the nature and the extent of Wikipedia contro-
versies will help to build a greater trust in the online
encyclopedia as well as to make its limits explicit.
Although we are still in early phases of the Contropedia
platform this paper is intended to share our work as soon
as possible with the Wikipedia community and Wikipedia
researchers, so that we can get early feedback about the use-
fulness of our system. As such we have already identified at
various points in our paper where there is still room for im-
provement or experimentation. A few other areas could also
use further exploration. First, as a link should generally
“appear only once in an article, but if helpful for readers,
[they] may be repeated in infoboxes, tables, image captions,
footnotes, and at the first occurrence after the lead”
increase the resolution of our metrics it could be fruitful to
also detect where in the text a link could have been. Sec-
ond, as in our current metrics each article will always have
a most controversial actor, we currently focus only on ar-
ticles which are already known to be controversial. Third,
while we currently focus on a subset of the controversial arti-
cles in the English Wikipedia we would like to expand these
Last accessed on May 1, 2014.
techniques to other controversial articles and other language
Wikipedias. And last, apart from just looking at the contro-
versiality of actors within one article, we also intend to allow
Contropedians to explore related sets of articles. While cur-
rently we provide a zoom in to actors within an article, we
could thus also allow one to zoom out and see which articles
are more controversial in the network of related articles.
The research leading to these results has received funding
from the EU FP7 EINS under grant agreement No 288021.
We also wish to thank Alexis Jacomy and Benjamin Ooghe
for their help in the development of this project.
[1] U. Brandes and J. Lerner. Visual analysis of
controversy in user-generated encyclopedias.
Information Visualization,7(1):3448,2008.
[2] P. Ciuccarelli. Mind the Graph: From Visualization to
Collaborative Network Constructions. Leonardo,47(3),
[3] M. D. Ekstrand and J. T. Riedl. rv you’re dumb:
identifying discarded work in wiki article history. In
Proceedings of the 5th international Symposium on
Wikis and Open Collaboration, WikiSym ’09, pages
4:1–4:10, 2009.
[4] E. J. Hackett, O. Amsterdamska, M. Lynch, and
J. Wajcman. The Handbook of Science and Technology
Studies. The MIT Press, 3 edition, 2008.
[5] J. Heer and B. Shneiderman. Interactive dynamics for
visual analysis. Queue,10(2):30,2012.
[6] A. Kaltenbrunner and D. Laniado. There is No
Deadline: Time Evolution of Wikipedia Discussions.
In Proceedings of the Eighth Annual International
Symposium on Wikis and Open Collaboration,
WikiSym ’12, pages 6:1–6:10, New York, NY, USA,
2012. ACM.
[7] A. Kittur, B. Suh, B. Pendleton, and E. Chi. He says,
she says: conflict and coordination in Wikipedia. In
Proceedings of the SIGCHI conference on Human
factors in computing systems, pages 453–462, 2007.
[8] D. Laniado, A. Kaltenbrunner, and T. Venturini.
Identifying more Wikipedia articles related to Climate
change, 2012.
[9] D. Laniado, R. Tasso, Y. Volkovich, and
A. Kaltenbrunner. When the Wikipedians talk:
network and tree structure of Wikipedia discussion
pages. Proceedings of ICWSM,2011.
[10] C. Li, A. Datta, and A. Sun. Mining latent relations in
peer-production environments: a case study with
Wikipedia article similarity and controversy. Social
Network Analysis and Mining,2(3):265278,2012.
[11] W. Lippmann. The phantom public. Transaction
Publishers, 1927.
[12] L. Masud, F. Valsecchi, P. Ciuccarelli, D. Ricci, and
G. Caviglia. From data to knowledge-visualizations as
transformation processes within the
data-information-knowledge continuum. In
Information Visualisation (IV), 2010 14th
International Conference, pages 445–449. IEEE, 2010.
[13] S. Niederer and J. Van Dijck. Wisdom of the crowd or
technicity of content? Wikipedia as a sociotechnical
system. New Media & Society,12(8):13681387,2010.
[14] H. S. Rad and D. Barbosa. Identifying controversial
articles in Wikipedia: A comparative study. In
Proceedings of the Eighth Annual International
Symposium on Wikis and Open Collaboration,
WikiSym ’12, pages 7:1–7:10. ACM, 2012.
[15] D. Ricci. Seeing what they are saying: Diagrams for
socio-technical controversies. Proceedings of DRS
[16] R. Rogers. Digital Methods. MIT Press, Cambridge,
MA, USA, 2013.
[17] H. Sepehri Rad, A. Makazhanov, D. Rafiei, and
D. Barbosa. Leveraging Editor Collaboration Patterns
in Wikipedia. In Proceedings of the 23rd ACM
Conference on Hypertext and Social Media,HT’12,
pages 13–22, New York, NY, USA, 2012. ACM.
[18] B. Suh, E. Chi, B. Pendleton, and A. Kittur. Us vs.
them: Understanding social dynamics in Wikipedia
with revert graph visualizations. In Visual Analytics
Science and Technology, 2007. VAST 2007. IEEE
Symposium on, pages 163–170, 2007.
[19] R. Sumi, T. Yasseri, A. Rung, A. Kornai, and
J. Kertesz. Edit Wars in Wikipedia. In 2011 IEEE
Third International Conference on Social Computing
(SocialCom), pages 724–727, 2011.
[20] T. Venturini. Diving in magma: How to explore
controversies with actor-network theory. Public
Understanding of Science,19(3):258273,2010.
[21] T. Venturini. Building on faults: how to represent
controversies with digital methods. Public
Understanding of Science,21(7):796812,2012.
[22] T. Venturini and B. Latour. The social fabric: Digital
traces and quali-quantitative methods. Proceedings of
Future En Seine 2009, pages 30–15, 2010.
[23] F. Vi´egas, M. Wattenberg, and K. Dave. Studying
cooperation and conflict between authors with history
flow visualizations. In Proceedings of the SIGCHI
conference on Human factors in computing systems,
pages 575–582, 2004.
[24] B.-Q. Vuong, E.-P. Lim, A. Sun, M.-T. Le, H. W.
Lauw, and K. Chang. On Ranking Controversies in
Wikipedia: Models and Evaluation. In Proceedings of
the 2008 International Conference on Web Search and
Data Mining, WSDM ’08, pages 171–182, New York,
NY, USA, 2008. ACM.
[25] M. Wattenberg, F. Vi´egas, and K. Hollenbach.
Visualizing Activity on Wikipedia with Chromograms.
In C. Baranauskas, P. Palanque, J. Abascal, and
S. Barbosa, editors, Human-Computer Interaction –
INTERACT 2007, volume 4663 of Lecture Notes in
Computer Science, pages 272–287. Springer Berlin
Heidelberg, 2007.
[26] E. Weltevrede and E. Borra. Repurposing Wikipedia
as a Controversy Exploration Device. In Digital
Methods Mini-Conference, Amsterdam, 2013.
[27] T. Yasseri, R. Sumi, A. Rung, A. Kornai, and
J. Kert´esz. Dynamics of conflicts in Wikipedia. PloS
... These effects would further increase the affiliation between Google searches and the news coverage of ad hoc S&T topics. Different from the automatic ranking algorithm of Google, Wikipedia organizes information based on human contribution and a continuous negotiation among editors (Borra et al., 2014). It would therefore be very instrumental website to deepen one's knowledge on cyclic S&T topics such as those triggered by educational cues. ...
... Wikipedia, on the other hand, has very different function and information retrieval mechanism. It mostly relies on peoples' contributions, and the negotiation between human editors (Borra et al., 2014). Edits in Wikipedia are often done in batches since users save multiple minor edits to a single article (Weltevrede et al., 2014). ...
In response to the news coverage of scientific events and to science education, people increasingly go online to get more information. This study investigates how patterns of science and technology information-seeking on Google and Wikipedia change over time, in ways that differ between “ad hoc” terms that correspond to news coverage and “cyclic” terms that correspond to the academic period. Findings show that the science and technology activity in Google and Wikipedia was significantly associated with ad hoc and cyclic patterns. While the peak activity in Google and Wikipedia largely overlapped for ad hoc terms, it mismatched for cyclic terms. The findings indicate the importance of external cues such as news media and education, and also of the online engagement process, and particularly the crucial but different role played by Google and Wikipedia in gaining science and technology knowledge. Educators and policy makers could benefit from taking into account those different patterns.
... However, from an epistemic perspective, they lay the responsibility for assessing content quality and reliability to third parties via reliable sources [4]. On the one hand, reliable sources are essential to Wikipedia's status of a neutral encyclopedia, yet on the other hand the selection of sources invariably leads to controversies [18] and even edit wars [19]. To be sure, this might largely be a feature as some researchers believe that the existence of such controversies ultimately leads to better quality articles [20]. ...
Full-text available
Wikipedia is the largest online encyclopedia: its open contribution policy allows everyone to edit and share their knowledge. A challenge of radical openness is that it facilitates introducing biased contents or perspectives in Wikipedia. Wikipedia relies on numerous external sources such as journal articles, books, news media, and more. News media sources, in particular, take up nearly third of all citations from Wikipedia. However, despite their importance for providing up-to-date and factual contents, there is still a limited understanding on which news media sources are cited from Wikipedia. Relying on a large-scale open dataset of nearly 30M citations from English Wikipedia, we find a moderate yet systematic liberal polarization in the selection of news media sources. We also show that this effect is not mitigated by controlling for news media factual reliability. Our results contribute to Wikipedia's knowledge integrity agenda in suggesting that a systematic effort would help to better map potential biases in Wikipedia and find means to strengthen its neutral point of view policy.
... This entails a shift from the analytical functions built into platforms to ''critical analytics'' in order to draw attention to their mediating capacities (Rogers, 2018). For example, researchers use data from edit histories and talk pages on Wikipedia in order to map controversies ( Borra et al., 2014;Weltevrede and Borra, 2016). These features of Wikipedia were originally intended to coordinate the improvement of articles, foster consensus and revert spam. ...
Full-text available
A recent report from the UN makes the case for “global data literacy” in order to realise the opportunities afforded by the “data revolution”. Here and in many other contexts, data literacy is characterised in terms of a combination of numerical, statistical and technical capacities. In this article, we argue for an expansion of the concept to include not just competencies in reading and working with datasets but also the ability to account for, intervene around and participate in the wider socio-technical infrastructures through which data is created, stored and analysed – which we call “data infrastructure literacy”. We illustrate this notion with examples of “inventive data practice” from previous and ongoing research on open data, online platforms, data journalism and data activism. Drawing on these perspectives, we argue that data literacy initiatives might cultivate sensibilities not only for data science but also for data sociology, data politics as well as wider public engagement with digital data infrastructures. The proposed notion of data infrastructure literacy is intended to make space for collective inquiry, experimentation, imagination and intervention around data in educational programmes and beyond, including how data infrastructures can be challenged, contested, reshaped and repurposed to align with interests and publics other than those originally intended.
... Because of the way in which MediaWiki (the software that supports the famous collabo- rative encyclopaedia) stores information, 'controversiality' can be operationalised not only at the article level (to highlight which topics are disputed) but also at the level of smaller elements such as the links within the articles (to reveal, for instance, which refer- ences are most contested). In addition to this, multiple measures of controversiality may be defined, from the volume of edit histories, to the depth of discussions in associated talk pages (Borra et al., 2014;Weltevrede and Borra, 2016). Each of these operationalisa- tions leads to a different appraisal of what constitutes a matter of concern or an expres- sion of disagreement. ...
Full-text available
Digital Methods can be defined as the repurposing of the inscriptions generated by digital media for the study of collective phenomena. The strength of these methods comes from their capacity to take advantage of the data and computational capacities of online platforms; their weakness comes from the difficulty to separate the phenomena that they investigate from the features of the media in which they manifest (‘the medium is the message’, according to McLuhan’s 1964 dictum). In this article, we discuss various methodological difficulties deriving from the lack of separation between medium and message and propose eight practical precautions to deal with it.
... Because of the way in which MediaWiki (the software that supports the famous collaborative encyclopaedia) stores information, 'controversiality' can be operationalized at the article level (to highlight which topics are disputed), but also at the level of smaller elements such as the links within the articles (to reveal, for instance, which references are most contested). In addition to this, multiple measures of controversiality may be defined, from the volume of edit histories, to the depth of discussions in associated talk pages (Borra et al, 2014, Weltevrede andBorra, 2016). Each of these operationalizations lead to a different appraisal of what constitutes a matter of concern or an expression of disagreement. ...
Full-text available
Digital Methods can be defined as the repurposing of the inscriptions generated by digital media for the study of collective phenomena. The strength of these methods comes from their capacity to take advantage of the data and computational capacities of online platforms; their weakness comes from the difficulty to separate the phenomena that they investigate from the features of the media in which they manifest (‘the medium is the message’, according to McLuhan’s 1964 dictum). In this article, we discuss various methodological difficulties deriving from the lack of separation between medium and message and propose eight practical precautions to deal with it.
... It often happens, however, that digital traces provide us information directly about opposition. For instance, studying controversies in Wikipedia, we can easily access 'reverts' and other antagonistic edits, but to exploit them to detect 'edit-factions' we need to turn the network around, according to the principle of 'my enemy's enemy is my friend' ( Borra et al, 2014). ...
Full-text available
This paper discusses the differences and affinities among three types of networks (namely Actor-?Network, Network Analysis and Digital Networks) that are playing an increasingly important role in digital STS.In the last few decades, the idea of ‘network’ has slowly but steadily colonized broad strands of STS research. This colonization started with the advent of actor-?network theory, which provided a convenient set of notions to describe the construction of socio-?technical phenomena. Then came network analysis, and scholars who imported in the STS the techniques of investigation and visualization developed in the tradition of social network analysis and scientometrics. Finally, with the increasing ‘computerization’ of STS, scholars turned their attention to digital networks a way of tracing collective life.Many researchers have more or less explicitly tried to link these three movements in one coherent set of digital methods for STS, betting on the idea that actor-?network theory can be operationalized through network analysis thanks to the data provided by digital networks. Yet, to be honest, little proves the continuity among these three objects besides the homonymy of the word ‘network’. Are we sure that we are talking about the same networks?"Odi
... These studies have shown that edithistory information such as the number of reverted revisions, length of discussions, editors' vote for one another in elections can be used both to automatically find conflict within articles, and for ascertaining levels of trustworthiness. Other studies have modeled, mapped and visualized controversy over time in Wikipedia[3,4,23,7,18,5,7,10]or have worked on visualizing, mapping and modeling collaboration patterns and content change[11,27,26]. The visualization approaches that used color schemas, dashboards and representing text as lines were effective in unmasking the types of social behaviors such as negotiation and consensus that occur through the knowledge building process in Wikipedia. ...
Conference Paper
Wikipedia has challenged the way traditional encyclopedia knowledge is built and contested by creating an open sociotechnical environment that allows non-domain experts to contribute to scientific and medical knowledge. The open nature of Wikipedia has been successful in attracting readers to its medical content, but there are doubts about the quality and trustworthiness of its articles. The goal of this research is to increase transparency and trust in Wikipedia medical articles by understanding the process of medical knowledge building over time. Health-related articles in Wikipedia pass through increasing trends in editing activities. In addition, health-related articles include medical controversies that are discussed between editors. By examining the community's levels of engagement and reactions over time through the lens of Actor Network Theory and applying quantitative and qualitative analyses of actors and their relations, the contribution of this work will extend theory to offer both theoretically- and empirically-informed design principles for building and evaluating crowd-sourced knowledge environments that engender trust and maintain transparency.
Full-text available
In this article, we present a few lessons we learnt in the establishment of the Sciences Po médialab. As an interdisciplinary laboratory associating social scientists, code developers and information designers, the médialab is not one of a kind. In the last years, several of such initiatives have been established around the world to harness the potential of digital technologies for the study of collective life. If we narrate this particular story, it is because, having lived it from the inside, we can provide an intimate account of the surprises and displacements of digital research. Founding the médialab in 2009, we knew that we were leaving the reassuring traditions of social sciences to venture in the unexplored territory of digital inscriptions. What we couldn't foresee was how much such encounter would change our research. Buying into gospel of Big Data, we imagined that the main novelty of digital research came from handling larger amounts of data. We soon realized that the interest of digital inscriptions comes instead from their proliferating diversity. Such diversity encouraged us to reshape our professional alliances, research practices and theoretical perspectives. It also led us to overcome several of the oppositions that used to characterize social sciences (qualitative/quantitative, situation/aggregation, micro/macro, local/global) and to move in the direction of a more continuous sociology.
Conference Paper
Wikipedia has challenged the way traditional encyclopedia knowledge is built and contested by creating an open socio-technical environment that allows non-domain experts to contribute to scientific and medical knowledge. The open nature of Wikipedia has been successful, but there are concerns about the quality and trustworthiness of its articles. The goal of my research is to build a theoretical framework to explain the dynamic of knowledge building in crowd-sourcing based environments like Wikipedia and judge the trustworthiness of the medical articles based on the dynamic network data. By applying Actor Network Theory and Social Network Analysis, the contribution of my research is theoretical and practical as to build a theory on the dynamics of knowledge building in Wikipedia across times and to offer insights for developing citizen science crowd-sourcing platforms by better understanding how editors interact to build health science content.
Conference Paper
Full-text available
Wikipedia articles are the result of the collaborative editing of a diverse group of anonymous volunteer editors, who are passionate and knowledgeable about specific topics. One can argue that this plurality of perspectives leads to broader coverage of the topic, thus benefitting the reader. On the other hand, differences among editors on polarizing topics can lead to controversial or questionable content, where facts and arguments are presented and discussed to support a particular point of view. Controversial articles are manually tagged by Wikipedia editors, and span many interesting and popular topics, such as religion, history, and politics, to name a few. Recent works have been proposed on automatically identifying controversy within unmarked articles. However, to date, no systematic comparison of these efforts has been made. This is in part because the various methods are evaluated using different criteria and on different sets of articles by different authors, making it hard for anyone to verify the efficacy and compare all alternatives. We provide a first attempt at bridging this gap. We compare five different methods for modelling and identifying controversy, and discuss some of the unique difficulties and opportunities inherent to the way Wikipedia is produced.
Full-text available
Predicting the positive or negative attitude of individuals towards each other in a social environment has long been of interest, with applications in many domains. We investigate this problem in the context of the collaborative editing of articles in Wikipedia, showing that there is enough information in the edit history of the articles that can be utilized for predicting the attitude of co-editors. We train a model using a distant supervision approach, by labeling interactions between editors as positive or negative depending on how these editors vote for each other in Wikipedia admin elections. We use the model to predict the attitude among other editors, who have neither run nor voted in an election. We validate our model by assessing its accuracy in the tasks of predicting the results of the actual elections, and identifying controversial articles. Our analysis reveals that the interactions in co-editing articles can accurately predict votes, although there are differences between positive and negative votes. For instance, the accuracy when predicting negative votes substantially increases by considering longer traces of the edit history. As for predicting controversial articles, we show that exploiting positive and negative interactions during the production of an article provides substantial improvements on previous attempts at detecting controversial articles in Wikipedia.
Full-text available
Wikipedia is often considered as an example of ‘collaborative knowledge’. Researchers have contested the value of Wikipedia content on various accounts. Some have disputed the ability of anonymous amateurs to produce quality information, while others have contested Wikipedia’s claim to accuracy and neutrality. Even if these concerns about Wikipedia as an encyclopaedic genre are relevant, they misguidedly focus on human agents only. Wikipedia’s advance is not only enabled by its human resources, but is equally defined by the technological tools and managerial dynamics that structure and maintain its content. This article analyses the sociotechnical system — the intricate collaboration between human users and automated content agents — that defines Wikipedia as a knowledge instrument.
Full-text available
In a previous article in this journal, I introduced Bruno Latour's cartography of controversies and I discussed half of it, namely how to observe techno-scientific controversies. In this article I will concentrate on the remaining half: how to represent the complexity of social debates in a legible form. In my previous paper, we learnt how to explore the richness of collective existence through Actor-Network Theory. In this one, I will discuss how to render such complexity through an original visualization device: the controversy-website. Capitalizing on the potential of digital technologies, the controversy-website has been developed as a multilayered toolkit to trace and aggregate information on public debates.
Full-text available
The cartography of controversies is a set of techniques to explore and visualize issues. It was developed by Bruno Latour as a didactic version of Actor-Network Theory to train college students in the investigation of contemporary socio-technical debate. The scope and interest of such cartography, however, exceed its didactic origin. Adopted and developed in several universities in Europe and the US, the cartography of controversies is today a full research method, though, unfortunately, not a much documented one. To fill this lack of documentation, we draw on our experience as Latour’s teaching assistant, to introduce some of the main techniques of the social cartographer toolkit. In particular, in these pages we will focus on exploration, leaving the discussion of visualization tools to a further paper.
A taxonomy of tools that support the fluent and flexible use of visualizations.
Social and Human Sciences have recently discovered the potential of a hybrid research process, where the specificity of design knowledge and the peculiarity of design thinking can be exploited. Two ongoing experiences demonstrate how - after a first stage where Communication Design has been placed at the end of a linear sequence from data to prototypes - a more integrated and collaborative research process can be established, building on the proclivity of humanities scholars to mingle thinking and making.
The increasing scale and availability of digital data provides an extraordinary resource for informing public policy, scientific discovery, business strategy, and personal lives. To enable analysts to explore large datasets involving varied data types, flexible visual analysis tools must provide appropriate controls for specifying the data and views of interest. Classic scientific visualization systems use data-flow graphs, in which the visualization process is deconstructed into a set of finer-grained operators for data import, transformation, layout, or coloring. When analyzing data with visualizations, users regularly traverse the space of views in an iterative fashion. Interactive visualizations often serve not only as data-exploration tools, but also as a means for recording, organizing, and communicating insights gained during exploration. Data-aware annotations allow a pointing intention to be reapplied to different views of the same data, enabling reuse of references across different choices of visual encodings.