Conference PaperPDF Available

Collaborative Visualizations for Wikipedia Critique and Activism

Authors:

Abstract

Wikipedia is one of the largest platforms based on the concept of asynchronous, distributed, collaborative work. A systematic collaborative exploration and assessment of Wik-ipedia content and coverage is however still largely missing. On the one hand editors routinely perform quality and coverage control of individual articles, while on the other hand academic research on Wikipedia is mostly focused on global issues, and only sporadically on local assessment. In this paper, we argue that collaborative visualizations have the potential to fill this gap, affording editors to collaboratively explore and analyse patterns in Wikipedia content, at different scales. We illustrate how a collaborative visualization service can be an effective tool for editors to create, edit, and discuss public visualizations of Wikipedia data. Combined with the large Wikipedia user-base, and its diverse local knowledge, this could result in a large-scale collection of evidence for critique and activism, and the potential to enhance the quantity and quality of Wikipedia content.
Electronic copy available at: http://ssrn.com/abstract=2592528
Collaborative Visualizations for Wikipedia Critique and Activism
Stefano De Sabbata, Kathryn Eccles,
Scott Hale, Ralph Straumann
Oxford Internet Institute,
University of Oxford
1 St Giles', Oxford, England
{stefano.desabbata, kathryn.eccles, scott.hale, ralph.straumann}@oii.ox.ac.uk
Arzu Çöltekin
Department of Geography
University of Zurich
Winterthurerstr. 190, Zurich, Switzerland
arzu.coltekin@geo.uzh.ch
Abstract
Wikipedia is one of the largest platforms based on the con-
cept of asynchronous, distributed, collaborative work. A
systematic collaborative exploration and assessment of Wik-
ipedia content and coverage is however still largely missing.
On the one hand editors routinely perform quality and cov-
erage control of individual articles, while on the other hand
academic research on Wikipedia is mostly focused on global
issues, and only sporadically on local assessment. In this
paper, we argue that collaborative visualizations have the
potential to fill this gap, affording editors to collaboratively
explore and analyse patterns in Wikipedia content, at differ-
ent scales. We illustrate how a collaborative visualization
service can be an effective tool for editors to create, edit,
and discuss public visualizations of Wikipedia data. Com-
bined with the large Wikipedia user-base, and its diverse lo-
cal knowledge, this could result in a large-scale collection of
evidence for critique and activism, and the potential to en-
hance the quantity and quality of Wikipedia content.
Introduction
Wikipedia articles are a prime example of asynchronous,
distributed collaboration on an internet scale. Editors from
all the connected part of the world can gather to collaborate
on a single shared document, without being in the same
physical place or working on that document at the same
time. For example, at the time of this writing, the term
“city of dreaming spires” used by poet Matthew Arnold to
describe Oxford is still part of Oxford’s Wikipedia article1,
as it was in the first version of the article, created in May
2001 by the editor Mjausson2. This has given every Wik-
ipedia user the opportunity to reflect on that first descrip-
.
.
1 en.wikipedia.org/w/index.php?oldid=648076589
2 en.wikipedia.org/w/index.php?oldid=271629
tion of Oxford, to discuss it in the Talk page, edit, and
move snippets of text in the different sections of the article.
This is the essence of Wikipedia as a tool for asynchro-
nous, distributed, collaborative sensemaking.
While this asynchronous, distributed collaboration is at
the very heart of Wikipedia, the analysis of its content and
coverage is still largely composed of many separate, indi-
vidual efforts. Several research projects have focused on
the analysis of Wikipedia content, from cross-language
comparison (e.g., Hale, 2015; Hecht and Gergle 2010b,
Pfeil et al., 2006), to geographic analysis (e.g., Hecht and
Gergle 2009; Graham et al., 2014), to the analysis of con-
troversial topics (e.g., Yasseri et al., 2014). Since systemat-
ic collaborative exploration and assessment of Wikipedia
content and coverage is still largely missing, the platforms
mostly relies on ad-hoc assessments by users for decisions
about new content creation; i.e., users compare and analyse
article contents individually and then may decide to con-
tribute additional content or amend existing content.
What if the editors (and readers) were able to visualize
the content (e.g., word frequencies), structure (e.g., which
articles are linked to which other articles), or statistics
(e.g., how many people write about a particular topic,
where are these people, how many visitors were on this
page and when)? What if these visualizations were also
collaborative, so that other editors could also edit them?
We contend that a tool allowing the broad Wikipedia
community to collaboratively explore and analyse Wikipe-
dia at different scales and collect evidence for critique and
activism has large potential to enhance the quantity and
quality of Wikipedia content. In other words; in this this
paper, we argue that collaborative visualizations (Pea,
1993; Isenberg et al., 2011) can afford this function by
giving groups of people the opportunity to reflect, discuss,
and edit a common visual representation of Wikipedia con-
tent (see e.g., Figures 1 and 3, discussed below), in the
Pre-print of the following paper: De Sabbata, S.; Çöltekin, A.; Eccles, K.; Hale, S.; Straumann, R. 2015.
Collaborative Visualizations for Wikipedia Critique and Activism. In Proceedings of ICWSM. AAAI. (forthcoming)
Electronic copy available at: http://ssrn.com/abstract=2592528
same way that editors can discuss and collaboratively edit
the content of individual Wikipedia articles today. There-
fore, collaborative visualizations would support the process
of asynchronous, distributed, collaborative sensemaking of
entire parts of Wikipedia, in addition to the sensemaking
that already occurs on the level of single articles.
Visual analytics
The term visual analytics was coined a decade ago by
Thomas and Cook (2005) to refer to the “science of human
analytical reasoning facilitated by interactive visualiza-
tions” (ibid: p.28). Visual analytics can be considered a
direct descendant of the concept of exploratory data analy-
sis proposed by Tukey (1977). The fundamental idea is to
combine the computer capabilities in automatic analysis
and the human capabilities in visual pattern recognition.
The aim, therefore, is to address a particular class of prob-
lems, which are both too complex or ill-defined to be fully
automatised (i.e., too hard for a computer), and involve
datasets too large and diverse to be presented in a static
visualization for humans to analyse (Keim et al, 2008;
2010). It comes as no surprise that several visual analytics
software programs are being developed in the recent wave
of ‘big data’ (Zhang et al, 2012), since they offer data ex-
ploration functionality and dashboards for making sense of
large datasets. Within the domain of visual analytics, the
field of geographic information science is devoting particu-
lar attention to the development of geo-visual analytics
methods, that can account for the spatial and temporal
components of data, and the inherent challenges that those
dimensions pose in terms of both analysis and visualization
methods (Andrienko et al., 2010).
This paper contends that the analysis of Wikipedia con-
tent falls into the category of problems that visual analytics
has been developed to tackle. The adequacy, correctness,
completeness, and currency of Wikipedia articles and cate-
gories is a complex and ill-defined problem that could
hardly be fully automatized. Moreover, information visual-
ization methods have long been used by researchers to ana-
lyse and investigate Wikipedia contents, edits, editors and
their geographies, as well as the differences between dif-
ferent editions. Methods employed range from pie charts
(Bao et al., 2012) to maps (Yasseri et al., 2014) and from
density plots to network diagrams (Hale, 2014).
Nonetheless, while ad-hoc processes and tools have so
far been successfully used by researchers, such methods
might not be suitable for Wikipedia contributors, who may
lack the tools, time, or skills to perform the technical pro-
cesses needed to create such visualizations. These factors
serve as barriers limiting the number of people who have
access to such analyses. In turn, not only the scope but
especially the scale of such analyses is diminished. “Local”
scale analyses might be of great interest and relevance to
particular communities, groups, or individuals but might
not be chosen as a research direction by professional scien-
tists with a global audience in mind, or simply lacking lo-
cal knowledge to do these subjects justice.
This paper further contends that collaborative visualiza-
tions, including collaborative geo-visualizations, can be a
useful means to enable the analysis of Wikipedia content at
scale. That is, a collaborative visualization service would
provide the Wikipedia community with a tool to perform
analyses of Wikipedia content, in a manner which would
be consistent with the principles and practices of Wikipe-
dia. Users would be able to collaborate in investigating the
structures and content of the platform, propose hypotheses,
collect evidence, formulate critiques, and promote actions,
such as new content creation and revision.
Collaborative visualization
One definition of collaborative visualization is “the shared
use of computer-supported, (interactive,) visual representa-
tions of data by more than one person with the common
goal of contribution to joint information processing activi-
ties” (Isenberg et al., 2011, p.312) which covers its most
important aspects. The key distinction between collabora-
tive visualization and other visualization environments is
the possibility of different users asynchronously accessing,
commenting, and editing visualizations created by other
users. When specifically applied to visual analytics ser-
vices, this approach is also referred to as collaborative vis-
ual analytics (Heer and Agrawala,2008).
Collaborative visualization services (e.g., Heer et al.,
2007; Viegas et al., 2007) are founded on the same princi-
ples as user-generated content websites like Wikipedia.
Thus, both offer very similar functionalities. A user of a
collaborative visualization service is able to create a new
visualization, which is visible and editable by any other
user of the same service. Users can edit visualizations,
leave comments, and graphically annotate them, while the
system records a changelog of each stage in the evolution
of the visualization thereby ensuring complete lineage in-
formation. Heer et al. (2007) discuss how each of these
functionalities has been used in a pilot study of the
sense.us website. They clearly illustrate how the comment
section is key to the ongoing process of sensemaking, as
different users observe and point out patterns, ask ques-
tions, and suggest interpretations of the visualized data
an analogous role is performed by Talk pages in Wikipe-
dia.
Similar concepts have been developed within the field of
geographic information science (Brewer et al., 2000; Brod-
lie et al., 2005). These take the forms of participatory geo-
graphic information systems (GIS) or public participation
GIS (Abbot et al., 1998; Dunn, 2007) and volunteered geo-
graphic information (VGI; Goodchild, 2007). These devel-
opments are also partially rooted in critical cartography
(Crampton and Krygier, 2006), and critical geographic
information systems (Harvey et al., 2005), and thus in the
on-going discussion within geography concerning the con-
cepts of space and place (Sui and Goodchild, 2011).
The following section presents three scenarios that illus-
trate how collaborative visualizations, visual analytics, and
geo-visual analytics methods could be applied to Wikipe-
dia as object of analysis.
Collaborative visualizations for Wikipedia
A long-standing challenge for Wikipedia has been that
most of its content (over 74 percent of all concepts) is writ-
ten in only one language (Hecht and Gergle 2010b). Fur-
thermore, even when users edit multiple language editions
of Wikipedia, they are much more likely to edit articles in
a second language that have a corresponding article in their
first languages (Hale, 2015). So-called interlanguage links
are a valuable resource to analyse what articles exist in
certain language editions but not others. Interlanguage
links connect articles about the same concept in different
languages. For example, the article on Oxford in English is
linked to the article on オックスフォード in Japanese.
Interlanguage links were previously maintained sepa-
rately in each language edition of Wikipedia through a mix
of human and machine processes. They did not necessarily
align perfectly between different language editions. In
2013, these separate interlanguage links were replaced with
a global, conflict-free, centrally stored and edited reposito-
ry in WikiData3. WikiData provides a knowledge base that
is closely coupled with Wikipedia, making it a good possi-
ble source of information for collaborative visualization
applications in general.
A collaborative visualization of the interlanguage link
data stored in WikiData could allow Wikipedia editors to
understand what concepts are covered in other languages
beyond the languages they edit in most frequently. This
could help both multilingual readers to discover additional
content and multilingual editors to write about some of this
3 Launched in 2012, www.wikidata.org
content in their primary languages thereby expanding the
coverage of each language edition of Wikipedia.
Figure 1 illustrates how a network diagram could be
used to explore how different entities related to Oxford are
represented in English and Italian Wikipedia.
Besides exploiting data on relative coverage in different
languages through interlanguage links to enhance Wikipe-
dia, WikiData could also be used to monitor specific as-
pects of coverage such as the gender of biography article
subjects or the representation of different locations.
The Wikimedia Lab DB offers another crucial source of
data for a collaborative visualizations service, as it stores
the complete structure of Wikipedia and other wikis in an
SQL format (i.e., a standard relational database format).
These databases provide a variety of information about
single pages as well as their metadata. For instance, from
data accessible through Wikimedia Lab DB (or related
services, such as Quarry4 or the MediaWiki web API5), a
hierarchical matrix plot (see Figure 2) could be created for
comparing the coverage of a category in two different lan-
guage editions. Each cell in such a plot would show the
difference in, e.g., the number of pages or the page lengths
contained in a category and its subcategories (the latter two
structured using marginal dendrograms in Figure 2).
Figure 2. Illustrative example of hierarchical matrix plot (gener-
ated using random data).
4 Launched in 2014, quarry.wmflabs.org
5 www.mediawiki.org/wiki/API:Main_page
Figure 1. Illustrative example of usage of a network diagram to illustrate interlanguage links on Wikipedia.
As another branch of exploration, collaborative visuali-
zation of the data in WikiData and Wikimedia Lab DB
could also be used to analyse the geographic biases present
in Wikipedia (e.g., Graham et al., 2014). Many Wikipedia
articles about places and events have geolocation infor-
mation attached to them (a.k.a., geo-tags). As such, it is
possible to map the coverage of Wikipedia as a whole as
well as the coverage of any particular language edition.
Figure 3 illustrates how a map could be used to explore the
presence and absence of geo-tags in Oxford, comparing
English and Italian Wikipedia.
Additionally, one can map the locations of contributors
using either IP address, geocoding, or user profile ge-
ocoding6. Ongoing work has also started to geolocate the
third-party sources (e.g., newspaper articles, websites, etc.)
cited in each language edition (Sen et al., 2015) enabling a
third layer of geographic coverage to be visualized and
collaboratively analysed. Plotting any of these three layers
of geographic information in the form of a dot map or a
density map could reveal interesting patterns, and possibly
coverage gaps. Such visualizations are especially meaning-
ful and useful to Wikipedia users holding deep local
knowledge of a certain geographical region (e.g., a valley,
or a village) and may motivate their future contribution
efforts.
More generally, we believe that opening up shortcom-
ings of Wikipedia content and structure to reflection and
discussion by rendering them explicit through collaborative
visualizations has great potential for alleviating the known
biases such as geography, gender or status present in all
user-generated content platforms. Similarly, displaying the
strengths of Wikipedia may allow inferring which content
may be better quality controlled than others, or potentially
lead to channelling the content creators’ energy and efforts
to less attended topics.
Furthermore, we envision that collaborative visualiza-
tion tools could expand to encompass contributor statistics
and user retention metrics in the future. Such data is not
currently available in WikiData, but efforts are underway
to make this data more easily accessible. Analysis and vis-
6 E.g., cii.oii.ox.ac.uk/visualising-the-locality-of-participation-and-voice-
on-wikipedia
ualizations of such data would be potentially very valuable
for promoting diversity among contributors and thus an-
other vector for improving the quality of Wikipedia as a
community and platform.
Challenges and research agenda
In this paper, we have illustrated how a collaborative visu-
alization service would enable users to analyse Wikipedia
content using visual analytics methods to investigate di-
verse aspects of the platform in a collaborative and asyn-
chronous manner. Such activities would then ideally result
in new content creation or in amendments of existing con-
tent. Building a collaborative visualization service on top
of a user-generated content platform (to expand and im-
prove the platform’s coverage through collaborative intro-
spection and discussion) is not restricted to Wikipedia, but
could also benefit other crowdsourcing and open data initi-
atives. However, this new perspective also poses some
questions and opens up new challenges in a number of
research areas related to technology, design and social sci-
ences.
First, a number of technical challenges need to be ad-
dressed in order to implement a service allowing collabora-
tive visualization and analyses as discussed above. In the
case of Wikipedia, the WikiData project and the Wiki-
media Lab DB currently seem the most promising founda-
tions for such a service, providing the necessary underlying
input data. Currently, vector-based interactive visualization
tools represent the state-of-the-art for visual analytics (pos-
sibly using WebGL for complex visualizations (see e.g.
Garaizar et al., 2012)). Custom-made tools could be built
for collaborative sensemaking or adapted from existing
projects such as RAW7 (see Uboldi and Caviglia, 2015).
Second, information visualization design challenges
need to be carefully considered to decide which type of
graphs and maps should be made available for which kind
of data. In order to reach a broad user base among Wikipe-
dia editors, the overarching emphasis in service and visual-
ization development needs to be put on ease of use for con-
7 github.com/densitydesign/raw
Figure 3. Illustrative example of usage of a density map to illustrate presence and absence of Wikipedia geotags.
structing, editing, annotating and discussing visualizations,
while the visualization designs should be guided by the
cognitive and perceptual principles. The interface design
should focus on learnability, consistency with Wikipedia
and its modalities of interaction, and support users in their
visualization process by offering informed choices and
annotations leading them to good design choices.
Furthermore, assuming that a collaborative visualization
service for Wikipedia has been developed, deployed, and is
being actively used, new opportunities for development
and testing of new ideas and methods in the field of com-
puter-supported collaborative work will arise. A critical
perspective from the digital humanities community could
lead to significant improvements of the service, resulting
from rich historical understandings of the construction of
knowledge, and experience of using such mixed methods
(visualizations alongside discussion) for collaborative
sensemaking. Such a service would also be a valuable tool
for digital humanities research, allowing for multilayered
analyses of articles on, for example, historical events, liter-
ary texts, and historiography. The open-source approach at
the core of Wikipedia will provide researchers in the social
sciences with a great source of data on collective behaviour
on the internet, and the use of data and visualization for
decision-making, critique, and activism.
Finally, a distributed, large-scale analysis of Wikipedia,
which developed to one of the pivotal sources of infor-
mation on the internet, will shed light on the role of digital
mediation in content production, reproduction, and its rep-
resentativeness.
References
Abbot, J.; Chambers, R.; Dunn, C.; Harris, T.; Merode, E. D.;
Porter, G.; Townsend, J.; and Weiner, D. 1998. Participatory GIS:
opportunity or oxymoron. PLA notes., 33: 27-33.
Andrienko, G.; Andrienko, N.; Demsar, U.; Dransch, D.; Dykes,
J.; Fabrikant, S. I.; Jern, M.; Kraak, M.J.; Schumann, H.; and
Tominski, C. 2010. Space, time and visual analytics. Internation-
al Journal of Geographical Information Science, 24(10): 1577-
1600.
Bao, P.; Hecht, B.; Carton, S.; Quaderi, M.; Horn, M.; and Ger-
gle, D. 2012. Omnipedia: Bridging the Wikipedia language gap.
In Proceedings of the SIGCHI Conference on Human Factors in
Computing Systems (pp. 1075-1084). ACM.
Brewer, I.; MacEachren, A. M.; Abdo, H.; Gundrum, J.; and Otto,
G. 2000. Collaborative geographic visualization: Enabling shared
understanding of environmental processes. In Information Visual-
ization, 2000. InfoVis 2000. IEEE Symposium on (pp. 137-141).
IEEE.
Brodlie, K.; Fairbairn, D.; Kemp, Z.; and Schroeder, M. 2005.
Connecting people, data and resourcesdistributed geovisualiza-
tion. Exploring Geovisualisation: 425-443.
Crampton, J. W., and Krygier, J. 2006. An introduction to critical
cartography. ACME: An International e-Journal for Critical Ge-
ographies, 4(1): 11-33.
Dunn, C. E. 2007. Participatory GISa people's GIS?. Progress
in Human Geography, 31(5): 616-637.
Garaizar P.; Vadillo, M.A.; and Lopez-de-Ipina, D. 2012. Bene-
fits and Pitfalls of Using HTML5 APIs for Online Experiments
and Simulations. International Journal of Online Engineering, 8:
20-25.
Goodchild, M. F. 2007. Citizens as sensors: The world of volun-
teered geography. GeoJournal, 69(4): 211-221.
Graham, M.; Hogan, B.; Straumann, R. K.; and Medhat, A. 2014.
Uneven geographies of user-generated information: patterns of
increasing informational poverty. Annals of the Association of
American Geographers, 104(4): 746-764.
Hale, S. A. 2015. Cross-language Wikipedia editing of Okinawa,
Japan. In Proceedings of the SIGCHI Conference on Human Fac-
tors in Computing Systems, CHI ’15. ACM.
Hale, S. A. 2014. Multilinguals and Wikipedia editing. In Pro-
ceedings of the 2014 ACM Conference on Web Science, WebSci
’14, (pp. 99108). ACM.
Harvey, F.; Kwan, M. P.; and Pavlovskaya, M. 2005. Introduc-
tion: critical GIS.Cartographica: The International Journal for
Geographic Information and Geovisualization, 40(4): 1-4.
Hecht, B., and Gergle, D. 2009. Measuring self-focus bias in
community-maintained knowledge repositories. In Proceedings of
the Fourth International Conference on Communities and Tech-
nologies, CandT '09, (pp. 1120). ACM.
Hecht, B., and Gergle, D. 2010a. On the “localness” of user-
generated content. In Proceedings of the 2010 ACM Conference
on Computer Supported Cooperative Work, CSCW '10, (pp. 229
232). ACM.
Hecht, B., and Gergle, D. 2010b. The Tower of Babel meets Web
2.0: User-generated content and its applications in a multilingual
context. In Proceedings of the 28th International Conference on
Human Factors in Computing Systems, CHI '10, (pp. 291300).
ACM.
Heer, J., and Agrawala, M. 2008. Design considerations for col-
laborative visual analytics. Information visualization, 7(1): 49-62.
Heer, J.; Viégas, F. B.; and Wattenberg, M. 2007. Voyagers and
voyeurs: supporting asynchronous collaborative information vis-
ualization. In Proceedings of the SIGCHI conference on Human
factors in computing systems (pp. 1029-1038). ACM.
Isenberg, P.; Elmqvist, N.; Scholtz, J.; Cernea, D.; Ma, K. L.; and
Hagen, H. 2011. Collaborative visualization: definition, challeng-
es, and research agenda. Information Visualization, 10(4): 310-
326.
Keim, D.; Andrienko, G.; Fekete, J. D.; Görg, C.; Kohlhammer,
J.; and Melançon, G. 2008. Visual Analytics: Definition, Process,
and Challenges. In Information Visualization (pp. 154-175).
Springer-Verlag.
Keim, D. A.; Mansmann, F.; and Thomas, J. 2010. Visual analyt-
ics: how much visualization and how much analytics?. ACM
SIGKDD Explorations Newsletter, 11(2): 5-8.
Pea, R. D. 1993. The collaborative visualization project. Commu-
nications of the ACM, 36(5): 60-63.
Pfeil, U.; Zaphiris, P.; and Ang, C. S. 2006. Cultural differences
in collaborative authoring of Wikipedia. Journal of Computer-
Mediated Communication, 12(1): 88113.
Sen, S.; Ford, H.; Musicant, D.; Graham, M.; Keyes, O. S.; and
Hecht, B. 2015. Barriers to the localness of volunteered geo-
graphic information. In Proceedings of the SIGCHI Conference
on Human Factors in Computing Systems, CHI 2015. ACM.
Sui, D., and Goodchild, M. 2011. The convergence of GIS and
social media: challenges for GIScience. International Journal of
Geographical Information Science, 25(11): 1737-1748.
Thomas, J. J., and Cook, K. A. 2005. Illuminating the path: the
research and development agenda for visual analytics. IEEE
Computer Society.
Tukey, J. W. 1977. Exploratory data analysis. Reading, Ma, 231:
32.
Uboldi, G., and Caviglia, G. 2015. Information Visualizations and
Interfaces in the Humanities. In New Challenges for Data Design
(pp. 207-218). Springer London.
Viegas, F. B.; Wattenberg, M.; Van Ham, F.; Kriss, J.; and
McKeon, M. 2007. Manyeyes: a site for visualization at internet
scale. Visualization and Computer Graphics, IEEE Transactions
on, 13(6): 1121-1128.
Yasseri, T.; Spoerri, A.; Graham, M.; and Kertész, J. 2014. The
most controversial topics in Wikipedia: A multilingual and geo-
graphical analysis. In Global Wikipedia: International and cross-
cultural issues in online collaboration, Scarecrow Press.
Stoffel, A.; Behrisch, M.; Mittelstadt, S.; Schreck, T.; Pompl, R.;
Weber, S.; Last, H.; Keim, D. 2012. Visual analytics for the big
data eraA comparative review of state-of-the-art commercial
systems. In Visual Analytics Science and Technology (VAST),
2012 IEEE Conference on (pp. 173-182). IEEE.
... Behind the user interface, diverse algorithms implement the tasks of node filtering, coordinate computation and edge selection due to a limited size of screen. Though researchers have intended to diversify the user experience of Wiki with visual effects [13,24,27], our contributions include the possibility of real-time updated visual navigation that responds to users browsing in the open space of Wikipedia pages, and the fact that nodes on the screen are determined by the personalized algorithmic parameters directly set by users. As users are often only aware of the pages that comprise their browsing paths only relatively "blind" to any "surrounding" ones, we hope our design of such an immersive visual navigation would make for a more useful Wikipedia search experience. ...
... IkeWiki [28] and SweetWiki [10] made the inherent structure of a Wikipedia page accessible to users and computing machines via annotations derived from semantic methods (e.g., RDF and conceptual graphs). A visual analytics framework [13] illustrated how editors could work together for a public visualization of Wikipedia data. ...
... Since only the edges with a large enough weight could be added to the map, the dense edges suggest several local clusters, such as Nodes (4,13,11,12,7), or another group (18,3,8,19) in Figure 5a. Besides, the spiral layout clearly shows the similarity-based distance to the center node in an anti-clockwise order. ...
Conference Paper
In this paper we present a proof-of-concept of a visual navigation tool for a personalized “sandbox” of Wiki pages. The navigation tool considers multiple groups of algorithmic parameters and adapts to user activity via graphical user interfaces. The output is a 2D map of a subset of Wikipedia pages network which provides a different and broader visual representation – a map – in the neighborhood (according to some metric) of the pages around the page currently displayed in a browser. The representation schema includes the incorporation of a kind of transparency in the algorithmic parameters affecting the presentation of the landscape visualization, which in turn enables the delivery of a personalized canvas, designed by the user. A case study shows the combination of four different sourcing (i.e., identification and extraction of the neighboring pages) rules and three layouts over the same Wikipedia subnetwork. The basic schema is readily adapted to other search experiences and contexts.
Conference Paper
Full-text available
Visual analytics (VA) system development started in academic research institutions where novel visualization techniques and open source toolkits were developed. Simultaneously, small software companies, sometimes spin-offs from academic research institutions, built solutions for specific application domains. In recent years we observed the following trend: some small VA companies grew exponentially; at the same time some big software vendors such as IBM and SAP started to acquire successful VA companies and integrated the acquired VA components into their existing frameworks. Generally the application domains of VA systems have broadened substantially. This phenomenon is driven by the generation of more and more data of high volume and complexity, which leads to an increasing demand for VA solutions from many application domains. In this paper we survey a selection of state-of-the-art commercial VA frameworks, complementary to an existing survey on open source VA tools. From the survey results we identify several improvement opportunities as future research directions.
Article
Full-text available
We present Omnipedia, a system that allows Wikipedia readers to gain insight from up to 25 language editions of Wikipedia simultaneously. Omnipedia highlights the similarities and differences that exist among Wikipedia language editions, and makes salient information that is unique to each language as well as that which is shared more widely. We detail solutions to numerous front-end and algorithmic challenges inherent to providing users with a multilingual Wikipedia experience. These include visualizing content in a language-neutral way and aligning data in the face of diverse information organization strategies. We present a study of Omnipedia that characterizes how people interact with information using a multilingual lens. We found that users actively sought information exclusive to unfamiliar language editions and strategically compared how language editions defined concepts. Finally, we briefly discuss how Omnipedia generalizes to other domains facing language barriers.
Conference Paper
Localness is an oft-cited benefit of volunteered geographic information (VGI). This study examines whether localness is a constant, universally shared benefit of VGI, or one that varies depending on the context in which it is produced. Focusing on articles about geographic entities (e.g. cities, points of interest) in 79 language editions of Wikipedia, we examine the localness of both the editors working on articles and the sources of the information they cite. We find extensive geographic inequalities in localness, with the degree of localness varying with the socioeconomic status of the local population and the health of the local media. We also point out the key role of language, showing that information in languages not native to a place tends to be produced and sourced by non-locals. We discuss the implications of this work for our understanding of the nature of VGI and highlight a generalizable technical contribution: an algorithm that determines the home country of the original publisher of online content.
Chapter
For the last few years , computers and the Internet have been changing the way research is conceived, conducted, and communicated, transforming scholarly publication and collaboration, and supporting the creation, the storage, the analysis, and the dissemination of data and information. While natural, medical, and social sciences have a long and established tradition with these technologies, most of the humanities disciplines have found it difficult if not impossible to integrate computational tools, based mostly on quantitative approaches, with their research methods. In the last 20 years however, new research areas and activities have emerged from the intersection between humanities and computing. Today, what is known as digital humanities represents a heterogeneous set of studies and practices that aims at understanding the implications and the opportunities that digital technologies can provide as media, tools, or objects of study in the humanities (Schreibman et al. in A companion to digital humanities. Blackwell, Oxford, 2004; Gold in Debates in the digital humanities. University of Minnesota Press, Minneapolis, 2012; Berry in Understanding digital humanities. Palgrave Macmillan , New York, 2012). These new relationships between the digital and the humanities are rapidly demanding for new modes of observation and interpretation. Information visualizations and interfaces appear as essential tools to explore and make sense out of big and heterogeneous amounts of data (Manovich 2013). But, in a context where most of the methods and the technologies are still adopted from other disciplines, the biggest challenge seems to be imagining new genuine research tools capable of embedding and valorizing the humanities endeavor (Drucker in Culture Machine 12:1–20, 2011). The work presented here aims at deepening the relationships between designers, humanities scholars, and computer scientists through the outlining of new research tools and processes based on humanistic data and digital environments. Furthermore, it explores the possibilities and challenges set forth by information and data visualizations as tools to support scholarly activities.
Article
This chapter highlights the way Distributed Geovisualization-connecting people, data and resources-can deliver real benefit to science and society. The discussion is framed around a scenario of environmental crisis management-a flood emergency-which is typical of the challenges that only a distributed approach can solve in an effective and timely manner. A real world problem of flood management is considered and it is shown that a distributed approach can lead to more effective crisis management. This is also typical of many others scenarios such as the management of forest fires, oil slicks, radiation leaks, and toxic chemical release. In all of these cases, a combined force of data, resources, and people, at very short notice and on a global scale, is required to be harnessed. Geovisualization is being presented with unique challenges as spatio-temporal decision- making applications require capabilities for extracting and using relevant subsets of data from heterogeneous distributed data resources. The users in the emergency management application, the river or water resources authority, the highways authority and the civic authorities will require access to relevant subsets of their operational databases. In the context of the flood emergency, the spatial dimension will be extremely relevant to enable users to specify the data to be extracted from the river and road networks and the information about land use that refers to the region at risk.
Article
This article analyzes users who edit Wikipedia articles about Okinawa, Japan, in English and Japanese. It finds these users are among the most active and dedicated users in their primary languages, where they make many large, high-quality edits. However, when these users edit in their non-primary languages, they tend to make edits of a different type that are overall smaller in size and more often restricted to the narrow set of articles that exist in both languages. Design changes to motivate wider contributions from users in their non-primary languages and to encourage multilingual users to transfer more information across language divides are presented.
Article
Geographies of codified knowledge have always been characterized by stark core–periphery patterns, with some parts of the world at the center of global voice and representation and many others invisible or unheard. Many have pointed to the potential for radical change, however, as digital divides are bridged and 2.5 billion people are now online. With a focus on Wikipedia, which is one of the world's most visible, most used, and most powerful repositories of user-generated content, we investigate whether we are now seeing fundamentally different patterns of knowledge production. Even though Wikipedia consists of a massive cloud of geographic information about millions of events and places around the globe put together by millions of hours of human labor, the encyclopedia remains characterized by uneven and clustered geographies: There is simply not a lot of content about much of the world. The article then moves to describe the factors that explain these patterns, showing that although just a few conditions can explain much of the variance in geographies of information, some parts of the world remain well below their expected values. These findings indicate that better connectivity is only a necessary but not a sufficient condition for the presence of volunteered geographic information about a place. We conclude by discussing the remaining social, economic, political, regulatory, and infrastructural barriers that continue to disadvantage many of the world's informational peripheries. The article ultimately shows that, despite many hopes that a democratization of connectivity will spur a concomitant democratization of information production, Internet connectivity is not a panacea and can only ever be one part of a broader strategy to deepen the informational layers of places.
Article
This article analyzes one month of edits to Wikipedia in order to examine the role of users editing multiple language editions (referred to as multilingual users). Such multilingual users may serve an important function in diffusing information across different language editions of the project, and prior work has suggested this could reduce the level of self-focus bias in each edition. This study finds multilingual users are much more active than their single-edition (monolingual) counterparts. They are found in all language editions, but smaller-sized editions with fewer users have a higher percentage of multilingual users than larger-sized editions. About a quarter of multilingual users always edit the same articles in multiple languages, while just over 40% of multilingual users edit different articles in different languages. When non-English users do edit a second language edition, that edition is most frequently English. Nonetheless, several regional and linguistic cross-editing patterns are also present.