Content uploaded by Jaume Nualart Vilaplana
Author content
All content in this area was uploaded by Jaume Nualart Vilaplana on Mar 24, 2015
Content may be subject to copyright.
El profesional de la información, 2014, mayo-junio, v. 23, n. 3. ISSN: 1386-6710 221
Article received on 19-01-2014
Approved on 09-03-2014
How we draw texts: a review of approacHes to
text visualization and exploration
Jaume Nualart-Vilaplana, Mario Pérez-Montoro y Mitchell Whitelaw
Jaume Nualart-Vilaplana is a PhD candidate in the Faculty of Arts and Design, University of Can-
berra (Australia), research engineer at Nicta (Australia), and a PhD candidate in the Faculty of In-
formaon Science, University of Barcelona. MAS and MSc (Licenciatura) at Autonomous University
of Barcelona
hp://orcid.org/0000-0003-4954-5303
Machine Learning Research Group at NICTA, Canberra Research Laboratory
Tower A, 7 London Circuit, Canberra City ACT 2601, Canberra, Australia
jaume.nualart@canberra.edu.au
Mario Pérez-Montoro holds a PhD in Philosophy and Educaon from the University of Barcelona
and a Master in Informaon Management and Systems from the Polytechnic University of Catalo-
nia. He studied at the Istuto di Discipline della Comunicazione at the Università di Bologna (Italy)
and has been a vising scholar at the Center for the Study of Language and Informaon (CSLI) at
Stanford University (California, USA) and at the School of Informaon at UC Berkeley (California,
USA). He is a professor in the Department of Informaon Science at the University of Barcelona. His
work has focused on informaon architecture and visualizaon. He is author of the book Arquitec-
tura de la información en entornos web (Trea, 2010).
hp://orcid.org/0000-0003-2426-8119
Facultat de Biblioteconomia i Documentació, Universitat de Barcelona
Melcior de Palau, 140. 08014 Barcelona, España
perez-montoro@ub.edu
Mitchell Whitelaw is an academic, writer and praconer with interests in new media art and
culture, especially generave systems and data-aesthecs. His work has appeared in journals in-
cluding Leonardo, Digital creavity, Fibreculture, and Senses and society. In 2004 his work on a-life
art was published in the book Metacreaon: art and arcial life (MIT Press, 2004). His current
work spans generave art and design, digital materiality, and data visualisaon. He is currently an
associate professor in the Faculty of Arts and Design at the University of Canberra, where he leads
the Master of Digital Design. He blogs at The Teeming Void.
hp://orcid.org/0000-0001-9013-9732
Faculty of Arts and Design, University of Canberra
Bldg, Floor & Room: 9, C12. ACT 2617, Canberra, Australia
mitchell.whitelaw@canberra.edu.au
Abstract
This paper presents a review of approaches to text visualizaon and exploraon. Text visualizaon and exploraon, we ar-
gue, constute a subeld of data visualizaon, and are fuelled by the advances being made in text analysis research and by
the growing amount of accessible data in text format. We propose an original classicaon for a total of 49 cases based on
the visual features of the approaches adopted, idened using an inducve process of analysis. We group the cases (publis-
hed between 1994 and 2013) in two categories: single-text visualizaons and text-collecon visualizaons, both of which
can be explored and compared online.
Keywords
Review, Text visualizaon, Data visualizaon, Data exploraon, Data display, Informaon visualizaon, Text analysis.
Título: Cómo dibujamos textos. Revisión de propuestas de visualización y exploración textual
artÍculos
Nota: Este arculo puede leerse traducido al español en:
hp://www.elprofesionaldelainformacion.com/contenidos/2014/may/02_esp.pdf
Jaume Nualart-Vilaplana, Mario Pérez-Montoro y Mitchell Whitelaw
222 El profesional de la información, 2014, mayo-junio, v. 23, n.3. ISSN: 1386-6710
Resumen
En este trabajo se presenta una revisión de estrategias para la visualización y exploración de textos. Se argumenta que la
visualización y exploración de textos constuye un subcampo de la visualización de datos que se nutre de los avances en
el análisis de textos y de la creciente candad de datos accesibles en formato texto. Proponemos una clasicación original
para un total de cuarenta y nueve casos revisados. La clasicación está basada en las caracteríscas visuales de cada caso,
idencadas mediante un proceso inducvo de análisis. Agrupamos los casos (publicados entre 1994 y 2013) en dos cate-
gorías: las visualizaciones de texto individuales y la visualizaciones de colecciones de textos. Los casos revisados pueden ser
explorados y comparados en línea.
Palabras clave
Visualización de texto, Visualización de datos, Exploración de datos, Visualización de información, Análisis de textos.
Nualart-Vilaplana, Jaume; Pérez-Montoro, Mario; Whitelaw, Mitchell (2014). “How we draw texts: a review of ap-
proaches to text visualizaon and exploraon”. El profesional de la información, mayo-junio, v. 23, n. 3, pp. 221-235.
hp://dx.doi.org/10.3145/epi.2014.may.02
1. Introducon
The aim of this review is to propose a classicaon of text
visualizaon and exploraon tools, while describing the
broader context in which they operate. To do so, we list, clas-
sify and discuss the most important contribuons made in
the eld of text visualizaon and exploraon between 1994
and 2013. This eld is undergoing rapid growth –fuelled by
open data iniaves and web scraping– and has become
highly diversied, developing in parallel in a range of disci-
plines. Some of the most important visualizaon methods
invented between 1765 and 1999 were the meline, bar
chart, pie chart, ow map, Venn diagram, histogram, Gan
chart, owchart, tag cloud, social networks, boxplot, star
plot, treemap, headmap, and sparkline. Figure 1 presents
a word cloud (using Wordle) of the professions pracced by
their respecve inventors. Given this diversity, our search
for cases has been conducted in many dierent contexts
and has involved the examinaon of many dierent sources,
ranging from the sciences to the humanies, from academic
journals to blog sites, from universies to freelance studios,
and from open data instuons to open data communies.
Clearly this proliferaon of disciplines has meant the adop-
on of a variety of dierent philosophies and points of view.
This review aims to help those that work with data, and es-
pecially with texts (but by no means limited to academics),
to use visualizaon techniques that can idenfy paerns or
behaviours present in the textual reality. Moreover, these
techniques can help users improve –in terms of both the
speed and the clarity of the process– the way in which they
visualize and discover the facts that lie within the data.
Drawing a clear conceptual line between approaches to text
visualizaon and exploraon is no straighorward task, but
here we have opted to review cases dedicated to both pro-
cesses, be they described separately or together. Note that
on occasions, for the sake of simplicity, we use the term text
visualizaon in reference to both approaches.
The two types of text visualizaon considered here are:
1) Single-text representaon, that is, ways of extract-
ing meaning from texts based on wring style, document
structure and language register as opposed to pure stas-
cs. Our interest lies in represenng the meaning and sali-
ent features of texts because their convenient visualizaon
can speed up and/or improve our ability to select texts and
manage the me required to tackle them. The research out-
put of elds such as natural language processing, linguisc
compung and machine learning provides techniques for
producing high quality data represenng complex texts. It
is our belief that by combining these techniques with a suit-
able text visualizaon method we can improve the way in
which we examine and understand texts.
2) Representaon and exploraon of collecons of texts.
Exploring and selecng individual texts and navigang and
analyzing collecons of texts are daily tasks for many of
those who work with computers and datasets, and there is
clearly plenty of room for new ideas and tools to facilitate
their work. Informaon re-
trieval is a crical factor in an
environment characterized by
an excess of informaon (Bae-
za-Yates et al., 1999). When
a user conducts a search, the
informaon retrieval systems
normally respond with a list of
results. More oen than not,
the presentaon of these re-
sults plays an important role in
sasfying the user’s informa-
on needs, so a poor or inad-
Figure 1. Word cloud of the professions practiced by inventors of visualization methods
How we draw texts: a review of approaches to text visualization and exploration
El profesional de la información, 2014, mayo-junio, v. 23, n. 3. ISSN: 1386-6710 223
equate presentaon can thwart the user (Baeza-Yates et al.,
2011). Typically, informaon retrieval systems present the
results of a query in a at, one-dimensional list. Such lists
tend to be opaque in terms of the order they give to the
informaon, i.e., the users are unaware as to why the list is
presented in a parcular order. To rene their search, users
have to interact again, normally by ltering the rst output
of results. It is our belief that new techniques for represent-
ing collecons of texts –including search results– can help
improve navigaon, exploraon and retrieval.
As we show below, data visualizaon can today be consid-
ered a consolidated academic eld (Strecker; IDRC, 2012).
Thus:
- Seven of the top 10 universies according to the Times
Higher Educaon ranking (2012) have departments or re-
search groups working in the eld of data visualizaon.
The discipline is incorpo-
rated in a wide variety
of departments, ranging
from computer science
and stascs to linguiscs
and graphical design, and
from chemistry and phy-
sics to genecs and his-
tory. Recently, data visua-
lizaon has emerged as a
disnct eld, with specic
departments dedicated to
its study and master’s pro-
grams being taught in the
subject (table 1).
- Over the last ve years a number of conferences have
been dedicated primarily to data visualizaon. These are
listed in table 2.
- A number of journals are now specically dedicated to
studies in data visualizaon, and important contribuons
can be found also in conference proceedings (table 3).
Finally, a number of leading websites –including Infosthe-
cs, Visualcomplexity and Visualizingdata.com– play a key
role in the disseminaon of the subject.
1.1. Text visualizaon
Shneiderman (1996) classies regular texts as one-dimen-
sional data, that is, data organized in a sequenal manner,
running right-to-le (or le-to-right), line-by-line, top-to-
boom. Yet, a text can have mulple internal structures, a
morphology made up of paragraphs, sentences and words.
Conference Location Topic No. participants URL
Nicar 2013 USA Data journalism 149 http://ire.org/conferences/nicar-2013
Dd4d 2009 France Information visualization 52 http://www.dd4d.net
FutureEverything 2013 UK Technology/society/art 52 http://futureeverything.org
Resonate 2013 UK Creative code 44 http://www.thisisresonate.co.uk/resonate-13
Graphical web 2012 Switzerland Open web/datavis 38 http://www.graphicalweb.org/2012
IeeeVis - VisWeek 2012 USA Information visualization - http://ieeevis.org
EuroVis 2013 Germany Computational aesthetics - http://www.eurovis2013.de
Siggraph 2013 USA Computer graphics and interactive
techniques -http://s2013.siggraph.org
OzViz 2012 Australia & NZ Workshops for visualisation practitio-
ners, academics and researchers -http://www.ozviz2012.org
Table 2. Conferences dedicated primarily to data visualization ordered by number of participants (Stefaner, 2013)
Institution Rank in
2012 Department/Course URL
Harvard University 1Broad Institute of Harvard and MIT http://www.broadinstitute.org/vis
Massachusetts Institute of Technology 2Broad Institute of Harvard and MIT http://www.broadinstitute.org/vis
University of Cambridge 3-- --
Stanford University 4Stanford Vis Group http://vis.stanford.edu
University of California, Berkeley 5VisualizationLab http://vis.berkeley.edu
University of Oxford 6Visual Informatics Lab at Oxford http://oxvii.wordpress.com
Princeton University 7PrincetonVisLab http://www.princeton.edu/researchcomputing/vis-lab
University of Tokyo 8-- --
University of California, Los Angeles 9IDRE GIS and visualization https://idre.ucla.edu/visualization
Yale University 10 -- --
Table 1. Leading universities and their data visualization departments
Name Url
Parsons journal for information mapping http://pjim.newschool.edu/issues/index.php
Journal of visualization http://springer.com/materials/mechanics/journal/12650
Ieee Transactions on visualization and computer
graphics (TVCG) http://www.computer.org/portal/web/tvcg
Information visualization http://ivi.sagepub.com
International journal of image processing and
data visualization (Ijipdv)http://iartc.net/index.php/Visualization
IEEE Vis (former Visweek)http://ieeevis.org
EuroVis http://www.eurovis2013.de
ACM CHI http://chi2013.acm.org
EG CGF http://www.eg.org
IVS http://www.graphicslink.co.uk/IV2013
Table 3. Main journals dedicated to data visualization
Jaume Nualart-Vilaplana, Mario Pérez-Montoro y Mitchell Whitelaw
224 El profesional de la información, 2014, mayo-junio, v. 23, n.3. ISSN: 1386-6710
Depending on its informaon structure, a text may be orde-
red by chapters, parts, secons, subsecons, etc. If a text is
given in a specic format, such as html, then it may be orga-
nized into bodies, divs, paragraphs, etc. In these examples
the text includes tree structures as well as a one-dimensio-
nal structure. Addionally, texts may have a subjecve com-
ponent and an abstract structure that is not readily analy-
sed by a computer. All in all, these data types and structures
constute the specicies of a text.
The amount of data to which we have access grows on a
daily basis. Most of these data are in text format, as Fernan-
da Viégas and Marn Waenberg in an interview with Je
Heer argue: “One of the things I think is really promising is
visualizing text. That has been mostly ignored so far in terms
of informaon visualizaon approaches, and yet a lot of the
richest informaon we have is in text format” (Heer, 2010).
Data analysis denes the boundaries of data visualizaon,
i.e., it provides the ne line between mulple truths and
lies. In the case of text visualizaon, this role has been taken
on by text analysis: in the main, via computaonal linguis-
cs, natural language processing, machine learning and sta-
scs. The advances made in text analysis at a whole range
of levels have provided computers with text understanding,
enabling them to modify a text, the so-called unstructured
data (see next subsecon “Text analysis”).
There is some discussion as to whether text visualizaon
might be considered a specic subeld of data visualizaon.
Some authors tend to disagree: Illinski (2013) claims that
text cannot be considered a data type; Šilić (2010) argues
that “unstructured text is not suitable for visualizaon”. Yet,
as discussed above, most text visualizaons transform the
inial “unstructured” textual data into a reduced, structu-
red dataset. This new dataset is no longer one-dimensional,
but rather it constutes a categorical or a network dataset
and it can be represented with a wide range of tools that are
not specic to text representaon (Hearst, 2009; Grobelnik;
Mladenić, 2002).
As we show in the cases we review here, most text visua-
lizaons do not represent raw data: that is, the text as it
is. Rather what they do is transform the text into smaller
chunks of data, normally extracng a representave part of
that text. This process is one of data transformaon and it
occurs, for example, when a text is reduced to a list of words
based on their frequency of appearance. In that case, the
method chosen to represent the data will belong to a family
of methods best suited to the data type. In this review we
consider the most frequently employed strategies to repre-
sent single texts or collecons of texts, paying special aen-
on to strategies for represenng textual data as it is, as a
regular text, with all its complexies, irregularies and rich
abstracons.
Text analysis is a key eld for text visualizaon. Below, we
present a brief commentary on this maer and its relaons-
hip with text visualizaon.
1.2. Text analysis
Text analysis, roughly synonymous with text mining (Feld-
man; Sanger, 2006), is an interdisciplinary eld that inclu-
des informaon retrieval, data mining, machine learning,
stascs, linguiscs and natural language processing. Accor-
ding to Mar Hearst (2003), the goal of text mining is to
discover “heretofore unknown informaon, something that
no one yet knows and so could not have yet wrien down”.
Text mining is a subeld of data mining whose typical appli-
caons include the analysis or comparison of literary texts,
the analysis of biological and genomic data sequences and,
more recently, the idencaon of consumer behaviour pat-
terns or the detecon of the fraudulent use of credit cards.
Hearst dierenates these applicaons from informaon
extracon operaons, such as the extracon of people’s
names, addresses or job skills. This laer task can be done
with >80% accuracy, but the former, the full interpretaon
of natural language by a computer program, looks like it will
not be possible for “a very long me” (Hearst, 2003).
To study text visualizaon and exploraon it is important to
examine the literature dedicated to both data visualizaon
and text analysis, given the signicant interrelaonships
that exist. Thus, while the text analysis output may limit the
possibilies of visual presentaon and interacon with the
text, there is strong empirical evidence indicang that peo-
ple learn beer with a combinaon of text and illustraon
(visualizaon) than with text alone (Anglin et al., 2004; Le-
vie; Lentz, 1982).
2. Review
In this secon we propose a possible classicaon based on
the visual features that characterize the approaches to tex-
tual visualizaon and exploraon, as idened in 49 cases.
The methodology to collect the cases is a two-part process.
First, a tradional literature search and review (including prac-
cal examples and visualisaon studies); and second, a subset
of these have been selected, based on a preliminary analysis
of their features. The aim was to select cases that provided a
representave overview of the range of work in the eld.
The classicaon of the cases is the product of empirical ob-
servaon following an inducve analysis. The classicaon
is followed by an analysis of these cases.
There are alternaves to those used in this paper for the se-
lecon and categorizaon of primary source methodologies
such as Kitchenham (2004) and Benavides; Segura; Ruiz-
Cortés (2010).
2.1. Classicaon of approaches
The basic classicaon of text visualizaon approaches
comprises two categories according to the type of data to
which they are applied:
1) Textual documents: that is, representaons of single
texts, where text is understood as a sequence of words or-
dered according to the hierarchy: document > paragraphs
Seven of the top 10 universities have de-
partments or research groups working in
data visualization
How we draw texts: a review of approaches to text visualization and exploration
El profesional de la información, 2014, mayo-junio, v. 23, n. 3. ISSN: 1386-6710 225
> sentences > other punctuaon marks > words > syllables
and phonemes or morphemes. Where a text is a book or
another kind of structure, then, it may have more granu-
laries, including: chapters > secons > sub secons > etc.
We also include the metadata of the text and other aa-
ched texts, i.e., tle, author(s), publisher, copyright notes,
acknowledgement, dedicaon, preface, table of contents,
forward, glossary, bibliography, index, etc.
2) Text collecons: that is, a group of texts in which each
item constutes a clearly dierenable enty. Typically
when speaking of collecons of texts, we speak of texts
that have elements in common, be it their register, length
or structure. All the cases we review here are collecons of
the same text type. Heterogeneous collecons of texts are
also referenced in the literature (Meeks, 2011), especially in
representave analyses of a eld of knowledge, where the
aim is to include the greatest possible variety of expressions
and vocabulary. In such cases the dataset can be said to be
heterogeneous in term of its structure and register.
To these two data types, we then add several subjecve
subdivisions to each category according to the visual featu-
res used to represent the textual features. The aim here is
to be able to describe and explain the cases under review, as
well as to idenfy the key features of the text visualizaon
approaches.
Single texts
- Whole <-> Part
- Sequenal <-> Non sequenal
- Discourse structure <-> Syntacc structure
- Search
- Time
Text collecons
- Items <-> Aggregaons
- Landscape
- Search
- Time
2.1.1. Single texts
In the specic instance of single texts, we classify the ca-
ses according to the part of the text that is represented,
whether the approach follows the same sequence as that
of the text, and the text structure employed in each case.
Whole or part?
In some instances, one part of the text is considered the
essence of the text and is used in the visualizaon process
rather than the whole text. Yet, there are processes that use
the whole text, at least implicitly. Examples include:
- chapters of a book but not the whole text.
- representaon of all the sentences of the text as coloured
lines.
- verbs of a text, providing an impression of the style of the text.
- characters of a novel and their appearance within the text.
- places or dates present in the text.
- etc.
The cases in which the whole text is explicitly represented
are, for obvious reasons, cases involving relavely short
texts, e.g., song lyrics, speeches, poems, etc.
In some instances, such as when using Radial word connec-
ons (see, case 1 below) only certain words from the text
are represented; yet, we classify this case as a whole text re-
presentaon because the whole novel, chapter by chapter,
is implicitly represented in the circle.
In those instances in which the whole text is represented
(even implicitly) as one central element in the visualizaon,
we classify it as being a whole-text visualizaon.
Does the visualizaon follow the same sequence as that
of the text?
If the visualizaon follows the same sequence, or order, as
that of the text, then the case is considered sequenal; if
not, then it is considered non-sequenal. For example, a ty-
pical case that does not follow the same sequence as that of
the original text would be a word cloud (see gure 1).
Does the visualizaon use elements from discourse struc-
ture or from syntacc structure?
A text may present one of two kinds of structure that we
consider useful for our research. One is so-called discourse
structure. Depending on the nature of the text, the discour-
se structure can be completely subjecve to the author’s
point of view –as in literature–, or restricted to a given struc-
ture –as in legal and scienc texts. In linguiscs, discourse
is a broad concept, but here we use it to refer to the parts
of a text and the outline of a document: parts, chapters,
secons, subsecons, etc. The discourse structure is widely
used when visualizing texts because it is a relavely straight-
forward way to represent the text sequence.
The second structure is the text’s syntacc structure, refe-
rred to text structure in sentences, phrases and word clas-
ses ―including verbs and nouns. This is an objecve struc-
ture and is dependent on the rules of linguiscs. In text
visualizaons, the elements comprising this structure, such
as sentences, are very common.
2.1.2. Text collecons
In the specic instance of text collecons we classify the ca-
ses according to pure items or aggregaons, i.e., as pure data
or data landscapes. Thus we determine whether the items
making up the collecon can be dierenated or represen-
ted as aggregaons. The specic quesons we address are:
How is each item in the collecon graphically represented?
Is each text represented as a graphical enty, i.e., as a point,
a word or short sentence? Can the items in the visualizaon
be counted, i.e., are they visually dierenated?
There are cases in which each item is not represented by a
graphically disnct enty, but rather, for example, as a co-
loured block. Alternavely, the items are accumulated and
shown as frequency distribuons. When the items of the
collecon are not graphically disnct (visually countable)
Most text visualizations transform the
initial ‘unstructured’ textual data into a
reduced structured dataset
Jaume Nualart-Vilaplana, Mario Pérez-Montoro y Mitchell Whitelaw
226 El profesional de la información, 2014, mayo-junio, v. 23, n.3. ISSN: 1386-6710
then we speak in terms of the visualizaon of an aggrega-
on rather than that of an item.
Pure data or data and landscape?
Are the items of the collecon accompanied by any graphi-
cal content? Is another dataset, apart from that emanang
from the text, also being represented? Some cases present
the items embedded in a graphical environment, such as a
map. This context might be an actual geographical map, a
metaphor, or, for example, a conceptual landscape compo-
sed of words that form a second layer complemenng that
of the data collecon, in which every distance plays a role:
item-item (similarity between documents), word-item (im-
portance of a word in a document), word-word (similarity
between words in the collecon).
Scales and axes are not considered as landscapes, nor are
the elements of the interface in which the representaon is
embedded. This data layer, if not considered as the main da-
taset, would reduce substanally Tue’s data-ink rao (Tuf-
te; Graves-Morris, 1983) compared to the rao of a pure
data representaon.
2.1.3. Both single texts and text collecons
Properes that are equally applicable to single-text and
text-collecon visualizaons include me, search results
and dataset size.
Does me play a role?
Do the texts change over me? One set of visualizaon ap-
proaches highlights the changes undergone by a dataset
over me. The most common approaches of this kind have
been developed in computer science to represent code evo-
luons or in Wikipedia to indicate various aspects of arcle
revisions.
This category also includes visualizaons in which the data-
set itself changes over me; for example, the visualizaon
of the latest news will see the dataset grow over me.
Does the visualizaon re-
sult from a search query?
Visualizaons of the output
of informaon system re-
trieval is a well-dened kind
of visualizaon characteri-
zed by the changing num-
ber of represented items
depending on the number
of search results obtained.
This is a growing visualiza-
on subeld related to the
disciplines of informaon
systems and informaon re-
trieval (Mann, 2002; Hearst,
2009).
Validity for small or large
datasets
It is rare that a visualizaon
tool is independent of the
size of the dataset that is to
be represented. Here, in those cases in which the tool has
been clearly designed for a specic dataset size, the reader
will be given the corresponding explanaon.
2.2. Analysis of visualizaon approaches
We review a total of 49 cases applying the classicaon out-
lined above. In an aempt to incorporate the most crucial
aspects of text visualizaon, our review concentrates on
the specic ideas underpinning the text visualizaon, rather
than the dataset and the contexts of each case.
Sixteen elds have been collected for each case: name, short
name, author(s), year of publicaon, URL for further infor-
maon, original dataset, discipline related to the work, des-
cripon of the visualizaon method, descripon of the case,
screen shot, thumbnail, classicaon (single or collecon),
classicaon (single-whole, single-part, collecon-items,
collecon-aggregaons), classicaon (me), classicaon
(search), classicaon (dataset small, dataset large, N/A).
The cases are grouped into two secons and four subsec-
ons:
Single-text visualizaons (23 cases)
– Whole-text visualizaons (15 cases)
– Paral-text visualizaons (8 cases)
Text collecon visualizaons (26 cases)
– Collecon of items (16 cases)
– Collecon of aggregaons (10 cases)
For each subsecon the cases are sorted by year of publica-
on (descendant). To assist the reader, the collecon of all
reviewed cases can be viewed using the visualizaon and
exploraon soware (also included in the review) known as
AREA (Nualart, 2013).
2.2.1 Single-text visualizaon
We present single texts grouped as whole-text visualiza-
ons, paral-text visualizaons and other subcategories.
Figure 2. The 49 reviewed cases visualized with the Area software (screen shot).
How we draw texts: a review of approaches to text visualization and exploration
El profesional de la información, 2014, mayo-junio, v. 23, n. 3. ISSN: 1386-6710 227
The laer includes sequenal and non-se-
quenal visualizaons, discourse-structu-
res and syntacc-structures visualizaons,
search results and datasets dependent on
me visualizaons. Each subsecon adhe-
res to the following structure: list of cases,
descripon of the group and discussion.
a) Whole-text visualizaons
1) Literature. Novel views: Les miséra-
bles, Radial word connecons by Je Clark
(2013)
2) Literature. Novel views: Les misérables,
Character menons by Je Clark (2013)
3) Literature. Poem viewer by Katharine
Coles et al. (2013)
4) Polics. State of the Union 2011, Senten-
ce bar diagrams by Je Clark (2011)
5) Literature. Visualizing lexical novelty in
literature by Mahew Hurst (2011)
6) Science/papers. On the origin of species:
The preservaon of favoured traces by Ben
Fry (2009)
7) Science/papers. Tex t y by Jaume Nualart (2008)
8) Religion. Bible cross-references by Chris Harrison (2008)
9) Literature. Literature ngerprint by Daniel A. Keim and
Daniela Oelke (2007)
10) Wikipedia. History ow by Fernanda Viégas and Marn
Waenberg (2003)
11) Literature. Colour-coded chronological sequencing by
Joel Deshaye and Peter Stoiche (2003)
12) Literature. 2-D display of me in the novel by Joel Des-
haye (2003)
13) Literature. 3-D display of me in the novel by Joel Des-
haye (2003)
14) Any. Waenberg’s arc diagram by Marn Waenberg
(2002)
15) Health. TileBars by Mar A. Hearst (1995)
Descripon
- Number of cases: We idenfy 15 cases that can be catego-
rized as whole-text visualizaons.
- Years: The cases were published over an 18-year period
from 1995 to 2013.
- Authors: All the authors work in academic elds. The most
prolic authors in this category are Je Clark and Joel Des-
haye (with three cases each), followed by Marn Waen-
berg (with two cases).
- Datasets: Most of the text corpora in this category are
taken from literature (eight cases). Most authors draw on
novels, especially well-known texts such as the classics, to
demonstrate new visualizaon approaches.
- Methods: All the cases except case 14 (arc diagram) use
colour as part of the visualizaon method. Five cases use
methods that are bar chart derivaves (cases 4, 5, 6, 9 and
11). Three cases use curves connecng parts of the texts:
two arcs and one radial diagrams (cases 1, 8 and 14).
Discussion
A common method cannot be idened for these whole-
text visualizaons. Yet, as expected, they all present an axis
represenng the whole text. In 13 of the 15 cases, the text
line is represented by a horizontal or vercal line. The two
excepons use a circle –the case of Radial word connecons
(case 1)– and an iconicaon of a text on the page –the case
of Tex t y (case 7).
Since whole-text visualizaons always include an abstrac-
on of the text, referred to as its text line, a queson arises:
which part of the text is physically present in the whole-text
visualizaon being reviewed? Interesngly, nine of the 15
visualizaons do not show a single word (cases 4, 5, 6, 7, 8,
9, 10, 11 and 15). Four cases show a small number of words
(cases 1, 2, 12 and 13) (gure 3), while only two cases show
all the text (cases 3 and 14).
The most common approach is to show the occurrence of a
certain feature –this might be a term, topic, cross-reference
or character– within the text as a whole (all cases except 3,
12, 13 and 15). With the excepon of Waenberg’s arc dia-
grams (case 14), these occurrences are represented using
the same colour.
It is interesng to observe how very similar data are repre-
sented in very dierent ways depending on the case under
review. For example, while Viégas and Waenberg’s History
ow (case 10) and Fry’s Favoured Traces (case 6) both pre-
sent document-version histories by secon, the former is
spaalized and the laer animated. Similaries, however,
are seen in the approaches adopted, for example, by Tile-
Bars (case 15) and Tex t y (case 7). Thus, both highlight words
from the text within a rectangular gure that is representa-
Figure 3. (Case 13) 3-D display of timeof William Faulkner’s novel The Sound and the Fury,
by Joel Deshaye and Peter Stoicheff (2003)
Jaume Nualart-Vilaplana, Mario Pérez-Montoro y Mitchell Whitelaw
228 El profesional de la información, 2014, mayo-junio, v. 23, n.3. ISSN: 1386-6710
ve of the whole text. Other cases use opposite or comple-
mentary techniques. Thus, Waenberg’s Arc diagram (case
14) shows repeons while Hurst’s novelty visualizaon
(case 5) shows only new strings, and no repeons.
Literature and other complex texts, such as polical spee-
ches (case 4) and the Bible (case 8), dominate the type of
corpora used in this category (10 cases). This is perhaps
surprising, as these texts tend to be complex, oen presen-
ng a high level of abstracon and lile formal structure.
Arguably, when opng to introduce or test a new approach,
it would make more sense to work with simpler, more struc-
tured texts (such as scienc papers, patents, health diag-
noscs, etc.) that present greater regularity in terms of their
vocabulary, text length, discourse structure and register.
Given the inherent freedoms associated with literature, no-
velists are under no obligaon to adhere to any paern or
rule that might help us give structure to the unstructured.
However, depending on how the text is treated and proces-
sed, the nature of the text is not always relevant. For exam-
ple, Mahew Hurst (case 5) tracks the introducon of new
terms in literary texts. Yet the tool can be applied to any
other text type, its results being unrelated to the complexity
of the text given the ubiquity of the method. Having said
this, it would be interesng to apply the technique to scien-
c papers in which the style is much more clearly dened.
Similar arguments can be applied to Radial word connec-
ons (case 1), Sentence bar diagrams (case 4) and Literature
ngerprints (case 9).
b) Paral-text visualizaons
16) Literature. Novel views: Les misérables. Characterisc
verbs by Je Clark (2013)
17) Any. Wordle by Jonathan Feinberg (2009)
18) Books. DocuBurst by C. Collins, S. Carpendale and G.
Penn (2009)
19) Literature. Phrase nets by Frank van Ham, Marn Wat-
tenberg and Fernanda B. Viégas (2009)
20) Google data. Word spectrum: Visualizing Google’s bi-
gram data by Chris Harrison (2008)
21) Google data. Word associaons: Visualizing Google’s bi-
gram data by Chris Harrison (2008)
22) Literature/songs. Document arc diagrams by Je Clark
(2007)
23) Any book. Gist icons by P. DeCamp, A. Frid-Jimenez, J.
Guiness, D. Roy (2005)
Descripon
- Number of cases: We idenfy eight cases that can be ca-
tegorized as paral-text visualizaons.
- Years: The cases were published over an eight-year period
from 1995 to 2013.
- Authors and datasets: Two cases by Je Clark (cases 16
and 22) and one by the creave team of Waenberg and
Viégas in collaboraon with van Ham (case 19) use literary
texts. The two cases by Chris Harrison use large bi-gram
datasets published by Google. One case is not dependent
on the nature of the text: Wordle (case 17), the very popu-
lar “word cloud” method introduced by Feinberg. Finally,
two interacve approaches involving large datasets are
presented: DocuBurst (case 18) and Gist icons (case 23).
- Methods: In six of the eight cases (cases 16, 17, 18, 19, 22
and 23), the dataset is reduced to what is called a bag of
words and only these words are present in the visualiza-
on. Cases 20 and 21 are representaons of all bi-grams
that pit two primary terms against each other.
Discussion
Paral-text visualizaon is a successful, popular way to draw
a text, presumably because of the way in which a long text
can be eecvely represented using a small set of words.
Simple stascal methods, such as word frequency counts,
are readily interpretable. A list of variously sized words is a
direct way of communicang with any user, from beginner to
expert. Most of the paral-text approaches available online
use stascal methods to extract the part from the whole.
It is our contenon that
extracng part of the cor-
pora can be aected by
the structure and com-
plexity of the whole. In
the visualizaons under
review, half present uns-
tructured text corpora,
but the criteria used in
extracng the part from
the whole are well de-
ned and include lists
of verbs (Characterisc
verbs, case 16), words
occurring in the text in an
“X and Y” paern (Docu-
Burst, case 18) and lists
of words not included in
a list of predened empty
words (Google’s bi-gram
data, case 21).
Figure 4. (Case 16) Novel views: Les misérables. Characteristic verbs by Jeff Clark (2013)
How we draw texts: a review of approaches to text visualization and exploration
El profesional de la información, 2014, mayo-junio, v. 23, n. 3. ISSN: 1386-6710 229
Clearly, extracon processes based on
word or phrase funconality, as opposed
to those that use stascal methods, are
more closely aected by the nature of the
text. Here, we focus on these cases becau-
se they are more interesng in terms of
our research goals. They include the cases
of Novel views: Les misérables. Characte-
risc verbs (case 16), which represents
only verbs, DocuBurst (case 18) which uses
the crowd-sourced lexical database Word-
net as a human-like backup, and Phrase net
(case 19) and the two Google bi-gram vi-
sualizaons (cases 20 and 21).
A common paern detected in the paral-
text visualizaons reviewed is that once a
part of the text has been extracted all ex-
cept one (Document arc diagrams, case
22) discard any reference to the original
text sequence in the visualizaon. See the
following point for a more detailed discus-
sion of this idea.
c) Other subcategories
Here we include sequenal and non-sequenal visuali-
zaons, discourse and syntacc structures visualizaons,
search results and datasets dependent on me visualiza-
ons.
Sequenal visualizaons
Sixteen of the 23 single-text visualizaons maintain a similar
sequence to that of the original text. Seven of these visuali-
ze the sequence using a discourse structure (primarily chap-
ters), while the remaining nine use syntacc elements to re-
present the original sequence of the text (primarily words).
Strikingly, only one paral-text visualizaon, Clark’s Docu-
ment arc diagrams (case 22) (gure 5), follows the original
text sequence, whereas all the whole-text visualizaons are
sequenal. It would thus appear that sequenality is intrin-
sic to whole-text visualizaon. Whole-text visualizaons do
not literally represent every word of the text, but rather pre-
sent a graphical metaphor of the whole: a text line. This text
line may represent either a discourse structure or a syntac-
c structure of the text; but, whatever the case, graphically
a line or area is used to represent the length of the text.
The sequenality of the visualizaon means it can be read
both backwards and forwards, as can the text. In the case of
a long text, such as a book (nine of the 16 cases), the visua-
lizaon can serve as a map or guide to the text.
Non-sequenal visualizaons
Five cases use non-sequenal visualizaons: three use word
clouds (cases 17, 20 and 21), one a net of phrases (case 19)
and one visualizes all the verbs in the text (case 16).
Discourse structures in the visualizaon
Cases: 1, 2, 5, 6, 8, 11, 12 and 13
The eight visualizaons that follow the discourse structure
of the text are sequenal –no cases being found in which
the discourse structure appeared out of sequence with re-
gards to the text. This is perhaps unsurprising, as those ca-
ses in which the text is divided into chapters and each chap-
ter represented as a separate enty were considered as text
collecon visualizaons (e.g., Sentence bar diagrams, case
4). For this reason, all the cases in this secon represent the
parts of a text ordered and aligned (in a curve or line). Of the
eight visualizaons, ve represent chapters or secons of a
book, two represent complete volumes, while one (Colour-
coded chronological sequencing, case 11) divides the text
in colours according to narrave topics and scenes. Indeed,
case 11 is the only one we have idened that uses discour-
se structure elements that are more deeply embedded than
chapters, secons, books and volumes. In all likelihood,
more deeply embedded methods than these, such as, na-
rrave topics, would require manual text line segmentaon.
Syntacc structures in the visualizaon
Cases: 3, 16, 4, 7, 18, 9, 22 and 23.
The other eight sequenal visualizaons use intrinsic text
elements, including groups of words (cases 7, 18, 22 and
23), verbs (case 16), sentences (cases 4 and 9) and a com-
plete text analysis (case 3). Syntacc analysis requires either
word-by-word parsing of the text (using a database of lexi-
cal or semanc word lists) or sentence and paragraph pars-
ing. Syntacc-structure visualizaon is less dependent on
the nature of the text in the sense that the methodology is
unaected by the complexity of the text. Typically, the so-
ware automacally extracts or marks the chosen syntacc
elements.
Search-result visualizaons
Cases: 15, 18 and 23
The three search-result visualizaons were presented as
web applicaons and were, therefore, interacve – the user
being able to query the visualizaon system and obtain a
Figure 5. (Case 15) TileBar search on (patient medicine medical AND test scan cure diagnosis
AND software program) with stricter distribution constaints.
Jaume Nualart-Vilaplana, Mario Pérez-Montoro y Mitchell Whitelaw
230 El profesional de la información, 2014, mayo-junio, v. 23, n.3. ISSN: 1386-6710
unique representaon for each search. The three cases,
however, are no longer available online. DocuBurst (case 18)
is a Prefuse applicaon that can be downloaded (Collins et
al., 2009). Prefuse is a set of soware tools for creang rich
interacve data visualizaons.
TileBars is a classic case of visualizaon (cited 625 mes by
Google Scholar) designed by a leading expert in visualizaon
and search engine interfaces, Mar Hearst. DocuBurst and
Gist icon are interacve radial visualizaons, the laer being
one of the references and main inuences on the develo-
pment of DocuBurst, as explained in the DocuBurst paper
cited.
Search-result visualizaon approaches have not been wi-
dely implemented in informaon retrieval systems and
most result outputs are one-dimensional lists of itemized
texts (Nualart; Pérez-Montoro, 2013). The three cases re-
viewed here are each applied to large datasets and, starng
with a search query, present an improved search output de-
signed to help the user read and lter the results. All three
are parcularly concerned with disnguishing between si-
milar items: TileBars searches PubMed (more than 20 mi-
llion papers); DocuBurst uses the WordNet lexical database
(155,287 words organized in 117,659 synsets for a total of
206,941 word-sense pairs) to classify the visualized text;
and, Gist icons use, among others, the complete dataset of
approximately 7 million USpto patents and the Enron email
dataset comprising 500,000 emails.
In the text collecon category below, we present nine fur-
ther search-result visualizaons.
Time dependent datasets
Cases: 6 and 10.
We present two cases in which the visualizaon approaches
can be used to understand or follow the evoluon of a text
over me. A dynamic
text visualizaon de-
monstrates that data
visualizaon may be the
only way to solve certain
tasks and that it is not
just one more method of
pure data advocacy. For
example, it is extremely
challenging to show how
a Wikipedia entry evol-
ves over me in line with
the editors’ parcipaon
(History ow, case 10)
(gure 6). History ow
provides a soluon to
this problem and sheds light on the complex collaborave
process of Wikipedia.
In the second case (Favoured traces, case 6), an animated
visualizaon demonstrates how Darwin’s ideas evolved
through successive edions of the Origin of Species. In
Ben Fry’s words: “The rst English edion was approxima-
tely 150,000 words and the sixth is a much larger 190,000
words. In the changes are renements and shis in ideas
—whether increasing the weight of a statement, adding de-
tails, or even a change in the idea itself.”
2.2.2. Text collecons
We present text collecons grouped as pure item visualiza-
ons, aggregaon visualizaons and other subcategories. The
laer includes data as a landscape layer and search result vi-
sualizaons. Each subsecon adheres to the following struc-
ture: list of cases, descripon of the group and discussion.
a) Item visualizaons
24) Literature (Note: this converts a single text into a collec-
on). Novel views: Les misérables. Segment word clouds by
Je Clark (2013)
25) Literature. Grimm’s fairy tale network by Je Clark (2013)
26) Twier. Spot by Je Clark (2012)
27) Science. Word storm by Quim Castella and Charles
Suon (2012)
28) Literature. Topic networks in Proust. Topology by Elijah
Meeks and Je Drouin (2011)
29) Wikipedia. Notabilia by D. Taraborelli, G. L. Ciampaglia
and M. Stefaner (2010)
30) Media art. X by Y by Moritz Stefaner (2009)
31) Search engine. Search clock by Chris Harrison (2008)
32) Online media. Digg rings by Chris Harrison (2008)
33) Science. Royal Society Archive by Chris Harrison (2008)
34) Wikipedia. WikiViz: Visualizing Wikipedia by Chris Ha-
rrison (2007)
35) Visualizaon. Area by Jaume Nualart (2007)
36) Chromograms by M. Waenberg, F.B. Viégas and K. Ho-
llenbach (2004)
Figure 6. (Case 10) History flow by Fernanda Viégas and Martin Wattenberg researchers at IBM’s Visual Communication
Lab (2003)
Partial-text visualization is a successful,
popular way to draw a text, presumably
because of the way in which a long text
can be effectively represented using a
small set of words
How we draw texts: a review of approaches to text visualization and exploration
El profesional de la información, 2014, mayo-junio, v. 23, n. 3. ISSN: 1386-6710 231
37) Search engines. KartOO/Ujiko by
Laurent Baleydier and Nicholas Bale-
ydier (2001)
38) Search engines. Touchgraph by
TouchGraph, LLC. (2001)
39) Internet. HotSauce by Rama-
nathan V. Guha (1996)
Descripon
- Number of cases: We idenfy 16
cases that can be categorized as
item visualizaons.
- Years: The cases were published
over a 17-year period from 1996
to 2013.
- Authors: The most prolic authors
in this category are Chris Harri-
son (cases 13, 32, 33 and 34) and
Je Clark (cases 24, 25 and 26),
followed by Moritz Stefaner with
two cases (29 and 30).
- Disciplines and datasets: Inter-
esngly, nine cases are datasets
taken from the Internet: Wikipedia (cases 29, 34 and 36),
search engines (cases 31, 37 and 38), Twier (case 26),
online media (case 32), web pages (case 39). Only three
cases use literary texts (cases 24, 25 and 28). Finally, two
cases visualize scienc papers (cases 27 and 33), one
case uses media art datasets (case 30) and one represents
non-specic collecons (case 35).
Discussion
The main dierence between single-text and text-collecon
visualizaons lies in the nature of the text. In the case of the
laer, most of the texts do not originate from literature and
are accessible online. Yet, the nature of the text appears to
be less important when the goal is
the representaon of the collecon
rather than of the text itself.
Item visualizaons use methods
that are independent of the nature
of the items themselves. Once the
text collecons have been itemized,
the dataset can be considered a ge-
neral case of data visualizaon and
not a pure case of text visualizaon.
For this reason, in this category, the
methods are generally well known
and used in other elds of visualiza-
on. Thus, we nd six network visua-
lizaons (cases 25, 28, 34, 37, 38 and
39), three melines (cases 31, 32
and 33) and three cases that likewise
use melines but which also permit
categorizaon-based groupings (ca-
ses 26, 30 and 35) (gure 7).
Finally, four cases are, we believe,
quite specic to text visualizaon.
Two are concerned with item com-
parison: Segment word clouds (case 24) and Word storm
(case 27). Segment word clouds transforms a single text
into a text collecon. Specically, it is used to represent the
chapters of Les misérables as word cloud items, thus facili-
tang their comparison. It also uses colour to idenfy words
as they acquire prominence in the text.
Word storm is a reinvenon of word cloud, or more speci-
cally a variaon of Wordle (case 17) that allows word clouds
to be compared. This is achieved by assigning a xed posi-
on to each word. This simple idea makes it visually easy
to compare word clouds while maintaining the usual word
cloud features.
Figure 7. (Case 30) X by Y by Moritz Stefaner (2009)
Figure 8. (Case 29) Notabilia. 100 longest Article for deletion [AfD] discussions on Wikipedia by Dario
Taraborelli, Giovanni-Luca Ciampaglia (data and analysis) and Moritz Stefaner (visualization) (2010)
Jaume Nualart-Vilaplana, Mario Pérez-Montoro y Mitchell Whitelaw
232 El profesional de la información, 2014, mayo-junio, v. 23, n.3. ISSN: 1386-6710
To conclude, Notabilia (case 29) and Chromograms (case 36)
are two highly original cases that deserve menon. The very
specic design of Notabilia shows the evoluon of “Arcle
for deleon” discussions of Wikipedians (gure 8), discus-
sions that are somemes more like “ame wars” given the
controversies that rage over the simple existence of certain
denions. Notabilia visualizes the evoluon of the hun-
dred longest discussions and their nal outcomes. Moritz
Stefaner’s visualizaon constutes an interacve bushtree,
the branches of which are highlighted when moused over.
The shape of the branches informs the reader about the na-
ture of the discussion: cyclical, straight or never-ending.
Chromograms is also based on Wikipedia data, providing an
analysis of the comments of editors for each edion of a Wi-
kipedia entry. Visually it produces colour-coded stripes that
in a small space rapidly inform the reader about the edit
history of Wikipedia entries.
b) Aggregaon visualizaons
40) Literature. Grimm’s fairy tale metrics by Je Clark (2013)
41) Topic models. Termite by J. Chuang, C.D. Manning and
J. Heer (2012)
42) Wikipedia. Pediameter by Müller-Birn, Benedix and Han-
tke (2011)
43) Google suggesons. Web Seer by Fernanda Viégas &
Marn Waenberg (2009)
44) Google n-grams. Web trigrams: visualizing Google’s tri-
gram data by Chris Harrison (2008)
45) Polical speech. Feature-
Lens by A. Don, E. Zheleva, M.
Gregory, S. Tarkan, L. Auvil, T.
Clement, B. Shneiderman and
C. Plaisant (2007)
46) Online news. Newsmap by
Marcos Weskamp (2004)
47) Email conversaon. Themail
by Fernanda B. Viégas, Sco
Golder, Judith Donath (2006)
48) Search engine. WebBook by
S.K. Card, G.G. Robertson and
W. York (1996)
49) Any texts. Dotplot appli-
caons by Jonathan Helfman
(1994)
Descripon
- Number of cases: We idenfy 10 cases that can be catego-
rized as aggregaon visualizaons.
- Years: The cases were published over a 19-year period
from 1994 to 2013.
- Authors and datasets: Only Fernanda B. Viégas parcipa-
ted in more than one of the 10 cases in this category (ca-
ses 43 and 47); the rest parcipated in just one case each.
The texts are very similar in nature to those in the item
visualizaon category. Five cases are corpora that can be
found online (Wikipedia, case 42; Google, cases 43 (gu-
re 9) and 44; online news, case 46; search engine results,
case 48). The standard unstructured texts include one
from literature (Sentence Bar Diagrams, case 4), one from
polical speeches (FeatureLens, case 45) and one from a
year’s worth of email conversaons between two corres-
pondents (Themail, case 47). Finally, there are two quite
unique cases: Termite (case 41) and Dotplot (case 49). All
the cases are discussed below.
Discussion
Aggregaon visualizaons is the category with the greatest
variaon in the methods employed. Thus, apart from visua-
lizing text collecons, the only thing the 10 cases assigned
to this category have in common is that they do not repre-
sent specic items.
Given these circumstances, we comment on each case se-
parately:
Sentence bar diagrams (case 40) provide a matrix (or table-
like) visualizaon that allows rows to be sorted by clicking
on columns. The columns provide a quantave denion
of 13 metrics related to the 62 stories making up Grimm’s
fairy tales. It is a powerful tool for analysing, understanding
and comparing the tales.
Termite (case 41) is a case that represents an intermediary
dataset known as topic models. Topic models are a “cle-
verer” way of obtaining a bag-of-words from a text than
applying a typical word-frequency stascal analysis. Ter-
mite does not visualize texts but it does compare parts of
Figure 9. (Case 43) Web seer by Fernanda Viégas & Martin Wattenberg (2009)
It might prove more effective to apply vi-
sualization techniques to texts that have
a more formal register and/or predefi-
ned outline and a well-defined vocabu-
lary
How we draw texts: a review of approaches to text visualization and exploration
El profesional de la información, 2014, mayo-junio, v. 23, n. 3. ISSN: 1386-6710 233
texts. As such, the tool can be used to com-
pare topic models.
Pediameter (case 42) is a specic interface
that uses bar charts to show Wikipedia edi-
ons in real me. It is most remarkable for
using a device known as an Arduino to detect
edions and transcribe them to a physical in-
dicator, merging digital and material worlds.
Web Seer (case 43) is another specic visua-
lizaon method that shows the most popular
search queries based on Google suggesons.
The approach allows queries to be compared
by represenng the suggesons with trees
and then connecng the matching branches.
The simplicity of this case contrasts with its
power of communicaon: rapid and user
friendly.
Google’s tri-gram data (case 44) uses a simi-
lar visualizaon method to that used by Web
seer. It draws on the huge Google n-gram dataset and repre-
sents and compares three-word sentences (tri-grams).
FeatureLens (case 45) is an interacve, dashboard-style in-
terface for comparing texts. The central representaon uses
a visualizaon of frequent concepts similar to that used by
Tex t y (case 7) and TileBars (case 15). It allows text browsing
and shows line graphs of frequent words found throughout
a text.
Newsmap (case 46) uses treemap visualizaon to oer a
new method for reading and monitoring the news in real-
me, employing online Google news feeds. It is totally cus-
tomizable in terms of topic, country and publicaon me.
The soware, which is available free of charge online, can
also be used for news searches.
TheMail (case 47) is an experiment in which a highly specic
interface was developed to follow and analyse the evoluon
of an email correspondence between two people over the
course of one year. It visualizes the words that characterize
each of the writers and their evoluon over me.
When rst developed in 1996, WebBook (case 48) (gure
10) was a somewhat surprising applicaon, as it trans-
formed search engine results in a mulmedia (text and
images, primarily) mash-up based on the metaphor of the
book. The applicaon was a pure text (web pages) collecon
visualizaon that presented the results as aggregaons of
text and images.
Finally, Dotplot (case 49) was an innovave visualizaon
applicaon with mulple uses, not unlike Arc diagrams
(case 14). The main use of Dotplots is for text comparisons,
including mul-language, text version and programming
code comparisons.
c) Other subcategories
Here we include landscape data layers, search-result visuali-
zaons and me-dependent datasets.
Landscape as an addional data layer
Cases: 40, 26, 28, 33, 47, 37, 38 and 49.
The typical concept of landscape data is a network visuali-
zaon comprising two layers of data, as in Topic networks
(case 28). In this specic case, the rst layer is provided by
the Marcel Proust texts represented as items and the se-
cond layer by a network of topic models of these texts. The
posions of the nodes of both layers are opmised so that
proximity indicates more strongly related nodes. This de-
nion of landscape can also be found in the defunct search
engine results provided by KartOO/Ujiko (case 37) and To u -
chGraph (case 38).
All the other cases included in this category present text co-
llecons in combinaon with more data. This is the case of
Dotplot, which represents the coincidence or otherwise of
strings in various texts, and of Grimm’s fairy tale metrics,
which combines a list of texts in rows with various parame-
ters listed in columns. These parameters do not form a di-
rect part of the text, but rather they are recalculated featu-
res related to the text, including, for example, length, lexical
diversity and the presence of dierent groups of words that
represent enes (for example: body -> hand, head, heart,
eyes and foot) in each tale.
A third kind of landscape is based on the representaon of
med metadata, as exemplied by Spot (case 26), the Royal
Society Archive (case 33) and TheMail (case 47).
A common feature of landscape visualizaons is their capa-
city to compare a collecon of texts simultaneously with a
second parameter, while their main limitaon is the number
of items represented so that large numbers create problems
of overlapping items.
Search result visualizaons
Cases: 26, 43, 35, 45, 47, 46, 37, 38 and 48.
Compared to single-text visualizaons, text-collecon visua-
lizaons include considerably more cases oering search
capacies (three vs. nine). Common sense suggests that
when presenng a text collecon, a natural feature of such
an approach will be a way of selecng part of that collecon
based on given criteria, i.e., lter and search features.
Figure 10. (Case 48) WebBook by Stuart K. Card, George G. Robertson, and William York
(1996)
Jaume Nualart-Vilaplana, Mario Pérez-Montoro y Mitchell Whitelaw
234 El profesional de la información, 2014, mayo-junio, v. 23, n.3. ISSN: 1386-6710
All the cases included in this category allow search queries
and output a unique visualizaon for each query. All the ca-
ses include a search box and a search buon.
Time-dependent datasets
Cases: 42, 29, 36 and 46.
The four cases included in this category allow the user to
monitor the evoluon of the texts in the collecon over
me. Only one is designed for use in real-me (Newsmap,
case 46), but potenally all of them can visualize the collec-
on on a specic date and at a specic me.
One obstacle faced by an approach that represents changes
in text collecons over me is providing access to an upda-
ted feed or an accessible API. It is presumably for this reason
that three of the four use Wikipedia data and the other uses
Google news. In all cases, they are online sources that have
long allowed public access to their feeds.
3. Conclusions
The diversity of approaches developed in dierent discipli-
nes, the wide diusion of publicaons or, on occasions, the
absence of formal publicaons of innovave ideas, repre-
sent a considerable challenge to the undertaking of a com-
prehensive survey of the work completed in this eld. Thus,
some of the visualizaons we present here have been unear-
thed in highly specic publicaons, the case for example of
Joel Deshaye and Peter Stoiche and their work on repre-
senng Faulkner (cases 11, 12 and 13). If we read Stoiche’s
working notes it is apparent that their visualizaons were
developed to facilitate the study of William Faulkner’s na-
rrave melines. There are no addional references to the
applicaon of these interesng ideas to other texts, sugges-
ng that more works remain hidden in the depths of other
elds.
Text visualizaon, as we have argued throughout this re-
view, may be considered a subeld of data visualizaon. Yet,
the boundaries of the discipline are not always clearly de-
ned. This is readily illustrated, for example, by the case of
Harrison’s Search clock (case 31), in which the text corpora
comprise an enormous dataset of search engine queries.
Can this dataset really be considered a collecon of texts
when each of them, in most instances, is no more than one
or two words in length? Does a text have to sasfy a mi-
nimum length in order to be considered a text? Here, we
opted to treat case 31 as a collecon of texts, short ones
admiedly but, ulmately, texts.
Clearly, the crical decision to be made throughout this re-
view has been how to classify the cases idened. As few
papers have aempted to review only text visualizaon
approaches, we turned to classic data visualizaon reviews
(e.g., Shneiderman, 1996) as well as to more recent ones
(e.g., Collins et al., 2009). In all these instances, the classi-
caons were based on tasks that the visualizaon approach
can solve rather than on the explicit aspects of the visuali-
zaon themselves. For this reason we chose to propose our
own classicaon, which, while far from perfect, we hope
will be useful for undertaking a classicaon based on visual
features.
We conclude with a list of insights, as well as shortcomings,
that we have idened to date:
- Single-text visualizaons have been applied mainly to li-
terature, a eld that, apart from being characterized by
complex combinaons of words, can present high levels
of human abstracon and freedom of structure and ex-
perimentaon. As such it might prove more eecve to
apply visualizaon techniques to texts that have a more
formal register and/or predened outline and a well-
dened vocabulary, such as legal texts, scienc papers,
template-based texts and communicaons, etc.
- We have idened only one single/paral-text visualiza-
on that is sequenal (Document arc diagrams, case 22).
Most paral-text visualizaons extract the essence of the
text based on one or more criteria and so the original se-
quence of the text is lost. Since sequenal visualizaon
approaches present certain advantages, it seems that
paral-visualizaon approaches that maintain the original
text sequence should be encouraged.
- Text-collecon visualizaons tend to employ methods
that are used for data visualizaon in general. Hence, the-
re is a need for further experimentaon in applying more
standard data visualizaon methods and approaches to
the specic subeld of text visualizaon.
- Text collecon aggregaons is the category in which the
most specic designs and ideas have been developed.
More work needs to be undertaken to idenfy any com-
mon approaches in this kind of visualizaon.
And, nally, we pose the following queson:
- Why is it that most of the cases reviewed here that are
more than ve years old are no longer available online?
If the soware used is no longer (or was never) in use,
we should perhaps queson its eecveness. While we
have not invesgated just how many cases form part of
commercial soware products and how many, following
publicaon, have simply been forgoen, the queson
remains as to why some apparently magnicent ideas
did not establish themselves as new standards. Our cha-
llenge to researchers is to produce applicaons that will
be adopted in one eld or another, or which can solve a
problem for a certain group of users; indeed, as the cases
reviewed here highlight, adopon seems to represent a
considerable challenge.
Acknowledgement
This work is part of the project “Acve audiences and jour-
nalism. Interacvity, web integraon and ndability of jour-
nalisc informaon”. CSO2012-39518-C04-02. Naonal
plan for R+D+i, Spanish Ministry of Economy and Compe-
veness.
4. References
Anglin, Gary J.; Vaez, Hossein; Cunningham, Kathryn L.
(2004). “Visual representaons and learning: The role of sta-
c and animated graphics”. Handbook of research on educa-
onal communicaons and technology, 2, pp. 865-916.
Baeza-Yates, Ricardo; Ribeiro-Neto, Berthier et al. (1999).
Modern informaon retrieval. New York: ACM press, vol. 463.
How we draw texts: a review of approaches to text visualization and exploration
El profesional de la información, 2014, mayo-junio, v. 23, n. 3. ISSN: 1386-6710 235
Baeza-Yates, Ricardo; Broder, Andreiz; Maarek, Yoelle
(2011). “The new froner of web search technology: Seven
challenges”. Search compung, v. 6585 of Lecture notes in
computer science, pp. 3-9.
hp://dx.doi.org/10.1007/978-3-642-19668-3_1
Benavides, David; Segura, Sergio; Ruiz-Cortés, Antonio
(2010). “Automated analysis of feature models 20 years
later: A literature review”. Informaon systems, v. 35, n. 6,
pp. 615-636.
hp://dx.doi.org/10.1016/j.is.2010.01.001
Collins, Christopher; Carpendale, Sheelagh; Penn, Gerald
(2009). “DocuBurst: Visualizing document content using
language Structure”. Computer graphics forum (Procs. of
the Eurographics/IEEE-VGTC Symposium on visualizaon,
EuroVis), v. 28, n. 3, pp. 1039-1046.
hp://dx.doi.org/10.1111/j.1467-8659.2009.01439.x
Feldman, Ronen; Sanger, James (2006). The text mining han-
dbook: advanced approaches in analyzing unstructured data.
Cambridge University Press. ISBN: 13 978 0 521 83657 9
Grobelnik, Marko; Mladenić, Dunja (2002). “Ecient visua-
lizaon of large text corpora”. In: Procs of the 7th seminar.
Dubrovnik, Croaa.
hp://ailab.ijs.si/dunja/SiKDD2002/papers/GrobelnikSep02.
pdf
Hearst, Mar A. (2003). What is text mining?
hp://people.ischool.berkeley.edu/~hearst/text-mining.html
Hearst, Mar A. (2009). “Search user interfaces”, Chapter 1.
ISBN: 9780521113793
hp://searchuserinterfaces.com/book
hp://searchuserinterfaces.com/book/sui_ch1_design.html
Hearst, Mar A. (2011). “Natural search user interfaces”.
Communicaons of the ACM, v., 54, n. 11, November, pp.
60-67.
hp://cacm.acm.org/magazines/2011/11/138216-natural-
search-user-interfaces/fulltext
hp://dx.doi.org/10.1145/2018396.2018414
Heer, Je (2010). “A conversaon with Je Heer, Marn
Waenberg, and Fernanda Viégas”. Queue, v. 8, n. 3, 10 pp.,
March.
hp://doi.acm.org/10.1145/1737923.1744741
Iliinsky, Noah (2013). Choosing visual properes for suc-
cessful visualizaons. IBM Soware. Business Analycs.
http://public.dhe.ibm.com/common/ssi/ecm/en/
ytw03323usen/YTW03323USEN.PDF
Kitchenham, Barbara (2004). Procedures for performing
systemac reviews. Keele, UK, Keele University, 33 pp.
Levie, W. Howard; Lentz, Richard (1982). “Eects of text
illustraons: A review of research”. ECTJ, v. 30, n. 4, pp.
195–232.
Mann, Thomas M. (2002). Visualizaon of search results
from the world wide web.
hp://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.91.2535
Meeks, Elijah (2011). Digital humanies specialist. Docu-
ments.
https://dhs.stanford.edu/comprehending-the-digital-
humanies/documents
Nualart-Vilaplana, Jaume (2013). How we draw texts: a
visualizaon of text visualizaon tools.
hp://research.nualart.cat/textvistools
Nualart, Jaume; Pérez-Montoro, Mario (2013). “Texty, a vis-
ualizaon tool to aid selecon of texts from search outputs”.
Informaon research, v. 18, n. 2, June.
hp://www.informaonr.net/ir/18-2/paper581.html
Shneiderman, Ben (1996). “The eyes have it: A task by data
type taxonomy for informaon visualizaons”. In: Visual
Languages. Proceedings IEEE Symposium, pp. 336–343.
hp://dx.doi.org/10.1109/VL.1996.545307
Šilić, Artur; Dalbelo-Bašić, Bojana (2010). “Visualizaon of
text streams: A survey”. Knowledge-based and intelligent in-
formaon and engineering systems, v. 6277 of Lecture notes
in computer science, pp. 31–43. Berlin, Heidelberg: Springer.
hp://dx.doi.org/10.1007/978-3-642-15390-7_4
Stefaner, Moritz (2013). Gender balance visualizaon.
hp://moritz.stefaner.eu/projects/gender-balance/#NUM/
NUM
Strecker, Jacqueline (2012). Data visualizaon in review:
summary. Internaonal Development Research Centre
(IDRC), Oawa, ON, Canada.
http://idl-bnc.idrc.ca/dspace/bitstream/10625/49286/1/
IDL-49286.pdf
Times Higher Educaon. World university rankings 2012-
2013.
hp://www.meshighereducaon.co.uk/world-university-
rankings/2012-13/world-ranking
Tue, Edward R.; Graves-Morris, P. R. (1983). The visual dis-
play of quantave informaon, v. 2. Cheshire, CT: Graphics
Press, 199 pp.