ArticlePDF Available

How we draw texts: A review of approaches to text visualization and exploration

Authors:

Abstract and Figures

This paper presents a review of approaches to text visualization and exploration. Text visualization and exploration, we argue, constitute a subfield of data visualization, and are fuelled by the advances being made in text analysis research and by the growing amount of accessible data in text format. We propose an original classification for a total of 49 cases based on the visual features of the approaches adopted, identified using an inductive process of analysis. We group the cases (published between 1994 and 2013) in two categories: single-text visualizations and text-collection visualizations, both of which can be explored and compared online.
Content may be subject to copyright.
El profesional de la información, 2014, mayo-junio, v. 23, n. 3. ISSN: 1386-6710 221
Article received on 19-01-2014
Approved on 09-03-2014
How we draw texts: a review of approacHes to
text visualization and exploration
Jaume Nualart-Vilaplana, Mario Pérez-Montoro y Mitchell Whitelaw
Jaume Nualart-Vilaplana is a PhD candidate in the Faculty of Arts and Design, University of Can-
berra (Australia), research engineer at Nicta (Australia), and a PhD candidate in the Faculty of In-
formaon Science, University of Barcelona. MAS and MSc (Licenciatura) at Autonomous University
of Barcelona
hp://orcid.org/0000-0003-4954-5303
Machine Learning Research Group at NICTA, Canberra Research Laboratory
Tower A, 7 London Circuit, Canberra City ACT 2601, Canberra, Australia
jaume.nualart@canberra.edu.au
Mario Pérez-Montoro holds a PhD in Philosophy and Educaon from the University of Barcelona
and a Master in Informaon Management and Systems from the Polytechnic University of Catalo-
nia. He studied at the Istuto di Discipline della Comunicazione at the Università di Bologna (Italy)
and has been a vising scholar at the Center for the Study of Language and Informaon (CSLI) at
Stanford University (California, USA) and at the School of Informaon at UC Berkeley (California,
USA). He is a professor in the Department of Informaon Science at the University of Barcelona. His
work has focused on informaon architecture and visualizaon. He is author of the book Arquitec-
tura de la información en entornos web (Trea, 2010).
hp://orcid.org/0000-0003-2426-8119
Facultat de Biblioteconomia i Documentació, Universitat de Barcelona
Melcior de Palau, 140. 08014 Barcelona, España
perez-montoro@ub.edu
Mitchell Whitelaw is an academic, writer and praconer with interests in new media art and
culture, especially generave systems and data-aesthecs. His work has appeared in journals in-
cluding Leonardo, Digital creavity, Fibreculture, and Senses and society. In 2004 his work on a-life
art was published in the book Metacreaon: art and arcial life (MIT Press, 2004). His current
work spans generave art and design, digital materiality, and data visualisaon. He is currently an
associate professor in the Faculty of Arts and Design at the University of Canberra, where he leads
the Master of Digital Design. He blogs at The Teeming Void.
hp://orcid.org/0000-0001-9013-9732
Faculty of Arts and Design, University of Canberra
Bldg, Floor & Room: 9, C12. ACT 2617, Canberra, Australia
mitchell.whitelaw@canberra.edu.au
Abstract
This paper presents a review of approaches to text visualizaon and exploraon. Text visualizaon and exploraon, we ar-
gue, constute a subeld of data visualizaon, and are fuelled by the advances being made in text analysis research and by
the growing amount of accessible data in text format. We propose an original classicaon for a total of 49 cases based on
the visual features of the approaches adopted, idened using an inducve process of analysis. We group the cases (publis-
hed between 1994 and 2013) in two categories: single-text visualizaons and text-collecon visualizaons, both of which
can be explored and compared online.
Keywords
Review, Text visualizaon, Data visualizaon, Data exploraon, Data display, Informaon visualizaon, Text analysis.
Título: Cómo dibujamos textos. Revisión de propuestas de visualización y exploración textual
artÍculos
Nota: Este arculo puede leerse traducido al español en:
hp://www.elprofesionaldelainformacion.com/contenidos/2014/may/02_esp.pdf
Jaume Nualart-Vilaplana, Mario Pérez-Montoro y Mitchell Whitelaw
222 El profesional de la información, 2014, mayo-junio, v. 23, n.3. ISSN: 1386-6710
Resumen
En este trabajo se presenta una revisión de estrategias para la visualización y exploración de textos. Se argumenta que la
visualización y exploración de textos constuye un subcampo de la visualización de datos que se nutre de los avances en
el análisis de textos y de la creciente candad de datos accesibles en formato texto. Proponemos una clasicación original
para un total de cuarenta y nueve casos revisados. La clasicación está basada en las caracteríscas visuales de cada caso,
idencadas mediante un proceso inducvo de análisis. Agrupamos los casos (publicados entre 1994 y 2013) en dos cate-
gorías: las visualizaciones de texto individuales y la visualizaciones de colecciones de textos. Los casos revisados pueden ser
explorados y comparados en línea.
Palabras clave
Visualización de texto, Visualización de datos, Exploración de datos, Visualización de información, Análisis de textos.
Nualart-Vilaplana, Jaume; Pérez-Montoro, Mario; Whitelaw, Mitchell (2014). “How we draw texts: a review of ap-
proaches to text visualizaon and exploraon”. El profesional de la información, mayo-junio, v. 23, n. 3, pp. 221-235.
hp://dx.doi.org/10.3145/epi.2014.may.02
1. Introducon
The aim of this review is to propose a classicaon of text
visualizaon and exploraon tools, while describing the
broader context in which they operate. To do so, we list, clas-
sify and discuss the most important contribuons made in
the eld of text visualizaon and exploraon between 1994
and 2013. This eld is undergoing rapid growth –fuelled by
open data iniaves and web scraping– and has become
highly diversied, developing in parallel in a range of disci-
plines. Some of the most important visualizaon methods
invented between 1765 and 1999 were the meline, bar
chart, pie chart, ow map, Venn diagram, histogram, Gan
chart, owchart, tag cloud, social networks, boxplot, star
plot, treemap, headmap, and sparkline. Figure 1 presents
a word cloud (using Wordle) of the professions pracced by
their respecve inventors. Given this diversity, our search
for cases has been conducted in many dierent contexts
and has involved the examinaon of many dierent sources,
ranging from the sciences to the humanies, from academic
journals to blog sites, from universies to freelance studios,
and from open data instuons to open data communies.
Clearly this proliferaon of disciplines has meant the adop-
on of a variety of dierent philosophies and points of view.
This review aims to help those that work with data, and es-
pecially with texts (but by no means limited to academics),
to use visualizaon techniques that can idenfy paerns or
behaviours present in the textual reality. Moreover, these
techniques can help users improve –in terms of both the
speed and the clarity of the process– the way in which they
visualize and discover the facts that lie within the data.
Drawing a clear conceptual line between approaches to text
visualizaon and exploraon is no straighorward task, but
here we have opted to review cases dedicated to both pro-
cesses, be they described separately or together. Note that
on occasions, for the sake of simplicity, we use the term text
visualizaon in reference to both approaches.
The two types of text visualizaon considered here are:
1) Single-text representaon, that is, ways of extract-
ing meaning from texts based on wring style, document
structure and language register as opposed to pure stas-
cs. Our interest lies in represenng the meaning and sali-
ent features of texts because their convenient visualizaon
can speed up and/or improve our ability to select texts and
manage the me required to tackle them. The research out-
put of elds such as natural language processing, linguisc
compung and machine learning provides techniques for
producing high quality data represenng complex texts. It
is our belief that by combining these techniques with a suit-
able text visualizaon method we can improve the way in
which we examine and understand texts.
2) Representaon and exploraon of collecons of texts.
Exploring and selecng individual texts and navigang and
analyzing collecons of texts are daily tasks for many of
those who work with computers and datasets, and there is
clearly plenty of room for new ideas and tools to facilitate
their work. Informaon re-
trieval is a crical factor in an
environment characterized by
an excess of informaon (Bae-
za-Yates et al., 1999). When
a user conducts a search, the
informaon retrieval systems
normally respond with a list of
results. More oen than not,
the presentaon of these re-
sults plays an important role in
sasfying the users informa-
on needs, so a poor or inad-
Figure 1. Word cloud of the professions practiced by inventors of visualization methods
How we draw texts: a review of approaches to text visualization and exploration
El profesional de la información, 2014, mayo-junio, v. 23, n. 3. ISSN: 1386-6710 223
equate presentaon can thwart the user (Baeza-Yates et al.,
2011). Typically, informaon retrieval systems present the
results of a query in a at, one-dimensional list. Such lists
tend to be opaque in terms of the order they give to the
informaon, i.e., the users are unaware as to why the list is
presented in a parcular order. To rene their search, users
have to interact again, normally by ltering the rst output
of results. It is our belief that new techniques for represent-
ing collecons of texts –including search results– can help
improve navigaon, exploraon and retrieval.
As we show below, data visualizaon can today be consid-
ered a consolidated academic eld (Strecker; IDRC, 2012).
Thus:
- Seven of the top 10 universies according to the Times
Higher Educaon ranking (2012) have departments or re-
search groups working in the eld of data visualizaon.
The discipline is incorpo-
rated in a wide variety
of departments, ranging
from computer science
and stascs to linguiscs
and graphical design, and
from chemistry and phy-
sics to genecs and his-
tory. Recently, data visua-
lizaon has emerged as a
disnct eld, with specic
departments dedicated to
its study and master’s pro-
grams being taught in the
subject (table 1).
- Over the last ve years a number of conferences have
been dedicated primarily to data visualizaon. These are
listed in table 2.
- A number of journals are now specically dedicated to
studies in data visualizaon, and important contribuons
can be found also in conference proceedings (table 3).
Finally, a number of leading websites –including Infosthe-
cs, Visualcomplexity and Visualizingdata.com– play a key
role in the disseminaon of the subject.
1.1. Text visualizaon
Shneiderman (1996) classies regular texts as one-dimen-
sional data, that is, data organized in a sequenal manner,
running right-to-le (or le-to-right), line-by-line, top-to-
boom. Yet, a text can have mulple internal structures, a
morphology made up of paragraphs, sentences and words.
Conference Location Topic No. participants URL
Nicar 2013 USA Data journalism 149 http://ire.org/conferences/nicar-2013
Dd4d 2009 France Information visualization 52 http://www.dd4d.net
FutureEverything 2013 UK Technology/society/art 52 http://futureeverything.org
Resonate 2013 UK Creative code 44 http://www.thisisresonate.co.uk/resonate-13
Graphical web 2012 Switzerland Open web/datavis 38 http://www.graphicalweb.org/2012
IeeeVis - VisWeek 2012 USA Information visualization - http://ieeevis.org
EuroVis 2013 Germany Computational aesthetics - http://www.eurovis2013.de
Siggraph 2013 USA Computer graphics and interactive
techniques -http://s2013.siggraph.org
OzViz 2012 Australia & NZ Workshops for visualisation practitio-
ners, academics and researchers -http://www.ozviz2012.org
Table 2. Conferences dedicated primarily to data visualization ordered by number of participants (Stefaner, 2013)
Institution Rank in
2012 Department/Course URL
Harvard University 1Broad Institute of Harvard and MIT http://www.broadinstitute.org/vis
Massachusetts Institute of Technology 2Broad Institute of Harvard and MIT http://www.broadinstitute.org/vis
University of Cambridge 3-- --
Stanford University 4Stanford Vis Group http://vis.stanford.edu
University of California, Berkeley 5VisualizationLab http://vis.berkeley.edu
University of Oxford 6Visual Informatics Lab at Oxford http://oxvii.wordpress.com
Princeton University 7PrincetonVisLab http://www.princeton.edu/researchcomputing/vis-lab
University of Tokyo 8-- --
University of California, Los Angeles 9IDRE GIS and visualization https://idre.ucla.edu/visualization
Yale University 10 -- --
Table 1. Leading universities and their data visualization departments
Name Url
Parsons journal for information mapping http://pjim.newschool.edu/issues/index.php
Journal of visualization http://springer.com/materials/mechanics/journal/12650
Ieee Transactions on visualization and computer
graphics (TVCG) http://www.computer.org/portal/web/tvcg
Information visualization http://ivi.sagepub.com
International journal of image processing and
data visualization (Ijipdv)http://iartc.net/index.php/Visualization
IEEE Vis (former Visweek)http://ieeevis.org
EuroVis http://www.eurovis2013.de
ACM CHI http://chi2013.acm.org
EG CGF http://www.eg.org
IVS http://www.graphicslink.co.uk/IV2013
Table 3. Main journals dedicated to data visualization
Jaume Nualart-Vilaplana, Mario Pérez-Montoro y Mitchell Whitelaw
224 El profesional de la información, 2014, mayo-junio, v. 23, n.3. ISSN: 1386-6710
Depending on its informaon structure, a text may be orde-
red by chapters, parts, secons, subsecons, etc. If a text is
given in a specic format, such as html, then it may be orga-
nized into bodies, divs, paragraphs, etc. In these examples
the text includes tree structures as well as a one-dimensio-
nal structure. Addionally, texts may have a subjecve com-
ponent and an abstract structure that is not readily analy-
sed by a computer. All in all, these data types and structures
constute the specicies of a text.
The amount of data to which we have access grows on a
daily basis. Most of these data are in text format, as Fernan-
da Viégas and Marn Waenberg in an interview with Je 
Heer argue: “One of the things I think is really promising is
visualizing text. That has been mostly ignored so far in terms
of informaon visualizaon approaches, and yet a lot of the
richest informaon we have is in text format” (Heer, 2010).
Data analysis denes the boundaries of data visualizaon,
i.e., it provides the ne line between mulple truths and
lies. In the case of text visualizaon, this role has been taken
on by text analysis: in the main, via computaonal linguis-
cs, natural language processing, machine learning and sta-
scs. The advances made in text analysis at a whole range
of levels have provided computers with text understanding,
enabling them to modify a text, the so-called unstructured
data (see next subsecon “Text analysis”).
There is some discussion as to whether text visualizaon
might be considered a specic subeld of data visualizaon.
Some authors tend to disagree: Illinski (2013) claims that
text cannot be considered a data type; Šilić (2010) argues
that “unstructured text is not suitable for visualizaon”. Yet,
as discussed above, most text visualizaons transform the
inial “unstructured” textual data into a reduced, structu-
red dataset. This new dataset is no longer one-dimensional,
but rather it constutes a categorical or a network dataset
and it can be represented with a wide range of tools that are
not specic to text representaon (Hearst, 2009; Grobelnik;
Mladenić, 2002).
As we show in the cases we review here, most text visua-
lizaons do not represent raw data: that is, the text as it
is. Rather what they do is transform the text into smaller
chunks of data, normally extracng a representave part of
that text. This process is one of data transformaon and it
occurs, for example, when a text is reduced to a list of words
based on their frequency of appearance. In that case, the
method chosen to represent the data will belong to a family
of methods best suited to the data type. In this review we
consider the most frequently employed strategies to repre-
sent single texts or collecons of texts, paying special aen-
on to strategies for represenng textual data as it is, as a
regular text, with all its complexies, irregularies and rich
abstracons.
Text analysis is a key eld for text visualizaon. Below, we
present a brief commentary on this maer and its relaons-
hip with text visualizaon.
1.2. Text analysis
Text analysis, roughly synonymous with text mining (Feld-
man; Sanger, 2006), is an interdisciplinary eld that inclu-
des informaon retrieval, data mining, machine learning,
stascs, linguiscs and natural language processing. Accor-
ding to Mar Hearst (2003), the goal of text mining is to
discover “heretofore unknown informaon, something that
no one yet knows and so could not have yet wrien down”.
Text mining is a subeld of data mining whose typical appli-
caons include the analysis or comparison of literary texts,
the analysis of biological and genomic data sequences and,
more recently, the idencaon of consumer behaviour pat-
terns or the detecon of the fraudulent use of credit cards.
Hearst dierenates these applicaons from informaon
extracon operaons, such as the extracon of people’s
names, addresses or job skills. This laer task can be done
with >80% accuracy, but the former, the full interpretaon
of natural language by a computer program, looks like it will
not be possible for “a very long me” (Hearst, 2003).
To study text visualizaon and exploraon it is important to
examine the literature dedicated to both data visualizaon
and text analysis, given the signicant interrelaonships
that exist. Thus, while the text analysis output may limit the
possibilies of visual presentaon and interacon with the
text, there is strong empirical evidence indicang that peo-
ple learn beer with a combinaon of text and illustraon
(visualizaon) than with text alone (Anglin et al., 2004; Le-
vie; Lentz, 1982).
2. Review
In this secon we propose a possible classicaon based on
the visual features that characterize the approaches to tex-
tual visualizaon and exploraon, as idened in 49 cases.
The methodology to collect the cases is a two-part process.
First, a tradional literature search and review (including prac-
cal examples and visualisaon studies); and second, a subset
of these have been selected, based on a preliminary analysis
of their features. The aim was to select cases that provided a
representave overview of the range of work in the eld.
The classicaon of the cases is the product of empirical ob-
servaon following an inducve analysis. The classicaon
is followed by an analysis of these cases.
There are alternaves to those used in this paper for the se-
lecon and categorizaon of primary source methodologies
such as Kitchenham (2004) and Benavides; Segura; Ruiz-
Cortés (2010).
2.1. Classicaon of approaches
The basic classicaon of text visualizaon approaches
comprises two categories according to the type of data to
which they are applied:
1) Textual documents: that is, representaons of single
texts, where text is understood as a sequence of words or-
dered according to the hierarchy: document > paragraphs
Seven of the top 10 universities have de-
partments or research groups working in
data visualization
How we draw texts: a review of approaches to text visualization and exploration
El profesional de la información, 2014, mayo-junio, v. 23, n. 3. ISSN: 1386-6710 225
> sentences > other punctuaon marks > words > syllables
and phonemes or morphemes. Where a text is a book or
another kind of structure, then, it may have more granu-
laries, including: chapters > secons > sub secons > etc.
We also include the metadata of the text and other aa-
ched texts, i.e., tle, author(s), publisher, copyright notes,
acknowledgement, dedicaon, preface, table of contents,
forward, glossary, bibliography, index, etc.
2) Text collecons: that is, a group of texts in which each
item constutes a clearly dierenable enty. Typically
when speaking of collecons of texts, we speak of texts
that have elements in common, be it their register, length
or structure. All the cases we review here are collecons of
the same text type. Heterogeneous collecons of texts are
also referenced in the literature (Meeks, 2011), especially in
representave analyses of a eld of knowledge, where the
aim is to include the greatest possible variety of expressions
and vocabulary. In such cases the dataset can be said to be
heterogeneous in term of its structure and register.
To these two data types, we then add several subjecve
subdivisions to each category according to the visual featu-
res used to represent the textual features. The aim here is
to be able to describe and explain the cases under review, as
well as to idenfy the key features of the text visualizaon
approaches.
Single texts
- Whole <-> Part
- Sequenal <-> Non sequenal
- Discourse structure <-> Syntacc structure
- Search
- Time
Text collecons
- Items <-> Aggregaons
- Landscape
- Search
- Time
2.1.1. Single texts
In the specic instance of single texts, we classify the ca-
ses according to the part of the text that is represented,
whether the approach follows the same sequence as that
of the text, and the text structure employed in each case.
Whole or part?
In some instances, one part of the text is considered the
essence of the text and is used in the visualizaon process
rather than the whole text. Yet, there are processes that use
the whole text, at least implicitly. Examples include:
- chapters of a book but not the whole text.
- representaon of all the sentences of the text as coloured
lines.
- verbs of a text, providing an impression of the style of the text.
- characters of a novel and their appearance within the text.
- places or dates present in the text.
- etc.
The cases in which the whole text is explicitly represented
are, for obvious reasons, cases involving relavely short
texts, e.g., song lyrics, speeches, poems, etc.
In some instances, such as when using Radial word connec-
ons (see, case 1 below) only certain words from the text
are represented; yet, we classify this case as a whole text re-
presentaon because the whole novel, chapter by chapter,
is implicitly represented in the circle.
In those instances in which the whole text is represented
(even implicitly) as one central element in the visualizaon,
we classify it as being a whole-text visualizaon.
Does the visualizaon follow the same sequence as that
of the text?
If the visualizaon follows the same sequence, or order, as
that of the text, then the case is considered sequenal; if
not, then it is considered non-sequenal. For example, a ty-
pical case that does not follow the same sequence as that of
the original text would be a word cloud (see gure 1).
Does the visualizaon use elements from discourse struc-
ture or from syntacc structure?
A text may present one of two kinds of structure that we
consider useful for our research. One is so-called discourse
structure. Depending on the nature of the text, the discour-
se structure can be completely subjecve to the author’s
point of view –as in literature–, or restricted to a given struc-
ture –as in legal and scienc texts. In linguiscs, discourse
is a broad concept, but here we use it to refer to the parts
of a text and the outline of a document: parts, chapters,
secons, subsecons, etc. The discourse structure is widely
used when visualizing texts because it is a relavely straight-
forward way to represent the text sequence.
The second structure is the text’s syntacc structure, refe-
rred to text structure in sentences, phrases and word clas-
ses ―including verbs and nouns. This is an objecve struc-
ture and is dependent on the rules of linguiscs. In text
visualizaons, the elements comprising this structure, such
as sentences, are very common.
2.1.2. Text collecons
In the specic instance of text collecons we classify the ca-
ses according to pure items or aggregaons, i.e., as pure data
or data landscapes. Thus we determine whether the items
making up the collecon can be dierenated or represen-
ted as aggregaons. The specic quesons we address are:
How is each item in the collecon graphically represented?
Is each text represented as a graphical enty, i.e., as a point,
a word or short sentence? Can the items in the visualizaon
be counted, i.e., are they visually dierenated?
There are cases in which each item is not represented by a
graphically disnct enty, but rather, for example, as a co-
loured block. Alternavely, the items are accumulated and
shown as frequency distribuons. When the items of the
collecon are not graphically disnct (visually countable)
Most text visualizations transform the
initial ‘unstructured’ textual data into a
reduced structured dataset
Jaume Nualart-Vilaplana, Mario Pérez-Montoro y Mitchell Whitelaw
226 El profesional de la información, 2014, mayo-junio, v. 23, n.3. ISSN: 1386-6710
then we speak in terms of the visualizaon of an aggrega-
on rather than that of an item.
Pure data or data and landscape?
Are the items of the collecon accompanied by any graphi-
cal content? Is another dataset, apart from that emanang
from the text, also being represented? Some cases present
the items embedded in a graphical environment, such as a
map. This context might be an actual geographical map, a
metaphor, or, for example, a conceptual landscape compo-
sed of words that form a second layer complemenng that
of the data collecon, in which every distance plays a role:
item-item (similarity between documents), word-item (im-
portance of a word in a document), word-word (similarity
between words in the collecon).
Scales and axes are not considered as landscapes, nor are
the elements of the interface in which the representaon is
embedded. This data layer, if not considered as the main da-
taset, would reduce substanally Tue’s data-ink rao (Tuf-
te; Graves-Morris, 1983) compared to the rao of a pure
data representaon.
2.1.3. Both single texts and text collecons
Properes that are equally applicable to single-text and
text-collecon visualizaons include me, search results
and dataset size.
Does me play a role?
Do the texts change over me? One set of visualizaon ap-
proaches highlights the changes undergone by a dataset
over me. The most common approaches of this kind have
been developed in computer science to represent code evo-
luons or in Wikipedia to indicate various aspects of arcle
revisions.
This category also includes visualizaons in which the data-
set itself changes over me; for example, the visualizaon
of the latest news will see the dataset grow over me.
Does the visualizaon re-
sult from a search query?
Visualizaons of the output
of informaon system re-
trieval is a well-dened kind
of visualizaon characteri-
zed by the changing num-
ber of represented items
depending on the number
of search results obtained.
This is a growing visualiza-
on subeld related to the
disciplines of informaon
systems and informaon re-
trieval (Mann, 2002; Hearst,
2009).
Validity for small or large
datasets
It is rare that a visualizaon
tool is independent of the
size of the dataset that is to
be represented. Here, in those cases in which the tool has
been clearly designed for a specic dataset size, the reader
will be given the corresponding explanaon.
2.2. Analysis of visualizaon approaches
We review a total of 49 cases applying the classicaon out-
lined above. In an aempt to incorporate the most crucial
aspects of text visualizaon, our review concentrates on
the specic ideas underpinning the text visualizaon, rather
than the dataset and the contexts of each case.
Sixteen elds have been collected for each case: name, short
name, author(s), year of publicaon, URL for further infor-
maon, original dataset, discipline related to the work, des-
cripon of the visualizaon method, descripon of the case,
screen shot, thumbnail, classicaon (single or collecon),
classicaon (single-whole, single-part, collecon-items,
collecon-aggregaons), classicaon (me), classicaon
(search), classicaon (dataset small, dataset large, N/A).
The cases are grouped into two secons and four subsec-
ons:
Single-text visualizaons (23 cases)
Whole-text visualizaons (15 cases)
Paral-text visualizaons (8 cases)
Text collecon visualizaons (26 cases)
Collecon of items (16 cases)
Collecon of aggregaons (10 cases)
For each subsecon the cases are sorted by year of publica-
on (descendant). To assist the reader, the collecon of all
reviewed cases can be viewed using the visualizaon and
exploraon soware (also included in the review) known as
AREA (Nualart, 2013).
2.2.1 Single-text visualizaon
We present single texts grouped as whole-text visualiza-
ons, paral-text visualizaons and other subcategories.
Figure 2. The 49 reviewed cases visualized with the Area software (screen shot).
How we draw texts: a review of approaches to text visualization and exploration
El profesional de la información, 2014, mayo-junio, v. 23, n. 3. ISSN: 1386-6710 227
The laer includes sequenal and non-se-
quenal visualizaons, discourse-structu-
res and syntacc-structures visualizaons,
search results and datasets dependent on
me visualizaons. Each subsecon adhe-
res to the following structure: list of cases,
descripon of the group and discussion.
a) Whole-text visualizaons
1) Literature. Novel views: Les miséra-
bles, Radial word connecons by Je Clark
(2013)
2) Literature. Novel views: Les misérables,
Character menons by Je Clark (2013)
3) Literature. Poem viewer by Katharine
Coles et al. (2013)
4) Polics. State of the Union 2011, Senten-
ce bar diagrams by Je Clark (2011)
5) Literature. Visualizing lexical novelty in
literature by Mahew Hurst (2011)
6) Science/papers. On the origin of species:
The preservaon of favoured traces by Ben
Fry (2009)
7) Science/papers. Tex t y by Jaume Nualart (2008)
8) Religion. Bible cross-references by Chris Harrison (2008)
9) Literature. Literature ngerprint by Daniel A. Keim and
Daniela Oelke (2007)
10) Wikipedia. History ow by Fernanda Viégas and Marn
Waenberg (2003)
11) Literature. Colour-coded chronological sequencing by
Joel Deshaye and Peter Stoiche (2003)
12) Literature. 2-D display of me in the novel by Joel Des-
haye (2003)
13) Literature. 3-D display of me in the novel by Joel Des-
haye (2003)
14) Any. Waenberg’s arc diagram by Marn Waenberg
(2002)
15) Health. TileBars by Mar A. Hearst (1995)
Descripon
- Number of cases: We idenfy 15 cases that can be catego-
rized as whole-text visualizaons.
- Years: The cases were published over an 18-year period
from 1995 to 2013.
- Authors: All the authors work in academic elds. The most
prolic authors in this category are Je Clark and Joel Des-
haye (with three cases each), followed by Marn Waen-
berg (with two cases).
- Datasets: Most of the text corpora in this category are
taken from literature (eight cases). Most authors draw on
novels, especially well-known texts such as the classics, to
demonstrate new visualizaon approaches.
- Methods: All the cases except case 14 (arc diagram) use
colour as part of the visualizaon method. Five cases use
methods that are bar chart derivaves (cases 4, 5, 6, 9 and
11). Three cases use curves connecng parts of the texts:
two arcs and one radial diagrams (cases 1, 8 and 14).
Discussion
A common method cannot be idened for these whole-
text visualizaons. Yet, as expected, they all present an axis
represenng the whole text. In 13 of the 15 cases, the text
line is represented by a horizontal or vercal line. The two
excepons use a circle –the case of Radial word connecons
(case 1)– and an iconicaon of a text on the page –the case
of Tex t y (case 7).
Since whole-text visualizaons always include an abstrac-
on of the text, referred to as its text line, a queson arises:
which part of the text is physically present in the whole-text
visualizaon being reviewed? Interesngly, nine of the 15
visualizaons do not show a single word (cases 4, 5, 6, 7, 8,
9, 10, 11 and 15). Four cases show a small number of words
(cases 1, 2, 12 and 13) (gure 3), while only two cases show
all the text (cases 3 and 14).
The most common approach is to show the occurrence of a
certain feature –this might be a term, topic, cross-reference
or character– within the text as a whole (all cases except 3,
12, 13 and 15). With the excepon of Waenberg’s arc dia-
grams (case 14), these occurrences are represented using
the same colour.
It is interesng to observe how very similar data are repre-
sented in very dierent ways depending on the case under
review. For example, while Viégas and Waenbergs History
ow (case 10) and Fry’s Favoured Traces (case 6) both pre-
sent document-version histories by secon, the former is
spaalized and the laer animated. Similaries, however,
are seen in the approaches adopted, for example, by Tile-
Bars (case 15) and Tex t y (case 7). Thus, both highlight words
from the text within a rectangular gure that is representa-
Figure 3. (Case 13) 3-D display of timeof William Faulkner’s novel The Sound and the Fury,
by Joel Deshaye and Peter Stoicheff (2003)
Jaume Nualart-Vilaplana, Mario Pérez-Montoro y Mitchell Whitelaw
228 El profesional de la información, 2014, mayo-junio, v. 23, n.3. ISSN: 1386-6710
ve of the whole text. Other cases use opposite or comple-
mentary techniques. Thus, Waenberg’s Arc diagram (case
14) shows repeons while Hurst’s novelty visualizaon
(case 5) shows only new strings, and no repeons.
Literature and other complex texts, such as polical spee-
ches (case 4) and the Bible (case 8), dominate the type of
corpora used in this category (10 cases). This is perhaps
surprising, as these texts tend to be complex, oen presen-
ng a high level of abstracon and lile formal structure.
Arguably, when opng to introduce or test a new approach,
it would make more sense to work with simpler, more struc-
tured texts (such as scienc papers, patents, health diag-
noscs, etc.) that present greater regularity in terms of their
vocabulary, text length, discourse structure and register.
Given the inherent freedoms associated with literature, no-
velists are under no obligaon to adhere to any paern or
rule that might help us give structure to the unstructured.
However, depending on how the text is treated and proces-
sed, the nature of the text is not always relevant. For exam-
ple, Mahew Hurst (case 5) tracks the introducon of new
terms in literary texts. Yet the tool can be applied to any
other text type, its results being unrelated to the complexity
of the text given the ubiquity of the method. Having said
this, it would be interesng to apply the technique to scien-
c papers in which the style is much more clearly dened.
Similar arguments can be applied to Radial word connec-
ons (case 1), Sentence bar diagrams (case 4) and Literature
ngerprints (case 9).
b) Paral-text visualizaons
16) Literature. Novel views: Les misérables. Characterisc
verbs by Je Clark (2013)
17) Any. Wordle by Jonathan Feinberg (2009)
18) Books. DocuBurst by C. Collins, S. Carpendale and G.
Penn (2009)
19) Literature. Phrase nets by Frank van Ham, Marn Wat-
tenberg and Fernanda B. Viégas (2009)
20) Google data. Word spectrum: Visualizing Google’s bi-
gram data by Chris Harrison (2008)
21) Google data. Word associaons: Visualizing Google’s bi-
gram data by Chris Harrison (2008)
22) Literature/songs. Document arc diagrams by Je Clark
(2007)
23) Any book. Gist icons by P. DeCamp, A. Frid-Jimenez, J.
Guiness, D. Roy (2005)
Descripon
- Number of cases: We idenfy eight cases that can be ca-
tegorized as paral-text visualizaons.
- Years: The cases were published over an eight-year period
from 1995 to 2013.
- Authors and datasets: Two cases by Je Clark (cases 16
and 22) and one by the creave team of Waenberg and
Viégas in collaboraon with van Ham (case 19) use literary
texts. The two cases by Chris Harrison use large bi-gram
datasets published by Google. One case is not dependent
on the nature of the text: Wordle (case 17), the very popu-
lar “word cloud” method introduced by Feinberg. Finally,
two interacve approaches involving large datasets are
presented: DocuBurst (case 18) and Gist icons (case 23).
- Methods: In six of the eight cases (cases 16, 17, 18, 19, 22
and 23), the dataset is reduced to what is called a bag of
words and only these words are present in the visualiza-
on. Cases 20 and 21 are representaons of all bi-grams
that pit two primary terms against each other.
Discussion
Paral-text visualizaon is a successful, popular way to draw
a text, presumably because of the way in which a long text
can be eecvely represented using a small set of words.
Simple stascal methods, such as word frequency counts,
are readily interpretable. A list of variously sized words is a
direct way of communicang with any user, from beginner to
expert. Most of the paral-text approaches available online
use stascal methods to extract the part from the whole.
It is our contenon that
extracng part of the cor-
pora can be aected by
the structure and com-
plexity of the whole. In
the visualizaons under
review, half present uns-
tructured text corpora,
but the criteria used in
extracng the part from
the whole are well de-
ned and include lists
of verbs (Characterisc
verbs, case 16), words
occurring in the text in an
“X and Y” paern (Docu-
Burst, case 18) and lists
of words not included in
a list of predened empty
words (Google’s bi-gram
data, case 21).
Figure 4. (Case 16) Novel views: Les misérables. Characteristic verbs by Jeff Clark (2013)
How we draw texts: a review of approaches to text visualization and exploration
El profesional de la información, 2014, mayo-junio, v. 23, n. 3. ISSN: 1386-6710 229
Clearly, extracon processes based on
word or phrase funconality, as opposed
to those that use stascal methods, are
more closely aected by the nature of the
text. Here, we focus on these cases becau-
se they are more interesng in terms of
our research goals. They include the cases
of Novel views: Les misérables. Characte-
risc verbs (case 16), which represents
only verbs, DocuBurst (case 18) which uses
the crowd-sourced lexical database Word-
net as a human-like backup, and Phrase net
(case 19) and the two Google bi-gram vi-
sualizaons (cases 20 and 21).
A common paern detected in the paral-
text visualizaons reviewed is that once a
part of the text has been extracted all ex-
cept one (Document arc diagrams, case
22) discard any reference to the original
text sequence in the visualizaon. See the
following point for a more detailed discus-
sion of this idea.
c) Other subcategories
Here we include sequenal and non-sequenal visuali-
zaons, discourse and syntacc structures visualizaons,
search results and datasets dependent on me visualiza-
ons.
Sequenal visualizaons
Sixteen of the 23 single-text visualizaons maintain a similar
sequence to that of the original text. Seven of these visuali-
ze the sequence using a discourse structure (primarily chap-
ters), while the remaining nine use syntacc elements to re-
present the original sequence of the text (primarily words).
Strikingly, only one paral-text visualizaon, Clark’s Docu-
ment arc diagrams (case 22) (gure 5), follows the original
text sequence, whereas all the whole-text visualizaons are
sequenal. It would thus appear that sequenality is intrin-
sic to whole-text visualizaon. Whole-text visualizaons do
not literally represent every word of the text, but rather pre-
sent a graphical metaphor of the whole: a text line. This text
line may represent either a discourse structure or a syntac-
c structure of the text; but, whatever the case, graphically
a line or area is used to represent the length of the text.
The sequenality of the visualizaon means it can be read
both backwards and forwards, as can the text. In the case of
a long text, such as a book (nine of the 16 cases), the visua-
lizaon can serve as a map or guide to the text.
Non-sequenal visualizaons
Five cases use non-sequenal visualizaons: three use word
clouds (cases 17, 20 and 21), one a net of phrases (case 19)
and one visualizes all the verbs in the text (case 16).
Discourse structures in the visualizaon
Cases: 1, 2, 5, 6, 8, 11, 12 and 13
The eight visualizaons that follow the discourse structure
of the text are sequenal –no cases being found in which
the discourse structure appeared out of sequence with re-
gards to the text. This is perhaps unsurprising, as those ca-
ses in which the text is divided into chapters and each chap-
ter represented as a separate enty were considered as text
collecon visualizaons (e.g., Sentence bar diagrams, case
4). For this reason, all the cases in this secon represent the
parts of a text ordered and aligned (in a curve or line). Of the
eight visualizaons, ve represent chapters or secons of a
book, two represent complete volumes, while one (Colour-
coded chronological sequencing, case 11) divides the text
in colours according to narrave topics and scenes. Indeed,
case 11 is the only one we have idened that uses discour-
se structure elements that are more deeply embedded than
chapters, secons, books and volumes. In all likelihood,
more deeply embedded methods than these, such as, na-
rrave topics, would require manual text line segmentaon.
Syntacc structures in the visualizaon
Cases: 3, 16, 4, 7, 18, 9, 22 and 23.
The other eight sequenal visualizaons use intrinsic text
elements, including groups of words (cases 7, 18, 22 and
23), verbs (case 16), sentences (cases 4 and 9) and a com-
plete text analysis (case 3). Syntacc analysis requires either
word-by-word parsing of the text (using a database of lexi-
cal or semanc word lists) or sentence and paragraph pars-
ing. Syntacc-structure visualizaon is less dependent on
the nature of the text in the sense that the methodology is
unaected by the complexity of the text. Typically, the so-
ware automacally extracts or marks the chosen syntacc
elements.
Search-result visualizaons
Cases: 15, 18 and 23
The three search-result visualizaons were presented as
web applicaons and were, therefore, interacve – the user
being able to query the visualizaon system and obtain a
Figure 5. (Case 15) TileBar search on (patient medicine medical AND test scan cure diagnosis
AND software program) with stricter distribution constaints.
Jaume Nualart-Vilaplana, Mario Pérez-Montoro y Mitchell Whitelaw
230 El profesional de la información, 2014, mayo-junio, v. 23, n.3. ISSN: 1386-6710
unique representaon for each search. The three cases,
however, are no longer available online. DocuBurst (case 18)
is a Prefuse applicaon that can be downloaded (Collins et
al., 2009). Prefuse is a set of soware tools for creang rich
interacve data visualizaons.
TileBars is a classic case of visualizaon (cited 625 mes by
Google Scholar) designed by a leading expert in visualizaon
and search engine interfaces, Mar Hearst. DocuBurst and
Gist icon are interacve radial visualizaons, the laer being
one of the references and main inuences on the develo-
pment of DocuBurst, as explained in the DocuBurst paper
cited.
Search-result visualizaon approaches have not been wi-
dely implemented in informaon retrieval systems and
most result outputs are one-dimensional lists of itemized
texts (Nualart; Pérez-Montoro, 2013). The three cases re-
viewed here are each applied to large datasets and, starng
with a search query, present an improved search output de-
signed to help the user read and lter the results. All three
are parcularly concerned with disnguishing between si-
milar items: TileBars searches PubMed (more than 20 mi-
llion papers); DocuBurst uses the WordNet lexical database
(155,287 words organized in 117,659 synsets for a total of
206,941 word-sense pairs) to classify the visualized text;
and, Gist icons use, among others, the complete dataset of
approximately 7 million USpto patents and the Enron email
dataset comprising 500,000 emails.
In the text collecon category below, we present nine fur-
ther search-result visualizaons.
Time dependent datasets
Cases: 6 and 10.
We present two cases in which the visualizaon approaches
can be used to understand or follow the evoluon of a text
over me. A dynamic
text visualizaon de-
monstrates that data
visualizaon may be the
only way to solve certain
tasks and that it is not
just one more method of
pure data advocacy. For
example, it is extremely
challenging to show how
a Wikipedia entry evol-
ves over me in line with
the editors’ parcipaon
(History ow, case 10)
(gure 6). History ow
provides a soluon to
this problem and sheds light on the complex collaborave
process of Wikipedia.
In the second case (Favoured traces, case 6), an animated
visualizaon demonstrates how Darwin’s ideas evolved
through successive edions of the Origin of Species. In
Ben Fry’s words: “The rst English edion was approxima-
tely 150,000 words and the sixth is a much larger 190,000
words. In the changes are renements and shis in ideas
—whether increasing the weight of a statement, adding de-
tails, or even a change in the idea itself.”
2.2.2. Text collecons
We present text collecons grouped as pure item visualiza-
ons, aggregaon visualizaons and other subcategories. The
laer includes data as a landscape layer and search result vi-
sualizaons. Each subsecon adheres to the following struc-
ture: list of cases, descripon of the group and discussion.
a) Item visualizaons
24) Literature (Note: this converts a single text into a collec-
on). Novel views: Les misérables. Segment word clouds by
Je Clark (2013)
25) Literature. Grimm’s fairy tale network by Je Clark (2013)
26) Twier. Spot by Je Clark (2012)
27) Science. Word storm by Quim Castella and Charles
Suon (2012)
28) Literature. Topic networks in Proust. Topology by Elijah
Meeks and Je Drouin (2011)
29) Wikipedia. Notabilia by D. Taraborelli, G. L. Ciampaglia
and M. Stefaner (2010)
30) Media art. X by Y by Moritz Stefaner (2009)
31) Search engine. Search clock by Chris Harrison (2008)
32) Online media. Digg rings by Chris Harrison (2008)
33) Science. Royal Society Archive by Chris Harrison (2008)
34) Wikipedia. WikiViz: Visualizing Wikipedia by Chris Ha-
rrison (2007)
35) Visualizaon. Area by Jaume Nualart (2007)
36) Chromograms by M. Waenberg, F.B. Viégas and K. Ho-
llenbach (2004)
Figure 6. (Case 10) History flow by Fernanda Viégas and Martin Wattenberg researchers at IBM’s Visual Communication
Lab (2003)
Partial-text visualization is a successful,
popular way to draw a text, presumably
because of the way in which a long text
can be effectively represented using a
small set of words
How we draw texts: a review of approaches to text visualization and exploration
El profesional de la información, 2014, mayo-junio, v. 23, n. 3. ISSN: 1386-6710 231
37) Search engines. KartOO/Ujiko by
Laurent Baleydier and Nicholas Bale-
ydier (2001)
38) Search engines. Touchgraph by
TouchGraph, LLC. (2001)
39) Internet. HotSauce by Rama-
nathan V. Guha (1996)
Descripon
- Number of cases: We idenfy 16
cases that can be categorized as
item visualizaons.
- Years: The cases were published
over a 17-year period from 1996
to 2013.
- Authors: The most prolic authors
in this category are Chris Harri-
son (cases 13, 32, 33 and 34) and
Je Clark (cases 24, 25 and 26),
followed by Moritz Stefaner with
two cases (29 and 30).
- Disciplines and datasets: Inter-
esngly, nine cases are datasets
taken from the Internet: Wikipedia (cases 29, 34 and 36),
search engines (cases 31, 37 and 38), Twier (case 26),
online media (case 32), web pages (case 39). Only three
cases use literary texts (cases 24, 25 and 28). Finally, two
cases visualize scienc papers (cases 27 and 33), one
case uses media art datasets (case 30) and one represents
non-specic collecons (case 35).
Discussion
The main dierence between single-text and text-collecon
visualizaons lies in the nature of the text. In the case of the
laer, most of the texts do not originate from literature and
are accessible online. Yet, the nature of the text appears to
be less important when the goal is
the representaon of the collecon
rather than of the text itself.
Item visualizaons use methods
that are independent of the nature
of the items themselves. Once the
text collecons have been itemized,
the dataset can be considered a ge-
neral case of data visualizaon and
not a pure case of text visualizaon.
For this reason, in this category, the
methods are generally well known
and used in other elds of visualiza-
on. Thus, we nd six network visua-
lizaons (cases 25, 28, 34, 37, 38 and
39), three melines (cases 31, 32
and 33) and three cases that likewise
use melines but which also permit
categorizaon-based groupings (ca-
ses 26, 30 and 35) (gure 7).
Finally, four cases are, we believe,
quite specic to text visualizaon.
Two are concerned with item com-
parison: Segment word clouds (case 24) and Word storm
(case 27). Segment word clouds transforms a single text
into a text collecon. Specically, it is used to represent the
chapters of Les misérables as word cloud items, thus facili-
tang their comparison. It also uses colour to idenfy words
as they acquire prominence in the text.
Word storm is a reinvenon of word cloud, or more speci-
cally a variaon of Wordle (case 17) that allows word clouds
to be compared. This is achieved by assigning a xed posi-
on to each word. This simple idea makes it visually easy
to compare word clouds while maintaining the usual word
cloud features.
Figure 7. (Case 30) X by Y by Moritz Stefaner (2009)
Figure 8. (Case 29) Notabilia. 100 longest Article for deletion [AfD] discussions on Wikipedia by Dario
Taraborelli, Giovanni-Luca Ciampaglia (data and analysis) and Moritz Stefaner (visualization) (2010)
Jaume Nualart-Vilaplana, Mario Pérez-Montoro y Mitchell Whitelaw
232 El profesional de la información, 2014, mayo-junio, v. 23, n.3. ISSN: 1386-6710
To conclude, Notabilia (case 29) and Chromograms (case 36)
are two highly original cases that deserve menon. The very
specic design of Notabilia shows the evoluon of “Arcle
for deleon” discussions of Wikipedians (gure 8), discus-
sions that are somemes more like “ame wars” given the
controversies that rage over the simple existence of certain
denions. Notabilia visualizes the evoluon of the hun-
dred longest discussions and their nal outcomes. Moritz
Stefaner’s visualizaon constutes an interacve bushtree,
the branches of which are highlighted when moused over.
The shape of the branches informs the reader about the na-
ture of the discussion: cyclical, straight or never-ending.
Chromograms is also based on Wikipedia data, providing an
analysis of the comments of editors for each edion of a Wi-
kipedia entry. Visually it produces colour-coded stripes that
in a small space rapidly inform the reader about the edit
history of Wikipedia entries.
b) Aggregaon visualizaons
40) Literature. Grimm’s fairy tale metrics by Je Clark (2013)
41) Topic models. Termite by J. Chuang, C.D. Manning and
J. Heer (2012)
42) Wikipedia. Pediameter by Müller-Birn, Benedix and Han-
tke (2011)
43) Google suggesons. Web Seer by Fernanda Viégas &
Marn Waenberg (2009)
44) Google n-grams. Web trigrams: visualizing Google’s tri-
gram data by Chris Harrison (2008)
45) Polical speech. Feature-
Lens by A. Don, E. Zheleva, M.
Gregory, S. Tarkan, L. Auvil, T.
Clement, B. Shneiderman and
C. Plaisant (2007)
46) Online news. Newsmap by
Marcos Weskamp (2004)
47) Email conversaon. Themail
by Fernanda B. Viégas, Sco
Golder, Judith Donath (2006)
48) Search engine. WebBook by
S.K. Card, G.G. Robertson and
W. York (1996)
49) Any texts. Dotplot appli-
caons by Jonathan Helfman
(1994)
Descripon
- Number of cases: We idenfy 10 cases that can be catego-
rized as aggregaon visualizaons.
- Years: The cases were published over a 19-year period
from 1994 to 2013.
- Authors and datasets: Only Fernanda B. Viégas parcipa-
ted in more than one of the 10 cases in this category (ca-
ses 43 and 47); the rest parcipated in just one case each.
The texts are very similar in nature to those in the item
visualizaon category. Five cases are corpora that can be
found online (Wikipedia, case 42; Google, cases 43 (gu-
re 9) and 44; online news, case 46; search engine results,
case 48). The standard unstructured texts include one
from literature (Sentence Bar Diagrams, case 4), one from
polical speeches (FeatureLens, case 45) and one from a
year’s worth of email conversaons between two corres-
pondents (Themail, case 47). Finally, there are two quite
unique cases: Termite (case 41) and Dotplot (case 49). All
the cases are discussed below.
Discussion
Aggregaon visualizaons is the category with the greatest
variaon in the methods employed. Thus, apart from visua-
lizing text collecons, the only thing the 10 cases assigned
to this category have in common is that they do not repre-
sent specic items.
Given these circumstances, we comment on each case se-
parately:
Sentence bar diagrams (case 40) provide a matrix (or table-
like) visualizaon that allows rows to be sorted by clicking
on columns. The columns provide a quantave denion
of 13 metrics related to the 62 stories making up Grimm’s
fairy tales. It is a powerful tool for analysing, understanding
and comparing the tales.
Termite (case 41) is a case that represents an intermediary
dataset known as topic models. Topic models are a “cle-
verer” way of obtaining a bag-of-words from a text than
applying a typical word-frequency stascal analysis. Ter-
mite does not visualize texts but it does compare parts of
Figure 9. (Case 43) Web seer by Fernanda Viégas & Martin Wattenberg (2009)
It might prove more effective to apply vi-
sualization techniques to texts that have
a more formal register and/or predefi-
ned outline and a well-defined vocabu-
lary
How we draw texts: a review of approaches to text visualization and exploration
El profesional de la información, 2014, mayo-junio, v. 23, n. 3. ISSN: 1386-6710 233
texts. As such, the tool can be used to com-
pare topic models.
Pediameter (case 42) is a specic interface
that uses bar charts to show Wikipedia edi-
ons in real me. It is most remarkable for
using a device known as an Arduino to detect
edions and transcribe them to a physical in-
dicator, merging digital and material worlds.
Web Seer (case 43) is another specic visua-
lizaon method that shows the most popular
search queries based on Google suggesons.
The approach allows queries to be compared
by represenng the suggesons with trees
and then connecng the matching branches.
The simplicity of this case contrasts with its
power of communicaon: rapid and user
friendly.
Google’s tri-gram data (case 44) uses a simi-
lar visualizaon method to that used by Web
seer. It draws on the huge Google n-gram dataset and repre-
sents and compares three-word sentences (tri-grams).
FeatureLens (case 45) is an interacve, dashboard-style in-
terface for comparing texts. The central representaon uses
a visualizaon of frequent concepts similar to that used by
Tex t y (case 7) and TileBars (case 15). It allows text browsing
and shows line graphs of frequent words found throughout
a text.
Newsmap (case 46) uses treemap visualizaon to oer a
new method for reading and monitoring the news in real-
me, employing online Google news feeds. It is totally cus-
tomizable in terms of topic, country and publicaon me.
The soware, which is available free of charge online, can
also be used for news searches.
TheMail (case 47) is an experiment in which a highly specic
interface was developed to follow and analyse the evoluon
of an email correspondence between two people over the
course of one year. It visualizes the words that characterize
each of the writers and their evoluon over me.
When rst developed in 1996, WebBook (case 48) (gure
10) was a somewhat surprising applicaon, as it trans-
formed search engine results in a mulmedia (text and
images, primarily) mash-up based on the metaphor of the
book. The applicaon was a pure text (web pages) collecon
visualizaon that presented the results as aggregaons of
text and images.
Finally, Dotplot (case 49) was an innovave visualizaon
applicaon with mulple uses, not unlike Arc diagrams
(case 14). The main use of Dotplots is for text comparisons,
including mul-language, text version and programming
code comparisons.
c) Other subcategories
Here we include landscape data layers, search-result visuali-
zaons and me-dependent datasets.
Landscape as an addional data layer
Cases: 40, 26, 28, 33, 47, 37, 38 and 49.
The typical concept of landscape data is a network visuali-
zaon comprising two layers of data, as in Topic networks
(case 28). In this specic case, the rst layer is provided by
the Marcel Proust texts represented as items and the se-
cond layer by a network of topic models of these texts. The
posions of the nodes of both layers are opmised so that
proximity indicates more strongly related nodes. This de-
nion of landscape can also be found in the defunct search
engine results provided by KartOO/Ujiko (case 37) and To u -
chGraph (case 38).
All the other cases included in this category present text co-
llecons in combinaon with more data. This is the case of
Dotplot, which represents the coincidence or otherwise of
strings in various texts, and of Grimm’s fairy tale metrics,
which combines a list of texts in rows with various parame-
ters listed in columns. These parameters do not form a di-
rect part of the text, but rather they are recalculated featu-
res related to the text, including, for example, length, lexical
diversity and the presence of dierent groups of words that
represent enes (for example: body -> hand, head, heart,
eyes and foot) in each tale.
A third kind of landscape is based on the representaon of
med metadata, as exemplied by Spot (case 26), the Royal
Society Archive (case 33) and TheMail (case 47).
A common feature of landscape visualizaons is their capa-
city to compare a collecon of texts simultaneously with a
second parameter, while their main limitaon is the number
of items represented so that large numbers create problems
of overlapping items.
Search result visualizaons
Cases: 26, 43, 35, 45, 47, 46, 37, 38 and 48.
Compared to single-text visualizaons, text-collecon visua-
lizaons include considerably more cases oering search
capacies (three vs. nine). Common sense suggests that
when presenng a text collecon, a natural feature of such
an approach will be a way of selecng part of that collecon
based on given criteria, i.e., lter and search features.
Figure 10. (Case 48) WebBook by Stuart K. Card, George G. Robertson, and William York
(1996)
Jaume Nualart-Vilaplana, Mario Pérez-Montoro y Mitchell Whitelaw
234 El profesional de la información, 2014, mayo-junio, v. 23, n.3. ISSN: 1386-6710
All the cases included in this category allow search queries
and output a unique visualizaon for each query. All the ca-
ses include a search box and a search buon.
Time-dependent datasets
Cases: 42, 29, 36 and 46.
The four cases included in this category allow the user to
monitor the evoluon of the texts in the collecon over
me. Only one is designed for use in real-me (Newsmap,
case 46), but potenally all of them can visualize the collec-
on on a specic date and at a specic me.
One obstacle faced by an approach that represents changes
in text collecons over me is providing access to an upda-
ted feed or an accessible API. It is presumably for this reason
that three of the four use Wikipedia data and the other uses
Google news. In all cases, they are online sources that have
long allowed public access to their feeds.
3. Conclusions
The diversity of approaches developed in dierent discipli-
nes, the wide diusion of publicaons or, on occasions, the
absence of formal publicaons of innovave ideas, repre-
sent a considerable challenge to the undertaking of a com-
prehensive survey of the work completed in this eld. Thus,
some of the visualizaons we present here have been unear-
thed in highly specic publicaons, the case for example of
Joel Deshaye and Peter Stoiche and their work on repre-
senng Faulkner (cases 11, 12 and 13). If we read Stoiche’s
working notes it is apparent that their visualizaons were
developed to facilitate the study of William Faulkner’s na-
rrave melines. There are no addional references to the
applicaon of these interesng ideas to other texts, sugges-
ng that more works remain hidden in the depths of other
elds.
Text visualizaon, as we have argued throughout this re-
view, may be considered a subeld of data visualizaon. Yet,
the boundaries of the discipline are not always clearly de-
ned. This is readily illustrated, for example, by the case of
Harrison’s Search clock (case 31), in which the text corpora
comprise an enormous dataset of search engine queries.
Can this dataset really be considered a collecon of texts
when each of them, in most instances, is no more than one
or two words in length? Does a text have to sasfy a mi-
nimum length in order to be considered a text? Here, we
opted to treat case 31 as a collecon of texts, short ones
admiedly but, ulmately, texts.
Clearly, the crical decision to be made throughout this re-
view has been how to classify the cases idened. As few
papers have aempted to review only text visualizaon
approaches, we turned to classic data visualizaon reviews
(e.g., Shneiderman, 1996) as well as to more recent ones
(e.g., Collins et al., 2009). In all these instances, the classi-
caons were based on tasks that the visualizaon approach
can solve rather than on the explicit aspects of the visuali-
zaon themselves. For this reason we chose to propose our
own classicaon, which, while far from perfect, we hope
will be useful for undertaking a classicaon based on visual
features.
We conclude with a list of insights, as well as shortcomings,
that we have idened to date:
- Single-text visualizaons have been applied mainly to li-
terature, a eld that, apart from being characterized by
complex combinaons of words, can present high levels
of human abstracon and freedom of structure and ex-
perimentaon. As such it might prove more eecve to
apply visualizaon techniques to texts that have a more
formal register and/or predened outline and a well-
dened vocabulary, such as legal texts, scienc papers,
template-based texts and communicaons, etc.
- We have idened only one single/paral-text visualiza-
on that is sequenal (Document arc diagrams, case 22).
Most paral-text visualizaons extract the essence of the
text based on one or more criteria and so the original se-
quence of the text is lost. Since sequenal visualizaon
approaches present certain advantages, it seems that
paral-visualizaon approaches that maintain the original
text sequence should be encouraged.
- Text-collecon visualizaons tend to employ methods
that are used for data visualizaon in general. Hence, the-
re is a need for further experimentaon in applying more
standard data visualizaon methods and approaches to
the specic subeld of text visualizaon.
- Text collecon aggregaons is the category in which the
most specic designs and ideas have been developed.
More work needs to be undertaken to idenfy any com-
mon approaches in this kind of visualizaon.
And, nally, we pose the following queson:
- Why is it that most of the cases reviewed here that are
more than ve years old are no longer available online?
If the soware used is no longer (or was never) in use,
we should perhaps queson its eecveness. While we
have not invesgated just how many cases form part of
commercial soware products and how many, following
publicaon, have simply been forgoen, the queson
remains as to why some apparently magnicent ideas
did not establish themselves as new standards. Our cha-
llenge to researchers is to produce applicaons that will
be adopted in one eld or another, or which can solve a
problem for a certain group of users; indeed, as the cases
reviewed here highlight, adopon seems to represent a
considerable challenge.
Acknowledgement
This work is part of the project “Acve audiences and jour-
nalism. Interacvity, web integraon and ndability of jour-
nalisc informaon”. CSO2012-39518-C04-02. Naonal
plan for R+D+i, Spanish Ministry of Economy and Compe-
veness.
4. References
Anglin, Gary J.; Vaez, Hossein; Cunningham, Kathryn L.
(2004). “Visual representaons and learning: The role of sta-
c and animated graphics”. Handbook of research on educa-
onal communicaons and technology, 2, pp. 865-916.
Baeza-Yates, Ricardo; Ribeiro-Neto, Berthier et al. (1999).
Modern informaon retrieval. New York: ACM press, vol. 463.
How we draw texts: a review of approaches to text visualization and exploration
El profesional de la información, 2014, mayo-junio, v. 23, n. 3. ISSN: 1386-6710 235
Baeza-Yates, Ricardo; Broder, Andreiz; Maarek, Yoelle
(2011). “The new froner of web search technology: Seven
challenges”. Search compung, v. 6585 of Lecture notes in
computer science, pp. 3-9.
hp://dx.doi.org/10.1007/978-3-642-19668-3_1
Benavides, David; Segura, Sergio; Ruiz-Cortés, Antonio
(2010). “Automated analysis of feature models 20 years
later: A literature review”. Informaon systems, v. 35, n. 6,
pp. 615-636.
hp://dx.doi.org/10.1016/j.is.2010.01.001
Collins, Christopher; Carpendale, Sheelagh; Penn, Gerald
(2009). “DocuBurst: Visualizing document content using
language Structure”. Computer graphics forum (Procs. of
the Eurographics/IEEE-VGTC Symposium on visualizaon,
EuroVis), v. 28, n. 3, pp. 1039-1046.
hp://dx.doi.org/10.1111/j.1467-8659.2009.01439.x
Feldman, Ronen; Sanger, James (2006). The text mining han-
dbook: advanced approaches in analyzing unstructured data.
Cambridge University Press. ISBN: 13 978 0 521 83657 9
Grobelnik, Marko; Mladenić, Dunja (2002). “Ecient visua-
lizaon of large text corpora”. In: Procs of the 7th seminar.
Dubrovnik, Croaa.
hp://ailab.ijs.si/dunja/SiKDD2002/papers/GrobelnikSep02.
pdf
Hearst, Mar A. (2003). What is text mining?
hp://people.ischool.berkeley.edu/~hearst/text-mining.html
Hearst, Mar A. (2009). “Search user interfaces”, Chapter 1.
ISBN: 9780521113793
hp://searchuserinterfaces.com/book
hp://searchuserinterfaces.com/book/sui_ch1_design.html
Hearst, Mar A. (2011). “Natural search user interfaces”.
Communicaons of the ACM, v., 54, n. 11, November, pp.
60-67.
hp://cacm.acm.org/magazines/2011/11/138216-natural-
search-user-interfaces/fulltext
hp://dx.doi.org/10.1145/2018396.2018414
Heer, Je (2010). “A conversaon with Je Heer, Marn
Waenberg, and Fernanda Viégas”. Queue, v. 8, n. 3, 10 pp.,
March.
hp://doi.acm.org/10.1145/1737923.1744741
Iliinsky, Noah (2013). Choosing visual properes for suc-
cessful visualizaons. IBM Soware. Business Analycs.
http://public.dhe.ibm.com/common/ssi/ecm/en/
ytw03323usen/YTW03323USEN.PDF
Kitchenham, Barbara (2004). Procedures for performing
systemac reviews. Keele, UK, Keele University, 33 pp.
Levie, W. Howard; Lentz, Richard (1982). “Eects of text
illustraons: A review of research”. ECTJ, v. 30, n. 4, pp.
195–232.
Mann, Thomas M. (2002). Visualizaon of search results
from the world wide web.
hp://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.91.2535
Meeks, Elijah (2011). Digital humanies specialist. Docu-
ments.
https://dhs.stanford.edu/comprehending-the-digital-
humanies/documents
Nualart-Vilaplana, Jaume (2013). How we draw texts: a
visualizaon of text visualizaon tools.
hp://research.nualart.cat/textvistools
Nualart, Jaume; Pérez-Montoro, Mario (2013). Texty, a vis-
ualizaon tool to aid selecon of texts from search outputs”.
Informaon research, v. 18, n. 2, June.
hp://www.informaonr.net/ir/18-2/paper581.html
Shneiderman, Ben (1996). “The eyes have it: A task by data
type taxonomy for informaon visualizaons”. In: Visual
Languages. Proceedings IEEE Symposium, pp. 336–343.
hp://dx.doi.org/10.1109/VL.1996.545307
Šilić, Artur; Dalbelo-Bašić, Bojana (2010). “Visualizaon of
text streams: A survey”. Knowledge-based and intelligent in-
formaon and engineering systems, v. 6277 of Lecture notes
in computer science, pp. 31–43. Berlin, Heidelberg: Springer.
hp://dx.doi.org/10.1007/978-3-642-15390-7_4
Stefaner, Moritz (2013). Gender balance visualizaon.
hp://moritz.stefaner.eu/projects/gender-balance/#NUM/
NUM
Strecker, Jacqueline (2012). Data visualizaon in review:
summary. Internaonal Development Research Centre
(IDRC), Oawa, ON, Canada.
http://idl-bnc.idrc.ca/dspace/bitstream/10625/49286/1/
IDL-49286.pdf
Times Higher Educaon. World university rankings 2012-
2013.
hp://www.meshighereducaon.co.uk/world-university-
rankings/2012-13/world-ranking
Tue, Edward R.; Graves-Morris, P. R. (1983). The visual dis-
play of quantave informaon, v. 2. Cheshire, CT: Graphics
Press, 199 pp.
... Alencar et al. [21] Gan et al. [22] Nualart-Vilaplana et al. [23] Cau and Cui [24] Federico et al. [25] Šilić and Bašić [5] Wanner et al. [26] Kucher and Kerren [4] Kucher et al. [9] Liu et al. [11] Jänicke et al. [27] Jänicke et al. [10] Sun et al. [15] Liu et al. [16] ...
... There are three surveys in this collection by Alencar et al. [21], Gan et al. [22] and Nualart-Vilaplana et al. [23]. Their classifications are centered around document type. ...
... Nualart-Vilaplana et al. [23] examine 49 approaches to visualize textual data over a 19-year period spanning 1994-2013, in order to provide a classification of text visualization approaches. Similar to Gan et al. [22], Nualart-Vilaplana et al. [23] start their classification with the data source of documents. ...
Article
Full-text available
Text visualization is a rapidly growing sub-field of information visualization and visual analytics. There are many approaches and techniques introduced every year to address a wide range of challenges and analysis tasks, enabling researchers from different disciplines to obtain leading-edge knowledge from digitized collections of text. This can be challenging particularly when the data is massive. Additionally, the sources of digital text have spread substantially in the last decades in various forms, such as web pages, blogs, twitter, email, electronic publications, and digitized books. In response to the explosion of text visualization research literature, the first text visualization survey article was published in 2010. Furthermore, there are a growing number of surveys that review existing techniques and classify them based on text research methodology. In this work, we aim to present the first Survey of Surveys (SoS) that review all of the surveys and state-of-the-art papers on text visualization techniques and provide an SoS classification. We study and compare the 14 surveys, and categorize them into five groups: (1) Document-centered, (2) user task analysis, (3) cross-disciplinary, (4) multi-faceted, and (5) satellite-themed. We provide survey recommendations for researchers in the field of text visualization. The result is a very unique, valuable starting point and overview of the current state-of-the-art in text visualization research literature.
... !is may help colleagues in this SIG and others to find new venues within the ACM for their interesting work. Similar reviews of text visualization projects have been conducted by [1,3,4,5]. Kucher and Kerren [1] analyze 141 text visualization projects according to a five-fold, nested taxonomy of a!ributes. "e four primary categories presented in [1] include "Analytic Tasks" (e.g., "sentiment analysis" or "entity extraction"), "Visualization Tasks" (e.g. ...
... Along with coding for the aforementioned taxonomies and sub-categories, [1] also examines authorship information and the co-citational networks that emerged from their survey results. Nualart, Perez-Montoro, and Whitelaw's [3] review coding scheme focuses on other aspects of text visualization projects, including whether or not the visualization represent individual texts or collections of texts, whether individual texts are rendered whole or in part, and if the visualization conforms to the semantic sequencing of the text. Gan et al.'s [4] survey departs from the previously discussed articles in its examination of "document visualizations," which the authors position in distinction from text visualization. ...
... Support for this view comes from [1], who gloss text visualization as "typically used for information visualization techniques that in some cases focus on raw textual data, in other cases on results of text mining algorithms." Nualart-Vilaplana, Perez-Montoro, and Whitelaw [3] also contribute to this view of text visualization as the result of text mining/analysis; however, they note the varying understandings of what counts as text visualization in their survey, citing [9,10]. ...
Conference Paper
Text visualization is a rapidly growing area of research and practice in the design of communication [1]. But, as text analysis and particularly big data techniques, become more useful across knowledge domains, text visualization will become less of a specialty area and more of a crucial aspect of everyday work. This paper offers an integrated review of 42 text visualization projects published by an Association for Computing Machinery organization and hosted in the Association for Computing Machinery Digital Library from 1991--2003. This survey will trace trends in text visualizations projects that specifically apply to the work of designers, user experience researchers, technical communicators and provide a view of the current state of text visualization to better approach evolving frameworks and methods to conduct analysis.
... Single-text visualization methods focusing on capturing the meaning, emotion, writing style, and an overall feel for the text [27,33,50] can assist reviewers in their perusal of LoRs and Essays. They can not only help the reviewers in identifying the salient points but can also counter confirmation biases. ...
... • For example, position on a common scale can convey a student's SAT/ACT scores with respect to the range of scores of admitted students, and the required AP-courses can be color-coded to find them easily in the transcripts [31]. • Single-text visualizations [27,33,50] can aid reviewers in identifying salient and affective points when reading LoRs and essays. • Snapshots of these visual representations can be included in the decision sheet. ...
Article
Holistic reviews are a common practice employed by universities in the USA to make admissions decisions. It is an individualized review process where reviewers assess an applicant's potential by considering various criteria including academic metrics, adversities faced, and personal attributes. While the factors considered in such reviews are broadly known, a detailed walk-through of the process is absent in existing literature. This is important to understand what is done in practice and to identify opportunities for technological interventions to support the complex and changing process. We employed cognitive task analysis and a socio-organizational approach to understand the holistic review process at a highly-selective, private university. We found the process to be nuanced and complex owing its complexity both to the numerous variables involved and the reviewers' thought processes. We present a rigorous, structured characterization of the review process and suggest possible leverage points for applying visualization decision-support tools.
... • to easily identify patterns, behaviors, and anomalies present in the textual reality (Nualart-Vilaplana et al., 2014) that otherwise are impossible to grasp. • to readily see the distribution of rhetorical moves and the relationships between them (Scrivner & Davis, 2017). 2 About texts… Shneiderman (1996) classifies written texts as • one-dimensional data -organized in a sequential manner • running left-to-right (or right-to-left) • line-by-line, top-to-bottom 3 Yet, ...
Presentation
Text and document data visualization is a growing research field in the digital humanities (Kucher and Kerre, 2019) as it allows new forms of analysis that can identify patterns present in the textual reality (Nualart-Vilaplana et al., 2014), which may not be apparent through other means. At Tel Aviv University, a class of Advanced Spanish (B1) students underwent a pedagogical sequence (similar to that described by Tolchinsky et al., 2021) in which they produced four analytical texts. The three first texts (T1-T3) were all written on the same topic in successive weeks, while the fourth text (T4) was written a month later on a different topic to examine which text characteristics introduced in the sequence were conserved. The texts were coded for different rhetorical moves, with macro-categories (claims, grounds, expository moves) and their subsequent sub-categories. Each set of texts was mapped out in a simple, graphic collaborative canvas tool (Miro). While the resulting visualization enabled us to see the change in rhetorical moves in each individual’s texts, it was especially critical for revealing the changes across different text times (T1-T4) at the group level. In this presentation, we show how this visualization uncovers distinct shifts in text organization and rhetorical move sequencing. We propose that this type of visual analysis can help improve the speed and clarity in which we analyze textual data, and can be used to follow groups of students writing across time to provide a more complete understanding of the process of student writing development and inform writing pedagogy and learning.
... The foundations of data visualizations originated from quantitative and categoric data analysis in statistics [1,2], cartography [3] and computer science [4]. There are 400+ visualizations for textual analysis from research [5][6][7][8]-but, many extend quantitative visualizations, such as more than 100 variants of graphs. For example, there tend to be few visualizations of letters or full text of paragraphs. ...
Preprint
Full-text available
There are still many potential literature visualizations to be discovered. By focusing on a single text, the author surveys many existing visualizations across research domains, in the wild, and creates new visualizations. 58 techniques are indicated, suggesting a wider variety of visualizations beyond research disciplines.
... Data collection is required in each language performing text classification or other NLP applications. Many corpora can be found in English language (for example Newsgroup English benchmark [13], ACL Anthology Reference polish Corpus (ACLARC) [14], Reuters 21578 English corpus [15], and Reuters Corpus Volume 1 (RCV1) [16]) as long as in the other languages such that Chinese Souhu News corpus [17], Thai dataset [18]. ...
Article
Full-text available
In the recent years, Arabic Natural Language Processing, including Text summarization, Text simplification, Text Categorization and other Natural Language-related disciplines, are attracting more researchers. Appropriate resources for Arabic Text Categorization are becoming a big necessity for the development of this research. The few existing corpora are not ready for use, they require preprocessing and filtering operations. In addition, most of them are not organized based on standard classification methods which makes unbalanced classes and thus reduced the classification accuracy. This paper proposes a New Arabic Dataset (NADA) for Text Categorization purpose. This corpus is composed of two existing corpora OSAC and DAA. The new corpus is preprocessed and filtered using the recent state of the art methods. It is also organized based on Dewey decimal classification scheme and Synthetic Minority Over-Sampling Technique. The experiment results show that NADA is an efficient dataset ready for use in Arabic Text Categorization.
... Quando visualizamos qualquer informação geográfica ou de qualquer elemento gráfico na tela de um computador, ele está sendo usado para transmitir uma linguagem visual. O uso correto da linguagem pode comunicar grandes quantidades de informação e se tornar uma ferramenta muito útil [19,16,11,20,14,2,27,28]. Exemplos da utilização de atributos visuais podem ser visualizados na Figura 3. Atributos Visuais Posição: No caso de um mapa, a posição já está associada com a informação a ser transmitida, isto é, a posição real na zona geográfica do referido objeto que está sendo simbolizado [20,2,27]. ...
Article
Full-text available
RESUMO: A apresentação visual dos dados permite uma melhor compreensão por parte dos usuários. Este estudo tem como objetivo fazer uma Revisão Sistemática da Literatura (SLR) existente, sobre a visualização de dados em sistemas de informações geográficas, a fim de abordar questões de pesquisa específicas, resumir os resultados, classificar as pesquisas com pontos em comum e identificar as necessidades de pesquisas futuras. Como resultado, foram revisados e analisados 28 documentos, considerando a informação geral, fontes, ano de publicação, questões de pesquisa e os aspectos relevantes da visualização de dados como: tipos de visualização, atributos visuais, mineração de dados, etc. Palavras chave: Visualização, visualização de dados, SIG, dados, mapas. ABSTRACT: The data visual presentation, allows users to get a better and clearer understanding of them. This study aims a systematic literature review (SLR) existing about data visualization concerning a geographic information system, in order to achieve specific research questions, summarize the results, sort the research on common ground and identify future research needs. In that sense, it was reviewed and analyzed 28 documents, considering the general information sources, year of publication, research issues and relevant aspects of data visualization, display types, visual attributes, data mining, among others.
Article
Recent developments in critical information visualization have brought the field's attention to political, feminist, ethical, and rhetorical aspects of data visualization. However, less work has explored the interplay between design decisions and political ramifications—structures of authority, means of representation, etc. In this paper, we build upon these critical perspectives and highlight the political aspect of civic text visualization especially in the context of democratic decision‐making. Based on a critical analysis of survey papers about text visualization in general, followed by a review on the status quo of text visualization in civics, we argue that civic text visualization inherits an exclusively analytic framing. This framing leads to a series of issues and challenges in the fundamentally political context of civics, such as misinterpretation of data, missing minority voices, and excluding the public from decision making processes. To span this gap between political context and analytic framing, we provide a series of two‐pole conceptual dimensions, such as from singular user to multiple relationships, and from complexity to inclusivity of visualization design. For each dimension, we discuss how the tensions between these poles can help surface the political ramifications of design decisions in civic text visualization. These dimensions can thus help visualization researchers, designers, and practitioners attend more intentionally to these political aspects and inspire their design choices. We conclude by suggesting that these dimensions may be useful for visualization design across a variety of application domains, beyond civic text visualization.
Article
Full-text available
Information visualization as a field is growing rapidly in popularity since the first information visualization conference in 1995. However, as a consequence of its growth, it is increasingly difficult to follow the growing body of literature within the field. Survey papers and literature reviews are valuable tools for managing the great volume of previously published research papers, and the quantity of survey papers in visualization has reached a critical mass. To this end, this survey paper takes a quantum step forward by surveying and classifying literature survey papers in order to help researchers understand the current landscape of Information Visualization. It is, to our knowledge, the first survey of survey papers (SoS) in Information Visualization. This paper classifies survey papers into natural topic clusters which enables readers to find relevant literature and develops the first classification of classifications. The paper also enables researchers to identify both mature and less developed research directions as well as identify future directions. It is a valuable resource for both newcomers and experienced researchers in and outside the field of Information Visualization and Visual Analytics.
Article
Full-text available
This paper discusses the creative potential of the transmutability of digital data, while focusing on the exploration of textual material. It begins by addressing the conceptual and creative possibilities associated to the topic, and then discusses artifacts that imply or express transmutability as an artistic concept and method. To this end, we resort to a framework for the description and analysis of these artifacts, focusing on their conceptual dimension, on their mechanics and on the elements of their experience. In particular, we address the concepts they approach through the use of data in textual formats as source information or content, we consider the processes for its manipulation, and describe the resulting sensory manifestations while emphasizing their dynamics and variability. In this manner, this study seeks to highlight how transmutability becomes relevant as an artistic argument, by proposing aesthetic experiences that explore the ubiquity and heterogeneity of data in our contemporary world, as it becomes available in text formats.
Article
Full-text available
Introduction. The presentation of the results page in a search system plays an important role in satisfying the information needs of a user. The usual performance management criteria and tools to organise results have limitations that may hinder the satisfaction of those needs. We present Texty as a new approach that can help improve the search experience of users. Method. The corpus of texts to which we applied Texty were papers from Information Research. To filter the texts we have build five groups of words or vocabularies on concrete fields of knowledge: conceptual approach, experimental approach, qualitative methodology, quantitative methodology and computers/IT. Results. We show how Texty, intrinsically, is capable of encoding or offer its users information about the text that other alternative classic representations (bar or lines charts, mainly) are not able to offer. Conclusions. Texty is a complementary tool that improves intellectual interaction with a lists of texts, allowing users to choose texts more effectively knowing their structure before reading them.
Book
The truly world-wide reach of the Web has brought with it a new realisation of the enormous importance of usability and user interface design. In the last ten years, much has become understood about what works in search interfaces from a usability perspective, and what does not. Researchers and practitioners have developed a wide range of innovative interface ideas, but only the most broadly acceptable make their way into major web search engines. This book summarizes these developments, presenting the state of the art of search interface design, both in academic research and in deployment in commercial systems. Many books describe the algorithms behind search engines and information retrieval systems, but the unique focus of this book is specifically on the user interface. It will be welcomed by industry professionals who design systems that use search interfaces as well as graduate students and academic researchers who investigate information systems.
Book
Text mining tries to solve the crisis of information overload by combining techniques from data mining, machine learning, natural language processing, information retrieval, and knowledge management. In addition to providing an in-depth examination of core text mining and link detection algorithms and operations, this book examines advanced pre-processing techniques, knowledge representation considerations, and visualization approaches. Finally, it explores current real-world, mission-critical applications of text mining and link detection in such varied fields as M&A business intelligence, genomics research and counter-terrorism activities.
Article
Edward Tufte states in his introduction, “Graphics are instruments for reasoning about quantitative information.” This book is for those who have never considered the combined use of points, lines, coordinate systems, numbers, symbols, words, shading, and color. It covers the two and one-half centuries of evolution since the legendary William Playfair began the development of a language of graphic design.
Article
charts, graphs, and diagrams are more abstract but do use spatial layout in a consequential way (Knowlton, 1966; Levie & Dickie, 1973; Rieber, 1994; Winn, 1987). Levie (1987) has suggested that there are at least four lines of research,on illustrations: (a) picture perception, (b) memory for pictures, (c) learning and cognition, and (d) affective responses to pictures. In this
Conference Paper
This work presents a survey of methods that visualize text streams. Existing methods are classified and compared from the aspect of visualization process. We introduce new aspects of method comparison: data type, text representation, and the temporal drawing approach. The subjectivity of visualization is described, and evaluation methodologies are explained. Related research areas are discussed and some future trends in the field anticipated.