ChapterPDF Available

Zappavigna, M. (2011). Visualising logogenesis: Preserving the dynamics of meaning. In S. Dreyfus, M. Stenglin & S. Hood [eds.] Semiotic Margins. London: Continuum. pp. 211-228

Chapter 10
Visualizing Logogenesis: Preserving the
Dynamics of Meaning
Michele Zappavigna
University of Sydney
Discourse analysts are concerned with understanding patterns in language.
These patterns are often highly complex involving relationships between many
variables and across many dimensions. Data is said to have high dimensionality
when there are multiple values that uniquely identify a data point. For example,
we might model a point in two dimensions using X and Y axes, in three dimen-
sions using X, Y and Z axes and in N dimensions using N number of axes. As the
degree of dimensionality increases, our capacity to perceive the data as mean-
ingful diminishes. This is in part because metaphors that we can intuitively
understand such as three-dimensional space cannot be applied to visualize the
data directly. The problem is pertinent for linguistics because language has
high dimensionality. So for a discourse analyst wanting to understand patterns
in language, some kind of technological support is required.
Since discourse analysts are concerned both with understanding relation-
ships between variables and with understanding how patterns of variables
unfold in a text, the kind of support that they need should be dynamic. In other
words it needs to be able to account for logogenesis: the unfolding of meaning
in a text. Fortunately, advances in computer technology now afford us the pos-
sibility of annotating, managing and visualizing highly complex data. We can
now track multiple relationships between variables unfolding in time or along
other dimensions. As a result we have the potential to model logogenesis and
understand how meanings work together as they unfold in real-time and across
semiotic modes. This chapter explores three enabling text visualization tech-
niques that use three different methods to represent the temporal sequencing
of a text: text arcs (Wattenberg 2002), streamgraphs (Byron & Wattenberg
2008, Havre et al. 2002) and animated networks (Fry 2000b).
I begin by considering the complementarity of dynamic and synoptic per-
spectives on texts and introduce the concept of logogenesis as it is theorized
in Systemic Functional Linguistics (SFL). I then introduce the  eld of Text
Visualization and describe some of its principles and methods with a view to
212 Semiotic Margins
suggesting how text arcs, streamgraphs and animated networks might be used
by functional discourse analysts as tools for exploring text.
Preserving Logogenesis
The toolkit available to the Systemic Functional linguist is currently largely
composed of strategies suited to a synoptic gaze. SFL has focused much effort
over the last half-century on modelling the meaning potential of language as a
semiotic system from a paradigmatic perspective. This effort has centred upon
using system networks (resembling, visually, tree diagrams turned on their
side), as a means of displaying choices in systems of possible meanings. Comple-
mentary to the paradigmatic view of ‘what could go instead of what’ (Halliday
& Matthiessen 2004:22) that these system networks afford, is a syntagmatic
perspective that considers ‘what goes where when’.
The preoccupation with paradigms has directly impacted how strata are
modelled in SFL as levels of abstraction. For Matthiessen (2007), strata are
subsystems at particular orders of abstraction that are held together in a realiza-
tion relationship whereby patterns at a lower level are realized by those at higher
levels. The co-tangential circles representation in Figure 10.1 presupposes a
paradigmatic gaze in which features are (metaphorically) distributed across
the two-dimensional plane rather than sequenced. This is in keeping with the
fact that realization affords minimal information about sequencing beyond
the small window of structure speci ed in function structures.
discourse semantics
graphology expression 'plane'
content 'plane'
Figure 10.1 Strati cation conceived as levels of abstraction (Halliday & Martin
Visualizing Logogenesis 213
However, a syntagmatic perspective considers an additional dimension
when modelling meaning-making, which is change. Halliday recognized that
once contiguity relations are added to a paradigmatic modelling strategy that
the linguist is ‘taking on a dynamic commitment’ (Halliday 1991:40), that
they are involved in modelling change. Halliday (1993) described three
kinds of semiotic change or ‘semogenesis’: logogenesis, the unfolding of text,
phylogenesis, the evolution of culture, and ontogenesis, the development of the
meaning-making potential of a human over time. In this chapter I am
concerned with logogenesis and thinking about meanings in unfolding in
texts ‘dynamically as currents  owing through a strati ed semiotic system’
(Halliday 1991:40).
In modelling change we come up against the problem of the limitations of
the information that systemic probability represented using system networks
can provide (Zappavigna et al. 2008). Lemke (1991) suggests the emergent
complexity of language as a dynamic open system. In contrast to the position
that argues that interdependency can be modelled at the stratum of lexico-
grammar, Lemke suggests that ‘relations of interdependency’ are dynamic
semantic relations (Lemke 1991:24):
If we imagine the description of dynamic systems to be mainly a matter of
the dynamic weightings of selection probabilities, then we wish to know how
the selections ‘up to now’ condition the probabilities for selections ‘now’.
(Lemke 1991:26)
In other words, unless we have a dynamic model of register, we are unable to
reset the ‘probability weightings in just the right way just in time for each pass
through the network’ (Lemke 1991:241). He notes that these kinds of sets or
probabilities can be described mathematically ‘and amount, in fact precisely to
the re-weightings of dynamic systems needed for text production to produce
texts of recognizable social formations’ (Lemke 1991:31).
In accord with Lemke’s position outlined above, Martin (2004:341–342) fore-
grounds the importance of maintaining the logogenetic integrity of texts when
he argues that ‘as social discourse analysts we need to guard against studies
that submerge unfolding texture in processes of counting and averaging that
look for trends across texts rather than contingencies within them’. While,
in accounting for the unfolding of a text, it may be clear that we wish to avoid
approaches that characterize the text as a ‘bag of words’, we also want to avoid
the position where the text is reduced to a collection of clauses:
It doesn’t matter how many clauses we analyse, it’s only once we analyse
meaning beyond the clause that we’ll be analysing discourse. And we need
to analyse discourse right along the cline of instantiation if we want to make
sense of the semiotic weather we experience in the ecosocial climate of our
times. (Martin & Rose 2003:272)
214 Semiotic Margins
In order to make such a jump out of the clause, we need means of commun-
icating the kinds of patterning that we will  nd. Static forms of representation
such as bar charts will not meet our needs because they reduce the complexity
to a single value visualized in two dimensions. Instead we perhaps require tech-
niques that assist linguists in exploring the patterning of annotations that they
have made to a text across as many dimensions as are necessary.
Rather than reducing the annotated text to a table of statistical values we
might employ various kinds of text visualization to achieve a dynamic lens on
the data. For example, consider the description, provided by a developer of a
text visualization system that presents texts in a three-dimensional network:
Instead of focusing on numeric speci cs . . . the piece provides a qualitative
feel for perturbations in the data, in this case the different types of words
and language used throughout the book. This provides a qualitative slice
into how the information is structured. On its own the raw data might not be
particularly useful. But when relationships between data points can be estab-
lished, and these relationships expressed through movement and structural
changes in the on-screen visuals, a more useful perspective is established.
(Fry 2000a:67)
Such a ‘qualitative slice’ may be of great use to the discourse analyst because
it emphasizes relationships between linguistic features in texts as they are
interwoven to create particular meanings. What is presented here is not an
argument against ‘counting’ these features, but a suggestion that should not
toss out information about their sequencing.
Patterning and Co-occurrence in Systemic
Functional Linguistics
When we turn from modelling potential paradigmatically to considering the
unfolding of meanings realized in texts, different patterns of coordination
need to be foregrounded. Logogenesis is clearly more than the text unfolding
in a simple linear progression. The orchestration of a text might involve dif-
ferent kinds of semiotic crescendo and decrescendo as different meanings
emerge and fade prosodically. This potential in discourse, a kind of ‘snowball-
ing’ (Martin, personal communication, 16 July 2008) of meaning, is apparent
when manually analysing, for example, evaluative language using Appraisal
theory (Martin & White 2005). However, we currently have an impoverished
repertoire for talking synoptically about this kind of patterning. A naive repres-
entation of a text accumulating particular meanings while shedding others is
presented in Figure 10.2. Current strategies for annotating patterning include
colour-coding, tabular organization, and, in some cases, annotations in formats
such as XML. While any kind of annotation is a useful  rst step, the problem of
Visualizing Logogenesis 215
making visible patterns that emerge in sequences that exceed a single page or
screen is signi cant. This is a problem of tractability. Until we have a way of
representing extended patterning we are limited to probing small co-patternings
of meaning deemed qualitatively relevant to the particular questions the analyst
is asking about the text.
To be true to the unfolding of a genuine, multimodal text, however, we
need to  nd ways of analysing and representing the unfolding of two kinds
of co-ocurrence in actual texts: co-occurrence across unfolding modes, for
example, simultaneous use of a particular intonation and a particular gesture,
and co-occurrence within the same text sequence, for example, use of 
together with 1 in a clause in the unfolding text. Time, in the
second type of co-occurrence is not clock time but instead a form of ‘text
time’ dependent on the dimension of meaning that the discourse analyst is
interested in exploring. The latter type of co-occurrence has begun to be
explored in the notions of coupling (Martin 2000) and syndromes (Zappavigna
et al. 2008) introduced earlier. Coupling refers to meanings that are co-related
in a text, for example, relationships between evaluative and ideational meaning
integral to construing shared values in a community (Knight 2008). Syndromes
are larger-scale con gurations involving multiple associations between dif ferent
meanings involved in the overall rhetoric being developed as the text unfolds.
I will now suggest how the domain of text visualization may offer assistance to
linguists trying to analyse unfolding textual patterns.
A Brief Introduction to Text Visualization
Text Visualization is an emergent  eld, a cousin of Scienti c Visualization,
and often classi ed as part of Information or Knowledge Visualization. Those
interested in visualizing text often have a background in both computer science
and digital art, bringing both technical and aesthetic skills to the endeavour
Figure 10.2 Representing accumulation and shedding of meaning in a text as it
216 Semiotic Margins
(see, for example, Martin Wattenberg at, Ben Fry at www. and Lee Byron at The  eld tends to naturalize
the encoding of language as a product, typically written ‘raw text’ repositioned
as ‘data’. The ‘raw text’ is a string of characters with lexical items as the focus of
inquiry. This attention to lexis is partly pragmatic, resulting from the dif culty
of training a computer to identify linguistic features of greater complexity (e.g.
clauses). Thus, most visualization techniques are ‘word’-based, often excluding
apparently irrelevant ‘stop words’ (often function words). They are also con-
stituency-oriented, chunking the text into units. These two limitations mean
that, to date, visualization techniques have not been used to explore meaning
beyond the clause in all its prosodic complexity. In short, the area has inherited
some of the bad habits of generative computational linguistics.
However, it is entirely possible that we may move beyond lexis and out of the
clause via text visualization. Visualization offers us an important opportunity to
gain synoptic views of the text that concurrently preserve a dynamic view of
logogenesis. This is because many visualization techniques allow the text to be
manipulated along multiple dimensions while allowing us to track multiple
kinds of relationships between features.
Visualization, in general, is concerned with  nding methods of representa-
tion that best leverage the characteristics of human visual perception to make
complex data meaningful. A synoptic view of a text or corpus should assist the
viewer by lessening the cognitive burden of perceiving patterns in texts:
For any reader, the rather slow serial process of mentally encoding a text
document is the motivation for providing a way for them to instead user their
primarily preattentive, parallel processing powers of visual perception . . .
The goal of text visualization, then, is to spatially transform text information
into a new visual representation that reveals thematic patterns and relation-
ships between documents in a manner similar if not identical to the way the
natural world is perceived. (Wise et al. 1995:51–52)
Thus, a visualization will only be effective to the extent that it can pro tably
make use of preattentive perceptual capabilities. In addition, as with all forms
of computing, ‘bad data in equals bad data out’. Careful attention needs to be
paid to which visualization strategies best accommodate the kinds of linguistic
relationships that we want to explore. We risk creating a representation that
resemiotizes our data in misleading ways.
The following sections present three visualization techniques that may be
useful in resolving the tension between gaining a synoptic perspective on the
text (the paradigmatic perspective) and capturing its unfolding (the syntag-
matic perspective). The overview of these three techniques is intended as
an invitation to the reader to think about how we might begin the task of
exploring the emergent complexity of logogenesis.
Visualizing Logogenesis 217
Text Arcs: Visualizing Repetition
Text arcs are a technique for summarizing repetition in long strings. They have
been used to visualize text, code (Wattenberg 2001), DNA (Spell & Brady 2003)
and music (Wattenberg 2001). Text arcs are a development of the dotplot
technique, a form of recurrence plot used in, for example, bioinformatics, to
graphically compare repetition in genomic sequences (Figure 10.3). Dotplots
represent repetitions in a similarity matrix by shading identical cells. The text
arc layout, on the other hand, creates links between repeated units using trans-
lucent arcs (Figure 10.4) and is thus able to preserve a view of the time sequence.
Figure 10.3 A DNA dotplot of a human zinc  nger transcription factor (GenBank
ID NM_002383), showing regional self-similarity
Figure 10.4 A text arc representation of music (Wattenberg 2002:5)
218 Semiotic Margins
Translucency allows for differing levels of aggregation to be represented on the
one diagram. The arcs overcome the problem of scalability, meaning that a long
text sequence can  t onto a single page or screen with the time series repre-
sented along the horizontal axis. In essence, text arcs make a text tractable by
providing a representation strategy that makes a time series manageable.
An example of the text arc technique used dynamically is ‘animated text arcs’
such as that developed by Byron (2007). Byron’s system dynamically renders
text arcs while an audio-text unfolds to assist children with learning about
rhyming in poetry (see an example at
A simpli ed text to speech engine is used to break down the poem into
individual phonemes, so that ‘Once upon a time’ becomes ‘w-ah-n-s ax-p-aa-n
ey t-ay-m’ these phonemes can then be identi ed in patterns representative
of alliteration, rhyme and rhythm. (Byron 2007)
The steps beneath the arcs represent rhythm, while the link repeated rhyme
represents alliteration and homophones (Figure 10.5). The rhyming engine
has also been used to create an interactive limerick writing assistance applica-
tion with which a child can begin to type a line and be prompted with informa-
tion about how many syllables remain to be used in that line. As you exhaust
‘remaining syllables the words become shorter, if you begin to type a word,
words that begin with what letters you have typed so far are presented’ (Byron
Text arcs have also been used to ‘represent visually different types of multi-
modal prosody so that a single text can be explored or comparisons can be
made between different texts’ (Zappavigna & Caldwell 2008). Caldwell and
Figure 10.5 Dynamic text arc visualization of ‘Hickory Dickory Dock’ (Byron 2007)
Visualizing Logogenesis 219
Zappavigna (Chapter 11) explored how text arcs could be used in visualizing
the patterning of end-rhymes in rap music. They also showed how end-rhymes
unfold in popular rap music, providing a logogenetic view that allows the rhym-
ing style of rap artists to be compared in terms of how they unfold with the text.
In general, the text arc technique may be useful to discourse analysts investigat-
ing how repeated patterns differ across texts of the same or different genres.
Streamgraphs: Visualizing Multiple Data Series
Discourse analysts are usually interested in tracking the unfolding of more than
one linguistic feature as it varies over a text or across a corpus. A visualization
technique able to represent multiple features on the same diagram is the
streamgraph. Streamgraphs are a form of stacked graph, a display where
multiple data series are positioned one on the top of the other, offering a
way of ful lling this requirement. Streamgraphs visualize multiple variables as
coloured ‘streams’  owing with the time series on a single graph. Smooth
curves are generated by interpolating between points to produce the ‘ owing’
river of data. The technique has been used to visualize box of ce revenues
changing over time (Byron & Wattenberg 2008), changes in music listening
habits (Byron 2008), shifts in lexical themes in corpora with time (Havre et al.
2002) and changes in word association in Twitter status messages (Clark 2008b).
For example, Figure 10.6 is a streamgraph depicting a user’s ‘listening history’,
which is the variation in artists that a user listens to over time. In this graph
Sufjan Stevens
November December January Februa
Dj Shadow
Figure 10.6 Extract from a streamgraph depicting a user’s ‘listening history’
(Byron 2008)
220 Semiotic Margins
each layer or ‘stream’ represents a different artist and the width of the layer
represents the frequency of the listening. Time is the movement from left to
right over an 18 month span. The developer describes the graph as ‘a sort of
virtual mirror, re ecting very personally signi cant events made visible by the
changes in listening trends’ (Byron 2008). The colour scheme, represented
here only in greyscale, was also used to indicate the level of interest a user had
in each artist:
I ultimately decided on a color scheme that highlighted both the point
of discovery of a musician as well as the user’s overall interest in them. Cool
colors represent a core’ musician who the user is familiar with, while warmer
colors represent a more recent discovery. The most saturated the color, the
more interest the user has in that musician. (Byron 2008)
This kind of representation is potentially useful to linguists because it is a
technique that allows multiple types of instances to be displayed as unfolding in
time. The graph layout also provides a mechanism for representing informa-
tion about the ‘weighting’ of those instances visualized as the width of the
stream. Such weighting may be a simple frequency count of annotated items in
a text, or based on a more complex metric in accord with a particular linguistic
theory. If applied to an annotated text, the technique could be used to show
how different types of linguistic phenomena co-vary over time. So, for example,
we might export annotation series made in the video annotation software,
ELAN, that have been encoded against a time series, and after some post-
processing, visualize how modes such a gesture and facial expression are working
together in the video.
An example of the streamgraph technique applied to textual data is Themeriver
(Havre et al. 2002), a system that uses the river metaphor to visualize ‘themes’
varying over time in a collection of documents. ‘Themes’ are represented by
colour-coded horizontal streams of varying width (Figure 10.7). Variance in
width indicates the ‘strength’ of a theme, de ned as the frequency of particular
lexical items or the frequency of texts containing particular lexical items,
depending on the customization selected. When the former frequency metric
is adopted the system offers a topic-centered approach to visualizing a corpus
in contrast to document-centered approaches (Berry 2003). Themeriver has
been used to visualize the shifting of themes in a collection of texts by Fidel
Castro from 1959 until 1961 (Figure 10.7). In this visualization particular points
in time can be selected along the time series and annotated with labels (e.g.
‘Cuba and Soviet relations resume’ in Figure 10.7). This feature supports
hypothesizing about context in particular domains, for example, political dis-
course, in various ways:
Providing such context allows users to evaluate content in relation to issues
beyond those contained within the documents themselves. Continuing with
Visualizing Logogenesis 221
the earlier example of candidates running for election, we might ask how
the candidates’ themes change in response to news events. Do their speeches
appear to trigger news events? Does a candidate’s opinion have any apparent
impact on the stock market? (Havre et al. 2002:11)
However, as always, the old adage that ‘correlation does not equal cause’
needs to be kept in mind.
Streamgraphs have been used to visualize Twitter feeds (Clark 2008b). Twitter
is a micro-blogging service that allows users to post status messages in text of
up to 140 characters. Other users can subscribe to an individual’s twitter feed to
receive these updates automatically. For example, Figure 10.8 shows a ‘Twitter
Topic Stream’ for the top 100 twitter users (twits), which uses a variation of the
Streamgraph technique to represent the distribution of the most ‘interesting’
capitalized words that occur in a database of twitter messages for the top 100 users.
The developer employed a particular operationalization of ‘interestingness’:
The interestingness of a word was quanti ed by a function of the total refer-
ences as well as the burstiness of the word distribution.
The most ‘interesting’ words in this data are primarily product, techno-
logy, or technology event names with the exceptions of ‘Scoble’ and ‘Obama’.
This isn’t surprising since the top twitter users are early-adopters interested
in technology. I was a bit surprised at the large volume for Seesmic but
Figure 10.7 ‘Theme River uses a river metaphor to represent themes in a
collection of Fidel Castro’s speeches, interviews and articles from the end of 1959
to mid-1961.’ (Havre et al. 2002:11)
222 Semiotic Margins
discovered that it is a company founded by Loic Le Meur, the 6th top twitter
user. (Clark 2008c)
However, it is clear that any number of linguistic criteria might be used, although
these are limited by what might be automatically detected. The interactive
application is available at
Figure 10.9 is an example by the same developer of the streamgraph tech-
nique applied to a single text, the novel ‘Tom Sawyer’ by Mark Twain, to visual-
ize the salience of particular characters throughout the novel. The streamgraph
technique allows an intuitive exploration of temporal changes across multiple
attributes; in this case the attributes are different characters in a novel. While
the accuracy of interpolated values, that is, new values that have been calculated
based on a discrete set of known values, might be questioned, the strategy offers
a useful qualitative view on sequential data such as text.
Figure 10.8 Topic Stream for a Twitter User (Clark 2008b)
Figure 10.9 Tom Sawyer Character Streamraph (Clark 2008a)
Visualizing Logogenesis 223
Animated Networks: The Text as ‘Becoming’
The metaphor often used in SFL of the text unfolding (logogenesis) invokes
ideas about linear progression that might not be optimal for modelling a text’s
complexity. An alternative metaphor that might be invoked is that of an ani-
mated network. This type of visualization seems more in accord with viewing
the text as a complex adaptive system in which changes, particularly in initial
conditions, have repercussions throughout the system. These types of systems
are common in nature. An animated representation also invokes a metaphor of
‘becoming’ or propagation. Indeed it is through propagation that systems such
as evaluative language swarm in a text, forming prosodic rather than constitu-
ent structures (Zappavigna et al. 2008). Fry’s (2000a:19) concept of ‘organic
information visualization’ deploys related ideas, conceiving visualization as
functioning to employ ‘simulated organic properties in an interactive, visually
re ned environment to glean qualitative facts from large bodies of quantitative
data generated by dynamic information sources’. His system, Valence (Fry
2000b), will be reviewed in this section. A simpli ed example of Valence read-
ing another of Mark Twain’s works, The Innocents Abroad, is available at www. (screen capture provided in Figure 10.10).
Valence (Fry 2000b) is a system that visualizes word usage as a network
unfolding in a three-dimensional globe. The system renders words as ‘nodes’ in
the network and connects words with branches if they are adjacent in the text
so that ‘each time these words are found adjacent to each other, the connecting
Figure 10.10 A simpli ed version of Valence reading ‘The Innocents Abroad’ by
Mark Twain (Fry 2000b)
224 Semiotic Margins
line shortens, pulling the two words closer together in space’ (Fry 2000a:67).
An important aspect of the value of the system is this foregrounding of the
relationality of language:
The premise is that the best way to understand a large body of information
. . . is to provide a feel for general trends and anomalies in the data, by pro-
viding a qualitative slice into how the information is structured. The most
important information comes from providing context and setting up the
interrelationships between elements of the data. If needed, one can later
dig deeper to  nd out speci cs, or further tweak the system to look at other
types of parameters. (Fry 2000b)
While the system only models one kind of relationship, lexical adjacency,
a logical extension appears to replace the input data, currently ‘raw text’
(Figure 10.11), with annotated data and to specify different kinds of relation-
ships between annotation series. This would occur at the ‘preprocessor engine’
stage of the information pipeline that Fry proposes as a software engineering
method (Figure 10.11).
Valence ‘reads’ the text by moving words that are used most frequently to
the edges of the globe and less frequent words to its centre (Figure 10.12).
Within the system, logogenesis is represented as a proximal–distal relationship
rather than movement from left to right across a page. The text ‘unfolds’ by
moving the current lexical item being ‘read’ to the centre front of the three-
dimensional space. In some versions of the system a small page is shown next
to the network with lines of the text appearing in sync with the ‘reading’
provided by the movement of the network. A Quicktime video of Valence in
reading Mark Twain’s The Innocents Abroad is available at
Figure 10.11 The ‘information pipeline’, a software engineering method for the
Valence visualization (Fry 2000a:65)
Visualizing Logogenesis 225
The dynamic network representation is an attempt to overcome the problem
of how to make tractable hefty data sources such as texts that contain large
quantities of unique elements. As Fry explains:
A bar chart containing this many elements would be nearly worthless. It
would be too large to take in at a glance, or if shrunk to one’s  eld of view,
too small to understand. A focus + context technique like the Table Lens could
be used, but due to enormous disparities in word usage . . . less than 25% [in
the case of the text ‘The Innocents Abroad’ by Mark Twain] would be worth-
while at all, with the interesting features not even appearing until the top 5%.
(Fry 2000a:66)
Figure 10.12 Metaphors of space used in Valence
Figure 10.13 Two viewpoints on a network (Fry 2000a:68)
226 Semiotic Margins
The three-dimensional visualization affords a way for the user to move around
inside the text and explore relationships between words. The user is able
to zoom in or view the network from different viewpoints (Figure 10.13),
depending on the relationships that they wish to investigate.
This chapter has presented three text visualization techniques that use particu-
lar representation strategies for making logogenesis both visible and tractable:
Text Arcs, Streamgraphs and animated networks. The  rst technique is useful
for discourse analysts exploring repeated patterns in texts, the second for rep-
resenting the unfolding of more than one linguistic feature on the same graph,
and the third for achieving a dynamic representation of features unfolding
in time. The techniques are examples of moving beyond a ‘bag of entities’
perspective on texts to embrace the complex sequencing of discourse. If we are
able to develop these techniques to cope with annotated systemic functional
input then we will have a powerful lens on our data. We will also have a useful
mechanism for communicating analyses of patterns that will allow us, in turn,
to develop functional theory about discourse patterning without factoring out
time2 (Zhao 2009, forthcoming).
Effective annotation is the  rst step in visualization of features that cannot
be automatically extracted from text with current computational techniques.
This means that we require systems that support easy manual annotation of
texts by the linguist. Examples of existing text annotation systems developed by
Systemic Functional linguists include Systemics (Judd & O’Halloran 2001),
UAM Corpus Tool (O’Donnell 2008) and SysAM (Matthiessen & Wu 2001).
To date, there has been no work done on how the output of these systems
might be visualized. We might think of ourselves as biologists trying to map
the genome without a theory of sequencing.
I would like to acknowledge the support of the Australian Research Council
in funding this research.
1 These categories are taken from Appraisal theory (Martin & White 2005) and
refer respectively to language about emotional responses and language scaling
evaluation in a text.
Visualizing Logogenesis 227
2 By time, I do not refer to physical time but instead to ‘text time’ in the sense of
logogenetic unfolding.
Berry, M. (2003). Survey of text mining: Clustering, classi cation, and retrieval. New York:
Byron, L. (2007). Children’s poetry and lymerick visualizations. Retrieved 11 August
2008, from Lee Byron:
Byron, L. (2008). listening history – What have I been listening to? Retrieved
31 July 2008, from Lee Byron :
Byron, L. & Wattenberg, M. (2008). Stacked graphs – Geometry & aesthetics. Retrieved
8 July 2008, from Lee Byron:
Clark, J. (2008a). Tom Sawyer character streamgraph. Retrieved 11 August 2008, from
Neoformix: Discovering and illustrating patterns in data:www.neoformix.
Clark, J. (2008b). Twitter topic stream. Retrieved 31 July 2008, from Neoformix:
Discovering and illustrating patterns in
Clark, J. (2008c). Twitter topic streams for some top users. Retrieved 11 August 2008,
from Neoformix: Discovering and illustrating patterns in data:www.neoformix.
Fry, B. (2000a). Organic Information Design. Unpublished dissertation. Boston,
MA: Massachusetts Institute of Technology.
Fry, B. (2000b). Valence. Retrieved 18 July 2008, from Ben
Halliday, M.A.K. (1991). Towards probabilistic interpretations. In E. Ventola (Ed.),
Functional and systemic linguistics: Approaches and uses (pp. 39–61). Berlin and
New York: Walter de Gruyter.
Halliday, M.A.K. (1993). Language in a Changing World. Occasional Paper Number 13.
Toowoomba, Queensland: Applied Linguistics Association of Australia, Centre
for Language Learning and Teaching, University of Southern Queensland.
Halliday, M.A.K. & Martin, J.R. (1993). Writing science: Literacy and discursive power.
London: Routledge, Taylor & Francis Group.
Halliday, M.A.K. & Mattheissen, C. (2004). An Introduction to Functional Grammar.
London: Edward Arnold.
Havre, S., Hetzler, E., Whitney, P. & Nowell, L. (2002). ThemeRiver: Visualizing
thematic changes in large document collections. IEEE Transactions on visualisation
and Computer Graphics, 8(1), 9–20.
Judd, K. & O’Halloran, K. (2001). Systemics. Singapore: Singapore University Press
2001. (Educational software).
Knight, N. (2008). ‘Still cool . . . and american too!’: An SFL analysis of deferred
bonds in internet messaging humour. In N. Nørgaard (Ed.), Systemic Functional
Linguistics in Use, Odense Working Papers in Language and Communication (Vol. 29)
(pp. 481–502). Odense: University of Southern Denmark, Institute of Language
and Communication.
228 Semiotic Margins
Lemke, J.L. (1991). Text production and dynamic text semantics. In E. Ventola (Ed.),
Functional and Systemic Linguistics: Approaches and Uses 23 (pp. 23–38). Berlin and
New York: Mouton de Gruyter.
Martin, J.R. (2000). Beyond exchange: Appraisal systems in English. In J. Martin,
S. Hunston & G. Thompson (Eds), Evaluation in text: Authorial stance and the
construction of discourse (pp. 142–175). Oxford: Oxford University Press.
Martin, J.R. (2004). Mourning: How we get aligned. Discourse and Society15, (2–3),
Martin, J.R. (2008, July 21–25). Chaser’s war on context: Making meaning. Paper
presented at the 35th International Systemic Functional Congress. Sydney.
Martin, J.R. & Rose, D. (2003). Working with discourse: Meaning beyond the clause.
London, New York: Continuum.
Martin, J.R. & White, P.R.R. (2005). The language of evaluation: Appraisal in English.
New York: Palgrave Macmillan.
Matthiessen, C. (2007). The ‘architecture’ of language according to systemic
functional theory: Developments since the 1970s. In R. Hasan, C. Matthiessen &
J. Webster (Eds), Continuing discourse on language: A functional perspective (Volume
two). London: Equinox.
Matthiessen, M.I.M. & Wu, C. (2001). SysAm. [Programs for computational
Analysis] Available at:
O’Donnell, M. (2008). Demonstration of the UAM CorpusTool for text and
image annotation. Proceedings of the ACL-08:HLT Demo Session (Companion volume)
(pp. 13–16). Columbus, OH: Association for Computational Linguistics.
Spell, R. & Brady, R. (2003). BARD: A visualization tool for biological sequence
analysis. Proceedings of the IEEE Symposium on Information Visualization. Seattle,
Wattenberg, M. (2001). The shape of song. Retrieved 31 July 2008, from Turbulence:
Wattenberg, M. (2002). Arc diagrams: Visualizing structure in strings. Proceedings
of the IEEE Symposium on Information Visualization (pp. 110–116). Boston, MA.
Wise, J., Thomas, J., Pennock, K., Lantrip, D., Pottier, M., Schur, A., et al. (1995).
Visualizing the non-visual: Spatial analysis and interaction with information
from text documents. Proceedings of the IEEE Information Visualization Symposium
(pp. 51–58). Atlanta, GA.
Zappavigna, M. & Caldwell, D. (2008). Visualising multimodal patterning. Paper
presented at Critical Dimensions in Applied Linguistics, July 4–6. Sydney.
Zappavigna, M., Dwyer, P. & Martin, J. (2008). Syndromes of meaning: Exploring
patterned coupling in a NSW Youth Justice Conference. In A. Mahboob &
K. Knight (Eds), Questioning linguistics (pp. 103–117). Newcastle: Cambridge
Scholars Publishing.
Zhao, S. (2010). Intersemiotic relations as logogenetic patterns: Towards the
restoration of the time dimension in hypertext description. In M. Bednarek &
J. Martin(Eds), New discourse on language: Functional perspectives on multimodality,
identity, and af liation (pp. 195–218). London: Continuum.
ResearchGate has not been able to resolve any citations for this publication.
As the volume of digitized textual information continues to grow, so does the critical need for designing robust and scalable indexing and search strategies/software to meet a variety of user needs. Knowledge extraction or creation from text requires systematic, yet reliable processing that can be codified and adapted for changing needs and environments. Survey of Text Mining is a comprehensive edited survey organized into three parts: Clustering and Classification; Information Extraction and Retrieval; and Trend Detection. Many of the chapters stress the practical application of software and algorithms for current and future needs in text mining. Authors from industry provide their perspectives on current approaches for large-scale text mining and obstacles that will guide R&D activity in this area for the next decade. Topics and features: * Highlights issues such as scalability, robustness, and software tools * Brings together recent research and techniques from academia and industry * Examines algorithmic advances in discriminant analysis, spectral clustering, trend detection, and synonym extraction * Includes case studies in mining Web and customer-support logs for hot- topic extraction and query characterizations * Extensive bibliography of all references, including websites This useful survey volume taps the expertise of academicians and industry professionals to recommend practical approaches to purifying, indexing, and mining textual information. Researchers, practitioners, and professionals involved in information retrieval, computational statistics, and data mining, who need the latest text-mining methods and algorithms, will find the book an indispensable resource.
This is the first comprehensive account of the Appraisal Framework. The underlying linguistic theory is explained and justified, and the application of this flexible tool, which has been applied to a wide variety of text and discourse analysis issues, is demonstrated throughout by sample text analyses from a range of registers, genres and fields.