Content uploaded by Vincent Labatut
Author content
All content in this area was uploaded by Vincent Labatut on Jun 24, 2022
Content may be subject to copyright.
89
Extraction and Analysis of Fictional Character Networks:
A Survey
VINCENT LABATUT, Laboratoire Informatique d’Avignon – LIA EA 4128, France
XAVIER BOST, Orkis, France and Laboratoire Informatique d’Avignon – LIA EA 4128, France
Acharacter network is a graph extracted from a narrative, in which vertices represent characters and edges
correspond to interactions between them. A number of narrative-related problems can be addressed auto-
matically through the analysis of character networks, such as summarization, classication, or role detection.
Character networks are particularly relevant when considering works of ctions (e.g. novels, plays, movies, TV
series), as their exploitation allows developing information retrieval and recommendation systems. However,
works of ction possess specic properties making these tasks harder.
This survey aims at presenting and organizing the scientic literature related to the extraction of character
networks from works of ction, as well as their analysis. We rst describe the extraction process in a generic
way, and explain how its constituting steps are implemented in practice, depending on the medium of the
narrative, the goal of the network analysis, and other factors. We then review the descriptive tools used to
characterize character networks, with a focus on the way they are interpreted in this context. We illustrate
the relevance of character networks by also providing a review of applications derived from their analysis.
Finally, we identify the limitations of the existing approaches, and the most promising perspectives.
Keywords
: Information retrieval, Character network, Work of ction, Narrative, Graph extraction, Graph
analysis, Natural language processing, Multimedia processing, Image processing.
Cite as:
Vincent Labatut and Xavier Bost. 2019. Extraction and Analysis of Fictional Character Networks: A Survey .
ACM Computing Surveys 52(5):89. https://doi.org/10.1145/3344548
Note :
This is a longer and slightly updated version of the ocial ACM CS article, including the
supplementary material. In particular, it contains additional gures extracted from the surveyed
articles, and Table 3has been completed with a number of bibliographic references published after
the original article.
Contents
Abstract 1
Contents 1
1 Introduction 2
2 Overview of the Extraction Process 6
3 Character Identication 7
4 Interaction Detection 16
5 Graph Extraction 27
6 Analysis and Applications 34
7 Discussion, Opportunities and Perspectives 53
A Methods for the Extraction of Fictional Character Networks 58
References 66
Authors’ addresses: Vincent Labatut, Laboratoire Informatique d’Avignon – LIA EA 4128, 339 chemin des Meinajaries,
Agroparc BP 91228, Avignon cedex 9, 84911, France, vincent.labatut@univ-avignon.fr; Xavier Bost, Orkis, 610 rue Georges
Claude, Pôle d’activité d’Aix-en-Provence, Aix-En-Provence, 13290, France, Laboratoire Informatique d’Avignon – LIA EA
4128, 339 chemin des Meinajaries, Agroparc BP 91228, Avignon cedex 9, 84911, France, xbost@orkis.com.
2/84 Vincent Labatut and Xavier Bost
1 INTRODUCTION
The rst works of ction possibly date back to as far as the Paleolithic, and have constituted a
major part of human culture since then [
316
]. Nowadays, it is estimated that, in average, adults
are in contact with ctional stories for 6% of their time awake [
316
]. Besides their artistic and
entertainment aspects, ctions are assumed to fulll various social and psychological purposes, e.g.
improvement of communication [
200
], development of empathy and collaboration skills [
315
,
351
,
352
], elaboration of social norms [
315
], proxy to understand the real world [
375
], assessment of
social strategies [
316
], constitution of a collective memory [
316
]. It is therefore natural that they
are abundantly studied by academia, and that ction-related business is a signicant part of the
economy [18,26,260,324].
A work of ction takes the form of a narrative, i.e. a report of events telling a story. This report can
be conducted through a variety of communication means: text, speech, image, music, gesture, and
others, under a variety of forms: fables, tales, novels, plays, but also movies, TV series, video games,
cartoons, and comics. The collection of events explicitly reported by the narrative constitutes its plot.
These events are often ordered to form a chronological and/or causal chain [
45
]. By comparison,
the story contains all the plot events, plus those imagined or inferred by the audience, based on
both the plot and a number of contextual factors [
45
]. As an illustration, an ellipsis consists in
removing events from the plot without aecting the story, as the audience will interpolate the
missing parts. Put dierently, the plot is what is told, whereas the narrative is how it is told, and
the story is what the audience perceives of the plot through the narrative.
Historically, narratives have been studied from the Aristotelian perspective, which argues that
the most important part of a narrative is its plot. However, more modern approaches focus on
characters instead [
21
], and consider that they are the agents that advance the plot through their
actions [
263
]. This is exemplied by Woloch in the eld of literary analysis [
412
]. He denes the
notion of character-space as the narrative environment of characters in a novel, i.e. their position
relative to the other elements of the plot (place, time, other characters). In other words, this is how
characters are described in the narrative. The concept of character-system extends this notion to
the narrative as a whole, and corresponds to the union of all character-spaces. This approach has
been noticeably used to study and understand how writers and directors build a narrative.
In addition to the characters themselves, researchers have then started to take into account the
way characters interact, which is considered as the backbone of the narrative [
77
,
307
]. In such a
context, graphs are a natural modeling paradigm, as they allow representing and studying a system
through the interactions of its constituting elements. A character network is a graph describing a
narrative by representing the characters through its vertices, and the interactions between them
through its edges. As we will see later, there are many methods to extract this type of network from
some raw data representing the considered work, depending not only on the nature of these data, but
also on the information that one wants to encode in the produced network, and on what one wants
to do with it eventually. Moretti has shown that such an approach allows to handle more formally
Woloch’s concepts [
268
]. In a graph, the subgraph induced by a vertex and its neighborhood can be
seen as a projection of the social aspects of the notion of character-space, whereas the whole graph,
which contains all characters and their relations, represents the character-system [
319
]. Woloch
emphasizes the fact that character-spaces must be considered jointly, and this is precisely what
graphs, a naturally relational modeling framework, allow.
Extraction and Analysis of Fictional Character Networks: A Survey 3/84
1.1 Value of Character Networks
This relevance of graphs for modeling works of ction is illustrated by the number of articles
dealing with character networks in the literature, and the variety of purposes for which they are
used. We distinguish three categories of such articles.
First, in the context of Narrative Analysis, character networks are generally extracted manually,
for a very small number of narratives (typically, a single one). Authors use them in a “distant
reading” fashion to obtain a simplication of the plot [
268
], characterize the plot structure at various
levels [
320
], detect relevant patterns and narrative events, identify character roles (e.g. protagonist
vs. antagonist) or particularly important characters [
319
], assess the validity of literary theories [
98
,
104
], and produce graphical representations [
289
,
396
,
413
]. In addition to the description of
individual plots, they are also used to compare them, for example among episodes of a given
series [
135
], or works belonging to the same genre [
320
], period [
167
] or author [
311
]. In other
social science domains, character networks are also used for educational purposes [
41
], and to
study certain psychological mechanisms [47].
Second, another category of works also adopts a descriptive and comparative approach, but
relying on a Complex Systems paradigm. These authors consider that character networks are a type
of Complex Network, and as such they apply the standard tools developed to analyze them [
307
],
and/or propose new ones [
74
]. The network itself is the object of the study. Like for Narrative
Analysis, these works generally consider a few narratives, as the networks are often extracted
manually. Many articles of this type compare the topological properties of character networks
with those of other kinds of complex networks, e.g. real-world social networks [
10
,
241
], random
models [74], or other ctions [373].
Third, a large number of works originate from the Articial Intelligence domain. They focus
more on automating the network extraction process, which requires solving various text, speech,
image, and/or video processing problems, depending on the media used. Compared to both other
categories, this allows using much larger corpora. These works also consider character networks as
models of the plot, and take advantage of this to solve higher-level problems: role detection [
178
],
genre classication [
16
,
368
], storyline detection [
407
], story segmentation [
409
], movie scene
segmentation [
231
], video abstraction [
386
], recommendation systems [
219
], and others. The
results obtained by solving some of these problems can be used to treat higher-level tasks, e.g. the
detected roles can help summarize a plot. Certain authors directly relate character networks to
novel elds such as movie information retrieval [
296
], which consists in obtaining and exploiting
valuable information from collections of movies.
Character networks even reach the mainstream audience, mainly for their relevance as a visual-
ization tool. Numerous non-academic or educational Web pages display character graphs extracted
from popular culture works, such as Star Wars [30,120,121,323], Harry Potter [56,205,313], The
Witcher [
165
,
377
], Marvel movies [
186
], Love Actually [
314
], Game of Thrones [
82
,
127
,
153
,
187
,
240
],
Star Trek [
299
], The Simpsons [
140
,
180
,
308
,
395
], South Park [
141
], The Oce [
92
,
287
,
354
], Sein-
feld [
363
], Curb your Enthusiasm [
108
], Grey’s Annatomy [
233
,
406
], and Friends [
11
,
37
,
333
,
346
];
as well as from classics like Sherlock Holmes [
54
], European drama [
113
,
422
] and Shakespeare’s
plays [55,134,160,300].
1.2 Specific Features of Fiction Works
The extraction and use of character networks concern all types of works, including non-ctional
ones. For instance, certain authors focus on biographies [
392
], professional meetings [
122
], journal
articles [
367
], and broadcast news [
398
]. So in theory, it is possible to apply these methods developed
for non-ction to deal with ction. However, in practice this does not necessary leads to good
4/84 Vincent Labatut and Xavier Bost
results, because works of ction possess some specic features, absent from non-ction. These can
result in specic issues, whose resolution requires suitable processing, but they can also correspond
to additional information one can leverage through appropriate methods to improve performance.
We give examples of both aspects in the following.
First, there can be dierences in the structure of the narrative. For instance, plays and TV or movie
scripts are semi-structured, in the sense that scenes are explicitly bounded and speakers are explicitly
named. This feature can be harnessed during network extraction [
279
], and does not appear in non-
ctional narratives (or most other types of ctional ones, for that matters). In video-based narratives,
the set of camera and editing rules, conventions and guidelines, sometimes metaphorically called
lm grammar [
49
], is quite dierent in ction and non-ction works. For instance, the so-called 180
degree rule states that during a scene, the relative positions of the characters on the screen must not
change. The shot alternation (or shot/countershot) rule is particularly used during conversations: it
species that consecutive shots alternatively show the involved characters. Yeh et al. [
416
] leverage
both of them to improve character detection in movies. Comics and animated lms are apart, as
in their cases, the medium itself is unlike anything related to real-life. Characters can be highly
deformed human, non-anthropomorphic beings, or even inanimate objects, which makes ineective
the methods designed to detect faces or persons in photographs [
18
] or live action movies [
390
].
Moreover, the structure of comics narratives is unique, in the sense that they include information
under a variety of forms encoded in both text (captions, speech balloons, onomatopoeia) and
drawings (pose, graphical conventions) not found in other media, and whose extraction requires
specic methods [18].
Second, there are generally signicant stylistic dierences. In texts, literary prose is considered
as more complex than journalistic prose [
97
], and even more so when the work is older [
136
].
One of the eects of style is actually to give a unique identity to the work, and to distinguish
it from both non-ctional works and other ctions [
46
]. Stylistic dierences are so marked that
it is possible to assign works automatically to their creators [
16
]. They signicantly aect the
performance of generic methods on a variety of NLP tasks: plot modeling, character detection
and story generation [
96
], text summarization [
183
], named entity recognition (NER) [
17
,
389
],
co-reference resolution [
198
,
389
]. There are a number of reasons for this drop in performance.
For instance, for character detection in novels [
96
]: many characters are relatives and share the
same last name; they bear nicknames; some ctional characters are inanimate objects in real life;
writers use specic honorics corresponding to complex, possibly outdated and even imaginary
social conventions; and they craft names in order to convey certain meaning or function. In fact,
this task is dicult even for humans, enough to requires a specic annotation process [
388
]. For
co-reference resolution, the problem comes from longer sentences, more frequent use of pronouns
and direct speech, more numerous and shorter co-reference chains [
198
]. Similarly to text, certain
characteristics of ction works make generic audiovisual processing tools inecient [
416
]. For
instance, movie directors use a variety of complex, possibly genre-related, editing techniques [
230
].
At a lower level, the same face can appear under a variety of lights, colors, angles, expression, and
other deformations, which do not correspond at all to the very controlled conditions under which
non-ctional works are recorded (e.g. news or talk show). Speech-wise, conversations are subject
to background noise or music, involve more participants, and a way of speaking that is unlike that
found in other forms of audiovisual productions [57].
Third, ctions often are closed-worlds, in the sense that they are self-contained and involve
recurring entities, possibly with made up names. Generic tools ignore this characteristic, which
sometimes can help handling certain tasks [
176
] such as alias resolution (nding the dierent
variants of a character’s name). On the contrary, most generic tools rely either on a training corpus
or on external databases: in both cases, the described entities are likely to be completely dierent
Extraction and Analysis of Fictional Character Networks: A Survey 5/84
from those occurring in a ction. For example, a standard approach when performing face-matching
in news is to leverage pictures from press articles and their captions: this cannot be done for movies
containing ctional characters [
433
]. If anything, this method is more likely to return the actor’s
rather than the character’s name. Similarly, many NER systems rely on gazetteers or services such
as DBPedia, Wikidata, or YAGO [
46
], which are likely to include only the main characters (if any)
of the considered ction. For instance, none of the proper nouns used in Tolkien’s The Lord of the
Rings would be present in a standard list of rst names or places.
Fourth, there is also a dierence in the way characters interact in works of ction, compared to
real life [
319
]. A real-world social network represents an auto-organized system, whose structure
emerges from the interactions between some agents acting according to their own agenda. By
comparison, the writer or director controls all actions of ctional characters, and arrange them
according to a plot. Put dierently, real-world networks are the result of microscopic processes,
whereas ctional ones are caused by a macroscopic process [
319
]. There is no reason to suppose that
the writer tries to mimic actual social relationships when producing the work of ction. As we will
see later, studies show that this is generally not the case, as numerous character networks extracted
from ctions do not exhibit realistic topological properties. This is because other constraints
come into play, such as the intelligibility and appeal of the plot. Analyzing a dierent structure is
likely to require specic tools, compared to real-world networks (including non-ctional character
networks).
1.3 Perimeter and Organization of the Survey
The rst publications related to ctional character networks date back to the early 2000s, e.g. [10,
362
]. As explained before, both extracting and leveraging these networks involve solving specic
problems. However, there is no synthetic review describing the solutions proposed in the literature.
With this survey, we want to ll this gap. Not only do we consider articles directly related to
ctional character network extraction and/or usage, but also articles focusing on certain specic
steps of this process (without necessarily trying to deal with such networks). Note that certain
authors extract other types of graphs (non-character-based) from works of ction, such as scene
transition graphs [
419
], or narrative structure graphs [
177
], but we do not include them in this
review.
Our contributions include the description of the tools currently available and the approaches
currently adopted to detect characters and their interactions from all forms of narratives, as well as
the methods leveraging them to build character networks. We also contribute by identifying open
problems at all levels of the extraction and analysis processes, and proposing perspectives to solve
them.
Terminology-wise, we need to distinguish scientic work from work of ction. For this purpose,
we will use the words author and article to refer to scientic authors and their work, whereas writer
(or director,playwright, or any medium-specic term) and simply work will refer to artistic authors
and their works of ction.
The rest of the survey is organized in two parts. We rst focus on the process of extracting a
character network from a work of ction. We introduce it in a generic way (Section 2), before
describing its three main steps: the identication of characters and their occurrences (Section 3), the
detection of their interactions over the narrative (Section 4), and the extraction of the graph itself
(Section 5). In the second part, we focus on how to leverage character networks. We rst discuss the
descriptive tools used in the literature to characterize them, and then examine a selection of more
elaborate tools developed to solve specic problems (Section 6). Finally, we identify the current
main issues of the eld, and conclude with some some perspectives (Section 7).
6/84 Vincent Labatut and Xavier Bost
2 OVERVIEW OF THE EXTRACTION PROCESS
The process of extracting character networks from works of ction depends a lot on the form of the
considered narrative, e.g. novels are not treated like movies. In order to give the reader a general
overview, in this section we make abstraction of these dierences and present this process in a
very generic way. In the rest of the survey, on the contrary, we focus on their dierences.
We consider that this process consists of three main steps, represented in Figure 1: 1) the
identication of characters; 2) of their interactions; and 3) the extraction of the proper graph. Each
of them can be conducted in a number of ways, depending not only on the nature of the considered
narrative, but also on the planned usage of the character network, and on certain methodological
choices.
Work of fiction
Static
network
Dynamic
network
Detect
occurrences
Character occurrences
Interaction
list
t=1
t=1
t=2
t=θ
t=2 t=3 t=4
Unified occurrences
t=1 t=2 t=3 t=4
Unify
occurrences
1. Identify characters
3. Extract graph 2. Detect interactions
Conversations
Co-occurrences
Filter/merge
characters
Full
temporal
integration
Partial
temporal
integration
Mentions
Direct actions
Affiliations
=
?
Filtered
list
Fig. 1. Overview of the generic character network extraction process. Figure available at
10.6084/m9.figshare.7993040 under CC-BY license.
The rst step is the most dependent on the form of the narrative, as it starts with the raw material,
i.e. the work of ction itself. We distinguish two substeps. The rst is to detect occurrences of
characters in the narrative, for instance looking for people names in a novel, or looking for faces in
a movie. The second is to unify these occurrences, i.e. to determine which ones correspond to the
same character. In a text, the same character can appear under dierent names, whereas in a movie,
the same face can be shown under a variety of scales, colors, lights, and angles. The output of this
step takes the form of a chronological sequence of unied character occurrences.
The second step consists in detecting interactions between characters. Note that it is sometimes
more ecient or convenient to conduct parts of this process during the rst step, but this is generally
not the case. We identify ve dierent denitions for the notion of interaction. Many authors
consider that a simple co-occurrence between two characters is enough to infer an interaction
between them. Others prefer to identify explicit interactions, which is generally a more dicult
process. One way of doing this is to take into account conversations, and to consider that two
characters interact when one talks to the other. With certain forms of narrative such as plays, in
Extraction and Analysis of Fictional Character Networks: A Survey 7/84
which speakers are given, this task is relatively straightforward. An alternative is to focus on the
content of the conversations, and to leverage mentions, i.e. situations where one character talks
about the other. Some authors consider all sorts of actions one character can perform on the other
(besides conversing). This is particularly the case with novels, a form of narrative in which such
actions are explicitly described. Finally, certain authors do not focus on actions and prefer to use
aliations, i.e. explicit or inferred social relationships such as being married, being relatives, or
working together. Note that it is possible to combine these denitions of the notion of interaction,
for instance by looking for both co-occurrences and conversations.
The output of the second step is a chronological sequence of interactions between characters.
The third step is therefore relatively generic, as it relies only on this list and is thus independent
of the nature of the original narrative. We distinguish two substeps. The rst, which is optional,
consists in simplifying this sequence by ltering and/or merging some of the characters under
certain conditions. For example, when considering co-occurrences, some authors merge characters
that always appear together: this allows simplifying the network. The second substep denes how
the graph is extracted through temporal integration, i.e. the aggregation of the previously identied
interactions. There are a number of approaches for this purpose, which we separate into two groups:
those performing a full integration and therefore leading to a static network, and those performing
only a partial integration, and producing a dynamic network.
3 CHARACTER IDENTIFICATION
Character identication consists in detecting which characters appear in the considered narrative,
and when exactly they appear in this narrative. As mentioned before, the form under which
characters appear in the narrative varies much depending on the medium. In the case of text,
they can be represented in three ways [
97
,
385
]: proper nouns (e.g. “Sherlock Holmes”), pronouns
(e.g. “He”), and nominals, i.e. anaphoric noun phrases referring to characters (e.g. “The consulting
detective”). For videos, they either can appear onscreen, or be mentioned in the audio stream (again
as a proper noun, pronoun, or nominal). In comics, characters can either appear as drawings or be
mentioned in the text (again, under the same three forms).
Automating character identication is quite challenging, which explains why many non-specialists
prefer to perform this task manually. We describe this manual approach separately, as there are
various ways of proceeding (Section 3.1). But our focus is rather on automatic approaches, for which
we distinguish two subproblems: rst, nding character occurrences in the narrative (Section 3.2);
and second, determining which of these occurrences represent the same character (Section 3.3).
For both subproblems, there are two very dierent categories of approaches, which depend on
whether characters are represented in a textual vs. audiovisual way. Note that this dichotomy does
not necessarily matches the type of narrative, for instance a movie can be treated as a video or as a
text (through its transcript or script).
Besides strict character identication, certain authors perform some additional processing in
order to extract individual attributes to describe characters (e.g. age, gender...), and/or to lter them.
We discuss this in Section 3.4.
3.1 Manual Approaches
Some authors adopt a fully manual approach to detect character occurrences, in which case there
is no need to distinguish occurrence detection from occurrence unication, as both tasks are
conducted at once.
3.1.1 Direct Annotation. The most widespread method is direct annotation, which consists
for the authors in annotating by themselves the narrative they want to study, e.g. [
47
,
207
] for
8/84 Vincent Labatut and Xavier Bost
novels, [
241
,
265
] for myths, [
268
,
355
] for plays, [
77
,
382
] for movies, and [
28
,
407
] for TV series.
Opting for such a manual approach can be due to technical limitations, e.g. the authors do not
have access to ecient automatic methods [
268
]. However, it can also be a methodological choice,
e.g. to better focus on the assessment of the other steps of the network extraction and/or analysis
process [5,6,28].
Though rarely mentioned explicitly [
135
], the authors that adopt manual approaches for later
extracting co-occurrence networks (cf. Section 4.1) often ignore mentions of characters which are
just named by others, but do not physically participate in the action, as in [
307
]. It is also not clear
exactly how annotators deal with occurrence unication. However, context generally suggests
they perform this task, and do so manually (e.g. [
268
]), as the extra cost is marginal (cognitively
speaking).
3.1.2 Character Index. Instead of doing the annotation work themselves, certain authors take
advantage of predened resources, which are also manually constituted. For certain classic novels,
literary experts have constituted so-called character indexes, indicating at which point of the plot
each character appears. This is for instance the case for Rousseau’s Les confessions in [
317
–
319
],
and Park’s Toji in [
293
,
294
]. Several authors proceed similarly for comics [
10
,
130
], as they study
the Marvel universe by taking advantage of the Marvel Chronology Project
1
, an online database
listing the occurrences of all signicant Marvel characters.
Even if such indices are elaborated by experts of the considered work of ction, it is dicult, or
even impossible to assess their reliability. Moreover, it is important to notice that they impose a
predened level of precision on the rest of the extraction process. For instance, character occurrences
are expressed in terms of pages for Les confessions, and comic issues for the Marvel universe. This lack
of control can be considered as a limitation, since the level of precision aects certain subsequent
extraction steps (e.g. it constrains the selection of a narrative unit when extracting co-occurrence
networks, cf. Section 4.1).
Like for direct annotation, the elaboration of indices is likely to include some form of character
occurrence unication. However, it is dicult to determine whether it is the case for a given index,
and according to which procedure exactly. Indeed, this task is conducted by the creators of the index,
not the authors of the study that take advantage of this index for network extraction. Moreover,
the index is often not properly documented regarding this aspect. Only a very few articles mention
occurrence unication, but they nevertheless reveal some dierences in the way they handle this
task. For instance, in [
317
,
319
], the character index considers all variants of the character names,
but not pronominal references, whereas the index used in [318] includes both.
3.1.3 Crowdsourcing. In practice, it is hard to handle more than a few works of ction when
using either of the previous approaches (direct annotation vs. predened character index). A
workaround is to turn to crowdsourcing, as Rochat & Kaplan do in [
320
] to constitute their own
indices. Interestingly, this study is also characterized by its multimedia nature, as the authors
consider a corpus of science-ction works including novels, comics, movies, TV series, and video
games. The manual approach has the advantage of allowing a more uniform network extraction
process over the variety of considered media, and therefore makes it possible to compare them.
They can also select their own level of precision during the elaboration of the indices: pages for
novels and comics, and intervals of one minute for movies and TV series. For video games, they
experiment with three dierent base materials: walk-throughs (i.e. texts explaining how to nish
the game), which are treated like novels; cinematic scenes, which are treated like other videos
(movies and TV series), and transcriptions of these scenes, which are treated like scripts.
1http://www.chronologyproject.com/
Extraction and Analysis of Fictional Character Networks: A Survey 9/84
3.2 Detection of Character Occurrences
We now switch to approaches that are at least partially automated. As mentioned before, the process
of character identication largely diers depending on whether the narrative is visual (Section 3.2.3)
or textual, which is why we separate them in our description. Moreover, certain texts such as plays
and scripts possess a structure which can be leveraged for character occurrence detection, so we
distinguish such semi-structured text (Section 3.2.2) from free text (Section 3.2.1).
3.2.1 Free Text. As mentioned before, a character can appear under three forms in text: proper
noun, nominal, and pronoun. The methods used in the literature all handle the rst form, but not
necessarily the two others, as detecting them is generally a much harder problem, and they are often
not considered as informative. A simple way to detect character names is to use a predened list of
these names and proceed through exact matching [
33
,
44
,
138
,
159
,
212
]. Such a list is generally
constituted manually, either by the authors themselves or through an external source such as the
Wikipedia page of the considered novel. Constituting it is not a trivial task, as characters can be
referred to through a variety of aliases, i.e. variations of their name. For instance, Sherlock Holmes
can also be called simply “Sherlock”, “Holmes”, or “Mr. Holmes”. Some authors perform a manual
verication after the exact matching step [138].
Detecting character names can be viewed as a specic version of the Named Entity Recognition
(NER) problem. NER consists in nding expressions in the text corresponding to proper nouns [
278
],
and to identify their category (e.g. Location,Person,Organization). A number of authors apply
o-the-shelf NER tools to novels, e.g. [7,66,98,152,358], and then only retain the Person entities.
Incidentally, those are generally much more frequent than other proper nouns in literary texts [
391
].
Dekker et al. perform an empirical comparison of four such tools in the context of character
extraction based on novels [
85
]. It is possible for a NER tool to assign dierent categories to distinct
instances of the same string, because of contextual dierences. For instance, “France” is a country,
but also a rst name. However, in the context of novels, such a situation is likely an error: it is
generally agreed upon that a novel is a small, self-contained world [
16
], and that the writer would
not confuse the reader by using the same name to denote entities of dierent types (such as a
person and a place) [
17
]. A straightforward solution to the multiple category issue is then to keep
the majority category, as in [
16
,
17
]. Valls-Vargas et al. specically train a classier to distinguish
characters from other types of mentions [
390
,
390
]. A very few authors consider other categories
in addition to Person, such as Location [
213
,
252
] and Organization [
15
], which eventually results in
a network with multiple types of vertices.
Fiction texts have certain characteristics which are leveraged by some authors, either to perform
some post-processing after having applied an o-the-shelf NER tool, in order to nd missed mentions
and/or discard incorrectly detected ones, or to design new ction-specic NER tools. A simple
method is to remove infrequent names, as they are likely to be errors. For instance, Elsner [
96
] and
Sack [
326
] remove names appearing fewer than ve times. Some authors also perform a manual
verication to x the errors of the automatic tools [
326
]. Some automatic approaches use honorics
(titles such as “Sir” or “Madam”), generally by relying on manually predened resources. One can
take advantage of a list of honorics to detect them in the text and check the surrounding text for
character names [
16
,
17
], or look for a set of patterns describing the various possible combinations
of honorics, rst names and last names [
385
]. Some approaches proceed similarly with action
verbs, as only characters are likely to be their subjects. For instance, Ardanuy & Sporleder [
16
]
use a manually constituted list of speech verbs (e.g. to say, to discuss), while Goh et al. leverage
WordNet to focus on human action verbs only [
131
]. Zhang et al. also use the grammatical structure
of the sentence through part-of-speech (PoS) tagging [
431
]. Finally, some approaches consist in
looking for relations of possession (through genitive marks, such as “’s” in English) [
385
], as only
10/84 Vincent Labatut and Xavier Bost
characters are supposed to own things. These approaches are more robust than generic NER tools,
in the sense that they allow detecting non-human characters behaving as humans [
182
]. In [
46
],
Bornet & Kaplan propose an ensemble-based method associating most of the aspects listed above,
for French texts. It relies on the vote-based combination of the outputs of six basic classiers. Each
one focuses on the detection of a specic type of clues: presence of honorics, position in the
sentence, semantics of neighboring words, grammatical structure, occurrence in external resources,
and presence of nearby explicit quotes.
These last methods are likely to return not only proper nouns, but also nominals (anaphoric noun
phrases referring to characters). Some authors propose methods specically designed to detect
these nominals, generally through regular expression matching. Elson et al. [
99
] look for structures
of the form: a determiner (article, possessive, number...), an optional modier (e.g. an adjective), and
ahead noun (not necessarily a proper noun). They manually compile lists of determiners and head
nouns based on their corpus and external linguistic resources such as WordNet. They use them to
detect the determiner and head noun rst, and consider the text located in between as the modier.
The task of detecting pronouns is more or less dicult depending on the considered language. For
English, exact matching based on a manually dened list is a simple and ecient approach [97].
3.2.2 Semi-Structured Text. A number of narratives can take the form of a script: theatric plays,
movies, TV series. A script is essentially a conversation-based text, with specic structure and
formatting described as semi-regular by certain authors [
4
]: scene boundaries are clearly indicated,
the characters involved in a scene are explicitly listed at its beginning in uppercase, and the name
of the character speaking a line is indicated right before it, also in uppercase. When the script
is properly formatted and structured, it is relatively straightforward to extract this information
automatically. Authors have proposed methods based on exact string matching [
87
,
178
,
277
],
regular expression [368,413], or a custom parser [311].
a) b) c)
Fig. 2. Examples of semi-structured text: a) from [4], b) from [87] c) from [178].
However, this structure and formatting is not a proper standard, and can vary from one script
to the other [
248
], as illustrated in Figure 2. It is even possible to nd inconsistencies in the same
script. Machine learning can help solving this issue. Agarwal et al. [
4
] propose a method to identify
which parts of the script correspond to character lists, dialogues, speaker names, scene boundaries,
and scene instructions. Tan et al. [
373
] take advantage of this type of decomposition to only focus
on character mentions associated to utterances, in order to ignore passive characters which are
present in a scene but do not intervene.
Discrepancies can also appear in the speaker names. In this case, a simple approach consists in
using an a priori list of the characters involved in the script [
196
,
373
,
413
], with their associated
aliases. This list is generally constituted manually, or by taking advantage of publicly available
resources (generally also constituted manually), such as the Wikipedia page of the considered
Extraction and Analysis of Fictional Character Networks: A Survey 11/84
work of ction. Again, machine learning-based methods can be more robust than such simple
matching-based approaches. Makris & Vikatos [
248
] take advantage of the Wikipedia pages of the
movies they study to train a classier into identifying which character speaks which line. Certain
authors directly apply o-the-shelf NER tools [132,433] to detect speaker names.
Identifying the characters involved or speaking during a scene is enough when extracting co-
occurrence (cf. Section 4.1) or conversational networks (Section 4.2), respectively. Dealing with
other types of interactions between characters requires identifying character mentions in the rest
of the text [
206
,
277
]: not only explicitly identied speakers, but also scene metadata, spoken lines,
and/or stage directions. In this case, one can apply similar approaches to those already described for
free text. For instance, Krishnan & Eisenstein [
196
] train a classier to detect addressees mentioned
in utterances, in order to determine who speaks to whom exactly. They detect not only proper
nouns, but also nominals, including titles and placeholder names (e.g. “bro”, “dude”, “sir”).
Choi et al. [
74
] apply an approach relatively similar to those designed for scripts, but rather to a
biographic dictionary of ctional characters: the classic Dictionary of Greek and Roman mythology
by Grant & Hazel. This work takes the form of a series of entries, each one summarizing the
biography of a given character. Choi et al. constitute the character list by parsing the entry keys,
and identify their occurrences in the entry bodies through exact matching.
3.2.3 Visual Narratives. In audiovisual narratives, detecting character occurrences amounts to
solving several distinct but related problems, depending on whether one focuses on the video or
the audio stream. In videos, these are face detection and face tracking [416,417].
Video Streams. Face detection consists in identifying which parts of a still image correspond
to faces, as illustrated in Figure 3. See [
204
] for a recent review of the eld. Jung et al. note that
current face detection methods are ecient mainly on front views of the faces [
178
]: this is a strong
limitation in our context, as a character can be lmed under a variety of angles. Weng et al. also
observe that current automatic methods do not reach satisfying enough performances, which is why
they rst proceed manually [
407
]. However, they later train their own model to obtain acceptable
performances on their dataset [
408
,
409
]. They experimentally nd the community structure of the
extracted networks to be relatively robust to face detection errors. A number of authors proceed
automatically using o-the-shelf tools [231,281,295,386,416].
The face detection problem is relatively similar when dealing with comics, except that the images
are drawings (cf. Figure 3). This implies a number of additional diculties: the characters can be
very deformed, non-human, or even non-anthropomorphic. Moreover, the structural lines dening
the characters and objects composing the panels are mixed with textures, screentones, and stylistic
elements. For these reasons, methods designed to handle photographs generally perform poorly on
comics, which require specic approaches [
76
,
365
]. One such approach consists in adapting features
or models originally developed for photographs, e.g. facial landmarks (points corresponding to
specic parts of the face such as eyes or mouth) in [
365
]. Takayama et al. [
371
] handcraft features
to t the specic case of mangas (skin and hair colors, jaw line shape, symmetry). The fact that a
pattern appears frequently in the narrative is also used as a hint to distinguish characters from
other objects [
154
]. More recent articles focus on training Deep Neural Networks [
76
], but there
is not enough publicly annotated data yet to reach the full potential of such approaches [
18
].
Finally, it is worth noticing that before being applied, many face detection approaches require
some preprocessing, in particular detecting panel bounds and speech bubbles [
76
], which in turn
constitute specic problems [312,364].
Face tracking builds upon face detection, and aims at identifying chronological sequences of
faces corresponding to the same person in a video. These sequences are called face tracks, and can
be considered as character occurrences in videos. Performing face tracking requires accounting
12/84 Vincent Labatut and Xavier Bost
a) b) c)
Fig. 3. Examples of face and body detection in: a) live-action movies [
295
]; b) animated movies [
353
]; c)
comics [76].
for changes in pose, scale, rotation, expression, color, light, angle, and blur. Most authors use
o-the-shelf tools [
231
,
433
], usually based on some form of similarity-based classication of the
detected faces.
Somandepalli et al. [
353
] detect characters in animated movies. This task proves to be much more
dicult than with live-action videos, as the design of the characters can vary widely, including
non-human, and even non-anthropomorphic shapes (cf. Figure 3). The authors rst list character
candidates by detecting salient objects in a generic way, before taking advantage of graphical and
saliency features to discard irrelevant ones. They then use an o-the-shelf tool to track deformable
objects.
Audio Streams. When using the audio stream, detecting character occurrences amounts to solving
the speaker segmentation (or speaker change detection) problem [
267
]. It consists in partitioning the
audio stream into segments associated to unique speakers. Put dierently, one wants to nd the
moments corresponding to switches between speakers. This task is sometimes performed simultane-
ously with that of segment clustering, which consists in grouping the segments spoken by the same
person. Performing both these tasks sequentially or simultaneously is called speaker diarization.
However, we treat this later in Section 3.3.2, and focus here only on speaker segmentation.
Certain existing systems work well in controlled environments, but this performance strongly
drops when applied to ction works, e.g. movie trailers and cartoons [
79
], and TV series [
100
].
This is mainly due to the presence of background music and sound eects, the higher number of
speakers [
50
], the spontaneous (though acted) nature of the exchanges, and the shorter speech
turns [
57
]. Results improve when using methods specically designed or trained on ctional
audiovisual narratives, e.g. [
25
] for movies, [
52
] for TV series. It is worth noting that compared to
video-based methods, audio-only tools do not allow identifying characters that appear in a scene
without speaking [178].
Multimodal Approaches. Certain approaches try to combine several types of information, be it
video-, audio-, or language-based. A few multimodal methods able to perform speaker segmentation
using both audio and video have been proposed, but we describe them later as they all additionally
solve speaker clustering (and therefore speaker diarization). Scripts can be used to distinguish
speakers or on-screen characters as in [
178
,
295
,
296
]. A script is not time-stamped, so this approach
requires rst solving an additional problem called script alignment, which consists in determining
the exact time at which each line contained in the script occurs in the video. In [
71
], Chen et al.
Extraction and Analysis of Fictional Character Networks: A Survey 13/84
use transcripts extracted from TV series. They apply o-the-shelf tools to detect all three forms of
textual character mentions (proper nouns, nominals, pronouns).
3.3 Unification of Character Occurrences
The second step of character identication is occurrence unication, which consists in determining,
for each detected occurrence, to which character it corresponds. Like for occurrence detection, the
methods proposed for this purpose vary much depending on the medium of the narrative: textual
(Section 3.3.1) vs. visual (Section 3.3.2). However, this time there is no distinction to make between
free and semi-structured text, as all the additional information of the latter has already been used
during the detection step.
As the literature shows, character unication is often not performed at all. There are mainly two
reasons for this: this task is generally harder than occurrence detection (especially in text); and in
certain situations it is simply unnecessary. For instance, when extracting a purely conversational
network (Section 4.2) from a clean script, the speakers explicitly named in the script are enough,
e.g. [
87
,
268
,
295
]. Or when extracting a network from a novel by considering chapter co-occurrences
(Section 4.1) [
192
]: it is likely that all characters participating in the chapter will be explicitly named.
Some authors show this empirically, e.g. Seo et al. [
341
] argue that restricting their analysis to
explicit proper nouns (and thus, ignoring pronouns and noun phrases) is enough to perform their
targeted tasks (character ranking and edge prediction) without signicant performance loss.
3.3.1 Textual Narratives. As mentioned before, characters occurrences appear under three forms
in text: proper nouns, nominals, and pronouns. Unifying these occurrences can be considered as
a specic version of the coreference resolution problem, which consists in identifying sequences
of expressions, called coreference chains, that represent the same concept (see [
369
] for a recent
review). Generic tools exist to solve this problem, but their performance does not necessarily
translate to ction works [
389
]. In particular, they tend to overlook minor characters such as those
mentioned only through nominals (e.g. “the detective”) [
389
]. Moreover, in our case the referents
are necessarily persons (characters), a category of entities possessing certain characteristics (e.g.
gender) which can be leveraged to improve performance. Two variants of the problem appear in
the literature: certain authors focus only on alias resolution (e.g. [
98
]), which consists in grouping
proper nouns referring to the same character, while others additionally solve pronominal and/or
nominal anaphoras.
The task of alias resolution arises because of the variability of proper nouns appearing in ction
works. On the one hand, in addition to their full name, characters are generally called by a variety
of aliases depending on context, style, and other factors. For instance, Sherlock Holmes can also
be called “Mr. Holmes” or “Sherlock”. On the other hand, some aliases cannot be unequivocally
associated to a character, e.g. “Mr. Holmes” can refer to both Sherlock Holmes and his brother
Mycroft Holmes. Most authors use some form of name clustering to perform alias resolution,
each cluster corresponding to all the names encountered for a specic character. Roughly, They
use two factors to determine that two aliases point at the same character: string similarity and
gender compatibility [
16
,
96
,
99
,
289
,
389
]. The gender of a character mention can be detected using
gendered honorics (e.g. “Mr.” vs. “Mrs.”) and gendered rst names (e.g. “Stephen” vs. “Stephanie”),
matched to a manually constituted list or some external resource such as WordNet [99,289].
A straightforward approach to compare strings is to use an appropriate distance function [
162
].
However, by doing so, one ignores the structure of the names: potential presence of honorics,
initials, multiple rst or last names, distinction between rst and last names. Moreover, a number
of conventions are culture-specic (e.g. the use of patronyms in Russian names). Also, the relative
proportions of rst and last name occurrences is likely to vary considerably from one work
14/84 Vincent Labatut and Xavier Bost
to the other, as it is tied to stylistic aspects: it is assumed to reect the level of intimacy in
the narrative [
391
]. Certain authors propose to perform direct comparisons through predened
patterns [
289
] or rules [
16
,
17
]. Elsner [
96
] rst compares only multiword names, in order to deal
with the ambiguity of isolated rst or last names. He constitutes clusters of similar and compatible
multiword names, and only then assigns single word names to these clusters whenever possible.
The remaining names are assigned based on spatial proximity in the text and lexical frequency.
Certain authors use a generative approach [
98
,
389
]: based on multiword names found in the text,
they produce potential variants thanks to predened recombination rules (e.g. addition of honorics,
omission of rst names) and resources such as gazetteers. These articial names are then matched
to those found in the text. Vala et al. [
389
] use additional constraints to prevent certain names from
being grouped together: co-occurring names, names with the same last name but dierent rst
names, names containing dierent honorics.
The other types of anaphoras are more dicult to handle, as they convey additional issues.
Some pronouns or nominals may not be connected to any proper noun (and therefore character), if
their referent is missing. They can also have split referents, e.g. “They” and “the Holmes brothers”
can both refer to “Sherlock and Mycroft Holmes”. Certain anaphoric expressions can also refer
to non-character entities. Many authors use o-the-shelf tools to solve automatically pronominal
anaphoras, e.g. [
358
,
366
,
385
]. Lee & Yeung additionally dene a distance limit between the
reference and the referent, in order to discard relations deemed too remote [
213
]. Vala et al. [
389
]
extend to pronouns and nominals the cluster-based approach used for alias resolution. As for proper
nouns, gender compatibility can be leveraged for certain pronouns (ex. “she” vs. “he”) and nominals
(ex. “uncle” vs. “aunt”). In order to identify anaphoras referring to characters (by opposition to
other types of entities), they constitute a list of verb-noun co-occurrences considered as frequent
in novels, and perform a grammatical dependency parsing: only the expressions involved in such
situations are considered as character mentions. In [
162
], Jannidis et al. use a co-reference resolution
tool that they previously developed specically for novels in German [
198
]. In particular, they use
linguistic resources to associate close synonym nominals to the same character.
3.3.2 Visual Narratives. Like before, handling audiovisual narratives amounts to solving very
dierent problems, which depend on whether one uses video or audio data. When dealing with
videos, the problems at hand are face track clustering, and possibly face-name matching.
Face track clustering consists in identifying groups of face tracks (output from the occurrence
detection step) corresponding to the same face, hence character. Certain authors use o-the-shelf
tools [
231
,
386
,
416
], but others consider that these generic methods are not suciently ecient
when applied to ction works [
407
]. Their results can be improved by training them on a corpus of
such works [
408
,
409
], but this requires additional work and resources. Zhang et al. [
433
] propose a
new method combining the Earth-mover’s distance with constrained
k
-means clustering, followed
by an additional pruning step.
As mentioned before, when using the audio stream, the problem is to perform speaker clustering
based on the output of the speaker segmentation step, the whole process being called speaker
diarization [
267
]. Speaker clustering consists in grouping audio segments spoken by the same
character (akin to track face clustering). Generic method are subject to the same limitations as
observed for speaker segmentation [
57
,
79
,
100
]. Methods specically developed for ctions obtain
better results. In [
52
], Bost & Linarès treat TV series through a two-step method rst solving the
problem locally at the scene level, then combining these partial results at the global level in order
to deal with the whole character set. In [
53
], they turn to a multimodal approach to enhance their
method: in addition to their audio-based tool, they independently perform speaker diarization
based on low-level video and audio features, before performing optimal matching to combine
Extraction and Analysis of Fictional Character Networks: A Survey 15/84
both resulting outputs. However, after further experimentation, Bost et al. consider the obtained
performance is not sucient, and eventually turn to manual annotation [50,51].
Like for character occurrence detection, certain authors adopt multimodal approaches. Some
prefer to extract the transcript of the work and apply text-based methods instead of directly using
the video or audio stream. This is the case for Chen et al. [
71
] who, after having detected character
occurrences in this text, use existing o-the-shelf co-reference tools to detect chains of mentions to
the same character. They identify the concerned character by using predened rules, exploiting
the presence of a proper noun in the chain, or connections to utterances whose speaker could be
identied. They propose an automatic method based on agglomerative convolutional networks to
take advantage of the latter type of information [
72
] when solving co-references and identifying
characters associated to co-reference chains. In [
57
], Bredin & Gelly combine face track clustering
and speaker diarization: they rst detect speakers through standard speech activity detection tools,
before using face embeddings to cluster the face tracks corresponding to the resulting speech
segments.
An additional issue specic to audiovisual narratives is to determine the name of the charac-
ters detected by grouping faces or speech segments. In the former case, this is called Face-name
matching, and in the latter, speaker identication. In both cases, solving this problem requires using
linguistic information: speech content (via transcripts, scripts, or subtitles), text overlaid in the
video, predened list of characters, other external resources. For instance, based on the assumption
that the title of the ction work is known, Tran et al. [
379
] retrieve its list of characters from IMDb,
look for their picture using Google Image, and leverage this information to infer character names
in the movie through matching. In the context of ction works though, the favored approach is
to leverage scripts [
178
,
231
,
295
,
433
]. This again raises the issue of script alignment (with the
corresponding video, transcript, or subtitle), as scripts are not time-stamped. After alignment, the
script directly allows recovering speaker names, and inferring addressee names (for instance by
crosschecking names mentioned in the conversation and on-screen faces).
For comics, a variety of methods have been proposed for character detection (or face recogni-
tion [
371
]), i.e. to match the multiple occurrences of a character’s face. The general approach
consists in dening some form of similarity measure, which is then leveraged to group occurrences
corresponding to the same character. This is what Takayama et al. [
371
] do, based on the features
they use for face detection (cf. Section 3.2.3). Stricker et al. [
365
] adopt a similar approach, but
to compare their sets of facial landmarks. The problem is dicult and still open, its resolution
will likely require large annotated corpora [
18
]. Ho et al. [
154
] represent character occurrences by
graphs of adjacent graphical subregions, and use approximate graph matching to group occurrences
corresponding to the same objects. Sun et al. [370] propose a method based on local features.
3.4 Additional Processing
In addition to character occurrence detection and unication, certain authors perform some addi-
tional processing related to character identication. We identify two such operations (which are
not mutually exclusive): ltering characters (or character occurrences) considered to amount to
noise (Section 3.4.1), and get a more detailed individual description of the characters (Section 3.4.2).
3.4.1 Character Filtering. Certain authors remove characters deemed not frequent enough [
96
,
98
,
159
,
366
]. For instance, Elson et al. [
98
] remove a character if it appears three times or fewer in
a novel, or if he amounts to 1% or less of all occurrences. They consider these as noise, possibly
generated by their alias resolution method: the occurrences are likely to refer to another existing
character, and not a separate infrequent one. When extracting conversational networks, Suen et
al. [
368
] ignore a character if he speaks fewer than ve times over the plot. Character ltering can
16/84 Vincent Labatut and Xavier Bost
concern a large part of the novel, e.g. Hutchinson et al. [
159
] remove approximately 80% of the
characters to study the Harry Potter novels.
Bossaert & Meidert [
47
] remove characters depending on their social category. Indeed, they
want to use the character network of the Harry Potter series of novels as a proxy to study peer
support among adolescents. For this reason, they remove all characters that are not Harry Potter’s
schoolmates.
3.4.2 Aribute Extraction. In addition to simply identifying the characters, certain authors
extract some additional information allowing to describe them. It typically takes the form of nodal
attributes in the nal character network. These are retrieved directly from the work of ction itself,
but also from external sources.
The most popular character trait used in the literature is probably their gender, which can be
assessed manually [
47
,
325
], through some external resource [
130
], or automatically [
16
,
17
,
99
].
Some authors dene a categorical attribute representing the various sides present in the considered
narrative; in the articles reviewed in this survey, this is always performed manually. For instance,
when dealing with the Marvel Universe, Gleiser extracts the characters’ alignment (hero vs. villain).
In [
420
], Yose et al. study the medieval Irish epic text Cogadh Gaedhel re Gallaibh, which narrates a
war between Irishmen and Vikings. For their analysis, they distinguish three categories of vertices:
Irishmen,Vikings, and Others. When studying Homer’s Iliad, Kydros et al. identify four mutually
exclusive categories of characters: Greek,Trojans,Gods and Others.
Certain authors focus on more sociological traits. In [
320
], Rochat & Triclot are interested in
the representation and relative position of science and politics in science-ction works. Through
crowdsourcing, they manually dene an attribute representing whether the main occupation of
each character is mainly related to Politics,Technology, or Science, but also Family,Religion, or
Art. They use the same attribute to identify Animals, and the remaining characters are noted as
Undened. When studying the Harry Potter series of novels, Bossaert & Meidert manually retrieve
the school house and school year of each student (in addition to his or her gender) [
47
]. In [
325
],
Rydberg studies Greek tragedies. He manually extracts the so-called social class of attributes: a
mix between race, social status and narrative role (Gods,Upper/lower class mortals,Chorus, and
Non-speaking characters.
The resulting nodal attributes have mainly two usages. Certain authors use them directly to
study the considered narrative. For instance, Bossaert & Meidert want to assess the eect of certain
character traits on the psychological mechanism of peer support, and therefore on the character
network structure [
47
]. Other authors take advantage of these attributes to solve some higher-level
problem. For instance, Ardanuy & Sporleder [
16
,
17
] use character gender to build features later
leveraged to classify novels automatically.
4 INTERACTION DETECTION
Based on the character occurrences, the next step of the extraction process consists in detecting all
interactions happening in the narrative between each pair of characters. Such an interaction can
be explicitly described, but also inferred from the narrative, depending on what one considers to be
an interaction. We identify ve distinct approaches in the literature.
The rst (Section 4.1) is co-occurrence-based, and relies on a decomposition of the narrative
into smaller narrative units. Two characters are considered to interact when they jointly appear
in the same such unit. The second approach (Section 4.2) considers only direct verbal interactions
between the characters. This is particularly appropriate for dialogue-oriented narratives such as
plays. The third (Section 4.3) requires one character to explicitly mention another one to infer
an interaction between them. The fourth (Section 4.4) takes into account other types of direct
Extraction and Analysis of Fictional Character Networks: A Survey 17/84
interaction than conversation (e.g. ghting, kissing). The fth (Section 4.5) focuses on explicitly
expressed aliations, such as family relationships or being coworkers. Finally, it is also possible to
combine several of these approaches, in various ways (Section 4.6).
4.1 Co-occurrences
The co-occurrence-based approach is the most widespread in the literature, probably because it
is the easiest to apply: detecting interactions in a more precise way can be a dicult problem,
even for humans [
320
]. This approach consists in breaking down the considered work into smaller
narrative units, and in assuming that two characters interact when they occur together within
the same unit. A few authors use additional constraints, to ensure that co-occurrences actually
capture interactions. Some want the narrative unit to contain only the two characters of interest,
and no one else [
96
,
161
]. Others take into account only consecutive occurrences, i.e. not separated
by another character [136].
Using co-occurrences presents several limitations mainly caused by their imprecise nature.
Indeed, co-occurrence is only a proxy for actual interaction, as it is possible for two characters to
appear together without interacting at all (e.g. they both are spectators of some event [
263
], or one
mentions the other in his absence [
307
]). The rst limitation is that this imprecision propagates to the
network itself: the set of co-occurrence-based interactions theoretically contains the conversation-,
mention-, action- and aliation-based ones, plus some false positives
2
. In practice though, Ardanuy
& Sporleder [
17
] argue that false positives are rare in the sense that two co-occurring characters
are almost always related in one way or the other. But this holds only for their experimental results,
obtained by integrating co-occurrences over whole narratives. On the contrary, Edwards et al. note
that co-occurrence networks are denser [95].
The second limitation also directly comes from the imprecise nature of co-occurrences. As they
encompass a number of dierent types of interactions, it is not possible to assign them a direction,
and they are therefore regarded as some form of bilateral interaction. Furthermore, for Kwon &
Shim [
206
], due to their imprecise nature, co-occurrences ignore intimate aspects of interactions,
such as opinions and emotions. The third limitation, according to Prado et al. [
307
], is that using
co-occurrences results in more importance being given to otherwise minor characters, when later
analyzing the obtained character network. As discussed in our introduction, all these arguments
must be balanced accordingly to the possibly very specic nature of the considered narrative.
We discuss the choice of the narrative unit in Section 4.1.1, as it depends on the type of narrative
and can aect the end result. Besides the detection of interactions under the form of co-occurrences,
certain authors additionally assign a numerical score to each interaction in order to include
more information in the character network eventually extracted: we review such approaches in
Section 4.1.2.
4.1.1 Narrative Unit.
Novels. In novels, Rochat & Kaplan use the page as a narrative unit [
319
], as imposed by the
predened character index they leverage during character identication (see Subsection 3.1). Such
a partitioning of the text, based on purely physical (and therefore arbitrary) aspects, results in
the possible split of chapters, paragraphs, or even sentences. It is therefore very likely to miss
co-occurrences. Rochat & Kaplan try to overcome this problem through a