ArticlePDF Available

The Automatic Identification of Conceptual Metaphors in Hungarian Texts: A Corpus-Based Analysis

Authors:

Abstract

The present study is a corpus-based analysis of literal versus metaphorical language use. Previous corpus linguistic works have focused on the linguistic characteristics of the metaphorical expressions. The main question of the present paper is whether the automatic identification of certain conceptual metaphors could be successful taking the embodiment hypothesis as a starting point. 12 widespread conceptual metaphors were selected from Lakoff & Johnson (1980) and the metaphor index in Kövecses (2002), where consistent mapping was observed between a concrete (source) domain and an abstract (target) domain. According to our hypothesis, a metaphoric sentence should include both source-domain and target-domain expressions. This assumption was tested relying on three different methods of selecting target-domain and source-domain expressions: a psycholinguistic word association method, a dictionary method and a corpus-based method The results show that for the automatic identification of metaphorical expressions, the corpus-based method is the most effective strategy, which suggests that the concept of source and target domains is best characterized by statistical patterns rather than by psycholinguistic factors.
The Automatic Identification of Conceptual Metaphors in Hungarian Texts: A
Corpus-Based Analysis
Anna Babarczy
1
, Ildikó Bencze M.
1
, István Fekete
1
, Eszter Simon
1,2
1
Budapest University of Technology and Economics
Department of Cognitive Science
H-1111 Budapest, Stoczek u. 2.
E-mail: {babarczy, ibencze, ifekete, esimon}@cogsci.bme.hu
2
Research Institute for Linguistics, Hungarian Academy of Sciences
H-1068 Budapest, Benczúr u. 33.
E-mail: eszter@nytud.hu
Abstract
The present study is a corpus-based analysis of literal versus metaphorical language use. Previous corpus linguistic works have
focused on the linguistic characteristics of the metaphorical expressions. The main question of the present paper is whether the
automatic identification of certain conceptual metaphors could be successful taking the embodiment hypothesis as a starting point. 12
widespread conceptual metaphors were selected from Lakoff & Johnson (1980)
and the metaphor index in Kövecses (2002), where
consistent mapping was observed between a concrete (source) domain and an abstract (target) domain. According to our hypothesis, a
metaphoric sentence should include both source-domain and target-domain expressions. This assumption was tested relying on three
different methods of selecting target-domain and source-domain expressions: a psycholinguistic word association method, a dictionary
method and a corpus-based method The results show that for the automatic identification of metaphorical expressions, the corpus-
based method is the most effective strategy, which suggests that the concept of source and target domains is best characterized by
statistical patterns rather than by psycholinguistic factors.
Keywords: embodiment hypothesis, conceptual metaphors, association, corpus-based, automatic identification
1. The Theory of Metaphor
1.1 The Cognitive Theory of Metaphor
In everyday language use the term metaphor is held to be
a figure of speech which refers to an analogy between two
entities or concepts (e.g., Achilles was a lion). In cognitive
linguistics, in contrast, metaphor is first of all a conceptual
process, thus metaphorical relations are taken to be
conceptual mappings, which characterize not only our
language use but also our everyday life, thought and
behavior (Lakoff & Johnson, 1980). According to the
cognitive linguistic view, conceptual metaphors refer to
the understanding of an abstract concept, also called the
target domain, in terms of a concrete concept of which we
can have direct sensory experience, namely the source
domain. This underlying association between the two
domains is held to be systematic in both language and
thought.
The hypothesis that the representation of abstract concepts
in the mind/brain is grounded in the representation of
concrete knowledge, which in turn is grounded in our
bodily experience of the world, is the main statement of
the embodiment theory in cognitive linguistics (Gibbs,
2006; Kövecses, 2002; Lakoff & Johnson, 1980, 1999).
For example, people universally think and talk about the
abstract concept of “time” with the help of “space”, the
terms of which are acquired through our interaction with
the environment (before, after, under, in etc.).
Consequently, we can argue that the concept of “time” is
structured by the concept of “space” which means that
there is a
TIME IS SPACE
conceptual metaphor in our mind.
This hypothesis is supported by psycholinguistic
experiments: it has been shown, for instance, that sensory-
motor experiences influence the interpretation of
metaphorical expressions on "time" (Boroditsky &
Ramscar, 2002) which means that during the
understanding of metaphors people do physical motion
simulation, i.e. they imagine the actions or events
described by metaphorical expressions (Gibbs & Matlock,
2008). However, other experiments did not find evidence
for the necessity of conceptual metaphoric mappings in
comprehension of metaphorical expressions (Keysar et al.,
2000; Szamarasz, 2006). The problem whether in natural
language use abstract concepts are independent of
concrete concepts still remains an open question.
1.2 The Statistical Learning Theory
Another approach referring to the nature of abstract
knowledge is the statistical learning theory, which
argues that people acquire and structure their abstract
concepts with the help of the statistical properties of
language (Burgess & Lund, 1997; Landauer & Dumais,
1997). This means that novel linguistic symbols are
directly abstracted from known symbols without the
interference of metaphorical processes or embodied
schemes.
The two theoretical approaches do not necessarily exclude
one another since it is conceivable that our abstract
knowledge exploits both sources mentioned above.
According to this integrative point of view (Andrews et
al., 2005, 2007), both the attributive and distributive
properties of words play an important role in symbol
grounding. Attributive properties are non-linguistic
physical attributes associated with a word, while
distributive factors refer to common occurrences of a
word with other linguistic elements.
Based on our discussion so far, the present paper
investigates whether the automatic extraction of
conceptual metaphors in large corpora could be successful
taking the embodiment hypothesis as starting point, and
along with this, whether which strategy is the most
effective: the psycholinguistic word association method or
the corpus-linguistic method based on statistical patterns.
2. Metaphor and Corpus Linguistics
2.1 Corpus-Based Research on Metaphor
Corpus-based studies of metaphorical language use have
already pointed out the inadequacy of the cognitive theory
and also the defects of psycholinguistic experiences.
These critics claim that the theoretical and experimental
research neglect the linguistic attributes of metaphorical
expressions, and they do not use natural data but fictitious
examples, which might be misleading in some cases. For
example, Deignan (2008) demonstrates that according to
corpus-linguistic results the conceptual metaphor
AN
ANGRY GROUP OF PEOPLE IS A WILDFIRE
is more likely to
occur than the metaphor
ANGER IS THE PRESSURE OF
HEATED FLUID IN A CONTAINER
,
even though it is the latter
that is ubiquitously listed in works in cognitive theory.
Observed metaphorical patterns (Stefanowitsch, 2006)
and collocations (Deignan, 2005, 2008) also have
characteristic grammatical features. Similarly, Deignan
(2005) demonstrates that in metaphoric usage the words
have less grammatical liberty compared to their literal
occurrences. For example, the words belong to the source
domain in the metaphorical mapping tend to denote
actions and properties, and thus they occur mainly as
verbs and adjectives. These results show that the logical
relations between concrete entities are not simply
mirrored in abstract language use but undergo some kind
of change. This fact supports the so-called blending theory
(Fauconnier & Turner, 2002), which contends that during
metaphoric language use people create a mixed or blended
domain that has a proper structure and relations, and thus
proper linguistic features.
Taking all the evidence into account, it is clear that the
conceptual theory of metaphor alone is not able to explain
all the phenomena found in texts.
2.2 Methodological Problems in Automatic
Conceptual Metaphor Identification
The default method of metaphor annotation is manual
processing: based on their linguistic intuitions, researchers
mark expressions that they perceive as metaphorical in a
given corpus. Since this method is very labor-intensive
and time-consuming, it is worth experimenting with at
least partly automated techniques, such as searching a
corpus for expressions belonging to the source domain
(e.g., Deignan, 2008) or to the target domain
(Stefanowitsch, 2006) and manually checking the
extracted sentences for metaphoricity. Finally, it is also
possible to search the corpus for sentences containing
characteristic words from both the source and the target
domains of a given conceptual metaphor (e.g., Martin,
2006). The disadvantage of this method is that in this way
we can test only predetermined metaphorical mappings,
and, in contrast to the technique used by Stefanowitsch
(2006), the recovery of novel metaphors is precluded.
However, it has the advantage of a higher level of
automation in the annotation process allowing the
processing of larger corpora. It is this latter strategy that
our study attempts to enhance.
The first step of any of the above three (semi-) automated
methods is that expressions that are likely to characterize
either the source domain or the target domain of a given
metaphor type need to be collected. However, the
identification of the linguistic cues that may characterize a
particular domain is not a straightforward question. A
problem facing automatic metaphor annotation is that, in
general, the domains of conceptual mappings discussed in
the cognitive literature are associated with concepts rather
than specific linguistic forms. Our paper undertakes to
address this issue by testing three different methods of
compiling word lists characterizing the source versus the
target domains of a set of conceptual metaphors. The first
two methods rely on experimental psycholinguistic
evidence and on lexicographic data, while the third
approach is based on the manual analysis of a reference
corpus. In addition to the practical import of the results for
corpus analysis, the experiments also shed light on the
language theoretical issue discussed in Section 1. If either
of the first two methods proves to be more successful, we
have some support for the embodiment hypothesis. If,
however, the third method leads to the best results, the
statistical approach to metaphor proves to be more
plausible.
3. The Study: Automatic Identification of
Metaphors
The main question addressed by the present study is,
therefore, whether the automatic identification of certain
conceptual metaphors is feasible taking the concept of
source-to-target domain mapping as a starting point.
The experiment involved the following phases:
A set of conceptual metaphors were selected from
the cognitive linguistic literature.
A corpus was compiled using a variety of text
types.
Word lists characterizing the source and the target
domains of the selected conceptual metaphors
were compiled using three different methods.
This resulted in three separate sets of source-
target word lists.
Sentences containing at least one source-domain
word and at least one corresponding target-
domain word were automatically extracted from
the corpus. The three sets of word lists were used
in separate runs.
The results were manually checked for precision
and recall.
3.1 Resources and Methods
3.1.1. The Conceptual Metaphors
12 widespread conceptual metaphors were selected from
Lakoff & Johnson (1980) and the metaphor index in
Kövecses (2002). The criteria for the selection process
were the following:
The metaphor had to be general enough to be
found in many types of texts,
The domains had to be suitable for providing
associations in a psycholinguistic experiment,
and
There had to be a mapping from a concrete source
domain to an abstract target domain.
Based on the above, the following 12 conceptual
metaphors were chosen:
1.
ANGER IS HEAT
2.
CHANGE IS MOTION
3.
CONFLICT IS FIRE
4.
CONTROL IS UP
5.
CREATION IS BUILDING
6.
MORE IS UP
(
LESS IS DOWN
)
7.
POLITICS IS WAR
8.
PROGRESS IS MOTION FORWARD
9.
RESOURCES ARE FOOD
10.
THE MIND IS A MACHINE
11.
THEORIES ARE BUILDINGS
12.
TIME IS MONEY
3.1.2. The Corpus
The corpus was compiled observing two criteria: a variety
of genres should be represented; and the texts should be
accessible for research purposes in four different
languages. The genres include modern fiction from digital
libraries, popular science articles from the National
Geographic magazine and movie subtitles, the latter of
which was included as a representation of quasi-spoken
language. The criterion of multilingual availability was
needed in view of future plans of creating a multilingual
parallel corpus (Hungarian, English, Spanish and Italian)
with metaphor annotation. As the analysis has only been
completed for the Hungarian texts, the results described in
this paper apply to the Hungarian corpus. The sizes of the
Hungarian texts from the different genres are shown in
Table 1.
Text types Number of text words
National Geographic 68,997
Subtitles 32,148
Fiction 208,384
Total 309,529
Table 1: The content of the corpus.
The texts were converted to plain text format with UTF-8
character encoding. The morphological analyzer Hunpos
(Halácsy et al., 2007) was used to tag the Hungarian texts.
Hunpos was chosen because it is a Hidden Markov Model
based open source part-of-speech tagger, which can tag
any language once it has been trained on a pre-tagged
corpus. As the next step, the tagged corpus was converted
to XML format, which was our working format for
metaphor identification.
3.1.3. The Baseline Corpus
In order to obtain an estimate of the performance expected
from an automatic metaphor annotation method a baseline
corpus was constructed on which human inter-annotator
agreement was measured.
The baseline corpus was created by extracting 10%
(approximately 30,000 words) of the entire corpus in
which each genre was represented in the same proportion
as in the main corpus. The baseline corpus was
independently annotated for metaphors by two human
annotators.
The manual annotation followed a pre-defined procedure.
The procedure was based on the criteria defined by
Pragglejaz (2007). For example, classical idioms, i.e.,
fixed collocations which are not decomposable (e.g., pop
the question), “dead metaphors” or those which are
metaphorical only in etymological sense (e.g., the word
depression) were not classed as metaphorical. A rule was
further defined for each type of conceptual metaphor. For
example, in the case of the
MORE IS UP
conceptual
metaphor we applied the following rules: “Every
expression with a ‘quantity’ meaning which can be
visualized as moving along a vertical scale, e.g., price,
lease, temperature, should be annotated as a potential
target domain expression. Every sentence which contains
the word csúcs (‘top’) e.g., csúcsteljesítmény (‘top
performance’), csúcstechnológia (‘peak technology’)
should be annotated as metaphorical.”
At the first attempt, inter-annotator agreement was only
17%. After refining the annotation instructions, we made
a second attempt, which resulted in an agreement level of
48%, which is still a strikingly low value. These results
indicate that the definition of “metaphoricity” is
problematic in itself.
Some typical sources of disagreement between the
annotators are the following:
In the absence of a statistical measure of semantic
distance, it was difficult to draw the line between words
directly referring to a concept belonging to the source
domain and those indirectly referring to it. For example,
in the case of the conceptual metaphors
ANGER IS HEAT
or
CONFLICT IS FIRE
, the source domain should be an
expression referring to a sort of “heated thing”. However,
in some cases, one or the other annotator included words
indirectly suggesting the presence of heat, such as kiolt
('extinguish'), kihől ( 'get cold') etc. Another case in point
is the phrase a memória élesítése (the sharpening of one’s
memory'), which or may or may not be an instance of the
conceptual metaphor
THE MIND IS A MACHINE
, depending
on whether the annotator is prepared to accept the indirect
association between machines and acts of sharpening.
A second source of discrepancies was the fuzzy nature
of the boundary between ambiguous words having an
established abstract sense and metaphorical uses of
unambiguous words. For example, the expression
eljutottam a mai napig ('I've gotten to this day') may or
may not represent a
CHANGE IS MOTION
metaphor
depending on whether the Hungarian verb jut (literally:
get somewhere, reach a place by moving the entire body)
is taken only to denote physical movement or to be
ambiguous. The verb alapul ('be founded on something'),
which is derived from the noun alap (‘foundation’) is
similarly problematic since, although az elmélet alapjai
('foundations of the theory') is a good example for
THEORIES ARE BUILDINGS
, the verb derived from the
concrete noun can only have an abstract sense. The
question is, therefore, how far we should go in diachronic
or morphological analysis when making a decision of
metaphoricity.
The level of inter-annotator agreement was further
lowered by discrepancies in the classification of
metaphorical expressions. Consider the following
example from the novel The Master and Margarita: "az
öreg elıbb megdöntötte mind az öt bizonyítékot, és aztán,
mintegy magamagából csúfot őzve, ı maga felállított egy
hatodikat."the old man first demolished all five
arguments and then, as if mocking himself, constructed
a sixth of his own'. This phrase were classified by one of
the annotators as a
THEORIES ARE BUILDINGS
metaphor,
while the other considered it to pertain to a
CREATION IS
BUILDING
type. Similarly, it is difficult to make an
informed decision on whether the following example
contains a
CHANGE IS MOTION
or a
PROGRESS IS MOTION
FORWARD
metaphor, neither of which appear to be an
intuitively correct choice: a járvány végigsöpört
szülıvárosukon ('the epidemic swept through their
hometown').
3.1.4. The Compilation of the Word Lists
For the automatic identification of metaphors, we
searched the corpus for sentences containing one or more
words characterizing the source domain and one or more
words representing the target domain of a given
conceptual metaphor. Three different methods of
compiling the word lists were tested: a) word association
experiment, b) dictionary of synonyms, and c) reference
corpus.
The first method is based on the assumption that the
expressions people associate with a key word for the
source domain and a key word for the target domain can
provide a lexical profile for a given metaphor type. The
word associations were collected in an online experiment.
138 students from the Budapest University of Technology
and Economics participated in the experiment. One key
word for each source and target domain (e.g., anger,
building, change, up, war) appeared on the screen one at a
time in randomized order and the participants had one
minute to type words they associated with the key word.
When the minute was up, the keyword disappeared and
participants were instructed to click a button when they
were ready for the next key word.
The lists obtained in the association experiment were
normalized: multiword expressions, proper names and
antonyms were filtered out, abbreviations were
completed, and finally, the words were stemmed by the
Hunmorph open source morphological analyzer (Trón et
al., 2005).
For each of the 12 conceptual metaphors, the resulting
two word association lists (one containing associations
provided for the source domain, and another providing
associations for the target domain) were taken to
constitute the metaphor’s lexical profile.
For the second method, the word lists obtained from the
association experiment were expanded with the synonyms
listed for the association words in the Magyar szókincstár
[Hungarian Word Thesaurus] (Kiss, 2007). Dialectal,
slang and obsolete expressions were omitted. Compared
to the association list, the size of the word lists
substantially increased (see Table 2). For the third -
corpus-based - method, the word lists for each source and
target domain were extracted from the manually annotated
baseline corpus. Due to the low level of inter-annotator
agreement obtained for the baseline corpus, the union of
sentences annotated as metaphorical by the two annotators
were used for compiling the corpus-based lists of source
and target domain words.
Method
Words
Psycho-
-linguistic Synonyms Corpus-
based
Source
domain 1239 6348 126
Target
domain 674 5094 120
Table 2. Number of words in source- and target-domain
lists compiled by the three methods
3.1.5. The Annotation Process and its Verification
Based on the three sets of word lists, the XML test corpus
was automatically annotated producing three files in
which the sentences were marked with tags showing the
type of conceptual metaphor the system identified. Each
of the three annotation versions were then verified
manually using the graphical interface of the GATE
application (Cunningham et al., 2002). Because of time
constraints, the manual verification was completed for
10% of the test corpus, where the different genres were
represented in the same proportion as in the entire corpus.
In this sub-corpus, a total of 155 sentences were identified
as metaphorical by two human annotators.
3.2 Results
The results of the three methods were quantified by the
precision and recall measures (Table 3). Precision shows
the proportion of the sentences correctly tagged as
metaphorical by the automatic system, while the recall
measure shows the percentage of metaphorical sentences
successfully identified by the system. The F-measure is
the weighted harmonic mean of these values, i.e. the final
indicator of the system’s performance.
Method Recall Precision F-
measure
Association 3.8% 7.5% 5.6%
Dictionary 18.1% 4.5% 11.3%
Corpus 31.3% 55.4% 43.3%
Table 3: Results of the three methods.
The results reveal that the association method covered
substantially fewer metaphorical sentences containing
both a source and a target expression than the other two
methods. This psycholinguistic method also performed
very poorly in terms of precision. When the association
word lists were expanded with synonyms, recall
somewhat improved but only at the cost of a decline in
precision. The corpus-based method was very clearly the
most successful of the three strategies. Taking all our
results into account, we must contend that the hypothesis
that the co-occurrence of psycholinguistically typical
source domain and target domain words in a sentence is a
good predictor of metaphoricity receives no empirical
support. Exploiting the statistical properties of texts leads
to considerably better but still not satisfying results.
3.3 Problem Cases
It is clear from the above discussion that deciding whether
a sentence is metaphorical or not is far from being a
straightforward task. The general experience of our
experiments is that if certain elements are difficult for a
human language user to find in a text, then the automatic
identification of these words also brings poor results. One
problem is that in several cases we must look beyond a
single sentence. The manual annotation identified several
sentences that were metaphorical but did not contain
words from both the source and the target domains, i.e.
they were problematic with regard to recall. There were
sentences in which a word denoting a concrete action in
its literal interpretation (source domain) referred to a
metaphorical event, which could only be deduced from
the extra-sentential context.
In other cases, the metaphoricity of the sentence was
signaled by a single word which incorporated both the
source and the target meaning.
Precision values were lowered by the frequent occurrence
of sentences which contained both a source and a target
expression but were not metaphorical. A typical example
is given below:
Mérnökök és vezetık tanakodnak kisebb csoportokban a
23 emelet magas fúrótorony tövében. (‘Small groups of
engineers and managers are discussing their options at
the base of the 23-storey tall oil-rig.’)
The word manager is a target-domain expression and the
adjective tall is a source-domain expression for the
metaphor
CONTROL IS UP
but the two words are
conceptually unrelated in this particular sentence.
4. Conclusions
The present paper investigated the automatic
identification of conceptual metaphors using corpus-
linguistic analyses, and found that the concept of source
and target domains is best characterized by statistical
patterns rather than by psycholinguistic factors. Since the
main objective of our study was to find the most effective
way of automatically identifying conceptual metaphors in
natural texts, we did not carry out a detailed grammatical
analysis of the examples or explore the possible
connection between the type of texts and the type of
metaphors occurring in them. However, it seems that our
research supports previous results of corpus-linguistic
analyses, in particular those regarding collocations and the
linguistic form of metaphorical expressions. This is also
confirmed by the fact that, while the lists compiled on the
basis of the association experiment had a very weak
predictive force, the targeted selection of the words
characteristic to conceptual domains brought the best
result, which means that not every association suggests
metaphoricity but only the common co-occurrences of
certain words and expressions. For example, in Hungarian
the co-occurrence of the verb pazarol (waste) and the
noun idı (time), or the verb gerjeszt (induce) and the noun
harag (anger) within a single sentence almost always
signals a metaphor.
Our analyses also found several examples highlighting the
importance of grammatical form: for example, in the case
of the conceptual metaphor
RESOURCES ARE FOOD
,
according to the reference corpus method the source
domain is represented mainly by verbs (fogyaszt
‘consume’, felfal ‘devour’, táplál ‘feed’), while the
majority of words collected in association experiment are
nouns (edény ‘dish’, fagylalt ‘ice cream’, reggeli
‘breakfast’ etc.). This observation supports the results
obtained by Deignan (2005) showing that for the majority
of metaphorical expressions, words referring to the source
domain are verbs or adjectives. The author argues that this
is because in metaphorical language use people try to
describe abstract entities, thus they take words denoting
behaviors, features or actions from the concrete source
domains. Of course, the confirmation of these hypotheses
requires a more comprehensive analysis of the metaphors
found so far. Our plans for the future involve the
expansion of the reference corpus and the extraction of a
larger word list for source and target domains. At the
same time, we intend to analyze the English, Spanish and
Italian versions of the texts, and to compare the results
with the Hungarian data, since cross-linguistic analyses
might reveal important factors in the conceptual nature of
metaphorical expressions.
5. References
Andrews, M., Vigliocco, G., Vinson, D. (2005). The role
of attributional and distributional information in
semantic representation. In B. Bara, L. Barsalou, & M.
Bucciarelli (Eds.), Proceedings of the Twenty Seventh
Annual Conference of the Cognitive Science Society.
Andrews, M., Vinson, D., Vigliocco, G. (2007).
Evaluating the Contribution of Intra-Linguistic and
Extra-Linguistic Data to the Structure of Human
Semantic Representations. In Proceedings of the
Cognitive Science Society.
Boroditsky, L., Ramscar, M. (2002). The roles of body
and mind in abstract thought. Psychological Science,
13, pp. 185–188.
Burgess, C., Lund, K. (1997). Representing abstract words
and emotional connotation in high-dimensional
memory space. In Proceedings of the Cognitive Science
Society, Hillsdale, NJ: Lawrence Erlbaum Associates,
pp. 61–66.
Cunningham, H., Maynard, D., Bontcheva, K., Tablan, V.
(2002). GATE: A Framework and Graphical
Development Environment for Robust NLP Tools and
Applications. In Proceedings of the 40th Anniversary
Meeting of the Association for Computational
Linguistics (ACL'02), Philadelphia.
Deignan, A. (2005). Metaphor and corpus linguistics,
Amsterdam/Philadelphia: John Benjamins.
Deignan, A. (2008). Corpus linguistics and metaphor. In
R.W. Gibbs Jr. (Ed.), The Cambridge Handbook of
Metaphor and Thought, Cambridge: Cambridge
University Press, pp. 280–294.
Fauconnier, G., Turner, M. (2002). The way we think:
conceptual blending and the mind’s hidden
complexities. New York: Basicbooks.
Gibbs, R.W. (2006). Embodiment and cognitive science,
New York: Cambridge University Press.
Gibbs Jr., R.W., Matlock, T. (2008). Metaphor,
imagination and simulation. Psycholinguistic evidence.
In R.W. Gibbs Jr., (Ed.), The Cambridge Handbook of
Metaphor and Thought, Cambridge: Cambridge
University Press, pp. 161–176.
Keysar, B., Shen, Y., Glucksberg, S., Horton, W.S.
(2000). Conventional language: How metaphorical is
it? Journal of Memory and Language, 43, pp. 576–593.
Kiss, G. (2007). Magyar szókincstár [Hungarian Word
Thesaurus], Budapest: Tinta.
Kövecses, Z. (2002). Metaphor. A Practical Introduction,
Oxford: University Press.
Lakoff, G., Johnson, M. (1980). Metaphors we live by,
Chicago: University of Chicago Press.
Lakoff, G., Johnson, M. (1999). Philosophy in the Flesh:
The Embodied Mind and Its Challenge to Western
Thought, New York, NY: Basic Books.
Landauer, T.K., Dumais, S.T. (1997). A solution to Plato's
problem: the Latent Semantic Analysis theory of
acquisition, induction and representation of knowledge.
Psychological Review, 104(2), pp. 211–240.
Martin, J.H. (2006). A corpus-based analysis of context
effects on metaphor comprehension. In A.
Stefanowitsch,& S.Th. Gries (Eds.), Corpus-based
approaches to metaphor and metonymy, Berlin/New
York: Mouton de Gruyter, pp. 214–236.
Halácsy, P., Kornai, A., Oravecz, Cs. (2007). HunPos - an
open source trigram tagger. In Proceedings of the 45th
Annual Meeting of the Association for Computational
Linguistics Companion Volume Proceedings of the
Demo and Poster Sessions, Association for
Computational Linguistics, Prague, Czech Republic,
pp. 209–212.
Pragglejaz Group. (2007). MIP: A method for identifying
metaphorically used words in discourse. Metaphor and
Symbol, 22(1), pp. 1–39.
Stefanowitsch, A. (2006). Words and their metaphors: a
corpus-based approach. In A. Stefanowitsch & S.Th.
Gries (Eds.), Corpus-based approaches to metaphor
and metonymy, Berlin/New York: Mouton de Gruyter,
pp. 63–105.
Szamarasz, V.Z. (2006). Az idı téri metaforái: a
metaforák szerepe a feldolgozásban. Világosság, 47(8-
9-10), pp. 99–109.
Trón, V., Németh, L., Halácsy, P., Kornai, A., Gyepesi,
Gy., Varga, D. (2005). Hunmorph: open source word
analysis. In Proceedings of the ACL 2005 Workshop on
Software, pp. 77–85.
... And even when we look beyond the MIPVU protocol, corpus-based investigations of metaphor in Hungarian turn out to be very limited in scope. Although one can find more or less corpus-assisted studies on metaphorical expressions in Hungarian (e.g., the cross-linguistic analysis of blood in American English and Hungarian by Simó (2011), studying the metaphorization of body parts in English, German and Hungarian (Tóth-Czifra, 2014), or assessing the processes of automatic metaphor identification in Hungarian (Babarczy et al., 2010), they are less systematic from a methodological point of view. Overall, it seems fair to say that Hungarian has remained a relatively understudied language in the field of corpus-based analysis of metaphorical expressions. ...
Article
Full-text available
The aim of the article is to present a new language resource for metaphor analysis in corpora that is (i) a MIPVU-inspired, morpheme-based process for identifying metaphor in Hungarian and (ii) the refinement and innovative version of metaphor identification extending the scope of the process to multi-word expressions. The elaboration of language-specific protocols in metaphor identification has become one of the central endeavors in contemporary cross-linguistic research on metaphor, but there is a gap in the field regarding languages with rich morphology, especially in the case of Hungarian. To fill this gap, we developed a hybrid, morpheme-based version of the original method, which can handle morphologically complex metaphorical expressions. Additional innovations of our protocol are the measurement and tagging of idiomaticity in metaphors based on collocation analysis and the identification of semantic relationships between the components of metaphorical expressions. The present paper discusses both the theoretical motivation and the practical details of the adapted method for metaphor identification. As a conclusion, the presented protocol can provide new answers to the questions of metaphor identification in languages with rich morphology and shed new light on the internal semantic organization of linguistic metaphors.
... In addition to providing metaphor researchers supplementary strategies to validate impressionistic conclusions of model dominance, the methods modeled in this study offer several future lines of inquiry concerning the automated extraction of metaphorical data. The programmed extraction of metaphorical tokens is not only of interest to corpus linguists and metaphor researchers but is also a focus of attention in natural language processing and computational linguistics (Babarczy et al 2010; Tang et al. 2010). A subgroup of these researchers is focused on the automatic detection of metaphorical tokens in relation to the system of conceptual metaphors that structure a given language (cf. ...
... Second, we have followed a systematic method to reduce subjectivity in metaphor identification . The use of MIPVU's instructions (Steen, Dorst, Herrmann, Kaal, Krennmayer, & Pasma, 2010) to identify metaphor-related words and the elucidation of a method to detect and label conceptual metaphors (inspired in Babarczy et al., 2010 ) have allowed us to make generalizations about metaphorical behavior in musical analyses. We believe that our methodological approach produces more transparent and replicable results, given that it comprises a set of explicit instructions that are systematically applied to a representative corpus of authentic examples. ...
Article
Full-text available
This article aims to provide a corpus-based evidence of (a) the ubiquitous presence of metaphors in verbal discourse about classical music and (b) the embodied basis of metaphors for musical motion. We analyzed authentic examples extracted from a 5,000-word corpus of texts taken from peer-reviewed music academic journals.We applied a systematic method to identify metaphor-related words (Metaphor Identification Procedure Vrije Universiteit Amsterdam [MIPVU]; Steen, Dorst, Herrmann, Kaal, Krennmayer, & Pasma, 2010) and to label conceptual metaphors (Babarczy, Bencze, Fekele, & Simon, 2010) that reduces the analyst’s bias in the identification of metaphors. Our main findings are: (a) the presence of metaphors in academic discourse on music (29%) is significantly higher than in academic discourse in general (19%; Steen, Dorst, Herrmann, Kaal, Krennmayer, & Pasma, 2010); (b) most of the identified metaphors to describe musical motion are correlational metaphors (Grady, 1999); and (c) metaphors for musical motion are structured in the same way as the metaphors that make up Lakoff’s (1993) Event Structure Metaphor, thus giving rise to the Musical Event Structure Metaphor.
Article
Full-text available
Z jezikom nismo vedno zmožni neposredno ubesediti vsega, kar mislimo, zato za razlago pojavnosti uporabljamo različne jezikovno-kognitivne postopke, med drugim metafore in metonimije. Prepoznavanje vrednosti in razširjenosti metaforičnih in metonimičnih izrazov v jeziku je v zadnjih dvajsetih letih vodilo k povečanemu zanimanju za sistematično identifikacijo in luščenje tovrstnih figurativnih izrazov v korpusih posameznih jezikov. Izraze, pri katerih potekajo konceptualne preslikave, ki sodelujejo pri metaforičnih in metonimičnih procesih, je namreč težko izluščiti iz korpusa, ki niso posebej označeni za namene raziskovanja figurativnega jezika. V članku opredelim razumevanje konceptualne metafore in konceptualne metonimije, predstavim najpogostejše metode luščenja metaforičnih in metonimičnih izrazov iz jezikovnih korpusov ter na primeru korpusa g-KOMET, ki je ročno označen za metaforične izraze in metonimične prenose, ponazarjam poskus sistematizacije nekaterih najbolj prisotnih metonimičnih prenosov v slovenskem govorjenem jeziku.
Article
Much recent research on figurative language and conceptual metaphor theory derives from corpus examination, and analysts are increasingly focused on the development of quantificational tools to reveal co-occurrence patterns indicative of source and target domain associations. Some mappings between source and target are transparent and appear in collocation patterns in natural language data. However, other metaphors, especially those that structure abstract processes, are more complex because the target domain is lexically divorced from the source. Using economic discourse as a case study, this paper introduces new techniques directed at the quantitative evaluation of metaphorical occurrence when target and source relationships are nonobvious. Constellations of source-domain triggers are identified in the data and shown to disproportionately emerge in topic-specific discourse.
Conference Paper
Full-text available
Common tasks involving orthographic words include spellchecking, stemming, morphological analysis, and morphological synthesis. To enable significant reuse of the language-specific resources across all such tasks, we have extended the functionality of the open source spellchecker MySpell, yielding a generic word analysis library, the runtime layer of the hunmorph toolkit. We added an offline resource management component, hunlex, which complements the efficiency of our runtime layer with a high-level description language and a configurable precompiler.
Article
Full-text available
In this paper, I propose and demonstrate a corpus-based approach to the investigation of metaphorical target domains based on retrieving representative lexical items from the target domain and indentifying the metaphorical expressions associated with them. I show that this approach is superior in terms of data coverage compared to the traditional method of eclectically collecting citations or gathering data from introspection. In addition to its superior coverage, a corpus-based approach allows us to quantify the frequency of individual metaphors, and I show how central metaphors can be identified on the basis of such quantitative data. Finally, I argue that a focus on metaphors associated with individual lexical items opens up the possibility to investigate the interaction between metaphor and lexical semantics.
Article
Full-text available
We evaluate a fundamental assumption of Lakoff and Johnson's (1980a, 1980b) view that people routinely use conceptual mappings to understand conventional expressions in ordinary discourse. Lakoff and Johnson argue that people rely on mappings such as ARGUMENT IS WAR in understanding expressions such as his criticism was right on target. We propose that people need not rely on conceptual mappings for conventional expressions, although such mappings may be used to understand nonconventional expressions. Three experiments support this claim. Experiments 1 and 2 used a reading-time measure and found no evidence that readers used conceptual mappings to understand conventional expressions. In contrast, the experiments did reveal the use of such mappings with nonconventional expressions. A third experiment ruled out lexical or semantic priming as an explanation for the results. Our findings call into question Lakoff and Johnson's central claim about the relationship between conventional expressions and conceptual mappings.
Article
Full-text available
This article presents an explicit method that can be reliably employed to identify metaphorically used words in discourse. Our aim is to provide metaphor scholars with a tool that may be flexibly applied to many research contexts. We present the “metaphor identification procedure” (MIP), followed by an example of how the procedure can be applied to identifying metaphorically used words in 1 text. We then suggest a format for reporting the results of MIP, and present the data from our case study describing the empirical reliability of the procedure, discuss several complications associated with using the procedure in practice, and then briefly compare MIP to other proposals on metaphor identification. The final section of the paper suggests ways that MIP may be employed in disciplinary and interdisciplinary studies of metaphor.
Article
Full-text available
In the world of non-proprietary NLP soft-ware the standard, and perhaps the best, HMM-based POS tagger is TnT (Brants, 2000). We argue here that some of the crit-icism aimed at HMM performance on lan-guages with rich morphology should more properly be directed at TnT's peculiar li-cense, free but not open source, since it is those details of the implementation which are hidden from the user that hold the key for improved POS tagging across a wider variety of languages. We present HunPos 1 , a free and open source (LGPL-licensed) al-ternative, which can be tuned by the user to fully utilize the potential of HMM architec-tures, offering performance comparable to more complex models, but preserving the ease and speed of the training and tagging process.
Book
This clear and lucid primer fills an important need by providing a comprehensive account of the many new developments in the study of metaphor over the last twenty years and their impact on our understanding of language, culture, and the mind. Beginning with Lakoff and Johnson's seminal work in Metaphors We Live By, Kövecses outlines the development of “the cognitive linguistic theory of metaphor” by explaining key ideas on metaphor. He also explores primary metaphor, metaphor systems, the “invariance principle,” mental-imagery experiments, the many-space blending theory, and the role of image schemas in metaphorical thought. He examines the applicability of these ideas to numerous related fields.
Book
George Lakoff and Mark Johnson take on the daunting task of rebuilding Western philosophy in alignment with three fundamental lessons from cognitive science: The mind is inherently embodied, thought is mostly unconscious, and abstract concepts are largely metaphorical. Why so daunting? "Cognitive science--the empirical study of the mind--calls upon us to create a new, empirically responsible philosophy, a philosophy consistent with empirical discoveries about the nature of mind," they write. "A serious appreciation of cognitive science requires us to rethink philosophy from the beginning, in a way that would put it more in touch with the reality of how we think." In other words, no Platonic forms, no Cartesian mind-body duality, no Kantian pure logic. Even Noam Chomsky's generative linguistics is revealed under scrutiny to have substantial problems. Parts of Philosophy in the Flesh retrace the ground covered in the authors' earlier Metaphors We Live By , which revealed how we deal with abstract concepts through metaphor. (The previous sentence, for example, relies on the metaphors "Knowledge is a place" and "Knowing is seeing" to make its point.) Here they reveal the metaphorical underpinnings of basic philosophical concepts like time, causality--even morality--demonstrating how these metaphors are rooted in our embodied experiences. They repropose philosophy as an attempt to perfect such conceptual metaphors so that we can understand how our thought processes shape our experience; they even make a tentative effort toward rescuing spirituality from the heavy blows dealt by the disproving of the disembodied mind or "soul" by reimagining "transcendence" as "imaginative empathetic projection." Their source list is helpfully arranged by subject matter, making it easier to follow up on their citations. If you enjoyed the mental workout from Steven Pinker's How the Mind Works , Lakoff and Johnson will, to pursue the "Learning is exercise" metaphor, take you to the next level of training. --Ron Hogan Two leading thinkers offer a blueprint for a new philosophy. "Their ambition is massive, their argument important.…The authors engage in a sort of metaphorical genome project, attempting to delineate the genetic code of human thought." -The New York Times Book Review "This book will be an instant academic best-seller." -Mark Turner, University of Maryland This is philosophy as it has never been seen before. Lakoff and Johnson show that a philosophy responsible to the science of the mind offers a radically new and detailed understandings of what a person is. After first describing the philosophical stance that must follow from taking cognitive science seriously, they re-examine the basic concepts of the mind, time, causation, morality, and the self; then they rethink a host of philosophical traditions, from the classical Greeks through Kantian morality through modern analytical philosophy.
Article
We describe Bayesian models that learn semantic rep-resentations from either extra-linguistic data or intra-linguistic data, or from both in combination. We evalu-ate the validity of these models using three human-based measures of semantic similarity. The results provide strong evidence for the hypothesis that human semantic representations are the product of the statistical combi-nation of extra-and intra-linguistic sources of data.