ChapterPDF Available

Getting to the Bottom of How Language Works



Content may be subject to copyright.
Getting to the Bottom of How
Language Works
Gilles-Maurice de Schryver
With Patrick around, it always feels that bit more likely that we shall
get to the bottom of how language works.
Adam Kilgarriff 1
1. Genesis of this book
This book originated in 2006, when some friends, colleagues, and admirers of
Patrick Hanks got together at the prompting of one of them and agreed to com-
pile a Festschrift for him. It is a pleasure to pay tribute here to the original edi-
tors Gregory Grefenstette, Ramesh Krishnamurthy, Karel Pala, and James
Pustejovsky and to the constant enthusiasm and unwavering support of a few
early contributors, in particular Igor Mel‘čuk and David Wiggins. Special men-
tion must be made here of Anne Urbschat, who first broached the idea and he-
roically carried it forward virtually unaided for three years. In the summer of
2009, Anne challenged me to take over the project, knock it into shape, and find
a publisher.
When I came to examine the contributions that Anne had already solicited
and obtained, I recognized immediately that this was not a mere ragbag of duti-
ful tributes by a small coterie of colleagues, but a major contribution to under-
standing the lexicon (an area of linguistics of great current interest), by an un-
usually wide range of leading lexicographers, lexicologists, philosophers of
language, and computational linguists. More specifically, the contributions ad-
dress issues in the analysis of corpus data and definition writing from both a
theoretical and a practical point of view. I solicited some additional contribu-
tions and rejected a few that did not seem to fit in well with the theme of the
4 Gilles-Maurice de Schryver
2. Patrick Hanks’s contribution to lexicography, corpus linguistics, and
lexical theory
Patrick Hanks is a linguistic theorist and empirical corpus analyst, also an ono-
mastician, but above all he is a lexicographer. He has a way with words. And
yet, in many of his theoretical writings, he proposes that lexicographers need to
do away with words in order to focus on phraseology. In the words of one of
his good friends:
He‘s the ideal lexicographer‘s lexicographer. I like the way he thinks about
Sue Atkins
Patrick has played a central role in editing no less than four major, highly
original dictionaries of the English language the Hamlyn Encyclopedic World
Dictionary (1971), the Collins Dictionary of the English Language (1979), the
Collins COBUILD English Language Dictionary (1987), and The New Oxford
Dictionary of English (1998) as well as three important dictionaries of per-
sonal names (for these, cf. Section 3).2
However, it is for his work as a corpus linguist, building on foundations laid
by the late John Sinclair, for which he will probably be best remembered. Hav-
ing hacked his way through literally hundreds of thousands of corpus vines for
three decades now, Patrick has come to answer the question Do Word Mean-
ings Exist? with ‗Yes, but …‘:
Yes, but traditional dictionaries give a false impression. What dictionar-
ies contain are (more or less inaccurate) statements of meaning poten-
tials, not meanings;
Yes, but only in context;
Yes, but the meaning potential of a word consists of a cluster of seman-
tic components, only some of which are activated when the word is used
in context. (cf. Hanks 2000c)
A decade on, his work has morphed into the Theory of Norms and Exploitations,
‗a principled approach according to which exploitations can be identified as
such and set on one side, leaving the distinctive patterns of normal phraseology
associated with each word to stand out more clearly‘ (Hanks forthcoming d; but
see also Hanks 2004a).
Getting to the Bottom of How Language Works 5
Certainly, Patrick‘s most famous publication is ‗Word Association Norms,
Mutual Information, and Lexicography,‘ written jointly with Ken Church,
which reintroduced statistical methods of lexical analysis into linguistics and
which emphasized the importance, for applications such as lexicography, lan-
guage teaching, and NLP, of measuring the statistical significance of word as-
sociations in corpora.3
The Church and Hanks paper had a galvanizing effect on the computational
linguistics world in 1989, when it was presented at the 27th Annual Meeting of
the Association for Computational Linguistics (ACL) in Vancouver. There, it
was the only paper to discuss statistical methods in computational linguistics,
while at most (if not all) previous meetings of ACL there were none. Nowadays,
such papers at ACL are in the majority.
The paper has attracted occasional hostile criticism, but seemingly only by
people who feel threatened by it. For example, some people have proposed log-
likelihood measures as a means of compensating for the so-called sparse data
problem. Computer scientists seem to like log-likelihood very much it is
elegant. But it has not been used in lexicography because it typically produces
results that are less useful, practically speaking, than MI score (the statistical
measure used by Church and Hanks in their 1989 paper) or t-score, which, as
they were to comment in a later paper (Church et al. 1994) favours collocating
function words, whereas MI favours collocations of pairs of content words.
The argument is that collocations have a large role to play in decoding mean-
ing, and that normal collocations are frequently recurrent in actual usage, so
their relative importance can be measured by analysis of a large body of texts.
What is more, Church and Hanks found (and published) a methodology for
discovering the most significant collocates of any selected target word. The
importance of this cannot be underestimated. Previous studies measured rela-
tions between two pre-selected target words, so they did not give us a discovery
procedure. Church and Hanks also showed how collocates can be grouped to
decide meaning.
When the proceedings of recent corpus linguistics conferences are read, it is
surprising and saddening to note that there are many corpus linguists who have
still, twenty years on, not yet adjusted their thinking to the most fundamental
theoretical implication of this paper, namely that natural languages are analogi-
cal systems built around prototypes of many different sorts, and that corpora
make it possible to identify these prototypes and measure agreement and vari-
ance statistically. If Church and Hanks are right about this (and their implication
is hard to refute), it means that all linguistic categorization is a statistical proce-
6 Gilles-Maurice de Schryver
dure, a point of fundamental importance for lexicography of many different
genres, as well as for theoretical and corpus linguistics.
3. Patrick Hanks’s career
As an undergraduate at Oxford, studying English Language and Literature, Pat-
rick had dreams of becoming a medievalist, but it was not to be. Like many a
young graduate before and since, he drifted around for a while after leaving
Oxford and ended up going into publishing as a trainee editor. He has said that
he learned more about the practicalities of the English language from Mr Bird
and Mr Bell the chief editor and his deputy in the ‗morgue‘ (the editorial
department) of George G. Harrap & Co. Ltd., than from any academic course of
He owes his career in lexicography to the millionaire philanthropist pub-
lisher Paul Hamlyn, who in 1964 appointed the then 24-year-old Patrick as edi-
tor of the Encyclopedic World Dictionary (EWD, 1971), an Anglicization of
Clarence Barnhart‘s The American College Dictionary (1947). Patrick‘s brief,
as he recalls it, was something like: ‗Minimum of alteration, just introduce
British spellings and add a few cricket terms. However, this on-the-job train-
ing was to lead immediately to the rather obvious discovery that a dictionary is
a collective cultural index and that the cultural world of speakers of British
English is a world apart from American culture and of course other English-
speaking countries are different again. These differences, obviously, are not
restricted to cricket and baseball, nor even to ‗football‘ and ‗hockey‘ which
mean completely different things in America. A huge number of cultural and
vocabulary differences rapidly became apparent as he and the young team ap-
pointed to work with him began ploughing their way through the text. These
included differences of social structure, law, business practice, leisure activities,
language use, and many, many other domains. A team of six lexicographers, all
learning on the job,‘ together with a dozen expert consultants, was assembled.
To take just one example of the need for expert advice: The plant and animal
species of Britain and Europe are of course quite different from those of Amer-
ica, while those in Australia, New Zealand, South Africa, and other English-
speaking countries are different again. Patrick therefore decided that a full-scale,
world-wide review of terms denoting flora and fauna was necessary (from a
British standpoint, naturally). This was just one of many special investigations
that were undertaken for the Hamlyn dictionary. This dictionary did not make
much of a dent in the market supremacy in Britain of Oxford and Chambers, but
Getting to the Bottom of How Language Works 7
it did well abroad, and was in turn to be the foundation for the Australian Mac-
quarie Dictionary.
While editing EWD, Patrick got to know the American lexicographer Laur-
ence Urdang (cf. Hanks 2008b) and, through him, the Scottish publisher Jan
Collins, who was looking for a British editor for a monolingual dictionary to
join the range of new Collins bilingual dictionaries that he had commissioned.
Together with Larry Urdang, Alan Isaacs, Paul Procter, and Della Summers, all
of whom were to make important contributions to lexicography in English, and
a large team of assistants and consultants, Patrick set to work to create a com-
pletely new dictionary of English. This was to be the hugely successful Collins
Dictionary of the English Language (1979), which became the flagship of a
range of synchronic monolingual English dictionaries published by Collins.
Patrick‘s introductory essay on ‗Meaning and Grammar‘ in the first edition of
this dictionary (Hanks 1979a, unfortunately omitted from subsequent editions),
is his first important theoretical statement. It is of interest, not least because it is
based on the experience of practical lexicography rather than on theoretical
speculation. At the same time, he wrote a fundamental paper on the theory and
practice of definition writing (Hanks 1979b).
At this point, Patrick decided that the time had come to put lexicography be-
hind him. After a short period teaching English for Business Purposes in Swe-
den, he signed up as a PhD student in linguistics at the University of Essex. At
this time (1980-82), the Department of Language and Linguistics at the Univer-
sity of Essex was, as he puts it, crawling with Chomskyans.‘ The interaction
was one that he describes as one of mutual bewilderment.‘ Patrick‘s interests in
word use and word meaning were satisfied neither by the syntactocentric ap-
proach to linguistic theory of generative linguists nor by the logical approaches
to meaning (symbol pushing) of formal semanticists. It was, however, not an
entirely unproductive period for him, as he struck up a lifelong friendship with
his supervisor, Yorick Wilks, who introduced him to Preference Semantics in
the context of Artificial Intelligence and to the work of linguistic philosophers
such as Wittgenstein, Putnam, and Grice, all of which were to bear fruit later on.
Patrick did not pursue his PhD studies at Essex, however. According to his su-
Patrick turned out not to need a PhD for what he was then doing and decided
to leave that until later, like a continental scholar of the age of ―habilitation.‖
But I suspect we both gained from our discussions in those times long past
I certainly did.
Yorick Wilks (cf. p. 50)
8 Gilles-Maurice de Schryver
In 1983, Patrick received an offer he could not refuse.‘ He was appointed
project manager of John Sinclair‘s COBUILD project at the University of Bir-
mingham. COBUILD at this time had been in development for over two years.
Patrick describes what he found there as a project bursting with talent and good
ideas, but suffering from inadequate management.‘ One of the lexicographers
on the team takes up the story:
The first thing Patrick said to me was ‗I want to die.‘ This was 1983. He had
just arrived to bring some sense to the intellectually exciting but organiza-
tionally-challenged COBUILD project (on which I was a lowly lexicogra-
pher) and had presumably just been looking at his in-tray. I liked him im-
mediately, and over the years have learned an enormous amount from him,
both in conversations and from reading his prolific, original, and insightful
output. For my money he is the most important lexicographic thinker since
Samuel Johnson.
Michael Rundell
John Sinclair‘s low-key approach which typically consisted of throwing out a
few carefully-chosen, thought-provoking questions and encouraging students to
work things out for themselves by analysing data, debate, and other activities
was very well-suited to the context of university teaching and research, but not
to the business of dictionary publishing, where something more heavy-handed is
called for. Patrick provided the requisite heavy-handedness to the management
of the project, as well as more delicate contributions to the content. He restruc-
tured the project, negotiated new arrangements with the university bureaucracy
and Collins Publishers, and hired staff to be trained as full-time professional
lexicographers, rather than part-timers working at home, with only a skeleton
staff of full-time professionals. He chaired a committee of the three team lead-
ers (senior lexicographers) and insisted on mutual common sense and practical
implementation of collectively agreed editorial policy i.e. the committee was
not just an academic talking shop for airing ideas. He introduced a few adjust-
ments of his own to project policy. Probably the most important of these was
insisting that lexicographers should take account of not just one, but three
sources of evidence: corpus data, intuitions, and previous dictionaries. That is,
he saw a role for respectful evaluation of the English lexicographic tradition and
the personal knowledge of lexicographers in particular the comparison of
intuitions of two or more team members in interpreting corpus data. He was
not afraid to make ruthless judgements.
Getting to the Bottom of How Language Works 9
Less than a year before the end of the project, he judged (to everyone‘s con-
sternation) that the entries that had been drafted so far were riddled with unac-
ceptable vagueness about the relationship between definitions and definienda,
and that this would confuse learners. As a result, after some debate, and with the
computational support of Jeremy Clear, all the definitions were rewritten during
the final editing phase in the now familiar COBUILD style of ‗full-sentence
definitions,‘ as a practical implementation of Sinclair‘s objections to traditional
dictionary style. The details are discussed in Hanks (1987, 1988b).
Working on the COBUILD project brought about a ‗road to Damascus‘ con-
version for Patrick personally, which was to influence all his subsequent work
in corpus linguistics and lexicography. Definitions in his two previous diction-
aries had been based on a mixture of comparative introspection (at least two
members of the editorial team had to agree that the definitions in an entry cor-
rectly represented their mutual beliefs about the word‘s meaning) and accretion
(surveying earlier dictionaries), supported by directed reading programmes,
designed to explore the vocabulary of particular domains rather than the lan-
guage as a whole. COBUILD changed all that. Working systematically through
the lexicon, Patrick and his colleagues at COBUILD became aware from a very
early stage in the project that pre-corpus dictionaries regularly misreport or
distort the basic meanings of words. Dictionaries that claimed to put the mod-
ern meaning first often failed to do so, because they did not have enough evi-
dence to know which meaning was the modern meaning. Distortions ranged
from the gross to the subtle. An example of a gross distortion was that pre-
corpus dictionaries (even those claiming to put the modern meaning first) stated
that the most modern meaning of the noun dope is a kind of varnish,‘ even
though by 1979 the ‗drugs‘ sense had been dominant for at least two decades.
Examples of more subtle questions involved deciding how widely to define the
scope of dope. Is it a slang term used generically for any drug, even aspirins and
laxatives? Or does it denote only drugs that have some effect such as impairing
athletic performance or imparting a ‗high‘ or a sense of wellbeing? When used
of a recreational drug, does it denote primarily cannabis, or is it used more
widely to include cocaine, heroin, etc.? Without corpus evidence, lexicogra-
phers were powerless to decide such questions by any means other than guess-
work. Corpus evidence provided a basis for an attempt to provide answers. The
answers were not always correct (subsequent collections of data from larger
corpora and from the Internet have since prompted revisions), but COBUILD‘s
definitions were not only radically different in style from those of previous dic-
tionaries; they also represented the first attempt ever to base the definitions of
10 Gilles-Maurice de Schryver
contemporary words on usage as recorded in a large collection of contemporary
A much debated word in COBUILD at the time was take. Should a concrete
sense such as ‗remove,‗steal,‘ ‗escort,etc. be placed first, or should pride of
place be given to the much more frequent idiomatic use of the verb in semanti-
cally depleted structures such as take a look, take a step, take a breath, take a
walk, take time, take charge, take a chance?
In Hanks (2000d), Patrick recounts his earliest experiences of studying cor-
pus evidence with John Sinclair. Looking at concordances for the word lap,
Patrick commented that the several hundred uses of this word in the Birming-
ham corpus did not contain a single example of going once around a track.‘
Sinclair‘s response was, ‗I‘m much more interested in all those punctuation
marks. Indeed, the most normal use of lap in English is in a prepositional
phrase in clause-final position: in her lap, on his lap, etc. Obviously, this sense
and this phraseology had to be represented first in the COBUILD dictionary,
although the other senses had to be recorded as well.
Literally thousands of similar decisions about the meaning and use of Eng-
lish words were supported in COBUILD by corpus evidence, leading its pub-
lisher to claim that COBUILD, published in 1987, was the first dictionary de-
signed to help learners with ‗real English.‘
For an overview of those revolutionary times in lexicography, see Patrick‘s
own accounts in Hanks (1990a) an early take, as actually written in 1984; and
Hanks (2009d) for a perspective 25 years later.
After the publication of the first edition of COBUILD, Patrick spent some
time as a visiting scientist working with Ken Church at AT&T Bell Laborato-
ries in New Jersey, while retaining his role as chief editor of English dictionar-
ies for Collins. As pointed out in Section 2, it is then that Ken and Patrick wrote
a series of highly-influential papers on statistical analysis of lexical items in
corpora. Patrick developed and maintained strong links with the research com-
munity, while remaining fully engaged with practical lexicography. Starting
around that time, he also became a frequent guest lecturer at various universities
and research institutions, first in Britain and America, later around the world.
An attendee recalls meeting Patrick:
Patrick was an invited speaker at the Computing Research Laboratory at
New Mexico State University in the late 1980s. He gave a wonderful inspir-
ing talk about the use of corpora in lexicography and I was lucky enough to
join the dinner held for him. He entertained all of us with his terrific stories
and I have been a fan ever since. His recent work on norms and exploitations
Getting to the Bottom of How Language Works 11
has had a significant influence on my own research, and his insights into the
use of words and phrases has led to many helpful discussions for our work
on the detection of anomaly in text.
Louise Guthrie
In 1990 Patrick joined Oxford University Press as manager (later chief editor)
of current English dictionaries. Soon after he arrived, he organized a collabora-
tion known as the Hector Project between OUP and the Systems Research
Laboratory of Digital Equipment Corporation in Palo Alto, California. This was
the first systematic attempt ever to link word meaning with word use using cor-
pus evidence. Unfortunately, the results were never adequately reported or pub-
licly evaluated, though the Hector entries were used as a benchmark in Sen-
seval.4 The best relevant accounts of the project from within were given at suc-
cessive COMPLEX conferences in Budapest by Atkins (1992) and Hanks
(1994). Patrick‘s paper, titled Linguistic Norms and Pragmatic Exploitations,
was the first airing of what later became the corpus-driven Theory of Norms
and Exploitations.
Taking account of the findings of the Hector Project, Patrick designed and
(with Judy Pearsall as the project editor) edited The New Oxford Dictionary of
English (1998). This is the only dictionary to be based both on a vast collection
of citations (the Oxford Reading Programme, which collects evidence for new,
rare, and unusual words and meanings) and a corpus (the British National Cor-
pus). One of its many innovations was to attempt a distinction not consistently
implemented, it must be said between core meaning and meaning extension or
subsense. For example, the core sense of cocoon is the most literal one: a silky
case spun by the larvae of many insects for protection as pupae.‘ Associated
with this are two subsenses: a technical one, a covering that prevents the corro-
sion of metal equipment, and a general one, something that envelops or sur-
rounds, especially in a protective or comforting way. In everyday English, the
last of these is more common than either of the other two. As John Sinclair
rightly points out in his paper in this book (p. 38), the core sense is not always
the most common one.
In 2000, after ten years in Oxford, Patrick went to Cambridge, Massachu-
setts to work with James Pustejovsky in a software company (Lexeme, later re-
christened LingoMotors) that was developing applications of computational
linguistics, including breaking the tyranny of text matching,‘ for information
retrieval. The company bit the dust along with many other software companies
when the bubble burst in 2001-2002, but the ideas continued to be developed at
Brandeis University, where James is a professor of computer science. Patrick
12 Gilles-Maurice de Schryver
became an adjunct professor there. His research activities at Brandeis included
exploring ways in which Pustejovsky‘s Generative Lexicon Theory could be
developed and extended using corpus evidence (cf., e.g., Pustejovsky & Hanks
2001, Hanks & Pustejovsky 2005). This work was to lead to the development of
a prototype of the Corpus Pattern Analysis project (Hanks 2004a), the aim of
which is to map the relationship between word meaning and word use.
In 2003, Patrick was invited to spend a year at the Berlin-Brandenburg
Academy of Sciences and Humanities, to serve as a consultant to Christiane
Fellbaum‘s research project Kollokationen im Wörterbuch, investigating in par-
ticular idioms and light verbs. A product of this collaboration was a paper on
German light verbs, outlining a new approach to a corpus-based analysis of
verbs in dictionary format, while at the same time questioning much of the re-
ceived wisdom of the Germanistic research community on ‗function verbs‘ and
‗support verbs‘ (cf. Hanks, Urbschat & Gehweiler 2006).
Patrick has for a long time had a happy and fruitful association with scholars
in the Czech Republic. In 1995 he had been invited to teach at the seventh Se-
ries of Vilém Mathesius Courses at the Charles University in Prague, and since
then he has given a number of talks on various aspects of lexicography at meet-
ings of the Prague Linguistic Circle and other Czech Institutions. In 1996 he
taught an intensive ‗block‘ course in computational lexicography and corpus
analysis at the Faculty of Informatics, Masaryk University, in Brno and went
on to obtain a PhD degree there! The former head of the Information Technol-
ogy Department at the Faculty of Informatics, explains how this came about:
My close contacts with Patrick Hanks started in 1993, when I had the oppor-
tunity to visit Oxford and other British corpus centres (Lancaster and Bir-
mingham), together with colleagues from Prague. During the visit to Oxford
it was Patrick who took care of us. He explained how the BNC had been
prepared and how it was being used at the time for compiling The New Ox-
ford Dictionary of English (NODE, 1998). This was not only interesting for
us, but was also a motivating experience for the later development of corpus
tools at the Faculty of Informatics, Masaryk University (FI MU), in Brno,
and for building the Czech National Corpus in Prague, under the direction of
František Čermák at the Charles University.
During one of Patrick‘s later visits to Brno I was surprised to discover
that, despite his achievements in lexicography, he had never actually com-
pleted a PhD. I convinced him to go for a PhD degree at FI MU. It took
some time, but finally Patrick completed his dissertation at the end of 2001
and in April 2002 he defended it successfully. From that moment our coop-
Getting to the Bottom of How Language Works 13
eration became closer and in 2007 our national grant projects allowed me to
employ Patrick at FI MU for two years.
During those two years he put a lot of effort into the development of the
Corpus Pattern Analysis (CPA) project,5 the output of which is a Pattern
Dictionary of English Verbs (PDEV). CPA is a new technique for finding
verb meanings based on the analysis of the context in which the individual
words (verbs) occur. A lexicographer, i.e. Patrick, starts with a representa-
tive sample of the verb tokens in a corpus. Using the CPA tool, developed at
the NLP Centre at FI MU by Pavel Rychlý and Adam Rambousek, he classi-
fies the various contexts into groups, which are then characterized as verb
patterns capturing verb meanings or rather, meanings of the whole pattern
in which the verb is embedded. In the process information about particular
sentence constituents (typically subject, object, adverbial) is added as well.
The editor and browser for CPA take advantage of the DEB platform,
also developed at the NLP Centre at FI MU. This software includes the cor-
pus manager Manatee/Bonito, together with an integrated version of the
Sketch Engine,6 plus editing and browsing functions. Thus the user can build
and maintain a database of the context patterns of English verbs (PDEV) and
indeed verbs in other languages (presently Italian and Spanish, with the pos-
sibility of Czech to follow).
The PDEV project results (close to 700 English verbs so far) are publicly
accessible online;7 interested users can browse the completed verbs, and see
their patterns and distribution, as well as the corresponding concordances for
each pattern.
In 2009 Patrick moved to Jan Hajič‘s Institute of Formal and Applied
Linguistics (ÚFAL) at the Charles University in Prague, where one of the
projects currently being undertaken is an evaluation of the potential useful-
ness of PDEV for NLP applications. The software for PDEV is being devel-
oped further, with Martin Holub from ÚFAL joining hands with the team at
FI MU, and in this way the fruitful cooperation between Patrick and FI MU
Karel Pala
At the time of writing (June 2010), at an age when most people are enjoying a
peaceful retirement, Patrick‘s appetite for lexicographic work remains undimin-
ished. He has just started not one but two new jobs. As a visiting professor at
the University of Wolverhampton, he is co-editing with Ruslan Mitkov the Ox-
ford Dictionary of Computational Linguistics. And as a visiting professor at the
University of the West of England, in Bristol, he has just started work with
14 Gilles-Maurice de Schryver
Richard Coates on building a vast database of Family Names of the United
Kingdom (FaNUK), recording the origin, meaning, history, and demographics
of every name that has any reasonable frequency in Britain. The project will
build on Patrick‘s lifelong interest in names and his three dictionaries of per-
sonal names published by Oxford University Press: A Dictionary of Surnames
(1988) and A Dictionary of First Names (1990), both compiled with Flavia
Hodges, and the Dictionary of American Family Names (2003), which was sup-
ported by a team of over 30 specialist consultants. FaNUK has received a large
funding grant from the Arts and Humanities Research Council, which will not
only enable the research to be carried out on a sound academic basis, but will
also train the next generation of onomastic researchers.
4. Patrick Hanks’s Theory of Norms and Exploitations
I have quoted Karel Pala at length in the previous section, simply because Pat-
rick‘s connection with the Czech Republic, and Brno in particular, has been
exceptionally important. Here I am not referring to his closing the loop with
regard to obtaining a PhD, a process started two decades earlier at Essex, but to
the dedicated software tools that have been built for him in Brno. Thanks to
these tools, Patrick has been able to put his own theories to the test, producing a
new type of data in the process, and refining his main theory the Theory of
Norms and Exploitations further. Those who have seen drafts of his forthcom-
ing book on the Theory of Norms and Exploitations invariably (and rightly)
refer to it as his Magnum Opus.8
Patrick‘s Theory of Norms and Exploitations argues that a natural language
is indeed a rule-governed system of linguistic behaviour, but that there are two
systems of rules a sort of double helix of rules. One rule system governs
normal phraseology and meaning of words in use, while the other allows lan-
guage users to exploit normal phraseology in all sorts of creative ways. These
two rule systems constantly interact, so that it sometimes happens that a creative
and original use of a word in one generation of language users becomes estab-
lished as a new conventional norm in the next generation.
The Corpus Pattern Analysis project applies the Theory of Norms and Ex-
ploitations by focusing on the verb, the pivot of the clause.‘ Each verb is asso-
ciated with a number of patterns based on valencies and collocational prefer-
ences; each pattern has a primary implicature (i.e. the meaning of the pattern)
and any number of secondary implicatures. Actual meanings in actual texts are
created and interpreted by language users relying (in a Gricean fashion) on the
meanings associated with each conventional pattern. Thus, an actual meaningful
Getting to the Bottom of How Language Works 15
use of a word in a text is created by a speaker or writer drawing instinctively on
his or her individual store of patterns and meaning associated with that word. It
is interpreted by hearers and readers in much the same way, by referring to their
individual stores of prototypical patterns and meanings. The hearer matches the
actual observed use in a text to a stored phraseological prototype, which has a
meaning. Speaker‘s and hearer‘s stored prototypes are not identical, of course,
but any gross and noticeable differences are eliminated by social pressures as
the individual grows up and develops in a language community. This general
account has many similarities to Michael Hoey‘s Lexical Priming Theory
(2005). Hoey is, of course, another former associate of Sinclair.
According to Patrick, a pattern consists prototypically of a verb, its valency
structure, and its collocational preferences in each argument slot. Therefore, a
pattern is a prototypical proposition, available for manipulation by language
users in a great variety of ways.
It is a lexicographical task to identify, on the basis of corpus evidence, the
underlying conventional norms of word use and meaning that speakers rely on
when communicating with each other. This involves compiling an inventory of
normal uses, which leads in turn to a distinction between norms and exploita-
tions, which implies an interest in figurative language. Literal meaning and
metaphorical meaning are complementary concepts two sides of a single coin.
With this we have arrived at yet another research interest of Patrick, namely
metaphors and figurative language (cf., e.g., Hanks 2005c, 2006e, 2008a,
5. Patrick Hanks’s publications
As can be deduced from the foregoing, Patrick has been a prolific writer, of all
sorts of reference works, as well as all sorts of lexicographic and corpus-
linguistic research papers. He has covered the breadth and depth of both those
fields, and this while permanently at the forefront of innovation.
The Addendum to this chapter lists a first attempt at bringing his published
output together. As with all first attempts at completeness, this list is certainly
not complete, and may even contain a few ‗funnies. Even so, with about 150
unique entries already, in just three decades, his output can only be considered
staggering. The more so that over thirty of the works listed are dictionaries
compiled under his editorship, with no fewer than half of those 16 to be pre-
cise first editions!
If one plots the number of Patrick‘s publications over time, Figure 1 is ob-
16 Gilles-Maurice de Schryver
Figure 1: Number of Patrick Hanks‘s publications per year.
While Figure 1 clearly indicates that Patrick moved into a different, higher gear
starting around the times of the COBUILD project, the truly astonishing aspect
is to notice that he hasn‘t merely stepped up his output yet again during the past
ten years, but now truly moved into an exponential gear with no signs whatso-
ever of slowing down. As a ‗lexicographer‘s lexicographer‘ Patrick is currently
(at seventy) in his ‗prime time.‘
If one only considers the books by Patrick which are currently held in the
world‘s libraries, the statistics are at least as impressive. According to World-
Cat,9 there are over ten thousand of Patrick‘s books on the world‘s library
shelves 57 different books, totalling 134 editions. The timeline for those li-
brary books is as shown in Figure 2.
Figure 2: Patrick Hanks‘s publication timeline in the world‘s libraries.
Perhaps surprisingly, the top three of his most widely held works are not the
four major dictionaries mentioned at the start of Section 2, but the three ono-
mastic reference works mentioned at the end of Section 3: A Dictionary of Sur-
names (held in over 2,500 libraries worldwide), A Dictionary of First Names
(held in over 2,300 libraries), and the Dictionary of American Family Names
(held in close to 1,400 libraries).
Getting to the Bottom of How Language Works 17
6. Structure of this book
This book is divided into three parts, very roughly: a theoretical section, a com-
putational section, and a lexicographic section. In each of those sections, the
contributions have been placed in an order paralleling Patrick‘s career.
‗Part I: Theoretical Aspects and Background,‘ starts with the last known pa-
per of John Sinclair, which he was actively working on, for this Festschrift, at
the time of his death. This outlines the most radical version of Sinclair‘s ap-
proach to collocational analysis, which is contrasted with the terminological
approach of traditional dictionaries. Part I also contains important papers by
Wilks (on Preference Semantics), Pustejovsky & Rumshisky (on the Generative
Lexicon), Mel‘čuk (on the Government Pattern), and Wiggins (on Paradoxes),
advancing our theoretical understanding of the nature of word meaning.
Ken Church, quite naturally, opens Part II: Computing Lexical Relations,
with some reflections on the nature (size) of (Web) corpora. Grefenstette goes
on to use a copy of the Web to predict the number of concepts future (computa-
tional) lexicographers will have to describe the number he predicts is a daunt-
ing one! Guthrie & Guthrie, Geyken, and Pala & Rychlý then provide a tour
d‘horizon, taking us along the places where Patrick has been active over the
past few years the US, Germany and the Czech Republic and discuss differ-
ent aspects of computational approaches to the lexicon, in casu of adjectives,
nouns and verbs, for English, German and Czech. The last two chapters in Part
II revolve around Patrick‘s PDEV. Cinková, Holub & Smejkalová evaluate the
current PDEV, while Jezek & Frontini extend PDEV to now also cover Italian.
Sinclair‘s chapter at the start of Part I was lightly edited by Rosamund Moon,
Sinclair‘s colleague since the earliest days of the COBUILD project. ‗Part III:
Lexical Analysis and Dictionary Writing,‘ includes some thought-provoking
reflections on idiom, allusion, and convention by Rosamund herself. John Sin-
clair‘s sister, Sue Atkins, then takes over in Part III, with a fascinating compari-
son of the FrameNet database, and DANTE, the latest lexicographical database
that is being populated under Sue‘s supervision. Kilgarriff & Rychlý then at-
tempt the ultimate dream: from corpus to dictionary, if not automatically, at
least semi-automatically. On a lighter note, Bogaards then wonders whether
there is any ‗theory‘ in ‗lexicography‘ the answer, of course, depends on what
one understands under these two terms. Next, Bańko provides a lively account
of an early adoption of the COBUILD approach, in this case for Polish, while
Green delves into the world of (French) argot. Michael Rundell, finally, con-
cludes with a masterfully elegant essay on Patrick‘s elegance.
18 Gilles-Maurice de Schryver
1 Unless otherwise noted, all quotes in this chapter are ‗personal communica-
tion, gathered and received during the production of this book.
2 The details of all Patrick Hanks‘s publications mentioned in this chapter
are to be found in the list of his publications included as an Addendum to this
3 The next four paragraphs are taken from my analysis in De Schryver (2009:
4 Cf.
5 Cf. = Hanks (2007a).
6 Cf.
7 Cf. = Hanks (2007c).
8 I have been particularly ‗lucky‘ in this regard, having worked closely with
Patrick for the past three years in Brno, Witney, Ghent, Barcelona, Cape
Town, and Kampala on the final compilation and editing of this forthcoming
book. Much of the contents of this chapter, and especially those on his career,
are based on my notes of our conversations.
9 Cf.
[For all publications by Patrick Hanks, see the Addendum to this chapter.]
Atkins, B. T. S. 1992. Tools for Computer-Aided Corpus Lexicography: The
Hector Project‘ in F. Kiefer, G. Kiss & J. Pajzs (eds.). 1992. Papers in Com-
putational Lexicography: COMPLEX 92. Budapest: Linguistics Institute,
Hungarian Academy of Sciences, 159.
Barnhart, C. L. 1947. The American College Dictionary. New York: Random
De Schryver, G.-M. 2009. An Analysis of Practical Lexicography: A Reader
(Ed. Fontenelle 2008). Lexikos 19: 458489.
Hoey, M. 2005. Lexical Priming: A New Theory of Words and Language. Ab-
ingdon: Routledge.
Getting to the Bottom of How Language Works 19
Addendum: Publications by Patrick Hanks
Jalovec, Karel. 1967. German and Austrian Violin-Makers. London:
Paul Hamlyn Publishers. [Translated from Czech by George Theiner,
edited by Patrick Hanks.]
Hanks, Patrick (ed.) & Simeon Potter (editorial consultant). 1971. En-
cyclopedic World Dictionary. London: Paul Hamlyn Publishers. / first
Guralnik, David B., Martin Joos, Frederic G. Cassidy, James E.
Congleton, Patrick Hanks, Marvin Carmony, Bernard van’t Hul &
Raven I. McDavid, Jr. 1973. ‗Dictionary Treatment of Pronunciation
Regional Discussion‘ in R. I. McDavid, Jr. & A. R. Duckert (eds.).
1973. Lexicography in English (Annals of the New York Academy of
Sciences 211). New York: New York Academy of Sciences, 141143.
Mathiot, Madeleine, Willard Van Orman Quine, Henry A. Gleason,
Jr., Patrick Hanks & Sherman M. Kuhn. 1973. ‗Vagaries of Definition
Discussion‘ in R. I. McDavid, Jr. & A. R. Duckert (eds.). 1973. Lexi-
cography in English (Annals of the New York Academy of Sciences 211).
New York: New York Academy of Sciences, 251252.
Laurence Urdang Associates (comp.), Edwin Riddell (ed.) & Patrick
Hanks (managing editor). 1976. Lives of the Stuart Age, 1603-1714. New
York: Barnes & Noble Books.
Hanks, Patrick, Alan Isaacs & John Daintith. 1977. The Illustrated
Dictionary. London: Sundial Books.
Also published as Hanks, Patrick, Alan Isaacs & John Daintith.
1980. Illustrated Children’s Dictionary. London: Octopus Books Ltd.
Also published as Hanks, Patrick & Alan Isaacs. 1987. Children’s
Illustrated Dictionary. London: Bounty Books.
Laurence Urdang Associates (comp.), William Gould (ed.) & Patrick
Hanks (managing editor). 1978. Lives of the Georgian Age, 1714-1837.
New York: Barnes & Noble Books.
20 Gilles-Maurice de Schryver
Hanks, Patrick. 1979a. ‗Meaning and Grammar‘ in P. Hanks et al. (eds.).
1979. Collins Dictionary of the English Language. London: William
Collins Sons & Co. Ltd., xxxi-xxxv.
Hanks, Patrick. 1979b. ‗To What Extent Does a Dictionary Definition
Define?‘ in R. R. K. Hartmann (ed.). 1979. Dictionaries and their Users:
Papers from the 1978 B.A.A.L. Seminar on Lexicography (Exeter Lin-
guistic Studies 4 & ITL Review of Applied Linguistics 45-46). Exeter:
University of Exeter Press, 3238.
Hanks, Patrick (ed.), Thomas H. Long (managing ed.), Laurence Ur-
dang (editorial director), et al. 1979. Collins Dictionary of the English
Language. London: William Collins Sons & Co. Ltd. / first edition
Grossmith, George & Weedon Grossmith. 1981. The Diary of a No-
body. London: William Collins Sons & Co. Ltd. [With an introduction by
Patrick Hanks, illustrated by Weedon Grossmith.]
Hanks, Patrick. 1981. ‗Book Review: J. Branford. 1980. A Dictionary of
South African English: New Enlarged Edition (Oxford University Press)‘.
English in Africa 8.1: 7385.
McLeod, William T. (managing ed.), Patrick Hanks (consultant ed.), et
al. 1982. The New Collins Concise Dictionary of the English Language.
London: William Collins Sons & Co. Ltd. / first edition
Hanks, Patrick & Jim Corbett. 1986a. Business Listening Tasks: Guide
for Teachers and Self-Study (Cambridge Professional English Series).
Cambridge: Cambridge University Press.
Hanks, Patrick & Jim Corbett. 1986b. Business Listening Tasks:
Learner’s Book (Cambridge Professional English Series). Cambridge:
Cambridge University Press.
Hanks, Patrick & Flavia Hodges. 1986. The Oxford Minidictionary of
First Names. Oxford: Oxford University Press. / first edition
Hanks, Patrick (ed.), William T. McLeod (managing ed.), Laurence
Urdang (editorial director), et al. 1986. Collins Dictionary of the English
Language. London: William Collins Sons & Co. Ltd. / second edition
(first edition in 1979)
Getting to the Bottom of How Language Works 21
Hanks, Patrick. 1987. ‗Definitions and Explanations‘ in J. M. Sinclair
(ed.). 1987. Looking Up: An account of the COBUILD Project in lexical
computing and the development of the Collins COBUILD English Lan-
guage Dictionary. London: Collins ELT, 116136.
Second half (pp. 123136) reprinted in R. R. K. Hartmann (ed.). 2003.
Lexicography: Critical Concepts. London: Routledge, Vol. III, 191
Sinclair, John M. (chief ed.), Patrick Hanks (managing ed.), et al. 1987.
Collins COBUILD English Language Dictionary. London: William
Collins Sons & Co. Ltd. / first edition
Hanks, Patrick. 1988a. ‗A New Kind of Dictionary for English Learners:
Cobuild‘. Les Cahiers de l’APLIUT 30-31 (VIII / 1-2): 7388.
Hanks, Patrick. 1988b. ‗Typicality and Meaning Potentials‘ in M. Snell-
Hornby (ed.). 1988. ZüriLEX ’86 Proceedings: Papers read at the EU-
RALEX International Congress, University of Zürich, 9-14 September
1986. Tübingen: A. Francke Verlag GmbH, 3747.
Reprinted in G. Sampson & D. McCarthy (eds.). 2004. Corpus Lin-
guistics: Readings in a Widening Discipline. London: Continuum, 58
Hanks, Patrick & Flavia Hodges. 1988. A Dictionary of Surnames. Ox-
ford: Oxford University Press. / first edition
Hanks, Patrick (chief ed.), William T. McLeod, Marian Makins
(managing eds.), et al. 1988. The Collins Concise Dictionary of the Eng-
lish Language. London: William Collins Sons & Co. Ltd. / second edition
(first edition in 1982)
Sinclair, John M. (chief ed.), Gwyneth Fox, Patrick Hanks (managing
eds.), et al. 1988. Collins COBUILD Essential English Dictionary. Lon-
don: William Collins Sons & Co. Ltd. / first edition
Church, Kenneth W., William A. Gale, Patrick Hanks & Donald
Hindle. 1989. ‗Parsing, Word Associations and Typical Predicate-
Argument Relations‘ in Proceedings of the International Workshop on
Parsing Technologies, 28-31 August 1989. Pittsburgh: Carnegie Mellon
University, 389398.
Also published in Speech and Natural Language: Proceedings of a
Workshop Held at Cape Cod, Massachusetts, October 15-18, 1989.
22 Gilles-Maurice de Schryver
San Mateo: Morgan Kaufmann Publishers Inc., 7581.
Revised version published in M. Tomita (ed.). 1991. Current Issues in
Parsing Technology (Kluwer International Series in Engineering and
Computer Science 126). Norwell: Kluwer Academic Publishers, 103
Church, Kenneth W. & Patrick Hanks. 1989. ‗Word Association
Norms, Mutual Information, and Lexicography‘ in Proceedings of the
27th Annual Meeting of the Association for Computational Linguistics,
26-29 June 1989. Vancouver: University of British Columbia, 7683.
Revised version published in Computational Linguistics 16.1 (1990):
Reprinted in T. Fontenelle (ed.). 2008. Practical Lexicography: A
Reader. New York: Oxford University Press, 285295.
Reprinted in P. Hanks (ed.). 2008g. Lexicology: Critical Concepts in
Linguistics. Abingdon: Routledge, Vol. VI, 150167.
Hanks, Patrick. 1989. ‗How common is ―common‖?‘ The Collins Dic-
tionary Diary 1989: 212.
Hanks, Patrick (chief ed.), Marian Makins (managing ed.), et al. 1989a.
Collins Concise Dictionary Plus. London: William Collins Sons & Co.
Ltd. / new edition, combining dictionary with extensive encyclopaedic
Hanks, Patrick (chief ed.), Marian Makins (managing ed.), et al. 1989b.
Collins Pocket English Dictionary. London: William Collins Sons & Co.
Ltd. / first edition
Hanks, Patrick (chief ed.), Marian Makins, Gwyneth Fox (managing
eds.), et al. 1989. The Collins School Dictionary. London: William
Collins Sons & Co. Ltd. / first edition
Sinclair, John M. (chief ed.), Patrick Hanks (editorial director),
Ramesh Krishnamurthy, et al. 1989. PONS Cobuild English Learner’s
Dictionary. Stuttgart: Ernst Klett Verlag GmbH. / edition for the German
Sinclair, John M. (chief ed.), Patrick Hanks (editorial director),
Rosamund E. Moon, et al. 1989. Collins COBUILD Dictionary of
Phrasal Verbs. London: William Collins Sons & Co. Ltd. / first edition
Hanks, Patrick. 1990a. ‗Evidence and Intuition in Lexicography‘ in J.
Tomaszczyk & B. Lewandowska-Tomaszczyk (eds.). 1990. Meaning and
Lexicography (Linguistic and Literary Studies in Eastern Europe 28).
Getting to the Bottom of How Language Works 23
Amsterdam: John Benjamins Publishing Company, 3141. [Note: This
paper was actually written in 1984 but publication was delayed for 6
years by Benjamins.]
Hanks, Patrick. 1990b. ‗Towards a Statistical Dictionary of Modern
English: Some Preliminary Reflections‘ in T. Magay & J. Zigány (eds.).
1990. BudaLEX ’88 Proceedings: Papers from the 3rd International EU-
RALEX Congress, Budapest, 4-9 September 1988. Budapest: Akadémiai
Kiadó, 5357.
Hanks, Patrick & Flavia Hodges. 1990. A Dictionary of First Names.
Oxford: Oxford University Press. / first edition
Hanks, Patrick, Alan Isaacs & John Daintith. 1990. The Hamlyn Illus-
trated Children’s Dictionary. London: Hamlyn young books.
Hanks, Patrick, Marian Makins & Diana Adams. 1990. Collins Pa-
perback Thesaurus in A to Z Form. Toronto: HarperCollins Canada. /
second edition (first edition in 1986)
Hardie, Ronald G. 1990. Collins Gem English Grammar. London:
HarperCollins. [Edited by Patrick Hanks & Alice Grandison.]
Church, Kenneth W., William A. Gale, Patrick Hanks & Donald
Hindle. 1991. ‗Using Statistics in Lexical Analysis‘ in U. Zernik (ed.).
1991. Lexical Acquisition: Exploiting On-Line Resources to Build a Lexi-
con. Hillsdale: Lawrence Erlbaum Associates, 115164.
Hanks, Patrick. 1992. ‗Computational Analysis and Definitional Struc-
ture‘. Lexicographica: International Annual for Lexicography 8: 100129.
Hanks, Patrick & Flavia Hodges. 1992. A Concise Dictionary of First
Names. Oxford: Oxford University Press. / first edition
Hanks, Patrick. 1992/1993a. ‗Lexicography: Theory and Practice‘. Dic-
tionaries: Journal of the Dictionary Society of North America 14: 97112.
Hanks, Patrick. 1992/1993b. ‗The Present-Day Distribution of Sur-
names in the British Isles‘. Nomina 16: 7998.
Church, Kenneth W., William A. Gale, Patrick Hanks, Donald
Hindle & Rosamund E. Moon. 1994. ‗Lexical Substitutability‘ in B. T.
S. Atkins & A. Zampolli (eds.). 1994. Computational Approaches to the
Lexicon. Oxford: Clarendon Press, 153177.
24 Gilles-Maurice de Schryver
Hanks, Patrick. 1994. ‗Linguistic Norms and Pragmatic Exploitations,
or Why Lexicographers need Prototype Theory, and Vice Versa‘ in F.
Kiefer, G. Kiss & J. Pajzs (eds.). 1994. Papers in Computational Lexi-
cography: COMPLEX ’94. Budapest: Linguistics Institute, Hungarian
Academy of Sciences, 89113.
Reprinted in P. Hanks (ed.). 2008g. Lexicology: Critical Concepts in
Linguistics. Abingdon: Routledge, Vol. V, 233255.
Hanks, Patrick & Flavia Hodges. 1994. Naming Your Baby. Oxford:
Oxford University Press.
Hanks, Patrick, Alan Isaacs & John Daintith. 1994. The Hamlyn Illus-
trated Children’s Dictionary. London: Hamlyn young books. / new edi-
Hanks, Patrick (chief ed.), et al. 1995. Oxford English Reference Dic-
tionary. New York: Oxford University Press.
Hanks, Patrick & Flavia Hodges. 1995. Babies’ Names. Oxford: Ox-
ford University Press. / first edition
Hanks, Patrick. 1996. ‗Contextual Dependency and Lexical Sets‘. Inter-
national Journal of Corpus Linguistics 1.1: 7598.
Lewandowska-Tomaszczyk, Barbara & Patrick Hanks. 1996. ‗Com-
pletive Particles and Verbs of Closing in English‘ in E. Weigand & F.
Hundsnurscher (eds.). 1996. Lexical Structures and Language Use. Pro-
ceedings of the International Conference on Lexicology and Lexical Se-
mantics, Münster, September 13-15, 1994. Tübingen: Max Niemeyer
Verlag, 89103.
Hanks, Patrick. 1997a. ‗Ferocious Empiricism‘ = ‗Book Review: J. A.
Foley (ed.). 1996. J. M. Sinclair on Lexis and Lexicography (UniPress)‘.
International Journal of Corpus Linguistics 2.2: 289295.
Hanks, Patrick. 1997b. ‗Lexical Sets: Relevance and Probability‘ in B.
Lewandowska-Tomaszczyk & M. Thelen (eds.). 1997. Translation and
Meaning, Part 4. Proceedings of the Łódź Session of the 2nd International
Maastricht-Łódź Duo Colloquium on “Translation and Meaning”, Held
in Łódź, Poland, 20-24 September 1995. Maastricht: School of Transla-
tion and Interpreting, Hogeschool Maastricht, 119139.
Hanks, Patrick & Flavia Hodges. 1997. A Concise Dictionary of First
Names. Oxford University Press. / second edition (first edition in 1992)
Getting to the Bottom of How Language Works 25
Hanks, Patrick. 1998a. ‗Enthusiasm and Condescension‘ in T. Fonte-
nelle, P. Hiligsmann, A. Michiels, A. Moulin & S. Theissen (eds.). 1998.
Actes EURALEX’98 Proceedings: Communications soumises à EU-
RALEX’98 (Huitième Congrès International de Lexicographie) à Liège,
Belgique / Papers submitted to the Eighth EURALEX International Con-
gress on Lexicography in Liège, Belgium. Liège: English and Dutch De-
partments, University of Liège, 151166.
Reprinted in W. Teubert & R. Krishnamurthy (eds.). 2007. Corpus
Linguistics: Critical Concepts in Linguistics. Abingdon: Routledge,
Vol. III, 140153.
Hanks, Patrick. 1998b. Problemas e solucións na preparación de dic-
cionarios de idioms ingleses‘ [Problems and Solutions in the Preparation
of English Idiom Dictionaries] in X. Ferro Ruibal (ed.). 1998. Actas do I
Coloquio Galego de Fraseoloxía. Santiago de Compostela: Centro
Ramón Piñeiro para a Investigación en Humanidades Xunta de Galicia,
Hanks, Patrick & Kate Hardcastle. 1998. ‗The Multiplicitous Origins
of American Family Names‘ in W. F. H. Nicolaisen (ed.). 1998. Proceed-
ings of the XIXth International Congress of Onomastic Sciences, Aber-
deen, August 4-11, 1996. Aberdeen: Department of English, University of
Aberdeen, Vol. III, 164182.
Pearsall, Judy (ed.), Patrick Hanks (chief ed., current English diction-
aries), et al. 1998. The New Oxford Dictionary of English. New York:
Oxford University Press. / first edition
Hanks, Patrick. 2000a. ‗Contributions of Lexicography and Corpus Lin-
guistics to a Theory of Language Performance‘ in U. Heid, S. Evert, E.
Lehmann & C. Rohrer (eds.). 2000. Proceedings of the Ninth EURALEX
International Congress, EURALEX 2000, Stuttgart, Germany, August 8th
- 12th, 2000. Stuttgart: Institut für Maschinelle Sprachverarbeitung, Uni-
versität Stuttgart, 313.
Hanks, Patrick. 2000b. ‗Dictionaries of Idioms and Phraseology in Eng-
lish‘ in G. Corpas Pastor (ed.). 2000. Las lenguas de Europa: Estudios de
fraseología, fraseografía y traducción (Interlingua 12). Granada: Coma-
res, 303320.
Hanks, Patrick. 2000c. ‗Do Word Meanings Exist?‘ Computers and the
Humanities 34.1-2: 205215.
26 Gilles-Maurice de Schryver
Reprinted in T. Fontenelle (ed.). 2008. Practical Lexicography: A
Reader. New York: Oxford University Press, 123134.
Hanks, Patrick. 2000d. ‗Immediate Context Analysis: Distinguishing
Meanings by Studying Usage‘ in C. Heffer & H. Sauntson (eds.). 2000.
Words in Context: A Tribute to John Sinclair on his Retirement (ELR
Discourse Analysis Monograph 18). Birmingham: University of Bir-
mingham, 1030.
Hanks, Patrick & D. Kenneth Tucker. 2000a. ‗A Diagnostic Database
of American Personal Names‘. Names: A Journal of Onomastics 48.1:
Hanks, Patrick & D. Kenneth Tucker. 2000b. ‗Two Projects in Ono-
mastic Lexicography‘ in U. Heid, S. Evert, E. Lehmann & C. Rohrer
(eds.). 2000. Proceedings of the Ninth EURALEX International Congress,
EURALEX 2000, Stuttgart, Germany, August 8th - 12th, 2000. Stuttgart:
Institut für Maschinelle Sprachverarbeitung, Universität Stuttgart, 213
Hanks, Patrick (chief ed.), Maurice Waite, Sara Hawker, et al. 2000.
The New Oxford Thesaurus of English. Oxford University Press. / first
Hanks, Patrick. 2001a. Los diccionarios fraseológicos en lengua ingle-
sa[Dictionaries of Idioms and Phraseology in English] in M. C. Ayala
Castro (ed.). 2001. Diccionarios y enseñanza (Ensayos y documentos 45).
Alcalá de Henares: Universidad de Alcalá, 287303.
Hanks, Patrick. 2001b. ‗The Probable and the Possible: Lexicography in
the Age of the Internet‘ in S. Lee (ed.). 2001. Asialex 2001 Proceedings,
Asian Bilingualism and the Dictionary. Seoul: Center for Linguistic In-
formatics Development, Yonsei University, 115.
Reprinted in Studies in Lexicography 11.1 (2001): 736.
Hanks, Patrick & Flavia Hodges. 2001. A Concise Dictionary of First
Names. Oxford: Oxford University Press. / third edition (first edition in
1992, second edition in 1997)
Pustejovsky, James & Patrick Hanks. 2001. ‗Very Large Lexical Data-
bases‘. Tutorial at ACL 2001, Toulouse, 6-11 July 2001. Tutorial notes
available online at
Getting to the Bottom of How Language Works 27
Hanks, Patrick. 2002. ‗Mapping Meaning onto Use‘ in M.-H. Corréard
(ed.). 2002. Lexicography and Natural Language Processing: A Fest-
schrift in Honour of B. T. S. Atkins. S.l.: Euralex, 156198.
Hanks, Patrick, Flavia Hodges, Anthony D. Mills & Adrian Room.
2002. The Oxford Names Companion. Oxford: Oxford University Press.
Pustejovsky, James, Luc Belanger, José Castaño, Rob Gaizauskas,
Patrick Hanks, Bob Ingria, Graham Katz, Dragomir Radev, Anna
Rumshisky, Antonio Sanfilippo, Roser Saurí, Andrea Setzer, Beth
Sundheim & Marc Verhagen. 2002. NRRC SummerWorkshop on Tem-
poral and Event Recognition for Question Answering Systems. Online at
Hanks, Patrick. 2003a. ‗Americanization of European Family Names in
the Seventeenth and Eighteenth Century‘. Onoma: Journal of the Interna-
tional Council of Onomastic Sciences 38: 119154.
Hanks, Patrick. 2003b. ‗Lexicography‘ in R. Mitkov (ed.). 2003. The
Oxford Handbook of Computational Linguistics: 4869. New York: Ox-
ford University Press.
Hanks, Patrick. 2003c. ‗Why WordNet Should Not Include Figurative
Language, and What Would Be Done Instead‘ in P. Sojka, K. Pala, P.
Smrž, C. Fellbaum & P. Vossen (eds.). 2003. Proceedings of the Second
International WordNet Conference (GWC 2004), Brno, Czech Republic,
January 20-23, 2004. Brno: Faculty of Informatics, Masaryk University,
Hanks, Patrick (chief ed.), Kate Hardcastle (managing ed.), et al. 2003.
Dictionary of American Family Names (3 volumes). New York: Oxford
University Press. / first edition
Pustejovsky, James, Patrick Hanks, Roser Saurí, Andrew See,
Robert J. Gaizauskas, Andrea Setzer, Dragomir R. Radev, Beth
Sundheim, David Day, Lisa Ferro & Marcia Lazo. 2003. ‗The
TIMEBANK Corpus‘ in D. Archer, P. Rayson, A. Wilson & T. McEnery.
2003. Proceedings of the Corpus Linguistics 2003 Conference, 28-31
March 2003 (UCREL Technical Paper 16). Lancaster: Lancaster Univer-
sity, 647656.
28 Gilles-Maurice de Schryver
Saurí Colomer, Roser & Patrick Hanks. 2003/2004. ‗Iberian Names in
North America: The Case of Asturian‘. Revista de Filoloxía Asturiana 3-
4: 89114.
Hanks, Patrick. 2004a. ‗Corpus Pattern Analysis‘ in G. Williams & S.
Vessier (eds.). 2004. Proceedings of the Eleventh EURALEX Interna-
tional Congress, EURALEX 2004, Lorient, France, July 6-10, 2004.
Lorient: Faculté des Lettres et des Sciences Humaines, Université de Bre-
tagne Sud, 8797.
Hanks, Patrick. 2004b. ‗The Syntagmatics of Metaphor and Idiom‘. In-
ternational Journal of Lexicography 17.3: 245274.
Hanks, Patrick. 2004c. ‗WordNet: What is to be Done?‘ Panel Presen-
tation at the Second International WordNet Conference (GWC 2004),
Brno, Czech Republic, January 20-23, 2004. Brno: Faculty of Informatics,
Masaryk University, 16. Online at
Hanks, Patrick & James Pustejovsky. 2004. ‗Common Sense about
Word Meaning: Sense in Context‘ in P. Sojka, I. Kopeček & K. Pala.
(eds.). 2004. Text, Speech and Dialogue. Proceedings of the 7th Interna-
tional Conference (TSD 2004), Brno, Czech Republic, September 8-11,
2004 (Lecture Notes in Artificial Intelligence 3206). Berlin: Springer-
Verlag GmbH, 1517.
Pustejovsky, James, Patrick Hanks & Anna Rumshisky. 2004.
‗Automated Induction of Sense in Context‘ in Proceedings of the 20th In-
ternational Conference on Computational Linguistics (COLING 2004),
23-27 August 2004. Geneva: University of Geneva, 924930.
Hanks, Patrick. 2005a. ‗Johnson and Modern Lexicography‘. Interna-
tional Journal of Lexicography 18.2: 243266.
Hanks, Patrick. 2005b. ‗Metaphors and Meanings: A Lexicographical
Approach‘ in F. Kiefer, G. Kiss & J. Pajzs (eds.). 2005. Papers in Com-
putational Lexicography, COMPLEX 2005. Budapest: Linguistics Insti-
tute, Hungarian Academy of Sciences, 81106.
Hanks, Patrick. 2005c. ‗Similes and Sets: The English Preposition like
in R. Blatná & V. Petkevič (eds.). 2005. Jazyky a jazykověda: Sborník k
65. narozeninám prof. Františka Čermáka [Languages and Linguistics: A
Getting to the Bottom of How Language Works 29
Festschrift in Honour of Professor František Čermák‘s 65th Birthday].
Prague: Philosophy Faculty, Charles University in Prague.
Reprinted in P. Hanks & R. Giora (eds.). (to appear in 2010). Meta-
phor and Figurative Language: Critical Concepts in Linguistics. Ab-
ingdon: Routledge, Vol. VI.
Hanks, Patrick & James Pustejovsky. 2005. ‗A Pattern Dictionary for
Natural Language Processing‘. Revue française de linguistique appliquée
10.2: 6382.
Nicholas, Nick & Patrick Hanks. 2005. ‗Robert Ingria, 1953-2003‘.
Journal of Greek Linguistics 6.1: 243244.
Hanks, Patrick. 2006a. ‗Conventions and Metaphors (Norms and Ex-
ploitations): Why Should Cognitive Linguists bother with Corpus Evi-
dence?‘ in Meeting of the German Cognitive Linguistics Association,
Theme Session “Cognitive-Linguistic Approaches: What can we Gain by
Computational Treatment of Data?”, 7 October 2006, Munich, Germany,
Hanks, Patrick. 2006b. ‗Definition‘ in K. Brown (chief ed.), et al. 2006.
Encyclopedia of Language and Linguistics, Second Edition. Oxford: El-
sevier, Vol. III, 399402.
Hanks, Patrick. 2006c. ‗English Lexicography‘ in K. Brown (chief ed.),
et al. 2006. Encyclopedia of Language and Linguistics, Second Edition.
Oxford: Elsevier, Vol. IV, 184194.
Hanks, Patrick. 2006d. ‗Lexicography: Overview‘ in K. Brown (chief
ed.), et al. 2006. Encyclopedia of Language and Linguistics, Second Edi-
tion. Oxford: Elsevier, Vol. VII, 113128.
Hanks, Patrick. 2006e. ‗Metaphoricity is Gradable‘ in A. Stefanowitsch
& S. T. Gries (eds.). 2006. Corpus-based Approaches to Metaphor and
Metonymy (Trends in Linguistics: Studies and Monographs 171). Berlin:
Mouton de Gruyter, 1735.
Reprinted in P. Hanks & R. Giora (eds.). (to appear in 2010). Meta-
phor and Figurative Language: Critical Concepts in Linguistics. Ab-
ingdon: Routledge, Vol. VI.
Hanks, Patrick. 2006f. ‗Nicknames‘ in K. Brown (chief ed.), et al. 2006.
Encyclopedia of Language and Linguistics, Second Edition. Oxford: El-
sevier, Vol. VIII, 624626.
30 Gilles-Maurice de Schryver
Hanks, Patrick. 2006g. ‗Personal Names‘ in K. Brown (chief ed.), et al.
2006. Encyclopedia of Language and Linguistics, Second Edition. Oxford:
Elsevier, Vol. IX, 299311.
Hanks, Patrick. 2006h. ‗Proper Names: Linguistic Status‘ in K. Brown
(chief ed.), et al. 2006. Encyclopedia of Language and Linguistics, Sec-
ond Edition. Oxford: Elsevier, Vol. X, 134137.
Hanks, Patrick. 2006i. ‗The English Language: An International Me-
dium of Communication‘ in J. Wu et al. (eds.). 2006. Proceedings of the
International Conference on EU-Fujian, China: Cross-Cultural Dialogue.
Xiamen: Xiamen University.
Hanks, Patrick. 2006j. ‗The Organization of the Lexicon: Semantic
Types and Lexical Sets‘ in E. Corino, C. Marello & C. Onesti (eds.).
2006. Atti del XII Congresso Internazionale di Lessicografia, Torino, 6-9
settembre 2006 / Proceedings XII Euralex International Congress,
Torino, Italia, September 6th-9th, 2006. Alessandria: Edizioni dell‘Orso,
Hanks, Patrick (section ed.). 2006k. Section: ―Lexicography‖ in K.
Brown (chief ed.), et al. 2006. Encyclopedia of Language and Linguistics,
Second Edition. Oxford: Elsevier.
Hanks, Patrick, Kate Hardcastle & Flavia Hodges. 2006. Oxford Dic-
tionary of First Names. Oxford: Oxford University Press. / second edition
(first edition in 1990)
Hanks, Patrick, Anne Urbschat & Elke Gehweiler. 2006. ‗German
Light Verb Constructions in Corpora and Dictionaries‘. International
Journal of Lexicography 19.4: 439457.
Rumshisky, Anna, Patrick Hanks, Catherine Havasi & James Puste-
jovsky. 2006. ‗Constructing a Corpus-based Ontology Using Model Bias‘
in G. C. J. Sutcliffe & R. G. Goebel (eds.). 2006. Proceedings of the
Nineteenth International Florida Artificial Intelligence Research Society
Conference, Melbourne Beach, May 11-13, 2006. Menlo Park: AAAI
Press, 327332.
Sheidlower, Jesse & Patrick Hanks. 2006. ‗American Lexicography‘ in
K. Brown (chief ed.), et al. 2006. Encyclopedia of Language and Linguis-
tics, Second Edition. Oxford: Elsevier, Vol. I, 184193.
Hanks, Patrick. 2007a. Corpus Pattern Analysis (CPA) Project Page.
Online at
Getting to the Bottom of How Language Works 31
Hanks, Patrick. 2007b. ‗John Sinclair (1933-2007)‘. International Jour-
nal of Lexicography 20.2, Euralex Newsletter Summer 2007: 209215.
Hanks, Patrick. 2007c. Pattern Dictionary of English Verbs (PDEV)
Project Page. Online at
Hanks, Patrick. 2007d. ‗Preference Syntagmatics‘ in K. Ahmad, C.
Brewster & M. Stevenson (eds.). 2007. Words and Intelligence II: Essays
in Honor of Yorick Wilks (Text, Speech and Language Technology 36).
Berlin: Springer-Verlag GmbH, 119135.
Hanks, Patrick, Karel Pala & Pavel Rychlý. 2007. ‗Towards an Em-
pirically Well-founded Semantic Ontology for NLP‘ in Proceedings of
the 4th International Workshop on Generative Approaches to the Lexicon,
Paris, May 10-11, 2007.
Gehweiler, Elke & Patrick Hanks. 2008. ‗The Linguistic Field: An In-
vestigation‘ in P. Hanks (ed.). 2008g. Lexicology: Critical Concepts in
Linguistics. Abingdon: Routledge, Vol. II, 2244. [Translation of Jost
Trier‘s (1934) ‗Das sprachliche Feld: Eine Auseinandersetzung‘ in Neue
Jahrbücher für Wissenschaft und Jugendbildung 10: 428449.]
Hanks, Patrick. 2008a. ‗How to Say New Things: An Essay on Linguis-
tic Creativity‘. Brno Studies in English 34: 3950.
Hanks, Patrick. 2008b. ‗Laurence Urdang (1927-2008)‘. International
Journal of Lexicography 21.4, Euralex Newsletter Winter 2008: 467471.
Hanks, Patrick. 2008c. ‗Lexical Patterns: From Hornby to Hunston and
Beyond‘ in E. Bernal & J. DeCesaris (eds.). 2008. Proceedings of the
XIII EURALEX International Congress (Barcelona, 15-19 July 2008)
(Sèrie Activitats 20). Barcelona: Institut Universitari de Lingüística Apli-
cada, Universitat Pompeu Fabra, 89129.
Hanks, Patrick. 2008d. ‗Lexicology: General Introduction‘ in P. Hanks
(ed.). 2008g. Lexicology: Critical Concepts in Linguistics. Abingdon:
Routledge, Vol. I, 135.
Hanks, Patrick. 2008e. ‗The Lexicographical Legacy of John Sinclair‘.
International Journal of Lexicography 21.3: 219229.
Hanks, Patrick. 2008f. ‗Towards a Diachronic Structural Semantics‘ in
P. Hanks (ed.). 2008g. Lexicology: Critical Concepts in Linguistics. Ab-
ingdon: Routledge, Vol. II, 140193. [Translation of Eugenio Coşeriu‘s
(1964) ‗Pour une sémantique diachronique structurale‘ in Travaux de lin-
guistique et de littérature 2.1: 139186.]
32 Gilles-Maurice de Schryver
Hanks, Patrick (ed.). 2008g. Lexicology: Critical Concepts in Linguis-
tics (6 volumes). Abingdon: Routledge.
Hanks, Patrick (guest ed.). 2008h. Special Issue: ―The Legacy of John
Sinclair‖. International Journal of Lexicography 21.3.
Hanks, Patrick & Elisabetta Jezek. 2008. ‗Shimmering Lexical Sets‘ in
E. Bernal & J. DeCesaris (eds.). 2008. Proceedings of the XIII EURALEX
International Congress (Barcelona, 15-19 July 2008) (Sèrie Activitats
20). Barcelona: Institut Universitari de Lingüística Aplicada, Universitat
Pompeu Fabra, 391402.
Hanks, Patrick. 2009a. ‗Common Sense Blossoms in Springfield, MA‘
= ‗Book Review: S. J. Perrault (ed.). 2008. Merriam-Webster‘s Advanced
Learner‘s English Dictionary (Merriam Webster Inc.)‘. International
Journal of Lexicography 22.3: 301315.
Hanks, Patrick. 2009b. ‗Dictionaries of Personal Names‘ in A. P. Cowie
(ed.). 2009. The Oxford History of English Lexicography. New York: Ox-
ford University Press, Vol. II, 122148.
Hanks, Patrick. 2009c. ‗Sestavljanje enojezičnega slovarja za domače
govorce‘ [Compiling a Monolingual Dictionary for Native Speakers].
Jezik in slovstvo 54.3-4: 724.
Hanks, Patrick. 2009d. ‗The Impact of Corpora on Dictionaries‘ in P.
Baker (ed.). 2009. Contemporary Corpus Linguistics (Series: Contempo-
rary Studies in Linguistics). London: Continuum, 214236.
Hanks, Patrick. 2009e. ‗The Linguistic Double Helix: Norms and Ex-
ploitations‘ in D. Hlaváčková, A. Horák, K. Osolsobě & P. Rychlý (eds.).
2009. After Half a Century of Slavonic Natural Language Processing
(Festschrift for Karel Pala). Brno: Masaryk University, 6380.
Hanks, Patrick. 2010a. ‗Nine Issues in Metaphor Theory and Analysis‘
= ‗Book Review: A. Deignan. 2005. Metaphor and Corpus Linguistics
(John Benjamins Publishing Company)‘. International Journal of Corpus
Linguistics 15.1: 133150.
Hanks, Patrick & Kate Hardcastle. 2010. Babies’ Names. New York:
Oxford University Press. / second edition (first edition in 1995)
Jezek, Elisabetta & Patrick Hanks. 2010. ‗What Lexical Sets Tell Us
about Conceptual Categories‘. Lexis: E-Journal in English Lexicology 4
(Corpus Linguistics and the Lexicon / La linguistique de corpus et le
Getting to the Bottom of How Language Works 33
lexique), 722. Online at
To appear in 2010
Hanks, Patrick. (2010b). ‗Compiling a Monolingual Dictionary for Na-
tive Speakers‘. Lexikos 20.
Hanks, Patrick. (2010c). ‗Elliptical Arguments‘ in S. Granger & M.
Paquot (eds.). Proceedings of ELEX 2009 eLexicography in the 21st
century: New challenges, new applications (Cahiers du Cental). Louvain-
la-Neuve: Presses universitaires de Louvain.
Hanks, Patrick. (2010d). ‗Lexicography, Printing Technology, and the
Spread of Renaissance Culture‘ in A. Dykstra (ed.). Proceedings of the
14th EURALEX International Congress, 6-10 July 2010, Leeuwar-
den/Ljouwert, The Netherlands.
Hanks, Patrick. (2010e). ‗Making Meanings‘ in B. Sharp & M. Zock
(eds.). Proceedings of the 7th International Workshop on Natural Lan-
guage Processing and Cognitive Science (NLPCS 2010), 8-9 June 2010,
Funchal, Madeira, Portugal.
Hanks, Patrick. (2010f). ‗Marcus Fabius Quintilianus, Institutio Orato-
ria (Institutes of Oratory)‘ in P. Hanks & R. Giora (eds.). Metaphor and
Figurative Language: Critical Concepts in Linguistics. Abingdon:
Routledge, Vol. I. [Translation of Quintilian‘s ‗Institutio Oratoria‘ (AD
95), Bk. 8, Ch. 6.]
Hanks, Patrick. (2010g). ‗Terminology, Phraseology, and Lexicography‘
in A. Dykstra (ed.). Proceedings of the 14th EURALEX International
Congress, 6-10 July 2010, Leeuwarden/Ljouwert, The Netherlands.
Hanks, Patrick & Rachel Giora (eds.). (2010). Metaphor and Figura-
tive Language: Critical Concepts in Linguistics (6 volumes, with 6 intro-
ductory articles). Abingdon: Routledge.
In preparation
DeCesaris, Janet & Patrick Hanks. (forthcoming). ‗The Persistence of
Technology in Lexis‘.
Hanks, Patrick. (forthcoming a). ‗Book Review: D. Geeraerts. 2010.
Theories of Lexical Semantics (Oxford University Press)‘. International
Journal of Lexicography.
Hanks, Patrick. (forthcoming b). ‗Coding Semantic Properties of Words
in Computational Dictionaries‘ in R. H. Gouws, U. Heid, W.
Schweickard & H. E. Wiegand (eds.). Dictionaries. An International En-
cyclopedia of Lexicography. Supplementary Volume: Recent Develop-
34 Gilles-Maurice de Schryver
ments with Special Focus on Computational Lexicography (Handbooks
of Linguistics and Communication Science). Berlin: Walter de Gruyter.
Hanks, Patrick. (forthcoming c). ‗English and American II: Synchronic
Lexicography‘ in R. H. Gouws, U. Heid, W. Schweickard & H. E. Wie-
gand (eds.). Dictionaries. An International Encyclopedia of Lexicography.
Supplementary Volume: Recent Developments with Special Focus on
Computational Lexicography (Handbooks of Linguistics and Communi-
cation Science). Berlin: Walter de Gruyter.
Hanks, Patrick. (forthcoming d). Lexical Analysis: Norms and Exploita-
tions. Cambridge, MA: MIT Press.
Hanks, Patrick. (forthcoming e). ‗Lexicography‘ in R. Mitkov (ed.). Ox-
ford Handbook of Computational Linguistics, Second Edition. New York:
Oxford University Press.
Hanks, Patrick. (forthcoming f). ‗Lexicography from Earliest Times to
the Present‘ in K. Brown & K. Allan (eds.). Oxford Handbook of the His-
tory of Linguistics. New York: Oxford University Press.
Hanks, Patrick. (forthcoming g). ‗Linguistic Creativity‘ in T. Veale (ed.).
The Agile Mind.
Hanks, Patrick. (forthcoming h). ‗Monolingual Lexicography‘ in C.
Chapelle (chief ed.), et al. Encyclopedia of Applied Linguistics. Hoboken,
NJ: Wiley-Blackwell.
Hanks, Patrick. (forthcoming i). ‗Representing the Unrepresentable:
Dictionaries, Documents, and Meaning‘ in P. M. Bertinetti (ed.). Atti del
XLII Congresso Internazionale di Studi, Pisa, Scuola Normale Superiore,
25-27 settembre 2008.
Hanks, Patrick. (forthcoming j). ‗The Lexicon‘ in R. Mitkov (ed.). Ox-
ford Handbook of Computational Linguistics, Second Edition. New York:
Oxford University Press.
Hanks, Patrick. (forthcoming k). ‗Wie aus Wörtern Bedeutung entsteht:
Semantische Typen treffen auf syntaktische Dependenzen‘ in
Sprachliches Wissen zwischen Lexikon und Grammatik, 46. Jahrestagung
des Instituts für Deutsche Sprache (IDS), 9. - 11. März 2010.
Hanks, Patrick & Peter McClure. (forthcoming). ‗Some Methodologi-
cal Considerations in Approaching the Study of Family Names‘. Nomina:
Journal of the Society for Name Studies in Britain and Ireland.
Mitkov, Ruslan & Patrick Hanks (eds.). (forthcoming). Oxford Dic-
tionary of Computational Linguistics. New York: Oxford University
Full-text available
In this article, all ten papers and talks that have been devoted to the use of ChatGPT in lexicography so far are critically analysed, their results tabulated and cross-compared, from which the leading trends are determined. Extrapolating from the trendlines, a single short but robust new prompt is fine-tuned with which articles from different word classes are generated fully-automatically for a dictionary which compares favourably to the best practice in dictionary compilation. The conclusion is that a new age, that of the successful application of generative AI in lexicography, has dawned.
Full-text available
This second edition of The Oxford Handbook of Computational Linguistics has been substantially revised, updated, and expanded. Alongside updated accounts of the topics covered in the first edition, it includes 17 new chapters on subjects such as deep learning, word representation, semantic role labelling, translation technology, opinion mining and sentiment analysis, and the application of Natural Language Processing in educational and biomedical contexts, among many others. The volume is divided into four parts that examine, respectively: the linguistic fundamentals of computational linguistics; the methods and resources used, such as statistical modelling, machine learning, and corpora; key language processing tasks including text segmentation, anaphora resolution, and speech recognition; and the major applications of Natural Language Processing, from machine translation to author profiling. The book will be an essential reference for researchers and students in computational linguistics and Natural Language Processing, as well as those working in related industries.
Conference Paper
Full-text available
In this paper, we introduce a model for sense assignment which relies on assigning senses to the contexts within which words appear, rather than to the words themselves. We argue that word senses as such are not directly encoded in the lexicon of the language. Rather, each word is associated with one or more stereotypical syntagmatic patterns, which we call selection contexts. Each selection context is associated with a meaning, which can be expressed in any of various formal or computational manifestations. We present a formalism for encoding contexts that help to determine the semantic contribution of a word in an utterance. Further, we develop a methodology through which such stereotypical contexts for words and phrases can be identified from very large corpora, and subsequently structured in a selection context dictionary, encoding both stereotypical syntactic and semantic information. We present some preliminary results.
Full-text available
Intended as a companion volume to The Oxford Guide to Practical Lexicography (Atkins and Rundell 2008), Fontenelle's book aims to bring together the most relevant papers in practical lexicography. This review article presents a critical analysis of the success thereof, both in quantitative and qualitative terms.
A dictionary is an inventory of the words of a language, with explanations or translations. All major languages and many others have dictionaries. This chapter traces the development of dictionaries for over 2,000 years, starting with China, India, Persia, classical Greece, and Rome. Arabic and Hebrew dictionaries in the Middle Ages were of comparable cultural importance. A major impact was the invention of printing. During the Renaissance, the Latin dictionaries of Calepino and Estienne set standards for future lexicography. The prescriptive aims of European Academies during the Enlightenment are contrasted with Johnson’s descriptive principles. The historical principles of OED are contrasted with the synchronic principles of dictionaries intended as a collective cultural index and dictionaries as aids for foreign learners. In Russia (unlike America), lexicography developed harmoniously with linguistics. The relationships between dictionaries and language development in different countries are discussed. The chapter concludes with a summary of the impact of computer technology, corpora, and changing business models on lexicography.
Patrick Hanks has been manager of Oxford English Dictionaries since 1990. He studied the English language and literature at Oxford and acquired his first lexicographical training by preparing a Briticized version of Clarence Barnhart's American College Dictionary. In 1970 he was appointed editor of Collins English Dictionary (published in 1979). He was Project Manager and then, from 1987, Editorial Director of the Cobuild Lexical Computing Group, a position he held concurrently with the chief editorship of the Collins English Dictionaries, the latter resulting in the Collins Cobuild Dictionary of the English Language (1987). His long interest in onomastics has led to A Dictionary of Surnames (coeditor, 1988) and A Dictionary of First Names (1990). His main current research interest lies in the relationship between word meaning and word use. 1. I would like to thank Ken Church, Marti Hearst, Rosamund Moon, Stanley Peters, and Yorick Wilks for comments on earlier drafts of this paper. I am also indebted to my colleague Sue Atkins for many stimulating discussions on this and related topics. 2. Wierzbicka is not the first to draw attention to the habitual and widespread overuse of etc., which she describes as "sad and defeatist." Sinclair has gone so far as to ban the word entirely from the Cobuild dictionaries of which he is editor-in-chief (although so powerful is its appeal that, even in the face of this ruling, his minions have allowed it to creep in occasionally).