ArticlePDF Available

Abstract and Figures

Prepositions are an important vehicle for indicating semantic roles. Their meanings are difficult to analyze and they are often discarded in processing text. The Preposition Project is designed to provide a comprehensive database of preposition senses suitable for use in natural language processing applications. In the project, prepositions in the FrameNet corpus are disambiguated using a sense inventory from a current dictionary, guided by a comprehensive treatment of preposition meaning. The methodology provides a framework for identifying and characterizing semantic roles, a gold standard corpus of instances for further analysis, and an account of semantic role alternation patterns. By adhering to this methodology, it is hoped that a comprehensive and improved characterization of preposition behavior (semantic role identification, and syntactic and semantic properties of the preposition complement and attachment point) will be developed. The databases generated in the project are publicly available for further use by researchers and application developers.
Content may be subject to copyright.
The Preposition Project
Ken Lit kowski
CL Research
9208 Gue Road
Damascus, MD 20872
ken@clre s.com
Orin Hargraves
5130 Band Ha ll Hill Road
Westminster, MD 21158
orinkh@carr.org
Abstract
Prepo sitio ns are an important vehicle for
indicating semantic roles. Their meanings
are difficult to analyze and they are often
discarded in processing text. The
Preposition Project is designed to provide
a comprehensive database of preposition
senses suitable for use in natural language
processing applications. I n the project,
prepo sitio ns in the FrameNet corpus are
disambiguated using a sense inventory from
a current dict ionary, guided by a
comprehensive treatment of preposition
meaning. The methodology provides a
framework for identifying and
charact erizing semantic roles, a g old
standard corpus of instances for further
analysis, and an account of semant ic r ole
alternation pat ter ns. By adhering to this
metho do logy, it is hoped that a
comprehensive and improved
characterization of preposition behavior
(semantic role identification, and syntactic
and semantic properties of the preposition
complement and at tachment point) will be
developed. The databases generat ed in the
project are publicly available for further use
by researchers and application developers.
1 Introduction
Characterization of preposition meanings is
important for understanding t he semantic
relations between elements of a sentence. The
difficulty of this task arises from their po lysemy.
Defining English prepositions in a native speaker
dictionary is a thankless task: t he senses are
many and complexly interrelated; the frequency
of prepositions requires the study of numerous
examples; and their treatment in dictionaries may
cause confusion and information overload, since
there is little agreement in minor, and sometimes
even in major sense divisions. The lexicographer
may suspect that the effort is of little value:
native speakers do not often consult dictionaries
to learn what a preposition means.
When the dict ionary entries are used as the
basis for processing text, the tables are turned:
the definition of preposit ions is of vital
importance, and can be an important resource.
But the distinctions for processing by the human
mind in dictionaries are not in all cases easily or
efficiently processed by comput er, and the
dictionary alone may be insufficient.
The Preposition Project (TPP) is designed to
provide a comprehensive characterization of
preposition senses suitable for use in natural
language processing. It is attempting to fine-tune
the dis tinctions w ithin and among prepositions in
a native speaker dictionary (the Oxford
Dictionary of English, 2003) by comparing and
contrasting them with the treatment of
prepo sitio ns in two other sources: the instances
of prepositions that ar e fu nctionally tagged in
FrameNet, and t he treatment of prepositions in a
traditional English grammar (Quirk et al., 1985).
This paper will survey the project and describe
initial findings of prepositional behavior that have
come to light through this exercise.
2The Preposition Project
Each of 847 preposition senses for 373
prepo sitio ns (including phrasal prepositions) will
be characterized with a semantic role name and
the syntactic and semantic properties of its
complement and attachment point. Each sense
will be further described by (1) a link to its
definition in the Oxford Dictionary of English,
(2) its basic syntactic function and meaning as
described in Quirk et al. (1985), (3) other
prepo sitio ns filling a similar semantic role, (4)
FrameNet frames and frame elements, (5) other
syntactic forms in which the semantic role may
be realized, and (6) its position in a network of
prepositions. This basic information is provided
in a spreadsheet for each preposition. All data
generated during this project is freely available
for use by other researchers and application
developers.1
The primary source of data is the set of
corpus instances from FrameNet: sentences
tagged with semantic roles (frame elements) for
each preposition. Since FrameNet was not
constructed with prepositions in mind, the
examination of frame elements using a
preposition provides a corpus that co nsiderably
facilitates the construction of a high-quality
preposition database. The assumption is that
frames and frame element s will help elucidate the
meanings of prepositions.
2.1 The Preposition Sense Inventory
The Oxford Dictionary of English (ODE, 2003)
(and its predecessor, the New Oxford Dictionary
of English (NODE, 1997)) was chosen as the
source of the preposition sense inventory
because of the clarity and organization of its
senses and its reliance on corpus evidence.
Litkowski (2002 ) identifies prepositions in
NODE, including phrasal prepositions. As
indicated there, 373 prepositions (listed in the
appendix) and 847 prepo sition senses were
identified. This set, with modifications as noted
in Litkowski (2 002), forms the basis for TPP's
sense invent ory.2
TPP's database does not include the
definitions and examples from ODE, sinc e t hat is
the intellectual property of Oxford University
Press. The database provides a key to these
definitions, so that a specific sense can be
identified and a user of this dat abase can gain a
further understanding of the meaning conveyed
by each sense by referring to the dictionary itself.
2.2 Methodology for Sense Disambiguation of
Preposition Instances in FrameNet
The initial focus of TPP is on the most common
and most polysemous prepositions. These
include the following 20 pr epo sitio ns that have
six or more senses: about (6), above (9), after
(11), against (10), around (6), at (12), by (22),
for (14), from (14), in (11), into (9), of (18), on
(23), over (16), through (13), to (17), towards
(6), under (16), with (16), and within (6).
After selecting a preposition for study,3 the
FrameNet corpus instance s a re obt ained using
CL Research's publicly available FrameNet
Explorer (FNE).4 The FrameNet database
includes appr oximat ely 7,500 XML lexical unit
files, each containing tagged sentences for a
spec ific lexical item and frame (e.g., the item
1http://www.clres.com/prepositions.html
2As noted above, an y sense inventory may be
fraught with problems and several have already come to
light. It is h oped that the suitability of this sense
inventor y for NLP applica tions, par ti cularly issues of
granularity, can be investigated more thoroughly with
the type of data being generated.
3By, through, with, and for have been
completed as of March 19, 2005.
4http://www.clres.com/FNExplorer.html
Frame Frame Element Lexical Unit Subcorpus Ide nti fier-Posi ti on
Achieving_first No_instances originate.v V-570-s20-np-ppby
Arrest Auth orities arrest.v V- 73 0- s2 0-ppby 875350-43
Arrest Auth orities arrest.v V-730-s20-pp by 875353-71
Arrest Auth orities arrest.v V-730-s20-pp by 875362-160
Arrest No_in stan ces apprehend.v V-730-s20-ppby
Table 1. Preposition Instance File Sample Lines
move.v in the Motion frame). Tagged sentences
are grouped into subcorpora, each of which has
a name. The name encodes syntactic properties
of the subcorpus, e.g., V-730-s20-ppacross;
which includes sentences using the verb move
that include a prepo sitional phrase beginning
with across (tagged as a Path frame element
within the Motion frame).
FNE generates a text file of tagged FrameNet
instances of a given preposition. FNE searches
each lexical unit file to find subcorpor a having
ppprep in the name. For each subcorpus having
the target name (e.g., ppby), a line is written to
the text file, containing: t he frame name, the
frame element, the lexical unit, the subcorpus
name, and t he sentence ID and starting position
of the preposition in the sentence. Table 1 shows
sample lines from the instance file for by.
In constructing a line, the FrameNet data for
the sentence are e xamined to identify the frame
element introduced by the target preposition.
The example data indicate that no sentences
containing a prepositional phrase beginning with
by were tagged in the subcorpora for originate.v
and apprehend.v, but that 3 sentences were
tagged for arrest.v in the Arrest frame with the
Authorities frame element. The file is initially
sorted by frame name; imported into an Excel
spreadsheet, it can be sorted on any other
element. For by, 1314 lines were generated.5
Using this instance file as a guide, the
lexicographer begins the proc ess of analyzing the
preposition’s senses. A separate Excel
spreadsheet is devised for the preposition, with
one row for each sense (with the ODE sense
number in parentheses). The lexicographer
examines the definit ions for the preposition,
available information about the preposition in
Quirk et al., and the FrameNet corpus instances.
On the basis of this information, the
lexicographer assigns an arbitrary and subjective
semantic role name, intended to be a
characterization of the sort of information that
the given preposition introduces. He then
identifies the usual syntactic function of a phrase
with the preposition in the specific sense (noun
postmodifier (1); adverbial adjunct (2a),
subjunct (2b), disjunct (2c), or conjunct (2d);
and/or verb (3a) or adjective (3b) complement,
as described in paragraph 9.1, p. 657 of Quirk et
al.). The lexicog rapher then ascertains the
paragraph, if any, in Quirk et al. that provides a
semantic description of the instant sense. This
paragraph may also identify other prepost ions
that have a similar sense and use; these other
prepo sitio ns are als o rec orde d in the spreadsheet,
along with any others that the lexicographer
intuits may have a similar meaning.
Based on the definition and the corpus
instances, the lexicographer then sets out to
characterize the syntactic and semant ic
properties of the sense's complement and
attachment point, based on an interpretation of
the de finit ion. These characterizations are
preliminary and not based on any syst ema tic
criteria; however, this is not important at this
stage of development. As described belo w, it is
expected that these characterizations will be
5The instance fil e gen erated by this method
does not represent all instances of a preposition in the
FrameNet da tabase.
Sense Relati on Name
Quirk
Syntax
Quirk
Paragraphs Complement Properties Attachment Properties
1 (1) ThingTransited 2a, 3a 9.25, 9.28 opening, channel, or l ocat ion verbs of mot ion
2 (1a) ThingBored 1, 2a, 3a 9.25, 9.28 permeable or breakable
physical object
verbs den oti ng pen etr ation
3 (1b) Th ingTransited 1, 2a, 3a 9.25, 9.26, 9.28 sth regarded as homogenous verbs of mot ion
4 (1c) Th ingPenetrated 1, 2a, 3a None a permeable obstacle a perceived object; sometim es
complem en t of a verb of
perception
5 (1d) ChannelTransited 1, 2a, 3a 9.19, 9.22, 9.27 an opening or obstacle copula or ver b of loca tion
Table 2. Sample S enses for ‘through’
refined when disambiguation routines are
developed. Table 2 shows this information for
five (of 13) senses of through.6
As indicated, the lexicographer assigns a
sense number t o each sentence inst ance . FNE is
used for this purpose, by displaying all annotated
instances of a lexical unit (such as arrest.v)
entered on its sea rch screen. In add ition, all
subcorpus names are displayed in a drop-down
list; by selecting the relevant subcorpus (e.g.,
V-730-s20-ppby), the lexicographer can view
just those sentences and determine which ODE
sense of the preposition is applicable. Since
similar items may be grouped together (i.e.,
frame name, frame element name, and lexical
unit), several instances can be tagged at a time.
Tagging about 1500 instances for a preposition
takes about 10 hours.
The lexicogr ap her may tag some instances
with multiple senses. The lexicographer may also
find, through t he ite rative exercise o f examining
FrameNet instances, that the sense division fou nd
in ODE does not quite match the reality of
preposition use. In this case, additional lines may
be created in the sense spreadsheet to
accommodate new subsenses, or less freque nt ly,
entirely new senses. The lexicographer also
keeps notes and prepares a summary describing
the treatment of the prepo sition, no ting any
special or idiomatic uses of the preposition that
may fall outside the defined sense inventory.
Finally, the lexicographic description is
compared with the Lexical Conceptual Structure
inventory available from Dorr (1996) (9 senses
for by, 2 senses for through, 15 for with, and 8
for for)
From a lexicographic perspective, it turns out
that each source of information abo ut the
behavior of a prep osit ion is incomplete in itself.
All sources used in the project are
complementary in providing an overall
assessment of the meaning and character ization
of the preposition. ODE may be found wanting
when placed next to the Fr ameNet inst ances; this
project may thus reveal further aspects of the
appropriate sense inventory. ODE does not
provide a summary picture of a preposition's
meanings; the characterization in Quirk et al.
provides such a perspective, but it too is
incomplete, both in coverage of a particular
meaning and in not identifying correspondences
with other prepositions. The FrameNet database
does not provide instances for all the senses. The
Dorr inventory confirms the characterization
here, but do es no t co nt ain the same level of
detail. Despite the (minor) deficiencies of each
source, their combination appears to be quite
comprehensive.
6The definitions in ODE for Table 2 are “(1)
moivng in one side and out of the other side of (an
opening, channel, or location): (a) so as to make a hole
or open in g in (a physi cal object); (b) moving ar oun d or
from one side to the other within (a crowd or group);
(c) so as to be perceived from the other side of (an
intervenin g obst acle); (d) expressing the posi ti on or
location of something beyond or at the far end of (an
opening or an obstacle).”
Sense Relati on Name Frame:FrameElement Pairs
1 (1) ThingTransited Arriving:Path; Cause_motion:Path; Cotheme:Path; Departing:Path;
Escaping:Location; Escaping:Path; Evading:Path; Fluidic_motion:Path;
Mass_motion:Path; Motion:Path; Motion_directional:Path; Motion_noise:Path;
Operate_vehicle:Path; Path_shape:Path; Placing:Goal; Placing:Path; Removing:Path;
Roadways:Area; Self_motion:Area; Self_motion:Path; Breathing:Path
2 (1a) ThingBored Cause_harm:Body_part; Impact:Impactee; Natural_features:Relative_location;
Use_firearm:Path
3 (1b) Th ingTransited Emotion_heat:Location; Path_shape:Area; Ride_Vehicle:Path; Roadways:Path;
Self_motion:Self_mover; Travel:Path
Table 3. Frame:FrameElement Pairs Ide ntifie d for Senses o f ‘through’
Frame:Frame_Element Lexical Units
Emotion_heat:Location boil.v seethe.v burn.v
Path_shape:Area crisscross.v
Ride_Vehicle:Path hitchhike.v
Roadways:Path bypass.n highway.n line.n motorway.n path.n pathway.n road.n street.n
track.n trail.n
Self_motion:Self_mover sprint.v
Travel:Path journey.n journey.v tour.n travel.v
Table 4. Analysis of Sense 3 (ThingTransite d) fo r ‘through’
3Analyzing the Semantic Role for a Sense
With the tagged instances, a simple sort by sense
number of the Excel spreadsheet identifies the
(Frame Frame_Element) pairs for each sense.
These pairs are aggregated into one list in the
Sense Analysis sprea dsheet (as shown in Table
3). As indicated above, the lexicographer
identifies a semantic role label for each sense
based on intuition. These labels are developed
independently of (computat ional) linguistic
theories and are mainly based on a general
characterization of the sense infor matio n for the
prepo sitio n. These labels are inte nded to be used
in characterizing prepositional phrases, based on
the crit eria in the complement and at tachment
syntactic and semantic properties for
disambiguating the prepositions. Gildea &
Jurafsky (2002) developed a mapping o f frame
elements into 18 higher level semantic roles. The
methodology followed here provides an
alternative mapping t hat is more data-driven and
less subjective.
In many senses for which FrameNet instances
were identified, there is a clear correspondence
between the frame element names and the
semantic relation assigned by the lexicographer.
But, they also show the range and variation of
frame elements that have been developed by the
FrameNet lexicographers. (Frame
FrameElement) pairs and lexical units are
shown in Table 4 for through (sense 3), given
the label ThingTransited. T his tab le suggests that
this sense encapsulates a Path semantic role.
Since other senses of through also have a Path
role, the FrameNet lexicographer’s assignment
indicates a finer granularity on the type o f path.
The assignment of an Area frame element for
crisscross suggests a finer granularity on the type
of path, suggesting that the path might be
through a region. This t ype of analysis
demonstrates the richness of the data generated
by tagging instances.
4Refining Characterizations Through
Disambiguation
In addition to the instances file, FNE also
generates an XML file of the sentences
themselves. These sentences (for which the
preposition senses have been assigned) are
suit able for the development of disambiguation
routines for semantic role assignment. In t his
Frame Frame Element Lexical Unit GF PT Preposi tion
Arriving Mode _of_t ra nsportati on arrive.v Comp PP by
Arriving Mode _of_t ra nsportati on arrive.v Comp PP in
Arriving Mode _of_t ra nsportati on come.v Comp PP by
Arriving Mode _of_t ra nsportati on return.n Comp PP by
Arriving Path approach.v Comp PP on
Arriving Path approach.v Comp PP through
Arriving Path approach.v Comp PP via
Arriving Path arrive.v Comp PP through
Arriving Path arrive.v Comp PP via
Arriving Path come.v Comp PP round
Arriving Path come.v Comp PP through
Arriving Path come.v Comp PP via
Arriving Path come.v Obj NP
Arriving Path enter.v Comp PP at
Arriving Path enter.v Comp PP by
Arriving Path enter.v Comp PP through
Arriving Path enter.v Comp PP via
Arriving Path get.v Comp PP past
Arriving Path reach.v Comp PP by
Arriving Path reach.v Comp PP through
Arriving Path reach.v Comp PPing
Arriving Path return.n Comp PP towards
Arriving Path return.v Comp PP across
Table 5. Variations in Syntactic Realizations of a Frame Element for ‘by’
respect, these sent ences ar e essentially equivalent
to the lexical sample task followed in Senseval.
In addition, since these instances are FrameNet
tagged sentences, they provide a suitable dataset
for the Senseval FrameNet semantic role task.
(The XML files are available as part of TPP).
Litkowski (2002) described a set of
disambiguation tests for the preposition of, based
solely on introspection of its definitions. Those
tests are not sufficient. As implied in Table 2, the
complement and attachment properties require a
richer set of semantic tests for which suitable
lexical resources do not presently exist. Sense 1
of through requires that the prepositional phrase
be attached to a verb of motion; WordNet has a
general motion catego ry for verbs, so in t his
case, a suitable test can be made. However, for
sense 2, it is necessary to identify verbs of
penetration; no such category is available in
WordNet. A Ro get-st yle thesa urus might provide
the necessary information (e.g., look up
penetration in the thesaurus and then examine
the verbs in the same thesaurus category).
The corpus instances developed in TPP will
be used to refine the characterizations developed
by the lexicographer. Disambiguation routines
will be developed, particularly investigating the
use of various lexical resources, such as
WordNet, machine-readable dictionaries, and
thesauruses. Many of the attachment
charact erizations suggest close ties to sets of
verbs; development of appr opr iate
disambiguation routines may reveal close
associations with verb classes. This phase of TPP
is not yet well developed.
5Identifying Other Prepositions and Other
Syntactic Realizations Filling the Same
Semantic Roles
A tagged sentence in the FrameNet database
identifies a specific frame element within a
spec ific frame for the prepositional phrase
introduced by the pre position. T he frame element
and frame can be used as a seed to find other
ways recorded in FrameNet for realizing the
combination. For example, as shown in Table 5,
Sense
Lexicographer
Prepositions Prepositions Identifiable from FrameNet
2 (1a) into into; on; over; about; at; across; in; under; against; between; through; around; with;
behin d; off; onto; towa rd s; by; down; outsid e; a lon g; near ; bel ow; ben eat h; above; of;
withi n; un der neath; besid e; beyon d; throughout; cl ose; up; for; from
3 (1b) among, within inside; through; under; within ; at; beneath; amongst; between; on; behind; among;
above; ar oun d; over; all ; close; across; al ong; down; towa rd s; up; past; via; from; of;
alongside; by; with; to
Table 6. Other S imilar P repositions for Senses of ‘through’
by introduces the frame ele ment
Mode_of_transportation or Path in the
Arriving frame. FNE can be used to query the
FrameNet database to determine other
prepo sitio ns and other syntactic realizations in
which these frame elements occur. The distinct
patterns in which these occur are summarized by
identifying all unique occurrences of (Frame
Frame_Element Lexical_Unit
Grammatical_Function Phrase_Type
Preposition) within the database. (Preposition
is included only when the Phrase_Type is PP.)
There may be many sentences that have been
tagged similarly, bu t only unique occurrences
need to be identified to examine the distribution
of the same frame element.
In Table 5, several combinations are evoked
by the seed element. The
Mode_ of_transportation frame element was
seeded by the instances for arrive.v and/or
come.v (sense 8 of by); the Path element was
evoked by the instances for enter.v (sense 5 of
by). It can be seen that in addition to by, in is
also used to indicate the
Mode_of_transportation frame element, also as
a Complement to the main verb. For the Path
frame element, in addition to by, the prepositions
on, through, via, round, past, towards, and
across are used. The Path frame element is also
expressed as the Direct Object for one verb,
come.
In a second example (not shown), 52 lines
were generated for the Cure:Treatment
combination from a sing le instance of through,
via the verb rehabilitate.v (sense 12, labeled
Intermed iary by the lexicographer, but
essent ially a means semantic role). The
Cure:Treatment pair occ ur s in a much greater
range of lexical items, including not only verbs
(alleviate, cure, ease, heal, rehabilitate,
resuscitate, and treat), but also nouns (cure,
healer, palliation, remedy, therapist, therapy,
and treatment) and adjectives (curative,
palliative, rehabilitative, and therapeutic).
Examining just those with a Phrase Type of PP,
we see that by, with, without, and for are other
prepositions in addition to through expressing
the Treatment frame element.
Using the frames and frame elements from all
sense-tagged instances as seeds, 9309 lines and
5440 lines similar to those in Table 5 are
generated for by and through, respectively.
(These files are also available in a tab-separated
text file.) These results can be examined by sense
number and c an lead t o an identification o f all
other prepositions expressing the frame elements
as shown in Table 3. These prepositions are
shown in Table 6 alongside those the
lexicographer listed on the basis of intuition and
Quirk a sses sment s of semantic similarit y.
The number of other prepositions expressing
frame elements encompassed by a single sense
was quite surprising. The first explanation for
this large number was simply that the
lexicographer had overlooked some possibilities.
And indeed, upon reviewing the lists, the
lexicographer co uld imagine substitut ing some of
the suggestions in example sentences. However,
the large number requires a more systematic
explanation.
To assess the substitutability of other
prepo sitio ns for a given semantic role, the
lexicographer first examined their definit ions in
ODE for simila rity. Many had similar definitions,
but many did not. The lexicographer then
examined the de finit ions in the Oxford English
Dictionary (OED), which has a much larger
number of senses than ODE. Rather than finding
similar senses, the lexicographer concluded that,
in fact, ODE simply provided a better
organization of the many senses, ignor ing
obsolete and dated senses.
An immediate explanation for the large
number of prepositions is simply to posit that
prepo sitio ns ar e inherently polysemous. But, this
seems to be too profligate a position. Instead, it
seems much more likely that some meaning
component of the attachment point (usually a
verb) combines with some meaning component
of the preposition to instantiate a frame element.
Instead of attempting to reach a final
conclusion on substitutability, this issue will
await further data when the other prepositions
undergo their sense tagging. The analysis at that
time will examine the semantic role assignments
for prepo sitions deemed substitutable and
deter mine their congruence. In particular, it will
be possible t o examine the array o f frame
elements of putative substitutable senses.
In addition to the other preposition analysis,
the FrameNet data support an in-depth
examination of other methods of realizing frame
elements. For example, the alternation patterns
for expressing the Treatment frame element
appear to vary by part of speech of the lexical
item. For verbs, we have "Comp PPing" (a
complement prepositional phrase containing a
gerund), "Ext NP" (an external argument, i.e.,
the subject of the verb), "DNI" (a definite null
instantiation, indicating that the element is an
anaphor), and a "Comp AVP" (a complement
adverbial phrase, e.g. treated
pharmacologically). Similar variations are
indicated for nouns and adjectives. These
semant ic r ole alternations aw ait furt her stud y.
6Network of Preposition Senses
Litkowski (2002 ) claimed that pr ep o sit ions in
NODE could be arranged in a hierarchy based on
a digraph analysis of the de finitions. Prepositions
do not seem a likely candidate for inheritance as
in the case of nouns and verbs. The
lexicographer examined this possibility in other
preposition definitions ending in by (18) and
through (6). Most cases using by (many of
which included the phrase “supported by”) did
not seem to have a strong sense of inheritance,
judged by the lexicographer as having an Agent
sense based simply on the presence of the past
participle.
The lexicographer also examined t he 2-level
hierarchy with ODE senses (co re sen ses and their
subsenses). ODE states that subsenses are
usua lly generalizations or sp ecializat ions o f the
core sense. In this effort, the lexicographer found
that the relation of the subsenses to the core
senses was based on so me s mall bit of expanded
or narrowed meaning. Whether these bits of
meaning are involved in any putative inheritance
will be studied further as TPP continues.
7Conclusions and Furt her Work
The disambiguation of prepositions using a w ell-
developed sense inventory and FrameNet
instances has provided a wealth of data about the
behavior of prepositions and semantic roles.
Even though only two prepositions have been
analyzed (at the time of submission), the results
achieved extend across many semantic roles,
numerous other prepositions, and semantic role
alternation patterns. With only a modest amount
of effort for disambiguating hundreds of
instances, several large databases have been
generated for further characterization of
preposition and semantic role behavior. All data
generated in The Prepo sition Project will be
publicly available for researchers and application
developers.7
The focus of The Preposition Project so far
has been on establishing a framework for
generating data and making it available. An
important future part of the project will be in
attempting to link this work with other research
on prepositions (e.g., O’Hara & Wiebe, 2003
and Saint-Dizier, 2005).
The Preposition Project demonstrates
considerable benefit available from exploiting the
FrameNet databases. While the initial focus has
been on prepo sition behavior, the semantic role
alternations suggest the value of the FrameNet
data for paraphrase opportunities.
References
Dorr, B. ( 1996). Lexical Conceptua l St ructures for
Prepositions
(http://www.umiacs.um d.edu/~bonnie/AZ-preps-
English. lcs)
Gildea, Daniel, and Daniel Jurafsky. (2002) Automatic
Labeling of Semantic Roles. Computational
Linguistics, 28 (3), 245-288.
Litkowsk i, K. C. (200 2). Dig ra ph Analysis of
Dictionary Preposition Definitions. Word Sense
Disambiguation: Recent Success and Future
Directions. Ph ilad elphia, PA: Assoc iation for
Computational Lin guistics.
The New Oxford Dictionary of English. (1998) (J.
Pearsall, Ed.). Oxford: Clarendon Press.
O’Hara , T hom as, an d Ja n Wiebe. (2003) Preposi tion
Semantic Classification via Treebank and Frame
Net, Proc.of the 7th Conference on Natural
Language Learning (CoNLL-2003), Edmonton,
Canada, pp. 79–86.
The Oxford Dictionary of English. (2003) (A.
Stevension and C. Soanes, Eds.). Oxford:
Clarendon Pr ess.
Quirk, R., Greenbaum, S., Leech, G., & Svartik, J.
(1985). A comprehensive grammar of the English
language. London: Longman.
Saint-Dizi er, Patrick. (200 5) PrepNet: a Fr amework for
Describing Pr eposi tions: Prelimina ry In vestigation
Results. International Workshop on Computational
Semantics, Tilburg, The Netherlands
7TPP files include the text file used to create
the Excel in stan ce spreadsheet, the Excel instance
spreadsheet with sense tags, and the Excel sense
analysis spreadsheet, and the summary lexicographic
treatment of the preposition in a Word document.
... In the last decades, an impressive number of semantic classifications has been developed, both regarding manual lexicographic and/or cognitive classifications such as WordNet (Fellbaum, 1998), FrameNet (Fillmore et al., 2003), VerbNet (Kipper Schuler, 2006) and PrepNet/The Preposition Project (Litkowski and Hargraves, 2005; SaintDizier, 2005), as well as regarding computational classifications for nouns (Hindle, 1990; Pereira et al., 1993; Snow et al., 2006), verbs (Merlo and Stevenson, 2001; Korhonen et al., 2003; Schulte im Walde, 2006) and adjectives (Hatzivassiloglou and McKeown, 1993; Boleda et al., 2012). Semantic classifications are of great interest to computational linguistics, specifically regarding the pervasive problem of data sparseness in the processing of natural language. ...
... Such classifications have been used in applications such as word sense disambiguation (Dorr and Jones, 1996; Kohomban and Lee, 2005; McCarthy et al., 2007), parsing (Carroll et al., 1998; Carroll and Fang, 2004), machine translation (Prescher et al., 2000; Koehn and Hoang, 2007; Weller et al., 2014), and information extraction (Surdeanu et al., 2003; Venturi et al., 2009). Regarding prepositions, comparably little effort in computational semantics has gone beyond a specific choice of prepositions (such as spatial prepositions), towards a systematic classification of preposition senses, as in The Preposition Project (Litkowski and Hargraves, 2005 ). Distributional approaches towards preposition meaning and sense distinction have only recently started to explore salient preposition features, but with few exceptions (such as Baldwin (2006) ) these approaches focused on token-based classification of preposition senses (Ye and Baldwin, 2006; O'Hara and Wiebe, 2009; Tratz and Hovy, 2009; Hovy et al., 2010; Hovy et al., 2011). ...
... In the last decades, an impressive number of semantic classifications has been developed, both regarding manual lexicographic and/or cognitive classifications such as WordNet ( Fellbaum, 1998), FrameNet ( Fillmore et al., 2003), VerbNet ( Kipper Schuler, 2006) and PrepNet/The Preposition Project ( Litkowski and Hargraves, 2005;SaintDizier, 2005), as well as regarding computational classifications for nouns ( Hindle, 1990;Pereira et al., 1993;Snow et al., 2006), verbs ( Merlo and Stevenson, 2001;Korhonen et al., 2003;Schulte im Walde, 2006) and adjectives ( Hatzivassiloglou and McKeown, 1993;Boleda et al., 2012). Semantic classifications are of great interest to computational linguistics, specifically regarding the pervasive problem of data sparseness in the processing of natural language. ...
... Notably, the polysemy of over and other prepositions has been explained in terms of sense networks encompassing core senses and motivated extensions (Brugman, 1981;Lakoff, 1987;Dewell, 1994;Evans, 2001, 2003). The Preposition Project (TPP; Litkowski and Hargraves, 2005) broke ground in stimulating computational work on finegrained word sense disambiguation of English prepositions (Litkowski and Hargraves, 2005;Ye and Baldwin, 2007;Tratz and Hovy, 2009;Dahlmeier, Ng, and Schultz, 2009). Typologists, meanwhile, have developed semantic maps of functions, where the nearness of two functions reflects their tendency to fall under the same adposition or case marker in many languages (Haspelmath, 2003;Wälchli, 2010). ...
... Notably, the polysemy of over and other prepositions has been explained in terms of sense networks encompassing core senses and motivated extensions (Brugman, 1981;Lakoff, 1987;Dewell, 1994;Evans, 2001, 2003). The Preposition Project (TPP; Litkowski and Hargraves, 2005) broke ground in stimulating computational work on finegrained word sense disambiguation of English prepositions (Litkowski and Hargraves, 2005;Ye and Baldwin, 2007;Tratz and Hovy, 2009;Dahlmeier, Ng, and Schultz, 2009). Typologists, meanwhile, have developed semantic maps of functions, where the nearness of two functions reflects their tendency to fall under the same adposition or case marker in many languages (Haspelmath, 2003;Wälchli, 2010). ...
Article
Full-text available
We consider the semantics of prepositions, revisiting a broad-coverage annotation scheme used for annotating all 4,250 preposition tokens in a 55,000 word corpus of English. Attempts to apply the scheme to adpositions and case markers in other languages, as well as some problematic cases in English, have led us to reconsider the assumption that a preposition's lexical contribution is equivalent to the role/relation that it mediates. Our proposal is to embrace the potential for construal in adposition use, expressing such phenomena directly at the token level to manage complexity and avoid sense proliferation. We suggest a framework to represent both the scene role and the adposition's lexical function so they can be annotated at scale---supporting automatic, statistical processing of domain-general language---and sketch how this representation would inform a constructional analysis.
... Actual comparison of event corpora is made complicated by the wide variance in how many events (or relation-bearing predicates) are annotated per sentence, and how many relations are explicitly annotated, how many implicit relations are inferable, and the distance that is allowed when one makes an annotation. Annotation schemes such as Propbank (Palmer et al., 2005), FrameNet (Fillmore andBaker, 2000), Preposition annotation (Litkowski and Hargraves, 2005;Srikumar and Roth, 2013;Schneider et al., 2015) or AMR (Banarescu et al., 2013) have captured large quantities of temporal and causal relationships, but largely do so within very limited distances from a predicate. Other annotations such as PDTB (Prasad et al 2008) or RST (Carlson et al., 2003) may also capture relations, but are limited to adjacent sentences or adjacency pairs within rhetorical structure. ...
... Beginning with the seminal resources from The Preposition Project (TPP; Litkowski and Hargraves, 2005), the computational study of preposition semantics has been fundamentally grounded in corpus-based lexicography centered around individual preposition types. Most previous datasets of preposition semantics at the token level Hargraves, 2005, 2007;Dahlmeier et al., 2009;Tratz and Hovy, 2009;Srikumar and Roth, 2013a) only cover high-frequency prepositions (the 34 represented in the SemEval-2007 shared task based on TPP, or a subset thereof). ...
Article
Full-text available
We present the first corpus annotated with preposition supersenses, unlexicalized categories for semantic functions that can be marked by English prepositions (Schneider et al., 2015). That scheme improves upon its predecessors to better facilitate comprehensive manual annotation. Moreover, unlike the previous schemes, the preposition supersenses are organized hierarchically. Our data will be publicly released on the web upon publication.
... The Preposition Project (TPP). This is an English preposition lexicon and corpus project (Litkowski and Hargraves, 2005) that adapts sense definitions from the Oxford Dictionary of English and applies them to prepositions in sentences from corpora. A dataset for the SemEval-2007 shared task on preposition WSD (Litkowski and Hargraves, 2007) was created by collecting FrameNet-annotated sentences (originally from the BNC) and annotating 34 frequent preposition types (listed in (2) below) with a total of 332 attested senses. ...
... Preposition, another type of closed-class words, is investigated for sense disambiguation. Litkowski and Hargraves [5] designed The Preposition Project (TPP) to deal with English preposition behaviors and a series of tasks are served for preposition WSD in the following years and to date. Despite some fruitful achievements from TPP and other researches, the semantic and syntactic complexity of prepositions still hinders a satisfying working algorithm and representative findings for preposition WSD [6]. ...
Article
Full-text available
English preposition “in” is targeted for word sense disambiguation (WSD) in this paper. In the way of Formal Concept Analysis, a model of formal context is formulated for the task of WSD. This model offers a higher accuracy increase proved in a structural partial-ordered attribute diagram (SPOAD) generated from the formal context. Thus, co-occurrence relation information between the proposed “in” and governors in the context provides a good perspective to preposition sense disambiguation. Moreover, the contributions of governors to the WSD of preposition are varied in terms of different attributes of Mutual Information (MI) and syntactic features.
Chapter
This paper presents a study about ambiguous French prepositions, stressing out their roles as dependencies introducers, in order to derive some translation heuristics into English, based on a French-English set of parallel texts. These heuristics are formulated out of statistical observations and use some up-to-date results in Machine Translation (MT). Their originality mostly relies upon two items: (1) The importance given to syntax and dependency relations, along with lexicons, the latter being well browsed by the present literature in the domain (2) The existence of intrinsic semantics in prepositions, something rather discarded in NLP literature devoted to statistical MT, that tends to point at the most appropriate translation. An experiment has been run on corpora in both languages, using a dependency parser in the source language, and results looked to be encouraging for a “step by step approach” for MT improvement.
Article
To improve the accuracy of word sense disambiguation (WSD) has been a significant issue, and to visualize the structure of a dataset to discover knowledge has been an urgent demand in natural language processing. In order to fulfill these two tasks simultaneously, a new approach of attribute partial order structure diagram is proposed. The principle of attribute partial order and the approach of attribute partial order structure diagram are described. The proposed approach is testified by the WSD of the English preposition over, using the dataset from SemEval corpus. Two well-accepted sense inventories for fine-grained WSD of the English prepositions are adopted. The formal contexts for the fine-grained WSD of the English preposition over are established and the corresponding attribute partial order structure diagrams are generated and used as the models of WSD. The tested results show that the accuracies of WSD of over by the proposed approach are significantly higher than the ones by the state of the art system. Moreover, the proposed approach can visualize the attribute partial order structure of the dataset, which can be used for knowledge discovery.
Article
While quite a lot of work has been devoted to nouns and verbs, little has been done in Computational Linguistics circles about prepositions. The rea- sons are quite clear: prepositions are probably the most polysemic category, possibly more so than adjectives, and linguistic realizations are extremely difficult to predict, not to mention the difficulty of identifying cross-linguistic regularities. From a linguistic perspective, several investigations have been carried out on quite diverse languages, emphasizing e.g., monolingual and cross- linguistic contrasts or the role of prepositions in syntactic alternations. These observations cover in general a small group of closely related preposi- tions. The semantic characterization of prepositions has also motivated the emergence of a few dedicated logical frameworks and reasoning procedures. Let us mention projects devoted to prepositions expressing space, time and movement in AI and in NLP, and also the development of formalisms and heuristics to handle PP attachment ambiguities. Let us also mention the large number of studies in psycholinguistics and in ethnolinguistics around specific preposition senses. These results remain quite often relatively the- oretical, focussing on cognitive representations. Most of them need a lot of elaborations to be used in NLP systems. From a representation point of view, prepositions seem to reach a very deep level in the cognitive-semantic
Article
We present a system for identifying the semantic relationships, or semantic roles, filled by constituents of a sentence within a semantic frame. Given an input sentence and a target word and frame, the system labels constituents with either abstract semantic roles, such as Agent or Patient, or more domain-specific semantic roles, such as Speaker, Message, and Topic. The system is based on statistical classifiers trained on roughly 50,000 sentences that were hand-annotated with semantic roles by the FrameNet semantic labeling project. We then parsed each training sentence into a syntactic tree and extracted various lexical and syntactic features, including the phrase type of each constituent, its grammatical function, and its position in the sentence. These features were combined with knowledge of the predicate verb, noun, or adjective, as well as information such as the prior probabilities of various combinations of semantic roles. We used various lexical clustering algorithms to generalize across possible fillers of roles. Test sentences were parsed, were annotated with these features, and were then passed through the classifiers. Our system achieves 82% accuracy in identifying the semantic role of presegmented constituents. At the more difficult task of simultaneously segmenting constituents and identifying their semantic role, the system achieved 65% precision and 61% recall. Our study also allowed us to compare the usefulness of different features and feature combination methods in the semantic role labeling task. We also explore the integration of role labeling with statistical syntactic parsing and attempt to generalize to predicates unseen in the training data.
Digraph Analysis of Dictionary Preposition Definitions. Word Sense Disambiguation: Recent Success and Future Directions
  • K C Litkowski
Litkowski, K. C. (2002). Digraph Analysis of Dictionary Preposition Definitions. Word Sense Disambiguation: Recent Success and Future Directions. Philadelphia, PA: Association for Computational Linguistics.
Preposition Semantic Classification via Treebank and Frame Net
  • Thomas O'hara
  • Jan Wiebe
O'Hara, Thomas, and Jan Wiebe. (2003) Preposition Semantic Classification via Treebank and Frame Net, Proc.of the 7th Conference on Natural Language Learning (CoNLL-2003), Edmonton, Canada, pp. 79-86.
Word Sense Disambiguation: Recent Success and Future Directions
  • K C Litkowski
Litkowski, K. C. (2002). Digraph Analysis of Dictionary Preposition Definitions. Word Sense Disambiguation: Recent Success and Future Directions. Philadelphia, PA: Association for Computational Linguistics.
A comprehensive grammar of the English language
  • R Quirk
  • S Greenbaum
  • G Leech
  • J Svartik
Quirk, R., Greenbaum, S., Leech, G., & Svartik, J. (1985). A comprehensive grammar of the English language. London: Longman.