Collin F. Baker

Collin F. Baker
I · International Computer Science Institute

Ph.D. (Linguistics) UC Berkeley

About

48
Publications
11,617
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
3,808
Citations

Publications

Publications (48)
Article
The launch of the FrameNet project in 1997 was both a crystallisation point of decades worth of theoretical investigations into lexical meaning by Charles J. Fillmore and colleagues, as well as the seed of an ongoing line of corpus-based and computational research that seeks to implement Fillmore’s theory of Frame Semantics in a way that both provi...
Chapter
Word Sense Disambiguation (WSD) continues to present a formidable challenge for Natural Language Processing. To better perform automatic WSD, manually annotated corpora are created that serve as training and testing data. When the annotation labels are drawn from an independently created lexical resource, there is an added benefit of checking the r...
Chapter
Beginning with an overview of the theory of Frame Semantics as developed by Charles Fillmore and colleagues, this article details the annotation of English sentences by the FrameNet Project based on this theory. Fillmore’s lexical semantics theory asserts that the meanings of most words are understood via the semantic frames they evoke; e.g. arrest...
Conference Paper
Prof. Charles J. Fillmore had a lifelong interest in lexical semantics, and this culminated in the latter part of his life in a major research project, the FrameNet Project at the International Computer Science Institute in Berkeley, California (http://framenet. icsi.berkeley.edu). This paper reports on the background of this ongoing project, its c...
Article
The verb lexicon can be classified in different and complementary ways; each approach faces challenges and limits. We focus on two large-scale lexical resources, WordNet (Miller et al. 1990; Fellbaum 1998) and FrameNet (Baker et al. 2003). WordNet is a semantic network where lexical meaning is represented in terms of relations among word forms. In...
Chapter
This chapter contrasts a broad use of the term frame in cognitive science with its related use in a type of linguistic analysis, describing the principles and data structure of a particular research project (FrameNet) as a model for representing frame-based analyses of lexical meanings. It introduces an extension of the project to include the seman...
Article
Proceedings of the 25th Annual Meeting of the Berkeley Linguistics Society (2000)
Article
This paper will focus on recent and near-term future developments at FrameNet (FN) and the interoperability issues they raise. We begin by discussing the current state of the Berkeley FN database including major changes in the data format for the latest data release. We then briefly review two recent local projects, "Rapid Vanguarding”, which has c...
Conference Paper
There has been a great deal of excitement recently about using the "wisdom of the crowd" to collect data of all kinds, quickly and cheaply (Howe, 2008; von Ahn and Dabbish, 2008). Snow et al. (Snow et al., 2008) were the first to give a convincing demonstration that at least some kinds of linguistic data can be gathered from workers on the web more...
Conference Paper
Full-text available
The Manually Annotated Sub-Corpus (MASC) project provides data and annotations to serve as the base for a communitywide annotation effort of a subset of the American National Corpus. The MASC infrastructure enables the incorporation of contributed annotations into a single, usable format that can then be analyzed as it is or ported to any of a vari...
Article
The FrameNet database comprises an English lexicon, organized in terms of semantic frames. Frames describe situations or entities, along with their participants and props, termed frame elements. The frames are organized in an ontology-like network. For the lexical units, corpus annotations illustrate which frame elements are typically realized, and...
Conference Paper
Full-text available
WordNet and FrameNet are widely used lexi- cal resources, but they are very different from each other and are often used in completely different ways in NLP. In a case study in which a short passage is annotated in both frame- works, we show how the synsets and defini- tions of WordNet and the syntagmatic infor- mation from FrameNet can complement...
Conference Paper
Full-text available
In this paper, we describe the SemEval-2010 shared task on "Linking Events and Their Participants in Discourse". This task is a variant of the classical semantic role labelling task. The novel aspect is that we focus on linking local semantic argument structures across sentence boundaries. Specifically, the task aims at linking locally uninstantiat...
Conference Paper
Full-text available
To answer the critical need for sharable, reusable annotated resources with rich linguistic annotations, we are developing a Manually Annotated Sub-Corpus (MASC) including texts from diverse genres and manual annotations or manually-validated annotations for multiple levels, including WordNet senses and FrameNet frames and frame elements, both of w...
Article
The FrameNet Desktop application supports frame-semantic lexicon creation and corpus annotation. It is primarily intended for use with the English language at the Frame- Net project, but a recent revised version fa- cilitates easier distribution, installation and adaptation to new projects. This paper re- ports on a case study where the new Frame-...
Article
Full-text available
This task consists of recognizing words and phrases that evoke semantic frames as defined in the FrameNet project (http: //framenet.icsi.berkeley.edu), and their semantic dependents, which are usually, but not always, their syntactic dependents (including subjects). The train-ing data was FN annotated sentences. In testing, participants automatical...
Conference Paper
Full-text available
This task consists of recognizing words and phrases that evoke semantic frames as defined in the FrameNet project (http://framenet.icsi.berkeley.edu), and their semantic dependents, which are usually, but not always, their syntactic dependents (including subjects). The training data was FN annotated sentences. In testing, participants automatically...
Article
Reasoning about natural language most prominently requires combining semantically rich lexical resources with world knowledge, provided by ontologies. Therefore, we are building bindings from FrameNet – a lexical resource for English – to various ontologies depending on the application at hand. In this paper we show the first step toward such bindi...
Article
Since 1997, the FrameNet project at the International Computer Science Institute has been developing a uniquely detailed lexicon of English based on Frame Semantics and manually annotated examples from a balanced corpus, and has distributed copies of the lexicon and the annotations to a wide variety of researchers in natural language processing. Th...
Article
This paper examines the question of how a linguistic analysis of a written document can contribute to identifying, tracking and populating the "eventualities" that are presented in the document, either directly or indirectly, and representing de- grees of belief concerning them. It is our view that the role of lexical analysis (as exemplified in th...
Article
This report presents the early results of the Frame+Schema project, a project to repre-sent image and force-dynamic schemas in FrameNet. These structures are eventually intended to be used in leveraging the wealth of frame semantic knowledge available in the FrameNet database for situation-specific reasoning. We detail our image and force-dynamic s...
Conference Paper
This paper describes FrameNet [9,1,3], an online lexical resource for English based on the principles of frame semantics [5,7,2]. We provide a data category specification for frame semantics and FrameNet annotations in an RDF-based language. More specifically, we provide an RDF markup for lexical units, defined as a relation between a lemma and a s...
Article
The FrameNet database contains descriptions of more than 7,000 lexical units based on more than 130,000 annotated sentences. The database and its related software are central to the process of entering lexical information, annotating sentences, displaying the results, and distributing the FrameNet data. This article discusses both how the design of...
Article
The FrameNet project has developed a lexical knowledge base providing a unique level of detail as to the the possible syntactic realizations of the specific semantic roles evoked by each predicator, for roughly 7,000 lexical units, on the basis of annotating more than 100,000 example sentences extracted from corpora. An interim version of the Frame...
Article
This paper describes FrameNet (Lowe et al., 1997; Baker et al., 1998; Fillmore et al., 2002), an online lexical resource for English based on the principles of frame semantics (Fillmore, 1977a; Fillmore, 1982; Fillmore and Atkins, 1992), and considers the FrameNet database in reference to the proposed ISO model for linguistic annotation of language...
Article
The Berkeley FrameNet Project (Baker, Fillmore, & Lowe 1998; Fillmore & Baker 2001) (URL: http:// framenet.icsi.berkeley.edu/framenet) is creating an online lexical resource for English, based on the principles of Frame Semantics and supported by corpus evidence. A semantic frame is a script-like structure of inferences, which are linked to the mea...
Article
The classification of verbs in Levin's (1993) English Verb Classes and Alternations: A preliminary Investigation, on the basis of both intuitive semantic grouping and their participation in valence alternations, is often used by the NLP community as evidence of the semantic similarity of verbs (Jing & McKeown 1998; Lapata & Brew 1999; Kohl et al. 1...
Article
FrameNet is a three-year NSF-supported project in corpus-based computational lexicography, now in its second year (NSF IRI-9618838, "Tools for Lexicon Building"). The project's key features are (a) a commitment to corpus evidence for semantic and syntactic generalizations, and (b) the representation of the valences of its target words (mostly nouns...
Article
This paper describes a research effort that exploits information available in the FrameNet database and seeks to find, for argument-structure-bearing verbs, nouns, and adjectives, the lexical heads of the phrases that satisfy the core semantic roles of those predicates, and to create from the database of annotated sentences collections of structure...
Conference Paper
Full-text available
0. Overview The classification of verbs in Levin's (1993) English Verb Classes and Alternations: A preliminary Investigation, on the basis of both intuitive semantic grouping and their participation in valence alternations, is often used by the NLP community as evidence of the semantic similarity of verbs (Jing & McKeown 1998; Lapata & Brew 1999; K...
Article
An introduction to knowledge representation using Frame Semantics, as is being carried out in the FrameNet Project. A short news article is analyzed, providing examples of many of the questions being dealt with and the proposed solutions, including se-mantic composition, text coherence, polysemy and WSD, and evidentiality.
Article
This paper reports on the design of a lexical database for English which is currently under construction ("FrameNet-2" ), and describes the kinds of linguistic facts that the database is intended to make available, for both human and computer consumers. Building on a recently completed pilot study ("FrameNet-1" ), it is centered on the nature of th...
Conference Paper
Abstract The number and arrangement of semantic tags must be constrained, lest the size and complexity,of the tagging sets (tagsets) used for semantic annotation become unwieldy,both,for humans,and,computers. The description of lexical predicates within the framework,of frame semantics,provides a natural method,for selecting and structuring appropr...
Article
Full-text available
The Berkeley FrameNet Project (http://www.icsi.berkeley.edu/~framenet) is building an on-line lexical resource for contemporary English. The database provides information about the semantic and syntactic combinatorial possibilities (valences) of each item analyzed. This paper describes the conceptual basis for what has been called reframing of data...
Article
While FrameNet does not record the range of semantic relations found in thesaurus-style lexical resources like WordNet, it does provide a number of ways in which lexical units (LUs) can be seen as related to each other. This paper characterizes and motivates the networks of frame-to-frame relations that are being built on FrameNet's frames, and int...
Article
While FrameNet does not record the range of semantic relations found in thesaurus-style lexical resources like WordNet, it does provide a number of ways in which lexical units (LUs) can be seen as related to each other. This paper characterizes and motivates the networks of frame-to-frame relations that are being built on FrameNet's frames, and int...
Article
Full-text available
In this paper, we describe the SemEval-2010 shared task on "Linking Events and Their Par- ticipants in Discourse". This task is a variant of the classical semantic role labelling task. The novel aspect is that we focus on linking local semantic argument structures across sen- tence boundaries. Specifically, the task aims at linking locally uninstan...
Article
Full-text available
We analyze how different conceptions of lexical semantics affect sense annotations and how multiple sense inventories can be compared empirically, based on annotated text. Our study focuses on the MASC project, where data has been annotated using WordNet sense identifiers on the one hand, and FrameNet lexical units on the other. This allows us to c...
Article
Full-text available
The MASC project has produced a multi-genre corpus with multiple layers of linguistic annotation, together with a sentence corpus containing WordNet 3.1 sense tags for 1000 occurrences of each of 100 words produced by multiple annotators, accompanied by in-depth inter-annotator agreement data. Here we give an overview of the contents of MASC and th...

Network

Cited By