Alexander Yates

Alexander Yates
  • Temple University

About

28
Publications
7,550
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
3,434
Citations
Current institution
Temple University

Publications

Publications (28)
Article
Full-text available
Finding the right representations for words is critical for building accurate NLP systems when domain-specific labeled data for the task is scarce. This article investigates novel techniques for extracting features from n-gram models, Hidden Markov Models, and other statistical language models, including a novel Partial Lattice Markov Random Field...
Article
Full-text available
The SAVE Science project is an attempt to ad-dress the shortcomings of current assessments of science. The project has developed two vir-tual worlds that each have a mystery or natu-ral phenomenon requiring scientific explana-tion; by recording students' behavior as they investigate the mystery, these worlds can be used to assess their understandin...
Article
Finding the right representation for words is critical for building accurate NLP systems when domain-specific labeled data for the task is scarce. This paper investigates lan-guage model representations, in which lan-guage models trained on unlabeled corpora are used to generate real-valued feature vec-tors for words. We investigate ngram mod-els a...
Conference Paper
Most supervised language processing systems show a significant drop-off in performance when they are tested on text that comes from a domain significantly different from the domain of the training data. Semantic role labeling techniques are typically trained on newswire text, and in tests their performance on fiction is as much as 19% worse than th...
Conference Paper
Like most natural language disambiguation tasks, word sense disambiguation (WSD) re- quires world knowledge for accurate predic- tions. Several proxies for this knowledge have been investigated, including labeled cor- pora, user-contributed knowledge, and ma- chine readable dictionaries, but each of these proxies requires significant manual effort...
Article
Full-text available
Most information extraction research identifies the state of the world in text, including the entities and the relation-ships that exist between them. Much less attention has been paid to the understanding of dynamics, or how the state of the world changes over time. Because intelligent behavior seeks to change the state of the world in rational an...
Article
Full-text available
We aim to solve the problem of determining word senses for ambiguous biomedical terms with minimal human effort. We build a fully automated system for Word Sense Disambiguation by designing a system that does not require manually-constructed external resources or manually-labeled training examples except for a single ambiguous word. The system uses...
Article
The task of identifying synonymous relations and objects, or synonym resolution, is critical for high-quality information extraction. This paper investigates synonym reso- lution in the context of unsupervised information extraction, where neither hand-tagged training examples nor domain knowledge is available. The paper presents a scalable, fully-...
Conference Paper
Supervised sequence-labeling systems in natural language processing often suffer from data sparsity because they use word types as features in their prediction tasks. Consequently, they have difficulty estimating parameters for types which appear in the test set, but seldom (or never) appear in the training set. We demonstrate that distributional r...
Article
In previous work we introduced KNOWITALL a data-driven, Web-based information extraction system. This paper fo-cuses on the task of automatically extending the system's ini-tial ontology by extracting subclasses of given general classes and by discovering other closely related classes. We first show that the basic KNOWITALL model can be easily ex-t...
Conference Paper
This paper describes a new method for providing recom- mendations tailored to a user's preferences using text min- ing techniques and online technical specications of prod- ucts. We rst learn a model that can predict the price of a product given automatically-determined features describing technical specications and users' opinions. We then use thi...
Conference Paper
Traditional information extraction systems have focused on satisfying precise, narrow, pre-specified requests from small, homogeneous corpora. In contrast, the TextRunner system demonstrates a new kind of information extraction, called Open Information Extraction (OIE), in which the system makes a single, data-driven pass over the entire corpus and...
Conference Paper
The task of identifying synonymous re- lations and objects, or Synonym Resolu- tion (SR), is critical for high-quality infor- mation extraction. The bulk of previous SR work assumed strong domain knowl- edge or hand-tagged training examples. This paper investigates SR in the con- text of unsupervised information extrac- tion, where neither is avail...
Conference Paper
Traditional information extraction systems have focused on satisfying precise, narrow, pre-specified requests from small, homogeneous corpora. In contrast, the TextRunner system demonstrates a new kind of information extraction, called Open Information Extraction (OIE), in which the system makes a single, data-driven pass over the entire corpus and...
Conference Paper
NLP systems for tasks such as question answering and information extraction typ- ically rely on statistical parsers. But the ef- ficacy of such parsers can be surprisingly low, particularly for sentences drawn from heterogeneous corpora such as the Web. We have observed that incorrect parses of- ten result in wildly implausible semantic interpretat...
Article
The KnowItAll system aims to automate the tedious process of extracting large collections of facts (e.g., names of scientists or politicians) from the Web in an unsupervised, domain-independent, and scalable manner. The paper presents an overview of KnowItAll's novel architecture and design principles, emphasizing its distinctive ability to extract...
Article
Manually querying search engines in order to accumulate a large body of factual information is a tedious, error-prone process of piecemeal search. Search engines retrieve and rank potentially relevant documents for human perusal, but do not extract facts, assess confidence, or fuse information from multiple documents. This paper introduces KNOWITAL...
Conference Paper
Manually querying search engines in order to accumulate a large bodyof factual information is a tedious, error-prone process of piecemealsearch. Search engines retrieve and rank potentially relevantdocuments for human perusal, but do not extract facts, assessconfidence, or fuse information from multiple documents. This paperintroduces KnowItAll, a...
Conference Paper
Our KNOWITALL system aims to automate the tedious process of extracting large collections of facts ( e.g., names of scientists or politicians) from the Web in an autonomous, domain-independent, and scalable man- ner. In its first major run, K NOWITALL extracted over 50,000 facts with high precision, but suggested a chal- lenge: How can we improve K...
Conference Paper
The need for Natural Language Interfaces to databases (NLIs) has become increasingly acute as more and more people ac- cess information through their web browsers, PDAs, and cell phones. Yet NLIs are only usable if they map natural lan- guage questions to SQL queries correctly — people are un- willing to trade reliable and predictable user interfac...
Article
Natural Language Interfaces to Databases (NLIs) can benefit from the advances in statis-tical parsing over the last fifteen years or so. However, statistical parsers require training on a massive, labeled corpus, and manually cre-ating such a corpus for each database is pro-hibitively expensive. To address this quandary, this paper reports on the P...
Conference Paper
Natural Language Interfaces to Databases (NLIs) can benefit from the advances in statistical parsing over the last fifteen years or so. However, statistical parsers require training on a massive, labeled corpus, and manually creating such a corpus for each database is prohibitively expensive. To address this quandary, this paper reports on the PREC...
Article
Full-text available
World Wide Web, consumers attempt to understand how corporations vary these prices over time. However, corporations change prices based on proprietary algorithms and hidden variables (e.g., the number of unsold seats on a flight). Is it possible to develop data mining techniques that will enable consumers to predict price changes under these condit...
Article
Full-text available
A@B9:4' 440/ -*C 3D -,3" > E F G7*$G*#" -,HD G,/) /4I'/#;JK ? "/?0-,F /#;JK ? 11810-36540 G7*$ L%=!#I-S 4'=#T *7U5/"V- -,WP4;-,3(592%## 3/#F 4-X-5 / Y) F 7,"2 4X-,F C 3/#F [ ,/Y447V5%#!- /$H]-,F / H*I*M^*C #%Y 7_=: 4>9,49: - H('=4`a b bb;) B 4C '*3F*' 4GIc`d 4$64 . ,46--* 6 f-,F (7/0 @B5#Agh <;#%%Ci!--*-4"*$G *75/"j(#" %-,kgh" C -*- VL / '=l *-4"*$...
Article
As household appliances grow in complexity and sophistication, they become harder and harder to use, particularly because of their tiny display screens and limited keyboards. This paper describes a strategy for building natural language interfaces to appliances that circumvents these problems. Our approach leverages decades of research on planning...
Conference Paper
Full-text available
As product prices become increasingly available on the World Wide Web, consumers attempt to understand how corporations vary these prices over time. However, corporations change prices based on proprietary algorithms and hidden variables (e.g., the number of unsold seats on a flight). Is it possible to develop data mining techniques that will enabl...
Conference Paper
As product prices become increasingly available on the World Wide Web, consumers attempt to understand how corporations vary these prices over time. However, corporations change prices based on proprietary algorithms and hidden variables (e.g., the number of unsold seats on a flight). Is it possible to develop data mining techniques that will enabl...

Network

Cited By