
Alexander Yates- Temple University
Alexander Yates
- Temple University
About
28
Publications
7,550
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
3,434
Citations
Current institution
Publications
Publications (28)
Finding the right representations for words is critical for building accurate NLP systems when domain-specific labeled data for the task is scarce. This article investigates novel techniques for extracting features from n-gram models, Hidden Markov Models, and other statistical language models, including a novel Partial Lattice Markov Random Field...
The SAVE Science project is an attempt to ad-dress the shortcomings of current assessments of science. The project has developed two vir-tual worlds that each have a mystery or natu-ral phenomenon requiring scientific explana-tion; by recording students' behavior as they investigate the mystery, these worlds can be used to assess their understandin...
Finding the right representation for words is critical for building accurate NLP systems when domain-specific labeled data for the task is scarce. This paper investigates lan-guage model representations, in which lan-guage models trained on unlabeled corpora are used to generate real-valued feature vec-tors for words. We investigate ngram mod-els a...
Most supervised language processing systems show a significant drop-off in performance when they are tested on text that comes from a domain significantly different from the domain of the training data. Semantic role labeling techniques are typically trained on newswire text, and in tests their performance on fiction is as much as 19% worse than th...
Like most natural language disambiguation tasks, word sense disambiguation (WSD) re- quires world knowledge for accurate predic- tions. Several proxies for this knowledge have been investigated, including labeled cor- pora, user-contributed knowledge, and ma- chine readable dictionaries, but each of these proxies requires significant manual effort...
Most information extraction research identifies the state of the world in text, including the entities and the relation-ships that exist between them. Much less attention has been paid to the understanding of dynamics, or how the state of the world changes over time. Because intelligent behavior seeks to change the state of the world in rational an...
We aim to solve the problem of determining word senses for ambiguous biomedical terms with minimal human effort.
We build a fully automated system for Word Sense Disambiguation by designing a system that does not require manually-constructed external resources or manually-labeled training examples except for a single ambiguous word. The system uses...
The task of identifying synonymous relations and objects, or synonym resolution, is critical for high-quality information extraction. This paper investigates synonym reso- lution in the context of unsupervised information extraction, where neither hand-tagged training examples nor domain knowledge is available. The paper presents a scalable, fully-...
Supervised sequence-labeling systems in natural language processing often suffer from data sparsity because they use word types as features in their prediction tasks. Consequently, they have difficulty estimating parameters for types which appear in the test set, but seldom (or never) appear in the training set. We demonstrate that distributional r...
In previous work we introduced KNOWITALL a data-driven, Web-based information extraction system. This paper fo-cuses on the task of automatically extending the system's ini-tial ontology by extracting subclasses of given general classes and by discovering other closely related classes. We first show that the basic KNOWITALL model can be easily ex-t...
This paper describes a new method for providing recom- mendations tailored to a user's preferences using text min- ing techniques and online technical specications of prod- ucts. We rst learn a model that can predict the price of a product given automatically-determined features describing technical specications and users' opinions. We then use thi...
Traditional information extraction systems have focused on satisfying precise, narrow, pre-specified requests from small, homogeneous corpora. In contrast, the TextRunner system demonstrates a new kind of information extraction, called Open Information Extraction (OIE), in which the system makes a single, data-driven pass over the entire corpus and...
The task of identifying synonymous re- lations and objects, or Synonym Resolu- tion (SR), is critical for high-quality infor- mation extraction. The bulk of previous SR work assumed strong domain knowl- edge or hand-tagged training examples. This paper investigates SR in the con- text of unsupervised information extrac- tion, where neither is avail...
Traditional information extraction systems have focused on satisfying precise, narrow, pre-specified requests from small, homogeneous corpora. In contrast, the TextRunner system demonstrates a new kind of information extraction, called Open Information Extraction (OIE), in which the system makes a single, data-driven pass over the entire corpus and...
NLP systems for tasks such as question answering and information extraction typ- ically rely on statistical parsers. But the ef- ficacy of such parsers can be surprisingly low, particularly for sentences drawn from heterogeneous corpora such as the Web. We have observed that incorrect parses of- ten result in wildly implausible semantic interpretat...
The KnowItAll system aims to automate the tedious process of extracting large collections of facts (e.g., names of scientists or politicians) from the Web in an unsupervised, domain-independent, and scalable manner. The paper presents an overview of KnowItAll's novel architecture and design principles, emphasizing its distinctive ability to extract...
Manually querying search engines in order to accumulate a large body of factual information is a tedious, error-prone process of piecemeal search. Search engines retrieve and rank potentially relevant documents for human perusal, but do not extract facts, assess confidence, or fuse information from multiple documents. This paper introduces KNOWITAL...
Manually querying search engines in order to accumulate a large bodyof factual information is a tedious, error-prone process of piecemealsearch. Search engines retrieve and rank potentially relevantdocuments for human perusal, but do not extract facts, assessconfidence, or fuse information from multiple documents. This paperintroduces KnowItAll, a...
Our KNOWITALL system aims to automate the tedious process of extracting large collections of facts ( e.g., names of scientists or politicians) from the Web in an autonomous, domain-independent, and scalable man- ner. In its first major run, K NOWITALL extracted over 50,000 facts with high precision, but suggested a chal- lenge: How can we improve K...
The need for Natural Language Interfaces to databases (NLIs) has become increasingly acute as more and more people ac- cess information through their web browsers, PDAs, and cell phones. Yet NLIs are only usable if they map natural lan- guage questions to SQL queries correctly — people are un- willing to trade reliable and predictable user interfac...
Natural Language Interfaces to Databases (NLIs) can benefit from the advances in statis-tical parsing over the last fifteen years or so. However, statistical parsers require training on a massive, labeled corpus, and manually cre-ating such a corpus for each database is pro-hibitively expensive. To address this quandary, this paper reports on the P...
Natural Language Interfaces to Databases (NLIs) can benefit from the advances in statistical parsing over the last fifteen years or so. However, statistical parsers require training on a massive, labeled corpus, and manually creating such a corpus for each database is prohibitively expensive. To address this quandary, this paper reports on the PREC...
World Wide Web, consumers attempt to understand how corporations vary these prices over time. However, corporations change prices based on proprietary algorithms and hidden variables (e.g., the number of unsold seats on a flight). Is it possible to develop data mining techniques that will enable consumers to predict price changes under these condit...
A@B9:4' 440/ -*C 3D -,3" > E F G7*$G*#" -,HD G,/) /4I'/#;JK ? "/?0-,F /#;JK ? 11810-36540 G7*$ L%=!#I-S 4'=#T *7U5/"V- -,WP4;-,3(592%## 3/#F 4-X-5 / Y) F 7,"2 4X-,F C 3/#F [ ,/Y447V5%#!- /$H]-,F / H*I*M^*C #%Y 7_=: 4>9,49: - H('=4`a b bb;) B 4C '*3F*' 4GIc`d 4$64 . ,46--* 6 f-,F (7/0 @B5#Agh <;#%%Ci!--*-4"*$G *75/"j(#" %-,kgh" C -*- VL / '=l *-4"*$...
As household appliances grow in complexity and sophistication, they become harder and harder to use, particularly because of their tiny display screens and limited keyboards. This paper describes a strategy for building natural language interfaces to appliances that circumvents these problems. Our approach leverages decades of research on planning...
As product prices become increasingly available on the World Wide Web, consumers attempt to understand how corporations vary these prices over time. However, corporations change prices based on proprietary algorithms and hidden variables (e.g., the number of unsold seats on a flight). Is it possible to develop data mining techniques that will enabl...
As product prices become increasingly available on the World Wide Web, consumers attempt to understand how corporations vary these prices over time. However, corporations change prices based on proprietary algorithms and hidden variables (e.g., the number of unsold seats on a flight). Is it possible to develop data mining techniques that will enabl...