Anthony Fader’s research while affiliated with Allen Institute for Brain Science and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (9)


IKE -An Interactive Tool for Knowledge Extraction
  • Conference Paper
  • Full-text available

June 2016

·

221 Reads

·

22 Citations

·

Sumithra Bhakthavatsalam

·

Chris Clark

·

[...]

·

Recent work on information extraction has suggested that fast, interactive tools can be highly effective; however, creating a usable system is challenging, and few publi-cally available tools exist. In this paper we present IKE, a new extraction tool that performs fast, interactive bootstrapping to develop high-quality extraction patterns for targeted relations. Central to IKE is the notion that an extraction pattern can be treated as a search query over a corpus. To oper-ationalize this, IKE uses a novel query language that is expressive, easy to understand, and fast to execute-essential requirements for a practical system. It is also the first interactive extraction tool to seamlessly integrate symbolic (boolean) and distributional (similarity-based) methods for search. An initial evaluation suggests that relation tables can be populated substantially faster than by manual pattern authoring while retaining accuracy, and more reliably than fully automated tools, an important step towards practical KB construction. We are making IKE publically available (http://allenai.org/ software/interactive-knowledge-extraction).

Download

Open Question answering over curated and extracted knowledge bases

August 2014

·

349 Reads

·

319 Citations

We consider the problem of open-domain question answering (Open QA) over massive knowledge bases (KBs). Existing approaches use either manually curated KBs like Freebase or KBs automatically extracted from unstructured text. In this paper, we present OQA, the first approach to leverage both curated and extracted KBs. A key technical challenge is designing systems that are robust to the high variability in both natural language questions and massive KBs. OQA achieves robustness by decomposing the full Open QA problem into smaller sub-problems including question paraphrasing and query reformulation. OQA solves these sub-problems by mining millions of rules from an unlabeled question corpus and across multiple KBs. OQA then learns to integrate these rules by performing discriminative training on question-answer pairs using a latent-variable structured perceptron algorithm. We evaluate OQA on three benchmark question sets and demonstrate that it achieves up to twice the precision and recall of a state-of-the-art Open QA system.


Figure 1: The parsed tree of a Chinese sentence. 
Chinese Open Relation Extraction for Knowledge Acquisition

April 2014

·

532 Reads

·

35 Citations

This study presents the Chinese Open Relation Extraction (CORE) system that is able to extract entity-relation triples from Chinese free texts based on a series of NLP techniques, i.e., word segmentation, POS tagging, syntactic parsing, and extraction rules. We employ the proposed CORE techniques to extract more than 13 million entity-relations for an open domain question answering application. To our best knowledge, CORE is the first Chinese Open IE system for knowledge acquisition



Paraphrase-Driven Learning for Open Question Answering

August 2013

·

296 Reads

·

343 Citations

We study question answering as a machine learning problem, and induce a function that maps open-domain questions to queries over a database of web extractions. Given a large, community-authored, question-paraphrase corpus, we demonstrate that it is possible to learn a semantic lexicon and linear ranking function without manually annotating questions. Our approach automatically generalizes a seed lexicon and includes a scalable, parallelized perceptron parameter estimation scheme. Experiments show that our approach more than quadruples the recall of the seed lexicon, with only an 8% loss in precision.


Identifying Relations for Open Information Extraction

January 2011

·

553 Reads

·

1,180 Citations

Open Information Extraction (IE) is the task of extracting assertions from massive corpora without requiring a pre-specified vocabulary. This paper shows that the output of state-of-the-art Open IE systems is rife with uninformative and incoherent extractions. To overcome these problems, we introduce two simple syntactic and lexical constraints on binary relations expressed by verbs. We implemented the constraints in the ReVerb Open IE system, which more than doubles the area under the precision-recall curve relative to previous extractors such as TextRunner and woepos. More than 30% of ReVerb's extractions are at precision 0.8 or higher---compared to virtually none for earlier systems. The paper concludes with a detailed analysis of ReVerb's errors, suggesting directions for future work.


Open Information Extraction: The Second Generation.

January 2011

·

312 Reads

·

439 Citations

How do we scale information extraction to the massive size and unprecedented heterogeneity of the Web corpus? Beginning in 2003, our KnowItAll project has sought to extract high-quality knowledge from the Web. In 2007, we introduced the Open Information Extraction (Open IE) paradigm which eschews hand-labeled training examples, and avoids domain-specific verbs and nouns, to develop unlexicalized, domain-independent extractors that scale to the Web corpus. Open IE systems have extracted billions of assertions as the basis for both common-sense knowledge and novel question-answering systems. This paper describes the second generation of Open IE systems, which rely on a novel model of how relations and their arguments are expressed in English sentences to double precision/recall compared with previous systems such as TEXTRUNNER and WOE.


Extracting Sequences from the Web

July 2010

·

25 Reads

·

3 Citations

Classical Information Extraction (IE) systems fill slots in domain-specific frames. This paper reports on SEQ, a novel open IE system that leverages a domain-independent frame to extract ordered sequences such as presidents of the United States or the most common causes of death in the U.S. SEQ leverages regularities about sequences to extract a coherent set of sequences from Web text. SEQ nearly doubles the area under the precision-recall curve compared to an extractor that does not exploit these regularities.


Scaling Wikipedia-based Named Entity Disambiguation to Arbitrary Web Text

January 2009

·

84 Reads

·

33 Citations

This paper investigates the "named-entity disam- biguation" task on the Web—identifying the refer- ent of a string, found on an arbitrary Web page. The GROUNDER system, introduced in this paper, ad- dresses two challenges not considered by previous work: how to utilize a priori information (e.g., Bill Clinton is more prominent on the Web than Clin- ton County) to improve disambiguation, and how to compose this prior information with contextual evidence. GROUNDER addresses both challenges by leverag- ing the user-contributed knowledge in Wikipedia and providing a novel formulation of the task. On a sample of strings drawn from the Web, GROUNDER achieves precision of 1.0 at recall 0.34, and preci- sion 0.90 at recall 0.60.

Citations (8)


... Kristjansson et al. [36] developed a system for automatically filling in structured contact information, using a constrained conditional random field to propose a mechanism to estimate the confidence of each field, and integrated the results into a human-computer interaction interface to make manual filling more intuitive and faster. Dalvi et al. [37] proposed an interactive knowledge extraction tool, which can provide a high-quality template for target relationship extraction through fast and interactive guidance. The creation of a relationship table can be much faster than when using the pure manual mode, and is more accurate and reliable than when using the fully automatic mode. ...

Reference:

Chinese Event Trigger Recommendation Model for High-Accuracy Applications
IKE -An Interactive Tool for Knowledge Extraction

... Some work focused on using rule, such as co-occurrence, and traditional Chinese medicine integrative database to extract relation among herb, genge, syndrome and disease, and to complete molecular mechanism analysis. On the other hand, a series of NLP methods, such as word segmentation and syntactic parsing, have been proposed to handle Chinese open relation extraction [13]. For example, A semi-supervised model was proposed in work [14] based on bootstrapping to discover the knowledge of gene functional, including extracting the relation among disease, symptom and gene in TCM bibliographic literature and MEDLINE. 1 The topic model is a popular method for relation extraction in TCM literature. ...

Chinese open relation extraction for knowledge acquisition
  • Citing Article
  • January 2014

... WikiAnswer dataset (Fader et al., 2013) is composed of about 18 million question pairs that are paraphrased and aligned word-by-word, providing synonym relationships. However, this dataset is limited to questions, which narrows down the scope of the paraphrases. ...

Paraphrase-Driven Learning for Open Question Answering
  • Citing Conference Paper
  • August 2013

... Weakly Paired Data with In-Batch Negatives The bulk of this data consists of title-body pairs obtained from diverse web sources such as Wikipedia, StackExchange, Semantic Scholar, Arxiv and PubMed. We also include citation pairs (SPECTER (Cohan et al., 2020), Semantic Scholar (Lo et al., 2020)), duplicate questions (StackExchange), and question answer pairs (PAQ (Lewis et al., 2021), WikiAnswers (Fader et al., 2014), SearchQA (Dunn et al., 2017)). For multilingual models, a subset of the English data is used, along with title-body pairs from mC4, multilingual Wikipedia, and multilingual Webhose, and Machine Translations of StackExchange. ...

Open Question answering over curated and extracted knowledge bases
  • Citing Article
  • August 2014

... A well-written text document, no matter whether it is in English or Chinese, often consists of sentences whose entities are linked through semantic relations (for example, "employment" relation between "person" and "company", "has" relation between "product" and "feature", and "is a" relation between two "con- 5 cepts"). Relation Extraction (RE) is a task of extracting semantic relations from a given text document. ...

Chinese Open Relation Extraction for Knowledge Acquisition

... An important component in most approaches is the probability that a mention links to one entity in the knowledge base. The prior probability, as suggested by Fader et al. [4], is a strong indicator to select the correct entity for a given mention, and consequently adopted as a baseline. Computation of this prior is typically done over knowledge sources such as Wikipedia. ...

Scaling Wikipedia-based Named Entity Disambiguation to Arbitrary Web Text
  • Citing Article
  • January 2009

... For example, Freebase [26], WordNet [27], and Wikidata [28] are well-known large KGs developed by human labor. In contrast, OpenIE [29][30][31] and YAGO [32] are developed to reduce human effort. The work most similar to ours, in addition to BertNet, is "Prompting as Probing" [33], where the authors use a prompt engineering approach to obtain knowledge from GPT-3 [2]. ...

Identifying Relations for Open Information Extraction
  • Citing Conference Paper
  • January 2011

... However, both graphs have similar average degrees (6.2112 and 5.7706), suggesting comparable overall connectivity per node. The number of self-loops is slightly higher in Graph 1 (70 vs. 33), though this does not significantly impact global structure. The clustering coefficients (0.1363 and 0.1434) indicate moderate levels of local connectivity, with Graph 2 exhibiting slightly more pronounced local clustering. ...

Open Information Extraction: The Second Generation.
  • Citing Conference Paper
  • January 2011