January 2016
·
109 Reads
·
79 Citations
This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.
January 2016
·
109 Reads
·
79 Citations
December 2015
·
17 Reads
·
24 Citations
Transactions of the Association for Computational Linguistics
Most approaches to relation extraction, the task of extracting ground facts from natural language text, are based on machine learning and thus starved by scarce training data. Manual annotation is too expensive to scale to a comprehensive set of relations. Distant supervision, which automatically creates training data, only works with relations that already populate a knowledge base (KB). Unfortunately, KBs such as FreeBase rarely cover event relations ( e.g. “person travels to location”). Thus, the problem of extracting a wide range of events — e.g., from news streams — is an important, open challenge. This paper introduces NewsSpike-RE, a novel, unsupervised algorithm that discovers event relations and then learns to extract them. NewsSpike-RE uses a novel probabilistic graphical model to cluster sentences describing similar events from parallel news streams. These clusters then comprise training data for the extractor. Our evaluation shows that NewsSpike-RE generates high quality training sentences and learns extractors that perform much better than rival approaches, more than doubling the area under a precision-recall curve compared to Universal Schemas.
June 2014
·
132 Reads
·
57 Citations
January 2014
·
56 Reads
·
42 Citations
July 2013
·
12 Reads
A translation graph is created using a plurality of reference sources that include translations between a plurality of different languages. Each entry in a source is used to create a wordsense entry, and each new word in a source is used to create a wordnode entry. A pair of wordnode and wordsense entries corresponds to a translation. In addition, a probability is determined for each wordsense entry and is decreased for each translation entry that includes more than a predefined number of translations into the same language. Bilingual translation entries are removed if subsumed by a multilingual translation entry. Triangulation is employed to identify pairs of common wordsense translations between a first, second, and third language. Translations not found in reference sources can also be inferred from the data comprising the translation graph. The translation graph can then be used for searches of a data collection in different languages.
January 2013
·
75 Reads
·
107 Citations
Chambers and Jurafsky (2009) demonstrated that event schemas can be automatically induced from text corpora. However, our analysis of their schemas identifies several weaknesses, e.g., some schemas lack a common topic and distinct roles are incorrectly mixed into a single actor. It is due in part to their pair-wise representation that treats subjectverb independently from verb-object. This often leads to subject-verb-object triples that are not meaningful in the real-world. We present a novel approach to inducing open-domain event schemas that overcomes these limitations. Our approach uses cooccurrence statistics of semantically typed relational triples, which we call Rel-grams (relational n-grams). In a human evaluation, our schemas outperform Chambers’s schemas by wide margins on several evaluation criteria. Both Rel-grams and event schemas are freely available to the research community.
November 2012
·
22 Reads
·
7 Citations
Both lexical translation and knowledge-based translation systems require sense-distinguished translation lexicons, yet such lexicons are expensive to create manually. However, the abundance of untagged monolingual corpora and the availability of bilingual, machine-readable dictionaries (MRDs) suggest an opportunity. Our PanLexicon system takes advantage of these resources to automatically construct a sense-distinguished multilingual lexicon. The challenge for PanLexicon is that free, bilingual MRDs do not make sense distinctions, and often have spotty coverage. PanLexicon uses word contexts from monolingual corpora to guide it in finding translation sets – sets of words that share the same word sense across multiple languages. By maintaining word sense distinctions, PanLexicon finds translations between language pairs that are not supported by any of its bilingual source dictionaries. PanLexicon runs in time linear in the size of its input, and thus scales readily to large numbers of languages. We built a prototype of PanLexicon with inputs from Spanish-English and Chinese-English dictionaries. Our initial experimental results show that PanLexicon is able to find high-quality translation sets despite the limitations of its inputs.
November 2012
·
284 Reads
·
2 Citations
The need for translation among the world's thousands of natural languages makes information access and communication costly. One possible solution is lemmatic communication: A human sender encodes a message into sequences of lemmata (dictionary words), a massively multilingual lexical translation engine translates them into lemma sequences in a target language, and a human receiver interprets them to infer the sender's intended meanings. Using a 13-million-lemma, 1300-language translation engine, we conducted an experiment in lemmatic communication with Spanish-and Hungarian-speaking subjects. Translingual communication was less successful than intralingual communication, and intralingual communication was less successful when the lemma sequences were artificially randomized before the receiver saw them (simulating word-order differences among languages). In all conditions, however, meanings were transmitted with high or moderate fidelity in at least 40% of the cases. The results suggest interface and translation-algorithm improvements that could increase the efficacy of lemmatic communication.
January 2012
·
34 Reads
·
18 Citations
June 2011
·
262 Reads
·
193 Citations
Open Information Extraction extracts relations from text without requiring a pre-specified domain or vocabulary. While existing techniques have used only shallow syntactic features, we investigate the use of semantic role labeling techniques for the task of Open IE. Semantic role labeling (SRL) and Open IE, although developed mostly in isolation, are quite related. We compare SRL-based open extractors, which perform computationally expensive, deep syntactic analysis, with TextRunner, an open extractor, which uses shallow syntactic analysis but is able to analyze many more sentences in a fixed amount of time and thus exploit corpus-level statistics. Our evaluation answers questions regarding these systems, including, can SRL extractors, which are trained on PropBank, cope with heterogeneous text found on the Web? Which extractor attains better precision, recall, f-measure, or running time? How does extractor performance vary for binary, n-ary and nested relations? How much do we gain by running multiple extractors? How do we select the optimal extractor given amount of data, available time, types of extractions desired?
... To enable the model to cope with unseen event types, there is a recent trend to investigate event detection in zero-shot scenarios, which discovers and classifies new events from texts without annotations. Specifically in this configuration, events are divided into seen and unseen types [4][5][6], in which "seen" means that the model can see the label information of these event types during the training process, while "unseen" implies that the model has not seen the label information of these types during training. Figure 1 illustrates the task of event detection in zero-shot scenarios (to be formally defined in Section 3). ...
December 2015
Transactions of the Association for Computational Linguistics
... Therefore, feedback emerges as another crucial factor in enhancing worker performance when dealing with complex open-ended crowdsourcing tasks. A summary of related work indicates that feedback from multiple sources could have a teaching effect on novice crowd workers, such as multi-turn contextual argumentation [58], peer communication [59], gated instruction [60], and automatic conversational interface [56]. ...
January 2016
... In certain instances, it may become imperative to reference an entity mentioned in the preceding sentence, despite the absence of direct mention. Also, knowing the types of entities can be helpful for relationship inference.Koch et al.[64] examined the increase in the performance of the weakly supervised model with NEL and coreference added. In tests with 48 relation classes in NYT and GORECE datasets, it was observed that precision increased by 44% and recall by 70%.Tonon et al.[65] tried to predict the rank of the given asset according to the context. ...
Reference:
Weakly-Supervised Relation Extraction
January 2014
... The Automatic Knowledge Base Construction (AKBC) from a dataset problem consists of finding a process t : Y KB → {(s, p, o)} able to triplify each data point following the RDF language 2 . Concrete examples of AKBC include learning a KB through semantic parsing, using Markov Logic [82] or relational probabilistic models [83]. The triplification process consists of extracting two entities or concepts, subject (s) and object (o), and a predicate (p) that connects them. ...
January 2012
... In early years, an event is defined either as a proposition of subject and predicate in studies on temporal news comprehension (Filatova and Hovy, 2001), or as a (predicate, dependency)-pair in studies on script learning (Chambers and Jurafsky, 2008). Later, Balasubramanian et al. (2013) represent events by (subject, relation, object)-triples for event schema induction. Recent studies in information extraction define an event as a more complex structure form consisting of event trigger, event type, event arguments and argument roles (Li et al., 2022b). ...
January 2013
... Other works present automatically derived hierarchically ordered summaries allowing users to drill down from a general overview to detailed information [36,37]. Therefore, these systems are neither interactive nor consider the user's feedback to update their internal summarization models. ...
June 2014
... Though open information extraction has received increasing attention, there are only a few publicly available large-scale Open-IE databases, such as TextRunner (Yates et al., 2007), ReVerb (Fader et al., 2011), PATTY (Nakashole et al., 2012, WiseNet 2.0 (Moro and Navigli, 2013), DefIE (Bovi et al., 2015), and OPIEC (Gashteovski et al., 2019). On the other hand, NELL (Carlson et al., 2010;Mitchell et al., 2018) uses a predefined seed ontology (including 123 categories and 55 relations) instead of extracting from scratch. ...
January 2007
... This has important ramifications for the case representation, which can include attribute-value pairs for only those linguistic constructs and knowledge sources that are available to the NLP system. In our work, the larger NLP system is the CIRCUS information extraction (IE) system (Lehnert 1990, Lehnert et al. 1992, Lehnert et al. 1993). In general, an information extraction system takes as input a set of unrestricted texts and 'summarizes' each text with respect to a prespecified topic or domain of interest: it finds useful information about the domain and encodes that information in a structured, template format, suitable for populating databases (Lehnert and Sundheim 1991, Chinchor et al. 1993, Cardie 1997). ...
January 1993
... An important component in most approaches is the probability that a mention links to one entity in the knowledge base. The prior probability, as suggested by Fader et al. [4], is a strong indicator to select the correct entity for a given mention, and consequently adopted as a baseline. Computation of this prior is typically done over knowledge sources such as Wikipedia. ...
January 2009
... The tuple includes a relational phrase and multiple or a pair of argument phrases, which are semantically connected by the relational phrase. For the collection of extraction patterns, some studies use hand-crafted rules, 3,16,17 whereas others learn from automatically labeled training datasets. 1,2,4 Additionally, a number of studies improved the accuracy of Open IE by transforming complex sentences, including several clauses into a collection of simplified independent clauses. ...
January 2010