ArticlePublisher preview available

A storytree-based model for inter-document causal relation extraction from news articles

Authors:
  • Schoow University
To read the full-text of this research, you can request a copy directly from the authors.

Abstract and Figures

With more and more news articles appearing on the Internet, discovering causal relations between news articles is very important for people to understand the development of news. Extracting the causal relations between news articles is an inter-document relation extraction task. Existing works on relation extraction cannot solve it well because of the following two reasons: (1) most relation extraction models are intra-document models, which focus on relation extraction between entities. However, news articles are many times longer and more complex than entities, which makes the inter-document relation extraction task harder than intra-document. (2) Existing inter-document relation extraction models rely on similarity information between news articles, which could limit the performance of extraction methods. In this paper, we propose an inter-document model based on storytree information to extract causal relations between news articles. We adopt storytree information to integer linear programming (ILP) and design the storytree constraints for the ILP objective function. Experimental results show that all the constraints are effective and the proposed method outperforms widely used machine learning models and a state-of-the-art deep learning model, with F1 improved by more than 5% on three different datasets. Further analysis shows that five constraints in our model improve the results to varying degrees and the effects on the three datasets are different. The experiment about link features also suggests the positive influence of link information.
This content is subject to copyright. Terms and conditions apply.
Knowledge and Information Systems (2023) 65:827–853
https://doi.org/10.1007/s10115-022-01781-7
REGULAR PAPER
A storytree-based model for inter-document causal relation
extraction from news articles
Chong Zhang1·Jiagao Lyu1·Ke Xu1
Received: 3 March 2021 / Revised: 9 October 2022 / Accepted: 16 October 2022 /
Published online: 3 November 2022
© The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature 2022
Abstract
With more and more news articles appearing on the Internet, discovering causal relations
between news articles is very important for people to understand the development of news.
Extracting the causal relations between news articles is an inter-document relation extraction
task. Existing works on relation extraction cannot solve it well because of the following
two reasons: (1) most relation extraction models are intra-document models, which focus
on relation extraction between entities. However, news articles are many times longer and
more complex than entities, which makes the inter-document relation extraction task harder
than intra-document. (2) Existing inter-document relation extraction models rely on simi-
larity information between news articles, which could limit the performance of extraction
methods. In this paper, we propose an inter-document model based on storytree information
to extract causal relations between news articles. We adopt storytree information to integer
linear programming (ILP) and design the storytree constraints for the ILP objective function.
Experimental results show that all the constraints are effective and the proposed method out-
performs widely used machine learning models and a state-of-the-art deep learning model,
with F1 improved by more than 5% on three different datasets. Further analysis shows that
five constraints in our model improve the results to varying degrees and the effects on the
three datasets are different. The experiment about link features also suggests the positive
influence of link information.
Keywords Relation classification ·News article ·Causal relation ·Constraint
1 Introduction
News keeps people informed about events happening around the world. With the increase in
the amount of information, the amount of news on news websites has exploded.Understanding
the relation between various news articles allows us to better sort out the development of
events and has a deeper understanding of various news. Therefore, it is meaningful and
BKe Xu
kexu@buaa.edu.cn
1State Key Lab of Software Development Environment, School of Computer Science and Engineering,
Beihang University, Beijing, China
123
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Relation extraction is one of the most important tasks in information extraction. The traditional works either use sentences or surface patterns (i.e., the shortest dependency paths of sentences) to build extraction models. Intuitively, the integration of these two kinds of methods will further obtain more robust and effective extraction models, which is, however, ignored in most of the existing works. In this paper, we aim to learn the embeddings of surface patterns to further augment the sentence-based models. To achieve this purpose, we propose a novel pattern embedding learning framework with the weighted multi-dimensional attention mechanism. To suppress noise in the training dataset, we mine the global statistics between patterns and relations and introduce two kinds of prior knowledge to guide the pattern embedding learning. Based on the learned embeddings, we present two augmentation strategies to improve the existing relation extraction models. We conduct extensive experiments on two popular datasets (i.e., NYT and KnowledgeNet) and observe promising performance improvements.
Article
Full-text available
In the two recent decades various security authorities around the world acknowledged the importance of exploiting the ever-growing amount of information published on the web on various types of events for early detection of certain threats, situation monitoring and risk analysis. Since the information related to a particular real-world event might be scattered across various sources and mentioned on different dates, an important task is to link together all event mentions that are interrelated. This article studies the application of various statistical and machine learning techniques to solve a new application-oriented variation of the task of event pair relatedness classification, which merges different fine-grained event relation types reported elsewhere into one concept. The task focuses on linking event templates automatically extracted from online news by an existing event extraction system, which contain only short text snippets, and potentially erroneous and incomplete information. Results of exploring the performance of shallow learning methods such as decision tree-based random forest and gradient boosted tree ensembles (XGBoost) along with kernel-based support vector machines (SVM) are presented in comparison to both simpler shallow learners as well as a deep learning approach based on long short-term memory (LSTM) recurrent neural network. Our experiments focus on using linguistically lightweight features (some of which not reported elsewhere) which are easily portable across languages. We obtained F1 scores ranging from 92% (simplest shallow learner) to 96.4% (LSTM-based recurrent neural network) evaluated on a newly created event linking corpus.
Article
Full-text available
The task of temporal slot filling (TSF) is to extract values of specific attributes for a given entity, called “facts”, as well as temporal tags of the facts, from text data. While existing work denoted the temporal tags as single time slots, in this paper, we introduce and study the task of Precise TSF (PTSF), that is to fill two precise temporal slots including the beginning and ending time points. Based on our observation from a news corpus, most of the facts should have the two points, however, fewer than 0.1% of them have time expressions in the documents. On the other hand, the documents’ post time, though often available, is not as precise as the time expressions of being the time a fact was valid. Therefore, directly decomposing the time expressions or using an arbitrary post-time period cannot provide accurate results for PTSF. The challenge of PTSF lies in finding precise time tags in noisy and incomplete temporal contexts in the text. To address the challenge, we propose an unsupervised approach based on the philosophy of truth finding. The approach has two modules that mutually enhance each other: One is a reliability estimator of fact extractors conditionally on the temporal contexts; the other is a fact trustworthiness estimator based on the extractor’s reliability. Commonsense knowledge (e.g., one country has only one president at a specific time) was automatically generated from data and used for inferring false claims based on trustworthy facts. For the purpose of evaluation, we manually collect hundreds of temporal facts from Wikipedia as ground truth, including country’s presidential terms and sport team’s player career history. Experiments on a large news dataset demonstrate the accuracy and efficiency of our proposed algorithm.
Article
Full-text available
Document-level relation extraction aims to extract the relationship among the entities in a paragraph of text. Compared with sentence-level, the text in document-level relation extraction is much longer and contains many more entities. It makes the document-level relation extraction a harder task. The number and complexity of entities make it necessary to provide enough information about the entities for the models in document-level relation extraction. To solve this problem, we put forward a document-level entity mask method with type information (DEMMT), which masks each mention of the entities by special tokens. By using this entity mask method, the model can accurately obtain every mention and type of the entities. Based on DEMMT, we propose a BERT-based one-pass model, through which we can predict the relationships among the entities by processing the text once. We test the proposed model on the DocRED dataset, which is a large scale open-domain document-level relation extraction dataset. The results on the manually annotated part of DocRED show that our approach obtains 6% F1 improvement compared with the state-of-the-art models that do not use pre-trained models and has 2% F1 improvement than BERT which does not use the DEMMT. On the distant supervision generated part of DocRED, the improvement of F1 is 2% compared with no pre-trained models, and 5% compared with pure BERT.
Article
Full-text available
Given the overwhelming amounts of information in our current 24/7 stream of new incoming articles, new techniques are needed to enable users to focus on just the key entities and concepts along with their relationships. Examples include news articles but also business reports and social media. The fact that relevant information may be distributed across diverse sources makes it particularly challenging to identify relevant connections. In this paper, we propose a system called MuReX to aid users in quickly discerning salient connections and facts from a set of related documents and viewing the resulting information as a graph-based visualization. Our approach involves open information extraction, followed by a careful transformation and filtering approach. We rely on integer linear programming to ensure that we retain only the most confident and compatible facts with regard to a user query, and finally apply a graph ranking approach to obtain a coherent graph that represents meaningful and salient relationships, which users may explore visually. Experimental results corroborate the effectiveness of our proposed approaches, and the local system we developed has been running for more than one year.
Extracting events accurately from vast news corpora and organize events logically is critical for news apps and search engines, which aim to organize news information collected from the Internet and present it to users in the most sensible forms. Intuitively speaking, an event is a group of news documents that report the same news incident possibly in different ways. In this article, we describe our experience of implementing a news content organization system at Tencent to discover events from vast streams of breaking news and to evolve news story structures in an online fashion. Our real-world system faces unique challenges in contrast to previous studies on topic detection and tracking (TDT) and event timeline or graph generation, in that we (1) need to accurately and quickly extract distinguishable events from massive streams of long text documents, and (2) must develop the structures of event stories in an online manner, in order to guarantee a consistent user viewing experience. In solving these challenges, we propose Story Forest , a set of online schemes that automatically clusters streaming documents into events, while connecting related events in growing trees to tell evolving stories. A core novelty of our Story Forest system is EventX , a semi-supervised scheme to extract events from massive Internet news corpora. EventX relies on a two-layered, graph-based clustering procedure to group documents into fine-grained events. We conducted extensive evaluations based on (1) 60 GB of real-world Chinese news data, (2) a large Chinese Internet news dataset that contains 11,748 news articles with truth event labels, and (3) the 20 News Groups English dataset, through detailed pilot user experience studies. The results demonstrate the superior capabilities of Story Forest to accurately identify events and organize news text into a logical structure that is appealing to human readers.
Chapter
Document-level RE requires reading, inferring and aggregating over multiple sentences. From our point of view, it is necessary for document-level RE to take advantage of multi-granularity inference information: entity level, sentence level and document level. Thus, how to obtain and aggregate the inference information with different granularity is challenging for document-level RE, which has not been considered by previous work. In this paper, we propose a Hierarchical Inference Network (HIN) to make full use of the abundant information from entity level, sentence level and document level. Translation constraint and bilinear transformation are applied to target entity pair in multiple subspaces to get entity-level inference information. Next, we model the inference between entity-level information and sentence representation to achieve sentence-level inference information. Finally, a hierarchical aggregation approach is adopted to obtain the document-level inference information. In this way, our model can effectively aggregate inference information from these three different granularities. Experimental results show that our method achieves state-of-the-art performance on the large-scale DocRED dataset. We also demonstrate that using BERT representations can further substantially boost the performance.
Article
Implicit discourse relation recognition (IDRR) remains an ongoing challenge. Recently, various neural network models have been proposed for this task, and have achieved promising results. However, almost all of them predict multi-level discourse senses separately, which not only ignores the semantic hierarchy of and mapping relationships between senses, but also may result in inconsistent predictions at different levels. In this paper, we propose a hierarchical multi-task neural network with a conditional random field layer (HierMTN-CRF) for multi-level IDRR. Specifically, a HierMTN component is designed to jointly model multi-level sense classifications, with these senses as supervision signals at different feature layers. Consequently, the hierarchical semantics of senses are explicitly encoded into features at different layers. To further exploit the mapping relationships between adjacent-level senses, a CRF layer is introduced to perform collective sense predictions. In this way, our model infers a sequence of multi-level senses rather than separate sense predictions in previous models. In addition, our model can be easily constructed based on existing IDRR models. Experimental results and in-depth analyses on the benchmark PDTB data set show that our model achieves significantly better and more consistent results over several competitive baselines on multi-level IDRR, without additional time overhead.