Figure 1 - uploaded by Yuen-Hsien Tseng
Content may be subject to copyright.

The parsed tree of a Chinese sentence.
Source publication
This study presents the Chinese Open Relation Extraction (CORE) system that is able to extract entity-relation triples from Chinese free texts based on a series of NLP techniques, i.e., word segmentation, POS tagging, syntactic parsing, and extraction rules. We employ the proposed CORE techniques to extract more
than 13 million entity-relations fo...
Context in source publication
Context 1
... the example sentence described above, the triple (愛迪生/Edison, 發明 了/invented, 燈泡/light bulb) is extracted by this approach. Figure 1 shows the parsed tree of a Chinese sentence for the relation extraction by CORE. The Chinese sentence "白宮預算委員會的民主 黨星期一發佈報告" ('Democrats on the House Budget Committee released a report on Monday') is the manual translation of one of the English sentences evaluated by ReVerb (Fader et al., 2011). ...
Similar publications
The typical emotion classification approach adopts one-step single-label classification using intra-sentence features such as unigrams, bigrams and emotion words. However, single-label classifier with intra-sentence features cannot ensure good performance for short microblogs text which has flexible expressions. Target to this problem, this paper p...
Word clustering is a popular research issue in the field of natural language processing. In this paper, Latent Dirichlet Allocation algorithm is used to extract the topics from nouns in the text, and the highest probability noun of each topic is selected as the centroids of the k-means algorithm. Experimental results show that this method can get b...
The semantic correlation is a very important research direction in Natural Language Processing, and the semantic relatedness is different from the semantic similarity . At present, the semantic correlation algorithm, which is based on the similarity of the semantic meaning, and it can’t achieve the desired results in a certain extent. In this paper...
Citations
... A well-written text document, no matter whether it is in English or Chinese, often consists of sentences whose entities are linked through semantic relations (for example, "employment" relation between "person" and "company", "has" relation between "product" and "feature", and "is a" relation between two "con- 5 cepts"). Relation Extraction (RE) is a task of extracting semantic relations from a given text document. ...
... There are several English oriented ORE systems, such as TextRunner [2], ReVerb [3], and OpenIE6 [4]. These systems, which use morphological features, usually perform well in English corpus but give poor results in Chinese texts 20 [5]. With understanding this incompatibility, a group of researchers recently have paid attention to the studies on Chinese ORE, used external syntactic or semantic knowledge to manually design extraction rules, and extracted open semantic relations from Chinese texts [5,6]. ...
... These systems, which use morphological features, usually perform well in English corpus but give poor results in Chinese texts 20 [5]. With understanding this incompatibility, a group of researchers recently have paid attention to the studies on Chinese ORE, used external syntactic or semantic knowledge to manually design extraction rules, and extracted open semantic relations from Chinese texts [5,6]. At the same time, another group of 2 Data Intelligence Just Accepted MS. https://doi.org/10.1162/dint_a_00227 ...
Open Relation Extraction (ORE) is a task of extracting semantic relations from a text document. Current ORE systems have significantly improved their efficiency in obtaining Chinese relations, when compared with conventional systems which heavily depend on feature engineering or syntactic parsing. However, the ORE systems do not use robust neural networks such as pre-trained language models to take advantage of large-scale unstructured data effectively. In respons to this issue, a new system entitled Chinese Open Relation Extraction with Knowledge Enhancement (CORE-KE) is presented in this paper. The CORE-KE system employs a pre-trained language model (with the support of a Bidirectional Long Short-Term Memory (BiLSTM) layer and a Masked Conditional Random Field (Masked CRF) layer) on unstructured data in order to improve Chinese open relation extraction. Entity descriptions in Wikidata and additional knowledge (in terms of triple facts) extracted from Chinese ORE datasets are used to fine-tune the pre-trained language model. In addition, syntactic features are further adopted in the training stage of the CORE-KE system for knowledge enhancement. Experimental results of the CORE-KE system on two large-scale datasets of open Chinese entities and relations demonstrate that the CORE-KE system is superior to other ORE systems. The F1-scores of the CORE-KE system on the two datasets have given a relative improvement of 20.1% and 1.3%, when compared with benchmark ORE systems, respectively. The source code is available at https://github.com/cjwen15/CORE-KE.
... Conventional ORE systems are largely based on syntactic patterns and heuristic rules that depend on external tools of natural language processing (e.g., PoS-taggers) and language-specific relation formations. For example, ReVerb (Fader, Soderland, and Etzioni 2011), ClausIE (Corro and Gemulla 2013), OpenIE4 (Mausam 2016) for English and CORE (Tseng et al. 2014), ZORE (Qiu and Zhang 2014) for Chinese, leverage external tools to obtain part-of-speech tags or dependency features and generate syntactic patterns to extract relational facts. These pattern-based approaches cannot handle the complexity and diversity of languages well, and the extraction is usually far from satisfactory. ...
... The benchmark ORE datasets in English (En) and Chinese (Zh) include OpenIE4 En (Mausam 2016), LSOIEwiki En , LSOIE-sci En (Solawetz and Larson 2021), COER Zh (Tseng et al. 2014), and SAOKE Zh (Sun et al. 2018), whose contexts are complex or multiple sentences. Nevertheless, the five datasets do not contain tuples with non-existent relations. ...
Open relation extraction (ORE) aims to assign semantic relationships among arguments, essential to the automatic construction of knowledge graphs (KG). The previous ORE methods and some benchmark datasets consider a relation between two arguments as definitely existing and in a simple single-span form, neglecting possible non-existent relationships and flexible, expressive multi-span relations. However, detecting non-existent relations is necessary for a pipelined information extraction system (first performing named entity recognition then relation extraction), and multi-span relationships contribute to the diversity of connections in KGs. To fulfill the practical demands of ORE, we design a novel Query-based Multi-head Open Relation Extractor (QuORE) to extract single/multi-span relations and detect non-existent relationships effectively. Moreover, we re-construct some public datasets covering English and Chinese to derive augmented and multi-span relation tuples. Extensive experiment results show that our method outperforms the state-of-the-art ORE model LOREM in the extraction of existing single/multi-span relations and the overall performances on four datasets with non-existent relationships.
... Numerous methods have been proposed since open IE was first introduced by Banko (Banko et al. 2007) [2], ReVerb (Fader et al. 2011;Etzioni et al. 2011) [3], and OLLIE (Schmitz et al. 2012) [4]. In the field of Chinese open IE, Tseng et al. (2014) [6] proposed CORE based on syntactic analysis, and Qiu et al. (2014) [7] proposed ZORE based on syntax and relational propagation algorithms. The above methods have achieved relatively good results, but there are still several limitations. ...
... Numerous methods have been proposed since open IE was first introduced by Banko (Banko et al. 2007) [2], ReVerb (Fader et al. 2011;Etzioni et al. 2011) [3], and OLLIE (Schmitz et al. 2012) [4]. In the field of Chinese open IE, Tseng et al. (2014) [6] proposed CORE based on syntactic analysis, and Qiu et al. (2014) [7] proposed ZORE based on syntax and relational propagation algorithms. The above methods have achieved relatively good results, but there are still several limitations. ...
... Besides, the lack of Chinese training corpus makes it even harder. Tseng et al. (2014) [6] proposed the CORE, which is the first attempt in the field of Chinese open IE. Given a Chinese text as input, CORE uses word segmentation, POS tagging, and syntactic analysis to automatically tag Chinese sentences to complete the extraction of entity-relational triples. ...
Open information extraction (IE) can support knowledge graph enrichment. Open IE systems are capable of extracting relational tuples from texts without the need for a pre-specified vocabulary. There have been more researches on open IE in English than in Chinese, and most of them rely on word segmentation and syntactic analysis tools, which have a great influence on the results. Besides, the lack of annotated Chinese corpus also makes it difficult to classify triples in a supervised manner. To address the problems, we propose an unsupervised Chinese open IE model, named graph augmentation model (GAM). It first uses the knowledge graph to obtain linked entities and types of entities, where the linked entities can benefit the word segmentation accuracy and the entity types can help obtain the domain and range of relations for knowledge graph schema completion. Then it uses manually set rules to obtain candidate triples and uses a designed graph-based algorithm to iteratively calculate the importance and accuracy of triples. Experiments demonstrate that our method outperforms existing baseline methods. Specifically, GAM is proved to effectively extract domain and range of relations that other methods cannot. GAM achieves high accuracy of triples above a certain threshold, and the triples obtained show benefits in enriching a knowledge graph without the need for data annotation.
... We also experiment with two pattern matching methods. Chinese Open Relation Extraction (CORE) [20] is a system designed for extracting entity-relation triples from text sequences based on a series of NLP techniques, including word segmentation, POS tagging, syntactic parsing, and rules extraction. Another method is an unsupervised OIE model based on Dependency Semantic Normal Forms (DSNF) [5] . ...
Open Information Extraction (OIE) is a task of generating the structured representations of information from natural language sentences. Recently years, many works have trained an End-to-End OIE extractor based on Sequence-to-Sequence (Seq2Seq) model and applied Reinforce Algorithm to update the model. However, the model performance often suffers from a large training variance and limited exploration. This paper introduces a reinforcement learning framework that enables an Advantage Actor-Critic (AAC) algorithm to update the Seq2Seq model with samples from a novel Confidence Exploration (CE). The AAC algorithm reduces the training variance with a fine-grained evaluation of each individual word. The confidence exploration provides effective training samples by exploring the word at key positions. Empirical evaluations demonstrate the leading performance of our Advantage Actor-Critic algorithm and Confidence Exploration over other comparison methods.
... At present, the research studies of relation extraction in Chinese mainly focus on the open domain and the methods of relation extraction in Chinese EMRs are still in the preliminary stage. A pipeline of NLP techniques was employed [15], a.k.a., word segmentation, POS-tagging, and syntactic parsing, to extract entity relations for an open domain. is system was considered as the first attempt to handle Chinese open relation extraction. ...
The Electronic Medical Record (EMR) contains a great deal of medical knowledge related to patients, which has been widely used in the construction of medical knowledge graphs. Previous studies mainly focus on the features based on surface semantics of EMRs for relation extraction, such as contextual feature, but the features of sentence structure in Chinese EMRs have been neglected. In this paper, a fusion dependency parsing-based relation extraction method is proposed. Specifically, this paper extends basic features with medical record feature and indicator feature that are applicable to Chinese EMRs. Furthermore, dependency syntactic features are introduced to analyse the dependency structure of sentences. Finally, the F1 value of relation extraction based on extended features is 4.87% higher than that of relation extraction based on basic features. And compared with the former, the F1 value of relation extraction based on fusion dependency parsing is increased by 4.39%. The results of experiments performed on a Chinese EMR data set show that the extended features and dependency parsing all contribute to the relation extraction.
... We compare two traditional OIE models based on pattern matching techniques. The first model CORE [29] is a system that selects entity-relation triples by matching a series of intermediate NLP components, including word segmentation, syntactic parsing, and rules extraction. The second model [12] builds an unsupervised OIE extractor based on Dependency Semantic Normal Forms (DSNF). ...
... In the recent study (Khairova et al., 2017), densities of simple and complex facts as features to measure the quality of articles in Russian Wikipedia were considered. The study (Yuen-Hsien Tseng et al., 2014) presents the first Chinese Open IE system that is able to extract entity-relation triples from Chinese free texts. ...
Open Information Extraction (OIE) is a modern strategy to extract the triplet of facts from Web-document collections. However, most part of the current OIE approaches is based on NLP techniques such as POS tagging and dependency parsing, which tools are accessible not to all languages. In this paper, we suggest the logical-linguistic model, which basic mathematical means are logical-algebraic equations of finite predicates algebra. These equations allow expressing a semantic role of the participant of a triplet of the fact (Subject-Predicate-Object) due to the relations of grammatical characteristics of words in the sentence. We propose the model that extracts the unlimited domain-independent number of facts from sentences of different languages. The use of our model allows extracting the facts from unstructured texts without requiring a pre-specified vocabulary, by identifying relations in phrases and associated arguments in arbitrary sentences of English, Kazakh, and Russian languages. We evaluate our approach on corpora of three languages based on English and Kazakh bilingual news websites. We achieve the precision of facts extraction over 87% for English corpus, over 82% for Russian corpus and 71% for Kazakh corpus.
... We also experiment with two pattern matching methods. Chinese Open Relation Extraction (CORE) [20] is a system designed for extracting entity-relation triples from text sequences based on a series of NLP techniques, including word segmentation, POS tagging, syntactic parsing, and rules extraction. Another method is an unsupervised OIE model based on Dependency Semantic Normal Forms (DSNF) [5] . ...
... Finally, regarding other languages, such as Chinese, German and Vietnamese, for example, some methods have been recently proposed in the literature. In the Chinese language, the methods CORE [60] and ZORE [61] use a shallow parser with a set of syntactic constraints to perform extractions. It is worth noting that CORE, according to the authors, was the first Open IE system for the Chinese language. ...
The number of documents published on the Web in languages other than English grows every year. As a consequence, the need to extract useful information from different languages increases, highlighting the importance of research into Open Information Extraction (OIE) techniques. Different OIE methods have dealt with features from a unique language; however, few approaches tackle multilingual aspects. In those approaches, multilingualism is restricted to processing text in different languages, rather than exploring cross-linguistic resources, which results in low precision due to the use of general rules. Multilingual methods have been applied to numerous problems in Natural Language Processing, achieving satisfactory results and demonstrating that knowledge acquisition for a language can be transferred to other languages to improve the quality of the facts extracted. We argue that a multilingual approach can enhance OIE methods as it is ideal to evaluate and compare OIE systems, and therefore can be applied to the collected facts. In this work, we discuss how the transfer knowledge between languages can increase acquisition from multilingual approaches. We provide a roadmap of the Multilingual Open IE area concerning state of the art studies. Additionally, we evaluate the transfer of knowledge to improve the quality of the facts extracted in each language. Moreover, we discuss the importance of a parallel corpus to evaluate and compare multilingual systems.
... Second, unreasonable clustering results or inaccurate relation extractions may be uncontrolled and unmeasured. Tseng et al. (2014) employed a pipeline of a series of NLP techniques, i.e., word segmentation, POS- tagging, and syntactic parsing, to extract entity relations for an open domain. Their system was considered as the first attempt to handle Chinese ORE. ...
Named entity relation extraction is an important subject in the field of information extraction. Although many English extractors have achieved reasonable performance, an effective system for Chinese relation extraction remains undeveloped due to the lack of Chinese annotation corpora and the specificity of Chinese linguistics. Here, we summarize three kinds of unique but common phenomena in Chinese linguistics. In this article, we investigate unsupervised linguistics-based Chinese open relation extraction (ORE), which can automatically discover arbitrary relations without any manually labeled datasets, and research the establishment of a large-scale corpus. By mapping the entity relations into dependency-trees and considering the unique Chinese linguistic characteristics, we propose a novel unsupervised Chinese ORE model based on Dependency Semantic Normal Forms (DSNFs). This model imposes no restrictions on the relative positions among entities and relationships and achieves a high yield by extracting relations mediated by verbs or nouns and processing the parallel clauses. Empirical results from our model demonstrate the effectiveness of this method, which obtains stable performance on four heterogeneous datasets and achieves better precision and recall in comparison with several Chinese ORE systems. Furthermore, a large-scale knowledge base of entity and relation, called COER, is established and published by applying our method to web text, which conquers the trouble of lack of Chinese corpora.