ArticlePDF Available

Improving Extraction of Chinese Open Relations Using Pre-trained Language Model and Knowledge Enhancement

Authors:

Abstract

Open Relation Extraction (ORE) is a task of extracting semantic relations from a text document. Current ORE systems have significantly improved their efficiency in obtaining Chinese relations, when compared with conventional systems which heavily depend on feature engineering or syntactic parsing. However, the ORE systems do not use robust neural networks such as pre-trained language models to take advantage of large-scale unstructured data effectively. In respons to this issue, a new system entitled Chinese Open Relation Extraction with Knowledge Enhancement (CORE-KE) is presented in this paper. The CORE-KE system employs a pre-trained language model (with the support of a Bidirectional Long Short-Term Memory (BiLSTM) layer and a Masked Conditional Random Field (Masked CRF) layer) on unstructured data in order to improve Chinese open relation extraction. Entity descriptions in Wikidata and additional knowledge (in terms of triple facts) extracted from Chinese ORE datasets are used to fine-tune the pre-trained language model. In addition, syntactic features are further adopted in the training stage of the CORE-KE system for knowledge enhancement. Experimental results of the CORE-KE system on two large-scale datasets of open Chinese entities and relations demonstrate that the CORE-KE system is superior to other ORE systems. The F1-scores of the CORE-KE system on the two datasets have given a relative improvement of 20.1% and 1.3%, when compared with benchmark ORE systems, respectively. The source code is available at https://github.com/cjwen15/CORE-KE.
Improving Extraction of Chinese Open Relations Using
Pre-trained Language Model and Knowledge
Enhancement
Chaojie Wena, Xudong Jiaa, Tao Chena,
aFaculty of Intelligent Manufacturing, Wuyi University, Jiangmen, Guangdong, China.
Abstract
Open Relation Extraction (ORE) is a task of extracting semantic relations from
a text document. Current ORE systems have signicantly improved their e-
ciency in obtaining Chinese relations, when compared with conventional systems
which heavily depend on feature engineering or syntactic parsing. However, the
ORE systems do not use robust neural networks such as pre-trained language
models to take advantage of large-scale unstructured data eectively. In re-
spons to this issue, a new system entitled Chinese Open Relation Extraction
with Knowledge Enhancement (CORE-KE) is presented in this paper. The
CORE-KE system employs a pre-trained language model (with the support of
a Bidirectional Long Short-Term Memory (BiLSTM) layer and a Masked Con-
ditional Random Field (Masked CRF) layer) on unstructured data in order to
improve Chinese open relation extraction. Entity descriptions in Wikidata and
additional knowledge (in terms of triple facts) extracted from Chinese ORE
datasets are used to ne-tune the pre-trained language model. In addition, syn-
tactic features are further adopted in the training stage of the CORE-KE system
for knowledge enhancement. Experimental results of the CORE-KE system on
two large-scale datasets of open Chinese entities and relations demonstrate that
the CORE-KE system is superior to other ORE systems. The F1-scores of the
Corresponding author
Email addresses: klaywen15@163.com (Chaojie Wen), xudong.jia@csun.edu
(Xudong Jia), chentao1999@gmail.com (Tao Chen)
Preprint submitted to Data Intelligence May 12, 2023
© 2023 Chinese Academy of Sciences. Published under a Creative Commons Attribution 4.0 International (CC
BY 4.0) license.
Data Intelligence Just Accepted MS.
https://doi.org/10.1162/dint_a_00227
Downloaded from http://direct.mit.edu/dint/article-pdf/doi/10.1162/dint_a_00227/2170609/dint_a_00227.pdf by guest on 07 November 2023
CORE-KE system on the two datasets have given a relative improvement of
20.1% and 1.3%, when compared with benchmark ORE systems, respectively.
The source code is available at https://github.com/cjwen15/CORE-KE.
Keywords: Chinese open relation extraction, Pre-trained language model,
Knowledge enhancement
1. Introduction
A well-written text document, no matter whether it is in English or Chinese,
often consists of sentences whose entities are linked through semantic relations
(for example, “employment” relation between “person” and “company”, “has”
relation between “product” and “feature”, and “is a” relation between two “con-5
cepts”). Relation Extraction (RE) is a task of extracting semantic relations from
a given text document. As an important step in Information Extraction (IE),
relation extraction, after Named Entity Recognition (NER), identies semantic
relations in each sentence of a text document.
Open relation extraction (ORE), dierent from conventional relation ex-10
traction, does not require pre-dened relation types to automatically discover
possible relations of interest using text corpus without any human involvement
[1]. Given a sentence [在欧杯的精彩演出,尼加盟了曼联 (After giving
a wonderful show in the European Cup, Rooney joined Manchester United)] (as
shown in Figure 1), a triple (鲁尼 Rooney , joined , Manchester United
15
) can be extracted by an ORE system, where the relation 加盟 (joined) is not
dened in advance in the ORE system.
There are several English oriented ORE systems, such as TextRunner [2],
ReVerb [3], and OpenIE6 [4]. These systems, which use morphological features,
usually perform well in English corpus but give poor results in Chinese texts20
[5]. With understanding this incompatibility, a group of researchers recently
have paid attention to the studies on Chinese ORE, used external syntactic
or semantic knowledge to manually design extraction rules, and extracted open
semantic relations from Chinese texts [5, 6]. At the same time, another group of
2
Data Intelligence Just Accepted MS.
https://doi.org/10.1162/dint_a_00227
Downloaded from http://direct.mit.edu/dint/article-pdf/doi/10.1162/dint_a_00227/2170609/dint_a_00227.pdf by guest on 07 November 2023
researchers have developed Chinese ORE methods/systems to extract semantic25
relations through relatively simple supervised neural networks [7, 8]. These
Chinese ORE methods/systems however do not use robust neural networks such
as Pre-trained Language Models (PLM) to eectively extract open relations
from large-scale unlabeled data.
Input sentence ,Ҷ㚄DŽAfter giving a wonderful show in the European Cup, Rooney joined Manchester United.
Ҷ㚄DŽChinese OREoutput
B-E1I-E1
O O O O O O O O O O O B-R I-R OB-E2I-E2O
(励ቬ,࣐ⴏ,ᴬ㚄)
Rooney joined Manchester United
Predicted triple
Figure 1. An example of Chinese ORE.
In this paper, a new ORE system, entitled the Chinese Open Relation30
Extraction with Knowledge Enhancement (CORE-KE) system, is presented
to strengthen the Bidirectional LongShort-Term Memory-Masked Conditional
Random Field (BiLSTM-Masked CRF) layers on a PLM with external syntactic
or semantic knowledge for Chinese open relation extraction. First, Wikidata1
(a free and open knowledge base that can be read and edited by both human be-35
ings and machines) and the Dependency Semantic Normal Forms (DSNFs) tools
developed by Jia et al. [9] are adopted in this study to obtain additional descrip-
tive and triplet knowledge corpus from Chinese ORE datasets. The enhanced
knowledge base is further used to pre-train and ne-tune the PLM within the
CORE-KE system. Second, the Language Technology Platform (LTP) [10] is40
used to extract syntactic features and these features are prepared as inputs for
the PLM. Third, the BiLSTM layer and the Masked CRF layer [11] are further
integrated into the PLM for sequence labeling and Chinese ORE.
Chinese ORE problems (as shown in Figure 1) are delimitated in the CORE-
KE system as the problems which can be solved by a sequence labeling task.45
1https : //www.wikidata.org/wiki/W ikidata :M ain_P ag e
3
Data Intelligence Just Accepted MS.
https://doi.org/10.1162/dint_a_00227
Downloaded from http://direct.mit.edu/dint/article-pdf/doi/10.1162/dint_a_00227/2170609/dint_a_00227.pdf by guest on 07 November 2023
Given any sentence s= [c1,, cN]with NChinese characters, the CORE-KE
system rst assigns each character with a BIO tag and then uses BIO tag-
ging schemes to extract a knowledge triple (E1,R,E2) from the system’s la-
beling result. E1,Rand E2, representing head entity, relation, and tail en-
tity, respectively, each consists of continuous character segment [ci,, cj]where50
1ijNof the given sentence s. Using the sequence labeling technique
and the PLM model, the CORE-KE system predicts relations from any given
Chinese sentences.
The CORE-KE system provides an eective tool in addressing challenges
such as ne-grained entity acquisition, long-tail entity mining, and taxonomy55
evolution in the elds of information retrieval, knowledge graph completion,
and intelligent question answering [12, 13]. Entities in a sentence are linked
to Wikidata and expanded external knowledge (extracted from Chinese open
relation datasets). The linked entities are further combined with their original
sentence to form an input into the PLM of the CORE-KE system for model60
training. In doing so, the CORE-KE system can acquire ne-grained entities
from Wikipedia and other knowledge sources. Additionally, the PLM in the
CORE-KE system helps train the open relation extraction process with large-
scale unstructured texts containing many long-tail words. Furthermore, the
CORE-KE system is a robust system which re-trains its ORE process on new65
data at low costs and can evolve over time.
The main contributions of our work are summarized below:
1) We proposed a Chinese ORE system, CORE-KE, with the support of
pre-trained language model and manifold external knowledge. We have also
published the code of the system for researchers to reproduce the experiments70
in the paper. To the best of our knowledge, this is one of the few open-source
Chinese ORE systems.
2) Experimental results demonstrate that the CORE-KE system can eec-
tively alleviate ne-grained entity acquisition, which is a challenge in Chinese
ORE.75
3) Experimental results show that the CORE-KE system performs well in
4
Data Intelligence Just Accepted MS.
https://doi.org/10.1162/dint_a_00227
Downloaded from http://direct.mit.edu/dint/article-pdf/doi/10.1162/dint_a_00227/2170609/dint_a_00227.pdf by guest on 07 November 2023
Chinese open relation extraction, giving a relative improvement of 20.1% and
1.3% in F1-score when compared with two state-of-the-art ORE systems on the
COER dataset and the SpanSAOKE/SpanSAOKE-NO dataset, respectively.
The rest of this paper is organized as follows: Section 2 describes previous80
research work related to Chinese open relation extraction. With a good un-
derstanding of the research in Chinese ORE, we present our CORE-KE system
in Section 3. Section 4 evaluates the CORE-KE system from the view of its
performance in open relation extraction from the COER and SpanSAOKE-NO
datasets. Furthermore, this section provides comparisons of the CORE-KE sys-85
tem against several benchmark Chinese ORE systems. The paper is concluded
in Section 5 by summarizing the contributions of the research work and outlining
the future research directions.
2. Related Work
As an important subtask of information extraction and knowledge acqui-90
sition, open relation extraction has attracted many researchers’ attention in
recent years. Mainstream ORE systems can be divided into 1) unsupervised
and rule-based systems, 2) supervised and statistical systems and 3) supervised
neural systems.
Unsupervised and rule-based systems mainly apply syntax features to de-95
signed syntactic constraints or paradigms in order to extract relationships be-
tween entities. Typical systems include TextRunner [2], ReVerb [3], SRLIE [14],
ClausIE [15], RelNoun [16], PropS [17], OpenIE4 [18], MinIE [19], Graphene
[20], and CALMIE [21]. For example, researchers in the ReVerb system ex-
tracted relations in the form of (arg1, relation phrase, arg2) by 1) articulating100
two simple but powerful constraints (that is, a syntactic constraint and a lexi-
cal constraint) and 2) expressing relation phrases via verbs in English sentences.
The ReVerb system takes English sentences as inputs, identies candidate pairs
of the noun phrase (NP) arguments (arg1, arg2) from sentences, employs the
ReVerb extractor to label each word of any two NP arguments as part of a105
5
Data Intelligence Just Accepted MS.
https://doi.org/10.1162/dint_a_00227
Downloaded from http://direct.mit.edu/dint/article-pdf/doi/10.1162/dint_a_00227/2170609/dint_a_00227.pdf by guest on 07 November 2023
potential relation phrase, and extracts relations. The ClausIE system explored
the linguistic knowledge of English grammars and mapped dependency relations
of English sentences to clause constituents. Since the ClausIE system relies on
dependency parsing and a small set of domain-independent lexicon, it could
lead to over-specication in arguments. To solve this problem, researchers in110
the MinIE system extracted open relations with semantic annotations, identied
and removed specic parts from its relation extraction process, and enhanced the
performance of the open relation extraction. It is noted that the MinIE system,
constrained by its single annotation type, cannot extract adjectives associated
with a noun (for example, assistant director). In summary, unsupervised and115
rule-based ORE systems can be aected by the implicit error propagation of
the tools being used, since they require manually constructed rules and depend
on syntactic outputs of NLP tools, which often lead to ineciency or proneness
to errors in the ORE process [22].
The ORE systems supported by supervised and statistical techniques con-120
struct open pattern templates over a large training set and then apply these
templates to extract open relations. Typical ORE systems include OLLIE [23],
Stanford [24], BONIE [25],RSNs [26], and DeepKE [27]. For example, the OL-
LIE system uses a set of “seed” tuples with high precision from the ReVerb
method to bootstrap a large training set and builds open pattern templates.125
These templates are then applied to individual sentences for ORE. Similarly,
the BONIE system also creates “seed” tuples or facts, throws the facts into
a training dataset in a bootstrapping process, and develops patterns through
dependency parses. The BONIE system constructs numerical tuples by pat-
tern matching and parse-based learning. In summary, supervised and statisti-130
cal technique-supported ORE systems depend heavily on learning pattern tem-
plates. As training datasets vary, pattern templates may not be well constructed
with high quality and diversity. As a result, relations may not be extracted with
a good performance.
In recognizing the drawbacks of the above ORE systems, a few researchers135
have developed another type of ORE systems using deep learning techniques (or
6
Data Intelligence Just Accepted MS.
https://doi.org/10.1162/dint_a_00227
Downloaded from http://direct.mit.edu/dint/article-pdf/doi/10.1162/dint_a_00227/2170609/dint_a_00227.pdf by guest on 07 November 2023
neural networks). Syntactic and semantic features of texts can be captured auto-
matically through NLP tasks. These deep learning ORE systems are mainly con-
sidered as supervised systems supported by labeling-based, generation-based,
and span-based techniques [4].140
The labeling-based ORE systems produce a sequence of word labels to mark
relationships between words and entities. For example, the SenseOIE system
[28] identies words through syntactic heads of relations and labels each head
word for the extraction of semantic relations. It is noted that the labeling-based
ORE systems cannot extract relations from sentences where entities and relation145
are overlapped.
The generation-based ORE systems, including Seq2Seq [29] and IMoJIE [30],
use sequence-to-sequence approaches to generate extractions sequentially. For
example, the Seq2Seq system treats open relation extraction as a sequence-to-
sequence generation problem, where input sequences are sentences, while output150
sequences are relation tuples with special placeholders. The Seq2Seq system uses
an encoder-decoder framework to obtain relation tuples from large-scale unstruc-
tured texts. It does not rely on hand-crafted semantic patterns and rules for
open relation extraction. Instead, it bootstraps a large volume of high-quality
examples (from state-of-the-art Open IE systems) into the models training pro-155
cess. The IMoJIE system extends the Copy-Attention of the Seq2Seq system
and creates additional open relations from extracted tuples. The generation-
based systems, in summary, produce new facts or relation triples from previous
triples through an iterative process. When false triples are generated from pre-
vious triples, a ripple eect may exist within the iterative process. As a result,160
wrong extraction of open relations may be experienced.
The RnnOIE [31] and SpanOIE [32] systems are examples of span-based
ORE systems in which any token subsequence (or span) constitutes a potential
entity, while a relation can hold between any pair of spans [33]. The RnnOIE
system formulates open relation extraction as a sequence tagging problem and165
applies the BiLSTM model for training its sequence labeling process. This
system addressed several task-specic challenges, including the BIO encoding of
7
Data Intelligence Just Accepted MS.
https://doi.org/10.1162/dint_a_00227
Downloaded from http://direct.mit.edu/dint/article-pdf/doi/10.1162/dint_a_00227/2170609/dint_a_00227.pdf by guest on 07 November 2023
predicates with multiple extractions and condence estimation. The SpanOIE
system improves the RnnOIE system by having two modules, with the rst
module to nd predicate spans in a sentence and the second module to combine170
the found predicate spans and their associated sentences. The SpanOIE system
forms combined segments as input and output argument spans for open relation
extraction. The span-based ORE systems extract spans to construct unrealistic
entity-relations, which may lead to wrong results. For example, given a sentence
[他相信湖人队会获得冠军 (He believes the Lakers will win the champion)], the175
extracted relation from a span-based ORE system could be (湖人队 the Lakers ,
获得 win , the champion), which is not the true implication of the original
sentence.
English has been the primary language in open relation extraction research.
Most of the above-mentioned ORE systems use English syntactic features. They180
cannot be directly applied to Chinese context. With a good understanding of
the limitations of the existing ORE systems, researchers in CORE [5] explored
the unsupervised and rule-based methods for Chinese open relation extraction.
The CORE system employs word segmentation, part-of-speech (POS) tagging,
syntactic parsing, and other NLP techniques, to automatically annotate Chinese185
sentences. In doing so, input sentences are chunked and entity-relation triples
are extracted. The ZORE [6] system is a supervised and statistical method. It
rst identies relation candidates from automatically parsed dependency trees
and then extracts relations iteratively through a novel double propagation al-
gorithm. It is noted that this system has a logistic regression classier which190
assigns features with weights and trains relation triples on the Wiki-500 dataset.
In addition, this system gives a condence score to each extracted relation. It
thus reduces false relations signicantly in the tuple-generation process and im-
proves the performance of ORE.
Researchers in HNN4ORT [34], PGCORE [8], NPORE [7], DBMCSS [35],195
and MGD-GNN [36] have used supervised neural networks for Chinese ORE.
The HNN4ORT system (a labeling-based system) employs the Ordered Neu-
rons LSTM model [37] to encode syntactic information and capture associations
8
Data Intelligence Just Accepted MS.
https://doi.org/10.1162/dint_a_00227
Downloaded from http://direct.mit.edu/dint/article-pdf/doi/10.1162/dint_a_00227/2170609/dint_a_00227.pdf by guest on 07 November 2023
among arguments and relations. The NPORE system implements a graph clique
mining algorithm to chunk Chinese noun phrases into modiers and headwords,200
generate candidate relation triples, and extract Chinese open semantic relations.
The PGCORE system treats relation extraction as a text summary task. It uses
a sequence-to-sequence framework and the Pointer-Generator mechanism [38] to
solve the ORE problem. The DBMCSS and MGD-GNN systems apply the span-
based ORE method in the Chinese context. They extract named entity spans,205
lter entity pairs, and extract Chinese open relations. The MGD-GNN model
constructs a multi-grained dependency graph to incorporate dependency and
word boundary information and employs the Graph Neural Network (GNN) to
get node representations for predicate and argument predictions.
The CORE-KE system presented in this paper implements a supervised210
neural ORE approach. It uses syntactic and semantic features, such as POS
tagging, dependency parsing, and large-scale external knowledge, to improve
the performance of Chinese open relation extraction on unstructured texts.
3. Methodology
In this section, the Chinese Open Relation Extraction with Knowledge En-215
hancement (CORE-KE) system powered by a pre-trained language model is pre-
sented. Knowledge enhancement in this system refers to the way to understand
concepts (or entities in a Chinese sentence) through description information
of concepts or other relevant knowledge (including dependency relations). An
overview of the CORE-KE system is shown in Figure 2. The CORE-KE system220
(which implements the WoBERT_plus + BiLSTM + Masked CRF model with
knowledge enhancement) has four modules/model:
1) the Descriptive Knowledge Acquisition Module. This module is designed
to obtain description information of entities in the training set from Wikidata.
The description information is used to pre-train the WoBERT_plus + BiLSTM225
+ Masked CRF model (or the WBM model).
2) the Triplet Knowledge Generation Module. This module applies DSNFs
9
Data Intelligence Just Accepted MS.
https://doi.org/10.1162/dint_a_00227
Downloaded from http://direct.mit.edu/dint/article-pdf/doi/10.1162/dint_a_00227/2170609/dint_a_00227.pdf by guest on 07 November 2023
tools to generate extra dependency relations for Chinese entities. The depen-
dency relations are further used to ne-tune the WBM model (which is already
pretrained by the Descriptive Knowledge Acquisition module).230
3) the Syntactic Feature Generation Module. This module introduces the
LTP tool to generate POS tag and dependency parsing tag for each character
in Chinese sentences.
4) the WBM model. The WBM model is integrated with a PLM. It combines
characters (in Chinese sentences) with POS tags and dependency parsing tags235
and uses the combined sequences as inputs. It produces the characters (attached
with their BIO tags) where predicted relations (in the format of triples) are
embedded.
10
Data Intelligence Just Accepted MS.
https://doi.org/10.1162/dint_a_00227
Downloaded from http://direct.mit.edu/dint/article-pdf/doi/10.1162/dint_a_00227/2170609/dint_a_00227.pdf by guest on 07 November 2023
W
o
B
E
R
T
_
l
u
s
P
r
e
-
t
r
a
i
n
e
d
L
a
n
g
u
a
g
e
M
o
d
e
l
WoBERT_plus Pre-trained Language Model
B
-
E
1
B-E1
I
-
E
1
I-E1
B
-
E
2
B-E2
I
-
E
2
I-E2
O
O
Training
set
...
...
...
B
i
L
S
T
M
BiLSTM
B
-
R
B-R
I
-
R
I-R
O
O
O
O
13
0
13
0
24
13
24
13
23
11
Ҷ18
1
18
1
25
14
DŽStep 1:Further pre-training
Sentence
pos_tag
dp_tag
DSNFs
Rooney joined Manchester
United
Description of
Entity 1
Description of
Entity 2
M
a
s
k
e
d
C
R
F
Masked CRF
Step 3:
Step 2:Fine-tuning
Entity 1 Entity 2
Syntactic Feature Generation ModuleDescriptive Knowledge Acquisition Module
Triplet Knowledge Generation Module
ORE
Chinese
Knowledge
Triples
WBM model
Figure 2. The architecture of the CORE-KE system. In step 1 (the Descriptive Knowledge
Acquisition module), the descriptions for entities in a sentence sfrom a Chinese open relation
extraction dataset are obtained from Wikidata and concatenated with sto further pre-train the
WoBERT_plus PLM. In step 2 (the Triplet Knowledge Generation module), extra dependency
relation triplets of the sentence sare generated by the DSNFs tools. These triplets are used
to ne-tune the WoBERT_plus PLM. In step 3 (the Syntactic Feature Generation module),
syntactic feature tags for each character in the sentence sare generated and concatenated
with the sentence sto train the WBM model.
3.1. Descriptive Knowledge Acquisition Module
Incorporating individual concepts (or sentences) with various descriptive and
supportive pieces of knowledge can improve the ability of understanding con-
cepts [39]. Acquiring and understanding descriptive knowledge therefore is a
vital step to enhance model’s cognition on entities [40]. In the CORE-KE sys-
tem, the Descriptive Knowledge Acquisition module has a set of utilities to help
nd descriptions for each entity in Chinese sentences and concatenate the found
descriptions with their associated entities. Take the sentence s[在欧洲杯的精
,鲁尼加盟了曼联 (After giving a wonderful show in the European
11
Data Intelligence Just Accepted MS.
https://doi.org/10.1162/dint_a_00227
Downloaded from http://direct.mit.edu/dint/article-pdf/doi/10.1162/dint_a_00227/2170609/dint_a_00227.pdf by guest on 07 November 2023
Cup, Rooney joined Manchester United)] as an example, 鲁尼 (Rooney) and
(Manchester United) are two entities. The descriptions of these two entities
are extracted from Wikidata. MediaWiki APIs 2and Wikipedia tools 3are used
in the CORE-KE system to get the entity descriptions. For the entity of 鲁尼
(Rooney), the description is 恩·马克·鲁尼,已退役的英格兰职业足球运
,英格兰足坛巨星之 (Wayne Mark Rooney, a retired English professional
soccer player, is one of the superstars in English soccer)[DesA]. For the entity
of 曼联 (Manchester United), the description is 彻斯特联足球俱乐,简称
曼联,一家于英曼彻特的足球乐部,前比赛于格兰级联赛,
球队 (Manchester United Soccer Club, abbreviated as
Manchester United, is a soccer club located in Manchester, England, currently
playing in the English Premier League, the team’s home is the Old Traord
stadium)[DesB]. The descriptions of two entities are further concatenated to
the original sentence s, that is,
Input =SDesADesB(1)
where Input refers to an input instance or sequence to be used to further train240
the pre-trained language model, denotes concatenation of any two strings, s
refers to the original sentence, DesAand DesBrefer to the description of the
rst and second entities (entity A and entity B), respectively.
The CORE-KE system also has utilities to clean up entity descriptions.
These utilities unify character encodings, change double-byte characters to single-245
byte characters, and remove noise characters (such as HTML tags and stop
words). Additionally, the CORE-KE system sets the description of a Chinese
entity to be null if the entity cannot be found in Wikidata.
3.2. Triplet Knowledge Generation Module
The Triplet Knowledge Generation module has implemented the Depen-250
dency Semantic Normal Forms (DSNFs) tools to obtain dependency relations
2https : //www.mediawiki.org/wiki/M ediaW iki
3https : //github.com/siznax/wptools
12
Data Intelligence Just Accepted MS.
https://doi.org/10.1162/dint_a_00227
Downloaded from http://direct.mit.edu/dint/article-pdf/doi/10.1162/dint_a_00227/2170609/dint_a_00227.pdf by guest on 07 November 2023
for Chinese entities in the training set. Through the dependency semantic nor-
mal forms, the CORE-KE system generalizes syntactic and semantic abstrac-
tions of relations and structures them with their associated words, POS-tags,
dependency path, and labels of dependency path. There are a total of seven255
dependency semantic normal forms used in the CORE-KE system in support
of four relation structures: modied structure, verbal structure, coordination
structure, and formulaic structure [9]. With these DSNFs, the CORE-KE sys-
tem addresses three kinds of unique but ubiquitous Chinese linguistic phenom-
ena, that is, the Nominal Modication-Center (NMC) phenomenon, the Chinese260
Light Verb construction (CLVC) phenomenon, and the Intransitive Verb (IV)
phenomenon.
The CORE-KE system further maps entity relations into dependency trees
and gathers a series of paradigms for relation extractions. The DSNF-based
knowledge generation involves the following tasks: 1) pre-process input sen-265
tences with word segmentation, POS tagging, and dependency parsing; 2) cre-
ate candidate entities for each subject sentence by using the LTP tool and the
Iterated Heuristic Algorithm; 3) pair candidate entities and classify them into
seven DSNFs. Taking advantage of these generated DSNFs, the CORE-KE sys-
tem extracts dependency relations (in the form of triples) from two large-scale270
unstructured Chinese ORE datasets for the WBM model.
Given the input sentence sin Figure 1 as an example, the following depen-
dency relations [鲁尼 (Rooney), 加盟 (joined), 曼联 (Manchester United)] and
[(鲁尼 (Rooney), 彩演出 (giving a wonderful show), 欧洲杯 (European Cup)]
are generated by the CORE-KE system using the DSNFs tools. The rst depen-275
dency relation [鲁尼 (Rooney), 加盟 (joined), (Manchester United)] can
also be found in the training dataset, however the second dependency relation
[(鲁尼 (Rooney), 彩演出 (giving a wonderful show), 欧洲杯 (European Cup)]
does not exist in the training dataset. From the view of the CORE-KE model,
the second dependency is an extra one.280
13
Data Intelligence Just Accepted MS.
https://doi.org/10.1162/dint_a_00227
Downloaded from http://direct.mit.edu/dint/article-pdf/doi/10.1162/dint_a_00227/2170609/dint_a_00227.pdf by guest on 07 November 2023
3.3. Syntactic Feature Generation Module
Considering that Chinese ORE is highly dependent on the quality of word
segmentation and often suers from the ambiguity of polysemic words [41],
the CORE-KE system uses multiple syntactic features to alleviate the ambigu-
ity of polysemy. After obtaining descriptive knowledge and triplet knowledge285
from the Descriptive Knowledge Acquisition Module and the Triplet Knowledge
Generation module, the CORE-KE system generates syntactic feature tags for
characters in Chinese sentences. This module describes the process of gener-
ating syntactic feature tags, such as POS tags and dependency parsing tags.
Figure 3 shows the types of syntactic features which are tagged in the CORE-290
KE system. Each character in a Chinese sentence belongs to one of the 29 POS
tag types. Numbers ranging from 0to 28 are used to represent each tag type.
For example, the number for the conjunction POS tag type is 2.
There are 15 dependency parsing tag types within the CORE-KE system.
Each character in a Chinese sentence can be parsed with one of these dependency295
parsing tag types. Numbers ranging from 0to 14 are used to represent each
dependency parsing tag type. For example, the number for the verb-object
dependency parsing tag type is 1.
14
Data Intelligence Just Accepted MS.
https://doi.org/10.1162/dint_a_00227
Downloaded from http://direct.mit.edu/dint/article-pdf/doi/10.1162/dint_a_00227/2170609/dint_a_00227.pdf by guest on 07 November 2023
Range of values for POS tag
0: adjective
1: other noun-modifier
2: conjunction
3: adverb
4: exclamation
5: morpheme
6: prefix
7: idiom
8: abbreviation
9: suffix
Range of values for dependency parsing-tag
0: subject - verb
1: verb - object
2: indirect - object
3: fronting - object
4: double
10: number
11: general noun
12: direction noun
13: person name
14: organization name
15: location noun
16: geographical name
17: temporal noun
18: other proper noun
19: onomatopoeia
20: preposition
21: quantity
22: pronoun
23: auxiliary
24: verb
25: punctuation
26: foreign words
27: non-lexeme
28: descriptive words
5: attribute
6: adverbial
7: complement
8: coordinate
9: preposition - object
10: left adjunct
11: right adjunct
12: independent structure
13: head
14: punctuation
Figure 3. Syntactic features used in the CORE-KE system.
Figure 4 shows an example of how these POS and dependency parsing tags
are used. The sentence s[在欧洲杯的精彩演出后,鲁尼加盟了曼联 (After giving300
a wonderful show in the European Cup, Rooney joined Manchester United)] is
incorporated with the pos_tag and dp_tag lines. The POS tags in the pos_tag
line and the dependency parsing tags in the dp_tag line are generated by the
LTP tool. Chinese characters for the same word are assigned with same pos_tag
and dp_tag. Take the Chinese word 鲁尼 (Rooney) as an example, the POS tag305
and the dependency parsing tag are person name and subject-verb, respectively.
The corresponding tag numbers are 13 and 0, respectively. Additionally, Chi-
nese characters and are also assigned with the POS tag of 13 and the
dependency parsing tag of 0.
Training sentences are further processed with a series of labels in the CORE-310
KE system. The BIO tagging scheme is used for sequence labeling. Each Chi-
nese character is labeled as B-X,I-X, or O.B-X refers to the beginning of label
type X(Xrefers to one of the three label categories: head entity (E1), tail entity
15
Data Intelligence Just Accepted MS.
https://doi.org/10.1162/dint_a_00227
Downloaded from http://direct.mit.edu/dint/article-pdf/doi/10.1162/dint_a_00227/2170609/dint_a_00227.pdf by guest on 07 November 2023
൘⅗⍢ᶟⲴ㋮ᖙ╄ࠪˈ励ቬ࣐ⴏҶᴬ㚄O O O O O O O O O O O B-E1 I-E1 B-R I-R O B-E2 I-E2
20 18 18 18 23 0 0 24 24 12 25 13 13 24 24 23 18 18
6 5 5 5 11 5 5 5 5 9 14 0 0 13 13 11 1 1
Sentence
Label
pos_tag
dp_tag
After giving a wonderful show in the European Cup, Rooney joined Manchester United. DŽO
25
14
Figure 4. A training example with syntactic features used in the CORE-KE system.
(E2), and relation (R) between E1and E2). I-X refers to the middle of label
type X, while Oindicates that the Chinese character is not a part of an entity or315
relation. For example, the Label line in Figure 4 indicates that 鲁尼 (Rooney) is
the head entity E1in the sentence, 曼联 (Manchester United) is the tail entity
E2, and 加盟 (joined) is the relation Rbetween E1and E2. The sentence sis
concatenated with its labels, POS tags, and dependency parsing tags to form
as a knowledge enhanced Chinese sentence for training in the WBM model.320
3.4. WBM Model
The CORE-KE system incorporates descriptive knowledge and triplet knowl-
edge to pre-train the WoBERT_plus pre-trained language model (see Steps 1
and 2 in Figure 2). After these two steps, the CORE-KE system implements
the WBM model for relation extraction. Given a sentence s, a feature vector
V(ci)for every Chinese character ciSin the CORE-KE system is:
V(ci) = emb(ci)emb(label(ci)) emb(pos(ci)) emb(dp(ci)) (2)
where,
emb(ci): the embedding of ci
emb(label(ci)) : the embedding of ci’s ground truth label
emb(pos(ci)) : the embedding of ci’s POS tag325
emb(dp(ci)) : the embedding of ci’s dependency parsing tag
: the concatenation of any two embeddings.
16
Data Intelligence Just Accepted MS.
https://doi.org/10.1162/dint_a_00227
Downloaded from http://direct.mit.edu/dint/article-pdf/doi/10.1162/dint_a_00227/2170609/dint_a_00227.pdf by guest on 07 November 2023
The emb(ci)and emb(label(ci)) are pre-trained contextual embeddings en-
coded by the pre-trained language model, which together incorporates informa-
tion from the character’s context in the Chinese sentence s.330
The emb(pos(ci)) and emb(dp(ci)), randomly initialized embeddings, incor-
porate information from POS tags and dependency parsing tags, respectively.
These embeddings are encoded according to the following mechanism:
1) Matrices are used to map POS tags and dependency parsing tags. A
Vpos ×Dmatrix (Spos) and a Vdp ×Dmatrix (Sdp ) are generated, where Vpos
335
and Vdp denote the number of POS tag types and dependency parsing tag types,
respectively (which are 29 and 15 in the CORE-KE system). Dgoverns the
columns of matrix Spos and matrix Sdp. Assume Dis set to be 10. Each ele-
ment in Spos or Sdp is a random number ranging from 0to 10. According to the
number of POS tag and dependency parsing tag of the character, emb(pos(ci))340
and emb(dp(ci)) are encoded into the corresponding row of Spos and Sdp, re-
spectively. For example, Chinese characters and are both assigned with
the POS tag of 13 and the dependency parsing tag of 0. The emb(pos(ci)) and
emb(dp(ci)) of these two characters are encoded, in an embedded space, as the
13th row of Spos and the 0th row of Sdp.345
Ten columns (D= 10) are selected in the CORE-KE system to consider
the constraints of computing resources (storage and computing speed) used for
model training. It is tested that the matrices (Spos [29 ×10] and Sdp [15 ×10])
are suitable for the WBM model.
2) The sentence swith syntactic features (embedded into the feature vec-350
tor V(S)) is fed into the WoBERT_plus pre-trained language model and the
contextualized output embeddings are then computed. The WoBERT_plus4
model is a variant of the WoBERT model which improves the BERT [42] pro-
cess dealing with Chinese texts. Typically, the performance of the WoBERT
model is similar to that of the BERT model, but it is 1.16, 1.22, and 1.28 times355
the speed of the BERT model in dealing with Chinese texts of a length of 128,
4https : //github.com/Z huiyiT echnology/W oB ERT
17
Data Intelligence Just Accepted MS.
https://doi.org/10.1162/dint_a_00227
Downloaded from http://direct.mit.edu/dint/article-pdf/doi/10.1162/dint_a_00227/2170609/dint_a_00227.pdf by guest on 07 November 2023
256, and 512 words, respectively5. The WoBERT_plus model improves the
RoBERTa-wwm-ext model [43] by training on a much larger corpus for 250,000
steps. Additionally, the WoBERT_plus model increases its vocabulary of the
dictionary, when compared with the WoBERT and BERT models. Also, the360
WoBERT_plus model, similar to the RoBERTa-wwm-ext model, utilizes the
whole word masking strategy in various pre-training tasks and thus mitigates
the drawbacks of masking partial word piece tokens in the BERT model.
In the CORE-KE system, the output of the WoBERT_plus model is the
sequence of character embeddings. This sequence is further fed into the BiLSTM365
layer to integrate useful information for sequence labeling. The output of the
BiLSTM layer is the predicted score for each character. The predicted scores are
later referenced to as logits. Each logit in the sequence is further considered as
the emission score for its corresponding character. The emission scores are then
used in the Masked CRF layer for computing the scores of possible sequences370
of BIO tags.
It is noted that each possible sequence of BIO tags is considered as a potential
path in the Masked CRF layer. For each possible path, the Masked CRF layer
rst introduces a transition matrix that models the transition scores from tag i
to tag jfor any two consecutive characters in the path, where 1ijN,375
Ndenotes the number of characters in the sentence s.
Assume that a sequence of input characters is x={x1,, xT}, a sequence
of ground truth BIO tags is y={y1,, yT}, and a sequence of logits as l=
{l1,, lT}for sentence s. Also assume that the number of distinct tags as
d, and the set of tag indices as [d] := {1,, d}. We then have y[d]and380
liRdwhere 1iT. In addition, the transition matrix is denoted as
A= (aij )Rd×d, where aij is the transition score from tag ito tag j,Was
the set of all trainable weights in the encoder (in the WoBERT_plus model and
the BiLSTM layer). All transition scores in the transition matrix are randomly
initialized. The purpose of the Masked CRF layer is to learn the tagging scheme385
5https : //kexue.f m/archives/7758
18
Data Intelligence Just Accepted MS.
https://doi.org/10.1162/dint_a_00227
Downloaded from http://direct.mit.edu/dint/article-pdf/doi/10.1162/dint_a_00227/2170609/dint_a_00227.pdf by guest on 07 November 2023
constraints of the training set during the training process and update transition
scores.
By aggregating the emission scores and the transition scores, the Masked
CRF layer assigns a score for each possible path. Given the input x, the weights
W, and the transition matrix A, the score of a path p={n1,, nT}in the
Masked CRF layer can be computed as:
s(p, x, W, A) =
T
i=1
li,ni+
T1
i=1
ani,ni+1 (3)
where li,j denotes the jth entry of li.
It is worth noting that the Masked CRF layer, as an improvement of CRF,
eliminates the outcomes of illegal paths. An illegal path denotes a path that390
violates BIO scheme rules. An example rule is that any I-X tag must be preceded
by a B-X tag or another I-X tag of the same type (Xrefers to head entity (E1),
tail entity (E2) or relation (R)). For example, O O O B-E1I-R O is an illegal
path because the transition from B-E1to I-R violates the example rule.
The Masked CRF layer adds a mask to the state transition matrix in advance395
and sets a minimum value for the score of the position on the illegal path. In
doing so, the CORE-KE system does not select illegal paths during the iterative
training process. The system can ensure that the predicted path for ORE is a
legal path.
The loss function of the Masked CRF layer is described as below:
L(W, A) = 1
|M|
(x,y)M
log exp s(x, y)
pP/Iexp s(p, x)(4)
where the dependence of s(·,·)on (W, A)is omitted for the sake of conciseness.400
Denoting Mas the set of all training samples, Pas the set of all possible paths,
Ias the set of illegal paths, P/Ias the set of legal paths.
The loss function of the Masked CRF layer is the whole loss function of
the CORE-KE system. Our system uses the Adam optimization algorithm
(a popular rst-order method) to minimize L(W, A). Let (Wopt,Aopt ) be the
minimizer of the loss L. The predicted path yopt of a test sample xtest is the
19
Data Intelligence Just Accepted MS.
https://doi.org/10.1162/dint_a_00227
Downloaded from http://direct.mit.edu/dint/article-pdf/doi/10.1162/dint_a_00227/2170609/dint_a_00227.pdf by guest on 07 November 2023
path having the highest score, that is,
yopt =arg max
pP/I
s(p, xtest, Wopt , Aopt)(5)
In the CORE-KE system, the Viterbi algorithm is used to nd the predicted
path.
The outputs of the CORE-KE system are the predicted results (in BIO405
format) which correspond to each character of the input Chinese sentences. In
the training process, the loss between the predicted results and the ground truth
is back propagated to ne-tune the parameters of the CORE-KE model.
3.5. Training and Inferring Processes
The CORE-KE system follows the data ow as shown in Figure 2 and trains410
the WBM model using the enhanced knowledge (that is, descriptive knowledge
and syntactic features) generated by the Descriptive Knowledge Acquisition
Module, the Triplet Knowledge Generation Module, and the Syntactic Feature
Generation Module. This section describes the training and inferring processes.
3.5.1. Training process415
The CORE-KE system selects each sentence (or sentence s) from the training
set and creates a BIO tag for each character in sentence s. Also, the CORE-KE
system conducts the following tasks in training the WBM model:
1) Link each entity in sto Wikidata and retrieve the description related to
each entity (as described in Section 3.1).420
2) Further pre-train the WoBERT_plus pre-trained language model (PLM)
by using sentence sand its description knowledge.
3) Use the DSNFs tools to generate extra dependency relation triplets for
sentence s(as described in Section 3.2).
4) Use these triplets to ne-tune the WoBERT_plus PLM in Task 1. After425
ne-tuning, the new PLM model is named as the WoBERT_plus PLM-DT
model.
20
Data Intelligence Just Accepted MS.
https://doi.org/10.1162/dint_a_00227
Downloaded from http://direct.mit.edu/dint/article-pdf/doi/10.1162/dint_a_00227/2170609/dint_a_00227.pdf by guest on 07 November 2023
5) Use the LTP tools to generate the syntactic feature tags (such as POS tags
and dependency parsing tags) for each character in s(as described in Section
3.3).430
6) Train the WoBERT_plus PLM-DT model by using sentence sand its
BIO tags, POS tags, and dependency parsing tags (as shown in Figure 4). After
training, the new model is named as the WoBERT_plus PLM-DT + BiLSTM
+ MaskedCRF model or the WBM model.
3.5.2. Inferring process435
The CORE-KE system takes advantage of the test set to verify the inferring
process of the WBM model and predict relations for a given sentence in the test
set. It involves the following tasks:
1) Select each sentence sfrom the test set and use the LTP tools to generate
the syntactic feature tags, such as POS tags and dependency parsing tags, for440
each character in sentence s(as described in Section 3.3).
2) Use sentence sand its related POS tags and dependency parsing tags as
inputs to the CORE-KE system. The system then predicts the BIO tag for each
character of sentence sand extracts a relation.
4. Experiments445
Aiming to evaluate the performance of the CORE-KE system for Chinese
open relation extraction, we conducted experiments on two datasets. In this
section, we describe the experimental setup and outline the benchmark ORE
systems used in comparison with the CORE-KE system in this research. At the
end of this section, experimental results are discussed.450
4.1. Experimental Setup
The two large-scale datasets used in our experiments were the Chinese Open
Entity and Relation Knowledge Base dataset (later referenced to as the COER
dataset) and the Span Symbol Aided Open Knowledge Expression dataset (later
referenced to as the SpanSAOKE dataset). The COER dataset is a Chinese455
21
Data Intelligence Just Accepted MS.
https://doi.org/10.1162/dint_a_00227
Downloaded from http://direct.mit.edu/dint/article-pdf/doi/10.1162/dint_a_00227/2170609/dint_a_00227.pdf by guest on 07 November 2023
open entity and relation knowledge base containing 65,778 dierent named
entities and 21,359 dierent open relation phrases in 218,362 Chinese sen-
tences extracted from several popular Chinese websites, such as Sohu News
[http://www.sohu.com], Sina News [http://news.sina.com.cn], and Baidu Baike
[https://baike.baidu.com]. The relations in the COER dataset cover many do-460
mains, including military, sports, entertainment, economy, etc. We ltered out
sentences that contain multiple relations. In doing so, the COER dataset used
in our experiment contained 213,327 triples in the training set and 2,000 triples
in the test set, respectively. The COER dataset for the COER-KE system was
the same as that for the PGCORE system[8].465
The SpanSAOKE dataset is a large-scale sentence-level dataset for Chinese
ORE containing 53,869 relation triple facts in 26,496 sentences collected from
Baidu Baike. Some of the entities and relations overlap in the SpanSAOKE
dataset. For example, in the sentence of [民收入主要以橡胶为主 (Farmers’
incomes are mainly from rubber)], 农民收入 (Farmers’ incomes) and 橡胶 (rub-470
ber) are two annotated entities, 以橡胶为主 (from rubber) is annotated relation
in the SpanSAOKE dataset. The tail entity 橡胶 (rubber ) and the relation 以橡
胶为主 (from rubber) overlap in this sentence. In our experiments, these over-
lapping sentences were ltered out and 44,734 triples in 23,856 sentences were
retained. The new dataset is later referenced to as the SpanSAOKE without475
overlapped entity-relation triples (or the SpanSAOKE-NO dataset for short).
We randomly divided the new dataset with 35,921 triples for training, 4,373
triples for validation, and 4,440 triples for testing, respectively. The details of
the two experimental datasets are shown in Table 1.
It is noted that the SpanSAOKE-NO dataset is dierent from the COER480
dataset. There are many sentences containing multiple relation triples in the
SpanSAOKE-NO dataset. The CORE-KE system thus predicts only one result
per sentence at a time. Same as the procedures used in MGD-GNN [36], the
gestalt pattern matching function [44] was used in the CORE-KE system to nd
the best triple (which matches with the ground truth) from the SpanSAOKE-485
NO dataset.
22
Data Intelligence Just Accepted MS.
https://doi.org/10.1162/dint_a_00227
Downloaded from http://direct.mit.edu/dint/article-pdf/doi/10.1162/dint_a_00227/2170609/dint_a_00227.pdf by guest on 07 November 2023
Table 1. Statistics of the COER dataset and SpanSAOKE-NO dataset.
Datasets Split #Triples
COER Train 213,327
Test 2,000
SpanSAOKE-NO
Train 35,921
Validation 4,373
Test 4,440
In further pre-training and ne-tuning the WoBERT_plus model, we used
the learning rate of 5e5, the max sequence length of 128, the batch size of 16,
and the training epoch of 30. Additionally, we also employed Gaussian Error
Linear Units (GELU) as the activation function.490
In training the CORE-KE system, we used the max sequence length of 256,
the batch size of 16, the dropout rate of 0.5, and the size of 256 as the BiLSTM
hidden unit. Additionally, we adopted other parameters the same as those in
the ne-tuning process of the WoBERT_plus model.
4.2. Benchmark Systems495
The following benchmark systems were used in this research to compare their
eectiveness with that of the CORE-KE system:
ZORE[6] is a syntactic ORE system that identies relation candidates from
automatically parsed dependency trees and extracts relations with their seman-
tic patterns iteratively through a novel double propagation algorithm.500
UnCORE[45] is a rule-based Chinese ORE system that 1) uses word dis-
tance and entity distance constraints to generate candidate relation triples from
raw corpus, 2) adopts global and domain ranking methods to discover relation
words from candidate relation triples, and 3) employs syntactic rules to lter
nal relation triples.505
DSNFs[9] is an unsupervised Chinese ORE system that establishes its own
Dependency Semantic Normal Forms (DSNFs) to map entity relations into de-
23
Data Intelligence Just Accepted MS.
https://doi.org/10.1162/dint_a_00227
Downloaded from http://direct.mit.edu/dint/article-pdf/doi/10.1162/dint_a_00227/2170609/dint_a_00227.pdf by guest on 07 November 2023
pendency trees and considers Chinese unique linguistic characteristics in open
relation extraction.
PGCORE[8] is an end-to-end supervised neural Chinese ORE system that510
applies a Pointer-Generator framework to copy words in input sequences and
move them to output sequences via pointers, while retaining its ability to gen-
erate new words.
SpanOIE[32] is a span selection based neural open information extraction
system which receives competitive results from English corpus. It uses the pre-515
trained Chinese word embeddings [46] as well as POS tags and dependency
labels to apply SpanOIE for Chinese corpus.
CharLSTM[47] applies a vanilla character-based BiLSTM model to encode
characters for Chinese open relation extraction.
MGD-GNN[36] is a character-based supervised neural system which con-520
structs a multi-grained dependency graph to incorporate dependency and word
boundary information. It employs GNN to get node representations for predi-
cate and argument predictions.
4.3. Measures
The performance of the CORE-KE system on the COER and SpanSAOKE-525
NO datasets is measured by Precision (P), Recall (R), and micro F1-score (F1):
P=|CG|
|C|(6)
R=|CG|
|G|(7)
F1=2×P×R
P+R(8)
where Crefers to the predicted result set of the CORE-KE system, Grefers to
the golden result set, and CGdenotes the triples both in Cand G. Both C
and G involve all three categories, including head entity, tail entity, and relation.
24
Data Intelligence Just Accepted MS.
https://doi.org/10.1162/dint_a_00227
Downloaded from http://direct.mit.edu/dint/article-pdf/doi/10.1162/dint_a_00227/2170609/dint_a_00227.pdf by guest on 07 November 2023
4.4. Results and Analysis530
We conducted numerous experiments by operating the CORE-KE system
and the benchmark ORE systems on the COER and SpanSAOKE-NO datasets.
The experimental results were obtained and comparisons between the CORE-
KE system and the benchmark ORE systems were made.
4.4.1. Overall Comparison535
Table 2 shows the experimental results of the CORE-KE system and the
benchmark ORE systems on the COER dataset. The best results are high-
lighted in bold. It is noted that we did not use all the benchmark ORE systems
described in Section 4.2, but the four ORE systems (ZORE, UnCORE, DSNFs,
and PGCORE) which have been previously reported as the top systems with540
high ORE performance [8]. The rst three of these systems are unsupervised
systems.
Table 2. Experimental results of Chinese ORE systems on the COER dataset.
Systems Precision Recall F1-score
Unsupervised
ZORE 0.838 0.145 0.249
UnCORE 0.806 0.476 0.599
DSNFs 0.838 0.587 0.690
Supervised PGCORE 0.854 0.543 0.663
CORE-KE 0.835 0.761 0.796
It is observed that all ve systems yield similar results in precision (0.83±0.02),
the CORE-KE system, however, outperforms the benchmark ORE systems by
a large margin in recall and F1-score. In comparison with three statistical and545
rule-based unsupervised systems (ZORE, UnCORE, and DSNFs), the CORE-
KE system ends up with a relative improvement of [(0.761-0.145)/ 0.145 or
424.8%], [(0.761-0.476)/0.476 or 59.9%], and [(0.761-0.587)/0.587 or 29.6%] in
recall, respectively; [(0.796-0.249)/0.249 or 219.7%], [(0.796-0.599)/ 0.599 or
32.9%], and [(0.796-0.690)/0.690 or 15.4%] in F1-score, respectively. In com-550
25
Data Intelligence Just Accepted MS.
https://doi.org/10.1162/dint_a_00227
Downloaded from http://direct.mit.edu/dint/article-pdf/doi/10.1162/dint_a_00227/2170609/dint_a_00227.pdf by guest on 07 November 2023
parison with the supervised neural ORE system (PGCORE), the CORE-KE
system has a relative improvement of [(0.761-0.543)/0.543 or 40.1%] in recall
and [(0.7960.663)/0.663 or 20.1%] in F1-score, respectively. All these results
show that the CORE-KE system can signicantly improve the performance of
Chinese open relation extraction.555
Table 3 shows the experimental results of the CORE-KE system and the
benchmark ORE systems on the SpanSAOKE/SpanSAOKE-NO dataset. The
best results are also highlighted in bold. It is noted that we used only four
ORE systems (ZORE, SpanOIE, CharLSTM, and MGD-GNN) since these four
systems had been previously reported as the top systems with high ORE per-560
formance [36]. Among these systems, ZORE is an unsupervised system.
Table 3. Experimental results of Chinese ORE systems on the SpanSAOKE/SpanSAOKE-NO
dataset.
Systems Precision Recall F1-score
Unsupervised ZORE 0.315 0.177 0.227
Supervised
SpanOIE 0.418 0.443 0.430
CharLSTM 0.404 0.454 0.427
MGD-GNN 0.450 0.471 0.460
CORE-KE 0.594 0.383 0.466
It is observed that the CORE-KE system outperforms all the benchmark
ORE systems in precision and F1-score. In comparison with the statistical
and rule-based system (ZORE), the CORE-KE system gives a relative improve-
ment of [(0.594-0.315)/0.315 or 88.6%] and [(0.466-0.227)/0.227 or 105.3%] in565
precision and F1-score, respectively. In comparison with the three supervised
neural ORE systems (SpanOIE, CharLSTM, and MGD-GNN), the CORE-
KE system gives a relative improvement of [(0.5940.418)/0.418 or 42.1%],
[(0.5940.404)/0.404 or 47.0%], and [(0.5940.450)/0.450 or 32.0%] in precision
and [(0.4660.430)/0.430 or 8.4%], [(0.4660.427)/0.427 or 9.1%], and [(0.466570
0.460)/0.460 or 1.3%] in F1-score, respectively. All these results indicate the
26
Data Intelligence Just Accepted MS.
https://doi.org/10.1162/dint_a_00227
Downloaded from http://direct.mit.edu/dint/article-pdf/doi/10.1162/dint_a_00227/2170609/dint_a_00227.pdf by guest on 07 November 2023
eectiveness of the CORE-KE system. Furthermore, when we compare the
CORE-KE system with the above three supervised neural ORE systems, the
CORE-KE system ends up with a lower recall. One possible reason might be
that the CORE-KE system extracts fewer triples from the SpanSAOKE-NO575
dataset. The extraction results of the CORE-KE system and the MGD-GNN
model are shown in Table 4. It is noted that the CORE-KE system extracted
2,860 triples on the SpanSAOKE-NO dataset, while the MGD-GNN model
extracted 5,591 triples on the SpanSAOKE dataset, which is even more than
the number of golden triples on this dataset. This may explain why the recall580
of the CORE-KE system is lower than that of the MGD-GNN model. Table 5
shows the number of head entities, tail entities, and relations extracted by the
CORE-KE system. It is noted these three values are inconsistent, the CORE-KE
system extracted more head entities than the tail entities and relations. This
indicates that at least 723 (3819 - 3096) triples extracted by the CORE-KE585
system are incomplete. The actual number is 1580 (4440 - 2860). These in-
complete triples containing only an entity and a relation are not included in
the extracted result set. According to the denitions of precision and recall
(Equations 6 and 7 in Section 4.3), fewer extracted triples lead to a lower recall
and a lower F1 score.590
Table 4. The extraction results of the CORE-KE system and the MGD-GNN model on the
test set of SpanSAOKE/SpanSAOKE-NO dataset.
CORE-KE MGD-GNN
#Golden triples 4,440 5,342
#Extracted triples 2,860 5,591
#Correct triples 1,700 2,516
27
Data Intelligence Just Accepted MS.
https://doi.org/10.1162/dint_a_00227
Downloaded from http://direct.mit.edu/dint/article-pdf/doi/10.1162/dint_a_00227/2170609/dint_a_00227.pdf by guest on 07 November 2023
Table 5. The number of head entities, tail entities and relations extracted by the CORE-KE
system on the test set of the SpanSAOKE-NO dataset.
Golden Extracted
#head entities 4,440 3,819
#tail entities 4,440 3,128
#relations 4,440 3,096
As a result, the CORE-KE model has limited improvement when compared
with the MGD-GNN model. The MGD-GNN incorporates the dependency re-
lations between words and adopts a graph neural network to encode the multi-
grained dependency graph (MGD). Dierent from the MGD-GNN model, the
CORE-KE model integrates dependency information in the way of encoding de-595
pendency parsing tags in an embedded space and concatenating dependency tags
with word embeddings and other information. Compared with the MGD-GNN
model, the CORE-KE system pays more attention to capturing the semantic
features of the sentence and has a better result in precision.
4.4.2. Comparison with Pre-trained Language Models600
Using the COER and SpanSAOKE-NO datasets, we also compared the pre-
trained language model used in the CORE-KE system with other pre-trained
language models. Table 6 summarizes the experimental results of the pre-trained
language models. The best results are highlighted in bold.
28
Data Intelligence Just Accepted MS.
https://doi.org/10.1162/dint_a_00227
Downloaded from http://direct.mit.edu/dint/article-pdf/doi/10.1162/dint_a_00227/2170609/dint_a_00227.pdf by guest on 07 November 2023
Table 6. Experimental results of pre-trained language models on the COER dataset and the
SpanSAOKE-NO dataset.
Models/Systems COER SpanSAOKE-NO
P R F1P R F1
BERT[42] 0.803 0.740 0.770 0.680 0.296 0.413
RoBERTa[48] 0.806 0.712 0.756 0.725 0.277 0.401
RoBERTa-wwm[43] 0.814 0.741 0.776 0.704 0.299 0.419
macBERT[49] 0.807 0.713 0.757 0.700 0.280 0.400
NEZHA[50] 0.819 0.733 0.773 0.625 0.258 0.365
ELECTRA[51] 0.824 0.695 0.754 0.499 0.230 0.316
WoBERT_plus 0.834 0.736 0.782 0.672 0.311 0.425
CORE-KE 0.835 0.761 0.796 0.594 0.383 0.466
It is noted from Table 6 that the CORE-KE system, when compared with605
other PLMs, relatively improves the performance of open relation extraction,
ranging between 0.1%-4.0% in precision, 2.7%-9.5% in recall, and 1.8%-5.6% in
F1-score on the COER dataset, respectively. Additionally, the CORE-KE sys-
tem enhances the performance of open relation extraction on the SpanSAOKE-
NO dataset, ranging between 23.2%-66.5% in recall and 9.6%-47.5% in F1-score,610
respectively. All these results demonstrate that the CORE-KE system is more
eective in Chinese ORE.
4.4.3. Ablation Study
Ablation study was conducted in this research to further assess the perfor-
mance of the CORE-KE system in Chinese ORE. We had the following ve615
models for the ablation study:
1) Ablation Model #1: the BiLSTM-Masked CRF model without the sup-
port of any pre-trained language model
2) Ablation Model #2: the WoBERT_plus+BiLSTM-Masked CRF model
without knowledge enhancement620
29
Data Intelligence Just Accepted MS.
https://doi.org/10.1162/dint_a_00227
Downloaded from http://direct.mit.edu/dint/article-pdf/doi/10.1162/dint_a_00227/2170609/dint_a_00227.pdf by guest on 07 November 2023
3) Ablation Model #3: the CORE-KE system with the support of external
knowledge corpus (descriptive and triplet knowledge) only
4) Ablation Model #4: the CORE-KE system with the support of syntactic
features only
5) Ablation Model #5: the CORE-KE system with the full support of exter-625
nal knowledge corpus (descriptive and triplet knowledge) and syntactic features.
The experimental results of these models on the two datasets are shown in
Table 7. The best results are highlighted in bold.
Table 7. Experimental results of the ablation study.
Models COER SpanSAOKE-NO
P R F1P R F1
Ablation Model #1 0.771 0.616 0.685 0.656 0.124 0.209
Ablation Model #2 0.822 0.752 0.785 0.652 0.336 0.444
Ablation Model #3 0.821 0.762 0.792 0.649 0.343 0.449
Ablation Model #4 0.823 0.756 0.788 0.630 0.357 0.456
Ablation Model #5 0.835 0.761 0.796 0.594 0.383 0.466
It is noted that the Chinese ORE model with the support of a PLM (as shown
with Ablation Model #2) has superior performance on the two datasets, when630
compared with the Chinese ORE model without the support of a PLM (as shown
with Ablation Model #1). The relative improvement of the Chinese ORE model
with a PLM are 14.6% and 112.4% in F1-score, respectively. These ndings
indicate that using a pre-trained language model can boost the performance of
Chinese ORE systems.635
It is also noted that enhanced knowledge has signicant impact on the per-
formance of the CORE-KE system in open relation extraction. When external
knowledge corpus was used, Ablation Model #3 relatively improved the F1-score
of 0.9% (on the COER dataset) and 1.1% (on the SpanSAOKE-NO dataset), as
compared with Ablation Model #2 which did not incorporate any knowledge.640
When syntactic features were used, Ablation Model #4 had a relative improve-
30
Data Intelligence Just Accepted MS.
https://doi.org/10.1162/dint_a_00227
Downloaded from http://direct.mit.edu/dint/article-pdf/doi/10.1162/dint_a_00227/2170609/dint_a_00227.pdf by guest on 07 November 2023
ment of 0.4% and 2.7% in F1-score, as compared with Ablation Model #2 on
the two datasets, respectively. The CORE-KE system, which takes advantage
of both external knowledge corpus and syntactic features, outperforms Ablation
Model #2 on the two datasets with relative improvements of 1.4% and 5.0% in645
the F1-score, respectively. All these results demonstrate that the use of exter-
nal knowledge corpus and syntactic features can improve the performance of
Chinese ORE.
From Table 7 we can observe that the PLM (WoBERT_plus) plays a more
important role in the CORE-KE system. One possible reason is that the650
PLM have obtained abundant context-based semantic knowledge on large data
sources, which is benecial to the downstream Natural Language Understand-
ing tasks such as relation extraction. The use of knowledge enhancement in
the CORE-KE system creates a new approach to strengthening pre-trained lan-
guage models in learning entities and relations from Chinese contexts. Syntactic655
features, such as POS tags and dependency parsing tags for each character in
Chinese sentences, help PLMs have a better understanding of each character in
terms of syntactic structure.
4.4.4. Case Studies
After the CORE-KE system was trained, we applied it to extract Chinese660
relations from the COER and SpanSAOKE-NO datasets. The ORE results are
shown in Table 8 and Table 9, respectively.
31
Data Intelligence Just Accepted MS.
https://doi.org/10.1162/dint_a_00227
Downloaded from http://direct.mit.edu/dint/article-pdf/doi/10.1162/dint_a_00227/2170609/dint_a_00227.pdf by guest on 07 November 2023
Table 8. ORE results from a Chinese sentence with interferences in the COER dataset.
Results
《米兰体育报》披露,加里亚尼是前来探营,近距离
Input 接触葡萄牙天才维罗索。(La Gazzetta Dello Sport
revealed that Galliani came to visit the camp, took
aclose encounter with the Portuguese genius Veloso.)
Head entity: 加里亚尼 (Galliani)
Gold (Ground Truth) Relation: 接触 (encounter)
Tail entity: 维罗索 (Veloso)
Head entity: 米兰体育报 (La Gazzetta Dello Sport)
DSNFs Relation: 接触 (encounter)
Tail entity: 维罗索 (Veloso)
Head entity: 加里亚尼 (Galliani)
CORE-KE Relation: 接触 (encounter)
Tail entity: 维罗索 (Veloso)
From the sentence [《米 育报》 , ,
接触萄牙天才 罗索 (La Gazzetta Dello Sport revealed that Galliani came
to visit the camp, took a close encounter with the Portuguese genius Veloso)]665
in Table 8, (Gal liani), (Veloso), and 接触 (encounter) are
annotated as a head entity, a tail entity, and a relation, respectively. There are
some interferences in this sentence, such as《米兰体育报》(La Gazzetta Dello
Sport), 披露 (revealed), 探营 (visit the camp), and 葡萄牙 (Portuguese). When
we run ORE systems (the CORE-KE system and the DSNFs system), only the670
CORE-KE system can extract (Gal liani) as head entity correctly.
The DSNFs and CORE-KE systems can precisely extract 接触 (encounter) as
a relation. The tail entity (Veloso) can be correctly extracted from
two systems. These results indicate that the CORE-KE system can eliminate
interferences and extract ne-grained entities.675
32
Data Intelligence Just Accepted MS.
https://doi.org/10.1162/dint_a_00227
Downloaded from http://direct.mit.edu/dint/article-pdf/doi/10.1162/dint_a_00227/2170609/dint_a_00227.pdf by guest on 07 November 2023
Table 9. ORE results of a Chinese sentence with long entities in the SpanSAOKE-NO dataset.
Results
如果想达到好的效果,需要操作者具有精确的
Input 时间判断及操控能力。(To achieve good results,
the operator is required to have precise time
judgment and manipulation ability.)
Head entity: 操作者 (Operator)
Gold (Ground Truth) Relation: 具有 (Have)
Tail entity: 精确的时间判断及操控能力
(Precise time judgment and manipulation ability)
Head entity: -
DSNFs Relation: -
Tail entity: -
Head entity: 操作者 (Operator)
CORE-KE Relation: 具有 (Have)
Tail entity: 精确的时间判断及操控能力
(Precise time judgment and manipulation ability)
From the sentence [如果想达到好的效果,需要操作者具有精确的时间判断及
操控能力 (To achieve good results, the operator is required to have precise time
judgment and manipulation ability)] in Table 9, 操作者 (operator) is annotated
as a head entity, 具有 (have) is annotated as a tail entity, and 精确的时间
断及 (precise time judgment and manipulation ability) is annotated680
as a relation. It is noted that the tail entity in this sentence is very long. The
DSNFs system cannot extract open relations from this sentence. The CORE-
KE system can extract the whole triple accurately. These results demonstrate
that the CORE-KE system can extract very long entities (especially long-tail
entities) correctly.685
The CORE-KE system was trained on one NVIDIA 2080Ti GPU for 5 hours
on the COER dataset, while the PGCORE system was trained for 17 hours on
33
Data Intelligence Just Accepted MS.
https://doi.org/10.1162/dint_a_00227
Downloaded from http://direct.mit.edu/dint/article-pdf/doi/10.1162/dint_a_00227/2170609/dint_a_00227.pdf by guest on 07 November 2023
one NVIDIA 1050Ti GPU [8]. The CORE-KE system is proven to be deployed
quickly onto new datasets. As a result, taxonomy evolution can be eectively
alleviated.690
4.4.5. Error Analysis
In this section, typical errors extracted by the CORE-KE system from the
COER and SpanSAOKE-NO datasets are described through examples in Tables
10 and 11, respectively.
Table 10. Errors extracted by the CORE-KE system from the COER dataset.
Results
Input 戈麦斯与波斯蒂加则为葡萄牙队打入两球。
(Gomez and Postiga scored two goals for Portugal.)
Head entity: 戈麦斯与波斯蒂加 (Gomez and Postiga)
Gold (Ground Truth) Relation: 打入两球 (scored two goals)
Tail entity: 葡萄牙队 (Portugal)
Head entity: 戈麦斯 (Gomez)
CORE-KE Relation: 打入两球 (scored two goals)
Tail entity: 葡萄牙队 (Portugal)
It is noted that the ground truth head entity of the example case is [戈麦斯695
与波斯蒂加 (Gomez and Postiga)] in Table 10, the CORE-KE system extracts
only part of the correct answer. One reason is that the acquired descriptive
knowledge brings the noise to the original training data. For example, the De-
scriptive Knowledge Acquisition module of the CORE-KE system has acquired
the description knowledge of 戈麦 (Gomez), (Postiga), and 700
牙队 (Portugal). All of these three entities and their descriptions are utilized
to further pre-train the WoBERT_plus model. Although in most cases, using
granule description knowledge can improve the understanding of a certain con-
cept. However, in this special case with two entities in parallel, the separate
description of 戈麦 (Gomez) and 斯蒂 (Postiga) indeed creats noise in705
34
Data Intelligence Just Accepted MS.
https://doi.org/10.1162/dint_a_00227
Downloaded from http://direct.mit.edu/dint/article-pdf/doi/10.1162/dint_a_00227/2170609/dint_a_00227.pdf by guest on 07 November 2023
the relation extraction.
Table 11. Errors extracted by the CORE-KE system from the SpanSAOKE-NO dataset.
Results
张女士祖籍黄冈,是香港湖北联谊会会员。
Input (Ms. Zhang, whose ancestral home is Huanggang,
is a member of the HongKong-Hubei Friendship
Association.)
Head entity: 张女士 (Ms. Zhang)
Gold (Ground Truth #1) Relation: 祖籍 (ancestral home)
Tail entity: 黄冈 (Huanggang)
Head entity: 张女士 (Ms. Zhang)
Gold (Ground Truth #2) Relation: 会员 (member)
Tail entity: 香港湖北联谊会 (HongKong-
Hubei Friendship Association)
Head entity: 张女士 (Ms. Zhang)
CORE-KE Relation: -
Tail entity: -
As shown in Table 11, there are two ground truth triples in this example. The
CORE-KE system only extracted the common head entity of the two triples.
One possible reason is that the CORE-KE system is designed to extract only one
triple from a sentence. With the support of knowledge enhancement technology,710
the CORE-KE system has recognized two potential relations from the example
sentence, but is still confused with which one to be used as the nal relation.
5. Conclusion and Future Work
In this paper, a new method entitled Chinese Open Relation Extraction with
Knowledge Enhancement (CORE-KE) is presented. The CORE-KE system715
which implements this method takes advantage of a pre-trained language model
35
Data Intelligence Just Accepted MS.
https://doi.org/10.1162/dint_a_00227
Downloaded from http://direct.mit.edu/dint/article-pdf/doi/10.1162/dint_a_00227/2170609/dint_a_00227.pdf by guest on 07 November 2023
(PLM) (with the support of a BiLSTM layer and a Masked CRF layer) on
unstructured data and extracts Chinese open relations. To the best of our
knowledge, this is the rst Chinese ORE method based on a PLM.
Descriptive knowledge from Wikidata and extra triplet knowledge from un-720
structured Chinese corpus were obtained through two separate modules. Chi-
nese sentences, combined with descriptive and triplet knowledge, were further
used in the CORE-KE system to pre-train and ne-tune the pre-trained lan-
guage model. In addition, syntactic features were adopted in the training stage
of the CORE-KE system.725
The experimental results of the CORE-KE system on two large-scale datasets
of open Chinese entities and relations demonstrate that the CORE-KE method
is superior to other ORE methods. The F1-scores of the CORE-KE method on
the two datasets have given a relative improvement of 20.1% and 1.3%, when
compared with benchmark ORE methods, respectively. Additionally, the results730
from the ablation study and the case studies also demonstrate that the CORE-
KE method is eective in addressing ORE challenges such as ne-grained entity
acquisition, long-tail entity mining, and taxonomy evolution.
There are still some limitations in the CORE-KE method, the future work
can be summarized as below:735
1) As discussed in the Results and Analysis section, the CORE-KE system,
when compared with the benchmark neural ORE systems (that is, SpanOIE,
CharLSTM, and MGD-GNN systems), had a lower recall. Further investiga-
tion will be undertaken to understand the underlying mechanism for such low
performance.740
2) Additionally, the CORE-KE method will be improved in the future stud-
ies to extract overlapping entities and multiple open relations from Chinese
sentences.
36
Data Intelligence Just Accepted MS.
https://doi.org/10.1162/dint_a_00227
Downloaded from http://direct.mit.edu/dint/article-pdf/doi/10.1162/dint_a_00227/2170609/dint_a_00227.pdf by guest on 07 November 2023
Acknowledgements
This work was supported by the high-level university construction special745
project of Guangdong province, China 2019 (No. 5041700175) and the new
engineering research and practice project of the Ministry of Education, China
(NO. E-RGZN20201036).
References
[1] S. Pawar, G. K. Palshikar, P. Bhattacharyya, Relation extraction: A sur-750
vey, arXiv preprint arXiv:1712.05191.
[2] O. Etzioni, M. Banko, S. Soderland, D. S. Weld, Open information ex-
traction from the web, Communications of the ACM 51 (12) (2008) 68–74.
doi:10.1145/1409360.1409378.
[3] A. Fader, S. Soderland, O. Etzioni, Identifying relations for open infor-755
mation extraction, in: Proceedings of the 2011 Conference on Empirical
Methods in Natural Language Processing, Association for Computational
Linguistics, 2011, pp. 1535–1545.
[4] K. Kolluru, V. Adlakha, S. Aggarwal, Mausam, S. Chakrabarti, Openie6:
Iterative grid labeling and coordination analysis for open information ex-760
traction, in: Proceedings of the 2020 Conference on Empirical Methods in
Natural Language Processing, Association for Computational Linguistics,
2020, pp. 3748–3761. doi:10.18653/v1/2020.emnlp-main.306.
[5] Y.-H. Tseng, L.-H. Lee, S.-Y. Lin, B.-S. Liao, M.-J. Liu, H.-H. Chen, O. Et-
zioni, A. Fader, Chinese open relation extraction for knowledge acquisi-765
tion, in: Proceedings of the 14th Conference of the European Chapter
of the Association for Computational Linguistics, volume 2: Short Pa-
pers, Association for Computational Linguistics, 2014, pp. 12–16. doi:
10.3115/v1/E14-4003.
37
Data Intelligence Just Accepted MS.
https://doi.org/10.1162/dint_a_00227
Downloaded from http://direct.mit.edu/dint/article-pdf/doi/10.1162/dint_a_00227/2170609/dint_a_00227.pdf by guest on 07 November 2023
[6] L. Qiu, Y. Zhang, Zore: A syntax-based system for chinese open relation770
extraction, in: Proceedings of the 2014 Conference on Empirical Methods in
Natural Language Processing, Association for Computational Linguistics,
2014, pp. 1870–1880. doi:10.3115/v1/D14-1201.
[7] C. Wang, X. He, A. Zhou, Open relation extraction for chinese noun
phrases, IEEE Transactions on Knowledge and Data Engineering 33 (6)775
(2021) 2693–2708. doi:10.1109/TKDE.2019.2953839.
[8] Z. Cheng, X. Wu, X. Xie, J. Wu, Chinese open relation extraction with
pointer-generator networks, in: 2020 IEEE 5th International Conference
on Data Science in Cyberspace, IEEE, 2020, pp. 307–311. doi:10.1109/
DSC50466.2020.00054.780
[9] S. Jia, S. E, M. Li, Y. Xiang, Chinese open relation extraction and knowl-
edge base establishment, ACM Trans. Asian Low Resour. Lang. Inf. Pro-
cess. 17 (3) (2018) 1–22. doi:10.1145/3162077.
URL https://doi.org/10.1145/3162077
[10] W. Che, Y. Feng, L. Qin, T. Liu, N-ltp: An open-source neural lan-785
guage technology platform for chinese, in: Proceedings of the 2021 Con-
ference on Empirical Methods in Natural Language Processing: System
Demonstrations, Association for Computational Linguistics, 2021, pp. 42–
49. doi:10.18653/v1/2021.emnlp-demo.6.
[11] T. Wei, J. Qi, S. He, S. Sun, Masked conditional random elds for sequence790
labeling, in: Proceedings of the 2021 Conference of the North American
Chapter of the Association for Computational Linguistics: Human Lan-
guage Technologies, Association for Computational Linguistics, 2021, pp.
2024–2035. doi:10.18653/v1/2021.naacl-main.163.
[12] N. Zhang, Q. Jia, S. Deng, X. Chen, H. Ye, H. Chen, H. Tou, G. Huang,795
Z. Wang, N. Hua, et al., Alicg: Fine-grained and evolvable conceptual
graph construction for semantic search at alibaba, in: Proceedings of the
38
Data Intelligence Just Accepted MS.
https://doi.org/10.1162/dint_a_00227
Downloaded from http://direct.mit.edu/dint/article-pdf/doi/10.1162/dint_a_00227/2170609/dint_a_00227.pdf by guest on 07 November 2023
27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining,
Association for Computing Machinery, 2021, pp. 3895–3905. doi:10.1145/
3447548.3467057.800
[13] Y. Gou, Y. Lei, L. Liu, P. Zhang, X. Peng, A dynamic parameter enhanced
network for distant supervised relation extraction, Knowledge-Based Sys-
tems 197 (2020) 105912. doi:10.1016/j.knosys.2020.105912.
[14] J. Christensen, S. Soderland, O. Etzioni, An analysis of open information
extraction based on semantic role labeling, in: Proceedings of the 6th In-805
ternational Conference on Knowledge Capture, Association for Computing
Machinery, 2011, pp. 113–120. doi:10.1145/1999676.1999697.
[15] L. Del Corro, R. Gemulla, Clausie: Clause-based open information ex-
traction, in: Proceedings of the 22nd International Conference on World
Wide Web, Association for Computing Machinery, 2013, pp. 355–366.810
doi:10.1145/2488388.2488420.
[16] H. Pal, et al., Demonyms and compound relational nouns in nominal open
ie, in: Proceedings of the 5th Workshop on Automated Knowledge Base
Construction, Association for Computational Linguistics, 2016, pp. 35–39.
doi:10.18653/v1/W16-1307.815
[17] G. Stanovsky, J. Ficler, I. Dagan, Y. Goldberg, Getting more out of syntax
with props, arXiv preprint arXiv:1603.01648.
[18] M. Mausam, Open information extraction systems and downstream ap-
plications, in: Proceedings of the 25th International Joint Conference on
Articial Intelligence, AAAI Press, 2016, pp. 4074–4077.820
[19] K. Gashteovski, R. Gemulla, L. del Corro, Minie: Minimizing facts in open
information extraction, in: Proceedings of the 2017 Conference on Empir-
ical Methods in Natural Language Processing, Association for Computa-
tional Linguistics, 2017, pp. 2630–2640. doi:10.18653/v1/D17-1278.
39
Data Intelligence Just Accepted MS.
https://doi.org/10.1162/dint_a_00227
Downloaded from http://direct.mit.edu/dint/article-pdf/doi/10.1162/dint_a_00227/2170609/dint_a_00227.pdf by guest on 07 November 2023
[20] M. Cetto, C. Niklaus, A. Freitas, S. Handschuh, Graphene: Semantically-825
linked propositions in open information extraction, in: Proceedings of the
27th International Conference on Computational Linguistics, Association
for Computational Linguistics, 2018, pp. 2300–2311.
[21] S. Saha, et al., Open information extraction from conjunctive sentences,
in: Proceedings of the 27th International Conference on Computational830
Linguistics, Association for Computational Linguistics, 2018, pp. 2288–
2299.
[22] Q. Li, L. Li, W. Wang, Q. Li, J. Zhong, A comprehensive exploration of se-
mantic relation extraction via pre-trained cnns, Knowledge-Based Systems
194 (2020) 105488. doi:10.1016/j.knosys.2020.105488.835
[23] Mausam, M. Schmitz, R. Bart, S. Soderland, O. Etzioni, Open language
learning for information extraction, in: Proceedings of the 2012 Joint Con-
ference on Empirical Methods in Natural Language Processing and Compu-
tational Natural Language Learning, Association for Computational Lin-
guistics, 2012, pp. 523–534.840
[24] G. Angeli, M. J. J. Premkumar, C. D. Manning, Leveraging linguistic struc-
ture for open domain information extraction, in: Proceedings of the 53rd
Annual Meeting of the Association for Computational Linguistics and the
7th International Joint Conference on Natural Language Processing (Vol-
ume 1: Long Papers), Association for Computational Linguistics, 2015, pp.845
344–354. doi:10.3115/v1/P15-1034.
[25] S. Saha, H. Pal, et al., Bootstrapping for numerical open ie, in: Proceedings
of the 55th Annual Meeting of the Association for Computational Linguis-
tics (Volume 2: Short Papers), Association for Computational Linguistics,
2017, pp. 317–323. doi:10.18653/v1/P17-2050.850
[26] R. Wu, Y. Yao, X. Han, R. Xie, Z. Liu, F. Lin, L. Lin, M. Sun, Open
relation extraction: Relational knowledge transfer from supervised data to
40
Data Intelligence Just Accepted MS.
https://doi.org/10.1162/dint_a_00227
Downloaded from http://direct.mit.edu/dint/article-pdf/doi/10.1162/dint_a_00227/2170609/dint_a_00227.pdf by guest on 07 November 2023
unsupervised data, in: Proceedings of the 2019 Conference on Empirical
Methods in Natural Language Processing and the 9th International Joint
Conference on Natural Language Processing, EMNLP-IJCNLP 2019, Hong855
Kong, China, November 3-7, 2019, Association for Computational Linguis-
tics, 2019, pp. 219–228. doi:10.18653/v1/D19-1021.
URL https://doi.org/10.18653/v1/D19-1021
[27] N. Zhang, X. Xu, L. Tao, H. Yu, H. Ye, S. Qiao, X. Xie, X. Chen, Z. Li,
L. Li, Deepke: A deep learning based knowledge extraction toolkit for860
knowledge base population, in: Proceedings of the The 2022 Conference on
Empirical Methods in Natural Language Processing, EMNLP 2022 - Sys-
tem Demonstrations, Abu Dhabi, UAE, December 7-11, 2022, Association
for Computational Linguistics, 2022, pp. 98–108.
URL https://aclanthology.org/2022.emnlp-demos.10865
[28] A. Roy, Y. Park, T. Lee, S. Pan, Supervising unsupervised open informa-
tion extraction models, in: Proceedings of the 2019 Conference on Empir-
ical Methods in Natural Language Processing and the 9th International
Joint Conference on Natural Language Processing, Association for Compu-
tational Linguistics, 2019, pp. 728–737. doi:10.18653/v1/D19-1067.870
[29] L. Cui, F. Wei, M. Zhou, Neural open information extraction, in: Pro-
ceedings of the 56th Annual Meeting of the Association for Computational
Linguistics (Volume 2: Short Papers), Association for Computational Lin-
guistics, 2018, pp. 407–413. doi:10.18653/v1/P18-2065.
[30] K. Kolluru, S. Aggarwal, V. Rathore, S. Chakrabarti, et al., Imojie: It-875
erative memory-based joint open information extraction, in: Proceedings
of the 58th Annual Meeting of the Association for Computational Lin-
guistics, Association for Computational Linguistics, 2020, pp. 5871–5886.
doi:10.18653/v1/2020.acl-main.521.
[31] G. Stanovsky, J. Michael, L. Zettlemoyer, I. Dagan, Supervised open880
information extraction, in: Proceedings of the 2018 Conference of the
41
Data Intelligence Just Accepted MS.
https://doi.org/10.1162/dint_a_00227
Downloaded from http://direct.mit.edu/dint/article-pdf/doi/10.1162/dint_a_00227/2170609/dint_a_00227.pdf by guest on 07 November 2023
North American Chapter of the Association for Computational Linguis-
tics: Human Language Technologies, Volume 1 (Long Papers), Associa-
tion for Computational Linguistics, 2018, pp. 885–895. doi:10.18653/v1/
N18-1081.885
[32] J. Zhan, H. Zhao, Span model for open information extraction on accurate
corpus, in: Proceedings of the AAAI Conference on Articial Intelligence,
Vol. 34, 2020, pp. 9523–9530. doi:10.1609/aaai.v34i05.6497.
[33] M. Eberts, A. Ulges, Span-based joint entity and relation extraction with
transformer pre-training, in: Proceedings of the 24th European Conference890
on Articial Intelligence, IOS Press, 2020, pp. 2006–2013. doi:10.3233/
FAIA200321.
[34] S. Jia, E. Shijia, L. Ding, X. Chen, Y. Xiang, Hybrid neural tagging model
for open relation extraction, Expert Systems with Applications 200 (2022)
116951. doi:10.1016/j.eswa.2022.116951.895
[35] J. Gan, P. Huang, J. Zhou, B. Wen, Chinese open information extrac-
tion based on dbmcss in the eld of national information resources, Open
Physics 16 (1) (2018) 568–573. doi:10.1515/phys-2018-0074.
[36] Z. Lyu, K. Shi, X. Li, L. Hou, J. Li, B. Song, Multi-grained dependency
graph neural network for chinese open information extraction, in: Advances900
in Knowledge Discovery and Data Mining, Springer International Publish-
ing, 2021, pp. 155–167. doi:10.1007/978-3-030-75768-7_13.
[37] Y. Shen, S. Tan, A. Sordoni, A. Courville, Ordered neurons: Integrating
tree structures into recurrent neural networks, in: Proceedings of the In-
ternational Conference on Learning Representations, 2018.905
[38] A. See, P. J. Liu, C. D. Manning, Get to the point: Summarization with
pointer-generator networks, in: Proceedings of the 55th Annual Meeting of
the Association for Computational Linguistics (Volume 1: Long Papers),
42
Data Intelligence Just Accepted MS.
https://doi.org/10.1162/dint_a_00227
Downloaded from http://direct.mit.edu/dint/article-pdf/doi/10.1162/dint_a_00227/2170609/dint_a_00227.pdf by guest on 07 November 2023
Association for Computational Linguistics, 2017, pp. 1073–1083. doi:10.
18653/v1/P17-1099.910
[39] J. Li, Z. Liu, Granule description in knowledge granularity and represen-
tation, Knowledge-Based Systems 203 (2020) 106160. doi:10.1016/j.
knosys.2020.106160.
[40] H. Liu, W. Li, Y. Li, A new computational method for acquiring eect
knowledge to support product innovation, Knowledge-Based Systems 231915
(2021) 107410. doi:10.1016/j.knosys.2021.107410.
[41] J. Zhang, K. Hao, X. song Tang, X. Cai, Y. Xiao, T. Wang, A multi-feature
fusion model for chinese relation extraction with entity sense, Knowledge-
Based Systems 206 (2020) 106348. doi:10.1016/j.knosys.2020.106348.
[42] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, Bert: Pre-training of deep920
bidirectional transformers for language understanding, in: Proceedings of
the 2019 Conference of the North American Chapter of the Association
for Computational Linguistics: Human Language Technologies, Volume 1
(Long and Short Papers), Association for Computational Linguistics, 2019,
pp. 4171–4186. doi:10.18653/v1/N19-1423.925
[43] Y. Cui, W. Che, T. Liu, B. Qin, Z. Yang, S. Wang, G. Hu, Pre-training
with whole word masking for chinese bert, IEEE/ACM Transactions on
Audio, Speech, and Language Processing 29 (2021) 3504–3514. doi:10.
1109/TASLP.2021.3124365.
[44] J. W. Ratcli, D. E. Metzener, Pattern-matching-the gestalt approach, Dr930
Dobbs Journal 13 (7) (1988) 46.
[45] B. Qin, A. Liu, T. Liu, Unsupervised chinese open entity relation extrac-
tion, Journal of computer research and development 52 (5) (2015) 1029.
doi:10.7544/issn1000-1239.2015.20131550.
[46] S. Li, Z. Zhao, R. Hu, W. Li, T. Liu, X. Du, Analogical reasoning on935
chinese morphological and semantic relations, in: Proceedings of the 56th
43
Data Intelligence Just Accepted MS.
https://doi.org/10.1162/dint_a_00227
Downloaded from http://direct.mit.edu/dint/article-pdf/doi/10.1162/dint_a_00227/2170609/dint_a_00227.pdf by guest on 07 November 2023
Annual Meeting of the Association for Computational Linguistics (Volume
2: Short Papers), Association for Computational Linguistics, 2018, pp.
138–143. doi:10.18653/v1/P18-2023.
[47] G. Lample, M. Ballesteros, S. Subramanian, K. Kawakami, C. Dyer, Neu-940
ral architectures for named entity recognition, in: Proceedings of the 2016
Conference of the North American Chapter of the Association for Compu-
tational Linguistics: Human Language Technologies, Association for Com-
putational Linguistics, 2016, pp. 260–270. doi:10.18653/v1/N16-1030.
[48] Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis,945
L. Zettlemoyer, V. Stoyanov, Roberta: A robustly optimized bert pretrain-
ing approach, arXiv preprint arXiv:1907.11692.
[49] Y. Cui, W. Che, T. Liu, B. Qin, S. Wang, G. Hu, Revisiting pre-trained
models for chinese natural language processing, in: Proceedings of the
2020 Conference on Empirical Methods in Natural Language Processing:950
Findings, Association for Computational Linguistics, 2020, pp. 657–668.
doi:10.18653/v1/2020.findings-emnlp.58.
[50] J. Wei, X. Ren, X. Li, W. Huang, Y. Liao, Y. Wang, J. Lin, X. Jiang,
X. Chen, Q. Liu, Nezha: Neural contextualized representation for chinese
language understanding, arXiv preprint arXiv:1909.00204.955
[51] K. Clark, M.-T. Luong, Q. V. Le, C. D. Manning, Electra: Pre-training
text encoders as discriminators rather than generators, in: Proceedings of
the International Conference on Learning Representations, 2019.
44
Data Intelligence Just Accepted MS.
https://doi.org/10.1162/dint_a_00227
Downloaded from http://direct.mit.edu/dint/article-pdf/doi/10.1162/dint_a_00227/2170609/dint_a_00227.pdf by guest on 07 November 2023
... LLMs have suffered from serious hallucination issues (Sun et al., 2024;Wen et al., 2023;Li et al., 2023a). To solve the problem, researchers retrieve related knowledge to enhance the models (Wen et al., 2023;Lu et al., 2023;Wang et al., 2023b). ...
... LLMs have suffered from serious hallucination issues (Sun et al., 2024;Wen et al., 2023;Li et al., 2023a). To solve the problem, researchers retrieve related knowledge to enhance the models (Wen et al., 2023;Lu et al., 2023;Wang et al., 2023b). Firstly, several works get knowledge through search engines, they finetune models to imitate human's searching actions (Nakano et al., 2021) or use in-context learning to let the model generate API calls (Gao et al., 2023;Trivedi et al., 2023;Lu et al., 2023). ...
... LLMs have suffered from serious hallucination issues (Sun et al., 2024;Wen et al., 2023;Li et al., 2023a). To solve the problem, researchers retrieve related knowledge to enhance the models (Wen et al., 2023;Lu et al., 2023;Wang et al., 2023b). ...
... LLMs have suffered from serious hallucination issues (Sun et al., 2024;Wen et al., 2023;Li et al., 2023a). To solve the problem, researchers retrieve related knowledge to enhance the models (Wen et al., 2023;Lu et al., 2023;Wang et al., 2023b). Firstly, several works get knowledge through search engines, they finetune models to imitate human's searching actions (Nakano et al., 2021) or use in-context learning to let the model generate API calls (Gao et al., 2023;Trivedi et al., 2023;Lu et al., 2023). ...
Preprint
Large language models (LLMs) sometimes demonstrate poor performance on knowledge-intensive tasks, commonsense reasoning is one of them. Researchers typically address these issues by retrieving related knowledge from knowledge graphs or employing self-enhancement methods to elicit knowledge in LLMs. However, noisy knowledge and invalid reasoning issues hamper their ability to answer questions accurately. To this end, we propose a novel method named eliciting, filtering and integrating knowledge in large language model (LINKED). In it, we design a reward model to filter out the noisy knowledge and take the marginal consistent reasoning module to reduce invalid reasoning. With our comprehensive experiments on two complex commonsense reasoning benchmarks, our method outperforms SOTA baselines (up to 9.0% improvement of accuracy). Besides, to measure the positive and negative impact of the injected knowledge, we propose a new metric called effectiveness-preservation score for the knowledge enhancement works. Finally, through extensive experiments, we conduct an in-depth analysis and find many meaningful conclusions about LLMs in commonsense reasoning tasks.
... With the increase in scale, large language models (LLMs) have demonstrated outstanding performance in different tasks (Li et al., 2023;Wen et al., 2023;Sun et al., 2024;Jin et al., 2024a), among them, commonsense reasoning has received significant attention due to its importance for general intelligence (Wang et al., 2022(Wang et al., , 2024Liu et al., 2024). In this task, researchers have proposed a series of chain-of-thought (CoT) like techniques to elicit models' potential abilities (e.g. ...
... Thus, previous studies have proposed various approaches to improve the models' performance on RE under lowresource conditions. For example, some studies propose harnessing higher-resource data through methods such as Weakly Supervised Augmentation (Najafi and Fyshe, 2023), Multi-lingual Augmentation (Taghizadeh and Faili, 2022), and Auxiliary Knowledge Enhancement (Wen et al., 2023). Other studies exploit more robust models, employing techniques such as Meta Learning (Obamuyide and Johnston, 2022), Transfer Learning (Sun et al., 2021), and Prompt Learning (Hsu et al., 2023). ...
Preprint
Full-text available
Relation Extraction (RE) serves as a crucial technology for transforming unstructured text into structured information, especially within the framework of Knowledge Graph development. Its importance is emphasized by its essential role in various downstream tasks. Besides the conventional RE methods which are based on neural networks and pre-trained language models, large language models (LLMs) are also utilized in the research field of RE. However, on low-resource languages (LRLs), both conventional RE methods and LLM-based methods perform poorly on RE due to the data scarcity issues. To this end, this paper constructs low-resource relation extraction datasets in 10 LRLs in three regions (Central Asia, Southeast Asia and Middle East). The corpora are constructed by translating the original publicly available English RE datasets (NYT10, FewRel and CrossRE) using an effective multilingual machine translation. Then, we use the language perplexity (PPL) to filter out the low-quality data from the translated datasets. Finally, we conduct an empirical study and validate the performance of several open-source LLMs on these generated LRL RE datasets.
Article
Open Relation Extraction (ORE) task remains a challenge to obtain a semantic representation by discovering arbitrary relations from the unstructured text. Conventional methods heavily depend on feature engineering or syntactic parsing, which are inefficient or error-cascading. Recently, leveraging supervised deep learning methods to address the ORE task is a promising way. However, there are two main challenges: (1) The lack of enough labeled corpus to support supervised training; (2) The exploration of specific neural architecture that adapts to the characteristics of open relation extracting. In this paper, we build a large-scale, high-quality training corpus in a fully automated way. And wedesign a tagging scheme to assist in transforming the ORE task into a sequence tagging processing. Furthermore, we propose a hybrid neural network model (HNN4ORT) for open relation tagging. The model employs the Ordered Neurons LSTM to encode potential syntactic information to capture the associations among the arguments and relations. It also emerges a novel Dual Aware Mechanism, including Local-aware Attention and Global-aware Convolution. The dual aware nesses complement each other. Takes the sentence-level semantics as a global perspective, and at the same time, the model implements salient local features to achieve sparse annotation. Experiment results on various testing sets show that our model achieves state-of-the-art performance compared toconventional methods or other neural models.
Article
Bidirectional Encoder Representations from Transformers (BERT) has shown marvelous improvements across various NLP tasks, and its consecutive variants have been proposed to further improve the performance of the pre-trained language models. In this paper, we aim to first introduce the whole word masking (wwm) strategy for Chinese BERT, along with a series of Chinese pre-trained language models. Then we also propose a simple but effective model called MacBERT, which improves upon RoBERTa in several ways. Especially, we propose a new masking strategy called MLM as correction (Mac). To demonstrate the effectiveness of these models, we create a series of Chinese pre-trained language models as our baselines, including BERT, RoBERTa, ELECTRA, RBT, etc. We carried out extensive experiments on ten Chinese NLP tasks to evaluate the created Chinese pre-trained language models as well as the proposed MacBERT. Experimental results show that MacBERT could achieve state-of-the-art performances on many NLP tasks, and we also ablate details with several findings that may help future research. We open-source our pre-trained language models for further facilitating our research community. 1</xref
Article
Effect provides a scientific principle-level means for product function realization. The unexpected or new application of effects can create high-level innovations enabling products long-term technical advantages and market competitiveness. Acquiring design knowledge is the vital first step of conducting product innovation activities. In order to capture the effect knowledge that can efficiently aid high-level product innovation, this article proposes a new computational method. The method stems from a novel effect knowledge representation considering both functional and technical area features, and utilizes functional-flow terms of Functional Basis and technical area categories of international patent classification (IPC) respectively to standardize the modelling of the two kinds of features. Based on such representation, the method reasonably combines syntactic analysis, WordNet and word vector technologies to extract the desired effect knowledge from IPC text. To evaluate the method, this article first compares the acquired knowledge with those in a comprehensive human-compiled effect database, and then applies the knowledge to aid the innovation design of several mechanical products with different technical backgrounds. Evaluation results and the discussion based on them suggest the feasibility and potential of the proposed method in automatically acquiring well-organized effect knowledge system, as well as in aiding high-level product innovation.
Chapter
Recent neural Open Information Extraction (OpenIE) models have improved traditional rule-based systems significantly for Chinese OpenIE tasks. However, these neural models are mainly word-based, suffering from word segmentation errors in Chinese. They utilize dependency information in a shallow way, making multi-hop dependencies hard to capture. This paper proposes a Multi-Grained Dependency Graph Neural Network (MGD-GNN) model to address these problems. MGD-GNN constructs a multi-grained dependency (MGD) graph with dependency edges between words and soft-segment edges between words and characters. Our model makes predictions based on character features while still has word boundary knowledge through word-character soft-segment edges. MGD-GNN updates node representations using a deep graph neural network to fully exploit the topology structure of the MGD graph and capture multi-hop dependencies. Experiments on a large-scale Chinese OpenIE dataset SpanSAOKE shows that our model could alleviate the propagation of word segmentation errors and use dependency information more effectively, giving significant improvements over previous neural OpenIE models.