ArticlePDF Available

Entity Overlapping Relation Extracting Algorithm based on CNN and BERT

Authors:

Abstract and Figures

Knowledge graphs show excellent application potential in natural language processing, but extracting overlapping relations of entities in their construction poses a significant challenge. This paper proposes an entity overlapping relation extraction algorithm based on a one-dimensional convolutional neural network, which combines the local features of convolution and the context features of sequence, enhances the feature concentration of data, and uses a cascaded decoding framework to solve the problem of overlapping relation extraction effectively. The feasibility of the proposed method was verified on two public NYT and WebNLG English datasets, and the experimental results show that the F1-score values of this algorithm were improved by 1.9% and 0.6%, respectively, significantly superior to similar algorithms.
Content may be subject to copyright.
VOLUME XX, 2017 1
Date of publication xxxx 00, 0000, date of current version xxxx 00, 0000.
Digital Object Identifier 10.1109/ACCESS.2022.Doi Number
Entity Overlapping Relation Extracting
Algorithm based on CNN and BERT
Yongqing Yang1, Siyuan Li2
1School of Computer Science and Engineering, Northwestern Polytechnical University, Xi’an,Shaanxi 710126,China
2School of Science, Xi'an Jiaotong University, Xi’an, Shaanxi 710126, China
Corresponding author: Yongqing Yang (e-mail: yangyongqing@opt.ac.cn).
This work was supported by the National Key Research and Development Program of China(No.2022YFF1300201), National Natural Science Foundation
of China with grant No.42101380 and No. 62201568.
ABSTRACT Knowledge graphs show excellent application potential in natural language processing, but
extracting overlapping relations of entities in their construction poses a significant challenge. This paper
proposes an entity overlapping relation extraction algorithm based on a one-dimensional convolutional
neural network, which combines the local features of convolution and the context features of sequence,
enhances the feature concentration of data, and uses a cascaded decoding framework to solve the problem
of overlapping relation extraction effectively. The feasibility of the proposed method was verified on two
public NYT and WebNLG English datasets, and the experimental results show that the F1-score values of
this algorithm were improved by 1.9% and 0.6%, respectively, significantly superior to similar algorithms.
INDEX TERMS Convolution Neural Network, Bidirectional Encoder Representations from Transformers,
Entity Relation Extraction
I. INTRODUCTION
As network technology advances rapidly, there is a
growing need to extract structured information from ever-
increasing massive, semi-structured data. Information
extraction is obtaining conceptual entities and associative
relations containing semantic information from natural
language text. This process involves two subtasks: entity
recognition and relation extraction. Entity recognition
aims to identify conceptual entities in a natural language
text using model algorithms, while relation extraction
seeks to assign the appropriate relations to the recognized
entities. Previous research implemented relation extraction
in a pipeline manner; entity recognition was performed
first, followed by relation extraction based on its results
[1]. However, the pipeline approach may propagate errors
from one subtask to another, affecting the overall
performance. Consequently, subsequent research adopted
a joint learning approach for the two subtasks of relation
extraction. Various methods have been proposed for
relation extraction, such as rule-based, distant
supervision-based, and deep learning-based. Rule-based
methods rely on manually crafted rules by domain experts
to match entities that conform to the rules from the text;
distant supervision-based methods leverage existing
databases to map data with entities and obtain relations
that match the entities in the database; still, this mapping
may cause semantic drift and introduce noise [2]; deep
learning-based methods formulate relation extraction as a
classification problem and train a supervised learning
model based on annotated data to perform relation
extraction. San et al[3]. combined the SURF (Speed-up
Robust Features) algorithm and CNN (Convolutional
Neural Network) to classify three types of car seat backs:
A, B, and C. Using a neural network-based car seat back
detection method, test results show that the system can
effectively reduce labor costs and improve the detection
efficiency of auto parts. Zhang et al[4]. constructed an
improved convolutional neural network. This network
retains more information and achieves better performance
in experimental evaluations by fusing forward and
backward information and the complementarity of
temporal and spatial features.
Many relations are missing in previous relation
classification tasks. Reference [5] proposed a new binary
cascade framework (Cascade Binary Tagging Framework
for Relational Triple, CasRel) to address the problem of
relation omission. Based on the pre-trained model BERT,
the model consists of two binary frameworks, one for
finding the head entity and the other for a set of cascaded
binaries of relation quantity. These two frameworks
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3422273
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
Author Name: Preparation of Papers for IEEE Access (February 2017)
VOLUME XX, 2017 7
achieve the extraction of overlapping relations. Still, the
model only uses the encoded annotation framework for
extraction and does not further process the features.
Natural language processing has witnessed rapid
progress in various tasks since the advent of
BERT(Bidirectional Encoder Representations from
Transformers) [6]. Before BERT, neural networks were
extensively employed in diverse domains to fulfill
different task objectives. Drawing on the insights of
neural networks and the CASREL framework, we pay
attention to the model's distinct phases of data processing
and utilize various methods and techniques to acquire
more abundant information and enhance the performance
of entity relation joint extraction. Yu et al[7]. proposed an
end-to-end lightweight quantitative modeling framework
based on ensemble convolutional neural networks . The
model achieves excellent predictive performance while
maintaining interpretability. This advancement in
quantitative analysis could contribute to the field of
planetary science by significantly improving the accuracy
and understanding of Mars's geological composition
analysis.
Hence, this paper presents a network model CASCNN
(Cascade Binary Tagging Framework Based on
Convolution) for relation extraction within the CASREL
framework. The network model first acquires word
vectors with contextual semantic information via the
BERT pre-trained model, then employs a one-dimensional
convolutional neural network (CNN) to extract and learn
context-related features, and finally predicts the index
position of the entity using a binary classifier, computes
the probability of each word as the start and end and
marks it as 1 if it exceeds the threshold; otherwise, it is
marked as 0”.
CASCNN is an end-to-end joint entity relationship
extraction model based on a one-dimensional
convolutional neural network. By combining CASREL
and CNN, the model can extract contextual features of
each word and capture local features. We integrate
convolutional networks into entity recognition and
relationship extraction frameworks, using attention-based
bidirectional encoders to capture contextual information.
By incorporating a convolutional network into the model,
our approach considers the context-specific and local
features of the text.
II. LITERATURE REVIEW
A dependency existed between relation extraction and
entity recognition in prior studies, and the extraction of
relations relied on identifying entities. Implementing the two
separately entails many issues, such as error accumulation.
The method of separating the two tasks neglects the
relevance of the functions; thus, relation extraction largely
hinges on the outcome of entity recognition; in case of low
accuracy of entity recognition, the extraction of relations will
induce erroneous model fitting, leading to errors not only in
entity recognition but also in relation extraction. Moreover,
the serial mode cannot address this problem effectively due
to overlapping relations. Hence, joint entity recognition and
relation extraction models have received considerable
attention recently.
Reference [8] presents an end-to-end model for joint entity
and relation extraction, which debuts the employment of
neural networks for information extraction tasks by sharing
the parameters of the encoding and sequence layers. Still, this
model must pay more attention to the long-term
dependencies among labels in the entity extraction task.
Reference [9] simplifies the complexity of relation extraction
by casting it as a sequence labeling problem, but it does not
account for the issue of entity relation overlap. Reference [10]
abandons the multi-classification approach with mutually
exclusive relations for relation extraction and instead adopts
multiple independent binary classification tasks for each
relation. However, this model must work on dealing with
entities interrupted by other entities or non-entities.
Reference [11] formulates joint entity and relation extraction
as a multi-turn question-and-answer task, where a question-
and-answer template characterizes each entity and each
relation. This method effectively captures the hierarchical
dependencies among labels but hinges on prior knowledge
and question-and-answer templates. Reference [6] introduces
a cascaded binary tagging framework for the overlapping
relation problem. The crux of this framework is to model
relation extraction not as a classification problem for entity
pairs but as a mapping function from head entities to tail
entities. It first identifies head entities and then infers tail
entities under a given relation. Still, this framework needs to
exploit the features of the data sufficiently. The CASREL
method is a sequence of binary classifiers designed to
identify entities and their corresponding relationships in text.
While adequate for specific tasks, CASREL faces challenges,
particularly with overlapping entities where the start and end
of an entity may need to be more easily distinguishable.
This paper proposes a convolutional network-based
method for relation extraction. We utilize one-dimensional
convolutional networks to extract local and contextual
features from text data. It captures the context specificity of
each word and local features crucial for distinguishing
overlapping entities. We performed preliminary experiments
and achieved significant improvements. We also evaluated
our method on two public datasets to validate its
effectiveness. The experimental results demonstrate that our
proposed model can enhance the performance of joint
extraction tasks and outperform or match the state-of-the-art
methods on the NYT and WebNLG datasets.
III. INTRODUCTION TO THE MODEL
Figure 1 illustrates the extraction method proposed in this
paper. The model structure comprises (1) The BERT layer,
which employs BERT to encode the word vectors with
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3422273
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
Author Name: Preparation of Papers for IEEE Access (February 2017)
VOLUME XX, 2017 7
contextual semantic features; (2) The first convolutional
layer (CNN1), which applies convolutional networks to
extract local features from the encoded vectors for the entity
tagging subtask; (3) Head entity tagger, which detects all
possible head entities and labels their start and end positions.
(4) The second convolutional layer (CNN2) utilizes
convolutional networks to extract local features for the
overlapping relation tail entity tagging subtask. (5) The
specific relation tail entity tagger. It comprises a series of
specific relation tail entity taggers, which label the tail entity
positions of the relations.
FIGURE 1. Convolutional neural network-based entity relationship joint
extraction framework.
A. BERT encoder
The encoder module extracts a feature from the sentence
and inputs it to the head entity tagger module. We employ
the pre-trained model BERT [7] to encode the contextual
information. BERT has been proven effective in various
natural language processing tasks [12]. Specifically, BERT
comprises N identical transformer blocks that leverage the
self-attention mechanism to capture contextual information
better. The self-attention mechanism computes the relevance
and importance of each word in a sentence concerning all
other words. Based on these scores, new representations are
generated for each word. We denote the transformer block as
Trans(x), where x is the input vector.
0 s p
h SW W
(1)
( ), [ , N]
α α 1
h Trans h α 1
(2)
where S is the one-hot encoding vector matrix of each
subword index obtained by tokenizing the input sentence,
is the subword embedding matrix, is the position
embedding matrix, where p denotes the position index of the
input sequence, is the vector of the hidden state, i.e., the
contextual representation of the input sentence at the α-th
layer, N is the number of transformer blocks. Note that the
input here is a single sentence, not a sentence pair, so we do
not consider the segment embedding vector described in the
BERT paper for the input part.
B. CNN LAYER
Convolutional neural networks (CNNs) are a classic
model in the image domain, and they have also become
popular in the natural language processing domain in recent
years. In this paper, we insert a one-dimensional CNN layer
after the encoder layer and before each of the two decoder
frameworks to further extract the local features of the
semantics. The CNN layer of the former framework is for
extracting local features for the entity recognition task, and
the CNN layer of the latter framework is for extracting
local features from the concatenation of the head entity
tagger output and the BERT encoder output. Our goal is to
further capture local word-level features, so we set the
convolution kernel size to 3, which covers each word and
its immediate neighbors and can better exploit the adjacent
features of each word. The “padding” is set to “same” to
keep the CNN layer output dimension consistent with the
binary classifier's input dimension. Since our CNN has only
one layer and BERT has already encoded the contextual
features of the sentence, we do not use any pooling layer
here. The convolution operation is illustrated as follows:
j
( * ( ) )
ij j
i 1 N 1
z f w h i b
󰇛󰇜
where
 is the convolution kernel of the convolution
operation,
󰇛󰇜 is the output of the BERT encoder, i
denotes the ith word, i(1,seq_len), seq_len is the sequence
length, i.e., the text length. N denotes the number of
transformer blocks,
is the bias term of the jth feature map,
f is the non-linear activation function; here, we choose the
relu function with high sparsity and effective solution to the
gradient vanishing problem, denotes the feature map of
the ith output.
FIGURE 2. Figure 2 Convolutional process diagram.
The one-dimensional convolution operation used in
this article, as shown in Figure 2, shows the operation
principle of one-dimensional convolution in text. In a one-
dimensional convolution operation, only one dimension is
specified for the convolution and size, while the other
dimension matches the length of the word encoding vector.
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3422273
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
Author Name: Preparation of Papers for IEEE Access (February 2017)
VOLUME XX, 2017 7
Moreover, the convolution kernel moves sequentially along
the text, unlike two-dimensional convolution, which moves
in both dimensions. In Figure 2, the 16x10 matrix denotes
the text encoding vector, which is padded at the first and
last layers to maintain a constant output dimension. After
removing the first and last rows, each character has a 1x10
vector, resulting in 14 characters in total, including
punctuation marks. Each row corresponds to a word
encoding vector. The convolution kernel has a size of 3x10
and performs convolution operations on the text encoding
vector representation to obtain the feature representation.
The filter size determines the range of the receptive
field of the convolutional layer and affects the model's
ability to capture local features of the input data. We
experimentally compared the impact of filters of different
sizes on model performance to determine the most
appropriate receptive field size. At the same time, the
number of filters directly affects the complexity and
learning ability of the model. We adjusted the number of
filters, observed their impact on model performance, and
chose a balance point that captured sufficient information
while avoiding overfitting.
The depth of the network affects the level at which the
model learns features. Deeper networks can learn more
abstract features but are also more susceptible to vanishing
or exploding gradient problems. To find the optimal
network depth, we gradually increase the convolutional
layers and use residual connections to alleviate the gradient
problem in deep network training. With this approach, we
aim to build a network structure capable of learning deep
features without losing training stability.
C. BINARY TAGGER
The cascaded binary decoding modules proposed in
Reference [3] have achieved significant improvement in
solving the problem of overlapping relations. Here, we
borrow from the binary cascade decoding tagger. The idea
of this module is to first identify the head entity from the
BERT-encoded text vector through the head entity tagger.
For each head entity, traverse all relationships, and define
the corresponding tail entity for each relationship. In the
absence of a tail entity, at each character position, the
binary tagger will mark the possibility of this character
being a tail entity as 0.
Head entity tagger: This tagger identifies all potential
head entities in the input sentence by directly decoding the
BERT encoded text feature vector
via the feature vector
from convolution operation. It consists of two identical
binary classifiers, which assign a binary label (0/1) to each
word, indicating the likelihood of the current position being
the start or end of a head entity. The head entity tagger
performs the following operations for each word:
_( )
start s
i start i start
p σ W z b
󰇛󰇜
_( )
end s
i end i end
p σ W z b
󰇛󰇜
 and
 denote the probabilities of the start and
end positions of the head entity of the ith word in the input
sequence, respectively. Based on a predefined threshold, we
use a binary classifier to indicate whether the word belongs
to the head entity. The classifier is set to 1 if the probability
is above the threshold and 0 otherwise. represents the
encoding of the ith word, where W represents the training
weight, b represents the bias, and the activation function is
sigmoid.
The specific relation tail entity tagger: This module
consists of n sets of binary tagging frames, where n
represents the number of relationships, which can
effectively solve overlapping relationships. The specific
relation entity tagger processes each word as follows:
j
c ( (z v ) )
ij k j
i 2 i sub 2
f w b
󰇛󰇜
_( * )
start o r r
i start i start
p σ W c b
󰇛󰇜
󰇛󰇜
where
 is the convolution kernel of the convolution
operation,
is the bias item of the jth feature map, is the
output of CNN1 layer, is the output of CNN2 layer, i
represents the ith word in the sequence, f denotes the
activation function, which is relu, the same as that of the
first CNN layer.
and
represent the probability
that the ith word in the input sequence is the start and end
position of the object, respectively. 
represents the kth
head entity detected by the head entity tagger. A head entity
usually consists of multiple words. To make the addition of
and 
in equation (6) possible, we need to ensure that
the two vectors have indentical dimensions. To achieve this,
we take the average vector of the beginning and ending
words of the kth head entity, denoted by 
.
IV. EXPERIMENTS
In this section, we introduce two datasets for English
relationship extraction, followed by an introduction to
comparative thinking, and conclude with an analysis of the
experimental results.
A. DATA SETS
We evaluated the framework on two public relation
extraction datasets: NYT 10 and WebNLG 11. The NYT
(New York Times) dataset is a widely used evaluation
dataset in current academic research. It is constructed from
the Freebase knowledge base and the New York Times
corpus. At the same time, the Stanford NER tool was used
to align the entity mentions in the text corpus with the
entities in Freebase to avoid noise caused by homonyms. It
contains 24 types of relations. WebNLG is a dataset
generated with knowledge from the DBPedia knowledge
base, which contains 6 categories (astronauts, buildings,
monuments, universities, sports teams, and works), and the
dataset contains 246 valid relations. Among them, NYT
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3422273
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
Author Name: Preparation of Papers for IEEE Access (February 2017)
VOLUME XX, 2017 7
contains 56,195 sentences for training, 4,999 sentences for
verification and 5,000 sentences for testing; WebNLG
contains 5,019 sentences for training, 500 sentences for
verification, and 703 sentences for testing. The statistics of
the two datasets are shown in Table 1.
The NYT dataset is geographically and culturally
biased due to its source. In contrast, the WebNLG dataset
requires attention to gender, racial, or cultural biases that
may arise when converting structured data into text. These
biases may affect the model's generalization ability,
especially when dealing with diverse and real-world
scenarios. To improve the model's generalization, we
introduce a wider range of data sources in model training
and adopt strategies such as data augmentation and fairness
constraints to reduce bias. In addition, model transparency
and interpretability are equally important, which helps
increase user trust in the model output.
TABLE I
NYT AND WEBNLG DATASETS
B. EXPERIMENTAL RESULTS
We compared the most advanced methods with
previous classical methods, NovelTagging [16], CopyR
[17], GraphRel [18], CopyRRL [19] and CasRel 3. The
results of the above baselines in the experimental results
were directly extracted from the original text.
Exhaustive control of CASCNN method for CASREL
is not possible, but CASCNN accepts CASREL's function.
CASREL's order processing method allows for a deeper
understanding of the upper and lower sentences, which is
particularly advantageous in areas such as universal text
analysis.
TABLE II
EXPERIMENTAL RESULTS OF DIFFE RENT BASELINE MODELS ON NYT AND
WEBNLG DATASETS
Table 2 shows the results of different baselines for
triple relation extractions on the two datasets. The
CASCNN (Cascade Binary Tagging Framework Based on
Convolution) model achieved F1 (harmonic mean of
precision and recall) scores on the NYT and WebNLG
datasets that were respectively 1.9% and 0.6% higher than
the most advanced method in Reference [3] and was only
slightly lower in precision than the current state-of-the-art
algorithm while being superior in all other indicators.
From the experimental results in Table 2, we observe a
significant difference in the performance of existing models
on the NYT and WebNLG datasets. Compared with other
algorithms, the model's performance on the NYT dataset is
better than that on the WebNLG dataset. We attribute this
discrepancy to the different sample sizes and numbers of
overlapping relations in the two datasets. We find from
Table 1 that the training samples of NYT are about 12 times
that of WebNLG. The NYT dataset primarily consists of
sentences of the Normal class, while most of the sentences
in the WebNLG dataset are of overlapping relation types.
The performance difference between the two datasets lies in
the inconsistent data distribution and the data volume size.
FIGURE 3. F1-score values of different overlapping types
Category
NYT
WebNLG
Train
Test
Train
Test
Normal
37013
3266
1596
246
EPO
9782
978
227
26
SEO
14735
1297
3406
457
ALL
61530
5541
5229
729
Method
NYT
WebNLG
Prec
Rec
F1
Prec
Rec
F1
NovelTagging[16]
77.9
67.2
72.1
63.3
59.9
61.6
CopyROneDecoder[
17]
57.4
56.4
56.9
29.7
27.5
28.5
CopyRMultiDecoder
[17]
60.0
56.6
58.7
35.3
38.2
36.7
GraphRel1p[18]
65.9
57.3
60.4
38.7
41.3
39.9
GraphRel2p[18]
58.9
65.0
61.8
46.9
38.6
42.9
CopyRRL[19]
74.9
71.2
73.0
60.2
62.5
61.4
CopyRRL[5]
74.8
68.4
71.5
63.5
56.6
59.8
CASREL[5]
89.7
89.5
89.6
93.4
90.1
91.8
CASCNN
91.7
91.2
91.5
93.0
91.9
92.4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3422273
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
Author Name: Preparation of Papers for IEEE Access (February 2017)
VOLUME XX, 2017 7
We further investigated the model’s performance on
overlapping relations by conducting additional experiments
on various types of overlaps, such as Normal, Single Entity
Overlap (SEO), and Entity Pair Overlap (EPO). Figure 3
shows that most baseline models exhibit slow improvement
or decline under the three modes, suggesting that triple
extraction becomes increasingly challenging in different
sentence overlaps. The Normal type is relatively
straightforward to extract, while the Entity Pair Overlap and
Single Entity Overlap types pose more difficulties.
FIGURE 4. F1-scores of different numbers of triples
We further evaluated the improved method's
performance in extracting relations from sentences with
varying numbers of relations. As shown in Figure 4, we
categorized the training data into five groups, each
corresponding to the number of triples in a sentence. We
observed that the baseline modes performed worse as the
number of triples in a sentence increased, indicating that the
previous baseline models struggled with overlapping triples.
The framework we adopted achieved significant
improvements in handling overlapping relations, and our
improved model also enhanced its performance. The
comparison of the two experiments demonstrated that the
method could extract multiple triples from a complex
sentence and effectively boost the model's performance
compared with the existing methods. With a large number
of training samples, the previous models steadily and
slowly improved in processing normal sentences. However,
compared to the baseline model, our reference framework
can also improve results even in large sample variance.
FIGURE 5. Loss reduction graph of CASCNN model on two datasets.
As shown in Figure 5, our model clearly achieved a
more rapid loss decline and better fitting. Our improvement
enables the model to reach a stable loss function more
quickly. Our model's superior fitting is a testament to its
ability to accurately capture the underlying patterns and
relationships within the data. This is particularly important
for predictive models, where the accuracy of the model's
predictions is directly linked to how well it can fit the
training data without overfitting. By achieving better fitting,
we are not only improving the model's predictive power but
also ensuring that it generalizes well to unseen data.
Furthermore, our improvement in the model's
performance also translates to its ability to reach a stable
loss function more quickly. A stable loss function is
indicative of the model having found a robust and reliable
solution to the problem at hand. The quicker attainment of
this stability is beneficial for several reasons. Firstly, it
reduces the computational resources required for training
the model, as the model does not need to iterate through as
many epochs to reach convergence. Secondly, it allows for
faster experimentation and iteration during the model
development process, as we can assess the model's
performance and make necessary adjustments more rapidly.
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3422273
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
Author Name: Preparation of Papers for IEEE Access (February 2017)
VOLUME XX, 2017 7
V. CONCLUSION
This paper presents an end-to-end joint entity relation
extraction model based on a one-dimensional convolutional
network. Combining CASREL and CNN can extract not
only each word's contextual but also local features. We
integrate the convolutional network into the entity
recognition and relation extraction framework, which
extracts local features from the encodings of the two
subtasks and employ an attention-based bidirectional
encoder to capture the contextual information. With the
addition of a convolutional network to the model, it
considers both the specificity of the text context and the
local features of the text. We evaluated our method on two
public datasets and showed that it outperformed the state-
of-the-art methods by 1.9% and 0.6%, respectively, with
similar training epochs. we aim to improve model
interpretability and computational efficiency through
architectural optimizations in the future. These efforts are
designed to advance the field of entity relation extraction
and expand its effectiveness across a broad range of
practical applications.
However, despite these advances, our model has some
limitations. For example, model interpretability remains
challenging, and further work is needed to improve our
understanding of model decision-making processes. In the
future, we aim to improve the model's interpretability and
computational efficiency through architectural optimization.
We are committed to improving not only the performance
of our models but also their reliability and transparency in
real-world applications through ongoing research.
ACKNOWLEDGEMENT
This work was supported by the National Key Research and
Development Program of China(No.2022YFF1300201)
National Natural Science Foundation of China with grant
No.42101380 and No. 62201568. And we also express our
great thanks to Tao Tao’s help and advice in algorithms and
data analysis.
REFERENCES
[1] Cícero dos Santos, Bing Xiang, and Bowen Zhou. 2015.Classifying
Relations by Ranking with Convolutional Neural Networks.
In Proceedings of the 53rd Annual Meeting of the Association for
Computational Linguistics and the 7th International Joint Conference
on Natural Language Processing (Volume 1: Long Papers), pages
626–634.
[2] Mintz M , Bills S , Snow R , et al. Distant supervision for relation
extraction without labeled data. ACL 2009, Proceedings of the 47th
Annual Meeting of the Association for Computational Linguistics
and the 4th International Joint Conference on Natural Language
Processing of the AFNLP, 2-7 August 2009, Singapore. Association
for Computational Linguistics, 2009.
[3] Sun S, Huang J, Zhu J, et al. Research on Both the Classification and
Quality Control Methods of the Car Seat Backrest Based on Machine
Vision[J]. Wireless Communications and Mobile Computing, 2022,
2022.
[4] Zhang R, Zhang L, Yu T, et al. Feature Fusion CNN-LSTM Network
Based Gait Recognition On Covariate Of Clothing And
Bag[C]//2021 3rd International Academic Exchange Conference on
Science and Technology Innovation (IAECST). IEEE, 2021: 304-307.
[5] Wei, Zhepei and Su, Jianlin and Wang, Yue and Tian, Yuan and
Chang, Yi.2020. A Novel Cascade Binary Tagging Framework for
Relational Triple Extraction. Proceedings of the 58th Annual
Meeting of the Association for Computational Linguistics, pages
1476-1488.
[6] Devlin, Jacob and Chang, Ming-Wei and Lee, Kenton and Toutanova,
Kristina.2018. BERT: Pre-training of Deep Bidirectional
Transformers for Language Understanding. arXiv preprint
arXiv:1810.04805.
[7] Miwa, Makoto and Bansal, Mohit. 2016 End-to-End Relation
Extraction using LSTMs on Sequences and Tree Structures.
Proceedings of the 54th Annual Meeting of the Association for
Computational Linguistics (Volume 1: Long Papers). pages 1105—
1116.
[8] Suncong Zheng, Feng Wang et al. 2017, Joint Extraction of Entities
and Relations Based on a Novel Tagging Scheme. ArXiv:
1706.05075.
[9] Giannis Bekoulis, Johannes Deleu et al. 2018, Joint entity
recognition and relation extraction as a multi-head selection problem.
Expert Syst. Appl. pages 34-45.
[10] Xiaoya Li, Fan Yin, Zijun Sun, et al. 2019 Entity-Relation Extraction
as Multi-Turn Question Answering. Association for Computational
Linguistics. pages 1340-1350.
[11] Peixiang Zhong, Di Wang, and Chunyan Miao. 2019.Knowledge-
enriched transformer for emotion detection in textual conversations.
InProceedings ofthe 2019 Conference on Empirical Methods in
Natural Language Processing and the 9th International Joint
Conference on Natural Language Processing (EMNLP-IJCNLP),
pages 165–176.
[12] Sebastian Riedel, Limin Yao, and Andrew McCallum. 2010.
Modeling relations and their mentions without labeled text. In Joint
European Conference on Machine Learning and Knowledge
Discovery in Databases, pages 148–163.
[13] Claire Gardent, Anastasia Shimorina, Shashi Narayan, and Laura
Perez-Beltrachini. 2017. Creating training corpora for nlg micro-
planners. In Proceedings of the 55th Annual Meeting of the
Association for Computational Linguistics (Volume 1: Long Papers),
pages 179–188.
[14] Xiangrong Zeng, Daojian Zeng, Shizhu He, Kang Liu, and Jun Zhao.
2018. Extracting relational facts by an end-to-end neural model with
copy mechanism. In Proceedings of the 56th Annual Meeting of the
Association for Computational Linguistics (Volume 1: Long Papers),
volume 1, pages 506–514.
[15] Suncong Zheng, Feng Wang, Hongyun Bao, Y uexing Hao, Peng
Zhou, and Bo Xu. 2017. Joint extraction of entities and relations
based on a novel tagging scheme. InProceedings of the 55th Annual
Meeting of the Association for Computational Linguistics (V olume
1: Long Papers), volume 1, pages 1227–1236.
[16] Xiangrong Zeng, Daojian Zeng, Shizhu He, Kang Liu, and Jun Zhao.
2018. Extracting relational facts by an end-to-end neural model with
copy mechanism. InProceedings of the 56th Annual Meeting of the
Association for Computational Linguistics (V olume 1: Long Papers),
volume 1, pages 506–514.
[17] Tsu-Jui Fu, Peng-Hsuan Li, and Wei-Y un Ma. 2019. Graphrel:
Modeling text as relational graphs for joint entity and relation
extraction. InProceedings of the 57th Annual Meeting of the
Association for Computational Linguistics, pages 1409–1418.
[18] Xiangrong Zeng, Shizhu He, Daojian Zeng, Kang Liu, Shengping
Liu, and Jun Zhao. 2019. Learning the extraction order of multiple
relational facts in a sentence with reinforcement learning.
InProceedings of the 2019 Conference on Empirical Methods in
Natural Language Processing and the 9th International Joint
Conference on Natural Language Processing (EMNLP-IJCNLP),
pages 367–377.
[19] Ryuichi Takanobu, Tianyang Zhang, Jiexi Liu, and Minlie Huang.
2019. A hierarchical framework for relation extraction with
reinforcement learning. In Proceedings of the AAAI Conference on
Artificial Intelligence, volume 33, pages 7072–7079.
[20] Raphael Hoffmann, Congle Zhang, Xiao Ling, Luke Zettlemoyer,
and Daniel S Weld. 2011. Knowledge-based weak supervision for
information extraction of overlapping relations. In Proceedings of the
49th Annual Meeting of the Association for Computational
Linguistics: Human Language Technologies-Volume 1, pages 541–
550.
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3422273
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
Author Name: Preparation of Papers for IEEE Access (February 2017)
VOLUME XX, 2017 7
Yang Yongqing is born in December 1986.
He is working for his Ph.D. degree in School
of Computer Science and Engineering of
Northwestern Polytechnical University, Xi’an,
Shaanxi, China. He also serves the Key
Laboratory of Spectral Imaging Technology,
Chinese Academy of Sciences. His current
research interests include design and
simulation of space-based imaging systems,
image processing.and all authors may include
biographies.
Li Siyuan is born in December 1982. He
serves the Key Laboratory of Spectral
Imaging Technology, Chinese Academy of
Sciences. He is now working as a Ph.D
candidate in School of Science in Xi'an
Jiaotong University, Xi an, Shaanxi, China.
His current research interests mainly focus on
hyperspectral imaging.
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3422273
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
... Convolutional neural networks (CNNs) and long short-term memory networks (LSTMs) are two prominent networks used for entity relationship extraction. CNN-based networks [7,8] primarily use one-dimensional convolutions to extract local features and spatial information from the data, while they lack the ability to capture contextual information, which impacts the accuracy of entity relationship extraction. LSTM-based networks [9,10] can model longrange dependencies within the data, thereby handling complex entity relationships more effectively. ...
Article
Full-text available
Aimed at mitigating the limitations of the existing document entity relation extraction methods, especially the complex information interaction between different entities in the document and the poor effect of entity relation classification, according to the semi-structured characteristics of patent document data, a patent document ontology model construction method based on hierarchical clustering and association rules was proposed to describe the entities and their relations in the patent document, dubbed as MPreA. Combined with statistical learning and deep learning algorithms, the pre-trained model of the attention mechanism was fused to realize the effective extraction of entity relations. The results of the numerical simulation show that, compared with the traditional methods, our proposed method has achieved significant improvement in solving the problem of insufficient contextual information, and provides a more effective solution for patent document entity relation extraction.
ResearchGate has not been able to resolve any references for this publication.