Content uploaded by Sanjida Akter Ishita
Author content
All content in this area was uploaded by Sanjida Akter Ishita on Aug 22, 2023
Content may be subject to copyright.
Intelligent Combination of Approaches Towards
Improved Bangla Text Summarization
Alam Khan∗, Sanjida Akter Ishita†, Fariha Zaman‡,Ashiqul Islam Ashik §and Md Moinul Hoque¶
Department of Computer Science and Engineering, Ahsanullah University of Science and Technology, Dhaka
Bangladesh
Email: ∗alamkhaan1997@gmail.com, †sanjidaakterishita@gmail.com, ‡fariha.aust.99@gmail.com,
§ashiqulislam170204070@gmail.com, ¶moinul@aust.edu
Abstract—Text summarization is a technique to ex-
tract the main concept from a large document. It turns
a big document into a smaller one without changing
the main context. Text summarization have widely
research area nowadays. There are two types of text
summarization, one generates an extractive summary
and the other generates an abstractive summary. Here
in this paper, an intelligent model is proposed which
can make an extractive summary from a given docu-
ment. After completing some preprocessing steps on
the document, some useful combinations of methods
are applied such as Named Entity-based scoring, key-
word based-scoring, parts of speech-based scoring, and
word and sentence based-analysis to rank the sentences
of the passage. These methods combined together
generated the nal summary. The proposed model is
compared with multiple human-made summaries and
the evaluation was performed with respect to precision,
recall, and F-measures. The model is also compared
with the state-of-the-art approaches and found to show
its eectiveness with respect to Precision (0.606) and
F-measure (0.6177) evaluation measures.
Index Terms—Extractive summary, data preprocess-
ing, TF-IDF, Sentence Scoring, Keyword Scoring, POS
Tagging, Positional Value
I. Introduction
Bangla text summarization has widely research scope.
But the problem is there is no proper dataset for Bangla
text summarization that can be considered sucient. We
composed and used four data sets, where each dataset
has eight categories and each category has a total of
30 passages. Each passage has 3 dierent human-made
summaries. We had a total of 960 passages and 2880
summaries for the evaluation process. Moreover, it has
been checked that every passage is unique. As well as we
have made a tool using Python (tkInter Frame) to gather
data and generate human-made summaries with the help
of random volunteers. We have categorized the news
documents into 8 categories in our own dataset. These
are- 1. Accident 2. Bangladesh 3. Crime 4. Economics 5.
Entertainment 6. International 7. Politics 8. Sports. We
have collected the passages from online news portals and
some pas- sages are collected by web scrapping from online
newspaper websites. We have checked the uniqueness of
passages whether it is not used in more than one document
and also if the passages are relevant to the categories or
not is inspected. Some summaries are generated manually
and some are generated by using our own made desktop
app using the python tkInter interface which can generate
human-made summaries very fast and accurately. We
have checked the passages and summaries manually that
if it contains the central idea of the source document.
A summary document check was performed to see if the
summary fullls 40% of the source document or not. In
addition, we have used the standard BNLP dataset [1]
for comparing the eectiveness of our system with other
similar systems. The next section describes the state-of-
the-art of the presented problem-solving.
II. Related Works
As English is an international language, most of the
research work has been done on the English language.
But there is also some research on the Bangla language.
Kamal Sarkar [2] proposed a model for an extractive
summary from a passage. They used three major steps
for constructing the model and nally using the model
he got a machine-generated extractive summary from a
passage. Scoring of sentences was done using TF-IDF.
The score of a sentence k, Skis calculated as per equation
1
Sk=∑
w
T F I DFw,k (1)
The summary is generated that is sequential as the pas-
sage. Recently Kamal Sarkar [3] published a research ar-
ticle on Unsupervised Bengali Text Summarization where
they approached a spectral clustering algorithm to identify
various topics covered in a Bengali document and generate
a summary by selecting signicant sentences from the
identied topics (Rogue-1 recall score is 0.4481).
Jingzhou Liu [4] approached a graph-based single-
document unsupervised extractive method that constructs
a distance-augmented sentence graph from a document
that enables the model to perform more ne-grained
modeling of sentences and better characterize the original
2023 International Conference on Applied Intelligence and Sustainable Computing (ICAISC)
979-8-3503-2379-5/23/$31.00 ©2023 IEEE
document structures. This paper uses an automatically
constructed sentence graph from each document to select
sentences for summarization based on both the similarities
and relative distances in the neighborhood of each sen-
tence. Unfortunately, though their approach was unique,
they were really bad at generating the summary.
P.Tumpa [5] discussed a model on improved extractive
summarization technique. The summary is generated
using the K-means clustering algorithm which lacks syn-
chronization.
Chandro et al. [6] discussed the extraction-based sum-
marization techniques by collaborating on individual
words and scoring sentences. Passages for experimenta-
tion were collected from dierent popular Bengali daily
newspapers.
Uddin and Khan [7] presented an approach where they
have given importance to sentence location, cue phrase
presence, title word presence, term frequency, and numer-
ical data. They have focused on the importance of the
rst and last sentences of the passages. They have got an
average accuracy of 71.3 percent.
Junan Zhu [8] presented a neural system combination
strategy for sentence summarization using the sequence-
to-sequence model and its encoder-decoder framework. In
this process, bidirectional Gated Recurrent Units are used.
Mahimul Islam [9] described a model of hybrid Bangla
Text Summarization. Three methods were used to gen-
erate a summary. The top 40 percent of the actual
passage was selected as a generated summary based on
the combined weighted score of Sentiment Score, Keyword
Ranking, and Text Ranking. The evaluation of the aver-
age scores of the precision, recall, and f-measure are 0.57,
0.77, and 0.64 respectively.
Mallick et al. present a Lexical-Chain-based approach
[10] for Extractive summarization Using Wordnet. Their
f1-score (Rouge-2) was 0.490. which could be improved
if they used anaphora and cataphora resolution in their
semantic relationship detection which would’ve created a
much better Lexical chain.
Gamal et. al [11] used Chicken Swarm Optimization
and Genetic Algorithm for Text Summarization with aws
such as slow convergence speed and low text summary
accuracy.
Shehab Abdel-Salam proposed the BERT-base model
[12] on Extractive Text Summarization. Dierent sizes
for BERT were subsisted. For instance, BERT-base with
12 encoders and BERT-larger with 24 encoders, but they
focused on the BERT-base. Unfortunately, there was no
hyperparameter tuning to generate a better summary.
III. Proposed Model
In our model, we have added a Named Entity Recogni-
tion (Banner Model) based scoring and Parts of Speech
based scoring along with an improved Keyword based
scoring (TF-IDF) and scoring based on sentence positions.
Each of the approaches was assigned a seed weight value
Figure 1. The Proposed Model for the Bengali Text Summarization.
initially. Weights values were optimized to retrieve the
best possible set of sentences in the nal summary. We
have taken forty percent of the sentences from the input
text into our machine-generated summary. The steps of
the model (Figure 1) are explained in the next subsequent
subsections.
A. Input Pre-processing Layer
In this step, from the input passage, we eliminate
characters like ’—’, ’!’, ’ ” ’, ’, ’, ’ / ’ etc. Some Phrases like
”Title” and blank or white spaces have also been removed
in this layer.
B. Category Labeling Layer
There are 8 categories of passages in our dataset as
mentioned in the introduction section. In this layer, the
input passages are categorized and labeled to be one of
those eight types.
C. Task Specic pre-processing Layer
Every document then comes into this task-specic pre-
processing layer. Some documents may need dierent
preprocessing based on their category label for the sen-
tence scoring step. For example, in the Named Entity
Recognition-based scoring approach, we do not use the
stemming of sentences before counting the named entities
from a sentence. We also removed stop words like
with a tool that is available online [13] to
increase the performance of our model for some types of
scoring approaches which is discussed in a later section.
Our ve approaches for sentence scoring are described
in the upcoming sections.
1) Sentence Scoring: Sentence Scoring is a scoring crite-
rion based on a similarity-based ranking model which can
be used to nd the most relevant sentence from a passage.
This model tokenizes each sentence of the training dataset
and converts them into sentence vectors. We trained a
model with a dataset of 3248295 sentences where 42466428
words and 461498 unique words are among them.
Here we used a pre-trained word2vec model on the Bengali
news dataset and tuned the model with our dataset and
used it to generate a summary. Here each sentence is
represented as a vector. After that, each vector gets
compared with all other vectors presented in the text, and
the similarity score of each of them is calculated via the
cosine similarity technique.
SentenceS core = (V1∗V2)/V1V2(2)
Here,
V1 = Vector representation of sentence 1
V2 = Vector representation of sentence 2
Sample Scoring snippet using gensim
import gensim
2) Named Entity Recognition: Named Entity Recogni-
tion (NER) is a technique to identify the named entities
from a chunk of words or a sentence. Here the main idea is
to identify the person, location, organization, and object
from the passage sentence. There are several techniques
for NER tagging. Here we have used the BIOES technique
for tagging the named entities.
Before NER tagging we have
After NER tagging, we get
The NER Score of the above sentence is = 5
3) Keyword Scoring: The appearance of some words
making a sentence valuable is called a keyword. A
sentence with more keywords has a higher chance
of being in the summary. So, detecting and scoring
keywords is important. For the keyword scoring, at
rst, we preprocessed our dataset with a task-specic
preprocessing layer such as eliminating stop words. Then
we applied TF-IDF scoring method from Kamal Sarkar’s
approach [14]. Here, Term Frequency values are measured
by the repetition of a particular word in a document. The
IDF (Inverse Document Frequency) value was computed
using log(N/df) where, N= number of documents in
the dataset, df= document frequency( indicates the
number of documents in which a word occurred). After
calculating the value of all keywords in a sentence, we
add them to nd the overall score of a sentence. After
scoring all the sentences, we take a maximum of 40%
cadndidate sentences with the higher TF-IDF score from
the passage to construct a summary.
An example of the Keyword based scoring method-
Here, the score of the above sentence using Keyword
Scoring(TF-IDF) after tuning the model with our dataset-
The nal sentence score is: 0.1037
4) Sentence Scoring based on Parts of Speech Tagging:
Parts of speech in a sentence shows us how the word relates
to each other. We have used a model for tagging Parts
of Speech (POS) developed by Sagor Sarker using a Pre-
trained CRF-based model to detect POSes such as nouns,
pronouns, adjectives, and verbs of a sentence.
An example of the POS-based scoring method is shown
below.
After applying the POS tagger model, the above sentences
look like below-
5) Scoring based on Positional Value: We all know that
the position of a sentence in a large text plays a very
signicant role in extractive summarization. For this
reason, we assigned scores to the sentences of a passage
according to their positions. For example, for the News
Article category, we saw that the rst sentence is the title
of the whole news article. So in gold summaries, the rst
sentence always appears. For this reason, we applied a
scoring method like below. (Equation 3) -
Positional Value = sqrt(len)/line*line (3)
For a better understanding, suppose a passage is having
14 lines so,
Positional value of sentence 1= (sqrt(14))/(12) = 3.741657
Positional value of sentence 2= (sqrt(14))/(22) = 0.935412
. . .
. . .
Positional value of sentence 14= (sqrt(14))/(142) = 0.0190
6) Final Score Generation for a sentence: After get-
ting candidate summary texts from each of the above-
mentioned (ve) approaches, we multiply the scores of the
sentences coming from dierent approaches with dierent
weight values and thus the nal score of a candidate
sentence is calculated (equation 4). Finally, the top 40%
unique sentences are merged to form the nal summary.
Final Score of a candidate sentence =
(W1 * Keyword Scoring) + (W2 * NER Score)
+ (W3 * POS tagger score)( + W4 * Sentence Scoring)
+ (W5 * Positional Scoring)
(4)
Here,
W1= Optimized weight of Keyword Scoring
W2= Optimized weight of Named Entity Recognition
Scoring
W3= Optimized weight of POS tagger Scoring
W4= Optimized weight of Sentence Scoring
W5= Optimized weight of Positional Scoring
7) Weight Optimization: Initially, We assumed
W1=W2=W3=W4=W5=1 and then we continuously
applied the regression approach to set Weight values ( W1
to W5) in increasing order until the summary generation
score starts to decline for each sentence in the candidate
test set. After several iterations, we found the optimized
weights to be W1=2, W2=1, W3=6, W4=1, W5=7 for
our dataset. We got the best result with these weights.
IV. Results and Discussion
Our approach to nding the human-like summaries by
a machine is tested with supporting experiments. To do
the experiments, we split our dataset into 4 parts. For
each part, we showed our experimental results. Then we
combined the whole results in our experiments.
A. Evaluation of Named Entity Recognition based sum-
mary generation
Table 1 shows the Rouge-1 score of the generated sum-
maries compared with the gold standard summaries. Table
2 shows the results based on Banner’s BERT-based NER
identier.
Table I
Rouge-1 Scores based on Sagar Sarkar’s CRF-based NER
identifier only
Model F-Measure Precision Recall
Haque [15] 0.6166 0.5757 0.6819
Kamal Sarkar [16] 0.5496 0.5603 0.5515
Mahimul Islam [9] 0.6487 0.5658 0.7745
Our Model 0.5854 0.6153 0.5730
Table II
Rouge-1 Scores based on Banner’s BERT-based NER
identifier
Model F-Measure Precision Recall
Haque [15] 0.6166 0.5757 0.6819
Kamal Sarkar [16] 0.5496 0.5603 0.5515
Mahimul Islam [9] 0.6487 0.5658 0.7745
Our Model 0.6645 0.6561 0.6849
From the result, we can see that Banner’s BERT-based
NER identier performed better than Sagar Sarkar’s CRF-
based NER identier [17]. We have thus selected the latter
model for the purpose. To note, this comparison is based
on the BNLPC dataset.
B. Evaluation of the Parts of Speech tagging Model
In this section, we have shown the comparison of two
models and gured out which model ts best for our
dataset. Two models that we used to tune the dataset
are given below-
1. Bengali Parts of Speech tagger by Sagor Sarker [18].
2. Bengali Parts of Speech tagger by ashwoolford/bnltk
[19]
The comparison of these two POS tagging models
(based on Rouge-2) on our dataset is shown in Table 3
and Table 4.
Table III
Rouge-2 score of the POS tagger based on Sagar Sarkar’s
Model
Model F-Measure Precision Recall
Haque [15] 0.5830 0.5459 0.6433
Kamal Sarkar [16] 0.5060 0.5165 0.5075
Mahimul Islam [9] 0.5777 0.4958 0.7065
Our Model 0.6067 0.6011 0.6253
After analyzing the result, we have selected the Bengali
POS tagger by ashwoolford/bnltk which performed com-
paratively better.
Table IV
Rouge-2 score based on Bengali POS tagger by
ashwoolford/bnltk
Model F-Measure Precision Recall
Haque [15] 0.5830 0.5459 0.6433
Kamal Sarkar [16] 0.5060 0.5165 0.5075
Mahimul Islam [9] 0.5777 0.4958 0.7065
Our Model 0.6254 0.5672 0.7138
C. Evaluation of category-based Keyword Scoring
Here we have used our developed dataset to train a
category-based TF/IDF model rst. Then for each key-
word in the corresponding category, we calculate the score
of the keywords appearing in a sentence based on TF-
IDF values. We calculate the score of a sentence based
on the keyword scores inside that sentence. The score of
a sentence is the summation of all keyword scores in that
sentence. The top 40 percent of sentences are selected as
per the score. The comparison of our system with others is
shown in the following tables (Table 5 and Table 6) based
on the BNLPC dataset.
Table V
Rouge-1 score for the Keyword-based scoring
Model F-Measure Precision Recall
Haque [15] 0.6166 0.5757 0.6819
Kamal Sarkar [16] 0.5496 0.5603 0.5515
Mahimul Islam [9] 0.6487 0.5658 0.7745
Our Model 0.6467 0.6593 0.6517
Table VI
Rouge-2 score for the Keyword-based scoring
Model F-Measure Precision Recall
Haque [15] 0.5830 0.5459 0.6433
Kamal Sarkar [16] 0.5060 0.5165 0.5075
Mahimul Islam [9] 0.5777 0.4958 0.7065
Our Model 0.5815 0.5891 0.5907
D. Model Combination
Our hybrid model combines NER, POS Tagging, Key-
word based Sentence Scoring, and the Positional Value
of the sentences to generate the nal summary. The
combination approach produces the best results.
E. Comparing our combined model with the existing models
based on the BNLPC dataset
The following tables (Table 7, and Table 8) show the
Precision, Recall, and F-measure values of our combined
system with the existing models. Our model has better
precision and better F-measure values compared to other
existing methods.
F. Performance of our model based on category-specic
passages
The performance of our model based on category-
specic passages was assessed with a dataset Split for 60%
Table VII
Rouge-1 Score of the Combined approach
Model F-Measure Precision Recall
Haque [15] 0.6166 0.5757 0.6819
Kamal Sarkar [16] 0.5496 0.5603 0.5515
Mahimul Islam [9] 0.6487 0.5658 0.7745
Our Model 0.6760 0.6665 0.6975
Table VIII
Rouge-2 Score of the Combined Approach
Model F-Measure Precision Recall
Haque [15] 0.5830 0.5459 0.6433
Kamal Sarkar [16] 0.5060 0.5165 0.5075
Mahimul Islam [9] 0.5777 0.4958 0.7065
Our Model 0.6177 0.6069 0.6411
in Training and 40% for testing. Here, we have used our
dataset only. The result is shown in Table 9 (Rouge-1
Score), Table 10 (Rouge-2 Score) and in Table 11 (Rouge-
L Score)
Table IX
Rouge-1 Score for Category-specific passages
Category F-Measure Precision Recall
Accident 0.7214 0.7050 0.7466
Bangladesh 0.7202 0.6955 0.7554
Crime 0.7180 0.6932 0.7517
Economics 0.6946 0.6842 0.7132
Entertainment 0.7019 0.6894 0.7231
International 0.6526 0.6419 0.6725
Politics 0.7279 0.7156 0.7467
Sports 0.6755 0.6622 0.6962
Combined Passages 0.7015 0.6859 0.7257
Table X
Rouge-2 Score for Category Specific passages
Category F-Measure Precision Recall
Accident 0.6559 0.6337 0.6874
Bangladesh 0.6534 0.6282 0.6892
Crime 0.6529 0.6267 0.6877
Economics 0.6208 0.6059 0.6434
Entertainment 0.6387 0.6265 0.6605
International 0.5791 0.5675 0.5981
Politics 0.6671 0.6551 0.6858
Sports 0.6056 0.5910 0.6275
Combined Passages 0.6342 0.6168 0.6600
G. Performance of our combined approach based on the
BNLPC Dataset
To do so, we split the BNLPC dataset at 70% 30%,
train-test ratio, and the summary result is given in Table
12.
In the future, we shall build a model that can do Bengali
sentiment analysis for news-related data and incorporate
it into the scoring mechanism. We shall also try to use a
Dynamic TF-IDF model proposed by Oleg Barabash et al
[20].
Table XI
Rouge-L Score for Category Specific passages
Category F-Measure Precision Recall
Accident 0.7064 0.6907 0.7306
Bangladesh 0.7041 0.6800 0.7384
Crime 0.7049 0.6808 0.7376
Economics 0.6732 0.6634 0.6907
Entertainment 0.6881 0.6759 0.7090
International 0.6322 0.6221 0.6512
Politics 0.7150 0.7029 0.7335
Sports 0.6591 0.6463 0.6790
Combined Passages 0.6854 0.6703 0.7087
Table XII
Rouge-1, Rouge-2, Rouge-L score based on the BNLPC
dataset
Category F-Measure Precision Recall
Rouge-1 BNLPC-1 0.7065 0.6891 0.7341
BNLPC-2 0.6456 0.6439 0.6609
Rouge-2 BNLPC-1 0.6470 0.6277 0.6781
BNLPC-2 0.5884 0.5861 0.6042
Rouge-L BNLPC-1 0.6931 0.6758 0.7202
BNLPC-2 0.6305 0.6294 0.6443
V. Conclusion and future works
In this work, our focus was to create a system that would
generate an extractive summary like a human being. As
there were not enough datasets in the Bangla language,
we made an enriched dataset that has its uniqueness
and has categorized passages. Dierent approaches were
intelligently combined for scoring sentences. We mea-
sured the performance of our model by comparing the
summaries with the human-generated ones. To compare
the similarities, we used standard evaluation measures
like precision, recall, and F-measure for rouge-1, rouge-
2, and rouge-L. The score of a sentence could be further
improved by adding a Text-sentiment based scoring ap-
proach. Analyzing the human-made summaries we found
that a sentence with a negative sentiment gets more
priority in human-generated summaries of Accident and
Crime categories. Whereas in Entertainment and sports
categories, sentences with positive sentiment appeared
more often in gold summaries. In the future, we can add
this approach to our overall model. Moreover, We are also
working on increasing the dataset with summaries from
textbooks and applying deep learning approaches to see if
the summarization performance can be further improved.
Acknowledgement
The authors of this paper acknowledge the help and
support of Md. Ashraful Haque for his excellent support
at the dierent stages of this research. His insights and
encouragement were invaluable to complete this work.
References
[1] “Bengali natural language processing(bnlp),” accessed July 22,
2020, vol. https://bnlp.readthedocs.io/en/latest/.
[2] K. Sarkar, “Bengali text summarization by sentence extrac-
tion,” arXiv preprint arXiv:1201.2240, 2012.
[3] S. Roychowdhury, K. Sarkar, and A. Maji, “Unsupervised ben-
gali text summarization using sentence embedding and spectral
clustering,” in Proceedings of the 19th International Conference
on Natural Language Processing (ICON), 2022, pp. 337–346.
[4] J. Liu, D. J. Hughes, and Y. Yang, “Unsupervised extractive
text summarization with distance-augmented sentence graphs,”
in 44th International ACM SIGIR Conference on Research and
Development in Information Retrieval, 2021, pp. 2313–2317.
[5] P. Tumpa, S. Yeasmin, A. Nitu, M. Uddin, M. Afjal, and
M. Mamun, “An improved extractive summarization technique
for bengali text (s),” in 2018 International Conference on
Computer, Communication, Chemical, Material and Electronic
Engineering (IC4ME2). IEEE, 2018, pp. 1–4.
[6] P. Chandro, M. F. H. Arif, M. M. Rahman, M. S. Siddik, M. S.
Rahman, and M. A. Rahman, “Automated bengali document
summarization by collaborating individual word & sentence
scoring,” in 2018 21st International Conference of Computer
and Information Technology (ICCIT). IEEE, 2018, pp. 1–6.
[7] M. N. Uddin and S. A. Khan, “A study on text summarization
techniques and implement few of them for bangla language,” in
2007 10th international conference on computer and informa-
tion technology. IEEE, 2007, pp. 1–4.
[8] J. Zhu, L. Zhou, H. Li, J. Zhang, Y. Zhou, and C. Zong,
“Augmenting neural sentence summarization through extractive
summarization,” in National CCF Conference on Natural Lan-
guage Processing and Chinese Computing, 2017, pp. 16–28.
[9] M. Islam, F. N. Majumdar, A. Galib, and M. M. Hoque, “Hybrid
text summarizer for bangla document,” Int J Comput Vis Sig
Process, vol. 1, no. 1, pp. 27–38, 2020.
[10] C. Mallick, M. Dutta, A. K. Das, A. Sarkar, and A. K. Das,
“Extractive summarization of a document using lexical chains,”
pp. 825–836, 2019.
[11] M. Gamal, A. Elsawy, and A. Abu El Atta, “Hybrid algorithm
based on chicken swarm optimization and genetic algorithm
for text summarization,” International Journal of Intelligent
Engineering and Systems, vol. 14, pp. 319–131, 05 2021.
[12] S. Abdel-Salam and A. Rafea, “Performance study on extractive
text summarization using bert models,” Information, vol. 13,
no. 2, p. 67, 2022.
[13] R. U. Haque, M. Mridha, M. Hamid, M. Abdullah-Al-Wadud,
M. Islam et al., “Bengali stop word and phrase detection mech-
anism,” Arabian Journal for Science and Engineering, vol. 45,
no. 4, pp. 3355–3368, 2020.
[14] K. Sarkar, “An approach to summarizing bengali news docu-
ments,” in proceedings of the International Conference on Ad-
vances in Computing, Communications and Informatics, 2012,
pp. 857–862.
[15] P. S. Haque, M.M and Z. Begum, “An innovative approach of
bangla text summarization by introducing pronoun replacement
and improved sentence ranking,” Journal of Information Pro-
cessing Systems, vol. 13:4, pp. 752–777, 2017.
[16] K. Sarkar, “A keyphrase-based approach to text summarization
for english and bengali documents,” International Journal of
Technology Diusion (IJTD), vol. 5, no. 2, pp. 28–38, 2014.
[17] S. Kamal, “Ner model,” accessed 22/6/2022, vol.
https://github.com/sagorbrur/bnlp/blob/master/
model/bn_ner.pkl 2020.
[18] C. Lehmann, “The nature of parts of speech,” STUF-Language
Typology and Universals, vol. 66, no. 2, pp. 141–177, 2013.
[19] S. Kamal, “Pos model,” accessed July 22, 2020, vol.
https://github.com/sagorbrur/bnlp/blob/master/model/
bn_pos.pkl 2020.
[20] O. Barabash, O. Laptiev, O. Kovtun, O. Leshchenko,
K. Dukhnovska, and A. Biehun, “The method dynavic tf-
idf,” International Journal of Emerging Trends in Engineering
Research, vol. 8:9, pp. 5712–5718, 2020.