Conference PaperPDF Available

FoSIL at CheckThat! 2022: Using Human Behaviour-Based Optimization for Text Classification

Authors:

Abstract and Figures

Nowadays, a huge amount of information and news articles are available every day. The events of recent years have shown that Fake News can severely shake trust in politics and science. Unfortunately, a decision can only be made about the truthfulness of a fraction of all news and posts. In this respect, the CLEF2022-CheckThat! shared task 3a adresses this problem. In this paper, we propose a new classification approach using a novel metaheuristic feature selection algorithm that mimics human behavior. The results show that the performance of a baseline classifier can achieve higher performance by combining with this algorithm with only a fraction of the features.
Content may be subject to copyright.
FoSIL at CheckThat! 2022: Using Human
Behaviour-Based Optimization for Text Classification
Andy Ludwig1,Jenny Felser1,Jian Xi1,Dirk Labudde1and Michael Spranger1
1University of Applied Sciences Mittweida, Technikumplatz 17, 09664 Mittweida, Germany
Abstract
Nowadays, a huge amount of information and news articles are available every day. The events of recent
years have shown that Fake News can severely shake trust in politics and science. Unfortunately, a
decision can only be made about the truthfulness of a fraction of all news and posts. In this respect, the
CLEF2022-CheckThat! shared task 3a adresses this problem. In this paper, we propose a new classication
approach using a novel metaheuristic feature selection algorithm that mimics human behavior. The
results show that the performance of a baseline classier can achieve higher performance by combining
with this algorithm with only a fraction of the features.
Keywords
fake news detection, text classication, feature selection, human behavior-based optimization
1. Introduction
In times of constant availability of vast amounts of information, people have to judge the truth
of news in a short time. This assessment is often neglected due to the fast-moving nature of
the news. There are various reasons why authors, whether intentionally or unintentionally,
contribute to the generation of untrustworthy content. In particular, sources that deliberately
disseminate false information pose a danger to consumers of news.
Eective methods for detecting fake news are essential in the ght against the targeted spread
of fake news. Continuous research and development of approaches to detect misinformation
are highly important. This is one mission of the CLEF2022 - CheckThat! Lab [
1
,
2
]. In general,
the Lab’s goal is to verify the veracity of claims. Task 3a takes up the challenge of assessing the
truth content of news articles [3].
This paper presents a novel approach for text classication. The concept is based on human
behaviour-based optimization (HBBO) [
4
]. This meta heuristic optimization approach uses
some fundamental interactions and behaviours of humans. The potential of this adaptation was
used for the fake news detection in task 3a.
The paper is structured as follows: section 2 presents related works, section 3 describes the
human behaviour-based optimization, section 4 summarizes the adaption of the optimization
approach for text classication, section 5 presents the given data and the conducted experiments,
in section 6 the results are discussed and nally in section 7 a conclusion is given.
CLEF 2022: Conference and Labs of the Evaluation Forum, September 5–8, 2022, Bologna, Italy
$ludwig1@hs-mittweida.de (A. Ludwig); jfelser@hs-mittweida.de (J. Felser); xi@hs-mittweida.de (J. Xi);
labudde@hs-mittweida.de (D. Labudde); spranger@hs-mittweida.de (M. Spranger)
©2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
CEUR
Workshop
Proceedings
http://ceur-ws.org
ISSN 1613-0073
CEUR Workshop Proceedings (CEUR-WS.org)
2. Related Work
The assessment of the truthfulness of news or claims is a special case of text classication whose
importance has increased greatly in recent times. Especially during the Covid-19 pandemic
situation a larger group of people was challenged to judge news correctly. This situation makes
the invention of automatic text classication systems necessary [
5
]. To pool a lot of competence
and to push the development forward, a task for dierent participants can be done like the
annual task CheckThat! Lab since 2018. The results of last year are summarized in [
6
]. These
tasks need a high quality dataset [7], which is also reected in the available labels [8].
To improve the results for fake news detection and the related text classication task, a wide
range of possible approaches can be explored. In this paper, the research focus was on a new
technique to select the best features for training. To solve these optimization tasks, nature
inspired meta-heuristic techniques can be applied. An overview of already known approaches
is given in [
9
]. For example, the ant colony optimization is a possible algorithm for feature
selection in text classication tasks [10].
The nature-inspired algorithm used in this paper is derived from human behaviour. The basis
of this approach was presented in [
4
]. This algorithm was used already in [
11
] in combination
with self-organizing maps in context of cryptanalytic resistance. The authors compare this
approach with other meta-heuristic solutions like ant colony optimization. Next to the approach
of [
4
], other algorithms based on human behaviour were described. Firstly, in [
12
] the goal is to
solve optimization tasks by simulating the phases of knowledge gaining and sharing in younger
and older years of human life. A dierent aspect of optimization algorithms under usage of
human behaviour is presented in [
13
]. This approach focuses on the adaption of behaviours
and manners of other humans for example in family structures.
3. Human Behaviour-Based Optimization (HBBO)
In this section, we briey describe Ahmadi’s novel swarm intelligence-based optimization
approach [
4
] considering human behavior, which forms the basis for the feature selection
approach used here. A central aspect in all phases of the algorithm is an individual’s pursuit of
self-optimization at dierent stages of his or her life. At the same time, dierent individuals
have dierent levels of experience in their eld, and some of them become experts in one of
them (e.g., art, music, or science, etc.).
The optimization of individual performance is carried out iteratively in four main steps: Ini-
tialization,Education,Consultation, and Field Changing. In the Initialization step the population
is built. Each individual in the population is assigned an area of interest in which improve-
ment should be achieved. Depending on the underlying optimization problem, an individual
is represented as a vector of characteristic variables (features).
𝐼=𝑥1𝑥2. . . 𝑥𝑁𝑇
. An
expert is determined for each eld. The expert provides the best function value depending on
his individual feature set. The formal denition of an expert is shown in equation
(1)
, where
𝐼
denotes a person, 𝐸denotes an expert, and 𝐹denotes the actual eld of expertise.
𝐼(𝐹)𝐸(𝐹)= argmin 𝑓error(𝐹)(1)
The rst step in the improvement process, is Education. Individuals in each area learn from
the respective expert. The learning process comprises the improvement of the individual
characteristic values by those of the expert and aims at the reduction of their own error function
value.
A similar procedure is used in Consultation. In this stage, individuals can learn from any
other individual, not only from an expert. For this purpose, some variables are merged between
two randomly selected individuals. The consultation is called eective and the merged set of
variables is kept if the updated variables lead to better function values. Otherwise, the update is
reversed.
The last step is Field Changing. Whether an individual changes his eld of interest is calculated
using a rank probability method and the roulette wheel selection.
After Initialization, the three remaining steps are repeated until a stop criterion is met, e.g.
if the average function values do not change (or change too little) or the number of iterations
reaches a maximum.
This algorithm can be adapted to nd an optimal feature set for text classication. The next
section explains the adaptation in detail.
4. Adaptation for Text Classification
In order to adapt the HBBO approach for feature selection, the optimization objective, i.e., an
objective function, must be determined. Here, the
𝐹1
-score is chosen on a test set. For this
purpose, the preprocessed dataset must be divided into training and test data. To obtain reliable
results and avoid overtting, the data set is split in terms of k-fold cross-validation. Each fold
results in an optimal feature set in the form of a document term matrix (DTM) that is able to
classify the respective test data with the greatest possible success. Finally, the results of all the
folds are merged into a DTM that contains only the most successful features to classify the
entire dataset. For assessing the performance after each optimization step, a support vector
machine (SVM) is used.
4.1. Initialization
During Initialization, each input document is considered as an single individual and encoded in
the form of a bit vector. This vector represents an individual’s knowledge. Each individual is
assigned the class of the respective document, which also represents the eld of interest. All
vectorized documents together form the entire population. In contrast to the original algorithm
where each individual is optimized, here subsets of individuals are optimized together. This
leads to a smaller set of documents achieving good or even better classication results than
the original set of all documents. A group of individuals contains all elds, which leads to a
simultaneous optimization of all classes within a group.
For this purpose, the individuals are grouped into subsets, with each subset equally populated
with individuals of each class (stratied approach). Each group of individuals is optimized
separately. The size of the subsets is a hyperparameter.
The remaining three steps are applied iteratively to each group of each fold. The number of
iterations needs to be specied as a hyperparameter.
4.2. Education
In the Education step, a group of individuals must rst be determined for each eld that achieves
the lowest error with respect to the objective function on the test set according to equation
(2)
.
Each of these groups is considered to be the expert group for that particular eld.
𝐼(𝐹)𝐸(𝐹)= argmax 𝐹1(2)
Subsequently, a subset
𝑆𝐸
of features of the expert group, considering all classes, is merged
with the features of other individuals, belonging to the remaining groups. This procedure
corresponds to the non-experts approaching the expert group by updating their feature vector.
The number of terms transferred in this step is another hyperparameter, as well as the number
of adapted individuals, i.e., documents. Equation
(3)
shows the formal denition of this step,
where 𝐼(𝐹)
𝑖denotes the 𝑖𝑡ℎ individual 𝐼in the specic eld 𝐹and 𝑆𝐸𝐸(𝐹).
𝐼(𝐹)
𝑖=𝐼(𝐹)
𝑖𝑆𝐸,if𝑓error(𝐼(𝐹)
𝑖𝑆𝐸)< 𝑓error(𝐼(𝐹)
𝑖)
𝐼(𝐹)
𝑖,otherwise (3)
The feature update is considered successful and the merged feature set is retained only
if the individual achieves an improvement with respect to the objective function, i.e., better
classication results. Otherwise, the previous feature set remains unchanged.
4.3. Consultation
In Consultation, the basic procedure is very similar to that in the Education step. The main
dierence is in the group of individuals from which the features for merging are taken. Regardless
of expert status, features are merged between two randomly selected groups of individuals.
This leads to greater heterogeneity of terms across all groups. Equation
(4)
shows the formal
denition of Consultation, where 𝑆𝑗is a subset of all features of individual 𝐼𝑗and 𝑆𝑗𝐼(𝐹)
𝑗.
𝐼(𝐹)
𝑖=𝐼(𝐹)
𝑖𝑆𝑗,if𝑓error(𝐼(𝐹)
𝑖𝑆𝑗)< 𝑓error(𝐼(𝐹)
𝑖)
𝐼(𝐹)
𝑖,otherwise (4)
As in Education, the updated feature set is retained only if the
𝐹1
-score improves; otherwise,
the previous terms remain unchanged. Again, the number of terms exchanged and the number
of individuals paired can be controlled using hyperparameters.
4.4. Field changing
The last step - Field Changing - does not manipulate the individuals’ terms, but changes the
eld associated with them. The number of randomly selected individuals changing the eld is
another hyperparameter.
In a multilevel classication, an individual can simply change their area of interest to that
of the most successful expert group. For this purpose, all expert groups are ordered by their
𝐹1-score, as shown in equation (5), where 𝐸(𝐹𝑥)is the expert group in the respective area.
𝑅=𝐸(𝐹1), ..., 𝐸(𝐹𝑛)|𝑓error 𝐸(𝐹1)< 𝐸(𝐹2)< ... < 𝐸(𝐹𝑛) (5)
Afterwards, the expert group with the highest rank in
𝑅
determines the eld of the individual
willing to switch, as shown in equation (6).
𝐹*
𝑥=𝐹𝑥|𝐸(𝐹𝑥)= argmin𝐸(𝐹𝑥)𝑅𝑓𝑒𝑟𝑟𝑜𝑟 𝐸(𝐹𝑥) (6)
In the special case of a binary classication, the process simply reduces to a eld change, i.e., an
inversion of the class label. Again, the new eld is retained only if it leads to an improvement
in the objective function value.
5. Experiments and Results
In this work, the adaptation of HBBO, as discussed in section 4, was applied to the problem of
detecting fake news in the context of the CLEF2022-CheckThat! shared task 3a. The documents
provided were news articles in English, which were to be grouped into the classes true, false,
partially false and other with regard to potentially containing fake news. For training the model,
the provided training and development sets were combined so that the total training corpus
comprised
1,264
documents, each of which was assigned to exactly one of the four mentioned
classes. The resulting class distribution of the training corpus is shown in Figure 1.
True
211
False
578
Partially False
358 Other
117
Figure 1: Distribution of classes and documents in the corpus of training data.
Before the HBBO algorithm can be applied, the input data must be cleaned. To this end, a
wide range of cleaning steps were performed:
Combination of article and headline into one pseudo document,
Replacement of newlines and tabs with white space,
Removal of emojis and links,
Removal of special characters, punctuation and numbers,
Removal of stop words,
Lemmatization.
For simplicity, a bit vector was chosen to represent each single document. Subsequently, the
cleaned dataset was divided into samples in terms of 5-fold cross-validation before HBBO could
be applied to each sample, resulting in an optimized feature set for each fold.
In order to observe the specic behavior of the new algorithm, the problem is considered
as several binary classication tasks. In this respect, for each class, a model is trained using
the training data without applying any kind of balancing technique. Table 1 show the values
chosen for the experiments.
Table 1
Summary of hyperparameter.
Phase of algorithm Hyperparameter Value
Initialization Number of dierent data sets for cross validation 5
Number of documents (individuals) per subset during optimization 24
Number of iterations for all phases 125
Education Number of terms for exchange 5
Percent of documents for adaptation of features 100
Consultation Number of terms for exchange 3
Percent of documents for adaption of features 100
Field changing Number of changed labels 1
Figure 2 shows the history of the
𝐹1
-score for all 125 iterations for the category true. Shown
in green color is the best group of individuals in each iteration, while black dots show the mean
of
𝐹1
-score of all groups and the red horizontal line symbolizes the baseline
𝐹1
-score trained
without HBBO feature selection. The results of the remaining classes are shown in Figure 3,
Figure 4, and Figure 5 accordingly.
In order to provide a quantitative overview, the respective results are summarized in Table 2.
In addition, the mean value is calculated for each class. The system yields a
𝑚𝑎𝑐𝑟𝑜 𝐹 1 = 0.602
over all classes.
Table 2
Results in terms of 𝐹1-Score for each fold of a 5-fold cross-validation.
Class 𝐹1-Score
Fold 1 Fold 2 Fold 3 Fold 4 Fold 5 Mean
True 0.508 0.464 0.553 0.558 0.574 0.531
False 0.765 0.762 0.777 0.787 0.764 0.771
Partially False 0.633 0.601 0.633 0.602 0.645 0.623
Other 0.476 0.45 0.468 0.416 0.607 0.483
As mentioned earlier, the results of all folds must be merged to combine them into one
classier. This can be achieved simply by concatenating the best group of individuals in each
fold. The nal feature matrix contains 120 pseudo documents (5 groups of 24 features each).
In order to uniquely assign each document of the test data to a class, each separately optimized
binary model was used for classication in the order of their performance. Table 3summarizes
the results of this nal classication procedure. Here,
𝑚𝑎𝑐𝑟𝑜 𝐹1= 0.251
with an accuracy of
0.462. With this result we have reached the 18𝑡ℎ place of the shared task 3a.
Figure 2: HBBO results aer 125 iterations for first category: true; black horizontal line: own baseline
without HBBO, black:
𝐹1
-score of the best group of individuals, grey: mean
𝐹1
-score of all groups of
individuals.
Table 3
Evaluation results using hold-out data.
Class Precision Recall 𝐹1-Score
True 0.391 0.086 0.141
False 0.573 0.806 0.670
Partially False 0.161 0.179 0.169
Other 0.016 0.032 0.022
6. Discussion
The results shown are surprisingly low. Nevertheless, HBBO as a feature selection algorithm
has a high potential for classication tasks. However, there are still some problems and open
research questions. First of all, the drop of
𝑚𝑎𝑐𝑟𝑜 𝐹1
from training (
0.602
) and test (
0.251
) data,
Figure 3: HBBO results aer 125 iterations for second category: false; black horizontal line: own
baseline without HBBO, black:
𝐹1
-score of the best group of individuals, grey: mean
𝐹1
-score of all
groups of individuals.
which might indicate a slight overtting. The optimization of each sample takes into account
the respective test set. This process may lead to good classication results only within that
particular test set. Further studies need to be performed to test this hypothesis.
Further performance gains could be achieved by structured experimentation with the hyper-
parameters or by using more advanced features, such as TF-IDF or BM25. Inseparable from the
chaining of binary classiers is the question of their best order. Among other strategies, the
shift to true multi-label classication could also be benecial. For this, further adaptations of
the HBBO algorithm have to be made.
In addition, the strict requirement for improvement at each optimization step can lead to miss
optimal feature combinations. This could be remedied by allowing temporary deteriorations.
In general, the task of automatically carrying out a ne-grained classication of a document
with regard to its truthfulness is dicult to imagine. Inseparable from the spread of fake
news is the task of making it appear as real as possible. Thus, there can be no statistically
detectable linguistic features that clearly indicate the truthfulness of a document, as can be
Figure 4: HBBO results aer 125 iterations for third category: partially false; black horizontal line: own
baseline without HBBO, black:
𝐹1
-score of the best group of individuals, grey: mean
𝐹1
-score of all
groups of individuals.
easily demonstrated analytically. Ultimately, this can only be determined by a fact check. This
insight is also underlined by the overall results achieved by all participants in this shared task.
7. Conclusion
In this paper a novel feature selection algorithm for text classication using an human behavior-
based optimization approach is presented in order to solve task of ne-grained fake news
detection. The algorithm shows an improved performance compared to classication using
a single SVM. In addition, the enormous reduction in training input after optimization is
remarkable. In each sample, only 24 optimized pseudo-documents were able to outperform the
baseline calculated considering all documents.
Nevertheless, further experiments must be carried out to nd the best values for the hyperpa-
rameters. Further improvements might achieved by using a more advanced term representation.
Figure 5: HBBO results aer 125 iterations for fourth category: other; black horizontal line: own
baseline without HBBO, black:
𝐹1
-score of the best group of individuals, grey: mean
𝐹1
-score of all
groups of individuals.
References
[1]
P. Nakov, A. Barrón-Cedeño, G. Da San Martino, F. Alam, J. M. Struß, T. Mandl, R. Míguez,
T. Caselli, M. Kutlu, W. Zaghouani, C. Li, S. Shaar, G. K. Shahi, H. Mubarak, A. Nikolov,
N. Babulkov, Y. S. Kartal, J. Beltrán, The clef-2022 checkthat! lab on ghting the covid-19
infodemic and fake news detection, in: M. Hagen, S. Verberne, C. Macdonald, C. Seifert,
K. Balog, K. Nørvåg, V. Setty (Eds.), Advances in Information Retrieval, Springer Interna-
tional Publishing, Cham, 2022, pp. 416–428.
[2]
P. Nakov, A. Barrón-Cedeño, G. Da San Martino, F. Alam, J. M. Struß, T. Mandl, R. Míguez,
T. Caselli, M. Kutlu, W. Zaghouani, C. Li, S. Shaar, G. K. Shahi, H. Mubarak, A. Nikolov,
N. Babulkov, Y. S. Kartal, J. Beltrán, M. Wiegand, M. Siegel, J. Köhler, Overview of the
CLEF-2022 CheckThat! lab on ghting the COVID-19 infodemic and fake news detection,
in: Proceedings of the 13th International Conference of the CLEF Association: Information
Access Evaluation meets Multilinguality, Multimodality, and Visualization, CLEF ’2022,
Bologna, Italy, 2022.
[3]
J. Köhler, G. K. Shahi, J. M. Struß, M. Wiegand, M. Siegel, T. Mandl, Overview of the
CLEF-2022 CheckThat! lab task 3 on fake news detection, in: Working Notes of CLEF
2022—Conference and Labs of the Evaluation Forum, CLEF ’2022, Bologna, Italy, 2022.
[4]
S.-A. Ahmadi, Human behavior-based optimization: A novel metaheuristic approach to
solve complex optimization problems, Neural Comput. Appl. 28 (2017) 233–244. URL:
https://doi.org/10.1007/s00521-016-2334-4. doi:10.1007/s00521- 016-2334-4.
[5]
G. K. Shahi, D. Nandini, FakeCovid a multilingual cross-domain fact check news dataset
for covid-19, in: Workshop Proceedings of the 14th International AAAI Conference on Web
and Social Media, 2020. URL: http://workshop-proceedings.icwsm.org/pdf/2020_14.pdf.
[6]
G. K. Shahi, J. M. Struß, T. Mandl, Overview of the clef-2021 checkthat! lab task 3 on fake
news detection, Working Notes of CLEF (2021).
[7]
G. K. Shahi, Amused: An annotation framework of multi-modal social media data, arXiv
preprint arXiv:2010.00502 (2020).
[8]
G. K. Shahi, A. Dirkson, T. A. Majchrzak, An exploratory study of covid-19 misinformation
on twitter, Online Social Networks and Media 22 (2021) 100104.
[9]
M. Sharma, P. Kaur, A comprehensive analysis of nature-inspired meta-heuristic techniques
for feature selection problem, Archives of Computational Methods in Engineering 28
(2020). doi:10.1007/s11831-020-09412-6.
[10]
J. Xi, M. Spranger, D. Labudde, Music event detection leveraging feature selection based
on ant colony optimization 13 (2020) 36–47.
[11]
R. Soto, B. Crawford, F. G. Molina, R. Olivares, Human behaviour based optimization
supported with self-organizing maps for solving the s-box design problem, IEEE Access 9
(2021) 84605–84618. doi:10.1109/ACCESS.2021.3087139.
[12]
A. Wagdy, A. Hadi, A. Khater, Gaining-sharing knowledge based algorithm for solving
optimization problems: a novel nature-inspired algorithm, International Journal of Machine
Learning and Cybernetics 11 (2020). doi:10.1007/s13042-019-01053-x.
[13]
M. Kumar, A. J. Kulkarni, S. C. Satapathy, Socio evolution & learning optimization algo-
rithm: A socio-inspired optimization methodology, Future Generation Computer Systems
81 (2018) 252–272. doi:https://doi.org/10.1016/j.future.2017.10.052.
Article
In today's technology, information spreads quickly through online social networks, making our lives easier. However, when false news is shared without critical evaluation, it can harm society and affect social, political and economic aspects as it reaches a wide audience. At this point, it is important to develop content verification and confirmation systems. In this study, the aim is to conduct monolingual and cross-lingual classification on a multi-class dataset containing English and German news content. We applied data preprocessing, including CountVectorizer and stylometric feature extraction, before classification. Feature selection was made using the genetic algorithm, which is an algorithm based on the idea of evolution in nature. Selected features were classified by Random Forest, Logistic Regression, Multinomial Naive Bayes, Decision Tree and KNearest Neighbors machine learning algorithms. In the classification process, Multinomial Naive Bayes achieved 58.49% Accuracy and 42.97% macro-F1 for monolingual English news texts, while Logistic Regression achieved 45.39% Accuracy and 37.70% macro-F1 in Cross-lingual classification using English and German news texts. Significantly successful results were obtained compared to studies conducted with the same dataset. In addition, the same methodology was applied to the ISOT dataset. 99.48% and 99.62% macro-F1 were obtained by Logistic Regression and Decision Tree algorithms, respectively.
Chapter
We describe the fifth edition of the CheckThat! lab, part of the 2022 Conference and Labs of the Evaluation Forum (CLEF). The lab evaluates technology supporting tasks related to factuality in multiple languages: Arabic, Bulgarian, Dutch, English, German, Spanish, and Turkish. Task 1 asks to identify relevant claims in tweets in terms of check-worthiness, verifiability, harmfullness, and attention-worthiness. Task 2 asks to detect previously fact-checked claims that could be relevant to fact-check a new claim. It targets both tweets and political debates/speeches. Task 3 asks to predict the veracity of the main claim in a news article. CheckThat! was the most popular lab at CLEF-2022 in terms of team registrations: 137 teams. More than one-third (37%) of them actually participated: 18, 7, and 26 teams submitted 210, 37, and 126 official runs for tasks 1, 2, and 3, respectively.KeywordsFact-CheckingDisinformationMisinformationCheck-WorthinessVerified Claim RetrievalFake NewsCOVID-19
Chapter
Full-text available
Social media has become popular among users for social interaction and news sources. Users spread misinformation in multiple data formats. However, systematic studying of social media phenomena has been challenging due to the lack of labelled data. This paper presents a semi-automated annotation framework AMUSED for gathering multilingual multimodal annotated data from social networking sites. The framework is designed to mitigate the workload in collecting and annotating social media data by cohesively combining machines and humans in the data collection process. AMUSED detects links to social media posts from a given list of news articles and then downloads the data from the respective social networking sites and labels them. The framework gathers the annotated data from multiple platforms like Twitter, YouTube, and Reddit. For the use case, we have implemented the framework for collecting COVID-19 misinformation data from different social media sites and have categorised 8,077 fact-checked articles into four different classes of misinformation.KeywordsData annotationSocial mediaMisinformationNews articlesFact-checkingCOVID-19
Article
Full-text available
The cryptanalytic resistance of modern block and stream encryption systems mainly depends on the substitution box (S-box). In this context, the problem is thus to create an S-box with higher value of nonlinearity because this property can provide some degree of protection against linear and differential cryptanalysis attacks. In this paper, we design a scheme built on a human behavior-based optimization algorithm, supported with Self-Organizing Maps to prevent premature convergence and improve the nonlinearity property in order to obtain strong 8×8 substitution boxes. The experiments are compared with S-boxes obtained using other metaheuristic algorithms such as Ant Colony Optimization, Genetic Algorithm and an approach based on chaotic functions and show that the obtained S-boxes have good cryptographic properties. The obtained S-box is investigated against standard tests such as bijectivity, nonlinearity, strict avalanche criterion, bit independence criterion, linear probability and differential probability, proving that the proposed scheme is proficient to discover a strong nonlinear component of encryption systems.
Article
Full-text available
During the COVID-19 pandemic, social media has become a home ground for misinformation. To tackle this infodemic, scientific oversight, as well as a better understanding by practitioners in crisis management, is needed. We have conducted an exploratory study into the propagation, authors and content of misinformation on Twitter around the topic of COVID-19 in order to gain early insights. We have collected all tweets mentioned in the verdicts of fact-checked claims related to COVID-19 by over 92 professional fact-checking organisations between January and mid-July 2020 and share this corpus with the community. This resulted in 1500 tweets relating to 1274 false and 226 partially false claims, respectively. Exploratory analysis of author accounts revealed that the verified twitter handle(including Organisation/celebrity) are also involved in either creating(new tweets) or spreading(retweet) the misinformation. Additionally, we found that false claims propagate faster than partially false claims. Compare to a background corpus of COVID-19 tweets, tweets with misinformation are more often concerned with discrediting other information on social media. Authors use less tentative language and appear to be more driven by concerns of potential harm to others. Our results enable us to suggest gaps in the current scientific coverage of the topic as well as propose actions for authorities and social media users to counter misinformation.
Article
Full-text available
Announcements of events are regularly spread using the Internet, e.g., via online newspapers or social media. Often, these events involve playing music publicly that is protected by international copyright laws. Authorities entrusted with the protection of the artists' interests have to find unregistered music events in order to fully exercise their duty. As a requirement, they need to find texts in the Internet that are related to such events like announcements or reports. However, event detection is a challenging task in the field of Text Mining due to the enormous variety of information that needs to be considered and the large amount of data that needs to be processed. In this paper, a process chain for the detection of music events incorporating external knowledge is proposed. Furthermore, a feature selection algorithm based on ant colony optimization to find featurse with a high degree of explanatory power is presented. Finally, the performance of five different machine learning algorithms including two learning ensembles is compared using various feature sets and two different datasets. The best performances reach an F1-measure of 0.95 for music texts and 0.968 for music event texts, respectively.
Article
Full-text available
This paper proposes a novel nature-inspired algorithm called Gaining Sharing Knowledge based Algorithm (GSK) for solving optimization problems over continuous space. The GSK algorithm mimics the process of gaining and sharing knowledge during the human life span. It is based on two vital stages, junior gaining and sharing phase and senior gaining and sharing phase. The present work mathematically models these two phases to achieve the process of optimization. In order to verify and analyze the performance of GSK, numerical experiments on a set of 30 test problems from the CEC2017 benchmark for 10, 30, 50 and 100 dimensions. Besides, the GSK algorithm has been applied to solve the set of real world optimization problems proposed for the IEEE-CEC2011 evolutionary algorithm competition. A comparison with 10 state-of-the-art and recent metaheuristic algorithms are executed. Experimental results indicate that in terms of robustness, convergence and quality of the solution obtained, GSK is significantly better than, or at least comparable to state-of-the-art approaches with outstanding performance in solving optimization problems especially with high dimensions.
Article
Full-text available
Optimization techniques, specially evolutionary algorithms, have been widely used for solving various scientific and engineering optimization problems because of their flexibility and simplicity. In this paper, a novel metaheuristic optimization method, namely human behavior-based optimization (HBBO), is presented. Despite many of the optimization algorithms that use nature as the principal source of inspiration, HBBO uses the human behavior as the main source of inspiration. In this paper, first some human behaviors that are needed to understand the algorithm are discussed and after that it is shown that how it can be used for solving the practical optimization problems. HBBO is capable of solving many types of optimization problems such as high-dimensional multimodal functions, which have multiple local minima, and unimodal functions. In order to demonstrate the performance of HBBO, the proposed algorithm has been tested on a set of well-known benchmark functions and compared with other optimization algorithms. The results have been shown that this algorithm outperforms other optimization algorithms in terms of algorithm reliability, result accuracy and convergence speed.
Chapter
The fifth edition of the CheckThat! Lab is held as part of the 2022 Conference and Labs of the Evaluation Forum (CLEF). The lab evaluates technology supporting various factuality tasks in seven languages: Arabic, Bulgarian, Dutch, English, German, Spanish, and Turkish. Task 1 focuses on disinformation related to the ongoing COVID-19 infodemic and politics, and asks to predict whether a tweet is worth fact-checking, contains a verifiable factual claim, is harmful to the society, or is of interest to policy makers and why. Task 2 asks to retrieve claims that have been previously fact-checked and that could be useful to verify the claim in a tweet. Task 3 is to predict the veracity of a news article. Tasks 1 and 3 are classification problems, while Task 2 is a ranking one. KeywordsFact-checkingDisinformationMisinformationCheck-worthinessVerified claim retrievalFake newsFactualityCOVID-19
Article
Meta-heuristics are problem-independent optimization techniques which provide an optimal solution by exploring and exploiting the entire search space iteratively. These techniques have been successfully engaged to solve distinct real-life and multidisciplinary problems. A good amount of literature has been already published on the design and role of various meta-heuristic algorithms and on their variants. The aim of this study is to present a comprehensive analysis of nature-inspired meta-heuristic utilized in the domain of feature selection. A systematic review methodology has been used for synthesis and analysis of one hundered and seventy six articles. It is one of the important multidisciplinary research areas that assist in finding an optimal set of features so that a better rate of classification can be achieved. The concept of feature selection process along with relevance and redundancy metric is briefly elucidated. A categorical list of nature-inspired meta-heuristic techniques has been presented. The major applications of these techniques are explored to highlight the least and most explored areas. The area of disease diagnosis has been extensively assessed. In addition, the special attention has been given on highlighting the role and performance of binary and chaotic variants of different nature-inspired meta-heuristic techniques. The summary of nature-inspired meta-heuristic methods and their variants along with datasets, performance (mean, best, worst, error rate and standard deviation) is also depicted. In addition, the detailed publication trend of meta-heuristic feature selection approaches has also been presented. The research gaps have been identified for the researcher who inclines to design or analyze the performance of divergent meta-heuristic techniques in solving feature selection problem.
Article
The paper proposes a novel metaheuristic Socio Evolution & Learning Optimization Algorithm (SELO) inspired by the social learning behaviour of humans organized as families in a societal setup. This population based stochastic methodology can be categorized under the very recent and upcoming class of optimization algorithms-the socio-inspired algorithms. It is the social tendency of humans to adapt to mannerisms and behaviours of other individuals through observation. SELO mimics the socio-evolution and learning of parents and children constituting a family. Individuals organized as family groups (parents and children) interact with one another and other distinct families to attain some individual goals. In the process, these family individuals learn from one another as well as from individuals from other families in the society. This helps them to evolve, improve their intelligence and collectively achieve shared goals. The proposed optimization algorithm models this de-centralized learning which may result in the overall improvement of each individual's behaviour and associated goals and ultimately the entire societal system. SELO shows good performance on finding the global optimum solution for the unconstrained optimization problems. The problem solving success of SELO is evaluated using 50 well-known boundary-constrained benchmark test problems. The paper compares the results of SELO with few other population based evolutionary algorithms which are popular across scientific and real-world applications. SELO's performance is also compared to another very recent socio-inspired methodology-the Ideology algorithm. Results indicate that SELO demonstrates comparable performance to other comparison algorithms. This gives ground to the authors to further establish the effectiveness of this metaheuristic by solving purposeful and real world problems.
Overview of the CLEF-2022 CheckThat! lab on fighting the COVID-19 infodemic and fake news detection
  • P Nakov
  • A Barrón-Cedeño
  • G Da San
  • F Martino
  • J M Alam
  • T Struß
  • R Mandl
  • T Míguez
  • M Caselli
  • W Kutlu
  • C Zaghouani
  • S Li
  • G K Shaar
  • H Shahi
  • A Mubarak
  • N Nikolov
  • Y S Babulkov
  • J Kartal
  • M Beltrán
  • M Wiegand
  • J Siegel
  • Köhler
P. Nakov, A. Barrón-Cedeño, G. Da San Martino, F. Alam, J. M. Struß, T. Mandl, R. Míguez, T. Caselli, M. Kutlu, W. Zaghouani, C. Li, S. Shaar, G. K. Shahi, H. Mubarak, A. Nikolov, N. Babulkov, Y. S. Kartal, J. Beltrán, M. Wiegand, M. Siegel, J. Köhler, Overview of the CLEF-2022 CheckThat! lab on fighting the COVID-19 infodemic and fake news detection, in: Proceedings of the 13th International Conference of the CLEF Association: Information Access Evaluation meets Multilinguality, Multimodality, and Visualization, CLEF '2022, Bologna, Italy, 2022.