Samira SadaouiUniversity of Regina · Department of Computer Sciences
Samira Sadaoui
Ph.D In Computer Science
Working on Incremental Feature Learning, Adaptive Learning & Concept Drift Detection
About
128
Publications
45,069
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,320
Citations
Introduction
1. Concept drift detection and incremental/adaptive learning using deep learning for data streaming applications, such as fraud detection (credit card fraud)
2. Text classification using deep neural networks and word embeddings, such as hate speech detection and topic labeling applications.
Additional affiliations
July 2002 - present
Publications
Publications (128)
The option of organizing E-auctions to purchase electricity required for anticipated peak load period is a
new one for utility companies. To meet the extra demand load, we develop electricity combinatorial reverse auction
(CRA) for the purpose of procuring power from diverse energy sources. In this new, smart electricity market,
suppliers of differ...
We introduce a new nature-inspired optimization algorithm namely Mushroom Reproduction Optimization (MRO) inspired and motivated by the reproduction and growth mechanisms of mushrooms in nature. MRO follows the process of discovering rich areas (containing good living conditions) by spores to grow and develop their own colonies. We thoroughly asses...
For detecting malicious bidding activities in e‐auctions, this study develops a chunk‐based incremental learning framework that can operate in real‐world auction settings. The self‐adaptive framework first classifies incoming bidder chunks to counter fraud in each auction and take necessary actions. The fraud classifier is then adjusted with confid...
Improving Offensive and Hate Speech (OHS) classifiers' performances requires a large, confidently labeled textual training dataset. Our study devises a semi-supervised classification approach with self-training to leverage the abundant social media content and develop a robust OHS classifier. The classifier is self-trained iteratively using the mos...
Detecting fraud accurately in credit cards is critical as this financial sector incurs significant losses for cardholders. Nonetheless, most studies adopted standard machine learning and few incremental learning, which are inadequate for addressing credit card challenges, such as rapid data arrival, unlimited data, data sensitivity, and performance...
This research primarily focuses on tackling the challenge of forecasting univariate time series. Existing methods for training forecasting models range from traditional statistical models to domain-specific algorithms, and more recently deep neural network models typically rely on raw time series observations, with alternative representations like...
Concept Drift (CD) is a significant challenge in real-world data stream applications, as its presence requires predictive models to adapt to data-distribution changes over time. Our paper introduces a new algorithm, Probabilistic Real-Drift Detection (PRDD), designed to track and respond to CD based on its probabilistic definitions. PRDD utilizes t...
Concept drift detection in noisy data streams is
challenging yet essential. This paper introduces NPRDD, a new
concept drift detection algorithm that is robust to noise and
accurately identifies Real drifts. NPRDD operates on a moving
window of recent data, utilizing predicted class probabilities
and cross-entropy-based surprise measures to weigh r...
Data clustering has many applications in machine
learning, data mining and image processing. K-means is the
most popular clustering algorithm due to its efficiency and simplicity of implementation. However, K-means has limitations,
such as large feature spaces, which may affect its effectiveness.
To improve K-means accuracy, we adopt the Biogeograp...
Concept drift, which indicates data-distribution changes in streaming scenarios, can significantly reduce predictive performance. Existing concept drift detection methods often struggle with the trade-off between fast detection and low false alarm rates. This paper presents a novel concept drift detection algorithm, called SPNCD*, based on probabil...
Data clustering has many applications in medical sciences, banking, and data mining. K-means is the most popular data clustering algorithm due to its efficiency and simplicity of implementation. However, K-means has some limitations, which may affect its effectiveness, such as all the features having the same degree of importance. To address these...
Although prediction models can provide high accuracy with massive data, learning from imbalanced data is a great challenge. Existing imbalanced-data solutions change the data distributions or the minority class's impact on the models by increasing the false positive rate to achieve a better recall. To overcome these issues, we develop an approach b...
This work presents a deep averaged reinforcement-learning approach to learn improvement heuristics for route planning. The proposed method is tested on the Traveling Salesman Problem (TSP). While learning improvement heuris-tics using machine learning models are prosperous, these methods suffer from low generalization and forgetfulness of the agent...
SAT problems are fundamental in representing and solving combinatorial applications. Over the past years, many sophisticated SAT solvers have been proposed. Due to the topic's relevance, a SAT competition is scheduled yearly to promote solving hard SAT instances. There is no unique solver to tackle all SAT problems efficiently. Indeed, some solvers...
Our study proposes a dimensionality reduction approach to efficiently process a service monitoring application’s high-dimensional and unlabeled time-series dataset. The approach aims to improve data quality and lower the feature space optimally. Since the dataset is vast and the reduction approach requires colossal resources, we divide it into seve...
Service monitoring applications continuously produce data to monitor their availability. Hence, it is critical to classify incoming data in real-time and accurately. For this purpose, our study develops an adaptive classification approach using Learn++ that can handle evolving data distributions. This approach sequentially predicts and updates the...
Network traces represent a critical piece of data for network security. Due to lack of expertise, companies are forced to outsource their network traces to third parties to perform analytics on the traces and provide security feedback and recommendations. However, these companies are reluctant to share their network traces, as they comprise sensiti...
Our work introduces an ensemble-based dimensionality reduction approach to efficiently address the high dimensionality of an industrial unlabeled time-series dataset, intending to produce robust data labels. The ensemble comprises a self-supervised learning method to improve data quality, an unsupervised dimensionality reduction to lower the ample...
Our research addresses the actual behavior of the credit-card fraud detection environment where financial transactions containing sensitive data must not be amassed in a considerable amount to train robust classifiers. We introduce an adaptive learning approach that adjusts frequently and efficiently to new transaction chunks; each chunk is discard...
Our study evaluates the quality of a high-dimensional time-series dataset gathered from service observability and monitoring application. We construct the target dataset by extracting heterogeneous sub-datasets from many servers, tackling data incompleteness in each sub-dataset using several imputation techniques, and fusing all the optimally imput...
This study aims to improve disease detection accuracy by incorporating a discrete version of the Whale Optimization Algorithm (WOA) into a supervised classification framework (KNN). We devise the discrete WOA by redefining the related components to operate on discrete spaces. More precisely, we redefine the notion of distance (between individuals i...
Data-driven applications often change over time by considering new features to improve predictive accuracy. Retraining a model from scratch for every change loses the learned knowledge and is very time-consuming. To fill the big literature gap, we devise an incremental feature learning algorithm using constructive neural networks to include new gro...
Risk Management (RM) is critical for projects' success as it can predict undesirable events that may occur. Nevertheless, RM is lacking in industry, especially for projects in the IT sector. For this purpose, our study introduces RM3, a new framework for analyzing and measuring risks to compensate for known and unknown factors affecting the path to...
One key for improving hate speech detection performance is to have a textual training corpus that is vast and confidently labeled. This paper develops a semi-supervised learning approach with self-training to benefit from the abundant amount of social media content and develop a robust hate speech classifier for future predictions. The classifier i...
This study addresses the actual behavior of the credit-card fraud detection environment where financial transactions containing sensitive data must not be amassed in an enormous amount to conduct learning. We introduce a new adaptive learning approach that adjusts frequently and efficiently to new transaction chunks; each chunk is discarded after e...
Mobile devices are used by numerous applications that continuously need
computing power to grow. Due to limited resources for complex computing,
offloading, a service offered for mobile devices, is commonly used in cloud
computing. In Mobile Cloud Computing (MCC), offloading decides where to
execute the tasks to efficiently maximize the benefits. H...
Many incoming data chunks are being produced each day continuously at high speed with soaring dimensionality, and in most cases, these chunks are unlabeled. Our study combines incremental learning with self-labeling to deal with these incoming data chunks. We first search for the best data dimensionality reduction algorithm, leading to the optimal...
This study aims to optimize Deep Feedforward Neural Networks (DFNNs) training using nature-inspired optimization algorithms, such as PSO, MTO, and its variant called MTOCL. We show how these algorithms efficiently update the weights of DFNNs when learning from data. We evaluate the performance of DFNN fused with optimization algorithms using three...
Large and accurately labeled textual corpora are vital to developing efficient hate speech classifiers. This paper introduces an ensemble-based semi-supervised learning approach to leverage the availability of abundant social media content. Starting with a reliable hate speech dataset, we train and test diverse classifiers that are then used to lab...
This research explores Cost-Sensitive Learning (CSL) in the fraud detection domain to decrease the fraud class’s incorrect predictions and increase its accuracy. Notably, we concentrate on shill bidding fraud that is challenging to detect because the behavior of shill and legitimate bidders are similar. We investigate CSL within the Semi-Supervised...
Feature Extraction Algorithms (FEAs) aim to address the curse of dimensionality that makes machine learning algorithms incompetent. Our study conceptually and empirically explores the most representative FEAs. First, we review the theoretical background of many FEAs from different categories (linear vs. nonlinear, supervised vs. unsupervised, rando...
This research explores Cost-Sensitive Learning (CSL) in the fraud detection domain to decrease the fraud class's incorrect predictions and increase its accuracy. Notably, we concentrate on shill bidding fraud that is challenging to detect because the behavior of shill and legitimate bidders are similar. We investigate CSL within the Semi-Supervised...
Deep neural networks have been adopted successfully in hate speech detection problems. Nevertheless, the effect of the word embedding models on the neural network's performance has not been appropriately examined in the literature. In our study, through different detection tasks, 2-class, 3-class, and 6-class classification, we investigate the impa...
Our study explores offensive and hate speech detection for the Arabic language, as previous studies are minimal. Based on two-class, three-class, and six-class Arabic-Twitter datasets, we develop single and ensemble CNN and BiLSTM classifiers that we train with non-contextual (Fasttext-SkipGram) and contextual (Multilingual Bert and AraBert) word-e...
Fraud detection systems aim to process a massive amount of data at high speed. To address the issues of data scalability, we introduce a chunk-based incremental classification approach based on a neural network (MLP) and a memory model to tackle the stability-plasticity dilemma. The incremental approach adapts the fraud model sequentially with inco...
We are witnessing an increasing proliferation of hate speech on social media targeting individuals for their protected characteristics. Our study aims to devise an effective Arabic hate and offensive speech detection framework to address this serious issue. First, we built a reliable Arabic textual corpus by crawling data from Twitter using four ro...
Constraint optimization consists of looking for an optimal solution maximizing a given objective function while meeting a set of constraints. In this study, we propose a new algorithm based on mushroom reproduction for solving constraint optimization problems. Our algorithm, that we call Mushroom Reproduction Optimization (MRO), is inspired by the...
Shill Bidding (SB) is still a predominant auction fraud because it is the toughest to identify due to its resemblance to the standard bidding behavior. To reduce losses on the buyers' side, we devise an example-incremental classification model that can detect fraudsters from incoming auction transactions. Thousands of auctions occur every day in a...
We present a new nature-inspired approach based on the Focus Group Optimization Algorithm (FGOA) for solving Constraint Satisfaction Problems (CSPs). CSPs are NP-complete problems meaning that solving them by classical systematic search methods requires exponential time, in theory. Appropriate alternatives are approximation methods such as metaheur...
Shill Bidding (SB) is a serious auction fraud committed by clever scammers. The challenge in labeling multi-dimensional SB training data hinders research on SB classification. To safeguard individuals from shill bidders , in this study, we explore Semi-Supervised Classification (SSC), which is the most suitable method for our fraud detection proble...
We present a new nature-inspired approach based on the Focus Group Optimization Algorithm (FGOA) for solving Constraint Satisfaction Problems (CSPs). CSPs are NP-complete problems meaning that solving them by classical systematic search methods requires exponential time, in theory. Appropriate alternatives are approximation methods such as metaheur...
Given the magnitude of monetary transactions at auction sites, they are very attractive to fraudsters and scam artists. Shill bidding (SB) is a severe fraud in e-auctions, which occurs during the bidding period and is driven by modern-day technology and clever scammers. SB does not produce any obvious evidence, and it is often unnoticed by the vict...
Alzahrani, AhmadSadaoui, Samira shill bidding is a common fraud in online auctions, it is however very tough to detect because there is no obvious evidence of it happening. There are limited studies on SB classification because training data are difficult to produce. In this study, we build a high-quality labeled shill bidding dataset based on rece...
Given the magnitude of monetary transactions at auction sites, they are very attractive to fraudsters and scam artists. Shill bidding (SB) is a severe fraud in e-auctions, which occurs during the bidding period and is driven by modern-day technology and clever scammers. SB does not produce any obvious evidence, and it is often unnoticed by the vict...
Given the magnitude of monetary transactions at auction sites,
they are very attractive to fraudsters and scam artists. Shill
bidding (SB) is a severe fraud in e-auctions, which occurs
during the bidding period and is driven by modern-day technology
and clever scammers. SB does not produce any obvious
evidence, and it is often unnoticed by the vict...
Online auctions have become one of the most convenient ways to commit fraud due to a large amount of money being traded every day. Shill bidding is the predominant form of auction fraud, and it is also the most difficult to detect because it so closely resembles normal bidding behavior. Furthermore, shill bidding does not leave behind any apparent...
Given the magnitude of online auction transactions, it is difficult to safeguard consumers from dishonest sellers, such as shill bidders. To date, the application of Machine Learning Techniques (MLTs) to auction fraud has been limited, unlike their applications for combatting other types of fraud. Shill Bidding (SB) is a severe auction fraud, which...
Given the magnitude of online auction transactions, it is difficult to safeguard consumers from dishonest sellers, such as shill bidders. To date, the application of machine learning to auction fraud detection has been limited. Shill Bidding (SB) is a severe auction fraud, which is driven by modern-day technologies and clever scammers. The difficul...
E-auctions are vulnerable to Shill Bidding (SB), the toughest fraud to detect due to its resemblance to usual bidding behavior. To avoid financial losses for genuine buyers, we develop a SB detection model based on multi-class ensemble learning. For our study, we utilize a real SB dataset but since the data are unlabeled, we combine a robust data c...
E-auctions are vulnerable to Shill Bidding (SB), the toughest fraud to detect due to its resemblance to usual bidding behavior. To avoid financial losses for genuine buyers, we develop a SB detection model based on multi-class ensemble learning. For our study, we utilize a real SB dataset but since the data are unlabeled, we combine a robust data c...
Shill Bidding (SB) has been recognized as the predominant online auction fraud and also the most difficult to detect due to its similarity to normal bidding behavior. Previously, we produced a high-quality SB dataset based on actual auctions and effectively labeled the instances into normal or suspicious. To overcome the serious problem of imbalanc...
Although shill bidding is a common fraud in online auctions, it is however very tough to detect because there is no obvious evidence of it happening. There are limited studies on SB classification because training data are difficult to produce. In this study, we build a high quality labeled shill bidding dataset based on recently scraped auctions f...
Although shill bidding is a common fraud in online auctions, it is however very tough to detect because there is no obvious evidence of it happening. There are limited studies on SB classification because training data are difficult to produce. In this study, we build a high quality labeled shill bidding dataset based on recently scraped auctions f...
This paper presents a new Chaotic Discrete Firefly Algorithm (CDFA) for solving Constraint Satisfaction Problems (CSPs). CSPs are known as NP-complete problems requiring systematic search methods of exponential time costs for solving them. Metaheuristic algorithms that have been developed for solving complex problems can be considered as appropriat...
The 31st International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems (IEA/AIE-2018) was held at Concordia University in Montreal, Canada, June 25–28, 2018. This report summarizes the The 31st International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems (IEA/A...
Online auctions created a very attractive environment for dishonest moneymakers who can commit different types of fraud. Shill Bidding (SB) is the most predominant auction fraud and also the most difficult to detect because of its similarity to usual bidding behavior. Based on a newly produced SB dataset, in this study, we devise a fraud classifica...
With the growing recognition of the importance of Project Management (PM), new solutions are still researched to improve PM practices in environments where there is a restriction on the project types. PM is becoming more widespread in business and academia but without enough information about the course of actions to be taken to archive success. Th...
Although shill bidding is a common auction fraud, it is however very tough to detect. Due to the unavailability and lack of training data, in this study, we build a high-quality labeled shill bidding dataset based on recently collected auctions from eBay. Labeling shill biding instances with multidimensional features is a critical phase for the fra...
Constraint Satisfaction Problems are regarded as NP-Complete problems which solving them with systematic methods requires exponential time. Firefly algorithm is a nature inspired algorithm which has been successfully applied to different combinatorial problems. This paper presents a new Discrete Firefly Algorithm for Solving Constraint Satisfaction...
E-auctions have attracted serious fraud, such as Shill Bidding (SB),
due to the large amount of money involved and anonymity of users. SB is
difficult to detect given its similarity to normal bidding behavior. To this end, we
develop an efficient SVM-based fraud classifier that enables auction companies
to distinguish between legitimate and shill b...
In the last three decades, we have seen a significant increase in trading goods and services through online auctions. However, this business created an attractive environment for malicious moneymakers who can commit different types of fraud activities, such as Shill Bidding (SB). The latter is predominant across many auctions but this type of fraud...
Constraints Satisfaction Problems (CSPs) are known to be hard to solve and require a backtrack search algorithm with exponential time cost. Metaheuristics have recently gained much reputation for solving complex problems and can be employed as an alternative to tackle CSPs even if, in theory, they do not guarantee a complete solution to the problem...
We have conducted an online survey to review worldwide project management practices in the IT sector. Through this study, our goal is to identify factors that influence the success rate of IT projects and elaborate a new project management approach that will help businesses in the project management discipline.
Exploitation and exploration are two main search strategies of every metaheuristic algorithm. However, the ratio between exploitation and exploration has a significant impact on the performance of these algorithms when dealing with optimization problems. In this study, we introduce an entire fuzzy system to tune efficiently and dynamically the fire...
Exploitation and exploration are two main search strategies of
every metaheuristic algorithm. However, the ratio between exploitation and exploration has a significant impact on the performance of these algorithms when dealing with optimization problems. In this study, we introduce an entire fuzzy system to tune efficiently and dynamically the fir...
This book constitutes the thoroughly refereed proceedings of the 31st International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems, IEA/AIE 2018, held in Montreal, QC, Canada, in June 2018.
The 53 full papers and 33 short papers presented were carefully reviewed and selected from 146 submissions. They ar...
Exploration and exploitation are two strategies used to search the problem space in Evolutionary Algorithms (EAs). To significantly increase the performance of these optimization techniques in terms of the solution optimality is to strike the right balance between exploration and exploitation. Firefly is one of the most favored EAs. In this study,...
Key Performance Indicators (KPIs) are used to inspect the performance and progress of businesses. This study introduces a new, integrated approach to manage KPIs in the context of decentralized information efficiently and to address the visual and managerial gaps existing in companies. The proposed Business Indicator Management (BIM) system is esse...
This study introduces an advanced Combinatorial Reverse Auction (CRA), multi-units, multiattributes and multi-objective, which is subject to buyer and seller trading constraints. Conflicting objectives may occur since the buyer can maximize some attributes and minimize some others. To address the Winner Determination (WD) problem for this type of C...
Online auctioning has attracted serious fraud given the huge amount
of money involved and anonymity of users. In the auction fraud detection
domain, the class imbalance, which means less fraud instances are present in
bidding transactions, negatively impacts the classification performance because
the latter is biased towards the majority class i.e....
Utility companies can organize e-auctions to procure electricity from other suppliers during peak load periods. For this purpose, we develop an efficient Combinatorial Reverse Auction (CRA) to purchase power from diverse sources, residents and plants. Our auction is different from what has been implemented in the electricity markets. In our CRA, wh...
Monitoring the progress of auctions for fraudulent bidding activities is crucial for detecting and stopping fraud during runtime to prevent fraudsters from succeeding. To this end, we introduce a stage-based framework to monitor multiple live auctions for In-Auction Fraud (IAF). Creating a stage fraud monitoring system is different than what has be...
This study introduces a new type of Combinatorial
Reverse Auction (CRA), products with multi-units, multi-attributes
and multi-objectives, which are subject to buyer and
seller constraints. In this advanced CRA, buyers may
maximize some attributes and minimize some others. To
address the Winner Determination (WD) problem in the
presence of multiple...
Auctioning multi-dimensional items is a key challenge, which requires rigorous tools. This study proposes a multi-round, first-score, semi-sealed multi-attribute reverse auction system. A fundamental concern in multi-attribute auctions is acquiring a useful description of the buyers’ individuated requirements: hard constraints and qualitative prefe...
Winner(s) determination in online reverse auctions is a very
appealing e-commerce application. This is a combinatorial optimization problem
where the goal is to find an optimal solution meeting a set of requirements and
minimizing a given procurement cost. This problem is hard to tackle especially
when multiple attributes of instances of items are...