Mohamed Amine Ben Farah’s research while affiliated with Birmingham City University and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (40)


OOP Poisoned Model Generation
Surrogate Model Development
Calculating Distances from Decision Boundaries
GMM visualization of features relationship in the dataset with PCA reduction.
Performance analysis of support vector machines (SVM) with consistent poisoning

+6

Outlier-oriented poisoning attack: a grey-box approach to disturb decision boundaries by perturbing outliers in multiclass learning
  • Article
  • Publisher preview available

February 2025

·

8 Reads

·

·

Mohamed Ben Farah

·

Khalid Ismail

Poisoning attacks are a primary threat to machine learning (ML) models, aiming to compromise their performance and reliability by manipulating training datasets. This paper introduces a novel attack—outlier-oriented poisoning (OOP) attack, which manipulates labels of most distanced samples from the decision boundaries. To ascertain the severity of the OOP attack for different degrees (5–25%) of poisoning to conduct a detailed analysis, we analyzed variance, accuracy, precision, recall, f1-score, and false positive rate for chosen ML models. Benchmarking the OOP attack, we have analyzed key characteristics of multiclass machine learning algorithms and their sensitivity to poisoning attacks. Our analysis helps understand behaviour of multiclass models against data poisoning attacks and contributes to effective mitigation against such attacks. Utilizing three publicly available datasets: IRIS, MNIST, and ISIC, our analysis shows that KNN and GNB are the most affected algorithms with a decrease in accuracy of 22.81% and 56.07% for IRIS dataset with 15% poisoning. Whereas, for same poisoning level and dataset, Decision Trees and Random Forest are the most resilient algorithms with the least accuracy disruption (12.28% and 17.52%). We have also analyzed the correlation between number of dataset classes and the performance degradation of models. Our analysis highlighted that number of classes are inversely proportional to the performance degradation, specifically the decrease in accuracy of the models, which is normalized with increasing number of classes. Further, our analysis identified that imbalanced dataset distribution can aggravate the impact of poisoning for machine learning models.

View access options

Deep behavioral analysis of machine learning algorithms against data poisoning

November 2024

·

49 Reads

·

1 Citation

Poisoning attacks represent one of the most common and practical adversarial attempts on machine learning systems. In this paper, we have conducted a deep behavioural analysis of six machine learning algorithms, analyzing poisoning impact and correlation between poisoning levels and classification accuracy. Adopting an empirical approach, we highlight practical feasibility of data poisoning, comprehensively analyzing factors of individual algorithms affected by poisoning. We used public datasets (UNSW-NB15, BotDroid, CTU13, and CIC-IDS-2017) and varying poisoning levels (5–25%) to conduct rigorous analysis across different settings. In particular, we analyzed the accuracy, precision, recall, f1-score, false positive rate and ROC of the chosen algorithms. Further, we conducted a sensitivity analysis of each algorithm to understand the impact of poisoning on its performance and characteristics underpinning its susceptibility against data poisoning attacks. Our analysis shows that, for 15% poisoning of UNSW-NB15 dataset, the accuracy of Decision Tree decreases by 15.04% with an increase of 14.85% in false positive rate. Further, with 25% poisoning of BotDroid dataset, accuracy of K-nearest neighbours (KNN) decreases by 15.48%. On the other hand, Random Forest is comparatively more resilient against poisoned training data with a decrease of 8.5% in accuracy with 15% poisoning of UNSW-NB15 dataset and 5.2% for BotDroid dataset. Our results highlight that 10–15% of dataset poisoning is the most effective poisoning rate, significantly disrupting classifiers without introducing overfitting, whereas 25% is detectable because of high performance degradation and overfitting algorithms. Our analysis also helps understand how asymmetric features and noise affect the impact of data poisoning on machine learning classifiers. Our experimentation and analysis are publicly available at: https://github.com/AnumAtique/Behavioural-Analaysis-of-Poisoned-ML/.


Outlier-Oriented Poisoning Attack: A Grey-box Approach to Disturb Decision Boundaries by Perturbing Outliers in Multiclass Learning

November 2024

·

10 Reads

Poisoning attacks are a primary threat to machine learning models, aiming to compromise their performance and reliability by manipulating training datasets. This paper introduces a novel attack - Outlier-Oriented Poisoning (OOP) attack, which manipulates labels of most distanced samples from the decision boundaries. The paper also investigates the adverse impact of such attacks on different machine learning algorithms within a multiclass classification scenario, analyzing their variance and correlation between different poisoning levels and performance degradation. To ascertain the severity of the OOP attack for different degrees (5% - 25%) of poisoning, we analyzed variance, accuracy, precision, recall, f1-score, and false positive rate for chosen ML models.Benchmarking our OOP attack, we have analyzed key characteristics of multiclass machine learning algorithms and their sensitivity to poisoning attacks. Our experimentation used three publicly available datasets: IRIS, MNIST, and ISIC. Our analysis shows that KNN and GNB are the most affected algorithms with a decrease in accuracy of 22.81% and 56.07% while increasing false positive rate to 17.14% and 40.45% for IRIS dataset with 15% poisoning. Further, Decision Trees and Random Forest are the most resilient algorithms with the least accuracy disruption of 12.28% and 17.52% with 15% poisoning of the IRIS dataset. We have also analyzed the correlation between number of dataset classes and the performance degradation of models. Our analysis highlighted that number of classes are inversely proportional to the performance degradation, specifically the decrease in accuracy of the models, which is normalized with increasing number of classes. Further, our analysis identified that imbalanced dataset distribution can aggravate the impact of poisoning for machine learning models



Machine learning security and privacy: a review of threats and countermeasures

April 2024

·

239 Reads

·

8 Citations

EURASIP Journal on Information Security

Machine learning has become prevalent in transforming diverse aspects of our daily lives through intelligent digital solutions. Advanced disease diagnosis, autonomous vehicular systems, and automated threat detection and triage are some prominent use cases. Furthermore, the increasing use of machine learning in critical national infrastructures such as smart grids, transport, and natural resources makes it an attractive target for adversaries. The threat to machine learning systems is aggravated due to the ability of mal-actors to reverse engineer publicly available models, gaining insight into the algorithms underpinning these models. Focusing on the threat landscape for machine learning systems, we have conducted an in-depth analysis to critically examine the security and privacy threats to machine learning and the factors involved in developing these adversarial attacks. Our analysis highlighted that feature engineering, model architecture, and targeted system knowledge are crucial aspects in formulating these attacks. Furthermore, one successful attack can lead to other attacks; for instance, poisoning attacks can lead to membership inference and backdoor attacks. We have also reviewed the literature concerning methods and techniques to mitigate these threats whilst identifying their limitations including data sanitization, adversarial training, and differential privacy. Cleaning and sanitizing datasets may lead to other challenges, including underfitting and affecting model performance, whereas differential privacy does not completely preserve model’s privacy. Leveraging the analysis of attack surfaces and mitigation techniques, we identify potential research directions to improve the trustworthiness of machine learning systems.


Deep Learning-Based Watermarking Techniques Challenges: A Review of Current and Future Trends

April 2024

·

228 Reads

·

10 Citations

Circuits Systems and Signal Processing

The digital revolution places great emphasis on digital media watermarking due to the increased vulnerability of multimedia content to unauthorized alterations. Recently, in the digital boom in the technology of hiding data, research has been tending to perform watermarking with numerous architectures of deep learning, which has explored a variety of problems since its inception. Several watermarking approaches based on deep learning have been proposed, and they have proven their efficiency compared to traditional methods. This paper summarizes recent developments in conventional and deep learning image and video watermarking techniques. It shows that although there are many conventional techniques focused on video watermarking, there are yet to be any deep learning models focusing on this area; however, for image watermarking, different deep learning-based techniques where efficiency in invisibility and robustness depends on the used network architecture are observed. This study has been concluded by discussing possible research directions in deep learning-based video watermarking.





A new efficient anaglyph 3D image and video watermarking technique minimizing generation deficiencies

July 2023

·

208 Reads

·

1 Citation

3D Anaglyph system is among the most popular 3D displaying techniques thanks to its simplicity and the cheap glasses that it uses. Anaglyph generation and watermarking are two essential techniques that attracted researchers in 3D anaglyph domain where several techniques have been proposed. However, most of the previous anaglyph watermarking studies focused on the robustness and the visual difference between original and marked content and they have not considered the three deficiencies caused by the generation step, which are the distortion of colors, the retinal rivalry and the ghosting effect. In this paper, we propose the first watermarking technique that protect 3D anaglyph content before its transmission by embedding the signature simultaneously with generation step. In this technique, three signatures were embedded before, during and after the generation process using different domains to obtain robustness to several manipulations, especially against malicious attacks. Moreover, the chosen generation process avoids generation deficiencies and allows obtaining high visual quality of the marked content. The experimental results illustrate robustness against attacks such as compression and collusion where the minimum value of NC is close to 0.7 and the maximum value of BER is close to 0.2. Besides, the suggested technique provides high invisibility where PSNR and SSIM values are respectively close to 58 and 0.9, and it minimizes the generation deficiencies.


Citations (23)


... The poisoning levels are set between 5%-25%, at a scale of 5, studying model behavior in different settings. The settings of these poisoning limits are inspired by [40] which highlighted that the poisoning level > 25% leads to abrupt performance degra-dation that is detectable. With our analysis, we have identified important parameters of each algorithm that are sensitive to poisoning attacks, answering how the models are getting misclassified and identifying optimal poisoning rates for each algorithm. ...

Reference:

Outlier-oriented poisoning attack: a grey-box approach to disturb decision boundaries by perturbing outliers in multiclass learning
Deep behavioral analysis of machine learning algorithms against data poisoning

... Among different machine learning concepts utilized to address these tasks, Graph Convolutional Network (GCN) models have proven especially promising [5] due to their inherent ability to model molecular structures using graph-based representations, capturing intricate relationships between the atoms and bonds of the molecule. However, maximizing the potential of such models in practice is challenging, primarily because of data privacy and security concerns [6], which are significantly heightened when sharing data during collaborative efforts. ...

Machine learning security and privacy: a review of threats and countermeasures

EURASIP Journal on Information Security

... Blockchain has been proposed to enhance supply chain management objectives such as cost, quality, speed and risk management within ports and vessels (Li and Zhou, 2021;Farah et al., 2024). Furthermore, the application of smart contracts in maritime operations has shown promise in automating regulatory compliance processes (Hasan et al., 2019). ...

A survey on blockchain technology in the maritime industry: Challenges and future perspectives
  • Citing Article
  • April 2024

Future Generation Computer Systems

... Keywords Zero-watermarking · Fractional-order Jacobi-Fourier moments · Non-extended visual cryptography · Fast finite Shearlet transform · Schur decomposition 1 Introduction 12,13] and digital watermarking technique [8,21] are two information hiding techniques widely used in the field of data security. Generally, steganography realizes the secret transmission of data by embedding secret data into multimedia covers such as images, audio or video. ...

Deep Learning-Based Watermarking Techniques Challenges: A Review of Current and Future Trends

Circuits Systems and Signal Processing

... Apart from ML models, recurrent neural networks (RNNs) and large language models (LLMs) are also used for sentiment classification tasks. RNN-based models, such as ANN, LSTM, BiLSTM, CNN, CNN-LSTM, and CNN-BiLSTM [28][29][30][31][32], are leveraged for comment classification. Additionally, fine-tuned versions of LLMs, such as RoBERTa (one-shot), LLaMA (one-shot), and T5 (one-shot) [33] are employed for comment classification based on different datasets. ...

Sorting the Digital Stream: Big Data-driven Insights into Email Classification for Spam and Ham Detection
  • Citing Conference Paper
  • December 2023

... This equation can be used to calculate the energy conversion process under different motion modes and combined with Transformer to predict the motion state of future frames, thus generating video sequences that conform to biomechanical laws. Combined with Transformer's global temporal feature learning capability, biomechanical motion analysis can achieve more accurate motion prediction, anomaly detection and video generation, and promote the in-depth development of intelligent motion analysis and simulation [6]. ...

A new efficient anaglyph 3D image and video watermarking technique minimizing generation deficiencies

... Additionally, WebRTC devices and machine learning models must be interoperable and standard. A simulated eavesdropping attack on WebRTC was presented in this paper [119]. The privacy concerns within the system must also be considered in addition to the security risk. ...

Detection of JavaScript Injection Eavesdropping on WebRTC communications
  • Citing Conference Paper
  • June 2022

... Ship-to-ship NIDS Network intrusion detection system KF Kalman filter middle (MITM) attacks have led to vessel mispositioning, collisions, and disruptions to Global Navigation Satellite Systems (GNSS) [23][24][25]. Additionally, ransomware attacks such as Hermes 2.1 and breaches in sensitive data systems, like those involving the US Navy contractor, further highlight the vulnerabilities of maritime power systems [26][27][28][29][30]. ...

Cyber Security in the Maritime Industry: A Systematic Survey of Recent Advances and Future Trends

... Esto a su vez puede ser complementado con certificaciones profesionales, que típicamente operan mediante un único examen y permiten obtener una credencial que ofrece un punto de referencia para el conocimiento a nivel del mercado laboral. Para este caso, es conveniente comprender el alcance de las mismas, su dificultad relativa, y su alcance [11]. ...

Cyber Security Certification Programmes
  • Citing Conference Paper
  • July 2021