Deep Behavioral Analysis of Machine Learning Algorithms Against Data Poisoning

Paracha, Anum; Arshad, Junaid; Farah, Mohamed; Ismail, Khalid N.

Deep Behavioral Analysis of Machine Learning Algorithms Against Data Poisoning

Paracha, Anum and Arshad, Junaid and Farah, Mohamed and Ismail, Khalid N. (2024) Deep Behavioral Analysis of Machine Learning Algorithms Against Data Poisoning. International Journal of Information Security, 24. ISSN 1615-5262

Text
research paper.pdf - Accepted Version
Download (1MB)

Official URL: https://link.springer.com/article/10.1007/s10207-0...

Abstract

Poisoning attacks represent one of the most common and practical adversarial attempts on machine learning systems. In this paper, we have conducted a deep behavioural analysis of six machine learning (ML) algorithms, analyzing poisoning impact and correlation between poisoning levels and classification accuracy. Adopting an empirical approach, we highlight practical feasibility of data poisoning, comprehensively analyzing factors of individual algorithms affected by poisoning. We used public datasets (UNSW-NB15, BotDroid, CTU13, and CIC-IDS-2017) and varying poisoning levels (5% - 25%) to conduct rigorous analysis across different settings. In particular, we analyzed the accuracy, precision, recall, f1-score, false positive rate and ROC of the chosen algorithms. Further, we conducted a sensitivity analysis of each algorithm to understand the impact of poisoning on its performance and characteristics underpinning its susceptibility against data poisoning attacks. Our analysis shows that, for 15% poisoning of UNSW NB15 dataset, the accuracy of Decision Tree (DT) decreases by 15.04% with an increase of 14.85% in false positive rate. Further, with 25% poisoning of BotDroid dataset, accuracy of K-nearest neighbours (KNN) decreases by 15.48%. On the other hand, Random Forest (RF) is comparatively more resilient against poisoned training data with a decrease of 8.5% in accuracy with 15% poisoning of UNSW-NB15 dataset and 5.2% for BotDroid dataset. Our results highlight that 10%-15% of dataset poisoning is the most effective poisoning rate, significantly disrupting classifiers without introducing overfitting, whereas 25% is detectable because of high performance degradation and overfitting algorithms. Our analysis also helps understand how asymmetric features and noise affect the impact of data poisoning on machine learning classifiers. Our experimentation and analysis are publicly available at: https://github.com/AnumAtique/Behavioural-Analaysis-of Poisoned-ML/

Item Type:	Article
Identification Number:	10.1007/s10207-024-00940-x
Dates:	Date Event 11 November 2024 Accepted 25 November 2024 Published Online
Subjects:	CAH11 - computing > CAH11-01 - computing > CAH11-01-01 - computer science
Divisions:	Architecture, Built Environment, Computing and Engineering > Computer Science
Depositing User:	Junaid Arshad
Date Deposited:	19 Nov 2024 13:06
Last Modified:	25 Nov 2025 03:00
URI:	https://www.open-access.bcu.ac.uk/id/eprint/15975