Deep Behavioral Analysis of Machine Learning Algorithms Against Data Poisoning

Paracha, Anum and Arshad, Junaid and Farah, Mohamed and Ismail, Khalid N. (2024) Deep Behavioral Analysis of Machine Learning Algorithms Against Data Poisoning. International Journal of Information Security. ISSN 1615-5262 (In Press)

[thumbnail of research paper.pdf] Text
research paper.pdf - Accepted Version
Restricted to Repository staff only

Download (1MB) | Request a copy

Abstract

Poisoning attacks represent one of the most common and practical adversarial attempts on machine learning systems. In this paper, we have conducted a deep behavioural analysis of six machine learning (ML) algorithms, analyzing poisoning impact and correlation between poisoning levels and classification accuracy. Adopting an empirical approach, we highlight practical feasibility of data poisoning, comprehensively analyzing factors of individual algorithms affected by poisoning. We used public datasets (UNSW-NB15, BotDroid, CTU13, and CIC-IDS-2017) and varying poisoning levels (5% - 25%) to conduct rigorous analysis across different settings. In particular, we analyzed the accuracy, precision, recall, f1-score, false positive rate and ROC of the chosen algorithms. Further, we conducted a sensitivity analysis of each algorithm to understand the impact of poisoning on its performance and characteristics underpinning its susceptibility against data poisoning attacks. Our analysis shows that, for 15% poisoning of UNSW NB15 dataset, the accuracy of Decision Tree (DT) decreases by 15.04% with an increase of 14.85% in false positive rate. Further, with 25% poisoning of BotDroid dataset, accuracy of K-nearest neighbours (KNN) decreases by 15.48%. On the other hand, Random Forest (RF) is comparatively more resilient against poisoned training data with a decrease of 8.5% in accuracy with 15% poisoning of UNSW-NB15 dataset and 5.2% for BotDroid dataset. Our results highlight that 10%-15% of dataset poisoning is the most effective poisoning rate, significantly disrupting classifiers without introducing overfitting, whereas 25% is detectable because of high performance degradation and overfitting algorithms. Our analysis also helps understand how asymmetric features and noise affect the impact of data poisoning on machine learning classifiers. Our experimentation and analysis are publicly available at: https://github.com/AnumAtique/Behavioural-Analaysis-of Poisoned-ML/

Item Type: Article
Dates:
Date
Event
11 November 2024
Accepted
Subjects: CAH11 - computing > CAH11-01 - computing > CAH11-01-01 - computer science
Divisions: Faculty of Computing, Engineering and the Built Environment > College of Computing
Depositing User: Junaid Arshad
Date Deposited: 19 Nov 2024 13:06
Last Modified: 19 Nov 2024 13:06
URI: https://www.open-access.bcu.ac.uk/id/eprint/15975

Actions (login required)

View Item View Item

Research

In this section...