A Behaviour-Informed Approach to Mitigate Data Poisoning Attacks for Machine Learning

Paracha, Anum

A Behaviour-Informed Approach to Mitigate Data Poisoning Attacks for Machine Learning

Paracha, Anum (2026) A Behaviour-Informed Approach to Mitigate Data Poisoning Attacks for Machine Learning. Doctoral thesis, Birmingham City University.

[thumbnail of Anum Paracha PhD Thesis_Final Version_Final Award January 2026.pdf]

Preview

Text
Anum Paracha PhD Thesis_Final Version_Final Award January 2026.pdf - Accepted Version
Download (14MB)

Abstract

The widespread deployment of machine learning (ML) across critical domains such as healthcare, transport, and smart grids has increased dependence on automated decision-making, exposing these systems to potential exploitation by adversaries. ML models are highly susceptible to adversarial perturbations, including intentional input manipulations that can alter model performance. Among these, data poisoning attacks are particularly evolving as there are enormous ways to corrupt training data to distort underlying behaviour and undermine system reliability. Furthermore, the risks of data poisoning attacks increase with the dependence on public datasets.

Data poisoning attacks have been extensively explored in the context of deep learning (DL) models; however, traditional ML, especially multiclass models, remains underexplored in assessing vulnerabilities and defences. Consequently, most mitigation strategies are limited to DL and are designed for specific algorithms or attack models. For example, adversarial training is effective for gradient-based models but less effective for traditional models as they do not rely on gradient optimisation. These limitations enable adversaries to exploit defences through new attack vectors, thereby complicating the security of ML systems. Moreover, limited defences for traditional ML keep these models vulnerable to such attacks.

This thesis analysed the security of traditional ML under data poisoning attacks implemented with limited adversarial capabilities and knowledge and analysed limitations of existing defences, subsequently introducing an enhanced mitigation strategy. The manipulations to training datasets are analysed through comprehensive deep behavioural analysis, identifying the change in model characteristics, the impact of increasing poisoning levels and their relationships. Furthermore, a new multiclass poisoning attack is proposed by exploiting a common outlier characteristic of ML models, called Outlier-Oriented Poisoning (OOP) attack. This attack leveraged the examination of multi-class ML under limited adversarial capabilities. These studies revealed how data poisoning alters the learning dynamics of the model and its characteristics. Insights from this analysis informed the development of SecureLearn, a behaviour-informed, attack-agnostic mitigation solution combining enhanced data sanitisation with a novel feature-oriented adversarial training (FORT) approach to improve model resilience against data poisoning.

This thesis examined SecureLearn by proposing a 3D evaluation matrix. Experimental results of this study demonstrated that SecureLearn effectively enhanced the security and robustness of multiclass ML across random forest (RF), decision tree (DT), gaussian naive bayes (GNB) and neural networks, confirming its generalisability beyond algorithm-specific defences. SecureLearn consistently maintained accuracy above 90%, recall and f1-score above 75%, and reduced the false discovery rate to 0.06 across all evaluated models against three distinct poisoning attacks. For RF models, SecureLearn maintained a minimum recall of 84.19% and f1-score of 81.54% at 20% poisoning level with the OOP attack. For DT models, the minimum recall is 78.20% and f1-score is 7.80%. However, it is observed that SecureLearn is less effective in enhancing the resilience of GNB models trained with the MNIST dataset. GNB models trained with the MNIST dataset, SecureLearn maintained the recall at a minimum of 57% with f1-score of 56%. In the context of neural networks, SecureLearn achieved at least 97% recall and f1-score against all selected poisoning attacks. The adversarial robustness of models, trained with SecureLearn, improved with an average accuracy trade-off of only 3%.

Item Type:	Thesis (Doctoral)
Dates:	Date Event 13 January 2026 Accepted
Uncontrolled Keywords:	Adversarial Machine Learning, Data Poisoning Attacks, Attack-agnostic Security, Training-time Attacks, Secure AI
Subjects:	CAH11 - computing > CAH11-01 - computing > CAH11-01-01 - computer science CAH11 - computing > CAH11-01 - computing > CAH11-01-05 - artificial intelligence CAH11 - computing > CAH11-01 - computing > CAH11-01-08 - others in computing
Divisions:	Architecture, Built Environment, Computing and Engineering > Computer Science Doctoral Research College > Doctoral Theses Collection
Depositing User:	Louise Muldowney
Date Deposited:	09 Feb 2026 11:18
Last Modified:	09 Feb 2026 11:18
URI:	https://www.open-access.bcu.ac.uk/id/eprint/16846