Hybrid Metaheuristic Methods for Ensemble Classification in Non-stationary Data Streams

Ghomeshi, Hossein (2020) Hybrid Metaheuristic Methods for Ensemble Classification in Non-stationary Data Streams. Doctoral thesis, Birmingham City University.

[img]
Preview
Text
Thesis_Final_Revised.pdf - Updated Version

Download (2MB)

Abstract

The extensive growth of digital technologies has led to new challenges in terms of processing and distilling insights from data that generated continuously in real-time. To address this challenge, several data stream mining techniques, where each instance of data is typically processed once on its arrival (i.e. online), have been proposed. However, such techniques of-ten perform poorly over non-stationary data streams, where the distribution of data evolves over time in unforeseen ways. To ensure the predictive ability of a computational model working with evolving data, appropriate data-stream mining techniques capable of adapting to different types of concept drifts are required. So far, ensemble-based learning methods are among the most popular techniques employed for performing data stream classification tasks in the presence of concept drifts. In ensemble learning, multiple learners forming an ensemble are trained to obtain a better predictive performance compared to that of a single learner.
This thesis aims to propose and investigate novel hybrid metaheuristic methods for per-forming classification tasks in non-stationary environments. In particular, the thesis offers the following three main contributions. First, it presents the Evolutionary Adaptation to Concept Drifts (EACD) method that uses two evolutionary algorithms, namely, Replicator Dynamics (RD) and Genetic algorithm (GA). According to this method, an ensemble of different classification types is created based on various feature sets (called subspaces) randomly drawn from the target data stream. These subspaces are allowed to grow or shrink based on their performance using RD, while their combinations are optimised using GA. As the second contribution, this thesis proposes the REplicator Dynamics & GENEtic (RED-GENE)algorithm. RED-GENE builds upon the EACD method and employs the same approach to creating different classification types and GA optimisation technique. At the same time, RED-GENE improves the EACD method by proposing three different modified versions of RD to accelerate the concept drift adaptation process. The third contribution of the thesis is the REplicator Dynamics & Particle Swarm Optimisation (RED-PSO) algorithm that is based on a three-layer architecture to produce classification types of different sizes. The selected feature combinations in all classification types are optimised using a non-canonical version of the Particle Swarm Optimisation (PSO) technique for each layer individually.

An extensive set of experiments using both synthetic and real-world data streams proves the effectiveness of the three proposed methods along with their statistical significance to the state-of-the-art algorithms. The proposed methods in this dissertation are consequently compared with each other that proves each of the proposed methods has its strengths to-wards concept drift adaptation in non-stationary data stream classification. This has led us to formulate a list of suggestions on when to use each of the proposed methods with regards to different applications and environments.

Item Type: Thesis (Doctoral)
Date: 15 June 2020
Uncontrolled Keywords: Data Stream Mining, Ensemble Learning, Concept Drifts, Non-stationary Data Streams, Data Stream Classification.
Subjects: G400 Computer Science
Divisions: Faculty of Computing, Engineering and the Built Environment > School of Computing and Digital Technology
REF UoA Output Collections > Doctoral Theses Collection
Depositing User: Kip Darling
Date Deposited: 27 Jun 2021 19:42
Last Modified: 27 Jun 2021 19:42
URI: http://www.open-access.bcu.ac.uk/id/eprint/11838

Actions (login required)

View Item View Item

Research

In this section...