Ensemble Dynamics in Non-stationary Data Stream Classification

Ghomeshi, Hossein and Gaber, Mohamed Medhat and Kovalchuk, Yevgeniya (2018) Ensemble Dynamics in Non-stationary Data Stream Classification. In: Learning from Data Streams in Evolving Environments. Springer, pp. 123-153.

[img]
Preview
Text
ensemble-dynamics-stationary.pdf

Download (3MB)

Abstract

Data stream classification is the process of learning supervised models from continuous labelled examples in the form of an infinite stream that, in most cases, can be read only once by the data mining algorithm. One of the most challenging problems in this process is how to learn such models in non-stationary environments, where the data/class distribution evolves over time. This phenomenon is called concept drift. Ensemble learning techniques have been proven effective adapting to concept drifts. Ensemble learning is the process of learning a number of classifiers, and combining them to predict incoming data using a combination rule. These techniques should incrementally process and learn from existing data in a limited memory and time to predict incoming instances and also to cope with different types of concept drifts including incremental, gradual, abrupt or recurring. A sheer number of applications can benefit from data stream classification from non-stationary data, including weather forecasting, stock market analysis, spam filtering systems, credit card fraud detection, traffic monitoring, sensor data analysis in Internet of Things (IoT) networks, to mention a few. Since each application has its own characteristics and conditions, it is difficult to introduce a single approach that would be suitable for all problem domains. This chapter studies ensembles’ dynamic behaviour of existing ensemble methods (e.g. addition, removal and update of classifiers) in non-stationary data stream classification. It proposes a new, compact, yet informative formalisation of state-of-the-art methods. The chapter also presents results of our experiments comparing a diverse selection of best performing algorithms when applied to several benchmark data sets with different types of concept drifts from different problem domains.

Item Type: Book Section
Subjects: G400 Computer Science
Divisions: Faculty of Computing, Engineering and the Built Environment > School of Computing and Digital Technology > Enterprise Systems
Faculty of Computing, Engineering and the Built Environment
Depositing User: Mohamed Gaber
Date Deposited: 13 Nov 2018 11:51
Last Modified: 13 Nov 2018 11:51
URI: http://www.open-access.bcu.ac.uk/id/eprint/6560

Actions (login required)

View Item View Item

Research

In this section...