On Expressiveness and Uncertainty Awareness in Rule-based Classification for Data Streams

Le, T. and Stahl, Frederic and Gaber, Mohamed Medhat and Gomes, J.B. and Di Fattaa, G. (2017) On Expressiveness and Uncertainty Awareness in Rule-based Classification for Data Streams. Neurocomputing, 265. pp. 127-141. ISSN 0925-2312

Preview

Text
submission_neuro.pdf - Accepted Version
Available under License Creative Commons Attribution Non-commercial No Derivatives.
Download (4MB)

Official URL: https://doi.org/10.1016/j.neucom.2017.05.081

Abstract

Mining data streams is a core element of Big Data Analytics. It represents the velocity of large datasets, which is one of the four aspects of Big Data, the other three being volume, variety and veracity. As data streams in, models are constructed using data mining techniques tailored towards continuous and fast model update. The Hoeffding Inequality has been among the most successful approaches in learning theory for data streams. In this context, it is typically used to provide a statistical bound for the number of examples needed in each step of an incremental learning process. It has been applied to both classification and clustering problems. Despite the success of the Hoeffding Tree classifier and other data stream mining methods, such models fall short of explaining how their results (i.e., classifications) are reached (black boxing). The expressiveness of decision models in data streams is an area of research that has attracted less attention, despite its paramount of practical importance. In this paper, we address this issue, adopting Hoeffding Inequality as an upper bound to build decision rules which can help decision makers with informed predictions (white boxing). We termed our novel method Hoeffding Rules with respect to the use of the Hoeffding Inequality in the method, for estimating whether an induced rule from a smaller sample would be of the same quality as a rule induced from a larger sample. The new method brings in a number of novel contributions including handling uncertainty through abstaining, dealing with continuous data through Gaussian statistical modelling, and an experimentally proven fast algorithm. We conducted a thorough experimental study using benchmark datasets, showing the efficiency and expressiveness of the proposed technique when compared with the state-of-the-art.

Item Type:	Article
Identification Number:	10.1016/j.neucom.2017.05.081
Dates:	Date Event 3 June 2017 Published Online 26 May 2017 Accepted
Uncontrolled Keywords:	Data Stream mining; Big Data Analytics; Classification; Expressiveness; Abstaining; Modular Classification Rule Induction
Subjects:	CAH11 - computing > CAH11-01 - computing > CAH11-01-01 - computer science
Divisions:	Faculty of Computing, Engineering and the Built Environment Faculty of Computing, Engineering and the Built Environment > College of Computing
Depositing User:	Ian Mcdonald
Date Deposited:	17 May 2017 09:20
Last Modified:	22 Mar 2023 12:01
URI:	https://www.open-access.bcu.ac.uk/id/eprint/4524