Data Stream Clustering for Real-Time Anomaly Detection: An Application to Insider Threats

Haidar, Diana and Gaber, Mohamed Medhat (2018) Data Stream Clustering for Real-Time Anomaly Detection: An Application to Insider Threats. In: Clustering Methods for Big Data Analytics. Springer, pp. 115-144.

Preview

Text
author.pdf
Download (392kB)

Abstract

Insider threat detection is an emergent concern for academia, industries, and governments due to the growing number of insider incidents in recent years. The continuous streaming of unbounded data coming from various sources in an organisation, typically in a high velocity, leads to a typical Big Data computational problem. The malicious insider threat refers to anomalous behaviour(s) (outliers) that deviate from the normal baseline of a data stream. The absence of previously logged activities executed by users shapes the insider threat detection mechanism into an unsupervised anomaly detection approach over a data stream. A common shortcoming in the existing data mining approaches to detect insider threats is the high number of false alarms/positives (FPs). To handle the Big Data issue and to address the shortcoming, we propose a streaming anomaly detection approach, namely Ensemble of Random subspace Anomaly detectors In Data Streams (E-RAIDS), for insider threat detection. E-RAIDS learns an ensemble of p established outlier detection techniques [Micro-cluster-based Continuous Outlier Detection (MCOD) or Anytime Outlier Detection (AnyOut)] which employ clustering over continuous data streams. Each model of the p models learns from a random feature subspace to detect local outliers, which might not be detected over the whole feature space. E-RAIDS introduces an aggregate component that combines the results from the p feature subspaces, in order to confirm whether to generate an alarm at each window iteration. The merit of E-RAIDS is that it defines a survival factor and a vote factor to address the shortcoming of high number of FPs. Experiments on E-RAIDS-MCOD and E-RAIDS-AnyOut are carried out, on synthetic data sets including malicious insider threat scenarios generated at Carnegie Mellon University, to test the effectiveness of voting feature subspaces, and the capability to detect (more than one)-behaviour-all-threat in real-time. The results show that E-RAIDS-MCOD reports the highest F1 measure and less number of false alarm = 0 compared to E-RAIDS-AnyOut, as well as it attains to detect approximately all the insider threats in real-time.

Item Type:	Book Section
Dates:	Date Event 2019 UNSPECIFIED 4 April 2018 Accepted
Subjects:	CAH11 - computing > CAH11-01 - computing > CAH11-01-05 - artificial intelligence
Divisions:	Faculty of Computing, Engineering and the Built Environment Faculty of Computing, Engineering and the Built Environment > College of Computing
Depositing User:	Mohamed Gaber
Date Deposited:	13 Nov 2018 11:45
Last Modified:	22 Mar 2023 12:01
URI:	https://www.open-access.bcu.ac.uk/id/eprint/6561