A Novel Feature-Engineered–NGBoost Machine-Learning Framework for Fraud Detection in Electric Power Consumption Data

Hussain, Saddam and Mustafa, Mohd Wazir and Al-Shqeerat, Khalil and Saeed, Faisal and Al-rimy, Bander (2021) A Novel Feature-Engineered–NGBoost Machine-Learning Framework for Fraud Detection in Electric Power Consumption Data. Sensors, 21 (24). p. 8423. ISSN 1424-8220

sensors-21-08423-v2.pdf - Published Version
Available under License Creative Commons Attribution.

Download (14MB)


This study presents a novel feature-engineered–natural gradient descent ensemble-boosting (NGBoost) machine-learning framework for detecting fraud in power consumption data. The proposed framework was sequentially executed in three stages: data pre-processing, feature engineering, and model evaluation. It utilized the random forest algorithm-based imputation technique initially to impute the missing data entries in the acquired smart meter dataset. In the second phase, the majority weighted minority oversampling technique (MWMOTE) algorithm was used to avoid an unequal distribution of data samples among different classes. The time-series feature-extraction library and whale optimization algorithm were utilized to extract and select the most relevant features from the kWh reading of consumers. Once the most relevant features were acquired, the model training and testing process was initiated by using the NGBoost algorithm to classify the consumers into two distinct categories (“Healthy” and “Theft”). Finally, each input feature’s impact (positive or negative) in predicting the target variable was recognized with the tree SHAP additive-explanations algorithm. The proposed framework achieved an accuracy of 93%, recall of 91%, and precision of 95%, which was greater than all the competing models, and thus validated its efficacy and significance in the studied field of research.

Item Type: Article
Identification Number: https://doi.org/10.3390/s21248423
13 December 2021Accepted
17 December 2021Published Online
Uncontrolled Keywords: theft detection in power consumption data; NGBoost algorithm; majority weighted minority oversampling technique algorithm; whale optimization algorithm; tree SHAP algorithm
Subjects: CAH11 - computing > CAH11-01 - computing > CAH11-01-01 - computer science
CAH11 - computing > CAH11-01 - computing > CAH11-01-05 - artificial intelligence
Divisions: Faculty of Computing, Engineering and the Built Environment > School of Computing and Digital Technology
Depositing User: Faisal Saeed
Date Deposited: 05 Jan 2022 14:01
Last Modified: 05 Jan 2022 14:01
URI: https://www.open-access.bcu.ac.uk/id/eprint/12584

Actions (login required)

View Item View Item


In this section...