An Ensemble Machine Learning and Data Mining Approach to Enhance Stroke Prediction

Wijaya, Richard and Saeed, Faisal and Samimi, Parnia and Albarrak, Abdullah M. and Qasem, Sultan Noman (2024) An Ensemble Machine Learning and Data Mining Approach to Enhance Stroke Prediction. Bioengineering, 11 (7). p. 672. ISSN 2306-5354

[thumbnail of bioengineering-11-00672-v2.pdf]
Preview
Text
bioengineering-11-00672-v2.pdf - Published Version
Available under License Creative Commons Attribution.

Download (8MB)

Abstract

Stroke poses a significant health threat, affecting millions annually. Early and precise prediction is crucial to providing effective preventive healthcare interventions. This study applied an ensemble machine learning and data mining approach to enhance the effectiveness of stroke prediction. By employing the cross-industry standard process for data mining (CRISP-DM) methodology, various techniques, including random forest, ExtraTrees, XGBoost, artificial neural network (ANN), and genetic algorithm with ANN (GANN) were applied on two benchmark datasets to predict stroke based on several parameters, such as gender, age, various diseases, smoking status, BMI, HighCol, physical activity, hypertension, heart disease, lifestyle, and others. Due to dataset imbalance, Synthetic Minority Oversampling Technique (SMOTE) was applied to the datasets. Hyperparameter tuning optimized the models via grid search and randomized search cross-validation. The evaluation metrics included accuracy, precision, recall, F1-score, and area under the curve (AUC). The experimental results show that the ensemble ExtraTrees classifier achieved the highest accuracy (98.24%) and AUC (98.24%). Random forest also performed well, achieving 98.03% in both accuracy and AUC. Comparisons with state-of-the-art stroke prediction methods revealed that the proposed approach demonstrates superior performance, indicating its potential as a promising method for stroke prediction and offering substantial benefits to healthcare.

Item Type: Article
Identification Number: 10.3390/bioengineering11070672
Dates:
Date
Event
20 June 2024
Accepted
2 July 2024
Published Online
Uncontrolled Keywords: stroke, prediction model, machine learning, ensemble learning
Subjects: CAH11 - computing > CAH11-01 - computing > CAH11-01-01 - computer science
Divisions: Faculty of Computing, Engineering and the Built Environment > College of Computing
Depositing User: Gemma Tonks
Date Deposited: 12 Aug 2024 13:26
Last Modified: 12 Aug 2024 13:26
URI: https://www.open-access.bcu.ac.uk/id/eprint/15706

Actions (login required)

View Item View Item

Research

In this section...