Dimension Reduction and Classifier-Based Feature Selection for Oversampled Gene Expression Data and Cancer Classification

Petinrin, Olutomilayo Olayemi and Saeed, Faisal and Salim, Naomie and Toseef, Muhammad and Liu, Zhe and Muyide, Ibukun Omotayo (2023) Dimension Reduction and Classifier-Based Feature Selection for Oversampled Gene Expression Data and Cancer Classification. Processes, 11 (7). p. 1940. ISSN 2227-9717

processes-11-01940.pdf - Published Version
Available under License Creative Commons Attribution.

Download (587kB)


Gene expression data are usually known for having a large number of features. Usually, some of these features are irrelevant and redundant. However, in some cases, all features, despite being numerous, show high importance and contribute to the data analysis. In a similar fashion, gene expression data sometimes have limited instances with a high rate of imbalance among the classes. This can limit the exposure of a classification model to instances of different categories, thereby influencing the performance of the model. In this study, we proposed a cancer detection approach that utilized data preprocessing techniques such as oversampling, feature selection, and classification models. The study used SVMSMOTE for the oversampling of the six examined datasets. Further, we examined different techniques for feature selection using dimension reduction methods and classifier-based feature ranking and selection. We trained six machine learning algorithms, using repeated 5-fold cross-validation on different microarray datasets. The performance of the algorithms differed based on the data and feature reduction technique used.

Item Type: Article
Identification Number: https://doi.org/10.3390/pr11071940
24 June 2023Accepted
27 June 2023Published Online
Uncontrolled Keywords: cancer classification, gene expression, machine learning, microarray data, sampling methods
Subjects: CAH11 - computing > CAH11-01 - computing > CAH11-01-01 - computer science
Divisions: Faculty of Computing, Engineering and the Built Environment > School of Computing and Digital Technology
Depositing User: Gemma Tonks
Date Deposited: 05 Jul 2023 13:04
Last Modified: 05 Jul 2023 13:04
URI: https://www.open-access.bcu.ac.uk/id/eprint/14552

Actions (login required)

View Item View Item


In this section...