Adaptive Generation-based Approaches of Oversampling using Different Sets of Base and Nearest Neighbor’s Instances
Nabus, Hatem S. Y. and Ali, Aida and Hassan, Shafaatunnur and Shamsuddin, Siti Mariya and Mustapha, Ismail B. and Saeed, Faisal (2022) Adaptive Generation-based Approaches of Oversampling using Different Sets of Base and Nearest Neighbor’s Instances. International Journal of Advanced Computer Science and Applications, 13 (4). ISSN 2158-107X
Preview |
Text
Paper_61-Adaptive_Generation_based_Approaches_of_Oversampling.pdf - Published Version Available under License Creative Commons Attribution. Download (769kB) |
Abstract
Standard classification algorithms often face a challenge of learning from imbalanced datasets. While several approaches have been employed in addressing this problem, methods that involve oversampling of minority samples remain more widely used in comparison to algorithmic modifications. Most variants of oversampling are derived from Synthetic Minority Oversampling Technique (SMOTE), which involves generation of synthetic minority samples along a point in the feature space between two minority class instances. The main reasons these variants produce different results lies in (1) the samples they use as initial selection / base samples and the nearest neighbors. (2) Variation in how they handle minority noises. Therefore, this paper presented different combinations of base and nearest neighbor’s samples which never used before to monitor their effect in comparison to the standard oversampling techniques. Six methods; three combinations of Only Danger Oversampling (ODO) techniques, and three combinations of Danger Noise Oversampling (DNO) techniques are proposed. The ODO’s and DNO’s methods use different groups of samples as base and nearest neighbors. While the three ODO’s methods do not consider the minority noises, the three DNO’s include the minority noises in both the base and neighbor samples. The performances of the proposed methods are compared to that of several standard oversampling algorithms. We present experimental results demonstrating a significant improvement in the recall metric.
Item Type: | Article |
---|---|
Identification Number: | 10.14569/IJACSA.2022.0130461 |
Dates: | Date Event 1 April 2022 Accepted 1 April 2022 Published Online |
Uncontrolled Keywords: | Class imbalance; nearest neighbors; base samples; initial selection; SMOTE |
Subjects: | CAH11 - computing > CAH11-01 - computing > CAH11-01-01 - computer science |
Divisions: | Faculty of Computing, Engineering and the Built Environment > College of Computing |
Depositing User: | Faisal Saeed |
Date Deposited: | 13 Jun 2022 16:21 |
Last Modified: | 13 Jun 2022 16:21 |
URI: | https://www.open-access.bcu.ac.uk/id/eprint/13291 |
Actions (login required)
![]() |
View Item |