A Hybrid Undersampling-SMOTE Method for Imbalanced Big Data Classification

Shaymaa Ahmed Razoqi; Ghayda Al-Talib

doi:10.33899/edusj.2023.143612.1393

Vol. 32 No. 4 (2023), Articles

Vol. 32 No. 4 (2023)

A Hybrid Undersampling-SMOTE Method for Imbalanced Big Data Classification

Articles

https://doi.org/10.33899/edusj.2023.143612.1393

Published 2023-12-01

Shaymaa Ahmed Razoqi
Ghayda Al-Talib

Shaymaa Ahmed Razoqi

Ghayda Al-Talib

PDF

Keywords

Big data

،Classification
،Imbalanced Problem
،Resampling
،Clustering

Abstract

Imbalanced data is an important issues and challenges faced in data classification. This will lead to poor performance of binary classifiers, this is due to bias in classification in favour of the majority class and lack of understanding of the influence of the minority class, while the minority class is usually the most important in the classification process. In order to find a compromise between the information loss and balance the data set before applying the classification, the research proposed a hybrid algorithm based on the use of clustering methods to divide the majority class into subgroups in the first phase, and using a method to encode the majority class. The Algorithm used the code to group samples that are similar to each other and reduce the majority class count. At the same time, the Synthetic Minority Oversampling Technique (SMOTE) was used to increase the number of minority class samples in the next phase. The study examined the impact of the proposed algorithm on five classifiers based on the AUC and F-score post-classification performance parameters using benchmark datasets with different sizes and imbalance factors. The results showed that the proposed algorithm significantly improved the performance of the classifiers when applied to the resampled data.

https://doi.org/10.33899/edusj.2023.143612.1393

PDF

Journal of Education and Science

A Hybrid Undersampling-SMOTE Method for Imbalanced Big Data Classification

Keywords

Abstract

Newsletter Subscription