Searching for just a few words should be enough to get started. If you need to make more complex queries, use the tips below to guide you.
Article type: Research Article
Authors: Chong, Weitua | Chen, Ningjianga; b; * | Fang, Chengyuna
Affiliations: [a] School of Computer and Electronic Information, Guangxi University, Nanning, Guangxi, China | [b] Key Laboratory of Parallel and Distributed Computing, Guangxi Colleges and Universities, Nanning, Guangxi, China
Correspondence: [*] Corresponding author: Ningjiang Chen, School of Computer and Electronic Information, Guangxi University, Nanning, Guangxi 530004, China. Tel.: +86 7713236133; E-mail: chnj@gxu.edu.cn.
Abstract: Addressing the problem of imbalanced data category distribution in real applications and the problem of traditional classifiers tending to ensure the accuracy of the majority class while ignoring the accuracy of the minority class when processing imbalanced data, this paper proposes a method called RBSP-Boosting for imbalanced data classification. First, RBSP-Boosting introduces the Shapley value and calculates the Shapley value for each sample of the dataset through the truncated Monte Carlo method. Moreover, the proposed method removes the noise data according to the Shapley value and undersamples the samples with Shapley values less than zero in the majority class. Then, it takes the Shapley value as the weight of the sample and oversamples the minority class according to the weight. Finally, the new dataset is trained on the classifier through the AdaBoost classifier. Experiments are conducted on nine groups of UCI and KEEL datasets, and RBSP-Boosting is compared with four sampling algorithms: Random-OverSampler, SMOTE, Borderline-SMOTE and SVM-SMOTE. Experimental results show that the RBSP-Boosting method in the three evaluation metrics of AUC, F-score and G-mean, compared with the best performance of the four comparison algorithms, increases by 4.69%, 10.3% and 7.86%, respectively. The proposed method can significantly improve the effect of imbalanced data classification.
Keywords: Shapley value, resampling, imbalanced data, monte carlo
DOI: 10.3233/IDA-216092
Journal: Intelligent Data Analysis, vol. 26, no. 6, pp. 1579-1595, 2022
IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
USA
Tel: +1 703 830 6300
Fax: +1 703 830 2300
sales@iospress.com
For editorial issues, like the status of your submitted paper or proposals, write to editorial@iospress.nl
IOS Press
Nieuwe Hemweg 6B
1013 BG Amsterdam
The Netherlands
Tel: +31 20 688 3355
Fax: +31 20 687 0091
info@iospress.nl
For editorial issues, permissions, book requests, submissions and proceedings, contact the Amsterdam office info@iospress.nl
Inspirees International (China Office)
Ciyunsi Beili 207(CapitaLand), Bld 1, 7-901
100025, Beijing
China
Free service line: 400 661 8717
Fax: +86 10 8446 7947
china@iospress.cn
For editorial issues, like the status of your submitted paper or proposals, write to editorial@iospress.nl
如果您在出版方面需要帮助或有任何建, 件至: editorial@iospress.nl