Searching for just a few words should be enough to get started. If you need to make more complex queries, use the tips below to guide you.
Article type: Research Article
Authors: Khoshgoftaar, Taghi M.a; * | Seliya, Naeemb | Drown, Dennis J.a
Affiliations: [a] Computer and Electrical Engineering and Computer Science Department, Florida Atlantic University, FL, USA | [b] Computer and Information Science Department, University of Michigan – Dearborn, MI, USA
Correspondence: [*] Corresponding author: Taghi M. Khoshgoftaar, Computer and Electrical Engineering and Computer Science Department, Florida Atlantic University, 777 West Glades Road, Boca Raton, FL 33431, USA. E-mail: taghi@cse.fau.edu.
Abstract: Class imbalance, where the classes in a dataset are not represented equally, is a common occurrence in machine learning. Classification models built with such datasets are often not practical since most machine learning algorithms would tend to perform poorly on the minority class instances. We present a unique evolutionary computing-based data sampling approach as an effective solution for the class imbalance problem. The genetic algorithm-based approach, Evolutionary Sampling, works as a majority undersampling technique where instances from the majority class are selectively removed. This preserves the relative integrity of the majority class while maintaining the original minority class group. Our research prototype, eVann, also implements genetic-algorithm-based optimization of modeling parameters for the machine learning algorithms considered in our study. An extensive empirical investigation involving four real-world datasets is performed, comparing the proposed approach to other existing data sampling techniques that target the class imbalance problem. Our results demonstrate that Evolutionary Sampling, both with and without learner optimization, performs relatively better than other data sampling techniques. A detailed coverage of our case studies in this paper lends itself toward empirical replication.
Keywords: Class imbalance, data sampling, genetic algorithms, machine learning, classification
DOI: 10.3233/IDA-2010-0409
Journal: Intelligent Data Analysis, vol. 14, no. 1, pp. 69-88, 2010
IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
USA
Tel: +1 703 830 6300
Fax: +1 703 830 2300
sales@iospress.com
For editorial issues, like the status of your submitted paper or proposals, write to editorial@iospress.nl
IOS Press
Nieuwe Hemweg 6B
1013 BG Amsterdam
The Netherlands
Tel: +31 20 688 3355
Fax: +31 20 687 0091
info@iospress.nl
For editorial issues, permissions, book requests, submissions and proceedings, contact the Amsterdam office info@iospress.nl
Inspirees International (China Office)
Ciyunsi Beili 207(CapitaLand), Bld 1, 7-901
100025, Beijing
China
Free service line: 400 661 8717
Fax: +86 10 8446 7947
china@iospress.cn
For editorial issues, like the status of your submitted paper or proposals, write to editorial@iospress.nl
如果您在出版方面需要帮助或有任何建, 件至: editorial@iospress.nl