Quick online spam classification method based on active and incremental learning

Feng, Lizhou; Wang, Youwei; Zuo, Wanli

doi:10.3233/IFS-151707

Quick online spam classification method based on active and incremental learning

Article type: Research Article

Authors: Feng, Lizhou^{a; b} | Wang, Youwei^{a; b} | Zuo, Wanli^{a; b; *}

Affiliations: [a] College of Computer Science and Technology, Jilin University, Changchun, Jilin, China | [b] Key Laboratory of Symbol Computation and Knowledge Engineering of the Ministry of Education, Jilin University, Changchun, Jilin, China

Correspondence: [*] Corresponding author. Wanli Zuo, College of Computer Science and Technology, Jilin University, Changchun, China. Tel.: +86 13604307340; Fax: +86 0431 85166492; E-mail: 1036569449@qq.com.

Abstract: In order to improve the classification speed without sacrificing the email classification accuracy seriously, a novel online spam classification method is proposed. Firstly, the conceptions of term frequency based interest sets are introduced, and emails are classified by combining term frequency based interest sets and Naïve Bayes classifier. Secondly, based on the active learning theory, a novel boundary density based email classification certainty evaluating method is proposed to select and recommend emails to users for labeling by combining the user interests. Finally, the emails which are labeled and classified with the greatest possibilities are used for retraining based on the incremental learning theory. In the experiments, Support Vector Machine (SVM), Naïve Bayesian (NB) and K-Nearest Neighbors (KNN) classifiers are used on two corpuses: Trec2007 and Enron-spam. Comparing with six typical active learning based incremental learning methods, the proposed method greatly reduces the consuming time of email classification while guaranteeing the accuracy. Moreover, the proposed method brings very small sample labeling burden to the users, proving its high value on online application.

Keywords: A mail classification, term frequency based interest sets, Support Vector Machine, Naïve Bayesian, K-NearestNeighbors, active learning, incremental learning

DOI: 10.3233/IFS-151707

Journal: Journal of Intelligent & Fuzzy Systems, vol. 30, no. 1, pp. 17-27, 2016

Published: 2016

Price: EUR 27.50

North America

IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
USA

Tel: +1 703 830 6300
Fax: +1 703 830 2300
sales@iospress.com

For editorial issues, like the status of your submitted paper or proposals, write to editorial@iospress.nl

Europe

IOS Press
Nieuwe Hemweg 6B
1013 BG Amsterdam
The Netherlands

Tel: +31 20 688 3355
Fax: +31 20 687 0091
info@iospress.nl

For editorial issues, permissions, book requests, submissions and proceedings, contact the Amsterdam office info@iospress.nl

Asia

Inspirees International (China Office)
Ciyunsi Beili 207(CapitaLand), Bld 1, 7-901
100025, Beijing
China

Free service line: 400 661 8717
Fax: +86 10 8446 7947
china@iospress.cn

For editorial issues, like the status of your submitted paper or proposals, write to editorial@iospress.nl

如果您在出版方面需要帮助或有任何建, 件至: editorial@iospress.nl

Share this:

North America

Europe

Asia