Searching for just a few words should be enough to get started. If you need to make more complex queries, use the tips below to guide you.
Article type: Research Article
Authors: Borchani, Hanen; * | Larrañaga, Pedro | Bielza, Concha
Affiliations: Departamento de Inteligencia Artificial, Facultad de Informática, Universidad Politécnica de Madrid, Boadilla del Monte, Madrid, Spain
Correspondence: [*] Corresponding author: Hanen Borchani, Departamento de Inteligencia Artificial, Facultad de Informática, Universidad Politécnica de Madrid, Boadilla del Monte, 28660, Madrid, Spain. Tel.: +34 913363675; Fax: +34 913524819; E-mail: hanen.borchani@upm.es.
Abstract: Recently, several approaches have been proposed to deal with the increasingly challenging task of mining concept-drifting data streams. However, most are based on supervised classification algorithms assuming that true labels are immediately and entirely available in the data streams. Unfortunately, such an assumption is often violated in real-world applications given that it is expensive or because it takes a long time to obtain all true labels. To deal with this problem, we propose in this paper a new semi-supervised approach for handling concept-drifting data streams containing both labeled and unlabeled instances. First, contrary to existing approaches, we monitor three possible kinds of drift: feature, conditional or dual drift. Drift detection is based on a hypothesis test comparing Kullback-Leibler divergence between old and recent data, whose distribution under the null hypothesis of coming from the same distribution is approximated via a bootstrap method. Then, if any drift occurs, a new classifier is learned from the recent data using the EM algorithm; otherwise, the current classifier is left unchanged. Our approach is so general that it can be applied to different classification models. Experimental studies, using the naive Bayes classifier and logistic regression, on both synthetic and real-world data sets demonstrate that our approach performs well.
Keywords: Data streams, concept drift, change detection, semi-supervised learning
DOI: 10.3233/IDA-2011-0488
Journal: Intelligent Data Analysis, vol. 15, no. 5, pp. 655-670, 2011
IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
USA
Tel: +1 703 830 6300
Fax: +1 703 830 2300
sales@iospress.com
For editorial issues, like the status of your submitted paper or proposals, write to editorial@iospress.nl
IOS Press
Nieuwe Hemweg 6B
1013 BG Amsterdam
The Netherlands
Tel: +31 20 688 3355
Fax: +31 20 687 0091
info@iospress.nl
For editorial issues, permissions, book requests, submissions and proceedings, contact the Amsterdam office info@iospress.nl
Inspirees International (China Office)
Ciyunsi Beili 207(CapitaLand), Bld 1, 7-901
100025, Beijing
China
Free service line: 400 661 8717
Fax: +86 10 8446 7947
china@iospress.cn
For editorial issues, like the status of your submitted paper or proposals, write to editorial@iospress.nl
如果您在出版方面需要帮助或有任何建, 件至: editorial@iospress.nl