Weighted Instance Typicality Search (WITS): A nearest neighbor data reduction algorithm

Morring, Brent D.; Martinez, Tony R.

doi:10.3233/IDA-2004-8104

Weighted Instance Typicality Search (WITS): A nearest neighbor data reduction algorithm

Article type: Research Article

Authors: Morring, Brent D. | Martinez, Tony R.

Affiliations: Computer Science Department, Brigham Young University, Provo, UT 84602, USA. E-mail: morringb@axon.cs.byu.edu, martinez@cs.byu.edu

Abstract: Two disadvantages of the standard nearest neighbor algorithm are 1) it must store all the instances of the training set, thus creating a large memory footprint and 2) it must search all the instances of the training set to predict the classification of a new query point, thus it is slow at run time. Much work has been done to remedy these shortcomings. This paper presents a new algorithm WITS (Weighted-Instance Typicality Search) and a modified version, Clustered-WITS (C-WITS), designed to address these issues. Data reduction algorithms address both issues by storing and using only a portion of the available instances. WITS is an incremental data reduction algorithm with O(n2) complexity, where n is the training set size. WITS uses the concept of Typicality in conjunction with Instance-Weighting to produce minimal nearest neighbor solutions. WITS and C-WITS are compared to three other state of the art data reduction algorithms on ten real-world datasets. WITS achieved the highest average accuracy, showed fewer catastrophic failures, and stored an average of 71% fewer instances than DROP-5, the next most competitive algorithm in terms of accuracy and catastrophic failures. The C-WITS algorithm provides a user-defined parameter that gives the user control over the training-time vs. accuracy balance. This modification makes C-WITS more suitable for large problems, the very problems data reductions algorithms are designed for. On two large problems (10,992 and 20,000 instances), C-WITS stores only a small fraction of the instances (0.88% and 1.95% of the training data)while maintaining generalization accuracies comparable to the best accuracies reported for these problems.

Keywords: instance-based learning, nearest-neighbor, instance reduction, pruning, classification

DOI: 10.3233/IDA-2004-8104

Journal: Intelligent Data Analysis, vol. 8, no. 1, pp. 61-78, 2004

Received 26 November 2002

Accepted 13 February 2003

Published: 15 March 2004

Price: EUR 27.50

North America

IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
USA

Tel: +1 703 830 6300
Fax: +1 703 830 2300
sales@iospress.com

For editorial issues, like the status of your submitted paper or proposals, write to editorial@iospress.nl

Europe

IOS Press
Nieuwe Hemweg 6B
1013 BG Amsterdam
The Netherlands

Tel: +31 20 688 3355
Fax: +31 20 687 0091
info@iospress.nl

For editorial issues, permissions, book requests, submissions and proceedings, contact the Amsterdam office info@iospress.nl

Asia

Inspirees International (China Office)
Ciyunsi Beili 207(CapitaLand), Bld 1, 7-901
100025, Beijing
China

Free service line: 400 661 8717
Fax: +86 10 8446 7947
china@iospress.cn

For editorial issues, like the status of your submitted paper or proposals, write to editorial@iospress.nl

如果您在出版方面需要帮助或有任何建, 件至: editorial@iospress.nl

Share this:

North America

Europe

Asia