A comparative study of saliency analysis and genetic algorithm for feature selection in support vector machines

Tay, Francis Eng Hock; Cao, Li Juan

doi:10.3233/IDA-2001-5302

A comparative study of saliency analysis and genetic algorithm for feature selection in support vector machines

Article type: Research Article

Authors: Tay, Francis Eng Hock | Cao, Li Juan

Affiliations: Department of Mechanical Engineering, National University of Singapore, 10 Kent Ridge Crescent, 119260, Singapore. E-mail: mpetayeh@nus.edu.sg

Abstract: Recently, support vector machine (SVM) has been receiving increasing attention in the field of regression estimation due to its remarkable characteristics such as good generalization performance, the absence of local minima and sparse representation of the solution. However, within the SVMs framework, there are very few established approaches for identifying important features. Selecting significant features from all candidate features is the first step in regression estimation, and this procedure can improve the network performance, reduce the network complexity, and speed up the training of the network. This paper investigates the use of saliency analysis (SA) and genetic algorithm (GA) in SVMs for selecting important features in the context of regression estimation. The SA measures the importance of features by evaluating the sensitivity of the network output with respect to the feature input. The derivation of the sensitivity of the network output to the feature input in terms of the partial derivative in SVMs is presented, and a systematic approach to remove irrelevant features based on the sensitivity is developed. GA is an efficient search method based on the mechanics of natural selection and population genetics. A simple GA is used where all features are mapped into binary chromosomes with a bit “1” representing the inclusion of the feature and a bit of “0” representing the absence of the feature. The performances of SA and GA are tested using two simulated non-linear time series and five real financial time series. The experiments show that with the simulated data, GA and SA detect the same true feature set from the redundant feature set, and the method of SA is also insensitive to the kernel function selection. With the real financial data, GA and SA select different subsets of features. Both selected feature sets achieve higher generation performance in SVMs than that of the full feature set. In addition, the generation performance between the selected feature sets of GA and SA is similar. All the results demonstrate that that both SA and GA are effective in SVMs for identifying important features.

Keywords: feature selection, support vector machines, structural risk minimization principle, saliency analysis, genetic algorithm

DOI: 10.3233/IDA-2001-5302

Journal: Intelligent Data Analysis, vol. 5, no. 3, pp. 191-209, 2001

Received 28 August 2000

Accepted 30 October 2000

Published: 1 May 2001

Price: EUR 27.50

North America

IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
USA

Tel: +1 703 830 6300
Fax: +1 703 830 2300
sales@iospress.com

For editorial issues, like the status of your submitted paper or proposals, write to editorial@iospress.nl

Europe

IOS Press
Nieuwe Hemweg 6B
1013 BG Amsterdam
The Netherlands

Tel: +31 20 688 3355
Fax: +31 20 687 0091
info@iospress.nl

For editorial issues, permissions, book requests, submissions and proceedings, contact the Amsterdam office info@iospress.nl

Asia

Inspirees International (China Office)
Ciyunsi Beili 207(CapitaLand), Bld 1, 7-901
100025, Beijing
China

Free service line: 400 661 8717
Fax: +86 10 8446 7947
china@iospress.cn

For editorial issues, like the status of your submitted paper or proposals, write to editorial@iospress.nl

如果您在出版方面需要帮助或有任何建, 件至: editorial@iospress.nl

Share this:

North America

Europe

Asia