Searching for just a few words should be enough to get started. If you need to make more complex queries, use the tips below to guide you.
Article type: Research Article
Authors: Mohamed Quafafou, | Moussa Boussouf,
Affiliations: IRIN, University of Nantes, 2 rue de la Houssiniere, BP 92208 - 44322, Nantes Cedex 03, France. E-mail: quafafou@irin.univ-nantes.fr, boussouf@irin.univ-nantes.fr
Abstract: The problem of feature subset selection can be defined as the selection of a relevant subset of features which allows a learning algorithm to induce small high-accuracy models. This problem is of primary important because irrelevant and redundant features may degrade the learner speed, especially in the context of high dimensionality, and reduce both the accuracy and comprehensibility of the induced model. Two main approaches have been developed, the first one is algorithm-independent (filter approach) which considers only the data, when the second approach which is algorithm-dependent takes into account both the data and a given learning algorithm (wrapper approach). Recent work was developed to study the interest of the rough set theory and more particularly its notions of reducts and core to deal with the problem of feature subset selection. Different methods were proposed to select features using both the core and the reduct concepts, whereas other researches show that useful feature subsets do not necessarily contain all features in cores. In this paper, we underline the fact that rough set theory is concerned with deterministic analysis of attribute dependencies which are at the basis of the two notions of reduct and core. We extend the notion of dependency which allows to find both deterministic and non-deterministic dependencies. A new notion of strong reducts is then introduced and leads to the definition of strong feature subsets (SFS). The interest of SFS is illustrated by the improvement of the accuracy of C4.5 on real-world datasets. Our study shows that generally the highest-accuracy-subset is not the best one as regards to the filter criteria. The highest accuracy subset is found by the new approach with minimum cost. The contribution of this work is four folds : (1) analysis of feature subset selection in the rough sets context, (2) introduction of new definitions based on a generalized rough set theory, i.e., α-RST, (3) reformulation of the selection problem, (4) description of a hybrid method combining combining both the filter and the wrapper approaches.
DOI: 10.3233/IDA-2000-4102
Journal: Intelligent Data Analysis, vol. 4, no. 1, pp. 3-17, 2000
IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
USA
Tel: +1 703 830 6300
Fax: +1 703 830 2300
sales@iospress.com
For editorial issues, like the status of your submitted paper or proposals, write to editorial@iospress.nl
IOS Press
Nieuwe Hemweg 6B
1013 BG Amsterdam
The Netherlands
Tel: +31 20 688 3355
Fax: +31 20 687 0091
info@iospress.nl
For editorial issues, permissions, book requests, submissions and proceedings, contact the Amsterdam office info@iospress.nl
Inspirees International (China Office)
Ciyunsi Beili 207(CapitaLand), Bld 1, 7-901
100025, Beijing
China
Free service line: 400 661 8717
Fax: +86 10 8446 7947
china@iospress.cn
For editorial issues, like the status of your submitted paper or proposals, write to editorial@iospress.nl
如果您在出版方面需要帮助或有任何建, 件至: editorial@iospress.nl