Cancer class prediction: Two stage clustering approach to identify informative genes

Alshalalfah, Mohammed; Alhajj, Reda

doi:10.3233/IDA-2009-0386

Cancer class prediction: Two stage clustering approach to identify informative genes

Article type: Research Article

Authors: Alshalalfah, Mohammed^a | Alhajj, Reda^{a; b}

Affiliations: [a] Department of Computer Science, University of Calgary, Calgary, Alberta, Canada. E-mail: msalshal@ucalgary.ca, alhajj@ucalgary.ca | [b] Department of Computer Science, Global University, Beirut, Lebanon

Abstract: Cancer classification is an important research area that has attracted the attention of several research groups over the last decades. However, there has been no general agreed upon approach for assigning tumors to known classes (a.k.a. class prediction). One challenge in microarray analysis, especially in cancerous gene expression profiles, is to identify genes or group of genes that are highly expressed in tumor cells but not in normal cells and vice versa. All of the methods described in the literature deal with features obtained directly from the data. Further, several clustering techniques have been proposed for the analysis of genome expression data, such as k-means, Self organizing maps, etc. However, these methods do not provide information about the influence of a given gene on the overall shape of the clusters. In this paper, we try to generate informative data, which can be more powerful in the classification of genes. We identify a set of reduced features capable of distinguishing between two classes by two stage clustering of genes using fuzzy c-means. In the first stage, the proposed clustering method clusters the original data. In the second stage, it clusters genes in each of the clusters produced from the first stage. We decided on using fuzzy c-means because a fuzzy model fits better gene expression data analysis by having a gene belong to different classes with a degree of membership per class. However, fuzziness parameter m is a major problem in applying fuzzy c-means for clustering. In this approach, we try to better identify the value of the fuzziness parameter when applying fuzzy c-means for microarray data. Support vector machine combined with different kernel functions are used for classification. The results from the experiments conducted on three benchmark data sets (including one multi-class data set) demonstrate the applicability and effectiveness of the proposed approach as compared to the other approaches described in the literature.

Keywords: Clustering, classification, microarray, validity analysis, support vector machines, fuzziness parameter, Fuzzy C-means

DOI: 10.3233/IDA-2009-0386

Journal: Intelligent Data Analysis, vol. 13, no. 4, pp. 671-686, 2009

Published: 14 August 2009

Price: EUR 27.50

North America

IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
USA

Tel: +1 703 830 6300
Fax: +1 703 830 2300
sales@iospress.com

For editorial issues, like the status of your submitted paper or proposals, write to editorial@iospress.nl

Europe

IOS Press
Nieuwe Hemweg 6B
1013 BG Amsterdam
The Netherlands

Tel: +31 20 688 3355
Fax: +31 20 687 0091
info@iospress.nl

For editorial issues, permissions, book requests, submissions and proceedings, contact the Amsterdam office info@iospress.nl

Asia

Inspirees International (China Office)
Ciyunsi Beili 207(CapitaLand), Bld 1, 7-901
100025, Beijing
China

Free service line: 400 661 8717
Fax: +86 10 8446 7947
china@iospress.cn

For editorial issues, like the status of your submitted paper or proposals, write to editorial@iospress.nl

如果您在出版方面需要帮助或有任何建, 件至: editorial@iospress.nl

Share this:

North America

Europe

Asia