Hubness weighted SVM ensemble for prediction of breast cancer subtypes

Raja Sree, S.; Kunthavai, A.

doi:10.3233/THC-212825

Hubness weighted SVM ensemble for prediction of breast cancer subtypes

Article type: Research Article

Authors: Raja Sree, S.^{a; *} | Kunthavai, A.^b

Affiliations: [a] Department of Information Technology, Coimbatore Institute of Technology, Coimbatore, India | [b] Department of Computer Science and Engineering, Coimbatore Institute of Technology, Coimbatore, India

Correspondence: [*] Corresponding author: S. Raja Sree, Department of Information Technology, Coimbatore Institute of Technology, Coimbatore, Tamilnadu 641004, India. E-mail: rajasree.s@cit.edu.in.

Abstract: BACKGROUND: Breast cancer is a major disease causing panic among women worldwide. Since gene mutations are the root cause for cancer development, analyzing gene expressions can give more insights into various phenotype of cancer treatments. Breast Cancer subtype prediction from gene expression data can provide more information for cancer treatment decisions. OBJECTIVE: Gene expressions are complex for analysis due to its high dimensional nature. Machine learning algorithms such as k-Nearest Neighbors, Support Vector Machine (SVM) and Random Forest are used with selection of features for prediction of breast cancer subtypes. Prediction accuracy of the existing methods are affected due to high dimensional nature of gene expressions. The objective of the work is to propose an efficient algorithm for the prediction of breast cancer subtypes from gene expression. METHODS: For subtype prediction, a novel Hubness Weighted Support Vector machine algorithm (HWSVM) using bad hubness score as a weight measure to handle the outliers in the data has been proposed. Based on the various subtypes, features are projected into seven different feature sets and Ensemble based Hubness Aware Weighted Support Vector Machine (HWSVMEns) is implemented for breast cancer subtype prediction. RESULTS:The proposed algorithms have been compared with the classical SVM and other traditional algorithms such as Random Forest, k-Nearest Neighbor algorithms and also with various gene selection methods. CONCLUSIONS:Experimental results show that the proposed HWSVM outperforms other algorithms in terms of accuracy, precision, recall and F1 score due to the hubness weightage scheme and the ensemble approach. The experiments have shown an average accuracy of 92% across various gene expression datasets.

Keywords: Breast cancer subtypes, high-dimensional data, hubness, gene selection, support vector machine

DOI: 10.3233/THC-212825

Journal: Technology and Health Care, vol. 30, no. 3, pp. 565-578, 2022

Received 11 January 2021

Accepted 15 June 2021

Published: 12 May 2022

Price: EUR 27.50

North America

IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
USA

Tel: +1 703 830 6300
Fax: +1 703 830 2300
sales@iospress.com

For editorial issues, like the status of your submitted paper or proposals, write to editorial@iospress.nl

Europe

IOS Press
Nieuwe Hemweg 6B
1013 BG Amsterdam
The Netherlands

Tel: +31 20 688 3355
Fax: +31 20 687 0091
info@iospress.nl

For editorial issues, permissions, book requests, submissions and proceedings, contact the Amsterdam office info@iospress.nl

Asia

Inspirees International (China Office)
Ciyunsi Beili 207(CapitaLand), Bld 1, 7-901
100025, Beijing
China

Free service line: 400 661 8717
Fax: +86 10 8446 7947
china@iospress.cn

For editorial issues, like the status of your submitted paper or proposals, write to editorial@iospress.nl

如果您在出版方面需要帮助或有任何建, 件至: editorial@iospress.nl

Share this:

North America

Europe

Asia