Searching for just a few words should be enough to get started. If you need to make more complex queries, use the tips below to guide you.
Article type: Research Article
Authors: González-Pérez, Pedro Pabloa | Sánchez-Gutiérrez, Máximo Eduardob; *
Affiliations: [a] Departamento de Matemáticas Aplicadas y Sistemas, Universidad Autónoma Metropolitana-Cuajimalpa, Ciudad de México, México | [b] Colegio de Ciencia y Tecnología, Universidad Autónoma de la Ciudad de México, Ciudad de México, México
Correspondence: [*] Corresponding author: Máximo Eduardo Sánchez-Gutiérrez, Colegio de Ciencia y Tecnología, Universidad Autónoma de la Ciudad de México, México. E-mail: maximo.sanchez@uacm.edu.mx.
Abstract: It is important to make sense of the data within its context to propose a useful model to solve a problem. This domain knowledge includes information not contained in the data, but that will help us understand the data to be fed into a machine-learning algorithm and guide us on what features might help our model. Nevertheless, domain knowledge may become insufficient as the input variables increase, forcing the need to try automated feature selection techniques. In this study, we investigate whether the joint use of 1) feature selection techniques, such as Chi-square, Tree-based Feature Selection, Pearson’s Correlation, LASSO, Low Variance, and Recursive Feature Elimination, 2) outlier detection methods such as Isolation-Forest, and 3) Cross-Validation techniques lead to improving the accuracy in multiclass classification in machine learning. Specifically, we address the classification of patterns representing the activation state of cell signaling components into classes that symbolize the different cellular processes triggered in cancer cells. The results presented in this work have shown an accuracy increase with up to 80% fewer input features by only using 3 out of the 16 original descriptors.
Keywords: Multiclass classification, machine learning, exploratory data analysis, dimensionality reduction, cellular signaling data
DOI: 10.3233/IDA-215826
Journal: Intelligent Data Analysis, vol. 26, no. 2, pp. 481-500, 2022
IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
USA
Tel: +1 703 830 6300
Fax: +1 703 830 2300
sales@iospress.com
For editorial issues, like the status of your submitted paper or proposals, write to editorial@iospress.nl
IOS Press
Nieuwe Hemweg 6B
1013 BG Amsterdam
The Netherlands
Tel: +31 20 688 3355
Fax: +31 20 687 0091
info@iospress.nl
For editorial issues, permissions, book requests, submissions and proceedings, contact the Amsterdam office info@iospress.nl
Inspirees International (China Office)
Ciyunsi Beili 207(CapitaLand), Bld 1, 7-901
100025, Beijing
China
Free service line: 400 661 8717
Fax: +86 10 8446 7947
china@iospress.cn
For editorial issues, like the status of your submitted paper or proposals, write to editorial@iospress.nl
如果您在出版方面需要帮助或有任何建, 件至: editorial@iospress.nl