Searching for just a few words should be enough to get started. If you need to make more complex queries, use the tips below to guide you.
Issue title: Special Section: Intelligent and Fuzzy Systems applied to Language & Knowledge Engineering
Guest editors: David Pinto and Vivek Singh
Article type: Research Article
Authors: Gómez-Adorno, Helenaa | Fuentes-Alba, Roddyb | Markov, Iliac | Sidorov, Grigorib | Gelbukh, Alexanderb; *
Affiliations: [a] Instituto de Investigaciones en Matemáticas Aplicadas y en Sistemas (IIMAS), Universidad Nacional Autónoma de México, Mexico | [b] Centro de Investigación en Computación (CIC), Instituto Politécnico Nacional, Mexico | [c] Institut National de Recherche en Informatique et en Automatique (INRIA), France
Correspondence: [*] Corresponding author. Alexander Gelbukh. Centro de Investigación en Computación (CIC), Instituto Politécnico Nacional, Mexico. E-mail: gelbukh@gelbukh.com. Web: www.gelbukh.com.
Abstract: We present a method for gender and language variety identification using a convolutional neural network (CNN). We compare the performance of this method with a traditional machine learning algorithm – support vector machines (SVM) trained on character n-grams (n = 3–8) and lexical features (unigrams and bigrams of words), and their combinations. We use a single multi-labeled corpus composed of news articles in different varieties of Spanish developed specifically for these tasks. We present a convolutional neural network trained on word- and sentence-level embeddings architecture that can be successfully applied to gender and language variety identification on a relatively small corpus (less than 10,000 documents). Our experiments show that the deep learning approach outperforms a traditional machine learning approach on both tasks, when named entities are present in the corpus. However, when evaluating the performance of these approaches reducing all named entities to a single symbol “NE” to avoid topic-dependent features, the drop in accuracy is higher for the deep learning approach.
Keywords: Convolutional neural networks, deep learning, author profiling, gender identification, language variety identification, machine learning, character n-grams, Spanish
DOI: 10.3233/JIFS-179032
Journal: Journal of Intelligent & Fuzzy Systems, vol. 36, no. 5, pp. 4845-4855, 2019
IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
USA
Tel: +1 703 830 6300
Fax: +1 703 830 2300
sales@iospress.com
For editorial issues, like the status of your submitted paper or proposals, write to editorial@iospress.nl
IOS Press
Nieuwe Hemweg 6B
1013 BG Amsterdam
The Netherlands
Tel: +31 20 688 3355
Fax: +31 20 687 0091
info@iospress.nl
For editorial issues, permissions, book requests, submissions and proceedings, contact the Amsterdam office info@iospress.nl
Inspirees International (China Office)
Ciyunsi Beili 207(CapitaLand), Bld 1, 7-901
100025, Beijing
China
Free service line: 400 661 8717
Fax: +86 10 8446 7947
china@iospress.cn
For editorial issues, like the status of your submitted paper or proposals, write to editorial@iospress.nl
如果您在出版方面需要帮助或有任何建, 件至: editorial@iospress.nl