Searching for just a few words should be enough to get started. If you need to make more complex queries, use the tips below to guide you.
Issue title: Soft Computing and Advances in Intelligent Systems
Guest editors: Ildar Batyrshin, Fernando Gomide, Vladik Kreinovich and Shahnaz Shahbazova
Article type: Research Article
Authors: Balouchzahi, Fazlourrahmana | Shashirekha, Hosahalli Lakshmaiahb | Sidorov, Grigoria; * | Gelbukh, Alexandera
Affiliations: [a] Instituto Politécnico Nacional, Centro de Investigación en Computación, CDMX, Mexico | [b] Department of Computer Science, Mangalore University Mangalore, India
Correspondence: [*] Corresponding author. Grigori Sidorov, Instituto Politécnico Nacional, Centro de Investigación en Computación, CDMX, Mexico. Email: sidorov@cic.ipn.mx.
Abstract: Curfews and lockdowns around the world in the Covid-19 era have increased the usage of the internet drastically and accordingly the amount of data shared on social media. In addition to using social media for sharing useful information, some miscreants are using the power of social media to spread hate speech and offensive content. Filtering the offensive language content manually is a laborious task due to the huge volume of data. Further, rapid developments in hardware and software technology have provided opportunities for users to post their comments not only in English but also in their native language scripts. However, based on the ease of Roman script usage, social media users specifically in multilingual countries like India, prefer to comment in code-mixed and multi-script texts. The typical systems that are employed to process and analyze monolingual texts are usually not appropriate for these kinds of texts. Further, as these texts do not adhere to the rules and regulations of any language to frame the words and sentences, the complexity of analyzing such texts increases. The novelty of the present study is to address the Offensive Language Identification (OLI) task in code-mixed and multi-script texts, this paper proposes to use relevant syllable and character n-grams features to train Machine Learning (ML) classifiers. The performance of the proposed models is evaluated on three Dravidian language pairs, namely: Malayalam-English, Tamil-English, and Kannada-English. The performances of ML classifiers prove the effectiveness of syllable and character n-grams features for code-mixed and multi-script texts analysis.
Keywords: Code-mixed, multi-script, offensive language identification, syllable, character n-grams
DOI: 10.3233/JIFS-212872
Journal: Journal of Intelligent & Fuzzy Systems, vol. 43, no. 6, pp. 6995-7005, 2022
IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
USA
Tel: +1 703 830 6300
Fax: +1 703 830 2300
sales@iospress.com
For editorial issues, like the status of your submitted paper or proposals, write to editorial@iospress.nl
IOS Press
Nieuwe Hemweg 6B
1013 BG Amsterdam
The Netherlands
Tel: +31 20 688 3355
Fax: +31 20 687 0091
info@iospress.nl
For editorial issues, permissions, book requests, submissions and proceedings, contact the Amsterdam office info@iospress.nl
Inspirees International (China Office)
Ciyunsi Beili 207(CapitaLand), Bld 1, 7-901
100025, Beijing
China
Free service line: 400 661 8717
Fax: +86 10 8446 7947
china@iospress.cn
For editorial issues, like the status of your submitted paper or proposals, write to editorial@iospress.nl
如果您在出版方面需要帮助或有任何建, 件至: editorial@iospress.nl