A novel ensemble model for identification and classification of cyber harassment on social media platform

Abarna, S.; Sheeba, J.I.; Pradeep Devaneyan, S.

doi:10.3233/JIFS-230346

A novel ensemble model for identification and classification of cyber harassment on social media platform

Article type: Research Article

Authors: Abarna, S.^{a; *} | Sheeba, J.I.^a | Pradeep Devaneyan, S.^b

Affiliations: [a] Department of Computer Science and Engineering Puducherry Technological University, Puducherry, India | [b] Deparment of Mechanical Engineering, Sri Venkateshwaraa College of Engineering and Technology, Puducherry, India

Correspondence: [*] Corresponding author. S. Abarna, Ph.D. Scholar, Department of CSE, Puducherry Technological University, Puducherry, India. E-mail: abarna@pec.edu.

Abstract: Schools and universities shuttered as a result of the worldwide COVID-19 pandemic lockdown, and student screen time skyrocketed. Since the programs are delivered online, a spike in social media use during lockdown resulted in many pupils becoming victims of cyberbullying, which includes criticizing one another, posting sexual comments on images of young ladies, and using fake accounts to bully others. Machine Learning (ML) and Natural Language Processing (NLP) techniques are being used in a growing body of work on automated cyberbullying detection. Different machine learning methods, however, are unable to converge to the requisite accuracy. Thus, numerous classifier systems known as “ensemble learning” are proposed in order to improve predictive performance by aggregating the predictions from various models. In our proposed system, we use a novel method of detecting online harassment (cyberbullying) on the Instagram dataset. The attributes of abusive words are initially analyzed from feature selection and pre-trained word embedding language models like Bidirectional Encoder Representations from Transformers (BERT) and Embeddings from Language Models (ELMO). A knowledge-based frequent pattern method is used to find the intention of the harasser and is created by the Knowledge-BERT (K-BERT). The unsupervised approaches such as Latent Semantic Analysis (LSA), Frequent pattern growth (FP-Growth), and a clustering technique K-Means. The results from the detection models are ensembled using Extreme Gradient Boosting (XGBoost) for classifying the categories of online harassment. The performance of the ensemble model is then cross-validated using machine learning metrics and compared with various existing techniques. An ensemble model performs better with a higher F1 score of 92.04% with less error rate in the classification of harassment categories.

Keywords: Cyber-harassment, ensemble learning, K-BERT, BERT, ELMO, FP-growth, LSA, K-means, XGBoost, NLP

DOI: 10.3233/JIFS-230346

Journal: Journal of Intelligent & Fuzzy Systems, vol. 45, no. 1, pp. 13-36, 2023

Published: 02 July 2023

Price: EUR 27.50

North America

IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
USA

Tel: +1 703 830 6300
Fax: +1 703 830 2300
sales@iospress.com

For editorial issues, like the status of your submitted paper or proposals, write to editorial@iospress.nl

Europe

IOS Press
Nieuwe Hemweg 6B
1013 BG Amsterdam
The Netherlands

Tel: +31 20 688 3355
Fax: +31 20 687 0091
info@iospress.nl

For editorial issues, permissions, book requests, submissions and proceedings, contact the Amsterdam office info@iospress.nl

Asia

Inspirees International (China Office)
Ciyunsi Beili 207(CapitaLand), Bld 1, 7-901
100025, Beijing
China

Free service line: 400 661 8717
Fax: +86 10 8446 7947
china@iospress.cn

For editorial issues, like the status of your submitted paper or proposals, write to editorial@iospress.nl

如果您在出版方面需要帮助或有任何建, 件至: editorial@iospress.nl

Share this:

North America

Europe

Asia