Searching for just a few words should be enough to get started. If you need to make more complex queries, use the tips below to guide you.
Article type: Research Article
Authors: Huang, Shuainaa; b | Zhang, Zhiyonga; b; * | Song, Bina; b | Mao, Yuehenga; b
Affiliations: [a] College of Information Engineering, Henan University of Science and Technology, Henan Luoyang, China | [b] Henan International Joint Laboratory of Cyberspace Security Applications, Henan University of Science and Technology, Henan Luoyang, China
Correspondence: [*] Corresponding author. Zhiyong Zhang, Henan International Joint Laboratory of Cyberspace Security Applications, Henan University of Science and Technology, Henan Luoyang 471023, China. E-mail: zhangzy@haust.edu.cn.
Note: [1] The work was supported by National Natural Science Foundation of China Grant No. 61972133, Project of Leading Talents in Science and Technology Innovation in Henan Province Grant No. 204200510021, Program for Henan Province Key Science and Technology under Grant No. 222102210177 and Henan Province University Key Scientific Research Project under Grant No. 23A520008.
Abstract: Social network attackers leverage images and text to disseminate sensitive information associated with pornography, politics, and terrorism,causing adverse effects on society.The current sensitive information classification model does not focus on feature fusion between images and text, greatly reducing recognition accuracy.To address this problem, we propose an attentive cross-modal fusion model (ACMF), which utilizes mixed attention mechanism and the Contrastive Language-Image Pre-training model.Specifically, we employ a deep neural network with a mixed attention mechanism as a visual feature extractor. This allows us to progressively extract features at different levels. We combine these visual features with those obtained from a text feature extractor and incorporate image-text frequency domain information at various levels to enable fine-grained modeling. Additionally, we introduce a cyclic attention mechanism and integrate the Contrastive Language-Image Pre-training model to establish stronger connections between modalities, thereby enhancing classification performance.Experimental evaluations conducted on sensitive information datasets collected demonstrate the superiority of our method over other baseline models. The model achieves an accuracy rate of 91.4% and an F1-score of 0.9145. These results validate the effectiveness of the mixed attention mechanism in enhancing the utilization of important features. Furthermore, the effective fusion of text and image features significantly improves the classification ability of the deep neural network.
Keywords: Multi-modal, sensitive information, spatial attention mechanism, channel attention mechanism, deep learning
DOI: 10.3233/JIFS-233508
Journal: Journal of Intelligent & Fuzzy Systems, vol. 45, no. 6, pp. 12425-12437, 2023
IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
USA
Tel: +1 703 830 6300
Fax: +1 703 830 2300
sales@iospress.com
For editorial issues, like the status of your submitted paper or proposals, write to editorial@iospress.nl
IOS Press
Nieuwe Hemweg 6B
1013 BG Amsterdam
The Netherlands
Tel: +31 20 688 3355
Fax: +31 20 687 0091
info@iospress.nl
For editorial issues, permissions, book requests, submissions and proceedings, contact the Amsterdam office info@iospress.nl
Inspirees International (China Office)
Ciyunsi Beili 207(CapitaLand), Bld 1, 7-901
100025, Beijing
China
Free service line: 400 661 8717
Fax: +86 10 8446 7947
china@iospress.cn
For editorial issues, like the status of your submitted paper or proposals, write to editorial@iospress.nl
如果您在出版方面需要帮助或有任何建, 件至: editorial@iospress.nl