Searching for just a few words should be enough to get started. If you need to make more complex queries, use the tips below to guide you.
Article type: Research Article
Authors: Wang, Fana | Tian, Shengweia; * | Yu, Longb | Long, Junc | Zhou, Tiejund | Wang, Boa | Wang, Junwena | Wang, Yongtaoa
Affiliations: [a] School of Software, University of Xinjiang, Xinjiang, China | [b] Network and Information Center, University of Xinjiang, Xinjiang, China | [c] Institute of Big Data Research, University of Central South, Changsha, China | [d] Internet Information Security Centre, Xinjiang, China
Correspondence: [*] Corresponding author. Shengwei Tian, School of Software, University of Xinjiang, Xinjiang, China. E-mail: tianshengwei@163.com.
Abstract: Human multi-modal emotions analysis includes time series data with different modalities, such as verbal, visual, and auditory. Due to different sampling rates from each modality, the collected data streams are unaligned. The asynchrony cross-modality increases the difficulty of multi-modal fusion. Therefore, we propose a new Cross-Modality Reinforcement model (CMR) based on recent advances in a cross-modality transformer, which performs multi-modal fusion in unaligned multi-modal sequences for emotion prediction. To deal with the long-time dependencies of unaligned sequences, we introduce a time domain aggregation to model the single modal, by aggregating the information in the time dimension, and enhance contextual dependencies. Moreover, a CMR strategy is introduced in our approach.With the main and secondary modalities as inputs to the module, main modal features are strengthened through cross-modality attention and cross-modality gate, and the secondary modality information flows to the main modality potentially, while retaining main modality-specific features and complementing the missing cues. This process gradually learns the common contributing features between the main and secondary modalities and reduces the noise caused by the variability of the modal features. Finally, the enhanced features are used to make predictions about human emotions. We evaluate CMR on two multi-modal sentiment analysis benchmark datasets, and we report the accuracy of 82.7% on the CMU-MOSI and 82.5% and CMU-MOSEI, respectively, which demonstrates our method outperforms current state-of-the-art methods.
Keywords: Cross-modality processing, multi-modal fusion, multi-modal unaligned sequences, multi-modal sentiment analysis
DOI: 10.3233/JIFS-213536
Journal: Journal of Intelligent & Fuzzy Systems, vol. 43, no. 5, pp. 6013-6025, 2022
IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
USA
Tel: +1 703 830 6300
Fax: +1 703 830 2300
sales@iospress.com
For editorial issues, like the status of your submitted paper or proposals, write to editorial@iospress.nl
IOS Press
Nieuwe Hemweg 6B
1013 BG Amsterdam
The Netherlands
Tel: +31 20 688 3355
Fax: +31 20 687 0091
info@iospress.nl
For editorial issues, permissions, book requests, submissions and proceedings, contact the Amsterdam office info@iospress.nl
Inspirees International (China Office)
Ciyunsi Beili 207(CapitaLand), Bld 1, 7-901
100025, Beijing
China
Free service line: 400 661 8717
Fax: +86 10 8446 7947
china@iospress.cn
For editorial issues, like the status of your submitted paper or proposals, write to editorial@iospress.nl
如果您在出版方面需要帮助或有任何建, 件至: editorial@iospress.nl