Searching for just a few words should be enough to get started. If you need to make more complex queries, use the tips below to guide you.
Article type: Research Article
Authors: Xue, Cuihonga; * | Yu, Mingb | Yan, Gangb | Qin, Mengxianb | Liu, Yuehaob | Jia, Jinglib
Affiliations: [a] Technical College for the Deaf, Tianjin University of Technology, Tianjin, China | [b] School of Artificial Intelligence, Hebei University of Technology, Tianjin, China
Correspondence: [*] Corresponding author. Cuihong Xue, Technical College for the Deaf, Tianjin University of Technology, Tianjin, 300384, China. E-mail: redxuech@tjut.edu.cn.
Abstract: Some of the existing continuous sign language recognition (CSLR) methods require alignment. However, this is time-consuming, and breaks the continuity of the frame sequence, and also affects the subsequent process of CSLR. In this paper, we propose a multi-modal network framework for CSLR based on a multi-layer self-attention mechanism. We propose a 3D convolution residual neural network (CR3D) and a multi-layer self-attention network (ML-SAN) for the feature extraction stage. The CR3D obtains the short-term spatiotemporal features of the RGB and optical flow image streams, whereas the ML-SAN uses a bi-gated recurrent unit (BGRU) to model the long-term sequence relationship and a multi-layer self-attention mechanism to learn the internal relationships between sign language sequences. For the performance optimization stage, we propose a cross-modal spatial mapping loss function, which improves the precision of CSLR by studying the spatial similarity between the video and text domains. Experiments were conducted on two test datasets: the RWTH-PHOENIX-Weather multi-signer dataset, and a Chinese SL (CSL) dataset. The results show that the proposed method can obtain state-of-the-art recognition performance on the two datasets, with word error rate (WER) value of 24.4% and accuracy value of 14.42%, respectively.
Keywords: CR3D, multi-modal fusion, self-attention mechanism, ML-SAN, cross-modal spatial mapping
DOI: 10.3233/JIFS-211697
Journal: Journal of Intelligent & Fuzzy Systems, vol. 43, no. 4, pp. 4303-4316, 2022
IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
USA
Tel: +1 703 830 6300
Fax: +1 703 830 2300
sales@iospress.com
For editorial issues, like the status of your submitted paper or proposals, write to editorial@iospress.nl
IOS Press
Nieuwe Hemweg 6B
1013 BG Amsterdam
The Netherlands
Tel: +31 20 688 3355
Fax: +31 20 687 0091
info@iospress.nl
For editorial issues, permissions, book requests, submissions and proceedings, contact the Amsterdam office info@iospress.nl
Inspirees International (China Office)
Ciyunsi Beili 207(CapitaLand), Bld 1, 7-901
100025, Beijing
China
Free service line: 400 661 8717
Fax: +86 10 8446 7947
china@iospress.cn
For editorial issues, like the status of your submitted paper or proposals, write to editorial@iospress.nl
如果您在出版方面需要帮助或有任何建, 件至: editorial@iospress.nl