Manual and non-manual sign language recognition framework using hybrid deep learning techniques

Javaid, Sameena; Rizvi, Safdar

doi:10.3233/JIFS-230560

Manual and non-manual sign language recognition framework using hybrid deep learning techniques

Article type: Research Article

Authors: Javaid, Sameena^{; *} | Rizvi, Safdar

Affiliations: Department of Computer Sciences, School of Engineering and Applied Sciences, Bahria University, Karachi Campus, Karachi, Pakistan

Correspondence: [*] Corresponding author. Sameena Javaid, Department of Computer Sciences, School of Engineering and Applied Sciences, Bahria University, Karachi Campus, Karachi, Pakistan. E-mail: sameenajaved.bukc@bahria.edu.pk.

Abstract: Sign language recognition is a significant cross-modal way to fill the communication gap between deaf and hearing people. Automatic Sign Language Recognition (ASLR) translates sign language gestures into text and spoken words. Several researchers are focusing either on manual gestures or non-manual gestures separately; a rare focus is on concurrent recognition of manual and non-manual gestures. Facial expression and other body movements can improve the accuracy rate, as well as enhance signs’ exact meaning. The current paper proposes a Multimodal –Sign Language Recognition (MM-SLR) framework to recognize non-manual features based on facial expressions along with manual gestures in Spatio temporal domain representing hand movements in ASLR. Our proposed architecture has three modules, first, a modified architecture of YOLOv5 is defined to extract faces and hands from videos as two Regions of Interest. Second, refined C3D architecture is used to extract features from the hand region and the face region, further, feature concatenation of both modalities is applied. Lastly, LSTM network is used to get spatial-temporal descriptors and attention-based sequential modules for gesture classification. To validate the proposed framework we used three publically available datasets RWTH-PHONIX-WEATHER-2014T, SILFA and PkSLMNM. Experimental results show that the above-mentioned MM-SLR framework outperformed on all datasets.

Keywords: C3D, LSTM, manual gestures, non-manual gestures, sign language recognition, YOLOv5

DOI: 10.3233/JIFS-230560

Journal: Journal of Intelligent & Fuzzy Systems, vol. 45, no. 3, pp. 3823-3833, 2023

Published: 24 August 2023

Price: EUR 27.50

North America

IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
USA

Tel: +1 703 830 6300
Fax: +1 703 830 2300
sales@iospress.com

For editorial issues, like the status of your submitted paper or proposals, write to editorial@iospress.nl

Europe

IOS Press
Nieuwe Hemweg 6B
1013 BG Amsterdam
The Netherlands

Tel: +31 20 688 3355
Fax: +31 20 687 0091
info@iospress.nl

For editorial issues, permissions, book requests, submissions and proceedings, contact the Amsterdam office info@iospress.nl

Asia

Inspirees International (China Office)
Ciyunsi Beili 207(CapitaLand), Bld 1, 7-901
100025, Beijing
China

Free service line: 400 661 8717
Fax: +86 10 8446 7947
china@iospress.cn

For editorial issues, like the status of your submitted paper or proposals, write to editorial@iospress.nl

如果您在出版方面需要帮助或有任何建, 件至: editorial@iospress.nl

Share this:

North America

Europe

Asia