Attention mechanism based LSTM in classification of stressed speech under workload

Yao, Xiao; Sheng, Zhengyan; Gu, Min; Wang, Haibin; Xu, Ning; Liu, Xiaofeng

doi:10.3233/IDA-205429

Attention mechanism based LSTM in classification of stressed speech under workload

Article type: Research Article

Authors: Yao, Xiao^a | Sheng, Zhengyan^{a; *} | Gu, Min^{b; c} | Wang, Haibin^a | Xu, Ning^a | Liu, Xiaofeng^a

Affiliations: [a] The College of IoT Engineering, Hohai University, Jiangsu, China | [b] Department of Stomatology, Affiliated Third Hospital of Soochow University, Suzhou, Jiangsu, China | [c] The First People’s Hospital of Changzhou, Changzhou, Jiangsu, China

Correspondence: [*] Corresponding author: Zhengyan Sheng, The College of IoT Engineering, Hohai University, Jiangsu, China. E-mail: 760551759@qq.com.

Abstract: In order to improve the robustness of speech recognition systems, this study attempts to classify stressed speech caused by the psychological stress under multitasking workloads. Due to the transient nature and ambiguity of stressed speech, the stress characteristics is not represented in all the segments in stressed speech as labeled. In this paper, we propose a multi-feature fusion model based on the attention mechanism to measure the importance of segments for stress classification. Through the attention mechanism, each speech frame is weighted to reflect the different correlations to the actual stressed state, and the multi-channel fusion of features characterizing the stressed speech to classify the speech under stress. The proposed model further adopts SpecAugment in view of the feature spectrum for data augment to resolve small sample sizes problem among stressed speech. During the experiment, we compared the proposed model with traditional methods on CASIA Chinese emotion corpus and Fujitsu stressed speech corpus, and results show that the proposed model has better performance in speaker-independent stress classification. Transfer learning is also performed for speaker-dependent classification for stressed speech, and the performance is improved. The attention mechanism shows the advantage for continuous speech under stress in authentic context comparing with traditional methods.

Keywords: Attention mechanism, speech under stress, multi-feature fusion, SpecAugment, transfer learning

DOI: 10.3233/IDA-205429

Journal: Intelligent Data Analysis, vol. 25, no. 6, pp. 1603-1627, 2021

Published: 29 October 2021

Price: EUR 27.50

North America

IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
USA

Tel: +1 703 830 6300
Fax: +1 703 830 2300
sales@iospress.com

For editorial issues, like the status of your submitted paper or proposals, write to editorial@iospress.nl

Europe

IOS Press
Nieuwe Hemweg 6B
1013 BG Amsterdam
The Netherlands

Tel: +31 20 688 3355
Fax: +31 20 687 0091
info@iospress.nl

For editorial issues, permissions, book requests, submissions and proceedings, contact the Amsterdam office info@iospress.nl

Asia

Inspirees International (China Office)
Ciyunsi Beili 207(CapitaLand), Bld 1, 7-901
100025, Beijing
China

Free service line: 400 661 8717
Fax: +86 10 8446 7947
china@iospress.cn

For editorial issues, like the status of your submitted paper or proposals, write to editorial@iospress.nl

如果您在出版方面需要帮助或有任何建, 件至: editorial@iospress.nl

Share this:

North America

Europe

Asia