Searching for just a few words should be enough to get started. If you need to make more complex queries, use the tips below to guide you.
Article type: Research Article
Authors: Zvarevashe, Kudakwashe* | Olugbara, Oludayo O.
Affiliations: ICT and Society Research Group, South Africa Luban Workshop, Durban University of Technology, Durban, South Africa
Correspondence: [*] Corresponding author: Kudakwashe Zvarevashe, ICT and Society Research Group, South Africa Luban Workshop, Durban University of Technology, Durban 4000, South Africa. E-mail: kudakwashe.zvarevashe@gmail.com.
Abstract: Speech emotion recognition has become the heart of most human computer interaction applications in the modern world. The growing need to develop emotionally intelligent devices has opened up a lot of research opportunities. Most researchers in this field have applied the use of handcrafted features and machine learning techniques in recognising speech emotion. However, these techniques require extra processing steps and handcrafted features are usually not robust. They are computationally intensive because the curse of dimensionality results in low discriminating power. Research has shown that deep learning algorithms are effective for extracting robust and salient features in dataset. In this study, we have developed a custom 2D-convolution neural network that performs both feature extraction and classification of vocal utterances. The neural network has been evaluated against deep multilayer perceptron neural network and deep radial basis function neural network using the Berlin database of emotional speech, Ryerson audio-visual emotional speech database and Surrey audio-visual expressed emotion corpus. The described deep learning algorithm achieves the highest precision, recall and F1-scores when compared to other existing algorithms. It is observed that there may be need to develop customized solutions for different language settings depending on the area of applications.
Keywords: Deep learning, emotion recognition, feature extraction, learning algorithm, neural network, speech emotion
DOI: 10.3233/IDA-194747
Journal: Intelligent Data Analysis, vol. 24, no. 5, pp. 1065-1086, 2020
IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
USA
Tel: +1 703 830 6300
Fax: +1 703 830 2300
sales@iospress.com
For editorial issues, like the status of your submitted paper or proposals, write to editorial@iospress.nl
IOS Press
Nieuwe Hemweg 6B
1013 BG Amsterdam
The Netherlands
Tel: +31 20 688 3355
Fax: +31 20 687 0091
info@iospress.nl
For editorial issues, permissions, book requests, submissions and proceedings, contact the Amsterdam office info@iospress.nl
Inspirees International (China Office)
Ciyunsi Beili 207(CapitaLand), Bld 1, 7-901
100025, Beijing
China
Free service line: 400 661 8717
Fax: +86 10 8446 7947
china@iospress.cn
For editorial issues, like the status of your submitted paper or proposals, write to editorial@iospress.nl
如果您在出版方面需要帮助或有任何建, 件至: editorial@iospress.nl