Complementing the DTW based speaker verification systems with knowledge of specific regions of interest

Laskar, Mohammad Azharuddin; Laskar, Rabul Hussain

doi:10.3233/JIFS-169927

Complementing the DTW based speaker verification systems with knowledge of specific regions of interest

Issue title: Soft Computing and Intelligent Systems: Techniques and Applications

Guest editors: Sabu M. Thampi and El-Sayed M. El-Alfy

Article type: Research Article

Authors: Laskar, Mohammad Azharuddin^{; *} | Laskar, Rabul Hussain

Affiliations: Department of Electronics and Communication Engineering, National Institute of Technology Silchar, Assam, India

Correspondence: [*] Corresponding author. Mohammad Azharuddin Laskar, Department of Electronics and Communication Engineering, National Institute of Technology Silchar, Assam, 788010 India. E-mail: azharlaskar@gmail.com.

Abstract: In recent times, Dynamic Time Warping (DTW) based template matching systems have again come to the forefront in the field of text-dependent speaker verification. Its integration with the latest technology, like i-vector/Probabilistic Linear Discriminant Analysis (PLDA) and Deep Neural Network (DNN), has resulted in significant improvement in the performance of the systems. DTW algorithm time-aligns two templates and gives a similarity score based on the optimal warping path. It however weighs all the local distances equally, along the optimal path. In this paper, we propose complementing the DTW based text-dependent speaker verification systems with local scores derived from the vicinity of speaker-identity-rich regions. The vowel regions are used to determine portions along the warping path that are more important in terms of speaker discriminating information content. Two systems, namely the DTW/ Mel-frequency Cepstral Coefficients (MFCC) system and the online i-vector/PLDA/DTW system have been extended to incorporate the knowledge of specific regions of interest. The results have been evaluated on Part 1 of RSR2015 database. Relative improvements of upto 11.85% and 49.41% are observed for the extended systems based on MFCC and i-vector respectively.

Keywords: DTW, vowel regions, online i-vector, text-dependent speaker verification

DOI: 10.3233/JIFS-169927

Journal: Journal of Intelligent & Fuzzy Systems, vol. 36, no. 3, pp. 2155-2163, 2019

Published: 26 March 2019

Price: EUR 27.50

North America

IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
USA

Tel: +1 703 830 6300
Fax: +1 703 830 2300
sales@iospress.com

For editorial issues, like the status of your submitted paper or proposals, write to editorial@iospress.nl

Europe

IOS Press
Nieuwe Hemweg 6B
1013 BG Amsterdam
The Netherlands

Tel: +31 20 688 3355
Fax: +31 20 687 0091
info@iospress.nl

For editorial issues, permissions, book requests, submissions and proceedings, contact the Amsterdam office info@iospress.nl

Asia

Inspirees International (China Office)
Ciyunsi Beili 207(CapitaLand), Bld 1, 7-901
100025, Beijing
China

Free service line: 400 661 8717
Fax: +86 10 8446 7947
china@iospress.cn

For editorial issues, like the status of your submitted paper or proposals, write to editorial@iospress.nl

如果您在出版方面需要帮助或有任何建, 件至: editorial@iospress.nl

Share this:

North America

Europe

Asia