Searching for just a few words should be enough to get started. If you need to make more complex queries, use the tips below to guide you.
Issue title: Communicative social signals: Computational and behavioural aspects of human-human and human-machine interaction
Guest editors: Klára Vicsix and Anna Espositoy
Article type: Research Article
Authors: Sárosi, Gellérta; * | Tarján, Balázsa | Fegyó, Tibora; b | Mihajlik, Pétera; c
Affiliations: [a] Department of Telecommunication and Media Informatics, Budapest University of Technology and Economics, Hungary | [b] Aitia International Inc., Hungary | [c] THINKTech Research Center Nonprofit LLC, Hungary | [x] Laboratory of Speech Acoustics, Department of Telecommunications and Media-Informatics, Budapest University of Technology and Economics, Budapest, Hungary | [y] Dipartimento di Psicologia and IIASS, Seconda Università di Napoli, Vietri Sul Mare, Salerno, Italy
Correspondence: [*] Correspoding author: Gellért Sárosi, Department of Telecommunication and Media Informatics, Budapest University of Technology and Economics, Hungary. E-mail: sarosi@mit.bme.hu
Abstract: This paper summarizes our recent efforts made to transcribe real-life Call Center conversations automatically with respect to non-verbal acoustic events, as well. Future Call Centers – as cognitive infocom systems – must respond automatically not only for well formed utterances but also for spontaneous and non-word speaker manifestations and must be robust against sudden noises. Conversational telephony speech transcription itself is a big challenge, primarily we address this issue on real-life (Bank and Insurance) tasks. In addition, we introduce several non-word acoustic modeling approaches and their integration to LVCSR (Large Vocabulary Continuous Speech Recognition). In the experiments, one and two channel (client and agent speech merged into one or left in two separate audio stream) transcription results, cross-task results and the handling of transcription data insufficiency are investigated – in parallel with the non-verbal acoustic event modeling. On the agent side less than 15% word error rate could be achieved and the best error rate reduction is 20% (relative) due to the inclusion of various written corpora and due to acoustic event handling.
Keywords: Speech, call centers, LVCSR, transcription
DOI: 10.3233/IDT-140195
Journal: Intelligent Decision Technologies, vol. 8, no. 4, pp. 265-275, 2014
IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
USA
Tel: +1 703 830 6300
Fax: +1 703 830 2300
sales@iospress.com
For editorial issues, like the status of your submitted paper or proposals, write to editorial@iospress.nl
IOS Press
Nieuwe Hemweg 6B
1013 BG Amsterdam
The Netherlands
Tel: +31 20 688 3355
Fax: +31 20 687 0091
info@iospress.nl
For editorial issues, permissions, book requests, submissions and proceedings, contact the Amsterdam office info@iospress.nl
Inspirees International (China Office)
Ciyunsi Beili 207(CapitaLand), Bld 1, 7-901
100025, Beijing
China
Free service line: 400 661 8717
Fax: +86 10 8446 7947
china@iospress.cn
For editorial issues, like the status of your submitted paper or proposals, write to editorial@iospress.nl
如果您在出版方面需要帮助或有任何建, 件至: editorial@iospress.nl