Classification of Parkinson’s disease from smartphone recording data using time-frequency analysis and convolutional neural network

Worasawate, Denchai; Asawaponwiput, Warisara; Yoshimura, Natsue; Intarapanich, Apichart; Surangsrirat, Decho

doi:10.3233/THC-220386

Classification of Parkinson’s disease from smartphone recording data using time-frequency analysis and convolutional neural network

Article type: Research Article

Authors: Worasawate, Denchai^a | Asawaponwiput, Warisara^a | Yoshimura, Natsue^b | Intarapanich, Apichart^c | Surangsrirat, Decho^{d; *}

Affiliations: [a] Department of Electrical Engineering, Faculty of Engineering, Kasetsart University, Bangkok, Thailand | [b] Institute of Innovative Research, Tokyo Institute of Technology, Yokohama, Japan | [c] Educational Technology Team, National Electronics and Computer Technology Center, Pathum Thani, Thailand | [d] Assistive Technology and Medical Devices Research Center, National Science and Technology Development Agency, Pathum Thani, Thailand

Correspondence: [*] Corresponding author: Decho Surangsrirat, Assistive Technology and Medical Devices Research Center, National Science and Technology Development Agency, Pathum Thani, Thailand. E-mail: decho.sur@nstda.or.th.

Abstract: BACKGROUND: Parkinson’s disease (PD) is a long-term neurodegenerative disease of the central nervous system. The current diagnosis is dependent on clinical observation and the abilities and experience of a trained specialist. One of the symptoms that affects most patients is voice impairment. OBJECTIVE: Voice samples are non-invasive data that can be collected remotely for diagnosis and disease progression monitoring. In this study, we analyzed voice recording data from a smartphone as a possible medical self-diagnosis tool by using only one-second voice recording. The data from one of the largest mobile PD studies, the mPower study, was used. METHODS: A total of 29,798 ten-second voice recordings on smartphone from 4,051 participants were used for the analysis. The voice recordings were from sustained phonation by participants saying /aa/ for ten seconds into an iPhone microphone. A dataset comprising 385,143 short one-second audio samples was generated from the original ten-second voice recordings. The samples were converted to a spectrogram using a short-time Fourier transform. CNN models were then applied to classify the samples. RESULTS: Classification accuracies of the proposed method with LeNet-5, ResNet-50, and VGGNet-16 are 97.7 ± 0.1%, 98.6 ± 0.2%, and 99.3 ± 0.1%, respectively. CONCLUSIONS: We achieve a respectable classification performance using a generalized approach on a dataset with a large number of samples. The result emphasizes that an analysis based on one-second clip recorded on a smartphone could be a promising non-invasive and remotely available PD biomarker.

Keywords: PD voice, audio classification, convolutional neural network, mPower study

DOI: 10.3233/THC-220386

Journal: Technology and Health Care, vol. 31, no. 2, pp. 705-718, 2023

Received 25 June 2022

Accepted 16 August 2022

Published: 15 March 2023

Price: EUR 27.50

North America

IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
USA

Tel: +1 703 830 6300
Fax: +1 703 830 2300
sales@iospress.com

For editorial issues, like the status of your submitted paper or proposals, write to editorial@iospress.nl

Europe

IOS Press
Nieuwe Hemweg 6B
1013 BG Amsterdam
The Netherlands

Tel: +31 20 688 3355
Fax: +31 20 687 0091
info@iospress.nl

For editorial issues, permissions, book requests, submissions and proceedings, contact the Amsterdam office info@iospress.nl

Asia

Inspirees International (China Office)
Ciyunsi Beili 207(CapitaLand), Bld 1, 7-901
100025, Beijing
China

Free service line: 400 661 8717
Fax: +86 10 8446 7947
china@iospress.cn

For editorial issues, like the status of your submitted paper or proposals, write to editorial@iospress.nl

如果您在出版方面需要帮助或有任何建, 件至: editorial@iospress.nl

Share this:

North America

Europe

Asia