Hyperkinetic Dysarthria voice abnormalities: a neural network solution for text translation

Ссылки

https://link.springer.com/10.1007/s10772-024-10098-5

DOI

https://doi.org/10.1007/s10772-024-10098-5
Конечная издательская версия

Antor mahamudul Hashan
Roman Dmitrievich Chaganov
Alexander Valerievich Melnikov
Danila Vasilyevich Dorokh
Nikolai Alexandrovich Khlebnikov
Boris Bredikhin

The implementation of a defect speech recognition (DSR) system has the opportunity to significantly improve the lifestyle of people with speech disorders. In this paper, we developed a novel ConvGRUSpeechNet model for recognizing and understanding hyperkinetic dysarthria disorder (HDD) speech. The proposed model uniquely combines convolutional layers, recurrent layers (GRU and BiGRU), and dense layers with a LogSoftmax function to effectively recognize and translate HDD speech into text. To prevent overfitting and handling imbalances, we employed data augmentation and splitting functions during the training process. Also, the Mel-frequency cepstral coefficients (MFCC) were employed to reduce the issue of vanishing gradients. In addition, a dataset of Russian speech has been created, comprising 2000 recordings of HDD speech. The primary objective of this research is to improve speech recognition for individuals with HDD by employing the ConvGRUSpeechNet model. The proposed DSR system outperformed the recognition character error rate (CER) of 12.35% using the test dataset. Under the same conditions, the experimental findings show that the proposed solution exhibits superior performance in comparison to existing state-of-the-art CBNs and TDNN-F LF-MMI models. Furthermore, we implemented the TensorFlow model on a flask server, making it accessible for use in a web application. © The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2024.

Язык оригинала	Английский
Страницы (с-по)	255-265
Число страниц	11
Журнал	International Journal of Speech Technology
Том	27
Номер выпуска	1
DOI	https://doi.org/10.1007/s10772-024-10098-5
Состояние	Опубликовано - 2024

Предметные области ASJC Scopus

Language and Linguistics
Linguistics and Language
Computer Vision and Pattern Recognition
Программный продукт
Human-Computer Interaction

ID: 57304969