

The implementation of a defect speech recognition (DSR) system has the opportunity to significantly improve the lifestyle of people with speech disorders. In this paper, we developed a novel ConvGRUSpeechNet model for recognizing and understanding hyperkinetic dysarthria disorder (HDD) speech. The proposed model uniquely combines convolutional layers, recurrent layers (GRU and BiGRU), and dense layers with a LogSoftmax function to effectively recognize and translate HDD speech into text. To prevent overfitting and handling imbalances, we employed data augmentation and splitting functions during the training process. Also, the Mel-frequency cepstral coefficients (MFCC) were employed to reduce the issue of vanishing gradients. In addition, a dataset of Russian speech has been created, comprising 2000 recordings of HDD speech. The primary objective of this research is to improve speech recognition for individuals with HDD by employing the ConvGRUSpeechNet model. The proposed DSR system outperformed the recognition character error rate (CER) of 12.35% using the test dataset. Under the same conditions, the experimental findings show that the proposed solution exhibits superior performance in comparison to existing state-of-the-art CBNs and TDNN-F LF-MMI models. Furthermore, we implemented the TensorFlow model on a flask server, making it accessible for use in a web application. © The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2024.
Язык оригиналаАнглийский
Страницы (с-по)255-265
Число страниц11
ЖурналInternational Journal of Speech Technology
Номер выпуска1
СостояниеОпубликовано - 2024

    Предметные области ASJC Scopus

  • Language and Linguistics
  • Linguistics and Language
  • Computer Vision and Pattern Recognition
  • Программный продукт
  • Human-Computer Interaction

ID: 57304969