DYSARTHRIA SPEECH RECOGNITION BY PHONEMES USING HIDDEN MARKOV MODELS

Standard

DYSARTHRIA SPEECH RECOGNITION BY PHONEMES USING HIDDEN MARKOV MODELS. / Bredikhin, B. A.; Antor, M.; Khlebnikov, N. A. et al.
In: Моделирование, оптимизация и информационные технологии, Vol. 12, No. 1 (44), 20, 2024.

Research output: Contribution to journal › Article › peer-review

Harvard

Bredikhin, BA , Antor, M , Khlebnikov, NA , Мельников, АВ & Бачурин, МВ 2024, 'DYSARTHRIA SPEECH RECOGNITION BY PHONEMES USING HIDDEN MARKOV MODELS', Моделирование, оптимизация и информационные технологии, vol. 12, no. 1 (44), 20. https://doi.org/10.26102/2310-6018/2024.44.1.002

APA

Bredikhin, B. A., Antor, M., Khlebnikov, N. A., Мельников, А. В., & Бачурин, М. В. (2024). DYSARTHRIA SPEECH RECOGNITION BY PHONEMES USING HIDDEN MARKOV MODELS. Моделирование, оптимизация и информационные технологии, 12(1 (44)), [20]. https://doi.org/10.26102/2310-6018/2024.44.1.002

Vancouver

Bredikhin BA , Antor M , Khlebnikov NA , Мельников АВ , Бачурин МВ. DYSARTHRIA SPEECH RECOGNITION BY PHONEMES USING HIDDEN MARKOV MODELS. Моделирование, оптимизация и информационные технологии. 2024;12(1 (44)):20. doi: 10.26102/2310-6018/2024.44.1.002

Author

Bredikhin, B. A. ; Antor, M. ; Khlebnikov, N. A. et al. / DYSARTHRIA SPEECH RECOGNITION BY PHONEMES USING HIDDEN MARKOV MODELS. In: Моделирование, оптимизация и информационные технологии. 2024 ; Vol. 12, No. 1 (44).

BibTeX

@article{c1e5d3ea283a4a65a5584de0ff2b513c,

title = "DYSARTHRIA SPEECH RECOGNITION BY PHONEMES USING HIDDEN MARKOV MODELS",

abstract = "The relevance of the paper is due to the difficulties of oral interaction between people with speech disorders and normotypic interlocutors as well as the low quality of abnormal speech recognition by standard speech recognition systems and the inability to create a system capable of processing any speech disorders. In this regard, this article is aimed at developing a method for automatic recognition of dysarthric speech using a pre-trained neural network for recognizing phonemes and hidden Markov models for converting phonemes into text and subsequent correction of recognition results using a search in the space of acceptable words of the nearest Levenshtein word and a dynamic algorithm for splitting the output of the model into separate words. The main advantage of using hidden Markov models in comparison with neural networks is the small size of the training data set collected individually for each user, as well as the ease of training the model further in case of progressive speech disorders. The data set for model training is described, and recommendations for collecting and marking data for model training are given. The effectiveness of the proposed method is tested on an individual data set recorded by a person with dysarthria; the recognition quality is compared with neural network models trained on the data set used. The materials of the article are of practical value for creating an augmented communication system for people with speech disorders.",

author = "Bredikhin, {B. A.} and M. Antor and Khlebnikov, {N. A.} and Мельников, {Александр Валерьевич} and Бачурин, {Матвей Владимирович}",

year = "2024",

doi = "10.26102/2310-6018/2024.44.1.002",

language = "English",

volume = "12",

journal = "Моделирование, оптимизация и информационные технологии",

issn = "2310-6018",

publisher = "Воронежский институт высоких технологий",

number = "1 (44)",

}

RIS

TY - JOUR

T1 - DYSARTHRIA SPEECH RECOGNITION BY PHONEMES USING HIDDEN MARKOV MODELS

AU - Bredikhin, B. A.

AU - Antor, M.

AU - Khlebnikov, N. A.

AU - Мельников, Александр Валерьевич

AU - Бачурин, Матвей Владимирович

PY - 2024

Y1 - 2024

N2 - The relevance of the paper is due to the difficulties of oral interaction between people with speech disorders and normotypic interlocutors as well as the low quality of abnormal speech recognition by standard speech recognition systems and the inability to create a system capable of processing any speech disorders. In this regard, this article is aimed at developing a method for automatic recognition of dysarthric speech using a pre-trained neural network for recognizing phonemes and hidden Markov models for converting phonemes into text and subsequent correction of recognition results using a search in the space of acceptable words of the nearest Levenshtein word and a dynamic algorithm for splitting the output of the model into separate words. The main advantage of using hidden Markov models in comparison with neural networks is the small size of the training data set collected individually for each user, as well as the ease of training the model further in case of progressive speech disorders. The data set for model training is described, and recommendations for collecting and marking data for model training are given. The effectiveness of the proposed method is tested on an individual data set recorded by a person with dysarthria; the recognition quality is compared with neural network models trained on the data set used. The materials of the article are of practical value for creating an augmented communication system for people with speech disorders.

AB - The relevance of the paper is due to the difficulties of oral interaction between people with speech disorders and normotypic interlocutors as well as the low quality of abnormal speech recognition by standard speech recognition systems and the inability to create a system capable of processing any speech disorders. In this regard, this article is aimed at developing a method for automatic recognition of dysarthric speech using a pre-trained neural network for recognizing phonemes and hidden Markov models for converting phonemes into text and subsequent correction of recognition results using a search in the space of acceptable words of the nearest Levenshtein word and a dynamic algorithm for splitting the output of the model into separate words. The main advantage of using hidden Markov models in comparison with neural networks is the small size of the training data set collected individually for each user, as well as the ease of training the model further in case of progressive speech disorders. The data set for model training is described, and recommendations for collecting and marking data for model training are given. The effectiveness of the proposed method is tested on an individual data set recorded by a person with dysarthria; the recognition quality is compared with neural network models trained on the data set used. The materials of the article are of practical value for creating an augmented communication system for people with speech disorders.

UR - https://www.elibrary.ru/item.asp?id=65474483

U2 - 10.26102/2310-6018/2024.44.1.002

DO - 10.26102/2310-6018/2024.44.1.002

M3 - Article

VL - 12

JO - Моделирование, оптимизация и информационные технологии

JF - Моделирование, оптимизация и информационные технологии

SN - 2310-6018

IS - 1 (44)

M1 - 20

ER -

ID: 55704531