Список литературы

2313-8912

Research Result. Theoretical and Applied Linguistics

2313-8912

10.18413/2313-8912-2022-8-4-0-6

2974

APPLIED LINGUISTICS

<strong>Boosting Speech-to-Text software potential</strong>

Biktimirov

Andrey R.

Biktimirov

Andrey R.

andybikt@yandex.ru

Gruzdev

Dmitry Yu.

Gruzdev

Dmitry Yu.

gru@inbox.ru

Military University, Russia

2022

8400

The article focuses on finding ways of boosting efficiency and accuracy of Speech-to-Text (STT)-powered input. The effort is triggered by the growing popularity of the software among professional translators, which is in line with the general trend of abandoning typing in favor of speech-to-text applications. Insisting that better effectiveness of such programs is contingent on their accuracy, the researchers analyze major factors, both linguistic and technical in nature, affecting the computer-assisted speech transcribing quality. This leads to an experiment, putting the hypothesis to a test. Based on numerical and performance data, errors and their breakdown into categories in an attempt to figure out their origins, it dwells on various approaches to dictation in a combination with several hardware options and configurations. These pave the way for recommendations on the improvement of STT performance based on the Dragon software. The authors arrive at a conclusion that it is possible to boost the STT accuracy up to 99 percent by adjusting the program profile to accommodate phonetic features of the speaker with due consideration of his accent, adding to the dictionary the most complex and rare vocabulary beforehand, and fine-tuning input hardware. Other noteworthy results include ways to overcome the most complex transcribing challenges, i.e. proper names, placenames, abbreviations, etc.

TranscribingVoice recognitionSTT softwareDictation efficiencyVoice propertiesPhonetic properties

Список литературы

Belitskaya, A. (2014). Roles of hesitation pauses in spontaneous speech, Philology and literary studies, 2, available at: https://philology.snauka.ru/2014/02/698 (Accessed 20 October 2022). (In Russian)

Brucal, S. G. E. et al. (2021). Filipino speech to text system using Convolutional Neural Network, Fifth World Conference on Smart Trends in Systems Security and Sustainability (WorldS4), 176-181. DOI: 10.1109/WorldS451998.2021.9513991 (In English)

Chistova, S. (2021). Abbreviation in the Russian, English and German discourse of pop music, Research Result. Theoretical and Applied Linguistics, 7 (1), 92-115. DOI: 10.18413/2313-8912-2021-7-1-0-8 (In Russian)

Cornaggia-Urrigshardt, A., Gökgöz, F., Kurth, F., Schmitz, H. and Wilkinghoff, K. (2022). Speech Recognition Lab, Procedia Computer Science, 205, 218–228. https://doi.org/10.1016/j.procs.2022.09.023 (In English)

Deng, L. et al. (2013). Recent advances in deep learning for speech research at Microsoft, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, 8604–8608. doi: 10.1109/ICASSP.2013.6639345 (In English)

Gruzdev, D. and Biktimirov, A. (2022). Written translation via sight translation, Moscow University Translation Studies Bulletin, 1, 7-26. (In Russian)

Gruzdev, D., Gruzdeva, L. and Makarenko, A. (2019). Sight translation coupled with voice recognition as a key to faster and easier translation, Bashkir University Bulletin, 24 (2), 430-438. (In Russian)

Jorge, J., Giménez, A., Baquero-Arnal, P., Iranzo-Sánchez, J., Pérez, A., Garcés Díaz-Munío, G.V., Silvestre-Cerdà, J.A., Civera, J., Sanchis, A. and Juan, A. (2021). MLLP-VRAIN Spanish ASR Systems for the Albayzin-RTVE 2020 Speech-To-Text Challenge, Proceedings of the 5th International Conference “IberSPEECH 2021”, Valladolid, Spain, 118-122. https://doi.org/10.21437/IberSPEECH.2021-25 (In English)

Kol´aˇr, J. and Lamel, L. (2012) Development and evaluation of automatic punctuation for French and English speech-to-text, Proceedings of the 13th Annual Conference of the International Speech Communication Association “Interspeech 2012”, Portland, Oregon, USA, 1376-1379. (In English)

Kumar, R., Gupta M. and Sapra, S. R. (2021) Speech to text community application using natural language processing, 5th International Conference on Information Systems and Computer Networks (ISCON), 1-6. DOI: 10.1109/ISCON52037.2021.9702428 (In English)

Kurzekar, P., Deshmukh, R., Waghmare, V. and Shrishrimal, P. (2014). A comparative study of feature extraction techniques for speech recognition system, International Journal of Innovative Research in Science, Engineering and Technology, 3 (12), 18006-18016. DOI: 10.15680/IJIRSET.2014.0312034 (In English)

Kuzmin, A. and Ivanov, S. (2021). Speech to text system for noisy and quiet speech, Journal of Physics: Conference Series, 2096, 012071. https://doi.org/10.1088/1742-6596/2096/1/012071 (In English)

Ma, Y., Nguyen, T. H. and Ma, B. (2022). CPT: cross-modal prefix-tuning for speech-to-text translation, ICASSP 2022 – 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 6217-6221. DOI: 10.1109/ICASSP43922.2022.9746935 (In English)

Messaoudi, A., Haddad, H., Fourati, C., Hmida, M. B., Elhaj Mabrouk, A. B. and Graiet, M. (2021). Tunisian Dialectal End-to-end Speech Recognition based on DeepSpeech, Procedia Computer Science, 189, 183–190. https://doi.org/10.1016/j.procs.2021.05.082 (In English)

Nugraha, D. S. and Dewanti, R. (2022). English-Indonesian crisis translation: accuracy and adequacy of Covid-19 terms translated by three MT tools, Research Result. Theoretical and Applied Linguistics, 8 (1), 122-134. https://doi.org/10.18413/2313-8912-2022-8-1-0-8 (In English)

Ogunshile, E., Phung, K. and Ramachandran, Raj. (2021). Exploring a web-based application to convert Tamil and Vietnamese speech to text without the effect of code-switching and code-mixing, Programming and Computer Software, 47, 757–764. https://doi.org/10.1134/S036176882108020X (In English)

Perero-Codosero, J. M., Espinoza-Cuadros, F. M. and Hernández-Gómez, L. A. (2022). A comparison of hybrid and end-to-end ASR systems for the IberSpeech-RTVE 2020 speech-to-text transcription challenge, Applied Sciences, 12 (2), 903. https://doi.org/10.3390/app12020903 (In English)

Pernarčić, M. (2019). Testing the efficiency of voice recognition software in translation, Master's thesis, Strossmayer University, Croatia. (In English)

Stubna, P. (2020). Beyond «Listen and Repeat»: Investigating English Pronunciation Instruction at the Upper Secondary School Level in Slovakia by R. Metruk: A Book Review, Journal of Language and Education, 6 (4), 216-220. https://doi.org/10.17323/jle.2020.10919 (In English)

Trabelsi, A., Warichet, S., Aajaoun, Y. and Soussilane, S. (2022). Evaluation of the efficiency of state-of-the-art Speech Recognition engines, Procedia Computer Science, 207. 2242-2252. https://doi.org/10.1016/j.procs.2022.09.534 (In English)