DOI: 10.18413/2313-8912-2022-8-4-0-6

Boosting Speech-to-Text software potential

Andrey R. Biktimirov (Military University, Russia)
Dmitry Yu. Gruzdev (Military University, Russia)

The article focuses on finding ways of boosting efficiency and accuracy of Speech-to-Text (STT)-powered input. The effort is triggered by the growing popularity of the software among professional translators, which is in line with the general trend of abandoning typing in favor of speech-to-text applications. Insisting that better effectiveness of such programs is contingent on their accuracy, the researchers analyze major factors, both linguistic and technical in nature, affecting the computer-assisted speech transcribing quality. This leads to an experiment, putting the hypothesis to a test. Based on numerical and performance data, errors and their breakdown into categories in an attempt to figure out their origins, it dwells on various approaches to dictation in a combination with several hardware options and configurations. These pave the way for recommendations on the improvement of STT performance based on the Dragon software. The authors arrive at a conclusion that it is possible to boost the STT accuracy up to 99 percent by adjusting the program profile to accommodate phonetic features of the speaker with due consideration of his accent, adding to the dictionary the most complex and rare vocabulary beforehand, and fine-tuning input hardware. Other noteworthy results include ways to overcome the most complex transcribing challenges, i.e. proper names, placenames, abbreviations, etc.

Keywords: Transcribing, Voice recognition, STT software, Dictation efficiency, Voice properties, Phonetic properties.

Number of views: 1395 (view statistics)

Количество скачиваний: 3369

Full text (HTML)Full text (PDF)To articles list

Information for citation:

Biktimirov, A. R. and Gruzdev, D. Yu. (2022). Boosting Speech-to-Text software potential, Research Result. Theoretical and Applied Linguistics, 8 (4), 72-89. DOI: 10.18413/2313-8912-2022-8-4-0-6

User comments
Reference lists

While nobody left any comments to this publication.
You can be first.

Belitskaya, A. (2014). Roles of hesitation pauses in spontaneous speech, Philology and literary studies, 2, available at: https://philology.snauka.ru/2014/02/698 (Accessed 20 October 2022). (In Russian)

Brucal, S. G. E. et al. (2021). Filipino speech to text system using Convolutional Neural Network, Fifth World Conference on Smart Trends in Systems Security and Sustainability (WorldS4), 176-181. DOI: 10.1109/WorldS451998.2021.9513991 (In English)

Chistova, S. (2021). Abbreviation in the Russian, English and German discourse of pop music, Research Result. Theoretical and Applied Linguistics, 7 (1), 92-115. DOI: 10.18413/2313-8912-2021-7-1-0-8 (In Russian)

Cornaggia-Urrigshardt, A., Gökgöz, F., Kurth, F., Schmitz, H. and Wilkinghoff, K. (2022). Speech Recognition Lab, Procedia Computer Science, 205, 218–228. https://doi.org/10.1016/j.procs.2022.09.023 (In English)

Deng, L. et al. (2013). Recent advances in deep learning for speech research at Microsoft, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, 8604–8608. doi: 10.1109/ICASSP.2013.6639345 (In English)

Gruzdev, D. and Biktimirov, A. (2022). Written translation via sight translation, Moscow University Translation Studies Bulletin, 1, 7-26. (In Russian)

Gruzdev, D., Gruzdeva, L. and Makarenko, A. (2019). Sight translation coupled with voice recognition as a key to faster and easier translation, Bashkir University Bulletin, 24 (2), 430-438. (In Russian)

Jorge, J., Giménez, A., Baquero-Arnal, P., Iranzo-Sánchez, J., Pérez, A., Garcés Díaz-Munío, G.V., Silvestre-Cerdà, J.A., Civera, J., Sanchis, A. and Juan, A. (2021). MLLP-VRAIN Spanish ASR Systems for the Albayzin-RTVE 2020 Speech-To-Text Challenge, Proceedings of the 5th International Conference “IberSPEECH 2021”, Valladolid, Spain, 118-122. https://doi.org/10.21437/IberSPEECH.2021-25(In English)

Kol´aˇr, J. and Lamel, L. (2012) Development and evaluation of automatic punctuation for French and English speech-to-text, Proceedings of the 13th Annual Conference of the International Speech Communication Association “Interspeech 2012”, Portland, Oregon, USA, 1376-1379. (In English)

Kumar, R., Gupta M. and Sapra, S. R. (2021) Speech to text community application using natural language processing, 5th International Conference on Information Systems and Computer Networks (ISCON), 1-6. DOI: 10.1109/ISCON52037.2021.9702428 (In English)

Kurzekar, P., Deshmukh, R., Waghmare, V. and Shrishrimal, P. (2014). A comparative study of feature extraction techniques for speech recognition system, International Journal of Innovative Research in Science, Engineering and Technology, 3 (12), 18006-18016. DOI: 10.15680/IJIRSET.2014.0312034(In English)

Kuzmin, A. and Ivanov, S. (2021). Speech to text system for noisy and quiet speech, Journal of Physics: Conference Series, 2096, 012071. https://doi.org/10.1088/1742-6596/2096/1/012071(In English)

Ma, Y., Nguyen, T. H. and Ma, B. (2022). CPT: cross-modal prefix-tuning for speech-to-text translation, ICASSP 2022 – 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 6217-6221. DOI: 10.1109/ICASSP43922.2022.9746935 (In English)

Messaoudi, A., Haddad, H., Fourati, C., Hmida, M. B., Elhaj Mabrouk, A. B. and Graiet, M. (2021). Tunisian Dialectal End-to-end Speech Recognition based on DeepSpeech, Procedia Computer Science, 189, 183–190. https://doi.org/10.1016/j.procs.2021.05.082(In English)

Nugraha, D. S. and Dewanti, R. (2022). English-Indonesian crisis translation: accuracy and adequacy of Covid-19 terms translated by three MT tools, Research Result. Theoretical and Applied Linguistics, 8 (1), 122-134. https://doi.org/10.18413/2313-8912-2022-8-1-0-8(In English)

Ogunshile, E., Phung, K. and Ramachandran, Raj. (2021). Exploring a web-based application to convert Tamil and Vietnamese speech to text without the effect of code-switching and code-mixing, Programming and Computer Software, 47, 757–764. https://doi.org/10.1134/S036176882108020X(In English)

Perero-Codosero, J. M., Espinoza-Cuadros, F. M. and Hernández-Gómez, L. A. (2022). A comparison of hybrid and end-to-end ASR systems for the IberSpeech-RTVE 2020 speech-to-text transcription challenge, Applied Sciences, 12 (2), 903. https://doi.org/10.3390/app12020903(In English)

Pernarčić, M. (2019). Testing the efficiency of voice recognition software in translation, Master's thesis, Strossmayer University, Croatia. (In English)

Stubna, P. (2020). Beyond «Listen and Repeat»: Investigating English Pronunciation Instruction at the Upper Secondary School Level in Slovakia by R. Metruk: A Book Review, Journal of Language and Education, 6 (4), 216-220. https://doi.org/10.17323/jle.2020.10919 (In English)

Trabelsi, A., Warichet, S., Aajaoun, Y. and Soussilane, S. (2022). Evaluation of the efficiency of state-of-the-art Speech Recognition engines, Procedia Computer Science, 207. 2242-2252. https://doi.org/10.1016/j.procs.2022.09.534 (In English)

All journals

Send article

Research Result. Theoretical and Applied Linguistics is included in the scientific database of the RINTs (license agreement No. 765-12/2014 dated 08.12.2014).

Журнал включен в перечень рецензируемых научных изданий, рекомендуемых ВАК

The journal is indexed by the following scientific databases and platforms

Research Result. Research Result. Theoretical and Applied Linguistics (ISSN 2313-8912)

The journal materials and website are licensed under Creative Commons «Attribution» 4.0 International.

The Founder: Federal State Autonomous Educational Institution of Higher Education "Belgorod National Research University"The Founder’s address: 85 Pobedy Street, Belgorod, the Belgorod region, 308015, Russia

The Publisher: Federal State Autonomous Educational Institution of HigherEducation "Belgorod National Research University" The Founder’s address:85 Pobedy Street, Belgorod, the Belgorod region, 308015, Russia

Editors Office: chief editor Olga Dekhnich, e-mail: RR_Linguistics@bsuedu.ru, phone: (4722) 301254.

Registered by the Federal Service for Supervision of Communications, Information Technology and Mass Media (Roskomnadzor)

Certificate

Charter of the editorial board of the mass media "Research Result. Theoretical and Applied Linguistics"

Order No. 636-OD dated 30.06.2023 "On approval of the Charters of the editorial boards of the mass media of scientific journals of Belgorod State National Research University"

Order No. 1097-OD from 15.11.2023 "On approval of the Regulations for the publication of scientific journals of Belgorod State National Research University"

Have questions?
You can write to us:

✉ Executive Secretary

✉ Site administration

✉ Content manager