<?xml version='1.0' encoding='utf-8'?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.2 20190208//EN" "http://jats.nlm.nih.gov/publishing/1.2/JATS-journalpublishing1.dtd">
<article article-type="research-article" dtd-version="1.2" xml:lang="ru" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><front><journal-meta><journal-id journal-id-type="issn">2313-8912</journal-id><journal-title-group><journal-title>Research Result. Theoretical and Applied Linguistics</journal-title></journal-title-group><issn pub-type="epub">2313-8912</issn></journal-meta><article-meta><article-id pub-id-type="doi">10.18413/2313-8912-2022-8-4-0-6</article-id><article-id pub-id-type="publisher-id">2974</article-id><article-categories><subj-group subj-group-type="heading"><subject>APPLIED LINGUISTICS</subject></subj-group></article-categories><title-group><article-title>&lt;strong&gt;Boosting Speech-to-Text software potential&lt;/strong&gt;</article-title><trans-title-group xml:lang="en"><trans-title>&lt;strong&gt;Boosting Speech-to-Text software potential&lt;/strong&gt;</trans-title></trans-title-group></title-group><contrib-group><contrib contrib-type="author"><name-alternatives><name xml:lang="ru"><surname>Biktimirov</surname><given-names>Andrey R.</given-names></name><name xml:lang="en"><surname>Biktimirov</surname><given-names>Andrey R.</given-names></name></name-alternatives><email>andybikt@yandex.ru</email><xref ref-type="aff" rid="aff1" /></contrib><contrib contrib-type="author"><name-alternatives><name xml:lang="ru"><surname>Gruzdev</surname><given-names>Dmitry Yu.</given-names></name><name xml:lang="en"><surname>Gruzdev</surname><given-names>Dmitry Yu.</given-names></name></name-alternatives><email>gru@inbox.ru</email><xref ref-type="aff" rid="aff1" /></contrib></contrib-group><aff id="aff1"><institution>Military University, Russia</institution></aff><pub-date pub-type="epub"><year>2022</year></pub-date><volume>8</volume><issue>4</issue><fpage>0</fpage><lpage>0</lpage><self-uri content-type="pdf" xlink:href="/media/linguistics/2022/4/Лингвистика_8_4_2022_72-89.pdf" /><abstract xml:lang="ru"><p>The article focuses on finding ways of boosting efficiency and accuracy of Speech-to-Text (STT)-powered input. The effort is triggered by the growing popularity of the software among professional translators, which is in line with the general trend of abandoning typing in favor of speech-to-text applications. Insisting that better effectiveness of such programs is contingent on their accuracy, the researchers analyze major factors, both linguistic and technical in nature, affecting the computer-assisted speech transcribing quality. This leads to an experiment, putting the hypothesis to a test. Based on numerical and performance data, errors and their breakdown into categories in an attempt to figure out their origins, it dwells on various approaches to dictation in a combination with several hardware options and configurations. These pave the way for recommendations on the improvement of STT performance based on the Dragon software. The authors arrive at a conclusion that it is possible to boost the STT accuracy up to 99&amp;nbsp;percent by adjusting the program profile to accommodate phonetic features of the speaker with due consideration of his accent, adding to the dictionary the most complex and rare vocabulary beforehand, and fine-tuning input hardware. Other noteworthy results include ways to overcome the most complex transcribing challenges, i.e. proper names, placenames, abbreviations, etc.</p></abstract><trans-abstract xml:lang="en"><p>The article focuses on finding ways of boosting efficiency and accuracy of Speech-to-Text (STT)-powered input. The effort is triggered by the growing popularity of the software among professional translators, which is in line with the general trend of abandoning typing in favor of speech-to-text applications. Insisting that better effectiveness of such programs is contingent on their accuracy, the researchers analyze major factors, both linguistic and technical in nature, affecting the computer-assisted speech transcribing quality. This leads to an experiment, putting the hypothesis to a test. Based on numerical and performance data, errors and their breakdown into categories in an attempt to figure out their origins, it dwells on various approaches to dictation in a combination with several hardware options and configurations. These pave the way for recommendations on the improvement of STT performance based on the Dragon software. The authors arrive at a conclusion that it is possible to boost the STT accuracy up to 99&amp;nbsp;percent by adjusting the program profile to accommodate phonetic features of the speaker with due consideration of his accent, adding to the dictionary the most complex and rare vocabulary beforehand, and fine-tuning input hardware. Other noteworthy results include ways to overcome the most complex transcribing challenges, i.e. proper names, placenames, abbreviations, etc.</p></trans-abstract><kwd-group xml:lang="ru"><kwd>Transcribing</kwd><kwd>Voice recognition</kwd><kwd>STT software</kwd><kwd>Dictation efficiency</kwd><kwd>Voice properties</kwd><kwd>Phonetic properties</kwd></kwd-group><kwd-group xml:lang="en"><kwd>Transcribing</kwd><kwd>Voice recognition</kwd><kwd>STT software</kwd><kwd>Dictation efficiency</kwd><kwd>Voice properties</kwd><kwd>Phonetic properties</kwd></kwd-group></article-meta></front><back><ref-list><title>Список литературы</title><ref id="B1"><mixed-citation>Belitskaya,&amp;nbsp;A. (2014). Roles of hesitation pauses in spontaneous speech, Philology and literary studies, 2, available at: https://philology.snauka.ru/2014/02/698 (Accessed 20 October 2022). (In Russian)</mixed-citation></ref><ref id="B2"><mixed-citation>Brucal,&amp;nbsp;S.&amp;nbsp;G.&amp;nbsp;E. et&amp;nbsp;al. (2021). Filipino speech to text system using Convolutional Neural Network, Fifth World Conference on Smart Trends in Systems Security and Sustainability (WorldS4), 176-181. DOI: 10.1109/WorldS451998.2021.9513991 (In English)</mixed-citation></ref><ref id="B3"><mixed-citation>Chistova,&amp;nbsp;S. (2021). Abbreviation in the Russian, English and German discourse of pop music, Research Result. Theoretical and Applied Linguistics, 7&amp;nbsp;(1), 92-115. DOI: 10.18413/2313-8912-2021-7-1-0-8 (In Russian)</mixed-citation></ref><ref id="B4"><mixed-citation>Cornaggia-Urrigshardt,&amp;nbsp;A., G&amp;ouml;kg&amp;ouml;z,&amp;nbsp;F., Kurth,&amp;nbsp;F., Schmitz,&amp;nbsp;H. and Wilkinghoff,&amp;nbsp;K. (2022). Speech Recognition Lab, Procedia Computer Science, 205, 218&amp;ndash;228. https://doi.org/10.1016/j.procs.2022.09.023 (In English)</mixed-citation></ref><ref id="B5"><mixed-citation>Deng,&amp;nbsp;L. et&amp;nbsp;al. (2013). Recent advances in deep learning for speech research at Microsoft, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, 8604&amp;ndash;8608. doi:&amp;nbsp;10.1109/ICASSP.2013.6639345 (In English)</mixed-citation></ref><ref id="B6"><mixed-citation>Gruzdev,&amp;nbsp;D. and Biktimirov,&amp;nbsp;A. (2022). Written translation via sight translation, Moscow University Translation Studies Bulletin, 1, 7-26. (In Russian)</mixed-citation></ref><ref id="B7"><mixed-citation>Gruzdev,&amp;nbsp;D., Gruzdeva,&amp;nbsp;L. and Makarenko,&amp;nbsp;A. (2019). Sight translation coupled with voice recognition as a key to faster and easier translation, Bashkir University Bulletin, 24&amp;nbsp;(2), 430-438. (In Russian)</mixed-citation></ref><ref id="B8"><mixed-citation>Jorge,&amp;nbsp;J., Gim&amp;eacute;nez,&amp;nbsp;A., Baquero-Arnal,&amp;nbsp;P., Iranzo-S&amp;aacute;nchez,&amp;nbsp;J., P&amp;eacute;rez,&amp;nbsp;A., Garc&amp;eacute;s D&amp;iacute;az-Mun&amp;iacute;o,&amp;nbsp;G.V., Silvestre-Cerd&amp;agrave;,&amp;nbsp;J.A., Civera,&amp;nbsp;J., Sanchis,&amp;nbsp;A. and Juan,&amp;nbsp;A. (2021). MLLP-VRAIN Spanish ASR Systems for the Albayzin-RTVE 2020 Speech-To-Text Challenge, Proceedings of the 5th International Conference &amp;ldquo;IberSPEECH 2021&amp;rdquo;, Valladolid, Spain, 118-122. https://doi.org/10.21437/IberSPEECH.2021-25 (In English)</mixed-citation></ref><ref id="B9"><mixed-citation>Kol&amp;acute;aˇr,&amp;nbsp;J. and Lamel,&amp;nbsp;L. (2012) Development and evaluation of automatic punctuation for French and English speech-to-text, Proceedings of the 13th Annual Conference of the International Speech Communication Association &amp;ldquo;Interspeech 2012&amp;rdquo;, Portland, Oregon, USA, 1376-1379. (In English)</mixed-citation></ref><ref id="B10"><mixed-citation>Kumar,&amp;nbsp;R., Gupta&amp;nbsp;M. and Sapra,&amp;nbsp;S.&amp;nbsp;R. (2021) Speech to text community application using natural language processing, 5th International Conference on Information Systems and Computer Networks (ISCON), 1-6. DOI: 10.1109/ISCON52037.2021.9702428 (In English)</mixed-citation></ref><ref id="B11"><mixed-citation>Kurzekar,&amp;nbsp;P., Deshmukh,&amp;nbsp;R., Waghmare,&amp;nbsp;V. and Shrishrimal,&amp;nbsp;P. (2014). A comparative study of feature extraction techniques for speech recognition system, International Journal of Innovative Research in Science, Engineering and Technology, 3&amp;nbsp;(12), 18006-18016. DOI:&amp;nbsp;10.15680/IJIRSET.2014.0312034 (In English)</mixed-citation></ref><ref id="B12"><mixed-citation>Kuzmin,&amp;nbsp;A. and Ivanov,&amp;nbsp;S. (2021). Speech to text system for noisy and quiet speech, Journal of Physics: Conference Series, 2096, 012071. https://doi.org/10.1088/1742-6596/2096/1/012071 (In English)</mixed-citation></ref><ref id="B13"><mixed-citation>Ma,&amp;nbsp;Y., Nguyen,&amp;nbsp;T.&amp;nbsp;H. and Ma,&amp;nbsp;B. (2022). CPT: cross-modal prefix-tuning for speech-to-text translation, ICASSP 2022 &amp;ndash; 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 6217-6221. DOI: 10.1109/ICASSP43922.2022.9746935 (In English)</mixed-citation></ref><ref id="B14"><mixed-citation>Messaoudi,&amp;nbsp;A., Haddad,&amp;nbsp;H., Fourati,&amp;nbsp;C., Hmida,&amp;nbsp;M.&amp;nbsp;B., Elhaj Mabrouk,&amp;nbsp;A.&amp;nbsp;B. and Graiet,&amp;nbsp;M. (2021). Tunisian Dialectal End-to-end Speech Recognition based on DeepSpeech, Procedia Computer Science, 189, 183&amp;ndash;190. https://doi.org/10.1016/j.procs.2021.05.082 (In English)</mixed-citation></ref><ref id="B15"><mixed-citation>Nugraha,&amp;nbsp;D.&amp;nbsp;S. and Dewanti,&amp;nbsp;R. (2022). English-Indonesian crisis translation: accuracy and adequacy of Covid-19 terms translated by three MT tools, Research Result. Theoretical and Applied Linguistics, 8&amp;nbsp;(1), 122-134. https://doi.org/10.18413/2313-8912-2022-8-1-0-8 (In English)</mixed-citation></ref><ref id="B16"><mixed-citation>Ogunshile,&amp;nbsp;E., Phung,&amp;nbsp;K. and Ramachandran,&amp;nbsp;Raj. (2021). Exploring a web-based application to convert Tamil and Vietnamese speech to text without the effect of code-switching and code-mixing, Programming and Computer Software, 47, 757&amp;ndash;764. https://doi.org/10.1134/S036176882108020X (In English)</mixed-citation></ref><ref id="B17"><mixed-citation>Perero-Codosero,&amp;nbsp;J.&amp;nbsp;M., Espinoza-Cuadros,&amp;nbsp;F.&amp;nbsp;M. and Hern&amp;aacute;ndez-G&amp;oacute;mez,&amp;nbsp;L.&amp;nbsp;A. (2022). A comparison of hybrid and end-to-end ASR systems for the IberSpeech-RTVE 2020 speech-to-text transcription challenge, Applied Sciences, 12&amp;nbsp;(2), 903. https://doi.org/10.3390/app12020903 (In English)</mixed-citation></ref><ref id="B18"><mixed-citation>Pernarčić,&amp;nbsp;M. (2019). Testing the efficiency of voice recognition software in translation, Master&amp;#39;s thesis, Strossmayer University, Croatia. (In English)</mixed-citation></ref><ref id="B19"><mixed-citation>Stubna,&amp;nbsp;P. (2020). Beyond &amp;laquo;Listen and Repeat&amp;raquo;: Investigating English Pronunciation Instruction at the Upper Secondary School Level in Slovakia by R.&amp;nbsp;Metruk: A Book Review, Journal of Language and Education, 6&amp;nbsp;(4), 216-220. https://doi.org/10.17323/jle.2020.10919 (In English)</mixed-citation></ref><ref id="B20"><mixed-citation>Trabelsi,&amp;nbsp;A., Warichet,&amp;nbsp;S., Aajaoun,&amp;nbsp;Y. and Soussilane,&amp;nbsp;S. (2022). Evaluation of the efficiency of state-of-the-art Speech Recognition engines, Procedia Computer Science, 207. 2242-2252. https://doi.org/10.1016/j.procs.2022.09.534 (In English)</mixed-citation></ref></ref-list></back></article>