<?xml version='1.0' encoding='utf-8'?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.2 20190208//EN" "http://jats.nlm.nih.gov/publishing/1.2/JATS-journalpublishing1.dtd">
<article article-type="research-article" dtd-version="1.2" xml:lang="ru" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><front><journal-meta><journal-id journal-id-type="issn">2313-8912</journal-id><journal-title-group><journal-title>Research Result. Theoretical and Applied Linguistics</journal-title></journal-title-group><issn pub-type="epub">2313-8912</issn></journal-meta><article-meta><article-id pub-id-type="doi">10.18413/2313-8912-2023-9-1-1-1</article-id><article-id pub-id-type="publisher-id">3062</article-id><article-categories><subj-group subj-group-type="heading"><subject>NEURAL NETWORKS IN NATURAL LANGUAGE PROCESSING</subject></subj-group></article-categories><title-group><article-title>&lt;strong&gt;A deep learning method based on language models for processing&lt;/strong&gt; &lt;strong&gt;natural language Russian commands in human robot interaction&lt;/strong&gt;</article-title><trans-title-group xml:lang="en"><trans-title>&lt;strong&gt;A deep learning method based on language models for processing&lt;/strong&gt; &lt;strong&gt;natural language Russian commands in human robot interaction&lt;/strong&gt;</trans-title></trans-title-group></title-group><contrib-group><contrib contrib-type="author"><name-alternatives><name xml:lang="ru"><surname>Sboev</surname><given-names>Alexander G.</given-names></name><name xml:lang="en"><surname>Sboev</surname><given-names>Alexander G.</given-names></name></name-alternatives><email>Sboev_AG@nrcki.ru</email><xref ref-type="aff" rid="aff1" /></contrib><contrib contrib-type="author"><name-alternatives><name xml:lang="ru"><surname>Gryaznov</surname><given-names>Artem V.</given-names></name><name xml:lang="en"><surname>Gryaznov</surname><given-names>Artem V.</given-names></name></name-alternatives><email>Gryaznov_AV@nrcki.ru</email><xref ref-type="aff" rid="aff1" /></contrib><contrib contrib-type="author"><name-alternatives><name xml:lang="ru"><surname>Rybka</surname><given-names>Roman B.</given-names></name><name xml:lang="en"><surname>Rybka</surname><given-names>Roman B.</given-names></name></name-alternatives><email>Rybka_RB@nrcki.ru</email><xref ref-type="aff" rid="aff1" /></contrib><contrib contrib-type="author"><name-alternatives><name xml:lang="ru"><surname>Skorokhodov</surname><given-names>Maxim S.</given-names></name><name xml:lang="en"><surname>Skorokhodov</surname><given-names>Maxim S.</given-names></name></name-alternatives><email>Skorokhodov_MS@nrcki.ru</email><xref ref-type="aff" rid="aff1" /></contrib><contrib contrib-type="author"><name-alternatives><name xml:lang="ru"><surname>Moloshnikov</surname><given-names>Ivan A.</given-names></name><name xml:lang="en"><surname>Moloshnikov</surname><given-names>Ivan A.</given-names></name></name-alternatives><email>Moloshnikov_IA@nrcki.ru</email><xref ref-type="aff" rid="aff1" /></contrib></contrib-group><aff id="aff1"><institution>Kurchatov Institute National Research Center, Russia</institution></aff><pub-date pub-type="epub"><year>2023</year></pub-date><volume>9</volume><issue>1</issue><fpage>0</fpage><lpage>0</lpage><self-uri content-type="pdf" xlink:href="/media/linguistics/2023/1/Лингвистика_9_1_2023-174-191.pdf" /><abstract xml:lang="ru"><p>The development of high performance human-machine interface systems for controlling robotic platforms by natural language is a relevant task in interdisciplinary field &amp;laquo;Human-Robot Interaction&amp;raquo;. In particular, it is in demand, when the robotic platform is controlled by an operator without any skills necessary to use specialized control tools. The paper describes a complex Russian language commands processing into a formalized RDF graph format to control a robotic platform. In this processing, neural network models are consistently used to search and replace pronouns in commands, restore missing verbs-actions, decompose a complex command with several actions into simple commands with only one action and classify simple command attribute. State-of-the-art solutions are applied as neural network models in this work. It is language models based on deep neural networks transformer architecture. The previous our papers show synthetic datasets based on developed generator of Russian language text commands, data based on crowdsourcing technologies and data from open sources for each of the described stages of processing. These datasets were used to fine-tune the language models of the neural networks. In this work, the resulting fine-tuned language models are implemented into the interface. The impact of the stage of searching and replacing pronouns on the efficiency of command conversion are evaluated. Using the virtual three-dimensional robotic platform simulator created at the National Research Center &amp;laquo;Kurchatov Institute&amp;raquo;, the high efficiency of complex Russian language commands processing as part of a human-machine interface system is demonstrated.</p></abstract><trans-abstract xml:lang="en"><p>The development of high performance human-machine interface systems for controlling robotic platforms by natural language is a relevant task in interdisciplinary field &amp;laquo;Human-Robot Interaction&amp;raquo;. In particular, it is in demand, when the robotic platform is controlled by an operator without any skills necessary to use specialized control tools. The paper describes a complex Russian language commands processing into a formalized RDF graph format to control a robotic platform. In this processing, neural network models are consistently used to search and replace pronouns in commands, restore missing verbs-actions, decompose a complex command with several actions into simple commands with only one action and classify simple command attribute. State-of-the-art solutions are applied as neural network models in this work. It is language models based on deep neural networks transformer architecture. The previous our papers show synthetic datasets based on developed generator of Russian language text commands, data based on crowdsourcing technologies and data from open sources for each of the described stages of processing. These datasets were used to fine-tune the language models of the neural networks. In this work, the resulting fine-tuned language models are implemented into the interface. The impact of the stage of searching and replacing pronouns on the efficiency of command conversion are evaluated. Using the virtual three-dimensional robotic platform simulator created at the National Research Center &amp;laquo;Kurchatov Institute&amp;raquo;, the high efficiency of complex Russian language commands processing as part of a human-machine interface system is demonstrated.</p></trans-abstract><kwd-group xml:lang="ru"><kwd>Human-robot interaction</kwd><kwd>Natural language processing</kwd><kwd>Deep learning</kwd><kwd>Artificial intelligence</kwd><kwd>Human-robot interface</kwd></kwd-group><kwd-group xml:lang="en"><kwd>Human-robot interaction</kwd><kwd>Natural language processing</kwd><kwd>Deep learning</kwd><kwd>Artificial intelligence</kwd><kwd>Human-robot interface</kwd></kwd-group></article-meta></front><back><ref-list><title>Список литературы</title><ref id="B1"><mixed-citation>Abadi,&amp;nbsp;M. et al. (2016). Tensorflow: A system for large-scale machine learning, OSDI&amp;#39;16: Proceedings of the 12th USENIX conference on Operating Systems Design and Implementation, 265-283. (In English)</mixed-citation></ref><ref id="B2"><mixed-citation>Ahn,&amp;nbsp;M. et al. (2022). Do As I Can and Not As I Say: Grounding Language in Robotic Affordances, arXiv preprint arXiv: 2204.01691. https://doi.org/10.48550/arXiv.2204.01691 (In English)</mixed-citation></ref><ref id="B3"><mixed-citation>Artetxe,&amp;nbsp;M. and Schwenk,&amp;nbsp;H. (2019). Massively multilingual sentence embeddings for zero-shot cross-lingual transfer and beyond, Transactions of the Association for Computational Linguistics, 7, 597-610. https://doi.org/10.1162/tacl_a_00288 (In English)</mixed-citation></ref><ref id="B4"><mixed-citation>Belkin,&amp;nbsp;I. (2019). BERT finetuning and graph modeling for gapping resolution, Computational Linguistics and Intellectual Technologies: Proceedings of the International Conference &amp;ldquo;Dialogue 2019&amp;rdquo;, 63-71. (In English)</mixed-citation></ref><ref id="B5"><mixed-citation>Budnikov,&amp;nbsp;E.&amp;nbsp;A., Toldova,&amp;nbsp;S.&amp;nbsp;Yu., Zvereva,&amp;nbsp;D.&amp;nbsp;S., Maximova,&amp;nbsp;D.&amp;nbsp;M. and Ionov,&amp;nbsp;M.&amp;nbsp;I. (2019). Ru-eval-2019: Evaluating anaphora and coreference resolution for Russian, Dialogue Evaluation, available at: https://www.dialog-21.ru/media/4689/budnikovzverevamaximova2019evaluatinganaphoracoreferenceresolution.pdf (Accessed 10 October 2022). (In English)</mixed-citation></ref><ref id="B6"><mixed-citation>Cer,&amp;nbsp;D. et al. (2018). Universal sentence encoder, arXiv preprint arXiv: 1803.11175. https://doi.org/10.48550/arXiv.1803.11175 (In English)</mixed-citation></ref><ref id="B7"><mixed-citation>Chaplot,&amp;nbsp;D.&amp;nbsp;S., Gandhi,&amp;nbsp;D., Gupta,&amp;nbsp;A. and Salakhutdinov,&amp;nbsp;R. (2020). Object Goal Navigation using Goal-Oriented Semantic Exploration, arXiv preprint arXiv: 2007.00643. https://doi.org/10.48550/arXiv.2007.00643 (In English)</mixed-citation></ref><ref id="B8"><mixed-citation>Choi,&amp;nbsp;D. and Langley,&amp;nbsp;P. (2018). Evolution of the Icarus Cognitive Architecture, Cognitive Systems Research, 25-38. https://doi.org/10.1016/j.cogsys.2017.05.005 (In English)</mixed-citation></ref><ref id="B9"><mixed-citation>Choi,&amp;nbsp;D., Shi,&amp;nbsp;W., Liang,&amp;nbsp;Y.&amp;nbsp;S, Yeo,&amp;nbsp;K.&amp;nbsp;H. and Kim,&amp;nbsp;J-J. (2021). Controlling Industrial Robots with High-Level Verbal Commands, International Conference on Social Robotics (ICSR 2021), Social Robotics, 216-226. https://doi.org/10.1007/978-3-030-90525-5_19 (In English)</mixed-citation></ref><ref id="B10"><mixed-citation>Chowdhery,&amp;nbsp;A. et al. (2022). PaLM: Scaling Language Modeling with Pathways, arXiv preprint arXiv: 2204.02311. https://doi.org/10.48550/arXiv.2204.02311 (In English)</mixed-citation></ref><ref id="B11"><mixed-citation>Devlin,&amp;nbsp;J., Chang,&amp;nbsp;M-W., Lee,&amp;nbsp;K. and Toutanova,&amp;nbsp;K. (2018). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, arXiv preprint arXiv: 1810.04805. https://doi.org/10.48550/arXiv.1810.04805 (In English)</mixed-citation></ref><ref id="B12"><mixed-citation>Feng,&amp;nbsp;F., Yang,&amp;nbsp;Y., Cer,&amp;nbsp;D., Arivazhagan,&amp;nbsp;N and Wang,&amp;nbsp;W. (2022). Language-agnostic bert sentence embedding, Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 878-891. http://dx.doi.org/10.18653/v1/2022.acl-long.62 (In English)</mixed-citation></ref><ref id="B13"><mixed-citation>Gubbi,&amp;nbsp;S.&amp;nbsp;V., Upadrashta,&amp;nbsp;R. and Amrutur,&amp;nbsp;B. (2020). Translating Natural Language Instructions to Computer Programs for Robot Manipulation, arXiv preprint arXiv: 2012.13695. https://doi.org/10.48550/arXiv.2110.12302 (In English)</mixed-citation></ref><ref id="B14"><mixed-citation>He,&amp;nbsp;K., Gkioxari,&amp;nbsp;G., Doll`ar,&amp;nbsp;P. and Girshick,&amp;nbsp;R.&amp;nbsp;B. (2017). Mask R-CNN, arXiv preprint arXiv: 1703.06870. (In English)</mixed-citation></ref><ref id="B15"><mixed-citation>Hochreiter,&amp;nbsp;S. and Schmidhuber,&amp;nbsp;J. (1997). Long Short-term Memory, Neural computation, 9&amp;nbsp;(8), 1735-1780. https://doi.org/10.1162/neco.1997.9.8.1735 (In English)</mixed-citation></ref><ref id="B16"><mixed-citation>Joshi,&amp;nbsp;M., Levy,&amp;nbsp;O., Zettlemoyer,&amp;nbsp;L. and Weld,&amp;nbsp;D. (2019). BERT for Coreference Resolution: Baselines and Analysis, Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, 5803-5808. http://dx.doi.org/10.18653/v1/D19-1588 (In English)</mixed-citation></ref><ref id="B17"><mixed-citation>Koenig,&amp;nbsp;N. and Howard,&amp;nbsp;A. (2004). Design and use paradigms for Gazebo, an open-source multi-robot simulator, 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Sendai, Japan, (3), 2149-2154. DOI:&amp;nbsp;10.1109/IROS.2004.1389727 (In English)</mixed-citation></ref><ref id="B18"><mixed-citation>Korobov,&amp;nbsp;M. (2015). Morphological Analyzer and Generator for Russian and Ukrainian Languages, Analysis of Images, Social Networks and Texts, 320-332. https://doi.org/10.1007/978-3-319-26123-2_31 (In English)</mixed-citation></ref><ref id="B19"><mixed-citation>Kuratov,&amp;nbsp;Y. and Arkhipov,&amp;nbsp;M. (2019). Adaptation of deep bidirectional multilingual transformers for Russian language, arXiv preprint arXiv: 1905.07213. https://doi.org/10.48550/arXiv.1905.07213 (In English)</mixed-citation></ref><ref id="B20"><mixed-citation>McBride,&amp;nbsp;B. (2004). The Resource Description Framework (RDF) and its Vocabulary Description Language RDFS, in Staab,&amp;nbsp;S. and Studer,&amp;nbsp;R. (eds.), Handbook on Ontologies. International Handbooks on Information Systems, Springer, Berlin, Heidelberg, Germany, 51-65. https://doi.org/10.1007/978-3-540-24750-0_3 (In English)</mixed-citation></ref><ref id="B21"><mixed-citation>Min,&amp;nbsp;S.&amp;nbsp;Y., Chaplot,&amp;nbsp;D.&amp;nbsp;S., Ravikumar,&amp;nbsp;P, Bisk,&amp;nbsp;Y. and Salakhutdinov,&amp;nbsp;R. (2021). FILM: Following Instructions in Language with Modular Methods, arXiv preprint arXiv: 2110.07342. https://doi.org/10.48550/arXiv.2110.07342 (In English)</mixed-citation></ref><ref id="B22"><mixed-citation>Quigley,&amp;nbsp;M., Conley,&amp;nbsp;K., Gerkey,&amp;nbsp;B.&amp;nbsp;P., Faust,&amp;nbsp;J., Foote,&amp;nbsp;T., Leibs,&amp;nbsp;J., Wheeler,&amp;nbsp;R. and Ng,&amp;nbsp;A.&amp;nbsp;Y. (2009). ROS: an open-source Robot Operating System, Workshops at the IEEE International Conference on Robotics and Automation. (In English)</mixed-citation></ref><ref id="B23"><mixed-citation>Radford,&amp;nbsp;A., Wu,&amp;nbsp;J., Child,&amp;nbsp;R., Luan,&amp;nbsp;D., Amodei,&amp;nbsp;D. and Sutskever,&amp;nbsp;I. (2019). Language Models Are Unsupervised Multitask Learners, OpenAI. (In English)</mixed-citation></ref><ref id="B24"><mixed-citation>Raffel,&amp;nbsp;C., Shazeer,&amp;nbsp;N. and Roberts,&amp;nbsp;A. (2019). Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer, arXiv preprint arXiv: 1910.10683. https://doi.org/10.48550/arXiv.1910.10683 (In English)</mixed-citation></ref><ref id="B25"><mixed-citation>Sboev,&amp;nbsp;A.&amp;nbsp;G., Gryaznov,&amp;nbsp;A.&amp;nbsp;V., Rybka,&amp;nbsp;R.&amp;nbsp;B., Skorokhodov,&amp;nbsp;M.&amp;nbsp;S. and Moloshnikov,&amp;nbsp;I.&amp;nbsp;A. (2022). Neural network interface for converting complex Russian-language text commands into a formalized graph form for controlling robotic devices, Vestnik Natsional`nogo Issledovatel&amp;rsquo;skogo Yadernogo Universiteta MIPHI, 11&amp;nbsp;(2), 153-163. DOI:&amp;nbsp;10.56304/S2304487X22020092 (In Russian)</mixed-citation></ref><ref id="B26"><mixed-citation>Sboev,&amp;nbsp;A., Rybka,&amp;nbsp;R. and Gryaznov,&amp;nbsp;A. (2020). Deep Neural Networks Ensemble with Word Vector Representation Models to Resolve Coreference Resolution in Russian, Advanced Technologies in Robotics and Intelligent Systems, 34-35. https://doi.org/10.1007/978-3-030-33491-8_4 (In English)</mixed-citation></ref><ref id="B27"><mixed-citation>Smurov,&amp;nbsp;I.&amp;nbsp;M., Ponomareva,&amp;nbsp;M., Shavrina,&amp;nbsp;T.&amp;nbsp;O. and Droganova,&amp;nbsp;K. (2019). Agrr-2019: Automatic gapping resolution for Russian, Computational Linguistics and Intellectual Technologies, 561-575. (In English)</mixed-citation></ref><ref id="B28"><mixed-citation>Van&amp;nbsp;Rossum,&amp;nbsp;G. and Drake,&amp;nbsp;F.&amp;nbsp;L. (2009). Python 3 Reference Manual, CreateSpace, Scotts Valley, CA. (In English)</mixed-citation></ref><ref id="B29"><mixed-citation>Vaswani,&amp;nbsp;A., Shazeer,&amp;nbsp;N., Parmar,&amp;nbsp;N., Uszkoreit,&amp;nbsp;J., Jones,&amp;nbsp;L., Gomez,&amp;nbsp;A.&amp;nbsp;N., Kaiser,&amp;nbsp;L. and Polosukhin,&amp;nbsp;I. (2017). Attention Is All You Need, arXiv preprint arXiv: 1706.03762. https://doi.org/10.48550/arXiv.1706.03762 (In English)</mixed-citation></ref><ref id="B30"><mixed-citation>Williams,&amp;nbsp;A., Nangia,&amp;nbsp;N. and Bowsman,&amp;nbsp;S.&amp;nbsp;R. (2017). A broad-coverage challenge corpus for sentence understanding through inference, arXiv preprint arXiv: 1704.05426. https://doi.org/10.48550/arXiv.1704.05426 (In English)</mixed-citation></ref><ref id="B31"><mixed-citation>Xue,&amp;nbsp;L., Constant,&amp;nbsp;N., Roberts,&amp;nbsp;A., Kale,&amp;nbsp;M., Al-Rfou,&amp;nbsp;R., Siddhant,&amp;nbsp;A., Barua,&amp;nbsp;A. and Raffel,&amp;nbsp;C. (2020). mT5: A massively multilingual pre-trained text-to-text transformer, arXiv preprint arXiv: 2010.11934. https://doi.org/10.48550/arXiv.1703.06870 (In English)</mixed-citation></ref></ref-list></back></article>