<?xml version='1.0' encoding='utf-8'?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.2 20190208//EN" "http://jats.nlm.nih.gov/publishing/1.2/JATS-journalpublishing1.dtd">
<article article-type="research-article" dtd-version="1.2" xml:lang="ru" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><front><journal-meta><journal-id journal-id-type="issn">2313-8912</journal-id><journal-title-group><journal-title>Research Result. Theoretical and Applied Linguistics</journal-title></journal-title-group><issn pub-type="epub">2313-8912</issn></journal-meta><article-meta><article-id pub-id-type="doi">10.18413/2313-8912-2026-12-1-0-4</article-id><article-id pub-id-type="publisher-id">4105</article-id><article-categories><subj-group subj-group-type="heading"><subject>APPLIED LINGUISTICS</subject></subj-group></article-categories><title-group><article-title>&lt;strong&gt;Automatic keyphrase extraction and annotation: modern theoretical approaches and practical solutions for text and speech&lt;/strong&gt;</article-title><trans-title-group xml:lang="en"><trans-title>&lt;strong&gt;Automatic keyphrase extraction and annotation: modern theoretical approaches and practical solutions for text and speech&lt;/strong&gt;</trans-title></trans-title-group></title-group><contrib-group><contrib contrib-type="author"><name-alternatives><name xml:lang="ru"><surname>Guseva</surname><given-names>Daria D.</given-names></name><name xml:lang="en"><surname>Guseva</surname><given-names>Daria D.</given-names></name></name-alternatives><email>daria.guseva@spbu.ru</email><xref ref-type="aff" rid="aff1" /></contrib><contrib contrib-type="author"><name-alternatives><name xml:lang="ru"><surname>Mitrofanova</surname><given-names>Olga A.</given-names></name><name xml:lang="en"><surname>Mitrofanova</surname><given-names>Olga A.</given-names></name></name-alternatives><email>o.mitrofanova@spbu.ru</email><xref ref-type="aff" rid="aff1" /></contrib></contrib-group><aff id="aff1"><institution>Saint-Petersburg State University, Saint-Petersburg, Russia</institution></aff><pub-date pub-type="epub"><year>2026</year></pub-date><volume>12</volume><issue>1</issue><fpage>0</fpage><lpage>0</lpage><self-uri content-type="pdf" xlink:href="/media/linguistics/2026/1/Лингвистика_12_1-90-122.pdf" /><abstract xml:lang="ru"><p>The exponential growth of textual and audiovisual information has made the task of automatic keyphrase extraction (KE) increasingly significant. This article provides a comprehensive analysis of contemporary theoretical approaches and practical solutions for KE across&amp;nbsp;both text and speech modalities.&amp;nbsp;The primary contribution of this work is its systematic synthesis of these often-disparate research strands into a unified analytical framework, highlighting the evolution of the field from statistical methods towards large language models (LLMs) and end-to-end speech processing.&amp;nbsp;We examine the stages of KE, the characteristics of keyphrases in written and spoken language, and terminological nuances. Various methods for automatic KE are discussed and analyzed in detail: statistical, hybrid, machine learning-based, and structural.&amp;nbsp;The review dedicates substantial attention to emerging paradigms, including keyphrase generation using LLMs, and provides a detailed overview of methodologies and challenges in automatic corpus annotation.&amp;nbsp;Furthermore,&amp;nbsp;we specifically analyze current directions and inherent difficulties in KE for spoken language,&amp;nbsp;comparing transcript-based and end-to-end acoustic approaches.&amp;nbsp;This synthesis leads us to conclude that the field is moving towards a more integrated, context-aware paradigm. Future progress will depend on addressing key challenges such as data scarcity for low-resource languages, effective multimodal fusion, and the nuanced evaluation of generative KE systems.</p></abstract><trans-abstract xml:lang="en"><p>The exponential growth of textual and audiovisual information has made the task of automatic keyphrase extraction (KE) increasingly significant. This article provides a comprehensive analysis of contemporary theoretical approaches and practical solutions for KE across&amp;nbsp;both text and speech modalities.&amp;nbsp;The primary contribution of this work is its systematic synthesis of these often-disparate research strands into a unified analytical framework, highlighting the evolution of the field from statistical methods towards large language models (LLMs) and end-to-end speech processing.&amp;nbsp;We examine the stages of KE, the characteristics of keyphrases in written and spoken language, and terminological nuances. Various methods for automatic KE are discussed and analyzed in detail: statistical, hybrid, machine learning-based, and structural.&amp;nbsp;The review dedicates substantial attention to emerging paradigms, including keyphrase generation using LLMs, and provides a detailed overview of methodologies and challenges in automatic corpus annotation.&amp;nbsp;Furthermore,&amp;nbsp;we specifically analyze current directions and inherent difficulties in KE for spoken language,&amp;nbsp;comparing transcript-based and end-to-end acoustic approaches.&amp;nbsp;This synthesis leads us to conclude that the field is moving towards a more integrated, context-aware paradigm. Future progress will depend on addressing key challenges such as data scarcity for low-resource languages, effective multimodal fusion, and the nuanced evaluation of generative KE systems.</p></trans-abstract><kwd-group xml:lang="ru"><kwd>Automatic keyphrase extraction</kwd><kwd>Spoken language processing</kwd><kwd>Speech summarization</kwd><kwd>Automatic annotation</kwd><kwd>Computational linguistics</kwd><kwd>Corpus linguistics</kwd></kwd-group><kwd-group xml:lang="en"><kwd>Automatic keyphrase extraction</kwd><kwd>Spoken language processing</kwd><kwd>Speech summarization</kwd><kwd>Automatic annotation</kwd><kwd>Computational linguistics</kwd><kwd>Corpus linguistics</kwd></kwd-group></article-meta></front><back><ack><p>This research was supported by Saint-Petersburg State University, project № 123042000068-8 </p></ack><ref-list><title>Список литературы</title><ref id="B1"><mixed-citation>Abramov, E. G. (2011). Selection of keywords for a scientific article, Nauchnaya periodika: problemy i resheniya, 2, 35&amp;ndash;40. (In Russian)</mixed-citation></ref><ref id="B2"><mixed-citation>Abrosimov, K. I. and Mosyagina, A. G. (2022). Sodner for Russian nested named entity recognition, Computational Linguistics and Intellectual Technologies: Papers from the Annual International Conference &amp;ldquo;Dialogue&amp;rdquo;, Moscow, Russia, June 15&amp;ndash;18, 2022, 1&amp;ndash;7. (In English)</mixed-citation></ref><ref id="B3"><mixed-citation>Antipina, E. S. and Prokhorenkova, S. A. (2020). Modeling of creative linguistic personality on the material of romances on A. S. Pushkin&amp;rsquo;s poem &amp;ldquo;Do not sing, beauty, in my presence&amp;hellip;&amp;rdquo;, Prepodavatel ХХI vek, 2&amp;nbsp;(2). (In Russian)</mixed-citation></ref><ref id="B4"><mixed-citation>Augenstein, I., Das, M., Riedel, S., Vikraman, L., and McCallum, A. (2017). SemEval 2017 Task 10: ScienceIE &amp;mdash; Extracting Keyphrases and Relations from Scientific Publications. In Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval&amp;ndash;2017), 546&amp;ndash;555, Vancouver, Canada. Association for Computational Linguistics. (In English)</mixed-citation></ref><ref id="B5"><mixed-citation>BN, S., Shing, H.-C., Xu, L., Strong, M., Burnsky, J., Ofor, J., Mason, J.R., Chen, S., Srinivasan, S., Shivade, C., Moriarty, J., Cohen, J.P. (2025). Fact-Controlled Diagnosis of Hallucinations in Medical Text Summarization, Proceedings Interspeech 2025, Rotterdam, The Netherlands, August 17&amp;ndash;21, 2025, 3070&amp;ndash;3074. (In English)</mixed-citation></ref><ref id="B6"><mixed-citation>Bolshakova, E. I., Klyshinskiy, E. S., Lande, D. V., Noskov, A. A., Peskova, O. V. and Yagunova, E. V. (2011). Avtomaticheskaya obrabotka tekstov na estestvennom jazyke i kompyuternaya lingvistika [Automatic processing of texts in natural language and computational linguistics: textbook], MIEM, Moscow, Russia. (In Russian)</mixed-citation></ref><ref id="B7"><mixed-citation>Boudin, F. and Aizawa, A. (2025). An Analysis of Datasets, Metrics and Models in Keyphrase Generation. In Proceedings of the Fourth Workshop on Generation, Evaluation and Metrics (GEM&amp;sup2;), 973&amp;ndash;973, Vienna, Austria and virtual meeting. Association for Computational Linguistics. (In English)</mixed-citation></ref><ref id="B8"><mixed-citation>Brezina, V., McEnery, T. and Wattam, S. (2015). Collocations in context: A new perspective on collocation networks, International Journal of Corpus Linguistics, 20&amp;nbsp;(2). (In English)</mixed-citation></ref><ref id="B9"><mixed-citation>Campos, R., Mangaravite, V., Pasquali, A., Jatowt, A., Jorge, A., Nunes, C. and Jatowt, A. (2020). YAKE! Keyword Extraction from Single Documents using Multiple Local Features, Information Sciences Journal, 509, 257&amp;ndash;289. (In English)</mixed-citation></ref><ref id="B10"><mixed-citation>Chen, P. I. and Lin, S. J. (2010). Automatic keyword prediction using Google similarity distance, Expert Systems with Applications, 37(3), 1928&amp;ndash;1938. (In English)</mixed-citation></ref><ref id="B11"><mixed-citation>Chen, W., Chan, H. P., Li, P. and King, I. (2020). Exclusive Hierarchical Decoding for Deep Keyphrase Generation, Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Florence, Italy, July 5, 2020, 1095&amp;ndash;1105. (In English)</mixed-citation></ref><ref id="B12"><mixed-citation>Chen, Y.-N., Huang, Y., Lee, H.-Y. and Lee, L.-S. (2012). Unsupervised two-stage keyword extraction from spoken documents by topic coherence and support vector machine, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Kyoto, Japan, March 25&amp;ndash;30, 2012, 5041&amp;ndash;5044. (In English)</mixed-citation></ref><ref id="B13"><mixed-citation>Devlin, J., Chang, M.-W., Lee, K. and Toutanova, K. (2019). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. North American Chapter of the Association for Computational Linguistics. (In English)</mixed-citation></ref><ref id="B14"><mixed-citation>Dostal, M. (2011). Automatic Keyphrase Extraction Based on NLP and Statistical Methods, Proceedings of the Dateso 2011: Annual International Workshop on Databases, Texts, Specifications and Objects, Pisek, Czech Republic, April 20, 2011, 140&amp;ndash;145. (In English)</mixed-citation></ref><ref id="B15"><mixed-citation>Dubinina, E. Yu. (2020). Extracting keywords of a scientific article text in the process of creating an automatic abstract, Vestnik VGU. Seriya: Filologiya. Zhurnalistika, 1, 26&amp;ndash;28. (In Russian)</mixed-citation></ref><ref id="B16"><mixed-citation>Evert, S. (2022). Measuring keyness, Digital Humanities 2022: Conference Abstracts, Tokyo, Japan, 25&amp;ndash;29 July 2022, 202&amp;ndash;205. (In Russian)</mixed-citation></ref><ref id="B17"><mixed-citation>Freisinger, S., Seeberger, P., Ranzenberger, T., Bocklet, T. and Riedhammer, K. (2025). Towards Multi-Level Transcript Segmentation: LoRA Fine-Tuning for Table-of-Contents Generation, Proceedings Interspeech 2025, Rotterdam, The Netherlands, August 17&amp;ndash;21, 2025, 276&amp;ndash;280. (In Russian)</mixed-citation></ref><ref id="B18"><mixed-citation>Gabrielatos, C. (2018). Keyness Analysis: nature, metrics and techniques, Corpus Approaches to Discourse: A critical review, Routledge, Oxford. (In English)</mixed-citation></ref><ref id="B19"><mixed-citation>Gallina, Y., Boudin, F. and Daille B. (2019). KPTimes: A Large-Scale Dataset for Keyphrase Generation on News Documents, Proceedings of the 12th International Conference on Natural Language Generation, Association for Computational Linguistics, Tokyo, Japan, 130&amp;ndash;135. (In English)</mixed-citation></ref><ref id="B20"><mixed-citation>Glazkova, A., Morozov, D. (2024). Exploring fine-tuned generative models for keyphrase selection: A case study for Russian. DAMDID-2024. https://doi.org/10.1007/978-3-032-03997-2_7 (In English)</mixed-citation></ref><ref id="B21"><mixed-citation>Glazkova, A., Morozov, D., Garipov, T. (2025). Key algorithms for keyphrase generation: Instruction-based LLMs for Russian scientific keyphrases. Analysis of Images, Social Networks and Texts, Springer Nature Switzerland, Cham, 107&amp;ndash;119. (In English)</mixed-citation></ref><ref id="B22"><mixed-citation>Glazkova, A., Morozov, D., Mitrofanova, O., Savchuk, S. (2024). Generation of keywords for regional media texts using large language models [Generatsiya klyuchevyh slov dlja tekstov regionalnyh SMI s pomoshchyu bolshih yazykovyh modeley]. International scientific conference dedicated to the 20th anniversary of the Russian National Corpus, The</mixed-citation></ref><ref id="B23"><mixed-citation>V.V. Vinogradov Russian Language Institute of the Russian Academy of Sciences, Moscow, Russia, 37&amp;ndash;39. (In English)</mixed-citation></ref><ref id="B24"><mixed-citation>Gong, Z., Ai, L., Deshpande, H., Johnson, A., Phung, E., Wu, Z., Emami, A. and Hirschberg, J. (2025). Comparison-Based Automatic Evaluation for Meeting Summarization, Proceedings Interspeech 2025, Rotterdam, The Netherlands, August 17&amp;ndash;21, 2025, 291&amp;ndash;295. (In English)</mixed-citation></ref><ref id="B25"><mixed-citation>Grineva, M. and Grinev, M. (2009). Analysis of text documents for extracting thematically grouped key terms, Trudy Instituta sistemnogo programmirovaniya RAN, 16, 155&amp;ndash;165. (In Russian)</mixed-citation></ref><ref id="B26"><mixed-citation>Grootendorst, M. (2020). KeyBERT: Minimal Keyword Extraction with BERT [Online], available at: http://doi.org/10.5281/</mixed-citation></ref><ref id="B27"><mixed-citation>zenodo.4461265 (Accessed 11.10.2025). (In English)</mixed-citation></ref><ref id="B28"><mixed-citation>Grudeva, E. V. and Churilina, L. N. (2019). Retelling as a secondary text: linguistic and methodological potential, Magnitogorskiy gosudarstvennyy tekhnicheskiy universitet im. G.I. Nosova, Magnitogorsk, Russia. (In Russian)</mixed-citation></ref><ref id="B29"><mixed-citation>Grudeva, E. V. and Gubushkina, A. A. (2020). Selection of keywords and oral retelling as secondary texts (based on the secondary speech activity of 6th grade students), Vestnik Cherepovetskogo gosudarstvennogo universiteta, 2&amp;nbsp;(95). (In Russian)</mixed-citation></ref><ref id="B30"><mixed-citation>Gulyaev, O. V. and Lukashevich, N. V. (2013). Automatic classification of texts based on section heading, Novyye informatsionnyye tekhnologii v avtomatizirovannykh sistemakh, 16, 238&amp;ndash;244. (In Russian)</mixed-citation></ref><ref id="B31"><mixed-citation>Guseva, D., Mitrofanova, O. and Dolgushin, M. (2025). Human and Machine Keyphrase Perception in Russian Text and Speech, Speech and Computer: 26th International Conference, SPECOM 2024, Belgrade, Serbia, November 25&amp;ndash;28, 2024, 265&amp;ndash;280. (In English)</mixed-citation></ref><ref id="B32"><mixed-citation>Hasan, K. and Ng, V. (2014). Automatic Keyphrase Extraction: A Survey of the State of the Art, Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, June 23 &amp;ndash;25, 2014, Baltimore, Maryland, USA, 1262&amp;ndash;1273. (In English)</mixed-citation></ref><ref id="B33"><mixed-citation>Hulth, A. (2003). Improved Automatic Keyword Extraction Given More Linguistic Knowledge. In Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing, 216&amp;ndash;223. (In English)</mixed-citation></ref><ref id="B34"><mixed-citation>Jacquemin, C. and Bourigault, D. (2003). Term extraction and automatic indexing, Handbook of Computational Linguistics, Oxford University Press, 599&amp;ndash;615. (In English)</mixed-citation></ref><ref id="B35"><mixed-citation>Jones, K. S. (1972). A statistical interpretation of term specificity and its application in retrieval, Journal of Documentation, 28 (1), 11&amp;ndash;21. (In English)</mixed-citation></ref><ref id="B36"><mixed-citation>Kamshilova, O. N. (2013). Small forms of scientific text: keywords and abstract (informational aspect), Izvestiya Rossiyskogo Gosudarstvennogo pedagogicheskogo universiteta im. A.I. Gertsena, 156, 106&amp;ndash;117. (In Russian)</mixed-citation></ref><ref id="B37"><mixed-citation>Kano, T., Ogawa, A., Delcroix, M., Fukuda, R., Chen, W., Watanabe, S. (2025). Pick and Summarize: Integrating Extractive and Abstractive Speech Summarization, Proceedings Interspeech 2025, Rotterdam, The Netherlands, August 17&amp;ndash;21, 2025, 281&amp;ndash;285. (In English)</mixed-citation></ref><ref id="B38"><mixed-citation>Kilgarriff, A. (2009). Simple maths for keywords, Proceedings of the Corpus Linguistics 2009 Conference, Liverpool, UK, July 20&amp;ndash;23, 2009, 1&amp;ndash;6. (In English)</mixed-citation></ref><ref id="B39"><mixed-citation>Kodzasov, S. V. and Krivnova, O. F. (2001). Obshchaya fonetika [General phonetics], Moscow, Russia. (In Russian)</mixed-citation></ref><ref id="B40"><mixed-citation>Koloski, B., Pollak, S., &amp;Scaron;krlj, B. and Martinc, M. (2021). Extending Neural Keyword Extraction with TF-IDF tagset matching, Proceedings of the EACL Hackashop on News Media Content Analysis and Automated Report Generation, April, 2021, 22&amp;ndash;29. (In English)</mixed-citation></ref><ref id="B41"><mixed-citation>Kotey, S., Dahyot, R. and Harte, N. (2023). Query Based Acoustic Summarization for Podcasts, Proc. Interspeech 2023, Dublin, Ireland, August 20&amp;ndash;24, 2023, 1483&amp;ndash;1487. (In English)</mixed-citation></ref><ref id="B42"><mixed-citation>Krapivin, M., Autayeu, A., Marchese, M., Blanzieri, E., Segata, N. (2010). Keyphrases Extraction from Scientific Documents: Improving Machine Learning Approaches with Natural Language Processing. In: Chowdhury, G., Koo, C., Hunter, J. (eds) The Role of Digital Libraries in a Time of Global Change, ICADL 2010, Lecture Notes in Computer Science, vol. 6102, Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13654-2_12 (In English)</mixed-citation></ref><ref id="B43"><mixed-citation>Krasavina, V. D. and Mirzagitova, A. R. (2015). Optimization of search in LeadScanner system using automatic extraction of keywords and word combinations, Trudy mezhdunarodnoy konferentsii &amp;ldquo;Korpusnaya lingvistika&amp;ndash;2015&amp;rdquo;, St. Petersburg, Russia. (In Russian)</mixed-citation></ref><ref id="B44"><mixed-citation>Kroll, M. and Kraus, K. (2024). Optimizing the role of human evaluation in LLM-based spoken document summarization systems, Proc. Interspeech 2024, Kos Island, Greece, September 1&amp;ndash;5, 2024, 1935&amp;ndash;1939. (In English)</mixed-citation></ref><ref id="B45"><mixed-citation>Le-Duc, K., Nguyen, K.-N., Vo-Dang, L. and Hy, T.-S. (2024). &amp;lsquo;Real-time Speech Summarization for Medical Conversations&amp;rsquo;, Proc. Interspeech 2024, Kos Island, Greece, September 1&amp;ndash;5, 2024, 1960&amp;ndash;1964. (In English)</mixed-citation></ref><ref id="B46"><mixed-citation>Lee, H.-y., Shiang, S.-R., Yeh, Ch.-F., Chen, Y.-N., Huang, Y., Kong, S.-y. and Lee, L.-S. (2014). Spoken Knowledge Organization by Semantic Structuring and a Prototype Course Lecture System for Personalized Learning, IEEE/ACM Transactions on Audio, Speech, and Language Processing, 22, 883&amp;ndash;898. (In English)</mixed-citation></ref><ref id="B47"><mixed-citation>Lee, W., Chun, M., Jeong, H., Jung, H. (2023). Toward keyword generation through large language models. Companion Proceedings of the 28th International Conference on Intelligent User Interfaces, 37&amp;ndash;40. &amp;nbsp;https://doi.org/10.1145/</mixed-citation></ref><ref id="B48"><mixed-citation>3581754.3584126 (In English)</mixed-citation></ref><ref id="B49"><mixed-citation>Litvak, M. (2013). DegExt: A language-independent keyphrase extractor, Journal of Ambient Intelligence and Humanized Computing, 4, 377&amp;ndash;387. (In English)</mixed-citation></ref><ref id="B50"><mixed-citation>Luhn, H.P. (1957). A Statistical Approach to Mechanized Encoding and Searching of Literary Information, IBM Journal of Research and Development, 1&amp;nbsp;(4), 309&amp;ndash;317. (In English)</mixed-citation></ref><ref id="B51"><mixed-citation>Mart&amp;iacute;nez‑Cruz, R., L&amp;oacute;pez‑L&amp;oacute;pez, A. J., and Portela, J. (2023). ChatGPT vs state‑of‑the‑art models: a benchmarking study in keyphrase generation task. arXiv preprint, arXiv:2304.14177. (In English)</mixed-citation></ref><ref id="B52"><mixed-citation>Mart&amp;iacute;nez-Cruz, R., L&amp;oacute;pez-L&amp;oacute;pez, A. J., Portela, J. (2025) ChatGPT vs state-of-the-art models: Abenchmarking study in keyphrase generation task, Applied Intelligence, 55&amp;nbsp;(1), 50. (In English)</mixed-citation></ref><ref id="B53"><mixed-citation>Marujo, L., Gershman, A., Carbonell, J., Frederking, R., and Neto, J. P. (2013). Supervised topical key phrase extraction of news stories using crowdsourcing, light filtering and co‑reference normalization. arXiv preprint, arXiv:1306.4886. (In English)</mixed-citation></ref><ref id="B54"><mixed-citation>Maskey, S. and Hirschberg, J. (2005). Comparing lexical, acoustic/prosodic, structural and discourse features for speech summarization, Proc. Interspeech 2005, Lisbon, Portugal, September 4 &amp;ndash; 8, 2005, 621&amp;ndash;624. (In English)</mixed-citation></ref><ref id="B55"><mixed-citation>Matsuura, K., Ashihara, T., Moriya, T., Mimura, M., Kano, T., Ogawa, A. and Delcroix, M. (2024). Sentence-wise Speech Summarization: Task, Datasets, and End-to-End Modeling with LM Knowledge Distillation, Proc. Interspeech 2024, Kos Island, Greece, September 1&amp;ndash;5, 2024, pp. 1945&amp;ndash;1949. (In English)</mixed-citation></ref><ref id="B56"><mixed-citation>Matsuura, K., Ashihara, T., Moriya, T., Tanaka, T., Kano, T., Ogawa, A. and Delcroix, M. (2023). Transfer Learning from Pre-trained Language Models Improves End-to-End Speech Summarization, Proc. Interspeech 2023, Dublin, Ireland, August 20&amp;ndash;24, 2023, 2943&amp;ndash;2947. (In English)</mixed-citation></ref><ref id="B57"><mixed-citation>Meng, R., Zhao, S., Han, S., He, D., Brusilovsky, P. and Chi, Y. (2017) Deep Keyphrase Generation, Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Association for Computational Linguistics, Vancouver, Canada, 582&amp;ndash;592. (In English)</mixed-citation></ref><ref id="B58"><mixed-citation>Meng, R., Zhao, S., Han, S., He, D., Brusilovsky, P., and Chi, Y. (2017). Deep Keyphrase Generation, in Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 582&amp;ndash;592, Vancouver, Canada. Association for Computational Linguistics. (In English)</mixed-citation></ref><ref id="B59"><mixed-citation>Mihalcea, R. TextRank: Bringing order into texts, in Proc. EMNLP, 2004, 4. 404&amp;ndash;411. (In English)</mixed-citation></ref><ref id="B60"><mixed-citation>Mijić, J., Dalbelo Ba&amp;scaron;ić, B., &amp;Scaron;najder, J. (2010). Robust Keyphrase Extraction for a Largescale Croatian News Production System, in Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing (EMNLP-2010), 59&amp;ndash;99. (In English)</mixed-citation></ref><ref id="B61"><mixed-citation>Mitrofanova, O. A. and Gavrilik, D. A. (2022). Experiments on automatic extraction of key expressions in stylistically diverse corpora of Russian texts, Terra Linguistica, 13 (4), 22&amp;ndash;40. (In Russian)</mixed-citation></ref><ref id="B62"><mixed-citation>Morozov, D. A., Glazkova, A. V., Tyutyul&amp;rsquo;nikov, M. A. and Iomdin, B. L. (2023). Generation of keywords for annotations of Russian scientific articles, Vestnik NSU. Seriya: Lingvistika i mezhkul&amp;rsquo;turnaya kommunikatsiya, 1. (In Russian)</mixed-citation></ref><ref id="B63"><mixed-citation>Moskvina, A. D., Erofeeva, A. R., Mitrofanova, O. A. and Kharabet, Ya. K. (2017). Automatic extraction of keywords and word combinations from Russian corpus of texts using RAKE algorithm, Trudy Mezhdunarodnoy konferentsii &amp;ldquo;Korpusnaya lingvistika&amp;ndash;2017&amp;rdquo; (Sankt-Peterburg, 27&amp;ndash;30 iyunya 2017 g.), Izd-vo SPbGU, Russia, 268&amp;ndash;275. (In Russian)</mixed-citation></ref><ref id="B64"><mixed-citation>Moskvina, A., Sokolova, E. and Mitrofanova, O. (2018). KeyPhrase extraction from the Russian corpus on Linguistics by means of KEA and RAKE algorithm, Data Analytics and Management in Data Intensive Domains: XX International Conference DAMDID/RCDL&amp;rsquo;2018: Conference Proceedings, Moscow, Russia, October 9&amp;ndash;12, 2018, 369&amp;ndash;372. (In English)</mixed-citation></ref><ref id="B65"><mixed-citation>Moskvitina, T. N. (2009). Keywords and their functions in scientific text, Vestnik ChGPU, 11, 270&amp;ndash;283. (In Russian)</mixed-citation></ref><ref id="B66"><mixed-citation>Moskvitina, T. N. (2018). Methods for extracting keywords when abstracting a scientific text, Vestnik Tomskogo gosudarstvennogo universiteta, 8, 45&amp;ndash;50. (In Russian)</mixed-citation></ref><ref id="B67"><mixed-citation>Nguyen, T. D., Kan, M-Y. (2007). Keyphrase Extraction in Scientific Publications. In: Goh, D. H-L., Cao, T. H., S&amp;oslash;lvberg, I. T., Rasmussen, E. (eds) Asian Digital Libraries. Looking Back 10 Years and Forging New Frontiers, ICADL 2007, Lecture Notes in Computer Science, vol. 4822, Springer, Berlin, Heidelberg. P. 317&amp;ndash;326. https://doi.org/10.1007/</mixed-citation></ref><ref id="B68"><mixed-citation>978-3-540-77094-7_41 (In English)</mixed-citation></ref><ref id="B69"><mixed-citation>Papusha, I. S. (2008). Complex syntactic whole: keywords or herms, Vestnik Assotsiatsii VUZov turizma i servisa, 3, 48&amp;ndash;54. (In Russian)</mixed-citation></ref><ref id="B70"><mixed-citation>Piotrovskiy, R. G., Bektaev, K. B. and Piotrovskaya, A. A. (1977). Matematicheskaya lingvistika: ucheb. posobiye dlya ped. institutov [Mathematical linguistics: textbook for pedagogical institutes], Vysshaya shkola, Moscow, Russia. (In Russian)</mixed-citation></ref><ref id="B71"><mixed-citation>Popova, S. V. and Danilova, V. V. (2014). &amp;lsquo;Representation of documents in the task of clustering scientific text annotations&amp;rsquo;, Nauchno-tekhnicheskiy vestnik informatsionnykh tekhnologiy, mekhaniki i optiki, 1 (89), 99&amp;ndash;107. (In Russian)</mixed-citation></ref><ref id="B72"><mixed-citation>Riedhammer, K., Favre, B. and Hakkani-Tur, D. (2010). Long story short&amp;ndash;global unsupervised models for keyphrase based meeting summarization, Speech Communication, 52 (10), 801&amp;ndash;815. (In English)</mixed-citation></ref><ref id="B73"><mixed-citation>Rose, S. J., Cowley, W. E., Crow, V. L. and Cramer, N. O. (2009). Rapid Automatic Keyword Extraction for Information Retrieval and Analysis, Text Mining: Applications and Theory, 1&amp;ndash;20. (In English)</mixed-citation></ref><ref id="B74"><mixed-citation>Ryu, S., Do, H., Kim, Y., Lee, G.G. and Ok, J. (2024). Key-Element-Informed sLLM Tuning for Document Summarization, Proc. Interspeech 2024, Kos Island, Greece, September 1&amp;ndash;5, 2024, 1940-1944. (In English)</mixed-citation></ref><ref id="B75"><mixed-citation>Sakharnyy, L. V. (1982). Actual division and text compression (on the use of informatics methods in psycholinguistics, Teoreticheskiye aspekty derivatsii, Perm, Russia. (In Russian)</mixed-citation></ref><ref id="B76"><mixed-citation>Sakharnyy, L. V. and Shtern, A. S. (1988). Selection of keywords as a type of text, Leksicheskiye aspekty v sisteme professional&amp;rsquo;no-orientirovannogo obucheniya inoyazychnoy rechevoy deyatel&amp;rsquo;nosti, Perm, Russia, 34&amp;ndash;51. (In Russian)</mixed-citation></ref><ref id="B77"><mixed-citation>Shang, H., Li, Z., Guo, J., Li, S., Rao, Z., Luo, Y., Wei, D. and Yang, H. (2024). An End-to-End Speech Summarization Using Large Language Model, Proc. Interspeech 2024, Kos Island, Greece, September 1&amp;ndash;5, 2024, 1950&amp;ndash;1954. (In English)</mixed-citation></ref><ref id="B78"><mixed-citation>Shao, L., Zhang, L., Peng, M., Ma, G., Yue, H., Sun, M., and Su, J. (2024). One2set+ large language model: Best partners for keyphrase generation, in Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 11140&amp;ndash;11153. (In English)</mixed-citation></ref><ref id="B79"><mixed-citation>Shekhtman, N. A. (2005). Ponimaniye rechevogo proizvedeniya i gipertekst [Understanding of a speech work and hypertext], Izd-vo OGPU, Orenburg, Russia. (In Russian)</mixed-citation></ref><ref id="B80"><mixed-citation>Sheremetyeva, S. O. and Osminin, P. G. (2015). Methods and models of automatic keyword extraction, Vestnik YUrGU. Seriya &amp;ldquo;Lingvistika&amp;rdquo;, 12 (1), 76&amp;ndash;81. (In Russian)</mixed-citation></ref><ref id="B81"><mixed-citation>Sokolova, E. V. and Mitrofanova, O. A. (2018). Automatic extraction of keywords and word combinations from Russian texts using KEA algorithm, Kompyuternaya lingvistika i vychislitel&amp;rsquo;nyye ontologii, 157&amp;ndash;165. (In Russian)</mixed-citation></ref><ref id="B82"><mixed-citation>Song, M., Geng, X., Yao, S., Lu, S., Feng, Y., Jing, L. (2023). Large language models as zero-shot keyphrase extractor: A preliminary empirical study. arXiv preprint arXiv:2312.15156 (In English)</mixed-citation></ref><ref id="B83"><mixed-citation>Song, M., Jiang, H., Shi, S., Yao, S., Lu, S., Feng, Y., Liu, H., and Jing, L. (2023). Is ChatGPT a good keyphrase generator? A preliminary study. arXiv preprint, arXiv:2303.13001. (In English)</mixed-citation></ref><ref id="B84"><mixed-citation>Sterckx, L., Demeester, T., Deleu, J., et al. (2018). Creation and evaluation of large keyphrase extraction collections with multiple opinions. Language Resources &amp;amp; Evaluation, 52, 503&amp;ndash;532. (In English)</mixed-citation></ref><ref id="B85"><mixed-citation>Su, J., Zhang, L., Hassanzadeh, H. R. and Schaaf, T. (2022). Extract and Abstract with BART for Clinical Notes from Doctor-Patient Conversations, Proc. Interspeech 2022, Incheon, Korea, September 18&amp;ndash;22, 2022, 2488&amp;ndash;2492. (In English)</mixed-citation></ref><ref id="B86"><mixed-citation>Svetozarova, N. D. and Shtern, A. S. (1989). Key and phonematically highlighted words of the text, Eksperimentalnaya fonetika, Moscow, Russia, 157&amp;ndash;170. (In Russian)</mixed-citation></ref><ref id="B87"><mixed-citation>Troshina, A. (2025). Text Preprocessing for Keyword and Key Phrase Extraction. In: Bakaev, M., et al. Internet and Modern Society. Human-Computer Communication. IMS 2024. Communications in Computer and Information Science, vol 2534. Springer, Cham, 105&amp;ndash;112. https://doi.org/10.1007/978-3-031-96177-9_9 &amp;nbsp;(In English)</mixed-citation></ref><ref id="B88"><mixed-citation>Tsarfaty, R., Seddah, D., K&amp;uuml;bler, S. and Nivre, J. (2013). Parsing Morphologically Rich Languages: Introduction to the Special Issue, Computational Linguistics, 3&amp;nbsp;9(1), 15&amp;ndash;22. (In English)</mixed-citation></ref><ref id="B89"><mixed-citation>Umair, M., Sultana, T., and Lee, Y.-K. (2024). Pre-trained language models for keyphrase prediction: A review. ICT Express, 10(4), 871&amp;ndash;890. (In English)</mixed-citation></ref><ref id="B90"><mixed-citation>Umair, M., Sultana, T., Lee, Y. K. (2024) Pre-trained language models for keyphrase prediction: A review. ICT Express, 10&amp;nbsp;(4), 871&amp;ndash;890. https://doi.org/10.1016/j.icte.2024.05.015 (In English)</mixed-citation></ref><ref id="B91"><mixed-citation>Vanyushkin, A. S. and Grashchenko, L. A. (2016). Methods and algorithms for extracting keywords, Novyye informatsionnyye tekhnologii v avtomatizirovannykh sistemakh, 19, 85&amp;ndash;93. (In Russian)</mixed-citation></ref><ref id="B92"><mixed-citation>Vanyushkin, A. S. and Grashchenko, L. A. (2017). Evaluation of keyword extraction algorithms: tools and resources, Novyye informatsionnyye tekhnologii v avtomatizirovannykh sistemakh, 20. (In Russian)</mixed-citation></ref><ref id="B93"><mixed-citation>Vanyushkin, A. S. and Grashchenko, L. A. (2018). On the marking of text corpora with keywords, Novyye informatsionnyye tekhnologii v avtomatizirovannykh sistemakh, 21, 207&amp;ndash;211.</mixed-citation></ref><ref id="B94"><mixed-citation>Vartakavi, A. and Garg, A. (2020). Podsumm: Podcast audio summarization [Online], available at: https://arxiv.org/pdf/2009.10315 &amp;nbsp;(Accessed 3.11.2025). (In English)</mixed-citation></ref><ref id="B95"><mixed-citation>Vasileva, V. V. and Kon&amp;rsquo;kov, V. I. (2015). Ustnaya rech: praktikum [Spoken speech: workshop], S.-Peterb. gos. un-t, St. Petersburg, Russia. (In Russian)</mixed-citation></ref><ref id="B96"><mixed-citation>Vinogradova, N. V. and Ivanov, V. K. (2016). &amp;lsquo;Modern methods of automated extraction of keywords from text&amp;rsquo;, Informatsionnyye resursy Rossii, 4, 13&amp;ndash;18. (In Russian)</mixed-citation></ref><ref id="B97"><mixed-citation>Wan, X. and Xiao, J. (2008). Single document keyphrase extraction using neighborhood knowledge. In Proceedings of the 23rd National Conference on Artificial Intelligence, volume 2, AAAI Press, 855&amp;ndash;860. (In English)</mixed-citation></ref><ref id="B98"><mixed-citation>Wang, J. (2022). ESSumm: Extractive Speech Summarization from Untranscribed Meeting, Proc. Interspeech 2022, Incheon, Korea, September 18&amp;ndash;22, 2022, 3243&amp;ndash;3247. (In English)</mixed-citation></ref><ref id="B99"><mixed-citation>Wang, S., Dai, S., and Jiang, J. (2024). Thinking like an author: A zero‑shot learning approach to key phrase generation with large language model. In A. Bifet, J. Davis, T. Krilavičius, M. Kull, E. Ntoutsi, and I. Žliobaitė (Eds.), Machine Learning and Knowledge Discovery in Databases. Research Track. Springer Nature Switzerland, Cham, 335&amp;ndash;350. (In English)</mixed-citation></ref><ref id="B100"><mixed-citation>Wienecke, Y. (2020). Automatic Keyphrase Extraction from Russian-Language Scholarly Papers in Computational Linguistics, University Honors Theses, Portland State University. (In English)</mixed-citation></ref><ref id="B101"><mixed-citation>Wilson, A. (2013). Embracing Bayes factors for key item analysis in corpus linguistics, New approaches to the study of linguistic variability. Language Competence and Language Awareness in Europe, 4, 3&amp;ndash;11. (In English)</mixed-citation></ref><ref id="B102"><mixed-citation>Xiong, L., Chuan Hu, Chenyan Xiong, Campos D., and Overwijk, A. (2019). Open Domain Web Keyphrase Extraction Beyond Language Modeling. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 5175&amp;ndash;5184, Hong Kong, China. Association for Computational Linguistics. (In English)</mixed-citation></ref><ref id="B103"><mixed-citation>Xiong, L., Hu, C., Xiong, C., Campos, D. and Overwijk, A. (2019). Open Domain Web Keyphrase Extraction Beyond Language Modeling. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Association for Computational Linguistics, Hong Kong, China, 5175&amp;ndash;5184. (In English)</mixed-citation></ref><ref id="B104"><mixed-citation>Yagunova, E. V. (2004). The role of keywords in the perception of spoken and written text (based on the Russian language), Chelovek pishushchiy i chitayushchiy: problemy i nablyudeniya: Materialy i nablyudeniya: Materialy mezhdunarodnoy konferentsii 14-16 marta 2002 g. Sankt-Peterburg, Izd-vo SPbGU, Russia, 197&amp;ndash;204. (In Russian)</mixed-citation></ref><ref id="B105"><mixed-citation>Yagunova, E. V. (2010). &amp;lsquo;Experiment and calculations in the analysis of keywords of a fictional text&amp;rsquo;, Filosofiya yazyka. Lingvistika. Lingvodidaktika, 1, 83&amp;ndash;89.</mixed-citation></ref><ref id="B106"><mixed-citation>Zakharov, V. P. and Khokhlova, M. V. (2014). &amp;lsquo;Extraction of terminological phrases from special texts based on various association measures&amp;rsquo;, XVII Vserossiyskaya obyedinennaya konferentsiya &amp;ldquo;Internet i sovremennoye obshchestvo&amp;rdquo; (IMS-2014), St. Petersburg, Russia. (In Russian)</mixed-citation></ref><ref id="B107"><mixed-citation>Zhang, C., Wang, H., Liu, Y., Wu, D., Liao, Y. and Wang, B. (2008). Automatic keyword extraction from documents using conditional random fields, Journal of Computational Information Systems, 4(3), 1169&amp;ndash;1180. (In English)</mixed-citation></ref></ref-list></back></article>