<?xml version='1.0' encoding='utf-8'?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.2 20190208//EN" "http://jats.nlm.nih.gov/publishing/1.2/JATS-journalpublishing1.dtd">
<article article-type="research-article" dtd-version="1.2" xml:lang="ru" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><front><journal-meta><journal-id journal-id-type="issn">2313-8912</journal-id><journal-title-group><journal-title>Research Result. Theoretical and Applied Linguistics</journal-title></journal-title-group><issn pub-type="epub">2313-8912</issn></journal-meta><article-meta><article-id pub-id-type="doi">10.18413/2313-8912-2023-9-1-0-3</article-id><article-id pub-id-type="publisher-id">3060</article-id><article-categories><subj-group subj-group-type="heading"><subject>TEXT COMPLEXITY PREDICTORS: METHODS AND APPROACHES FOR ASSESSMENT</subject></subj-group></article-categories><title-group><article-title>&lt;strong&gt;Terminology use in school textbooks: corpus analysis&lt;/strong&gt;</article-title><trans-title-group xml:lang="en"><trans-title>&lt;strong&gt;Terminology use in school textbooks: corpus analysis&lt;/strong&gt;</trans-title></trans-title-group></title-group><contrib-group><contrib contrib-type="author"><name-alternatives><name xml:lang="ru"><surname>Monakhov</surname><given-names>Sergei I.</given-names></name><name xml:lang="en"><surname>Monakhov</surname><given-names>Sergei I.</given-names></name></name-alternatives><email>sergomon@gmail.com</email><xref ref-type="aff" rid="aff1" /></contrib><contrib contrib-type="author"><name-alternatives><name xml:lang="ru"><surname>Turchanenko</surname><given-names>Vladimir V.</given-names></name><name xml:lang="en"><surname>Turchanenko</surname><given-names>Vladimir V.</given-names></name></name-alternatives><email>turchanenko@mail.ru</email><xref ref-type="aff" rid="aff2" /></contrib><contrib contrib-type="author"><name-alternatives><name xml:lang="ru"><surname>Cherdakov</surname><given-names>Dmitrii N.</given-names></name><name xml:lang="en"><surname>Cherdakov</surname><given-names>Dmitrii N.</given-names></name></name-alternatives><email>dm.cherdakov@gmail.com</email><xref ref-type="aff" rid="aff3" /></contrib></contrib-group><aff id="aff1"><institution>Friedrich Schiller University Jena, Germany</institution></aff><aff id="aff3"><institution>Saint Petersburg University, Russia</institution></aff><aff id="aff2"><institution>Institute of Russian Literature (Pushkinsky Dom) of the Russian Academy of Sciences, Russia</institution></aff><pub-date pub-type="epub"><year>2023</year></pub-date><volume>9</volume><issue>1</issue><fpage>0</fpage><lpage>0</lpage><self-uri content-type="pdf" xlink:href="/media/linguistics/2023/1/Лингвистика_9_1_2023-27-49_en__ru.pdf" /><abstract xml:lang="ru"><p>The article presents the methods and results of the study that investigated the use of terminology in textbooks for secondary schools in Russia. The data were taken from a full-text DIY corpus of 207 textbooks for grades 5-11. The toolkit included models trained with the Word2Vec algorithms driven by the ideas of distributional semantics. The models were used to improve traditional automatic term extraction based on word frequency statistics. Numerical representation of word collocation patterns and their semantic similarity enabled the following: more effective automatic term extraction with a clear dividing line between terminology per se and high-frequency common words; comparative analysis of inventory and functioning of terms in textbooks for different school subjects and grades; analysis of the dynamics of new terms entering educational and methodological complexes and insights into terminological relations between textbooks for different grades. The study included another DIY corpus compiled of scholarly articles across the subjects taught at school. It was used to identify differences in term use in textbooks and scholarly texts as well as in non-specific and popular science contexts. The latter was facilitated by the RusVectōrēs word embedding model. The comprehensive analysis identified some patterns in term functioning relevant for particular school subjects or groups of subjects. The results were evaluated in view of the theory of text complexity, teaching methodology and didactics. The study found some contradictions between the expected and real text complexity. It also showed certain discrepancy between text complexity and basic didactic principles.</p></abstract><trans-abstract xml:lang="en"><p>The article presents the methods and results of the study that investigated the use of terminology in textbooks for secondary schools in Russia. The data were taken from a full-text DIY corpus of 207 textbooks for grades 5-11. The toolkit included models trained with the Word2Vec algorithms driven by the ideas of distributional semantics. The models were used to improve traditional automatic term extraction based on word frequency statistics. Numerical representation of word collocation patterns and their semantic similarity enabled the following: more effective automatic term extraction with a clear dividing line between terminology per se and high-frequency common words; comparative analysis of inventory and functioning of terms in textbooks for different school subjects and grades; analysis of the dynamics of new terms entering educational and methodological complexes and insights into terminological relations between textbooks for different grades. The study included another DIY corpus compiled of scholarly articles across the subjects taught at school. It was used to identify differences in term use in textbooks and scholarly texts as well as in non-specific and popular science contexts. The latter was facilitated by the RusVectōrēs word embedding model. The comprehensive analysis identified some patterns in term functioning relevant for particular school subjects or groups of subjects. The results were evaluated in view of the theory of text complexity, teaching methodology and didactics. The study found some contradictions between the expected and real text complexity. It also showed certain discrepancy between text complexity and basic didactic principles.</p></trans-abstract><kwd-group xml:lang="ru"><kwd>Term</kwd><kwd>Terminology</kwd><kwd>School textbook</kwd><kwd>Text complexity</kwd><kwd>Word frequency</kwd><kwd>Vector representation</kwd><kwd>Word2Vec</kwd><kwd>Neural network</kwd></kwd-group><kwd-group xml:lang="en"><kwd>Term</kwd><kwd>Terminology</kwd><kwd>School textbook</kwd><kwd>Text complexity</kwd><kwd>Word frequency</kwd><kwd>Vector representation</kwd><kwd>Word2Vec</kwd><kwd>Neural network</kwd></kwd-group></article-meta></front><back><ack><p>The reported study was funded by the&amp;nbsp;Russian Foundation for Basic Research, Project number 19-29-14032 mk &amp;ldquo;Study of terminological subsystems of modern school textbooks in Russian with the help of word embedding models Word2Vec and neural networks&amp;rdquo;.</p></ack><ref-list><title>Список литературы</title><ref id="B1"><mixed-citation>Brownlee,&amp;nbsp;J. (2017). Deep Learning for Natural Language Processing: Develop Deep Learning Models for your Natural Language Problems, Machine Learning Mastery Publ., Vermont, USA. (In&amp;nbsp;English)</mixed-citation></ref><ref id="B2"><mixed-citation>Cabré,&amp;nbsp;M.&amp;nbsp;T., Estopà,&amp;nbsp;R. and Vivaldi,&amp;nbsp;J. (2001). Automatic Term Detection: a Review of Current Systems, in Bourigault,&amp;nbsp;D., Jacquemin,&amp;nbsp;Ch. and L&amp;rsquo;Homme,&amp;nbsp;M.-C. (eds.), Recent Advances in Computational Terminology, John Benjamins Publ., Amsterdam, Netherlands, 53&amp;ndash;87. DOI:&amp;nbsp;10.1075/nlp.2.04cab (In&amp;nbsp;English)</mixed-citation></ref><ref id="B3"><mixed-citation>Durda,&amp;nbsp;K. and Buchanan,&amp;nbsp;L. (2008). WINDSORS: Windsor Improved Norms of Distance and Similarity of Representations of Semantics, Behavior Research Methods, 40, 705&amp;ndash;712. DOI:&amp;nbsp;10.3758/BRM.40.3.705 (In&amp;nbsp;English)</mixed-citation></ref><ref id="B4"><mixed-citation>Fisher,&amp;nbsp;D., Frey,&amp;nbsp;N. and Lapp,&amp;nbsp;D. (2016). Text Complexity: Stretching Readers with Texts and Tasks, Corwin Press, Thousand Oaks, CA, USA. (In&amp;nbsp;English)</mixed-citation></ref><ref id="B5"><mixed-citation>Flor,&amp;nbsp;M., Klebanov,&amp;nbsp;B. and Sheehan,&amp;nbsp;K. (2013). Lexical Tightness and Text Complexity, Proceedings of the 2th Workshop of Natural Language Processing for Improving Textual Accessibility (NLP4ITA), Atlanta, USA, 29&amp;ndash;38. (In&amp;nbsp;English)</mixed-citation></ref><ref id="B6"><mixed-citation>Glazkova,&amp;nbsp;A., Egorov,&amp;nbsp;Yu. and Glazkov,&amp;nbsp;M. (2021). A Comparative Study of Feature Types for Age-Based Text Classification, in van der Aalst,&amp;nbsp;W. et al. (eds.), Analysis of Images, Social Networks and Texts. AIST 2020. Lecture Notes in Computer Science, 12602, Springer Publ., Cham, Switzerland, 120&amp;ndash;134. (In&amp;nbsp;English)</mixed-citation></ref><ref id="B7"><mixed-citation>Iomdin,&amp;nbsp;B.&amp;nbsp;L. and Morozov,&amp;nbsp;D.&amp;nbsp;A. (2021). Who Can Understand &amp;ldquo;Dunno&amp;rdquo;? Automatic Assessment of Text Complexity in Children&amp;rsquo;s Literature, Russkaya Rech&amp;rsquo;, 5, 55&amp;ndash;68. DOI:&amp;nbsp;10.31857/S013161170017239-1 (In&amp;nbsp;Russian)</mixed-citation></ref><ref id="B8"><mixed-citation>Jones,&amp;nbsp;M.&amp;nbsp;N. and Mewhort,&amp;nbsp;D.&amp;nbsp;J.&amp;nbsp;K. (2007). Representing Word Meaning and Order Information in a Composite Holographic Lexicon, Psychological Review, 114, 1&amp;ndash;37. DOI:&amp;nbsp;10.1037/0033-295X.114.1.1 (In&amp;nbsp;English)</mixed-citation></ref><ref id="B9"><mixed-citation>Kilgarriff,&amp;nbsp;A., Jakub&amp;iacute;ček,&amp;nbsp;M., Kov&amp;aacute;ř,&amp;nbsp;V. et al. (2014). Finding Terms in Corpora for Many Languages with the Sketch Engine, Proceedings of the Demonstrations at the 14th Conference the European Chapter of the Association for Computational Linguistics, Gothenburg, Sweden, 53&amp;ndash;56. DOI:&amp;nbsp;10.3115/v1/E14-2014 (In&amp;nbsp;English)</mixed-citation></ref><ref id="B10"><mixed-citation>Korkontzelos,&amp;nbsp;I. and Ananiadou,&amp;nbsp;S. (2014). Term Extraction, in Mitkov,&amp;nbsp;R. (ed.), Oxford Handbook of Computational Linguistics, Oxford University Press, Oxford, UK, 991&amp;ndash;1012. (In&amp;nbsp;English)</mixed-citation></ref><ref id="B11"><mixed-citation>Kutuzov,&amp;nbsp;A. and Kuzmenko,&amp;nbsp;E. (2017). WebVectors: A Toolkit for Building Web Interfaces for Vector Semantic Models, in Ignatov,&amp;nbsp;D. et al. (ed.), Analysis of Images, Social Networks and Texts. AIST 2016. Communications in Computer and Information Science, 661, Springer Publ., Cham, Switzerland, 155&amp;ndash;161. (In&amp;nbsp;English)</mixed-citation></ref><ref id="B12"><mixed-citation>Laposhina,&amp;nbsp;A.&amp;nbsp;N., Lebedeva,&amp;nbsp;M.&amp;nbsp;U. and Berlin Khenis,&amp;nbsp;A. (2022). Word Frequency and Text Complexity: An Eye-tracking Study of Young Russian Readers, Russian Journal of Linguistics, 26&amp;nbsp;(2), 493&amp;ndash;514. DOI:&amp;nbsp;10.22363/2687-0088-30084. (In&amp;nbsp;Russian)</mixed-citation></ref><ref id="B13"><mixed-citation>Laposhina,&amp;nbsp;А.&amp;nbsp;N., Veselovskaya,&amp;nbsp;Т.&amp;nbsp;S., Lebedeva,&amp;nbsp;M.&amp;nbsp;U. and Kupreshchenko,&amp;nbsp;O.&amp;nbsp;F. (2019). Lexical Analysis of the Russian Language Textbooks for Primary School: Corpus Study, Computational Linguistics and Intellectual Technologies: papers from the Annual International Conference &amp;ldquo;Dialogue&amp;rdquo;, Moscow, Russia, 18&amp;nbsp;(25), 351&amp;ndash;363. (In&amp;nbsp;Russian)</mixed-citation></ref><ref id="B14"><mixed-citation>Leichik,&amp;nbsp;V.&amp;nbsp;M. (2007). Terminovedenie: predmet, metody, struktura [Terminology Studies: Subject, Methods, Structure], LKI Publishing House, Moscow, Russia. (In&amp;nbsp;Russian)</mixed-citation></ref><ref id="B15"><mixed-citation>Levy,&amp;nbsp;O. and Goldberg,&amp;nbsp;Y. (2014). Linguistic Regularities in Sparse and Explicit Word Representations, Proceedings of the Eighteenth Conference on Computational Natural Language Learning, Baltimore, USA, 171&amp;ndash;180. DOI:&amp;nbsp;10.3115/ v1/W14-1618 (In&amp;nbsp;English)</mixed-citation></ref><ref id="B16"><mixed-citation>Lukashevich,&amp;nbsp;N.&amp;nbsp;V. and Logachev,&amp;nbsp;Yu.&amp;nbsp;M. (2010). Combining Features for Automatic Term Extraction, Numerical Methods and Programming, 11&amp;nbsp;(4), 108&amp;ndash;116. (In&amp;nbsp;Russian)</mixed-citation></ref><ref id="B17"><mixed-citation>Martynova,&amp;nbsp;E.&amp;nbsp;V., Solnyshkina,&amp;nbsp;M.&amp;nbsp;I., Merzlyakova,&amp;nbsp;A.&amp;nbsp;F. and Gizatulina,&amp;nbsp;D.&amp;nbsp;Yu. (2020). Lexical Parameters of the Academic Text (Based on the Texts of the Academic Corpus of the Russian Language), Philology and Culture, 3, 72&amp;ndash;80. DOI:&amp;nbsp;10.26907/2074-0239-2020-61-3-72-80 (In&amp;nbsp;Russian)</mixed-citation></ref><ref id="B18"><mixed-citation>Mikk,&amp;nbsp;Ya.&amp;nbsp;A. (1981). Optimizatsiya slozhnosti uchebnogo teksta: V pomoshch&amp;#39; avtoram i redaktoram [Optimizing the complexity of educational text: To help authors and editors], Prosveshchenie, Moscow, Russia. (In&amp;nbsp;Russian)</mixed-citation></ref><ref id="B19"><mixed-citation>Mikolov,&amp;nbsp;T., Sutskever,&amp;nbsp;I., Chen,&amp;nbsp;K. et al. (2013a). Distributed Representations of Words and Phrases and their Compositionality, Advances in Neural Information Processing Systems 26, 27th Annual Conference on Neural Information Processing Systems 2013, Lake Tahoe, USA, 3136&amp;ndash;3144. (In&amp;nbsp;English)</mixed-citation></ref><ref id="B20"><mixed-citation>Mikolov,&amp;nbsp;T., Yih,&amp;nbsp;W.&amp;nbsp;T and Zweig,&amp;nbsp;G. (2013b). Linguistic Regularities in Continuous Space Word Representations, Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Atlanta, USA, 746&amp;ndash;751. (In&amp;nbsp;English)</mixed-citation></ref><ref id="B21"><mixed-citation>Mitrofanova,&amp;nbsp;O.&amp;nbsp;A. and Zakharov,&amp;nbsp;V.&amp;nbsp;P. (2009). Automatic Analysis of Terminology in the Russian Text Corpus on Corpus Linguistics, Computational Linguistics and Intellectual Technologies: papers from the Annual International Conference &amp;ldquo;Dialogue&amp;rdquo;, Bekasosvo, Russia, 8&amp;nbsp;(15), 321&amp;ndash;328. (In&amp;nbsp;Russian)</mixed-citation></ref><ref id="B22"><mixed-citation>Monakhov,&amp;nbsp;S.&amp;nbsp;I., Turchanenko,&amp;nbsp;V.&amp;nbsp;V. and Cherdakov,&amp;nbsp;D.&amp;nbsp;N. (2022). Terminology in Textbooks and Research Articles: Cluster Analysis of Corpus Data, Proceedings of 6th International Conference &amp;ldquo;Informatization of Education and E-learning Methodology: Digital Technologies in Education&amp;rdquo;, Krasnoyarsk, Russia, 3, 228&amp;ndash;233. (In&amp;nbsp;Russian)</mixed-citation></ref><ref id="B23"><mixed-citation>Morozov,&amp;nbsp;D.&amp;nbsp;A. and Iomdin,&amp;nbsp;B.&amp;nbsp;L. (2019). Criteria of Semantic Complexity of Words, Computational Linguistics and Intellectual Technologies: papers from the Annual International Conference &amp;ldquo;Dialogue&amp;rdquo;, Moscow, Russia, 18&amp;nbsp;(25), 119&amp;ndash;131. (In&amp;nbsp;Russian)</mixed-citation></ref><ref id="B24"><mixed-citation>Nokel,&amp;nbsp;M.&amp;nbsp;A., Bolshakova,&amp;nbsp;E.&amp;nbsp;I. and Loukachevitch,&amp;nbsp;N.&amp;nbsp;V. (2012). Combining Multiple Features for Single-word Term Extraction, Computational Linguistics and Intellectual Technologies: papers from the Annual International Conference &amp;ldquo;Dialogue&amp;rdquo;, Bekasosvo, Russia, 11&amp;nbsp;(18), 1, 490&amp;ndash;501. (In&amp;nbsp;English)</mixed-citation></ref><ref id="B25"><mixed-citation>Piotrovsky,&amp;nbsp;R.&amp;nbsp;G. and Yastrebova,&amp;nbsp;S.&amp;nbsp;V. (1969). Statistical Term Recognition, in Piotrovskij,&amp;nbsp;R.&amp;nbsp;G. (ed.), Statistika teksta [Text statistics], Belorusskij gosudarstvennyj universitet, Minsk, Belarus, 1, 249&amp;ndash;259. (In&amp;nbsp;Russian)</mixed-citation></ref><ref id="B26"><mixed-citation>Rohde,&amp;nbsp;D.&amp;nbsp;L., Gonnerman,&amp;nbsp;L.&amp;nbsp;M. and Plaut,&amp;nbsp;D.&amp;nbsp;C. (2006). An Improved Model of Semantic Similarity Based on Lexical Co-Occurrence, Communications of the ACM, 8, 627&amp;ndash;633. (In&amp;nbsp;English)</mixed-citation></ref><ref id="B27"><mixed-citation>Schwanenflugel,&amp;nbsp;P.&amp;nbsp;J. (1991). Why are Abstract Concepts Hard to Understand?, in Schwanenflugel,&amp;nbsp;P.&amp;nbsp;J. (ed.), The psychology of word meanings, Lawrence Erlbaum Associates Inc., Hillsdale, USA, 223&amp;ndash;250. (In&amp;nbsp;English)</mixed-citation></ref><ref id="B28"><mixed-citation>Sharoff,&amp;nbsp;S. (2022). What Neural Networks Know about Linguistic Complexity, Russian Journal of Linguistics, 26&amp;nbsp;(2), 371&amp;ndash;390. DOI:&amp;nbsp;10.22363/2687-0088-30178 (In&amp;nbsp;English)</mixed-citation></ref><ref id="B29"><mixed-citation>Shpakovsky,&amp;nbsp;Yu.&amp;nbsp;F. (2007). Estimation of Perception Difficulty and Optimization of the Educational Text Complexity (on the Material of Texts in Chemistry), Abstract of Ph.D. dissertation, Linguistics, Minsk State Linguistic University, Minsk, Belarus. (In&amp;nbsp;Russian)</mixed-citation></ref><ref id="B30"><mixed-citation>Solnyshkina,&amp;nbsp;M.&amp;nbsp;I. (2022). Measuring Text Complexity: State of the Art, Collection of Scientific Papers X Jubilee International Scientific Conference &amp;ldquo;Teacher. Student. Textbook (in the Context of Global Challenges of Modern Times)&amp;rdquo;, Moscow, Russia, 20&amp;ndash;24. (In&amp;nbsp;Russian)</mixed-citation></ref><ref id="B31"><mixed-citation>Solnyshkina,&amp;nbsp;M.&amp;nbsp;I. and Kiselnikov,&amp;nbsp;A.&amp;nbsp;S. (2015). Text Complexity: Study Phases in Russian Linguistics, Tomsk State University Journal of Philology, 6&amp;nbsp;(38), 86&amp;ndash;99. DOI:&amp;nbsp;10.17223/19986645/38/7 (In&amp;nbsp;Russian)</mixed-citation></ref><ref id="B32"><mixed-citation>Solnyshkina,&amp;nbsp;M.&amp;nbsp;I., McNamara,&amp;nbsp;D. and Zamaletdinov,&amp;nbsp;R.&amp;nbsp;R. (2022). Natural Language Processing and Discourse Complexity Studies, Russian Journal of Linguistics, 26&amp;nbsp;(2), 317&amp;ndash;341. DOI:&amp;nbsp;10.22363/2687-0088-30171 (In&amp;nbsp;Russian)</mixed-citation></ref><ref id="B33"><mixed-citation>Solovyev,&amp;nbsp;V.&amp;nbsp;D., Ivanov,&amp;nbsp;V.&amp;nbsp;V. and Solnyshkina,&amp;nbsp;M.&amp;nbsp;I. (2018). Assessment of Reading Difficulty Levels in Russian Academic Texts: Approaches and Metrics, Journal of Intelligent &amp;amp; Fuzzy Systems, 34&amp;nbsp;(2), 3049&amp;ndash;3058. DOI:&amp;nbsp;10.3233/JIFS-169489 (In&amp;nbsp;English)</mixed-citation></ref><ref id="B34"><mixed-citation>Solovyev,&amp;nbsp;V.&amp;nbsp;D., Solnyshkina,&amp;nbsp;M.&amp;nbsp;I. and McNamara,&amp;nbsp;D. (2022). Computational Linguistics and Discourse Complexology: Paradigms and Research Methods, Russian Journal of Linguistics, 26&amp;nbsp;(2), 275&amp;ndash;316. DOI:&amp;nbsp;10.22363/2687-0088-30161 (In&amp;nbsp;English)</mixed-citation></ref><ref id="B35"><mixed-citation>Stepanova,&amp;nbsp;D.&amp;nbsp;V. (2017). Analiz metodov avtomaticheskogo vydeleniya terminov iz nauchno-tekhnicheskih tekstov [Analysis of Methods for Automatic Terms Extraction from Scientific and Technical Texts], Aktual&amp;#39;nye problemy sovremennoj prikladnoj lingvistiki [Current problems of modern applied linguistics], Minskij gosudarstvennyj lingvisticheskij universitet, Minsk, 62&amp;ndash;67. (In&amp;nbsp;Russian)</mixed-citation></ref><ref id="B36"><mixed-citation>Tatarinov,&amp;nbsp;V.&amp;nbsp;A. (2006). Obshchee terminovedenie: Entsiklopedicheskij slovar&amp;#39; [Terminology Studies: Encyclopedic Dictionary], Moskovskij Litsej, Moscow, Russia. (In&amp;nbsp;Russian)</mixed-citation></ref><ref id="B37"><mixed-citation>Turney,&amp;nbsp;P.&amp;nbsp;D. and Pantel,&amp;nbsp;P. (2010). From Frequency to Meaning: Vector Space Models of Semantics, Journal of Artiﬁcial Intelligence Research, 37, 141&amp;ndash;188. DOI:&amp;nbsp;10.1613/jair.2934 (In&amp;nbsp;English)</mixed-citation></ref></ref-list></back></article>