<?xml version='1.0' encoding='utf-8'?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.2 20190208//EN" "http://jats.nlm.nih.gov/publishing/1.2/JATS-journalpublishing1.dtd">
<article article-type="research-article" dtd-version="1.2" xml:lang="ru" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><front><journal-meta><journal-id journal-id-type="issn">2313-8912</journal-id><journal-title-group><journal-title>Research Result. Theoretical and Applied Linguistics</journal-title></journal-title-group><issn pub-type="epub">2313-8912</issn></journal-meta><article-meta><article-id pub-id-type="doi">10.18413/2313-8912-2022-8-4-0-8</article-id><article-id pub-id-type="publisher-id">2976</article-id><article-categories><subj-group subj-group-type="heading"><subject>APPLIED LINGUISTICS</subject></subj-group></article-categories><title-group><article-title>&lt;strong&gt;Lexical and syntactic features of academic Russian texts: a&amp;nbsp;discriminant analysis&lt;/strong&gt;</article-title><trans-title-group xml:lang="en"><trans-title>&lt;strong&gt;Lexical and syntactic features of academic Russian texts: a&amp;nbsp;discriminant analysis&lt;/strong&gt;</trans-title></trans-title-group></title-group><contrib-group><contrib contrib-type="author"><name-alternatives><name xml:lang="ru"><surname>Kupriyanov</surname><given-names>Roman V.</given-names></name><name xml:lang="en"><surname>Kupriyanov</surname><given-names>Roman V.</given-names></name></name-alternatives><email>kroman1@mail.ru</email><xref ref-type="aff" rid="aff1" /></contrib><contrib contrib-type="author"><name-alternatives><name xml:lang="ru"><surname>Solnyshkina</surname><given-names>Marina I.</given-names></name><name xml:lang="en"><surname>Solnyshkina</surname><given-names>Marina I.</given-names></name></name-alternatives><email>mesoln@yandex.ru</email><xref ref-type="aff" rid="aff1" /></contrib><contrib contrib-type="author"><name-alternatives><name xml:lang="ru"><surname>Dascalu</surname><given-names>Mihai</given-names></name><name xml:lang="en"><surname>Dascalu</surname><given-names>Mihai</given-names></name></name-alternatives><email>mihai.dascalu@upb.ro</email><xref ref-type="aff" rid="aff2" /></contrib><contrib contrib-type="author"><name-alternatives><name xml:lang="ru"><surname>Soldatkina</surname><given-names>Tatyana A.</given-names></name><name xml:lang="en"><surname>Soldatkina</surname><given-names>Tatyana A.</given-names></name></name-alternatives><email>fia.vr.solta@gmail.com</email><xref ref-type="aff" rid="aff1" /></contrib></contrib-group><aff id="aff1"><institution>Kazan Federal University, Russia</institution></aff><aff id="aff2"><institution>Polytechnic University of Bucharest, Romania</institution></aff><pub-date pub-type="epub"><year>2022</year></pub-date><volume>8</volume><issue>4</issue><fpage>0</fpage><lpage>0</lpage><self-uri content-type="pdf" xlink:href="/media/linguistics/2022/4/Лингвистика_8_4_2022_105-122.pdf" /><abstract xml:lang="ru"><p>This article presents three mathematical models to differentiate academic texts from three subject discourses written in Russian (i.e., Philological, Mathematical, and Natural Sciences) which further enable design and automated profiling of corresponding typologies. Our models include 5 indices, one at surface level (i.e., sentence length) and 4 syntax features (i.e., mean verbs per sentence, mean adjectives per sentence, local noun overlap, and global argument overlap). We identified and validated the five statistically significant features out of 45 linguistic features extracted from our research corpus consisting of 91.185 tokens. The shortest sentence length is found in Russian language textbooks while the longest sentences are identified in Natural Science texts. The mean number of verbs, nouns, and adjectives per sentence is higher in Natural Science textbooks, whereas Mathematics discourse is characterized by the shortest word length, highest local noun overlap, and highest global argument overlap. We assign the metric differences between the three discourses to their functions: Natural Science texts are characterized by descriptions and narrative passages in contrast to Philology that is associated with opinions. Mathematical discourse operates with precise definitions, explanations and justifications thus exercising numerous overlaps. The discriminant analysis built on top of the features supports the development of text profilers targeting parametric analyses. The automation of these features and the provided formulas for classification enable the design and development of text profilers required for textbook writing and editing. Our findings are useful for professional linguists, technologists, and academic writers to select and modify texts for their target audience.</p></abstract><trans-abstract xml:lang="en"><p>This article presents three mathematical models to differentiate academic texts from three subject discourses written in Russian (i.e., Philological, Mathematical, and Natural Sciences) which further enable design and automated profiling of corresponding typologies. Our models include 5 indices, one at surface level (i.e., sentence length) and 4 syntax features (i.e., mean verbs per sentence, mean adjectives per sentence, local noun overlap, and global argument overlap). We identified and validated the five statistically significant features out of 45 linguistic features extracted from our research corpus consisting of 91.185 tokens. The shortest sentence length is found in Russian language textbooks while the longest sentences are identified in Natural Science texts. The mean number of verbs, nouns, and adjectives per sentence is higher in Natural Science textbooks, whereas Mathematics discourse is characterized by the shortest word length, highest local noun overlap, and highest global argument overlap. We assign the metric differences between the three discourses to their functions: Natural Science texts are characterized by descriptions and narrative passages in contrast to Philology that is associated with opinions. Mathematical discourse operates with precise definitions, explanations and justifications thus exercising numerous overlaps. The discriminant analysis built on top of the features supports the development of text profilers targeting parametric analyses. The automation of these features and the provided formulas for classification enable the design and development of text profilers required for textbook writing and editing. Our findings are useful for professional linguists, technologists, and academic writers to select and modify texts for their target audience.</p></trans-abstract><kwd-group xml:lang="ru"><kwd>Typology</kwd><kwd>Lexical features</kwd><kwd>Automation profilers</kwd><kwd>Subject domain</kwd><kwd>Syntactic features</kwd><kwd>Mathematical model</kwd><kwd>Discriminant analysis</kwd></kwd-group><kwd-group xml:lang="en"><kwd>Typology</kwd><kwd>Lexical features</kwd><kwd>Automation profilers</kwd><kwd>Subject domain</kwd><kwd>Syntactic features</kwd><kwd>Mathematical model</kwd><kwd>Discriminant analysis</kwd></kwd-group></article-meta></front><back><ack><p>This paper has been supported by the Kazan Federal University Strategic Academic Leadership Program (&amp;ldquo;PRIORITY-2030&amp;rdquo;), Strategic Project №4.

We thank Polina Alexandrovna Lekhnitskaya, a student at Kazan Federal University, for her assistance in compiling the corpus of academic texts and cooperation while conducting the research.</p></ack><ref-list><title>Список литературы</title><ref id="B1"><mixed-citation>Andreev,&amp;nbsp;V.&amp;nbsp;S. (2010). Methods of quantitative style research in linguistics: a multidimensional approach, Izvestiya Smolenskogo gosudarstvennogo universiteta, 3&amp;nbsp;(11), 100&amp;ndash;110. (In Russian)</mixed-citation></ref><ref id="B2"><mixed-citation>Baayen,&amp;nbsp;R.&amp;nbsp;H., Halteren,&amp;nbsp;H. and Tweedie,&amp;nbsp;F.&amp;nbsp;J. (1996). Outside the cave of shadows: using syntactic annotation to enhance authorship attribution, Literary and Linguistic Computing, 11&amp;nbsp;(3), 121&amp;ndash;132. (In English)</mixed-citation></ref><ref id="B3"><mixed-citation>Biber,&amp;nbsp;D. (2006). University language: A corpus-based study of spoken and written registers, John Benjamins, Amsterdam. (In English)</mixed-citation></ref><ref id="B4"><mixed-citation>Corlatescu,&amp;nbsp;D., Ruseti,&amp;nbsp;Ș. and Dascalu,&amp;nbsp;M.&amp;nbsp;(2022). ReaderBench: Multilevel analysis of Russian text characteristics, Russian Journal of Linguistics, 26, 2, 342&amp;ndash;370, available at: URL:&amp;nbsp;https://journals.rudn.ru/linguistics/article/view/31328 (Accessed 5&amp;nbsp;March 2022). https://doi.org/10.22363/2687-0088-30145 (In English)</mixed-citation></ref><ref id="B5"><mixed-citation>Crossley,&amp;nbsp;S.&amp;nbsp;A., Varner,&amp;nbsp;L.&amp;nbsp;K., Roscoe,&amp;nbsp;R.&amp;nbsp;D. and McNamara,&amp;nbsp;D.&amp;nbsp;S. (2013). Using Automated Indices of Cohesion to Evaluate an Intelligent Tutoring System and an Automated Writing Evaluation System, Artificial Intelligence in Education, 269&amp;ndash;278. https://doi.org/10.1007/978-3-642-39112-5_28 (In English)</mixed-citation></ref><ref id="B6"><mixed-citation>Flesch,&amp;nbsp;R. (1948). A new readability yardstick, Journal of Applied Psychology, 32&amp;nbsp;(3), 221&amp;ndash;233. (In English)</mixed-citation></ref><ref id="B7"><mixed-citation>Gatiyatullina,&amp;nbsp;G., Solnyshkina,&amp;nbsp;M., Solovyev,&amp;nbsp;V., Danilov,&amp;nbsp;A., Martynova,&amp;nbsp;E. and Yarmakeev,&amp;nbsp;I. (2020). Computing Russian Morphological distribution patterns using RusAC Online Server, 13th International Conference on Developments in eSystems Engineering (DeSE), 393&amp;ndash;398, available at: https://ieeexplore.ieee.org/document/9450753.Coh-Metrix (Accessed 5 March 2022). http://doi.org/10.1109/DeSE51703.2020.9450753 (In English)</mixed-citation></ref><ref id="B8"><mixed-citation>Graesser,&amp;nbsp;A.&amp;nbsp;C., McNamara,&amp;nbsp;D.&amp;nbsp;S., Louwerse,&amp;nbsp;M.&amp;nbsp;M. and Cai,&amp;nbsp;Z. (2004). Coh-Metrix: Analysis of text on cohesion and language, Behavior Research Methods, Instruments, &amp;amp; Computers, 36&amp;nbsp;(2), 193&amp;ndash;202. http://doi.org/10.3758/bf03195564 (In English)</mixed-citation></ref><ref id="B9"><mixed-citation>Holmes, D., Forsyth, R. (1995). The Federalist revisited: New directions in authorship attribution, Literary and Linguistic Computing, 10&amp;nbsp;(2), 111&amp;ndash;127. (In English)</mixed-citation></ref><ref id="B10"><mixed-citation>Kiselnikov,&amp;nbsp;A.&amp;nbsp;S. (2015). K probleme kharakteristik teksta: chitabel&amp;rsquo;nost&amp;rsquo;, ponyatnost&amp;rsquo;, slozhnost&amp;rsquo;, trudnost&amp;rsquo; [To the problem of text characteristics: readability, clarity, complexity, difficulty ], Filologicheskie nauki. Voprosy teorii i praktiki, 11&amp;nbsp;(53), 79&amp;ndash;84. (In Russian)</mixed-citation></ref><ref id="B11"><mixed-citation>Malvern,&amp;nbsp;D., Richards,&amp;nbsp;B., Chipere,&amp;nbsp;N. and Dur&amp;aacute;n,&amp;nbsp;P. (2004). Traditional Approaches to Measuring Lexical Diversity, Palgrave Macmillan, London, UK. https://doi.org/10.1057/9780230511804 (In English)</mixed-citation></ref><ref id="B12"><mixed-citation>Martynova,&amp;nbsp;E., Solnyshkina,&amp;nbsp;M.&amp;nbsp;I., Merzlyakova,&amp;nbsp;A. and Gizatulina,&amp;nbsp;D. (2020). Leksicheskie parametry uchebnogo teksta (na materiale tekstov uchebnogo korpusa russkogo yazyka) [Lexical parameters of academic text (based on the texts of Academic corpus of the Russian language)], Philology and culture, 3&amp;nbsp;(61), 72&amp;ndash;80, available at: http://www.philology-and-culture.kpfu.ru/?q=node/2728 (Accessed 5 March 2022). (In Russian)</mixed-citation></ref><ref id="B13"><mixed-citation>McNamara,&amp;nbsp;D.&amp;nbsp;S., Graesser,&amp;nbsp;A.&amp;nbsp;C. and Louwerse,&amp;nbsp;M.&amp;nbsp;M. (2012). Sources of text difficulty: Across genres and grades, in Sabatini,&amp;nbsp;J.&amp;nbsp;P., Albro,&amp;nbsp;E. and O&amp;rsquo;Reilly,&amp;nbsp;T. (eds.), Measuring up: Advances in how we assess reading ability, 89&amp;ndash;116. (In English)</mixed-citation></ref><ref id="B14"><mixed-citation>McNamara,&amp;nbsp;D., Graesser,&amp;nbsp;A., McCarthy,&amp;nbsp;P. and Cai,&amp;nbsp;Z. (2014). Automated Evaluation of Text and Discourse with Coh-Metrix, Cambridge University Press, Cambridge, UK. http://doi.org/10.1017/CBO9780511894664 (In English)</mixed-citation></ref><ref id="B15"><mixed-citation>Oborneva,&amp;nbsp;I.&amp;nbsp;V. (2006). Avtomatizirovannaya otsenka slozhnosti uchebnyh tekstov na osnove statisticheskih parameters [Automated assessment of the complexity of educational texts based on statistical parameters], Abstract of Ph.D. dissertation, Moscow City University, Moscow, Russia. (In Russian)</mixed-citation></ref><ref id="B16"><mixed-citation>Seifart,&amp;nbsp;F., Danielsen,&amp;nbsp;S., Meyer,&amp;nbsp;R., Nordhoff,&amp;nbsp;S., Pakendorf,&amp;nbsp;B., Witzlack-Makarevich,&amp;nbsp;A. and Zakharko,&amp;nbsp;T. (2012). The relative frequencies of nouns, pronouns, and verbs cross-linguistically Applicant, available at: https://www.semanticscholar.org/paper/The-relative-frequencies-of-nouns-%2C-pronouns-%2C-and-Seifart-Danielsen/cd52cd7091fee4b1781c16a51fe58f87ba642c1c (Accessed 5 March 2022). (In English)</mixed-citation></ref><ref id="B17"><mixed-citation>Sirotinina,&amp;nbsp;O.&amp;nbsp;B. (2009). Spoken language within the system of functional styles of the Russian literary language: grammar, Librekom, Moscow, Russia. (In English)</mixed-citation></ref><ref id="B18"><mixed-citation>Solnyshkina,&amp;nbsp;M.&amp;nbsp;I. and Kiselnikov,&amp;nbsp;A.&amp;nbsp;S. (2015). Slozhnost&amp;rsquo; teksta kak funktsiya leksicheskih parametrov (na materiale uchebnyh tekstov na russkom yazyke [Text Complexity: Chronology of Russian applied linguistics studies], Vestnik Tomskogo gosudarstvennogo universiteta. Filologiya, 6&amp;nbsp;(38). (In Russian)</mixed-citation></ref><ref id="B19"><mixed-citation>Solnyshkina,&amp;nbsp;M.&amp;nbsp;I., Harkova,&amp;nbsp;E.&amp;nbsp;V. and Kazachkova,&amp;nbsp;M.&amp;nbsp;B. (2020). The Structure of Cross-Linguistic Differences: Meaning and Context of &amp;lsquo;Readability&amp;rsquo; and its Russian Equivalent &amp;lsquo;Chitabelnost&amp;rsquo;, Journal of Language and Education, 6&amp;nbsp;(1), 103&amp;ndash;119. https://doi.org/10.17323/jle.2020.7176 (In English)</mixed-citation></ref><ref id="B20"><mixed-citation>Solnyshkina,&amp;nbsp;M.&amp;nbsp;I., Kazachkova,&amp;nbsp;M.&amp;nbsp;B. and Harkova,&amp;nbsp;E.&amp;nbsp;V. (2020). Сifrovye tekhnologii izmereniya slozhnosti tekstov kak instrument upravleniya kachestvom obucheniya chteniyu na anglijskom yazyke [Digital technologies for measuring text complexity as a tool for managing the quality of teaching reading in English], Foreign languages at school, 3, 15&amp;ndash;21, available at: https://www.elibrary.ru/item.asp?id=42609743 (Accessed 5&amp;nbsp;March 2022). (In Russian)</mixed-citation></ref><ref id="B21"><mixed-citation>Solnyshkina,&amp;nbsp;M., McNamara,&amp;nbsp;D. and Zamaletdinov,&amp;nbsp;R. (2022). Natural language processing and discourse complexity studies, Russian Journal of Linguistics, 26&amp;nbsp;(2), 317&amp;ndash;341. (In English)</mixed-citation></ref><ref id="B22"><mixed-citation>Solnyshkina,&amp;nbsp;M.&amp;nbsp;I., Solovyev,&amp;nbsp;V.&amp;nbsp;D., Gafiyatova,&amp;nbsp;E.&amp;nbsp;V. and Martynova,&amp;nbsp;E.&amp;nbsp;V. (2022) Slozhnost&amp;#39; teksta kak mezhdisciplinarnaya problema [Text complexity as an interdisciplinary problem], Issues of cognitive linguistics, 1, 18-40. (In Russian)</mixed-citation></ref><ref id="B23"><mixed-citation>Solovyev,&amp;nbsp;V.&amp;nbsp;D., Ivanov,&amp;nbsp;V.&amp;nbsp;V. and Akhtiamov,&amp;nbsp;R.&amp;nbsp;B. (2019). Dictionary of abstract and concrete words of the Russian language: A methodology for creation and application, Journal of research in applied linguistics, 10, 215&amp;ndash;227. (In English)</mixed-citation></ref><ref id="B24"><mixed-citation>Solovyev,&amp;nbsp;V., Ivanov,&amp;nbsp;V. and Solnyshkina,&amp;nbsp;M. (2018). Assessment of reading difficulty levels in Russian academic texts: Approaches and Metrics, Journal of Intelligent &amp;amp; Fuzzy Systems, 34&amp;nbsp;(5). http://doi.org/10.3233/JIFS-169489 (In English)</mixed-citation></ref><ref id="B25"><mixed-citation>Stamatatos,&amp;nbsp;E., Fakotakis,&amp;nbsp;N. and Kokkinakis,&amp;nbsp;G. (2001). Computer-based authorship attribution without lexical measures, Computers and the Humanities, 35&amp;nbsp;(2), 193&amp;ndash;214. (In English)</mixed-citation></ref><ref id="B26"><mixed-citation>Vakhrusheva,&amp;nbsp;A.&amp;nbsp;Ya., Solnyshkina,&amp;nbsp;M.&amp;nbsp;I., Kupriyanov,&amp;nbsp;R.&amp;nbsp;V., Gafiyatova,&amp;nbsp;E.&amp;nbsp;V. and Klimagina,&amp;nbsp;I.&amp;nbsp;O. (2021). Lingvisticheskaya slozhnost&amp;rsquo; uchebnyh tekstov [Linguistic complexity of academic texts], Voprosy zhurnalistiki, pedagogiki, yazykoznaniya: elektronnyj zhurnal, 40&amp;nbsp;(1), 89&amp;ndash;99, available at: http://jpl-journal.ru/index.php/journal/article/view/78 (Accessed 5 March 2022). (In Russian)</mixed-citation></ref><ref id="B27"><mixed-citation>Zherebtsova,&amp;nbsp;Zh.&amp;nbsp;I. (2007). Ispol&amp;rsquo;zovanie informatsionnoj&amp;nbsp; struktury predlozheniya v obuchenii inostrannyh studentov-nefilologov chteniyu russkih uchebno-nauchnyh tekstov [The use of information structure of the sentence in teaching foreign non-philological students to read Russian academic and research texts], Ph.D. Thesis, Herzen State University, St.Petersburg, Russia. (In Russian)</mixed-citation></ref><ref id="B28"><mixed-citation>Zhuravlev,&amp;nbsp;A.&amp;nbsp;F. (1988). An experience of quantitative and typological investigation of spoken registers, Varieties of urban spoken language: a collection of research articles. Raznovidnosti gorodskoy ustnoy rechi, Nauka, Moscow, Russia, 84&amp;ndash;150. (In English)</mixed-citation></ref></ref-list></back></article>