16+
DOI: 10.18413/2313-8912-2024-10-3-0-4

Text complexity increase in Russian texts as a function of morphological changes

The study aims to identify (1) morphological complexity predictors and (2) domain inherent markers able to differentiate subject areas of academic text in Russian. The total size of the corpus, including textbooks on biology and social studies of three levels of complexity, corresponding to 6-7, 8-9, and 10-11 grades of the Russian school, amounted to 941963 tokens. The linguistic complexity of the texts was assessed using the Flesch-Kincaid readability formula modified for the Russian language, and the interdependence of the parameters was measured based on the correlation analysis conducted with STATISTICA. Calculation of linguistic parameters values, including distribution of nouns, adjectives, verbs, and readability index, were performed using RuLingva (rulex.kpfu.ru/), a text profiler for the Russian language, while the frequency metrics of deverbatives and deadjectives were identified by the contributors manually. To ensure comparability of the metrics, the distributional analysis of deverbatives and deadjectives was performed in the corpus normalized to 10000 tokens. Metrics “noun distribution”, “lexical density”, “deverbation”, “deadjectivation” demonstrated linear interdependence with readability and as such can be viewed as complexity predictors. Inverse correlation was revealed between text readability and verb distribution. Morphological analysis confirmed a high level of texts nominativity and a stable growth of substantives frequency. The latter explicates in an increase in the frequency of deverbation and deadjectivation suffixes in texts from the 6th to the 11th grade. Metrics of lexical density, adjective distribution and substantive suffixes demonstrate ability to discriminate academic texts domains. The research findings are applicable in text analytics, computational linguistics, genre studies, and can be useful for test developers and textbook writers. The authors view the research prospect in the study of compounds of Latin and Greek origin in academic texts. The identified parameters may be used as linguistic complexity predictors and domain discriminants.

Figures

Number of views: 64 (view statistics)
Количество скачиваний: 71
Full text (HTML)Full text (PDF)To articles list
  • User comments
  • Reference lists
  • Thanks

While nobody left any comments to this publication.
You can be first.

Leave comment: