DOI: 10.18413/2313-8912-2026-12-2-0-2

Qualitative and quantitative specificity of corpus representation of part-of-speech categoriality in Russian and Chinese

Aliaksandr A. Barkovich (Belarusian State University, Minsk, Republic of Belarus)
Qing Wang (independent researcher, Minsk, Republic of Belarus)

Relevance. This article analyzes corpus-based representations of parts of speech in Russian and Chinese as an objective and relevant resource for linguistic research. The experience of generalization of a wide range of data and metadata is relevant both for understanding the theoretical foundations of language description and for practice-oriented use in applied interdisciplinary research. It can be productively implemented and further developed in multifaceted research. The scientific significance of the work is determined by the fact that a corpus-verified comparison of part-of-speech systems of typologically distant languages allows not only to clarify the statistical parameters of their functioning, but also to argue for the question of the linguistic validity of the discussion categories – the predicative in Russian and the differentiating word in Chinese.

Problems. Improving linguistic categories is equally relevant for both Russian and Chinese, as evidenced by the modernization of the parts-of-speech systems created for them in the context of corpus representation. Such data enables us to interpret part-of-speech as a feature that identifies and qualifies metalinguistic practice.

Methods. Data obtained through computer programs integrated into linguistic corpora enable us to overcome the commonplaceness and self-reflective nature of traditional linguistic descriptions of language using material from a fundamentally stable system of parts of speech. The study employed a comprehensive methodological approach, including qualitative and quantitative analysis and corpus-based technique.

Results. The feasibility of the metalinguistic positioning of part-of-speech categoricality as a linguistic universal is confirmed. Using data from the Russian National Corpus and the Online Corpus of Chinese, this article examines the qualitative and quantitative features of the functionality of linguistic units in the context of their part-of-speech affiliation.

Conclusions. The obtained corpus data on the units’ real-speech functioning stereotypeness of a particular part-of-speech cluster in typologically distinct languages convincingly demonstrates the feasibility of classifying parts of speech as a metalinguistic tool. In turn, the development of part-of-speech systems in Russian and Chinese is no less linguistically pressing, as evidenced by the urgency of identifying new categories of parts of speech and their actual incorporation into the corpus annotation. In this regard, it is significant that the linguistic validity of such categories as the predicative in Russian and the differentiating word in Chinese was confirmed based on corpus representation. The knowledge gained primarily relates to the high-frequency segment of the vocabulary of both languages. The obtained results can be applied in the practice of corpus annotation, comparative grammar and teaching Russian and Chinese as foreign languages.

Keywords: Corpus representation, Part of speech, Quantitative analysis, Qualitative analysis, Russian, Chinese, Part-of-speech categorization.

Figures

Number of views: 85 (view statistics)

Количество скачиваний: 212

Full text (HTML)Full text (PDF)Скачать XML To articles list

Information for citation:

Barkovich, A. A., Wang, Q. (2026). Qualitative and quantitative specificity of corpus representation of part-of-speech categoriality in Russian and Chinese, Research Result. Theoretical and Applied Linguistics, 12 (2), 25–60.

User comments
Reference lists

While nobody left any comments to this publication.
You can be first.

Alpatov, V. M. (2018). Slovo i chasti rechi [Word and Parts of Speech], Publishing house YASK, Moscow, Russia. (In Russian)

Barkovich, A. A. and Wang, Q. (2015). Linguistic Corpora of the Chinese Language: Functional Aspect, Vestnik MGLU. Ser. 1. Filologiya, 5(78), 105–113. (In Russian)

Barkovich, A. (2018). Meta-Description of Derivational Relations: Specifics of System Representation, Mundo Eslavo, 17, 7–25. (In Russian)

Vinogradov, V. V. (2001). Russkiy yazyk [Russian Language], Rus. Language, Moscow, Russia. (In Russian)

Zakharov, V. P., Bogdanova, S. Yu. (2011). Korpusnaya lingvistika [Corpus Linguistics], IGLU Irkutsk, Russia. (In Russian)

Ivanov, L. Yu. (ed.) (2011). Culture of Russian Speech: Encyclopedic Dictionary and Reference Book, Flinta, Moscow, Russia. (In Russian)

Lomonosov, M. V. (1855). Grammatika russkogo yazyka, akademika M. V. Lomonosova, 1755 goda [Grammar of the Russian language, Academician M. V. Lomonosov, 1755], Imp. Academician Sciences Publ., Saint Petersburg, Russia. (In Russian)

Moskovin, V. P. (2013). On Approaches to Defining the Concept of "Trope", Izvestiya RAN. Seriya literatury i yazyka, 72 (2), 20–31. (In Russian)

Dobrushina, N. R. (ed.) (2007). Russian National Corpus and Problems of Humanitarian Education, TEIS: GU-HSE, Moscow, Russia. (In Russian)

Savchuk, S. O., Arkhangelsky, T. A., Bonch-Osmolovskaya, A. A., Donina, O. V., Kuznetsova, Yu. N., Lyashevskaya, O. N., Orekhov, B. V., Podryadchikova, M. V. (2024). National Corpus of the Russian Language 2.0: New Possibilities and Development Prospects, Voprosy Yazykoznaniya, 2, 7–34. (In Russian)

Sichinava, D. V. (2018). Parts of speech, Materialy k korpusnoy grammatike russkogo yazyka. Vypusk III: Chasti rechi i leksiko-grammaticheskie klassy [Materials for a Corpus Grammar of the Russian Language. Part III: Parts of Speech and Lexical and Grammatical Classes], St. Petersburg: Nestor-Istoriya, 9–39. (In Russian)

Anderson, S. R. (1992). A-Morphous Morphology, Cambridge University Press, Cambridge, UK. (In English)

Barkovich, A. (2019). Informational Linguistics: Computer, Internet, Artificial Intelligence and Language, 2019 IEEE 1st International Conference on Artificial Intelligence in Information and Communication (ICAIIC 2019), Okinawa, Japan, February 11–13, 008–013. https://doi.org/10.1109/ICAIIC.2019.8668989(In English)

Biber, D., Johansson, S., Leech, G., Conrad, S. and Finegan, E. (1999). Longman Grammar of Spoken and Written English, Pearson Education, Harlow. (In English)

Bybee, J. L. (2010). Language, Usage and Cognition, Cambridge University Press, Cambridge. https://doi.org/10.1017/CBO9780511750526(In English)

Çöltekin, Ç. and Rama, T. (2023). What do complexity measures measure? Correlating and validating corpus-based measures of morphological complexity, Linguistics Vanguard, 9(s1), 27–43. https://doi.org/10.1515/lingvan-2021-0007(In English)

Croft, W. (2001). Radical Construction Grammar: Syntactic Theory in Typological Perspective, Oxford University Press, Oxford. https://doi.org/10.1093/acprof:oso/9780198299554.001.0001(In English)

Crosthwaite, P., Ningrum, S., and Schweinberger, M. (2023). Research trends in corpus linguistics: A bibliometric analysis of two decades of Scopus-indexed corpus linguistics research, International Journal of Corpus Linguistics, 28 (3), 344–377. https://doi.org/10.1075/iycl.21072.cro(In English)

Francis, W. N. and Kučera, H. (1982). Frequency Analysis of English Usage: Lexicon and Grammar, Houghton Mifflin, Boston. (In English)

Givón, T. (1984). Syntax: A Functional-Typological Introduction (Vol. 1), John Benyamins, Amsterdam. (In English)

Goldberg, A. E. (1995). Constructions: A Construction Grammar Approach to Argument Structure, University of Chicago Press, Chicago. (In English)

Gries, S. T. (2009). Quantitative Corpus Linguistics with R: A Practical Introduction, Routledge, New York. https://doi.org/10.4324/9781315746210(In English)

Halliday, M. A. K. (1994). An Introduction to Functional Grammar (2nd ed.), Edward Arnold, London. (In English)

Haspelmath, M. (2011). The indeterminacy of word segmentation and the nature of morphology and syntax, Folia Linguistica, 45(1) 31–80. https://doi.org/10.1515/flin.2011.002(In English)

Hunston, S. (2002). Corpora in Applied Linguistics, Cambridge University Press, Cambridge. (In English)

Lan, G., Pan, X., Sun, Y. and Lu, Y. (2023). Part of speech tagging of grammatical features related to L2 Chinese development: A case analysis of Stanza in the L2 writing context, Frontiers in Psychology, 14. https://doi.org/10.3389/fpsyg.2023.1139703(In English)

Langacker, R. W. (1987). Foundations of Cognitive Grammar, Vol. I: Theoretical Prerequisites, Stanford University Press, Stanford. (In English)

Levshina, N. (2022). Corpus-based typology: applications, challenges and some solutions, Linguistic Typology, 26 (1), 129–160. https://doi.org/10.1515/lingty-2020-0118(In English)

Li, C. N. and Thompson, S. A. (1981). Mandarin Chinese: A Functional Reference Grammar, University of California Press, Berkeley, CA. (In English)

McEnery, T. and Hardie, A. (2012). Corpus Linguistics: Method, Theory and Practice, Cambridge University Press, Cambridge. (In English)

Partee, B. H. (1987). Noun phrase interpretation and type-shifting principles, Studies in Discourse Representation Theory and the Theory of Generalized Quantifiers, Groenendiyk J., de Jongh D., and Stokhof M. (Eds.), Foris, Dordrecht, 115–143. https://doi.org/10.1002/9780470758335.ch15(In English)

Sapir, E. (1921). Language, an Introduction to the Study of Speech, Harcourt, Brace & World, New York. (In English)

Schütze, H. (1995). Distributional part-of-speech tagging, Proceedings of the Seventh Conference on European Chapter of the Association for Computational Linguistics, 27–31 March 1995, Dublin, Ireland, 141–148. (In English)

Siewierska, A. (2004). Person, Cambridge, University Press Cambridge. (In English)

Sinclair, J. (1991). Corpus, Concordance, Collocation, Oxford University Press, Oxford. (In English)

Tomasello, M. (2003). Constructing a Language: A Usage-Based Theory of Language Acquisition, Harvard University Press, Cambridge, MA. (In English)

Tognini-Bonelli, E. (2001). Corpus Linguistics at Work, John Benyamins, Amsterdam. (In English)

Wang, Z., Huang, D., Cui, J., Zhang, X., Ho, S.-B., and Cambria, E. (2025). A review of Chinese sentiment analysis: subjects, methods, and trends, Artificial Intelligence Review, 58, 75. https://doi.org/10.1007/s10462-024-10988-9(In English)

Wang, X., Bin, S., Chen, I., and Hao, Y. (2026). Beyond Commonalities: A Quantitative Perspective on Syntactic Features across Mandarin Chinese Varieties, Journal of Quantitative Linguistics, 33(2), 81–116. https://doi.org/10.1080/09296174.2025.2567089(In English)

Xu, Z. (2018). The word status of Chinese adjective-noun combinations, Linguistics, 56(1), 207–256. https://doi.org/10.1515/ling-2017-0035(In English)

马氏文通. 北京: 商务印书馆. 1983. 447 p. = Ma, C. (1983). Explanation of Mr. Mas Written Language Rules, Shanwu Yinshuguan, Beiying. (In Chinese)

锦熙. 新著国文语法. 北京: 商务印书馆. 1993. 336 p. = Li, C. (1993). New Grammar of the National Language, Shanwu Yinshuguan, Beiying. (In Chinese)

杨波. 计算机辅助语言学习的发展与前景 // 中国信息科技. 2011. № 3. P. 107–109. = Cao, Ya. (2011). Development and Prospects of Language Teaching with the Help of Information Technology, Kitajskie informacionnye tehnologii [Chinese Information Technology], 3, 107–109. (In Chinese)

现代汉语同形同音词与多义词. 北京: 北京语言大学对外汉语研究中心. 2002–2003. = Zhang, B. (2002–2003). Homonymy and Polysemy of Modern Chinese, Center for the Study of Chinese as a Second Language, Beiying University of Languages and Culture, Beiying. [Online], available at: http://wenku.baidu.com/view/b3eb771afc4ffe473368ab83.html (Accessed 28 August 2025). (In Chinese)

现代汉语同音词研究: 论文 … 汉语言文字学; 四川师范大学, 成都. 2008. 99 p. = Dai, Q. (2008). Study of homonyms in modern Chinese, Ph.D. Thesis, Sichuan Normal University. Chengdu, China. (In Chinese)

现代汉语通论. 上海: 上海教育出版社. 2007. 335 p. = Shao, C. (2007). General Introduction to Modern Chinese, Shanghai Education Press, Shanghai. (In Chinese)

Corpus Material

Akhmanova, O. S. (1966). Slovar lingvisticheskih terminov [Dictionary of Linguistic Terms], Sov. Encyclopedia, Moscow, Russia. (In Russian)

Russian National Corpus [Online], available at: http://www.ruscorpora.ru (Accessed 28 August 2025). (In Russian)

Ozhegov, S. I., Shvedova, N. Yu. (2006). Tolkovy slovar russkogo yazyka: 80 000 slov i frazeologicheskih vyrazheniy [Explanatory Dictionary of the Russian Language: 80,000 Words and Phraseological Expressions], TEMP Publ., Moscow, Russia. (In Russian)

Rosenthal, D. E., Golub, I. B., Telenkova, M. A. (2010). Sovremenny russkiy yazyk [Modern Russian Language], Iris-press Publ., Moscow, Russia. (In Russian)

语料库在线 = Online corpus [Online], available at: http://www.cncorpus.org/ (Accessed 28 August 2025). (In Chinese)

现代汉语语法大全. = Sun, H. Grammar of Modern Chinese [Online], available at: http://wenku.baidu.com /view/0ea2c833433239-68011c9253.html (Accessed 28 August 2025). (In Chinese)

中国社会科学院语言研究所. 北京: 商务印书馆. 2016. 1799 p. = Dictionary of Modern Chinese / Institute of Linguistics, Academy of Social Sciences of the Peoples Republic of China (2016), Shanwu Yinshuguan, Beiying. (In Chinese)

中学教学语法系统提要 (试行) / 人民教育出版社中学语文室. 北京: 人民教育出版社. 1984. 21 p. = Curriculum for the Grammar System of Middle Schools (Experimental Mode) / Department of Language and Literature for Middle Schools, (1984). Peoples Educational Publishing House, Beiying. [Online], available at: https://eol.shzu.edu.cn/meol/analytics/ (Accessed 28 August 2025). (InChinese)

All journals

Send article

Research Result. Theoretical and Applied Linguistics is included in the scientific database of the RINTs (license agreement No. 765-12/2014 dated 08.12.2014).

Журнал включен в перечень рецензируемых научных изданий, рекомендуемых ВАК

The journal is indexed by the following scientific databases and platforms

Research Result. Research Result. Theoretical and Applied Linguistics (ISSN 2313-8912)

The journal materials and website are licensed under Creative Commons «Attribution» 4.0 International.

The Founder: Federal State Autonomous Educational Institution of Higher Education "Belgorod National Research University"The Founder’s address: 85 Pobedy Street, Belgorod, the Belgorod region, 308015, Russia

The Publisher: Federal State Autonomous Educational Institution of HigherEducation "Belgorod National Research University" The Founder’s address:85 Pobedy Street, Belgorod, the Belgorod region, 308015, Russia

Editors Office: chief editor Olga Dekhnich, e-mail: RR_Linguistics@bsuedu.ru, phone: (4722) 301254.

Registered by the Federal Service for Supervision of Communications, Information Technology and Mass Media (Roskomnadzor)

Certificate

Charter of the editorial board of the mass media "Research Result. Theoretical and Applied Linguistics"

Order No. 636-OD dated 30.06.2023 "On approval of the Charters of the editorial boards of the mass media of scientific journals of Belgorod State National Research University"

Order No. 1097-OD from 15.11.2023 "On approval of the Regulations for the publication of scientific journals of Belgorod State National Research University"

Order No. 76-OD from 10.02.2026 "On approval of the composition of the Editorial Board of the journal "Research Result. Theoretical and Applied Linguistics""

Have questions?
You can write to us:

✉ Site administration

✉ Content manager

✉ Executive Secretary