<?xml version='1.0' encoding='utf-8'?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.2 20190208//EN" "http://jats.nlm.nih.gov/publishing/1.2/JATS-journalpublishing1.dtd">
<article article-type="research-article" dtd-version="1.2" xml:lang="ru" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><front><journal-meta><journal-id journal-id-type="issn">2313-8912</journal-id><journal-title-group><journal-title>Научный результат. Вопросы теоретической и прикладной лингвистики</journal-title></journal-title-group><issn pub-type="epub">2313-8912</issn></journal-meta><article-meta><article-id pub-id-type="doi">10.18413/2313-8912-2025-11-3-0-5</article-id><article-id pub-id-type="publisher-id">3879</article-id><article-categories><subj-group subj-group-type="heading"><subject>ПРИКЛАДНАЯ ЛИНГВИСТИКА</subject></subj-group></article-categories><title-group><article-title>&lt;strong&gt;От STM до GPT: сравнительный анализ методов тематического моделирования на материале предметной области &amp;laquo;ИИ в стоматологии&amp;raquo;&lt;/strong&gt;</article-title><trans-title-group xml:lang="en"><trans-title>&lt;strong&gt;From STM to GPT: A Comparative Study of Topic Modeling Methods for AI in Dentistry&lt;/strong&gt;</trans-title></trans-title-group></title-group><contrib-group><contrib contrib-type="author"><name-alternatives><name xml:lang="ru"><surname>Литвинова</surname><given-names>Татьяна Александровна</given-names></name><name xml:lang="en"><surname>Litvinova</surname><given-names>Tatiana A.</given-names></name></name-alternatives><email>centr_rus_yaz@mail.ru</email><xref ref-type="aff" rid="aff1" /></contrib><contrib contrib-type="author"><name-alternatives><name xml:lang="ru"><surname>Ипполитов</surname><given-names>Юрий Алексеевич</given-names></name><name xml:lang="en"><surname>Ippolitov</surname><given-names>Yury A.</given-names></name></name-alternatives><email>dsvgma@mail.ru</email><xref ref-type="aff" rid="aff2" /></contrib><contrib contrib-type="author"><name-alternatives><name xml:lang="ru"><surname>Середин</surname><given-names>Павел Владимирович</given-names></name><name xml:lang="en"><surname>Seredin</surname><given-names>Pavel V.</given-names></name></name-alternatives><email>paul@phys.vsu.ru</email><xref ref-type="aff" rid="aff3" /></contrib></contrib-group><aff id="aff2"><institution>Воронежский государственный медицинский университет имени Н.Н. Бурденко,  Воронеж, Россия</institution></aff><aff id="aff1"><institution>Воронежский государственный педагогический университет, Россия</institution></aff><aff id="aff3"><institution>Воронежский государственный университет, Воронеж, Россия</institution></aff><pub-date pub-type="epub"><year>2025</year></pub-date><volume>11</volume><issue>3</issue><fpage>0</fpage><lpage>0</lpage><self-uri content-type="pdf" xlink:href="/media/linguistics/2025/3/Лингвистика_11_3-86-122.pdf" /><abstract xml:lang="ru"><p>Исследование представляет собой анализ научных аннотаций в области применения искусственного интеллекта (ИИ) в стоматологии с использованием различных методов тематического моделирования. Мы сформировали и проанализировали корпус из 3170 аннотаций научных статей, опубликованных в 2019&amp;ndash;2025 гг. в изданиях, индексируемых в базах данных Dimensions и Scopus. Были сравнены три подхода к тематическому моделированию: структурное тематическое моделирование (STM) вероятностная модель, позволяющая анализировать временные тенденции; кластеризация на основе эмбеддингов с использованием алгоритма Leiden &amp;mdash; стабильная альтернатива BERTopic; моделирование с использованием GPT-4o без обучения модели. Для оценки качества тем была применена совокупность метрик. Показано, что алгоритм STM дает наиболее компактную и чётко разделённую структуру тем; GPT оказался эффективным для создания названий тем и кратких описаний, но показал большее тематическое перекрытие и менее чёткие границы между темами. Мы также выполнили согласованное выравнивание тем в едином GPT-пространстве и выявили как стабильные, так и специфичные для моделей темы, а также общие временные тренды. Полученные результаты подчёркивают ценность комбинирования классических вероятностных моделей с возможностями LLM для достижения оптимального качества тематического моделирования. Хотя GPT-4o повышает интерпретируемость, его не следует использовать как единственный метод для анализа тем. Предложенный гибридный подход является масштабируемой и воспроизводимой стратегией для проведения обзоров литературы в быстро развивающихся областях исследований.



</p></abstract><trans-abstract xml:lang="en"><p>This study presents a comprehensive topic modeling analysis of scientific abstracts in the field of artificial intelligence (AI) applied to dentistry. We compiled and analyzed 3,170 peer-reviewed abstracts published between 2019 and 2025 from the Dimensions and Scopus databases. Three complementary approaches were compared: (1) Structural Topic Modeling (STM), a probabilistic framework incorporating publication year as a covariate to enable temporal trend analysis; (2) embedding-based clustering using the Leiden algorithm on OpenAI text embeddings; and (3) zero-shot GPT-based topic modeling, in which GPT-4o generated topics, descriptions, and keywords directly from batches of abstracts without model training. Topic quality was evaluated using compactness, distinctiveness, silhouette scores, and label redundancy. STM consistently produced the most compact and well-separated topics, embedding-based clustering excelled in identifying discrete semantic groupings, and GPT-based modeling provided interpretable, human-readable labels but exhibited greater thematic overlap. To ensure comparability across methods, we introduced a two-layer alignment framework that integrates topic-level similarity with document-level consensus, enabling robust cross-model comparison. Using this framework, we identified stable topics consistently recovered across methods (e.g., caries detection, radiographic AI diagnostics) as well as method-specific themes. Temporal trend analysis in this shared space revealed a clear shift from foundational AI methods (e.g., image segmentation, image enhancement) toward applied and integrative areas, including large language model applications, patient-facing tools, and AI in clinical education. Our results underscore the value of combining classical probabilistic models with modern large language model (LLM) tools for optimal topic modeling performance. While GPT-4o enhances interpretability, it should not be used in isolation for mapping thematic structures in scientific literature, at least not without pre-screening and prompt experimentation. Overall, our findings demonstrate the importance of hybrid topic modeling for mapping thematic structures in fast-evolving scientific domains, with dentistry serving as a case study.



</p></trans-abstract><kwd-group xml:lang="ru"><kwd>Тематическое моделирование</kwd><kwd>структурное тематическое моделирование</kwd><kwd>GPT-4</kwd><kwd>Большие языковые модели</kwd><kwd>ИИ в стоматологии</kwd><kwd>Выравнивание тем</kwd><kwd>Кластеризация на основе эмбеддингов</kwd><kwd>Извлечение тем с использованием больших языковых моделей</kwd><kwd>Анализ научной литературы</kwd></kwd-group><kwd-group xml:lang="en"><kwd>Topic Modeling</kwd><kwd>Structural Topic Model</kwd><kwd>GPT-4</kwd><kwd>Large Language Models</kwd><kwd>Dentistry</kwd><kwd>AI in Dentistry</kwd><kwd>LLM in Dentistry</kwd><kwd>Scientometric Analysis</kwd><kwd>Embedding-Based Clustering</kwd><kwd>Zero-Shot Topic Modeling</kwd><kwd>Bibliometric methods</kwd></kwd-group></article-meta></front><back><ack><p>Исследование выполнено в Воронежском государственном университете при поддержке Российского научного фонда, грант № 23-15-00060.</p></ack><ref-list><title>Список литературы</title><ref id="B1"><mixed-citation>Allani&amp;nbsp;H. Multidisciplinary Applications of AI in Dentistry: Bibliometric Review / H.&amp;nbsp;Allani, A.&amp;nbsp;T.&amp;nbsp;Santos, H.&amp;nbsp;Ribeiro-Vidal // Applied Sciences. 2024. Т. 14, № 17. Ст. 7624. https://doi.org/10.3390/app14177624</mixed-citation></ref><ref id="B2"><mixed-citation>Benz&amp;nbsp;P. Mapping the unseen in practice: comparing latent Dirichlet allocation and BERTopic for navigating topic spaces / P.&amp;nbsp;Benz, C.&amp;nbsp;Pradier, D.&amp;nbsp;Kozlowski [и др.] // Scientometrics. 2025. Ранний онлайн-доступ (10.06.2025). https://doi.org/10.1007/s11192-025-05339-6</mixed-citation></ref><ref id="B3"><mixed-citation>Blei&amp;nbsp;D.&amp;nbsp;M. Latent Dirichlet Allocation / D.&amp;nbsp;M.&amp;nbsp;Blei, A.&amp;nbsp;Y.&amp;nbsp;Ng, M.&amp;nbsp;I.&amp;nbsp;Jordan // Journal of Machine Learning Research. 2003. Т. 3. С. 993&amp;ndash;1022.</mixed-citation></ref><ref id="B4"><mixed-citation>B&amp;uuml;ttner&amp;nbsp;M. Natural Language Processing: Chances and Challenges in Dentistry / M.&amp;nbsp;B&amp;uuml;ttner, U.&amp;nbsp;Leser, L.&amp;nbsp;Schneider, F.&amp;nbsp;Schwendicke // Journal of Dentistry. 2024. Т. 141. Ст. 104796. https://doi.org/10.1016/j.jdent.2023.104796</mixed-citation></ref><ref id="B5"><mixed-citation>Cosola&amp;nbsp;D.&amp;nbsp;M. Artificial intelligence in dentistry: a narrative review of applications, challenges, and future directions / D.&amp;nbsp;M.&amp;nbsp;Cosola, A.&amp;nbsp;Ballini, F.&amp;nbsp;A.&amp;nbsp;Prencipe [и др.] // Minerva Dent Oral Sci. 2025. Ранний онлайн-доступ (18.06.2025). https://doi.org/10.23736/S2724-6329.25.05217-9 (In English)</mixed-citation></ref><ref id="B6"><mixed-citation>de&amp;nbsp;Magalh&amp;atilde;es&amp;nbsp;A.&amp;nbsp;A. Advancements in Diagnostic Methods and Imaging Technologies in Dentistry: A Literature Review of Emerging Approaches / A.&amp;nbsp;A.&amp;nbsp;de&amp;nbsp;Magalh&amp;atilde;es, A.&amp;nbsp;T.&amp;nbsp;Santos // Journal of Clinical Medicine. 2025. Т. 14, № 4. Ст. 1277. https://doi.org/10.3390/jcm14041277</mixed-citation></ref><ref id="B7"><mixed-citation>Grootendorst&amp;nbsp;M. BERTopic: Neural topic modeling with a class-based TF-IDF procedure / M.&amp;nbsp;Grootendorst // arXiv preprint arXiv:2203.05794. URL: https://arxiv.org/abs/2203.05794 (дата обращения: 22.08.2025).</mixed-citation></ref><ref id="B8"><mixed-citation>Hu&amp;nbsp;J. Advances in hydrological research in China over the past two decades: Insights from advanced large language model and topic modeling / J.&amp;nbsp;Hu, C.&amp;nbsp;Miao, Y.&amp;nbsp;Wu, J.&amp;nbsp;Su // Fundamental Research. 2025. Ранний онлайн-доступ (10.05.2025). https://doi.org/10.1016/j.fmre.2025.05.002</mixed-citation></ref><ref id="B9"><mixed-citation>Islam&amp;nbsp;K.&amp;nbsp;M.&amp;nbsp;S. Contextual Embedding-based Clustering to Identify Topics for Healthcare Service Improvement / K.&amp;nbsp;M.&amp;nbsp;S.&amp;nbsp;Islam // arXiv:2504.14068. URL: https://arxiv.org/abs/2504.14068 (дата обращения: 22.08.2025). https://doi.org/10.48550/arXiv.2504.14068</mixed-citation></ref><ref id="B10"><mixed-citation>Jung&amp;nbsp;H.&amp;nbsp;S. Expansive data, extensive model: Investigating discussion topics around LLM through unsupervised machine learning in academic papers and news / H.&amp;nbsp;S.&amp;nbsp;Jung, H.&amp;nbsp;Lee, Y.&amp;nbsp;S.&amp;nbsp;Woo [и др.] // PLOS ONE. 2024. Т. 19, № 5. Ст. e0304680. https://doi.org/10.1371/journal.pone.0304680</mixed-citation></ref><ref id="B11"><mixed-citation>Kozlowski&amp;nbsp;D. Generative AI for automatic topic labelling / D.&amp;nbsp;Kozlowski // arXiv:2408.07003. URL: https://arxiv.org/abs/2408.07003 (дата обращения: 22.08.2025). https://doi.org/10.48550/arxiv.2408.07003</mixed-citation></ref><ref id="B12"><mixed-citation>Lee&amp;nbsp;V.&amp;nbsp;V. Harnessing ChatGPT for Thematic Analysis: Are We Ready? / V.&amp;nbsp;V.&amp;nbsp;Lee, S.&amp;nbsp;C.&amp;nbsp;C. van der Lubbe, L.&amp;nbsp;H.&amp;nbsp;Goh, J.&amp;nbsp;M.&amp;nbsp;Valderas // Journal of Medical Internet Research. 2024. Т. 26. Ст. e54974. https://doi.org/10.2196/54974</mixed-citation></ref><ref id="B13"><mixed-citation>Lee&amp;nbsp;Y. Prompt engineering in ChatGPT for literature review: practical guide exemplified with studies on white phosphors / Y.&amp;nbsp;Lee, J. H.&amp;nbsp;Oh, D.&amp;nbsp;Lee [и др.] // Sci Rep. 2025. Т. 15. Ст. 15310. https://doi.org/10.1038/s41598-025-99423-9</mixed-citation></ref><ref id="B14"><mixed-citation>Mahmoud&amp;nbsp;M. Predicting Software Engineering Trends from Scientific Papers with a Combined Framework of Clustering and Topic Modeling / M.&amp;nbsp;Mahmoud, M.&amp;nbsp;Mashaly, A.-E.&amp;nbsp;Mervat // Proc. 2025 15th International Conference on Electrical Engineering (ICEENG). 2025. С.&amp;nbsp;1&amp;ndash;6. https://doi.org/10.1109/ICEENG64546.2025.11031347</mixed-citation></ref><ref id="B15"><mixed-citation>Mathis&amp;nbsp;W.&amp;nbsp;S. Inductive thematic analysis of healthcare qualitative interviews using open-source large language models: How does it compare to traditional methods? / W.&amp;nbsp;S.&amp;nbsp;Mathis, S.&amp;nbsp;Zhao, N.&amp;nbsp;Pratt [и др.] // Computer Methods and Programs in Biomedicine. 2024. Т. 255. Ст. 108356.https://doi.org/10.1016/j.cmpb.2024.108356</mixed-citation></ref><ref id="B16"><mixed-citation>Meng&amp;nbsp;F. Demand-side energy management reimagined: A comprehensive literature analysis leveraging large language models / F.&amp;nbsp;Meng, Z.&amp;nbsp;Lu, X.&amp;nbsp;Li [и др.] // Energy. 2024. Т. 291.</mixed-citation></ref><ref id="B17"><mixed-citation>Ст. 130303. https://doi.org/10.1016/j.energy.2024.130303</mixed-citation></ref><ref id="B18"><mixed-citation>Mu&amp;nbsp;Y. Addressing topic granularity and hallucination in large language models for topic modelling / Y.&amp;nbsp;Mu, P.&amp;nbsp;Bai, K.&amp;nbsp;Bontcheva, X.&amp;nbsp;Song // arXiv:2405.00611v1. URL: https://arxiv.org/html/2405.00611v1 (дата обращения: 22.08.2025).</mixed-citation></ref><ref id="B19"><mixed-citation>Mu&amp;nbsp;Y. Large language models offer an alternative to the traditional approach of topic modelling / Y.&amp;nbsp;Mu, C.&amp;nbsp;Dong, K.&amp;nbsp;Bontcheva, X.&amp;nbsp;Song // Proc. LREC-COLING 2024. 2024. С.&amp;nbsp;10160&amp;ndash;10171.</mixed-citation></ref><ref id="B20"><mixed-citation>Ogunleye&amp;nbsp;B. Topic modelling through the bibliometrics lens and its technique / B.&amp;nbsp;Ogunleye, B.&amp;nbsp;S.&amp;nbsp;Lancho Barrantes, K.&amp;nbsp;I.&amp;nbsp;Zakariyyah [и др.] // Artificial Intelligence Review. 2025. Т. 58. Ст. 74. https://doi.org/10.1007/s10462-024-11011-x</mixed-citation></ref><ref id="B21"><mixed-citation>Pham&amp;nbsp;C.&amp;nbsp;M. TopicGPT: A Prompt-based Topic Modeling Framework / C.&amp;nbsp;M.&amp;nbsp;Pham, A.&amp;nbsp;Hoyle, S.&amp;nbsp;Sun // Proc. of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT 2024). 2024. С.&amp;nbsp;2956&amp;ndash;2984. Mexico City. Association for Computational Linguistics.</mixed-citation></ref><ref id="B22"><mixed-citation>Reuter&amp;nbsp;A. Gptopic: Dynamic and interactive topic representations / A.&amp;nbsp;Reuter, A.&amp;nbsp;Thielmann, C.&amp;nbsp;Weisser [и др.] // arXiv:2403.03628. URL: https://arxiv.org/abs/2403.03628 (дата обращения: 22.08.2025).</mixed-citation></ref><ref id="B23"><mixed-citation>Riaz&amp;nbsp;A. Exploring topic modelling: a comparative analysis of traditional and transformer-based approaches with emphasis on coherence and diversity / A.&amp;nbsp;Riaz, O.&amp;nbsp;Abdulkader, M.&amp;nbsp;J.&amp;nbsp;Ikram, J.&amp;nbsp;Sadaqat // International Journal of Electrical and Computer Engineering (IJECE). 2025. Т.&amp;nbsp;15, №&amp;nbsp;2. С.&amp;nbsp;1933&amp;ndash;1948. http://doi.org/10.11591/ijece.v15i2.pp1933-1948</mixed-citation></ref><ref id="B24"><mixed-citation>Roberts&amp;nbsp;M.&amp;nbsp;E. STM: An R package for structural topic models / M.&amp;nbsp;E.&amp;nbsp;Roberts, B.&amp;nbsp;M.&amp;nbsp;Stewart, D.&amp;nbsp;Tingley // Journal of Statistical Software. 2019. Т.&amp;nbsp;91, № 2. С.&amp;nbsp;1&amp;ndash;40. https://doi.org/10.18637/jss.v091.i02</mixed-citation></ref><ref id="B25"><mixed-citation>Şakar&amp;nbsp;S. Research Topics and Trends in Gifted Education: A Structural Topic Model / S.&amp;nbsp;Şakar, S.&amp;nbsp;Tan // Gifted Child Quarterly. 2025. Т. 69, №&amp;nbsp;1. С.&amp;nbsp;68&amp;ndash;84. https://doi.org/10.1177/0016986224128504</mixed-citation></ref><ref id="B26"><mixed-citation>Sbalchiero&amp;nbsp;S. Topic modeling, long texts and the best number of topics. Some problems and solutions / S.&amp;nbsp;Sbalchiero, M.&amp;nbsp;Eder // Quality &amp;amp; Quantity. 2020. Т. 54. С.&amp;nbsp;1095&amp;ndash;1108. https://doi.org/10.1007/s11135-020-00976-w</mixed-citation></ref><ref id="B27"><mixed-citation>Shapurian&amp;nbsp;G. Large Language Models and Knowledge Graphs for Astronomical Entity Disambiguation / G.&amp;nbsp;Shapurian // arXiv:2406.11400. URL: https://arxiv.org/pdf/2406.11400 (дата обращения: 22.08.2025).</mixed-citation></ref><ref id="B28"><mixed-citation>Sharma&amp;nbsp;A. DeTAILS: Deep Thematic Analysis with Iterative LLM Support / A.&amp;nbsp;Sharma, J.&amp;nbsp;R.&amp;nbsp;Wallace // Proc. of the 7th ACM Conference on Conversational User Interfaces (CUI &amp;rsquo;25). 2025. Article 28. С.&amp;nbsp;1&amp;ndash;7. https://doi.org/10.1145/3719160.3735657</mixed-citation></ref><ref id="B29"><mixed-citation>Shirani&amp;nbsp;M. Trends and Classification of Artificial Intelligence Models Utilized in Dentistry: A Bibliometric Study / M.&amp;nbsp;Shirani // Cureus. 2025. Т.&amp;nbsp;17, №&amp;nbsp;4. Ст. e81836. https://doi.org/10.7759/cureus.81836</mixed-citation></ref><ref id="B30"><mixed-citation>Silveira&amp;nbsp;L. Cone Beam Computed Tomography and Artificial Intelligence. Where We Are? / L.&amp;nbsp;Silveira // Rev Cient Odontol. 2024. Т.&amp;nbsp;12, №&amp;nbsp;4. Ст.&amp;nbsp;e214. https://doi.org/10.21142/2523-2754-1204-2024-214</mixed-citation></ref><ref id="B31"><mixed-citation>Tarek&amp;nbsp;A. Query-Based Topic Modeling and Trend Analysis in Scientific Literature / A.&amp;nbsp;Tarek, M.&amp;nbsp;Mahmoud, B.&amp;nbsp;Afifi [и др.] // 2024 International Conference on Microelectronics (ICM). Doha, Qatar. 2024. С.&amp;nbsp;1&amp;ndash;6. https://doi.org/10.1109/ICM63406.2024.10815706</mixed-citation></ref><ref id="B32"><mixed-citation>Torres&amp;nbsp;J. PROMPTHEUS: A Human-Centered Pipeline to Streamline Systematic Literature Reviews with Large Language Models / J.&amp;nbsp;Torres, C.&amp;nbsp;Mulligan, J.&amp;nbsp;Jorge, C.&amp;nbsp;Moreira // Information. 2025. Т.&amp;nbsp;16, №&amp;nbsp;5. Ст.&amp;nbsp;420. https://doi.org/10.3390/info16050420</mixed-citation></ref><ref id="B33"><mixed-citation>Wu&amp;nbsp;X. A survey on neural topic models: Methods, applications, and challenges / X.&amp;nbsp;Wu, T.&amp;nbsp;Nguyen, A.&amp;nbsp;Luu // Artificial Intelligence Review. 2024. Т.&amp;nbsp;57, №&amp;nbsp;18. С.&amp;nbsp;1&amp;ndash;30. https://doi.org/10.1007/s10462-023-10661-7</mixed-citation></ref><ref id="B34"><mixed-citation>Xie&amp;nbsp;B. Artificial intelligence in dentistry: a bibliometric analysis from 2000 to 2023 / B.&amp;nbsp;Xie, D.&amp;nbsp;Xu, X.&amp;nbsp;Q.&amp;nbsp;Zou [и др.] // Journal of Dental Sciences. 2024. Т.&amp;nbsp;19. С.&amp;nbsp;1722&amp;ndash;1733. https://doi.org/10.1016/j.jds.2023.10.025</mixed-citation></ref><ref id="B35"><mixed-citation>Zatt&amp;nbsp;F.&amp;nbsp;P. Artificial intelligence applications in dentistry: a bibliometric review with an emphasis on computational research trends within the field / F.&amp;nbsp;P.&amp;nbsp;Zatt, A.&amp;nbsp;O.&amp;nbsp;Rocha, L.&amp;nbsp;M.&amp;nbsp;Anjos [и др.] // Journal of the American Dental Association. 2024. Т. 155, № 9. С. 755&amp;ndash;764. https://doi.org/10.1016/j.adaj.2024.05.013</mixed-citation></ref></ref-list></back></article>