Список литературы

2313-8912

Научный результат. Вопросы теоретической и прикладной лингвистики

2313-8912

10.18413/2313-8912-2025-11-3-0-5

3879

ПРИКЛАДНАЯ ЛИНГВИСТИКА

<strong>От STM до GPT: сравнительный анализ методов тематического моделирования на материале предметной области «ИИ в стоматологии»</strong>

<strong>From STM to GPT: A Comparative Study of Topic Modeling Methods for AI in Dentistry</strong>

Литвинова

Татьяна Александровна

Litvinova

Tatiana A.

centr_rus_yaz@mail.ru

Ипполитов

Юрий Алексеевич

Ippolitov

Yury A.

dsvgma@mail.ru

Середин

Павел Владимирович

Seredin

Pavel V.

paul@phys.vsu.ru

Воронежский государственный медицинский университет имени Н.Н. Бурденко, Воронеж, РоссияВоронежский государственный университет, Воронеж, РоссияВоронежский государственный педагогический университет, Россия

2025

11300

Исследование представляет собой анализ научных аннотаций в области применения искусственного интеллекта (ИИ) в стоматологии с использованием различных методов тематического моделирования. Мы сформировали и проанализировали корпус из 3170 аннотаций научных статей, опубликованных в 2019–2025 гг. в изданиях, индексируемых в базах данных Dimensions и Scopus. Были сравнены три подхода к тематическому моделированию: структурное тематическое моделирование (STM) вероятностная модель, позволяющая анализировать временные тенденции; кластеризация на основе эмбеддингов с использованием алгоритма Leiden — стабильная альтернатива BERTopic; моделирование с использованием GPT-4o без обучения модели. Для оценки качества тем была применена совокупность метрик. Показано, что алгоритм STM дает наиболее компактную и чётко разделённую структуру тем; GPT оказался эффективным для создания названий тем и кратких описаний, но показал большее тематическое перекрытие и менее чёткие границы между темами. Мы также выполнили согласованное выравнивание тем в едином GPT-пространстве и выявили как стабильные, так и специфичные для моделей темы, а также общие временные тренды. Полученные результаты подчёркивают ценность комбинирования классических вероятностных моделей с возможностями LLM для достижения оптимального качества тематического моделирования. Хотя GPT-4o повышает интерпретируемость, его не следует использовать как единственный метод для анализа тем. Предложенный гибридный подход является масштабируемой и воспроизводимой стратегией для проведения обзоров литературы в быстро развивающихся областях исследований.

This study presents a comprehensive topic modeling analysis of scientific abstracts in the field of artificial intelligence (AI) applied to dentistry. We compiled and analyzed 3,170 peer-reviewed abstracts published between 2019 and 2025 from the Dimensions and Scopus databases. Three complementary approaches were compared: (1) Structural Topic Modeling (STM), a probabilistic framework incorporating publication year as a covariate to enable temporal trend analysis; (2) embedding-based clustering using the Leiden algorithm on OpenAI text embeddings; and (3) zero-shot GPT-based topic modeling, in which GPT-4o generated topics, descriptions, and keywords directly from batches of abstracts without model training. Topic quality was evaluated using compactness, distinctiveness, silhouette scores, and label redundancy. STM consistently produced the most compact and well-separated topics, embedding-based clustering excelled in identifying discrete semantic groupings, and GPT-based modeling provided interpretable, human-readable labels but exhibited greater thematic overlap. To ensure comparability across methods, we introduced a two-layer alignment framework that integrates topic-level similarity with document-level consensus, enabling robust cross-model comparison. Using this framework, we identified stable topics consistently recovered across methods (e.g., caries detection, radiographic AI diagnostics) as well as method-specific themes. Temporal trend analysis in this shared space revealed a clear shift from foundational AI methods (e.g., image segmentation, image enhancement) toward applied and integrative areas, including large language model applications, patient-facing tools, and AI in clinical education. Our results underscore the value of combining classical probabilistic models with modern large language model (LLM) tools for optimal topic modeling performance. While GPT-4o enhances interpretability, it should not be used in isolation for mapping thematic structures in scientific literature, at least not without pre-screening and prompt experimentation. Overall, our findings demonstrate the importance of hybrid topic modeling for mapping thematic structures in fast-evolving scientific domains, with dentistry serving as a case study.

Тематическое моделированиеструктурное тематическое моделированиеGPT-4Большие языковые моделиИИ в стоматологииВыравнивание темКластеризация на основе эмбеддинговИзвлечение тем с использованием больших языковых моделейАнализ научной литературы

Topic ModelingStructural Topic ModelGPT-4Large Language ModelsDentistryAI in DentistryLLM in DentistryScientometric AnalysisEmbedding-Based ClusteringZero-Shot Topic ModelingBibliometric methods

Исследование выполнено в Воронежском государственном университете при поддержке Российского научного фонда, грант № 23-15-00060.

Список литературы

Allani H. Multidisciplinary Applications of AI in Dentistry: Bibliometric Review / H. Allani, A. T. Santos, H. Ribeiro-Vidal // Applied Sciences. 2024. Т. 14, № 17. Ст. 7624. https://doi.org/10.3390/app14177624

Benz P. Mapping the unseen in practice: comparing latent Dirichlet allocation and BERTopic for navigating topic spaces / P. Benz, C. Pradier, D. Kozlowski [и др.] // Scientometrics. 2025. Ранний онлайн-доступ (10.06.2025). https://doi.org/10.1007/s11192-025-05339-6

Blei D. M. Latent Dirichlet Allocation / D. M. Blei, A. Y. Ng, M. I. Jordan // Journal of Machine Learning Research. 2003. Т. 3. С. 993–1022.

Büttner M. Natural Language Processing: Chances and Challenges in Dentistry / M. Büttner, U. Leser, L. Schneider, F. Schwendicke // Journal of Dentistry. 2024. Т. 141. Ст. 104796. https://doi.org/10.1016/j.jdent.2023.104796

Cosola D. M. Artificial intelligence in dentistry: a narrative review of applications, challenges, and future directions / D. M. Cosola, A. Ballini, F. A. Prencipe [и др.] // Minerva Dent Oral Sci. 2025. Ранний онлайн-доступ (18.06.2025). https://doi.org/10.23736/S2724-6329.25.05217-9 (In English)

de Magalhães A. A. Advancements in Diagnostic Methods and Imaging Technologies in Dentistry: A Literature Review of Emerging Approaches / A. A. de Magalhães, A. T. Santos // Journal of Clinical Medicine. 2025. Т. 14, № 4. Ст. 1277. https://doi.org/10.3390/jcm14041277

Grootendorst M. BERTopic: Neural topic modeling with a class-based TF-IDF procedure / M. Grootendorst // arXiv preprint arXiv:2203.05794. URL: https://arxiv.org/abs/2203.05794 (дата обращения: 22.08.2025).

Hu J. Advances in hydrological research in China over the past two decades: Insights from advanced large language model and topic modeling / J. Hu, C. Miao, Y. Wu, J. Su // Fundamental Research. 2025. Ранний онлайн-доступ (10.05.2025). https://doi.org/10.1016/j.fmre.2025.05.002

Islam K. M. S. Contextual Embedding-based Clustering to Identify Topics for Healthcare Service Improvement / K. M. S. Islam // arXiv:2504.14068. URL: https://arxiv.org/abs/2504.14068 (дата обращения: 22.08.2025). https://doi.org/10.48550/arXiv.2504.14068

Jung H. S. Expansive data, extensive model: Investigating discussion topics around LLM through unsupervised machine learning in academic papers and news / H. S. Jung, H. Lee, Y. S. Woo [и др.] // PLOS ONE. 2024. Т. 19, № 5. Ст. e0304680. https://doi.org/10.1371/journal.pone.0304680

Kozlowski D. Generative AI for automatic topic labelling / D. Kozlowski // arXiv:2408.07003. URL: https://arxiv.org/abs/2408.07003 (дата обращения: 22.08.2025). https://doi.org/10.48550/arxiv.2408.07003

Lee V. V. Harnessing ChatGPT for Thematic Analysis: Are We Ready? / V. V. Lee, S. C. C. van der Lubbe, L. H. Goh, J. M. Valderas // Journal of Medical Internet Research. 2024. Т. 26. Ст. e54974. https://doi.org/10.2196/54974

Lee Y. Prompt engineering in ChatGPT for literature review: practical guide exemplified with studies on white phosphors / Y. Lee, J. H. Oh, D. Lee [и др.] // Sci Rep. 2025. Т. 15. Ст. 15310. https://doi.org/10.1038/s41598-025-99423-9

Mahmoud M. Predicting Software Engineering Trends from Scientific Papers with a Combined Framework of Clustering and Topic Modeling / M. Mahmoud, M. Mashaly, A.-E. Mervat // Proc. 2025 15th International Conference on Electrical Engineering (ICEENG). 2025. С. 1–6. https://doi.org/10.1109/ICEENG64546.2025.11031347

Mathis W. S. Inductive thematic analysis of healthcare qualitative interviews using open-source large language models: How does it compare to traditional methods? / W. S. Mathis, S. Zhao, N. Pratt [и др.] // Computer Methods and Programs in Biomedicine. 2024. Т. 255. Ст. 108356.https://doi.org/10.1016/j.cmpb.2024.108356

Meng F. Demand-side energy management reimagined: A comprehensive literature analysis leveraging large language models / F. Meng, Z. Lu, X. Li [и др.] // Energy. 2024. Т. 291.

Ст. 130303. https://doi.org/10.1016/j.energy.2024.130303

Mu Y. Addressing topic granularity and hallucination in large language models for topic modelling / Y. Mu, P. Bai, K. Bontcheva, X. Song // arXiv:2405.00611v1. URL: https://arxiv.org/html/2405.00611v1 (дата обращения: 22.08.2025).

Mu Y. Large language models offer an alternative to the traditional approach of topic modelling / Y. Mu, C. Dong, K. Bontcheva, X. Song // Proc. LREC-COLING 2024. 2024. С. 10160–10171.

Ogunleye B. Topic modelling through the bibliometrics lens and its technique / B. Ogunleye, B. S. Lancho Barrantes, K. I. Zakariyyah [и др.] // Artificial Intelligence Review. 2025. Т. 58. Ст. 74. https://doi.org/10.1007/s10462-024-11011-x

Pham C. M. TopicGPT: A Prompt-based Topic Modeling Framework / C. M. Pham, A. Hoyle, S. Sun // Proc. of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT 2024). 2024. С. 2956–2984. Mexico City. Association for Computational Linguistics.

Reuter A. Gptopic: Dynamic and interactive topic representations / A. Reuter, A. Thielmann, C. Weisser [и др.] // arXiv:2403.03628. URL: https://arxiv.org/abs/2403.03628 (дата обращения: 22.08.2025).

Riaz A. Exploring topic modelling: a comparative analysis of traditional and transformer-based approaches with emphasis on coherence and diversity / A. Riaz, O. Abdulkader, M. J. Ikram, J. Sadaqat // International Journal of Electrical and Computer Engineering (IJECE). 2025. Т. 15, № 2. С. 1933–1948. http://doi.org/10.11591/ijece.v15i2.pp1933-1948

Roberts M. E. STM: An R package for structural topic models / M. E. Roberts, B. M. Stewart, D. Tingley // Journal of Statistical Software. 2019. Т. 91, № 2. С. 1–40. https://doi.org/10.18637/jss.v091.i02

Şakar S. Research Topics and Trends in Gifted Education: A Structural Topic Model / S. Şakar, S. Tan // Gifted Child Quarterly. 2025. Т. 69, № 1. С. 68–84. https://doi.org/10.1177/0016986224128504

Sbalchiero S. Topic modeling, long texts and the best number of topics. Some problems and solutions / S. Sbalchiero, M. Eder // Quality & Quantity. 2020. Т. 54. С. 1095–1108. https://doi.org/10.1007/s11135-020-00976-w

Shapurian G. Large Language Models and Knowledge Graphs for Astronomical Entity Disambiguation / G. Shapurian // arXiv:2406.11400. URL: https://arxiv.org/pdf/2406.11400 (дата обращения: 22.08.2025).

Sharma A. DeTAILS: Deep Thematic Analysis with Iterative LLM Support / A. Sharma, J. R. Wallace // Proc. of the 7th ACM Conference on Conversational User Interfaces (CUI ’25). 2025. Article 28. С. 1–7. https://doi.org/10.1145/3719160.3735657

Shirani M. Trends and Classification of Artificial Intelligence Models Utilized in Dentistry: A Bibliometric Study / M. Shirani // Cureus. 2025. Т. 17, № 4. Ст. e81836. https://doi.org/10.7759/cureus.81836

Silveira L. Cone Beam Computed Tomography and Artificial Intelligence. Where We Are? / L. Silveira // Rev Cient Odontol. 2024. Т. 12, № 4. Ст. e214. https://doi.org/10.21142/2523-2754-1204-2024-214

Tarek A. Query-Based Topic Modeling and Trend Analysis in Scientific Literature / A. Tarek, M. Mahmoud, B. Afifi [и др.] // 2024 International Conference on Microelectronics (ICM). Doha, Qatar. 2024. С. 1–6. https://doi.org/10.1109/ICM63406.2024.10815706

Torres J. PROMPTHEUS: A Human-Centered Pipeline to Streamline Systematic Literature Reviews with Large Language Models / J. Torres, C. Mulligan, J. Jorge, C. Moreira // Information. 2025. Т. 16, № 5. Ст. 420. https://doi.org/10.3390/info16050420

Wu X. A survey on neural topic models: Methods, applications, and challenges / X. Wu, T. Nguyen, A. Luu // Artificial Intelligence Review. 2024. Т. 57, № 18. С. 1–30. https://doi.org/10.1007/s10462-023-10661-7

Xie B. Artificial intelligence in dentistry: a bibliometric analysis from 2000 to 2023 / B. Xie, D. Xu, X. Q. Zou [и др.] // Journal of Dental Sciences. 2024. Т. 19. С. 1722–1733. https://doi.org/10.1016/j.jds.2023.10.025

Zatt F. P. Artificial intelligence applications in dentistry: a bibliometric review with an emphasis on computational research trends within the field / F. P. Zatt, A. O. Rocha, L. M. Anjos [и др.] // Journal of the American Dental Association. 2024. Т. 155, № 9. С. 755–764. https://doi.org/10.1016/j.adaj.2024.05.013