<?xml version='1.0' encoding='utf-8'?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.2 20190208//EN" "http://jats.nlm.nih.gov/publishing/1.2/JATS-journalpublishing1.dtd">
<article article-type="research-article" dtd-version="1.2" xml:lang="ru" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><front><journal-meta><journal-id journal-id-type="issn">2313-8912</journal-id><journal-title-group><journal-title>Research Result. Theoretical and Applied Linguistics</journal-title></journal-title-group><issn pub-type="epub">2313-8912</issn></journal-meta><article-meta><article-id pub-id-type="doi">10.18413/2313-8912-2025-11-3-0-5</article-id><article-id pub-id-type="publisher-id">3879</article-id><article-categories><subj-group subj-group-type="heading"><subject>APPLIED LINGUISTICS</subject></subj-group></article-categories><title-group><article-title>&lt;strong&gt;From STM to GPT: A Comparative Study of Topic Modeling Methods for AI in Dentistry&lt;/strong&gt;</article-title><trans-title-group xml:lang="en"><trans-title>&lt;strong&gt;From STM to GPT: A Comparative Study of Topic Modeling Methods for AI in Dentistry&lt;/strong&gt;</trans-title></trans-title-group></title-group><contrib-group><contrib contrib-type="author"><name-alternatives><name xml:lang="ru"><surname>Litvinova</surname><given-names>Tatiana A.</given-names></name><name xml:lang="en"><surname>Litvinova</surname><given-names>Tatiana A.</given-names></name></name-alternatives><email>centr_rus_yaz@mail.ru</email><xref ref-type="aff" rid="aff1" /></contrib><contrib contrib-type="author"><name-alternatives><name xml:lang="ru"><surname>Ippolitov</surname><given-names>Yury A.</given-names></name><name xml:lang="en"><surname>Ippolitov</surname><given-names>Yury A.</given-names></name></name-alternatives><email>dsvgma@mail.ru</email><xref ref-type="aff" rid="aff2" /></contrib><contrib contrib-type="author"><name-alternatives><name xml:lang="ru"><surname>Seredin</surname><given-names>Pavel V.</given-names></name><name xml:lang="en"><surname>Seredin</surname><given-names>Pavel V.</given-names></name></name-alternatives><email>paul@phys.vsu.ru</email><xref ref-type="aff" rid="aff3" /></contrib></contrib-group><aff id="aff3"><institution>Voronezh State University, Voronezh, Russia</institution></aff><aff id="aff2"><institution>N.N. Burdenko Voronezh State Medical University, Voronezh, Russia</institution></aff><aff id="aff1"><institution>Voronezh State Pedagogical University, Russia</institution></aff><pub-date pub-type="epub"><year>2025</year></pub-date><volume>11</volume><issue>3</issue><fpage>0</fpage><lpage>0</lpage><self-uri content-type="pdf" xlink:href="/media/linguistics/2025/3/Лингвистика_11_3-86-122.pdf" /><abstract xml:lang="ru"><p>This study presents a comprehensive topic modeling analysis of scientific abstracts in the field of artificial intelligence (AI) applied to dentistry. We compiled and analyzed 3,170 peer-reviewed abstracts published between 2019 and 2025 from the Dimensions and Scopus databases. Three complementary approaches were compared: (1) Structural Topic Modeling (STM), a probabilistic framework incorporating publication year as a covariate to enable temporal trend analysis; (2) embedding-based clustering using the Leiden algorithm on OpenAI text embeddings; and (3) zero-shot GPT-based topic modeling, in which GPT-4o generated topics, descriptions, and keywords directly from batches of abstracts without model training. Topic quality was evaluated using compactness, distinctiveness, silhouette scores, and label redundancy. STM consistently produced the most compact and well-separated topics, embedding-based clustering excelled in identifying discrete semantic groupings, and GPT-based modeling provided interpretable, human-readable labels but exhibited greater thematic overlap. To ensure comparability across methods, we introduced a two-layer alignment framework that integrates topic-level similarity with document-level consensus, enabling robust cross-model comparison. Using this framework, we identified stable topics consistently recovered across methods (e.g., caries detection, radiographic AI diagnostics) as well as method-specific themes. Temporal trend analysis in this shared space revealed a clear shift from foundational AI methods (e.g., image segmentation, image enhancement) toward applied and integrative areas, including large language model applications, patient-facing tools, and AI in clinical education. Our results underscore the value of combining classical probabilistic models with modern large language model (LLM) tools for optimal topic modeling performance. While GPT-4o enhances interpretability, it should not be used in isolation for mapping thematic structures in scientific literature, at least not without pre-screening and prompt experimentation. Overall, our findings demonstrate the importance of hybrid topic modeling for mapping thematic structures in fast-evolving scientific domains, with dentistry serving as a case study.



</p></abstract><trans-abstract xml:lang="en"><p>This study presents a comprehensive topic modeling analysis of scientific abstracts in the field of artificial intelligence (AI) applied to dentistry. We compiled and analyzed 3,170 peer-reviewed abstracts published between 2019 and 2025 from the Dimensions and Scopus databases. Three complementary approaches were compared: (1) Structural Topic Modeling (STM), a probabilistic framework incorporating publication year as a covariate to enable temporal trend analysis; (2) embedding-based clustering using the Leiden algorithm on OpenAI text embeddings; and (3) zero-shot GPT-based topic modeling, in which GPT-4o generated topics, descriptions, and keywords directly from batches of abstracts without model training. Topic quality was evaluated using compactness, distinctiveness, silhouette scores, and label redundancy. STM consistently produced the most compact and well-separated topics, embedding-based clustering excelled in identifying discrete semantic groupings, and GPT-based modeling provided interpretable, human-readable labels but exhibited greater thematic overlap. To ensure comparability across methods, we introduced a two-layer alignment framework that integrates topic-level similarity with document-level consensus, enabling robust cross-model comparison. Using this framework, we identified stable topics consistently recovered across methods (e.g., caries detection, radiographic AI diagnostics) as well as method-specific themes. Temporal trend analysis in this shared space revealed a clear shift from foundational AI methods (e.g., image segmentation, image enhancement) toward applied and integrative areas, including large language model applications, patient-facing tools, and AI in clinical education. Our results underscore the value of combining classical probabilistic models with modern large language model (LLM) tools for optimal topic modeling performance. While GPT-4o enhances interpretability, it should not be used in isolation for mapping thematic structures in scientific literature, at least not without pre-screening and prompt experimentation. Overall, our findings demonstrate the importance of hybrid topic modeling for mapping thematic structures in fast-evolving scientific domains, with dentistry serving as a case study.



</p></trans-abstract><kwd-group xml:lang="ru"><kwd>Topic Modeling</kwd><kwd>Structural Topic Model</kwd><kwd>GPT-4</kwd><kwd>Large Language Models</kwd><kwd>Dentistry</kwd><kwd>AI in Dentistry</kwd><kwd>LLM in Dentistry</kwd><kwd>Scientometric Analysis</kwd><kwd>Embedding-Based Clustering</kwd><kwd>Zero-Shot Topic Modeling</kwd><kwd>Bibliometric methods</kwd></kwd-group><kwd-group xml:lang="en"><kwd>Topic Modeling</kwd><kwd>Structural Topic Model</kwd><kwd>GPT-4</kwd><kwd>Large Language Models</kwd><kwd>Dentistry</kwd><kwd>AI in Dentistry</kwd><kwd>LLM in Dentistry</kwd><kwd>Scientometric Analysis</kwd><kwd>Embedding-Based Clustering</kwd><kwd>Zero-Shot Topic Modeling</kwd><kwd>Bibliometric methods</kwd></kwd-group></article-meta></front><back><ack><p>The study is supported by Russian Science Foundation, grant number 23-15-00060.</p></ack><ref-list><title>Список литературы</title><ref id="B1"><mixed-citation>Allani,&amp;nbsp;H., Santos,&amp;nbsp;A.&amp;nbsp;T. and Ribeiro-Vidal,&amp;nbsp;H. (2024). Multidisciplinary Applications of AI in Dentistry: Bibliometric Review, Applied Sciences, 14&amp;nbsp;(17), 7624. https://doi.org/10.3390/app14177624 (In English)</mixed-citation></ref><ref id="B2"><mixed-citation>Benz,&amp;nbsp;P., Pradier,&amp;nbsp;C., Kozlowski,&amp;nbsp;D., Shokida,&amp;nbsp;N.&amp;nbsp;S. and Larivi&amp;egrave;re,&amp;nbsp;V. (2025). Mapping the unseen in practice: comparing latent Dirichlet allocation and BERTopic for navigating topic spaces, Scientometrics, 10 June 2025. https://doi.org/10.1007/s11192-025-05339-6 (In English)</mixed-citation></ref><ref id="B3"><mixed-citation>Blei,&amp;nbsp;D.&amp;nbsp;M., Ng,&amp;nbsp;A.&amp;nbsp;Y. and Jordan,&amp;nbsp;M.&amp;nbsp;I. (2003). Latent dirichlet allocation, J. Mach. Learn. Res., 3, 993&amp;ndash;1022. (In English)</mixed-citation></ref><ref id="B4"><mixed-citation>B&amp;uuml;ttner, M., Leser, U., Schneider, L. and Schwendicke, F. (2024). Natural Language Processing: Chances and Challenges in Dentistry, Journal of Dentistry, 141, 104796. https://doi.org/10.1016/j.jdent.2023.104796 (In English)</mixed-citation></ref><ref id="B5"><mixed-citation>Cosola,&amp;nbsp;Di&amp;nbsp;M., Ballini,&amp;nbsp;A., Prencipe&amp;nbsp;F.&amp;nbsp;A., De&amp;nbsp;Tullio,&amp;nbsp;F., Fabrocini,&amp;nbsp;A., Cupelli,&amp;nbsp;P., Rignani&amp;nbsp;M., Cazzolla&amp;nbsp;A.&amp;nbsp;P., Bizzoca&amp;nbsp;M.&amp;nbsp;E., Musella&amp;nbsp;G. (2025). Artificial intelligence in dentistry: a narrative review of applications, challenges, and future directions, Minerva Dent Oral Sci, Jun 18. https://doi.org/10.23736/S2724-6329.25.05217-9 (In English)</mixed-citation></ref><ref id="B6"><mixed-citation>de Magalh&amp;atilde;es,&amp;nbsp;A.&amp;nbsp;A. and Santos,&amp;nbsp;A.&amp;nbsp;T. (2025). Advancements in Diagnostic Methods and Imaging Technologies in Dentistry: A Literature Review of Emerging Approaches, Journal of Clinical Medicine, 14&amp;nbsp;(4), 1277. https://doi.org/10.3390/jcm14041277 (In English)</mixed-citation></ref><ref id="B7"><mixed-citation>Grootendorst,&amp;nbsp;M. (2022). Bertopic: Neural topic modeling with a class-based tf-idf procedure, arXiv preprint arXiv:2203.05794 [Online], available at https://arxiv.org/abs/2203.05794 (accessed 22.08.2025) (In English)</mixed-citation></ref><ref id="B8"><mixed-citation>Hu,&amp;nbsp;J., Miao,&amp;nbsp;C., Wu,&amp;nbsp;Yi and Su,&amp;nbsp;J. (2025). Advances in hydrological research in China over the past two decades: Insights from advanced large language model and topic modeling, Fundamental Research [Online], available at https://doi.org/10.1016/j.fmre.2025.05.002 (accessed 10.05.2025) (In English)</mixed-citation></ref><ref id="B9"><mixed-citation>Islam,&amp;nbsp;K.&amp;nbsp;M.&amp;nbsp;S. (2025). Contextual Embedding-based Clustering to Identify Topics for Healthcare Service Improvement, arXiv:2504.14068, https://doi.org/10.48550/arXiv.2504.14068 (In English)</mixed-citation></ref><ref id="B10"><mixed-citation>Jung, H.&amp;nbsp;S., Lee,&amp;nbsp;H., Woo, Y.&amp;nbsp;S., Baek, S.&amp;nbsp;Y. and Kim, J.&amp;nbsp;H. (2024). Expansive data, extensive model: Investigating discussion topics around LLM through unsupervised machine learning in academic papers and news,&amp;nbsp;Plos One, 19&amp;nbsp;(5), e0304680. https://doi.org/10.1371/journal.pone.0304680 (In English)</mixed-citation></ref><ref id="B11"><mixed-citation>Kozlowski,&amp;nbsp;D. (2024). Generative AI for automatic topic labelling, arXiv:2408.07003. https://doi.org/10.48550/arxiv.2408.07003 (In English)</mixed-citation></ref><ref id="B12"><mixed-citation>Lee,&amp;nbsp;V.&amp;nbsp;V., van der Lubbe,&amp;nbsp;S.&amp;nbsp;C.&amp;nbsp;C., Goh,&amp;nbsp;L.&amp;nbsp;H. and Valderas,&amp;nbsp;J.&amp;nbsp;M. (2024). Harnessing ChatGPT for Thematic Analysis: Are We Ready?. Journal of medical Internet research, 26, e54974. https://doi.org/10.2196/54974 (In English)</mixed-citation></ref><ref id="B13"><mixed-citation>Lee,&amp;nbsp;Y., Oh,&amp;nbsp;J.&amp;nbsp;H., Lee,&amp;nbsp;D., Kang,&amp;nbsp;M. and Lee,&amp;nbsp;S. (2025). Prompt engineering in ChatGPT for literature review: practical guide exemplified with studies on white phosphors, Sci Rep, 15, 15310. https://doi.org/10.1038/s41598-025-99423-9 (In English)</mixed-citation></ref><ref id="B14"><mixed-citation>Mahmoud,&amp;nbsp;M., Mashaly,&amp;nbsp;M. and Mervat,&amp;nbsp;A.-E. (2025). Predicting Software Engineering Trends from Scientific Papers with a Combined Framework of Clustering and Topic Modeling, in 2025 15th International Conference on Electrical Engineering (ICEENG), 1&amp;ndash;6. https://doi.org/10.1109/ICEENG64546.2025.11031347 (In English)</mixed-citation></ref><ref id="B15"><mixed-citation>Mathis,&amp;nbsp;W.&amp;nbsp;S., Zhao,&amp;nbsp;S., Pratt,&amp;nbsp;N., Weleff,&amp;nbsp;J. and De Paoli,&amp;nbsp;S. (2024). Mathis Inductive thematic analysis of healthcare qualitative interviews using open-source large language models: How does it compare to traditional methods?, Computer Methods and Programs in Biomedicine, 255. https://doi.org/10.1016/j.cmpb.2024.108356 (In English)</mixed-citation></ref><ref id="B16"><mixed-citation>Meng,&amp;nbsp;F., Lu,&amp;nbsp;Z., Li,&amp;nbsp;X., Han,&amp;nbsp;W., Peng,&amp;nbsp;J., Liu,&amp;nbsp;X. and Niu,&amp;nbsp;Z. (2024). Demand-side energy management reimagined: A comprehensive literature analysis leveraging large language models, Energy, 291, 130303. https://doi.org/10.1016/j.energy.2024.130303 (In English)</mixed-citation></ref><ref id="B17"><mixed-citation>Mu,&amp;nbsp;Y., Bai,&amp;nbsp;P., Bontcheva,&amp;nbsp;K. and Song,&amp;nbsp;X. (2024a). Addressing topic granularity and hallucination in large language models for topic modelling.&amp;nbsp;arXiv preprint arXiv:2405.00611v1 [Online], available at https://arxiv.org/html/2405.00611v1 (accessed 22.08.2025) (In English)</mixed-citation></ref><ref id="B18"><mixed-citation>Mu,&amp;nbsp;Y., Dong,&amp;nbsp;C., Bontcheva,&amp;nbsp;K. and Song,&amp;nbsp;X. (2024b). Large language models offer an alternative to the traditional approach of topic modelling. In LREC-COLING 2024, ELRA Language Resource Association, 2024, 10160&amp;ndash;10171. (In English)</mixed-citation></ref><ref id="B19"><mixed-citation>Ogunleye,&amp;nbsp;B., Lancho Barrantes,&amp;nbsp;B.&amp;nbsp;S. and Zakariyyah,&amp;nbsp;K.&amp;nbsp;I. (2025). Topic modelling through the bibliometrics lens and its technique, Artif Intell Rev, 58, 74. https://doi.org/10.1007/s10462-024-11011-x (In English)</mixed-citation></ref><ref id="B20"><mixed-citation>Pham,&amp;nbsp;C.&amp;nbsp;M., Hoyle,&amp;nbsp;A., Sun,&amp;nbsp;S., Resnik,&amp;nbsp;P. and Iyyer,&amp;nbsp;M. 2024.&amp;nbsp;TopicGPT: A Prompt-based Topic Modeling Framework, in Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), 2956&amp;ndash;2984, Mexico City, Mexico. Association for Computational Linguistics. (In English)</mixed-citation></ref><ref id="B21"><mixed-citation>Reuter,&amp;nbsp;A., Thielmann,&amp;nbsp;A., Weisser,&amp;nbsp;C., Fischer,&amp;nbsp;S. and S&amp;auml;fken,&amp;nbsp;B. (2024). Gptopic: Dynamic and interactive topic representations,&amp;nbsp;arXiv preprint arXiv:2403.03628 [Online], available at https://arxiv.org/abs/2403.03628 (accessed 22.08.2025) (In English)</mixed-citation></ref><ref id="B22"><mixed-citation>Riaz,&amp;nbsp;A., Abdulkader,&amp;nbsp;O., Ikram,&amp;nbsp;M.&amp;nbsp;J. and Sadaqat,&amp;nbsp;J. (2025). Exploring topic modelling: a comparative analysis of traditional and transformer-based approaches with emphasis on coherence and diversity, International Journal of Electrical and Computer Engineering (IJECE), [S.l.], 15&amp;nbsp;(2), 1933&amp;ndash;1948. http://doi.org/10.11591/ijece.v15i2.pp1933-1948 (In English)</mixed-citation></ref><ref id="B23"><mixed-citation>Roberts,&amp;nbsp;M.&amp;nbsp;E., Stewart,&amp;nbsp;B.&amp;nbsp;M. and Tingley,&amp;nbsp;D. (2019). Stm: An R package for structural topic models, J. Stat. Softw. 91&amp;nbsp;(2), 1&amp;ndash;40. https://doi.org/10.18637/jss.v091.i02 (In English)</mixed-citation></ref><ref id="B24"><mixed-citation>Şakar,&amp;nbsp;S. and Tan,&amp;nbsp;S. (2025). Research Topics and Trends in Gifted Education: A Structural Topic Model, Gifted Child Quarterly, 69&amp;nbsp;(1), 68&amp;ndash;84. https://doi.org/10.1177/0016986224128504 (In English)</mixed-citation></ref><ref id="B25"><mixed-citation>Sbalchiero,&amp;nbsp;S. and Eder,&amp;nbsp;M. (2020). Topic modeling, long texts and the best number of topics. Some Problems and solutions, Qual Quant, 54, 1095&amp;ndash;1108. https://doi.org/10.1007/s11135-020-00976-w (In English)</mixed-citation></ref><ref id="B26"><mixed-citation>Shapurian,&amp;nbsp;G. (2024). Large Language Models and Knowledge Graphs for Astronomical Entity Disambiguation, arXiv:2406.11400 [Online], available at: https://arxiv.org/pdf/2406.11400 (accessed 22.08.2025) (In English)</mixed-citation></ref><ref id="B27"><mixed-citation>Sharma,&amp;nbsp;A., Wallace,&amp;nbsp;J.&amp;nbsp;R. (2025). DeTAILS: Deep Thematic Analysis with Iterative LLM Support, in Proceedings of the 7th ACM Conference on Conversational User Interfaces (CUI &amp;#39;25). Association for Computing Machinery, New York, NY, USA, Article 28, 1&amp;ndash;7. https://doi.org/10.1145/3719160.3735657 (In English)</mixed-citation></ref><ref id="B28"><mixed-citation>Shirani,&amp;nbsp;M. (2025). Trends and Classification of Artificial Intelligence Models Utilized in Dentistry: A Bibliometric Study, Cureus, 17&amp;nbsp;(4): e81836. https://doi.org/10.7759/cureus.81836 (In English)</mixed-citation></ref><ref id="B29"><mixed-citation>Silveira,&amp;nbsp;L. (2024). Cone Beam Computed Tomography and Artificial Intelligence. &amp;iquest;Where We Are? Rev Cient Odontol, 12&amp;nbsp;(4), e214. https://doi.org/10.21142/2523-2754-1204-2024-214 (In English)</mixed-citation></ref><ref id="B30"><mixed-citation>Tarek,&amp;nbsp;A., Mahmoud,&amp;nbsp;M., Afifi,&amp;nbsp;B., Mashaly,&amp;nbsp;M. and Abu-Elkheir,&amp;nbsp;M. (2024). Query-Based Topic Modeling and Trend Analysis in Scientific Literature, in 2024 International Conference on Microelectronics (ICM), Doha, Qatar, 2024, 1&amp;ndash;6. https://doi.org/10.1109/ICM63406.2024.10815706 (In English)</mixed-citation></ref><ref id="B31"><mixed-citation>Torres,&amp;nbsp;J., Mulligan,&amp;nbsp;C., Jorge,&amp;nbsp;J. and Moreira,&amp;nbsp;C. (2025). PROMPTHEUS: A Human-Centered Pipeline to Streamline Systematic Literature Reviews with Large Language Models, Information, 16&amp;nbsp;(5), 420. https://doi.org/10.3390/info16050420 (In English)</mixed-citation></ref><ref id="B32"><mixed-citation>Wu,&amp;nbsp;X., Nguyen,&amp;nbsp;T. and Luu,&amp;nbsp;A.&amp;nbsp;T. (2024). A survey on neural topic models: Methods, applications, and challenges, Artif Intell Rev 57(18): 1&amp;ndash;30. https://doi.org/10.1007/s10462-023-10661-7 (In English)</mixed-citation></ref><ref id="B33"><mixed-citation>Xie,&amp;nbsp;B., Xu,&amp;nbsp;D., Zou,&amp;nbsp;X.&amp;nbsp;Q., Lu,&amp;nbsp;M.&amp;nbsp;J., Peng,&amp;nbsp;X.&amp;nbsp;L. and Wen,&amp;nbsp;X.&amp;nbsp;J. (2024). Artificial intelligence in dentistry: a bibliometric analysis from 2000 to 2023, J Dent Sci., 19, 1722&amp;ndash;33. https://doi.org/10.1016/j.jds.2023.10.025 (In English)</mixed-citation></ref><ref id="B34"><mixed-citation>Zatt,&amp;nbsp;F.&amp;nbsp;P., Rocha,&amp;nbsp;A.&amp;nbsp;O., Anjos,&amp;nbsp;L.&amp;nbsp;M., Caldas,&amp;nbsp;R.&amp;nbsp;A., Cardoso,&amp;nbsp;M. and Rabelo,&amp;nbsp;G.&amp;nbsp;D. (2024). Artificial intelligence applications in dentistry: a bibliometric review with an emphasis on computational research trends within the field, J Am Dent Assoc., 155&amp;nbsp;(9), 755&amp;ndash;764. https://doi.org/10.1016/j.adaj.2024.05.013 (In English)</mixed-citation></ref></ref-list></back></article>