<?xml version='1.0' encoding='utf-8'?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.2 20190208//EN" "http://jats.nlm.nih.gov/publishing/1.2/JATS-journalpublishing1.dtd">
<article article-type="research-article" dtd-version="1.2" xml:lang="ru" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><front><journal-meta><journal-id journal-id-type="issn">2313-8912</journal-id><journal-title-group><journal-title>Research Result. Theoretical and Applied Linguistics</journal-title></journal-title-group><issn pub-type="epub">2313-8912</issn></journal-meta><article-meta><article-id pub-id-type="doi">10.18413/2313-8912-2024-10-4-0-5</article-id><article-id pub-id-type="publisher-id">3676</article-id><article-categories><subj-group subj-group-type="heading"><subject>Large Language Models and Prompt Engineering in Linguistics</subject></subj-group></article-categories><title-group><article-title>&lt;strong&gt;Combining the tasks of entity linking and relation extraction using a unified neural network model&lt;/strong&gt;</article-title><trans-title-group xml:lang="en"><trans-title>&lt;strong&gt;Combining the tasks of entity linking and relation extraction using a unified neural network model&lt;/strong&gt;</trans-title></trans-title-group></title-group><contrib-group><contrib contrib-type="author"><name-alternatives><name xml:lang="ru"><surname>Sboev</surname><given-names>Alexander G.</given-names></name><name xml:lang="en"><surname>Sboev</surname><given-names>Alexander G.</given-names></name></name-alternatives><email>Sboev_AG@nrcki.ru</email><xref ref-type="aff" rid="aff1" /></contrib><contrib contrib-type="author"><name-alternatives><name xml:lang="ru"><surname>Gryaznov</surname><given-names>Artem V.</given-names></name><name xml:lang="en"><surname>Gryaznov</surname><given-names>Artem V.</given-names></name></name-alternatives><email>Gryaznov_AV@nrcki.ru</email><xref ref-type="aff" rid="aff1" /></contrib></contrib-group><aff id="aff1"><institution>Kurchatov Institute National Research Center, Russia</institution></aff><pub-date pub-type="epub"><year>2024</year></pub-date><volume>10</volume><issue>4</issue><fpage>0</fpage><lpage>0</lpage><self-uri content-type="pdf" xlink:href="/media/linguistics/2024/4/Research_Result_4-42-108-119.pdf" /><abstract xml:lang="ru"><p>In this paper we describe methods for training neural network models for extracting pharmacologically significant entities from natural language texts with their further transformation into a formalized form of thesauruses and specialized dictionaries, as well as establishing relations between them. The task of extracting relevant pharmaceutical information from Internet texts is in demand by pharmacovigilance to monitor the effects and conditions of taking medicines. The analysis of texts from the Internet is complicated by the presence of informal speech and distorted terminology. Therefore, the analysis requires not only extracting pharmacologically relevant information, but also bringing it to a standardized form. The purpose of this work is to obtain an end-to-end neural network model that solves all three tasks &amp;ndash; entity recognition, relation extraction, and entity disambiguation &amp;ndash; in order to avoid sequential processing of one text by independent models. We consider approaches based on generative neural networks that create sequences of words according to a given input text and extractive ones that select and classify words and sequences within the source text. The results of the comparison showed the advantage of the extractive approach over the generative one on the considered set of tasks. The models of this approach outperform the generative model by 5%&amp;nbsp;(f1-micro=85.9) in the task of extracting pharmaceutical entities, by 10%&amp;nbsp;(f1-micro=72.8) in the task of extracting relations and by 4% (f1-micro=64.5) in the entity disambiguation. A joint extractive model was also obtained for three tasks with f1-micro accuracy: 83.4, 68.2, 57.4 for each of the tasks.



</p></abstract><trans-abstract xml:lang="en"><p>In this paper we describe methods for training neural network models for extracting pharmacologically significant entities from natural language texts with their further transformation into a formalized form of thesauruses and specialized dictionaries, as well as establishing relations between them. The task of extracting relevant pharmaceutical information from Internet texts is in demand by pharmacovigilance to monitor the effects and conditions of taking medicines. The analysis of texts from the Internet is complicated by the presence of informal speech and distorted terminology. Therefore, the analysis requires not only extracting pharmacologically relevant information, but also bringing it to a standardized form. The purpose of this work is to obtain an end-to-end neural network model that solves all three tasks &amp;ndash; entity recognition, relation extraction, and entity disambiguation &amp;ndash; in order to avoid sequential processing of one text by independent models. We consider approaches based on generative neural networks that create sequences of words according to a given input text and extractive ones that select and classify words and sequences within the source text. The results of the comparison showed the advantage of the extractive approach over the generative one on the considered set of tasks. The models of this approach outperform the generative model by 5%&amp;nbsp;(f1-micro=85.9) in the task of extracting pharmaceutical entities, by 10%&amp;nbsp;(f1-micro=72.8) in the task of extracting relations and by 4% (f1-micro=64.5) in the entity disambiguation. A joint extractive model was also obtained for three tasks with f1-micro accuracy: 83.4, 68.2, 57.4 for each of the tasks.



</p></trans-abstract><kwd-group xml:lang="ru"><kwd>NLP</kwd><kwd>LLM</kwd><kwd>Pharm</kwd><kwd>Entity recognition</kwd><kwd>Relation extraction</kwd><kwd>Medical concept normalization</kwd><kwd>Entity disambiguation</kwd></kwd-group><kwd-group xml:lang="en"><kwd>NLP</kwd><kwd>LLM</kwd><kwd>Pharm</kwd><kwd>Entity recognition</kwd><kwd>Relation extraction</kwd><kwd>Medical concept normalization</kwd><kwd>Entity disambiguation</kwd></kwd-group></article-meta></front><back><ref-list><title>Список литературы</title><ref id="B1"><mixed-citation>Broscheit,&amp;nbsp;S. (2020). Investigating entity knowledge in BERT with simple neural end-to-end entity linking, arXiv preprint, arXiv:2003.05473. https://doi.org/10.18653/v1/K19-1063 (In English)</mixed-citation></ref><ref id="B2"><mixed-citation>De&amp;nbsp;Cao&amp;nbsp;N, Izacard&amp;nbsp;G, Riedel&amp;nbsp;S, Petroni&amp;nbsp;F. (2020). Autoregressive Entity Retrieval, ICLR 2021 &amp;ndash; 9th International Conference on Learning Representations, Vienna, Austria. (In English)</mixed-citation></ref><ref id="B3"><mixed-citation>Devlin,&amp;nbsp;J., Chang,&amp;nbsp;M.-W., Lee,&amp;nbsp;K. and Toutanova,&amp;nbsp;K. (2019). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, arXiv:1810.04805v2. DOI: 10.48550/arXiv.1810.04805 (In English)</mixed-citation></ref><ref id="B4"><mixed-citation>Eberts,&amp;nbsp;M. and Ulges,&amp;nbsp;A. (2020). Span-based joint entity and relation extraction with transformer pre-training, ECAI 2020, 325, 2006&amp;ndash;2013. DOI: 10.3233/FAIA200321 (In English)</mixed-citation></ref><ref id="B5"><mixed-citation>Kondragunta,&amp;nbsp;M., Perez-de-Vi&amp;ntilde;aspre,&amp;nbsp;O. and Oronoz,&amp;nbsp;M. (2023). Improving and simplifying template-based named entity recognition, Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics: Student Research Workshop, Dubrovnik, Croatia, 79&amp;ndash;86. DOI: 10.18653/v1/2023.eacl-srw.8 (In English)</mixed-citation></ref><ref id="B6"><mixed-citation>Lee,&amp;nbsp;J., Yoon,&amp;nbsp;W., Kim,&amp;nbsp;S., Kim,&amp;nbsp;D., Kim,&amp;nbsp;S., So,&amp;nbsp;C.&amp;nbsp;H. and Kang,&amp;nbsp;J. (2019). BioBERT: A pre-trained biomedical language representation model for biomedical text mining, Bioinformatics, 36, 4, 1234&amp;ndash;1240. DOI: 10.1093/bioinformatics/btz682 (In English)</mixed-citation></ref><ref id="B7"><mixed-citation>Lin,&amp;nbsp;C., Lou,&amp;nbsp;Y.&amp;nbsp;S., Tsai,&amp;nbsp;D.&amp;nbsp;J., Lee,&amp;nbsp;C.&amp;nbsp;C., Hsu,&amp;nbsp;C.&amp;nbsp;J., Wu,&amp;nbsp;D.&amp;nbsp;C., Wang,&amp;nbsp;M.&amp;nbsp;C. and Fang,&amp;nbsp;W.&amp;nbsp;H. (2019). Projection word embedding model with hybrid sampling training for classifying ICD-10-CM codes: Longitudinal observational study, JMIR medical informatics, 7 (3), e14499. DOI: 10.2196/14499 (In English)</mixed-citation></ref><ref id="B8"><mixed-citation>Liu,&amp;nbsp;Y, Ott,&amp;nbsp;M, Goyal,&amp;nbsp;N, Du,&amp;nbsp;J, Joshi,&amp;nbsp;M, Chen,&amp;nbsp;D, Levy,&amp;nbsp;O, Lewis,&amp;nbsp;M, Zettlemoyer,&amp;nbsp;L, Stoyanov,&amp;nbsp;V. (2019). Roberta: A robustly optimized BERT pretraining approach, arXiv preprint. https://doi.org/10.48550/arXiv.1907.11692 (In English)</mixed-citation></ref><ref id="B9"><mixed-citation>Liu,&amp;nbsp;P., Guo,&amp;nbsp;Y., Wang,&amp;nbsp;F. and Li,&amp;nbsp;G. (2022). Chinese named entity recognition: The state of the art, Neurocomputing, 473, 37&amp;ndash;53. https://doi.org/10.1016/j.neucom.2021.10.101 (In English)</mixed-citation></ref><ref id="B10"><mixed-citation>Mondal,&amp;nbsp;I., Purkayastha,&amp;nbsp;S., Sarkar,&amp;nbsp;S., Goyal,&amp;nbsp;P., Pillai,&amp;nbsp;J., Bhattacharyya,&amp;nbsp;A. and Gattu,&amp;nbsp;M. (2019). Medical entity linking using triplet network, Proceedings of the 2nd Clinical Natural Language Processing Workshop, Minneapolis, Minnesota, USA, 95&amp;ndash;100. DOI: 10.18653/v1/W19-1912 (In English)</mixed-citation></ref><ref id="B11"><mixed-citation>Pattisapu,&amp;nbsp;N., Anand,&amp;nbsp;V., Patil,&amp;nbsp;S., Palshikar,&amp;nbsp;G. and Varma,&amp;nbsp;V. (2020). Distant supervision for medical concept normalization, Journal of biomedical informatics, 109, 103522. DOI: 10.1016/j.jbi.2020.103522 (In English)</mixed-citation></ref><ref id="B12"><mixed-citation>Raffel,&amp;nbsp;C., Shazeer,&amp;nbsp;N., Roberts,&amp;nbsp;A., Lee,&amp;nbsp;K., Narang,&amp;nbsp;S., Matena,&amp;nbsp;M., Zhou,&amp;nbsp;Y., Li,&amp;nbsp;W. and Liu,&amp;nbsp;PJ. (2020). Exploring the limits of transfer learning with a unified text-to-text transformer, Journal of machine learning research, 21 (14), 1&amp;ndash;67. (In English)</mixed-citation></ref><ref id="B13"><mixed-citation>Sahu,&amp;nbsp;S.&amp;nbsp;K. and Ashish,&amp;nbsp;A. (2018). Drug-drug interaction extraction from biomedical texts using long short-term memory network, Journal of biomedical informatics, 86, 15&amp;ndash;24. DOI: 10.1016/j.jbi.2018.08.005 (In English)</mixed-citation></ref><ref id="B14"><mixed-citation>Sakhovskiy,&amp;nbsp;A., Semenova,&amp;nbsp;N., Kadurin,&amp;nbsp;A. and Tutubalina,&amp;nbsp;E. (2023). Graph-enriched biomedical entity representation transformer, Experimental IR Meets Multilinguality, Multimodality, and Interaction, CLEF 2023. Lecture Notes in Computer Science, 14163, Springer, Cham. DOI: 10.1007/978-3-031-42448-9_10 (In English)</mixed-citation></ref><ref id="B15"><mixed-citation>Sboev,&amp;nbsp;A., Sboeva,&amp;nbsp;S., Moloshnikov,&amp;nbsp;I., Gryaznov,&amp;nbsp;A., Rybka,&amp;nbsp;R., Naumov,&amp;nbsp;A., Selivanov,&amp;nbsp;A., Rylkov,&amp;nbsp;G. and Ilyin,&amp;nbsp;V. (2022a). Analysis of the full-size Russian corpus of internet drug reviews with complex NER labeling using deep learning neural networks and language models, Applied Sciences, 12.1, 491. DOI: 10.3390/app12010491 (In English)</mixed-citation></ref><ref id="B16"><mixed-citation>Sboev,&amp;nbsp;A., Rybka,&amp;nbsp;R., Gryaznov,&amp;nbsp;A., Moloshnikov,&amp;nbsp;I., Sboeva,&amp;nbsp;S., Rylkov,&amp;nbsp;G. and Selivanov,&amp;nbsp;A. (2022b). Adverse Drug Reaction Concept Normalization in Russian-Language Reviews of Internet Users, Big Data and Cognitive Computing, 6 (4), 145. DOI:/10.3390/bdcc6040145 (In English)</mixed-citation></ref><ref id="B17"><mixed-citation>Sboev,&amp;nbsp;A., Rybka,&amp;nbsp;R., Selivanov,&amp;nbsp;A., Moloshnikov,&amp;nbsp;I., Gryaznov,&amp;nbsp;A., Naumov,&amp;nbsp;A., Sboeva,&amp;nbsp;S., Rylkov,&amp;nbsp;G. and Zakirova,&amp;nbsp;S. (2023). Accuracy analysis of the end-to-end extraction of related named entities from Russian drug review texts by modern approaches validated on English Biomedical corpora, Mathematics, 11 (2), 354. DOI: 10.3390/math11020354 (In English)</mixed-citation></ref><ref id="B18"><mixed-citation>Tutubalina,&amp;nbsp;E., Alimova,&amp;nbsp;I., Miftahutdinov,&amp;nbsp;Z., Sakhovskiy,&amp;nbsp;A., Malykh,&amp;nbsp;V. and Nikolenko,&amp;nbsp;S. (2021). The Russian Drug Reaction Corpus and neural models for drug reactions and effectiveness detection in user reviews, Bioinformatics, 37 (2), 243&amp;ndash;249. DOI: 10.1093/bioinformatics/btaa675 (In English)</mixed-citation></ref><ref id="B19"><mixed-citation>Yuan,&amp;nbsp;Z., Zhao,&amp;nbsp;Z., Sun,&amp;nbsp;H., Li,&amp;nbsp;J., Wang,&amp;nbsp;F. and Yu,&amp;nbsp;S. (2022). CODER: Knowledge-infused cross-lingual medical term embedding for term normalization, Journal of biomedical informatics, 126, 103983. DOI: 10.1016/j.jbi.2021.103983 (In English)</mixed-citation></ref><ref id="B20"><mixed-citation>Zhou,&amp;nbsp;W., Huang,&amp;nbsp;K., Ma,&amp;nbsp;T. and Huang,&amp;nbsp;J. (2021). Document-level relation extraction with adaptive thresholding and localized context pooling, Proceedings of the AAAI conference on artificial intelligence, 35&amp;nbsp;(16), 14612-14620. DOI: 10.1609/aaai.v35i16.17717 (In English)</mixed-citation></ref><ref id="B21"><mixed-citation>Zmitrovich,&amp;nbsp;D., Abramov,&amp;nbsp;A., Kalmykov,&amp;nbsp;A., Tikhonova,&amp;nbsp;M., Taktasheva,&amp;nbsp;E., Astafurov,&amp;nbsp;D., Baushenko,&amp;nbsp;M., Snegirev,&amp;nbsp;A.,</mixed-citation></ref><ref id="B22"><mixed-citation>Kadulin,&amp;nbsp;V., Markov,&amp;nbsp;S. and Shavrina,&amp;nbsp;T. (2023).</mixed-citation></ref><ref id="B23"><mixed-citation>A family of pretrained transformer language models for Russian, arXiv preprint, arXiv:2309.10931. DOI: 10.48550/arXiv.2309.10931 (In English)</mixed-citation></ref></ref-list></back></article>