<?xml version='1.0' encoding='utf-8'?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.2 20190208//EN" "http://jats.nlm.nih.gov/publishing/1.2/JATS-journalpublishing1.dtd">
<article article-type="research-article" dtd-version="1.2" xml:lang="ru" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><front><journal-meta><journal-id journal-id-type="issn">2313-8912</journal-id><journal-title-group><journal-title>Research Result. Theoretical and Applied Linguistics</journal-title></journal-title-group><issn pub-type="epub">2313-8912</issn></journal-meta><article-meta><article-id pub-id-type="doi">10.18413/2313-8912-2024-10-4-0-3</article-id><article-id pub-id-type="publisher-id">3674</article-id><article-categories><subj-group subj-group-type="heading"><subject>Large Language Models and Prompt Engineering in Linguistics</subject></subj-group></article-categories><title-group><article-title>&lt;strong&gt;Prompt injection &amp;ndash; the problem of linguistic vulnerabilities of large language models at the present stage&lt;/strong&gt;</article-title><trans-title-group xml:lang="en"><trans-title>&lt;strong&gt;Prompt injection &amp;ndash; the problem of linguistic vulnerabilities of large language models at the present stage&lt;/strong&gt;</trans-title></trans-title-group></title-group><contrib-group><contrib contrib-type="author"><name-alternatives><name xml:lang="ru"><surname>Zyryanova</surname><given-names>Irina N.</given-names></name><name xml:lang="en"><surname>Zyryanova</surname><given-names>Irina N.</given-names></name></name-alternatives><email>Irina_zyr@mail.ru</email><xref ref-type="aff" rid="aff1" /></contrib><contrib contrib-type="author"><name-alternatives><name xml:lang="ru"><surname>Chernavskiy</surname><given-names>Alexander S.</given-names></name><name xml:lang="en"><surname>Chernavskiy</surname><given-names>Alexander S.</given-names></name></name-alternatives><email>Chernavskiy.com@gmail.com</email><xref ref-type="aff" rid="aff2" /></contrib><contrib contrib-type="author"><name-alternatives><name xml:lang="ru"><surname>Trubachev</surname><given-names>Stanislav O.</given-names></name><name xml:lang="en"><surname>Trubachev</surname><given-names>Stanislav O.</given-names></name></name-alternatives><email>brandei@yandex.ru</email><xref ref-type="aff" rid="aff3" /></contrib></contrib-group><aff id="aff3"><institution>OPS Guru LLC</institution></aff><aff id="aff2"><institution>Federal State Budgetary Educational Institution of Higher Education «Moscow Pedagogical State University»</institution></aff><aff id="aff1"><institution>The Federal State Budget Educational Institution of Higher Education «Baikal State University»</institution></aff><pub-date pub-type="epub"><year>2024</year></pub-date><volume>10</volume><issue>4</issue><fpage>0</fpage><lpage>0</lpage><self-uri content-type="pdf" xlink:href="/media/linguistics/2024/4/Research_Result_4-42-41-66.pdf" /><abstract xml:lang="ru"><p>The article examines the phenomenon of &amp;ldquo;prompt injection&amp;rdquo; in the context of contemporary large language models (LLMs), elucidating a significant challenge for AI developers and researchers. The study comprises a theoretical and methodological review of scholarly publications, thereby enhancing the comprehension of the present state of research in this field. The authors present the findings of a case study, which employs a comparative analysis of the linguistic vulnerabilities of prominent LLMs, including Chat GPT 4.0, Claude 3.5, and Yandex GPT. The study employs experimental evaluation to assess the resilience of these models against a range of vector attacks, with the objective of determining the extent to which each model resists manipulative prompts designed to exploit their linguistic capabilities. A taxonomy of prompt injection attack types was developed based on the collected data, with classification according to effectiveness and targeting of specific LLMs. This classification facilitates comprehension of the nature of these vulnerabilities and provides a basis for future research in this field. Moreover, the article offers suggestions for bolstering the resilience of language models against negative manipulations, representing a significant stride towards the development of safer and more ethical AI systems. These recommendations are based on empirical data and aim to provide practical guidance for developers seeking to enhance the resilience of their models against potential threats. The research findings extend our understanding of linguistic vulnerabilities in LLMs, while also contributing to the development of more effective defence strategies. These have practical implications for the deployment of LLMs across various domains, including education, healthcare and customer service. The authors emphasise the necessity for continuous monitoring and improvement of language model security in an ever-evolving technological landscape. The findings suggest the necessity for an ongoing dialogue among stakeholders to address issues pertaining to the prompt injection of funds.



</p></abstract><trans-abstract xml:lang="en"><p>The article examines the phenomenon of &amp;ldquo;prompt injection&amp;rdquo; in the context of contemporary large language models (LLMs), elucidating a significant challenge for AI developers and researchers. The study comprises a theoretical and methodological review of scholarly publications, thereby enhancing the comprehension of the present state of research in this field. The authors present the findings of a case study, which employs a comparative analysis of the linguistic vulnerabilities of prominent LLMs, including Chat GPT 4.0, Claude 3.5, and Yandex GPT. The study employs experimental evaluation to assess the resilience of these models against a range of vector attacks, with the objective of determining the extent to which each model resists manipulative prompts designed to exploit their linguistic capabilities. A taxonomy of prompt injection attack types was developed based on the collected data, with classification according to effectiveness and targeting of specific LLMs. This classification facilitates comprehension of the nature of these vulnerabilities and provides a basis for future research in this field. Moreover, the article offers suggestions for bolstering the resilience of language models against negative manipulations, representing a significant stride towards the development of safer and more ethical AI systems. These recommendations are based on empirical data and aim to provide practical guidance for developers seeking to enhance the resilience of their models against potential threats. The research findings extend our understanding of linguistic vulnerabilities in LLMs, while also contributing to the development of more effective defence strategies. These have practical implications for the deployment of LLMs across various domains, including education, healthcare and customer service. The authors emphasise the necessity for continuous monitoring and improvement of language model security in an ever-evolving technological landscape. The findings suggest the necessity for an ongoing dialogue among stakeholders to address issues pertaining to the prompt injection of funds.



</p></trans-abstract><kwd-group xml:lang="ru"><kwd>Prompt injection</kwd><kwd>Large language models</kwd><kwd>LLM</kwd><kwd>LLM vulnerabilities</kwd><kwd>LLM jailbreak</kwd><kwd>security of AI</kwd><kwd>Linguistic attacks on LLM</kwd><kwd>Prompts security</kwd></kwd-group><kwd-group xml:lang="en"><kwd>Prompt injection</kwd><kwd>Large language models</kwd><kwd>LLM</kwd><kwd>LLM vulnerabilities</kwd><kwd>LLM jailbreak</kwd><kwd>security of AI</kwd><kwd>Linguistic attacks on LLM</kwd><kwd>Prompts security</kwd></kwd-group></article-meta></front><back><ref-list><title>Список литературы</title><ref id="B1"><mixed-citation>Chang,&amp;nbsp;Z., Li,&amp;nbsp;M., Liu,&amp;nbsp;Y., Wang,&amp;nbsp;J., Wang,&amp;nbsp;Q. and Liu,&amp;nbsp;Y. (2024). Play guessing game with LLM: Indirect jailbreak attack with implicit clues, arXiv preprint arXiv:2402.09091. https://doi.org/10.48550/arXiv.2402.09091 (In English)</mixed-citation></ref><ref id="B2"><mixed-citation>Chen,&amp;nbsp;S., Zharmagambetov,&amp;nbsp;A., Mahloujifar,&amp;nbsp;S., Chaudhuri,&amp;nbsp;K. and Guo,&amp;nbsp;C. (2024). Aligning LLMs to Be Robust Against Prompt Injection.&amp;nbsp;arXiv preprint arXiv:2410.05451. https://doi.org/10.48550/arXiv.2410.05451 (In English)</mixed-citation></ref><ref id="B3"><mixed-citation>Duan,&amp;nbsp;M., Suri,&amp;nbsp;A., Mireshghallah,&amp;nbsp;N., Min,&amp;nbsp;S., Shi,&amp;nbsp;W., Zettlemoyer,&amp;nbsp;L., Tsvetkov,&amp;nbsp;Yu, Choi,&amp;nbsp;Y., Evans,&amp;nbsp;D. and Hajishirzi,&amp;nbsp;H. (2024). Do membership inference attacks work on large language models?, arXiv preprint arXiv:2402.07841. https://doi.org/10.48550/arXiv.2402.07841 (In English)</mixed-citation></ref><ref id="B4"><mixed-citation>Hines,&amp;nbsp;K., Lopez,&amp;nbsp;G., Hall,&amp;nbsp;M., Zarfati,&amp;nbsp;F., Zunger,&amp;nbsp;Y. and Kiciman,&amp;nbsp;E. (2024). Defending Against Indirect Prompt Injection Attacks with Spotlighting, arXiv preprint arXiv:2403.14720. https://doi.org/10.48550/arXiv.2403.14720 (In English)</mixed-citation></ref><ref id="B5"><mixed-citation>Khandelwal,&amp;nbsp;U., Levy,&amp;nbsp;O., Jurafsky,&amp;nbsp;D., Zettlemoyer,&amp;nbsp;L. and Lewis,&amp;nbsp;M. (2019). Generalization through memorization: Nearest neighbor language models, arXiv preprint arXiv:1911.00172. https://doi.org/10.48550/arXiv.1911.00172 (In English).</mixed-citation></ref><ref id="B6"><mixed-citation>Kumar,&amp;nbsp;S.&amp;nbsp;S., Cummings,&amp;nbsp;M.&amp;nbsp;L. and Stimpson,&amp;nbsp;A. (2024, May). Strengthening LLM trust boundaries: A survey of prompt injection attacks, 2024 IEEE 4th International Conference on Human-Machine Systems (ICHMS), 1&amp;ndash;6, available at: https://www.researchgate.net/profile/Missy-Cummings/publication/378072627_Strengthening_LLM_Trust_Boundaries_A_Survey_of_Prompt_Injection_Attacks/links/65c57ac379007454976ae142/Strengthening-LLM-Trust-Boundaries-A-Survey-of-Prompt-Injection-Attacks.pdf/ (Accessed 29.06.2024). DOI: 10.1109/ICHMS59971.2024.10555871 (In English)</mixed-citation></ref><ref id="B7"><mixed-citation>Li,&amp;nbsp;X., Wang,&amp;nbsp;R., Cheng,&amp;nbsp;M.,&amp;nbsp;Zhou,&amp;nbsp;T. and Hsieh,&amp;nbsp;C.&amp;nbsp;J. (2024). Drattack: Prompt decomposition and reconstruction makes powerful llm jailbreakers.&amp;nbsp;arXiv preprint arXiv:2402.16914. https://doi.org/10.48550/arXiv.2402.16914 (In English)</mixed-citation></ref><ref id="B8"><mixed-citation>Liu,&amp;nbsp;X., Yu,&amp;nbsp;Z., Zhang,&amp;nbsp;Y., Zhang,&amp;nbsp;N. and Xiao,&amp;nbsp;C. (2024). Automatic and universal prompt injection attacks against large language models, arXiv preprint arXiv:2403.04957. https://doi.org/10.48550/arXiv.2403.04957 (In English)</mixed-citation></ref><ref id="B9"><mixed-citation>Marvin,&amp;nbsp;G. Hellen,&amp;nbsp;N., Jjingo,&amp;nbsp;D. and Nakatumba-Nabende,&amp;nbsp;J. 2023). Prompt engineering in large language models, Proceedings of the International conference on data intelligence and cognitive informatics, Springer Nature Singapore, Singapore, 387&amp;ndash;402, available at: https://www.researchgate.net/publication/377214553_Prompt_Engineering_in_Large_Language_Models (accessed 29.06.2024). DOI: 10.1007/978-981-99-7962-2_30 (In English)</mixed-citation></ref><ref id="B10"><mixed-citation>Mudarova,&amp;nbsp;R., Namiot,&amp;nbsp;D. (2024). Countering Prompt Injection attacks on large language models, International Journal of Open Information Technologies, 12&amp;nbsp;(5), 39&amp;ndash;48. (In Russian)</mixed-citation></ref><ref id="B11"><mixed-citation>Pedro,&amp;nbsp;R., Castro,&amp;nbsp;D., Carreira,&amp;nbsp;P. and Santos,&amp;nbsp;N. (2023). From prompt injections to SQL injection attacks: How protected is your llm-integrated web application?, arXiv preprint arXiv:2308.01990. https://doi.org/10.48550/arXiv.2308.01990 (In English)</mixed-citation></ref><ref id="B12"><mixed-citation>Piet,&amp;nbsp;J., Alrashed,&amp;nbsp;M., Sitawarin,&amp;nbsp;C., Chen,&amp;nbsp;S., Wei,&amp;nbsp;Z., Sun,&amp;nbsp;E. and Wagner,&amp;nbsp;D. (2023). Jatmo: Prompt injection defense by task-specific finetuning, arXiv preprint arXiv:2312.17673. https://doi.org/10.48550/arXiv.2312.17673 (In English)</mixed-citation></ref><ref id="B13"><mixed-citation>R&amp;ouml;ttger,&amp;nbsp;P., Pernisi,&amp;nbsp;F., Vidgen,&amp;nbsp;B. and Hovy,&amp;nbsp;D. (2024). Safety prompts: a systematic review of open datasets for evaluating and improving large language model safety, arXiv preprint arXiv:2404.05399.m. https://doi.org/10.48550/arXiv.2404.05399 (In English)</mixed-citation></ref><ref id="B14"><mixed-citation>Rossi,&amp;nbsp;S., Michel,&amp;nbsp;A&amp;nbsp;M., Mukkamala,&amp;nbsp;R.&amp;nbsp;R. and Thatcher,&amp;nbsp;J.&amp;nbsp;B. (2024). An Early Categorization of Prompt Injection Attacks on Large Language Models, arXiv preprint arXiv:2402.00898. https://doi.org/10.48550/arXiv.2402.00898 (In English)</mixed-citation></ref><ref id="B15"><mixed-citation>Tavabi,&amp;nbsp;N., Goyal,&amp;nbsp;P., Almukaynizi,&amp;nbsp;M., Shakarian,&amp;nbsp;P. and Lerman,&amp;nbsp;K. (2018). Darkembed: Exploit prediction with neural language models, Proceedings of the AAAI Conference on Artificial Intelligence, 32, 1, 7849&amp;ndash;7854. https://doi.org/10.1609/aaai.v32i1.11428 (In English)</mixed-citation></ref><ref id="B16"><mixed-citation>Yan,&amp;nbsp;J., Yadav,&amp;nbsp;V., Li,&amp;nbsp;S., Chen,&amp;nbsp;L., Tang,&amp;nbsp;Z., Wang,&amp;nbsp;H. and Jin,&amp;nbsp;H. (2024). Backdooring instruction-tuned large language models with virtual prompt injection, Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 1, Long Papers, 6065&amp;ndash;6086. DOI: 10.18653/v1/2024.naacl-long.337 (In English)</mixed-citation></ref><ref id="B17"><mixed-citation>Yu,&amp;nbsp;J., Wu,&amp;nbsp;Y., Shu,&amp;nbsp;D., Jin,&amp;nbsp;M., Yang, S. and Xing,&amp;nbsp;X. (2023). Assessing prompt injection risks in 200+ custom GPTS, arXiv preprint arXiv:2311.11538. https://doi.org/10.48550/arXiv.2311.11538 (In English)</mixed-citation></ref><ref id="B18"><mixed-citation>Yu,&amp;nbsp;Z., Liu,&amp;nbsp;X., Liang,&amp;nbsp;S., Cameron,&amp;nbsp;Z., Xiao,&amp;nbsp;C. and Zhang,&amp;nbsp;N. (2024). Don&amp;#39;t Listen to Me: Understanding and Exploring Jailbreak Prompts of Large Language Models, arXiv preprint arXiv:2403.17336. https://doi.org/10.48550/arXiv.2403.17336 (In English)</mixed-citation></ref><ref id="B19"><mixed-citation>Zhang,&amp;nbsp;J. (2024). Should We Fear Large Language Models? A Structural Analysis of the Human Reasoning System for Elucidating LLM Capabilities and Risks Through the Lens of Heidegger&amp;rsquo;s Philosophy, arXiv preprint arXiv:2403.03288. https://doi.org/10.48550/arXiv.2403.03288 (In English)</mixed-citation></ref></ref-list></back></article>