Список литературы

2313-8912

Research Result. Theoretical and Applied Linguistics

2313-8912

10.18413/2313-8912-2024-10-4-0-3

3674

Large Language Models and Prompt Engineering in Linguistics

<strong>Prompt injection – the problem of linguistic vulnerabilities of large language models at the present stage</strong>

Zyryanova

Irina N.

Zyryanova

Irina N.

Irina_zyr@mail.ru

Chernavskiy

Alexander S.

Chernavskiy

Alexander S.

Chernavskiy.com@gmail.com

Trubachev

Stanislav O.

Trubachev

Stanislav O.

brandei@yandex.ru

OPS Guru LLCFederal State Budgetary Educational Institution of Higher Education «Moscow Pedagogical State University»The Federal State Budget Educational Institution of Higher Education «Baikal State University»

2024

10400

The article examines the phenomenon of “prompt injection” in the context of contemporary large language models (LLMs), elucidating a significant challenge for AI developers and researchers. The study comprises a theoretical and methodological review of scholarly publications, thereby enhancing the comprehension of the present state of research in this field. The authors present the findings of a case study, which employs a comparative analysis of the linguistic vulnerabilities of prominent LLMs, including Chat GPT 4.0, Claude 3.5, and Yandex GPT. The study employs experimental evaluation to assess the resilience of these models against a range of vector attacks, with the objective of determining the extent to which each model resists manipulative prompts designed to exploit their linguistic capabilities. A taxonomy of prompt injection attack types was developed based on the collected data, with classification according to effectiveness and targeting of specific LLMs. This classification facilitates comprehension of the nature of these vulnerabilities and provides a basis for future research in this field. Moreover, the article offers suggestions for bolstering the resilience of language models against negative manipulations, representing a significant stride towards the development of safer and more ethical AI systems. These recommendations are based on empirical data and aim to provide practical guidance for developers seeking to enhance the resilience of their models against potential threats. The research findings extend our understanding of linguistic vulnerabilities in LLMs, while also contributing to the development of more effective defence strategies. These have practical implications for the deployment of LLMs across various domains, including education, healthcare and customer service. The authors emphasise the necessity for continuous monitoring and improvement of language model security in an ever-evolving technological landscape. The findings suggest the necessity for an ongoing dialogue among stakeholders to address issues pertaining to the prompt injection of funds.

Prompt injectionLarge language modelsLLMLLM vulnerabilitiesLLM jailbreaksecurity of AILinguistic attacks on LLMPrompts security

Список литературы

Chang, Z., Li, M., Liu, Y., Wang, J., Wang, Q. and Liu, Y. (2024). Play guessing game with LLM: Indirect jailbreak attack with implicit clues, arXiv preprint arXiv:2402.09091. https://doi.org/10.48550/arXiv.2402.09091 (In English)

Chen, S., Zharmagambetov, A., Mahloujifar, S., Chaudhuri, K. and Guo, C. (2024). Aligning LLMs to Be Robust Against Prompt Injection. arXiv preprint arXiv:2410.05451. https://doi.org/10.48550/arXiv.2410.05451 (In English)

Duan, M., Suri, A., Mireshghallah, N., Min, S., Shi, W., Zettlemoyer, L., Tsvetkov, Yu, Choi, Y., Evans, D. and Hajishirzi, H. (2024). Do membership inference attacks work on large language models?, arXiv preprint arXiv:2402.07841. https://doi.org/10.48550/arXiv.2402.07841 (In English)

Hines, K., Lopez, G., Hall, M., Zarfati, F., Zunger, Y. and Kiciman, E. (2024). Defending Against Indirect Prompt Injection Attacks with Spotlighting, arXiv preprint arXiv:2403.14720. https://doi.org/10.48550/arXiv.2403.14720 (In English)

Khandelwal, U., Levy, O., Jurafsky, D., Zettlemoyer, L. and Lewis, M. (2019). Generalization through memorization: Nearest neighbor language models, arXiv preprint arXiv:1911.00172. https://doi.org/10.48550/arXiv.1911.00172 (In English).

Kumar, S. S., Cummings, M. L. and Stimpson, A. (2024, May). Strengthening LLM trust boundaries: A survey of prompt injection attacks, 2024 IEEE 4th International Conference on Human-Machine Systems (ICHMS), 1–6, available at: https://www.researchgate.net/profile/Missy-Cummings/publication/378072627_Strengthening_LLM_Trust_Boundaries_A_Survey_of_Prompt_Injection_Attacks/links/65c57ac379007454976ae142/Strengthening-LLM-Trust-Boundaries-A-Survey-of-Prompt-Injection-Attacks.pdf/ (Accessed 29.06.2024). DOI: 10.1109/ICHMS59971.2024.10555871 (In English)

Li, X., Wang, R., Cheng, M., Zhou, T. and Hsieh, C. J. (2024). Drattack: Prompt decomposition and reconstruction makes powerful llm jailbreakers. arXiv preprint arXiv:2402.16914. https://doi.org/10.48550/arXiv.2402.16914 (In English)

Liu, X., Yu, Z., Zhang, Y., Zhang, N. and Xiao, C. (2024). Automatic and universal prompt injection attacks against large language models, arXiv preprint arXiv:2403.04957. https://doi.org/10.48550/arXiv.2403.04957 (In English)

Marvin, G. Hellen, N., Jjingo, D. and Nakatumba-Nabende, J. 2023). Prompt engineering in large language models, Proceedings of the International conference on data intelligence and cognitive informatics, Springer Nature Singapore, Singapore, 387–402, available at: https://www.researchgate.net/publication/377214553_Prompt_Engineering_in_Large_Language_Models (accessed 29.06.2024). DOI: 10.1007/978-981-99-7962-2_30 (In English)

Mudarova, R., Namiot, D. (2024). Countering Prompt Injection attacks on large language models, International Journal of Open Information Technologies, 12 (5), 39–48. (In Russian)

Pedro, R., Castro, D., Carreira, P. and Santos, N. (2023). From prompt injections to SQL injection attacks: How protected is your llm-integrated web application?, arXiv preprint arXiv:2308.01990. https://doi.org/10.48550/arXiv.2308.01990 (In English)

Piet, J., Alrashed, M., Sitawarin, C., Chen, S., Wei, Z., Sun, E. and Wagner, D. (2023). Jatmo: Prompt injection defense by task-specific finetuning, arXiv preprint arXiv:2312.17673. https://doi.org/10.48550/arXiv.2312.17673 (In English)

Röttger, P., Pernisi, F., Vidgen, B. and Hovy, D. (2024). Safety prompts: a systematic review of open datasets for evaluating and improving large language model safety, arXiv preprint arXiv:2404.05399.m. https://doi.org/10.48550/arXiv.2404.05399 (In English)

Rossi, S., Michel, A M., Mukkamala, R. R. and Thatcher, J. B. (2024). An Early Categorization of Prompt Injection Attacks on Large Language Models, arXiv preprint arXiv:2402.00898. https://doi.org/10.48550/arXiv.2402.00898 (In English)

Tavabi, N., Goyal, P., Almukaynizi, M., Shakarian, P. and Lerman, K. (2018). Darkembed: Exploit prediction with neural language models, Proceedings of the AAAI Conference on Artificial Intelligence, 32, 1, 7849–7854. https://doi.org/10.1609/aaai.v32i1.11428 (In English)

Yan, J., Yadav, V., Li, S., Chen, L., Tang, Z., Wang, H. and Jin, H. (2024). Backdooring instruction-tuned large language models with virtual prompt injection, Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 1, Long Papers, 6065–6086. DOI: 10.18653/v1/2024.naacl-long.337 (In English)

Yu, J., Wu, Y., Shu, D., Jin, M., Yang, S. and Xing, X. (2023). Assessing prompt injection risks in 200+ custom GPTS, arXiv preprint arXiv:2311.11538. https://doi.org/10.48550/arXiv.2311.11538 (In English)

Yu, Z., Liu, X., Liang, S., Cameron, Z., Xiao, C. and Zhang, N. (2024). Don't Listen to Me: Understanding and Exploring Jailbreak Prompts of Large Language Models, arXiv preprint arXiv:2403.17336. https://doi.org/10.48550/arXiv.2403.17336 (In English)

Zhang, J. (2024). Should We Fear Large Language Models? A Structural Analysis of the Human Reasoning System for Elucidating LLM Capabilities and Risks Through the Lens of Heidegger’s Philosophy, arXiv preprint arXiv:2403.03288. https://doi.org/10.48550/arXiv.2403.03288 (In English)