DOI: 10.18413/2313-8912-2024-10-4-0-3

Prompt injection – the problem of linguistic vulnerabilities of large language models at the present stage

Irina N. Zyryanova (The Federal State Budget Educational Institution of Higher Education «Baikal State University»)
Alexander S. Chernavskiy (Federal State Budgetary Educational Institution of Higher Education «Moscow Pedagogical State University»)
Stanislav O. Trubachev (OPS Guru LLC)

The article examines the phenomenon of “prompt injection” in the context of contemporary large language models (LLMs), elucidating a significant challenge for AI developers and researchers. The study comprises a theoretical and methodological review of scholarly publications, thereby enhancing the comprehension of the present state of research in this field. The authors present the findings of a case study, which employs a comparative analysis of the linguistic vulnerabilities of prominent LLMs, including Chat GPT 4.0, Claude 3.5, and Yandex GPT. The study employs experimental evaluation to assess the resilience of these models against a range of vector attacks, with the objective of determining the extent to which each model resists manipulative prompts designed to exploit their linguistic capabilities. A taxonomy of prompt injection attack types was developed based on the collected data, with classification according to effectiveness and targeting of specific LLMs. This classification facilitates comprehension of the nature of these vulnerabilities and provides a basis for future research in this field. Moreover, the article offers suggestions for bolstering the resilience of language models against negative manipulations, representing a significant stride towards the development of safer and more ethical AI systems. These recommendations are based on empirical data and aim to provide practical guidance for developers seeking to enhance the resilience of their models against potential threats. The research findings extend our understanding of linguistic vulnerabilities in LLMs, while also contributing to the development of more effective defence strategies. These have practical implications for the deployment of LLMs across various domains, including education, healthcare and customer service. The authors emphasise the necessity for continuous monitoring and improvement of language model security in an ever-evolving technological landscape. The findings suggest the necessity for an ongoing dialogue among stakeholders to address issues pertaining to the prompt injection of funds.

Keywords: Prompt injection, Large language models, LLM, LLM vulnerabilities, LLM jailbreak, security of AI, Linguistic attacks on LLM, Prompts security.

Figures

Number of views: 828 (view statistics)

Количество скачиваний: 1422

Full text (HTML)Full text (PDF)To articles list

Information for citation:

Zyryanova, I. N., Chernavskiy, A. S., Trubachev, S. O. (2024). Prompt injection – the problem of linguistic vulnerabilities of large language models at the present stage, Research Result. Theoretical and Applied Linguistics, 10 (4), 40–52.

User comments
Reference lists

While nobody left any comments to this publication.
You can be first.

Chang, Z., Li, M., Liu, Y., Wang, J., Wang, Q. and Liu, Y. (2024). Play guessing game with LLM: Indirect jailbreak attack with implicit clues, arXiv preprint arXiv:2402.09091. https://doi.org/10.48550/arXiv.2402.09091(In English)

Chen, S., Zharmagambetov, A., Mahloujifar, S., Chaudhuri, K. and Guo, C. (2024). Aligning LLMs to Be Robust Against Prompt Injection. arXiv preprint arXiv:2410.05451. https://doi.org/10.48550/arXiv.2410.05451(In English)

Duan, M., Suri, A., Mireshghallah, N., Min, S., Shi, W., Zettlemoyer, L., Tsvetkov, Yu, Choi, Y., Evans, D. and Hajishirzi, H. (2024). Do membership inference attacks work on large language models?, arXiv preprint arXiv:2402.07841. https://doi.org/10.48550/arXiv.2402.07841(In English)

Hines, K., Lopez, G., Hall, M., Zarfati, F., Zunger, Y. and Kiciman, E. (2024). Defending Against Indirect Prompt Injection Attacks with Spotlighting, arXiv preprint arXiv:2403.14720. https://doi.org/10.48550/arXiv.2403.14720(In English)

Khandelwal, U., Levy, O., Jurafsky, D., Zettlemoyer, L. and Lewis, M. (2019). Generalization through memorization: Nearest neighbor language models, arXiv preprint arXiv:1911.00172. https://doi.org/10.48550/arXiv.1911.00172(In English).

Kumar, S. S., Cummings, M. L. and Stimpson, A. (2024, May). Strengthening LLM trust boundaries: A survey of prompt injection attacks, 2024 IEEE 4th International Conference on Human-Machine Systems (ICHMS), 1–6, available at: https://www.researchgate.net/profile/Missy-Cummings/publication/378072627_Strengthening_LLM_Trust_Boundaries_A_Survey_of_Prompt_Injection_Attacks/links/65c57ac379007454976ae142/Strengthening-LLM-Trust-Boundaries-A-Survey-of-Prompt-Injection-Attacks.pdf/ (Accessed 29.06.2024). DOI: 10.1109/ICHMS59971.2024.10555871 (In English)

Li, X., Wang, R., Cheng, M., Zhou, T. and Hsieh, C. J. (2024). Drattack: Prompt decomposition and reconstruction makes powerful llm jailbreakers. arXiv preprint arXiv:2402.16914. https://doi.org/10.48550/arXiv.2402.16914 (In English)

Liu, X., Yu, Z., Zhang, Y., Zhang, N. and Xiao, C. (2024). Automatic and universal prompt injection attacks against large language models, arXiv preprint arXiv:2403.04957. https://doi.org/10.48550/arXiv.2403.04957(In English)

Marvin, G. Hellen, N., Jjingo, D. and Nakatumba-Nabende, J. 2023). Prompt engineering in large language models, Proceedings of the International conference on data intelligence and cognitive informatics, Springer Nature Singapore, Singapore, 387–402, available at: https://www.researchgate.net/publication/377214553_Prompt_Engineering_in_Large_Language_Models (accessed 29.06.2024). DOI: 10.1007/978-981-99-7962-2_30 (In English)

Mudarova, R., Namiot, D. (2024). Countering Prompt Injection attacks on large language models, International Journal of Open Information Technologies, 12 (5), 39–48. (In Russian)

Pedro, R., Castro, D., Carreira, P. and Santos, N. (2023). From prompt injections to SQL injection attacks: How protected is your llm-integrated web application?, arXiv preprint arXiv:2308.01990. https://doi.org/10.48550/arXiv.2308.01990(In English)

Piet, J., Alrashed, M., Sitawarin, C., Chen, S., Wei, Z., Sun, E. and Wagner, D. (2023). Jatmo: Prompt injection defense by task-specific finetuning, arXiv preprint arXiv:2312.17673. https://doi.org/10.48550/arXiv.2312.17673(In English)

Röttger, P., Pernisi, F., Vidgen, B. and Hovy, D. (2024). Safety prompts: a systematic review of open datasets for evaluating and improving large language model safety, arXiv preprint arXiv:2404.05399.m. https://doi.org/10.48550/arXiv.2404.05399(In English)

Rossi, S., Michel, A M., Mukkamala, R. R. and Thatcher, J. B. (2024). An Early Categorization of Prompt Injection Attacks on Large Language Models, arXiv preprint arXiv:2402.00898. https://doi.org/10.48550/arXiv.2402.00898(In English)

Tavabi, N., Goyal, P., Almukaynizi, M., Shakarian, P. and Lerman, K. (2018). Darkembed: Exploit prediction with neural language models, Proceedings of the AAAI Conference on Artificial Intelligence, 32, 1, 7849–7854. https://doi.org/10.1609/aaai.v32i1.11428(In English)

Yan, J., Yadav, V., Li, S., Chen, L., Tang, Z., Wang, H. and Jin, H. (2024). Backdooring instruction-tuned large language models with virtual prompt injection, Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 1, Long Papers, 6065–6086. DOI: 10.18653/v1/2024.naacl-long.337 (In English)

Yu, J., Wu, Y., Shu, D., Jin, M., Yang, S. and Xing, X. (2023). Assessing prompt injection risks in 200+ custom GPTS, arXiv preprint arXiv:2311.11538. https://doi.org/10.48550/arXiv.2311.11538(In English)

Yu, Z., Liu, X., Liang, S., Cameron, Z., Xiao, C. and Zhang, N. (2024). Don't Listen to Me: Understanding and Exploring Jailbreak Prompts of Large Language Models, arXiv preprint arXiv:2403.17336. https://doi.org/10.48550/arXiv.2403.17336(In English)

Zhang, J. (2024). Should We Fear Large Language Models? A Structural Analysis of the Human Reasoning System for Elucidating LLM Capabilities and Risks Through the Lens of Heidegger’s Philosophy, arXiv preprint arXiv:2403.03288. https://doi.org/10.48550/arXiv.2403.03288(In English)

All journals

Send article

Research Result. Theoretical and Applied Linguistics is included in the scientific database of the RINTs (license agreement No. 765-12/2014 dated 08.12.2014).

Журнал включен в перечень рецензируемых научных изданий, рекомендуемых ВАК

The journal is indexed by the following scientific databases and platforms

Research Result. Research Result. Theoretical and Applied Linguistics (ISSN 2313-8912)

The journal materials and website are licensed under Creative Commons «Attribution» 4.0 International.

The Founder: Federal State Autonomous Educational Institution of Higher Education "Belgorod National Research University"The Founder’s address: 85 Pobedy Street, Belgorod, the Belgorod region, 308015, Russia

The Publisher: Federal State Autonomous Educational Institution of HigherEducation "Belgorod National Research University" The Founder’s address:85 Pobedy Street, Belgorod, the Belgorod region, 308015, Russia

Editors Office: chief editor Olga Dekhnich, e-mail: RR_Linguistics@bsuedu.ru, phone: (4722) 301254.

Registered by the Federal Service for Supervision of Communications, Information Technology and Mass Media (Roskomnadzor)

Certificate

Charter of the editorial board of the mass media "Research Result. Theoretical and Applied Linguistics"

Order No. 636-OD dated 30.06.2023 "On approval of the Charters of the editorial boards of the mass media of scientific journals of Belgorod State National Research University"

Order No. 1097-OD from 15.11.2023 "On approval of the Regulations for the publication of scientific journals of Belgorod State National Research University"

Have questions?
You can write to us:

✉ Executive Secretary

✉ Site administration

✉ Content manager