Список литературы

2313-8912

Научный результат. Вопросы теоретической и прикладной лингвистики

2313-8912

10.18413/2313-8912-2024-10-4-0-3

3674

Большие языковые модели и промпт-инжиниринг в лингвистических исследованиях

<strong>Prompt injection – проблема лингвистических уязвимостей больших языковых моделей на современном этапе</strong>

<strong>Prompt injection – the problem of linguistic vulnerabilities of large language models at the present stage</strong>

Зырянова

Ирина Николаевна

Zyryanova

Irina N.

Irina_zyr@mail.ru

Чернавский

Александр Сергеевич

Chernavskiy

Alexander S.

Chernavskiy.com@gmail.com

Трубачев

Станислав Олегович

Trubachev

Stanislav O.

brandei@yandex.ru

Федеральное государственное бюджетное образовательное учреждение высшего образования «Московский педагогический государственный университет»ООО «ОПС Гуру»Федеральное государственное бюджетное образовательное учреждение высшего образования «Байкальский государственный университет»

2024

10400

В данной статье рассматривается феномен «инъекции запросов» в контексте современных больших языковых моделей (LLMs), что представляет собой актуальную проблему для разработчиков и исследователей в области ИИ. Исследование включает теоретический и методологический обзор научных публикаций, углубляющий понимание текущего состояния в этой области. Авторы представляют результаты кейс-стади, проводя сравнительный анализ лингвистической уязвимости популярных LLM, таких как Chat GPT 4o, Claude 3.5 и Yandex GPT. В ходе исследования были проведены эксперименты для проверки устойчивости этих моделей к различным векторным атакам с целью оценить, насколько эффективно каждая модель противостоит манипулятивным запросам, направленным на использование их лингвистических возможностей. На основе полученных данных была разработана таксономия типов атак «инъекции запросов», классифицирующая их по эффективности и нацеленности на конкретные LLM. Эта классификация помогает понять природу уязвимости и служит основой для будущих исследований в данной области. Кроме того, в статье предлагаются рекомендации по повышению устойчивости языковых моделей к негативным манипуляциям, что является важным шагом к созданию более безопасных и этичных систем ИИ. Эти рекомендации основаны на эмпирических данных и направлены на предоставление практических рекомендаций для разработчиков, стремящихся улучшить безопасность своих моделей против потенциальных угроз. Результаты исследования расширяют наше понимание лингвистической уязвимости в LLM и способствуют разработке более эффективных стратегий защиты, что имеет практическое значение для будущих исследований и внедрения LLM в различных сферах, включая образование, здравоохранение и обслуживание клиентов в целом. Авторы подчеркивают необходимость постоянного мониторинга и улучшения безопасности языковых моделей в условиях постоянно меняющегося технологического ландшафта. Представленные выводы призывают к постоянному диалогу между заинтересованными сторонами для решения проблем, связанных с «инъекцией запросов».

The article examines the phenomenon of “prompt injection” in the context of contemporary large language models (LLMs), elucidating a significant challenge for AI developers and researchers. The study comprises a theoretical and methodological review of scholarly publications, thereby enhancing the comprehension of the present state of research in this field. The authors present the findings of a case study, which employs a comparative analysis of the linguistic vulnerabilities of prominent LLMs, including Chat GPT 4.0, Claude 3.5, and Yandex GPT. The study employs experimental evaluation to assess the resilience of these models against a range of vector attacks, with the objective of determining the extent to which each model resists manipulative prompts designed to exploit their linguistic capabilities. A taxonomy of prompt injection attack types was developed based on the collected data, with classification according to effectiveness and targeting of specific LLMs. This classification facilitates comprehension of the nature of these vulnerabilities and provides a basis for future research in this field. Moreover, the article offers suggestions for bolstering the resilience of language models against negative manipulations, representing a significant stride towards the development of safer and more ethical AI systems. These recommendations are based on empirical data and aim to provide practical guidance for developers seeking to enhance the resilience of their models against potential threats. The research findings extend our understanding of linguistic vulnerabilities in LLMs, while also contributing to the development of more effective defence strategies. These have practical implications for the deployment of LLMs across various domains, including education, healthcare and customer service. The authors emphasise the necessity for continuous monitoring and improvement of language model security in an ever-evolving technological landscape. The findings suggest the necessity for an ongoing dialogue among stakeholders to address issues pertaining to the prompt injection of funds.

Prompt injection«Инъекции запросов»БЯМ (Большие языковые модели)Лингвистическая уязвимость БЯМБезопасность БЯМЛингвистические атаки БЯМАтаки на ИИ

Prompt injectionLarge language modelsLLMLLM vulnerabilitiesLLM jailbreaksecurity of AILinguistic attacks on LLMPrompts security

Список литературы

Chang Z., Li M., Liu Y., Wang J., Wang Q., Liu Y. Play guessing game with LLM: Indirect jailbreak attack with implicit clues. 2024. arXiv preprint arXiv:2402.09091. https://doi.org/10.48550/arXiv.2402.09091

Chen S., Zharmagambetov A., Mahloujifar S., Chaudhuri K., Guo C. Aligning LLMs to be robust against prompt injection. 2024. arXiv preprint arXiv:2410.05451. https://doi.org/10.48550/arXiv.2410.05451

Duan M., Suri A., Mireshghallah N., Min S., Shi W., Zettlemoyer L., Tsvetkov Yu, Choi,Y., Evans D., Hajishirzi H. Do membership inference attacks work on large language models? 2024. arXiv preprint arXiv:2402.07841. https://doi.org/10.48550/arXiv.2402.07841

Hines K., Lopez G., Hall M., Zarfati F., Zunger Y., Kiciman E. Defending against indirect prompt injection attacks with spotlighting. 2024. arXiv preprint arXiv:2403.14720. https://doi.org/10.48550/arXiv.2403.14720

Khandelwal U., Levy O., Jurafsky D., Zettlemoyer L. and Lewis M. Generalization through memorization: Nearest neighbor language models. 2019. arXiv preprint arXiv:1911.00172. https://doi.org/10.48550/arXiv.1911.00172

Kumar S. S., Cummings M. L., Stimpson A. Strengthening LLM trust boundaries: A survey of prompt injection attacks // 2024 IEEE 4th International Conference on Human-Machine Systems (ICHMS). 2024, May. Pp. 1–6. URL: https://www.researchgate.net/profile/Missy-Cummings/publication/378072627_Strengthening_LLM_Trust_Boundaries_A_Survey_of_Prompt_Injection_Attacks/links/65c57ac379007454976ae142/Strengthening-LLM-Trust-Boundaries-A-Survey-of-Prompt-Injection-Attacks.pdf/ (дата обращения: 29.06.2024). DOI: 10.1109/ICHMS59971.2024.10555871

Li X., Wang R., Cheng M., Zhou T., Hsieh C. J. Drattack: Prompt decomposition and reconstruction makes powerful llm jailbreakers. 2024. arXiv preprint arXiv:2402.16914. https://doi.org/10.48550/arXiv.2402.16914

Liu X., Yu Z., Zhang Y., Zhang N., Xiao C. Automatic and universal prompt injection attacks against large language models. 2024. arXiv preprint arXiv:2403.04957. https://doi.org/10.48550/arXiv.2403.04957

Marvin G., Hellen N., Jjingo D., Nakatumba-Nabende J. Prompt engineering in large language models // Proceedings of the International conference on data intelligence and cognitive informatics. Springer Nature Singapore, Singapore, 2023. Pp. 387–402. URL: https://www.researchgate.net/publication/377214553_Prompt_Engineering_in_Large_Language_Models (дата обращения: 29.06.2024). DOI: 10.1007/978-981-99-7962-2_30

Мударова Р. М., Намиот Д. Е. Противодействие атакам типа инъекция подсказок на большие языковые модели // International Journal of Open Information Technologies. 2024. Т. 12. № 5. С. 39–48.

Pedro R., Castro D., Carreira P. and Santos N. From prompt injections to SQL injection attacks: How protected is your llm-integrated web application? 2023. arXiv preprint arXiv:2308.01990. DOI: https://doi.org/10.48550/arXiv.2308.01990

Piet J., Alrashed M., Sitawarin C., Chen S., Wei Z., Sun E., Wagner D. Jatmo: Prompt injection defense by task-specific finetuning. 2023. arXiv preprint arXiv:2312.17673. DOI: https://doi.org/10.48550/arXiv.2312.17673

Röttger P., Pernisi F., Vidgen B., Hovy D. Safety prompts: a systematic review of open datasets for evaluating and improving large language model safety. 2024. arXiv preprint arXiv:2404.05399.m. https://doi.org/10.48550/arXiv.2404.05399

Rossi S., Michel A. M., Mukkamala R. R., Thatcher J. B. An early categorization of prompt injection attacks on Large Language Models. 2024. arXiv preprint arXiv:2402.00898. https://doi.org/10.48550/arXiv.2402.00898

Tavabi N., Goyal P., Almukaynizi M., Shakarian P., Lerman K. Darkembed: Exploit prediction with neural language models // Proceedings of the AAAI Conference on Artificial Intelligence. 2018. 32. 1. Pp. 7849–7854. DOI: https://doi.org/10.1609/aaai.v32i1.11428

Yan J., Yadav V., Li S., Chen L., Tang Z., Wang H., Jin H. Backdooring instruction-tuned large language models with virtual prompt injection // Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2024. Vol. 1: Pp. 6065–6086. DOI: 10.18653/v1/2024.naacl-long.337

Yu J., Wu Y., Shu D., Jin M., Yang S., Xing X. Assessing prompt injection risks in 200+ custom GPTS. 2023. arXiv preprint arXiv:2311.11538. https://doi.org/10.48550/arXiv.2311.11538

Yu Z., Liu X., Liang S., Cameron Z., Xiao C. and Zhang N. Don't listen to me: understanding and exploring jailbreak prompts of large language models. 2024. arXiv preprint arXiv:2403.17336. https://doi.org/10.48550/arXiv.2403.17336

Zhang J. Should we fear large language models? A structural analysis of the human reasoning system for elucidating LLM capabilities and risks through the lens of Heidegger’s philosophy. 2024. arXiv preprint arXiv:2403.03288. https://doi.org/10.48550/arXiv.2403.03288