A TLDR on Web LLM Attacks and Attack Surface Discovery

Home | Blog | A TLDR on Web LLM Attacks and Attack Surface Discovery

Author: Oscar Gutierrez, Security Engineer

July 23, 2024

The adoption of Large Language Models (LLMs) has increased at an alarming rate ever since the introduction of ChatGPT in November of 2022. As with any groundbreaking technology, the adoption rate is not stopping any time soon, and the efforts to protect the applications that leverage them increase exponentially. This post serves as an attempt to explain how threat actors are currently probing and subsequently exploiting web applications that incorporate LLMs into their workflows.

Attack surface discovery

The initial reconnaissance process is a crucial part in any offensive workflow, and attacks targeting Web LLMs are no exception. Web applications may implement LLM APIs in a plethora of processes such as chat support bots, data analytics features, audio and image processing, and many more. Often, the use of LLMs is detected since most companies leveraging this technology are likely to have some type of chat support bot following users around as they browse the website. However, it is always a good idea to crawl the application’s pages in order to detect the usage of any third-party-provided chatbots by looking at URLs invoking services such as:

api-iam.intercom.io
js.intercomcdn.com
api.chatbot.com
api.anthropic.com
api.openai.com

Once all possible features that take user-supplied input have been identified, the next step would be to determine what kind of data or other APIs the chatbot can access. This could be achieved by simply asking the chatbot what it has access to, but as time progresses, this process is likely to become more and more difficult. For this reason, attackers might resort to providing misleading prompts to the LLM in order to trick them into spilling information. For instance, the attackers may convince the chatbot that they are a tech support employee trying to troubleshoot issues with the chatbot.

It is worth noting that enterprise level applications tend to offer an enormous amount of different features and leverage all kinds of technologies via APIs, so the usage of LLMs might be present in other high-value areas such as image processing features and PDF generators that must be discovered before proceeding to the next stage of the attack chain.

Common web LLM attacks

No technology is perfect, and implementing the newest shiny thing always comes at a cost. LLMs are no exception. The risks that LLMs bring to the table may vary depending on the data the model has access to and the manner in which they are being used within the web application’s context. However, there are some specific issues that are unique to LLMs and must be addressed before incorporating any model in a web application.

The most common and often severe issue with LLMs is called Prompt Injection, which leads to the full compromise of the model, allowing attackers to force the LLM to perform unintended actions such as leaking sensitive data, generating responses that the LLM is not designed to provide, tampering with data, and many more.

This attack can be performed in a variety of ways, but for the sake of simplicity, we will describe the simplest way of prompt injection, which is telling the LLM to ignore certain parts of the prompt and only process the prompt that we want. This method often works against vulnerable models because LLMs are known for having a hard time distinguishing between user-supplied prompts and developer-supplied prompts. This attack would look something like this:

As shown in the image above, the attacker fools the chatbot into processing the second prompt and returning the desired response.

These prompt injection attacks can be performed through means other than directly telling the LLM what to do. For example, an attacker could tell the LLM to visit an innocent looking website and parse its contents, but prompt injection payloads could be hidden inside HTML comments, images and invisible characters within the website; which could lead to an Indirect Prompt Injection scenario. This method is often used by attackers to bypass Direct Prompt Injection mitigations implemented by the developers.

Although these issues are unique to LLMs, it is worth noting that LLMs may serve as a proxy for attackers to interact with other sensitive components such as internal APIs and databases, so it is imperative that all user-supplied input is properly sanitized before passing it onto other services.

How can I protect my application against these attacks?

Although there is not a definitive answer to this question due to the constant evolution of GenerativeAI technology and threat actor sophistication, it is crucial to implement mitigations such as feeding LLMs with only necessary data, sanitizing user-supplied data, restricting LLMs to interact with external websites when not necessary, and constant human lead testing (do not rely on other LLMs to find vulnerabilities in your application).