Prompt Injection
Prompt Injection is an adversarial attack in which a user adds hidden or overt instructions to input text that override or alter a large language model’s original directives, causing it to leak private data, execute disallowed actions, or produce harmful content. The attacker appends commands (“Ignore previous instructions and…”) or embeds them in URLs, PDFs, or style tags so the model processes them as part of the prompt. Two variants exist: direct injection, where the malicious text is sent by the user, and indirect (cross-domain) injection, where the text comes from untrusted external data that the application feeds to the model. Defenses include input sanitization, output filtering, role-based system prompts, retrieval whitelists, and function-calling scopes that strictly separate user content from control tokens. Security teams monitor jailbreak success rate, false-positive blocks, and latency overhead to balance safety with usability. As LLMs integrate into code execution, customer support, and autonomous agents, Prompt Injection stands as the primary vector that must be mitigated to keep generative AI trustworthy and compliant.