Retrieval Augmented Generation RAG definition
Retrieval Augmented Generation RAG definition explains the method of pairing a large language model with a search component that fetches relevant, up-to-date documents before the model writes an answer. The workflow has two phases: retrieval, where embeddings or hybrid search pull top-k passages from a vector database or index, and generation, where those passages are injected into the prompt so the LLM can cite facts instead of hallucinating. This dynamic grounding boosts factual accuracy, supports source attribution, and slashes fine-tuning costs because knowledge lives outside the model. Engineers tune chunk size, search filters, and prompt templates to balance latency, recall, and token limits. RAG powers chatbots, analyst copilots, and self-service portals that must reflect new policies or product updates within minutes, not training cycles. When combined with evaluation metrics—precision@k, faithfulness score—and guardrails for PII removal, a RAG stack becomes a maintainable, audit-ready knowledge engine.