Retriever (RAG)
Retriever (RAG) is the component in a Retrieval-Augmented Generation pipeline that locates the most relevant documents for a user query before a large language model (LLM) crafts the final answer. It converts the query into a vector embedding, searches a vector database with similarity metrics such as cosine or dot-product, and returns a top-k list of passages. Popular retriever types include dense semantic search (Sentence-BERT, OpenAI embeddings), hybrid BM25-plus-vector search, and re-ranked cross-encoders for higher precision. A well-tuned retriever boosts factual accuracy, lowers hallucinations, and trims token costs by feeding only high-value context into the LLM’s context window. Key settings—embedding model, k value, max marginal relevance (MMR), and metadata filters—balance recall versus latency. Monitoring recall@k and hit-rate guards against drift as content grows. In essence, the Retriever is the “memory lookup” engine that grounds generative AI in trustworthy knowledge.