Re-Ranking

Bartosz Roguski

Machine Learning Engineer

Published: July 3, 2025

Glossary Category

RAG

Re-Ranking is the post-processing step that resorts an initial list of retrieved items—documents, products, passages—by applying a second, more precise model to each candidate. After a fast first-stage search (BM25, dense vector similarity) returns the top-k results, a Re-Ranking model—often a cross-encoder like MonoT5, Cohere Rerank, or OpenAI’s text-embedding-ada-rerank—evaluates the query–item pair jointly and assigns a relevance score. The list is then reordered, boosting precision and user satisfaction while allowing a smaller k to cut bandwidth and token costs in Retrieval-Augmented Generation (RAG). Key knobs include k size, latency budget, and hybrid scoring that blends original and rerank scores. Metrics such as NDCG and recall@k measure impact, and A/B tests detect improvements. By replacing rough heuristics with deep semantic scoring, Re-Ranking squeezes extra accuracy from existing search or recommendation pipelines without changing storage layers.

Want to learn how these AI concepts work in practice?

Understanding AI is one thing. Explore how we apply these AI principles to build scalable, agentic workflows that deliver real ROI and value for organizations.

Last updated: July 3, 2025