Re-Ranking
Re-Ranking is the post-processing step that resorts an initial list of retrieved items—documents, products, passages—by applying a second, more precise model to each candidate. After a fast first-stage search (BM25, dense vector similarity) returns the top-k results, a Re-Ranking model—often a cross-encoder like MonoT5, Cohere Rerank, or OpenAI’s text-embedding-ada-rerank—evaluates the query–item pair jointly and assigns a relevance score. The list is then reordered, boosting precision and user satisfaction while allowing a smaller k to cut bandwidth and token costs in Retrieval-Augmented Generation (RAG). Key knobs include k size, latency budget, and hybrid scoring that blends original and rerank scores. Metrics such as NDCG and recall@k measure impact, and A/B tests detect improvements. By replacing rough heuristics with deep semantic scoring, Re-Ranking squeezes extra accuracy from existing search or recommendation pipelines without changing storage layers.