Back to blog

Advanced RAG pipeline, part 1: Rerankers

Filip Mirski

Agentic AI Engineer

Wojciech Achtelik

PhD(c), ... -> PhD(d), AI Tech Lead

September 8, 2025

Category Post

RAG

Table of content

Standard RAG systems face a common problem: they can quickly find documents, but often provide irrelevant files that lead to low-quality answers. This analysis shows how adding a reranking step solves this challenge, using real-world regulatory documents to demonstrate clear improvements in finding the right information fast.

Announcing our new series: Advanced RAG Techniques

Welcome to the first post in our new series on Advanced Retrieval-Augmented Generation (RAG) Techniques! While many have heard of RAG, getting it to work well in the real world is another story. Standard RAG pipelines are a great start, but they often fall short when faced with complex, domain-specific documents.

In this series, we’ll pull back the curtain and share the tools, methodologies, and tricks we use to build robust and reliable RAG systems. We’ll show you how to move beyond the basics and truly master your RAG pipeline.

RAG allows LLMs to access and reference information outside the LLMs own training data, such as an organization’s specific knowledge base, before generating a response—and, crucially, with citations included. This capability enables LLMs to produce highly specific outputs without extensive fine-tuning or training, delivering some of the benefits of a custom LLM at considerably less expense. Lareina Yee, Senior Partner of McKinsey, October 30, 2024, on What is retrieval-augmented generation (RAG)?

Our testing ground: the EurLex project ⚖️

For this series, we’ve chosen to work with the EurLex dataset, a collection of European Union regulations including more than 700 different documents. This dataset presents many of the same challenges you’ll encounter in real-world enterprise RAG implementations:

Domain complexity: Legal language is precise, nuanced, and often contains subtle distinctions that are critical for accurate retrieval
Document length variability: Relevant documents span from brief amendments to comprehensive regulatory frameworks
Semantic similarity challenges: Multiple documents may discuss similar topics but with vastly different legal implications
High accuracy requirements: In legal contexts, retrieving the wrong document or missing a relevant regulation can have serious consequences

Working with EurLex allows us to demonstrate how our techniques perform under demanding conditions that mirror the sensitive, domain-specific datasets our clients work with regularly.

What are Cross-Encoders (aka Rerankers)?

A common pain point in RAG is providing irrelevant documents in your retrieval step, which leads to poor-quality answers. One of the most effective ways to fix this is by adding a second step to your retrieval process: reranking.

A reranker, also known as a cross-encoder, is a model that takes a user’s query and a single retrieved document and outputs a relevance score. Its only job is to determine how relevant that specific document is to that specific query.

This leads to a powerful two-stage retrieval process:

Stage 1: Broad Retrieval. First, a fast and efficient retriever scans your entire database and pulls a large set of potentially relevant documents (e.g. the top 50). This stage prioritizes speed over accuracy.

Stage 2: Precise Reranking. Next, the reranker (cross-encoder) carefully examines each of those 50 documents alongside the user’s query. It assigns a precise relevance score to each query-document pair. The documents are then re-sorted based on this new score, pushing the most relevant to the top. This stage prioritizes accuracy.

We use two stages because rerankers are computationally intensive and slow, while initial retrievers are fast and efficient. Using a reranker on a full database would be far too slow and costly, so we use a fast retriever to create a smaller, more manageable list for the reranker to work its magic on, getting superior end results at a faster speed and lower overall cost.

In short:

Retrievers (Bi-encoders): Fast but less accurate. They create generic document embeddings without comparing to the query.
Rerankers (Cross-encoders): Slow but highly accurate. They directly compare the query and document text, prioritizing rich contextual relevance.

Putting it to the Test: Reranking in action 📊

Talk is cheap, so let’s look at the data. To measure the real-world impact of a reranker on our EurLex project, we ran a direct comparison.

We built an evaluation dataset of 55 questions for which we are aware of the exact document that contains the correct answer. The test is simple: for each question, does our retrieval system fetch the correct document? We measure this as retrieval accuracy, the percentage of questions for which the correct document was successfully retrieved.

In this experiment, we vary the Top K, which is the number of documents we initially retrieve from our vector store. We then plot the accuracy with and without our cross-encoder reranker.

Line chart comparing retrieval accuracy with and without reranker across Top K values (3, 5, 10, 15). Accuracy is consistently higher with reranker — Figure 1: Retrieval accuracy improves significantly when a reranker (cross-encoder) is applied. The blue line represents results with reranker, showing a steady increase in accuracy from 0.80 at Top K=3 to 0.93 at Top K=15. The red line represents results without reranker, starting lower at 0.67 and plateauing at 0.83

As you can see, the reranker provides a significant and consistent boost in accuracy. Looking at the left side of the chart, you can see that, when retrieving a smaller number of documents (Top K = 3), the reranker improves accuracy by over 10 percent. This is crucial because sending fewer, higher-quality documents to the Language Model (LLM) is more efficient and often yields better final answers.

Adding a reranker isn’t just a theoretical improvement; it’s a practical, high-impact technique for enhancing the quality and reliability of any serious RAG system.

In summary: the power of advanced retrieval-augmented generation techniques

Incorporating a reranking step in your RAG implementation can significantly enhance the effectiveness of your information retrieval process. This advanced technique, combined with proper embedding models and vector stores, forms the foundation for building tailored AI assistants capable of leveraging domain-specific knowledge with high accuracy and reliability.

The RAG system we’ve described here demonstrates several key benefits:

Improved accuracy: By using a two-stage retrieval process with reranking, we can raise the relevance of retrieved documents by over 10%.
Reduced hallucinations: With more accurate document retrieval, we provide LLMs with better context, reducing the likelihood of generating incorrect or irrelevant information.
Efficient processing: The combination of swift initial retrieval and precise reranking allows for the efficient handling of large document collections.
Versatility: This approach is applicable across a variety RAG use cases, from question-answering systems to chatbots and other AI agents.

Infographic showing the four key benefits of rerankers in Retrieval-Augmented Generation: improved accuracy, reduced hallucinations, efficient processing, and versatility — Figure X: Rerankers in Retrieval-Augmented Generation enhance accuracy, reduce hallucinations, enable efficient processing, and provide versatility across AI applications

As we continue to explore advanced RAG techniques in this series, we’ll delve deeper into topics such as hybrid search methods, chunking strategies, and the use of dense vectors for improved similarity scoring. We’ll also discuss how RAG can be integrated with other machine learning and natural language processing techniques to create even more efficient and flexible systems.

You can access the detailed open-source documentation of our process here on Github.

By mastering these advanced RAG implementation techniques, you’ll be well-equipped to tackle complex, knowledge-intensive tasks and build AI systems that can effectively leverage internal and external knowledge sources to provide accurate, contextually relevant responses, whether you’re working with legal documents, technical manuals, or any other domain-specific content, prioritising accuracy.

Ready to see how Rerankers can improve your AI Agent’s performance?

Meet directly with our founders and PhD AI engineers. We will demonstrate real implementations from 30+ agentic projects and show you the practical steps to integrate them into your specific workflows—no hypotheticals, just proven approaches.

Book your session

Keep an eye out for the next installment of our series on Advanced RAG Techniques, where we’ll continue to explore cutting-edge strategies and new techniques for expanding the boundaries of what’s possible with retrieval augmented generation and LLMs.

Last updated: November 13, 2025

The LLM Book

The LLM Book explores the world of Artificial Intelligence and Large Language Models, examining their capabilities, technology, and adaptation.

Read it now

Join the newsletter!

Advanced RAG pipeline, part 1: Rerankers

Announcing our new series: Advanced RAG Techniques

Our testing ground: the EurLex project ⚖️

What are Cross-Encoders (aka Rerankers)?

Putting it to the Test: Reranking in action 📊

In summary: the power of advanced retrieval-augmented generation techniques

Ready to see how Rerankers can improve your AI Agent’s performance?

The LLM Book

Read more from this category

Old-School Keyword Search to the Rescue When Your RAG Fails

When clean text is not enough: structured extraction for RAG

Why RAG is not dead: a case for context engineering over massive context windows

Introduction to Information Retrieval in RAG pipelines