On-The-Fly Indexing

Antoni Kozelski
CEO & Co-founder
Published: July 3, 2025
Glossary Category
RAG

On-The-Fly Indexing is the technique of ingesting and indexing new data at query time instead of through pre-scheduled batch jobs, enabling search and Retrieval-Augmented Generation (RAG) systems to surface content seconds after it appears. When a user submits a query, a micro-pipeline crawls the targeted source, extracts text, chunks it, generates vector embeddings, and temporarily stores them in memory or a hot vector store like Pinecone or Chroma. Results are merged with the pre-built index, ranked, and returned—all within a single request–response cycle. This approach eliminates staleness and supports just-in-time analytics for breaking news, live chats, or IoT streams. Trade-offs include higher query-time latency and transient memory spikes, mitigated by caching, rate limits, and incremental write-backs to the persistent index during low-traffic windows. Key metrics are end-to-end latency, freshness (seconds since publish), and recall@k. By fusing crawling and retrieval, On-The-Fly Indexing delivers real-time knowledge without complex ETL schedules.

Want to learn how these AI concepts work in practice?

Understanding AI is one thing. Explore how we apply these AI principles to build scalable, agentic workflows that deliver real ROI and value for organizations.

Last updated: August 4, 2025