On-The-Fly Indexing
On-The-Fly Indexing is the technique of ingesting and indexing new data at query time instead of through pre-scheduled batch jobs, enabling search and Retrieval-Augmented Generation (RAG) systems to surface content seconds after it appears. When a user submits a query, a micro-pipeline crawls the targeted source, extracts text, chunks it, generates vector embeddings, and temporarily stores them in memory or a hot vector store like Pinecone or Chroma. Results are merged with the pre-built index, ranked, and returned—all within a single request–response cycle. This approach eliminates staleness and supports just-in-time analytics for breaking news, live chats, or IoT streams. Trade-offs include higher query-time latency and transient memory spikes, mitigated by caching, rate limits, and incremental write-backs to the persistent index during low-traffic windows. Key metrics are end-to-end latency, freshness (seconds since publish), and recall@k. By fusing crawling and retrieval, On-The-Fly Indexing delivers real-time knowledge without complex ETL schedules.