Stochastic RAG: End-to-end REG optimized by expected utility maximization

PG()
Bartosz Roguski
Machine Learning Engineer
June 24, 2025

Stochastic RAG: End-to-end REG optimized by expected utility maximization is a research-grade extension of Retrieval-Augmented Generation (RAG) that treats both retrieval and generation as probabilistic actions in a single decision process. Instead of picking top-k documents deterministically, the retriever samples candidate passages from a relevance distribution; the generator then samples response tokens. A utility function—often a blend of factuality, length, and style scores—evaluates each full answer. Using reinforcement learning or policy-gradient methods, the system updates retriever weights and decoding strategies to maximize expected utility across many rollouts. This end-to-end training aligns the retriever with the generator’s real needs, reduces exposure bias, and gracefully handles uncertain or conflicting sources. The result is a resilient pipeline that balances accuracy, diversity, and cost—ideal for domains with noisy corpora, shifting knowledge, or risk-weighted outputs such as finance and healthcare.