Context Window

PG()
Bartosz Roguski
Machine Learning Engineer
July 3, 2025
Glossary Category
LLM

Context Window is the maximum number of tokens that an AI model can process and retain in memory during a single inference session, determining the scope of information the model can reference when generating responses. This fundamental limitation affects the model’s ability to maintain coherence across long conversations, analyze extensive documents, and perform complex reasoning tasks that require broad contextual understanding. Context window size is typically measured in tokens, with modern language models supporting ranges from 2,048 to over 1 million tokens, directly impacting computational requirements and memory consumption. The constraint necessitates careful prompt engineering, context management strategies, and chunking techniques for processing lengthy inputs. Advanced implementations utilize techniques like sliding windows, hierarchical attention, and retrieval-augmented generation to extend effective context beyond native limitations while maintaining performance and managing computational costs efficiently.