Context Window (Context Length)

wojciech achtelik
Wojciech Achtelik
AI Engineer Lead
July 7, 2025
Glossary Category
LLM

Context Window (Context Length) is the maximum number of tokens—words, sub-words, or bytes—that a large language model can consider at once when generating or interpreting text. It equals the sum of the input prompt and any tokens the model has already produced, capped by a hard limit set during training (e.g., 8 k for GPT-3.5, 200 k for Claude 3.5, 1 M for Gemini 1.5 Pro). Tokens that exceed this limit are truncated or ignored, so developers must chunk, summarize, or retrieve only the most relevant passages to fit. Larger windows reduce truncation and hallucinations but demand more GPU memory and increase latency, while smaller windows cut cost at the risk of losing critical context. Techniques such as Retrieval-Augmented Generation (RAG), sliding-window attention, and hierarchical summarization extend effective memory without blowing the budget. Choosing the right context length balances accuracy, speed, and dollar spend in chatbots, document QA, and coding copilots.