Text Summarization

Bartosz Roguski

Machine Learning Engineer

July 2, 2025

Glossary Category

LLM RAG

Text Summarization is the natural-language-processing task of condensing a source document into a shorter version that preserves its key facts, sentiment, and intent. Modern systems use large language models (LLMs) to generate two styles: extractive, which selects the most important sentences verbatim, and abstractive, which rewrites content in new words, often with higher coherence but greater risk of fabrication. The workflow starts by tokenizing input, encoding it into vectors with a Transformer, and decoding a summary under length, coverage, or headline constraints. Fine-tuned models—GPT-4 Turbo, Gemini, Llama 3—support domain-specific guides like “TL;DR bullets,” “legal brief,” or “executive one-pager.” Evaluation combines ROUGE or BERTScore with LLM-based metrics for factuality and fluency. Summaries power news digests, meeting minutes, and chat inbox triage, and they serve as a grounding step in Retrieval-Augmented Generation (RAG) pipelines to lower context length and costs.

Text Summarization

Other terms