Text Summarization: From Extractive Methods to Abstractive LLM-Based Summarization

## TL;DR
Text summarization condenses documents into concise summaries while preserving key information. The field has evolved from simple sentence extraction to LLM-powered abstractive generation that rewrites content in its own words. The hard problems remain: summarizing book-length documents, ensuring factual accuracy, and adapting summaries to user needs.

## Core Explanation
Two paradigms: (1) Extractive summarization — select and concatenate the most important sentences from the source text. Methods: TextRank (graph-based centrality), LexRank, BERT-based sentence scoring (BERTSUM, MatchSum). Pros: factually accurate (sentences are verbatim). Cons: disjointed, can't synthesize across sentences; (2) Abstractive summarization — generate new sentences that capture the essence. Architecture: encoder-decoder (BART, PEGASUS, T5) — encoder reads source, decoder generates summary. Training: teacher forcing on (document, reference summary) pairs from CNN/DailyMail, XSum, PubMed datasets. LLM-based: prompt-based generation ("Summarize the following article: [text]") with chain-of-thought for complex documents.

## Detailed Analysis
Evaluation metrics: ROUGE (Recall-Oriented Understudy for Gisting Evaluation) — ROUGE-1 (unigram overlap), ROUGE-2 (bigram), ROUGE-L (longest common subsequence). ROUGE correlates moderately with human judgment but misses factual consistency. BERTScore (2020) uses contextual embeddings. Factuality metrics: SummaC, QAFactEval — verify summary claims against source document using NLI (Natural Language Inference). Long document summarization: the key bottleneck was context length — BART/PEGASUS limited to 1024 tokens. Solutions: (1) Hierarchical — encode sentences, then encode sentence representations (HIBERT); (2) Extractive then abstractive — select salient sentences first (30-50% compression), then generate abstractive summary; (3) LLM-based — models with 128K-1M token context windows (GPT-4, Claude, Gemini) directly process entire documents. Domain-specific summarization: medical (radiology reports → findings), legal (case law → holdings), scientific (full papers → structured abstracts). Meeting summarization: multi-speaker dialogue with topic segmentation. The ACM 2025 survey identifies the "summary faithfulness" problem as the most critical open challenge — 3-8% hallucination rate is unacceptable for high-stakes domains.

## Further Reading
- BART: Denoising Sequence-to-Sequence Pre-training (Lewis et al., ACL 2020)
- PEGASUS: Pre-training with Extracted Gap-sentences (Zhang et al., ICML 2020)
- Hugging Face Summarization Pipeline