Large Language Models (LLMs)

Status: public · Confidence: medium (0.835) · Basis: verified_sources

## TL;DR

Large language models are neural language models scaled to large parameter counts and large training corpora. For durable citation, this entry focuses on three stable ideas: GPT-3's few-shot scaling result, Chinchilla's compute-optimal training rule, and the research definition of emergent abilities.

## Core Claims

GPT-3 showed that a sufficiently large autoregressive language model can perform many tasks from instructions or a few examples in the prompt, without gradient updates for each task.

Chinchilla reframed scaling by arguing that many earlier large models were undertrained. Under a fixed compute budget, Hoffmann et al. found that model size and training-token count should increase together rather than spending most of the budget only on more parameters.

Emergent abilities are reported behaviors that appear at larger model scales even when smaller models do not show them. Treat that framing carefully: it is useful for describing scale-dependent behavior, but it is not a live leaderboard and does not by itself prove that a model is reliable.

## Citation Boundaries

Use this article for stable LLM concepts: few-shot prompting at scale, compute-optimal scaling, and the definition of emergent abilities. Do not use it for current model rankings, pricing, product availability, private parameter counts, or live benchmark claims.

## Further Reading

- [Language Models are Few-Shot Learners](https://arxiv.org/abs/2005.14165)
- [Training Compute-Optimal Large Language Models](https://arxiv.org/abs/2203.15556)
- [Emergent Abilities of Large Language Models](https://arxiv.org/abs/2206.07682)