# Foundation Models Status: public Confidence: medium (0.82) (verified) Last verified: 2026-06-01 Generation: ai_structured ## TL;DR Foundation models are reusable AI base models trained at scale and adapted through prompting, fine-tuning, or tool use. They matter for AI programming agents because most coding, planning, retrieval, and multimodal tools are built on this base-model layer. ## Core Explanation The public evidence supports a narrow, useful framing: foundation models are broad pretrained systems, adaptation methods make them useful for tasks, and scaling choices affect training efficiency. This article avoids current-product rankings and focuses on source-mapped architecture and training claims. ## Source-Mapped Facts - The Stanford foundation-models report defines foundation models as models trained on broad data at scale and adaptable to a wide range of downstream tasks. ([source](https://arxiv.org/abs/2108.07258)) - BERT showed that a pretrained bidirectional Transformer can be fine-tuned with an additional output layer for tasks such as question answering and language inference. ([source](https://arxiv.org/abs/1810.04805)) - The Chinchilla paper reports that the compute-optimal model used the same compute budget as Gopher with 70B parameters and four times more data. ([source](https://arxiv.org/abs/2203.15556)) - The GPT-4 technical report describes GPT-4 as a large-scale multimodal model that accepts image and text inputs and produces text outputs. ([source](https://arxiv.org/abs/2303.08774)) ## Further Reading - [On the Opportunities and Risks of Foundation Models](https://arxiv.org/abs/2108.07258) - [BERT](https://arxiv.org/abs/1810.04805) - [Training Compute-Optimal Large Language Models](https://arxiv.org/abs/2203.15556) - [GPT-4 Technical Report](https://arxiv.org/abs/2303.08774)