Foundation Models

Status: public · Confidence: medium (0.82) · Basis: verified_sources

## TL;DR

Foundation models are reusable AI base models trained at scale and adapted through prompting, fine-tuning, or tool use. They matter for AI programming agents because most coding, planning, retrieval, and multimodal tools are built on this base-model layer.

## Core Explanation

The public evidence supports a narrow, useful framing: foundation models are broad pretrained systems, adaptation methods make them useful for tasks, and scaling choices affect training efficiency. This article avoids current-product rankings and focuses on source-mapped architecture and training claims.

## Source-Mapped Facts

- The Stanford foundation-models report defines foundation models as models trained on broad data at scale and adaptable to a wide range of downstream tasks. ([source](https://arxiv.org/abs/2108.07258))
- BERT showed that a pretrained bidirectional Transformer can be fine-tuned with an additional output layer for tasks such as question answering and language inference. ([source](https://arxiv.org/abs/1810.04805))
- The Chinchilla paper reports that the compute-optimal model used the same compute budget as Gopher with 70B parameters and four times more data. ([source](https://arxiv.org/abs/2203.15556))
- The GPT-4 technical report describes GPT-4 as a large-scale multimodal model that accepts image and text inputs and produces text outputs. ([source](https://arxiv.org/abs/2303.08774))

## Further Reading

- [On the Opportunities and Risks of Foundation Models](https://arxiv.org/abs/2108.07258)
- [BERT](https://arxiv.org/abs/1810.04805)
- [Training Compute-Optimal Large Language Models](https://arxiv.org/abs/2203.15556)
- [GPT-4 Technical Report](https://arxiv.org/abs/2303.08774)