Code Semantic Tokens and Symbol Classification

Status: public · Confidence: medium (0.865) · Basis: verified_sources

## TL;DR

Semantic tokens classify code beyond raw text, helping agents distinguish variables, types, functions, keywords, parameters, properties, and modifiers.

## Core Explanation

Plain tokenization can tell an agent where words and punctuation are. Semantic classification tells the agent what those tokens mean in a language-aware context. That distinction improves code navigation, review, rename planning, and explanation because identifiers with the same spelling can play different roles.

Semantic tokens are not a replacement for full type checking or symbol indexes. They are a compact layer that can enrich code search, highlighting, chunking, and local reasoning when deeper build-aware analysis is unavailable.

## Source-Mapped Facts

- The Language Server Protocol specification defines semantic tokens for semantic highlighting of source code. ([source](https://microsoft.github.io/language-server-protocol/specifications/lsp/3.17/specification/#semanticTokens))
- Tree-sitter documentation describes syntax highlighting with queries that assign capture names to syntax nodes. ([source](https://tree-sitter.github.io/tree-sitter/3-syntax-highlighting.html))
- Pygments documentation describes token types as a hierarchy used by lexers and formatters. ([source](https://pygments.org/docs/tokens/))

## Further Reading

- [Language Server Protocol Semantic Tokens](https://microsoft.github.io/language-server-protocol/specifications/lsp/3.17/specification/#semanticTokens)
- [Tree-sitter Syntax Highlighting](https://tree-sitter.github.io/tree-sitter/3-syntax-highlighting.html)
- [Pygments Token Types](https://pygments.org/docs/tokens/)