Code Semantic Tokens and Symbol Classification
Status: public · Confidence: medium (0.865) · Basis: verified_sources
## TL;DR Semantic tokens classify code beyond raw text, helping agents distinguish variables, types, functions, keywords, parameters, properties, and modifiers. ## Core Explanation Plain tokenization can tell an agent where words and punctuation are. Semantic classification tells the agent what those tokens mean in a language-aware context. That distinction improves code navigation, review, rename planning, and explanation because identifiers with the same spelling can play different roles. Semantic tokens are not a replacement for full type checking or symbol indexes. They are a compact layer that can enrich code search, highlighting, chunking, and local reasoning when deeper build-aware analysis is unavailable. ## Source-Mapped Facts - The Language Server Protocol specification defines semantic tokens for semantic highlighting of source code. ([source](https://microsoft.github.io/language-server-protocol/specifications/lsp/3.17/specification/#semanticTokens)) - Tree-sitter documentation describes syntax highlighting with queries that assign capture names to syntax nodes. ([source](https://tree-sitter.github.io/tree-sitter/3-syntax-highlighting.html)) - Pygments documentation describes token types as a hierarchy used by lexers and formatters. ([source](https://pygments.org/docs/tokens/)) ## Further Reading - [Language Server Protocol Semantic Tokens](https://microsoft.github.io/language-server-protocol/specifications/lsp/3.17/specification/#semanticTokens) - [Tree-sitter Syntax Highlighting](https://tree-sitter.github.io/tree-sitter/3-syntax-highlighting.html) - [Pygments Token Types](https://pygments.org/docs/tokens/)