Code Taint Tracking and Data-Flow Security for Agents

Status: public · Confidence: medium (0.685) · Basis: verified_sources

## TL;DR

Taint tracking helps code agents trace untrusted input from sources to risky sinks and check whether sanitizers break the path.

## Core Explanation

General data-flow analysis tells an agent where values can travel. Security taint tracking adds a threat model: which inputs are untrusted, which operations are dangerous, and which transformations count as sanitization.

Agents should preserve the rule ID, source model, sink model, sanitizer model, path explanation, language, framework, and reviewed false-positive status before opening a security fix. This matters because a generated patch can silence a finding without fixing the underlying source-to-sink path.

## Source-Mapped Facts

- CodeQL documentation describes data-flow analysis as computing possible values that can flow to a program point. ([source](https://codeql.github.com/docs/writing-codeql-queries/about-data-flow-analysis/))
- CodeQL documentation distinguishes taint tracking from normal data flow because taint tracking can follow values through transformations that are not value-preserving. ([source](https://codeql.github.com/docs/writing-codeql-queries/about-data-flow-analysis/#taint-tracking))
- Semgrep taint-mode documentation describes taint analysis as tracking data from sources to sinks, with sanitizers able to remove taint. ([source](https://semgrep.dev/docs/writing-rules/data-flow/taint-mode/overview))

## Further Reading

- [CodeQL Data Flow Analysis](https://codeql.github.com/docs/writing-codeql-queries/about-data-flow-analysis/)
- [Semgrep Taint Analysis](https://semgrep.dev/docs/writing-rules/data-flow/taint-mode/overview)