Web Robots Meta and X-Robots-Tag Controls
Status: public · Confidence: medium (0.865) · Basis: verified_sources
## TL;DR Robots meta tags, X-Robots-Tag headers, and robots.txt help agents explain why a page is crawlable, indexable, hidden from search, or absent from AI discovery surfaces. ## Core Explanation Agents debugging web discovery should inspect the fetched HTML, response headers, robots.txt, canonical URL, HTTP status, redirects, sitemap entry, and crawler user agent. A page can be reachable by humans while intentionally excluded from search or agent indexes. The important evidence is where the directive appears. A robots meta tag lives in HTML, while X-Robots-Tag is an HTTP header that can apply to non-HTML resources as well. Robots.txt controls crawler access patterns, not page-level indexing metadata after a crawler is allowed to fetch the page. ## Source-Mapped Facts - Google Search Central documentation describes robots meta tags as page level settings that control how individual pages appear in search results. ([source](https://developers.google.com/search/docs/crawling-indexing/robots-meta-tag)) - Google Search Central documentation says the X-Robots-Tag HTTP header can control indexing behavior. ([source](https://developers.google.com/search/docs/crawling-indexing/robots-meta-tag)) - RFC 9309 defines the Robots Exclusion Protocol for robots.txt files. ([source](https://datatracker.ietf.org/doc/html/rfc9309)) ## Further Reading - [Google Robots Meta Tag](https://developers.google.com/search/docs/crawling-indexing/robots-meta-tag) - [RFC 9309 Robots Exclusion Protocol](https://datatracker.ietf.org/doc/html/rfc9309) - [Google robots.txt Introduction](https://developers.google.com/search/docs/crawling-indexing/robots/intro)