Web Robots Meta and X-Robots-Tag Controls

Status: public · Confidence: medium (0.865) · Basis: verified_sources

## TL;DR

Robots meta tags, X-Robots-Tag headers, and robots.txt help agents explain why a page is crawlable, indexable, hidden from search, or absent from AI discovery surfaces.

## Core Explanation

Agents debugging web discovery should inspect the fetched HTML, response headers, robots.txt, canonical URL, HTTP status, redirects, sitemap entry, and crawler user agent. A page can be reachable by humans while intentionally excluded from search or agent indexes.

The important evidence is where the directive appears. A robots meta tag lives in HTML, while X-Robots-Tag is an HTTP header that can apply to non-HTML resources as well. Robots.txt controls crawler access patterns, not page-level indexing metadata after a crawler is allowed to fetch the page.

## Source-Mapped Facts

- Google Search Central documentation describes robots meta tags as page level settings that control how individual pages appear in search results. ([source](https://developers.google.com/search/docs/crawling-indexing/robots-meta-tag))
- Google Search Central documentation says the X-Robots-Tag HTTP header can control indexing behavior. ([source](https://developers.google.com/search/docs/crawling-indexing/robots-meta-tag))
- RFC 9309 defines the Robots Exclusion Protocol for robots.txt files. ([source](https://datatracker.ietf.org/doc/html/rfc9309))

## Further Reading

- [Google Robots Meta Tag](https://developers.google.com/search/docs/crawling-indexing/robots-meta-tag)
- [RFC 9309 Robots Exclusion Protocol](https://datatracker.ietf.org/doc/html/rfc9309)
- [Google robots.txt Introduction](https://developers.google.com/search/docs/crawling-indexing/robots/intro)