What does the AI Crawler Checker do?

It fetches the public robots.txt of any domain you enter and evaluates the rules for each major AI crawler — GPTBot, OAI-SearchBot, ChatGPT-User, ClaudeBot, Google-Extended, PerplexityBot, CCBot, Bytespider, Meta-ExternalAgent, Applebot-Extended and more. For each bot it reports whether the homepage path is allowed or disallowed by the site's robots.txt directives.

Does a 'blocked' result mean the AI bot definitely can't access the site?

No. robots.txt is a signal, not enforcement. Well-behaved crawlers (like OpenAI's GPTBot and Google-Extended) honor it, but non-compliant bots can ignore it entirely. Real enforcement happens at the network/WAF layer — Cloudflare's default AI-bot block, a 403, or a 402 Pay-Per-Crawl response. The checker tells you the site's stated intent, not the technical reality.

Why would a site block AI crawlers?

To stop its content being used for AI training without compensation, to preserve bandwidth, or to push AI companies toward licensing. Over 2.5 million sites have opted out of AI training via managed robots.txt, and roughly 18.7% of all sites block GPTBot specifically. Some sites block training crawlers (GPTBot, CCBot, Google-Extended) while still allowing answer-engine crawlers that drive referral traffic.

Is my data stored when I run a check?

No. The tool fetches the target domain's public robots.txt in real time, evaluates it, and returns the result. We don't store the domains you check. Requests are rate-limited to keep the tool fast and fair for everyone.

What's the difference between robots.txt, llms.txt and ai.txt?

robots.txt controls crawler access (a signal). llms.txt is a content map for AI models (it restricts nothing). ai.txt is a training opt-out tied to Spawning's Do-Not-Train registry. This checker reads robots.txt because that's the file AI crawlers actually consult for access. For the full breakdown, see our guide on robots.txt vs llms.txt vs ai.txt.

All systems operational•IP pool status

Dashboard Login/Signup Purchase Guide All Proxies

Free Tool · No Signup · 2026

Can AI Crawlers See Your Site? AI Crawler Checker

Enter any domain to instantly see whether its robots.txt allows or blocks the major AI crawlers — GPTBot, ClaudeBot, Google-Extended, PerplexityBot, CCBot and 15+ more.

We fetch the domain's public robots.txt and evaluate the rules for each known AI crawler. Nothing is stored.

What this tool checks

The checker fetches the public /robots.txt of the domain you enter and resolves the directives for each known AI user-agent against the homepage path. It covers training crawlers, answer-engine crawlers, and the training opt-out tokens:

Training crawlers

GPTBot, ClaudeBot, CCBot, Bytespider, Meta-ExternalAgent, cohere-ai

Answer-engine crawlers

OAI-SearchBot, PerplexityBot, YouBot, Googlebot, Amazonbot

Training opt-out tokens

Google-Extended, Applebot-Extended

Live-fetch user agents

ChatGPT-User, Claude-Web, Perplexity-User

Important: a "blocked" result means the site's robots.txt asks that bot not to crawl — it doesn't guarantee the bot obeys. robots.txt is a signal; real enforcement lives at the WAF/CDN. Read why in robots.txt vs llms.txt vs ai.txt.

FAQ

Learn more

The Closing Web in 2026 (pillar)

AI crawler blocking, Pay-Per-Crawl, and the data wars in full.

robots.txt vs llms.txt vs ai.txt

What each file does, what it can't, and why the WAF is the real control.

Scraping in the Agentic Era (MCP)

How AI agents collect web data and why the IP layer decides.

Is web scraping legal in 2026?

hiQ, Meta v Bright Data, Reddit v Perplexity & DMCA §1201.

How websites detect proxies in 2026

The 7-layer detection stack you must pass as a real visitor.

Web scraping proxies

Real 4G/5G carrier IPs for legitimate public-data collection.