Tests 21 AI bot user-agents against your robots.txt — GPTBot, ClaudeBot, PerplexityBot, Google-Extended, Bytespider, CCBot, and more. Distinguishes training crawlers from search crawlers.
Check if your site is ready for AI crawlers, citations & training.
The AI Readiness Check is a free 5-stage diagnostic that scores your domain on AI bot access, llms.txt quality, structured data, content rendering, and metadata. Median scan time: under a second.
What this tool measures
AI Readiness is a composite score across five independent dimensions. Each dimension is parsed from public, no-auth-required signals on your domain. Total: 100 points.
Detects /llms.txt and /llms-full.txt. Scores presence, structural compliance with the proposed spec, link completeness, and reachability of linked resources.
Extracts JSON-LD, microdata, and RDFa. Validates against schema.org. Recognizes Organization, WebSite, Article, FAQPage, Product, and BreadcrumbList schemas.
Measures the gap between server-rendered HTML and JS-rendered HTML. Most AI training bots cannot execute JavaScript — content behind a hydration boundary is invisible to them.
Inventories <title>, meta description, canonical, OpenGraph, Twitter Card, and AI-specific meta directives. Probes for sitemap.xml and validates its structure.
Why AI readiness is the new SEO surface area
In 2024 your homepage competed for ten blue links on a results page. In 2026 it competes for inclusion inside an AI-generated answer. The mechanics are different enough that classic SEO audits miss most of what now matters:
- AI crawlers have separate user-agents. Your robots.txt may welcome Googlebot and quietly exclude GPTBot. Most CMS defaults haven't caught up to the bots that emerged in the last 18 months.
- AI training crawlers do not run JavaScript. A perfectly optimized Next.js SPA can be invisible to them. Server-side rendering is no longer optional for content you want cited.
- Citation depends on entity identity. Without JSON-LD, an LLM has to guess who you are from prose — and it often guesses wrong, citing competitors with stronger entity signals instead.
- llms.txt is the new sitemap. It's how you tell AI systems which pages matter most. Without it, they crawl breadth-first and may never reach your core content.
What happens when you scan
Each scan is a 5-stage probe completed in under a second. No JavaScript execution in the primary pass; no account required; results are not stored against your IP.
- Stage 1 — Fetch robots.txt Parse User-agent blocks. Match against our 21-bot catalogue. Detect Sitemap directives.
- Stage 2 — Probe /llms.txt and /llms-full.txt Validate Content-Type, parse markdown structure, walk the first five links for reachability.
- Stage 3 — Fetch homepage without JS Extract all <script type="application/ld+json"> blocks. Validate against schema.org vocabularies.
- Stage 4 — Measure rendering gap Compare server-rendered body length to a headless render. A large gap means most AI bots see an empty page.
- Stage 5 — Catalogue meta surface Title, description, canonical, OpenGraph, Twitter Card, AI-specific directives, sitemap.xml.
{
"domain": "cloudflare.com",
"score": 85,
"grade": "A",
"strategy": "balanced",
"checks": {
"robots_txt": { "score": 20, "exists": true },
"llms_txt": { "score": 20, "exists": true, "links": 84 },
"structured_data": { "score": 15, "json_ld_count": 3 },
"content_accessibility": { "score": 20, "rendering": "server_rendered" },
"meta_sitemap": { "score": 10, "has_canonical": true }
}
}How to read your grade
Scores map to letter grades on a standard curve. The 'balanced' strategy weights all five dimensions according to their point allocation. A future 'citation-first' strategy will up-weight structured data and llms.txt.
- A · 90–100 · AI-ready Your domain is discoverable, parseable, and identifiable to every major AI system. No critical gaps.
- B · 75–89 · Strong foundation One or two dimensions are weak — usually a missing llms.txt or partial schema coverage. Quick fixes available.
- C · 60–74 · Adequate, with gaps AI systems can find you but may miss context. Expect inconsistent citations and partial content extraction.
- D · 40–59 · Significant gaps Multiple dimensions failing. AI systems may skip your domain entirely or hallucinate basic facts about you.
- F · 0–39 · Not AI-ready Either blocked, invisible, or unparseable. Your content is effectively missing from the AI-search era.
The 21 AI bots we track
AI user-agents are split into three classes because their access policies usually differ. You may want to allow search bots while restricting training bots, or vice versa.
GPTBot · ClaudeBot · anthropic-ai · Google-Extended · CCBot · Meta-ExternalAgent · Bytespider · Applebot-Extended · cohere-ai · Diffbot · Omgilibot · FacebookBot · img2dataset
OAI-SearchBot · ChatGPT-User · Claude-SearchBot · Claude-User · PerplexityBot · Perplexity-User · DuckAssistBot · YouBot
Googlebot · Bingbot. Tracked separately — these don't count toward the AI Crawler Access score, but appear in your report so you can see classic SEO bots alongside AI ones.
How to raise your score, in order of effort
Most domains can move two letter grades in an afternoon. Ordered from lowest to highest effort:
- Add /llms.txt (~15 min · up to +20 pts) A flat markdown file at your root with an H1, a one-line description, and a list of links to your most important pages. Spec at llmstxt.org.
- Add baseline JSON-LD (~30 min · up to +15 pts) Organization and WebSite schemas in your <head>. Include name, url, logo, sameAs (social profiles), and SearchAction. Closes most identity gaps in one shot.
- Explicit AI bot rules in robots.txt (~10 min · up to +5 pts) Add User-agent blocks for GPTBot, ClaudeBot, PerplexityBot, etc. with explicit Allow: / directives. The absence of rules counts against you — bots can't tell intent from silence.
- Generate a sitemap.xml (~1 hr · up to +5 pts) Reference it from robots.txt with a Sitemap: directive. Most static-site generators and CMSes produce this automatically — verify yours is actually being served at /sitemap.xml.
- Move to server-side or static rendering (1+ sprint · up to +20 pts) If your Content Accessibility score is below 15, your SPA is the bottleneck. Migrate to Next.js SSR, Remix, Astro, or a static export. Hydration after first paint is fine; an empty initial HTML response is not.
- Add /llms-full.txt for high-context citation (~half day · marginal) An expanded version with inline content for large context windows. Useful if you want AI systems to cite your full documentation without re-crawling each page.
Use this programmatically
The same scan that powers this page is available as a JSON API. No auth required for ad-hoc checks; rate-limited per IP. For CI/CD or monitoring use cases, request an API key.
curl https://api.domainscan.in/v1/ai-ready?domain=cloudflare.comconst res = await fetch(
'https://api.domainscan.in/v1/ai-ready?domain=cloudflare.com'
);
const report = await res.json();
console.log(report.score, report.grade);
// 85 "A"{
"domain": "string",
"score": "number (0–100)",
"grade": "A | B | C | D | F",
"strategy": "balanced | citation-first",
"checks": {
"robots_txt": {
"exists": "boolean",
"ai_bots": { "training": {}, "search": {}, "traditional": {} },
"has_sitemap_directive": "boolean",
"score": "number"
},
"llms_txt": {
"exists": "boolean",
"parsed": { "has_h1": "boolean", "sections": [], "links": [] },
"llms_full_exists": "boolean",
"score": "number"
},
"structured_data": {
"found_schemas": {},
"json_ld_count": "number",
"score": "number"
},
"content_accessibility": {
"rendering": "server_rendered | js_required",
"body_text_length": "number",
"score": "number"
},
"meta_sitemap": {
"og_tags": {},
"has_canonical": "boolean",
"sitemap_exists": "boolean",
"score": "number"
}
},
"recommendations": [
{
"severity": "pass | warning | error",
"category": "string",
"title": "string",
"description": "string",
"impact": "string"
}
]
}How teams use this tool
Six patterns we see most often:
Run before a marketing site goes live. Catches misconfigured robots.txt and missing schema before your domain accumulates a poor AI-visibility baseline.
Schedule a scan every quarter as part of your SEO review. Track grade over time. The bot catalogue grows as new crawlers emerge.
Scan five competitors in your category. Identify gaps where they're winning AI citations — usually llms.txt presence or richer schema coverage.
Hit the API from your deployment pipeline. Fail the build if a content release pushes your score below a threshold (e.g., grade drops from A to B).
Investors and acquirers use AI readiness as a leading indicator of content-distribution health. A low score signals compounding visibility debt.
Re-scan after a CMS migration, framework upgrade, or rendering-strategy change. SPA migrations are the most common cause of overnight score drops.
Common questions
- How accurate is the score? The score reflects what AI bots can mechanically observe — robots.txt rules, JSON-LD presence, server-rendered content. It doesn't measure content quality or topical authority. A high score is necessary, not sufficient, for AI citation.
- Why is my React/Vue SPA scored low when my content is excellent? AI training crawlers fetch HTML over a basic HTTP request. They don't run JavaScript. If your content renders client-side, the bot sees an empty <div id="root"></div>. Server-side rendering, static generation, or pre-rendering fixes this — the rest of your stack can stay.
- Is my scan stored? Anonymous scans are cached for 24 hours to reduce load on your domain. They aren't associated with your IP or identity. Signed-in users get persistent scan history saved to their account.
- How often should I re-scan? Quarterly for established sites; after every major content or framework release for active projects. New AI bots emerge frequently — the catalogue we test against grows roughly every quarter.
- Should I block AI training crawlers? That's a content-licensing decision, not a technical one. Blocking GPTBot won't remove you from ChatGPT search (that's ChatGPT-User), and won't affect Google. Decide per-bot, per-class, based on your content strategy.
- What's the difference between training bots and search bots? Training bots (GPTBot, ClaudeBot, Google-Extended) crawl to build model weights, shaping future model versions. Search bots (OAI-SearchBot, PerplexityBot, ChatGPT-User) crawl at query time to ground a live answer. Most sites benefit from allowing search bots even when restricting training bots.
- Can I run this in CI? Yes — hit the JSON API and assert on `score` or specific check fields. Fail the build if a regression is detected. Free tier is rate-limited; for high-volume CI use, request an API key.
- What's coming next? A 'citation-first' scoring strategy that weights llms.txt and structured data more heavily, separate scores per language/region, and a diff view so you can see exactly what changed between scans.