Is Your Online Store Blocking AI Bots? A Crawlability Checklist
If AI crawlers can't access your product pages, AI search engines can't recommend your products. Here's how to check your robots.txt, fix common blocks, and make your store visible to ChatGPT, Perplexity, and Google AI.
You can have the most complete structured data, the richest product descriptions, and perfect taxonomy — but if AI crawlers can’t access your product pages, none of it matters. Your store is invisible to AI search.
This is more common than you’d think. Many e-commerce stores unknowingly block AI bots through misconfigured robots.txt files, CDN settings, or hosting defaults. The result: zero AI referral traffic, while competitors who allow access get recommended by ChatGPT, Perplexity, and Google AI Overviews.
The AI crawlers you need to know
There are two distinct categories of AI bots, and understanding the difference matters for your robots.txt strategy:
AI search and shopping bots (you likely want these)
These bots crawl your pages in real time to answer user queries — including product shopping queries. Blocking them means your products won’t appear in AI shopping recommendations.
| Bot | Operator | Purpose | User Agent |
|---|---|---|---|
| OAI-SearchBot | OpenAI | Powers ChatGPT Shopping and ChatGPT Search. Crawls pages to answer queries in real time. | OAI-SearchBot |
| PerplexityBot | Perplexity | Indexes content for Perplexity’s AI search and shopping features. Fastest-growing AI crawler — 157,490% increase in requests in 2025. | PerplexityBot |
| Google-Extended | Used for Gemini and Google AI Overviews. Separate from Googlebot (which handles traditional search). | Google-Extended | |
| Applebot-Extended | Apple | Powers Apple Intelligence features, Siri, and Spotlight suggestions. | Applebot-Extended |
| ClaudeBot | Anthropic | Powers Claude’s web search and answer features. | ClaudeBot |
| Bytespider | ByteDance | Used for TikTok search and content recommendations. | Bytespider |
AI training bots (your choice)
These bots scrape content to train foundational models. Blocking them doesn’t affect your visibility in AI search results — it only prevents your content from being used in future model training.
| Bot | Operator | Purpose | User Agent |
|---|---|---|---|
| GPTBot | OpenAI | Scrapes content for training OpenAI models. Separate from OAI-SearchBot. | GPTBot |
| CCBot | Common Crawl | Open web crawl used by many AI companies for training data. | CCBot |
| FacebookBot | Meta | Crawls for Meta AI training data. | FacebookBot |
| Diffbot | Diffbot | Web scraping for structured data extraction and AI training. | Diffbot |
The key distinction: you can block training bots without affecting your AI search visibility. Many merchants want to allow search bots (so their products get recommended) while blocking training bots (so their content isn’t used to train models). This is a valid and increasingly common configuration.
How to check your current setup
Step 1: View your robots.txt
Navigate to yourstore.com/robots.txt in your browser. This file tells all crawlers which parts of your site they can and can’t access.
Look for any Disallow rules under user agents that match the AI bots listed above. For example:
# This blocks ChatGPT Shopping from accessing your products
User-agent: OAI-SearchBot
Disallow: /
# This blocks Perplexity from indexing your store
User-agent: PerplexityBot
Disallow: /
If you see rules like these, your store is invisible to those AI platforms.
Also check for blanket blocks that affect all bots:
# This blocks EVERYTHING — including AI bots
User-agent: *
Disallow: /
Step 2: Check your CDN/hosting settings
Since July 2025, Cloudflare blocks AI bots by default for all customers. If your store uses Cloudflare (directly, or through a hosting provider that uses it), AI crawlers may be blocked at the network level — even if your robots.txt allows them.
Check your Cloudflare dashboard under Security > Bots. Look for the “AI Bots” toggle. If it’s set to block, AI crawlers are being stopped before they ever reach your robots.txt.
Other CDN providers and hosting platforms may have similar default-block settings. Check your hosting provider’s bot management or security documentation.
Step 3: Test with Google Search Console
Google Search Console’s URL Inspection tool shows whether Googlebot can access and render your pages. While this doesn’t directly test AI bots, if Googlebot can’t crawl a page, it’s very likely AI bots can’t either.
Check for pages with “Crawled - currently not indexed” or “Discovered - not indexed” status, as these may indicate crawlability issues.
Shopify-specific crawlability
Shopify handles robots.txt differently from most platforms, and there are several Shopify-specific considerations.
Shopify’s default robots.txt
By default, Shopify allows all crawlers. The default robots.txt blocks access to /admin, /cart, /checkout, /search, /policies/, and filtered collection URLs (/collections/*+*) — which is correct. Product pages, collection pages, and your homepage are accessible.
However, there are situations where this changes:
Custom robots.txt overrides
Shopify allows merchants to customize their robots.txt through the robots.txt.liquid theme file. If you (or a developer, or a Shopify app) have edited this file, it may contain custom rules that block AI bots.
To check, go to your Shopify admin: Online Store > Themes > Edit Code and search for robots.txt.liquid. If this file exists, review its contents for any AI bot blocks.
Password-protected stores
If your store has password protection enabled (even unintentionally — common on development stores), all bots are blocked. Ensure password protection is disabled for your live store.
Shopify apps and meta tags
Some Shopify SEO apps add <meta name="robots" content="noindex"> tags to certain pages. This tells all search engines and AI bots not to index those pages. Check your product page source code for any noindex meta tags.
The OAI-SearchBot consideration
ChatGPT Shopping uses OAI-SearchBot to discover and index products. If you’ve customized your robots.txt to block GPTBot (the training bot) but haven’t explicitly allowed OAI-SearchBot, double-check that OAI-SearchBot isn’t caught by your blocking rules. They’re separate bots with separate user agents — you can block one while allowing the other.
The JavaScript rendering problem
This one catches many e-commerce stores. Some product page content is only rendered by JavaScript after the initial page load. Traditional search engines like Google can render JavaScript, but many AI crawlers cannot — or choose not to for efficiency.
Google’s own documentation states that structured data must be present in the HTML returned from the web server and cannot be generated by JavaScript after page load.
What this means for your store:
- Product data injected by client-side JavaScript (React apps, single-page applications, dynamically loaded content) may be invisible to AI crawlers
- Schema markup added by JavaScript-based Shopify apps may not be seen by crawlers that don’t render JavaScript
- Lazy-loaded content that requires scrolling or user interaction to appear won’t be crawled
To test: view your product page source code (right-click > View Page Source, not Inspect Element). If your product description, price, or structured data isn’t visible in the raw HTML source, AI crawlers likely can’t see it either.
For Shopify stores, prefer Liquid-rendered content over JavaScript-injected content for critical product data. Theme app embeds that render server-side are preferable to apps that inject data via client-side scripts.
Sitemap hygiene
Your sitemap tells crawlers what pages exist and when they were last updated. A stale or incomplete sitemap means AI crawlers may miss products.
Check your sitemap
Visit yourstore.com/sitemap.xml. Verify:
- All products are included. Shopify auto-generates sitemaps, but products marked as “hidden” from online store channels won’t appear.
- URLs are current. If you’ve changed URL handles, old URLs should redirect properly.
- Last modified dates are accurate. Stale dates may cause crawlers to skip pages they think haven’t changed.
- No excessive pages. Sitemaps bloated with tag pages, duplicate filtered URLs, or paginated collection pages can dilute crawl budget.
Shopify sitemap limits
Shopify auto-generates sitemaps with up to 50,000 URLs per sitemap file, with an index file linking to sub-sitemaps for products, collections, blogs, and pages. For most stores this works well, but verify that your product sitemap includes all active products.
Page speed and crawl efficiency
AI crawlers have limited crawl budgets — they won’t spend unlimited time waiting for slow pages. If your product pages take too long to load, crawlers may abandon them or deprioritize your site.
Key performance factors:
- Large, uncompressed images — the most common culprit on e-commerce sites. Use WebP format and proper image sizing.
- Excessive third-party scripts — every Shopify app that loads JavaScript on your storefront adds to page weight. Audit your apps and remove unused ones.
- Render-blocking resources — CSS and JavaScript that prevents the page content from loading quickly.
For Shopify stores, the biggest quick wins are usually image optimization and removing unused apps.
The crawlability checklist
Run through this checklist to ensure AI search engines can access your store:
robots.txt
- Visit
yourstore.com/robots.txtand review all rules - Confirm
OAI-SearchBotis not blocked (ChatGPT Shopping) - Confirm
PerplexityBotis not blocked (Perplexity Shopping) - Confirm
Google-Extendedis not blocked (Google AI Overviews) - Decide whether to allow or block training bots (
GPTBot,CCBot) - On Shopify: check
robots.txt.liquidfor custom overrides
CDN and hosting
- If using Cloudflare: check AI bot blocking settings in Security > Bots
- If using another CDN: verify AI bot access isn’t blocked at the network level
- Confirm your store is not password-protected
Page-level signals
- Check product pages for
noindexmeta tags - Verify structured data appears in raw HTML source (not JavaScript-rendered only)
- Confirm critical product data (description, price, availability) is in server-rendered HTML
- Check that no Shopify app is adding bot-blocking headers or meta tags
Sitemap
- Visit
yourstore.com/sitemap.xmland confirm it loads - Verify all active products appear in the product sitemap
- Check that URLs are current and not pointing to old handles
Performance
- Test product page load time (aim for under 3 seconds)
- Compress and properly size product images
- Remove unused Shopify apps that inject storefront scripts
The tradeoff: content protection vs AI visibility
There’s a real tension here. Many merchants (and especially content publishers) are concerned about AI companies using their content to train models without compensation. That’s a legitimate concern, and blocking training bots like GPTBot and CCBot is a reasonable response.
But there’s a critical distinction between training bots and search bots. You can protect your content from being used for model training while still allowing AI search engines to recommend your products. The two are separate bot categories with separate user agents.
The recommended approach for most e-commerce stores:
- Allow AI search bots —
OAI-SearchBot,PerplexityBot,Google-Extended,Applebot-Extended,ClaudeBot - Decide on training bots based on your values — blocking
GPTBotandCCBotwon’t affect your AI search visibility - Monitor and adjust — as the AI landscape evolves, review your bot policies quarterly
What to do next
Start by checking your robots.txt right now — it takes 30 seconds. Navigate to yourstore.com/robots.txt and scan for AI bot blocks. If you find any, decide whether they’re intentional or accidental.
If you’re on Shopify, StoreBeam checks your store’s crawlability as part of its AI readiness scan — including robots.txt configuration, theme app embed status, and structured data rendering. It flags issues that could be making your products invisible to AI search engines, so you can fix them before your competitors do.
Crawlability is the foundation layer. If AI bots can’t reach your pages, nothing else you optimize matters. Fix this first, then focus on structured data, product descriptions, and taxonomy.