Install free on Shopify
← Back to blog

Is Your Online Store Blocking AI Bots? A Crawlability Checklist

If AI crawlers can't access your product pages, AI search engines can't recommend your products. Here's how to check your robots.txt, fix common blocks, and make your store visible to ChatGPT, Perplexity, and Google AI.

You can have the most complete structured data, the richest product descriptions, and perfect taxonomy — but if AI crawlers can’t access your product pages, none of it matters. Your store is invisible to AI search.

This is more common than you’d think. Many e-commerce stores unknowingly block AI bots through misconfigured robots.txt files, CDN settings, or hosting defaults. The result: zero AI referral traffic, while competitors who allow access get recommended by ChatGPT, Perplexity, and Google AI Overviews.

The AI crawlers you need to know

There are two distinct categories of AI bots, and understanding the difference matters for your robots.txt strategy:

AI search and shopping bots (you likely want these)

These bots crawl your pages in real time to answer user queries — including product shopping queries. Blocking them means your products won’t appear in AI shopping recommendations.

BotOperatorPurposeUser Agent
OAI-SearchBotOpenAIPowers ChatGPT Shopping and ChatGPT Search. Crawls pages to answer queries in real time.OAI-SearchBot
PerplexityBotPerplexityIndexes content for Perplexity’s AI search and shopping features. Fastest-growing AI crawler — 157,490% increase in requests in 2025.PerplexityBot
Google-ExtendedGoogleUsed for Gemini and Google AI Overviews. Separate from Googlebot (which handles traditional search).Google-Extended
Applebot-ExtendedApplePowers Apple Intelligence features, Siri, and Spotlight suggestions.Applebot-Extended
ClaudeBotAnthropicPowers Claude’s web search and answer features.ClaudeBot
BytespiderByteDanceUsed for TikTok search and content recommendations.Bytespider

AI training bots (your choice)

These bots scrape content to train foundational models. Blocking them doesn’t affect your visibility in AI search results — it only prevents your content from being used in future model training.

BotOperatorPurposeUser Agent
GPTBotOpenAIScrapes content for training OpenAI models. Separate from OAI-SearchBot.GPTBot
CCBotCommon CrawlOpen web crawl used by many AI companies for training data.CCBot
FacebookBotMetaCrawls for Meta AI training data.FacebookBot
DiffbotDiffbotWeb scraping for structured data extraction and AI training.Diffbot

The key distinction: you can block training bots without affecting your AI search visibility. Many merchants want to allow search bots (so their products get recommended) while blocking training bots (so their content isn’t used to train models). This is a valid and increasingly common configuration.

How to check your current setup

Step 1: View your robots.txt

Navigate to yourstore.com/robots.txt in your browser. This file tells all crawlers which parts of your site they can and can’t access.

Look for any Disallow rules under user agents that match the AI bots listed above. For example:

# This blocks ChatGPT Shopping from accessing your products
User-agent: OAI-SearchBot
Disallow: /

# This blocks Perplexity from indexing your store
User-agent: PerplexityBot
Disallow: /

If you see rules like these, your store is invisible to those AI platforms.

Also check for blanket blocks that affect all bots:

# This blocks EVERYTHING — including AI bots
User-agent: *
Disallow: /

Step 2: Check your CDN/hosting settings

Since July 2025, Cloudflare blocks AI bots by default for all customers. If your store uses Cloudflare (directly, or through a hosting provider that uses it), AI crawlers may be blocked at the network level — even if your robots.txt allows them.

Check your Cloudflare dashboard under Security > Bots. Look for the “AI Bots” toggle. If it’s set to block, AI crawlers are being stopped before they ever reach your robots.txt.

Other CDN providers and hosting platforms may have similar default-block settings. Check your hosting provider’s bot management or security documentation.

Step 3: Test with Google Search Console

Google Search Console’s URL Inspection tool shows whether Googlebot can access and render your pages. While this doesn’t directly test AI bots, if Googlebot can’t crawl a page, it’s very likely AI bots can’t either.

Check for pages with “Crawled - currently not indexed” or “Discovered - not indexed” status, as these may indicate crawlability issues.

Shopify-specific crawlability

Shopify handles robots.txt differently from most platforms, and there are several Shopify-specific considerations.

Shopify’s default robots.txt

By default, Shopify allows all crawlers. The default robots.txt blocks access to /admin, /cart, /checkout, /search, /policies/, and filtered collection URLs (/collections/*+*) — which is correct. Product pages, collection pages, and your homepage are accessible.

However, there are situations where this changes:

Custom robots.txt overrides

Shopify allows merchants to customize their robots.txt through the robots.txt.liquid theme file. If you (or a developer, or a Shopify app) have edited this file, it may contain custom rules that block AI bots.

To check, go to your Shopify admin: Online Store > Themes > Edit Code and search for robots.txt.liquid. If this file exists, review its contents for any AI bot blocks.

Password-protected stores

If your store has password protection enabled (even unintentionally — common on development stores), all bots are blocked. Ensure password protection is disabled for your live store.

Shopify apps and meta tags

Some Shopify SEO apps add <meta name="robots" content="noindex"> tags to certain pages. This tells all search engines and AI bots not to index those pages. Check your product page source code for any noindex meta tags.

The OAI-SearchBot consideration

ChatGPT Shopping uses OAI-SearchBot to discover and index products. If you’ve customized your robots.txt to block GPTBot (the training bot) but haven’t explicitly allowed OAI-SearchBot, double-check that OAI-SearchBot isn’t caught by your blocking rules. They’re separate bots with separate user agents — you can block one while allowing the other.

The JavaScript rendering problem

This one catches many e-commerce stores. Some product page content is only rendered by JavaScript after the initial page load. Traditional search engines like Google can render JavaScript, but many AI crawlers cannot — or choose not to for efficiency.

Google’s own documentation states that structured data must be present in the HTML returned from the web server and cannot be generated by JavaScript after page load.

What this means for your store:

  • Product data injected by client-side JavaScript (React apps, single-page applications, dynamically loaded content) may be invisible to AI crawlers
  • Schema markup added by JavaScript-based Shopify apps may not be seen by crawlers that don’t render JavaScript
  • Lazy-loaded content that requires scrolling or user interaction to appear won’t be crawled

To test: view your product page source code (right-click > View Page Source, not Inspect Element). If your product description, price, or structured data isn’t visible in the raw HTML source, AI crawlers likely can’t see it either.

For Shopify stores, prefer Liquid-rendered content over JavaScript-injected content for critical product data. Theme app embeds that render server-side are preferable to apps that inject data via client-side scripts.

Sitemap hygiene

Your sitemap tells crawlers what pages exist and when they were last updated. A stale or incomplete sitemap means AI crawlers may miss products.

Check your sitemap

Visit yourstore.com/sitemap.xml. Verify:

  • All products are included. Shopify auto-generates sitemaps, but products marked as “hidden” from online store channels won’t appear.
  • URLs are current. If you’ve changed URL handles, old URLs should redirect properly.
  • Last modified dates are accurate. Stale dates may cause crawlers to skip pages they think haven’t changed.
  • No excessive pages. Sitemaps bloated with tag pages, duplicate filtered URLs, or paginated collection pages can dilute crawl budget.

Shopify sitemap limits

Shopify auto-generates sitemaps with up to 50,000 URLs per sitemap file, with an index file linking to sub-sitemaps for products, collections, blogs, and pages. For most stores this works well, but verify that your product sitemap includes all active products.

Page speed and crawl efficiency

AI crawlers have limited crawl budgets — they won’t spend unlimited time waiting for slow pages. If your product pages take too long to load, crawlers may abandon them or deprioritize your site.

Key performance factors:

  • Large, uncompressed images — the most common culprit on e-commerce sites. Use WebP format and proper image sizing.
  • Excessive third-party scripts — every Shopify app that loads JavaScript on your storefront adds to page weight. Audit your apps and remove unused ones.
  • Render-blocking resources — CSS and JavaScript that prevents the page content from loading quickly.

For Shopify stores, the biggest quick wins are usually image optimization and removing unused apps.

The crawlability checklist

Run through this checklist to ensure AI search engines can access your store:

robots.txt

  • Visit yourstore.com/robots.txt and review all rules
  • Confirm OAI-SearchBot is not blocked (ChatGPT Shopping)
  • Confirm PerplexityBot is not blocked (Perplexity Shopping)
  • Confirm Google-Extended is not blocked (Google AI Overviews)
  • Decide whether to allow or block training bots (GPTBot, CCBot)
  • On Shopify: check robots.txt.liquid for custom overrides

CDN and hosting

  • If using Cloudflare: check AI bot blocking settings in Security > Bots
  • If using another CDN: verify AI bot access isn’t blocked at the network level
  • Confirm your store is not password-protected

Page-level signals

  • Check product pages for noindex meta tags
  • Verify structured data appears in raw HTML source (not JavaScript-rendered only)
  • Confirm critical product data (description, price, availability) is in server-rendered HTML
  • Check that no Shopify app is adding bot-blocking headers or meta tags

Sitemap

  • Visit yourstore.com/sitemap.xml and confirm it loads
  • Verify all active products appear in the product sitemap
  • Check that URLs are current and not pointing to old handles

Performance

  • Test product page load time (aim for under 3 seconds)
  • Compress and properly size product images
  • Remove unused Shopify apps that inject storefront scripts

The tradeoff: content protection vs AI visibility

There’s a real tension here. Many merchants (and especially content publishers) are concerned about AI companies using their content to train models without compensation. That’s a legitimate concern, and blocking training bots like GPTBot and CCBot is a reasonable response.

But there’s a critical distinction between training bots and search bots. You can protect your content from being used for model training while still allowing AI search engines to recommend your products. The two are separate bot categories with separate user agents.

The recommended approach for most e-commerce stores:

  1. Allow AI search botsOAI-SearchBot, PerplexityBot, Google-Extended, Applebot-Extended, ClaudeBot
  2. Decide on training bots based on your values — blocking GPTBot and CCBot won’t affect your AI search visibility
  3. Monitor and adjust — as the AI landscape evolves, review your bot policies quarterly

What to do next

Start by checking your robots.txt right now — it takes 30 seconds. Navigate to yourstore.com/robots.txt and scan for AI bot blocks. If you find any, decide whether they’re intentional or accidental.

If you’re on Shopify, StoreBeam checks your store’s crawlability as part of its AI readiness scan — including robots.txt configuration, theme app embed status, and structured data rendering. It flags issues that could be making your products invisible to AI search engines, so you can fix them before your competitors do.

Crawlability is the foundation layer. If AI bots can’t reach your pages, nothing else you optimize matters. Fix this first, then focus on structured data, product descriptions, and taxonomy.