AI Crawler Control and Management
Control how AI crawlers like GPTBot, Claude-Web, Perplexity, and others access your changelog for AI training purposes. Manage this separately from search engine indexing.
Understanding AI Crawlers
AI crawlers are automated bots used by AI companies to collect content from the web for training language models. These include:
- GPTBot - OpenAI's crawler for ChatGPT training
- ChatGPT-User - OpenAI's user agent
- Claude-Web - Anthropic's crawler for Claude training
- anthropic-ai - Anthropic's AI crawler
- PerplexityBot - Perplexity AI's crawler
- Google-Extended - Google's AI training crawler
- cohere-ai - Cohere's AI crawler
- YouBot - You.com's AI crawler
- Applebot-Extended - Apple's AI crawler
- Diffbot - AI data extraction crawler
Why Control AI Crawlers Separately?
You may want different policies for search engines and AI crawlers:
- Search engines help users discover your product updates
- AI crawlers use your content to train AI models that may reference or summarize your content
- You might want search visibility but control over AI training data
- Or you might want to allow AI training but keep search indexing private
Control Modes
Block Mode
Prevent AI crawlers from accessing your changelog.
- Blocks all major AI crawlers via robots.txt
- Prevents your content from being used for AI training
- Gives you control over how your content is used
- Perfect if you want to opt-out of AI training
Default for free users: Free accounts default to Block mode to protect your content.
Allow Mode
Normal access - standard behavior.
- Allows AI crawlers to access your content
- No special directives - standard access
- Market average approach
- Your content may be used for AI training
This is the default for paid users and represents standard practice.
Optimize Mode
Explicitly encourage AI crawler access with enhanced features.
- Explicitly allows all major AI crawlers
- Includes sitemap reference in robots.txt
- Structured data (Schema.org) for better AI understanding
- Helps AI systems better understand and reference your content
Premium feature: Optimize mode is available for paid users who want to maximize AI visibility.
How It Works
Robots.txt Configuration
ChangeCrab automatically configures your robots.txt file to control AI crawlers:
- Block: Adds
Disallow: / directives for all major AI crawlers
- Allow: No restrictions (default allow behavior)
- Optimize: Explicitly allows AI crawlers and includes sitemap reference
Structured Data
When Optimize mode is enabled, ChangeCrab adds Schema.org structured data that helps AI systems:
- Better understand your content structure
- Identify your changelog as a collection of updates
- Extract meaningful information about your product
Configuring AI Crawler Control
- Navigate to your changelog Settings
- Go to the Privacy & Visibility section
- Find AI Crawler Access
- Select your preferred mode:
- Block - Prevent AI crawlers
- Allow - Normal access
- Optimize - Enhanced access (Premium)
- Click Save
Important: AI crawler control is separate from search engine indexing. You can block AI crawlers while allowing search engines, or vice versa.
Use Cases
Block AI Crawlers, Allow Search Engines
Best for: Companies that want search visibility but want to control AI training data.
- Search engines can index your changelog
- AI systems cannot use your content for training
- You maintain control over how your content is used
Allow Both
Best for: Companies that want maximum visibility and don't mind AI training.
- Maximum discoverability in search results
- Your content may be referenced by AI assistants
- Good for brand awareness and reach
Block Both
Best for: Internal or sensitive changelogs that need complete privacy.
- No search engine indexing
- No AI training data collection
- Maximum privacy and control
Optimize Both
Best for: Public-facing products that want maximum visibility and AI presence.
- Professional SEO optimization
- Enhanced AI system understanding
- Best chance of being referenced by AI assistants
- Maximum brand visibility
Compliance and Best Practices
Robots.txt Compliance
Reputable AI crawlers (like GPTBot, Claude-Web) respect robots.txt directives. However:
- Compliance is voluntary - some crawlers may ignore directives
- Block mode provides strong protection but isn't 100% guaranteed
- For maximum protection, combine with password protection or private changelogs
When to Use Each Mode
- Block: When you want to opt-out of AI training or have sensitive content
- Allow: Standard approach - let AI systems access your content normally
- Optimize: When you want to maximize AI visibility and understanding
Related Features
FAQ
Why would I want to block AI crawlers?
You might want to block AI crawlers if:
- You want to control how your content is used for AI training
- You have sensitive or proprietary information
- You prefer to opt-out of AI training data collection
- You want to maintain exclusive control over your content
Why would I want to allow or optimize AI crawlers?
You might want to allow AI crawlers if:
- You want your product to be referenced by AI assistants
- You see value in AI systems understanding your updates
- You want maximum brand visibility
- You're comfortable with your content being used for AI training
Can I use Optimize mode on a free plan?
Optimize mode is a premium feature available to paid users. Free users can use Block or Allow modes, or upgrade to Premium to access Optimize mode.
Will Block mode prevent all AI crawlers?
Block mode uses robots.txt directives that reputable AI crawlers respect. However, compliance is voluntary, and some crawlers may ignore these directives. For maximum protection, consider making your changelog private or adding password protection.
What's the difference between blocking AI crawlers and search engines?
Search engines help users discover your content through search results. AI crawlers collect content for training AI models. You can control them independently - for example, allowing search engines while blocking AI crawlers, or vice versa.
How do I know if AI crawlers are accessing my changelog?
You can check your server logs or analytics for user agents like "GPTBot", "Claude-Web", "PerplexityBot", etc. However, ChangeCrab's Block mode should prevent most reputable AI crawlers from accessing your content.