All AI bots

CCBot

Common CrawlTraining

CCBot is the crawler for Common Crawl, a non-profit foundation that maintains a public, free archive of the web. That archive is the raw material used to train a huge number of AI models, both commercial and open-source. Being in Common Crawl means being in the knowledge base of much of today's AI ecosystem.

User-agent
CCBotCCBot/2.0 (https://commoncrawl.org/faq/)
Does it respect robots.txt?
Yes
Official documentation
https://commoncrawl.org/ccbot

How to allow it in your robots.txt

User-agent: CCBot
Allow: /

How to block it (not recommended)

User-agent: CCBot
Disallow: /

Frequently asked questions

Should I block CCBot?

It's not advisable if you're looking for AI visibility. Common Crawl feeds dozens of models at once: blocking CCBot is like erasing yourself from the encyclopedia almost every AI uses to learn.

Does CCBot respect robots.txt?

Yes. A simple Disallow rule for the CCBot user-agent is enough. Common Crawl also publishes its official IP ranges and offers a voluntary opt-out registry, and warns that impostors posing as CCBot exist.

How do I know if CCBot visits my site?

Search for "CCBot" in your server logs. Legitimate visits can be verified via reverse DNS: they resolve to domains like crawl.commoncrawl.org.

Related resources

Do you know if these bots already read your site and what they say about you? Run the free test.

Run the free test