cohere-training-data-crawler
This is Cohere's training crawler — Cohere is an AI company focused on language models for the business world. It collects public text from the web to train and refine its models. Cohere doesn't publish an official documentation page for the bot, so the available reference comes from reputable specialized directories, which confirm its user-agent and behavior.
- User-agent
cohere-training-data-crawler- Does it respect robots.txt?
- Yes
- Official documentation
- https://knownagents.com/agents/cohere-training-data-crawler
How to allow it in your robots.txt
User-agent: cohere-training-data-crawler
Allow: /How to block it (not recommended)
User-agent: cohere-training-data-crawler
Disallow: /Frequently asked questions
Should I block cohere-training-data-crawler?
If you want Cohere's models, used by many companies in their own products, to know about your content, no. Blocking it excludes you from that branch of the AI ecosystem without gaining anything in return.
Does this bot respect robots.txt?
This isn't confirmed by Cohere. The specialized directories that document it assume that, like most bots from reputable companies, it follows robots.txt directives targeting "cohere-training-data-crawler", but Cohere doesn't publish official documentation guaranteeing it.
How do I know if it visits my site?
Search for "cohere-training-data-crawler" in your server logs. It sometimes appears together with a contact email for Cohere's crawling team in the user-agent string itself.