All AI bots

cohere-training-data-crawler

CohereTraining

This is Cohere's training crawler — Cohere is an AI company focused on language models for the business world. It collects public text from the web to train and refine its models. Cohere doesn't publish an official documentation page for the bot, so the available reference comes from reputable specialized directories, which confirm its user-agent and behavior.

User-agent
cohere-training-data-crawler
Does it respect robots.txt?
Yes

How to allow it in your robots.txt

User-agent: cohere-training-data-crawler
Allow: /

How to block it (not recommended)

User-agent: cohere-training-data-crawler
Disallow: /

Frequently asked questions

Should I block cohere-training-data-crawler?

If you want Cohere's models, used by many companies in their own products, to know about your content, no. Blocking it excludes you from that branch of the AI ecosystem without gaining anything in return.

Does this bot respect robots.txt?

This isn't confirmed by Cohere. The specialized directories that document it assume that, like most bots from reputable companies, it follows robots.txt directives targeting "cohere-training-data-crawler", but Cohere doesn't publish official documentation guaranteeing it.

How do I know if it visits my site?

Search for "cohere-training-data-crawler" in your server logs. It sometimes appears together with a contact email for Cohere's crawling team in the user-agent string itself.

Related resources

Do you know if these bots already read your site and what they say about you? Run the free test.

Run the free test