Question 1

Should I block cohere-training-data-crawler?

Accepted Answer

If you want Cohere's models, used by many companies in their own products, to know about your content, no. Blocking it excludes you from that branch of the AI ecosystem without gaining anything in return.

Question 2

Does this bot respect robots.txt?

Accepted Answer

This isn't confirmed by Cohere. The specialized directories that document it assume that, like most bots from reputable companies, it follows robots.txt directives targeting "cohere-training-data-crawler", but Cohere doesn't publish official documentation guaranteeing it.

Question 3

How do I know if it visits my site?

Accepted Answer

Search for "cohere-training-data-crawler" in your server logs. It sometimes appears together with a contact email for Cohere's crawling team in the user-agent string itself.

cohere-training-data-crawler

How to allow it in your robots.txt

How to block it (not recommended)

Frequently asked questions

Should I block cohere-training-data-crawler?

Does this bot respect robots.txt?

How do I know if it visits my site?

Related resources

Do you know if these bots already read your site and what they say about you? Run the free test.