All AI bots

Diffbot

DiffbotMixed

Diffbot uses computer vision and artificial intelligence to turn web pages into structured data (products, articles, organizations) and maintain a huge knowledge base of the web. Many companies and AI applications consume that data. If your business is listed there with accurate data, that information later flows into tools and assistants that reuse it.

User-agent
DiffbotMozilla/5.0 (compatible; Diffbot/0.1; +http://www.diffbot.com/our-apis/crawler/)
Does it respect robots.txt?
Partially

How to allow it in your robots.txt

User-agent: Diffbot
Allow: /

How to block it (not recommended)

User-agent: Diffbot
Disallow: /

Frequently asked questions

Should I block Diffbot?

It's not advisable. Its structured data ends up in business tools and AI systems that can showcase your business. Being well represented in its knowledge base works in your favor.

Does Diffbot respect robots.txt?

Partially. Its mass crawls (Crawl) respect robots.txt according to its official documentation, including disallow and crawl-delay directives; but extractions of specific URLs requested by customers can be processed even if there's a block in place.

How do I know if Diffbot visits my site?

Search for "Diffbot" in your server logs. Its user-agent includes a link to its crawler documentation that lets you identify it without doubt.

Related resources

Do you know if these bots already read your site and what they say about you? Run the free test.

Run the free test