Question 1

Should I block Diffbot?

Accepted Answer

It's not advisable. Its structured data ends up in business tools and AI systems that can showcase your business. Being well represented in its knowledge base works in your favor.

Question 2

Does Diffbot respect robots.txt?

Accepted Answer

Partially. Its mass crawls (Crawl) respect robots.txt according to its official documentation, including disallow and crawl-delay directives; but extractions of specific URLs requested by customers can be processed even if there's a block in place.

Question 3

How do I know if Diffbot visits my site?

Accepted Answer

Search for "Diffbot" in your server logs. Its user-agent includes a link to its crawler documentation that lets you identify it without doubt.

Diffbot

How to allow it in your robots.txt

How to block it (not recommended)

Frequently asked questions

Should I block Diffbot?

Does Diffbot respect robots.txt?

How do I know if Diffbot visits my site?

Related resources

The bots already read your site. Do you know what AI says about your business?