AI2Bot
AI2Bot is the crawler for the Allen Institute for AI (Ai2), a non-profit research organization. It collects web pages to build open datasets, like Dolma, used to train and evaluate its open-source language models. Those datasets are later reused by many other AI projects, so being in them extends your footprint in the ecosystem.
- User-agent
AI2BotMozilla/5.0 (compatible) AI2Bot (+https://www.allenai.org/crawler)- Does it respect robots.txt?
- Yes
- Official documentation
- https://allenai.org/crawler
How to allow it in your robots.txt
User-agent: AI2Bot
Allow: /How to block it (not recommended)
User-agent: AI2Bot
Disallow: /Frequently asked questions
Should I block AI2Bot?
It's not advisable. Its open datasets feed open-source models used by thousands of developers and products. Allowing it extends the reach of your content in AI at no cost to you.
Does AI2Bot respect robots.txt?
Yes. Ai2 publishes an official crawling notice documenting its user-agent and confirms you can filter or reject its traffic with standard rules targeting "AI2Bot".
How do I know if AI2Bot visits my site?
Search for "AI2Bot" in your server logs. Its user-agent includes a link to allenai.org/crawler, the official page where the institute explains how it works.