All AI bots

AI2Bot

Allen Institute for AITraining

AI2Bot is the crawler for the Allen Institute for AI (Ai2), a non-profit research organization. It collects web pages to build open datasets, like Dolma, used to train and evaluate its open-source language models. Those datasets are later reused by many other AI projects, so being in them extends your footprint in the ecosystem.

User-agent
AI2BotMozilla/5.0 (compatible) AI2Bot (+https://www.allenai.org/crawler)
Does it respect robots.txt?
Yes
Official documentation
https://allenai.org/crawler

How to allow it in your robots.txt

User-agent: AI2Bot
Allow: /

How to block it (not recommended)

User-agent: AI2Bot
Disallow: /

Frequently asked questions

Should I block AI2Bot?

It's not advisable. Its open datasets feed open-source models used by thousands of developers and products. Allowing it extends the reach of your content in AI at no cost to you.

Does AI2Bot respect robots.txt?

Yes. Ai2 publishes an official crawling notice documenting its user-agent and confirms you can filter or reject its traffic with standard rules targeting "AI2Bot".

How do I know if AI2Bot visits my site?

Search for "AI2Bot" in your server logs. Its user-agent includes a link to allenai.org/crawler, the official page where the institute explains how it works.

Related resources

Do you know if these bots already read your site and what they say about you? Run the free test.

Run the free test