AI bots & crawlers

These are the robots that read your website to feed ChatGPT, Gemini, Perplexity and friends. Block them and your business disappears from their answers.

58 crawlers

OpenAITraining

GPTBot

Collects public content to train OpenAI's AI models, such as the ones behind ChatGPT.

OpenAIUser action

ChatGPT-User

Visits your site live when a ChatGPT user asks for information that requires checking your page.

OpenAISearch

OAI-SearchBot

Indexes websites so they can appear as results and links in ChatGPT search.

AnthropicTraining

ClaudeBot

Collects public content to train and improve Anthropic's Claude models.

AnthropicUser action

Claude-User

Accesses your site live when a Claude user asks a question that requires checking it.

AnthropicSearch

Claude-SearchBot

Indexes web content to improve the quality and relevance of Claude's search results.

GoogleMixed

Googlebot

Crawls the web for Google Search, including AI Overviews.

GoogleTraining

Google-Extended

A robots.txt control token that decides whether your content can be used to train Google's Gemini models.

GoogleMixed

GoogleOther

A general-purpose crawler Google's product teams use to download public content for various purposes, including AI research and development.

PerplexitySearch

PerplexityBot

Indexes websites to show and link them in Perplexity search results.

PerplexityUser action

Perplexity-User

Visits your site live when a user asks a question on Perplexity that requires checking your page.

MicrosoftMixed

Bingbot

Crawls the web for the Bing search engine and also feeds Microsoft Copilot's answers.

MetaTraining

meta-externalagent

Crawls the web to train Meta's AI models (Llama) and to index content for its products.

MetaUser action

meta-externalfetcher

Downloads specific links requested by users and assists Meta AI's agentic capabilities.

AppleSearch

Applebot

Crawls the web for Apple's search features: Siri, Spotlight and Safari suggestions.

AppleTraining

Applebot-Extended

A control token that decides whether Apple can use your content to train Apple Intelligence models.

AmazonMixed

Amazonbot

Crawls the web to improve Amazon products and services, such as Alexa's answers, and can be used to train its AI models.

ByteDanceTraining

Bytespider

Collects web content at scale to train ByteDance's AI models — the company behind TikTok and Doubao.

Common CrawlTraining

CCBot

Builds an open archive of the web that serves as training data for many AI models.

CohereTraining

cohere-training-data-crawler

Collects public web content to train Cohere's language models, aimed at businesses.

Mistral AIUser action

MistralAI-User

Visits your site when a user of Vibe, Mistral's assistant, asks a question that requires checking it.

DuckDuckGoSearch

DuckAssistBot

Collects content for DuckAssist, DuckDuckGo's AI-generated answers.

You.comSearch

YouBot

Discovers and indexes web pages so You.com's search engine and AI assistants can give up-to-date answers.

DiffbotMixed

Diffbot

Extracts structured data from web pages to build a knowledge base used by businesses and AI systems.

Allen Institute for AITraining

AI2Bot

Collects web documents to build open datasets used to train and evaluate Ai2's language models.

xAIMixed

GrokBot

Retrieves web content for Grok, xAI's AI assistant built into X (formerly Twitter).

HuaweiMixed

PetalBot

Crawls the web for Petal Search, Huawei's search engine, and its ecosystem services.

HiveMixed

ImagesiftBot

Collects public images and their context for Hive's web intelligence and AI products.

GoogleUser action

Google-NotebookLM

Downloads the content of a URL when a NotebookLM user — on Google's AI-powered note-taking app — manually adds it as a source in their project.

GoogleUser action

Google-Read-Aloud

Powers Google's Read Aloud feature: when a user asks to have a web page read to them, this bot fetches the text to convert it into audio.

GoogleMixed

Google-CloudVertexBot

Indexes your website content to power AI agents and enterprise search applications built on Google Cloud Vertex AI.

GoogleUser action

Google-InspectionTool

Fetches your page on demand when you actively use the URL Inspection tool or the Rich Results Test in Google Search Console.

GoogleUser action

Google-Agent

Browses the web on behalf of real users so Google's AI agent (Project Mariner) can carry out specific tasks, such as finding information or taking actions on web pages.

MicrosoftSearch

msnbot

Crawled web pages to build the Bing (and formerly MSN Search) index, helping sites appear in Microsoft's search results.

AmazonSearch

Amzn-SearchBot

Crawls public web content to improve search relevance across Amazon products, including Alexa.

MetaMixed

facebookexternalhit

Generates the link preview when someone shares your website on Facebook, Instagram, or Messenger.

MetaTraining

FacebookBot

Downloads public web content to build training datasets for Meta's voice recognition technology.

MetaMixed

meta-externalads

Crawls external pages to review Meta ads and generate the link previews displayed on Facebook, Instagram, and WhatsApp.

BaiduSearch

Baiduspider

Crawls public web pages to index them in Baidu, China's largest search engine.

Moonshot AITraining

KimiBot

Crawls public web content to train Moonshot AI's foundation models, which power the Kimi assistant.

Moonshot AIUser action

Kimi-User

Fetches your web content on demand when a Kimi user asks for real-time information or to summarise an article.

Moonshot AISearch

Kimi-SearchBot

Analyzes web pages for relevance to build the search index that powers Kimi, Moonshot AI's AI assistant.

Mistral AISearch

MistralAI-Index

Crawls public web pages to index content that powers the search engine inside Mistral's Vibe platform.

OpenAIMixed

OAI-AdsBot

Validates that landing pages submitted for ChatGPT Ads comply with OpenAI's advertising policies and are relevant to the ad.

Webz.ioTraining

webzio-extended

Validates which publicly available web content can legally be used to train AI models, labeling it for Webz.io's data pipeline clients.

Webz.ioMixed

omgilibot

Crawls news sites, forums and user-generated content to power Webz.io's data API, sold to media monitoring platforms and AI training companies alike.

Velen.ioTraining

VelenPublicWebCrawler

Crawls public websites to build business datasets and train the machine learning models that power Hunter.io.

Bright DataMixed

Brightbot

Collects public web data on behalf of business clients that need price monitoring, competitive intelligence, or commercial databases.

SemrushMixed

SemrushBot-OCOB

Crawls public web content to power Semrush's Content Toolkit, an AI-assisted content marketing and SEO suite.

YandexSearch

YandexAdditionalBot

Crawls pages already indexed by Yandex to use as real-time sources for YandexGPT, Yandex's AI-powered search feature.

TimpiSearch

Timpibot

Crawls public websites to build the index for Timpi's decentralised search engine, with that index potentially also used to train AI models.

AndiSearch

Andibot

Crawls public websites to power Andi's generative search results, which answer questions with AI-generated summaries instead of a traditional list of links.

iAskSearch

iaskspider

Crawls public web pages to build the index powering iAsk, an AI-based answer search engine.

PhindSearch

PhindBot

Crawls websites on demand to power Phind's AI-based search engine, which is aimed at developers and technical users.

Romain BeaumontTraining

img2dataset

Downloads images from public websites at scale to build datasets for training AI vision and image-generation models.

HuaweiTraining

PanguBot

Downloads public web content to train PanGu, Huawei's multimodal AI model.

Research Organization of Information and SystemsTraining

Cotoyogi

Collects publicly available web content in Japanese to build training datasets for AI models.

AwarioMixed

AwarioBot

Crawls public websites to collect brand and keyword mentions on behalf of Awario's monitoring customers.

Do you know if these bots already read your site and what they say about you? Run the free test.

Run the free test

More free tools

AI Visibility Test AI Bots Checker llms.txt Checker llms.txt Generator Schema Generator Schema Validator Sitemap Validator