AI bots & crawlers
These are the robots that read your website to feed ChatGPT, Gemini, Perplexity and friends. Block them and your business disappears from their answers.
58 crawlers
GPTBotCollects public content to train OpenAI's AI models, such as the ones behind ChatGPT.
ChatGPT-UserVisits your site live when a ChatGPT user asks for information that requires checking your page.
OAI-SearchBotIndexes websites so they can appear as results and links in ChatGPT search.
ClaudeBotCollects public content to train and improve Anthropic's Claude models.
Claude-UserAccesses your site live when a Claude user asks a question that requires checking it.
Claude-SearchBotIndexes web content to improve the quality and relevance of Claude's search results.
GooglebotCrawls the web for Google Search, including AI Overviews.
Google-ExtendedA robots.txt control token that decides whether your content can be used to train Google's Gemini models.
GoogleOtherA general-purpose crawler Google's product teams use to download public content for various purposes, including AI research and development.
PerplexityBotIndexes websites to show and link them in Perplexity search results.
Perplexity-UserVisits your site live when a user asks a question on Perplexity that requires checking your page.
BingbotCrawls the web for the Bing search engine and also feeds Microsoft Copilot's answers.
meta-externalagentCrawls the web to train Meta's AI models (Llama) and to index content for its products.
meta-externalfetcherDownloads specific links requested by users and assists Meta AI's agentic capabilities.
ApplebotCrawls the web for Apple's search features: Siri, Spotlight and Safari suggestions.
Applebot-ExtendedA control token that decides whether Apple can use your content to train Apple Intelligence models.
AmazonbotCrawls the web to improve Amazon products and services, such as Alexa's answers, and can be used to train its AI models.
BytespiderCollects web content at scale to train ByteDance's AI models — the company behind TikTok and Doubao.
CCBotBuilds an open archive of the web that serves as training data for many AI models.
cohere-training-data-crawlerCollects public web content to train Cohere's language models, aimed at businesses.
MistralAI-UserVisits your site when a user of Vibe, Mistral's assistant, asks a question that requires checking it.
DuckAssistBotCollects content for DuckAssist, DuckDuckGo's AI-generated answers.
YouBotDiscovers and indexes web pages so You.com's search engine and AI assistants can give up-to-date answers.
DiffbotExtracts structured data from web pages to build a knowledge base used by businesses and AI systems.
AI2BotCollects web documents to build open datasets used to train and evaluate Ai2's language models.
GrokBotRetrieves web content for Grok, xAI's AI assistant built into X (formerly Twitter).
PetalBotCrawls the web for Petal Search, Huawei's search engine, and its ecosystem services.
ImagesiftBotCollects public images and their context for Hive's web intelligence and AI products.
Google-NotebookLMDownloads the content of a URL when a NotebookLM user — on Google's AI-powered note-taking app — manually adds it as a source in their project.
Google-Read-AloudPowers Google's Read Aloud feature: when a user asks to have a web page read to them, this bot fetches the text to convert it into audio.
Google-CloudVertexBotIndexes your website content to power AI agents and enterprise search applications built on Google Cloud Vertex AI.
Google-InspectionToolFetches your page on demand when you actively use the URL Inspection tool or the Rich Results Test in Google Search Console.
Google-AgentBrowses the web on behalf of real users so Google's AI agent (Project Mariner) can carry out specific tasks, such as finding information or taking actions on web pages.
msnbotCrawled web pages to build the Bing (and formerly MSN Search) index, helping sites appear in Microsoft's search results.
Amzn-SearchBotCrawls public web content to improve search relevance across Amazon products, including Alexa.
facebookexternalhitGenerates the link preview when someone shares your website on Facebook, Instagram, or Messenger.
FacebookBotDownloads public web content to build training datasets for Meta's voice recognition technology.
meta-externaladsCrawls external pages to review Meta ads and generate the link previews displayed on Facebook, Instagram, and WhatsApp.
BaiduspiderCrawls public web pages to index them in Baidu, China's largest search engine.
KimiBotCrawls public web content to train Moonshot AI's foundation models, which power the Kimi assistant.
Kimi-UserFetches your web content on demand when a Kimi user asks for real-time information or to summarise an article.
Kimi-SearchBotAnalyzes web pages for relevance to build the search index that powers Kimi, Moonshot AI's AI assistant.
MistralAI-IndexCrawls public web pages to index content that powers the search engine inside Mistral's Vibe platform.
OAI-AdsBotValidates that landing pages submitted for ChatGPT Ads comply with OpenAI's advertising policies and are relevant to the ad.
webzio-extendedValidates which publicly available web content can legally be used to train AI models, labeling it for Webz.io's data pipeline clients.
omgilibotCrawls news sites, forums and user-generated content to power Webz.io's data API, sold to media monitoring platforms and AI training companies alike.
VelenPublicWebCrawlerCrawls public websites to build business datasets and train the machine learning models that power Hunter.io.
BrightbotCollects public web data on behalf of business clients that need price monitoring, competitive intelligence, or commercial databases.
SemrushBot-OCOBCrawls public web content to power Semrush's Content Toolkit, an AI-assisted content marketing and SEO suite.
YandexAdditionalBotCrawls pages already indexed by Yandex to use as real-time sources for YandexGPT, Yandex's AI-powered search feature.
TimpibotCrawls public websites to build the index for Timpi's decentralised search engine, with that index potentially also used to train AI models.
AndibotCrawls public websites to power Andi's generative search results, which answer questions with AI-generated summaries instead of a traditional list of links.
iaskspiderCrawls public web pages to build the index powering iAsk, an AI-based answer search engine.
PhindBotCrawls websites on demand to power Phind's AI-based search engine, which is aimed at developers and technical users.
img2datasetDownloads images from public websites at scale to build datasets for training AI vision and image-generation models.
PanguBotDownloads public web content to train PanGu, Huawei's multimodal AI model.
CotoyogiCollects publicly available web content in Japanese to build training datasets for AI models.
AwarioBotCrawls public websites to collect brand and keyword mentions on behalf of Awario's monitoring customers.