How to monitor all your clients across 4 AIs without doing it by hand or taking screenshots
The first time you check whether a client shows up in ChatGPT, the manual method works fine: you open the AI, type the prompts, take screenshots, paste them into a dated document. An hour, maybe two, and you have something to show in the meeting.
The problem isn't the first time. It's the fortieth.
Because monitoring AI visibility —what the industry calls GEO (Generative Engine Optimization), the tracking and improvement of how brands appear in the answers of AI search engines— isn't a photo, it's a film. Answers change from one week to the next, and a serious service requires measuring them on a recurring basis, for every client, across every relevant AI. And that's where the manual arithmetic breaks down.
The real arithmetic: clients × prompts × AIs × frequency
Let's run the numbers for a small agency that takes the service seriously:
- 6 clients on an AI visibility service.
- 20 prompts per client (the minimum to cover the commercial questions in a sector: generic, local, comparative and brand).
- 4 AIs (ChatGPT, Gemini, Perplexity and Claude — they don't agree with each other, so looking at just one is looking at a quarter of the problem).
- Weekly frequency, because answers shift week by week and a monthly measurement leaves you blind for 30 days straight.
The multiplication: 6 × 20 × 4 = 480 queries a week. Being optimistic —2 minutes per query to launch the prompt, read the answer, note whether the client appears, in which position, what's said about them and who else shows up, and save the screenshot— that's 16 hours a week. Two full working days for one person, every single week, just to gather data. Without even building the report.
At €25/hour of internal cost, that's around €1,600/month in hours. And that's with 6 clients: with 10, you're over 26 hours a week, and there's no person who can take that on without dropping something else.
Cutting the frequency to monthly divides the hours by four, yes. But then the service suffers: if an AI starts giving a false fact about the client on day 3, you find out on day 30. A bad place to find out, especially when 37.9% of the Spanish population already uses generative AI (INE, last quarter of 2025): during those four weeks, real people received the wrong information.
What screenshots won't give you, no matter how many hours you put in
Even if you had the hours, the screenshot method has three holes that effort won't fix:
1. It isn't comparable over time. Forty screenshots in folders don't answer the question the client will ask: "am I better than three months ago?". To answer it you need structured data —appearances per prompt, per AI, per week— and screenshots aren't that.
2. It doesn't catch what you're not looking for. By hand you check your 20 prompts and that's it. You don't see that the AI has started recommending the client with an outdated fact in a question that wasn't on your list, or that a new competitor has entered the answers for half of your prompts.
3. It depends on who's asking and how. An AI's answers vary between sessions and phrasings. Serious measurement needs to run the same prompts every time, under the same conditions, and record the result systematically. A person with a browser isn't a measurement protocol; it's a person with a browser.
The result of those three holes together: the reporting you deliver is anecdotal, and the client —sooner or later— notices. And if the conversation arrives while their organic traffic is falling (AI Overviews have cut organic CTR by an average of 61% in Spain, according to ismajimenez.com), turning up with loose screenshots is turning up unarmed. For that specific conversation there's how to explain the organic traffic drop without panic setting in.
What the tool has to do (whether you build it or buy it)
What's needed is the same thing you already have in SEO with your rank tracker, but for AIs:
- Run each client's prompts against the 4 AIs automatically and on a recurring basis, every week, always the same way.
- Record, for each answer, whether the client appears, what's said about them, which sources are cited and which competitors show up alongside.
- Comparable history: evolution per prompt, per AI and per month, to answer "am I better?" with a chart rather than a hunch.
- True multi-client: each client in their own separate space, with their own prompts and report, without mixing data.
- Presentable output: a report you can hand over without spending two hours laying out screenshots.
You can build it yourself with custom development against the four providers; it's a technical project with permanent maintenance, because the models and formats change every few months. Most agencies don't want that project: they want the data on Monday morning.
The arithmetic with a tool
This is where Surfeo for agencies comes in, which we built for exactly this multiplication. The Agency account costs €20/month as a base, and each client you connect costs €35/month (Starter tier: 40 prompts, 3 AIs, weekly tracking and 6 articles a month) or €79/month (Growth tier: 75 prompts and all 4 AIs, Claude included), up to 10 clients. Each one in its own space, with the AIs queried every week and a PDF report for your reporting.
Go back to the agency with 6 clients: by hand, ~16 hours a week (≈€1,600/month in internal cost). With a tool, 20 + 6 × 35 = €230/month, and the hours come down to interpreting the data and deciding actions — which is the work the client pays you for, not copying answers into a spreadsheet. The difference isn't 20%; it's an order of magnitude, and it grows with every new client.
And there's a detail that matters for your margin: what you charge for the service doesn't change just because the data collection is automatic. If you charge €400-600/month per client —within the reasonable range, the full numbers here—, the tool cost per client (€35-79) leaves the rest for your real work: analysis, fixes, content. What to deliver with that data each week is covered in what to deliver in the first month of an AI visibility service.
Frequently asked questions
Do you really need to measure every week? Isn't once a month enough?
For a "how are we doing" report, monthly is fine. For a service that fixes things, no: AI answers change often and you need to know soon whether a fix has taken effect or a new false fact has appeared. Weekly measurement is the difference between operating the service and performing its autopsy.
Why 4 AIs and not just ChatGPT, which is the one everyone uses?
Because the answers don't agree: in our study of 9,865 Spanish SMEs, 91% appeared in only 1 of the 4 AIs (full study). Your client may be perfect in ChatGPT and invisible in Perplexity and Gemini — and their audience uses all four. Measuring just one is reporting a quarter of reality.
Are screenshots now useless for everything?
They work as a one-off piece: to open a sales meeting or illustrate a specific case in the report, a dated screenshot is very telling. What they don't work as is a continuous monitoring system: there you need structured, comparable data.
What happens if I have more than 10 clients?
The Agency plan covers up to 10 client spaces. If your AI visibility portfolio exceeds that, write to us from the pricing page and we'll look at it with you — it's the kind of problem we like to have.
Do your own arithmetic: clients × prompts × 4 AIs × weeks. If the result is more hours than you want to give away, start by seeing the tool in action with a real client in the free AI visibility test.