What KPIs to put in a GEO proposal: the ones you can commit to and the ones you only report
You're writing the GEO proposal (Generative Engine Optimization: the work of measuring and improving how a brand appears in the answers of ChatGPT, Gemini, Perplexity and Claude) and you reach the KPIs section. And there the temptation appears: "the client wants results, let's say that in 3 months they'll appear in 50% of the prompts".
Don't sign it. Not because it's ambitious, but because it doesn't depend on you. And a KPI that doesn't depend on you, in a signed proposal, isn't a target: it's a future claim against your agency.
This article divides GEO KPIs into two columns -the ones you can commit to by contract and the ones you only report- and, more importantly, explains the criterion for telling them apart, because the criterion will serve you when new metrics appear.
The criterion: who controls the outcome?
The dividing line is a single question: does the value of this metric depend on work you carry out, or on the decision of a system you don't control?
AI answers are the second case, in its purest form. They change between weeks and between phrasings of the same prompt, the models update without warning, and no one -not the most expensive agency in the world- can force ChatGPT to mention a client. You can systematically improve the conditions for it to happen (the sources, the content, third-party mentions: that's the service), but the final decision is taken by a model that isn't yours.
In SEO this distinction already existed -no one serious guarantees position 1 in Google- but the industry had spent years cheating with it. GEO gives you the chance to start the service with the division done properly from the very first proposal. And there's a direct commercial benefit: when you explain why you don't commit to mentions, you're explaining how the AIs work, and that positions you as the one who understands the terrain against the competitor who promises smoke.
Column 1: the KPIs you can commit to
These are metrics of executed work and measurement coverage. They depend 100% on your agency, so they can go in the contract with a number and a date.
1. Coverage of monitored prompts. "We'll measure N prompts (the commercial questions in your sector) across the 4 AIs, every week, with comparable history." It's the founding KPI: it defines the perimeter of everything else. Commit to the number of prompts, the AIs and the frequency — and deliver it from week 1, because the dated baseline is what will let you demonstrate progression later.
2. Corrections carried out on the sources. "We'll audit the sources the AIs draw on (site, listings, directories, reviews, mentions) and carry out the prioritised corrections: false data, inconsistencies, omissions." You commit to the full diagnosis in the first month and a pace of corrections after that. It's verifiable work: each correction has a before, an after and a date.
3. Citable content published. "We'll publish N pieces a month geared to the sector's questions: content structured so the AIs can extract and cite it." Commit to quantity, defined quality and a calendar. What you don't commit to is the AIs citing it — that lives in the other column.
To these three you can add a fourth, a process one: the monthly report delivered, with a fixed date. It seems trivial and it's one of the biggest renewal-savers — what that report contains in the first month is in what to deliver in the first month of an AI visibility service.
Column 2: the KPIs you report, but don't commit to
These are the outcome metrics: the ones the client really wants to see and the ones you really chase. They're measured rigorously, reported every month, their improvement celebrated — but they don't carry a guaranteed number in the proposal, because their value is decided by the AIs.
1. Mentions: in how many prompts the client appears, by AI. The central result of the service. It's reported with its progression against the baseline. It isn't committed to because a model update can move the figure in any direction in any given week, without anyone having done anything wrong.
2. Share of voice: the client's share against competitors in the same prompts. A brilliant metric for the report, because it provides context (perhaps the client doesn't rise, but the competitor falls — or the other way round). Doubly uncontrollable: it depends on what the AIs decide about the client and about third parties.
3. Sentiment and accuracy: what the AIs say when they mention them. Do they describe them well, with correct data, in positive terms? Here there's an honest nuance: detected false data does create a commitment, but the commitment is to correct the source (column 1), not the date on which the AI will reflect the correction.
The drafting rule for this column: verbs of measurement, never of promise. "We'll measure and report the progression of mentions" — not "we'll increase mentions by 40%".
How it looks in the proposal (text you can adapt)
Key data
Service commitments: weekly monitoring of 40 key questions in your sector across ChatGPT, Gemini and Perplexity; full source audit and correction of detected errors (month 1, and ongoing after that); 6 optimised pieces of content a month; monthly report with full progression.
Results we measure (and why we don't guarantee them): mentions of your brand by question and AI, share of visibility against your competitors and what the AIs say about you. AI answers are decided by each model and change frequently: anyone who guarantees you appearances is selling you something they don't control. Our commitment is to execute, measure and show you the real progression every month.
That second paragraph, far from weakening the proposal, is usually the one that sets it apart. And it connects with the context the client already notices: organic CTR has fallen by an average of 61% in Spain because of AI Overviews (ismajimenez.com) and clickless searches have gone from 56% to 69% in a year (data compiled by stucom.com). The client knows the ground is shifting; what they're looking for is someone to measure it honestly. If the price conversation comes later, the ranges for Spain are here.
An operational note: all of the above assumes you can measure mentions and share of voice systematically, not with scattered screenshots. That machinery is what Surfeo for agencies automates — each client in their own workspace, their prompts run every week against the AIs, with history and a PDF report from which the KPIs of both columns come directly for your reporting.
Frequently asked questions
Won't being honest put me at a disadvantage if my competitor promises guaranteed appearances?
In the short term it may cost you the odd signing against the one who promises smoke. In the medium term, the one who promised 50% of mentions in 3 months has a burned client and a dent in their reputation; you have renewals. And in the sale itself, explaining why no one can guarantee AI answers is the cheapest demonstration that you know what you're talking about.
And if the client insists on a results target in the contract?
Offer them a directional target that's reviewable, not a guarantee: "the work target for the quarter is to improve mention coverage against the baseline; we review it together each month and adjust the plan". Commit to review and reaction, not a figure. If even so they demand a guaranteed figure, weigh up whether you want that client: they'll be the same one demanding a refund when a model updates.
Shouldn't traffic and conversions from AI be the main KPI?
They're the consequence the client enjoys, and when it can be attributed, report it — with justified cheer: traffic from AI answers converts at 14.2% versus the 2.8% of classic organic (sector data compiled by roymo.es). But as a committed KPI it has the same problems as mentions (you don't control the source) plus an extra one: attribution from AIs is still partial. Measure it, report it, don't sign it.
How often do I review the battery of prompts I monitor?
Quarterly, with the client, and document the changes: if you change prompts every month, you destroy the comparability of your own baseline. Add new prompts as a new series in the history, not by replacing the old ones — it's the only way for "we're better than in January" to keep meaning something.
Before writing your next proposal, measure the client's baseline: run their site through the free AI visibility test and you'll have, in minutes, the initial data to anchor both columns of KPIs.