01 Why AI visibility resists casual measurement
There is no rank tracker for a conversation. AI answers vary by prompt phrasing, by user context, by model version, and sometimes by nothing you can identify at all. A screenshot of ChatGPT recommending your brand is an anecdote, not a measurement, and the same is true of the screenshot where it recommends your competitor. Any method that reads a single answer as a result will produce whiplash, not insight. The variance is not a flaw you can wait out, either. It is how probabilistic systems behave, and a vendor promising a fixed ranking inside one has already told you how carelessly they measure.
Measuring AI visibility borrows the fix every noisy field uses: sample repeatedly, hold the instrument constant, and read trends instead of points. The thing being sampled is specific too: whether the backlinks and press coverage a brand has earned, the signals AI systems absorb as trust, have actually started registering in answers. That is less exciting than a live dashboard, and considerably more honest.
02 Prompt panels: the core instrument
A prompt panel is a fixed set of questions your buyers might realistically ask an assistant, written once and then held constant. Ours cover the spread of buying intent: category recommendations, comparisons against named competitors, problem-first questions, and brand fact checks. The panel runs against ChatGPT, Claude, Perplexity, Gemini, Google AI Overviews, and Microsoft Copilot on a schedule, and every response is recorded the same way.
The discipline is in not editing the panel to flatter the results. Prompts get added when the category shifts, and changes are logged, but the core set stays fixed so that month twelve is comparable with month one.
Panel design is where the judgment lives. Prompts need to sound like buyers, not like marketers auditioning: the question a procurement lead types at four in the afternoon, the comparison a founder runs before a demo, the fact check a journalist makes on deadline. A panel of flattering softballs measures nothing.
03 Share of voice, defined properly
For each run, record three things per brand: was it mentioned at all, was it actively recommended, and was one of its pages cited as a source. Share of voice is then the fraction of runs in which your brand appears, tracked separately for mentions and recommendations, and always alongside the same numbers for competitors. A rising mention rate with a flat recommendation rate is a real diagnosis: the engines know you exist but do not yet trust you enough to endorse, which usually means the coverage footprint is still too thin to verify. Position matters as well: first named carries different weight than fifth in a list, so we record order alongside presence.
Citation counts get tracked as their own line because they respond to different work, the asset building covered in AI citations.
04 Referral traffic from AI surfaces
Assistants send measurable visitors. Traffic from chatgpt.com and perplexity.ai shows up in analytics referrer data, and clicks from AI Overviews arrive blended into Google organic. The volumes are usually modest, the intent is usually excellent, and the data is genuinely useful as corroboration: if panel share of voice climbs and AI referrals climb with it, the two measurements are validating each other.
Treat referrals as the supporting witness, not the primary metric. Plenty of AI influence never produces a click at all, because the buyer takes the recommendation and searches for your brand directly, showing up in analytics as branded search instead. As of mid-2026 the referrer data is still crude, and some assistants send visitors with no referrer at all, so an absence of clicks never proves an absence of influence.
05 Reading noisy data without flinching
Single-month movements are usually noise. A brand can drop out of a specific answer for a week and return without anything changing on either side. The signals worth acting on are sustained: three consecutive runs trending the same direction, a competitor newly appearing across many prompts at once, or a whole engine shifting after a model update. Model updates deserve special respect, because a new version can reset behavior across the board overnight. Good measurement logs those events and annotates the timeline, so nobody mistakes a lab release for a strategy failure, in either direction.
The same discipline applies to good news. One flattering answer is not a win to report. It is a data point inside a trend that might become one, and treating it that way is what keeps measuring AI visibility honest.
06 How we run it monthly
Every client gets a fixed panel re-run monthly across all six surfaces, a share of voice table against named competitors, a citation ledger showing which pages got quoted where, and an annotated log of model updates and panel changes. The same report pairs the numbers with the work that drives them, the press placements and backlinks earned that month, because coverage is the input the engines absorb as trust and measurement exists to prove it is landing. The mechanism itself is explained in backlinks for AI trust, and per-engine depth, such as share of voice inside ChatGPT specifically, is covered under ChatGPT visibility.