Every AI Alignment project should start the same way: by finding out where you actually stand. Not where you think you stand — where you actually stand, in the answers your buyers are reading right now.
This is the audit. It takes a focused afternoon to do by hand, and it gives you a baseline across all four layers of the framework — Discoverability, Clarity, Authority, Trust. Do it once before you change anything, and you’ll have a “before” you can measure against. Skip it, and you’re optimising blind.
Here’s the process, end to end, plus a template you can copy.
Step 1 — Pick the models that matter to you
You don’t need to track every AI. Start with the surfaces your buyers actually use. For most B2B and consumer brands in 2026 that’s:
- ChatGPT — by far the largest, and the default for most buyers.
- Google AI Overviews — because it sits on top of the searches people still do on Google.
- Perplexity — smaller, but research- and citation-heavy, and over-indexed among power users.
If you’re enterprise or technical, add Gemini, Claude, and Grok. But don’t let “track everything” stop you from starting. Three models, done consistently, beats eight models done once.
Step 2 — Build your prompt set (the new keyword research)
In AI search, the question is the new keyword. Your prompt set is the list of questions a real buyer would ask on the way to choosing something in your category. This is the most important — and most skipped — step. A weak prompt set gives you a meaningless audit.
Build 15–30 prompts across the buyer journey:
- Category / discovery: “What’s the best [category] for [segment]?” · “Top tools for [job to be done]?”
- Comparison: “[You] vs [Competitor]” · “Alternatives to [Competitor]”
- Use-case / fit: “Best [category] for [specific situation, e.g. a 5-person agency]”
- Brand-specific: “What does [your brand] do?” · “Is [your brand] any good?” · “How much does [your brand] cost?”
- Objection / risk: “Is [your brand] safe / legit?” · “[Your brand] reviews” · “Problems with [your brand]”
Organise them with tags — persona, funnel stage, and which layer they test. (Brand-specific prompts mostly test Clarity and Trust; category and comparison prompts test Discoverability and Authority.)
Step 3 — Run them and capture the answers
Ask each prompt in each model. For every answer, record five things:
- Mentioned? Were you named at all? (Discoverability)
- Accurate? Was what it said about you correct? (Clarity)
- Favourable? Was the tone positive, neutral, or negative? (Trust)
- Position / share. Were you first, buried, or one of many? Who else was named? (Authority)
- Sources. Which URLs/domains did it cite? (Discoverability + Authority)
A note on method: AI answers vary between runs and over time, and they personalise. Use a clean session (logged out / no memory), and run each prompt a couple of times so you’re recording the typical answer, not a fluke. Date everything — answers drift, and the drift is the story.
Step 4 — Score across the four layers
Roll your raw notes up into a score per layer. Keep it simple — a 0–5 scale is plenty. The point isn’t false precision; it’s spotting which layer is your weakest link.
| Layer | What you’re scoring | 0–5 guide |
|---|---|---|
| Discoverability | % of category/comparison prompts where you appear at all | 0 = never appears · 5 = appears in nearly every relevant answer |
| Clarity | Accuracy of what models say about you | 0 = frequently wrong / confused with others · 5 = consistently accurate |
| Authority | Position & citation share vs competitors | 0 = competitors dominate, you’re absent · 5 = you’re a default named option |
| Trust | Sentiment & whether you’re recommended | 0 = negative / warned-about · 5 = actively recommended |
Your lowest score is your starting point. The framework is sequential for a reason — fix Discoverability before Authority, Clarity before Trust.
Step 5 — The audit template
Grab it with the buttons below — Copy to paste straight into a sheet, or Download CSV to open it in Excel or Google Sheets. One row per prompt × model; the scorecard at the bottom becomes your baseline.
| Prompt | Model | Mentioned? (Y/N) | Accurate? (Y/N/—) | Sentiment (+/0/-) | Your position | Competitors named | Sources cited | Notes |
|---|---|---|---|---|---|---|---|---|
| ”Best [category] for [segment]“ | ChatGPT | |||||||
| ”Best [category] for [segment]“ | AI Overviews | |||||||
| ”[You] vs [Competitor]“ | Perplexity | |||||||
| ”What does [brand] do?” | ChatGPT | |||||||
| ”[Brand] reviews” | ChatGPT | |||||||
| … | … |
Baseline scorecard (fill once the table is complete):
| Layer | Score (0–5) | Biggest gap observed |
|---|---|---|
| Discoverability | ||
| Clarity | ||
| Authority | ||
| Trust |
Step 6 — Read the results like a diagnostician
The scores tell you where; the patterns tell you why. A few common diagnoses:
- Absent from category prompts, fine on brand prompts. You have a Discoverability problem — models can describe you when asked directly, but don’t surface you unprompted. Work on source presence and authority.
- Mentioned but wrong. A Clarity problem. Models found you but have a bad entity record — wrong category, dated facts, confused with a competitor.
- Mentioned, accurate, but never recommended. A Trust problem. You’re a known option, not a preferred one — usually a sentiment or reviews gap.
- Same competitor everywhere, citing the same handful of domains. That’s your Authority target list. Those domains are where the answer is being decided.
Write the diagnosis down in plain language: “We’re invisible on comparison queries because every answer cites three review sites we’re barely on.” That sentence is worth more than any score.
Step 7 — Turn it into priorities
Don’t try to fix everything. Pick the lowest layer, pick the two or three highest-impact gaps inside it, and take those into your 90-day plan. Re-run the same prompt set monthly so you can see movement — the audit isn’t a one-time event, it’s your scoreboard. (When you’re ready to make that scoreboard a proper dashboard, we wrote a guide to GEO reporting, and you can see what a tracked baseline looks like in our Q1 2026 GEO benchmark.)
Sources
-
Search Engine Land — 8 GEO metrics to track in 2026 ↩
-
Ahrefs — How to track AI Overviews: mentions, citations, and click loss ↩
-
Search Engine Land — How to reverse-engineer LLM brand visibility ↩