01 Two systems answer every question
Ask ChatGPT to recommend a payroll provider and the answer comes from one of two places. The first is model knowledge: patterns absorbed during training, frozen at a cutoff date, recalled from memory. The second is live retrieval: the assistant runs a web search, reads the results, and builds an answer from them, usually with citations attached. ChatGPT search, Perplexity, Gemini, and Google AI Overviews all lean on retrieval for anything current. Claude adds web search when a question calls for fresh information, and Microsoft Copilot grounds its answers in Bing's index.
The distinction matters because each system responds to different inputs. Model knowledge shifts slowly and rewards years of consistent brand mentions across the web. Retrieval shifts weekly and rewards pages that are crawlable, quotable, and trusted right now. A serious program works on both, and it budgets patience accordingly.
02 Retrieval augmented generation, minus the jargon
The industry term is retrieval augmented generation, or RAG. In plain English: search first, write second. The assistant converts your question into one or more search queries, pulls back a shortlist of pages, extracts the relevant passages, and instructs the model to compose an answer grounded in those passages. The pages that survive the funnel become the citations you see under the answer.
Two practical consequences follow. First, if your page never enters the shortlist, nothing else about it matters, so crawlability and topical authority come before copywriting. Second, if your page enters the shortlist but buries its answer under preamble, the extraction step will quietly favor a competitor that states the same thing plainly.
Watch it happen once and you have seen how AI search works end to end. Ask Perplexity for the best expense management tool for a small nonprofit and you can see the queries it runs, the sources it selects, and the sentence each claim came from. Every brand named in that answer earned its slot upstream of the conversation, mostly through coverage on pages the engine already trusted.
03 The crawlers reading your site
Each AI company runs identifiable crawlers, and they show up in your server logs. As of mid-2026 the ones that matter most for visibility work are:
- GPTBot and OAI-SearchBot: OpenAI's training and search fetchers
- ClaudeBot: Anthropic's crawler
- PerplexityBot: feeds Perplexity's index
- Googlebot: supplies AI Overviews and Gemini grounding
Bing's index grounds Microsoft Copilot and still supplies part of some other retrieval pipelines, so Bing Webmaster Tools is worth the ten minutes it takes to set up. Blocking these bots in robots.txt is a real decision with a real cost: a crawler you block cannot cite you. Before assuming you have an optimization problem, check whether you have a permission problem. It also pays to watch the logs going forward: which bots visit, how often, and which pages they fetch tells you how visible you are to each pipeline before any answer gets measured.
04 How citations get chosen
Retrieval systems score candidate pages on relevance to the query, then filter hard on trust. Trust is inferred from the citation graph, the machine-readable version of reputation: who links to the page, whether the domain gets referenced by publications the engine already relies on, and whether the page's claims agree with what the rest of the web says about the topic. A page that is relevant but unverifiable loses to a page that is slightly less relevant but well corroborated. Engines do not publish the formula, but the observable behavior across ChatGPT search, Perplexity, and AI Overviews is consistent: corroborated sources win.
This is why backlinks and press coverage move AI visibility faster than on-site edits alone: they change what the engine can verify about you. You cannot tag or template your way past a missing reputation; the engines have to find you already being talked about. The specific content patterns that win the quoted slots are broken down in our guide to AI citations.
05 Why mentions matter even without links
Language models learn from text, and text contains brand names with or without hyperlinks. A trade publication describing your product as the strong option for mid-market teams enters the corpora the next model trains on, so the description becomes part of what the model knows, even when no link is attached. Unlinked mentions also feed the consistency check: when many independent sources describe your company the same way, engines treat those facts as stable and repeat them with confidence.
This is a genuine break from traditional SEO, where an unlinked mention was mostly a missed opportunity. In AI search it is an asset in its own right, which changes what a good press placement is worth.
06 What the mechanics mean for your budget
Follow how AI search works and the strategy writes itself. Make your site crawlable by the AI bots. Publish pages that state answers plainly enough to survive extraction. Above all, earn the backlinks and press coverage that feed all three trust channels at once: training corpora, live retrieval, and the citation graph. A brand that is verifiably talked about beats a brand with a perfect website and no witnesses. Then measure whether assistants actually name you, prompt by prompt, month by month, because nothing else counts as proof. The coverage mechanism is explained in full under backlinks for AI trust, and the program built around it on the AI SEO services page.