Skip to main content
Back to glossary

Glossary

RAG (Retrieval-Augmented Generation)

AI Technology

RAG (Retrieval-Augmented Generation)

RAG is a technique where AI retrieves relevant documents from external sources before generating a response. This allows AI to cite current, specific information rather than relying solely on training data.

Why It Matters for GEO

Perplexity and ChatGPT with search use RAG. When your content is retrieved, it can be cited. GEO optimizes content for retrieval systems.

Without RAG, AI answers are limited to what the model learned during training — which could be months or years out of date. With RAG, AI fetches live documents from the web before answering. This is where GEO becomes critical: the retrieval step is essentially a search, and search results are ranked. Your content needs to be crawlable, well-structured, and authoritative to make it into that retrieval set. Content that does not get retrieved never gets cited.

How RAG Works

  1. User asks a question
  2. System retrieves relevant documents
  3. LLM generates answer using retrieved content
  4. Sources are cited in response

Practical Example

A business owner asks Perplexity: "What is the average cost of ISO 27001 certification for a 200-person company?" A security consultancy has published a guide on exactly this topic with specific price ranges, regional variations, and a 2025 update date. Because their site allows PerplexityBot, uses clear headings, and answers the question directly in the first paragraph, Perplexity's RAG system retrieves their guide and cites it. A competitor has the same information buried in a dense PDF behind a gated form — RAG cannot retrieve it, so it is never cited.

Common Mistakes

  • Gated content: Putting your most valuable expertise behind lead capture forms prevents RAG systems from retrieving it. Consider publishing a free version of key insights while gating the premium analysis.
  • Blocking AI crawlers: RAG retrieval only works if the AI bot can access your content. A robots.txt that blocks GPTBot or PerplexityBot removes your pages from the retrieval candidate pool entirely.
  • No date signals: RAG systems prefer recent content. Pages without a visible publication or modification date are deprioritized relative to dated alternatives on the same topic.
  • Thin answers: RAG extracts the most relevant passage to cite. If your answer to a specific question is vague or incomplete, the system may retrieve a competitor's more direct response instead.