Building emdash-geo-seo: One Plugin for AI Citations and Traditional SEO

Most SEO tools were designed for one job: rank in Google. The job has quietly become two jobs. The first one — rank in classic search — has not gone away. The second one — get *cited* by AI assistants when they answer questions — is new, has different inputs, and almost no off-the-shelf tooling for a small Astro site running on Cloudflare Workers.

GEO, for *Generative Engine Optimization*, is the awkward name people have started using for that second job. The term will probably not survive, but the work it points to will. AI assistants need different signals than crawlers do: they want a clear declaration of what your content is for and which crawlers may consume it, structured data they can lift verbatim into an answer, and a llms.txt that tells them where the canonical version of each page lives.

I wrote emdash-geo-seo to handle both at once, from one place, with an admin UI on top so I do not have to redeploy to change a robots policy or a JSON-LD sameAs link. This post is what it does, why each piece is there, and how it is shaped.

The shape from outside

The plugin emits ten public artifacts and one admin panel. The artifacts are what AI crawlers, search engines, and aggregators consume; the admin is where I edit the inputs.

Public surface (rendered live, no static caching):

/robots.txt — Cloudflare's *Content-Signal* preamble plus a per-crawler allow/deny matrix.
/llms.txt — short summary plus an ordered list of pages, one collection per section.
/llms-full.txt — same index but with the full markdown body of each entry inline.
/sitemap-index.xml — top-level sitemap index linking to per-collection sitemaps.
/sitemap-posts.xml — per-collection URL list with lastmod and priority.
/schemamap.xml — a parallel sitemap for the JSON-LD endpoints below.
/schema/posts.json — JSON-LD @graph for an entire collection (WebSite, Organization, then one node per entry).
/rss.xml — site RSS, kept distinct so feed readers do not get confused by the llms-full.txt payload.
Plus per-page meta tags and JSON-LD injected via the page:metadata hook on every rendered page.

Admin surface, at /_emdash/admin/plugins/emdash-geo-seo/geo, four tabs:

General — /llms.txt configuration: short summary, which collections to include, per-collection limit.
AI crawlers — three modes (Allow all / Deny all / Custom) plus a per-crawler matrix when Custom is on. Names like GPTBot, ClaudeBot, PerplexityBot, Google-Extended, CCBot.
Organization — schema.org extras: sameAs profile links and any other JSON-LD properties the JSON endpoints should include.
Advanced — meta defaults: title template, title and description length ranges, list of canonical-allowed query parameters.

Nothing in there ships with the build. Every value is editable in the admin and persisted to the EmDash KV store under the plugin's id. Public routes read settings at request time, so a save in the admin reflects in the next request to /robots.txt. No redeploy.

Why each piece exists

`/robots.txt` with Content-Signal

The Cloudflare *Content-Signal* spec is the closest thing the open web has to a machine-readable license declaration. The preamble at the top of a Content-Signal-enabled robots.txt says, in plain English: "if a crawler has declared a content-signal of yes for a use, it may collect; if no, it may not; if absent, no permission is granted or denied." Below that, signals get attached to specific user-agent groups. So a publisher can say "Googlebot may consume this for search, but ai-train: no" — and the spec gives a crawler a contractual basis for honoring it.

The plugin generates the preamble verbatim, then an allow-list block, then a per-crawler block built from the admin matrix. The allow/deny matrix is what controls everything else: flip GPTBot to deny in the admin, save, and the next curl /robots.txt includes a Disallow: / for User-agent: GPTBot. The change takes effect on the next request.

`/llms.txt` and `/llms-full.txt`

llms.txt is a proposal from Jeremy Howard's group at Answer.AI for a single canonical document at the root of a site that tells an LLM what the site is, lists the canonical URLs for the most important pages, and optionally inlines the markdown of each one (llms-full.txt). The format is small enough to fit in a model's context window during retrieval, and structured enough to be parsed without HTML.

The plugin emits both. llms.txt has the site title, an admin-editable summary, and one section per included collection ("## Posts", "## Pages") with a bulleted list of links. llms-full.txt has the same structure but inlines each entry's markdown body. Both pull from EmDash's content layer using the same hydration the rest of the site uses, so a draft does not appear and a scheduled post does not appear early. The included collections and per-section limits are settings, not code.

Sitemaps and schemamap

sitemap-index.xml and sitemap-posts.xml are the unglamorous part — the part Googlebot and Bingbot still actually read. Per-collection sitemaps mean a single Atom-style discovery surface; lastmod comes from updatedAt, priority is settable per collection in the file config. Standard fare.

schemamap.xml is the same shape but for the JSON-LD endpoints under /schema/<collection>.json. AI assistants that look for structured data have an easier time when there is a sitemap of the structured-data files, separate from the HTML sitemap, so they do not have to derive it.

`schema/<collection>.json`

This is the single most useful artifact for AI citation, in my experience. It returns a JSON-LD @graph with three kinds of nodes:

A WebSite node, anchored at https://example.com/#website.
An Organization node, anchored at https://example.com/#org. This is where the admin's *Organization* tab adds sameAs (a list of profile URLs the site owner is the same person at) and any other extras.
One WebPage node per entry, with headline, description, datePublished, dateModified, and back-references to the WebSite and Organization nodes.

The sameAs list is a small thing that does outsized work. AI models doing entity resolution use it to confirm "the person who wrote this post on this site is the same person who has *that* GitHub profile and *those* Mastodon and X accounts." Without it, a model has to guess; with it, you have given it a verifiable list to anchor on.

The whole graph is generated per-collection on every request. Settings — siteName, url, organization.name, organization.sameAs — come from KV; structured fields per entry come from EmDash's content. KV first, file config second, defaults third.

Per-page meta and JSON-LD

The other half of the plugin is what runs on every rendered page, not just the GEO endpoints. EmDash exposes a page:metadata hook that lets a plugin inject <meta>, <link>, and <script type="application/ld+json"> tags into the document head. The plugin uses it to add:

Open Graph and Twitter card tags built from the entry's title, excerpt, and featured image.
A canonical link, with optional query-parameter allow-list (so ?utm_source=... does not produce a different canonical).
A title built from a configurable template (e.g. "{title} — {siteName}").
A page-level JSON-LD node, typed by the entry's collection (Article for posts, WebPage for static pages, an entity-typed node where the schema has one).

This is the part that classic-SEO tools would call "the SEO plugin." It is also the part that makes individual pages legible to AI assistants when they are visited directly, instead of through one of the index files.

How it is shaped on the inside

The plugin runs in EmDash's *native* format. That is the format with React admin entries, hooks, and storage — as opposed to the sandboxed format which is more isolated but does not support a full admin UI. The descriptor is small:

export function geoSeoPlugin(options: GeoSeoUserConfig = {}): PluginDescriptor {
  return {
    id: "emdash-geo-seo",
    version: "0.2.0",
    format: "native",
    entrypoint: "emdash-geo-seo/sandbox",
    options: options as Record<string, unknown>,
    capabilities: ["read:content"],
    storage: {},
    adminEntry: "emdash-geo-seo/admin",
    adminPages: [{ path: "/geo", label: "GEO", icon: "robot" }],
  };
}

Three pieces do most of the work: a sandbox entry that registers hooks and JSON routes, an Astro-side createGeoSeoRoutes function that the host site mounts under /robots.txt, /llms.txt, and friends, and an admin entry that exports the React view router for the four tabs.

Settings flow through three layers, KV first:

  KV (per-key, edited in admin)
     │
     ▼
  fileConfig (geoSeoOptions in astro.config.mjs)
     │
     ▼
  resolveConfig defaults (sane out-of-box behavior)

loadSettings(kv, fileConfig) reads each KV key, falls back to the corresponding fileConfig field, falls back to a default. The organization section is a shallow merge so an admin can override name without losing sameAs. Everything that reads settings does so per request, so saves in the admin take effect immediately.

There is a plugin:install hook that seeds the four KV keys on first run with sensible defaults, and that is the only piece of fileConfig the average installation needs to think about. After install, the file config can be empty:

export const geoSeoOptions: GeoSeoUserConfig = {};

Which is exactly what src/config/geo-seo.ts looks like on this site after I converted it. The admin owns the configuration; the file is for build-time CI overrides only.

What it does not do

It is worth being explicit about scope. The plugin does *not*:

Run audits or scoring. There are good tools for that (Lighthouse, Ahrefs, Sitebulb, the various GEO scoring plugins). I want the input side handled correctly; auditing is a separate task.
Manage redirects. EmDash 0.7 ships a redirects table; that is the right home for them.
Generate AI-pleasing prose. The point is to make whatever prose the author *wrote* legible to crawlers and models, not to rewrite it.
Track AI citations. There is no AI-equivalent of Search Console yet. When there is, that will be a separate plugin or a separate tab.

The job is the boring middle: make sure every page emits the right meta tags, every collection has a JSON-LD endpoint, every crawler knows what it is allowed to do, and llms.txt says what the site is.

What I'd build next

Three things, in priority order:

Per-page overrides in the admin. Right now per-page metadata is computed from the entry's data plus site defaults. I want a small "GEO" panel inside the post editor for overrides — a custom title, a custom canonical, a per-page robots directive ("noindex, follow" for thank-you pages and the like). The KV layer already supports it; the admin tab does not exist yet.
`schema_export` MCP tool. Settings MCP tools landed in 0.8. A geoseo:export that returns the resolved settings, the rendered robots.txt, and the rendered llms.txt would let me drive a CI smoke check from outside the worker without scraping the live site.
A presets pack. Most sites I would install this on want one of three configurations: "open to all," "open to search, deny AI training," "open to citation, deny training and image generation." The admin UI handles all three but they are tedious to compose by hand. A presets dropdown that bulk-edits the matrix would save five minutes on every install.

Where the source lives

The plugin source is in plugins/emdash-geo-seo/ in this site's repo, which is currently the canonical location while the API stabilizes. Once the surface stops changing I will publish it as a standalone npm package; for now, vendoring it as a workspace plugin is fine and lets me iterate on the shape without semver pressure.

The admin lives at `/_emdash/admin/plugins/emdash-geo-seo/geo` . The public surface is everything I listed above; you can hit /robots.txt, /llms.txt, /schema/posts.json, and the rest right now and see what the production configuration looks like.

SEO has always been about making the right declarations in the right places. GEO is the same job extended to a new audience that reads llms.txt and JSON-LD instead of <title> and <meta description>. Doing both from one plugin, with one admin UI, with no redeploy to flip a setting, is the bar I wanted. This is what cleared it.

Building emdash-geo-seo: One Plugin for AI Citations and Traditional SEO

The shape from outside

Why each piece exists

`/robots.txt` with Content-Signal

`/llms.txt` and `/llms-full.txt`

Sitemaps and schemamap

`schema/<collection>.json`

Per-page meta and JSON-LD

How it is shaped on the inside

What it does not do

What I'd build next

Where the source lives

Continue reading

Testing emdash-geo-seo: A 92/100 AI SEO Score and What It Means

Upgrading to EmDash 0.8.0: Skipping 0.7 and a Migration That Forgot Itself

Upgrading to EmDash 0.6.0: A Quieter Release

The shape from outside

Why each piece exists

/robots.txt with Content-Signal

/llms.txt and /llms-full.txt

Sitemaps and schemamap

schema/<collection>.json

Per-page meta and JSON-LD

How it is shaped on the inside

What it does not do

What I'd build next

Where the source lives

Continue reading

Testing emdash-geo-seo: A 92/100 AI SEO Score and What It Means

Upgrading to EmDash 0.8.0: Skipping 0.7 and a Migration That Forgot Itself

Upgrading to EmDash 0.6.0: A Quieter Release

`/robots.txt` with Content-Signal

`/llms.txt` and `/llms-full.txt`

`schema/<collection>.json`